Web User-Facing

Search Engine Optimization (SEO)

  • The influence of search engine optimization on Google’s results: A multi-dimensional approach for detecting SEO, Dirk Lewandowski, Sebastian Sünkler, Nurce Yagci, ACM WebSci, 2021
    • insight from interview w/ “SEO expert”
    • questionable heuristics (e.g., HTTPS, manual website classification)
    • dataset: Google Trends, radical right, coronavirus
    • most search result likely have SEO
  • Is Google Getting Worse? A Longitudinal Investigation of SEO Spam in Search Engines, Janek Bevendorff, Matti Wiegmann, Martin Potthast, Benno Stein, Springer ECIR, 2024
    • search result on product review & spot affiliate link
      • query: best <category> where <category> is in GS1 Global Product Classification/ Google Product Taxonomy
      • filter review based on keyword regex, but 80% accuracy in test
      • manual classification of top 30 domain: authentic review/ magazine&news/ content farm/ spam/ shop/ social media/ other
    • top SEO content: repetitive, less readable, shallower URL, longer content, more heading, less heading-content overlap
      • lots of SEO metric based on HTML
      • They are also indicators of lower-quality, possibly mass-produced, or even AI-generated content.

    • comparison w/ BM25 search engine ChatNoir: much more affiliate link
  • Adversarial Search Engine Optimization for Large Language Models, Fredrik Nestaas, Edoardo Debenedetti, Florian Tramèr, arXiv, 2024
    • embed instruction/defamation in web content to manipulate RAG LLM search engine (answer engine)
      • can imply other content is bad
    • test by searching w/ site: for owned domain

examples:

ideas:

  • ranking based on user feedback
  • measuring retrieval of Perplexity, ChatGPT, etc. in search mode