Baseline Websites
curl/seed_db.py
- assume webpage before ChatGPT (Nov 30, 2022) likely human-written
- publication date extraction: htmldate: A Python package to extract publication dates from web pages, Adrien Barbaresi, JOSS, 2020
Human-written
credible source
- Wikihow https://www.wikihow.com/
- Reddit recommended blog
personal website
- (30x) IndieWeb Wiki: registry of personal website
- after filtering by having sitemap, most are tech blog
- WordPress directory?
- (30x) IndieWeb Wiki: registry of personal website
company website
EDGAR database → company name → search for websitemany of them do not have websiteUS Business Database?no website linkLinkedIn?forbit crawling- (30x) Russell 2000
- many do not have blog; many are after ChatGPT; many have no sitemap
- after filtering by having sitemap, most are tech company
- content: most are blog/ company statement (news); some are service/product description; few functional page e.g. form
find
.*/blog/.*
URL in CommonCrawl?
Machine-generated
- (2x) self-claim generated
- sell solution for generating article https://eulogygenerator.com/
- AI search https://www.neuralword.com/
- generated using Wix and B12 manually or w/ Browser-Use
- prompt generated by ChatGPT/Gemini (
curl/suggestions.py
)- summarize given URL as website description; suggest name for similar site
- suggest 50 name+description for blog post
- input into Wix/B12 to generate home page + boilerplate + 20 blog post
- Browser-Use only usable w/ best model like Gemini 2.5 Pro
- expensive: spent ~$8/site just for LLM API
- 30x Wix site correspond to Russell 2000 company site
- 30x B12 site correspond to IndieWeb site
- 4x + 4x other
- prompt generated by ChatGPT/Gemini (
- (not included in dataset) clear cue, e.g., “as an AI”
Training/test dataset
- company website dataset: 30x Wix vs. 30x Russell 2000
- personal website dataset: 30x B12 vs. 30x IndieWeb Wiki
- other website dataset: 4x Wix + 4x B12 + 2x self-claim generated vs. 8 Reddit recommended blog + 6 top blog
note:
- human site have blog, statement, service description, etc.; while generated site mostly are blog
- generated site have boilerplate page, e.g. policy, by the website generator, causing occasional high Binoculars score
AI website generator
- 10Web claim to generate&host website on name&description
- landing page & 1-paragraph sample article
- claim to have generated 1.5M+ websites
- from $13/month; need $28/month “pro” to edit&multi-site; WordPress, Cloudflare CDN
- ❌ need $49/month for each additional website
- ❌ extremely slow when generating, e.g., >10min/page
- Wix AI Website Builder
- landing page & short/long blog article on demand
- allow multiple site, sell domain&service instead of generator
- for arbitrary page, “Generate Full Page Text” produce poor result
- ❌ only generate 1 very short text block & no layout generation
- ContentBot.ai automate AI-driven content creation
- claim to be used on ABCNews, Contagious, PR Week, etc.
- no free trial; from $0.5/1000 word, $29/month for full plan
- Copy.ai go-to market AI for marketing, sales, etc.
- claim to be used by SIEMENS, Rubrik, etc.
- no free trial; $49/month for starter individual plan; mainly target business
- WebWave AI
- landing page & manually written blog
- ❌ very slow; had bug of not publishing blog
- from $3.5/month; $5/month for blog&SEO
- B12
- landing page/ medium-length blog/ service/project description/ team member, on demand
- or any page given name+description
- from $42/month
- very fast generation
- landing page/ medium-length blog/ service/project description/ team member, on demand
- Contentful AI Content Generator use OpenAI API to write content
- HubSpot AI Website Generator optimize existing company website
- only generate landing page
- Relume only generate mockup/HTML
- Webflow only generate layout
- GoDaddy Airo focus on marketing & selling
- ❌ need GoDaddy domain
- Dorik AI
- ❌ need $39/month for unlimited #page, else limit to 5 (free) or 25 ($18/month) per site
- Vzy
- $10/month/site for 100 page
- Wegic, Tilda, Shopify Magic?
Provided example generated sites
- hand-picked; probably not purely generated
- some not text-heavy (mainly image, etc.)
- commonly business w/
/blog
; unlike most content farm found
each generator:
- 10Web https://help.10web.io/hc/en-us/articles/360031026572-Can-You-Provide-Examples-of-Websites-Hosted-on-10Web
- seem not purely generated
- Wix https://www.wix.com/blog/wix-artificial-design-intelligence Examples of sites created with Wix’s ADI-powered website builder
- most down
- B12 https://www.commoninja.com/blog/b12-ai-website-builder#Examples-of-Websites-Designed-with-B12%23Examples-of-Websites-Designed-with-B12
Website generator capability
- Examining the Accessibility of Generative AI Website Builder Tools for Blind and Low Vision Users: 21 Best Practices for Designers and Developers, Sushil K. Oswal, Hitender K. Oswal, ProComm, 2024
- dorik.com, relume.io, wix.com capability
- generate landing page, sitemap, wireframe on description
- customize layout, style
- generate text&image w/ prompt
- dorik.com, relume.io, wix.com capability
- most tutorial/comparison only showcase landing page generation
- product description/ event description/ tech doc/ blog
- marketing email/ social media post/ SEO
- not content: AI chat support, analytics report
- not AI functionality
- boilerplate page: FAQ/ privacy policy/ terms of service/ 404
- selling product/service/ booking/ form
What category to cover
- can only cover what AI website generator can generate
covering:
- personal/company/organization blog
want:
- personal/team project description
- news
- products