The Complete robots.txt for 2026: Every AI Crawler You Should Know About

Most websites are accidentally blocking AI crawlers. ChatGPT can’t cite you. Perplexity can’t find you. Claude can’t read you. Here’s the definitive list of AI bot user agents and how to configure robots.txt for maximum visibility.

The AI Crawlers You Need to Know

OpenAI (ChatGPT)

User-agent: GPTBot          # Training + products
User-agent: ChatGPT-User    # Browse-with-ChatGPT feature
User-agent: OAI-SearchBot   # ChatGPT Search (citations!)

OAI-SearchBot is the most important for traffic. When someone asks ChatGPT “best job boards in Europe,” this bot fetches pages to cite.

Google (Gemini)

User-agent: Google-Extended  # Gemini/AI Overviews training
User-agent: GoogleOther      # R&D crawling

Blocking Google-Extended removes you from AI Overviews but keeps you in normal search. Usually you want both.

Anthropic (Claude)

User-agent: ClaudeBot       # Claude's web browsing
User-agent: anthropic-ai    # Older identifier

Perplexity

User-agent: PerplexityBot   # Perplexity AI search

Perplexity is growing fast as an AI search engine. Being cited here drives real traffic.

Apple (Siri / Apple Intelligence)

User-agent: Applebot           # Siri, Spotlight
User-agent: Applebot-Extended  # Apple Intelligence features

Others

User-agent: CopilotBot    # Microsoft Copilot
User-agent: YouBot         # You.com AI search
User-agent: cohere-ai      # Cohere RAG
User-agent: CCBot          # Common Crawl (feeds many AI systems)
User-agent: Bytespider     # ByteDance/TikTok
User-agent: Amazonbot      # Alexa answers

The Simple Approach

If you want maximum AI visibility (recommended for most sites):

User-agent: *
Allow: /

Sitemap: https://yoursite.com/sitemap.xml

That’s it. Allow everything. Let every AI system read, cite, and reference your content.

The Selective Approach

If you want search engines and AI search, but not AI training:

# Allow search + AI search
User-agent: Googlebot
Allow: /

User-agent: GPTBot
Allow: /

User-agent: OAI-SearchBot
Allow: /

User-agent: PerplexityBot
Allow: /

# Block pure training crawlers
User-agent: CCBot
Disallow: /

User-agent: Bytespider
Disallow: /

Why This Matters

Ahrefs shows an “AI Citations” metric now. Sites that block AI crawlers show 0 citations. Sites that allow them get referenced in ChatGPT, Perplexity, and Gemini responses — which is increasingly where people find information.

At MisuJob, we allow ALL AI crawlers. Our job listings appear in AI search results, driving traffic from ChatGPT Search and Perplexity.

What’s your robots.txt policy for AI crawlers? Block all, allow all, or selective?