Most websites are accidentally blocking AI crawlers. ChatGPT can’t cite you. Perplexity can’t find you. Claude can’t read you. Here’s the definitive list of AI bot user agents and how to configure robots.txt for maximum visibility.
The AI Crawlers You Need to Know
OpenAI (ChatGPT)
User-agent: GPTBot # Training + products
User-agent: ChatGPT-User # Browse-with-ChatGPT feature
User-agent: OAI-SearchBot # ChatGPT Search (citations!)
OAI-SearchBot is the most important for traffic. When someone asks ChatGPT “best job boards in Europe,” this bot fetches pages to cite.
Google (Gemini)
User-agent: Google-Extended # Gemini/AI Overviews training
User-agent: GoogleOther # R&D crawling
Blocking Google-Extended removes you from AI Overviews but keeps you in normal search. Usually you want both.
Anthropic (Claude)
User-agent: ClaudeBot # Claude's web browsing
User-agent: anthropic-ai # Older identifier
Perplexity
User-agent: PerplexityBot # Perplexity AI search
Perplexity is growing fast as an AI search engine. Being cited here drives real traffic.
Apple (Siri / Apple Intelligence)
User-agent: Applebot # Siri, Spotlight
User-agent: Applebot-Extended # Apple Intelligence features
Meta
User-agent: meta-externalagent # Meta AI training
User-agent: FacebookBot # Meta AI + link previews
Others
User-agent: CopilotBot # Microsoft Copilot
User-agent: YouBot # You.com AI search
User-agent: cohere-ai # Cohere RAG
User-agent: CCBot # Common Crawl (feeds many AI systems)
User-agent: Bytespider # ByteDance/TikTok
User-agent: Amazonbot # Alexa answers
The Simple Approach
If you want maximum AI visibility (recommended for most sites):
User-agent: *
Allow: /
Sitemap: https://yoursite.com/sitemap.xml
That’s it. Allow everything. Let every AI system read, cite, and reference your content.
The Selective Approach
If you want search engines and AI search, but not AI training:
# Allow search + AI search
User-agent: Googlebot
Allow: /
User-agent: GPTBot
Allow: /
User-agent: OAI-SearchBot
Allow: /
User-agent: PerplexityBot
Allow: /
# Block pure training crawlers
User-agent: CCBot
Disallow: /
User-agent: Bytespider
Disallow: /
Why This Matters
Ahrefs shows an “AI Citations” metric now. Sites that block AI crawlers show 0 citations. Sites that allow them get referenced in ChatGPT, Perplexity, and Gemini responses — which is increasingly where people find information.
At MisuJob, we allow ALL AI crawlers. Our job listings appear in AI search results, driving traffic from ChatGPT Search and Perplexity.
What’s your robots.txt policy for AI crawlers? Block all, allow all, or selective?

