Blocking AI Crawler Robots
AI Crawler Robots
There are many AI crawler robots and new ones come online frequently. You may not want to block all of them for various reasons, below is a list of some current known AI crawler robots along with the code line for the robots.txt file to block them. These can be blocked using the self serve robots.txt tool in Continuum.
Operator | Crawler | Blocking line to add to robots.txt (use the two lines below) | Notes |
---|---|---|---|
Amazon | Amazonbot |
User-agent: Amazonbot Disallow: / |
|
Anthropic | Claude-SearchBot |
User-agent: Claude-SearchBot Disallow: / |
|
Anthropic | Claude-User |
User-agent: Claude-User Disallow: / |
|
Anthropic-AI | anthropic-ai |
User-agent: anthropic-ai Disallow: / |
|
Apple | Applebot |
User-agent: Applebot Disallow: / |
|
ByteDance | Bytespider |
User-agent: Bytespider Disallow: / |
|
Common Crawl | CCBot |
User-agent: CCBot Disallow: / |
|
DuckDuckGo | DuckAssistBot |
User-agent: DuckAssistBot Disallow: / |
|
Google-CloudVertexBot |
User-agent: Google-CloudVertexBot Disallow: / |
||
GoogleBot |
User-agent: GoogleBot Disallow: / |
||
Google-Extended | Google-Extended |
User-agent: Google-Extended Disallow: / |
A newer user agent which feeds data to Bard (their AI search engine product) and Vertex AI generative APIs |
GoogleOther | GoogleOther |
User-agent: GoogleOther Disallow: / |
Used by Google to crawl for internal research and development. Please read the documentation for this crawler to determine if blocking is appropriate in your situation. |
Huawei | PetalBot |
User-agent: PetalBot Disallow: / |
|
Internet Archive | archive.org_bot |
User-agent: archive.org_bot Disallow: / |
|
Meta | FacebookBot |
User-agent: FacebookBot Disallow: / |
|
Meta | Meta-ExternalAgent |
User-agent: Meta-ExternalAgent Disallow: / |
|
Meta | Meta-ExternalFetcher |
User-agent: Meta-ExternalFetcher Disallow: / |
|
Microsoft | BingBot |
User-agent: BingBot Disallow: / |
|
Mistral | MistralAI-User |
User-agent: MistralAI-User Disallow: / |
|
OpenAI | ChatGPT-User |
User-agent: ChatGPT-User Disallow: / |
|
OpenAI | GPTBot |
User-agent: GPTBot Disallow: / |
|
OpenAI | OAI-SearchBot |
User-agent: OAI-SearchBot Disallow: / |
|
Perplexity | Perplexity-User |
User-agent: Perplexity-User Disallow: / |
|
Perplexity | PerplexityBot |
User-agent: PerplexityBot Disallow: / |
|
ProRata.ai | ProRataInc |
User-agent: ProRataInc Disallow: / |
|
Timpi | Timpibot |
User-agent: Timpibot Disallow: / |
|
Webz.io | Omgilibot |
User-agent: Omgilibot Disallow: / |
Please be aware that the landscape of AI crawlers and bots is constantly evolving. The provided list may not be exhaustive. Ensure that you conduct independent research and verify the inclusion/exclusion of any specific AI crawlers or bots relevant to your needs.