Blocking AI Crawler Robots
AI Crawler Robots
There are many AI crawler robots and new ones come online frequently. You may not want to block all of them for various reasons, below is a list of some current known AI crawler robots along with the code line for the robots.txt file to block them. These can be blocked using the self serve robots.txt tool in Continuum.
Anthropic-AI
Website: https://www.anthropic.com/
Blocking line to add to robots.txt (use the two lines below)
User-agent: anthropic-ai
Disallow: /
CCBot
Website: https://commoncrawl.org/ccbot
Blocking line to add to robots.txt (use the two lines below)
User agent: CCBot
Disallow: /
ChatGPT User
Website: https://chat.openai.com/
Blocking line to add to robots.txt (use the two lines below)
User agent: ChatGPT-User
Disallow: /
Facebook/Meta
Website: https://developers.facebook.com/docs/sharing/bot/
Blocking line to add to robots.txt (use the two lines below)
User agent: FacebookBot
Disallow: /
GoogleOther
Website: https://developers.google.com/search/docs/crawling-indexing/overview-google-crawlers
Used by Google to crawl for internal research and development
*please read the documentation for this crawler to determine if blocking is appropriate in your situation.
Blocking line to add to robots.txt (use the two lines below)
User agent: GoogleOther
Disallow: /
Google-Extended
Website: https://developers.google.com/search/docs/crawling-indexing/overview-google-crawlers
A newer user agent which feeds data to Bard (Their AI search engine product) and Vertex AI generative APIs.
Blocking line to add to robots.txt (use the two lines below)
User agent: Google-Extended
Disallow: /
GPTBot
Website: https://platform.openai.com/docs/gptbot
Blocking line to add to robots.txt (use the two lines below)
User agent: GPTBot
Disallow: /
Webz.io
Website: https://webz.io/blog/web-data/what-is-the-omgili-bot-and-why-is-it-crawling-your-website/
Blocking line to add to robots.txt (use the two lines below)
User agent: OmigiliBot
Disallow: /