Question 1

What is robots.txt?

Accepted Answer

robots.txt is a file at the root of your website (example.com/robots.txt) that tells search engine crawlers which pages they can and cannot access. It uses 'Allow' and 'Disallow' directives per User-Agent to control crawler behavior. It's the first file any crawler checks before crawling your site.

Question 2

Does robots.txt affect AI search visibility?

Accepted Answer

Yes — critically. In 2026, AI search engines like ChatGPT (OAI-SearchBot), Claude (Claude-SearchBot), and Perplexity (PerplexityBot) respect robots.txt. If you block these crawlers, your content won't appear in AI search results. Many sites accidentally block AI crawlers, losing all AI search visibility.

Question 3

What's the difference between GPTBot and OAI-SearchBot?

Accepted Answer

GPTBot collects data for OpenAI model training. OAI-SearchBot crawls content for ChatGPT search results. You can block GPTBot (preventing training use) while allowing OAI-SearchBot (keeping ChatGPT search visibility). Same pattern for Anthropic: ClaudeBot (training) vs Claude-SearchBot (search).

Question 4

Should I have a Sitemap directive in robots.txt?

Accepted Answer

Yes. The Sitemap directive in robots.txt tells all crawlers where to find your XML sitemap. While Google can find sitemaps through Search Console, other crawlers (including AI bots) rely on this directive. Format: Sitemap: https://example.com/sitemap.xml

Question 5

What happens if I don't have a robots.txt?

Accepted Answer

Without a robots.txt file, all crawlers assume they can access everything on your site. This is fine for most sites, but it means you can't control which bots access your content. If you want to block AI training bots while allowing search bots, you need a robots.txt file.

Question 6

Can robots.txt block Google from indexing a page?

Accepted Answer

robots.txt prevents crawling, not indexing. If Google finds a link to a blocked page, it may still index the URL (with a 'URL is blocked by robots.txt' note) — it just can't see the content. To prevent indexing, use a noindex meta tag instead. Use robots.txt for crawler access control, not indexation control.

Robots.txt Tester

What This Tool Checks

Frequently Asked Questions