NC Logo UseToolSuite

Robots.txt Generator

Free online robots.txt generator with presets for common configurations. Create robots.txt files for your website with support for AI bot blocking, WordPress, and custom rules.

Configuration

What is Robots.txt Generator?

Robots.txt Generator is a free online tool that creates properly formatted robots.txt files for your website. The robots.txt file is a standard text file placed at the root of your website that tells search engine crawlers and other bots which pages or sections they are allowed or not allowed to access. This tool supports all major directives including User-agent, Disallow, Allow, Sitemap, and Crawl-delay — with presets for common configurations and support for blocking AI training bots like GPTBot and Google-Extended.

When to use it?

Use this tool when launching a new website, updating your crawl policies, or blocking specific bots from accessing parts of your site. It is especially useful when you want to prevent AI crawlers from scraping your content for training data, hide admin or staging areas from search engines, or ensure your sitemap location is properly declared for all crawlers.

Common use cases

Web developers create robots.txt files when deploying new sites to control search engine indexing. SEO specialists configure crawl directives to prevent duplicate content issues and protect private sections. Site owners block AI training bots (GPTBot, CCBot, Google-Extended) from scraping their content. WordPress administrators block access to wp-admin, wp-includes, and other sensitive directories while allowing CSS and JS files for proper rendering.

Key Concepts

Essential terms and definitions related to Robots.txt Generator.

Robots Exclusion Protocol

A standard (originally proposed in 1994) that defines how web crawlers should interact with websites. The robots.txt file, the noindex meta tag, and the X-Robots-Tag HTTP header are all part of this protocol. Compliance is voluntary — well-behaved crawlers follow the rules, but the protocol provides no enforcement mechanism.

User-agent Directive

A robots.txt directive that specifies which crawler the following rules apply to. User-agent: * matches all crawlers, while User-agent: Googlebot targets only Google's crawler. Multiple User-agent blocks can be defined in one robots.txt file to apply different rules to different bots.

Crawl-delay Directive

A robots.txt directive that requests crawlers to wait a specified number of seconds between requests. For example, Crawl-delay: 10 asks bots to wait 10 seconds between each page fetch. Bing respects this directive; Google does not — use Google Search Console's crawl rate settings instead. Use crawl-delay when your server has limited resources and heavy bot traffic is causing performance issues.

Sitemap Directive

A robots.txt directive that declares the location of your XML sitemap: Sitemap: https://example.com/sitemap.xml. This helps search engines discover all pages on your site, including those not reachable through internal links. Multiple Sitemap directives can be specified. The Sitemap directive is not tied to any User-agent block and applies to all crawlers.

Frequently Asked Questions

Where should I place the robots.txt file?

The robots.txt file must be placed at the root of your domain: https://example.com/robots.txt. It must be accessible at that exact URL. Placing it in a subdirectory will not work — crawlers only look for it at the domain root. The file must be served with a text/plain Content-Type header.

Does robots.txt prevent pages from appearing in search results?

No. Robots.txt blocks crawling, not indexing. If other pages link to a URL that is disallowed in robots.txt, search engines may still index the URL based on anchor text and link context — they just won't crawl the page content. To prevent indexing, use the noindex meta tag or X-Robots-Tag HTTP header instead.

How do I block AI crawlers like GPTBot and Google-Extended?

Add separate User-agent blocks for each AI crawler you want to block: User-agent: GPTBot followed by Disallow: /. This tool includes presets for major AI crawlers including GPTBot (OpenAI), ChatGPT-User, Google-Extended (Gemini), CCBot (Common Crawl), anthropic-ai, and Bytespider (TikTok). Note that compliance is voluntary — well-behaved bots respect robots.txt, but malicious scrapers may ignore it.

Can I use wildcards in robots.txt paths?

Yes. Google and Bing support the * wildcard and the $ end-of-URL anchor. For example, Disallow: /*.pdf$ blocks all URLs ending in .pdf, and Disallow: /*/private/ blocks any URL containing /private/ in the path. However, not all crawlers support these extensions — the original robots.txt specification only defines exact prefix matching.

What is the Sitemap directive and why is it important?

The Sitemap directive tells crawlers where to find your XML sitemap: Sitemap: https://example.com/sitemap.xml. This is important because it helps search engines discover pages that might not be reachable through internal links alone. The Sitemap directive can be placed outside any User-agent block and applies globally.

Troubleshooting & Technical Tips

Common errors developers encounter and how to resolve them.

Pages still appearing in Google despite Disallow rule

Robots.txt blocks crawling, not indexing. If external sites link to your disallowed pages, Google can still index the URL (showing a title and snippet derived from links, not page content). To fully prevent indexing, add a <meta name="robots" content="noindex"> tag to the page HTML, or send an X-Robots-Tag: noindex HTTP header. Note: Google must be able to crawl a page to see a noindex tag, so do not both disallow and noindex the same URL.

AI bots ignoring robots.txt: GPTBot still scraping content

Robots.txt compliance is voluntary. While reputable bots (Googlebot, Bingbot, GPTBot) honor robots.txt, some AI scrapers do not. For additional protection: use your web server or CDN (Cloudflare, Fastly) to block known AI bot user-agent strings at the server level, implement rate limiting, or require JavaScript rendering that simple scrapers cannot execute. Robots.txt should be your first line of defense but not your only one.

Googlebot cannot access CSS and JS files: Page renders incorrectly

If you disallow CSS or JavaScript directories in robots.txt, Googlebot cannot render your pages properly — leading to poor indexing and potential ranking drops. Google's Search Console will flag this as a crawling issue. Always allow access to CSS, JS, and image files that are needed for page rendering. Use Allow: /wp-content/uploads/ and Allow: /wp-includes/css/ in WordPress robots.txt configurations.

Related Tools