---
title: "Free Robots.txt Generator · Block AI Bots & Control Crawlers · Nuxt SEO"
meta:
  "og:description": "Free robots.txt generator with AI bot blocking. Create, test & download robots.txt files. Block GPTBot, ClaudeBot, CCBot & 20+ crawlers. No signup required."
  "og:title": "Free Robots.txt Generator · Block AI Bots & Control Crawlers · Nuxt SEO"
  description: "Free robots.txt generator with AI bot blocking. Create, test & download robots.txt files. Block GPTBot, ClaudeBot, CCBot & 20+ crawlers. No signup required."
---

# **Free Robots.txt Generator**

Create robots.txt files with AI bot blocking. Block GPTBot, ClaudeBot, CCBot & 20+ crawlers. Test rules and download instantly.

## **Rules **

1.

Disallow

*(none) *

Allow

Sitemaps

Content-Usage [IETF](https://datatracker.ietf.org/doc/draft-ietf-aipref-vocab/)

search

train-ai

Content-Signal [Cloudflare](https://blog.cloudflare.com/content-signals-policy/)

search

ai-input

ai-train

"No preference" excludes from output

robots.txt

```
# Generated by Nuxt SEO
# https://nuxtseo.com/tools/robots-txt-generator
User-agent: *
Allow: /
```

**Use with AI Assistants**

These tools are also available in the free Nuxt SEO MCP server for Claude, Cursor, VS Code, and more.

[Setup MCP Server](https://nuxtseo.com/docs/nuxt-seo/guides/mcp)

**Was this tool helpful? **

Your feedback helps us improve

## **Test Your Rules**

URL Path to test

User-agent

Access granted!

Matched: User-agent: * Allow: /

## **Why Block AI Bots?**

AI bots like GPTBot, ClaudeBot, and CCBot crawl the web to train Large Language Models (LLMs). Blocking them can:

- **Prevent AI Scraping:** Stop your content from being used to answer AI queries without attribution.
- **Stop Model Training:** Opt-out of having your data used to train future AI models.
- **Reduce Server Load:** AI crawlers can be aggressive. Blocking them saves bandwidth.

#### **Key Bots to Block **

**GPTBot****ChatGPT-User****ClaudeBot****anthropic-ai****CCBot****Google-Extended****Bytespider****PerplexityBot**

Use the **Block AI Crawlers** preset above to block all of these instantly.

## **Common User Agents**

### **Search Engines**

[Googlebot ](https://developers.google.com/search/docs/crawling-indexing/overview-google-crawlers)

Google Search (28% of bot traffic)

[Bingbot ](https://www.bing.com/webmasters/help/which-crawlers-does-bing-use-8c184ec0)

Microsoft Bing Search

[DuckDuckBot ](https://duckduckgo.com/duckduckbot)

DuckDuckGo Search

[YandexBot ](https://yandex.com/support/webmaster/robot-workings/check-yandex-robots.html)

Yandex Search (Russia)

[Baiduspider ](https://www.baidu.com/search/robots_english.html)

Baidu Search (China)

### **AI Crawlers**

[GPTBot ](https://platform.openai.com/docs/bots)

OpenAI model training (7.5% of bot traffic, most blocked bot)

[ChatGPT-User ](https://platform.openai.com/docs/bots)

ChatGPT live browsing

[OAI-SearchBot ](https://platform.openai.com/docs/bots)

ChatGPT Search feature

[ClaudeBot ](https://www.anthropic.com/crawlers-info)

Anthropic model training

[Claude-Web ](https://www.anthropic.com/crawlers-info)

Claude live browsing

[anthropic-ai ](https://www.anthropic.com/crawlers-info)

Anthropic data collection

[CCBot ](https://commoncrawl.org/ccbot)

Common Crawl dataset (Frequently blocked, used by many AI labs)

[Google-Extended ](https://developers.google.com/search/docs/crawling-indexing/overview-google-crawlers#google-extended)

Gemini/Bard training (separate from Search)

[PerplexityBot ](https://docs.perplexity.ai/guides/bots)

Perplexity AI search

Bytespider

TikTok/ByteDance AI

[Amazonbot ](https://developer.amazon.com/amazonbot)

Amazon Alexa training

[cohere-ai ](https://cohere.com/robots)

Cohere model training

[Meta-ExternalAgent ](https://www.facebook.com/externalhit_uatext.php)

Meta AI training

[meta-externalfetcher ](https://www.facebook.com/externalhit_uatext.php)

Meta data fetching

[Applebot-Extended ](https://support.apple.com/en-us/119829)

Apple AI training (not Search)

### **Social Platforms**

[facebookexternalhit ](https://developers.facebook.com/docs/sharing/webmasters/web-crawlers/)

Facebook/Meta link previews

[Twitterbot ](https://developer.x.com/en/docs/x-for-websites/cards/guides/getting-started)

Twitter/X link previews

[LinkedInBot ](https://www.linkedin.com/help/linkedin/answer/a521928)

LinkedIn link previews

[Slackbot ](https://api.slack.com/robots)

Slack link previews

Discordbot

Discord link previews

WhatsApp

WhatsApp link previews

TelegramBot

Telegram link previews

Source: [Cloudflare 2025 Bot Traffic Report](https://blog.cloudflare.com/from-googlebot-to-gptbot-whos-crawling-your-site-in-2025/)

## **Directives**

### **Core Directives **

`User-agent: *`

applies to all crawlers

`Disallow: /`

block entire site

`Allow: /`

explicitly allow (for exceptions)

`Crawl-delay: 10`

wait 10s between requests (Bing/Yandex)

`Sitemap: URL`

specify sitemap location

### **Content-Usage **[IETF](https://datatracker.ietf.org/doc/draft-ietf-aipref-vocab/)

Uses `y`/`n` values

`Content-Usage: search=y`

allow search indexing

`Content-Usage: train-ai=n`

disallow AI model training

Combine: `search=y, train-ai=n`

### **Content-Signal **[Cloudflare](https://blog.cloudflare.com/content-signals-policy/)

Uses `yes`/`no` values

`search=yes`

allow search indexing

`ai-input=no`

disallow live AI answers

`ai-train=no`

disallow model training

### **Pattern Matching **

`*`

matches any sequence

`$`

matches end of URL

`/*.pdf`

all .pdf files

`/*.php$`

URLs ending in .php

## **Common Patterns**

`Disallow: /admin/`Block /admin/ directory

`Disallow: /*?`Block URLs with query strings

`Disallow: /*.json$`Block all .json files

`Disallow: /private/*`Block everything under /private/

`Allow: /api/public`Allow specific path (exception)

## **How to Block GPTBot & AI Crawlers **

[GPTBot: most blocked bot 2024 →](https://blog.cloudflare.com/from-googlebot-to-gptbot-whos-crawling-your-site-in-2025/)

$ top AI crawlers to block

`GPTBot``ClaudeBot``CCBot``Google-Extended``Bytespider``PerplexityBot`

Google-Extended = Gemini training (not Search)

$ content preference headers

[IETF](https://datatracker.ietf.org/doc/draft-ietf-aipref-vocab/)`Content-Usage: search=y, train-ai=n`

[Cloudflare](https://blog.cloudflare.com/content-signals-policy/)`Content-Signal: search=yes, ai-train=no`

ai-input = live answers   ai-train = model training

**Need to block everything?**

To completely hide your site (staging/dev environments), use `User-agent: *` with `Disallow: /`. This is the "nuclear option" often searched as _robots.txt disallow all_ or _robots.txt deny all_.

## **How to Create a Robots.txt File**

1. **1****Choose your crawl policy** Decide which bots to allow and which to block. Most sites allow search engines but block AI crawlers.
2. **2****Add User-agent rules** Each rule targets a specific bot (or `*` for all). Add `Disallow:` paths to block and `Allow:` paths for exceptions.
3. **3****Add your sitemap URL** Include a `Sitemap:` directive pointing to your XML sitemap. This helps search engines discover all your pages.
4. **4****Test your rules** Use the tester above to verify specific paths are blocked or allowed as expected.
5. **5****Download and deploy** Save the file as `robots.txt` in your site's root directory so it's accessible at `example.com/robots.txt`.

## **Robots.txt vs Meta Robots Tag**

### **robots.txt **

- Controls **crawling access** — which bots can fetch pages
- Site-wide rules in a **single file**
- Cannot prevent **indexing** — blocked pages can still appear in search results
- Cannot control **link following** (nofollow)

### **Meta Robots Tag **

- Controls **indexing** — noindex removes pages from search results
- Per-page control with **nofollow**, nosnippet, noimageindex
- Requires the bot to **crawl the page first** to read the tag
- Must be added to **every page** individually

**Best practice:** Use robots.txt to block crawling of entire sections (admin, staging, API endpoints). Use meta robots to control indexing of individual pages. For complete removal from search results, you need **both** — allow crawling (so the bot reads your noindex tag) but set `<meta name="robots" content="noindex">`.

## **Common Robots.txt Mistakes**

### **Blocking CSS/JS files **

Disallowing `/_nuxt/` or `/assets/` prevents Google from rendering your page. Google needs these resources to understand your layout and content.

### **Using robots.txt to hide pages **

Blocking a URL in robots.txt doesn't remove it from search results — Google can still index it based on links from other sites. Use the `noindex` meta tag instead.

### **Wrong file location **

The file must be at the root: `example.com/robots.txt`. Placing it in a subdirectory (`/public/robots.txt`) won't be found by crawlers.

### **Missing sitemap directive **

Always include a `Sitemap:` line pointing to your XML sitemap. This is the simplest way to help search engines discover all your pages.

## **Using Nuxt? Generate robots.txt automatically **

The **@nuxtjs/robots** module generates robots.txt from your Nuxt config — with environment-aware rules, AI bot blocking, route annotations, and HMR support.

[**Get Started **](https://nuxtseo.com/docs/robots/getting-started/installation) [**Robots.txt Guide **](https://nuxtseo.com/docs/robots/guides/robots-txt)

## **Frequently Asked Questions**

<details>

<summary>**01**

### **What is a robots.txt generator?**

</summary>

A robots.txt generator is a tool that helps you create a valid robots.txt file for your website. It lets you specify which crawlers can access which parts of your site, block AI bots, and add sitemap references - all without writing the syntax manually.

</details>

<details>

<summary>**02**

### **How do I block AI bots with robots.txt?**

</summary>

Add User-agent rules for AI crawlers like GPTBot (OpenAI), ClaudeBot (Anthropic), CCBot (Common Crawl), and Google-Extended (Gemini training). Set Disallow: / under each to block them. Note: AI companies may not respect robots.txt for training data already collected.

</details>

<details>

<summary>**03**

### **What is the meta robots nofollow directive?**

</summary>

The nofollow directive tells search engines not to follow links on a page. In robots.txt, you can't use nofollow - it's only for the meta robots tag or link rel attribute. Robots.txt only controls crawling access, not link following behavior.

</details>

<details>

<summary>**04**

### **Where do I put my robots.txt file?**

</summary>

Place robots.txt in your website's root directory (e.g., example.com/robots.txt). It must be accessible at this exact URL. Search engines check this location automatically. Most web frameworks have built-in ways to generate this file.

</details>

<details>

<summary>**05**

### **Can robots.txt block pages from Google?**

</summary>

Robots.txt prevents crawling but not indexing. Blocked pages can still appear in search results if linked from other sites. To fully remove pages from search results, use the meta robots noindex tag or X-Robots-Tag HTTP header instead.

</details>

## **Learn More**

[<h2>**Robots.txt Validator**</h2>Check your robots.txt for syntax errors, AI signals (IETF/Cloudflare), and verify bot access.](https://nuxtseo.com/tools/robots-txt-validator) [<h2>**Robots.txt Guide**</h2>Complete guide to robots.txt syntax, common patterns, and security considerations.](https://nuxtseo.com/learn-seo/vue/controlling-crawlers/robots-txt) [<h2>**XML Sitemaps**</h2>How sitemaps work with robots.txt to guide crawler discovery.](https://nuxtseo.com/learn-seo/vue/controlling-crawlers/sitemaps) [<h2>**Meta Robots Tags**</h2>Page-level control when robots.txt isn't enough.](https://nuxtseo.com/learn-seo/vue/controlling-crawlers/meta-tags)