---
title: "Robots Txt"
canonical_url: "https://nuxtseo.com/learn-seo/nuxt/controlling-crawlers/robots-txt"
last_updated: "2026-01-29"
---

<key-takeaways>

- robots.txt is advisory. crawlers can ignore it, never use for security
- Use Nuxt Robots module for environment-aware generation (auto-block staging)
- In 2026, use `Content-Signal` and `Content-Usage` for granular AI governance
- Include sitemap reference and distinguish between Search and AI Training bots

</key-takeaways>

The `robots.txt` file controls which parts of your site crawlers can access. [Officially adopted as RFC 9309](https://datatracker.ietf.org/doc/html/rfc9309) in September 2022, it's primarily used to [manage crawl budget](https://developers.google.com/search/docs/crawling-indexing/large-site-managing-crawl-budget) on large sites and manage **AI Bot Governance**.

In 2026, robots.txt has evolved from a simple "block/allow" file into a sophisticated policy document for the **Agentic Web**. It's how you tell AI models whether they can use your data for training, real-time answering, or agentic actions.

## Quick Setup

### Static robots.txt

For simple static rules, add the file in your public directory:

```dir
public/
  robots.txt
```

Add your rules:

```robots-txt [robots.txt]
# Allow all crawlers
User-agent: *
Disallow:

# Optionally point to your sitemap
Sitemap: https://mysite.com/sitemap.xml
```

### Server Route

For custom dynamic generation, create a server route:

```ts [server/routes/robots.txt.ts]
export default defineEventHandler((event) => {
  const isDev = process.env.NODE_ENV !== 'production'
  const robots = isDev
    ? 'User-agent: *\nDisallow: /'
    : 'User-agent: *\nDisallow:\nSitemap: https://mysite.com/sitemap.xml'

  setHeader(event, 'Content-Type', 'text/plain')
  return robots
})
```

### Automatic Generation with Module

For environment-aware generation (auto-block staging), use the Nuxt Robots module:

<module-card className="w-1/2" slug="robots">



</module-card>

Install the module:

```bash
npx nuxi@latest module add robots
```

The module automatically generates `robots.txt` with zero config. For environment-specific rules:

```ts [nuxt.config.ts]
export default defineNuxtConfig({
  modules: ['@nuxtjs/robots'],
  robots: {
    disallow: process.env.NODE_ENV !== 'production' ? '/' : undefined
  }
})
```

## Robots.txt Syntax

The `robots.txt` file consists of directives grouped by user agent. Google [uses the most specific matching rule](https://developers.google.com/search/docs/crawling-indexing/robots/robots_txt) based on path length:

```robots-txt [robots.txt]
# Define which crawler these rules apply to
User-agent: *

# Block access to specific paths
Disallow: /admin

# Allow access to specific paths (optional, more specific than Disallow)
Allow: /admin/public

# Point to your sitemap
Sitemap: https://mysite.com/sitemap.xml
```

### User-agent

The `User-agent` directive specifies which crawler the rules apply to:

```robots-txt [robots.txt]
# All crawlers
User-agent: *

# Just Googlebot
User-agent: Googlebot

# Multiple specific crawlers
User-agent: Googlebot
User-agent: Bingbot
Disallow: /private
```

Common crawler user agents:

- [Googlebot](https://developers.google.com/search/docs/advanced/crawling/overview-google-crawlers): Google's search crawler (28% of all bot traffic in 2025)
- [Google-Extended](https://blog.cloudflare.com/from-googlebot-to-gptbot-whos-crawling-your-site-in-2025/): Google's AI training crawler (separate from search)
- [GPTBot](https://platform.openai.com/docs/bots/overview-of-openai-crawlers): OpenAI's AI training crawler (7.5% of bot traffic)
- [ClaudeBot](https://www.playwire.com/blog/how-to-block-ai-bots-with-robotstxt-the-complete-publishers-guide): Anthropic's AI training crawler
- [CCBot](https://commoncrawl.org/ccbot): Common Crawl's dataset builder (frequently blocked)
- [Bingbot](https://ahrefs.com/seo/glossary/bingbot): Microsoft's search crawler
- [FacebookExternalHit](https://developers.facebook.com/docs/sharing/webmasters/web-crawlers/): Facebook's link preview crawler

### Allow / Disallow

The `Allow` and `Disallow` directives control path access:

```robots-txt [robots.txt]
User-agent: *
# Block all paths starting with /admin
Disallow: /admin

# Block a specific file
Disallow: /private.html

# Block files with specific extensions
Disallow: /*.pdf$

# Block URL parameters
Disallow: /*?*
```

Wildcards supported ([RFC 9309](https://datatracker.ietf.org/doc/html/rfc9309)):

- `*`: matches zero or more characters
- `$`: matches the end of the URL
- Paths are case-sensitive and relative to domain root

### Sitemap

The `Sitemap` directive tells crawlers where to find your [sitemap.xml](/learn-seo/nuxt/controlling-crawlers/sitemaps):

```robots-txt [robots.txt]
Sitemap: https://mysite.com/sitemap.xml

# Multiple sitemaps
Sitemap: https://mysite.com/products-sitemap.xml
Sitemap: https://mysite.com/blog-sitemap.xml
```

The [Nuxt Sitemap module](/docs/sitemap/getting-started/introduction) automatically adds the sitemap URL to your `robots.txt`.

### Crawl-Delay (Non-Standard)

`Crawl-Delay` is not part of [RFC 9309](https://datatracker.ietf.org/doc/html/rfc9309). Google ignores it. Bing and Yandex support it:

```robots-txt [robots.txt]
User-agent: Bingbot
Crawl-delay: 10  # seconds between requests
```

For Google, you [manage crawl rate in Search Console](https://developers.google.com/search/docs/crawling-indexing/large-site-managing-crawl-budget).

## Security: Why robots.txt Fails

[Robots.txt is not a security mechanism](https://developer.mozilla.org/en-US/docs/Web/Security/Practical_implementation_guides/Robots_txt). Malicious crawlers ignore it, and listing paths in `Disallow` [reveals their location to attackers](https://www.searchenginejournal.com/robots-txt-security-risks/289719/).

**Common mistake:**

```robots-txt
# ❌ Advertises your admin panel location
User-agent: *
Disallow: /admin
Disallow: /wp-admin
Disallow: /api/internal
```

Use [proper authentication](https://developers.google.com/search/docs/crawling-indexing/block-indexing) instead. See our [security guide](/learn-seo/nuxt/routes-and-rendering/security) for details.

## Crawling vs Indexing

Blocking a URL in `robots.txt` prevents crawling but [doesn't prevent indexing](https://developers.google.com/search/docs/crawling-indexing/robots/intro). If other sites link to the URL, Google can still index it without crawling, showing the URL with no snippet. Use [meta robots tags](/learn-seo/nuxt/controlling-crawlers/meta-tags) for page-level indexing control.

To prevent indexing:

- Use [`noindex` meta tag](/learn-seo/nuxt/controlling-crawlers/meta-tags) (requires allowing crawl)
- Use password protection or authentication
- Return 404/410 status codes

Don't block pages with `noindex` in `robots.txt`. Google can't see the tag if it can't crawl.

## Common Mistakes

### 1. Blocking JavaScript and CSS

[Google needs JavaScript and CSS to render pages](https://developers.google.com/search/docs/crawling-indexing/javascript/javascript-seo-basics). Blocking them breaks indexing:

```robots-txt [robots.txt]
# ❌ Prevents Google from rendering your Nuxt app
User-agent: *
Disallow: /assets/
Disallow: /*.js$
Disallow: /*.css$
```

Nuxt apps are JavaScript-heavy. Never block `.js`, `.css`, or `/assets/` from Googlebot.

### 2. Blocking Dev Sites in Production

Copy-pasting a dev `robots.txt` to production blocks all crawlers:

```robots-txt [robots.txt]
# ❌ Accidentally left from staging
User-agent: *
Disallow: /
```

The [Nuxt Robots module](/docs/robots/getting-started/introduction) handles this automatically based on environment.

### 3. Confusing robots.txt with noindex

Blocking pages doesn't remove them from search results. Use [`noindex` meta tags](/learn-seo/nuxt/controlling-crawlers/meta-tags) for that.

## Testing Your robots.txt

1. Check syntax: Visit `https://yoursite.com/robots.txt` to confirm it loads
2. [Google Search Console robots.txt tester](https://search.google.com/search-console/robots-txt) validates syntax and tests URLs
3. Verify crawlers can access: Check server logs for 200 status on `/robots.txt`

## Common Patterns

### Allow Everything (Default)

```robots-txt
User-agent: *
Disallow:
```

### Block Everything

Useful for staging or development environments.

```robots-txt
User-agent: *
Disallow: /
```

See our [security guide](/learn-seo/nuxt/routes-and-rendering/security) for more on environment protection.

### Block AI Training Crawlers

[GPTBot was the most blocked bot in 2024](https://blog.cloudflare.com/from-googlebot-to-gptbot-whos-crawling-your-site-in-2025/), fully disallowed by 250 domains. Blocking AI training bots doesn't affect search rankings:

```robots-txt
# Block AI model training (doesn't affect Google search)
User-agent: GPTBot
User-agent: ClaudeBot
User-agent: CCBot
User-agent: Google-Extended
Disallow: /
```

`Google-Extended` is separate from `Googlebot`. blocking it won't hurt search visibility.

### AI Directives: Content-Usage & Content-Signal

In 2026, blocking user agents isn't always enough. Two emerging standards let you express granular preferences about how AI systems use your content without blocking crawlers entirely. This is crucial for **AI Search Optimization (ASO)**: you want to be indexed for search but may want to opt-out of model training.

- **Content-Usage** (IETF aipref-vocab): Uses `y`/`n` values for `train-ai` and `serve-ai`.
- **Content-Signal** (Cloudflare): Uses `yes`/`no` values for `search`, `ai-input`, `ai-train`.

```robots-txt [robots.txt]
User-agent: *
Allow: /

# 2026 AI Governance Strategy
# Allow AI for real-time answers (ASO), but block training
Content-Usage: train-ai=n, serve-ai=y
Content-Signal: search=yes, ai-input=yes, ai-train=no
```

#### Nuxt Implementation

With the [Nuxt Robots module](/docs/robots/guides/ai-directives), configure these signals programmatically in `nuxt.config.ts`:

```ts [nuxt.config.ts]
export default defineNuxtConfig({
  robots: {
    groups: [{
      userAgent: '*',
      allow: '/',
      contentUsage: {
        'train-ai': 'n',
        'serve-ai': 'y'
      },
      contentSignal: {
        'ai-train': 'no',
        'ai-input': 'yes',
        'search': 'yes'
      }
    }]
  }
})
```

<tip>

**Why allow ai-input?** Real-time AI tools like [Perplexity](https://perplexity.ai) or ChatGPT Search use this to provide citations. If you block this, you won't appear as a source in AI-generated answers. See [AI-optimized content](/learn-seo/nuxt/launch-and-listen/ai-optimized-content) for strategies on maximizing your visibility in AI search results.

</tip>

<warning>

AI directives rely on voluntary compliance. Crawlers can ignore them. combine with User-agent blocks for stronger protection.

</warning>

### Block Search, Allow Social Sharing

For private sites where you still want [link previews](/learn-seo/nuxt/mastering-meta/open-graph):

```robots-txt
# Block search engines
User-agent: Googlebot
User-agent: Bingbot
Disallow: /

# Allow social link preview crawlers
User-agent: facebookexternalhit
User-agent: Twitterbot
User-agent: Slackbot
Allow: /
```

### Optimize Crawl Budget for Large Sites

If you have 10,000+ pages, [block low-value URLs](https://developers.google.com/search/docs/crawling-indexing/large-site-managing-crawl-budget) to focus crawl budget on important content:

```robots-txt
User-agent: *
# Block internal search results
Disallow: /search?
# Block infinite scroll pagination
Disallow: /*?page=
# Block filtered/sorted product pages
Disallow: /products?*sort=
Disallow: /products?*filter=
# Block print versions
Disallow: /*/print
```

Sites under 1,000 pages don't need crawl budget optimization.
