---
title: "Robots Txt · Nuxt SEO"
meta:
  "og:title": "Robots Txt · Nuxt SEO"
  author: "Harlan Wilton"
---

# **Robots Txt**

[![Harlan Wilton](https://avatars.githubusercontent.com/u/5326365?v=4)Harlan Wilton](https://x.com/harlan-zw)10 mins read Published **Nov 3, 2024** Updated **Jan 29, 2026**

**What you'll learn**

- robots.txt is advisory. crawlers can ignore it, never use for security
- Use Nuxt Robots module for environment-aware generation (auto-block staging)
- In 2026, use `Content-Signal` and `Content-Usage` for granular AI governance
- Include sitemap reference and distinguish between Search and AI Training bots

The `robots.txt` file controls which parts of your site crawlers can access. [**Officially adopted as RFC 9309**](https://datatracker.ietf.org/doc/html/rfc9309) in September 2022, it's primarily used to [**manage crawl budget**](https://developers.google.com/search/docs/crawling-indexing/large-site-managing-crawl-budget) on large sites and manage **AI Bot Governance**.

In 2026, robots.txt has evolved from a simple "block/allow" file into a sophisticated policy document for the **Agentic Web**. It's how you tell AI models whether they can use your data for training, real-time answering, or agentic actions.

## [Quick Setup](#quick-setup)

### [Static robots.txt](#static-robotstxt)

For simple static rules, add the file in your public directory:

```
public/
  robots.txt
```

Add your rules:

robots.txt

```
# Allow all crawlers
User-agent: *
Disallow:

# Optionally point to your sitemap
Sitemap: https://mysite.com/sitemap.xml
```

### [Server Route](#server-route)

For custom dynamic generation, create a server route:

server/routes/robots.txt.ts

```
export default defineEventHandler((event) => {
  const isDev = process.env.NODE_ENV !== 'production'
  const robots = isDev
    ? 'User-agent: *\nDisallow: /'
    : 'User-agent: *\nDisallow:\nSitemap: https://mysite.com/sitemap.xml'

  setHeader(event, 'Content-Type', 'text/plain')
  return robots
})
```

### [Automatic Generation with Module](#automatic-generation-with-module)

For environment-aware generation (auto-block staging), use the Nuxt Robots module:

[Robots **v5.7.1** 8.5M 511 Tame the robots crawling and indexing your site with ease.](https://nuxtseo.com/docs/robots/getting-started/introduction)

Install the module:

```
npx nuxi@latest module add robots
```

The module automatically generates `robots.txt` with zero config. For environment-specific rules:

nuxt.config.ts

```
export default defineNuxtConfig({
  modules: ['@nuxtjs/robots'],
  robots: {
    disallow: process.env.NODE_ENV !== 'production' ? '/' : undefined
  }
})
```

## [Robots.txt Syntax](#robotstxt-syntax)

The `robots.txt` file consists of directives grouped by user agent. Google [**uses the most specific matching rule**](https://developers.google.com/search/docs/crawling-indexing/robots/robots_txt) based on path length:

robots.txt

```
# Define which crawler these rules apply to
User-agent: *

# Block access to specific paths
Disallow: /admin

# Allow access to specific paths (optional, more specific than Disallow)
Allow: /admin/public

# Point to your sitemap
Sitemap: https://mysite.com/sitemap.xml
```

### [User-agent](#user-agent)

The `User-agent` directive specifies which crawler the rules apply to:

robots.txt

```
# All crawlers
User-agent: *

# Just Googlebot
User-agent: Googlebot

# Multiple specific crawlers
User-agent: Googlebot
User-agent: Bingbot
Disallow: /private
```

Common crawler user agents:

- [**Googlebot**](https://developers.google.com/search/docs/advanced/crawling/overview-google-crawlers): Google's search crawler (28% of all bot traffic in 2025)
- [**Google-Extended**](https://blog.cloudflare.com/from-googlebot-to-gptbot-whos-crawling-your-site-in-2025/): Google's AI training crawler (separate from search)
- [**GPTBot**](https://platform.openai.com/docs/bots/overview-of-openai-crawlers): OpenAI's AI training crawler (7.5% of bot traffic)
- [**ClaudeBot**](https://www.playwire.com/blog/how-to-block-ai-bots-with-robotstxt-the-complete-publishers-guide): Anthropic's AI training crawler
- [**CCBot**](https://commoncrawl.org/ccbot): Common Crawl's dataset builder (frequently blocked)
- [**Bingbot**](https://ahrefs.com/seo/glossary/bingbot): Microsoft's search crawler
- [**FacebookExternalHit**](https://developers.facebook.com/docs/sharing/webmasters/web-crawlers/): Facebook's link preview crawler

### [Allow / Disallow](#allow-disallow)

The `Allow` and `Disallow` directives control path access:

robots.txt

```
User-agent: *
# Block all paths starting with /admin
Disallow: /admin

# Block a specific file
Disallow: /private.html

# Block files with specific extensions
Disallow: /*.pdf$

# Block URL parameters
Disallow: /*?*
```

Wildcards supported ([**RFC 9309**](https://datatracker.ietf.org/doc/html/rfc9309)):

- `*` . matches zero or more characters
- `$` . matches the end of the URL
- Paths are case-sensitive and relative to domain root

### [Sitemap](#sitemap)

The `Sitemap` directive tells crawlers where to find your [**sitemap.xml**](https://nuxtseo.com/learn-seo/nuxt/controlling-crawlers/sitemaps):

robots.txt

```
Sitemap: https://mysite.com/sitemap.xml

# Multiple sitemaps
Sitemap: https://mysite.com/products-sitemap.xml
Sitemap: https://mysite.com/blog-sitemap.xml
```

With the [**Nuxt Sitemap module**](https://nuxtseo.com/docs/sitemap/getting-started/introduction), the sitemap URL is automatically added to your `robots.txt`.

### [Crawl-Delay (Non-Standard)](#crawl-delay-non-standard)

`Crawl-Delay` is not part of [**RFC 9309**](https://datatracker.ietf.org/doc/html/rfc9309). Google ignores it. Bing and Yandex support it:

robots.txt

```
User-agent: Bingbot
Crawl-delay: 10  # seconds between requests
```

For Google, [**crawl rate is managed in Search Console**](https://developers.google.com/search/docs/crawling-indexing/large-site-managing-crawl-budget).

## [Security: Why robots.txt Fails](#security-why-robotstxt-fails)

[**Robots.txt is not a security mechanism**](https://developer.mozilla.org/en-US/docs/Web/Security/Practical_implementation_guides/Robots_txt). Malicious crawlers ignore it, and listing paths in `Disallow` [**reveals their location to attackers**](https://www.searchenginejournal.com/robots-txt-security-risks/289719/).

**Common mistake:**

```
# ❌ Advertises your admin panel location
User-agent: *
Disallow: /admin
Disallow: /wp-admin
Disallow: /api/internal
```

Use [**proper authentication**](https://developers.google.com/search/docs/crawling-indexing/block-indexing) instead. See our [**security guide**](https://nuxtseo.com/learn-seo/nuxt/routes-and-rendering/security) for details.

## [Crawling vs Indexing](#crawling-vs-indexing)

Blocking a URL in `robots.txt` prevents crawling but [**doesn't prevent indexing**](https://developers.google.com/search/docs/crawling-indexing/robots/intro). If other sites link to the URL, Google can still index it without crawling, showing the URL with no snippet.

To prevent indexing:

- Use [`noindex`** meta tag**](https://nuxtseo.com/learn-seo/nuxt/controlling-crawlers/meta-tags) (requires allowing crawl)
- Use password protection or authentication
- Return 404/410 status codes

Don't block pages with `noindex` in `robots.txt`. Google can't see the tag if it can't crawl.

## [Common Mistakes](#common-mistakes)

### [1. Blocking JavaScript and CSS](#_1-blocking-javascript-and-css)

[**Google needs JavaScript and CSS to render pages**](https://developers.google.com/search/docs/crawling-indexing/javascript/javascript-seo-basics). Blocking them breaks indexing:

robots.txt

```
# ❌ Prevents Google from rendering your Nuxt app
User-agent: *
Disallow: /assets/
Disallow: /*.js$
Disallow: /*.css$
```

Nuxt apps are JavaScript-heavy. Never block `.js`, `.css`, or `/assets/` from Googlebot.

### [2. Blocking Dev Sites in Production](#_2-blocking-dev-sites-in-production)

Copy-pasting a dev `robots.txt` to production blocks all crawlers:

robots.txt

```
# ❌ Accidentally left from staging
User-agent: *
Disallow: /
```

The [**Nuxt Robots module**](https://nuxtseo.com/docs/robots/getting-started/introduction) handles this automatically based on environment.

### [3. Confusing robots.txt with noindex](#_3-confusing-robotstxt-with-noindex)

Blocking pages doesn't remove them from search results. Use [`noindex`** meta tags**](https://nuxtseo.com/learn-seo/nuxt/controlling-crawlers/meta-tags) for that.

## [Testing Your robots.txt](#testing-your-robotstxt)

1. Check syntax: Visit `https://yoursite.com/robots.txt` to confirm it loads
2. [**Google Search Console robots.txt tester**](https://search.google.com/search-console/robots-txt) validates syntax and tests URLs
3. Verify crawlers can access: Check server logs for 200 status on `/robots.txt`

## [Common Patterns](#common-patterns)

### [Allow Everything (Default)](#allow-everything-default)

```
User-agent: *
Disallow:
```

### [Block Everything](#block-everything)

Useful for staging or development environments.

```
User-agent: *
Disallow: /
```

See our [**security guide**](https://nuxtseo.com/learn-seo/nuxt/routes-and-rendering/security) for more on environment protection.

### [Block AI Training Crawlers](#block-ai-training-crawlers)

[**GPTBot was the most blocked bot in 2024**](https://blog.cloudflare.com/from-googlebot-to-gptbot-whos-crawling-your-site-in-2025/), fully disallowed by 250 domains. Blocking AI training bots doesn't affect search rankings:

```
# Block AI model training (doesn't affect Google search)
User-agent: GPTBot
User-agent: ClaudeBot
User-agent: CCBot
User-agent: Google-Extended
Disallow: /
```

`Google-Extended` is separate from `Googlebot`. blocking it won't hurt search visibility.

### [AI Directives: Content-Usage & Content-Signal](#ai-directives-content-usage-content-signal)

In 2026, blocking user agents isn't always enough. Two emerging standards let you express granular preferences about how AI systems use your content without blocking crawlers entirely. This is crucial for **AI Search Optimization (ASO)**: you want to be indexed for search but may want to opt-out of model training.

- **[**Content-Usage**](https://ietf-wg-aipref.github.io/drafts/draft-ietf-aipref-vocab.html)** (IETF aipref-vocab) . Uses `y`/`n` values for `train-ai` and `serve-ai`.
- **[**Content-Signal**](https://contentsignals.org/)** (Cloudflare) . Uses `yes`/`no` values for `search`, `ai-input`, `ai-train`.

robots.txt

```
User-agent: *
Allow: /

# 2026 AI Governance Strategy
# Allow AI for real-time answers (ASO), but block training
Content-Usage: train-ai=n, serve-ai=y
Content-Signal: search=yes, ai-input=yes, ai-train=no
```

#### [Nuxt Implementation](#nuxt-implementation)

With the [**Nuxt Robots module**](https://nuxtseo.com/docs/robots/guides/ai-directives), configure these signals programmatically in `nuxt.config.ts`:

nuxt.config.ts

```
export default defineNuxtConfig({
  robots: {
    groups: [{
      userAgent: '*',
      allow: '/',
      contentUsage: {
        'train-ai': 'n',
        'serve-ai': 'y'
      },
      contentSignal: {
        'ai-train': 'no',
        'ai-input': 'yes',
        'search': 'yes'
      }
    }]
  }
})
```

**Why allow `ai-input`?** Real-time AI tools like [**Perplexity**](https://perplexity.ai) or ChatGPT Search use this to provide citations. If you block this, you won't appear as a source in AI-generated answers.

AI directives rely on voluntary compliance. Crawlers can ignore them. combine with User-agent blocks for stronger protection.

### [Block Search, Allow Social Sharing](#block-search-allow-social-sharing)

For private sites where you still want [**link previews**](https://nuxtseo.com/learn-seo/nuxt/mastering-meta/open-graph):

```
# Block search engines
User-agent: Googlebot
User-agent: Bingbot
Disallow: /

# Allow social link preview crawlers
User-agent: facebookexternalhit
User-agent: Twitterbot
User-agent: Slackbot
Allow: /
```

### [Optimize Crawl Budget for Large Sites](#optimize-crawl-budget-for-large-sites)

If you have 10,000+ pages, [**block low-value URLs**](https://developers.google.com/search/docs/crawling-indexing/large-site-managing-crawl-budget) to focus crawl budget on important content:

```
User-agent: *
# Block internal search results
Disallow: /search?
# Block infinite scroll pagination
Disallow: /*?page=
# Block filtered/sorted product pages
Disallow: /products?*sort=
Disallow: /products?*filter=
# Block print versions
Disallow: /*/print
```

Sites under 1,000 pages don't need crawl budget optimization.

---

[**Controlling Crawlers** Manage how search engines crawl and index your Nuxt app. Configure robots.txt, sitemaps, canonical URLs, and redirects for better SEO.](https://nuxtseo.com/learn-seo/nuxt/controlling-crawlers) [**Sitemaps** Generate sitemaps for Nuxt with the @nuxtjs/sitemap module or server routes for dynamic content.](https://nuxtseo.com/learn-seo/nuxt/controlling-crawlers/sitemaps)

**On this page**

- [Quick Setup](#quick-setup)
- [Robots.txt Syntax](#robotstxt-syntax)
- [Security: Why robots.txt Fails](#security-why-robotstxt-fails)
- [Crawling vs Indexing](#crawling-vs-indexing)
- [Common Mistakes](#common-mistakes)
- [Testing Your robots.txt](#testing-your-robotstxt)
- [Common Patterns](#common-patterns)