Robots.txt in Nuxt

Robots.txt tells crawlers what they can access. Here's how to set it up in Nuxt.
Harlan WiltonHarlan Wilton10 mins read Published Updated
What you'll learn
  • robots.txt is advisory—crawlers can ignore it, never use for security
  • Use Nuxt Robots module for environment-aware generation (auto-block staging)
  • Include sitemap reference and consider blocking AI training bots separately

The robots.txt file controls which parts of your site crawlers can access. Officially adopted as RFC 9309 in September 2022 after 28 years as a de facto standard, it's primarily used to manage crawl budget on large sites and block AI training bots.

Robots.txt is not a security mechanism—crawlers can ignore it. For individual page control, use meta robots tags instead.

Quick Setup

Nuxt provides multiple approaches for robots.txt. For most sites, use the Nuxt Robots module:

Robots v5.6.7
7.7M
500
Tame the robots crawling and indexing your site with ease.

Install the module:

npx nuxi@latest module add robots

The module automatically generates robots.txt with zero config. For environment-specific rules (e.g., blocking all crawlers in staging), configure in nuxt.config.ts:

nuxt.config.ts
export default defineNuxtConfig({
  modules: ['@nuxtjs/robots'],
  robots: {
    disallow: process.env.NODE_ENV !== 'production' ? '/' : undefined
  }
})

Static robots.txt

For simple static rules, add the file in your public directory:

public/
  robots.txt

Add your rules:

robots.txt
# Allow all crawlers
User-agent: *
Disallow:

# Optionally point to your sitemap
Sitemap: https://mysite.com/sitemap.xml

Server Route

For custom dynamic generation without the module, create a server route:

server/routes/robots.txt.ts
export default defineEventHandler((event) => {
  const isDev = process.env.NODE_ENV !== 'production'
  const robots = isDev
    ? 'User-agent: *\nDisallow: /'
    : 'User-agent: *\nDisallow:\nSitemap: https://mysite.com/sitemap.xml'

  setHeader(event, 'Content-Type', 'text/plain')
  return robots
})

Robots.txt Syntax

The robots.txt file consists of directives grouped by user agent. Google uses the most specific matching rule based on path length:

robots.txt
# Define which crawler these rules apply to
User-agent: *

# Block access to specific paths
Disallow: /admin

# Allow access to specific paths (optional, more specific than Disallow)
Allow: /admin/public

# Point to your sitemap
Sitemap: https://mysite.com/sitemap.xml

User-agent

The User-agent directive specifies which crawler the rules apply to:

robots.txt
# All crawlers
User-agent: *

# Just Googlebot
User-agent: Googlebot

# Multiple specific crawlers
User-agent: Googlebot
User-agent: Bingbot
Disallow: /private

Common crawler user agents:

  • Googlebot: Google's search crawler (28% of all bot traffic in 2025)
  • Google-Extended: Google's AI training crawler (separate from search)
  • GPTBot: OpenAI's AI training crawler (7.5% of bot traffic)
  • ClaudeBot: Anthropic's AI training crawler
  • CCBot: Common Crawl's dataset builder (frequently blocked)
  • Bingbot: Microsoft's search crawler
  • FacebookExternalHit: Facebook's link preview crawler

Allow / Disallow

The Allow and Disallow directives control path access:

robots.txt
User-agent: *
# Block all paths starting with /admin
Disallow: /admin

# Block a specific file
Disallow: /private.html

# Block files with specific extensions
Disallow: /*.pdf$

# Block URL parameters
Disallow: /*?*

Wildcards supported (RFC 9309):

  • * — matches zero or more characters
  • $ — matches the end of the URL
  • Paths are case-sensitive and relative to domain root

Sitemap

The Sitemap directive tells crawlers where to find your sitemap.xml:

robots.txt
Sitemap: https://mysite.com/sitemap.xml

# Multiple sitemaps
Sitemap: https://mysite.com/products-sitemap.xml
Sitemap: https://mysite.com/blog-sitemap.xml

With the Nuxt Sitemap module, the sitemap URL is automatically added to your robots.txt.

Crawl-Delay (Non-Standard)

Crawl-Delay is not part of RFC 9309. Google ignores it. Bing and Yandex support it:

robots.txt
User-agent: Bingbot
Crawl-delay: 10  # seconds between requests

For Google, crawl rate is managed in Search Console.

Security: Why robots.txt Fails

Robots.txt is not a security mechanism. Malicious crawlers ignore it, and listing paths in Disallow reveals their location to attackers.

Common mistake:

# ❌ Advertises your admin panel location
User-agent: *
Disallow: /admin
Disallow: /wp-admin
Disallow: /api/internal

Use proper authentication instead. See our security guide for details.

Crawling vs Indexing

Blocking a URL in robots.txt prevents crawling but doesn't prevent indexing. If other sites link to the URL, Google can still index it without crawling, showing the URL with no snippet.

To prevent indexing:

  • Use noindex meta tag (requires allowing crawl)
  • Use password protection or authentication
  • Return 404/410 status codes

Don't block pages with noindex in robots.txt—Google can't see the tag if it can't crawl.

Common Mistakes

1. Blocking JavaScript and CSS

Google needs JavaScript and CSS to render pages. Blocking them breaks indexing:

robots.txt
# ❌ Prevents Google from rendering your Nuxt app
User-agent: *
Disallow: /assets/
Disallow: /*.js$
Disallow: /*.css$

Nuxt apps are JavaScript-heavy. Never block .js, .css, or /assets/ from Googlebot.

2. Blocking Dev Sites in Production

Copy-pasting a dev robots.txt to production blocks all crawlers:

robots.txt
# ❌ Accidentally left from staging
User-agent: *
Disallow: /

The Nuxt Robots module handles this automatically based on environment.

3. Confusing robots.txt with noindex

Blocking pages doesn't remove them from search results. Use noindex meta tags for that.

Testing Your robots.txt

  1. Check syntax: Visit https://yoursite.com/robots.txt to confirm it loads
  2. Google Search Console robots.txt tester validates syntax and tests URLs
  3. Verify crawlers can access: Check server logs for 200 status on /robots.txt

Common Patterns

Allow Everything (Default)

User-agent: *
Disallow:

Block Everything

Useful for staging or development environments.

User-agent: *
Disallow: /

See our security guide for more on environment protection.

Block AI Training Crawlers

GPTBot was the most blocked bot in 2024, fully disallowed by 250 domains. Blocking AI training bots doesn't affect search rankings:

# Block AI model training (doesn't affect Google search)
User-agent: GPTBot
User-agent: ClaudeBot
User-agent: CCBot
User-agent: Google-Extended
Disallow: /

Google-Extended is separate from Googlebot—blocking it won't hurt search visibility.

AI Directives: Content-Usage & Content-Signal

Two emerging standards let you express preferences about how AI systems use your content—without blocking crawlers entirely:

User-agent: *
Allow: /

# IETF aipref-vocab
Content-Usage: train-ai=n

# Cloudflare Content Signals
Content-Signal: search=yes, ai-input=no, ai-train=no

With the Nuxt Robots module, configure programmatically:

nuxt.config.ts
export default defineNuxtConfig({
  robots: {
    groups: [{
      userAgent: '*',
      allow: '/',
      contentUsage: { 'train-ai': 'n' },
      contentSignal: { 'ai-train': 'no', 'search': 'yes' }
    }]
  }
})
AI directives rely on voluntary compliance. Crawlers can ignore them—combine with User-agent blocks for stronger protection.

Block Search, Allow Social Sharing

For private sites where you still want link previews:

# Block search engines
User-agent: Googlebot
User-agent: Bingbot
Disallow: /

# Allow social link preview crawlers
User-agent: facebookexternalhit
User-agent: Twitterbot
User-agent: Slackbot
Allow: /

Optimize Crawl Budget for Large Sites

If you have 10,000+ pages, block low-value URLs to focus crawl budget on important content:

User-agent: *
# Block internal search results
Disallow: /search?
# Block infinite scroll pagination
Disallow: /*?page=
# Block filtered/sorted product pages
Disallow: /products?*sort=
Disallow: /products?*filter=
# Block print versions
Disallow: /*/print

Sites under 1,000 pages don't need crawl budget optimization.