The robots.txt file controls which parts of your site crawlers can access. Officially adopted as RFC 9309 in September 2022 after 28 years as a de facto standard, it's primarily used to manage crawl budget on large sites and block AI training bots.
Robots.txt is not a security mechanism—crawlers can ignore it. For individual page control, use meta robots tags instead.
Nuxt provides multiple approaches for robots.txt. For most sites, use the Nuxt Robots module:
Install the module:
npx nuxi@latest module add robots
The module automatically generates robots.txt with zero config. For environment-specific rules (e.g., blocking all crawlers in staging), configure in nuxt.config.ts:
export default defineNuxtConfig({
modules: ['@nuxtjs/robots'],
robots: {
disallow: process.env.NODE_ENV !== 'production' ? '/' : undefined
}
})
For simple static rules, add the file in your public directory:
public/
robots.txt
Add your rules:
# Allow all crawlers
User-agent: *
Disallow:
# Optionally point to your sitemap
Sitemap: https://mysite.com/sitemap.xml
For custom dynamic generation without the module, create a server route:
export default defineEventHandler((event) => {
const isDev = process.env.NODE_ENV !== 'production'
const robots = isDev
? 'User-agent: *\nDisallow: /'
: 'User-agent: *\nDisallow:\nSitemap: https://mysite.com/sitemap.xml'
setHeader(event, 'Content-Type', 'text/plain')
return robots
})
The robots.txt file consists of directives grouped by user agent. Google uses the most specific matching rule based on path length:
# Define which crawler these rules apply to
User-agent: *
# Block access to specific paths
Disallow: /admin
# Allow access to specific paths (optional, more specific than Disallow)
Allow: /admin/public
# Point to your sitemap
Sitemap: https://mysite.com/sitemap.xml
The User-agent directive specifies which crawler the rules apply to:
# All crawlers
User-agent: *
# Just Googlebot
User-agent: Googlebot
# Multiple specific crawlers
User-agent: Googlebot
User-agent: Bingbot
Disallow: /private
Common crawler user agents:
The Allow and Disallow directives control path access:
User-agent: *
# Block all paths starting with /admin
Disallow: /admin
# Block a specific file
Disallow: /private.html
# Block files with specific extensions
Disallow: /*.pdf$
# Block URL parameters
Disallow: /*?*
Wildcards supported (RFC 9309):
* — matches zero or more characters$ — matches the end of the URLThe Sitemap directive tells crawlers where to find your sitemap.xml:
Sitemap: https://mysite.com/sitemap.xml
# Multiple sitemaps
Sitemap: https://mysite.com/products-sitemap.xml
Sitemap: https://mysite.com/blog-sitemap.xml
With the Nuxt Sitemap module, the sitemap URL is automatically added to your robots.txt.
Crawl-Delay is not part of RFC 9309. Google ignores it. Bing and Yandex support it:
User-agent: Bingbot
Crawl-delay: 10 # seconds between requests
For Google, crawl rate is managed in Search Console.
Robots.txt is not a security mechanism. Malicious crawlers ignore it, and listing paths in Disallow reveals their location to attackers.
Common mistake:
# ❌ Advertises your admin panel location
User-agent: *
Disallow: /admin
Disallow: /wp-admin
Disallow: /api/internal
Use proper authentication instead. See our security guide for details.
Blocking a URL in robots.txt prevents crawling but doesn't prevent indexing. If other sites link to the URL, Google can still index it without crawling, showing the URL with no snippet.
To prevent indexing:
noindex meta tag (requires allowing crawl)Don't block pages with noindex in robots.txt—Google can't see the tag if it can't crawl.
Google needs JavaScript and CSS to render pages. Blocking them breaks indexing:
# ❌ Prevents Google from rendering your Nuxt app
User-agent: *
Disallow: /assets/
Disallow: /*.js$
Disallow: /*.css$
Nuxt apps are JavaScript-heavy. Never block .js, .css, or /assets/ from Googlebot.
Copy-pasting a dev robots.txt to production blocks all crawlers:
# ❌ Accidentally left from staging
User-agent: *
Disallow: /
The Nuxt Robots module handles this automatically based on environment.
Blocking pages doesn't remove them from search results. Use noindex meta tags for that.
https://yoursite.com/robots.txt to confirm it loads/robots.txtUser-agent: *
Disallow:
Useful for staging or development environments.
User-agent: *
Disallow: /
See our security guide for more on environment protection.
GPTBot was the most blocked bot in 2024, fully disallowed by 250 domains. Blocking AI training bots doesn't affect search rankings:
# Block AI model training (doesn't affect Google search)
User-agent: GPTBot
User-agent: ClaudeBot
User-agent: CCBot
User-agent: Google-Extended
Disallow: /
Google-Extended is separate from Googlebot—blocking it won't hurt search visibility.
Two emerging standards let you express preferences about how AI systems use your content—without blocking crawlers entirely:
y/n values for train-aiyes/no values for search, ai-input, ai-trainUser-agent: *
Allow: /
# IETF aipref-vocab
Content-Usage: train-ai=n
# Cloudflare Content Signals
Content-Signal: search=yes, ai-input=no, ai-train=no
With the Nuxt Robots module, configure programmatically:
export default defineNuxtConfig({
robots: {
groups: [{
userAgent: '*',
allow: '/',
contentUsage: { 'train-ai': 'n' },
contentSignal: { 'ai-train': 'no', 'search': 'yes' }
}]
}
})
For private sites where you still want link previews:
# Block search engines
User-agent: Googlebot
User-agent: Bingbot
Disallow: /
# Allow social link preview crawlers
User-agent: facebookexternalhit
User-agent: Twitterbot
User-agent: Slackbot
Allow: /
If you have 10,000+ pages, block low-value URLs to focus crawl budget on important content:
User-agent: *
# Block internal search results
Disallow: /search?
# Block infinite scroll pagination
Disallow: /*?page=
# Block filtered/sorted product pages
Disallow: /products?*sort=
Disallow: /products?*filter=
# Block print versions
Disallow: /*/print
Sites under 1,000 pages don't need crawl budget optimization.