The robots.txt file controls which parts of your site crawlers can access. Officially adopted as RFC 9309 in September 2022 after 28 years as a de facto standard, it's primarily used to manage crawl budget on large sites and block AI training bots.
Robots.txt is not a security mechanism—crawlers can ignore it. For individual page control, use meta robots tags instead.
To get started quickly with a static robots.txt, add the file in your public directory:
public/
robots.txt
Add your rules:
# Allow all crawlers
User-agent: *
Disallow:
# Optionally point to your sitemap
Sitemap: https://mysite.com/sitemap.xml
For environment-specific rules (e.g., blocking all crawlers in staging), generate robots.txt server-side:
import express from 'express'
const app = express()
app.get('/robots.txt', (req, res) => {
const isDev = process.env.NODE_ENV !== 'production'
const robots = isDev
? 'User-agent: *\nDisallow: /'
: 'User-agent: *\nDisallow:\nSitemap: https://mysite.com/sitemap.xml'
res.type('text/plain').send(robots)
})
// server.js for Vite SSR
import express from 'express'
const app = express()
app.use((req, res, next) => {
if (req.path === '/robots.txt') {
const isDev = process.env.NODE_ENV !== 'production'
const robots = isDev
? 'User-agent: *\nDisallow: /'
: 'User-agent: *\nDisallow:\nSitemap: https://mysite.com/sitemap.xml'
return res.type('text/plain').send(robots)
}
next()
})
import { defineEventHandler, setHeader } from 'h3'
export default defineEventHandler((event) => {
if (event.path === '/robots.txt') {
const isDev = process.env.NODE_ENV !== 'production'
const robots = isDev
? 'User-agent: *\nDisallow: /'
: 'User-agent: *\nDisallow:\nSitemap: https://mysite.com/sitemap.xml'
setHeader(event, 'Content-Type', 'text/plain')
return robots
}
})
The robots.txt file consists of directives grouped by user agent. Google uses the most specific matching rule based on path length:
# Define which crawler these rules apply to
User-agent: *
# Block access to specific paths
Disallow: /admin
# Allow access to specific paths (optional, more specific than Disallow)
Allow: /admin/public
# Point to your sitemap
Sitemap: https://mysite.com/sitemap.xml
The User-agent directive specifies which crawler the rules apply to:
# All crawlers
User-agent: *
# Just Googlebot
User-agent: Googlebot
# Multiple specific crawlers
User-agent: Googlebot
User-agent: Bingbot
Disallow: /private
Common crawler user agents:
The Allow and Disallow directives control path access:
User-agent: *
# Block all paths starting with /admin
Disallow: /admin
# Block a specific file
Disallow: /private.html
# Block files with specific extensions
Disallow: /*.pdf$
# Block URL parameters
Disallow: /*?*
Wildcards supported (RFC 9309):
* — matches zero or more characters$ — matches the end of the URLThe Sitemap directive tells crawlers where to find your sitemap.xml:
Sitemap: https://mysite.com/sitemap.xml
# Multiple sitemaps
Sitemap: https://mysite.com/products-sitemap.xml
Sitemap: https://mysite.com/blog-sitemap.xml
Crawl-Delay is not part of RFC 9309. Google ignores it. Bing and Yandex support it:
User-agent: Bingbot
Crawl-delay: 10 # seconds between requests
For Google, crawl rate is managed in Search Console.
Robots.txt is not a security mechanism. Malicious crawlers ignore it, and listing paths in Disallow reveals their location to attackers.
Common mistake:
# ❌ Advertises your admin panel location
User-agent: *
Disallow: /admin
Disallow: /wp-admin
Disallow: /api/internal
Never use robots.txt to hide sensitive content. Listing paths in Disallow advertises their location to attackers, and malicious bots ignore robots.txt entirely. Use authentication and proper access controls instead.
Use proper authentication instead. See our security guide for details.
Blocking a URL in robots.txt prevents crawling but doesn't prevent indexing. If other sites link to the URL, Google can still index it without crawling, showing the URL with no snippet.
To prevent indexing:
noindex meta tag (requires allowing crawl)Don't block pages with noindex in robots.txt—Google can't see the tag if it can't crawl.
Google needs JavaScript and CSS to render pages. Blocking them breaks indexing:
# ❌ Prevents Google from rendering your Vue app
User-agent: *
Disallow: /assets/
Disallow: /*.js$
Disallow: /*.css$
Vue apps are JavaScript-heavy. Never block .js, .css, or /assets/ from Googlebot.
Copy-pasting a dev robots.txt to production blocks all crawlers:
# ❌ Accidentally left from staging
User-agent: *
Disallow: /
Use dynamic generation or environment checks to avoid this.
Blocking pages doesn't remove them from search results. Use noindex meta tags for that.
https://yoursite.com/robots.txt to confirm it loads/robots.txtUser-agent: *
Disallow:
Useful for staging or development environments.
User-agent: *
Disallow: /
See our security guide for more on environment protection.
GPTBot was the most blocked bot in 2024, fully disallowed by 250 domains. Blocking AI training bots doesn't affect search rankings:
# Block AI model training (doesn't affect Google search)
User-agent: GPTBot
User-agent: ClaudeBot
User-agent: CCBot
User-agent: Google-Extended
Disallow: /
Google-Extended is separate from Googlebot—blocking it won't hurt search visibility.
Two emerging standards let you express preferences about how AI systems use your content—without blocking crawlers entirely:
y/n values for train-aiyes/no values for search, ai-input, ai-trainUser-agent: *
Allow: /
# IETF aipref-vocab
Content-Usage: train-ai=n
# Cloudflare Content Signals
Content-Signal: search=yes, ai-input=no, ai-train=no
This allows crawlers to access your content for search indexing while blocking AI training and RAG/grounding uses. Both can be used together for broader coverage.
For private sites where you still want link previews:
# Block search engines
User-agent: Googlebot
User-agent: Bingbot
Disallow: /
# Allow social link preview crawlers
User-agent: facebookexternalhit
User-agent: Twitterbot
User-agent: Slackbot
Allow: /
If you have 10,000+ pages, block low-value URLs to focus crawl budget on important content:
User-agent: *
# Block internal search results
Disallow: /search?
# Block infinite scroll pagination
Disallow: /*?page=
# Block filtered/sorted product pages
Disallow: /products?*sort=
Disallow: /products?*filter=
# Block print versions
Disallow: /*/print
Sites under 1,000 pages don't need crawl budget optimization.
If you're using Nuxt, check out Nuxt SEO which handles much of this automatically.