Robots.txt in Vue

Robots.txt tells crawlers what they can access. Here's how to set it up in Vue.
Harlan WiltonHarlan Wilton10 mins read Published Updated
What you'll learn
  • robots.txt is advisory—crawlers can ignore it, so never use it for security
  • Primary uses: crawl budget optimization and blocking AI training bots
  • Always include a sitemap reference in your robots.txt

The robots.txt file controls which parts of your site crawlers can access. Officially adopted as RFC 9309 in September 2022 after 28 years as a de facto standard, it's primarily used to manage crawl budget on large sites and block AI training bots.

Robots.txt is not a security mechanism—crawlers can ignore it. For individual page control, use meta robots tags instead.

Quick Setup

To get started quickly with a static robots.txt, add the file in your public directory:

public/
  robots.txt

Add your rules:

robots.txt
# Allow all crawlers
User-agent: *
Disallow:

# Optionally point to your sitemap
Sitemap: https://mysite.com/sitemap.xml

Dynamic robots.txt

For environment-specific rules (e.g., blocking all crawlers in staging), generate robots.txt server-side:

import express from 'express'

const app = express()

app.get('/robots.txt', (req, res) => {
  const isDev = process.env.NODE_ENV !== 'production'
  const robots = isDev
    ? 'User-agent: *\nDisallow: /'
    : 'User-agent: *\nDisallow:\nSitemap: https://mysite.com/sitemap.xml'
  res.type('text/plain').send(robots)
})

Robots.txt Syntax

The robots.txt file consists of directives grouped by user agent. Google uses the most specific matching rule based on path length:

robots.txt
# Define which crawler these rules apply to
User-agent: *

# Block access to specific paths
Disallow: /admin

# Allow access to specific paths (optional, more specific than Disallow)
Allow: /admin/public

# Point to your sitemap
Sitemap: https://mysite.com/sitemap.xml

User-agent

The User-agent directive specifies which crawler the rules apply to:

robots.txt
# All crawlers
User-agent: *

# Just Googlebot
User-agent: Googlebot

# Multiple specific crawlers
User-agent: Googlebot
User-agent: Bingbot
Disallow: /private

Common crawler user agents:

  • Googlebot: Google's search crawler (28% of all bot traffic in 2025)
  • Google-Extended: Google's AI training crawler (separate from search)
  • GPTBot: OpenAI's AI training crawler (7.5% of bot traffic)
  • ClaudeBot: Anthropic's AI training crawler
  • CCBot: Common Crawl's dataset builder (frequently blocked)
  • Bingbot: Microsoft's search crawler
  • FacebookExternalHit: Facebook's link preview crawler

Allow / Disallow

The Allow and Disallow directives control path access:

robots.txt
User-agent: *
# Block all paths starting with /admin
Disallow: /admin

# Block a specific file
Disallow: /private.html

# Block files with specific extensions
Disallow: /*.pdf$

# Block URL parameters
Disallow: /*?*

Wildcards supported (RFC 9309):

  • * — matches zero or more characters
  • $ — matches the end of the URL
  • Paths are case-sensitive and relative to domain root

Sitemap

The Sitemap directive tells crawlers where to find your sitemap.xml:

robots.txt
Sitemap: https://mysite.com/sitemap.xml

# Multiple sitemaps
Sitemap: https://mysite.com/products-sitemap.xml
Sitemap: https://mysite.com/blog-sitemap.xml

Crawl-Delay (Non-Standard)

Crawl-Delay is not part of RFC 9309. Google ignores it. Bing and Yandex support it:

robots.txt
User-agent: Bingbot
Crawl-delay: 10  # seconds between requests

For Google, crawl rate is managed in Search Console.

Security: Why robots.txt Fails

Robots.txt is not a security mechanism. Malicious crawlers ignore it, and listing paths in Disallow reveals their location to attackers.

Common mistake:

# ❌ Advertises your admin panel location
User-agent: *
Disallow: /admin
Disallow: /wp-admin
Disallow: /api/internal

Never use robots.txt to hide sensitive content. Listing paths in Disallow advertises their location to attackers, and malicious bots ignore robots.txt entirely. Use authentication and proper access controls instead.

Use proper authentication instead. See our security guide for details.

Crawling vs Indexing

Blocking a URL in robots.txt prevents crawling but doesn't prevent indexing. If other sites link to the URL, Google can still index it without crawling, showing the URL with no snippet.

To prevent indexing:

  • Use noindex meta tag (requires allowing crawl)
  • Use password protection or authentication
  • Return 404/410 status codes

Don't block pages with noindex in robots.txt—Google can't see the tag if it can't crawl.

Common Mistakes

1. Blocking JavaScript and CSS

Google needs JavaScript and CSS to render pages. Blocking them breaks indexing:

robots.txt
# ❌ Prevents Google from rendering your Vue app
User-agent: *
Disallow: /assets/
Disallow: /*.js$
Disallow: /*.css$

Vue apps are JavaScript-heavy. Never block .js, .css, or /assets/ from Googlebot.

2. Blocking Dev Sites in Production

Copy-pasting a dev robots.txt to production blocks all crawlers:

robots.txt
# ❌ Accidentally left from staging
User-agent: *
Disallow: /

Use dynamic generation or environment checks to avoid this.

3. Confusing robots.txt with noindex

Blocking pages doesn't remove them from search results. Use noindex meta tags for that.

Testing Your robots.txt

  1. Check syntax: Visit https://yoursite.com/robots.txt to confirm it loads
  2. Google Search Console robots.txt tester validates syntax and tests URLs
  3. Verify crawlers can access: Check server logs for 200 status on /robots.txt

Common Patterns

Allow Everything (Default)

User-agent: *
Disallow:

Block Everything

Useful for staging or development environments.

User-agent: *
Disallow: /

See our security guide for more on environment protection.

Block AI Training Crawlers

GPTBot was the most blocked bot in 2024, fully disallowed by 250 domains. Blocking AI training bots doesn't affect search rankings:

# Block AI model training (doesn't affect Google search)
User-agent: GPTBot
User-agent: ClaudeBot
User-agent: CCBot
User-agent: Google-Extended
Disallow: /

Google-Extended is separate from Googlebot—blocking it won't hurt search visibility.

AI Directives: Content-Usage & Content-Signal

Two emerging standards let you express preferences about how AI systems use your content—without blocking crawlers entirely:

User-agent: *
Allow: /

# IETF aipref-vocab
Content-Usage: train-ai=n

# Cloudflare Content Signals
Content-Signal: search=yes, ai-input=no, ai-train=no

This allows crawlers to access your content for search indexing while blocking AI training and RAG/grounding uses. Both can be used together for broader coverage.

AI directives rely on voluntary compliance. Crawlers can ignore them—combine with User-agent blocks for stronger protection.

Block Search, Allow Social Sharing

For private sites where you still want link previews:

# Block search engines
User-agent: Googlebot
User-agent: Bingbot
Disallow: /

# Allow social link preview crawlers
User-agent: facebookexternalhit
User-agent: Twitterbot
User-agent: Slackbot
Allow: /

Optimize Crawl Budget for Large Sites

If you have 10,000+ pages, block low-value URLs to focus crawl budget on important content:

User-agent: *
# Block internal search results
Disallow: /search?
# Block infinite scroll pagination
Disallow: /*?page=
# Block filtered/sorted product pages
Disallow: /products?*sort=
Disallow: /products?*filter=
# Block print versions
Disallow: /*/print

Sites under 1,000 pages don't need crawl budget optimization.

Using Nuxt?

If you're using Nuxt, check out Nuxt SEO which handles much of this automatically.

Learn more about robots.txt in Nuxt →