Robots.txt in Vue · Nuxt SEO

-
-
-
-

[1.4K](https://github.com/harlan-zw/nuxt-seo)

[Nuxt SEO on GitHub](https://github.com/harlan-zw/nuxt-seo)

Learn SEO

Master search optimization

Nuxt

 Vue

-
-
-
-
-
-
-

-
-
-
-
-
-
-

-
-
-

-
-
-
-
-
-
-
-
-
-
-

-
-
-

-
-
-
-
-
-
-
-
-

1.
2.
3.
4.
5.

# Robots.txt in Vue

Robots.txt tells crawlers what they can access. Here's how to set it up in Vue.

[![Harlan Wilton](https://avatars.githubusercontent.com/u/5326365?v=4)Harlan Wilton](https://x.com/harlan-zw)10 mins read Published Nov 3, 2024 Updated Jan 29, 2026

What you'll learn

- robots.txt is advisory. crawlers can ignore it, so never use it for security
- Primary uses: crawl budget optimization and blocking AI training bots
- Distinguish between AI training bots (blockable) and AI search bots (needed for traffic)

The `robots.txt` file controls which parts of your site crawlers can access. [Officially adopted as RFC 9309](https://datatracker.ietf.org/doc/html/rfc9309) in September 2022 after 28 years as a de facto standard, it's primarily used to [manage crawl budget](https://developers.google.com/search/docs/crawling-indexing/large-site-managing-crawl-budget) on large sites and block AI training bots.

Robots.txt is not a security mechanism. [crawlers can ignore it](https://developer.mozilla.org/en-US/docs/Web/Security/Practical_implementation_guides/Robots_txt). For individual page control, use

 instead.

## [Quick Setup](#quick-setup)

To get started quickly with a static `robots.txt`, add the file in your public directory:

```
public/
  robots.txt
```

Add your rules:

robots.txt

```
# Allow all crawlers
User-agent: *
Disallow:

# Optionally point to your sitemap
Sitemap: https://mysite.com/sitemap.xml
```

### [Dynamic robots.txt](#dynamic-robotstxt)

For environment-specific rules (e.g., blocking all crawlers in staging), generate `robots.txt` server-side:

Express

Vite

H3

```
import express from 'express'

const app = express()

app.get('/robots.txt', (req, res) => {
  const isDev = process.env.NODE_ENV !== 'production'
  const robots = isDev
    ? 'User-agent: *\nDisallow: /'
    : 'User-agent: *\nDisallow:\nSitemap: https://mysite.com/sitemap.xml'
  res.type('text/plain').send(robots)
})
```

```
// server.js for Vite SSR
import express from 'express'

const app = express()

app.use((req, res, next) => {
  if (req.path === '/robots.txt') {
    const isDev = process.env.NODE_ENV !== 'production'
    const robots = isDev
      ? 'User-agent: *\nDisallow: /'
      : 'User-agent: *\nDisallow:\nSitemap: https://mysite.com/sitemap.xml'
    return res.type('text/plain').send(robots)
  }
  next()
})
```

```
import { defineEventHandler, setHeader } from 'h3'

export default defineEventHandler((event) => {
  if (event.path === '/robots.txt') {
    const isDev = process.env.NODE_ENV !== 'production'
    const robots = isDev
      ? 'User-agent: *\nDisallow: /'
      : 'User-agent: *\nDisallow:\nSitemap: https://mysite.com/sitemap.xml'
    setHeader(event, 'Content-Type', 'text/plain')
    return robots
  }
})
```

## [Robots.txt Syntax](#robotstxt-syntax)

The `robots.txt` file consists of directives grouped by user agent. Google [uses the most specific matching rule](https://developers.google.com/search/docs/crawling-indexing/robots/robots_txt) based on path length:

robots.txt

```
# Define which crawler these rules apply to
User-agent: *

# Block access to specific paths
Disallow: /admin

# Allow access to specific paths (optional, more specific than Disallow)
Allow: /admin/public

# Point to your sitemap
Sitemap: https://mysite.com/sitemap.xml
```

### [User-agent](#user-agent)

The `User-agent` directive specifies which crawler the rules apply to:

robots.txt

```
# All crawlers
User-agent: *

# Just Googlebot
User-agent: Googlebot

# Multiple specific crawlers
User-agent: Googlebot
User-agent: Bingbot
Disallow: /private
```

Common crawler user agents (2026):

- [Googlebot](https://developers.google.com/search/docs/advanced/crawling/overview-google-crawlers): Google's search crawler
- [Bingbot](https://ahrefs.com/seo/glossary/bingbot): Microsoft's search crawler
- [Applebot](https://support.apple.com/en-us/106381): Apple's search crawler
- [GPTBot](https://platform.openai.com/docs/bots/overview-of-openai-crawlers): OpenAI's training crawler
- [OAI-SearchBot](https://platform.openai.com/docs/bots/oai-searchbot): OpenAI's search crawler (for ChatGPT Search)
- [ClaudeBot](https://support.anthropic.com/en/articles/9906653-claude-bot-and-crawling): Anthropic's training crawler
- [Applebot-Extended](https://support.apple.com/en-us/119829): Apple's AI training crawler

### [Allow / Disallow](#allow-disallow)

The `Allow` and `Disallow` directives control path access:

robots.txt

```
User-agent: *
# Block all paths starting with /admin
Disallow: /admin

# Block a specific file
Disallow: /private.html

# Block files with specific extensions
Disallow: /*.pdf$

# Block URL parameters
Disallow: /*?*
```

Wildcards supported ([RFC 9309](https://datatracker.ietf.org/doc/html/rfc9309)):

- `*`: matches zero or more characters
- `$`: matches the end of the URL
- Paths are case-sensitive and relative to domain root

### [Sitemap](#sitemap)

The `Sitemap` directive tells crawlers where to find your

:

robots.txt

```
Sitemap: https://mysite.com/sitemap.xml

# Multiple sitemaps
Sitemap: https://mysite.com/products-sitemap.xml
Sitemap: https://mysite.com/blog-sitemap.xml
```

### [Crawl-Delay (Non-Standard)](#crawl-delay-non-standard)

`Crawl-Delay` is not part of [RFC 9309](https://datatracker.ietf.org/doc/html/rfc9309). Google ignores it. Bing and Yandex support it:

robots.txt

```
User-agent: Bingbot
Crawl-delay: 10  # seconds between requests
```

For Google, you [manage crawl rate in Search Console](https://developers.google.com/search/docs/crawling-indexing/large-site-managing-crawl-budget).

## [Security: Why robots.txt Fails](#security-why-robotstxt-fails)

[Robots.txt is not a security mechanism](https://developer.mozilla.org/en-US/docs/Web/Security/Practical_implementation_guides/Robots_txt). Malicious crawlers ignore it, and listing paths in `Disallow` [reveals their location to attackers](https://www.searchenginejournal.com/robots-txt-security-risks/289719/).

**Common mistake:**

```
# ❌ Advertises your admin panel location
User-agent: *
Disallow: /admin
Disallow: /wp-admin
Disallow: /api/internal
```

Never use robots.txt to hide sensitive content. Listing paths in Disallow advertises their location to attackers, and malicious bots ignore robots.txt entirely. Use authentication and proper access controls instead.

Use [proper authentication](https://developers.google.com/search/docs/crawling-indexing/block-indexing) instead. See our

 for details.

## [Crawling vs Indexing](#crawling-vs-indexing)

Blocking a URL in `robots.txt` prevents crawling but [doesn't prevent indexing](https://developers.google.com/search/docs/crawling-indexing/robots/intro). If other sites link to the URL, Google can still index it without crawling, showing the URL with no snippet.

To prevent indexing:

- Use (requires allowing crawl)
- Use password protection or authentication
- Return 404/410 status codes

Don't block pages with `noindex` in `robots.txt`. Google can't see the tag if it can't crawl.

## [Common Mistakes](#common-mistakes)

### [1. Blocking JavaScript and CSS](#_1-blocking-javascript-and-css)

[Google needs JavaScript and CSS to render pages](https://developers.google.com/search/docs/crawling-indexing/javascript/javascript-seo-basics). Blocking them breaks indexing:

robots.txt

```
# ❌ Prevents Google from rendering your Vue app
User-agent: *
Disallow: /assets/
Disallow: /*.js$
Disallow: /*.css$
```

Vue apps are JavaScript-heavy. Never block `.js`, `.css`, or `/assets/` from Googlebot.

### [2. Blocking Dev Sites in Production](#_2-blocking-dev-sites-in-production)

Copy-pasting a dev `robots.txt` to production blocks all crawlers:

robots.txt

```
# ❌ Accidentally left from staging
User-agent: *
Disallow: /
```

Use [dynamic generation](#dynamic-robots-txt) or environment checks to avoid this.

### [3. Confusing robots.txt with noindex](#_3-confusing-robotstxt-with-noindex)

Blocking pages doesn't remove them from search results. Use

 for that.

## [Testing Your robots.txt](#testing-your-robotstxt)

1. Check syntax: Visit `https://yoursite.com/robots.txt` to confirm it loads
2. [Google Search Console robots.txt tester](https://search.google.com/search-console/robots-txt) validates syntax and tests URLs
3. Verify crawlers can access: Check server logs for 200 status on `/robots.txt`

## [Common Patterns](#common-patterns)

### [Allow Everything (Default)](#allow-everything-default)

```
User-agent: *
Disallow:
```

### [Block Everything](#block-everything)

Useful for staging or development environments.

```
User-agent: *
Disallow: /
```

See our

 for more on environment protection.

### [Block AI Training Crawlers](#block-ai-training-crawlers)

Blocking AI training bots is a common practice in 2026. This prevents models from using your content for training but doesn't affect your appearance in search results.

```
# Block AI model training
User-agent: GPTBot
User-agent: ClaudeBot
User-agent: Applebot-Extended
User-agent: Google-Extended
User-agent: CCBot
Disallow: /
```

Be careful not to block **Search Bots** like `OAI-SearchBot` or `Claude-SearchBot` (unless you want to be invisible in their search products). Blocking `GPTBot` is safe for search visibility; blocking `OAI-SearchBot` removes you from ChatGPT Search.

### [AI Directives: Content-Usage & Content-Signal](#ai-directives-content-usage-content-signal)

Two emerging standards let you express preferences about how AI systems use your content. without blocking crawlers entirely:

- **[Content-Usage](https://ietf-wg-aipref.github.io/drafts/draft-ietf-aipref-vocab.html)** (IETF): Uses `y`/`n` values for `train-ai`
- **[Content-Signal](https://contentsignals.org/)** (Cloudflare): Uses `yes`/`no` values for `search`, `ai-input`, `ai-train`

```
User-agent: *
Allow: /

# IETF aipref-vocab
Content-Usage: train-ai=n

# Cloudflare Content Signals
Content-Signal: search=yes, ai-input=no, ai-train=no
```

This allows crawlers to access your content for search indexing while blocking AI training and RAG/grounding uses. You can use both together for broader coverage.

AI directives rely on voluntary compliance. Crawlers can ignore them. combine with User-agent blocks for stronger protection.

### [Block Search, Allow Social Sharing](#block-search-allow-social-sharing)

For private sites where you still want

:

```
# Block search engines
User-agent: Googlebot
User-agent: Bingbot
Disallow: /

# Allow social link preview crawlers
User-agent: facebookexternalhit
User-agent: Twitterbot
User-agent: Slackbot
Allow: /
```

### [Optimize Crawl Budget for Large Sites](#optimize-crawl-budget-for-large-sites)

If you have 10,000+ pages, [block low-value URLs](https://developers.google.com/search/docs/crawling-indexing/large-site-managing-crawl-budget) to focus crawl budget on important content:

```
User-agent: *
# Block internal search results
Disallow: /search?
# Block infinite scroll pagination
Disallow: /*?page=
# Block filtered/sorted product pages
Disallow: /products?*sort=
Disallow: /products?*filter=
# Block print versions
Disallow: /*/print
```

Sites under 1,000 pages don't need crawl budget optimization.

## [Using Nuxt?](#using-nuxt)

If you're using Nuxt, check out

 which handles much of this automatically.

---

On this page

- [Quick Setup](#quick-setup)
- [Robots.txt Syntax](#robotstxt-syntax)
- [Security: Why robots.txt Fails](#security-why-robotstxt-fails)
- [Crawling vs Indexing](#crawling-vs-indexing)
- [Common Mistakes](#common-mistakes)
- [Testing Your robots.txt](#testing-your-robotstxt)
- [Common Patterns](#common-patterns)
- [Using Nuxt?](#using-nuxt)

[GitHub](https://github.com/harlan-zw/nuxt-seo) [ Discord](https://discord.com/invite/275MBUBvgP)

###

-
-

Modules

-
-
-
-
-
-
-
-
-

###

-
-
-

###

Nuxt

-
-
-
-
-

Vue

-
-
-
-
-
-
-
-

###

-
-
-
-
-
-
-
-
-
-

Copyright © 2023-2026 Harlan Wilton - [MIT License](https://github.com/harlan-zw/nuxt-seo/blob/main/license) · [mdream](https://mdream.dev)