Robots.txt in Vue & Nuxt

Learn why, when and how to implement a best practice robots.txt in your Vue and Nuxt apps.

Harlan Wilton

8 mins read

Introduction
Quick Setup
Understanding robots.txt
Testing
Common Patterns
Related

Last Updated November 3, 2024

Published November 3, 2024

Introduction

The robots.txt file lives in your root web directory is a common way to control how crawlers access your site.

✅ Good for:

Blocking large site sections (e.g., /admin/*)
Managing crawler bandwidth on heavy pages (e.g., search, infinite scroll)
Preventing crawling of development sites

❌ Don't use for:

Protecting sensitive data (crawlers can ignore rules)
Individual page indexing (use meta robots instead)
Removing existing pages from search results

Implementing robots.txt is straightforward, you can either create a static file or a dynamic one for Vue / Nuxt applications.

Quick Setup

To get started quickly with a static robots.txt, add the file in your public directory:

public/
  robots.txt

Add your rules:

robots.txt

# Allow all crawlers
User-agent: *
Disallow:

# Optionally point to your sitemap
Sitemap: https://mysite.com/sitemap.xml

Dynamic Implementation

In some cases, you may prefer a dynamic robots.txt file. That is, we server-side generate the file based on the environment or other factors.

// example using Vite SSR
function createServer() {
  const app = express()
  // ..
  app.get('/robots.txt', (req, res) => {
    const robots = `
      User-agent: *
      Disallow: /admin
    `
    res.type('text/plain').send(robots)
  })
  // ..
}

import { getRequestHost } from 'h3'

// server/routes/robots.txt.ts
export default defineEventHandler((e) => {
  const host = getRequestHost(e)
  return host.includes('staging')
    ? 'User-agent: *\nDisallow: /'
    : 'User-agent: *\nDisallow:'
})

Using Nuxt? The Nuxt Robots module can handle this automatically.

Robots v5.4.0

6.4M

486

Tame the robots crawling and indexing your site with ease.

Understanding robots.txt

The robots.txt file consists of these main directives:

robots.txt

# Define which crawler these rules apply to
User-agent: *

# Block access to specific paths
Disallow: /admin

# Allow access to specific paths (optional)
Allow: /admin/public

# Point to your sitemap
Sitemap: https://mysite.com/sitemap.xml

User-agent

The User-agent directive specifies which crawler the rules apply to:

robots.txt

# All crawlers
User-agent: *

# Just Googlebot
User-agent: Googlebot

# Multiple specific crawlers
User-agent: Googlebot
User-agent: Bingbot
Disallow: /private

Common crawler user agents:

Googlebot: Google's crawler
Bingbot: Microsoft's crawler
FacebookExternalHit: Facebook's crawler
GPTBot: OpenAI's crawler
Claude-Web: Anthropic's crawler

Allow / Disallow

The Allow and Disallow directives control path access:

robots.txt

User-agent: *
# Block all paths starting with /admin
Disallow: /admin

# Block a specific file
Disallow: /private.html

# Block files with specific extensions
Disallow: /*.pdf$

# Block URL parameters
Disallow: /*?*

Path matching uses simple pattern matching:

* matches any sequence of characters
$ matches the end of the URL
Paths are relative to the root domain

Sitemap

The Sitemap directive tells crawlers where to find your sitemap.xml:

robots.txt

Sitemap: https://mysite.com/sitemap.xml

# Multiple sitemaps
Sitemap: https://mysite.com/products-sitemap.xml
Sitemap: https://mysite.com/blog-sitemap.xml

Yandex Directives

The Yandex search engine introduced additional directives, of which only Clean-Param is useful.

Clean-Param: Removes URL parameters from the URL before crawling
Host: Specifies the host name of the site (unused)
Crawl-Delay: Specifies the delay between requests (unused)

If you need to use this, you should target the Yandex user agent:

robots.txt

# Remove URL parameters
User-Agent: Yandex
Clean-Param: param1 param2

Security Considerations

robots.txt is publicly visible - avoid revealing sensitive URL patterns
Not all crawlers follow the rules - see our security guide

SEO Impact

Blocking search crawlers prevents indexing but doesn't remove existing pages
For page-level control, use meta robots tags instead
Blocked resources can affect page rendering and SEO

Common Mistakes

Blocking CSS/JS/Assets

robots.txt

# ❌ May break page rendering
User-agent: *
Disallow: /assets
Disallow: /css

Using robots.txt for Authentication

robots.txt

# ❌ Not secure
User-agent: *
Disallow: /admin

Blocking Site Features

robots.txt

# ❌ Better to use meta robots
User-agent: *
Disallow: /search

Testing

Using Google's Tools

Visit Google's robots.txt Tester
Add your site
Test specific URLs

Common Patterns

Allow Everything (Default)

User-agent: *
Disallow:

Block Everything

Useful for staging or development environments.

User-agent: *
Disallow: /

See our security guide for more on environment protection.

Block AI Crawlers

User-agent: GPTBot
User-agent: Claude-Web
User-agent: CCBot
User-agent: Google-Extended
Disallow: /

# Block search engines
User-agent: Googlebot
User-agent: Bingbot
Disallow: /

# Allow social crawlers
User-agent: facebookexternalhit
User-agent: Twitterbot
Allow: /

Block Heavy Pages

User-agent: *
# Block search results
Disallow: /search
# Block filter pages
Disallow: /products?*
# Block print pages
Disallow: /*/print

Meta Robots Guide - Page-level crawler control
Sitemaps Guide - Telling crawlers about your pages
Security Guide - Protecting from malicious crawlers

Edit this page

Controller Crawlers

Learn how to effectively manage web crawlers in Vue and Nuxt applications to optimize SEO and protect your content.

Sitemaps

Learn how to create and maintain sitemaps in Vue and Nuxt applications.