Last Updated
Published

Introduction

The robots.txt file lives in your root web directory is a common way to control how crawlers access your site.

✅ Good for:

  • Blocking large site sections (e.g., /admin/*)
  • Managing crawler bandwidth on heavy pages (e.g., search, infinite scroll)
  • Preventing crawling of development sites

❌ Don't use for:

  • Protecting sensitive data (crawlers can ignore rules)
  • Individual page indexing (use meta robots instead)
  • Removing existing pages from search results

Implementing robots.txt is straightforward, you can either create a static file or a dynamic one for Vue / Nuxt applications.

Quick Setup

To get started quickly with a static robots.txt, add the file in your public directory:

public/
  robots.txt

Add your rules:

robots.txt
# Allow all crawlers
User-agent: *
Disallow:

# Optionally point to your sitemap
Sitemap: https://mysite.com/sitemap.xml

Dynamic Implementation

In some cases, you may prefer a dynamic robots.txt file. That is, we server-side generate the file based on the environment or other factors.

// example using Vite SSR
function createServer() {
  const app = express()
  // ..
  app.get('/robots.txt', (req, res) => {
    const robots = `
      User-agent: *
      Disallow: /admin
    `
    res.type('text/plain').send(robots)
  })
  // ..
}

Using Nuxt? The Nuxt Robots module can handle this automatically.

Understanding robots.txt

The robots.txt file consists of these main directives:

robots.txt
# Define which crawler these rules apply to
User-agent: *

# Block access to specific paths
Disallow: /admin

# Allow access to specific paths (optional)
Allow: /admin/public

# Point to your sitemap
Sitemap: https://mysite.com/sitemap.xml

User-agent

The User-agent directive specifies which crawler the rules apply to:

robots.txt
# All crawlers
User-agent: *

# Just Googlebot
User-agent: Googlebot

# Multiple specific crawlers
User-agent: Googlebot
User-agent: Bingbot
Disallow: /private

Common crawler user agents:

Allow / Disallow

The Allow and Disallow directives control path access:

robots.txt
User-agent: *
# Block all paths starting with /admin
Disallow: /admin

# Block a specific file
Disallow: /private.html

# Block files with specific extensions
Disallow: /*.pdf$

# Block URL parameters
Disallow: /*?*

Path matching uses simple pattern matching:

  • * matches any sequence of characters
  • $ matches the end of the URL
  • Paths are relative to the root domain

Sitemap

The Sitemap directive tells crawlers where to find your sitemap.xml:

robots.txt
Sitemap: https://mysite.com/sitemap.xml

# Multiple sitemaps
Sitemap: https://mysite.com/products-sitemap.xml
Sitemap: https://mysite.com/blog-sitemap.xml

Yandex Directives

The Yandex search engine introduced additional directives, of which only Clean-Param is useful.

  • Clean-Param: Removes URL parameters from the URL before crawling
  • Host: Specifies the host name of the site (unused)
  • Crawl-Delay: Specifies the delay between requests (unused)

If you need to use this, you should target the Yandex user agent:

robots.txt
# Remove URL parameters
User-Agent: Yandex
Clean-Param: param1 param2

Security Considerations

  • robots.txt is publicly visible - avoid revealing sensitive URL patterns
  • Not all crawlers follow the rules - see our security guide

SEO Impact

  • Blocking search crawlers prevents indexing but doesn't remove existing pages
  • For page-level control, use meta robots tags instead
  • Blocked resources can affect page rendering and SEO

Common Mistakes

  1. Blocking CSS/JS/Assets
robots.txt
# ❌ May break page rendering
User-agent: *
Disallow: /assets
Disallow: /css
  1. Using robots.txt for Authentication
robots.txt
# ❌ Not secure
User-agent: *
Disallow: /admin
  1. Blocking Site Features
robots.txt
# ❌ Better to use meta robots
User-agent: *
Disallow: /search

Testing

Using Google's Tools

  1. Visit Google's robots.txt Tester
  2. Add your site
  3. Test specific URLs

Common Patterns

Allow Everything (Default)

User-agent: *
Disallow:

Block Everything

Useful for staging or development environments.

User-agent: *
Disallow: /

See our security guide for more on environment protection.

Block AI Crawlers

User-agent: GPTBot
User-agent: Claude-Web
User-agent: CCBot
User-agent: Google-Extended
Disallow: /

Block Search While Allowing Social

# Block search engines
User-agent: Googlebot
User-agent: Bingbot
Disallow: /

# Allow social crawlers
User-agent: facebookexternalhit
User-agent: Twitterbot
Allow: /

Block Heavy Pages

User-agent: *
# Block search results
Disallow: /search
# Block filter pages
Disallow: /products?*
# Block print pages
Disallow: /*/print