Last Updated
Published

Introduction

The robots.txt file lives in your root web directory is a common way to control how crawlers access your site.

✅ Good for:

  • Blocking large site sections (e.g., /admin/*)
  • Managing crawler bandwidth on heavy pages (e.g., search, infinite scroll)
  • Preventing crawling of development sites

❌ Don't use for:

  • Protecting sensitive data (crawlers can ignore rules)
  • Individual page indexing (use meta robots instead)
  • Removing existing pages from search results

Implementing robots.txt is straightforward, you can either create a static file or a dynamic one for Vue / Nuxt applications.

Quick Setup

To get started quickly with a static robots.txt, add the file in your public directory:

public/
  robots.txt

Add your rules:

robots.txt
# Allow all crawlers
User-agent: *
Disallow:

# Optionally point to your sitemap
Sitemap: https://mysite.com/sitemap.xml

Dynamic Implementation

In some cases, you may prefer a dynamic robots.txt file. That is, we server-side generate the file based on the environment or other factors.

// example using Vite SSR
function createServer() {
  const app = express()
  // ..
  app.get('/robots.txt', (req, res) => {
    const robots = `
      User-agent: *
      Disallow: /admin
    `
    res.type('text/plain').send(robots)
  })
  // ..
}

Using Nuxt? The Nuxt Robots module can handle this automatically.

Robots v4.1.11
1.1M
424
Tame the robots crawling and indexing your site with ease.

Understanding robots.txt

The robots.txt file consists of these main directives:

robots.txt
# Define which crawler these rules apply to
User-agent: *

# Block access to specific paths
Disallow: /admin

# Allow access to specific paths (optional)
Allow: /admin/public

# Point to your sitemap
Sitemap: https://mysite.com/sitemap.xml

User-agent

The User-agent directive specifies which crawler the rules apply to:

robots.txt
# All crawlers
User-agent: *

# Just Googlebot
User-agent: Googlebot

# Multiple specific crawlers
User-agent: Googlebot
User-agent: Bingbot
Disallow: /private

Common crawler user agents:

Allow / Disallow

The Allow and Disallow directives control path access:

robots.txt
User-agent: *
# Block all paths starting with /admin
Disallow: /admin

# Block a specific file
Disallow: /private.html

# Block files with specific extensions
Disallow: /*.pdf$

# Block URL parameters
Disallow: /*?*

Path matching uses simple pattern matching:

  • * matches any sequence of characters
  • $ matches the end of the URL
  • Paths are relative to the root domain

Sitemap

The Sitemap directive tells crawlers where to find your sitemap.xml:

robots.txt
Sitemap: https://mysite.com/sitemap.xml

# Multiple sitemaps
Sitemap: https://mysite.com/products-sitemap.xml
Sitemap: https://mysite.com/blog-sitemap.xml

Yandex Directives

The Yandex search engine introduced additional directives, of which only Clean-Param is useful.

  • Clean-Param: Removes URL parameters from the URL before crawling
  • Host: Specifies the host name of the site (unused)
  • Crawl-Delay: Specifies the delay between requests (unused)

If you need to use this, you should target the Yandex user agent:

robots.txt
# Remove URL parameters
User-Agent: Yandex
Clean-Param: param1 param2

Security Considerations

  • robots.txt is publicly visible - avoid revealing sensitive URL patterns
  • Not all crawlers follow the rules - see our security guide

SEO Impact

  • Blocking search crawlers prevents indexing but doesn't remove existing pages
  • For page-level control, use meta robots tags instead
  • Blocked resources can affect page rendering and SEO

Common Mistakes

  1. Blocking CSS/JS/Assets
robots.txt
# ❌ May break page rendering
User-agent: *
Disallow: /assets
Disallow: /css
  1. Using robots.txt for Authentication
robots.txt
# ❌ Not secure
User-agent: *
Disallow: /admin
  1. Blocking Site Features
robots.txt
# ❌ Better to use meta robots
User-agent: *
Disallow: /search

Testing

Using Google's Tools

  1. Visit Google's robots.txt Tester
  2. Add your site
  3. Test specific URLs

Common Patterns

Allow Everything (Default)

User-agent: *
Disallow:

Block Everything

Useful for staging or development environments.

User-agent: *
Disallow: /

See our security guide for more on environment protection.

Block AI Crawlers

User-agent: GPTBot
User-agent: Claude-Web
User-agent: CCBot
User-agent: Google-Extended
Disallow: /

Block Search While Allowing Social

# Block search engines
User-agent: Googlebot
User-agent: Bingbot
Disallow: /

# Allow social crawlers
User-agent: facebookexternalhit
User-agent: Twitterbot
Allow: /

Block Heavy Pages

User-agent: *
# Block search results
Disallow: /search
# Block filter pages
Disallow: /products?*
# Block print pages
Disallow: /*/print