Guides

Config using Robots.txt

Last updated by
Harlan Wilton
in fix: support `Content-Signal` Fixes #487.

Introduction

The robots.txt standard is important for search engines to understand which pages to crawl and index on your site.

New to robots.txt? Check out the Robots.txt Guide to learn more.

To match closer to the robots standard, Nuxt Robots recommends configuring the module by using a robots.txt, which will be parsed, validated, configuring the module.

If you need programmatic control, you can configure the module using nuxt.config.ts, Route Rules and Nitro hooks.

Creating a robots.txt file

You can place your file in any location; the easiest is to use: <rootDir>/public/_robots.txt.

Additionally, the following paths are supported by default:

Example File Structure
# root directory
robots.txt
# asset folders
assets/
├── robots.txt
# pages folder
pages/
├── robots.txt
├── _dir/
   └── robots.txt
# public folder
public/
├── _robots.txt
├── _dir/
   └── robots.txt

Custom paths

If you find this too restrictive, you can use the mergeWithRobotsTxtPath config to load your robots.txt file from any path.

export default defineNuxtConfig({
  robots: {
    mergeWithRobotsTxtPath: 'assets/custom/robots.txt'
  }
})

Parsed robots.txt

The following rules are parsed from your robots.txt file:

  • User-agent - The user-agent to apply the rules to.
  • Disallow - An array of paths to disallow for the user-agent.
  • Allow - An array of paths to allow for the user-agent.
  • Sitemap - An array of sitemap URLs to include in the generated sitemap.
  • Content-Usage / Content-Signal - Directives for expressing AI usage preferences (see Content Signals below).

This parsed data will be shown for environments that are indexable.

Content Signals

Content Signals allow you to express preferences about how AI systems should interact with your content. Both Content-Usage and Content-Signal directives are supported:

Content-Usage (IETF Standard)

The Content-Usage directive follows the IETF AI Preferences specification:

robots.txt
User-agent: *
Allow: /
Content-Usage: ai=n
Content-Usage: /public/ train-ai=y
Content-Usage: /restricted/ ai=n train-ai=n

Content-Signal (Cloudflare Implementation)

The Content-Signal directive is Cloudflare's implementation, widely deployed across millions of domains:

robots.txt
User-agent: *
Allow: /
Content-Signal: ai-train=no, search=yes, ai-input=yes

Both directives are parsed identically and output as Content-Usage in the generated robots.txt. Use whichever format matches your preferences or existing tooling.

Conflicting public/robots.txt

To ensure other modules can integrate with your generated robots file, you must not have a robots.txt file in your public folder.

If you do, it will be moved to <rootDir>/public/_robots.txt and merged with the generated file.

Did this page help you?