Control Web Crawlers and Crawl Budget in Nuxt · Nuxt SEO

-
-
-
-

[1.4K](https://github.com/harlan-zw/nuxt-seo)

[Nuxt SEO on GitHub](https://github.com/harlan-zw/nuxt-seo)

Learn SEO

Master search optimization

Nuxt

 Vue

-
-
-
-
-
-
-

-
-
-
-
-
-
-

-
-
-
-
-
-
-
-
-
-
-

-
-
-
-
-
-
-
-
-

1.
2.
3.

# Control Web Crawlers and Crawl Budget in Nuxt

Manage how search engines crawl and index your Nuxt app. Configure robots.txt, sitemaps, canonical URLs, and redirects for better SEO.

[![Harlan Wilton](https://avatars.githubusercontent.com/u/5326365?v=4)Harlan Wilton](https://x.com/harlan-zw)8 mins read Published Nov 3, 2024 Updated Jan 29, 2026

Web crawlers determine what gets indexed and how often. Controlling them affects your [crawl budget](https://developers.google.com/search/docs/crawling-indexing/large-site-managing-crawl-budget): the number of pages Google will crawl on your site in a given timeframe.

In 2026, crawler control extends to **AI Bot Governance** and the **Agentic Web**. You need to guide LLMs to your best content while protecting your data from unauthorized training.

Most sites don't need to worry about crawl budget. But if you have 10,000+ pages, frequently updated content, or want to manage how AI agents consume your data, crawler control matters.

## [Types of Crawlers](#types-of-crawlers)

**Search engines**: Index your pages for search results

- [Googlebot](https://developers.google.com/search/docs/advanced/crawling/overview-google-crawlers) (28% of bot traffic)
- [Bingbot](https://ahrefs.com/seo/glossary/bingbot)

**Social platforms**: Generate link previews when shared

- [FacebookExternalHit](https://developers.facebook.com/docs/sharing/webmasters/web-crawlers/)
- Twitterbot, Slackbot, Discordbot

**AI training & Inference**: Scrape content for model training or real-time answering

- [GPTBot](https://platform.openai.com/docs/bots/overview-of-openai-crawlers) (OpenAI training)
- [ClaudeBot](https://www.anthropic.com) (Anthropic training)
- [PerplexityBot](https://www.perplexity.ai) (Real-time AI search)
- [Google-Extended](https://developers.google.com/search/docs/crawling-indexing/overview-google-crawlers) (Google's AI training)

**Agentic AI**: Bots that perform actions on behalf of users (e.g., booking, shopping)

- [OpenAI-GPT-4o](https://openai.com)
- [Claude-3.5-Sonnet](https://anthropic.com)

**Malicious**: Ignore robots.txt, spoof user agents, scan for vulnerabilities. Block these at the

, not with robots.txt.

## [Control Mechanisms](#control-mechanisms)

| Mechanism | Use When |
| --- | --- |
|  | Block site sections, manage crawl budget, block AI crawlers |
|  | Help crawlers discover pages, especially on large sites |
|  | Control indexing per page (noindex, nofollow) |
|  | Consolidate duplicate content, handle URL parameters |
|  | Preserve SEO when moving/deleting pages |
|  | Guide AI tools to your documentation (via nuxt-llms) |
| [X-Robots-Tag](https://developers.google.com/search/docs/crawling-indexing/robots-meta-tag#xrobotstag) | Control non-HTML files (PDFs, images) |
|  | Block malicious bots at network level |

## [Quick Recipes](#quick-recipes)

**Block page from indexing**:

pages/admin.vue

```
<script setup lang="ts">
useSeoMeta({ robots: 'noindex, follow' })
</script>
```

**Block AI training bots**:

public/robots.txt

```
User-agent: GPTBot
User-agent: ClaudeBot
User-agent: CCBot
Disallow: /
```

**Fix duplicate content**:

pages/products/[id].vue

```
<script setup lang="ts">
const route = useRoute()
useHead({
  link: [{ rel: 'canonical', href: \`https://mysite.com/products/${route.params.id}\` }]
})
</script>
```

**Redirect moved page**:

nuxt.config.ts

```
export default defineNuxtConfig({
  routeRules: {
    '/old-url': { redirect: { to: '/new-url', statusCode: 301 } }
  }
})
```

## [When Crawler Control Matters](#when-crawler-control-matters)

Most small sites don't need to optimize crawler behavior. But it matters when:

**Crawl budget concerns**: Sites with 10,000+ pages need Google to prioritize important content. Block low-value pages (search results, filtered products, admin areas) so crawlers focus on what matters.

**Duplicate content**: URLs like `/about` and `/about/` compete against each other. Same with `?sort=price` variations.

 consolidate these.

**Staging environments**: Search engines index any public site they find. Block staging/dev environments in

 to avoid duplicate content issues.

**AI training opt-out**: [GPTBot was the most-blocked crawler in 2024](https://blog.cloudflare.com/from-googlebot-to-gptbot-whos-crawling-your-site-in-2025/). Block AI training bots without affecting search rankings.

**Server costs**: Bots consume CPU. Heavy pages (maps, infinite scroll, SSR) cost money per request. Blocking unnecessary crawlers reduces load.

## [Nuxt SEO Modules](#nuxt-seo-modules)

Nuxt handles crawler control through dedicated modules. Install once, configure in `nuxt.config.ts`, and forget about it.

---

On this page

- [Types of Crawlers](#types-of-crawlers)
- [Control Mechanisms](#control-mechanisms)
- [Quick Recipes](#quick-recipes)
- [When Crawler Control Matters](#when-crawler-control-matters)
- [Nuxt SEO Modules](#nuxt-seo-modules)

[GitHub](https://github.com/harlan-zw/nuxt-seo) [ Discord](https://discord.com/invite/275MBUBvgP)

###

-
-

Modules

-
-
-
-
-
-
-
-
-

###

-
-
-

###

Nuxt

-
-
-
-
-

Vue

-
-
-
-
-
-
-
-

###

-
-
-
-
-
-
-
-
-
-

Copyright © 2023-2026 Harlan Wilton - [MIT License](https://github.com/harlan-zw/nuxt-seo/blob/main/license) · [mdream](https://mdream.dev)