---
title: "Control Web Crawlers and Crawl Budget in Vue · Nuxt SEO"
meta:
  "og:description": "Manage how search engines crawl and index your Vue app. Configure robots.txt, sitemaps, canonical URLs, and redirects for better SEO."
  "og:title": "Control Web Crawlers and Crawl Budget in Vue · Nuxt SEO"
  author: "Harlan Wilton"
  description: "Manage how search engines crawl and index your Vue app. Configure robots.txt, sitemaps, canonical URLs, and redirects for better SEO."
---

# **Control Web Crawlers and Crawl Budget in Vue**

Manage how search engines crawl and index your Vue app. Configure robots.txt, sitemaps, canonical URLs, and redirects for better SEO.

[![Harlan Wilton](https://avatars.githubusercontent.com/u/5326365?v=4)Harlan Wilton](https://x.com/harlan-zw)8 mins read Published **Nov 3, 2024** Updated **Jan 29, 2026**

**What you'll learn**

- robots.txt is advisory. malicious crawlers ignore it, use firewalls for real security
- Sites under 10,000 pages rarely need crawl budget optimization
- Block AI training (Applebot-Extended, GPTBot) without losing AI search traffic (OAI-SearchBot)

Web crawlers determine what gets indexed and how often. Controlling them affects your [**crawl budget**](https://developers.google.com/search/docs/crawling-indexing/large-site-managing-crawl-budget). the number of pages Google will crawl on your site in a given timeframe.

Most sites don't need to worry about crawl budget. But if you have 10,000+ pages, frequently updated content, or want to block AI training bots, crawler control matters.

## [Types of Crawlers](#types-of-crawlers)

**Search engines** . Index your pages for search results

- [**Googlebot**](https://developers.google.com/search/docs/advanced/crawling/overview-google-crawlers) (28% of bot traffic)
- [**Bingbot**](https://ahrefs.com/seo/glossary/bingbot)
- [**Applebot**](https://support.apple.com/en-us/106381) (Used for Spotlight and Siri suggestions)

**Social platforms** . Generate link previews when shared

- [**FacebookExternalHit**](https://developers.facebook.com/docs/sharing/webmasters/web-crawlers/)
- Twitterbot, Slackbot, Discordbot

**AI training** . Scrape content for model training

- [**GPTBot**](https://platform.openai.com/docs/bots/overview-of-openai-crawlers) (OpenAI training data)
- [**Applebot-Extended**](https://support.apple.com/en-us/119829) (Apple Intelligence training)
- [**Google-Extended**](https://developers.google.com/search/docs/crawling-indexing/overview-google-crawlers) (Gemini training)
- [**ClaudeBot**](https://www.anthropic.com) (Anthropic training)

**AI Search** . Live fetching for AI answers (Don't block these if you want to appear in AI search)

- [**OAI-SearchBot**](https://platform.openai.com/docs/bots/oai-searchbot) (ChatGPT Search)
- [**Claude-SearchBot**](https://support.anthropic.com/en/articles/9906653-claude-bot-and-crawling) (Claude Search)

**Malicious** . Ignore robots.txt, spoof user agents, scan for vulnerabilities. Block these at the [**firewall level**](https://nuxtseo.com/learn-seo/vue/routes-and-rendering/security), not with robots.txt.

## [Control Mechanisms](#control-mechanisms)

| **Mechanism** | **Use When** |
| --- | --- |
| [**robots.txt**](https://nuxtseo.com/learn-seo/vue/controlling-crawlers/robots-txt) | Block site sections, manage crawl budget, block AI bots |
| [**Sitemaps**](https://nuxtseo.com/learn-seo/vue/controlling-crawlers/sitemaps) | Help crawlers discover pages, especially on large sites |
| [**Meta robots**](https://nuxtseo.com/learn-seo/vue/controlling-crawlers/meta-tags) | Control indexing per page (noindex, nofollow) |
| [**Canonical URLs**](https://nuxtseo.com/learn-seo/vue/controlling-crawlers/canonical-urls) | Consolidate duplicate content, handle URL parameters |
| [**Redirects**](https://nuxtseo.com/learn-seo/vue/controlling-crawlers/redirects) | Preserve SEO when moving/deleting pages |
| [**llms.txt**](https://nuxtseo.com/learn-seo/vue/controlling-crawlers/llms-txt) | Guide AI tools to your documentation (MCP servers, coding assistants) |
| [**X-Robots-Tag**](https://developers.google.com/search/docs/crawling-indexing/robots-meta-tag#xrobotstag) | Control non-HTML files (PDFs, images) |
| [**Firewall**](https://nuxtseo.com/learn-seo/vue/routes-and-rendering/security) | Block malicious bots at network level |

A key shift in 2026 is distinguishing between **Training Bots** (which consume your data to build models) and **Search Bots** (which fetch data to answer user queries).Blocking `GPTBot` stops your data from training future GPT models, but you should allow `OAI-SearchBot` if you want your site to be cited in ChatGPT Search results.

## [Quick Recipes](#quick-recipes)

**Block page from indexing** . [**Full guide**](https://nuxtseo.com/learn-seo/vue/controlling-crawlers/meta-tags)

pages/admin.vue

```
<script setup>
import { useSeoMeta } from '@unhead/vue'

useSeoMeta({ robots: 'noindex, follow' })
</script>
```

**Block AI training bots** . [**Full guide**](https://nuxtseo.com/learn-seo/vue/controlling-crawlers/robots-txt)

public/robots.txt

```
User-agent: GPTBot
User-agent: ClaudeBot
User-agent: Applebot-Extended
User-agent: Google-Extended
Disallow: /
```

**Fix duplicate content** . [**Full guide**](https://nuxtseo.com/learn-seo/vue/controlling-crawlers/canonical-urls)

pages/products/[id].vue

```
<script setup>
import { useHead } from '@unhead/vue'

useHead({
  link: [{ rel: 'canonical', href: \`https://mysite.com/products/${route.params.id}\` }]
})
</script>
```

**Redirect moved page** . [**Full guide**](https://nuxtseo.com/learn-seo/vue/controlling-crawlers/redirects)

server.ts

```
app.get('/old-url', (req, res) => res.redirect(301, '/new-url'))
```

## [When Crawler Control Matters](#when-crawler-control-matters)

Most small sites don't need to optimize crawler behavior. But it matters when:

**Crawl budget concerns** . Sites with 10,000+ pages need Google to prioritize important content. Block low-value pages (search results, filtered products, admin areas) so crawlers focus on what matters.

**Duplicate content** . URLs like `/about` and `/about/` compete against each other. Same with `?sort=price` variations. [**Canonical tags**](https://nuxtseo.com/learn-seo/vue/controlling-crawlers/canonical-urls) consolidate these.

**Staging environments** . Search engines index any public site they find. Block staging/dev environments in [**robots.txt**](https://nuxtseo.com/learn-seo/vue/controlling-crawlers/robots-txt) to avoid duplicate content issues.

**AI training opt-out** . [**GPTBot was the most-blocked crawler in 2024**](https://blog.cloudflare.com/from-googlebot-to-gptbot-whos-crawling-your-site-in-2025/). Block AI training bots without affecting search rankings.

**Server costs** . Bots consume CPU. Heavy pages (maps, infinite scroll, SSR) cost money per request. Blocking unnecessary crawlers reduces load.

## [Using Nuxt?](#using-nuxt)

If you're using Nuxt, check out [**Nuxt SEO**](https://nuxtseo.com/docs/nuxt-seo/getting-started/introduction) which handles much of this automatically.

[**Learn more about controlling crawlers in Nuxt →**](https://nuxtseo.com/learn-seo/nuxt/controlling-crawlers)

---

[**Rich Results** Get rich results in Google search with Schema.org structured data. Learn which types still work after Google's 2023 and 2026 removals.](https://nuxtseo.com/learn-seo/vue/mastering-meta/rich-results) [**Robots.txt** Robots.txt tells crawlers what they can access. Here's how to set it up in Vue.](https://nuxtseo.com/learn-seo/vue/controlling-crawlers/robots-txt)

**On this page**

- [Types of Crawlers](#types-of-crawlers)
- [Control Mechanisms](#control-mechanisms)
- [Quick Recipes](#quick-recipes)
- [When Crawler Control Matters](#when-crawler-control-matters)
- [Using Nuxt?](#using-nuxt)