---
title: "Control Web Crawlers and Crawl Budget in Vue"
description: "Manage how search engines crawl and index your Vue app. Configure robots.txt, sitemaps, canonical URLs, and redirects for better SEO."
canonical_url: "https://nuxtseo.com/learn-seo/vue/controlling-crawlers"
last_updated: "2026-01-29"
---

<key-takeaways>

- robots.txt is advisory. malicious crawlers ignore it, use firewalls for real security
- Sites under 10,000 pages rarely need crawl budget optimization
- Block AI training (Applebot-Extended, GPTBot) without losing AI search traffic (OAI-SearchBot)

</key-takeaways>

Web crawlers determine what gets indexed and how often. Controlling them affects your [crawl budget](https://developers.google.com/search/docs/crawling-indexing/large-site-managing-crawl-budget), the number of pages Google will crawl on your site in a given timeframe.

Most sites don't need to worry about crawl budget. But if you have 10,000+ pages, frequently updated content, or want to block AI training bots, crawler control matters.

## Types of Crawlers

**Search engines**: Index your pages for search results

- [Googlebot](https://developers.google.com/search/docs/advanced/crawling/overview-google-crawlers) (28% of bot traffic)
- [Bingbot](https://ahrefs.com/seo/glossary/bingbot)
- [Applebot](https://support.apple.com/en-us/106381) (Used for Spotlight and Siri suggestions)

**Social platforms**: Generate link previews when shared

- [FacebookExternalHit](https://developers.facebook.com/docs/sharing/webmasters/web-crawlers/)
- Twitterbot, Slackbot, Discordbot

**AI training**: Scrape content for model training

- [GPTBot](https://platform.openai.com/docs/bots/overview-of-openai-crawlers) (OpenAI training data)
- [Applebot-Extended](https://support.apple.com/en-us/119829) (Apple Intelligence training)
- [Google-Extended](https://developers.google.com/search/docs/crawling-indexing/overview-google-crawlers) (Gemini training)
- [ClaudeBot](https://www.anthropic.com) (Anthropic training)

**AI Search**: Live fetching for AI answers (Don't block these if you want to appear in AI search)

- [OAI-SearchBot](https://platform.openai.com/docs/bots/oai-searchbot) (ChatGPT Search)
- [Claude-SearchBot](https://support.anthropic.com/en/articles/9906653-claude-bot-and-crawling) (Claude Search)

**Malicious**: Ignore robots.txt, spoof user agents, scan for vulnerabilities. Block these at the [firewall level](/learn-seo/vue/routes-and-rendering/security), not with robots.txt.

## Control Mechanisms

<table>
<thead>
  <tr>
    <th>
      Mechanism
    </th>
    
    <th>
      Use When
    </th>
  </tr>
</thead>

<tbody>
  <tr>
    <td>
      <a href="/learn-seo/vue/controlling-crawlers/robots-txt">
        robots.txt
      </a>
    </td>
    
    <td>
      Block site sections, manage crawl budget, block AI bots
    </td>
  </tr>
  
  <tr>
    <td>
      <a href="/learn-seo/vue/controlling-crawlers/sitemaps">
        Sitemaps
      </a>
    </td>
    
    <td>
      Help crawlers discover pages, especially on large sites
    </td>
  </tr>
  
  <tr>
    <td>
      <a href="/learn-seo/vue/controlling-crawlers/meta-tags">
        Meta robots
      </a>
    </td>
    
    <td>
      Control indexing per page (noindex, nofollow)
    </td>
  </tr>
  
  <tr>
    <td>
      <a href="/learn-seo/vue/controlling-crawlers/canonical-urls">
        Canonical URLs
      </a>
    </td>
    
    <td>
      Consolidate duplicate content, handle URL parameters
    </td>
  </tr>
  
  <tr>
    <td>
      <a href="/learn-seo/vue/controlling-crawlers/redirects">
        Redirects
      </a>
    </td>
    
    <td>
      Preserve SEO when moving/deleting pages
    </td>
  </tr>
  
  <tr>
    <td>
      <a href="/learn-seo/vue/controlling-crawlers/llms-txt">
        llms.txt
      </a>
    </td>
    
    <td>
      Guide AI tools to your documentation (MCP servers, coding assistants)
    </td>
  </tr>
  
  <tr>
    <td>
      <a href="https://developers.google.com/search/docs/crawling-indexing/robots-meta-tag#xrobotstag" rel="nofollow">
        X-Robots-Tag
      </a>
    </td>
    
    <td>
      Control non-HTML files (PDFs, images)
    </td>
  </tr>
  
  <tr>
    <td>
      <a href="/learn-seo/vue/routes-and-rendering/security">
        Firewall
      </a>
    </td>
    
    <td>
      Block malicious bots at network level
    </td>
  </tr>
</tbody>
</table>

<tip title="2026 Strategy: Training vs. Search">

A key shift in 2026 is distinguishing between **Training Bots** (which consume your data to build models) and **Search Bots** (which fetch data to answer user queries).

Blocking `GPTBot` stops your data from training future GPT models, but you should allow `OAI-SearchBot` if you want your site to be cited in ChatGPT Search results.

</tip>

## Quick Recipes

**Block page from indexing**: [Full guide](/learn-seo/vue/controlling-crawlers/meta-tags)

```vue [pages/admin.vue]
<script setup lang="ts">
import { useSeoMeta } from '@unhead/vue'

useSeoMeta({ robots: 'noindex, follow' })
</script>
```

**Block AI training bots**: [Full guide](/learn-seo/vue/controlling-crawlers/robots-txt)

```robots-txt [public/robots.txt]
User-agent: GPTBot
User-agent: ClaudeBot
User-agent: Applebot-Extended
User-agent: Google-Extended
Disallow: /
```

**Fix duplicate content**: [Full guide](/learn-seo/vue/controlling-crawlers/canonical-urls)

```vue [pages/products/[id].vue]
<script setup lang="ts">
import { useHead } from '@unhead/vue'

useHead({
  link: [{ rel: 'canonical', href: `https://mysite.com/products/${route.params.id}` }]
})
</script>
```

**Redirect moved page**: [Full guide](/learn-seo/vue/controlling-crawlers/redirects)

```ts [server.ts]
app.get('/old-url', (req, res) => res.redirect(301, '/new-url'))
```

## When Crawler Control Matters

Most small sites don't need to optimize crawler behavior. But it matters when:

**Crawl budget concerns**: Sites with 10,000+ pages need Google to prioritize important content. Block low-value pages (search results, filtered products, admin areas) so crawlers focus on what matters.

**Duplicate content**: URLs like `/about` and `/about/` compete against each other. Same with `?sort=price` variations. [Canonical tags](/learn-seo/vue/controlling-crawlers/canonical-urls) consolidate these.

**Staging environments**: Search engines index any public site they find. Block staging/dev environments in [robots.txt](/learn-seo/vue/controlling-crawlers/robots-txt) to avoid duplicate content issues.

**AI training opt-out**: [GPTBot was the most-blocked crawler in 2024](https://blog.cloudflare.com/from-googlebot-to-gptbot-whos-crawling-your-site-in-2025/). Block AI training bots without affecting search rankings.

**Server costs**: Bots consume CPU. Heavy pages (maps, infinite scroll, SSR) cost money per request. Blocking unnecessary crawlers reduces load.

## Using Nuxt?

If you're using Nuxt, check out [Nuxt SEO](/docs/nuxt-seo/getting-started/introduction) which handles much of this automatically.

[Learn more about controlling crawlers in Nuxt →](/learn-seo/nuxt/controlling-crawlers)
