Control Web Crawlers and Crawl Budget in Vue

Manage how search engines crawl and index your Vue app. Configure robots.txt, sitemaps, canonical URLs, and redirects for better SEO.
Harlan WiltonHarlan Wilton8 mins read Published Updated
What you'll learn
  • robots.txt is advisory. malicious crawlers ignore it, use firewalls for real security
  • Sites under 10,000 pages rarely need crawl budget optimization
  • GPTBot was the most-blocked crawler in 2024. block AI training without affecting search

Web crawlers determine what gets indexed and how often. Controlling them affects your crawl budget. the number of pages Google will crawl on your site in a given timeframe.

Most sites don't need to worry about crawl budget. But if you have 10,000+ pages, frequently updated content, or want to block AI training bots, crawler control matters.

Types of Crawlers

Search engines . Index your pages for search results

Social platforms . Generate link previews when shared

AI training . Scrape content for model training

Malicious . Ignore robots.txt, spoof user agents, scan for vulnerabilities. Block these at the firewall level, not with robots.txt.

Control Mechanisms

MechanismUse When
robots.txtBlock site sections, manage crawl budget, block AI bots
SitemapsHelp crawlers discover pages, especially on large sites
Meta robotsControl indexing per page (noindex, nofollow)
Canonical URLsConsolidate duplicate content, handle URL parameters
RedirectsPreserve SEO when moving/deleting pages
llms.txtGuide AI tools to your documentation (MCP servers, coding assistants)
X-Robots-TagControl non-HTML files (PDFs, images)
FirewallBlock malicious bots at network level

Quick Recipes

Block page from indexing . Full guide

pages/admin.vue
<script setup>
import { useSeoMeta } from '@unhead/vue'

useSeoMeta({ robots: 'noindex, follow' })
</script>

Block AI training bots . Full guide

public/robots.txt
User-agent: GPTBot
User-agent: ClaudeBot
User-agent: CCBot
Disallow: /

Fix duplicate content . Full guide

pages/products/[id].vue
<script setup>
import { useHead } from '@unhead/vue'

useHead({
  link: [{ rel: 'canonical', href: `https://mysite.com/products/${route.params.id}` }]
})
</script>

Redirect moved page . Full guide

server.ts
app.get('/old-url', (req, res) => res.redirect(301, '/new-url'))

When Crawler Control Matters

Most small sites don't need to optimize crawler behavior. But it matters when:

Crawl budget concerns . Sites with 10,000+ pages need Google to prioritize important content. Block low-value pages (search results, filtered products, admin areas) so crawlers focus on what matters.

Duplicate content . URLs like /about and /about/ compete against each other. Same with ?sort=price variations. Canonical tags consolidate these.

Staging environments . Search engines index any public site they find. Block staging/dev environments in robots.txt to avoid duplicate content issues.

AI training opt-out . GPTBot was the most-blocked crawler in 2024. Block AI training bots without affecting search rankings.

Server costs . Bots consume CPU. Heavy pages (maps, infinite scroll, SSR) cost money per request. Blocking unnecessary crawlers reduces load.

Using Nuxt?

If you're using Nuxt, check out Nuxt SEO which handles much of this automatically.

Learn more about controlling crawlers in Nuxt →