Control Web Crawlers and Crawl Budget in Nuxt

Manage how search engines crawl and index your Nuxt app. Configure robots.txt, sitemaps, canonical URLs, and redirects for better SEO.
Harlan WiltonHarlan Wilton8 mins read Published Updated

Web crawlers determine what gets indexed and how often. Controlling them affects your crawl budget—the number of pages Google will crawl on your site in a given timeframe.

Most sites don't need to worry about crawl budget. But if you have 10,000+ pages, frequently updated content, or want to block AI training bots, crawler control matters.

Types of Crawlers

Search engines — Index your pages for search results

Social platforms — Generate link previews when shared

AI training — Scrape content for model training

Malicious — Ignore robots.txt, spoof user agents, scan for vulnerabilities. Block these at the firewall level, not with robots.txt.

Control Mechanisms

MechanismUse When
robots.txtBlock site sections, manage crawl budget, block AI bots
SitemapsHelp crawlers discover pages, especially on large sites
Meta robotsControl indexing per page (noindex, nofollow)
Canonical URLsConsolidate duplicate content, handle URL parameters
RedirectsPreserve SEO when moving/deleting pages
llms.txtGuide AI tools to your documentation (via nuxt-llms)
X-Robots-TagControl non-HTML files (PDFs, images)
FirewallBlock malicious bots at network level

Quick Recipes

Block page from indexingFull guide

pages/admin.vue
<script setup>
useSeoMeta({ robots: 'noindex, follow' })
</script>

Block AI training botsFull guide

public/robots.txt
User-agent: GPTBot
User-agent: ClaudeBot
User-agent: CCBot
Disallow: /

Fix duplicate contentFull guide

pages/products/[id].vue
<script setup>
const route = useRoute()
useHead({
  link: [{ rel: 'canonical', href: `https://mysite.com/products/${route.params.id}` }]
})
</script>

Redirect moved pageFull guide

nuxt.config.ts
export default defineNuxtConfig({
  routeRules: {
    '/old-url': { redirect: { to: '/new-url', statusCode: 301 } }
  }
})

When Crawler Control Matters

Most small sites don't need to optimize crawler behavior. But it matters when:

Crawl budget concerns — Sites with 10,000+ pages need Google to prioritize important content. Block low-value pages (search results, filtered products, admin areas) so crawlers focus on what matters.

Duplicate content — URLs like /about and /about/ compete against each other. Same with ?sort=price variations. Canonical tags consolidate these.

Staging environments — Search engines index any public site they find. Block staging/dev environments in robots.txt to avoid duplicate content issues.

AI training opt-outGPTBot was the most-blocked crawler in 2024. Block AI training bots without affecting search rankings.

Server costs — Bots consume CPU. Heavy pages (maps, infinite scroll, SSR) cost money per request. Blocking unnecessary crawlers reduces load.

Nuxt SEO Modules

Nuxt handles crawler control through dedicated modules. Install once, configure in nuxt.config.ts, and forget about it.