As a minimum the only recommended configuration for robots is to disable indexing for non-production environments.
Many sites will never need to configure their robots.txt or robots meta tag beyond this, as the controlling web crawlers
is an advanced use case and topic.
However, if you're looking to get the best SEO and performance results, you may consider some of the recipes on this page for your site.
If you're finding your site is getting hit with a lot of bots, you may consider enabling the blockNonSeoBots option.
export default defineNuxtConfig({
robots: {
blockNonSeoBots: true
}
})
This will block mostly web scrapers, the full list is: Nuclei, WikiDo, Riddler, PetalBot, Zoominfobot, Go-http-client, Node/simplecrawler, CazoodleBot, dotbot/1.0, Gigabot, Barkrowler, BLEXBot, magpie-crawler.
AI crawlers can be beneficial as they can help users finding your site, but for some educational sites or those not
interested in being indexed by AI crawlers, you can block them using the blockAIBots option.
export default defineNuxtConfig({
robots: {
blockAiBots: true
}
})
This will block the following AI crawlers: GPTBot, ChatGPT-User, Claude-Web, anthropic-ai, Applebot-Extended, Bytespider, CCBot, cohere-ai, Diffbot, FacebookBot, Google-Extended, ImagesiftBot, PerplexityBot, OmigiliBot, Omigili
If you have pages that require authentication or are only available to certain users, you should block these from being indexed.
User-agent: *
Disallow: /admin
Disallow: /dashboard
See Config using Robots.txt for more information.
If you have certain pages that you don't want indexed but you still want their Open Graph Tags to be crawled, you can target the specific user-agents.
# Block search engines
User-agent: Googlebot
User-agent: Bingbot
Disallow: /user-profiles
# Allow social crawlers
User-agent: facebookexternalhit
User-agent: Twitterbot
Allow: /user-profiles
See Config using Robots.txt for more information.
You may consider blocking search results from being indexed, as they can be seen as duplicate content and can be a poor user experience.
User-agent: *
# block search results
Disallow: /*?query=
# block pagination
Disallow: /*?page=
# block sorting
Disallow: /*?sort=
# block filtering
Disallow: /*?filter=