Robot.txt Recipes
Introduction
As a minimum the only recommended configuration for robots is to disable indexing for non-production environments.
Many sites will never need to configure their robots.txt
or robots
meta tag beyond this, as the controlling web crawlers
is an advanced use case and topic.
However, if you're looking to get the best SEO and performance results, you may consider some of the recipes on this page for your site.
Robots.txt recipes
Blocking Bad Bots
If you're finding your site is getting hit with a lot of bots, you may consider enabling the blockNonSeoBots
option.
export default defineNuxtConfig({
robots: {
blockNonSeoBots: true
}
})
This will block mostly web scrapers, the full list is: Nuclei
, WikiDo
, Riddler
, PetalBot
, Zoominfobot
, Go-http-client
, Node/simplecrawler
, CazoodleBot
, dotbot/1.0
, Gigabot
, Barkrowler
, BLEXBot
, magpie-crawler
.
Blocking AI Crawlers
AI crawlers can be beneficial as they can help users finding your site, but for some educational sites or those not
interested in being indexed by AI crawlers, you can block them using the blockAIBots
option.
export default defineNuxtConfig({
robots: {
blockAiBots: true
}
})
This will block the following AI crawlers: GPTBot
, ChatGPT-User
, Claude-Web
, anthropic-ai
, Applebot-Extended
, Bytespider
, CCBot
, cohere-ai
, Diffbot
, FacebookBot
, Google-Extended
, ImagesiftBot
, PerplexityBot
, OmigiliBot
, Omigili
Blocking Privileged Pages
If you have pages that require authentication or are only available to certain users, you should block these from being indexed.
User-agent: *
Disallow: /admin
Disallow: /dashboard
See Config using Robots.txt for more information.
Whitelisting Open Graph Tags
If you have certain pages that you don't want indexed but you still want their Open Graph Tags to be crawled, you can target the specific user-agents.
# Block search engines
User-agent: Googlebot
User-agent: Bingbot
Disallow: /user-profiles
# Allow social crawlers
User-agent: facebookexternalhit
User-agent: Twitterbot
Disallow: /user-profiles
See Config using Robots.txt for more information.
Blocking Search Results
You may consider blocking search results from being indexed, as they can be seen as duplicate content and can be a poor user experience.
User-agent: *
# block search results
Disallow: /*?query=
# block pagination
Disallow: /*?page=
# block sorting
Disallow: /*?sort=
# block filtering
Disallow: /*?filter=