Robots.txt and meta robots tags are polite suggestions. Malicious crawlers ignore them. You need actual security to protect sensitive content.
✅ Good Security Practices:
❌ Don't Rely On:
Protect your Vue app from unwanted crawlers at the server level:
// server/middleware/security.js
import express from 'express'
const app = express()
// Block non-production environments
app.use((req, res, next) => {
if (process.env.NODE_ENV !== 'production') {
res.setHeader('X-Robots-Tag', 'noindex, nofollow')
}
next()
})
// Enforce HTTPS
app.use((req, res, next) => {
if (req.headers['x-forwarded-proto'] !== 'https') {
return res.redirect(301, `https://${req.headers.host}${req.url}`)
}
next()
})
// Add security headers to your server
import helmet from 'helmet'
app.use(helmet({
frameguard: { action: 'deny' },
contentSecurityPolicy: {
directives: {
defaultSrc: ["'self'"],
styleSrc: ["'self'", "'unsafe-inline'"],
scriptSrc: ["'self'"]
}
},
referrerPolicy: { policy: 'strict-origin-when-cross-origin' }
}))
import rateLimit from 'express-rate-limit'
const limiter = rateLimit({
windowMs: 15 * 60 * 1000, // 15 minutes
max: 100 // limit each IP to 100 requests per windowMs
})
app.use('/api', limiter)
Always block search engines in non-production environments:
// middleware/block-non-production.js
app.use((req, res, next) => {
const isProd = process.env.NODE_ENV === 'production'
const isMainDomain = req.headers.host === 'mysite.com'
if (!isProd || !isMainDomain) {
res.setHeader('X-Robots-Tag', 'noindex, nofollow')
// Also consider basic auth for staging
const auth = req.headers.authorization
if (!auth) {
res.setHeader('WWW-Authenticate', 'Basic')
return res.status(401).send('Authentication required')
}
}
next()
})
Protect admin and user areas:
// middleware/protect-routes.js
app.use((req, res, next) => {
const protectedPaths = ['/admin', '/dashboard', '/user']
if (protectedPaths.some(path => req.path.startsWith(path))) {
// Ensure user is authenticated
if (!req.session?.user) {
return res.redirect('/login')
}
// Block indexing of protected content
res.setHeader('X-Robots-Tag', 'noindex, nofollow')
}
next()
})
Identify legitimate crawlers through:
// utils/verify-crawler.js
import dns from 'dns'
import { promisify } from 'util'
const reverse = promisify(dns.reverse)
export async function isLegitCrawler(ip, userAgent) {
// Example: Verify Googlebot
if (userAgent.includes('Googlebot')) {
const hostnames = await reverse(ip)
return hostnames.some(h => h.endsWith('googlebot.com'))
}
return false
}
Implement tiered rate limiting:
import rateLimit from 'express-rate-limit'
// Different limits for different paths
const apiLimiter = rateLimit({
windowMs: 15 * 60 * 1000,
max: 100
})
const crawlerLimiter = rateLimit({
windowMs: 60 * 1000,
max: 10,
skip: req => !req.headers['user-agent']?.includes('bot')
})
app.use('/api', apiLimiter)
app.use(crawlerLimiter)
Always redirect HTTP to HTTPS:
app.use((req, res, next) => {
const proto = req.headers['x-forwarded-proto']
if (proto === 'http') {
return res.redirect(301, `https://${req.headers.host}${req.url}`)
}
next()
})
Add security headers using helmet:
import helmet from 'helmet'
app.use(helmet({
// Prevent clickjacking
frameguard: { action: 'deny' },
// Prevent MIME type sniffing
noSniff: true,
// Control referrer information
referrerPolicy: { policy: 'strict-origin-when-cross-origin' },
// Enable strict CSP in production
contentSecurityPolicy: process.env.NODE_ENV === 'production' ? {
directives: {
defaultSrc: ["'self'"]
}
} : false
}))
// middleware/crawler-monitor.js
app.use((req, res, next) => {
const ua = req.headers['user-agent']
const ip = req.ip
// Log suspicious patterns
if (isSuspiciousPattern(ua, ip)) {
console.warn(`Suspicious crawler: ${ip} with UA: ${ua}`)
// Consider blocking or rate limiting
}
next()
})
Services like Cloudflare or AWS WAF can:
Opinion: If you're running a small blog, a WAF is overkill. Add it when you're actually getting attacked.
Prevent automated content theft:
const requestCounts = new Map()
app.use((req, res, next) => {
const ip = req.ip
const count = requestCounts.get(ip) || 0
if (count > 100) {
return res.status(429).send('Too Many Requests')
}
requestCounts.set(ip, count + 1)
// Add slight delays to automated requests
if (isBot(req.headers['user-agent'])) {
setTimeout(next, 500)
} else {
next()
}
})
Protect forms from bot submissions:
// routes/contact.js
app.post('/api/contact', async (req, res) => {
const { website, ...formData } = req.body
// Honeypot check
if (website) { // hidden field
return res.json({ success: false })
}
// Rate limiting
if (exceedsRateLimit(req.ip)) {
return res.status(429).json({
error: 'Too many attempts'
})
}
// Process legitimate submission
// ...
})
Looking for Nuxt-specific implementation? Check out the Nuxt Security Guide for server middleware examples and Nuxt-specific configurations.