Protecting Vue Apps from Malicious Crawlers

Learn how to protect your Vue application from malicious crawlers and bots.

Introduction

Robots.txt and meta robots are polite suggestions. Malicious crawlers ignore them. You need actual security to protect sensitive content.

✅ Good Security Practices:

  • Block non-production environments
  • Protect development assets
  • Rate limit aggressive crawlers
  • Authenticate sensitive routes
  • Monitor crawler behavior
  • Use HTTPS everywhere

❌ Don't Rely On:

  • robots.txt for sensitive data (lazy and ineffective)
  • IP blocking alone (easily bypassed)
  • User-agent detection (trivial to fake)
  • Client-side only protection (not real security)
  • Security through obscurity

Quick Setup

Protect your Vue/Nuxt app from unwanted crawlers:

// server/middleware/security.ts
export default defineEventHandler((event) => {
// Block non-production environments
  if (process.env.NODE_ENV !== 'production') {
    setHeader(event, 'X-Robots-Tag', 'noindex, nofollow')
  }

  // Enforce HTTPS
  if (!getRequestHost(event).includes('https')) {
    return sendRedirect(event, `https://${getRequestHost(event)}${event.path}`, 301)
  }
})

Environment Protection

Development & Staging

Always block search engines in non-production environments:

// server/middleware/block-non-production.ts
export default defineEventHandler((event) => {
  const isProd = process.env.NODE_ENV === 'production'
  const isMainDomain = getRequestHost(event) === 'mysite.com'

  if (!isProd || !isMainDomain) {
    setHeader(event, 'X-Robots-Tag', 'noindex, nofollow')

    // Also consider basic auth for staging
    if (!event.headers.get('authorization')) {
      setResponseStatus(event, 401)
      setHeader(event, 'WWW-Authenticate', 'Basic')
      return 'Authentication required'
    }
  }
})

Sensitive Routes

Protect admin and user areas:

// server/middleware/protect-routes.ts
export default defineEventHandler((event) => {
  const protectedPaths = ['/admin', '/dashboard', '/user']

  if (protectedPaths.some(path => event.path.startsWith(path))) {
    // Ensure user is authenticated
    if (!event.context.auth?.user) {
      return sendRedirect(event, '/login')
    }

    // Block indexing of protected content
    setHeader(event, 'X-Robots-Tag', 'noindex, nofollow')
  }
})

Crawler Identification

Good vs Bad Crawlers

Identify legitimate crawlers through:

  • Reverse DNS lookup
  • IP verification
  • Behavior patterns
  • Request rate
// server/utils/verify-crawler.ts
export async function isLegitCrawler(ip: string, userAgent: string) {
  // Example: Verify Googlebot
  if (userAgent.includes('Googlebot')) {
    const hostname = await reverseDns(ip)
    return hostname.endsWith('googlebot.com')
  }
  return false
}

Rate Limiting

Implement tiered rate limiting:

import { rateLimit } from 'express-rate-limit'

// Different limits for different paths
const apiLimiter = rateLimit({
  windowMs: 15 * 60 * 1000,
  max: 100
})

const crawlerLimiter = rateLimit({
  windowMs: 60 * 1000,
  max: 10,
  skip: req => !req.headers['user-agent']?.includes('bot')
})

export default defineEventHandler((event) => {
  if (event.path.startsWith('/api')) {
    return apiLimiter(event)
  }
  return crawlerLimiter(event)
})

Infrastructure Security

HTTPS Enforcement

Always redirect HTTP to HTTPS:

export default defineEventHandler((event) => {
  const proto = event.headers.get('x-forwarded-proto')

  if (proto === 'http') {
    return sendRedirect(
      event,
      `https://${getRequestHost(event)}${event.path}`,
      301
    )
  }
})

Security Headers

Add security headers:

export default defineNuxtConfig({
  nitro: {
    routeRules: {
      '/**': {
        headers: {
          // Prevent clickjacking
          'X-Frame-Options': 'DENY',
          // Prevent MIME type sniffing
          'X-Content-Type-Options': 'nosniff',
          // Control referrer information
          'Referrer-Policy': 'strict-origin-when-cross-origin',
          // Enable strict CSP in production
          ...(process.env.NODE_ENV === 'production'
            ? {
                'Content-Security-Policy': 'default-src \'self\';'
              }
            : {})
        }
      }
    }
  }
})

Monitoring & Detection

Logging Suspicious Activity

// server/middleware/crawler-monitor.ts
export default defineEventHandler((event) => {
  const ua = event.headers.get('user-agent')
  const ip = getRequestIP(event)

  // Log suspicious patterns
  if (isSuspiciousPattern(ua, ip)) {
    console.warn(`Suspicious crawler: ${ip} with UA: ${ua}`)
    // Consider blocking or rate limiting
  }
})

Using Web Application Firewalls

Services like Cloudflare or AWS WAF can:

  • Block malicious IPs
  • Prevent DDoS attacks
  • Filter suspicious requests
  • Monitor traffic patterns

Opinion: If you're running a small blog, a WAF is overkill. Add it when you're actually getting attacked.

Common Attacks

Content Scraping

Prevent automated content theft:

export default defineEventHandler((event) => {
  const requests = getRequestCount(getRequestIP(event))

  if (requests > 100) {
    setResponseStatus(event, 429)
    return 'Too Many Requests'
  }

  // Add slight delays to automated requests
  if (isBot(event.headers.get('user-agent'))) {
    await new Promise(r => setTimeout(r, 500))
  }
})

Form Spam

Protect forms from bot submissions:

// server/api/contact.post.ts
export default defineEventHandler(async (event) => {
  const body = await readBody(event)

  // Honeypot check
  if (body.website) { // hidden field
    return { success: false }
  }

  // Rate limiting
  if (exceedsRateLimit(getRequestIP(event))) {
    return createError({
      statusCode: 429,
      message: 'Too many attempts'
    })
  }

// Process legitimate submission
})

Core Concepts

Additional Resources