Protecting Vue Apps from Malicious Crawlers · Nuxt SEO

-
-
-
-

[1.4K](https://github.com/harlan-zw/nuxt-seo)

[Nuxt SEO on GitHub](https://github.com/harlan-zw/nuxt-seo)

Learn SEO

Master search optimization

Nuxt

 Vue

-
-
-
-
-
-
-

-
-
-
-
-
-
-

-
-
-

-
-
-
-
-
-
-
-
-
-
-

-
-
-

-
-
-
-
-
-
-
-
-

1.
2.
3.
4.
5.

# Protecting Vue Apps from Malicious Crawlers

Robots.txt is a polite suggestion. Malicious crawlers ignore it. Here's how to actually protect your Vue app.

[![Harlan Wilton](https://avatars.githubusercontent.com/u/5326365?v=4)Harlan Wilton](https://x.com/harlan-zw)8 mins read Published Nov 3, 2024 Updated Dec 5, 2024

and meta robots tags are polite suggestions. Malicious crawlers ignore them.

You need actual security: block non-production environments, protect development assets, rate limit aggressive crawlers, authenticate sensitive routes, use HTTPS everywhere. Don't rely on robots.txt for sensitive data, IP blocking alone (easily bypassed), or user-agent detection (trivial to fake).

## [Quick Setup](#quick-setup)

Protect your Vue app from unwanted crawlers at the server level:

Express Middleware

```
// server/middleware/security.js
import express from 'express'

const app = express()

// Block non-production environments
app.use((req, res, next) => {
  if (process.env.NODE_ENV !== 'production') {
    res.setHeader('X-Robots-Tag', 'noindex, nofollow')
  }
  next()
})

// Enforce HTTPS
app.use((req, res, next) => {
  if (req.headers['x-forwarded-proto'] !== 'https') {
    return res.redirect(301, \`https://${req.headers.host}${req.url}\`)
  }
  next()
})
```

Security Headers

```
// Add security headers to your server
import helmet from 'helmet'

app.use(helmet({
  frameguard: { action: 'deny' },
  contentSecurityPolicy: {
    directives: {
      defaultSrc: ['\'self\''],
      styleSrc: ['\'self\'', '\'unsafe-inline\''],
      scriptSrc: ['\'self\'']
    }
  },
  referrerPolicy: { policy: 'strict-origin-when-cross-origin' }
}))
```

Rate Limiting

```
import rateLimit from 'express-rate-limit'

const limiter = rateLimit({
  windowMs: 15 * 60 * 1000, // 15 minutes
  max: 100 // limit each IP to 100 requests per windowMs
})

app.use('/api', limiter)
```

## [Environment Protection](#environment-protection)

### [Development & Staging](#development-staging)

Always block search engines in non-production environments:

```
// middleware/block-non-production.js
app.use((req, res, next) => {
  const isProd = process.env.NODE_ENV === 'production'
  const isMainDomain = req.headers.host === 'mysite.com'

  if (!isProd || !isMainDomain) {
    res.setHeader('X-Robots-Tag', 'noindex, nofollow')

    // Also consider basic auth for staging
    const auth = req.headers.authorization

    if (!auth) {
      res.setHeader('WWW-Authenticate', 'Basic')
      return res.status(401).send('Authentication required')
    }
  }
  next()
})
```

### [Sensitive Routes](#sensitive-routes)

Protect admin and user areas:

```
// middleware/protect-routes.js
app.use((req, res, next) => {
  const protectedPaths = ['/admin', '/dashboard', '/user']

  if (protectedPaths.some(path => req.path.startsWith(path))) {
    // Ensure user is authenticated
    if (!req.session?.user) {
      return res.redirect('/login')
    }

    // Block indexing of protected content
    res.setHeader('X-Robots-Tag', 'noindex, nofollow')
  }
  next()
})
```

## [Crawler Identification](#crawler-identification)

### [Good vs Bad Crawlers](#good-vs-bad-crawlers)

Identify legitimate crawlers through:

- Reverse DNS lookup
- IP verification
- Behavior patterns
- Request rate

```
// utils/verify-crawler.js
import dns from 'node:dns'
import { promisify } from 'node:util'

const reverse = promisify(dns.reverse)

export async function isLegitCrawler(ip, userAgent) {
  // Example: Verify Googlebot
  if (userAgent.includes('Googlebot')) {
    const hostnames = await reverse(ip)
    return hostnames.some(h => h.endsWith('googlebot.com'))
  }
  return false
}
```

### [Rate Limiting](#rate-limiting)

Implement tiered rate limiting:

```
import rateLimit from 'express-rate-limit'

// Different limits for different paths
const apiLimiter = rateLimit({
  windowMs: 15 * 60 * 1000,
  max: 100
})

const crawlerLimiter = rateLimit({
  windowMs: 60 * 1000,
  max: 10,
  skip: req => !req.headers['user-agent']?.includes('bot')
})

app.use('/api', apiLimiter)
app.use(crawlerLimiter)
```

## [Infrastructure Security](#infrastructure-security)

### [HTTPS Enforcement](#https-enforcement)

Always redirect HTTP to HTTPS:

```
app.use((req, res, next) => {
  const proto = req.headers['x-forwarded-proto']

  if (proto === 'http') {
    return res.redirect(301, \`https://${req.headers.host}${req.url}\`)
  }
  next()
})
```

### [Security Headers](#security-headers)

Add security headers using helmet:

```
import helmet from 'helmet'

app.use(helmet({
  // Prevent clickjacking
  frameguard: { action: 'deny' },
  // Prevent MIME type sniffing
  noSniff: true,
  // Control referrer information
  referrerPolicy: { policy: 'strict-origin-when-cross-origin' },
  // Enable strict CSP in production
  contentSecurityPolicy: process.env.NODE_ENV === 'production'
    ? {
        directives: {
          defaultSrc: ['\'self\'']
        }
      }
    : false
}))
```

## [Monitoring & Detection](#monitoring-detection)

### [Logging Suspicious Activity](#logging-suspicious-activity)

```
// middleware/crawler-monitor.js
app.use((req, res, next) => {
  const ua = req.headers['user-agent']
  const ip = req.ip

  // Log suspicious patterns
  if (isSuspiciousPattern(ua, ip)) {
    console.warn(\`Suspicious crawler: ${ip} with UA: ${ua}\`)
    // Consider blocking or rate limiting
  }
  next()
})
```

### [Using Web Application Firewalls](#using-web-application-firewalls)

Services like [Cloudflare](https://cloudflare.com) or AWS WAF can:

- Block malicious IPs
- Prevent DDoS attacks
- Filter suspicious requests
- Monitor traffic patterns

**Opinion:** If you're running a small blog, a WAF is overkill. Add it when you're getting attacked.

## [Common Attacks](#common-attacks)

### [Content Scraping](#content-scraping)

Prevent automated content theft:

```
const requestCounts = new Map()

app.use((req, res, next) => {
  const ip = req.ip
  const count = requestCounts.get(ip) || 0

  if (count > 100) {
    return res.status(429).send('Too Many Requests')
  }

  requestCounts.set(ip, count + 1)

  // Add slight delays to automated requests
  if (isBot(req.headers['user-agent'])) {
    setTimeout(next, 500)
  }
  else {
    next()
  }
})
```

### [Form Spam](#form-spam)

Protect forms from bot submissions:

```
// routes/contact.js
app.post('/api/contact', async (req, res) => {
  const { website, ...formData } = req.body

  // Honeypot check
  if (website) { // hidden field
    return res.json({ success: false })
  }

  // Rate limiting
  if (exceedsRateLimit(req.ip)) {
    return res.status(429).json({
      error: 'Too many attempts'
    })
  }

  // Process legitimate submission
  // ...
})
```

## [Using Nuxt?](#using-nuxt)

If you're using Nuxt, it handles much of this automatically.

---

On this page

- [Quick Setup](#quick-setup)
- [Environment Protection](#environment-protection)
- [Crawler Identification](#crawler-identification)
- [Infrastructure Security](#infrastructure-security)
- [Monitoring & Detection](#monitoring-detection)
- [Common Attacks](#common-attacks)
- [Using Nuxt?](#using-nuxt)

[GitHub](https://github.com/harlan-zw/nuxt-seo) [ Discord](https://discord.com/invite/275MBUBvgP)

###

-
-

Modules

-
-
-
-
-
-
-
-
-

###

-
-
-

###

Nuxt

-
-
-
-
-

Vue

-
-
-
-
-
-
-
-

###

-
-
-
-
-
-
-
-
-
-

Copyright © 2023-2026 Harlan Wilton - [MIT License](https://github.com/harlan-zw/nuxt-seo/blob/main/license) · [mdream](https://mdream.dev)