---
title: "RAG Setup"
description: "Vectorize your site's markdown for semantic search and RAG pipelines."
canonical_url: "https://nuxtseo.com/docs/ai-ready/advanced/rag-example"
last_updated: "2026-05-06T21:34:15.865Z"
---

Nuxt AI Ready outputs clean markdown optimized for vectorizing. This guide shows how to build a RAG pipeline using `llms-full.txt`.

## Fetch Markdown Content

`llms-full.txt` contains all pages as markdown, separated by `---` dividers with frontmatter:

```ts
const RE_PAGE_SPLIT = /^---$/m

const response = await fetch('https://yoursite.com/llms-full.txt')
const content = await response.text()

// Split into pages
const pages = content.split(RE_PAGE_SPLIT).filter(Boolean).map((block) => {
  const [, frontmatter, ...rest] = block.split(RE_PAGE_SPLIT)
  const markdown = rest.join('---').trim()

  // Parse frontmatter
  const meta: Record<string, string> = {}
  frontmatter?.split('\n').forEach((line) => {
    const [key, ...val] = line.split(':')
    if (key?.trim())
      meta[key.trim()] = val.join(':').trim()
  })

  return { ...meta, markdown }
})
```

## Generate Embeddings

Use any embedding provider. Example with [OpenAI](https://openai.com):

```ts
import OpenAI from 'openai'

const openai = new OpenAI()

async function embed(text: string) {
  const response = await openai.embeddings.create({
    model: 'text-embedding-3-small',
    input: text,
  })
  return response.data[0].embedding
}

// Embed each page
const vectors = await Promise.all(
  pages.map(async page => ({
    id: page.route,
    embedding: await embed(page.markdown),
    metadata: { title: page.title, route: page.route }
  }))
)
```

## Store in Vector DB

### sqlite-vec (Local)

```ts
import Database from 'better-sqlite3'
import * as sqliteVec from 'sqlite-vec'

const db = new Database(':memory:')
sqliteVec.load(db)

db.exec(`
  CREATE VIRTUAL TABLE pages USING vec0(
    id TEXT PRIMARY KEY,
    embedding FLOAT[1536]
  )
`)

const insert = db.prepare('INSERT INTO pages VALUES (?, ?)')
for (const v of vectors) {
  insert.run(v.id, new Float32Array(v.embedding))
}
```

### Upstash Vector (Serverless)

```ts
import { Index } from '@upstash/vector'

const index = new Index()

await index.upsert(vectors.map(v => ({
  id: v.id,
  vector: v.embedding,
  metadata: v.metadata
})))
```

## Query

```ts
async function search(query: string, topK = 5) {
  const queryEmbedding = await embed(query)

  // sqlite-vec
  const results = db.prepare(`
    SELECT id, distance
    FROM pages
    WHERE embedding MATCH ?
    ORDER BY distance
    LIMIT ?
  `).all(new Float32Array(queryEmbedding), topK)

  return results
}

// Use in RAG prompt
const relevant = await search('how do I configure meta tags?')
const context = relevant.map(r => pages.find(p => p.route === r.id)?.markdown).join('\n\n')
```

## Chunking Strategy

By default, each page is one chunk. For large pages, split by heading:

```ts
const RE_HEADING_SPLIT = /^##\s+/m

function chunkByHeading(markdown: string, route: string) {
  const sections = markdown.split(RE_HEADING_SPLIT)
  return sections.map((section, i) => ({
    id: `${route}#${i}`,
    content: section.trim(),
    route
  }))
}
```

<table>
<thead>
  <tr>
    <th>
      Strategy
    </th>
    
    <th>
      When to use
    </th>
  </tr>
</thead>

<tbody>
  <tr>
    <td>
      Page-level
    </td>
    
    <td>
      Small pages (<2k tokens), general search
    </td>
  </tr>
  
  <tr>
    <td>
      Heading-level
    </td>
    
    <td>
      Long docs, precise retrieval needed
    </td>
  </tr>
  
  <tr>
    <td>
      Sliding window
    </td>
    
    <td>
      Dense technical content, overlap matters
    </td>
  </tr>
</tbody>
</table>

## Build Script

Run vectorization at build time:

```ts
// scripts/vectorize.ts
import { readFileSync } from 'node:fs'

const llmsFull = readFileSync('.output/public/llms-full.txt', 'utf-8')
// ... parse and vectorize as above
```

Add to your build:

```json
{
  "scripts": {
    "generate": "nuxt generate && tsx scripts/vectorize.ts"
  }
}
```
