pgvector Embeddings
Generate vector embeddings using Ollama and store them in PostgreSQL with pgvector. This skill covers the ingestion phase of RAG pipelines.
When to Apply
Use this skill when:
- Generating embeddings for documents or text
- Storing embeddings in PostgreSQL
- Building the ingestion pipeline for RAG
- Converting text to vectors for semantic search
- Chunking documents for better retrieval
Embedding Models
Recommended: nomic-embed-text
The nomic-embed-text model provides 768-dimensional embeddings with good quality and performance:
# Pull the model
ollama pull nomic-embed-text
| Model | Dimensions | Speed | Quality |
|---|---|---|---|
nomic-embed-text | 768 | Fast | Good |
mxbai-embed-large | 1024 | Medium | Better |
all-minilm | 384 | Very Fast | Acceptable |
OllamaClient Implementation
Create a TypeScript client for generating embeddings:
// src/utils/ollama.ts
import fetch from 'cross-fetch';
interface OllamaEmbeddingResponse {
embedding: number[];
}
export class OllamaClient {
private baseUrl: string;
constructor(baseUrl?: string) {
this.baseUrl = baseUrl || process.env.OLLAMA_HOST || 'http://localhost:11434';
}
async generateEmbedding(text: string, model: string = 'nomic-embed-text'): Promise<number[]> {
const response = await fetch(`${this.baseUrl}/api/embeddings`, {
method: 'POST',
headers: {
'Content-Type': 'application/json',
},
body: JSON.stringify({
model,
prompt: text,
}),
});
if (!response.ok) {
throw new Error(`Failed to generate embedding: ${response.statusText}`);
}
const data: OllamaEmbeddingResponse = await response.json();
return data.embedding;
}
}
Storing Embeddings
Format Vector for PostgreSQL
pgvector expects vectors in bracket notation:
const formatVector = (embedding: number[]): string => `[${embedding.join(',')}]`;
Insert Document with Embedding
import { Pool } from 'pg';
import { OllamaClient } from './utils/ollama';
const pool = new Pool();
const ollama = new OllamaClient();
async function addDocument(title: string, content: string, metadata: Record<string, unknown> = {}) {
// Generate embedding for the full document
const embedding = await ollama.generateEmbedding(content);
// Insert with embedding
const result = await pool.query(
`INSERT INTO intelligence.documents (title, content, metadata, embedding)
VALUES ($1, $2, $3, $4::vector)
RETURNING id`,
[title, content, metadata, formatVector(embedding)]
);
return result.rows[0].id;
}
Document Chunking
Why Chunk Documents?
Large documents should be split into smaller chunks for better retrieval:
- Embeddings capture meaning better for shorter text
- Retrieval returns more relevant context
- Reduces noise in LLM responses
Chunking Function (SQL)
Create a PostgreSQL function for chunking:
-- deploy/schemas/intelligence/procedures/create_document_chunks.sql
CREATE FUNCTION intelligence.create_document_chunks(
p_document_id INTEGER,
p_chunk_size INTEGER DEFAULT 1000,
p_chunk_overlap INTEGER DEFAULT 200
)
RETURNS VOID AS $$
DECLARE
v_content TEXT;
v_position INTEGER := 1;
v_chunk_index INTEGER := 0;
v_chunk TEXT;
v_len INTEGER;
BEGIN
SELECT content INTO v_content
FROM intelligence.documents
WHERE id = p_document_id;
IF v_content IS NULL THEN
RAISE NOTICE 'No content found for document_id %', p_document_id;
RETURN;
END IF;
v_len := LENGTH(v_content);
WHILE v_position <= v_len LOOP
v_chunk := SUBSTRING(v_content FROM v_position FOR p_chunk_size);
INSERT INTO intelligence.chunks (document_id, content, chunk_index)
VALUES (p_document_id, v_chunk, v_chunk_index);
v_position := v_position + (p_chunk_size - p_chunk_overlap);
v_chunk_index := v_chunk_index + 1;
END LOOP;
END;
$$ LANGUAGE plpgsql;
Chunking Parameters
| Parameter | Recommended | Description |
|---|---|---|
chunk_size | 500-1000 | Characters per chunk |
chunk_overlap | 100-200 | Overlap between chunks |
Overlap ensures context isn't lost at chunk boundaries.
Complete Ingestion Pipeline
import { Pool } from 'pg';
import { OllamaClient } from './utils/ollama';
const formatVector = (embedding: number[]): string => `[${embedding.join(',')}]`;
export class DocumentIngester {
private pool: Pool;
private ollama: OllamaClient;
constructor(pool: Pool, ollamaBaseUrl?: string) {
this.pool = pool;
this.ollama = new OllamaClient(ollamaBaseUrl);
}
async ingestDocument(
title: string,
content: string,
metadata: Record<string, unknown> = {},
chunkSize: number = 1000,
chunkOverlap: number = 200
): Promise<number> {
// 1. Generate embedding for full document
const docEmbedding = await this.ollama.generateEmbedding(content);
// 2. Insert document
const docResult = await this.pool.query(
`INSERT INTO intelligence.documents (title, content, metadata, embedding)
VALUES ($1, $2, $3, $4::vector)
RETURNING id`,
[title, content, metadata, formatVector(docEmbedding)]
);
const documentId = docResult.rows[0].id;
// 3. Create chunks
await this.pool.query(
'SELECT intelligence.create_document_chunks($1, $2, $3)',
[documentId, chunkSize, chunkOverlap]
);
// 4. Generate embeddings for each chunk
const chunks = await this.pool.query(
'SELECT id, content FROM intelligence.chunks WHERE document_id = $1 ORDER BY chunk_index',
[documentId]
);
for (const chunk of chunks.rows) {
const chunkEmbedding = await this.ollama.generateEmbedding(chunk.content);
await this.pool.query(
'UPDATE intelligence.chunks SET embedding = $1::vector WHERE id = $2',
[formatVector(chunkEmbedding), chunk.id]
);
}
return documentId;
}
}
Testing Embeddings
import { getConnections, PgTestClient } from 'pgsql-test';
import { OllamaClient } from '../src/utils/ollama';
let pg: PgTestClient;
let teardown: () => Promise<void>;
let ollama: OllamaClient;
const formatVector = (embedding: number[]): string => `[${embedding.join(',')}]`;
beforeAll(async () => {
({ pg, teardown } = await getConnections());
ollama = new OllamaClient();
});
afterAll(() => teardown());
test('should generate and store embedding', async () => {
const text = 'Machine learning is a subset of artificial intelligence.';
const embedding = await ollama.generateEmbedding(text);
expect(embedding).toHaveLength(768); // nomic-embed-text dimensions
const result = await pg.client.query(
`INSERT INTO intelligence.documents (title, content, embedding)
VALUES ($1, $2, $3::vector)
RETURNING id`,
['ML Basics', text, formatVector(embedding)]
);
expect(result.rows[0].id).toBeDefined();
});
Design Considerations
Embeddings at Application Layer
Generate embeddings in your application, not in database triggers:
Why not triggers?
- HTTP calls in triggers can cause transaction timeouts
- Failed embedding calls would rollback the entire transaction
- Harder to retry or handle rate limits
Recommended approach:
- Generate embeddings in application code
- Use job queues for async processing if needed
- Handle failures gracefully with retries
Batch Processing
For large document sets, process in batches:
async function batchIngest(documents: Array<{title: string, content: string}>) {
for (const doc of documents) {
try {
await ingester.ingestDocument(doc.title, doc.content);
console.log(`Ingested: ${doc.title}`);
} catch (error) {
console.error(`Failed to ingest ${doc.title}:`, error);
// Continue with next document
}
}
}
Troubleshooting
| Issue | Solution |
|---|---|
| "Connection refused" to Ollama | Ensure Ollama is running: ollama serve |
| "Model not found" | Pull the model: ollama pull nomic-embed-text |
| Dimension mismatch | Ensure VECTOR(n) matches model output dimensions |
| Slow embedding generation | Consider batching or using a faster model |
| Memory issues | Process documents in smaller batches |
References
- Related skill:
pgvector-setupfor database schema setup - Related skill:
pgvector-similarity-searchfor querying embeddings - Related skill:
ollama-integrationfor Ollama client details - Related skill:
rag-pipelinefor complete RAG implementation
