Architecture

Embeddings

Dense numerical vector representations of text (or other data) that capture semantic meaning. Texts with similar meanings have similar embeddings, enabling search, clustering, and retrieval applications.

Embeddings are fixed-length numerical vectors that represent the meaning of text in a high-dimensional space. When you pass a sentence through an embedding model, it outputs a vector (typically 768 to 3,072 dimensions) where each number captures some aspect of the text's semantic content. The key property is that semantically similar texts produce vectors that are close together in this space, allowing mathematical comparison of meaning.

Embedding models are used for a different purpose than generative models. While GPT-4 or Claude generate text, embedding models like OpenAI's text-embedding-3-large, Cohere's embed-v3, or open-source models like E5 and BGE produce numerical representations used for downstream tasks. The most common application is semantic search: embed a query, compare it against pre-computed document embeddings using cosine similarity, and return the most relevant results. This powers RAG systems, recommendation engines, and knowledge bases.

Creating an embedding-based system follows a standard pipeline. First, chunk your documents into manageable pieces (typically 200-500 tokens each). Second, generate embeddings for each chunk using an embedding model. Third, store these embeddings in a vector database like Pinecone, Weaviate, ChromaDB, or pgvector. Fourth, at query time, embed the user's question, find the most similar document chunks, and pass them to a generative model as context. This is the foundation of retrieval-augmented generation.

Embedding quality varies significantly between models. Better embedding models produce vectors that more accurately capture semantic nuance, improving retrieval quality. Dimensions matter too: higher-dimensional embeddings can capture more information but require more storage and slower similarity computation. For most applications, a modern embedding model with 1,024-1,536 dimensions provides an excellent balance. Cost is minimal — embedding models are orders of magnitude cheaper than generative models, often at $0.02-0.13 per million tokens.

Distillation

Few-Shot Learning

Explore more AI concepts in the glossary

Browse Full Glossary