跳到主要内容

Embedding 快速入门

以 BGE 和 OpenAI 为例,快速上手 Embedding。

方案一:本地 BGE

安装

pip install sentence-transformers torch

基本用法

from sentence_transformers import SentenceTransformer

model = SentenceTransformer("BAAI/bge-large-zh-v1.5")
texts = ["RAG 是检索增强生成", "Embedding 将文本转为向量"]
embeddings = model.encode(texts)
print(embeddings.shape) # (2, 1024)

查询向量(BGE 建议加指令)

query = "什么是检索增强生成"
query_emb = model.encode("Represent this sentence for retrieval: " + query)
# 与 doc embeddings 做相似度计算

方案二:OpenAI Embedding

pip install openai
from openai import OpenAI
client = OpenAI()

response = client.embeddings.create(
model="text-embedding-3-small",
input="RAG 是检索增强生成"
)
embedding = response.data[0].embedding # 1536 维

与 LangChain 集成

from langchain_community.embeddings import HuggingFaceEmbeddings

embeddings = HuggingFaceEmbeddings(
model_name="BAAI/bge-large-zh-v1.5"
)
vector = embeddings.embed_query("你好")

与 LlamaIndex 集成

from llama_index.embeddings.huggingface import HuggingFaceEmbedding

embed_model = HuggingFaceEmbedding(model_name="BAAI/bge-large-zh-v1.5")
vector = embed_model.get_text_embedding("你好")

下一步