Embedding 快速入门
以 BGE 和 OpenAI 为例,快速上手 Embedding。
方案一:本地 BGE
安装
pip install sentence-transformers torch
基本用法
from sentence_transformers import SentenceTransformer
model = SentenceTransformer("BAAI/bge-large-zh-v1.5")
texts = ["RAG 是检索增强生成", "Embedding 将文本转为向量"]
embeddings = model.encode(texts)
print(embeddings.shape) # (2, 1024)
查询向量(BGE 建议加指令)
query = "什么是检索增强生成"
query_emb = model.encode("Represent this sentence for retrieval: " + query)
# 与 doc embeddings 做相似度计算
方案二:OpenAI Embedding
pip install openai
from openai import OpenAI
client = OpenAI()
response = client.embeddings.create(
model="text-embedding-3-small",
input="RAG 是检索增强生成"
)
embedding = response.data[0].embedding # 1536 维
与 LangChain 集成
from langchain_community.embeddings import HuggingFaceEmbeddings
embeddings = HuggingFaceEmbeddings(
model_name="BAAI/bge-large-zh-v1.5"
)
vector = embeddings.embed_query("你好")
与 LlamaIndex 集成
from llama_index.embeddings.huggingface import HuggingFaceEmbedding
embed_model = HuggingFaceEmbedding(model_name="BAAI/bge-large-zh-v1.5")
vector = embed_model.get_text_embedding("你好")