开源向量LLM - Qwen3-Embedding

1 Qwen3-Embedding介绍

Qwen3-Embedding遵循 Apache 2.0 许可证，模型大小从0.6B到8B，支持32k长文本编码。

Model Type	Models	Size	Layers	Sequence Length	Embedding Dimension	MRL Support	Instruction Aware
Text Embedding	Qwen3-Embedding-0.6B	0.6B	28	32K	1024	Yes	Yes
Text Embedding	Qwen3-Embedding-4B	4B	36	32K	2560	Yes	Yes
Text Embedding	Qwen3-Embedding-8B	8B	36	32K	4096	Yes	Yes
Text Reranking	Qwen3-Reranker-0.6B	0.6B	28	32K	-	-	Yes
Text Reranking	Qwen3-Reranker-4B	4B	36	32K	-	-	Yes
Text Reranking	Qwen3-Reranker-8B	8B	36	32K	-	-	Yes

2 sentence-transformers示例

安装sentence-transformers

pip install -U sentence-transformers -i https://pypi.tuna.tsinghua.edu.cn/simple

代码示例

import os
os.environ['HF_ENDPOINT'] = "https://hf-mirror.com"# Requires transformers>=4.51.0
# Requires sentence-transformers>=2.7.0import torch
from sentence_transformers import SentenceTransformer# Load the model
model = SentenceTransformer("Qwen/Qwen3-Embedding-0.6B")# We recommend enabling flash_attention_2 for better acceleration and memory saving,
# together with setting `padding_side` to "left":
# model = SentenceTransformer(
#     "Qwen/Qwen3-Embedding-0.6B",
#     model_kwargs={"attn_implementation": "flash_attention_2", "device_map": "auto"},
#     tokenizer_kwargs={"padding_side": "left"},
# )# The queries and documents to embed
queries = ["What is the capital of China?","Explain gravity",
]
documents = ["The capital of China is Beijing.","Gravity is a force that attracts two bodies towards each other. It gives weight to physical objects and is responsible for the movement of planets around the sun.",
]with torch.no_grad():# Encode the queries and documents. Note that queries benefit from using a prompt# Here we use the prompt called "query" stored under `model.prompts`, but you can# also pass your own prompt via the `prompt` argumentquery_embeddings = model.encode(queries, prompt_name="query")document_embeddings = model.encode(documents)# Compute the (cosine) similarity between the query and document embeddingssimilarity = model.similarity(query_embeddings, document_embeddings)print(similarity)
# tensor([[0.7646, 0.1414], [0.1355, 0.6000]])

3 hf transformers示例

安装过程参考开源向量LLM - BGE (BAAI General Embedding) -CSDN博客

示例代码如下

import os
os.environ['HF_ENDPOINT'] = "https://hf-mirror.com"# Requires transformers>=4.51.0
import torch
from transformers import AutoModel, AutoTokenizer, AutoModelForCausalLMdef format_instruction(instruction, query, doc):if instruction is None:instruction = 'Given a web search query, retrieve relevant passages that answer the query'output = "<Instruct>: {instruction}\n<Query>: {query}\n<Document>: {doc}".format(instruction=instruction,query=query, doc=doc)return outputdef process_inputs(pairs):inputs = tokenizer(pairs, padding=False, truncation='longest_first',return_attention_mask=False, max_length=max_length - len(prefix_tokens) - len(suffix_tokens))for i, ele in enumerate(inputs['input_ids']):inputs['input_ids'][i] = prefix_tokens + ele + suffix_tokensinputs = tokenizer.pad(inputs, padding=True, return_tensors="pt", max_length=max_length)for key in inputs:inputs[key] = inputs[key].to(model.device)return inputs@torch.no_grad()
def compute_logits(inputs, **kwargs):batch_scores = model(**inputs).logits[:, -1, :]true_vector = batch_scores[:, token_true_id]false_vector = batch_scores[:, token_false_id]batch_scores = torch.stack([false_vector, true_vector], dim=1)batch_scores = torch.nn.functional.log_softmax(batch_scores, dim=1)scores = batch_scores[:, 1].exp().tolist()return scorestokenizer = AutoTokenizer.from_pretrained("Qwen/Qwen3-Reranker-0.6B", padding_side='left')
model = AutoModelForCausalLM.from_pretrained("Qwen/Qwen3-Reranker-0.6B").eval()# We recommend enabling flash_attention_2 for better acceleration and memory saving.
# model = AutoModelForCausalLM.from_pretrained("Qwen/Qwen3-Reranker-0.6B", torch_dtype=torch.float16, attn_implementation="flash_attention_2").cuda().eval()token_false_id = tokenizer.convert_tokens_to_ids("no")
token_true_id = tokenizer.convert_tokens_to_ids("yes")
max_length = 8192prefix = "<|im_start|>system\nJudge whether the Document meets the requirements based on the Query and the Instruct provided. Note that the answer can only be \"yes\" or \"no\".<|im_end|>\n<|im_start|>user\n"
suffix = "<|im_end|>\n<|im_start|>assistant\n<think>\n\n</think>\n\n"
prefix_tokens = tokenizer.encode(prefix, add_special_tokens=False)
suffix_tokens = tokenizer.encode(suffix, add_special_tokens=False)task = 'Given a web search query, retrieve relevant passages that answer the query'queries = ["What is the capital of China?","Explain gravity",
]documents = ["The capital of China is Beijing.","Gravity is a force that attracts two bodies towards each other. It gives weight to physical objects and is responsible for the movement of planets around the sun.",
]pairs = [format_instruction(task, query, doc) for query, doc in zip(queries, documents)]# Tokenize the input texts
inputs = process_inputs(pairs)
scores = compute_logits(inputs)print("scores: ", scores)

reference

---

Qwen3-Embeeding

https://github.com/QwenLM/Qwen3-Embedding

vllm-on-intel-extension-for-pytorch

https://github.com/malcolmchanhaoxian/VLLM-on-Intel-Extension-for-Pytorch-

vllm cpu

https://vllm.hyper.ai/docs/getting-started/installation/cpu/

理解 Hugging Face 的 AutoModel 系列：不同任务的自动模型加载类

https://blog.csdn.net/weixin_42426841/article/details/142236561

图像特征提取

https://hugging-face.cn/docs/transformers/tasks/image_feature_extraction

理解 Hugging Face 的 AutoModel 系列：不同任务的自动模型加载类

https://zhuanlan.zhihu.com/p/721062232

本文来自互联网用户投稿，该文观点仅代表作者本人，不代表本站立场。本站仅提供信息存储空间服务，不拥有所有权，不承担相关法律责任。
如若转载，请注明出处：http://www.pswp.cn/bicheng/91613.shtml
繁体地址，请注明出处：http://hk.pswp.cn/bicheng/91613.shtml

如若内容造成侵权/违法违规/事实不符，请联系多彩编程网进行投诉反馈email:809451989@qq.com，一经查实，立即删除！