1 Qwen3-Embedding介绍
Qwen3-Embedding遵循 Apache 2.0 许可证,模型大小从0.6B到8B,支持32k长文本编码。
Model Type | Models | Size | Layers | Sequence Length | Embedding Dimension | MRL Support | Instruction Aware |
---|---|---|---|---|---|---|---|
Text Embedding | Qwen3-Embedding-0.6B | 0.6B | 28 | 32K | 1024 | Yes | Yes |
Text Embedding | Qwen3-Embedding-4B | 4B | 36 | 32K | 2560 | Yes | Yes |
Text Embedding | Qwen3-Embedding-8B | 8B | 36 | 32K | 4096 | Yes | Yes |
Text Reranking | Qwen3-Reranker-0.6B | 0.6B | 28 | 32K | - | - | Yes |
Text Reranking | Qwen3-Reranker-4B | 4B | 36 | 32K | - | - | Yes |
Text Reranking | Qwen3-Reranker-8B | 8B | 36 | 32K | - | - | Yes |
2 sentence-transformers示例
安装sentence-transformers
pip install -U sentence-transformers -i https://pypi.tuna.tsinghua.edu.cn/simple
代码示例
import os
os.environ['HF_ENDPOINT'] = "https://hf-mirror.com"# Requires transformers>=4.51.0
# Requires sentence-transformers>=2.7.0import torch
from sentence_transformers import SentenceTransformer# Load the model
model = SentenceTransformer("Qwen/Qwen3-Embedding-0.6B")# We recommend enabling flash_attention_2 for better acceleration and memory saving,
# together with setting `padding_side` to "left":
# model = SentenceTransformer(
# "Qwen/Qwen3-Embedding-0.6B",
# model_kwargs={"attn_implementation": "flash_attention_2", "device_map": "auto"},
# tokenizer_kwargs={"padding_side": "left"},
# )# The queries and documents to embed
queries = ["What is the capital of China?","Explain gravity",
]
documents = ["The capital of China is Beijing.","Gravity is a force that attracts two bodies towards each other. It gives weight to physical objects and is responsible for the movement of planets around the sun.",
]with torch.no_grad():# Encode the queries and documents. Note that queries benefit from using a prompt# Here we use the prompt called "query" stored under `model.prompts`, but you can# also pass your own prompt via the `prompt` argumentquery_embeddings = model.encode(queries, prompt_name="query")document_embeddings = model.encode(documents)# Compute the (cosine) similarity between the query and document embeddingssimilarity = model.similarity(query_embeddings, document_embeddings)print(similarity)
# tensor([[0.7646, 0.1414], [0.1355, 0.6000]])
3 hf transformers示例
安装过程参考开源向量LLM - BGE (BAAI General Embedding) -CSDN博客
示例代码如下
import os
os.environ['HF_ENDPOINT'] = "https://hf-mirror.com"# Requires transformers>=4.51.0
import torch
from transformers import AutoModel, AutoTokenizer, AutoModelForCausalLMdef format_instruction(instruction, query, doc):if instruction is None:instruction = 'Given a web search query, retrieve relevant passages that answer the query'output = "<Instruct>: {instruction}\n<Query>: {query}\n<Document>: {doc}".format(instruction=instruction,query=query, doc=doc)return outputdef process_inputs(pairs):inputs = tokenizer(pairs, padding=False, truncation='longest_first',return_attention_mask=False, max_length=max_length - len(prefix_tokens) - len(suffix_tokens))for i, ele in enumerate(inputs['input_ids']):inputs['input_ids'][i] = prefix_tokens + ele + suffix_tokensinputs = tokenizer.pad(inputs, padding=True, return_tensors="pt", max_length=max_length)for key in inputs:inputs[key] = inputs[key].to(model.device)return inputs@torch.no_grad()
def compute_logits(inputs, **kwargs):batch_scores = model(**inputs).logits[:, -1, :]true_vector = batch_scores[:, token_true_id]false_vector = batch_scores[:, token_false_id]batch_scores = torch.stack([false_vector, true_vector], dim=1)batch_scores = torch.nn.functional.log_softmax(batch_scores, dim=1)scores = batch_scores[:, 1].exp().tolist()return scorestokenizer = AutoTokenizer.from_pretrained("Qwen/Qwen3-Reranker-0.6B", padding_side='left')
model = AutoModelForCausalLM.from_pretrained("Qwen/Qwen3-Reranker-0.6B").eval()# We recommend enabling flash_attention_2 for better acceleration and memory saving.
# model = AutoModelForCausalLM.from_pretrained("Qwen/Qwen3-Reranker-0.6B", torch_dtype=torch.float16, attn_implementation="flash_attention_2").cuda().eval()token_false_id = tokenizer.convert_tokens_to_ids("no")
token_true_id = tokenizer.convert_tokens_to_ids("yes")
max_length = 8192prefix = "<|im_start|>system\nJudge whether the Document meets the requirements based on the Query and the Instruct provided. Note that the answer can only be \"yes\" or \"no\".<|im_end|>\n<|im_start|>user\n"
suffix = "<|im_end|>\n<|im_start|>assistant\n<think>\n\n</think>\n\n"
prefix_tokens = tokenizer.encode(prefix, add_special_tokens=False)
suffix_tokens = tokenizer.encode(suffix, add_special_tokens=False)task = 'Given a web search query, retrieve relevant passages that answer the query'queries = ["What is the capital of China?","Explain gravity",
]documents = ["The capital of China is Beijing.","Gravity is a force that attracts two bodies towards each other. It gives weight to physical objects and is responsible for the movement of planets around the sun.",
]pairs = [format_instruction(task, query, doc) for query, doc in zip(queries, documents)]# Tokenize the input texts
inputs = process_inputs(pairs)
scores = compute_logits(inputs)print("scores: ", scores)
reference
---
Qwen3-Embeeding
https://github.com/QwenLM/Qwen3-Embedding
vllm-on-intel-extension-for-pytorch
https://github.com/malcolmchanhaoxian/VLLM-on-Intel-Extension-for-Pytorch-
vllm cpu
https://vllm.hyper.ai/docs/getting-started/installation/cpu/
理解 Hugging Face 的 AutoModel 系列:不同任务的自动模型加载类
https://blog.csdn.net/weixin_42426841/article/details/142236561
图像特征提取
https://hugging-face.cn/docs/transformers/tasks/image_feature_extraction
理解 Hugging Face 的 AutoModel 系列:不同任务的自动模型加载类
https://zhuanlan.zhihu.com/p/721062232