通义智文开源QwenLong-L1: 迈向长上下文大推理模型的强化学习

在这里插入图片描述

🎉 动态

2025年5月26日: 🔥 我们正式发布🤗QwenLong-L1-32B——首个采用强化学习训练、专攻长文本推理的LRM模型。在七项长文本文档问答基准测试中,QwenLong-L1-32B性能超越OpenAI-o3-mini和Qwen3-235B-A22B等旗舰LRM,达到与Claude-3.7-Sonnet-Thinking持平的水准,展现了当前最先进长文本推理模型的领先实力。

2025年5月26日: 🔥 我们同步开源🤗DocQA-RL-1.6K专项强化学习数据集,包含1,600道涵盖数学演算、逻辑推理和多跳推理等领域的文档问答题目。

📚 简介

在本研究中,我们提出了QwenLong-L1,这是一种新颖的强化学习(RL)框架,旨在促进LRM从短上下文熟练度向稳健的长上下文泛化过渡。在我们的初步实验中,我们展示了短上下文和长上下文推理RL训练动态之间的差异。

在这里插入图片描述

我们的框架通过强化学习训练中的渐进式上下文扩展,增强了短上下文语言推理模型(LRM)的性能。该框架包含三个核心组件:用于初始化稳健策略的预热监督微调(SFT)阶段;通过课程引导的强化学习阶段实现从短上下文到长上下文的稳定适应;以及难度感知的回溯采样机制,通过动态调整各阶段训练复杂度来激励策略探索。我们整合了包括GRPO和DAPO在内的最新强化学习算法,结合基于规则和基于模型的二元结果奖励混合函数,以平衡精确率与召回率。在策略优化过程中,通过战略性利用群体相对优势,引导LRM学习对实现稳健长上下文锚定和卓越推理能力至关重要的有效推理模式。

在这里插入图片描述

🎯 模型发布

我们发布了🤗 QwenLong-L1-32B,这是首个通过强化学习训练、专为长文本推理设计的长上下文语言推理模型。在七项长文本文档问答基准测试中,QwenLong-L1-32B性能超越OpenAI-o3-mini和Qwen3-235B-A22B等旗舰语言推理模型,达到与Claude-3.7-Sonnet-Thinking相当的水准,展现出当前最先进语言推理模型中的领先性能。

以下是评估结果。

在这里插入图片描述

🛠️ 要求

# Create the conda environment
conda create -n qwenlongl1 python==3.10
conda activate qwenlongl1# Install requirements
pip3 install -r requirements.txt# Install verl
cd verl
pip3 install -e .# Install vLLM
pip3 install vllm==0.7.3 # Install flash-attn
pip3 install flash-attn --no-build-isolation

🚀 快速入门

以下是如何使用 🤗 Transformers 运行该模型:

from transformers import AutoModelForCausalLM, AutoTokenizermodel_name = "Tongyi-Zhiwen/QwenLong-L1-32B"# load the tokenizer and the model
tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForCausalLM.from_pretrained(model_name,torch_dtype="auto",device_map="auto"
)# prepare the model input
template = """Please read the following text and answer the question below.<text>
$DOC$
</text>$Q$Format your response as follows: "Therefore, the answer is (insert answer here)"."""
context = "<YOUR_CONTEXT_HERE>" 
question = "<YOUR_QUESTION_HERE>"
prompt = template.replace('$DOC$', context.strip()).replace('$Q$', question.strip())
messages = [{"role": "user", "content": prompt}
]
text = tokenizer.apply_chat_template(messages,tokenize=False,add_generation_prompt=True
)
model_inputs = tokenizer([text], return_tensors="pt").to(model.device)# conduct text completion
generated_ids = model.generate(**model_inputs,max_new_tokens=10000,temperature=0.7,top_p=0.95
)
output_ids = generated_ids[0][len(model_inputs.input_ids[0]):].tolist() # parsing thinking content
try:# rindex finding 151649 (</think>)index = len(output_ids) - output_ids[::-1].index(151649)
except ValueError:index = 0thinking_content = tokenizer.decode(output_ids[:index], skip_special_tokens=True).strip("\n")
content = tokenizer.decode(output_ids[index:], skip_special_tokens=True).strip("\n")print("thinking content:", thinking_content)
print("content:", content)

🗂️ 数据集

为了构建一个具有挑战性的可验证长文本推理强化学习数据集,我们开发了🤗DocQA-RL-1.6K,该数据集包含跨三个推理领域的1600个文档问答问题:

(1) 数学推理:我们使用DocMath数据集中的600道问题,这些问题要求对财务报告等长专业文档进行数值推理。对于DocMath数据集,我们从其验证集中每个子集抽取75%条目用于训练,25%用于评估;

(2) 逻辑推理:我们采用DeepSeek-R1合成了600道多选题,这些问题需要对我们精选的法律、金融、保险和生产领域真实文档进行逻辑分析;

(3) 多跳推理:我们从MultiHopRAG选取200个样本,从Musique选取200个样本,重点关注跨文档推理。

请下载以下数据集并放入./datasets/目录用于训练和评估。

强化学习训练数据:🤗DocQA-RL-1.6K

评估数据:🤗docmath、🤗frames、🤗longbench

💻 训练

我们为单阶段强化学习训练提供了基于DAPO的基础演示代码。

首先,我们应该启动一个本地验证器。

export CUDA_VISIBLE_DEVICES=0vllm serve "Qwen/Qwen2.5-1.5B-Instruct" \--host 0.0.0.0 \--port 23547

然后,我们开始使用4个节点进行强化学习训练。

export PROJ_DIR="<YOUR_PROJ_DIR_HERE>"
export MASTER_IP="<YOUR_MASTER_IP_HERE>" # ray master ip
export NNODES=4 # total GPU nodes
export NODE_RANK=${RANK} # rank of current node
export PORT=6382
export WANDB_API_KEY="<YOUR_WANDB_API_KEY_HERE>"
export WANDB_PROJECT="QwenLong-L1"
export LLM_JUDGE=Y # 'Y': LLM JUDGE, 'N': RULE BASED
export VLLM_ATTENTION_BACKEND=FLASH_ATTN
# verifier
export VERIFIER_PATH="Qwen/Qwen2.5-1.5B-Instruct"
export VERIFIER_HOST="<YOUR_VERIFIER_HOST_HERE>"
export VERIFIER_PORT="23547"ray_start_retry() {while true; doray start --address="${MASTER_IP}:${PORT}"if [ $? -eq 0 ]; thenbreakfiecho "Failed to connect to master, retrying in 5 seconds..."sleep 5done
}check_ray_status() {until ray status >/dev/null 2>&1; doecho "Waiting for Ray cluster to be ready..."sleep 5done
}if [ "$RANK" == "0" ]; thenecho "Starting HEAD node..."ray start --head --port=${PORT}check_ray_statusecho "Ray head node started successfully"elseecho "Starting WORKER node..."ray_start_retrycheck_ray_statusecho "Successfully joined Ray cluster"
fiif [ "$RANK" == "0" ]; thenbash ${PROJ_DIR}/scripts/rl_4nodes_dapo.sh 2>&1 | tee ${PROJ_DIR}/logs/rl_log_$(date +%Y%m%d_%H%M%S).txt &
elsesleep 30d
fiwait

实践演示

from transformers import AutoModelForCausalLM, AutoTokenizer, BitsAndBytesConfigmodel_name = "Tongyi-Zhiwen/QwenLong-L1-32B"quantization_config = BitsAndBytesConfig(load_in_4bit=True)# load the tokenizer and the model
tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForCausalLM.from_pretrained(model_name,torch_dtype="auto",device_map="auto",quantization_config=quantization_config,
)# prepare the model input
template = """Please read the following text and answer the question below.<text>
$DOC$
</text>$Q$Format your response as follows: "Therefore, the answer is (insert answer here)"."""# 填充上下文和问题
context = """
Renewable energy sources are crucial for addressing climate change and reducing dependence on fossil fuels. Solar power is one of the most abundant and widely available renewable energy sources. It converts sunlight directly into electricity using photovoltaic (PV) cells or indirectly through concentrated solar power (CSP) systems.Wind energy is another rapidly growing renewable source. Wind turbines capture the kinetic energy of moving air and convert it into electrical energy. Onshore wind farms are more common, but offshore wind farms are becoming increasingly popular due to stronger and more consistent wind resources.Hydroelectric power is generated by harnessing the energy of flowing water in rivers or dams. It is one of the oldest and most established renewable energy technologies, providing a reliable and flexible source of electricity.Biomass energy uses organic materials such as wood, agricultural waste, and dedicated energy crops to produce heat, electricity, or biofuels. It is considered renewable because the carbon dioxide released during combustion is offset by the carbon dioxide absorbed during the growth of the biomass feedstock.Geothermal energy taps into the Earth's internal heat to generate electricity or provide direct heating and cooling. It is a reliable and consistent energy source, particularly in regions with high geothermal activity.The transition to renewable energy is driven by several factors, including environmental concerns, energy security, and technological advancements. However, challenges remain, such as the intermittency of solar and wind power, high initial costs, and the need for energy storage solutions.Despite these challenges, the global renewable energy market is expanding rapidly. Many countries have set ambitious renewable energy targets, and investments in renewable energy technologies continue to grow. The International Renewable Energy Agency (IRENA) projects that renewable energy could account for up to 86% of global electricity generation by 2050 if current trends continue.
"""question = """
What are the main types of renewable energy sources mentioned in the text, and what are their respective advantages?
"""# 构建提示
prompt = template.replace('$DOC$', context.strip()).replace('$Q$', question.strip())
messages = [{"role": "user", "content": prompt}
]
text = tokenizer.apply_chat_template(messages,tokenize=False,add_generation_prompt=True
)
model_inputs = tokenizer([text], return_tensors="pt").to(model.device)# conduct text completion
generated_ids = model.generate(**model_inputs,max_new_tokens=10000,temperature=0.7,top_p=0.95
)
output_ids = generated_ids[0][len(model_inputs.input_ids[0]):].tolist() # parsing thinking content
try:# rindex finding 151649 (</think>)index = len(output_ids) - output_ids[::-1].index(151649)
except ValueError:index = 0thinking_content = tokenizer.decode(output_ids[:index], skip_special_tokens=True).strip("\n")
content = tokenizer.decode(output_ids[index:], skip_special_tokens=True).strip("\n")print("thinking content:", thinking_content)
print("content:", content)
输出:
thinking content: Okay, let me try to figure this out. The question is asking for the main types of renewable energy sources mentioned in the text and their respective advantages. First, I need to go through the text again to identify each renewable energy source and note down their advantages.Starting with the first paragraph, it mentions solar power. The text says solar power uses photovoltaic cells or CSP systems. The advantage here is that it's abundant and widely available. Then, wind energy is next. The text talks about onshore and offshore wind farms. The advantage for wind is that offshore farms have stronger and more consistent winds, which I think makes them more reliable.Hydroelectric power is mentioned next. It's described as one of the oldest and most established, providing reliable and flexible electricity. So the advantage here is reliability and flexibility. Biomass energy uses organic materials and is considered renewable because the CO2 released is offset by the growth of the feedstock. The advantage here is that it's renewable in terms of carbon balance.Geothermal energy is next, using the Earth's internal heat. The advantage is that it's reliable and consistent, especially in areas with high geothermal activity. Wait, the question is about the main types and their advantages. Let me list them out:1. Solar power: Abundant and widely available.
2. Wind energy: Stronger and more consistent winds offshore.
3. Hydroelectric power: Reliable and flexible.
4. Biomass energy: Carbon neutrality (offset by growth).
5. Geothermal energy: Reliable and consistent.I think that's all the main types mentioned. The text also mentions challenges like intermittency for solar and wind, but the question is about advantages, so I should focus on the positive aspects each has. I need to make sure I didn't miss any. Let me check the text again.Yes, the text lists solar, wind, hydroelectric, biomass, and geothermal. Each has their specific advantages as I noted. So the answer should list each type with their respective advantages.
</think>
content: Therefore, the answer is Solar power (abundant and widely available), wind energy (stronger and consistent offshore winds), hydroelectric power (reliable and flexible), biomass energy (carbon neutrality through growth of feedstock), and geothermal energy (reliable and consistent internal heat).

本文来自互联网用户投稿,该文观点仅代表作者本人,不代表本站立场。本站仅提供信息存储空间服务,不拥有所有权,不承担相关法律责任。如若转载,请注明出处:http://www.pswp.cn/diannao/84943.shtml

如若内容造成侵权/违法违规/事实不符,请联系多彩编程网进行投诉反馈email:809451989@qq.com,一经查实,立即删除!

相关文章

学习如何设计大规模系统,为系统设计面试做准备!

前言 在当今快速发展的技术时代&#xff0c;系统设计能力已成为衡量一名软件工程师专业素养的重要标尺。随着云计算、大数据、人工智能等领域的兴起&#xff0c;构建高性能、可扩展且稳定的系统已成为企业成功的关键。然而&#xff0c;对于许多工程师而言&#xff0c;如何有效…

Python生成ppt(python-pptx)N问N答(如何绘制一个没有背景的矩形框;如何绘制一个没有背景的矩形框)

文章目录 [toc]1. **如何安装python-pptx库&#xff1f;**2. **如何创建一个空白PPT文件&#xff1f;**3. **如何添加幻灯片并设置布局&#xff1f;**4. **如何添加文本内容&#xff1f;**5. **如何插入图片&#xff1f;**6. **如何设置动画和转场效果&#xff1f;**9. **如何绘…

命令模式,观察者模式,状态模式,享元模式

什么是命令模式&#xff1f; 核心思想是将原本直接调用的方法封装为对象&#xff08;如AttackCommand&#xff09;&#xff0c;对象包含​​执行逻辑​​和​​上下文信息​​&#xff08;如目标、参数&#xff09;。比如&#xff0c;玩家的按键操作被封装成一个命令对象&#…

Window Server 2019--07 PKI、SSL网站与邮件安全

了解PKI、SSL技术的核心原理掌握PKI架构服务器配置掌握证书管理与应用 公钥基础设施&#xff08;Public Key Infrastructure&#xff0c;PKI&#xff09;是一个完整的颁发、吊销、管理数字证书的系统&#xff0c;是支持认证、加密、完整性和可追究性服务的基础设施。PKI通过第…

从C++编程入手设计模式2——工厂模式

从C编程入手设计模式 工厂模式 ​ 我们马上就要迎来我们的第二个创建型设计模式&#xff1a;工厂方法模式&#xff08;Factory Method Pattern&#xff09;。换而言之&#xff0c;我们希望使用一个这样的接口&#xff0c;使用其他手段而不是直接创建的方式&#xff08;说的有…

MySQL、PostgreSQL、Oracle 区别详解

MySQL、PostgreSQL、Oracle 区别详解 一、基础架构对比 1.1 数据库类型 MySQL:关系型数据库(支持NoSQL插件如MySQL Document Store)PostgreSQL:对象-关系型数据库(支持JSON等半结构化数据)Oracle:多模型数据库(关系型+文档+图+空间等)关键结论:PostgreSQL在数据类型…

window11系统 使用GO语言建立TDengine 连接

目录 1、安装GCC、TDengine-client 1、github下载mingw64 软件包 2、解压指定目录、配置环境变量 3、检验gcc是否安装成功 4、安装TDengine-client 2、配置go环境变量 3、配置Goland 系统变量、重启Goland&#xff08;该软件自己也有系统变量&#xff0c;有时候会和win…

VR 赋能病毒分离鉴定:开启微观探索新视界

在大众认知里&#xff0c;VR 技术往往与沉浸式游戏体验、虚拟社交紧密相连&#xff0c;让人仿佛置身于奇幻的虚拟世界中&#xff0c;感受着科技带来的奇妙娱乐享受。而病毒分离鉴定&#xff0c;听起来则是一个充满专业性与严肃性的科学领域&#xff0c;它关乎病毒的研究、疾病的…

Azure Devops pipeline 技巧和最佳实践

1. 如何显示release pipeline ? 解决方法: 登录devops, 找到organization - pipeline - setting下的Disable creation of classic release pipelines,禁用该选项。 然后在project - pipeline - setting,禁用Disable creation of classic release pipelines 现在可以看到r…

GPU的通信技术

GPU 之间直接通信主要采用了以下几种技术1&#xff1a; GPUDirect P2P&#xff1a;NVIDIA 开发的技术&#xff0c;用于单机上的 GPU 间高速通信。在没有该技术时&#xff0c;GPU 间数据交换需先通过 CPU 和 PCIe 总线复制到主机固定的共享内存&#xff0c;再复制到目标 GPU&…

重新测试deepseek Jakarta EE 10编程能力

听说deepseek做了一个小更新&#xff0c;我重新测试了一下Jakarta EE 10编程能力&#xff1b;有点进步&#xff0c;遗漏的功能比以前少了。 采用Jakarta EE 10 编写员工信息表维护表&#xff0c;包括员工查询与搜索、员工列表、新增员工、删除员工&#xff0c;修改员工&#xf…

​Windows 11 安装 Miniconda 与 Jupyter 全流程指南​

​一、Miniconda 安装与配置​ 1. 下载安装程序 ​访问官网​&#xff1a;打开 Miniconda 官网&#xff0c;下载 ​Python 3.x 版本的 Windows 64 位安装包​。​安装路径选择​&#xff1a; 推荐路径&#xff1a;D:\Miniconda3&#xff08;避免使用中文路径和空格&#xff0…

RuoYi前后端分离框架集成手机短信验证码(一)之后端篇

一、背景 本项目基于RuoYi 3.8.9前后端分离框架构建,采用Spring Security实现系统权限管理。作为企业级应用架构的子模块,系统需要与顶层项目实现用户数据无缝对接(以手机号作为统一用户标识),同时承担用户信息采集的重要职能。为此,我们在保留原有账号密码登录方式的基…

Java ThreadLocal 应用指南:从用户会话到数据库连接的线程安全实践

ThreadLocal 提供了一种线程局部变量&#xff08;thread-local variables&#xff09;的机制&#xff0c;这意味着每个访问该变量的线程都会拥有其自己独立的、初始化的变量副本。这确保了线程之间不会共享数据&#xff0c;也避免了因共享数据而可能产生的竞争条件和同步问题&a…

GitCode镜像门法律分析:PL协议在中国的司法实践

本文以2022年引发广泛争议的GitCode开源代码镜像事件为研究对象&#xff0c;系统分析公共许可证&#xff08;Public License&#xff0c;PL&#xff09;在中国法律体系下的适用性挑战。通过研究中国法院近五年涉及GPL、Apache、MIT等主流协议的21个司法案例&#xff0c;揭示开源…

Rider崩溃问题终极解决指南

JetBrains Rider 2025.1.2 频繁崩溃问题解决指南 问题描述&#xff1a; 编辑器频繁自动崩溃&#xff0c;任务管理器显示大量 Git for Windows 进程被启动。 原因分析&#xff1a; 这是 Rider 的自动版本控制功能导致的。当检测到代码变更时&#xff0c;编辑器会不断尝试启动 …

4 串电池保护芯片创芯微CM1341-DAT使用介绍

特性 专用于 4 串锂/铁/钠电池的保护芯片&#xff0c;内置有高精度电压检测电路和电流检测电路。通过检测各节电池的电压、充放电电流及温度等信息&#xff0c;实现电池过充电、过放电、均衡、断线、低压禁充、放电过电流、短路、充电过电流和过温保护等功能&#xff0c;放电过…

煤矿电液控制器-底座倾角传感器4K型护套连接器ZE0703-09(100)

煤矿电液控制器作为井下自动化开采的核心设备&#xff0c;其可靠性直接关系到生产安全与效率。在众多关键组件中&#xff0c;底座倾角传感器4K型护套连接器ZE0703-09&#xff08;100&#xff09;凭借独特设计成为保障系统稳定运行的"神经末梢"&#xff0c;其技术特性…

Vue计算属性与监视

在Vue.js中&#xff0c;处理复杂的逻辑和数据依赖关系是构建高效、可维护的前端应用的关键。Vue提供了两种强大的工具来帮助我们实现这一点&#xff1a;计算属性&#xff08;Computed Properties&#xff09; 和 侦听器&#xff08;Watchers&#xff09;。本文将深入探讨这两者…

基于RT-Thread的STM32F4开发第七讲——RTC(硬件、软件)

提示&#xff1a;文章写完后&#xff0c;目录可以自动生成&#xff0c;如何生成可参考右边的帮助文档 文章目录 前言一、RT-Thread工程创建1.硬件RTC配置2.软件RTC配置3.RTC闹钟配置 总结 前言 本章是基于RT-Thread studio实现RTC硬件和软件下的日历时钟功能&#xff0c;开发板…