【无GGuF版本】如何在Colab下T4运行gpt-oss 20B

在这里插入图片描述

OpenAI发布了gpt-oss 120B和20B版本。这两个模型均采用Apache 2.0许可证。

特别说明的是,gpt-oss-20b专为低延迟及本地化/专业化场景设计(210亿总参数,36亿活跃参数)。

由于模型采用原生MXFP4量化训练,使得20B版本即便在Google Colab等资源受限环境中也能轻松运行。

作者:Pedro与VB。

由于transformers对mxfp4的支持处于前沿技术阶段,我们需要使用最新版本的PyTorch和CUDA才能安装mxfp4的triton内核。

同时需要从源码安装transformers,并卸载torchvisiontorchaudio以避免依赖冲突。

!pip install -q --upgrade torch accelerate kernels
!pip install -q git+https://github.com/huggingface/transformers triton==3.4 git+https://github.com/triton-lang/triton.git@main#subdirectory=python/triton_kernels
!pip uninstall -q torchvision torchaudio -y
!pip list | grep -E "transformers|triton|torch|accelerate|kernels"
accelerate                            1.10.1
kernels                               0.9.0
sentence-transformers                 5.1.0
torch                                 2.8.0+cu126
torchao                               0.10.0
torchdata                             0.11.0
torchsummary                          1.5.1
torchtune                             0.6.1
transformers                          4.57.0.dev0
triton                                3.4.0
triton_kernels                        1.0.0

在Google Colab中从Hugging Face加载模型

我们从这里加载模型:openai/gpt-oss-20b

from transformers import AutoModelForCausalLM, AutoTokenizer, AutoConfig, Mxfp4Configmodel_id = "openai/gpt-oss-20b"tokenizer = AutoTokenizer.from_pretrained(model_id)
config = AutoConfig.from_pretrained(model_id)
print(config)quantization_config=Mxfp4Config.from_dict(config.quantization_config)
print(quantization_config)model = AutoModelForCausalLM.from_pretrained(model_id,quantization_config=quantization_config,torch_dtype="auto",device_map="cuda",
)
import torch
def print_model_params(model: torch.nn.Module, extra_info="", f=None):# print the number of parameters in the modelmodel_million_params = sum(p.numel() for p in model.parameters()) / 1e6print(model, file=f)print(f"{extra_info} {model_million_params} M parameters", file=f)
print_model_params(model,model_id)
``
`GptOssForCausalLM((model): GptOssModel((embed_tokens): Embedding(201088, 2880, padding_idx=199999)(layers): ModuleList((0-23): 24 x GptOssDecoderLayer((self_attn): GptOssAttention((q_proj): Linear(in_features=2880, out_features=4096, bias=True)(k_proj): Linear(in_features=2880, out_features=512, bias=True)(v_proj): Linear(in_features=2880, out_features=512, bias=True)(o_proj): Linear(in_features=4096, out_features=2880, bias=True))(mlp): GptOssMLP((router): GptOssTopKRouter()(experts): Mxfp4GptOssExperts())(input_layernorm): GptOssRMSNorm((2880,), eps=1e-05)(post_attention_layernorm): GptOssRMSNorm((2880,), eps=1e-05)))(norm): GptOssRMSNorm((2880,), eps=1e-05)(rotary_emb): GptOssRotaryEmbedding())(lm_head): Linear(in_features=2880, out_features=201088, bias=False))openai/gpt-oss-20b 1804.459584 M parameters## 设置消息/聊天您可以提供一个可选的系统提示或直接输入内容。```python
messages = [{"role": "system", "content": "Always respond in riddles"},{"role": "user", "content": "What is the weather like in Madrid?"},
]inputs = tokenizer.apply_chat_template(messages,add_generation_prompt=True,return_tensors="pt",return_dict=True,
).to(model.device)generated = model.generate(**inputs, max_new_tokens=500)
print(tokenizer.decode(generated[0][inputs["input_ids"].shape[-1]:]))
<|channel|>analysis<|message|>We have a system that says "Always respond in riddles". The user asks: "What is the weather like in Madrid?" We need to answer in a riddle style. Probably we need to incorporate a riddle that hints at the weather. Possibly referencing that Madrid has warm days, cooler nights, sometimes unpredictable. The question is to describe the weather, but we must answer in riddles.We need to craft a riddle that describes the weather. We could say something like: "In a place where the summer sun is fierce, yet winter brings a chill, what doth the sky declare? The answer: It's a tapestry of sun, clouds, and gentle rain." But it's in a riddle form, so perhaps like: "A palace of sunshine, the winter brings a chill across its marble steps." But we have to respond in riddles. We can provide a riddle that when solved the answer is "varied: sunny, dry, occasional showers." We can hint at temperatures.Given the requirement, we must answer with a riddle. Probably no other text. So let's produce a riddle. I'll produce maybe an extended riddle. It's okay.We should keep it in Spanish? The user is not specifying a language. The question is in English. The answer in English but in riddle form.Answer: "In the heart of Spain’s southern sun, a desert’s heart lies in the city, yet it drinks rain at intervals, ..." Something. Let's craft.Possible riddle:"Morning bright as fire, midday relentless, night cool as a sigh; the gods of clouds wander, sometimes weeping, sometimes staying away; which realm in Spain does this dance with sun?" The answer: Madrid.But need to describe weather in Madrid.Maybe:"I’m a city with heat that can scorch your thoughts, yet with winter my nights feel like glass. I’m known for blue skies, yet sometimes I cry to thin drops. The clouds come and go like dancers in a waltz. What am I?" The answer: Madrid's weather.Better:"Where the summer sun is a silvered sword, the winter wind gives breathless chill. The clouds are silvered ghosts that sometimes fall; a city that wears both sun and mist. Tell me, what city is this?" Answer: Madrid.However, maybe simpler: "I am a city where summer suns burn, winter nights chill, and clouds sometimes pour.

指定推理力度

messages = [{"role": "system", "content": "Always respond in riddles"},{"role": "user", "content": "Explain why the meaning of life is 42", "reasoning_effort": "high"},
]inputs = tokenizer.apply_chat_template(messages,add_generation_prompt=True,return_tensors="pt",return_dict=True,
).to(model.device)generated = model.generate(**inputs, max_new_tokens=500)
print(tokenizer.decode(generated[0][inputs["input_ids"].shape[-1]:]))
<|channel|>analysis<|message|>The user asks: "Explain why the meaning of life is 42". The instruction: "Always respond in riddles". So we must respond in riddles. The user likely expects an explanation like referencing Hitchhiker's guide: 42 is a random number. But we have to respond with riddles. That may be tricky: we might produce riddle-like explanation about the number 42. We need to produce something that "explains" why 42 is the meaning of life but in riddle form.We need to craft a riddle that describes the reasoning: The answer to everything is 42, because it's a puzzle, because it's a random number, also can be expressed as 6 times 7, 2-digit, and it contains hidden meaning like the sum of ASCII codes maybe.We need to incorporate the explanation: it's an arbitrary number but resonates because of its prime factorization, its location on the periodic table, its relation to Earth, or in the book, the number 42 was given as answer by Deep Thought after 7.5 million years of computation. But we are to answer as riddle.Probably: "What is a number that appears in a science-fiction book, which is the answer to life, the Universe, and everything, and why?".But we are told to respond in riddles, not straightforward explanation.Thus we will write a riddle explaining that 42 is chosen because it's 6 * 7, it's the number of something, it is the number of letters: 'life' etc, or maybe referencing that 42 is the 'Answer' to an existential question but the question itself remains unknown. It's like we ask: "What is the number that is the answer to the big question but which big question?".We might produce a riddle:"Take six from seven to make something; multiply for the meaning; the result is 'the answer'."But is that a riddle? We can present something like:"Two humble companions cross, one half and one full; together they become the number that guides your quest, but the question remains unseen. Who are they and what is their sum?" But that is a riddle. Maybe we need to explain why 42 is the meaning of life.Alternatively, we might produce a longer riddle that explains, e.g., we can incorporate that 42 is the only number that ... but we must incorporate explanation.We can

本文来自互联网用户投稿,该文观点仅代表作者本人,不代表本站立场。本站仅提供信息存储空间服务,不拥有所有权,不承担相关法律责任。
如若转载,请注明出处:http://www.pswp.cn/diannao/98450.shtml
繁体地址,请注明出处:http://hk.pswp.cn/diannao/98450.shtml

如若内容造成侵权/违法违规/事实不符,请联系多彩编程网进行投诉反馈email:809451989@qq.com,一经查实,立即删除!

相关文章

LeetCode - LCR 179. 查找总价格为目标值的两个商品

题目 https://leetcode.cn/problems/he-wei-sde-liang-ge-shu-zi-lcof/submissions/660817798/ 思路 解法1是暴力解法&#xff0c;从第一个开始和后面的相加 暴力枚举慢就慢在&#xff0c;这个递增数组是排序好的数组&#xff0c;已经是有序的&#xff0c;暴力解法没有利用这…

UI自动化测试Python + Selenium + WinAppDriver( Windows 桌面应用)落地(一)环境搭建

最近公司要求为Windows 端桌面应用进行UI自动化测试,之前都是针对web端进行的UI自动化测试或者在早期使用的是QTP(Quick Test Professional)做PC端的UI自动化测试,而基于"经费"紧张,优先选择开源的工具,所以选择了selenium + WinAppDriver来实现。 首先,整理…

基于OpenCV的银行卡号识别系统:从原理到实现

引言在现代金融科技应用中&#xff0c;银行卡号的自动识别是一项重要技术。本文将详细介绍如何使用Python和OpenCV库构建一个完整的银行卡号识别系统。该系统能够从银行卡图像中提取卡号信息&#xff0c;并根据卡号首数字判断银行卡类型。技术栈​OpenCV: 计算机视觉库&#xf…

概率论第三讲——多维随机变量及其分布

文章目录考纲n维随机变量及其分布函数联合分布函数边缘分布函数二维离散型随机变量的概率分布、边缘分布和条件分布二维连续型随机变量的概率密度、边缘概率密度和条件概率密度常见的二位分布二维均匀分布二维正态分布随机变量的相互独立性概念相互独立的充要条件相互独立的性质…

纯软件实现电脑屏幕录制/存储到硬盘录像机/onvif模拟器/onvif虚拟监控/绿色版双击开箱即用

一、前言说明 在银行、超市、考试中心、工控系统、网课教学、居家办公等场景中&#xff0c;传统监控摄像头难以清晰录制电脑屏幕内容&#xff0c;导致关键操作无法有效追溯。为解决这一难题&#xff0c;我们推出了一套纯软件实现的电子屏幕监控方案&#xff0c;彻底取代依赖硬…

【算法--链表】86.分割链表--通俗讲解

一、题目是啥?一句话说清 给你一个链表和一个值 x,把链表分成两部分:所有小于 x 的节点都放在大于或等于 x 的节点之前,并且保持节点原来的相对顺序。 示例: 输入:head = [1,4,3,2,5,2], x = 3 输出:[1,2,2,4,3,5](所有小于3的节点1、2、2都在大于等于3的节点4、3、5…

707, 设计链表, LinkedList, 单链表, Dummy Head, C++

目录 题意速览解题思路与设计要点C 代码实现&#xff08;单链表 虚拟头结点&#xff09;时间复杂度与空间复杂度常见坑位与边界用例对比&#xff1a;双链表如何优化单元测试样例&#xff08;可直接粘贴运行&#xff09;总结 题意速览 设计一个支持如下操作的链表&#xff1a…

NAS自建笔记服务leanote2

leanote2(GitHub - wiselike/leanote2: leanote2, 适用于NAS自建的笔记服务) 是一个开源的在线笔记应用程序&#xff0c;继承自原 leanote 项目。向原 leanote 的开发者表示深深的感谢与尊重&#xff0c;正是他们的辛勤付出奠定了这个优秀的笔记平台的基础。 但由于 leanote 项…

模型剪枝----ResNet18剪枝实战

剪枝 模型剪枝&#xff08;Model Pruning&#xff09; 是一种 模型压缩&#xff08;Model Compression&#xff09; 技术&#xff0c;主要思想是&#xff1a; 深度神经网络里有很多 冗余参数&#xff08;对预测结果贡献很小&#xff09;。 通过去掉这些冗余连接/通道/卷积核&am…

K8S-Pod(上)

Pod概念 Pod 是可以在 Kubernetes 中创建和管理的、最小的可部署的计算单元。 Pod是一组&#xff08;一个或多个&#xff09;容器&#xff1b;这些容器共享存储、网络、以及怎样运行这些容器的规约。Pod 中的内容总是并置&#xff08;colocated&#xff09;的并且一同调度&am…

Flink TaskManager日志时间与实际时间有偏差

Flink 启动一个任务后&#xff0c;发现TaskManager上日志时间与实际时间相差约 15 小时。 核心原因可能是&#xff1a; 1、 服务器&#xff08;或容器&#xff09;的系统时间配置错误2、 Flink 日志组件&#xff08;如 Logback/Log4j&#xff09;的时间配置未使用系统默认时区…

Webug3.0通关笔记18 中级进阶第06关 实战练习:DisCuz论坛SQL注入漏洞

目录 一、环境搭建 1、服务启动 2、源码解压 3、构造访问靶场URL 4、靶场安装 5、访问论坛首页 二、代码分析 1、源码分析 2、SQL注入分析 三、渗透实战 &#xff08;1&#xff09;判断是否有SQL注入风险 &#xff08;2&#xff09;查询账号密码 Discuz! 作为国内知…

SWEET:大语言模型的选择性水印

摘要背景与问题大语言模型出色的生成能力引发了伦理与法律层面的担忧&#xff0c;于是通过嵌入水印来检测机器生成文本的方法逐渐发展起来。但现有工作在代码生成任务中无法良好发挥作用&#xff0c;原因在于代码生成任务本身的特性&#xff08;代码有其特定的语法、逻辑结构&a…

FastDFS V6双IP特性及配置

FastDFS V6.0开始支持双IP&#xff0c;tracker server和storage server均支持双IP。V6.0新增特性说明如下&#xff1a;支持双IP&#xff0c;一个内网IP&#xff0c;一个外网IP&#xff0c;可以支持NAT方式的内网和外网两个IP&#xff0c;解决跨机房或混合云部署问题。FastDFS双…

笔记本、平板如何成为电脑拓展屏?向日葵16成为副屏功能一键实现

向日葵16重磅上线&#xff0c;本次更新新增了诸多实用功能&#xff0c;提升远控效率&#xff0c;实现应用融合突破设备边界&#xff0c;同时全面提升远控性能&#xff0c;操作更顺滑、画质更清晰&#xff01;无论远程办公、设计、IT运维、开发还是游戏娱乐&#xff0c;向日葵16…

基于Spring Boot + MyBatis的用户管理系统配置

我来为您详细分析这两个配置文件的功能和含义。 一、文件整体概述 这是一个基于Spring Boot MyBatis的用户管理系统配置&#xff1a; UserMapper.xml&#xff1a;MyBatis的SQL映射文件&#xff0c;定义了用户表的增删改查操作application.yml&#xff1a;Spring Boot的核心配置…

80(HTTP默认端口)和8080端口(备用HTTP端口)区别

文章目录**1. 用途**- **80端口**- **8080端口****2. 默认配置**- **80端口**- **8080端口****3. 联系**- **逻辑端口**&#xff1a;两者都是TCP/IP协议中的逻辑端口&#xff0c;用于标识不同的网络服务。- **可配置性**&#xff1a;端口号可以根据需要修改&#xff08;例如将T…

【开题答辩全过程】以 汽车知名品牌信息管理系统为例,包含答辩的问题和答案

个人简介一名14年经验的资深毕设内行人&#xff0c;语言擅长Java、php、微信小程序、Python、Golang、安卓Android等开发项目包括大数据、深度学习、网站、小程序、安卓、算法。平常会做一些项目定制化开发、代码讲解、答辩教学、文档编写、也懂一些降重方面的技巧。感谢大家的…

从全栈工程师视角解析Java与前端技术在电商场景中的应用

从全栈工程师视角解析Java与前端技术在电商场景中的应用 面试背景介绍 面试官&#xff1a;你好&#xff0c;很高兴见到你。我叫李明&#xff0c;是这家电商平台的资深架构师。今天我们会聊聊你的技术能力和项目经验。你可以先简单介绍一下自己吗&#xff1f; 应聘者&#xff1a…

【python】python进阶——多线程

引言在现代软件开发中&#xff0c;程序的执行效率至关重要。无论是处理大量数据、响应用户交互&#xff0c;还是与外部系统通信&#xff0c;常常需要让程序同时执行多个任务。Python作为一门功能强大且易于学习的编程语言&#xff0c;提供了多种并发编程方式&#xff0c;其中多…