LlamaFactory/unsloth Demo

内部叫Tuning-Factory

参数文档https://llamafactory.readthedocs.io/zh-cn/latest/index.html

高级技巧，如加速：https://llamafactory.readthedocs.io/zh-cn/latest/advanced/acceleration.html

0.环境

conda env list
conda remove --name llm --all
conda create -n llm python=3.10 (切记不能11，具体看readme.md的推荐版本)
conda activate llm
cd LLaMA-Factory
pip install -e ".[torch,metrics]" -i https://pypi.tuna.tsinghua.edu.cn/simple --no-build-isolation成功
（
可以尝试指定tag，例如：
git clone --branch 0.7.1 --depth 1 https://github.com/username/repository.git
pip install llamafactory[metrics]==0.7.1；
指定tag直接wget下载代码压缩包
当以git clone方式下载时默认下载当时最新dev版本：0.9.4.dev0）
）
llamafactory-cli version（有点久）
在这里插入图片描述

之前尝试出现的报错：
在这里插入图片描述

llamafactory-cli version报错ImportError: cannot import name 'logging' from 'huggingface_hub'

from transformers import AutoTokenizer,AutoModelForCausalLM报错

1. SFT

基座模型下载

多种途径

魔塔社区模型库，git clone https://www.modelscope.cn/Qwen/Qwen2.5-0.5B-Instruct.git
huggingface

cli微调

进入Llama-Factory仓库目录，
如果是自定义数据集，则将数据集json文件移动到data目录下，同时修改data目录下的data_info.json，添加key名和自定义数据集名（value）

"identity_xuefeng": {"file_name": "identity_xuefeng.json"},

复制examples\train_qlora下提供的llama3_lora_sft_awq.yaml文件，修改文件名，修改文件内容：

model_name_or_path （提前下载的基座模型的绝对路径）
template: llama3或qwen
dataset: /data/data_info.json里的key值
output_dir：相对路径，saves/Qwen2.5-0.5B-Instruct/lora/sft（仓库目录下）
epoch

llamafactory-cli help

 Usage:                                                             |
|   llamafactory-cli api -h: launch an OpenAI-style API server       |
|   llamafactory-cli chat -h: launch a chat interface in CLI         |
|   llamafactory-cli eval -h: evaluate models                        |
|   llamafactory-cli export -h: merge LoRA adapters and export model |
|   llamafactory-cli train -h: train models                          |
|   llamafactory-cli webchat -h: launch a chat interface in Web UI   |
|   llamafactory-cli webui: launch LlamaBoard                        |
|   llamafactory-cli version: show version info

llamafactory-cli train examples/train_qlora/xuefeng_qwen_lora_sft_awq.yaml

在这里插入图片描述

llamafactory-cli version

FORCE_TORCHRUN=1 CUDA_VISIBLE_DEVICES=0 llamafactory-cli train examples/train_qlora/xuefeng_qwen_lora_sft_awq.yaml

web UI微调

llamafactory-cli webui

export USE_MODELSCOPE_HUB=1 && llamafactory-cli webui

CUDA_VISIBLE_DEVICES=0 USE_MODELSCOPE_HUB=1 llamafactory-cli webui

2. 推理

基座模型直接推理

/home/jinxuefeng.jxf/xuefeng/LLaMA-Factory/examples/inference/llama3_lora_sft.yaml
复制examples\inference下提供的llama3_lora_sft.yaml文件，修改文件名，修改文件内容：

model_name_or_path （提前下载的基座模型的绝对路径）
template: llama3或qwen
adapter_name_or_path: saves/Qwen2.5-0.5B-Instruct/lora/sft （SFT输出路径，或者注释掉则仅使用基座模型）
infer_backend: huggingface # choices: [huggingface, vllm, sglang]
trust_remote_code: true

llamafactory-cli chat examples/inference/xuefeng_qwen_lora_sft.yaml

报错1：
MaxRetryError: HTTPSConnectionPool(host=‘huggingface.co’, port=443)
解决：vim ~/.bashrc，添加
export HF_ENDPOINT=https://hf-mirror.com
source ~/.bashrc
conda activate llm

报错LOCAL_RANK，卡了较久时间
export LOCAL_RANK=0
FORCE_TORCHRUN=1 CUDA_VISIBLE_DEVICES=0 llamafactory-cli chat examples/inference/xuefeng_qwen_lora_sft.yaml

解决：pip install e.安装的4.52.4报上述LOCAL_RANK的错误，降级transofrmers=4.51.3，尽管requirements.txt要求范围内都行

transformers>=4.49.0,<=4.52.4,!=4.52.0; sys_platform != 'darwin'
transformers>=4.49.0,<=4.51.3,!=4.52.0; sys_platform == 'darwin'

仅降级transformers仍不够还会报错，还需修改命令，最终成功运行命令为：
LOCAL_RANK=0 MASTER_ADDR=127.0.0.1 MASTER_PORT=12346 WORLD_SIZE=1 RANK=0 CUDA_VISIBLE_DEVICES=0 llamafactory-cli chat examples/inference/xuefeng_qwen_lora_sft.yaml
事实上，最小运行单位为：
LOCAL_RANK=0 MASTER_ADDR=127.0.0.1 MASTER_PORT=12345 llamafactory-cli chat examples/inference/xuefeng_qwen_lora_sft.yaml
可以尝试
export LOCAL_RANK=0
export MASTER_ADDR=127.0.0.1

至此走通下载模型直接推理（非SFT后的模型）链路

SFT后推理

然而，(Q)lora SFT后推理仍然报错：
RuntimeError: aten.add_.Tensor: got mixed torch.Tensor and DTensor, need to convert all torch.Tensor to DTensor before calling distributed operators!

无论推理脚本里是否有或注释掉finetuning_type: lora

详细错误如下

[INFO|2025-07-12 18:58:51] llamafactory.model.model_utils.attention:143 >> Using torch SDPA for faster training and inference.
[rank0]: Traceback (most recent call last):
[rank0]:   File "/home/jinxuefeng.jxf/.conda/envs/llm/bin/llamafactory-cli", line 8, in <module>
[rank0]:     sys.exit(main())
[rank0]:   File "/home/jinxuefeng.jxf/xuefeng/LLaMA-Factory/src/llamafactory/cli.py", line 151, in main
[rank0]:     COMMAND_MAP[command]()
[rank0]:   File "/home/jinxuefeng.jxf/xuefeng/LLaMA-Factory/src/llamafactory/chat/chat_model.py", line 154, in run_chat
[rank0]:     chat_model = ChatModel()
[rank0]:   File "/home/jinxuefeng.jxf/xuefeng/LLaMA-Factory/src/llamafactory/chat/chat_model.py", line 53, in __init__
[rank0]:     self.engine: BaseEngine = HuggingfaceEngine(model_args, data_args, finetuning_args, generating_args)
[rank0]:   File "/home/jinxuefeng.jxf/xuefeng/LLaMA-Factory/src/llamafactory/chat/hf_engine.py", line 59, in __init__
[rank0]:     self.model = load_model(
[rank0]:   File "/home/jinxuefeng.jxf/xuefeng/LLaMA-Factory/src/llamafactory/model/loader.py", line 184, in load_model
[rank0]:     model = init_adapter(config, model, model_args, finetuning_args, is_trainable)
[rank0]:   File "/home/jinxuefeng.jxf/xuefeng/LLaMA-Factory/src/llamafactory/model/adapter.py", line 300, in init_adapter
[rank0]:     model = _setup_lora_tuning(
[rank0]:   File "/home/jinxuefeng.jxf/xuefeng/LLaMA-Factory/src/llamafactory/model/adapter.py", line 184, in _setup_lora_tuning
[rank0]:     model = model.merge_and_unload()
[rank0]:   File "/home/jinxuefeng.jxf/.conda/envs/llm/lib/python3.10/site-packages/peft/tuners/lora/model.py", line 900, in merge_and_unload
[rank0]:     return self._unload_and_optionally_merge(
[rank0]:   File "/home/jinxuefeng.jxf/.conda/envs/llm/lib/python3.10/site-packages/peft/tuners/lora/model.py", line 531, in _unload_and_optionally_merge
[rank0]:     target.merge(safe_merge=safe_merge, adapter_names=adapter_names)
[rank0]:   File "/home/jinxuefeng.jxf/.conda/envs/llm/lib/python3.10/site-packages/peft/tuners/lora/layer.py", line 617, in merge
[rank0]:     base_layer.weight.data += delta_weight
[rank0]:   File "/home/jinxuefeng.jxf/.conda/envs/llm/lib/python3.10/site-packages/torch/_compile.py", line 51, in inner
[rank0]:     return disable_fn(*args, **kwargs)
[rank0]:   File "/home/jinxuefeng.jxf/.conda/envs/llm/lib/python3.10/site-packages/torch/_dynamo/eval_frame.py", line 838, in _fn
[rank0]:     return fn(*args, **kwargs)
[rank0]:   File "/home/jinxuefeng.jxf/.conda/envs/llm/lib/python3.10/site-packages/torch/distributed/tensor/_api.py", line 344, in __torch_dispatch__
[rank0]:     return DTensor._op_dispatcher.dispatch(
[rank0]:   File "/home/jinxuefeng.jxf/.conda/envs/llm/lib/python3.10/site-packages/torch/distributed/tensor/_dispatch.py", line 167, in dispatch
[rank0]:     op_info = self.unwrap_to_op_info(op_call, args, kwargs)
[rank0]:   File "/home/jinxuefeng.jxf/.conda/envs/llm/lib/python3.10/site-packages/torch/distributed/tensor/_dispatch.py", line 366, in unwrap_to_op_info
[rank0]:     self._try_replicate_spec_for_scalar_tensor(
[rank0]:   File "/home/jinxuefeng.jxf/.conda/envs/llm/lib/python3.10/site-packages/torch/distributed/tensor/_dispatch.py", line 468, in _try_replicate_spec_for_scalar_tensor
[rank0]:     raise RuntimeError(
[rank0]: RuntimeError: aten.add_.Tensor: got mixed torch.Tensor and DTensor, need to convert all torch.Tensor to DTensor before calling distributed operators!
[rank0]:[W712 18:58:52.262922106 ProcessGroupNCCL.cpp:1479] Warning: WARNING: destroy_process_group() was not called before program exit, which can leak resources. For more info, please see https://pytorch.org/docs/stable/distributed.html#shutdown (function operator())

尝试先export merge后infer，正常！！(merge之前的SFT 训练train命令无论是否设置LOCAL_RANK=0 MASTER_ADDR=127.0.0.1 MASTER_PORT=12345都行)

merge_lora.yarm代码

model_name_or_path: merge/Qwen2.5-0.5B-Instruct/identity_xuefeng
# model_name_or_path: /home/jinxuefeng.jxf/xuefeng/LLaMA-Factory/offline_models/qwen/Qwen2.5-0.5B-Instruct
# model_name_or_path: /home/jinxuefeng.jxf/xuefeng/LLaMA-Factory/offline_models/llama/Llama-3.2-1B-Instruct
# adapter_name_or_path: saves/Qwen2.5-0.5B-Instruct/lora/sftT
template: qwen # llama3
infer_backend: huggingface  # choices: [huggingface, vllm, sglang]
trust_remote_code: true
# finetuning_type: lora

3. 微调数据集

大模型数据集格式分为sharegpt和apaca

alpaca格式

通用指令微调：
{
“instruction”: “将以下中文翻译成英文”,
“input”: “今天的天气非常好”,
“output”: “The weather is very nice today.”
}

instruction：明确的任务指令（必须存在）
input：任务输入内容（可能为空）
output：期望的输出结果（必须存在）

sharegpt格式

多轮对话：

{"id": "chatcmpl-7F6Wr8JQ6JgB","conversations": [{"from": "human", "value": "Python里如何快速排序列表？"},{"from": "gpt", "value": "可以使用sorted()函数..."},{"from": "human", "value": "时间复杂度是多少？"}]
}

def alpaca_to_sharegpt(alpaca_data):
return {
“conversations”: [
{“from”: “human”, “value”: f"{alpaca_data[‘instruction’]}\n{alpaca_data[‘input’]}"},
{“from”: “gpt”, “value”: alpaca_data[“output”]}
]
}

Easy Dataset

LLM微调数据集创建工具
处理文本，生成alpaca、sharegpt格式数据集
https://github.com/ConardLi/easy-dataset
可以输入任意文本，自动生成问题及对应的答案
「文献处理-问题生成-答案构建-标签管理-格式导出」

git clone https://www.modelscope.cn/datasets/xiaofengalg/Chinese-medical-dialogue.git
进入魔塔数据，下载数据至LLaMA-Factory/data/xxx.json
在这里插入图片描述

"custom_sft_train_data":{
"file_name":"Chinese-medical-dialogue/data/train_0001_of_0001.json",
"columns":{
"prompt":"instruction",
"query":"input",
"response":"output"}
},

按数据集格式编写格式，写进LLaMA-Factory/data/data_info.json中

若数据集已经是sharegpt格式：

"data_name":{
"file_name":"xx/xxx/xx.json",
"formatting": "sharegpt"
},

3. 目录结果及模型pt结构

仓库目录

LLAMA-Factory 的项目目录结构，下面将对个几个比较重要的文件和文件夹做简要介绍，方便大家了解整体的框架：

文件夹
assets
- 用途：通常用于存放项目的静态资源，如图像、样式表、JavaScript 文件等。
- 说明：这些资源可能用于前端展示或用户界面。
data
- 用途：存放数据集、配置文件或其他与数据相关的文件。（微调的数据集下载后就放在这里）
- 说明：这些文件可能包括训练数据、测试数据或模型配置信息。
docker
- 用途：包含 Docker 相关的配置文件和脚本，用于容器化部署。
- 说明：这些文件帮助自动化部署过程，确保在不同环境中的一致性。
evaluation
- 用途：存放评估模型性能的脚本和工具。
- 说明：这些脚本用于衡量模型的准确性和其他指标。
examples
- 用途：提供示例代码和用例，帮助用户快速上手。（微调和训练的参数配置文件在这里）
- 说明：这些示例展示了如何使用项目中的功能。
scripts
- 用途：存放各种脚本文件，用于自动化任务或辅助功能。
- 说明：这些脚本可能包括数据预处理、模型训练等任务。
src
- 用途：存放项目的源代码。
- 说明：这是项目的核心代码所在的地方。
tests
- 用途：存放测试代码，用于验证项目功能的正确性。
- 说明：这些测试脚本确保代码的质量和稳定性。

微调/训练后的输出文件

model
config.json
模型配置文件，包含模型架构、参数等。
generation_config.json
生成时的配置
merges.txt
分词器（tokenizer）的合并规则文件，用于将子词组合成完整的词汇
model.safetensors
安全的二进制格式，存放模型权重信息。模型较大时可能出现多个切片文件
optimizer.pt （存储最大）
猜测优化器状态
scheduler.pt
tokenizer_config.json
tokenizer.json
vocab.jaon
词表