内部叫Tuning-Factory
参数文档https://llamafactory.readthedocs.io/zh-cn/latest/index.html
高级技巧,如加速:https://llamafactory.readthedocs.io/zh-cn/latest/advanced/acceleration.html
0.环境
conda env list
conda remove --name llm --all
conda create -n llm python=3.10
(切记不能11,具体看readme.md的推荐版本)
conda activate llm
cd LLaMA-Factory
pip install -e ".[torch,metrics]" -i https://pypi.tuna.tsinghua.edu.cn/simple --no-build-isolation
成功
(
可以尝试指定tag,例如:
git clone --branch 0.7.1 --depth 1 https://github.com/username/repository.git
pip install llamafactory[metrics]==0.7.1
;
指定tag直接wget下载代码压缩包
当以git clone方式下载时默认下载当时最新dev版本:0.9.4.dev0)
)
llamafactory-cli version(有点久)
之前尝试出现的报错:
llamafactory-cli version报错ImportError: cannot import name 'logging' from 'huggingface_hub'
from transformers import AutoTokenizer,AutoModelForCausalLM报错
1. SFT
基座模型下载
多种途径
- 魔塔社区模型库,git clone https://www.modelscope.cn/Qwen/Qwen2.5-0.5B-Instruct.git
- huggingface
cli微调
进入Llama-Factory仓库目录,
如果是自定义数据集,则将数据集json文件移动到data目录下,同时修改data目录下的data_info.json,添加key名和自定义数据集名(value)
"identity_xuefeng": {"file_name": "identity_xuefeng.json"},
复制examples\train_qlora下提供的llama3_lora_sft_awq.yaml文件,修改文件名,修改文件内容:
- model_name_or_path (提前下载的基座模型的绝对路径)
- template: llama3或qwen
- dataset: /data/data_info.json里的key值
- output_dir:相对路径,saves/Qwen2.5-0.5B-Instruct/lora/sft(仓库目录下)
- epoch
llamafactory-cli help
Usage: |
| llamafactory-cli api -h: launch an OpenAI-style API server |
| llamafactory-cli chat -h: launch a chat interface in CLI |
| llamafactory-cli eval -h: evaluate models |
| llamafactory-cli export -h: merge LoRA adapters and export model |
| llamafactory-cli train -h: train models |
| llamafactory-cli webchat -h: launch a chat interface in Web UI |
| llamafactory-cli webui: launch LlamaBoard |
| llamafactory-cli version: show version info
llamafactory-cli train examples/train_qlora/xuefeng_qwen_lora_sft_awq.yaml
llamafactory-cli version
FORCE_TORCHRUN=1 CUDA_VISIBLE_DEVICES=0 llamafactory-cli train examples/train_qlora/xuefeng_qwen_lora_sft_awq.yaml
web UI微调
llamafactory-cli webui
export USE_MODELSCOPE_HUB=1 && llamafactory-cli webui
CUDA_VISIBLE_DEVICES=0 USE_MODELSCOPE_HUB=1 llamafactory-cli webui
2. 推理
基座模型直接推理
/home/jinxuefeng.jxf/xuefeng/LLaMA-Factory/examples/inference/llama3_lora_sft.yaml
复制examples\inference下提供的llama3_lora_sft.yaml文件,修改文件名,修改文件内容:
- model_name_or_path (提前下载的基座模型的绝对路径)
- template: llama3或qwen
- adapter_name_or_path: saves/Qwen2.5-0.5B-Instruct/lora/sft (SFT输出路径,或者注释掉则仅使用基座模型)
- infer_backend: huggingface # choices: [huggingface, vllm, sglang]
- trust_remote_code: true
llamafactory-cli chat examples/inference/xuefeng_qwen_lora_sft.yaml
报错1:
MaxRetryError: HTTPSConnectionPool(host=‘huggingface.co’, port=443)
解决:vim ~/.bashrc,添加
export HF_ENDPOINT=https://hf-mirror.com
source ~/.bashrc
conda activate llm
报错LOCAL_RANK,卡了较久时间
export LOCAL_RANK=0
FORCE_TORCHRUN=1 CUDA_VISIBLE_DEVICES=0 llamafactory-cli chat examples/inference/xuefeng_qwen_lora_sft.yaml
解决:pip install e.安装的4.52.4报上述LOCAL_RANK的错误,降级transofrmers=4.51.3,尽管requirements.txt要求范围内都行
transformers>=4.49.0,<=4.52.4,!=4.52.0; sys_platform != 'darwin'
transformers>=4.49.0,<=4.51.3,!=4.52.0; sys_platform == 'darwin'
仅降级transformers仍不够还会报错,还需修改命令,最终成功运行命令为:
LOCAL_RANK=0 MASTER_ADDR=127.0.0.1 MASTER_PORT=12346 WORLD_SIZE=1 RANK=0 CUDA_VISIBLE_DEVICES=0 llamafactory-cli chat examples/inference/xuefeng_qwen_lora_sft.yaml
事实上,最小运行单位为:
LOCAL_RANK=0 MASTER_ADDR=127.0.0.1 MASTER_PORT=12345 llamafactory-cli chat examples/inference/xuefeng_qwen_lora_sft.yaml
可以尝试
export LOCAL_RANK=0
export MASTER_ADDR=127.0.0.1
至此走通下载模型直接推理(非SFT后的模型)链路
SFT后推理
然而,(Q)lora SFT后推理仍然报错:
RuntimeError: aten.add_.Tensor: got mixed torch.Tensor and DTensor, need to convert all torch.Tensor to DTensor before calling distributed operators!
无论推理脚本里是否有或注释掉finetuning_type: lora
详细错误如下
[INFO|2025-07-12 18:58:51] llamafactory.model.model_utils.attention:143 >> Using torch SDPA for faster training and inference.
[rank0]: Traceback (most recent call last):
[rank0]: File "/home/jinxuefeng.jxf/.conda/envs/llm/bin/llamafactory-cli", line 8, in <module>
[rank0]: sys.exit(main())
[rank0]: File "/home/jinxuefeng.jxf/xuefeng/LLaMA-Factory/src/llamafactory/cli.py", line 151, in main
[rank0]: COMMAND_MAP[command]()
[rank0]: File "/home/jinxuefeng.jxf/xuefeng/LLaMA-Factory/src/llamafactory/chat/chat_model.py", line 154, in run_chat
[rank0]: chat_model = ChatModel()
[rank0]: File "/home/jinxuefeng.jxf/xuefeng/LLaMA-Factory/src/llamafactory/chat/chat_model.py", line 53, in __init__
[rank0]: self.engine: BaseEngine = HuggingfaceEngine(model_args, data_args, finetuning_args, generating_args)
[rank0]: File "/home/jinxuefeng.jxf/xuefeng/LLaMA-Factory/src/llamafactory/chat/hf_engine.py", line 59, in __init__
[rank0]: self.model = load_model(
[rank0]: File "/home/jinxuefeng.jxf/xuefeng/LLaMA-Factory/src/llamafactory/model/loader.py", line 184, in load_model
[rank0]: model = init_adapter(config, model, model_args, finetuning_args, is_trainable)
[rank0]: File "/home/jinxuefeng.jxf/xuefeng/LLaMA-Factory/src/llamafactory/model/adapter.py", line 300, in init_adapter
[rank0]: model = _setup_lora_tuning(
[rank0]: File "/home/jinxuefeng.jxf/xuefeng/LLaMA-Factory/src/llamafactory/model/adapter.py", line 184, in _setup_lora_tuning
[rank0]: model = model.merge_and_unload()
[rank0]: File "/home/jinxuefeng.jxf/.conda/envs/llm/lib/python3.10/site-packages/peft/tuners/lora/model.py", line 900, in merge_and_unload
[rank0]: return self._unload_and_optionally_merge(
[rank0]: File "/home/jinxuefeng.jxf/.conda/envs/llm/lib/python3.10/site-packages/peft/tuners/lora/model.py", line 531, in _unload_and_optionally_merge
[rank0]: target.merge(safe_merge=safe_merge, adapter_names=adapter_names)
[rank0]: File "/home/jinxuefeng.jxf/.conda/envs/llm/lib/python3.10/site-packages/peft/tuners/lora/layer.py", line 617, in merge
[rank0]: base_layer.weight.data += delta_weight
[rank0]: File "/home/jinxuefeng.jxf/.conda/envs/llm/lib/python3.10/site-packages/torch/_compile.py", line 51, in inner
[rank0]: return disable_fn(*args, **kwargs)
[rank0]: File "/home/jinxuefeng.jxf/.conda/envs/llm/lib/python3.10/site-packages/torch/_dynamo/eval_frame.py", line 838, in _fn
[rank0]: return fn(*args, **kwargs)
[rank0]: File "/home/jinxuefeng.jxf/.conda/envs/llm/lib/python3.10/site-packages/torch/distributed/tensor/_api.py", line 344, in __torch_dispatch__
[rank0]: return DTensor._op_dispatcher.dispatch(
[rank0]: File "/home/jinxuefeng.jxf/.conda/envs/llm/lib/python3.10/site-packages/torch/distributed/tensor/_dispatch.py", line 167, in dispatch
[rank0]: op_info = self.unwrap_to_op_info(op_call, args, kwargs)
[rank0]: File "/home/jinxuefeng.jxf/.conda/envs/llm/lib/python3.10/site-packages/torch/distributed/tensor/_dispatch.py", line 366, in unwrap_to_op_info
[rank0]: self._try_replicate_spec_for_scalar_tensor(
[rank0]: File "/home/jinxuefeng.jxf/.conda/envs/llm/lib/python3.10/site-packages/torch/distributed/tensor/_dispatch.py", line 468, in _try_replicate_spec_for_scalar_tensor
[rank0]: raise RuntimeError(
[rank0]: RuntimeError: aten.add_.Tensor: got mixed torch.Tensor and DTensor, need to convert all torch.Tensor to DTensor before calling distributed operators!
[rank0]:[W712 18:58:52.262922106 ProcessGroupNCCL.cpp:1479] Warning: WARNING: destroy_process_group() was not called before program exit, which can leak resources. For more info, please see https://pytorch.org/docs/stable/distributed.html#shutdown (function operator())
尝试先export merge后infer,正常!!(merge之前的SFT 训练train命令无论是否设置LOCAL_RANK=0 MASTER_ADDR=127.0.0.1 MASTER_PORT=12345都行)
merge_lora.yarm代码
model_name_or_path: merge/Qwen2.5-0.5B-Instruct/identity_xuefeng
# model_name_or_path: /home/jinxuefeng.jxf/xuefeng/LLaMA-Factory/offline_models/qwen/Qwen2.5-0.5B-Instruct
# model_name_or_path: /home/jinxuefeng.jxf/xuefeng/LLaMA-Factory/offline_models/llama/Llama-3.2-1B-Instruct
# adapter_name_or_path: saves/Qwen2.5-0.5B-Instruct/lora/sftT
template: qwen # llama3
infer_backend: huggingface # choices: [huggingface, vllm, sglang]
trust_remote_code: true
# finetuning_type: lora
3. 微调数据集
大模型数据集格式分为sharegpt和apaca
alpaca格式
通用指令微调:
{
“instruction”: “将以下中文翻译成英文”,
“input”: “今天的天气非常好”,
“output”: “The weather is very nice today.”
}
instruction:明确的任务指令(必须存在)
input:任务输入内容(可能为空)
output:期望的输出结果(必须存在)
sharegpt格式
多轮对话:
{"id": "chatcmpl-7F6Wr8JQ6JgB","conversations": [{"from": "human", "value": "Python里如何快速排序列表?"},{"from": "gpt", "value": "可以使用sorted()函数..."},{"from": "human", "value": "时间复杂度是多少?"}]
}
def alpaca_to_sharegpt(alpaca_data):
return {
“conversations”: [
{“from”: “human”, “value”: f"{alpaca_data[‘instruction’]}\n{alpaca_data[‘input’]}"},
{“from”: “gpt”, “value”: alpaca_data[“output”]}
]
}
Easy Dataset
LLM微调数据集创建工具
处理文本,生成alpaca、sharegpt格式数据集
https://github.com/ConardLi/easy-dataset
可以输入任意文本,自动生成问题及对应的答案
「文献处理-问题生成-答案构建-标签管理-格式导出」
git clone https://www.modelscope.cn/datasets/xiaofengalg/Chinese-medical-dialogue.git
进入魔塔数据,下载数据至LLaMA-Factory/data/xxx.json
"custom_sft_train_data":{
"file_name":"Chinese-medical-dialogue/data/train_0001_of_0001.json",
"columns":{
"prompt":"instruction",
"query":"input",
"response":"output"}
},
按数据集格式编写格式,写进LLaMA-Factory/data/data_info.json中
若数据集已经是sharegpt格式:
"data_name":{
"file_name":"xx/xxx/xx.json",
"formatting": "sharegpt"
},
3. 目录结果及模型pt结构
仓库目录
LLAMA-Factory 的项目目录结构,下面将对个几个比较重要的文件和文件夹做简要介绍,方便大家了解整体的框架:
- 文件夹
- assets
- 用途:通常用于存放项目的静态资源,如图像、样式表、JavaScript 文件等。
- 说明:这些资源可能用于前端展示或用户界面。
- data
- 用途:存放数据集、配置文件或其他与数据相关的文件。(微调的数据集下载后就放在这里)
- 说明:这些文件可能包括训练数据、测试数据或模型配置信息。
- docker
- 用途:包含 Docker 相关的配置文件和脚本,用于容器化部署。
- 说明:这些文件帮助自动化部署过程,确保在不同环境中的一致性。
- evaluation
- 用途:存放评估模型性能的脚本和工具。
- 说明:这些脚本用于衡量模型的准确性和其他指标。
- examples
- 用途:提供示例代码和用例,帮助用户快速上手。(微调和训练的参数配置文件在这里)
- 说明:这些示例展示了如何使用项目中的功能。
- scripts
- 用途:存放各种脚本文件,用于自动化任务或辅助功能。
- 说明:这些脚本可能包括数据预处理、模型训练等任务。
- src
- 用途:存放项目的源代码。
- 说明:这是项目的核心代码所在的地方。
- tests
- 用途:存放测试代码,用于验证项目功能的正确性。
- 说明:这些测试脚本确保代码的质量和稳定性。
微调/训练后的输出文件
-
model
-
config.json
模型配置文件,包含模型架构、参数等。 -
generation_config.json
生成时的配置 -
merges.txt
分词器(tokenizer)的合并规则文件,用于将子词组合成完整的词汇 -
model.safetensors
安全的二进制格式,存放模型权重信息。模型较大时可能出现多个切片文件 -
optimizer.pt (存储最大)
猜测优化器状态 -
scheduler.pt
-
tokenizer_config.json
-
tokenizer.json
-
vocab.jaon
词表