【GPT入门】第43课 使用LlamaFactory微调Llama3

【GPT入门】第43课 使用LlamaFactory微调Llama3

  • 1.环境准备
  • 2. 下载基座模型
  • 3.LLaMA-Factory部署与启动
  • 4. 重新训练![在这里插入图片描述](https://i-blog.csdnimg.cn/direct/e7aa869f8e2c4951a0983f0918e1b638.png)

1.环境准备

采购autodl服务器,24G,GPU,型号3090,每小时大概2元。
在这里插入图片描述

2. 下载基座模型

下载基座模型:

source /etc/network_turbo
pip install -U huggingface_hub
![在这里插入图片描述](https://i-blog.csdnimg.cn/direct/078976ae914a4bb2a9d98dc78bfb5a92.png)export HF_ENDPOINT=https://hf-mirror.comsource /etc/network_turbohuggingface-cli download --resume-download shenzhi-wang/Llama3-8B-Chinese-Chat --local-dir /root/autodl-tmp/models/Llama3-8B-Chinese-Chat

说明:
配置hf国内镜像
export HF_ENDPOINT=https://hf-mirror.com
autodl的学术加速
source /etc/network_turbo

3.LLaMA-Factory部署与启动

  • 官网介绍
    https://github.com/hiyouga/LLaMA-Factory

在数据盘下载LLaMA-Facotry

cd /root/autodl-tmp
  • 下载
git clone --depth 1 https://github.com/hiyouga/LLaMA-Factory.git
cd LLaMA-Factory
pip install -e .
  • 启动ui
llamafactory-cli webui

在这里插入图片描述
配置好模型相关参数

可以生成命令,点击开始,就开始训练
在这里插入图片描述

训练损失:
在这里插入图片描述
训练结果:
在这里插入图片描述

4. 重新训练在这里插入图片描述

配置修改
在这里插入图片描述

  • 查看内存使用 nvitop
    训练过程,内存,GPU都使用完。
    在这里插入图片描述
  • 记录训练过程
    在这里插入图片描述
  • 训练结果
null,"rope_theta": 500000.0,"tie_word_embeddings": false,"torch_dtype": "bfloat16","transformers_version": "4.55.0","use_cache": true,"vocab_size": 128256
}[INFO|2025-08-10 17:41:53] logging.py:143 >> KV cache is disabled during training.
[INFO|2025-08-10 17:41:53] modeling_utils.py:1305 >> loading weights file /root/autodl-tmp/models/Llama3-8B-Chinese-Chat1/model.safetensors.index.json
[INFO|2025-08-10 17:41:53] modeling_utils.py:2411 >> Instantiating LlamaForCausalLM model under default dtype torch.bfloat16.
[INFO|2025-08-10 17:41:53] configuration_utils.py:1098 >> Generate config GenerationConfig {"bos_token_id": 128000,"eos_token_id": 128009,"use_cache": false
}[INFO|2025-08-10 17:41:55] modeling_utils.py:5606 >> All model checkpoint weights were used when initializing LlamaForCausalLM.[INFO|2025-08-10 17:41:55] modeling_utils.py:5614 >> All the weights of LlamaForCausalLM were initialized from the model checkpoint at /root/autodl-tmp/models/Llama3-8B-Chinese-Chat1.
If your task is similar to the task the model of the checkpoint was trained on, you can already use LlamaForCausalLM for predictions without further training.
[INFO|2025-08-10 17:41:55] configuration_utils.py:1051 >> loading configuration file /root/autodl-tmp/models/Llama3-8B-Chinese-Chat1/generation_config.json
[INFO|2025-08-10 17:41:55] configuration_utils.py:1098 >> Generate config GenerationConfig {"bos_token_id": 128000,"eos_token_id": 128009,"pad_token_id": 128009
}[INFO|2025-08-10 17:41:55] logging.py:143 >> Gradient checkpointing enabled.
[INFO|2025-08-10 17:41:55] logging.py:143 >> Using torch SDPA for faster training and inference.
[INFO|2025-08-10 17:41:55] logging.py:143 >> Upcasting trainable params to float32.
[INFO|2025-08-10 17:41:55] logging.py:143 >> Fine-tuning method: LoRA
[INFO|2025-08-10 17:41:55] logging.py:143 >> Found linear modules: k_proj,gate_proj,o_proj,down_proj,v_proj,up_proj,q_proj
[INFO|2025-08-10 17:41:55] logging.py:143 >> trainable params: 20,971,520 || all params: 8,051,232,768 || trainable%: 0.2605
[INFO|2025-08-10 17:41:56] trainer.py:757 >> Using auto half precision backend
[INFO|2025-08-10 17:41:56] trainer.py:2433 >> ***** Running training *****
[INFO|2025-08-10 17:41:56] trainer.py:2434 >>   Num examples = 481
[INFO|2025-08-10 17:41:56] trainer.py:2435 >>   Num Epochs = 10
[INFO|2025-08-10 17:41:56] trainer.py:2436 >>   Instantaneous batch size per device = 2
[INFO|2025-08-10 17:41:56] trainer.py:2439 >>   Total train batch size (w. parallel, distributed & accumulation) = 16
[INFO|2025-08-10 17:41:56] trainer.py:2440 >>   Gradient Accumulation steps = 8
[INFO|2025-08-10 17:41:56] trainer.py:2441 >>   Total optimization steps = 310
[INFO|2025-08-10 17:41:56] trainer.py:2442 >>   Number of trainable parameters = 20,971,520
[INFO|2025-08-10 17:42:46] logging.py:143 >> {'loss': 1.2718, 'learning_rate': 4.9979e-05, 'epoch': 0.17, 'throughput': 973.44}
[INFO|2025-08-10 17:43:33] logging.py:143 >> {'loss': 1.1509, 'learning_rate': 4.9896e-05, 'epoch': 0.33, 'throughput': 977.09}
[INFO|2025-08-10 17:44:25] logging.py:143 >> {'loss': 1.1788, 'learning_rate': 4.9749e-05, 'epoch': 0.50, 'throughput': 973.40}
[INFO|2025-08-10 17:45:15] logging.py:143 >> {'loss': 1.1129, 'learning_rate': 4.9538e-05, 'epoch': 0.66, 'throughput': 972.88}
[INFO|2025-08-10 17:46:07] logging.py:143 >> {'loss': 1.0490, 'learning_rate': 4.9264e-05, 'epoch': 0.83, 'throughput': 973.34}
[INFO|2025-08-10 17:46:52] logging.py:143 >> {'loss': 1.0651, 'learning_rate': 4.8928e-05, 'epoch': 1.00, 'throughput': 970.40}
[INFO|2025-08-10 17:47:32] logging.py:143 >> {'loss': 1.0130, 'learning_rate': 4.8531e-05, 'epoch': 1.13, 'throughput': 968.83}
[INFO|2025-08-10 17:48:25] logging.py:143 >> {'loss': 1.0207, 'learning_rate': 4.8073e-05, 'epoch': 1.30, 'throughput': 968.68}
[INFO|2025-08-10 17:49:20] logging.py:143 >> {'loss': 0.9748, 'learning_rate': 4.7556e-05, 'epoch': 1.46, 'throughput': 968.56}
[INFO|2025-08-10 17:50:18] logging.py:143 >> {'loss': 0.9706, 'learning_rate': 4.6980e-05, 'epoch': 1.63, 'throughput': 969.27}
[INFO|2025-08-10 17:51:10] logging.py:143 >> {'loss': 0.8635, 'learning_rate': 4.6349e-05, 'epoch': 1.80, 'throughput': 968.72}
[INFO|2025-08-10 17:51:52] logging.py:143 >> {'loss': 0.8789, 'learning_rate': 4.5663e-05, 'epoch': 1.96, 'throughput': 967.37}
[INFO|2025-08-10 17:52:26] logging.py:143 >> {'loss': 0.7969, 'learning_rate': 4.4923e-05, 'epoch': 2.10, 'throughput': 966.36}
[INFO|2025-08-10 17:53:13] logging.py:143 >> {'loss': 0.8251, 'learning_rate': 4.4133e-05, 'epoch': 2.27, 'throughput': 965.43}
[INFO|2025-08-10 17:54:01] logging.py:143 >> {'loss': 0.8170, 'learning_rate': 4.3293e-05, 'epoch': 2.43, 'throughput': 965.72}
[INFO|2025-08-10 17:54:52] logging.py:143 >> {'loss': 0.7979, 'learning_rate': 4.2407e-05, 'epoch': 2.60, 'throughput': 965.57}
[INFO|2025-08-10 17:55:41] logging.py:143 >> {'loss': 0.8711, 'learning_rate': 4.1476e-05, 'epoch': 2.76, 'throughput': 966.39}
[INFO|2025-08-10 17:56:35] logging.py:143 >> {'loss': 0.8219, 'learning_rate': 4.0502e-05, 'epoch': 2.93, 'throughput': 966.59}
[INFO|2025-08-10 17:57:19] logging.py:143 >> {'loss': 0.8487, 'learning_rate': 3.9489e-05, 'epoch': 3.07, 'throughput': 967.22}
[INFO|2025-08-10 17:58:06] logging.py:143 >> {'loss': 0.6639, 'learning_rate': 3.8438e-05, 'epoch': 3.23, 'throughput': 967.08}
[INFO|2025-08-10 17:58:06] trainer.py:4408 >> 
***** Running Evaluation *****
[INFO|2025-08-10 17:58:06] trainer.py:4410 >>   Num examples = 10
[INFO|2025-08-10 17:58:06] trainer.py:4413 >>   Batch size = 2
[INFO|2025-08-10 17:58:08] trainer.py:4074 >> Saving model checkpoint to saves/Llama-3-8B-Instruct/lora/train_2025-08-10-17-35-45/checkpoint-100
[INFO|2025-08-10 17:58:08] configuration_utils.py:750 >> loading configuration file /root/autodl-tmp/models/Llama3-8B-Chinese-Chat1/config.json
[INFO|2025-08-10 17:58:08] configuration_utils.py:817 >> Model config LlamaConfig {"architectures": ["LlamaForCausalLM"],"attention_bias": false,"attention_dropout": 0.0,"bos_token_id": 128000,"eos_token_id": 128009,"head_dim": 128,"hidden_act": "silu","hidden_size": 4096,"initializer_range": 0.02,"intermediate_size": 14336,"max_position_embeddings": 8192,"mlp_bias": false,"model_type": "llama","num_attention_heads": 32,"num_hidden_layers": 32,"num_key_value_heads": 8,"pretraining_tp": 1,"rms_norm_eps": 1e-05,"rope_scaling": null,"rope_theta": 500000.0,"tie_word_embeddings": false,"torch_dtype": "bfloat16","transformers_version": "4.55.0","use_cache": true,"vocab_size": 128256
}[INFO|2025-08-10 17:58:08] tokenization_utils_base.py:2393 >> chat template saved in saves/Llama-3-8B-Instruct/lora/train_2025-08-10-17-35-45/checkpoint-100/chat_template.jinja
[INFO|2025-08-10 17:58:08] tokenization_utils_base.py:2562 >> tokenizer config file saved in saves/Llama-3-8B-Instruct/lora/train_2025-08-10-17-35-45/checkpoint-100/tokenizer_config.json
[INFO|2025-08-10 17:58:08] tokenization_utils_base.py:2571 >> Special tokens file saved in saves/Llama-3-8B-Instruct/lora/train_2025-08-10-17-35-45/checkpoint-100/special_tokens_map.json
[INFO|2025-08-10 17:58:57] logging.py:143 >> {'loss': 0.7453, 'learning_rate': 3.7353e-05, 'epoch': 3.40, 'throughput': 964.85}
[INFO|2025-08-10 17:59:43] logging.py:143 >> {'loss': 0.7854, 'learning_rate': 3.6237e-05, 'epoch': 3.56, 'throughput': 964.81}
[INFO|2025-08-10 18:00:31] logging.py:143 >> {'loss': 0.7068, 'learning_rate': 3.5091e-05, 'epoch': 3.73, 'throughput': 964.43}
[INFO|2025-08-10 18:01:27] logging.py:143 >> {'loss': 0.7155, 'learning_rate': 3.3920e-05, 'epoch': 3.90, 'throughput': 964.82}
[INFO|2025-08-10 18:02:05] logging.py:143 >> {'loss': 0.6858, 'learning_rate': 3.2725e-05, 'epoch': 4.03, 'throughput': 964.73}
[INFO|2025-08-10 18:02:52] logging.py:143 >> {'loss': 0.6058, 'learning_rate': 3.1511e-05, 'epoch': 4.20, 'throughput': 964.82}
[INFO|2025-08-10 18:03:41] logging.py:143 >> {'loss': 0.6702, 'learning_rate': 3.0280e-05, 'epoch': 4.37, 'throughput': 964.40}
[INFO|2025-08-10 18:04:29] logging.py:143 >> {'loss': 0.6454, 'learning_rate': 2.9036e-05, 'epoch': 4.53, 'throughput': 964.05}
[INFO|2025-08-10 18:05:23] logging.py:143 >> {'loss': 0.6122, 'learning_rate': 2.7781e-05, 'epoch': 4.70, 'throughput': 964.45}
[INFO|2025-08-10 18:06:15] logging.py:143 >> {'loss': 0.6598, 'learning_rate': 2.6519e-05, 'epoch': 4.86, 'throughput': 964.33}
[INFO|2025-08-10 18:06:55] logging.py:143 >> {'loss': 0.5058, 'learning_rate': 2.5253e-05, 'epoch': 5.00, 'throughput': 964.34}
[INFO|2025-08-10 18:07:57] logging.py:143 >> {'loss': 0.6128, 'learning_rate': 2.3987e-05, 'epoch': 5.17, 'throughput': 964.73}
[INFO|2025-08-10 18:08:49] logging.py:143 >> {'loss': 0.5506, 'learning_rate': 2.2723e-05, 'epoch': 5.33, 'throughput': 965.04}
[INFO|2025-08-10 18:09:33] logging.py:143 >> {'loss': 0.4599, 'learning_rate': 2.1465e-05, 'epoch': 5.50, 'throughput': 964.54}
[INFO|2025-08-10 18:10:21] logging.py:143 >> {'loss': 0.5927, 'learning_rate': 2.0216e-05, 'epoch': 5.66, 'throughput': 964.48}
[INFO|2025-08-10 18:11:14] logging.py:143 >> {'loss': 0.5014, 'learning_rate': 1.8979e-05, 'epoch': 5.83, 'throughput': 964.93}
[INFO|2025-08-10 18:11:58] logging.py:143 >> {'loss': 0.5629, 'learning_rate': 1.7758e-05, 'epoch': 6.00, 'throughput': 965.11}
[INFO|2025-08-10 18:12:35] logging.py:143 >> {'loss': 0.3429, 'learning_rate': 1.6555e-05, 'epoch': 6.13, 'throughput': 964.97}
[INFO|2025-08-10 18:13:27] logging.py:143 >> {'loss': 0.5253, 'learning_rate': 1.5374e-05, 'epoch': 6.30, 'throughput': 965.11}
[INFO|2025-08-10 18:14:15] logging.py:143 >> {'loss': 0.4926, 'learning_rate': 1.4218e-05, 'epoch': 6.46, 'throughput': 965.22}
[INFO|2025-08-10 18:14:15] trainer.py:4408 >> 
***** Running Evaluation *****
[INFO|2025-08-10 18:14:15] trainer.py:4410 >>   Num examples = 10
[INFO|2025-08-10 18:14:15] trainer.py:4413 >>   Batch size = 2
[INFO|2025-08-10 18:14:18] trainer.py:4074 >> Saving model checkpoint to saves/Llama-3-8B-Instruct/lora/train_2025-08-10-17-35-45/checkpoint-200
[INFO|2025-08-10 18:14:18] configuration_utils.py:750 >> loading configuration file /root/autodl-tmp/models/Llama3-8B-Chinese-Chat1/config.json
[INFO|2025-08-10 18:14:18] configuration_utils.py:817 >> Model config LlamaConfig {"architectures": ["LlamaForCausalLM"],"attention_bias": false,"attention_dropout": 0.0,"bos_token_id": 128000,"eos_token_id": 128009,"head_dim": 128,"hidden_act": "silu","hidden_size": 4096,"initializer_range": 0.02,"intermediate_size": 14336,"max_position_embeddings": 8192,"mlp_bias": false,"model_type": "llama","num_attention_heads": 32,"num_hidden_layers": 32,"num_key_value_heads": 8,"pretraining_tp": 1,"rms_norm_eps": 1e-05,"rope_scaling": null,"rope_theta": 500000.0,"tie_word_embeddings": false,"torch_dtype": "bfloat16","transformers_version": "4.55.0","use_cache": true,"vocab_size": 128256
}[INFO|2025-08-10 18:14:18] tokenization_utils_base.py:2393 >> chat template saved in saves/Llama-3-8B-Instruct/lora/train_2025-08-10-17-35-45/checkpoint-200/chat_template.jinja
[INFO|2025-08-10 18:14:18] tokenization_utils_base.py:2562 >> tokenizer config file saved in saves/Llama-3-8B-Instruct/lora/train_2025-08-10-17-35-45/checkpoint-200/tokenizer_config.json
[INFO|2025-08-10 18:14:18] tokenization_utils_base.py:2571 >> Special tokens file saved in saves/Llama-3-8B-Instruct/lora/train_2025-08-10-17-35-45/checkpoint-200/special_tokens_map.json
[INFO|2025-08-10 18:15:09] logging.py:143 >> {'loss': 0.4487, 'learning_rate': 1.3090e-05, 'epoch': 6.63, 'throughput': 963.77}
[INFO|2025-08-10 18:16:01] logging.py:143 >> {'loss': 0.5197, 'learning_rate': 1.1992e-05, 'epoch': 6.80, 'throughput': 964.14}
[INFO|2025-08-10 18:16:48] logging.py:143 >> {'loss': 0.4752, 'learning_rate': 1.0927e-05, 'epoch': 6.96, 'throughput': 964.07}
[INFO|2025-08-10 18:17:29] logging.py:143 >> {'loss': 0.4830, 'learning_rate': 9.8985e-06, 'epoch': 7.10, 'throughput': 964.03}
[INFO|2025-08-10 18:18:11] logging.py:143 >> {'loss': 0.4122, 'learning_rate': 8.9088e-06, 'epoch': 7.27, 'throughput': 963.90}
[INFO|2025-08-10 18:19:01] logging.py:143 >> {'loss': 0.3856, 'learning_rate': 7.9603e-06, 'epoch': 7.43, 'throughput': 963.90}
[INFO|2025-08-10 18:19:49] logging.py:143 >> {'loss': 0.4221, 'learning_rate': 7.0557e-06, 'epoch': 7.60, 'throughput': 963.93}
[INFO|2025-08-10 18:20:43] logging.py:143 >> {'loss': 0.4601, 'learning_rate': 6.1970e-06, 'epoch': 7.76, 'throughput': 963.97}
[INFO|2025-08-10 18:21:38] logging.py:143 >> {'loss': 0.4670, 'learning_rate': 5.3867e-06, 'epoch': 7.93, 'throughput': 964.27}
[INFO|2025-08-10 18:22:17] logging.py:143 >> {'loss': 0.3538, 'learning_rate': 4.6267e-06, 'epoch': 8.07, 'throughput': 964.06}
[INFO|2025-08-10 18:23:05] logging.py:143 >> {'loss': 0.3823, 'learning_rate': 3.9190e-06, 'epoch': 8.23, 'throughput': 963.94}
[INFO|2025-08-10 18:23:58] logging.py:143 >> {'loss': 0.3467, 'learning_rate': 3.2654e-06, 'epoch': 8.40, 'throughput': 963.98}
[INFO|2025-08-10 18:24:45] logging.py:143 >> {'loss': 0.4004, 'learning_rate': 2.6676e-06, 'epoch': 8.56, 'throughput': 963.92}
[INFO|2025-08-10 18:25:33] logging.py:143 >> {'loss': 0.4647, 'learning_rate': 2.1271e-06, 'epoch': 8.73, 'throughput': 963.86}
[INFO|2025-08-10 18:26:28] logging.py:143 >> {'loss': 0.3902, 'learning_rate': 1.6454e-06, 'epoch': 8.90, 'throughput': 964.21}
[INFO|2025-08-10 18:27:05] logging.py:143 >> {'loss': 0.4114, 'learning_rate': 1.2236e-06, 'epoch': 9.03, 'throughput': 963.96}
[INFO|2025-08-10 18:28:06] logging.py:143 >> {'loss': 0.4310, 'learning_rate': 8.6282e-07, 'epoch': 9.20, 'throughput': 964.55}
[INFO|2025-08-10 18:28:53] logging.py:143 >> {'loss': 0.4251, 'learning_rate': 5.6401e-07, 'epoch': 9.37, 'throughput': 964.58}
[INFO|2025-08-10 18:29:35] logging.py:143 >> {'loss': 0.3407, 'learning_rate': 3.2793e-07, 'epoch': 9.53, 'throughput': 964.39}
[INFO|2025-08-10 18:30:26] logging.py:143 >> {'loss': 0.4325, 'learning_rate': 1.5518e-07, 'epoch': 9.70, 'throughput': 964.48}
[INFO|2025-08-10 18:30:26] trainer.py:4408 >> 
***** Running Evaluation *****
[INFO|2025-08-10 18:30:26] trainer.py:4410 >>   Num examples = 10
[INFO|2025-08-10 18:30:26] trainer.py:4413 >>   Batch size = 2
[INFO|2025-08-10 18:30:28] trainer.py:4074 >> Saving model checkpoint to saves/Llama-3-8B-Instruct/lora/train_2025-08-10-17-35-45/checkpoint-300
[INFO|2025-08-10 18:30:28] configuration_utils.py:750 >> loading configuration file /root/autodl-tmp/models/Llama3-8B-Chinese-Chat1/config.json
[INFO|2025-08-10 18:30:28] configuration_utils.py:817 >> Model config LlamaConfig {"architectures": ["LlamaForCausalLM"],"attention_bias": false,"attention_dropout": 0.0,"bos_token_id": 128000,"eos_token_id": 128009,"head_dim": 128,"hidden_act": "silu","hidden_size": 4096,"initializer_range": 0.02,"intermediate_size": 14336,"max_position_embeddings": 8192,"mlp_bias": false,"model_type": "llama","num_attention_heads": 32,"num_hidden_layers": 32,"num_key_value_heads": 8,"pretraining_tp": 1,"rms_norm_eps": 1e-05,"rope_scaling": null,"rope_theta": 500000.0,"tie_word_embeddings": false,"torch_dtype": "bfloat16","transformers_version": "4.55.0","use_cache": true,"vocab_size": 128256
}[INFO|2025-08-10 18:30:28] tokenization_utils_base.py:2393 >> chat template saved in saves/Llama-3-8B-Instruct/lora/train_2025-08-10-17-35-45/checkpoint-300/chat_template.jinja
[INFO|2025-08-10 18:30:28] tokenization_utils_base.py:2562 >> tokenizer config file saved in saves/Llama-3-8B-Instruct/lora/train_2025-08-10-17-35-45/checkpoint-300/tokenizer_config.json
[INFO|2025-08-10 18:30:28] tokenization_utils_base.py:2571 >> Special tokens file saved in saves/Llama-3-8B-Instruct/lora/train_2025-08-10-17-35-45/checkpoint-300/special_tokens_map.json
[INFO|2025-08-10 18:31:15] logging.py:143 >> {'loss': 0.3490, 'learning_rate': 4.6201e-08, 'epoch': 9.86, 'throughput': 963.47}
[INFO|2025-08-10 18:32:01] logging.py:143 >> {'loss': 0.3733, 'learning_rate': 1.2838e-09, 'epoch': 10.00, 'throughput': 963.59}
[INFO|2025-08-10 18:32:01] trainer.py:4074 >> Saving model checkpoint to saves/Llama-3-8B-Instruct/lora/train_2025-08-10-17-35-45/checkpoint-310
[INFO|2025-08-10 18:32:01] configuration_utils.py:750 >> loading configuration file /root/autodl-tmp/models/Llama3-8B-Chinese-Chat1/config.json
[INFO|2025-08-10 18:32:01] configuration_utils.py:817 >> Model config LlamaConfig {"architectures": ["LlamaForCausalLM"],"attention_bias": false,"attention_dropout": 0.0,"bos_token_id": 128000,"eos_token_id": 128009,"head_dim": 128,"hidden_act": "silu","hidden_size": 4096,"initializer_range": 0.02,"intermediate_size": 14336,"max_position_embeddings": 8192,"mlp_bias": false,"model_type": "llama","num_attention_heads": 32,"num_hidden_layers": 32,"num_key_value_heads": 8,"pretraining_tp": 1,"rms_norm_eps": 1e-05,"rope_scaling": null,"rope_theta": 500000.0,"tie_word_embeddings": false,"torch_dtype": "bfloat16","transformers_version": "4.55.0","use_cache": true,"vocab_size": 128256
}[INFO|2025-08-10 18:32:01] tokenization_utils_base.py:2393 >> chat template saved in saves/Llama-3-8B-Instruct/lora/train_2025-08-10-17-35-45/checkpoint-310/chat_template.jinja
[INFO|2025-08-10 18:32:01] tokenization_utils_base.py:2562 >> tokenizer config file saved in saves/Llama-3-8B-Instruct/lora/train_2025-08-10-17-35-45/checkpoint-310/tokenizer_config.json
[INFO|2025-08-10 18:32:01] tokenization_utils_base.py:2571 >> Special tokens file saved in saves/Llama-3-8B-Instruct/lora/train_2025-08-10-17-35-45/checkpoint-310/special_tokens_map.json
[INFO|2025-08-10 18:32:01] trainer.py:2718 >> Training completed. Do not forget to share your model on huggingface.co/models =)[INFO|2025-08-10 18:32:01] trainer.py:4074 >> Saving model checkpoint to saves/Llama-3-8B-Instruct/lora/train_2025-08-10-17-35-45
[INFO|2025-08-10 18:32:01] configuration_utils.py:750 >> loading configuration file /root/autodl-tmp/models/Llama3-8B-Chinese-Chat1/config.json
[INFO|2025-08-10 18:32:01] configuration_utils.py:817 >> Model config LlamaConfig {"architectures": ["LlamaForCausalLM"],"attention_bias": false,"attention_dropout": 0.0,"bos_token_id": 128000,"eos_token_id": 128009,"head_dim": 128,"hidden_act": "silu","hidden_size": 4096,"initializer_range": 0.02,"intermediate_size": 14336,"max_position_embeddings": 8192,"mlp_bias": false,"model_type": "llama","num_attention_heads": 32,"num_hidden_layers": 32,"num_key_value_heads": 8,"pretraining_tp": 1,"rms_norm_eps": 1e-05,"rope_scaling": null,"rope_theta": 500000.0,"tie_word_embeddings": false,"torch_dtype": "bfloat16","transformers_version": "4.55.0","use_cache": true,"vocab_size": 128256
}[INFO|2025-08-10 18:32:01] tokenization_utils_base.py:2393 >> chat template saved in saves/Llama-3-8B-Instruct/lora/train_2025-08-10-17-35-45/chat_template.jinja
[INFO|2025-08-10 18:32:01] tokenization_utils_base.py:2562 >> tokenizer config file saved in saves/Llama-3-8B-Instruct/lora/train_2025-08-10-17-35-45/tokenizer_config.json
[INFO|2025-08-10 18:32:01] tokenization_utils_base.py:2571 >> Special tokens file saved in saves/Llama-3-8B-Instruct/lora/train_2025-08-10-17-35-45/special_tokens_map.json
[WARNING|2025-08-10 18:32:02] logging.py:148 >> No metric eval_accuracy to plot.
[INFO|2025-08-10 18:32:02] trainer.py:4408 >> 
***** Running Evaluation *****
[INFO|2025-08-10 18:32:02] trainer.py:4410 >>   Num examples = 10
[INFO|2025-08-10 18:32:02] trainer.py:4413 >>   Batch size = 2
[INFO|2025-08-10 18:32:04] modelcard.py:456 >> Dropping the following result as it does not have all the necessary fields:
{'task': {'name': 'Causal Language Modeling', 'type': 'text-generation'}}
  • 训练完成
    在这里插入图片描述
  • 训练结果
    在这里插入图片描述
  • 训练后,内存自动被释放
    在这里插入图片描述

本文来自互联网用户投稿,该文观点仅代表作者本人,不代表本站立场。本站仅提供信息存储空间服务,不拥有所有权,不承担相关法律责任。
如若转载,请注明出处:http://www.pswp.cn/web/92718.shtml
繁体地址,请注明出处:http://hk.pswp.cn/web/92718.shtml

如若内容造成侵权/违法违规/事实不符,请联系多彩编程网进行投诉反馈email:809451989@qq.com,一经查实,立即删除!

相关文章

计算机网络:如何理解目的网络不再是一个完整的分类网络

这一理解主要源于无分类域间路由(CIDR)技术的广泛应用,它打破了传统的基于类的IP地址分配方式。具体可从以下方面理解: 传统分类网络的局限性:在早期互联网中,IP地址被分为A、B、C等固定类别,每…

小米开源大模型 MiDashengLM-7B:不仅是“听懂”,更能“理解”声音

目录 前言 一、一枚“重磅炸弹”:开源,意味着一扇大门的敞开 二、揭秘MiDashengLM-7B:它究竟“神”在哪里? 2.1 “超级耳朵” 与 “智慧大脑” 的协作 2.2 突破:从 “听见文字” 到 “理解世界” 2.3 创新训练&a…

mysql出现大量redolog、undolog排查以及解决方案

排查步骤 监控日志增长情况 -- 查看InnoDB状态 SHOW ENGINE INNODB STATUS;-- 查看redo log配置和使用情况 SHOW VARIABLES LIKE innodb_log_file%; SHOW VARIABLES LIKE innodb_log_buffer_size;-- 查看undo log信息 SHOW VARIABLES LIKE innodb_undo%;检查长时间运行的事务 -…

华为网路设备学习-28(BGP协议 三)路由策略

目录: 一、BGP路由汇总1、注:使用network命令注入的BGP不会被自动汇总2、主类网络号计算过程如下:3.示例 开启BGP路由自动汇总bgp100 开启BGP路由自动汇总import-route 直连路由 11.1.1.0 /24对端 为 10.1.12.2 AS 2004.手动配置BGP路…

微信小程序中实现表单数据实时验证的方法

一、实时验证的基本实现思路表单实时时验证通过监听表单元素的输入事件,在用户输入过程中即时对数据进行校验,并并即时反馈验证结果,主要实现步骤包括:为每个表单字段绑定输入事件在事件处理函数中获取当前输入值应用验证规则进行…

openpnp - 顶部相机如果超过6.5米影响通讯质量,可以加USB3.0信号放大器延长线

文章目录openpnp - 顶部相机如果超过6.5米影响通讯质量,可以加USB3.0信号放大器延长线概述备注ENDopenpnp - 顶部相机如果超过6.5米影响通讯质量,可以加USB3.0信号放大器延长线 概述 手头有1080x720x60FPS的摄像头模组备件,换上后&#xff…

【驱动】RK3576-Debian系统使用ping报错:socket operation not permitted

1、问题描述 在RK3576-Debian系统中,连接了Wifi后,测试网络通断时,报错: ping www.csdn.net ping: socktype: SOCK_RAW ping: socket: Operation not permitted ping: => missing cap_net_raw+p capability or setuid?2、原因分析 2.1 分析打印日志 socktype: SOCK…

opencv:图像轮廓检测与轮廓近似(附代码)

目录 图像轮廓 cv2.findContours(img, mode, method) 绘制轮廓 轮廓特征与近似 轮廓特征 轮廓近似 轮廓近似原理 opencv 实现轮廓近似 轮廓外接矩形 轮廓外接圆 图像轮廓 cv2.findContours(img, mode, method) mode:轮廓检索模式(通常使用第四个模式&am…

mtrace定位内存泄漏问题(仅限 GNU glibc 的 Linux)

一、mtrace原理 函数拦截机制:mtrace 利用 glibc 的内部机制,对 malloc() / calloc() / realloc() / free() 等内存函数进行 hook,记录每一次分配和释放行为。日志记录:记录会写入 MALLOC_TRACE 环境变量指定的日志文件中&#xf…

高校合作 | 世冠科技联合普华、北邮项目入选教育部第二批工程案例

近日,教育部学位与研究生教育发展中心正式公布第二批工程案例立项名单。由北京世冠金洋科技发展有限公司牵头,联合普华基础软件、北京邮电大学共同申报的"基于国产软件栈的汽车嵌入式软件开发工程案例"成功入选。该项目由北京邮电大学修佳鹏副…

TOMCAT笔记

一、前置知识:Web 技术演进 C/S vs B/S – C/S:Socket 编程,QQ、迅雷等,通信层 TCP/UDP,协议私有。 – B/S:浏览器 HTTP,文本协议跨网络。 动态网页诞生 早期静态 HTML → 1990 年 HTTP 浏览…

上海一家机器人IPO核心零部件依赖外购, 募投计划频繁修改引疑

作者:Eric来源:IPO魔女8月8日,节卡机器人股份有限公司(简称“节卡股份”)将接受上交所科创板IPO上会审核。公司保荐机构为国泰海通证券股份有限公司,拟募集资金为6.76亿元。报告期内,节卡股份营…

Linux810 shell 条件判断 文件工具 ifelse

变量 条件判断 -ne 不等 $(id -u) -eq [codesamba ~]$ [ $(id -u) -ne 0 ] && echo "the user is not admin" the user is not admin [codesamba ~]$ [ $(id -u) -eq 0] && echo "yes admin" || echo "no not " -bash: [: 缺少 …

ChatGPT 5的编程能力宣传言过其实

2025年的8月7日,OpenAI 正式向全球揭开了GPT-5的神秘面纱,瞬间在 AI 领域乃至整个科技圈引发了轩然大波。OpenAI对GPT-5的宣传可谓不遗余力,将其描绘成一款具有颠覆性变革的 AI 产品,尤其在编程能力方面,给出了诸多令人…

从MySQL到大数据平台:基于Spark的离线分析实战指南

引言在当今数据驱动的商业环境中,企业业务数据通常存储在MySQL等关系型数据库中,但当数据量增长到千万级甚至更高时,直接在MySQL中进行复杂分析会导致性能瓶颈。本文将详细介绍如何将MySQL业务数据迁移到大数据平台,并通过Spark等…

Mysql笔记-存储过程与存储函数

1. 存储过程(Stored Procedure) 1.1 概述 1.1.1 定义: 存储过程是一组预编译的 SQL 语句和控制流语句(如条件判断、循环)的集合,​无返回值​(但可通过 OUT/INOUT 参数或结果集返回数据)。它支持参数传递、…

[论文阅读] 人工智能 + 软件工程 | LLM协作新突破:用多智能体强化学习实现高效协同——解析MAGRPO算法

LLM协作新突破:用多智能体强化学习实现高效协同——解析MAGRPO算法 论文:LLM Collaboration With Multi-Agent Reinforcement LearningarXiv:2508.04652 (cross-list from cs.AI) LLM Collaboration With Multi-Agent Reinforcement Learning Shuo Liu, …

使用OAK相机实现智能物料检测与ABB机械臂抓取

大家好!今天我们很高兴能与大家分享来自OAK的国外用户——Vention 的这段精彩视频,展示了他们的AI操作系统在现实中的应用——在演示中,进行实时的自动物料拣选。 OAK相机实时自动AI物料拣选视频中明显可以看到我们的OAK-D Pro PoE 3D边缘AI相…

html5和vue区别

HTML5 是网页开发的核心标准,而 Vue 是构建用户界面的JavaScript框架,两者在功能定位和开发模式上有显著差异: 核心定位 HTML5是 HTML标准 的第五次重大更新(2014年发布),主要提供网页结构定义、多媒体嵌入…

【前端八股文面试题】【JavaScript篇3】DOM常⻅的操作有哪些?

文章目录🧭 一、查询/获取元素 (Selecting Elements)✏️ 二、修改元素内容与属性 (Modifying Content & Attributes)🧬 三、创建与插入元素 (Creating & Inserting Elements)🗑️ 四、删除与替换元素 (Removing & Replacing)&am…