在FreeBSD系统下使用llama-cpp运行飞桨开源大模型Ernie4.5 0.3B（失败）

先上结论，截止到目前2025.7.25日，还不能用。也就是Ernie4.5模型无法在llama.cpp 和Ollama上进行推理，原因主要就llama是不支持Ernie4.5异构MoE架构。

不局限于FreeBSD系统，Windows也测试失败，理论上Ubuntu下也是不行。

所做尝试

安装llama-cpp

首先pkg安装llama-cpp

pkg install llama-cpp

也尝试了编译安装

下载源代码

git clone https://github.com/ggerganov/llama.cpp

进入llama.cpp目录

编译安装

mkdir build
cd build
cmake ..
cmake --build . --config Release

将编译好的路径加入PATH

export PATH=~/github/llama.cpp/build/bin:$PAT

这样就可以执行llama.cpp了。

直接编译，最后生成的可执行文件是main，执行起来是这样：

main -m ~/work/model/chinesellama/ggml-model-f16.gguf  -p "Building a website can be done in 10 simple steps:\nStep 1:" -n 400 -e

下载模型

从这个网址下载：unsloth/ERNIE-4.5-0.3B-PT-GGUF at main

如果下载很慢，可以考虑从huggingface官网下载，当然需要科学上网。

下载完毕：

ls E*
ERNIE-4.5-0.3B-PT-F16.gguf	ERNIE-4.5-0.3B-PT-Q2_K.gguf

也可以下载普通的模型文件，然后用转换程序，转换为gguf格式模型

python convert.py ~/work/model/chinesellama/

运行

llama-cli -m ERNIE-4.5-0.3B-PT-Q2_K.gguf -p "hello"

如果编译后的文件为main，那么执行：

main -m ERNIE-4.5-0.3B-PT-Q2_K.gguf -p "hello"

运行失败。

总结

截止目前Ernie4.5还不能用llama推理。

说实话，这确实限制了Ernie4.5的普及。

调试

报错Terminating due to uncaught exception 0x28323c45c340 of type std::runtime_error

main: chat template is available, enabling conversation mode (disable it with -no-cnv)
*** User-specified prompt will pre-start conversation, did you mean to set --system-prompt (-sys) instead?
[New LWP 112399 of process 29362]
[New LWP 112400 of process 29362]
[New LWP 112401 of process 29362]
[New LWP 112402 of process 29362]
0x0000000829dc1818 in _wait4 () from /lib/libc.so.7
#0 0x0000000829dc1818 in _wait4 () from /lib/libc.so.7
#1 0x0000000821b3993c in ?? () from /lib/libthr.so.3
#2 0x00000008231e6809 in ?? () from /usr/local/lib/libggml-base.so
#3 0x00000008281be199 in std::terminate() () from /lib/libcxxrt.so.1
#4 0x00000008281be674 in ?? () from /lib/libcxxrt.so.1
#5 0x00000008281be589 in __cxa_throw () from /lib/libcxxrt.so.1
#6 0x00000000002d8070 in ?? ()
#7 0x00000000002d8adc in ?? ()
#8 0x000000000025e8b8 in ?? ()
#9 0x0000000829d0dc3a in __libc_start1 () from /lib/libc.so.7
#10 0x000000000025e120 in ?? ()
[Inferior 1 (process 29362) detached]
Terminating due to uncaught exception 0x28323c45c340 of type std::runtime_error
终止陷阱（核心已转储）

大约是内存不足

后来在Windows下用llama.cpp，报错：

print_info: file size   = 688.14 MiB (16.00 BPW)
llama_model_load: error loading model: error loading model architecture: unknown model architecture: 'ernie4_5'
llama_model_load_from_file_impl: failed to load model
common_init_from_params: failed to load model 'e:\360Downloads\ERNIE-4.5-0.3B-PT-F16.gguf'
main: error: unable to load model

证明确实无法用llama进行推理。

本文来自互联网用户投稿，该文观点仅代表作者本人，不代表本站立场。本站仅提供信息存储空间服务，不拥有所有权，不承担相关法律责任。
如若转载，请注明出处：http://www.pswp.cn/pingmian/90667.shtml
繁体地址，请注明出处：http://hk.pswp.cn/pingmian/90667.shtml

如若内容造成侵权/违法违规/事实不符，请联系多彩编程网进行投诉反馈email:809451989@qq.com，一经查实，立即删除！