VILA运行全程踩坑笔记
- 1. docker的尝试
- 2. 本地部署服务端
仓库地址:https://github.com/NVlabs/VILA
全文按照一路踩坑的时间顺序记录,不建议按照步骤一步一步来重复每一个踩坑的悲伤故事,不如先全部看完,再实际上手操作。
省流:官方这文档写得一点鸟用都没有,甚至不如github里面老哥们写的issue实用。
既然是奔着实际使用来的,那么直奔server部署方案,因为最后肯定是在别的项目里面调用API的,而官方也给出了看起来十分完美 的openAI式最终调用方式:
from openai import OpenAIclient = OpenAI(base_url="http://localhost:8000",api_key="fake-key",
)
response = client.chat.completions.create(messages=[{"role": "user","content": [{"type": "text", "text": "What’s in this image?"},{"type": "image_url","image_url": {"url": "https://blog.logomyway.com/wp-content/uploads/2022/01/NVIDIA-logo.jpg",# Or you can pass in a base64 encoded image# "url": "data:image/png;base64,<base64_encoded_image>",},},],}],model="NVILA-15B",
)
print(response.choices[0].message.content)
1. docker的尝试
看起来很棒,既然如此,先看看理论上最省事的docker方案。
按照官方页面引导编译用于python SDK访问的docker……啊?不是直接提供的docker镜像而是要自己编译的吗??? :
docker build -t vila-server:latest .
不出意外的发生报错:
=> [6/8] COPY environment_setup.sh environment_setup.sh 0.0s=> ERROR [7/8] RUN bash environment_setup.sh vila 2.9s
------> [7/8] RUN bash environment_setup.sh vila:
2.779
2.779 CondaToSNonInteractiveError: Terms of Service have not been accepted for the following channels. Please accept or remove them before proceeding:
2.779 - https://repo.anaconda.com/pkgs/main
2.779 - https://repo.anaconda.com/pkgs/r
2.779
2.779 To accept these channels' Terms of Service, run the following commands:
2.779 conda tos accept --override-channels --channel https://repo.anaconda.com/pkgs/main
2.779 conda tos accept --override-channels --channel https://repo.anaconda.com/pkgs/r
2.779
2.779 For information on safely removing channels from your conda configuration,
2.779 please see the official documentation:
2.779
2.779 https://www.anaconda.com/docs/tools/working-with-conda/channels
2.779
------1 warning found (use docker --debug to expand):- LegacyKeyValueFormat: "ENV key=value" should be used instead of legacy "ENV key value" format (line 9)
Dockerfile:14
--------------------12 |13 | COPY environment_setup.sh environment_setup.sh14 | >>> RUN bash environment_setup.sh vila15 |16 |
--------------------
ERROR: failed to solve: process "/bin/sh -c bash environment_setup.sh vila" did not complete successfully: exit code: 1
解决办法:
1、打开项目路径下的Dockerfile文件,将内容修改为:
FROM nvcr.io/nvidia/pytorch:24.06-py3WORKDIR /appRUN curl https://repo.anaconda.com/miniconda/Miniconda3-latest-Linux-x86_64.sh -o ~/miniconda.sh \&& sh ~/miniconda.sh -b -p /opt/conda \&& rm ~/miniconda.shENV PATH=/opt/conda/bin:$PATH
COPY pyproject.toml pyproject.toml
COPY llava llavaCOPY environment_setup.sh environment_setup.sh
RUN conda tos accept --override-channels --channel https://repo.anaconda.com/pkgs/main && \conda tos accept --override-channels --channel https://repo.anaconda.com/pkgs/r
RUN bash environment_setup.sh vilaCOPY server.py server.py
CMD ["conda", "run", "-n", "vila", "--no-capture-output", "python", "-u", "-W", "ignore", "server.py"]
即修改了旧版的环境变量写法:
ENV PATH=/opt/conda/bin:$PATH
并添加了两行conda新版必须要手动通过的许可:
RUN conda tos accept --override-channels --channel https://repo.anaconda.com/pkgs/main && \conda tos accept --override-channels --channel https://repo.anaconda.com/pkgs/r
再次运行编译命令:
docker build -t vila-server:latest .
编译成功。
然后发现这居然只是一个开始,后面全都是报错,官方你到底写的什么破文档啊啊啊
一顿折腾之后,发现在docker里面调起来实在是太麻烦了,还是先走本地部署python的路子走通吧。
2. 本地部署服务端
首先看一下官方的操作说明:
安装环境
./environment_setup.sh vila
conda activate vila
运行服务端
python -W ignore server.py \--port 8000 \--model-path Efficient-Large-Model/NVILA-15B \--conv-mode auto
然后就可以访问了,这也太简单了吧【?
结果没想到,看似只有2步的简单安装说明其实简直是地狱。
因为众所周知的原因,国内安装环境下载极慢,为了加快速度,打开./environment_setup.sh
,给里面所有pip install
的指令末尾都加上清华源:
-i https://mirrors.tuna.tsinghua.edu.cn/pypi/web/simple
漫长的等待之后,./environment_setup.sh
运行完成,运行,报错,一气呵成:
No module named 'ps3'
不是,哥们,你们写环境安装脚本还能漏写环境的吗?
在issue里面找到了十分有用的帖子,一位回答问题的老哥不仅指出了如何修复,还指出了他后面遇到的好几个问题……结果后面还真全都遇到了,一个都没少……
https://github.com/NVlabs/VILA/issues/258
不过那是后话,首先解决目前遇到的模块缺失问题:
pip install ps3-torch
再次运行,结果triton
的代码报错了,好在刚刚帖子里面的老哥已经指出了如何处理:
pip install triton==3.3.1
这样改完之后的environment_setup.sh
就变成了:
#!/usr/bin/env bash
set -eCONDA_ENV=${1:-""}
if [ -n "$CONDA_ENV" ]; then# This is required to activate conda environmenteval "$(conda shell.bash hook)"conda create -n $CONDA_ENV python=3.10.14 -yconda activate $CONDA_ENV# This is optional if you prefer to use built-in nvccconda install -c nvidia cuda-toolkit -y
elseecho "Skipping conda environment creation. Make sure you have the correct environment activated."
fi# This is required to enable PEP 660 support
pip install --upgrade pip setuptools -i https://mirrors.tuna.tsinghua.edu.cn/pypi/web/simple# Install FlashAttention2
pip install https://github.com/Dao-AILab/flash-attention/releases/download/v2.5.8/flash_attn-2.5.8+cu122torch2.3cxx11abiFALSE-cp310-cp310-linux_x86_64.whl -i https://mirrors.tuna.tsinghua.edu.cn/pypi/web/simple# Install VILA
pip install -e ".[train,eval]" -i https://mirrors.tuna.tsinghua.edu.cn/pypi/web/simplepip install ps3-torch -i https://mirrors.tuna.tsinghua.edu.cn/pypi/web/simple# Quantization requires the newest triton version, and introduce dependency issue
pip install triton==3.3.0 -i https://mirrors.tuna.tsinghua.edu.cn/pypi/web/simple# numpy introduce a lot dependencies issues, separate from pyproject.yaml
# pip install numpy==1.26.4# Replace transformers and deepspeed files
site_pkg_path=$(python -c 'import site; print(site.getsitepackages()[0])')
cp -rv ./llava/train/deepspeed_replace/* $site_pkg_path/deepspeed/# Downgrade protobuf to 3.20 for backward compatibility
pip install protobuf==3.20.* -i https://mirrors.tuna.tsinghua.edu.cn/pypi/web/simple
终于,把conda的vila虚拟环境装完了,然后开始运行服务器……
等等?什么玩意要下载一万年?原来是参数中的Efficient-Large-Model/NVILA-15B
这个模型啊。
那可不行,国内下载这个根本下不动,顺便又翻了一下,发现这个模型太大了,本地的显卡配置完全跑不起,那就去镜像站整一个小点的模型吧。
于是尝试了一下它的缩小版:
git clone https://modelscope.cn/models/Efficient-Large-Model/NVILA-8B
下载完之后,尝试运行服务端……启动!完美,出现了服务端地址与端口,然后运行官方的API访问脚本……报错!
openai.UnprocessableEntityError: Error code: 422 - {'detail': [{'loc': ['body', 'model'], 'msg': "unexpected value; permitted: 'NVILA-15B', 'VILA1.5-3B', 'VILA1.5-3B-AWQ', 'VILA1.5-3B-S2', 'VILA1.5-3B-S2-AWQ', 'Llama-3-VILA1.5-8B', 'Llama-3-VILA1.5-8B-AWQ', 'VILA1.5-13B', 'VILA1.5-13B-AWQ', 'VILA1.5-40B', 'VILA1.5-40B-AWQ'", 'type': 'value_error.const', 'ctx': {'given': 'NVILA-8B', 'permitted': ['NVILA-15B', 'VILA1.5-3B', 'VILA1.5-3B-AWQ', 'VILA1.5-3B-S2', 'VILA1.5-3B-S2-AWQ', 'Llama-3-VILA1.5-8B', 'Llama-3-VILA1.5-8B-AWQ', 'VILA1.5-13B', 'VILA1.5-13B-AWQ', 'VILA1.5-40B', 'VILA1.5-40B-AWQ']}}]}
翻了一下项目源码,发现它只支持部分llm……但奇葩的是,你连Llama都支持了,为什么唯独不支持一下官方示例的缩小版NVILA-8B?
有点无语,但翻了一下源码感觉改起来风险有点高,那就认输,换一个它有提供支持的小模型。
git clone https://modelscope.cn/models/Efficient-Large-Model/VILA1.5-3b
结果,它又又又报错了!
ValueError: Cannot use chat template functions because tokenizer.chat_template is not set and no template argument was passed! For information about writing templates and setting the tokenizer.chat_template attribute, please see the documentation at https://huggingface.co/docs/transformers/main/en/chat_templating
查了一下,发现这是新版本transformers不再提供旧版本默认的对话模板导致的,需要手动改一下。
于是,进入下载的模型文件夹里面的文件VILA1.5-3B/llm/tokenizer_config.json
,在json的末尾添加一行属性:
"chat_template": "{% if messages[0]['role'] != 'system' %}{{ '<|im_start|>system\nYou are a helpful assistant<|im_end|>\n' }}{% endif %}{% for message in messages if message['content'] is not none %}{{ '<|im_start|>' + message['role'] + '\n' + message['content'] + '<|im_end|>' + '\n' }}{% endfor %}{% if add_generation_prompt %}{{ '<|im_start|>assistant\n' }}{% endif %}"
再次尝试运行……它又报错了,已经有点麻木了。
openai.UnprocessableEntityError: Error code: 422 - {'detail': [{'loc': ['body', 'model'], 'msg': "unexpected value; permitted: 'NVILA-15B', 'VILA1.5-3B', 'VILA1.5-3B-AWQ', 'VILA1.5-3B-S2', 'VILA1.5-3B-S2-AWQ', 'Llama-3-VILA1.5-8B', 'Llama-3-VILA1.5-8B-AWQ', 'VILA1.5-13B', 'VILA1.5-13B-AWQ', 'VILA1.5-40B', 'VILA1.5-40B-AWQ'", 'type': 'value_error.const', 'ctx': {'given': 'VILA1.5-3b', 'permitted': ['NVILA-15B', 'VILA1.5-3B', 'VILA1.5-3B-AWQ', 'VILA1.5-3B-S2', 'VILA1.5-3B-S2-AWQ', 'Llama-3-VILA1.5-8B', 'Llama-3-VILA1.5-8B-AWQ', 'VILA1.5-13B', 'VILA1.5-13B-AWQ', 'VILA1.5-40B', 'VILA1.5-40B-AWQ']}}]}
这次有点迷惑,我使用的VILA1.5-3b
明明是列表里面的,为什么无法通过呢?
折腾了好久,震惊地发现了一个事实,下载下来的模型是VILA1.5-3b
,而源码允许的模型是VILA1.5-3B
……
对的,字母B的大小写不一样……不是,哥们,你们到底写的啥啊?
还是那句话,改源码风险不确定,最后想出一个办法,就是把下载下来的模型文件夹重命名:VILA1.5-3B
居然真的管用?
于是,愉快地再次尝试运行服务端推理……也毫不意外地又报错了。
openai.InternalServerError: Error code: 500 - {'error': 'Invalid style: SeparatorStyle.AUTO'}
……去github上翻了一下issue
https://github.com/NVlabs/VILA/issues/160
其中提到了这个问题,然后有人回复了这么一句……
It seems that we haven't found a solution to this problem yet
啊???
不过,后面他们又补充了一句:
offical serving scripts are uploaded:https://github.com/NVlabs/VILA/tree/main/serving
进入页面,发现里面写着运行服务端的方式:
# launch server
python serving/server.py --port 8001 --model-path Efficient-Large-Model/NVILA-15B --conv-mode auto
这和你们首页上写的都已经完全不是同一个脚本了好吧???
于是,尝试了一下这个启动写法:
python -W ignore serving/server.py --port 8000 --model-path ./model/VILA1.5-3B --conv-mode auto
成功解决了刚才的报错,但是又出现了新的问题……
from llava.constants import (
ImportError: cannot import name 'DEFAULT_IM_END_TOKEN' from 'llava.constants' (VILA/llava/constants.py)
于是给serving/server.py
文件里面加了导入:
import sys
sys.path.append(os.path.join(os.path.dirname(os.path.abspath(__file__)), ".."))
报错依旧,有点奇怪,看了一眼VILA/llava/constants.py
里面根本没有这个变量……
又回到serving/server.py
里面,发现这个变量除了import的地方之外,根本就没用到过……
=-= 你们到底在写什么东西啊?
很快发现它并不是唯一的同类,一顿改完后,serving/server.py
里面注释掉了一堆不存在也没有任何用处的导入:
from llava.constants import (# DEFAULT_IM_END_TOKEN,# DEFAULT_IM_START_TOKEN,DEFAULT_IMAGE_TOKEN,# IMAGE_PLACEHOLDER,# IMAGE_TOKEN_INDEX,
)
再次执行:
python -W ignore serving/server.py --port 8000 --model-path ./model/VILA1.5-3B --conv-mode auto
然后运行官方的openAI式API访问。
谢天谢地,这次终于运行并成功返回了对图像内容的描述结果:
Assistant: The image is a collage of two distinct photos. On the left, there's a man dressed in a black leather jacket and glasses. He appears to be in the middle of a speech or presentation, as he is gesturing with his hands. The background behind him is blurred, suggesting a focus on the man and his actions.On the right side of the image, there's a logo for NVIDIA. The logo is a stylized representation of a green spiral, which is a common symbol for NVIDIA. The word "NVIDIA" is written in white letters beneath the logo, clearly indicating the company's identity.The two images are placed side by side, creating a juxtaposition between the man's presentation and the NVIDIA logo. The man's action of speaking and the logo's static nature create a contrast between the dynamic and the static, the human and the corporate.