VILA运行全程踩坑笔记

1. docker的尝试
2. 本地部署服务端

仓库地址：https://github.com/NVlabs/VILA

全文按照一路踩坑的时间顺序记录，不建议按照步骤一步一步来重复每一个踩坑的悲伤故事，不如先全部看完，再实际上手操作。
省流：官方这文档写得一点鸟用都没有，甚至不如github里面老哥们写的issue实用。

既然是奔着实际使用来的，那么直奔server部署方案，因为最后肯定是在别的项目里面调用API的，而官方也给出了~~看起来十分完美~~ 的openAI式最终调用方式：

from openai import OpenAIclient = OpenAI(base_url="http://localhost:8000",api_key="fake-key",
)
response = client.chat.completions.create(messages=[{"role": "user","content": [{"type": "text", "text": "What’s in this image?"},{"type": "image_url","image_url": {"url": "https://blog.logomyway.com/wp-content/uploads/2022/01/NVIDIA-logo.jpg",# Or you can pass in a base64 encoded image# "url": "data:image/png;base64,<base64_encoded_image>",},},],}],model="NVILA-15B",
)
print(response.choices[0].message.content)

1. docker的尝试

看起来很棒，既然如此，先看看理论上最省事的docker方案。

按照官方页面引导编译用于python SDK访问的docker……~~啊？不是直接提供的docker镜像而是要自己编译的吗？？？~~ ：

docker build -t vila-server:latest .

不出意外的发生报错：

=> [6/8] COPY environment_setup.sh environment_setup.sh                                                                                                                                0.0s=> ERROR [7/8] RUN bash environment_setup.sh vila                                                                                                                                      2.9s
------> [7/8] RUN bash environment_setup.sh vila:
2.779
2.779 CondaToSNonInteractiveError: Terms of Service have not been accepted for the following channels. Please accept or remove them before proceeding:
2.779     - https://repo.anaconda.com/pkgs/main
2.779     - https://repo.anaconda.com/pkgs/r
2.779
2.779 To accept these channels' Terms of Service, run the following commands:
2.779     conda tos accept --override-channels --channel https://repo.anaconda.com/pkgs/main
2.779     conda tos accept --override-channels --channel https://repo.anaconda.com/pkgs/r
2.779
2.779 For information on safely removing channels from your conda configuration,
2.779 please see the official documentation:
2.779
2.779     https://www.anaconda.com/docs/tools/working-with-conda/channels
2.779
------1 warning found (use docker --debug to expand):- LegacyKeyValueFormat: "ENV key=value" should be used instead of legacy "ENV key value" format (line 9)
Dockerfile:14
--------------------12 |13 |     COPY environment_setup.sh environment_setup.sh14 | >>> RUN bash environment_setup.sh vila15 |16 |
--------------------
ERROR: failed to solve: process "/bin/sh -c bash environment_setup.sh vila" did not complete successfully: exit code: 1

解决办法：
1、打开项目路径下的Dockerfile文件，将内容修改为：

FROM nvcr.io/nvidia/pytorch:24.06-py3WORKDIR /appRUN curl https://repo.anaconda.com/miniconda/Miniconda3-latest-Linux-x86_64.sh -o ~/miniconda.sh \&& sh ~/miniconda.sh -b -p /opt/conda \&& rm ~/miniconda.shENV PATH=/opt/conda/bin:$PATH
COPY pyproject.toml pyproject.toml
COPY llava llavaCOPY environment_setup.sh environment_setup.sh
RUN conda tos accept --override-channels --channel https://repo.anaconda.com/pkgs/main && \conda tos accept --override-channels --channel https://repo.anaconda.com/pkgs/r
RUN bash environment_setup.sh vilaCOPY server.py server.py
CMD ["conda", "run", "-n", "vila", "--no-capture-output", "python", "-u", "-W", "ignore", "server.py"]

即修改了旧版的环境变量写法：

ENV PATH=/opt/conda/bin:$PATH

并添加了两行conda新版必须要手动通过的许可：

RUN conda tos accept --override-channels --channel https://repo.anaconda.com/pkgs/main && \conda tos accept --override-channels --channel https://repo.anaconda.com/pkgs/r

再次运行编译命令：

docker build -t vila-server:latest .

编译成功。

然后发现这居然只是一个开始，后面全都是报错，~~官方你到底写的什么破文档啊啊啊~~
一顿折腾之后，发现在docker里面调起来实在是太麻烦了，还是先走本地部署python的路子走通吧。

2. 本地部署服务端

首先看一下官方的操作说明：
安装环境

./environment_setup.sh vila
conda activate vila

运行服务端

python -W ignore server.py \--port 8000 \--model-path Efficient-Large-Model/NVILA-15B \--conv-mode auto

然后就可以访问了，这也太简单了吧【？
结果没想到，看似只有2步的简单安装说明其实简直是地狱。

因为众所周知的原因，国内安装环境下载极慢，为了加快速度，打开./environment_setup.sh，给里面所有pip install的指令末尾都加上清华源：

 -i https://mirrors.tuna.tsinghua.edu.cn/pypi/web/simple

漫长的等待之后，./environment_setup.sh运行完成，运行，报错，一气呵成：

No module named 'ps3'

不是，哥们，你们写环境安装脚本还能漏写环境的吗？
在issue里面找到了十分有用的帖子，一位回答问题的老哥不仅指出了如何修复，还指出了他后面遇到的好几个问题……结果后面还真全都遇到了，一个都没少……
https://github.com/NVlabs/VILA/issues/258
不过那是后话，首先解决目前遇到的模块缺失问题：

pip install ps3-torch

再次运行，结果triton的代码报错了，好在刚刚帖子里面的老哥已经指出了如何处理：

pip install triton==3.3.1

这样改完之后的environment_setup.sh就变成了：

#!/usr/bin/env bash
set -eCONDA_ENV=${1:-""}
if [ -n "$CONDA_ENV" ]; then# This is required to activate conda environmenteval "$(conda shell.bash hook)"conda create -n $CONDA_ENV python=3.10.14 -yconda activate $CONDA_ENV# This is optional if you prefer to use built-in nvccconda install -c nvidia cuda-toolkit -y
elseecho "Skipping conda environment creation. Make sure you have the correct environment activated."
fi# This is required to enable PEP 660 support
pip install --upgrade pip setuptools -i https://mirrors.tuna.tsinghua.edu.cn/pypi/web/simple# Install FlashAttention2
pip install https://github.com/Dao-AILab/flash-attention/releases/download/v2.5.8/flash_attn-2.5.8+cu122torch2.3cxx11abiFALSE-cp310-cp310-linux_x86_64.whl -i https://mirrors.tuna.tsinghua.edu.cn/pypi/web/simple# Install VILA
pip install -e ".[train,eval]" -i https://mirrors.tuna.tsinghua.edu.cn/pypi/web/simplepip install ps3-torch -i https://mirrors.tuna.tsinghua.edu.cn/pypi/web/simple# Quantization requires the newest triton version, and introduce dependency issue
pip install triton==3.3.0 -i https://mirrors.tuna.tsinghua.edu.cn/pypi/web/simple# numpy introduce a lot dependencies issues, separate from pyproject.yaml
# pip install numpy==1.26.4# Replace transformers and deepspeed files
site_pkg_path=$(python -c 'import site; print(site.getsitepackages()[0])')
cp -rv ./llava/train/deepspeed_replace/* $site_pkg_path/deepspeed/# Downgrade protobuf to 3.20 for backward compatibility
pip install protobuf==3.20.* -i https://mirrors.tuna.tsinghua.edu.cn/pypi/web/simple

终于，把conda的vila虚拟环境装完了，然后开始运行服务器……
等等？什么玩意要下载一万年？原来是参数中的Efficient-Large-Model/NVILA-15B这个模型啊。
那可不行，国内下载这个根本下不动，顺便又翻了一下，发现这个模型太大了，本地的显卡配置完全跑不起，那就去镜像站整一个小点的模型吧。
于是尝试了一下它的缩小版：

git clone https://modelscope.cn/models/Efficient-Large-Model/NVILA-8B

下载完之后，尝试运行服务端……启动！完美，出现了服务端地址与端口，然后运行官方的API访问脚本……报错！

openai.UnprocessableEntityError: Error code: 422 - {'detail': [{'loc': ['body', 'model'], 'msg': "unexpected value; permitted: 'NVILA-15B', 'VILA1.5-3B', 'VILA1.5-3B-AWQ', 'VILA1.5-3B-S2', 'VILA1.5-3B-S2-AWQ', 'Llama-3-VILA1.5-8B', 'Llama-3-VILA1.5-8B-AWQ', 'VILA1.5-13B', 'VILA1.5-13B-AWQ', 'VILA1.5-40B', 'VILA1.5-40B-AWQ'", 'type': 'value_error.const', 'ctx': {'given': 'NVILA-8B', 'permitted': ['NVILA-15B', 'VILA1.5-3B', 'VILA1.5-3B-AWQ', 'VILA1.5-3B-S2', 'VILA1.5-3B-S2-AWQ', 'Llama-3-VILA1.5-8B', 'Llama-3-VILA1.5-8B-AWQ', 'VILA1.5-13B', 'VILA1.5-13B-AWQ', 'VILA1.5-40B', 'VILA1.5-40B-AWQ']}}]}

翻了一下项目源码，发现它只支持部分llm……但奇葩的是，你连Llama都支持了，为什么唯独不支持一下官方示例的缩小版NVILA-8B？
有点无语，但翻了一下源码感觉改起来风险有点高，那就认输，换一个它有提供支持的小模型。

git clone https://modelscope.cn/models/Efficient-Large-Model/VILA1.5-3b

结果，它又又又报错了！

ValueError: Cannot use chat template functions because tokenizer.chat_template is not set and no template argument was passed! For information about writing templates and setting the tokenizer.chat_template attribute, please see the documentation at https://huggingface.co/docs/transformers/main/en/chat_templating

查了一下，发现这是新版本transformers不再提供旧版本默认的对话模板导致的，需要手动改一下。
于是，进入下载的模型文件夹里面的文件VILA1.5-3B/llm/tokenizer_config.json，在json的末尾添加一行属性：

"chat_template": "{% if messages[0]['role'] != 'system' %}{{ '<|im_start|>system\nYou are a helpful assistant<|im_end|>\n' }}{% endif %}{% for message in messages if message['content'] is not none %}{{ '<|im_start|>' + message['role'] + '\n' + message['content'] + '<|im_end|>' + '\n' }}{% endfor %}{% if add_generation_prompt %}{{ '<|im_start|>assistant\n' }}{% endif %}"

再次尝试运行……它又报错了，已经有点麻木了。

openai.UnprocessableEntityError: Error code: 422 - {'detail': [{'loc': ['body', 'model'], 'msg': "unexpected value; permitted: 'NVILA-15B', 'VILA1.5-3B', 'VILA1.5-3B-AWQ', 'VILA1.5-3B-S2', 'VILA1.5-3B-S2-AWQ', 'Llama-3-VILA1.5-8B', 'Llama-3-VILA1.5-8B-AWQ', 'VILA1.5-13B', 'VILA1.5-13B-AWQ', 'VILA1.5-40B', 'VILA1.5-40B-AWQ'", 'type': 'value_error.const', 'ctx': {'given': 'VILA1.5-3b', 'permitted': ['NVILA-15B', 'VILA1.5-3B', 'VILA1.5-3B-AWQ', 'VILA1.5-3B-S2', 'VILA1.5-3B-S2-AWQ', 'Llama-3-VILA1.5-8B', 'Llama-3-VILA1.5-8B-AWQ', 'VILA1.5-13B', 'VILA1.5-13B-AWQ', 'VILA1.5-40B', 'VILA1.5-40B-AWQ']}}]}

这次有点迷惑，我使用的VILA1.5-3b明明是列表里面的，为什么无法通过呢？
折腾了好久，震惊地发现了一个事实，下载下来的模型是VILA1.5-3b，而源码允许的模型是VILA1.5-3B……
对的，字母B的大小写不一样……~~不是，哥们，你们到底写的啥啊？~~

还是那句话，改源码风险不确定，最后想出一个办法，就是把下载下来的模型文件夹重命名：VILA1.5-3B
居然真的管用？

于是，愉快地再次尝试运行服务端推理……也毫不意外地又报错了。

openai.InternalServerError: Error code: 500 - {'error': 'Invalid style: SeparatorStyle.AUTO'}

……去github上翻了一下issue
https://github.com/NVlabs/VILA/issues/160
其中提到了这个问题，然后有人回复了这么一句……

It seems that we haven't found a solution to this problem yet

啊？？？
不过，后面他们又补充了一句：

offical serving scripts are uploaded:https://github.com/NVlabs/VILA/tree/main/serving

进入页面，发现里面写着运行服务端的方式：

# launch server
python serving/server.py --port 8001 --model-path Efficient-Large-Model/NVILA-15B --conv-mode auto

这和你们首页上写的都已经完全不是同一个脚本了好吧？？？

于是，尝试了一下这个启动写法：

python -W ignore serving/server.py   --port 8000     --model-path ./model/VILA1.5-3B  --conv-mode auto

成功解决了刚才的报错，但是又出现了新的问题……

from llava.constants import (
ImportError: cannot import name 'DEFAULT_IM_END_TOKEN' from 'llava.constants' (VILA/llava/constants.py)

于是给serving/server.py文件里面加了导入：

import sys
sys.path.append(os.path.join(os.path.dirname(os.path.abspath(__file__)), ".."))

报错依旧，有点奇怪，看了一眼VILA/llava/constants.py里面根本没有这个变量……
又回到serving/server.py里面，发现这个变量除了import的地方之外，根本就没用到过……
=-= 你们到底在写什么东西啊？
很快发现它并不是唯一的同类，一顿改完后，serving/server.py里面注释掉了一堆不存在也没有任何用处的导入：

from llava.constants import (# DEFAULT_IM_END_TOKEN,# DEFAULT_IM_START_TOKEN,DEFAULT_IMAGE_TOKEN,# IMAGE_PLACEHOLDER,# IMAGE_TOKEN_INDEX,
)

再次执行：

python -W ignore serving/server.py   --port 8000     --model-path ./model/VILA1.5-3B  --conv-mode auto

然后运行官方的openAI式API访问。
谢天谢地，这次终于运行并成功返回了对图像内容的描述结果：

Assistant:  The image is a collage of two distinct photos. On the left, there's a man dressed in a black leather jacket and glasses. He appears to be in the middle of a speech or presentation, as he is gesturing with his hands. The background behind him is blurred, suggesting a focus on the man and his actions.On the right side of the image, there's a logo for NVIDIA. The logo is a stylized representation of a green spiral, which is a common symbol for NVIDIA. The word "NVIDIA" is written in white letters beneath the logo, clearly indicating the company's identity.The two images are placed side by side, creating a juxtaposition between the man's presentation and the NVIDIA logo. The man's action of speaking and the logo's static nature create a contrast between the dynamic and the static, the human and the corporate.