《OpenShift / RHEL / DevSecOps 汇总目录》
说明:本文已经在 OpenShift 4.18 + OpenShift AI 2.19 的环境中验证
文章目录
- 准备 Triton Runtime 环境
- 添加 Triton Serving Runtime
- 运行基于 Triton Runtime 的 Model Server
- 在 Triton Runtime 中运行模型
- 准备模型运行环境
- 运行 PyTorch 模型
- 运行 ONNX 模型
- 运行 TensorFlow 模型
- 参考
准备 Triton Runtime 环境
添加 Triton Serving Runtime
- 进入 RHOAI 的 Settings -> Serving runtime 菜单。
- 点击 Add serving runtime 按钮。
- 在 Add serving runtime 页面中选择 Multi-model serving platform 和 REST。
- 在 YAML 区域点击 ‘Start from scratch’,然后提供以下内容,最后 Create。
apiVersion: serving.kserve.io/v1alpha1
kind: ServingRuntime
metadata:name: triton-23.05labels:name: triton-23.05annotations:maxLoadingConcurrency: "2"openshift.io/display-name: Triton runtime - 25.05-py3
spec:supportedModelFormats:- name: kerasversion: "2" # 2.6.0autoSelect: true- name: onnxversion: "1" # 1.5.3autoSelect: true- name: pytorchversion: "1" # 1.8.0a0+17f8c32autoSelect: true- name: tensorflowversion: "1" # 1.15.4autoSelect: true- name: tensorflowversion: "2" # 2.3.1autoSelect: true- name: tensorrtversion: "7" # 7.2.1autoSelect: true- name: sklearnversion: "0" # v0.23.1autoSelect: false- name: xgboostversion: "1" # v1.1.1autoSelect: false- name: lightgbmversion: "3" # v3.2.1autoSelect: falseprotocolVersions:- grpc-v2multiModel: truegrpcEndpoint: port:8085grpcDataEndpoint: port:8001volumes:- name: shmemptyDir:medium: MemorysizeLimit: 2Gicontainers:- name: tritonimage: nvcr.io/nvidia/tritonserver:25.05-py3command:- /bin/shargs:- -c- 'mkdir -p /models/_triton_models;chmod 777 /models/_triton_models;exec tritonserver"--model-repository=/models/_triton_models""--model-control-mode=explicit""--strict-model-config=false""--strict-readiness=false""--allow-http=true""--allow-sagemaker=false"'volumeMounts:- name: shmmountPath: /dev/shmresources:requests:cpu: 500mmemory: 1Gilimits:cpu: "5"memory: 1GilivenessProbe:exec:command:- curl- --fail- --silent- --show-error- --max-time- "9"- http://localhost:8000/v2/health/liveinitialDelaySeconds: 5periodSeconds: 30timeoutSeconds: 10builtInAdapter:serverType: tritonruntimeManagementPort: 8001memBufferBytes: 134217728modelLoadingTimeoutMillis: 90000
运行基于 Triton Runtime 的 Model Server
- 在一个 RHOAI 项目中为 Models 设为 Multi-model serving platform 类型。
- 按下图在 Models 中运行一个基于 Triton 运行时的 Model Server。
- 完成后可以查看 Triton Model Server 的运行情况。
$ oc get deploy
NAME READY UP-TO-DATE AVAILABLE AGE
modelmesh-serving-triton-model-server 1/1 1 1 24h
在 Triton Runtime 中运行模型
准备模型运行环境
- 在 RHOAI 中创建一个项目,然后在 Models 中选择 ‘Select multi-model’。
- 确保在对象存储中有名为 ai-models 的存储桶。
- 创建一个 名为 ai-models 的 Connection,连到对象存储中名为 ai-models 的存储桶。
运行 PyTorch 模型
- 将 modelmesh-minio-examples 下载到本地,然后查看位于 modelmesh-minio-examples/pytorch/cifar 目录下所包含的文件。
$ git clone https://github.com/kserve/modelmesh-minio-examples && cd modelmesh-minio-examples/pytorch
$ tree cifar/
cifar/
├── 1
│ └── model.pt
└── config.pbtxt
- 将 modelmesh-minio-examples/pytorch/cifar 目录上传到对象存储中的 ai-models 存储桶中。
- 在 RHOAI 中的 Models 页面里点击 Triton Model Server 一行右侧的 Deploy model 按钮,然后按下图部署位于对象存储中的 cifar 模型。
- 完成后可以看到 cifar-triton-torch 模型的部署状态。
- 查询模型的 input 和 output 格式。
$ MODEL_NAME=cifar-triton-torch
$ MODEL_URL=$(oc get route $MODEL_NAME -ojsonpath=https://{.spec.host})/v2/models/$MODEL_NAME
$ curl -s ${MODEL_URL} | jq
{"name": "cifar-triton-torch__isvc-9f77f26bf2","versions": ["1"],"platform": "pytorch_libtorch","inputs": [{"name": "INPUT__0","datatype": "FP32","shape": ["-1","3","32","32"]}],"outputs": [{"name": "OUTPUT__0","datatype": "FP32","shape": ["-1","10"]}]
}
- 下载测试数据文件,然后提交给 cifar-triton-torch 模型,得到返回结果。
$ wget https://raw.githubusercontent.com/kserve/kserve/master/docs/samples/v1beta1/triton/torchscript/input.json
$ curl -s -X POST -k "${MODEL_URL}/infer" -H "Content-Type: application/json" -d @./input.json | jq
{"model_name": "cifar-triton-torch__isvc-9f77f26bf2","model_version": "1","outputs": [{"name": "OUTPUT__0","datatype": "FP32","shape": [1,10],"data": [-0.55252016,-1.7675304,0.6265609,1.4070208,0.38794953,1.3849527,-0.16314837,0.85409915,-0.6349715,-0.6840154]}]
}
运行 ONNX 模型
- 下载 https://ai-on-openshift.io/odh-rhoai/img-triton/card.fraud.detection.onnx 模型文件到本地。
- 将模型文件上传到对象存储的 ai-models 存储桶下的 card-fraud-detection 文件夹中。
- 按下图将 card-fraud-detection.onnx 部署模型到 Triton Model Server 中。
- 查看部署状态。
- 查询模型的 input 和 output 格式。
$ MODEL_NAME=card-fraud-detection
$ MODEL_URL=$(oc get route $MODEL_NAME -ojsonpath=https://{.spec.host})/v2/models/$MODEL_NAME
$ curl -s ${MODEL_URL} | jq
{"name": "card-fraud-detection-1__isvc-c0a9fa30b8","versions": ["1"],"platform": "onnxruntime_onnx","inputs": [{"name": "dense_input","datatype": "FP32","shape": ["-1","7"]}],"outputs": [{"name": "dense_3","datatype": "FP32","shape": ["-1","1"]}]
}
- 访问 card-fraud-detection 模型。
$ curl -s -X POST -k "${MODEL_URL}/infer" -d '{"inputs": [{ "name": "dense_input", "shape": [1, 7], "datatype": "FP32", "data": [57.87785658389723,0.3111400080477545,1.9459399775518593,1.0,1.0,0.0,0.0]}]}' | jq
{"model_name": "card-fraud-detection__isvc-7bda50d09c","model_version": "1","outputs": [{"name": "dense_3","datatype": "FP32","shape": [1,1],"data": [0.86280495]}]
}
运行 TensorFlow 模型
- 将 modelmesh-minio-examples 下载到本地,然后查看位于 modelmesh-minio-examples/tensorflow 目录下所包含的文件。
$ git clone https://github.com/kserve/modelmesh-minio-examples && cd modelmesh-minio-examples
$ tree tensorflow
tensorflow
+--- mnist
| +--- saved_model.pb
| +--- variables
| | +--- variables.data-00000-of-00001
| | +--- variables.index
+--- simple_string
| +--- 1
| | +--- model.graphdef
| +--- config.pbtxt
- 将 tensorflow 目录上传到对象存储的 ai-models 存储桶中。
- 在 RHOAI 中按下图部署模型,分别使用 tensorflow/mnist 和 tensorflow/simple_string 作为部署模型的 Path。
- 完成后可以看到部署好的 mnist 和 simplestring 模型。
- 查询每个模型的 input 和 output 格式。
$ MODEL_NAME=mnist
$ MODEL_URL=$(oc get route $MODEL_NAME -ojsonpath=https://{.spec.host})/v2/models/$MODEL_NAME
$ curl -s ${MODEL_URL} | jq
{"name": "mnist__isvc-a18e6fe55d","versions": ["1"],"platform": "tensorflow_savedmodel","inputs": [{"name": "inputs","datatype": "FP32","shape": ["-1","784"]}],"outputs": [{"name": "classes","datatype": "INT64","shape": ["-1","1"]}]
}
- 将以下内容保存到 mnist-test.json 文件中。
{"inputs": [{"name": "inputs","shape": [1, 784],"datatype": "FP32","data": [0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.2392, 0.0118, 0.1647, 0.4627, 0.7569, 0.4627, 0.4627, 0.2392, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0549, 0.7020, 0.9608, 0.9255, 0.9490, 0.9961, 0.9961, 0.9961, 0.9961, 0.9608, 0.9216, 0.3294, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.5922, 0.9961, 0.9961, 0.9961, 0.8353, 0.7529, 0.6980, 0.6980, 0.7059, 0.9961, 0.9961, 0.9451, 0.1804, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.1686, 0.9216, 0.9961, 0.8863, 0.2510, 0.1098, 0.0471, 0.0000, 0.0000, 0.0078, 0.5020, 0.9882, 1.0000, 0.6784, 0.0667, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.2196, 0.9961, 0.9922, 0.4196, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.5255, 0.9804, 0.9961, 0.2941, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.2471, 0.9961, 0.6196, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.8667, 0.9961, 0.6157, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.7608, 0.9961, 0.4039, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.5882, 0.9961, 0.8353, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.1333, 0.8627, 0.9373, 0.2275, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.3294, 0.9961, 0.8353, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.4941, 0.9961, 0.6706, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.3294, 0.9961, 0.8353, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.8392, 0.9373, 0.2353, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.3294, 0.9961, 0.8353, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.8392, 0.7804, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.3294, 0.9961, 0.8353, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0431, 0.8588, 0.7804, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.3294, 0.9961, 0.8353, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.3843, 0.9961, 0.7804, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.6353, 0.9961, 0.8196, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.3843, 0.9961, 0.7804, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.2000, 0.9333, 0.9961, 0.2941, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.3843, 0.9961, 0.7804, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.2000, 0.6471, 0.9961, 0.7647, 0.0157, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.2588, 0.9451, 0.7804, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0118, 0.6549, 0.9961, 0.8902, 0.2157, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.8392, 0.8353, 0.0784, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.1804, 0.5961, 0.7922, 0.9961, 0.9961, 0.2471, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.8392, 0.9961, 0.8000, 0.7059, 0.7059, 0.7059, 0.7059, 0.7059, 0.9216, 0.9961, 0.9961, 0.9176, 0.6118, 0.0392, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.3176, 0.8039, 0.9961, 0.9961, 0.9961, 0.9961, 0.9961, 0.9961, 0.9961, 0.9882, 0.9176, 0.4706, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.1020, 0.8235, 0.9961, 0.9961, 0.9961, 0.9961, 0.9961, 0.6000, 0.4078, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000,0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000]}]
}
- 将测试文件提交给 mnist 模型,得到返回结果。
$ curl -s -X POST -k "${MODEL_URL}/infer" -H "Content-Type: application/json" -d @./test.json | jq
{"model_name": "mnist__isvc-a18e6fe55d","model_version": "1","outputs": [{"name": "classes","datatype": "INT64","shape": [1,1],"data": [0]}]
}
参考
https://github.com/kserve/modelmesh-serving/blob/main/config/runtimes/triton-2.x.yaml
https://github.com/kserve/modelmesh-minio-examples/tree/main/pytorch/cifar
https://docs.nvidia.com/deeplearning/triton-inference-server/user-guide/docs/user_guide/model_repository.html
https://docs.nvidia.com/deeplearning/triton-inference-server/user-guide/docs/tutorials/Quick_Deploy/PyTorch/README.html
https://docs.nvidia.com/deeplearning/triton-inference-server/archives/triton_inference_server_1150/user-guide/docs/model_repository.html#pytorch-models
https://catalog.ngc.nvidia.com/orgs/nvidia/containers/tritonserver/tags
https://github.com/triton-inference-server/tutorials/tree/main/Conceptual_Guide/Part_1-model_deployment
https://github.com/triton-inference-server/server/tree/main/docs/examples
https://ai-on-openshift.io/odh-rhoai/custom-runtime-triton
https://ai-on-openshift.io/tools-and-applications/ensemble-serving/ensemble-serving/
https://ai-on-openshift.io/odh-rhoai/custom-runtime-triton/#deploying-a-model-into-it
https://github.com/rh-aiservices-bu/kserve-triton-ensemble-testing
https://github.com/rh-aiservices-bu/kserve-triton-ensemble-testing/blob/main/runtime/runtime-rest.yaml
https://kserve.github.io/website/latest/modelserving/v1beta1/triton/torchscript/