Serving

This guide walks through the steps to deploy and serve a model with nvidia triton server.

1. Create a pvc named triton-pvc with models to deploy.

2. Summit your serving job into nvidia triton server.

$ arena serve triton \
 --name=test-triton \
 --namespace=triton \
 --gpus=1 \
 --image=nvcr.io/nvidia/tritonserver:24.01-py3 \
 --data=triton-pvc:/mnt/models \
 --model-repository=/mnt/models/ai/triton/model_repository

configmap/test-triton-202105312038-triton-serving created
configmap/test-triton-202105312038-triton-serving labeled
service/test-triton-202105312038-tritoninferenceserver created
deployment.apps/test-triton-202105312038-tritoninferenceserver created
INFO[0001] The Job test-triton has been submitted successfully 
INFO[0001] You can run `arena get test-triton --type triton-serving` to check the job status

3. List the job you were just serving

$ arena serve list -n triton
NAME         TYPE    VERSION       DESIRED  AVAILABLE  ADDRESS       PORTS
test-triton  Triton  202105312038  1        1          172.16.72.43  RESTFUL:8000,GRPC:8001

4. Test the model service

$ kubectl get svc -n triton
NAME                                             TYPE           CLUSTER-IP      EXTERNAL-IP     PORT(S)                                        AGE
test-triton-202105312038-tritoninferenceserver   ClusterIP      172.16.72.43    <none>          8000/TCP,8001/TCP,8002/TCP                     5m41s

$ kubectl port-forward svc/test-triton-202105312038-tritoninferenceserver -n triton 8000:8000
Forwarding from 127.0.0.1:8000 -> 8000
Forwarding from [::1]:8000 -> 8000

# check models deploy success
$ curl -v localhost:8000/v2/health/ready
*   Trying ::1...
* TCP_NODELAY set
* Connected to localhost (::1) port 8000 (#0)
> GET /v2/health/ready HTTP/1.1
> Host: localhost:8000
> User-Agent: curl/7.64.1
> Accept: */*
>
< HTTP/1.1 200 OK
< Content-Length: 0
< Content-Type: text/plain
<
* Connection #0 to host localhost left intact
* Closing connection 0

5. Delete the inference service

$ arena serve delete test-triton -n triton                                                                                         
service "test-triton-202105312038-tritoninferenceserver" deleted
deployment.apps "test-triton-202105312038-tritoninferenceserver" deleted
configmap "test-triton-202105312038-triton-serving" deleted
INFO[0001] The serving job test-triton with version 202105312038 has been deleted successfully