Update serving
This recipe suggest how to update the triton serving after it has deployed.
1. Deploy a triton serving job follow the submit a nvidia triton serving job which use gpus.
2. Update the serving.
arena support update some config of triton serving after it has deployed.
$ arena serve update triton --help
Update a triton serving job and its associated instances
Usage:
arena serve update triton [flags]
Flags:
--allow-metrics open metrics (default true)
--command string the command will inject to container's command.
--cpu string the request cpu of each replica to run the serve.
-e, --env stringArray the environment variables
--gpumemory int the limit GPU memory of each replica to run the serve.
--gpus int the limit GPU count of each replica to run the serve.
-h, --help help for triton
--image string the docker image name of serving job
--memory string the request memory of each replica to run the serve.
--model-repository string the path of triton model path
--name string the serving name
--replicas int the replicas number of the serve job.
--version string the serving version
Global Flags:
--arena-namespace string The namespace of arena system service, like tf-operator (default "arena-system")
--config string Path to a kube config. Only required if out-of-cluster
--loglevel string Set the logging level. One of: debug|info|warn|error (default "info")
-n, --namespace string the namespace of the job
--pprof enable cpu profile
--trace enable trace
for example, if you want to scale the replicas, you can use
$ arena serve update triton --name=test-triton --replicas=2
and if you want to update the model path, you can do like this command.
$ arean serve update triton --name=test-triton --model-repository=/mnt/models/ai/triton/model_repository
After you execute the command, the triton serving will do rolling update with the support of kubernetes deployment.