Skip to content

Pytorch Training Job with configuration files

You can pass the configuration files to containers when submiting pytorch jobs, the following steps show how to use this feature.

1. Prepare the configuration file to be mounted on the submitted machine.

# prepare your config-file
➜ cat  /tmp/test-config.json
{
    "key": "job-config"
}

2. Submit the job, and specify the configuration file by --config-file <LOCAL_FILE>:<CONTAINER_FILE>, the following command show how to mount local file /tmp/test-config.json to containers and the path in containers is /etc/config/config.json.

➜ arena --loglevel info submit pytorch \
    --name=pytorch-config-file \
    --gpus=1 \
    --image=registry.cn-beijing.aliyuncs.com/ai-samples/pytorch-with-tensorboard:1.5.1-cuda10.1-cudnn7-runtime \
    --sync-mode=git \
    --sync-source=https://code.aliyun.com/370272561/mnist-pytorch.git \
    --config-file /tmp/test-config.json:/etc/config/config.json \
    "python /root/code/mnist-pytorch/mnist.py --epochs 50 --backend gloo"

configmap/pytorch-config-file-pytorchjob created
configmap/pytorch-config-file-pytorchjob labeled
configmap/pytorch-config-file-a9cbad1b8719778 created
pytorchjob.kubeflow.org/pytorch-config-file created
INFO[0000] The Job pytorch-config-file has been submitted successfully
INFO[0000] You can run `arena get pytorch-config-file --type pytorchjob` to check the job status

3. Get the details of the this job.

➜ arena get pytorch-config-file --type pytorchjob
STATUS: RUNNING
NAMESPACE: default
PRIORITY: N/A
TRAINING DURATION: 51s

NAME                 STATUS   TRAINER     AGE  INSTANCE                      NODE
pytorch-config-file  RUNNING  PYTORCHJOB  51s  pytorch-config-file-master-0  172.16.0.210

4. Use kubectl to check file is in container or not:

➜ kubectl exec -ti pytorch-config-file-master-0 -- cat /etc/config/config.json
{
    "key": "job-config"
}