Skip to content

Tensorflow job with configuration files

The following steps will help you pass the configuration files to containers when submiting jobs.

1. prepare the sample configuration files, create a test file which name is "test-config.json",its' path is "/tmp/test-config.json". we want push this file to containers of a tfjob (or mpijob) and the path in container is "/etc/config/config.json".

$ cat /tmp/test-config.json
{
    "key": "job-config"

}

2. submit a tfjob, and assign the configuration file with option --config-file.

$ arena submit tfjob \
    --name=tf \
    --gpus=1 \
    --workers=1 \
    --worker-image=cheyang/tf-mnist-distributed:gpu \
    --ps-image=cheyang/tf-mnist-distributed:cpu \
    --ps=1 \
    --tensorboard \
    --config-file /tmp/test-config.json:/etc/config/config.json \
    "python /app/main.py"

you can use --config-file <host_path_file>:<container_path_file> to assign a configuration file to containers.

Note

there is some rules:

  • if assignd <host_path_file> and not assign <container_path_file> , we see <container_path_file> is the same as <host_path_file>.
  • <container_path_file> must be a file with absolute path.
  • you can use --config-file more than one in a command,eg: --config-file <file1>:<container_file1> --config-file <file2>:<container_file2>.

3. query the job details and make sure the job is "RUNNING".

$ arena get tf
STATUS: RUNNING
NAMESPACE: default
PRIORITY: N/A
TRAINING DURATION: 16s

NAME  STATUS   TRAINER  AGE  INSTANCE     NODE
tf    RUNNING  TFJOB    16s  tf-ps-0      192.168.7.18
tf    RUNNING  TFJOB    16s  tf-worker-0  192.168.7.16

Your tensorboard will be available on:
http://192.168.7.10:31825

4. use kubectl to check file is in container or not.

$ kubectl exec -ti tf-ps-0 -- cat /etc/config/config.json
{
    "key": "job-config"

}

$ kubectl exec -ti tf-worker-0 -- cat /etc/config/config.json
{
    "key": "job-config"

}

as you see,the file /etc/config/config.json is existed in the container.