Training Job Guide
Welcome to the Arena Training Job Guide! This guide covers how to use the arena cli
to manage the training job. This page outlines the most common situations and questions that bring readers to this section.
Who should use this guide?
If you want to use arena to manage training jobs, this guide is for you. we have included detailed usages for managing training jobs.
Manage The Training Jobs
- How to list all training jobs.
- How to get the training job details.
- How to attach the training job.
- How to get the training job logs.
- How to delete the training jobs.
- How to clean up the finished training jobs.
Tensorflow Training Job Guide
- I want to submit a standalone tensorflow training job.
- I want to submit a tensorflow training job with specified a tensorboard.
- I want to submit a distributed tensorflow training job.
- I want to submit a tensorflow training job with specified datasets.
- I want to submit a tensorflow training job with gang scheduling enabled.
- I want to submit a tensorflow training job with specified estimator.
- I want to submit a tensorflow training job with specified node selectors.
- I want to submit a tensorflow training job with specified taint nodes.
- I want to submit a tensorflow training job with specified configuration files.
- I want to submit Tensorflow Job with specified role sequence.
MPI Training Job Guide
- I want to submit a distributed MPI training job.
- I want to submit a distributed MPI training with gpu topology scheduling.
- I want to preempt the MPI training job.
- I want to submit a MPI training job with specified tolerations.
- I want to submit a MPI training job with specified node selectors.
- I want to submit a MPI training job with specified configuration files.
- I want to submit a MPI training job with specified rdma devices.
Pytorch Training Job Guide
- I want to submit a standalone pytorch training job.
- I want to submit a distributed pytorch training job.
- I want to submit a pytorch training job with specified tensorboard.
- I want to submit a pytorch training job with specified datasets.
- I want to submit a pytorch training job with specified node selectors.
- I want to submit a pytorch training job with specified node tolerations.
- I want to submit a pytorch training job with specified configuration files.
- I want to preempt the pytorch training job.
- I want to submit a pytorch training job with specified cleaning task policy.
Elastic Training Job Guide
- I want to submit a elastic training job(pytorch).
- I want to submit a elastic training job(tensorflow).
Cron Training Job Guide
- I want to submit a cron training job(tensorflow).
Spark Training Job Guide
- I want to submit a distributed spark training job.
Volcano Training Job Guide
- I want to submit a volcano training job.
Ray Training Job Guide
- I want to submit a ray training job.