Skip to content

Release 0.9.1

Fixed

  • Fix the bug that failed to run pytorchjob with RDMA
  • Fix the bug that error dispaly gpu core resources on nodes
  • Fix the bug that add evaluator and tensorboard to pod group

Changed

  • Refact installtion
  • Modify restful-serving to http-serving of deployment services
  • Optimize the operators to omit the Completed jobs into the queue

Added

  • Support modeljob adapts helm3
  • Cron workload supports custom labels
  • Java SDK submits training job with --label
  • Add resource limits for tfjob
  • Add subpathexpr for job