Release 0.4.0
Added
- Add Pytorch
- Support custom serving
- Add tarball installation for Linux and Mac
- Support non-root installation
- Add train init framework
- Add GPU support for PS
- Support GangScheduling Native in MPIJob
- Support GangScheduling Native in PytorchJob
Fixed
- Upgrade Deployment version from extensions/v1beta1 to apps/v1
- Fix the issue of incorrect number of allocated GPUs
- Upgrade Helm to v2.14.1
- Fix evaluator & chief validation
- Fix incorrect cpu resource variable, should be psCPU
- Set exit code as 2 when delete job failed
- Fix the bug of using Estimator
- Fix the bug of deploying Prometheus
- Support Kubernetes 1.18 and above