Submit a distributed spark job
Arena supports and simplifies distributed spark job.
1. To run a distributed spark job, you need to specify:
- The spark job image which contains the main class jar. (required)
- Main class of your jar. (required)
- Jar path in the container.(required)
- The number of executors.(default: 1)
- The resource cpu request of driver pod (default: 1)
- The resource memory request of driver pod (default: 500m)
- The resource cpu request of executor pod (default: 1)
- The resource memory request of executor pod (default: 500m)
2. How to create spark job image.
Arena spark job is based on spark-operator.You can create spark job image with tool docker-image-tool.
3. Install the spark operator.
Please refer Spark operator documentation to get more details about how to install spark operator.
4. Submit a spark job,the following command shows how to use arena to submit a sparkjob.
$ arena submit sparkjob \
--name=spark-pi \
--image=spark:3.5.0 \
--jar=local:///opt/spark/examples/jars/spark-examples_2.12-3.5.0.jar \
--main-class=org.apache.spark.examples.SparkPi \
--spark-version=3.5.0 \
--driver-cpu-request=1 \
--driver-memory-request=500m \
--driver-service-account=spark-operator-spark \
--replicas=1 \
--executor-cpu-request=1 \
--executor-memory-request=500m
sparkapplication.sparkoperator.k8s.io/spark-pi created
INFO[0002] The Job spark-pi has been submitted successfully
INFO[0002] You can run `arena get spark-pi --type sparkjob -n default` to check the job status
5. Get spark job details
$ bin/arena get spark-pi
Name: spark-pi
Status: SUCCEEDED
Namespace: default
Priority: N/A
Trainer: SPARKJOB
Duration: 9s
CreateTime: 2024-07-30 10:56:25
EndTime: 2024-07-30 10:56:34
Instances:
NAME STATUS AGE IS_CHIEF GPU(Requested) NODE
---- ------ --- -------- -------------- ----
spark-pi-driver Completed 9s true 0 cn-hongkong.172.21.122.2
6. Get logs of the spark job.
$ arena logs spark-pi --tail 10
24/07/30 02:56:33 WARN ExecutorPodsWatchSnapshotSource: Kubernetes client has been closed.
24/07/30 02:56:33 INFO MapOutputTrackerMasterEndpoint: MapOutputTrackerMasterEndpoint stopped!
24/07/30 02:56:33 INFO MemoryStore: MemoryStore cleared
24/07/30 02:56:33 INFO BlockManager: BlockManager stopped
24/07/30 02:56:33 INFO BlockManagerMaster: BlockManagerMaster stopped
24/07/30 02:56:33 INFO OutputCommitCoordinator$OutputCommitCoordinatorEndpoint: OutputCommitCoordinator stopped!
24/07/30 02:56:33 INFO SparkContext: Successfully stopped SparkContext
24/07/30 02:56:33 INFO ShutdownHookManager: Shutdown hook called
24/07/30 02:56:33 INFO ShutdownHookManager: Deleting directory /tmp/spark-271a7ffc-3be4-4f5d-b6dc-082f4f046cb1
24/07/30 02:56:33 INFO ShutdownHookManager: Deleting directory /var/data/spark-1b0e1d8b-4dce-43aa-8e5c-20ea82490648/spark-0b1ae353-731c-4bc5-b4fe-0f21c92adfb0
7. Delete the spark job when it finished.
$ arena delete spark-pi
INFO[0002] The training job spark-pi has been deleted successfully