Using the Scheduler with Third-Party Workloads
By default, Kubernetes uses its own native scheduler to determine pod placement. The NVIDIA Run:ai platform provides a custom scheduler, runai-scheduler
, which is used by default for workloads submitted using the NVIDIA Run:ai platform. This section outlines how to configure third-party workloads, such as those submitted directly to Kubernetes, to run with the NVIDIA Run:ai Scheduler, runai-scheduler
, instead of the default Kubernetes scheduler.
Specify the Scheduler in the Workload YAML
To use the NVIDIA Run:ai Scheduler for third-party workloads, specify it in the workload’s YAML file. This instructs Kubernetes to schedule the workload using the NVIDIA Run:ai Scheduler instead of the default one.
spec:schedulerName: runai-scheduler
For example:
apiVersion: v1
kind: Pod
metadata:
annotations:
user: test
gpu-fraction: "0.5"
gpu-fraction-num-devices: "2"
labels:
runai/queue: test
name: multi-fractional-pod-job
namespace: test
spec:
containers:
- image: gcr.io/run-ai-demo/quickstart-cuda
imagePullPolicy: Always
name: job
env:
- name: RUNAI_VERBOSE
value: "1"
resources:
limits:
cpu: 200m
memory: 200Mi
requests:
cpu: 100m
memory: 100Mi
securityContext:
capabilities:
drop: ["ALL"]
schedulerName: runai-scheduler
serviceAccount: default
serviceAccountName: default
terminationGracePeriodSeconds: 5
Enforce the Scheduler at the Namespace Level
If modifying the workload YAML is not possible, you can enforce the use of the NVIDIA Run:ai Scheduler for all workloads in a given namespace (i.e., NVIDIA Run:ai project) by applying an annotation. Once applied, all workloads submitted to the annotated namespace will automatically use the NVIDIA Run:ai Scheduler without requiring individual YAML modifications.
Annotate the namespace with:
runai/enforce-scheduler-name: true
. For example, to annotate a project namedproj-a
, use the following command:
kubectl annotate ns runai-proj-a runai/enforce-scheduler-name=true
Verify the namespace in YAML format to see the annotation by running the following:
kubectl get ns runai-proj-a -o yaml
The following shows an example output:
apiVersion: v1
kind: Namespace
metadata:
annotations:
runai/enforce-scheduler-name: "true"
creationTimestamp: "2024-04-09T08:15:50Z"
labels:
kubernetes.io/metadata.name: runai-proj-a
runai/namespace-version: v2
runai/queue: proj-a
name: runai-proj-a
resourceVersion: "388336"
uid: c53af666-7989-43df-9804-42bf8965ce83
spec:
finalizers:
- kubernetes
status:
phase: Active
Last updated