Using the Scheduler with Third-Party Workloads

By default, Kubernetes uses its own native scheduler to determine pod placement. The NVIDIA Run:ai platform provides a custom scheduler, runai-scheduler, which is used by default for workloads submitted using the NVIDIA Run:ai platform. This section outlines how to configure third-party workloads, such as those submitted directly to Kubernetes, to run with the NVIDIA Run:ai Scheduler, runai-scheduler, instead of the default Kubernetes scheduler.

Specify the Scheduler in the Workload YAML

To use the NVIDIA Run:ai Scheduler for third-party workloads, specify it in the workload’s YAML file. This instructs Kubernetes to schedule the workload using the NVIDIA Run:ai Scheduler instead of the default one.

spec:schedulerName: runai-scheduler

For example:

apiVersion: v1
kind: Pod
metadata:
  annotations:
    user: test
    gpu-fraction: "0.5"
    gpu-fraction-num-devices: "2"
  labels:
    runai/queue: test
  name: multi-fractional-pod-job
  namespace: test
spec:
  containers:
  - image: gcr.io/run-ai-demo/quickstart-cuda
    imagePullPolicy: Always
    name: job
    env:
    - name: RUNAI_VERBOSE
      value: "1"
    resources:
      limits:
        cpu: 200m
        memory: 200Mi
      requests:
        cpu: 100m
        memory: 100Mi
    securityContext:
      capabilities:
        drop: ["ALL"]
  schedulerName: runai-scheduler
  serviceAccount: default
  serviceAccountName: default
  terminationGracePeriodSeconds: 5

Enforce the Scheduler at the Namespace Level

If modifying the workload YAML is not possible, you can enforce the use of the NVIDIA Run:ai Scheduler for all workloads in a given namespace (i.e., NVIDIA Run:ai project) by applying an annotation. Once applied, all workloads submitted to the annotated namespace will automatically use the NVIDIA Run:ai Scheduler without requiring individual YAML modifications.

  1. Annotate the namespace with: runai/enforce-scheduler-name: true. For example, to annotate a project named proj-a, use the following command:

kubectl annotate ns runai-proj-a runai/enforce-scheduler-name=true
  1. Verify the namespace in YAML format to see the annotation by running the following:

kubectl get ns runai-proj-a -o yaml

The following shows an example output:

apiVersion: v1
kind: Namespace
metadata:
  annotations:
    runai/enforce-scheduler-name: "true"
  creationTimestamp: "2024-04-09T08:15:50Z"
  labels:
    kubernetes.io/metadata.name: runai-proj-a
    runai/namespace-version: v2
    runai/queue: proj-a
  name: runai-proj-a
  resourceVersion: "388336"
  uid: c53af666-7989-43df-9804-42bf8965ce83
spec:
  finalizers:
  - kubernetes
status:
  phase: Active

Last updated