Setting the Default Scheduler
By default, Kubernetes uses its own native scheduler to determine pod placement. The NVIDIA Run:ai platform provides a custom scheduler, runai-scheduler
, which is used by default for workloads submitted using the NVIDIA Run:ai platform.
This section outlines how to configure workloads submitted directly to Kubernetes or through external frameworks to run with the NVIDIA Run:ai Scheduler, instead of the default Kubernetes scheduler.
Enforce the Scheduler at the Namespace Level
When submitting workloads in a given namespace (i.e., NVIDIA Run:ai project), the parameter enforceRunaiScheduler
is enabled (true) by default. This ensures that any workload associated with a NVIDIA Run:ai project automatically uses the runai-scheduler
, including workloads submitted directly to Kubernetes or through external frameworks.
If this parameter is disabled, enforceRunaiScheduler=false
, workloads will no longer default to the NVIDIA Run:ai Scheduler. In this case, you can still use the NVIDIA Run:ai Scheduler by specifying it manually in the workload YAML.
Specify the Scheduler in the Workload YAML
To use the NVIDIA Run:ai Scheduler, specify it in the workload’s YAML file. This instructs Kubernetes to schedule the workload using the NVIDIA Run:ai Scheduler instead of the default one.
spec:
schedulerName: runai-scheduler
For example:
apiVersion: v1
kind: Pod
metadata:
annotations:
user: test
gpu-fraction: "0.5"
gpu-fraction-num-devices: "2"
labels:
runai/queue: test
name: multi-fractional-pod-job
namespace: test
spec:
containers:
- image: gcr.io/run-ai-demo/quickstart-cuda
imagePullPolicy: Always
name: job
env:
- name: RUNAI_VERBOSE
value: "1"
resources:
limits:
cpu: 200m
memory: 200Mi
requests:
cpu: 100m
memory: 100Mi
securityContext:
capabilities:
drop: ["ALL"]
schedulerName: runai-scheduler
serviceAccount: default
serviceAccountName: default
terminationGracePeriodSeconds: 5
Last updated