LogoLogo
Contact support
v2.22
  • Home
  • SaaS
  • Self-hosted
  • Multi-tenant
  • Getting Started
    • Overview
    • What's New
      • What's New in Version 2.22
      • Hotfixes for Version 2.22
    • Installation
      • Control Plane System Requirements
      • Preparations
      • Network Requirements
      • Install the Control Plane
      • Cluster System Requirements
      • Install Using Helm
      • Customized Installation
      • Upgrade
      • Uninstall
  • Infrastructure setup
    • Authentication and Authorization
      • Authentication and Authorization
      • Users
      • SSO
        • Set Up SSO with SAML
        • Set Up SSO with OpenID Connect
        • Set Up SSO with OpenShift
      • Roles
      • Applications
      • Access Rules
      • Cluster Authentication
    • Advanced Setup
      • Node Roles
      • Advanced Control Plane Configurations
      • Advanced Cluster Configurations
      • Container Access
        • External Access to Containers
        • User Identity in Containers
      • Service Mesh
      • Integrations
        • Interworking with Karpenter
    • Infrastructure Procedures
      • NVIDIA Run:ai at Scale
      • Monitoring and Maintenance
      • NVIDIA Run:ai System Monitoring
      • Clusters
      • Shared Storage
      • Nodes Maintenance
      • Backup and Restore
      • Secure Your Cluster
      • Logs Collection
      • Event History
  • Platform management
    • Manage AI Initiatives
      • Adapting AI Initiatives to Your Organization
      • Managing Your Organization
        • Projects
        • Departments
      • Managing Your Resources
        • Nodes
        • Configuring NVIDIA MIG Profiles
        • Using GB200 NVL72 and Multi-Node NVLink Domains
        • Node Pools
    • Scheduling and Resource Optimization
      • Scheduling
        • The NVIDIA Run:ai Scheduler: Concepts and Principles
        • How the Scheduler Works
        • Using the Scheduler with Third-Party Workloads
        • Workload Priority Control
        • Quick Starts
          • Over Quota, Fairness and Preemption
      • Resource Optimization
        • GPU Fractions
        • Dynamic GPU Fractions
        • Optimize Performance with Node Level Scheduler
        • GPU Time-Slicing
        • GPU Memory Swap
        • CPU Compute and Memory Allocation
        • Quick Starts
          • Launching Workloads with GPU Fractions
          • Launching Workloads with Dynamic GPU Fractions
          • Launching Workloads with GPU Memory Swap
    • Policies
      • Policies and Rules
      • Workload Policies
      • Policy YAML Examples
      • Policy YAML Reference
      • Scheduling Rules
    • Monitor Performance and Health
      • Before You Start
      • Monitor Workloads by Category
      • Metrics and Telemetry
      • Reports
  • Workloads in NVIDIA Run:ai
    • Introduction to Workloads
    • NVIDIA Run:ai Workload Types
    • Workloads
    • Workload Assets
      • Workload Assets
      • Environments
      • Data Sources
      • Data Volumes
      • Compute Resources
      • Credentials
    • Workload Templates
      • Workspace Templates
        • Workspace Templates (Legacy)
      • Training Templates
        • Standard Training Templates
        • Distributed Training Templates
    • Experiment Using Workspaces
      • Running Workspaces
      • Quick Starts
        • Running Jupyter Notebooks Using Workspaces
    • Train Models Using Training
      • Standard Training
        • Train Models Using a Standard Training Workload
        • Quick Starts
          • Run Your First Standard Training
      • Distributed Training
        • Train Models Using a Distributed Training Workload
        • Quick Starts
          • Run Your First Distributed Training
      • Best Practices: Checkpointing Preemptible Training Workloads
    • Deploy Models Using Inference
      • Deploy a Custom Inference Workload
      • Deploy Inference Workloads from Hugging Face
      • Deploy Inference Workloads with NVIDIA NIM
      • Deploy NVIDIA Cloud Functions (NVCF) in NVIDIA Run:ai
      • Quick Starts
        • Run Your First Custom Inference Workload
  • Settings
    • General Settings
    • User Settings
      • Email Notifications
      • User Applications
      • User Credentials
  • Reference
    • CLI Reference
      • Install and Configure CLI
      • Add NVIDIA Run:ai Authorization to Kubeconfig
      • Auto Update CLI Mechanism
      • CLI Commands Reference
        • runai cluster
        • runai config
        • runai department
        • runai inference
        • runai jax
        • runai kubeconfig
        • runai login
        • runai logout
        • runai mpi
        • runai node
        • runai nodepool
        • runai project
        • runai pvc
        • runai pytorch
        • runai report
        • runai tensorflow
        • runai training
        • runai upgrade
        • runai version
        • runai whoami
        • runai workload
        • runai workspace
        • runai xgboost
      • CLI Commands Examples
      • Administrator CLI
    • API Reference
      • How to Authenticate to the API
      • NVIDIA Run:ai REST API
      • API Usage Guides
        • Configuring Slack Notifications
        • Using Node Affinity via API
    • API Python Client Reference
      • API Python Client (runapy)
      • Install and Configure the Client
  • Support Policy
    • Product Support Policy
    • Product Version Life Cycle
On this page
Export as PDF
  1. Reference
  2. API Reference
  3. API Usage Guides

Using Node Affinity via API

PreviousConfiguring Slack NotificationsNextAPI Python Client Reference

Last updated 6 days ago

LogoLogo

Corporate Info

  • NVIDIA.com Home
  • About NVIDIA
  • Privacy Policy
  • Manage My Privacy
  • Terms of Service

NVIDIA Developer

  • Developer Home
  • Blog

Resources

  • Contact Us
  • Developer Program

Copyright © 2025, NVIDIA Corporation.

CtrlK
  • Functionality
  • Supported Features
  • Setting Node Affinity in Workload Submissions
  • Viewing Node Affinity in Workloads
  • Applying Node Affinity via Policies

NVIDIA Run:ai leverages the Kubernetes' Node Affinity feature to allow administrators and researchers more control over where workloads are scheduled. This guide explains how NVIDIA Run:ai integrates with and supports the standard Kubernetes Node Affinity API, both directly in workload specifications and through administrative policies. For more details, refer to the official Kubernetes documentation.

Functionality

You can use the nodeAffinity field within your workload specifications (spec.nodeAffinityRequired) to define scheduling constraints based on node labels.

When a workload with a node affinity specification is submitted, the NVIDIA Run:ai Scheduler evaluates these constraints alongside other scheduling factors such as resource availability and fairness policies.

Note

Preferred node affinity is not supported.

Supported Features

  • nodeAffinityRequired (requiredDuringSchedulingIgnoredDuringExecution) - Define hard requirements for node selection. Pods are scheduled only onto nodes that meet these requirements

  • Node selector terms - Use nodeSelectorTerms with matchExpressions to specify label-based rules.

  • Operators - Supported operators in matchExpressions include: In, NotIn, Exists, DoesNotExist, Gt, and Lt.

nodeAffinityRequired	<Object>
  nodeSelectorTerms	<[]Object>
    matchExpressions	<[]Object>
      key	    <string>
      operator	<enum> (In, NotIn, Exists, DoesNotExist, Gt, Lt)
      values	<[]string>

Setting Node Affinity in Workload Submissions

When submitting a workload, include the nodeAffinityRequired field in the API body. This field should describe the required node affinity rule, similar to Kubernetes’ nodeAffinity under requiredDuringSchedulingIgnoredDuringExecution.

See NVIDIA Run:ai API for more details.

Example:

curl -L 'https://<COMPANY-URL>/api/v1/workloads/workspaces' \ 
-H 'Authorization: Bearer <API-TOKEN>' \
-H 'Content-Type: application/json' \
-d '{
      "name": "workload-name", 
      "projectId": "<PROJECT-ID>", 
      "clusterId": "<CLUSTER-UUID>", 
      "spec": {
        "nodeAffinityRequired": {
          "nodeSelectorTerms": [
            {
              "matchExpressions": [
                {
                  "key": "run.ai/type",
                  "operator": "In",
                  "values": ["training", "inference"]
                }
              ]
            }
          ]
        }
      }
    }'

Viewing Node Affinity in Workloads

nodeAffinity is dynamically generated by combining user input with system-level scheduling requirements. The final, effective affinity expression is a result of several components:

  • User-defined affinity - The initial rules you provide for the workload

  • Platform features - System-generated rules for features such as node pools and Multi-Node NVLink (MNNVL)

  • Scheduling policies - Additional constraints applied by the NVIDIA Run:ai Scheduler

As a result, the affinity expression returned by the GET workloads/{workloadId}/pods endpoint reflects this final merged configuration, not only your original input.

Example:

A user submits a workload excluding nodes runai-cluster-system-0-0 and runai-cluster-system-0-1:

"nodeAffinityRequired": {
        "nodeSelectorTerms": [
          {
            "matchExpressions": [
              {
                "key": "kubernetes.io/hostname",
                "operator": "NotIn",
                "values": [
                    "runai-cluster-system-0-0",
                    "runai-cluster-system-0-1"
                ]
              }
            ]
          }
       ]
     }

The project also has quotas on two node pools: pool-a and pool-b. The merged affinity expression returned by the API reflects both the user-defined rules and the system-enforced node pool constraints:

{
  "pods": [
    {
      .
      .
      .
      "requestedNodePools": [
          "pool-b",
          "pool-a"
      ],
      "nodeAffinity": {
        "required": {
          "nodeSelectorTerms": [
            {
              "matchExpressions": [
                {
                  "key": "kubernetes.io/hostname",
                  "operator": "NotIn",
                  "values": [
                    "runai-cluster-system-0-0",
                    "runai-cluster-system-0-1"
                  ]
                },
                {
                  "key": "node-pool-label",
                  "operator": "In",
                  "values": [
                    "b"
                  ]
                }
              ]
            },
            {
              "matchExpressions": [
                {
                  "key": "kubernetes.io/hostname",
                  "operator": "NotIn",
                  "values": [
                    "runai-cluster-system-0-0",
                    "runai-cluster-system-0-1"
                  ]
                },
                {
                  "key": "node-pool-label",
                  "operator": "In",
                  "values": [
                    "a" 
                  ]
                }
              ]
            }
          ]
        }
      },
      .
      .
      .
    }
  ]
}

Applying Node Affinity via Policies

Administrators can enforce node affinity policies in two ways:

  • Can edit - The administrator applies a policy, but users can override it when submitting a workload.

  • Can't edit - The administrator applies a policy that can't be overridden by the user.

Example:

defaults:
  nodeAffinityRequired:
      nodeSelectorTerms:
        - matchExpressions:
            - key: app
              operator: In
              values:
                - frontend
                - backend

rules:
  nodeAffinityRequired:
    canEdit: false