What's New in Version 2.24

The NVIDIA Run:ai v2.24 what's new provides a detailed summary of the latest features, enhancements, and updates introduced in this version. They serve as a guide to help users, administrators, and researchers understand the new capabilities and how to leverage them for improved workload management, resource optimization, and more.

circle-info

Important

For a complete list of deprecations, see Deprecation notifications. Deprecated features, APIs, and capabilities remain available for two versions from the time of the deprecation notice, after which they may be removed.

Dynamo Support

NVIDIA Run:ai supports Dynamo-based inference workloads through the DynamoGraphDeployment workload type. This allows Dynamo workloads to be deployed, scheduled, and monitored using the same platform capabilities and operational model as native workloads. See Supported workload types for more details. From cluster v2.23 onward

Key capabilities include:

  • YAML-based deployment and management - Dynamo workloads can be submitted via YAML from the UI, API, or CLI, without requiring direct cluster access.

  • Hierarchical gang scheduling - NVIDIA Run:ai supports multi-level gang scheduling for Dynamo workloads. Replica groups are scheduled together as sub-gangs, and the entire workload is then scheduled as a single unit. This ensures coordinated placement and execution across all components of the Dynamo workload.

  • Topology-aware scheduling - NVIDIA Run:ai applies topology-aware scheduling at the workload level to ensure Dynamo workload components are placed according to the underlying cluster topology, improving communication efficiency and execution consistency.

  • Support for automatic MNNVL - Dynamo workloads are supported on Multi-Node NVLink (MNNVL) domains, including GB200 NVL72 systems. NVIDIA Run:ai applies the appropriate compute domain configuration to ensure Dynamo workloads are placed and scaled within the same NVLink domain.

  • Automatic discovery of Dynamo frontend endpoints - NVIDIA Run:ai automatically detects Dynamo frontend endpoints and exposes them for access and monitoring.

  • Unified workload lifecycle and status visibility - Dynamo workloads are managed, monitored, and tracked with a unified lifecycle and status view.

AI Practitioners

Workloads

  • YAML-based workload submission in the UI and CLI - Submit supported workload types defined in YAML directly from the UI and CLI. This brings YAML-based submission, previously available through the API, into interactive workflows, allowing you to submit existing Kubernetes or framework-specific manifests while still benefiting from NVIDIA Run:ai scheduling, resource management, and monitoring. See Submit supported workload types via YAML for more details. From cluster v2.23 onward

  • Automatic network topology acceleration for supported workloads - Network topology–aware scheduling is applied automatically to supported distributed workload types submitted via YAML. Once a topology is attached to a node pool, NVIDIA Run:ai automatically applies Preferred topology constraints at the lowest available level for the entire workload, optimizing pod placement without additional user configuration. This expands the topology acceleration beyond NVIDIA Run:ai native distributed workloads to additional workload types. See Accelerating workloads with network topology-aware scheduling for more details. From cluster v2.23 onward

  • Visibility into workload topology constraints - Workloads now expose the topology constraints requested during scheduling, providing clear visibility into how network topology influences placement decisions. In the UI, NVIDIA Run:ai native workloads display the requested topology constraints in the workload Details view, while the Workloadsarrow-up-right API exposes these fields across native and supported workload types. From cluster v2.24 onward

  • MNNVL acceleration for supported workload types - NVIDIA Run:ai now enables running supported workload types on Multi-Node NVLink (MNNVL) domains, including GB200 NVL72 systems. NVIDIA Run:ai applies the appropriate compute domain configuration to ensure workloads are placed and scaled within the same NVLink domain. AI practitioners can submit supported workload types using the Workloads V2arrow-up-right API and configure their MNNVL preference as part of the workload submission. From cluster v2.24 onward

  • AI application–based workload grouping in the UI - NVIDIA Run:ai provides a dedicated AI applications view. This view automatically groups Kubernetes resources deployed via Helm charts into a single logical application, allowing you to list, sort, and filter AI applications. You can also inspect aggregated resource requests and allocations (GPU, CPU, memory) and view the underlying workloads through the Details pane, making it easier to understand and manage complex, multi-component solutions. See AI applications for more details. From cluster v2.23 onward

  • Separate priority and preemptibility controls - Workload priority and preemptibility are configured as two independent parameters across the UI, CLI, and API for native and supported workload types. If no preemptibility value is specified, the existing behavior based on priority is applied automatically. See Workload priority and preemption for more details. From cluster v2.24 onward

  • Authenticated browsing for the NGC catalog - Browse the NGC catalog and private NGC registries as an authenticated user by selecting your NGC API key credentials during workload submission or template creation. This provides access to models and containers that require authentication while preserving the option to browse the public container registry. Private NGC registries require administrator configuration in the General settings. Beta From cluster v2.23 onward

  • Improved fractional GPU support for multi-container pods - Fractional GPUs are no longer limited to the first container in a pod. You can explicitly specify which container should receive fractional GPU resources using an annotation. If no container is specified, fractional GPUs continue to be associated with the first container by default. See GPU fractions and Dynamic GPU fractions for more details. From cluster v2.24 onward

  • Support for elastic distributed workloads on NVLink domains - Elastic distributed workloads, including auto-scaling and dynamically sized deployments, are fully supported on GB200 NVL72 and Multi-Node NVLink (MNNVL) domains using NVIDIA DRA driver version 25.8 and later. NVIDIA Run:ai automatically applies ComputeDomain configuration and topology-aware scheduling to ensure workloads scale within the same NVLink domain. See Using GB200 NVL72 and Multi-Node NVLink domains for more details. From cluster v2.24 onward

  • Native Load Balancer support - NVIDIA Run:ai exposes LoadBalancer connectivity directly in the UI and CLI when submitting workloads or creating templates (assuming a load balancer is already installed in the cluster). Configure service ports explicitly and view clearer port configuration and connectivity status. From cluster v2.24 onward

  • Min/max worker configuration for PyTorch distributed training - Define the minimum and maximum number of workers directly from the UI when submitting PyTorch distributed training workloads, providing greater flexibility and control over resource allocation. See Train models using a distributed training workload for more details.

  • Connections column enabled by default in the Workloads grid - The Connections column is now selected by default in the Workloads table. When a workload has a single connection, the URL is displayed directly with long URLs automatically shortened. The URL is clickable and opens the Connections dialog. When multiple connections exist, the table displays the total count.

  • New EmptyDir data source for ephemeral storage - A new EmptyDir data source is available for one-time configuration during workload submission or template creation. EmptyDir provides temporary, node-local storage that exists only for the lifetime of the workload. From cluster v2.23 onward

  • Enhanced workload details view - The workload Details tab for NVIDIA Run:ai native workloads has been updated to provide a clearer and more structured view of workload configuration. The updated design improves readability and makes it easier to understand how a workload was submitted and configured. Key enhancements include:

    • Improved layout and data presentation - Configuration fields are grouped and displayed more intuitively, helping users quickly find the information they need.

    • Specification selector - When a workload contains multiple specs, a new dropdown allows you to easily switch between them.

Inference

  • NVIDIA NIM service API enhancements - NVIDIA Run:ai expands support for deploying and managing NIMs through the NVIDIA NIM Operator, providing a standardized, operator-based deployment flow aligned with NIM-native configurations. NIM services are fully managed through the NVIDIA Run:ai API, with UI and CLI support planned for a future release. This capability does not replace the current NIM deployment flow and is available as an additional option. See NVIDIA NIMarrow-up-right API for more details. From cluster v2.23 onward

    • Autoscaling allowing NIM services to scale dynamically based on demand

    • Fractional GPU support, enabling NIM services to request and use partial GPUs for more efficient GPU utilization

    • Multi-node NIM deployments, enabling distributed NIM workloads across multiple nodes

    • Policy enforcement through a dedicated NVIDIA NIM Policyarrow-up-right API for consistent governance of NIM services

    • Partial updates via a new PATCH endpoint, allowing targeted changes without resubmitting the full specification

    • NIM Cachearrow-up-right support for model stores, enabling caching of specific LLM or multi-LLM model artifacts to improve startup time and reuse across deployments

  • LeaderWorkerSet (LWS) as a new workload type - LeaderWorkerSet is now available as a supported workload type. LWS workloads can be deployed and managed using YAML submission from the UI, API, or CLI, providing a standardized way to run leader–worker and multi-process workloads across the platform without direct cluster access. See Supported workload types for more details. From cluster v2.23 onward

  • NGC API key support for NVIDIA NIM workloads - NVIDIA Run:ai supports using an NGC API key when deploying NIM workloads to handle both image access and model runtime authentication. A single NGC API key is automatically applied for pulling NIM images from the NGC catalog and injected as a runtime environment variable required for downloading model weights. This streamlines NIM deployment by removing the need for separate pull secrets and runtime credentials while enabling full user self-service for authenticated NIM workloads. See Deploy inference workloads from NVIDIA NIM for more details. From cluster v2.23 onward

  • Distributed inference support in the CLI - Native distributed inference workloads can be submitted and managed directly from the NVIDIA Run:ai CLI. AI practitioners can use familiar NVIDIA Run:ai commands to work with distributed inference workloads, such as list, describe, logs, exec, port-forward, update, and delete. See CLI command reference for more details. From cluster v2.23 onward

  • Control access scope for inference serving endpoints - Set whether an inference serving endpoint is accessible externally or restricted to internal cluster traffic when submitting workloads or creating templates. Endpoints can be configured as External (public access), if your administrator has configured Knative to support external access, or Internal only, limiting access to in-cluster traffic. From cluster v2.24 onward

  • Hugging Face model catalog browsing - Browse and search the Hugging Face model catalog directly from the NVIDIA Run:ai UI and API when creating Hugging Face inference workloads. The live catalog view displays model details such as download count and gated status. For gated models, the platform prompts you to provide a Hugging Face token for access, while open models can be selected without authentication. See Deploy inference workloads from Hugging Face for more details.

  • Distributed inference templates (API) - Distributed inference templates allow you to save workload configurations that can be reused across distributed inference submissions. These templates simplify the submission process and promote standardization across distributed inference workloads. From cluster v2.22 onward

  • New NVIDIA NIM performance histogram metrics - The Metrics pane now includes two new histograms for NVIDIA NIM workloads: end-to-end request latency and time to first token (TTFT). These metrics provide deeper visibility into inference performance and responsiveness. See NVIDIA NIM metrics for more details. From cluster v2.23 onward

Workload Assets

  • Updated credential creation in the UI - The Credentials page has been redesigned for improved usability. The Access key and Username & password credential types have been consolidated under Generic secret, where each secret format now opens a dedicated form with context-specific input fields. In addition, a dedicated SSH key format has been added under Generic secret for easier configuration of SSH-based authentication. This change simplifies the UI and provides a more streamlined experience for managing credentials. See Credentials for more details. From cluster v2.23 onward

  • NGC API key support for credentials - A new NGC API Key credential type is available across both credential assets and user credentials, enabling authenticated access to NGC resources, including gated or private models and images. From cluster v2.23 onward

  • New PVC events - NVIDIA Run:ai emits new PVC asset lifecycle events - Creating, Deleting, and Syncing. These events appear in the PVC’s Event history, extending the visibility introduced in previous releases and giving administrators clearer insight into PVC asset changes and activity over time.

Command-line Interface (CLI v2)

  • Template and asset-based workload submission in the CLI - The NVIDIA Run:ai CLI supports submitting native workloads using existing templates and workload assets, such as compute resources, environments, and data sources. This allows AI practitioners to reuse the same predefined configurations available in the UI and API, reducing the need for long, flag-heavy CLI commands. Templates and assets can be browsed and inspected directly from the CLI to support consistent and reliable workload submission. See CLI command reference for more details. From cluster v2.23 onward

  • Cluster diagnostics collection command - Added a new CLI command, runai diagnostics collect-logs, which gathers diagnostic logs from the Kubernetes cluster for troubleshooting or sharing with NVIDIA Run:ai support. You can collect logs from all or specific namespaces, specify an output directory, and choose whether to include previous pod logs, simplifying cluster debugging and support workflows. See runai diagnostics command for more details.

  • Extended storage visibility in CLI describe commands - The describe command for native workloads supports --storage, showing storage resources such as PVCs, ConfigMaps, and Secrets.

Platform Administrators

Organizations - Projects/Departments

Redesigned Projects and Departments management - NVIDIA Run:ai introduces an improved organization management experience that provides better visibility into resource distribution and clearer explainability for how resources are prioritized and allocated across the organization. This update simplifies large-scale organizational management while maintaining full compatibility with NVIDIA Run:ai’s advanced scheduling capabilities. See Projects and Departments for more details. From cluster v2.20 onward

Key improvements include:

  • Improved organizational visibility - A clearer, “big picture” view of projects and departments, making it easier to understand how GPU resources are distributed and prioritized.

  • Bulk management operations - Administrators can perform bulk actions across multiple organizational units directly from the UI and API, reducing operational overhead.

  • Clearer resource explainability - Improved transparency into resource contention and ordering, helping align scheduling behavior with business needs.

Authentication and Authorization

  • Custom roles using permission sets (API) - Administrators can now create custom roles by combining predefined permission sets using the Rolesarrow-up-right API. Permission sets are predefined, supported groupings of permissions that represent all required permission dependencies for a specific operation (for example, workload submission). Custom roles can then be assigned to users or groups through access rules in the UI or API, alongside the existing NVIDIA Run:ai predefined roles. This allows organizations to tailor access control to their operational needs while maintaining compatibility with the platform’s supported permission model. See Roles for more details. From cluster v2.21 onward

  • Updated predefined roles - Predefined roles in NVIDIA Run:ai have been updated to better align with common organizational responsibilities and workflows. See Roles for more details.

    • Added new predefined roles - AI practitioner, Data and storage administrator, and Project administrator

    • Some existing predefined roles have been deprecated. See Deprecation notifications for more details.

  • Reduced access to clusters and node pools - New APIs and permissions are now available to support reduced access to clusters and node pools - Clusters minimal and Node pools minimal. These APIs allow roles to perform actions such as workload submission while exposing only the minimal required cluster and node pool information (for example, names and IDs), rather than full read access. This improves role design by aligning the visible data with what is actually required for the action being performed. Roles that rely on full read access remain unchanged. Some predefined roles are planned to transition to the new minimal access as described in the Deprecation notifications.

  • Renamed applications and user applications - Authentication entities in NVIDIA Run:ai have been renamed across the UI, API and CLI to better reflect their purpose and usage. Applications are now referred to as service accounts, and user applications are now referred to as access keys. These changes are terminology updates only, all existing functionality remains the same, and all existing records continue to work unchanged. Existing API endpoints and CLI login modes reflect the updated terminology. See Deprecation notifications for more details.

  • Email invitations for local users - Administrators can now choose to automatically send an email invitation when creating a local user. When enabled, the invitation email allows the new user to sign in and get started. To use this feature, ensure that email notifications are configured under General settings.

Node Pools

Time-based fairshare configuration per node pool - NVIDIA Run:ai supports time-based fairshare to improve long-term fairness in over-quota resource allocation. Instead of relying only on momentary demand, the Scheduler factors in historical GPU usage over time, ensuring that projects with lower recent consumption are given fair access to resources. Usage is tracked continuously, and each project’s GPU-hour consumption is evaluated against its configured weight to balance resource distribution more effectively across projects. Time-based fairshare can be enabled and configured per node pool using the Node pools form, with advanced customization available through the Node poolsarrow-up-right API. From cluster v2.24 onward

Onboarding

Email server configuration during admin onboarding - The administrator onboarding wizard includes an optional step for configuring an email server (SMTP). This allows administrators to set up email delivery early in the onboarding process, enabling email invitations for local users and supporting email-based notifications across the platform using the same configuration.

Analytics

Overview dashboard enhancements - Improvements to the Overview dashboard strengthen visibility and support key monitoring workflows. These enhancements also support deprecating the legacy Grafana dashboards. See Deprecation notifications for more details:

  • Enhancements to the Projects/Departments tables - Timeframe controls for GPU allocation, utilization, and memory utilization are located within each column. The tables display up to 15 entries, include GPU quota, separate pending and running workloads, and provide direct links to each project or department.

  • New pending-time widget - Introduced a new widget that displays pending workloads count by pending time, helping admins understand how long workloads wait and identify projects and departments with extended pending times.

  • New guided tour for the Overview dashboard - A built-in tour walks administrators and AI practitioners through the key areas of the Overview dashboard, helping them navigate the dashboard and become familiar with the functionality available.

  • Additional dashboard improvements include:

    • Added a top-stats Failed workloads widget.

    • Added numeric counters to bar graphs, displaying values directly on each bar rather than only in tooltips.

    • Updated the Workloads by category/type widget to count running workloads only.

    • Added an Idle time column to the idle workloads table.

    • Updated widget ordering to separate current time widgets from over time widgets, improving the analysis process from identifying issue to over time investigation.

Policies

Policy-aware behavior for templates and assets - Templates and assets that do not fully comply with policy are no longer blocked outright when submitting a workload. Instead, NVIDIA Run:ai now evaluates non-compliance on a case-by-case basis:

  • Fixable non-compliance - If compliance can be achieved by adjusting settings during submission, the template or asset can be loaded. The UI highlights what needs to be updated to meet policy requirements.

  • Non-fixable non-compliance - If the non-compliant configuration cannot be changed, the template or asset cannot be used, and the relevant policy is displayed to explain the restriction.

  • Quick workload submit behavior - Templates with any non-compliance can now be loaded, but quick submit is automatically blocked. The full workload submission flow opens by default, where the UI highlights what needs to be updated to meet policy requirements.

Network Topology

Network topology visibility in clusters and node pools - The Network topologies modal in the Clusters page displays a new column showing which node pools each topology is associated with. This information is also available in the Network topologiesarrow-up-right API. From cluster v2.23 onward

Reports

Consumption report enhancements for GPU hour breakdown - The Consumption report includes two new columns, GPU deserved quota hours and GPU over-quota hours. These metrics fully support all existing grouping options, including cluster, node pool, department, and project. This change also supports deprecating the legacy Consumption dashboard. See Deprecation notifications for more details.

Event History

Audit logging for password resets - Audit logs capture all password reset events, including administrator-initiated resets, user-initiated resets, and password-recovery (“forgot password”) actions. This enhancement improves traceability and security visibility across user management workflows.

Infrastructure Administrators

Installation

Ingress controller recommendation update - Due to an announced deprecation by the upstream NGINX Ingress Controllerarrow-up-right project, NVIDIA Run:ai is updating its recommended ingress controller to HAProxy Ingress for supported environments. The Kubernetes Ingress standard remains fully supported. This change affects only the underlying ingress controller implementation and is intended to ensure long-term security, stability, and maintainability. For fresh installations, see Installation. To upgrade from earlier versions, see Migrate from NGINX to HAProxy Ingress. From cluster v2.24 onward

Advanced Configurations

  • Global pod tolerations (control plane) - You can define global.tolerations to apply a shared list of Kubernetes pod tolerations across NVIDIA Run:ai services and supported third-party services. These tolerations are applied globally unless explicitly overridden at the individual service level. See Advanced control plane configurations for more details. From cluster v2.24 onward

  • Global replica count (control plane) - A new global.replicaCount setting allows you to define a default number of replicas for all NVIDIA Run:ai services that support scaling. See Advanced control plane configurations for more details. From cluster v2.24 onward

  • Default pod anti-affinity - A new control plane and cluster configuration, global.requireDefaultPodAntiAffinity, applies a default pod anti-affinity rule to prevent pods from the same service from being scheduled on the same node when possible. This setting is enabled by default. See Advanced control plane configurations and Advanced cluster configurations for more details. From cluster v2.24 onward

System Requirements

  • NVIDIA Run:ai supports Kubernetes version 1.35.

  • NVIDIA Run:ai supports OpenShift version 4.20.

  • NVIDIA Run:ai supports GPU Operator version 25.10.

  • OpenShift version 4.16 is no longer supported.

  • Kubernetes version 1.31 is no longer supported.

  • Rancher Kubernetes Engine (RKE1) is no longer supported due to reaching end of life (EOL). RKE2 is the recommended Rancher distribution. See the Rancher migration guidearrow-up-right for more details.

Deprecation Notifications

circle-info

Note

Deprecated features, APIs, and capabilities remain available for two versions from the time of the deprecation notice, after which they may be removed.

NVIDIA Run:ai Predefined Roles

The following predefined roles are deprecated in the UI and API. Review the new predefined roles to determine whether they meet your requirements, or create a custom role using the API. See Roles for more details:

  • Compute resource administrator

  • Credentials administrator

  • Data source administrator

  • Data volume administrator

  • Environment administrator

  • L1 researcher

  • L2 researcher

  • ML engineer

  • Research manager

  • Template administrator

During the deprecation period, the following predefined roles will be updated with minimal access to cluster and node pool data:

  • Both cluster and node pool access are planned to transition to Clusters minimal and Node pools minimal - L1 researcher, L2 researcher, Research manager

  • Cluster access is planned to transition to Clusters minimal while node pools access remains unchanged - ML engineer

  • Cluster access is planned to transition to Clusters minimal - Compute resource administrator, Credentials administrator, Data source administrator, Data volume administrator, Environment administrator, Template administrator

Models Catalog

The Models catalog page is deprecated. Previously, the Models catalog provided a quick start experience for deploying a curated set of Hugging Face models. The same capability is now available through the Hugging Face inference workload flow, which integrates directly with Hugging Face and allows you to browse, select, and deploy any supported model from an open list. To deploy Hugging Face models, use the Hugging Face inference workload flow.

API Deprecation Notifications

Deprecated Endpoints

Deprecated Endpoint
Replacement Endpoint

/api/v1/apps

/api/v1/service-accounts

/api/v1/user-applications

/api/v1/access-keys

/api/v1/authorization/roles

/api/v2/authorization/roles

Deprecated Parameters

Endpoint
Deprecated Parameter
Replacement Parameter

api/v1/authorization/access-rules

subjectType: app (enum)

subjectType: service-account

  • /api/v2/authorization/roles

  • /api/v1/authorization/roles

  • /api/v1/authorization/permissions

resourceType: app (enum)

resourceType: service-account

  • /api/v1/org-unit/projects

  • /api/v1/org-unit/departments

resources: priority

resources: rank

CLI Deprecation Notifications

CLI Command
Deprecated Parameter
Replacement Parameter

runai login

application

access-key

runai login

user

password

runai [workloadtype] submit

--environment

--environment-variable

Last updated