What's new in version 2.21
The NVIDIA Run:ai v2.21 What's new provides a detailed summary of the latest features, enhancements, and updates introduced in this version. They serve as a guide to help users, administrators, and researchers understand the new capabilities and how to leverage them for improved workload management, resource optimization, and more.
Important
For a complete list of deprecations, see Deprecation notifications. Deprecated features and capabilities will be available for two versions ahead of the notification.
AI practitioners
Flexible workload submission
Streamlined workload submission with a customizable form – The new customizable submission form allows you to submit workloads by selecting and modifying an existing setup or providing your own settings. This enables faster, more accurate submissions that align with organizational policies and individual workload needs. Experimental
From cluster v2.18 onward
Feature high level details:
Flexible submission options – Choose from an existing setup and customize it, or start from scratch and provide your own settings for a one-time setup.
Improved visibility – Review existing setups and understand their associated policy definitions.
One-time data sources setup – Configure a data source as part of your one-time setup for a specific workload.
Unified experience – Use the new form for all workload types — workspaces, standard training, distributed training, and custom Inference.
Workspaces and training
Support for JAX distributed training workloads – You can now submit distributed training workloads using the JAX framework via the UI, API, and CLI. This enables you to leverage JAX for scalable, high-performance training, making it easier to run and manage JAX-based workloads seamlessly within NVIDIA Run:ai. See Train models using a distributed training workload for more details.
From cluster v2.21 onward
Pod restart policy for all workload types – A restart policy can be configured to define how pods are restarted when they terminate. The policy is set at the workload level across all workload types via the API and CLI. For distributed training workloads, restart policies can be set separately for master and worker pods. This enhancement ensures workloads are restarted efficiently, minimizing downtime and optimizing resource usage.
From cluster v2.21 onward
Enhanced failure status details for workloads – When a workload is marked as "Failed", clicking the “i” icon next to the status provides detailed failure reasons, with clear explanations across compute, network, and storage resources. This enhancement improves troubleshooting efficiency, and helps you quickly diagnose and resolve issues, leading to faster workload recovery.
From cluster v2.21 onward
Workload priority class management for training workloads – You can now change the default priority class of training workloads within a project, via the API or CLI, by selecting from predefined priority class values. This influences the workload’s position in the project scheduling queue managed by the Run:ai Scheduler, ensuring critical training jobs are prioritized and resources are allocated more efficiently. See Workload priority class control for more details.
From cluster v2.18 onward
Workload assets
New environment presets – Added new NVIDIA Run:ai environment presets when running in a host-based routing cluster - vscode, rstudio, jupyter-scipy, tensorboard-tensorflow. See Environments for more details.
From cluster v2.21 onward
Support for PVC size expansion – Adjust the size of Persistent Volume Claims (PVCs) via the Update a PVC asset API, leveraging the
allowVolumeExpansion
field of the storage class resource. This enhancement enables you to dynamically adjust storage capacity as needed.Improved visibility of storage class configurations – When creating new PVCs or volumes, the UI now displays access modes, volume modes, and size options based on administrator-defined storage class configurations. This update ensures consistency, increases transparency, and helps prevent misconfigurations during setup.
From cluster v2.21 onward
ConfigMaps as environment variables – Use predefined ConfigMaps as environment variables during environment setup or workload submission.
From cluster v2.21 onward
Improved scope selection experience – The scope mechanism has been improved to reduce clicks and enhance usability. The organization tree now opens by default at the cluster level for quicker navigation. Scope search now includes alphabetical sorting and supports browsing non-displayed scopes. You can also use keyboard shortcuts: Escape to cancel, or click outside the modal to close it. These improvements apply across templates, policies, projects, and all workload assets.
Command-line interface (CLI v2)
New default CLI – CLI v2 is the default command-line interface. CLI v1 has been deprecated as of version 2.20.
Secret volume mapping for workloads – You can now map secrets to volumes when submitting workloads using the
--secret-volume
flag. This feature is available for all workload types - workspaces, training, and inference.Support for environment field references in submit commands – A new flag,
fieldRef
, has been added to all submit commands to support environment field references in a key:value format. This enhancement enables dynamic injection of environment variables directly from pod specifications, offering greater flexibility during workload submission.Improved PVC visibility and selection for researchers – Use
runai pvc
to list existing PVCs within your scope, making it easier to reference available options when submitting workloads. A noun auto-completion has been introduced for storage, streamlining the selection process. Theworkload describe
command also includes a PVC section, improving visibility into persistent volume claims. These enhancements provide greater clarity and efficiency in storage utilization.Enhanced workload deletion options – The
runai workload delete
command now supports deleting multiple workloads by specifying a list of workload names (e.g.,workload-a, workload-b, workload-c
).
ML engineers
Workloads - inference
Support for inference workloads via CLI v2 – You can now run inference workloads directly from the command-line interface. This update enables greater automation and flexibility for managing inference workloads. See
runai inference
for more details.Enhanced rolling inference updates – Rolling inference updates allow ML engineers to apply live updates to existing inference workloads—regardless of their current status (e.g., running or pending)—without disrupting critical services.
Experimental
This capability is now supported for both Hugging Face and custom inference workloads, with a new UI flow that aligns with the API functionality introduced in v2.19.
From cluster v2.19 onward
Compute resource is now also updatable via API, UI, and CLI.
From cluster v2.21 onward
Support for NVIDIA Cloud Functions (NVCF) external workloads – NVIDIA Run:ai enables you to deploy, schedule and manage NVCF workloads as external workloads within the platform. See Deploy NVIDIA Cloud Functions (NVCF) in NVIDIA Run:ai for more details.
From cluster v2.21 onward
Added validation for Knative – You can now only submit inference workloads if Knative is properly installed. This ensures workloads are deployed successfully by preventing submission when Knative is misconfigured or missing.
From cluster v2.21 onward
Enhancements in Hugging Face workloads. For more details, see Deploy inference workloads from Hugging Face:
Added Hugging Face model authentication – NVIDIA Run:ai validates whether a user-provided token grants access to a specific model, in addition to checking if a model requires a token and verifying the token format. This enhancement ensures that users can only load models they have permission to access, improving security and usability.
From cluster v2.18 onward
Introduced model store support using data sources – Select a data source to serve as a model store, caching model weights to reduce loading time and avoid repeated downloads. This improves performance and deployment speed, especially for frequently used models, minimizing the need to re-authenticate with external sources.
Improved model selection – Select a model from a drop-down list. The list is partial and consists only of models that were tested.
From cluster v2.18 onward
Enhanced Hugging Face environment control – Choose between vLLM, TGI, or any other custom container image by selecting an image tag and providing additional arguments. By default, workloads use the official vLLM or TGI containers, with full flexibility to override the image and customize runtime settings for more controlled and adaptable inference deployments.
From cluster v2.18 onward
Updated authentication for NIM model access – You can now authenticate access to NIM models using tokens or credentials, ensuring a consistent, flexible, and secure authentication process. See Deploy inference workloads with NVIDIA NIM for more details.
From cluster v2.19 onward
Added support for volume configuration – You can now set volumes for custom inference workloads. This feature allows inference workloads to allocate and retain storage, ensuring continuity and efficiency in inference execution.
From cluster v2.20 onward
Platform administrators
Analytics
Enhancements to the Overview dashboard – The Overview dashboard includes optimization insights for projects and departments, providing real-time visibility into GPU resource allocation and utilization. These insights help department and project managers make more informed decisions about quota management, ensuring efficient resource usage.
Dashboard UX improvements:
Improved visibility of metrics in the Resources utilization widget by repositioning them above the graphs.
Added a new Idle workloads table widget to help you easily identify and manage underutilized resources.
Renamed and updated the "Workloads by type" widget to provide clearer insights into cluster usage with a focus on workloads.
Improved user experience by moving the date picker to a dedicated section within the overtime widgets, Resources allocation and Resources utilization.
Organizations - projects/departments
Enhanced resource prioritization for projects and departments – Admins can now define and manage SLAs tailored to specific departments and projects via the UI, ensuring resource allocation aligns with real business priorities. This enhancement empowers admins to assign strict priority to over-quota resources, extending control beyond the existing over-quota weight system.
From cluster v2.20 onward
This feature allows administrators to:
Set the priority of each department relative to other departments within the same node pool.
Define the priority of projects within a department, on a per-node pool basis.
Set specific GPU resource limits for both departments and projects.
Audit logs
Updated access control for audit logs – Only users with tenant-wide permissions have the ability to access audit logs, ensuring proper access control and data security. This update reinforces security and compliance by restricting access to sensitive system logs. It ensures that only authorized users can view audit logs, reducing the risk of unauthorized access and potential data exposure.
Notifications
Slack API integration for notifications – A new API allows organizations to receive notifications directly to Slack. This feature enhances real-time communication and monitoring by enabling users to stay informed about workload statuses. See Configuring Slack notifications for more details.
Authentication and authorization
Improved visibility into user roles and access scopes – Individual users can now view their assigned roles and scopes directly in their settings. This enhancement provides greater transparency into user permissions, allowing individuals to easily verify their access levels. It helps users understand what actions they can perform and reduces dependency on administrators for access-related inquiries. See Access rules for more details.
Added auto-redirect to SSO – To deliver a consistent and streamlined login experience across customer applications, users accessing the NVIDIA Run:ai login page will be automatically redirected to SSO, bypassing the standard login screen entirely. This can be enabled via a toggle after an Identity Provider is added, and is available through both the UI and API. See Single Sign-On (SSO) for more details.
SAML service provider metadata XML – After configuring SAML IDP, the service provider metadata XML is now available for download to simplify integration with identity providers. See Set up SSO with SAML for more details.
Expanded SSO OpenID Connect authentication support – SSO OpenID Connect authentication supports attribute mapping of groups in both list and map formats. In map format, the group name is used as the value. This applies to new identity providers only. See Set up SSO with OpenID Connect for more details.
Improved permission error messaging – Enhanced clarity when attempting to delete a user with higher privileges, making it easier to understand and resolve permission-related actions.
Data & storage
Added Data volumes to the UI – Administrators can now create and manage data volumes directly from the UI and share data across different scopes in a cluster, including projects and departments. See Data volumes for more details. Experimental
From cluster v2.19 onward
Infrastructure administrator
NVIDIA Datacenter GPUs - Grace-Blackwell
Support for NVIDIA GB200 NVL72 and MultiNode NVLink systems – NVIDIA Run:ai offers full support for NVIDIA’s most advanced MultiNode NVLink (MNNVL) systems, including NVIDIA GB200, NVIDIA GB200 NVL72 and its derivatives. NVIDIA Run:ai simplifies the complexity of managing and submitting workloads on these systems by automating infrastructure detection, domain labeling, and distributed job submission via the UI, CLI, or API. See Node pools for more details. From cluster v2.21 onward
Advanced cluster configurations
Automatic cleanup of resources for failed workloads – When a workload fails due to infrastructure issues, its resources can be automatically cleaned up using failureResourceCleanupPolicy
, reducing resource of failed workloads. For more details, see Advanced cluster configurations. From cluster v2.21 onward
Advanced setup
Custom pod labels and annotations – Add custom labels and annotations to pods in both the control plane and cluster. This new capability enables service mesh deployment in NVIDIA Run:ai. This feature provides greater flexibility in workload customization and management, allowing users to integrate with service meshes more easily. See Service mesh for more details.
System requirements
NVIDIA Run:ai now supports NVIDIA GPU Operator version 25.3.
NVIDIA Run:ai now supports OpenShift version 4.18.
NVIDIA Run:ai now supports Kubeflow Training Operator 1.9.
Kubernetes version 1.29 is no longer supported.
Deprecation notifications
Cluster API for workload submission
Using the Cluster API to submit NVIDIA Run:ai workloads via YAML was deprecated starting from NVIDIA Run:ai version 2.18. For cluster version 2.18 and above, use the Run:ai REST API to submit workloads. The Cluster API documentation has also been removed from v2.20 and above
Last updated