What’s new in version 2.20
The NVIDIA Run:ai v2.20 What's new provides a detailed summary of the latest features, enhancements, and updates introduced in this version. They serve as a guide to help users, administrators, and researchers understand the new capabilities and how to leverage them for improved workload management, resource optimization, and more.
Important
For a complete list of deprecations, see Deprecation notifications. Deprecated features and capabilities will be available for two versions ahead of the notification.
Researchers
Workloads - workspaces and training
Stop/run actions for distributed workloads - You can now stop and run distributed workloads from the UI, CLI, and API. Scheduling rules for training workloads also apply to distributed workloads. This enhances control over distributed workloads, enabling greater flexibility and resource management.
From cluster v2.20 onward
Visibility into idle GPU devices - Idle GPU devices are now displayed in the UI and API showing the number of allocated GPU devices that have been idle for more than 5 minutes. This provides better visibility into resource utilization, enabling more efficient workload management.
Configurable workload completion with multiple runs - You can now define the number of runs a training workload must complete to be considered finished directly in the UI, API, and CLI v2. Running training workloads multiple times improves the reliability and validity of training results. Additionally, you can configure how many runs can be scheduled in parallel, helping to significantly reduce training time and simplifying the process of managing jobs that require multiple runs. See Train models using a standard training workload for more details.
From cluster v2.20 onward
Configurable grace period for workload preemption - You can now set a grace period in the UI, API and CLI v2 providing a buffer time for preempted workloads to reach a safe checkpoint before being forcibly preempted for standard and distributed training workloads. The grace period can be configured between 0 seconds and 5 minutes. This aims to minimize data loss and avoid unnecessary retraining, ensuring the latest checkpoints are saved.
From cluster v2.20 onward
Pod deletion policy for terminal workloads - You can now specify which pods should be deleted when a distributed workload reaches a terminal state (completed/failed) using cleanPodPolicy in CLI v2 and API. This enhancement provides greater control over resource cleanup and helps maintain a more organized and efficient cluster environment. See cleanPodPolicy for more details.
Workload assets
Instructions for environment variables - You can now add instructions to environment variables when creating new environments via the UI and API. In addition, NVIDIA Run:ai's environments now include default instructions. Adding instructions provides guidance enabling anyone using the environment to set the environment variable values correctly.
From cluster v2.20 onward
Enhanced environments and compute resource management - The action bar now contains "Make a Copy" and "Edit" while the "Rename" option has been removed. A new "Last Updated" column has also been added for easier tracking of asset modifications.
From cluster v2.20 onward
Enhanced data sources and credentials tables - Added a new "Kubernetes name" column to data sources and credentials tables for visibility into Kubernetes resource associations. The credentials table now includes an "Environments" column displaying the environments associated with the credential.
From cluster v2.20 onward
Authentication and authorization
User applications for API authentication - You can now create your own applications for API integrations with NVIDIA Run:ai. Each application includes client credentials which can be used to obtain an authentication token to utilize for subsequent API calls. See User applications for more details.
From cluster v2.20 onward
Scheduler
Support for multiple fractional GPUs in a single workload - NVIDIA Run:ai now supports submitting workloads that utilize multiple fractional GPUs within a single workload using the UI and CLI. This feature enhances GPU utilization, increases scheduling probability in shorter timeframes, and allows workloads to consume only the memory they need. It maximizes quota usage and enables more workloads to share the same GPUs effectively. See Multi-GPU fractions and Multi-GPU dynamic fractions for more details.
Beta for Dynamic Fractions
From cluster v2.20 onward
Support for GPU memory swap with multiple GPUs per workload - NVIDIA Run:ai now supports GPU memory swap for workloads utilizing multiple GPUs. By leveraging GPU memory swap, you can maximize GPU utilization and serve more workloads using the same hardware. The swap scheduler on each node ensures that all GPUs of a distributed model run simultaneously, maintaining synchronization across GPUs. Workload configurations combine swap settings with multi-GPU dynamic fractions, providing flexibility and efficiency for managing large-scale workloads. See Multi-GPU memory swap.
Beta
From cluster v2.20 onward
Command-line interface (CLI v2)
Support for Windows OS - CLI v2 now supports Windows operating systems, enabling you to leverage the full capabilities of the CLI.
From cluster v2.18 onward
Unified training command structure - Unified the
distributed
command into thetraining
command to align with the NVIDIA Run:ai UI. Thetraining
command now includes a new sub-command to support distributed workloads, ensuring a more consistent and streamlined user experience across both the CLI v2 and UI.New command for Kubernetes access - Added a new CLI v2 command,
runai kubeconfig set
, allowing users to set the kubeconfig file with NVIDIA Run:ai authorization token. This enhancement enables users to gain access to the Kubernetes cluster, simplifying authentication and integration with NVIDIA Run:ai-managed environments. For more details, see Add NVIDIA Run:ai authorization to kubeconfig.Added view workload labels - You can now view the labels associated with a workload when using the CLI v2
runai workload describe
command for all workload types. This enhancement provides better visibility into workload metadata.
ML engineers
Workloads - inference
Enhanced visibility into rolling updates for inference workloads - NVIDIA Run:ai now provides a phase message that provides detailed insights into the current state of the update, by hovering over the workload's status. This helps users to monitor and manage updates more effectively. See Rolling inference updates for more details.
From cluster v2.20 onward
Inference serving endpoint configuration - You can now define an inference serving endpoint directly within the environment using the NVIDIA Run:ai UI.
From cluster v2.19 onward
Persistent token management for Hugging Face models - NVIDIA Run:ai allows users to save their Hugging Face tokens persistently as part of their credentials within the NVIDIA Run:ai UI. Once saved, tokens can be easily selected from a list of stored credentials, removing the need to manually enter them each time. This enhancement improves the process of deploying Hugging Face models, making it more efficient and user-friendly. See Deploy inference workloads from Hugging Face for more details.
From cluster v2.13 onward
Deploy and manage NVIDIA NIM models in inference workloads - NVIDIA Run:ai now supports NVIDIA NIM models, enabling you to easily deploy and manage these models when submitting inference workloads. You can select a NIM model and leverage NVIDIA’s hardware optimizations directly through the NVIDIA Run:ai UI. This feature also allows you to take advantage of NVIDIA Run:ai capabilities such as autoscaling and GPU fractioning. See Deploy inference workloads with NVIDIA NIM for more details.
Customizable autoscaling plans for inference workloads - NVIDIA Run:ai allows advanced users practicing autoscaling for inference workloads to fine-tune their autoscaling plans using the Update inference spec API. This feature enables you to achieve optimal behavior to meet fluctuating request demands.
Experimental
From cluster v2.20 onward
Platform administrator
Analytics
New Reports view for analytics - The new Reports enables generating and organizing large data in a structured, CSV-formatted layout. With this feature, you can monitor resource consumption, identify trends, and make informed decisions to optimize their AI workloads with greater efficiency.
From cluster v2.20 onward
Authentication and authorization
Client credentials for applications - Applications now use client credentials - Client ID and Client secret - to obtain an authentication token, aligned with OAuth standard. See Applications for more details.
From cluster v2.20 onward
Node pools
Enhanced metric graphs for node pools - Enhanced metric graphs in the DETAILS tab for node pools by aligning these graphs with the dashboard and the node pools API. As part of this improvement, the following columns have been removed from the Node pools table.
Node GPU Allocation
GPU Utilization Distribution
GPU Utilization
GPU Memory Utilization
CPU Utilization
CPU Memory Utilization
Organizations - projects/departments
Enhanced project deletion - Deleting a project will now attempt to delete the project's associated workloads and assets, allowing better management of your organization's assets.
From cluster v2.20 onward
Enhanced resource prioritization for projects and departments - NVIDIA Run:ai has introduced advanced prioritization capabilities to manage resources between projects or between departments more effectively using the Projects and Departments APIs.
From cluster v2.20 onward
This feature allows administrators to:
Prioritize resource allocation and reclaim between different projects and departments.
Prioritize projects within the same department.
Set priorities per node-pool for both projects and departments.
Implement distinct SLAs by assigning strict priority levels to over-quota resources.
Updated over quota naming - Renamed over quota priority to over quota weight to reflect its actual functionality.
Policy
Added policy-based default field values - Administrators can now set default values for fields that are automatically calculated based on the values of other fields using defaultFrom. This ensures that critical fields in the workload submission form are populated automatically if not provided by the user.
From cluster v2.20 onward
This feature supports various field types:
Integer fields (e.g.,
cpuCoresRequest
),Number fields (e.g.,
gpuPortionRequest
),Quantity fields (e.g.,
gpuMemoryRequest
)
Data sources
Improved control over data source and storage class visibility - NVIDIA Run:ai now provides administrators with the ability to control the visibility of data source types and storage in the UI. Data source types that are restricted by policy will no longer appear during workload submission or when creating new data source assets. Additionally, administrators can configure storage classes as internal using the Storage class configuration API.
From cluster v2.20 onward
Email notifications
Added email notifications API - Email notifications can now be configured via API in addition to the UI, enabling integration with external tools. See NotificationChannels API for more details.
Infrastructure Administrator
NVIDIA Data Center GPUs - Grace-Hopper
Support for ARM-Based Grace-Hopper Superchip (GH200) - NVIDIA Run:ai now supports the ARM-based Grace-Hopper Superchip (GH200). Due to a limitation in version 2.20 with ARM64, the NVIDIA Run:ai control plane services must be scheduled on non-ARM based CPU nodes. This limitation will be addressed in a future release. See self-hosted installation for more details.
From cluster v2.20 onward
System requirements
NVIDIA Run:ai now supports Kubernetes version 1.32.
NVIDIA Run:ai now supports OpenShift version 4.17.
Kubernetes version 1.28 is no longer supported.
OpenShift versions 4.12 to 4.13 are no longer supported.
Advanced cluster configurations
Exclude nodes in mixed node clusters - NVIDIA Run:ai now allows you to exclude specific nodes in a mixed node cluster using the
nodeSelectorTerms
flag. See Advanced cluster configurations for more details.From cluster v2.20 onward
Advanced configuration options for cluster services - Introduced new cluster configuration options for setting node affinity and tolerations for NVIDIA Run:ai cluster services. These configuration ensure that the NVIDIA Run:ai cluster services are scheduled on the desired nodes. See Advanced cluster configurations for more details.
From cluster v2.20 onward
global.affinity
global.tolerations
daemonSetsTolerations
Added Argo workflows auto-pod grouping - Introduced a new cluster configuration option,
gangScheduleArgoWorkflow
, to modify the default behavior for grouping ArgoWorkflow pods, allowing you to prevent pods from being grouped into a single pod-group. See Advanced cluster configurations for more details. Cluster v2.20 and v2.18Added cloud auto-scaling for memory fractions - NVIDIA Run:ai now supports auto-scaling for workloads using memory fractions in cloud environments. Using
gpuMemoryToFractionRatio
configuration option allows a failed scheduling attempt for a memory fractions workload to create NVIDIA Run:ai scaling pods, triggering the auto-scaler. See Advanced cluster configurations for more details.From cluster v2.19 onward
Added stale gang eviction timeout for improved stability - NVIDIA Run:ai has introduced a default timeout of 60 seconds for gang eviction in gang scheduling workloads using
defaultStalenessGracePeriod
. This timeout allows both the workload controller and the scheduler sufficient time to remediate the workload, improving the stability of large training jobs. See Advanced cluster configurations for more details.From cluster v2.18 onward
Added custom labels for built-in alerts - Administrators can now add their own custom labels to the built-in alerts from Prometheus by setting
spec.prometheus.additionalAlertLabels
in their cluster. See Advanced cluster configurations for mode details.From cluster v2.20 onward
Enhanced configuration flexibility for cluster replica management - Administrators can now use the
spec.global.replicaCount
to manage replicas for NVIDIA Run:ai services. See See Advanced cluster configurations for more details.From cluster v2.20 onward
NVIDIA Run:ai built-in alerts
Added two new NVIDIA Run:ai built-in alerts for Kubernetes nodes hosting GPU workloads. The unknown state alert notifies when the node's health and readiness cannot be determined, and the low memory alert warns when the node has insufficient memory to support current or upcoming workloads.
From cluster v2.20 onward
NVIDIA Run:ai Developer
Metrics and telemetry
Additional metrics and telemetry are available via the API. For more details, see Metrics API:
Metrics (over time)
Project
GPU_QUOTA
CPU_QUOTA_MILLICORES
CPU_MEMORY_QUOTA_MB
GPU_ALLOCATION
CPU_ALLOCATION_MILLICORE
SCPU_MEMORY_ALLOCATION_MB
Department
GPU_QUOTA
CPU_QUOTA_MILLICORES
CPU_MEMORY_QUOTA_MB
GPU_ALLOCATION
CPU_ALLOCATION_MILLICORES
CPU_MEMORY_ALLOCATION_MB
Telemetry (current time)
Project
GPU_QUOTA
CPU_QUOTA
MEMORY_QUOTA
GPU_ALLOCATION
CPU_ALLOCATION
MEMORY_ALLOCATION
GPU_ALLOCATION_NON_PREEMPTIBLE
CPU_ALLOCATION_NON_PREEMPTIBLE
MEMORY_ALLOCATION_NON_PREEMPTIBLE
Department
GPU_QUOTA
CPU_QUOTA
MEMORY_QUOTA
GPU_ALLOCATION
CPU_ALLOCATION
MEMORY_ALLOCATION
GPU_ALLOCATION_NON_PREEMPTIBLE
CPU_ALLOCATION_NON_PREEMPTIBLE
MEMORY_ALLOCATION_NON_PREEMPTIBLE
Deprecation notifications
Ongoing Dynamic MIG deprecation process
The Dynamic MIG deprecation process started in version 2.19. NVIDIA Run:ai supports standard MIG profiles as detailed in Configuring NVIDIA MIG profiles.
Before upgrading to version 2.20, workloads submitted with Dynamic MIG and their associated node configurations must be removed
In version 2.20, MIG was removed from the NVIDIA Run:ai UI under compute resources.
In Q2/25 all ‘Dynamic MIG’ APIs and CLI commands will be fully deprecated. (it will fail)
CLI v1 deprecation
CLI V1 is deprecated and no new features will be developed for it. It will remain available for use for the next two releases to ensure a smooth transition for all users. We recommend switching to CLI v2, which provides feature parity, backwards compatibility, and ongoing support for new enhancements. CLI v2 is designed to deliver a more robust, efficient, and user-friendly experience.
Legacy Jobs view deprecation
Starting with version 2.20, the legacy Jobs view will be discontinued in favor of the more advanced Workloads view. The legacy submission form will still be accessible via the Workload manager view for a smoother transition.
appID and appSecret deprecation
Deprecating appID and appSecret parameters used for requesting an API token. It will remain available for use for the next two releases. To create application tokens, use your client credentials - Client ID and Client secret.
Last updated