What's New in Version 2.23
The NVIDIA Run:ai v2.23 what's new provides a detailed summary of the latest features, enhancements, and updates introduced in this version. They serve as a guide to help users, administrators, and researchers understand the new capabilities and how to leverage them for improved workload management, resource optimization, and more.
Simplified Onboarding
Guided onboarding for first-time admins - A new onboarding flow helps system and platform administrators quickly get started by walking through cluster installation, setting up SSO and onboarding the first research team, reducing setup complexity and accelerating time to adoption.
Guided onboarding experience for new researchers - On their first login, all new researchers are directed to the Workloads page and guided through creating their first Jupyter Notebook workspace with a short tour. A template is available for immediate launch, helping users get started quickly. The guided tour remains available anytime from the Help menu
Workload Extensibility with Resource Interface
The Resource Interface (RI) enables organizations to extend NVIDIA Run:ai with new workload types from any ML framework, tool, or Kubernetes resource using a no-code configuration through the Workload Types API. This allows organizations to incorporate emerging AI/ML tools or custom resources without platform updates or code changes. These workloads become immediately available across the organization, empowering teams to innovate and collaborate while benefiting from advanced scheduling and monitoring. See Extending workload support with Resource Interface for more details. Experimental
From cluster v2.23 onward
No-code onboarding - Register new workload types instantly via the Workload Types API.
Seamless researcher experience - Submit and run workloads using a standard YAML manifest via the Workloads v2 API.
Unified management - Newly added workloads are available to all teams and benefit from the same orchestration and monitoring as native types.
Resource Interface-powered integration - Defines how each workload is interpreted and optimized, enabling consistent support for scaling, dependencies, and advanced scheduling.
Newly added workload types - NIM Services, KServe, Milvus, and JobSet.
AI Practitioners
Workloads
New workload template capabilities - The new templates simplify the workload submission experience by allowing you to launch a workload in a single click, without modifying any settings. In addition, several supporting capabilities have been introduced. See Workload templates for more details.
From cluster v2.23 onward
Preset templates - A set of ready-to-use workload templates for NeMo, BioNeMo and PyTorch is now available, enabling you to launch workloads quickly.
Linked assets - Templates can now be linked to assets such as environments and compute resources. Any changes to these assets are automatically reflected in the template, ensuring consistency across workloads.
Migrating legacy templates - Existing legacy templates can now be migrated into the new workload templates format, allowing teams to retain their saved configurations while taking advantage of new features. This capability is available when the Flexible workload templates setting is toggled on. You will not lose your existing templates - all legacy templates remain available.
AI Application-based workload grouping - NVIDIA Run:ai now automatically groups related workloads into a single logical application for any workloads deployed via Helm charts. This provides a unified view of complex solutions. Using the API, you can track aggregated resource requests and allocations (GPU, CPU, memory) and monitor the overall application status. In the UI, you can filter the Workloads grid by application name to easily see all components of a solution together. See AI Applications API for more details.
NGC public registry support for environment images - Environment images and tags can now be selected directly from the NGC public registry when creating workloads, environment assets and templates. This provides a streamlined way to access trusted NVIDIA containers without manually entering image URLs.
Beta
From cluster v2.23 onward
Enhanced logging with per-container support - Workload logs can be viewed at the container level within each pod through the UI, API and CLI, giving researchers and administrators finer control when monitoring and debugging workloads. In addition, downloaded logs are saved with unique file names that include the workload, pod, container, and timestamp, making it easier to organize and analyze logs from distributed workloads. See Workloads for more details.
Application access for workload tools - Added support for authorizing applications (in addition to users and groups) when connecting to tools. This makes it easier to integrate external systems or services that need direct access to workload tools, providing more flexibility in how connections are managed.
Networking metrics - A new metric, NVLink bandwidth total, has been added to Nodes and Workloads in the UI and is also available through the Nodes and Pods APIs. This improves visibility into network utilization, giving teams deeper insight into consumption patterns and resource allocations.
From cluster v2.23 onward
Workload Assets
PVC details view in data sources - A new details pane is available when selecting a PVC data source from the Data sources table. The pane shows Event History for cluster events, as well as Details such as scope, request settings, and partial storage class information. This enhancement gives administrators and AI practitioners greater visibility into PVC usage history and configuration, improving monitoring and debugging. See Data sources for more details.
From cluster v2.23 onward
Enhanced Git credential management - Git data sources can now be configured with Generic secret credentials through the UI or API, with support for SSH private keys. This provides a consistent and secure way to authenticate to repositories, simplifying setup for administrators and enabling users to connect to Git-based workflows more easily. See Credentials for more details.
From cluster v2.23 onward
Command-line Interface (CLI v2)
Customize your CLI list views - The new
--columns
flag allows you to tailor the output ofrunai list
commands to display only the fields you need, giving you complete control over table views. See CLI commands reference for more details.Select and order columns - Define exactly which columns to display and in what order.
Discover more data - Show useful fields that are not part of the default output.
Autocompletion support - Use tab completion to discover and select all available columns for any list command.
ML Engineers
Inference
Flexible submission form for NVIDIA NIM and Hugging Face workloads - The flexible submission form is now supported for NVIDIA NIM and Hugging Face inference workloads. This form allows users to submit workloads using an existing setup or provide custom settings for one-time use, enabling faster, more consistent submissions aligned with organizational policies.
Advanced setup form for NVIDIA NIM and Hugging Face workloads - Advanced configuration options are available when submitting NVIDIA NIM and Hugging Face inference workloads, including editing the image and tag, modifying or adding environment variables, and setting workload priority. This provides greater flexibility for adapting workload configurations to specific requirements.
Dynamic NVIDIA NIM model list from NGC catalog - The list of available NVIDIA NIM models is now retrieved directly from the NGC catalog using an API call. This ensures the model list remains current and reflects the latest offerings.
Flexible inference workload templates - Flexible workload templates allow you to save workload configurations that can be reused across workload submissions. You can create templates from scratch or base them on existing assets - environments, compute resources, or data sources. These templates simplify the submission process and promote standardization across users and teams. See Inference templates for more details.
Distributed inference API enhancements - The inference API has been extended with support for multi-node deployments, adding autoscaling and rolling updates. These enhancements improve the robustness, scalability, and manageability of distributed inference workloads. See Distributed inferences API for more details.
From cluster v2.22 onward
Policy API for distributed inference - A dedicated policy API is available for distributed inference enabling fine-grained control over distributed inference workloads. Administrators can define and enforce policies that govern scheduling, scaling, and update behavior, ensuring workloads adhere to organizational requirements and operate consistently across environments. See Policy API for more details.
From cluster v2.22 onward
Distributed inference support for GB200 and MNNVL - Distributed inference workloads can now take advantage of NVIDIA GB200 NVL72 and other Multi-Node NVLink systems. This enables automatic infrastructure detection, domain labeling, and optimized cross-node communication for high-bandwidth, performance-optimized inference execution. See Using GB200 NVL72 and Multi-Node NVLink domains for more details.
From cluster v2.23 onward
NVIDIA NIM observability metrics - Observability metrics are now available for NVIDIA NIM inference workloads via the UI and Workloads / Pods APIs, giving teams better visibility into the performance of large language model (LLM) deployments. These metrics can be collected when deploying NIM through NVIDIA Run:ai, NIM operator, Helm chart, or directly via container images (with
run.ai/nim-workload: "true"
label). This enhancement enables more effective monitoring and troubleshooting of NIM-based inference workloads. See Workloads and NIM observability metrics via API for more details.From cluster v2.23 onward
NVIDIA NIM service deployment API - A new API is available for deploying NVIDIA NIM services, allowing programmatic creation and management of NIM service workloads for easier automation and integration. See NIM services API for more details.
From cluster v2.23 onward
Support for Dynamo inference workloads - Multi-node inference workloads deployed with the NVIDIA Dynamo framework can now be scheduled efficiently using gang scheduling and topology-aware scheduling. This ensures fast startup, low latency, and better resource utilization for disaggregated inference pipelines.
Experimental
From cluster v2.23 onward
Application access for inference serving endpoints - All inference workloads - custom, Hugging Face, and NVIDIA NIM support authorizing applications (in addition to users and groups) when connecting to inference serving endpoints. This enables secure, programmatic access to inference endpoints when accessed externally from the cluster. To use this capability, configure the serving endpoint, authenticate using a token granted by an application, and use the token in API requests to the endpoint.
Credential creation during NIM and Hugging Face submissions - You can now create My credentials of type Generic secret directly in the NVIDIA NIM and Hugging Face inference workloads submission, avoiding the need to leave the flow to configure authentication.
Separate admin toggles for NVIDIA NIM and Hugging Face models - Previously, enabling NVIDIA NIM and Hugging Face models was managed through a single Models toggle in the General settings. These options are now separated into distinct toggles, allowing administrators to enable or disable NVIDIA NIM and Hugging Face models independently for finer control over inference model availability.
Platform Administrators
Clusters
Network topology-aware scheduling for distributed workloads - NVIDIA Run:ai now supports topology-aware scheduling to optimize placement of distributed workloads across data center nodes. By leveraging Kubernetes node labels, the Scheduler can co-locate pods on nodes that are “closer” to each other in the network. This reduces communication overhead, improves workload efficiency, and helps maximize GPU utilization. Once administrators configure the network topology and associate it with node pools, scheduling is applied automatically for distributed workloads submitted through the platform. See Accelerating workloads with network topology-aware scheduling for more details.
From cluster v2.23 onward
Expanded cluster role permissions - Cluster roles have been updated to include
watch
permissions for all supported workload Custom Resource Definitions (CRDs) whereverget
andlist
permissions were already present. This change ensures compatibility with Kubernetes operators that requireget
,list
, andwatch
access for proper monitoring and integration with NVIDIA Run:ai workloads.From cluster v2.21 onward
Analytics
Workloads by category over time - Added a widget to the Overview dashboard that shows the number of workloads per category (e.g., Train, Build, Deploy) over time. This visualization helps identify usage trends, compare activity across categories, and track changes over specific periods. This feature is also supported in the API.
From cluster v2.22 onward
Policies
System policies for workload governance - By default, every NVIDIA Run:ai account is governed by system policies that establish foundational security controls across all workloads, scopes, and interfaces (UI, API and CLI). These policies ensure consistent workload behavior and prevent unauthorized escalation, and can be viewed as part of the effective policy for any scope. Administrators can create new policies to update these defaults at any desired scope. This flexibility allows easing certain API restrictions when needed, while ensuring every change is explicit and auditable. See System policies for more details.
From cluster v2.23 onward
Privileged parameter - Set to
false
by default and not editable (canEdit: false
), preventing containers from running with full host access unless explicitly enabled by an administrator.Grace period - Defines how long a workload can continue running after a preemption request before termination. The default grace period is 30 seconds, with a system-enforced maximum of 5 minutes across UI, API and CLI submissions. This value can be updated at any scope within the policy hierarchy.
Policy synchronization changes - Starting in version 2.23, control plane policies are no longer synchronized with the cluster. Policies are now stored and enforced only in the control plane, preventing conflicts with outdated cluster policies. See Workload policies for more details.
Nodes / Node Pools
Minimum guaranteed runtime for preemptible workloads - You can now configure a minimum guaranteed runtime for preemptible workloads in node pools via the UI and API. This setting specifies the minimum time a preemptible workload will run once scheduled and bound to a node before becoming eligible for preemption. This reduces unexpected interruptions and makes workload execution more predictable. See Node pools for more details.
From cluster v2.23 onward
Improved status messaging for node pools with undrained nodes - When creating a node pool or labeling nodes to add to the node pool, nodes that are not fully drained (i.e., still have running workloads) trigger clearer status messages in the API and UI. These messages indicate that the node pool cannot include the affected nodes until they are drained and reach a "Ready" state. This helps administrators better understand node pool readiness and identify which nodes are still in transition.
From cluster v2.23 onward
Cluster filter enhancements for Nodes page - The Nodes grid includes an “All” option in the clusters filter to make it easier to view and manage nodes across multiple clusters at once. When multiple clusters are selected, a Cluster column is displayed by default, showing each node’s associated cluster. Available in both the UI and API.
Authentication and Authorization
Scoped access rules - Users with permissions restricted to a specific scope are now limited to access rules within that scope. This capability is enabled via a tenant setting (
enable_scoped_authorization
) in the Settings API. Once enabled, the Access rules API returns only the rules within the viewer’s scope (or narrower), and the same scope filtering is applied when viewing access rules in the UI. This ensures access control is aligned with scope boundaries and prevents users from seeing or modifying rules outside their domain.
General Settings
Several changes have been made to the General settings:
Removed the following - Job submission, MPI distributed training, Weights & Biases SWEEP toggles and Set Docker image registry
The following toggles are now enabled by default - Flexible workload submission, Flexible workload templates Data volumes, and Policies
Custom logo branding - You can now upload a custom logo to appear in the top-right corner of the NVIDIA Run:ai platform interface. This allows organizations to personalize the platform UI with their own branding. Logos can be uploaded in SVG or PNG format (up to 128 KB) directly from the Branding settings.
Infrastructure Administrators
Installation
Cluster configuration via Helm values - Cluster configurations can now be managed directly through the Helm values interface (
clusterConfig
). At runtime,runaiconfig
is the actual source of truth, representing what is actively running in the cluster. When a Helm upgrade is performed, the Helm values overwrite the existingrunaiconfig
, ensuring alignment with the chart. As a result, clusters configured through a Helm chart should always be managed through Helm. This keeps configurations consistent and predictable across deployments and upgrades. See Advanced cluster configurations for more details.From cluster v2.23 onward
Support for custom CA with S3 and Git integrations - Administrators can configure a custom Certificate Authority (CA) for secure TLS communication with S3 and Git repositories. This update extends existing custom CA support to also cover the Git-sync and S3 sidecar containers. It simplifies setup for airgapped environments by eliminating the need for manually built images and ensures consistent secure communication across all components. See Cluster system requirements for more details.
From cluster v2.23 onward
System Requirements
NVIDIA Run:ai supports Kubernetes version 1.34.
OpenShift version 4.15 is no longer supported.
Support for ARM on OpenShift - NVIDIA Run:ai now supports running on ARM-based nodes in OpenShift clusters, expanding deployment flexibility, and allowing organizations to leverage ARM architectures alongside existing x86 infrastructure within their OpenShift environments.
General Enhancements
Keyboard shortcuts for dialogs and forms - Common keyboard actions across most UI screens and dialogs. Press Enter to confirm actions and Esc to cancel, making it quicker and easier to navigate workflows.
Deleted workloads visible by default - Deleted workloads are displayed by default in the UI under Workload manager. The toggle to enable this view has been removed, simplifying the experience and making it easier for users to track and review deleted workloads without extra configuration.
Direct tool connection - When a workload has only one configured tool, clicking Connect opens the connection directly, without showing a selection menu. If multiple tools are configured, the selection menu will still appear.
Metrics view updates - The metrics view has been reorganized with new naming and grouping:
Renamed Default metrics view to Resource utilization
Renamed Advanced metrics view to GPU profiling
Inference metrics are shown in a dedicated Inference dropdown, available for all inference workloads
Deprecation Notifications
Grafana Dashboards
The legacy Grafana dashboards - Overview and Analytics - are deprecated and will be removed in a future release. We recommend transitioning to the new dashboards available in the NVIDIA Run:ai UI, which are powered by NVIDIA Run:ai APIs. These dashboards provide improved visibility with drill-down capabilities and more flexibility for analyzing usage and performance.
CLI v1
CLI v1 was deprecated in v2.20 and has now been fully removed from the platform. All command-line interactions should be performed using CLI v2.
Jobs
The Jobs workload type was deprecated in v2.20 and has now been fully removed from the platform. This means the Jobs option no longer appears in the General settings or in the Workloads page.
Last updated