What's New in Version 2.23

The NVIDIA Run:ai v2.23 what's new provides a detailed summary of the latest features, enhancements, and updates introduced in this version. They serve as a guide to help users, administrators, and researchers understand the new capabilities and how to leverage them for improved workload management, resource optimization, and more.

Important

For a complete list of deprecations, see Deprecation notifications. Deprecated features and capabilities will be available for two versions ahead of the notification.

Simplified Onboarding

  • Guided onboarding for first-time admins - A new onboarding flow helps system and platform administrators quickly get started by walking through cluster installation, setting up SSO and onboarding the first research team, reducing setup complexity and accelerating time to adoption.

  • Guided onboarding experience for new researchers - On their first login, all new researchers are directed to the Workloads page and guided through creating their first Jupyter Notebook workspace with a short tour. A template is available for immediate launch, helping users get started quickly. The guided tour remains available anytime from the Help menu

Workload Extensibility with Resource Interface

The Resource Interface (RI) enables organizations to extend NVIDIA Run:ai with new workload types from any ML framework, tool, or Kubernetes resource using a no-code configuration through the Workload Types API. This allows organizations to incorporate emerging AI/ML tools or custom resources without platform updates or code changes. These workloads become immediately available across the organization, empowering teams to innovate and collaborate while benefiting from advanced scheduling and monitoring. See Extending workload support with Resource Interface for more details. Experimental From cluster v2.23 onward

  • No-code onboarding - Register new workload types instantly via the Workload Types API.

  • Seamless researcher experience - Submit and run workloads using a standard YAML manifest via the Workloads v2 API.

  • Unified management - Newly added workloads are available to all teams and benefit from the same orchestration and monitoring as native types.

  • Resource Interface-powered integration - Defines how each workload is interpreted and optimized, enabling consistent support for scaling, dependencies, and advanced scheduling.

  • Newly added workload types - NIM Services, KServe, Milvus, and JobSet.

AI Practitioners

Workloads

  • New workload template capabilities - The new templates simplify the workload submission experience by allowing you to launch a workload in a single click, without modifying any settings. In addition, several supporting capabilities have been introduced. See Workload templates for more details. From cluster v2.23 onward

    • Preset templates - A set of ready-to-use workload templates for NeMo, BioNeMo and PyTorch is now available, enabling you to launch workloads quickly.

    • Linked assets - Templates can now be linked to assets such as environments and compute resources. Any changes to these assets are automatically reflected in the template, ensuring consistency across workloads.

    • Migrating legacy templates - Existing legacy templates can now be migrated into the new workload templates format, allowing teams to retain their saved configurations while taking advantage of new features. This capability is available when the Flexible workload templates setting is toggled on. You will not lose your existing templates - all legacy templates remain available.

  • AI Application-based workload grouping - NVIDIA Run:ai now automatically groups related workloads into a single logical application for any workloads deployed via Helm charts. This provides a unified view of complex solutions. Using the API, you can track aggregated resource requests and allocations (GPU, CPU, memory) and monitor the overall application status. In the UI, you can filter the Workloads grid by application name to easily see all components of a solution together. See AI Applications API for more details.

  • NGC public registry support for environment images - Environment images and tags can now be selected directly from the NGC public registry when creating workloads, environment assets and templates. This provides a streamlined way to access trusted NVIDIA containers without manually entering image URLs. Beta From cluster v2.23 onward

  • Enhanced logging with per-container support - Workload logs can be viewed at the container level within each pod through the UI, API and CLI, giving researchers and administrators finer control when monitoring and debugging workloads. In addition, downloaded logs are saved with unique file names that include the workload, pod, container, and timestamp, making it easier to organize and analyze logs from distributed workloads. See Workloads for more details.

  • Application access for workload tools - Added support for authorizing applications (in addition to users and groups) when connecting to tools. This makes it easier to integrate external systems or services that need direct access to workload tools, providing more flexibility in how connections are managed.

  • Networking metrics - A new metric, NVLink bandwidth total, has been added to Nodes and Workloads in the UI and is also available through the Nodes and Pods APIs. This improves visibility into network utilization, giving teams deeper insight into consumption patterns and resource allocations. From cluster v2.23 onward

Workload Assets

  • PVC details view in data sources - A new details pane is available when selecting a PVC data source from the Data sources table. The pane shows Event History for cluster events, as well as Details such as scope, request settings, and partial storage class information. This enhancement gives administrators and AI practitioners greater visibility into PVC usage history and configuration, improving monitoring and debugging. See Data sources for more details. From cluster v2.23 onward

  • Enhanced Git credential management - Git data sources can now be configured with Generic secret credentials through the UI or API, with support for SSH private keys. This provides a consistent and secure way to authenticate to repositories, simplifying setup for administrators and enabling users to connect to Git-based workflows more easily. See Credentials for more details. From cluster v2.23 onward

Command-line Interface (CLI v2)

  • Customize your CLI list views - The new --columns flag allows you to tailor the output of runai list commands to display only the fields you need, giving you complete control over table views. See CLI commands reference for more details.

    • Select and order columns - Define exactly which columns to display and in what order.

    • Discover more data - Show useful fields that are not part of the default output.

    • Autocompletion support - Use tab completion to discover and select all available columns for any list command.

ML Engineers

Inference

  • Flexible submission form for NVIDIA NIM and Hugging Face workloads - The flexible submission form is now supported for NVIDIA NIM and Hugging Face inference workloads. This form allows users to submit workloads using an existing setup or provide custom settings for one-time use, enabling faster, more consistent submissions aligned with organizational policies.

  • Advanced setup form for NVIDIA NIM and Hugging Face workloads - Advanced configuration options are available when submitting NVIDIA NIM and Hugging Face inference workloads, including editing the image and tag, modifying or adding environment variables, and setting workload priority. This provides greater flexibility for adapting workload configurations to specific requirements.

  • Dynamic NVIDIA NIM model list from NGC catalog - The list of available NVIDIA NIM models is now retrieved directly from the NGC catalog using an API call. This ensures the model list remains current and reflects the latest offerings.

  • Flexible inference workload templates - Flexible workload templates allow you to save workload configurations that can be reused across workload submissions. You can create templates from scratch or base them on existing assets - environments, compute resources, or data sources. These templates simplify the submission process and promote standardization across users and teams. See Inference templates for more details.

  • Distributed inference API enhancements - The inference API has been extended with support for multi-node deployments, adding autoscaling and rolling updates. These enhancements improve the robustness, scalability, and manageability of distributed inference workloads. See Distributed inferences API for more details. From cluster v2.22 onward

  • Policy API for distributed inference - A dedicated policy API is available for distributed inference enabling fine-grained control over distributed inference workloads. Administrators can define and enforce policies that govern scheduling, scaling, and update behavior, ensuring workloads adhere to organizational requirements and operate consistently across environments. See Policy API for more details. From cluster v2.22 onward

  • Distributed inference support for GB200 and MNNVL - Distributed inference workloads can now take advantage of NVIDIA GB200 NVL72 and other Multi-Node NVLink systems. This enables automatic infrastructure detection, domain labeling, and optimized cross-node communication for high-bandwidth, performance-optimized inference execution. See Using GB200 NVL72 and Multi-Node NVLink domains for more details. From cluster v2.23 onward

  • NVIDIA NIM observability metrics - Observability metrics are now available for NVIDIA NIM inference workloads via the UI and Workloads / Pods APIs, giving teams better visibility into the performance of large language model (LLM) deployments. These metrics can be collected when deploying NIM through NVIDIA Run:ai, NIM operator, Helm chart, or directly via container images (with run.ai/nim-workload: "true" label). This enhancement enables more effective monitoring and troubleshooting of NIM-based inference workloads. See Workloads and NIM observability metrics via API for more details. From cluster v2.23 onward

  • NVIDIA NIM service deployment API - A new API is available for deploying NVIDIA NIM services, allowing programmatic creation and management of NIM service workloads for easier automation and integration. See NIM services API for more details. From cluster v2.23 onward

  • Support for Dynamo inference workloads - Multi-node inference workloads deployed with the NVIDIA Dynamo framework can now be scheduled efficiently using gang scheduling and topology-aware scheduling. This ensures fast startup, low latency, and better resource utilization for disaggregated inference pipelines. Experimental From cluster v2.23 onward

  • Application access for inference serving endpoints - All inference workloads - custom, Hugging Face, and NVIDIA NIM support authorizing applications (in addition to users and groups) when connecting to inference serving endpoints. This enables secure, programmatic access to inference endpoints when accessed externally from the cluster. To use this capability, configure the serving endpoint, authenticate using a token granted by an application, and use the token in API requests to the endpoint.

  • Credential creation during NIM and Hugging Face submissions - You can now create My credentials of type Generic secret directly in the NVIDIA NIM and Hugging Face inference workloads submission, avoiding the need to leave the flow to configure authentication.

  • Separate admin toggles for NVIDIA NIM and Hugging Face models - Previously, enabling NVIDIA NIM and Hugging Face models was managed through a single Models toggle in the General settings. These options are now separated into distinct toggles, allowing administrators to enable or disable NVIDIA NIM and Hugging Face models independently for finer control over inference model availability.

Platform Administrators

Clusters

  • Network topology-aware scheduling for distributed workloads - NVIDIA Run:ai now supports topology-aware scheduling to optimize placement of distributed workloads across data center nodes. By leveraging Kubernetes node labels, the Scheduler can co-locate pods on nodes that are “closer” to each other in the network. This reduces communication overhead, improves workload efficiency, and helps maximize GPU utilization. Once administrators configure the network topology and associate it with node pools, scheduling is applied automatically for distributed workloads submitted through the platform. See Accelerating workloads with network topology-aware scheduling for more details. From cluster v2.23 onward

  • Expanded cluster role permissions - Cluster roles have been updated to include watch permissions for all supported workload Custom Resource Definitions (CRDs) wherever get and list permissions were already present. This change ensures compatibility with Kubernetes operators that require get, list, and watch access for proper monitoring and integration with NVIDIA Run:ai workloads. From cluster v2.21 onward

Analytics

  • Workloads by category over time - Added a widget to the Overview dashboard that shows the number of workloads per category (e.g., Train, Build, Deploy) over time. This visualization helps identify usage trends, compare activity across categories, and track changes over specific periods. This feature is also supported in the API. From cluster v2.22 onward

Policies

  • System policies for workload governance - By default, every NVIDIA Run:ai account is governed by system policies that establish foundational security controls across all workloads, scopes, and interfaces (UI, API and CLI). These policies ensure consistent workload behavior and prevent unauthorized escalation, and can be viewed as part of the effective policy for any scope. Administrators can create new policies to update these defaults at any desired scope. This flexibility allows easing certain API restrictions when needed, while ensuring every change is explicit and auditable. See System policies for more details. From cluster v2.23 onward

    • Privileged parameter - Set to false by default and not editable (canEdit: false), preventing containers from running with full host access unless explicitly enabled by an administrator.

    • Grace period - Defines how long a workload can continue running after a preemption request before termination. The default grace period is 30 seconds, with a system-enforced maximum of 5 minutes across UI, API and CLI submissions. This value can be updated at any scope within the policy hierarchy.

  • Policy synchronization changes - Starting in version 2.23, control plane policies are no longer synchronized with the cluster. Policies are now stored and enforced only in the control plane, preventing conflicts with outdated cluster policies. See Workload policies for more details.

Nodes / Node Pools

  • Minimum guaranteed runtime for preemptible workloads - You can now configure a minimum guaranteed runtime for preemptible workloads in node pools via the UI and API. This setting specifies the minimum time a preemptible workload will run once scheduled and bound to a node before becoming eligible for preemption. This reduces unexpected interruptions and makes workload execution more predictable. See Node pools for more details. From cluster v2.23 onward

  • Improved status messaging for node pools with undrained nodes - When creating a node pool or labeling nodes to add to the node pool, nodes that are not fully drained (i.e., still have running workloads) trigger clearer status messages in the API and UI. These messages indicate that the node pool cannot include the affected nodes until they are drained and reach a "Ready" state. This helps administrators better understand node pool readiness and identify which nodes are still in transition. From cluster v2.23 onward

  • Cluster filter enhancements for Nodes page - The Nodes grid includes an “All” option in the clusters filter to make it easier to view and manage nodes across multiple clusters at once. When multiple clusters are selected, a Cluster column is displayed by default, showing each node’s associated cluster. Available in both the UI and API.

Authentication and Authorization

  • Scoped access rules - Users with permissions restricted to a specific scope are now limited to access rules within that scope. This capability is enabled via a tenant setting (enable_scoped_authorization) in the Settings API. Once enabled, the Access rules API returns only the rules within the viewer’s scope (or narrower), and the same scope filtering is applied when viewing access rules in the UI. This ensures access control is aligned with scope boundaries and prevents users from seeing or modifying rules outside their domain.

General Settings

  • Several changes have been made to the General settings:

    • Removed the following - Job submission, MPI distributed training, Weights & Biases SWEEP toggles and Set Docker image registry

    • The following toggles are now enabled by default - Flexible workload submission, Flexible workload templates Data volumes, and Policies

  • Custom logo branding - You can now upload a custom logo to appear in the top-right corner of the NVIDIA Run:ai platform interface. This allows organizations to personalize the platform UI with their own branding. Logos can be uploaded in SVG or PNG format (up to 128 KB) directly from the Branding settings.

Infrastructure Administrators

Installation

  • Cluster configuration via Helm values - Cluster configurations can now be managed directly through the Helm values interface (clusterConfig). At runtime, runaiconfig is the actual source of truth, representing what is actively running in the cluster. When a Helm upgrade is performed, the Helm values overwrite the existing runaiconfig, ensuring alignment with the chart. As a result, clusters configured through a Helm chart should always be managed through Helm. This keeps configurations consistent and predictable across deployments and upgrades. See Advanced cluster configurations for more details. From cluster v2.23 onward

  • Support for custom CA with S3 and Git integrations - Administrators can configure a custom Certificate Authority (CA) for secure TLS communication with S3 and Git repositories. This update extends existing custom CA support to also cover the Git-sync and S3 sidecar containers. It simplifies setup for airgapped environments by eliminating the need for manually built images and ensures consistent secure communication across all components. See Cluster system requirements for more details. From cluster v2.23 onward

System Requirements

  • NVIDIA Run:ai supports Kubernetes version 1.34.

  • OpenShift version 4.15 is no longer supported.

  • Support for ARM on OpenShift - NVIDIA Run:ai now supports running on ARM-based nodes in OpenShift clusters, expanding deployment flexibility, and allowing organizations to leverage ARM architectures alongside existing x86 infrastructure within their OpenShift environments.

General Enhancements

  • Keyboard shortcuts for dialogs and forms - Common keyboard actions across most UI screens and dialogs. Press Enter to confirm actions and Esc to cancel, making it quicker and easier to navigate workflows.

  • Deleted workloads visible by default - Deleted workloads are displayed by default in the UI under Workload manager. The toggle to enable this view has been removed, simplifying the experience and making it easier for users to track and review deleted workloads without extra configuration.

  • Direct tool connection - When a workload has only one configured tool, clicking Connect opens the connection directly, without showing a selection menu. If multiple tools are configured, the selection menu will still appear.

  • Metrics view updates - The metrics view has been reorganized with new naming and grouping:

    • Renamed Default metrics view to Resource utilization

    • Renamed Advanced metrics view to GPU profiling

    • Inference metrics are shown in a dedicated Inference dropdown, available for all inference workloads

Deprecation Notifications

Grafana Dashboards

The legacy Grafana dashboards - Overview and Analytics - are deprecated and will be removed in a future release. We recommend transitioning to the new dashboards available in the NVIDIA Run:ai UI, which are powered by NVIDIA Run:ai APIs. These dashboards provide improved visibility with drill-down capabilities and more flexibility for analyzing usage and performance.

Note

The Consumption dashboard was deprecated in version 2.22 and replaced with Reports.

CLI v1

CLI v1 was deprecated in v2.20 and has now been fully removed from the platform. All command-line interactions should be performed using CLI v2.

Note

CLI v1 is still available for clusters below v2.18.

Jobs

The Jobs workload type was deprecated in v2.20 and has now been fully removed from the platform. This means the Jobs option no longer appears in the General settings or in the Workloads page.

Last updated