What's New
The what's new provides transparency into the latest changes and improvements to NVIDIA Run:ai’s SaaS platform. The updates include new features, optimizations, and fixes aimed at improving performance and user experience.
Important
For a complete list of deprecations, see Deprecation notifications. Deprecated features, APIs, and capabilities remain available for six months from the time of the deprecation notice, after which they may be removed.
Gradual Rollout
SaaS features and bug fixes are gradually rolled out to customers to ensure a smooth transition and minimize any potential disruption. SaaS releases follow a scheduled rollout cadence, typically every two weeks, allowing us to introduce new functionalities in a controlled and predictable manner. All customers receive the changes within 10 days of the initial release.
In contrast, hotfixes are deployed as needed to address urgent issues and are released immediately to ensure the stability and security of the service.
DGX Cloud
Certain features are first made available in fully managed cloud-based deployments provisioned through DGX Cloud. These features are labeled as DGX Cloud only and will become available to all customers in future releases.
Feature Life Cycle
NVIDIA Run:ai uses life cycle labels to indicate the maturity and stability of features across releases:
Experimental- This feature is in early development. It may not be stable and could be removed or changed significantly in future versions. Use with caution.Beta- This feature is still being developed for official release in a future version and may have some limitations. Use with caution.Legacy- This feature is scheduled to be removed in future versions. We recommend using alternatives if available. Use only if necessary.
February 2026 Releases
February 23
Product Enhancements
New blocked rule for workload policies - A new
blockedrule was added to workload policies, allowing administrators to prevent AI practitioners from specifying a value for a field. This can be used to lock security-related configurations from user modification without enforcing a specific default value (for example, supplemental groups). See Policy YAML reference for more details.UI inactivity timeout updates - The Session timeout setting was renamed to UI inactivity timeout. If left blank, users are logged out after 24 hours of inactivity by default. CLI and API access remain unaffected. See General settings for more details.
Permission-based access to the Overview dashboard - The Overview dashboard is now available to roles that have at least one of the relevant READ permissions listed below. Dashboard widgets and data are displayed according to each user’s permissions, and widgets that are not applicable to the user’s permissions are automatically hidden. This enhancement also supports custom roles with different permission sets.
Clusters READ
Node pools READ
Nodes READ
Projects READ
Departments READ
Workloads READ
Resolved Bugs
RUN-34472
Fixed an issue where the "Allocation ratio by node pool" widget in the Overview dashboard aggregated unlimited quotas together with other quotas, resulting in incorrect data.
RUN-33566
Fixed an issue where, after the runai upgrade command completed successfully, the CLI incorrectly prompted the user to run the upgrade again.
RUN-36045
Fixed an issue where inference workload metrics were not being refreshed correctly.
RUn-36257
Fixed an issue in the flexible workload submission form where image pull secret section would present shared credentials instead of shared secrets resulting in a failure to submit the workload.
RUN-36381
Fixed a security vulnerability related to GHSA-jmp9-x22r-554x with severity HIGH.
RUN-36598
Fixed an issue where department data was not synced to the cluster, affecting both department creation and updates.
RUN-34624
Fixed an issue in Projects and Departments where GPU utilization/allocation metrics were not displayed if only partial data was available.
RUN-36382
Fixed a security vulnerability related to GHSA-cv78-6m8q-ph82 with severity HIGH.
RUN-36414
Fixed a security vulnerability related to CVE-2025-14459 and CVE-2025-64324 with severity HIGH.
RUN-36451
Fixed an issue where users with the appropriate permissions could not delete system templates in the UI.
RUN-36457
Fixed an issue where, on rare occasions, "Allocation ratio by node pool" widget would show incorrect data.
RUN-36501
Fixed an issue where a node pool that included nodes without required topology labels became stuck in Updating after a topology was attached.
RUN-36506
Fixed an issue where the UI shows the wrong GPU quotas for node pools associated with the “Default” department.
RUN-36505
Fixed an issue where, on rare occasions, there was a race condition in some of the metrics causing the average GPU utilization to be above 100%.
February 10
Product Enhancements
Redesigned Projects and Departments management - NVIDIA Run:ai introduces an improved organization management experience that provides better visibility into resource distribution and clearer explainability for how resources are prioritized and allocated across the organization. This update simplifies large-scale organizational management while maintaining full compatibility with NVIDIA Run:ai’s advanced scheduling capabilities. See Projects and Departments for more details.
From cluster v2.20 onwardImproved organizational visibility - A clearer, “big picture” view of projects and departments, making it easier to understand how GPU resources are distributed and prioritized.
Bulk management operations - Administrators can perform bulk actions across multiple organizational units directly from the UI and API, reducing operational overhead.
Clearer resource explainability - Improved transparency into resource contention and ordering, helping align scheduling behavior with business needs.
Increased initialization timeout for inference workloads - The maximum initialization timeout for inference workloads and templates in the UI has been increased to 720 minutes, allowing workloads with longer startup times, such as large models, to complete successfully without premature failure.
Delete predefined environment assets - Users can now delete predefined environment assets for inference workloads,
chatbot-ui,gpt2, andllm-server, giving greater control over environment configuration.UI adjustments for distributed training - The distributed training workflow (workloads and templates) in the UI now includes a third step for mutual workload setup. The "Allow different setup for the master" toggle has been removed. By default, the master and workers use the same setup unless a policy defines different behavior. This applies to Flexible submission only. See Train models using a distributed training workload for more details.
New guided tour for projects and departments - A built-in tour guides administrators through the projects and departments experience, highlighting key areas and workflows to help them get started quickly.
Resolved Bugs
RUN-36122
Fixed an issue where credentials assets were not displayed in the Credentials table.
RUN-36010
Fixed an issue where navigating back to the root level in dashboard widgets caused the dashboard to crash.
RUN-35976
Fixed an issue where workloads submitted with names longer than 63 characters failed to schedule.
RUN-35922
Fixed a security vulnerability related to CVE-2026-0861 with severity HIGH.
RUN-35637
Fixed an issue where, when CPU quota and Limit projects from exceeding department quota were both enabled, updating department or project memory quotas to very large values failed with incorrect validation errors, even though the values were valid.
RUN-35620
Fixed an issue where providing an invalid admin password during installation caused the tenant to become permanently stuck.
RUN-35594
Fixed an issue where the workload describe command did not display the master specification for distributed workloads.
RUN-35511
Fixed an issue where an incorrect FQDN used during certificate generation caused errors.
RUN-35834
Fixed an issue where AI practitioner role did not have read access to policies granted through workload submission permission sets (for example, workspaceEditAccess).
RUN-36254
Fixed an issue where a race condition during webhook certificate generation caused failures.
RUN-35443
Fixed a security vulnerability related to CVE-2025-68973 with severity HIGH.
RUN-35326
Fixed an issue where the Projects/Departments table in the Overview dashboard sometimes showed fewer than 15 projects/departments when their workloads did not have allocated GPUs or were not in Running or Pending status.
RUN-35169
Fixed an issue where distributed inference workloads could be submitted successfully with an invalid workers value
RUN-34593
Fixed an issue in the Overview dashboard where the Node pool filter did not work for the Idle workloads table.
RUN-34017
Fixed an issue where runai template list returned incorrect output when using --page-size and --max-items together.
January 2026 Releases
January 26
Product Enhancements
Pod logs and terminal access - Accessing pod logs and interactive shells is now faster and more consistent across the Workloads experience. You can open logs or connect to running pods directly from multiple entry points, with pod selection and status kept in sync as you move between views. See Workloads for more details. From cluster v2.24 onward
One-click access to logs and terminals from the Pods view and Logs view, with the selected pod opened automatically.
New Terminal tab for interactive access to pods and containers, including automatic connection when launching from the Workload grid.
Synchronized pod selection and status across Pods, Logs, and Terminal views, while preserving existing responsive pod name behavior.
Resolved Bugs
RUN-35623
Fixed an issue where running runai logout returned 404 Not Found when the session token had already expired. The logout command now completes successfully and returns a clear message.
RUN-35583
Fixed an issue where the template describe command did not display the master specification for distributed templates when the master and worker configurations differed.
RUN-35566
Fixed an issue where image pull secrets marked with exclude=true were not excluded from the workload.
RUN-35460
Fixed an issue where during password change, the wrong current password logged the user out and redirected them to the login page.
RUN-35769
Fixed an issue on OpenShift clusters where missing permissions to manage finalizers caused all workloads to remain stuck in Creating state.
RUN-35421
Fixed a security vulnerability related to CVE-2025-15284 with severity HIGH.
RUN-35388
Fixed an issue where distributed training workloads were not blocked when master and worker roles used different node pools.
RUN-35148
Fixed an issue where charts in the Overview dashboard did not render data after the node pool filter was changed.
RUN-32181
Fixed a security vulnerability related to CVE-2025-32988 with severity HIGH.
RUN-34875
Fixed an issue where enabling authentication and authorization prevented user metrics from being collected for inference workloads running on Knative and NIM.
January 15
Product Enhancements
Custom roles using permission sets (API) - Administrators can now create custom roles by combining predefined permission sets using the Roles API. Permission sets are predefined, supported groupings of permissions that represent all required dependencies for a specific operation (for example, workload submission). Custom roles can then be assigned to users or groups through access rules in the UI or API, alongside the existing NVIDIA Run:ai predefined roles. This allows organizations to tailor access control to their operational needs while maintaining compatibility with the platform’s supported permission model. See Roles for more details.
From cluster v2.21 onwardNVIDIA NIM service API enhancements - NVIDIA Run:ai expands support for deploying and managing NIMs through the NVIDIA NIM Operator, providing a standardized, operator-based deployment flow aligned with NIM-native configurations. NIM services are fully managed through the NVIDIA Run:ai API, with UI and CLI support planned for a future release. This capability does not replace the current NIM deployment flow and is available as an additional option. See NVIDIA NIM API for more details.
From cluster v2.23 onwardAutoscaling allowing NIM services to scale dynamically based on demand
Fractional GPU support, enabling NIM services to request and use partial GPUs for more efficient GPU utilization
Multi-node NIM deployments, enabling distributed NIM workloads across multiple nodes
Policy enforcement through a dedicated NVIDIA NIM Policy API for consistent governance of NIM services
Partial updates via a new PATCH endpoint, allowing targeted changes without resubmitting the full specification
NIM Cache support for model stores, enabling caching of specific LLM or multi-LLM model artifacts to improve startup time and reuse across deployments
MNNVL acceleration for supported workload types - NVIDIA Run:ai now enables running supported workload types on Multi-Node NVLink (MNNVL) domains, including GB200 NVL72 systems. NVIDIA Run:ai applies the appropriate compute domain configuration to ensure workloads are placed and scaled within the same NVLink domain. AI practitioners can submit supported workload types using the Workloads V2 API and configure their MNNVL preference as part of the workload submission.
From cluster v2.24 onwardSeparate priority and preemptibility controls - Workload priority and preemptibility are configured as two independent parameters across the UI, CLI, and API for native and supported workload types. If no preemptibility value is specified, the existing behavior based on priority is applied automatically. See Workload priority and preemption for more details.
From cluster v2.24 onwardAuthenticated browsing for the NGC catalog - Browse the NGC catalog and private NGC registries as an authenticated user by selecting your NGC API key credentials during workload submission or template creation. This provides access to models and containers that require authentication while preserving the option to browse the public container registry. Private NGC registries require administrator configuration in the General settings.
BetaFrom cluster v2.23 onwardNGC API key support for NVIDIA NIM workloads - NVIDIA Run:ai supports using an NGC API key when deploying NIM workloads to handle both image access and model runtime authentication. A single NGC API key is automatically applied for pulling NIM images from the NGC catalog and injected as a runtime environment variable required for downloading model weights. This streamlines NIM deployment by removing the need for separate pull secrets and runtime credentials while enabling full user self-service for authenticated NIM workloads. See See Deploy inference workloads from NVIDIA NIM for more details.
From cluster v2.23 onwardUpdated predefined roles - Predefined roles in NVIDIA Run:ai have been updated to better align with common organizational responsibilities and workflows. See Roles for more details:
Added new predefined roles - AI practitioner, Data and storage administrator, and Project administrator
Some existing predefined roles have been deprecated. See Deprecation notifications for more details.
Reduced access to clusters and node pools - New APIs and permissions are now available to support reduced access to clusters and node pools - Clusters minimal and Node pools minimal. These APIs allow roles to perform actions such as workload submission while exposing only the minimal required cluster and node pool information (for example, names and IDs), rather than full read access. This improves role design by aligning the visible data with what is actually required for the action being performed. Roles that rely on full read access remain unchanged. Some predefined roles are planned to transition to the new minimal access as described in the Deprecation notifications.
Visibility into workload topology constraints - Workloads now expose the topology constraints requested during scheduling, providing clear visibility into how network topology influences placement decisions. In the UI, NVIDIA Run:ai native workloads display the requested topology constraints in the workload Details view, while the Workloads API exposes these fields across native and supported workload types.
From cluster v2.24 onwardControl access scope for inference serving endpoints - Set whether an inference serving endpoint is accessible externally or restricted to internal cluster traffic when submitting workloads or creating templates. Endpoints can be configured as External (public access), if your administrator has configured Knative to support external access, or Internal only, limiting access to in-cluster traffic.
From cluster v2.24 onwardAsset-based workload submission in the CLI - The NVIDIA Run:ai CLI supports submitting native workloads using workload assets, such as compute resources, environments, and data sources. This allows AI practitioners to reuse the same predefined configurations available in the UI and API, reducing the need for long, flag-heavy CLI commands. Assets can be browsed and inspected directly from the CLI to support consistent and reliable workload submission. See CLI command reference.
From cluster v2.23 onwardImproved fractional GPU support for multi-container pods - Fractional GPUs are no longer limited to the first container in a pod. You can explicitly specify which container should receive fractional GPU resources using an annotation. If no container is specified, fractional GPUs continue to be associated with the first container by default. See GPU fractions and Dynamic GPU fractions for more details.
From cluster v2.24 onwardSupport for elastic distributed workloads on NVLink domains - Elastic distributed workloads, including auto-scaling and dynamically sized deployments, are fully supported on GB200 NVL72 and Multi-Node NVLink (MNNVL) domains using NVIDIA DRA driver version 25.8 and later. NVIDIA Run:ai automatically applies ComputeDomain configuration and topology-aware scheduling to ensure workloads scale within the same NVLink domain. See Using GB200 NVL72 and Multi-Node NVLink domains for more details.
From cluster v2.24 onwardNative Load Balancer support - NVIDIA Run:ai exposes LoadBalancer connectivity directly in the UI and CLI when submitting workloads or creating templates (assuming a load balancer is already installed in the cluster). Configure service ports explicitly and view clearer port configuration and connectivity status.
From cluster v2.24 onwardTime-based fairshare configuration per node pool - NVIDIA Run:ai supports time-based fairshare to improve long-term fairness in over-quota resource allocation. Instead of relying only on momentary demand, the Scheduler factors in historical GPU usage over time, ensuring that projects with lower recent consumption are given fair access to resources. Usage is tracked continuously, and each project’s GPU-hour consumption is evaluated against its configured weight to balance resource distribution more effectively across projects. Time-based fairshare can be enabled and configured per node pool using the Node pools form, with advanced customization available through the Node pools API.
From cluster v2.24 onwardExtended storage visibility in CLI describe commands - The
describecommand for native workloads supports--storage, showing storage resources such as PVCs, ConfigMaps, and Secrets.Default pod anti-affinity - A new cluster configuration,
global.requireDefaultPodAntiAffinity, applies a default pod anti-affinity rule to prevent pods from the same service from being scheduled on the same node when possible. This setting is enabled by default. See Advanced cluster configurations for more details.From cluster v2.24 onwardIngress controller recommendation update - Due to an announced deprecation by the upstream NGINX Ingress Controller project, NVIDIA Run:ai is updating its recommended ingress controller to HAProxy Ingress for supported environments. The Kubernetes Ingress standard remains fully supported. This change affects only the underlying ingress controller implementation and is intended to ensure long-term security, stability, and maintainability. For fresh installations, see Installation. To upgrade from earlier versions, see Migrate from NGINX to HAProxy Ingress.
From cluster v2.24 onwardComponent version updates - NVIDIA Run:ai now supports Kubernetes version 1.35, OpenShift version 4.20, and GPU Operator version 25.10. Support for Kubernetes version 1.31 and OpenShift version 4.16 has been removed.
From cluster v2.24 onwardRancher Kubernetes Engine (RKE1) - RKE1 is no longer supported due to reaching end of life (EOL). RKE2 is the recommended Rancher distribution. See the Rancher migration guide for more details.
From cluster v2.24 onward
Resolved Bugs
RUN-35189
Fixed an issue where the --working-dir parameter was ignored for Knative-based inference workloads, causing containers to start in / instead of the specified directory.
RUN-34639
Fixed an issue where the Fully free GPU devices column displayed - instead of 0 when no fully free GPU devices were available under fractional GPU allocations.
RUN-34867
Fixed an issue where projects created or updated during node pool deletion could reference a non-existent node pool and remain NotReady.
RUN-34607
Fixed issues where readiness probes did not work correctly with serving port authorization in single-node Knative inference workloads.
RUN-35348
Fixed an issue where the /v1/k8s/setting endpoint returned a 500 error for tenants without clusters, causing the UI to hang instead of redirecting to cluster creation.
RUN-34381
Fixed an issue where the Node column displayed a sort icon but did not actually sort results in the Running / Requested Pods modal.
RUN-34379
Fixed an issue where image names longer than the display limit were truncated without providing access to the full name.
RUN-35206
Fixed an issue causing a delay before newly created clusters could be deleted, leaving the Remove option temporarily unavailable after creation.
RUN-34611
Fixed an issue where Overview widgets did not update correctly when navigating from departments to projects.
RUN-35290
Fixed an issue where copying a workload that included node affinity settings caused the re-submission to fail.
RUN-34721
Fixed a security vulnerability related to CVE-2024-25621 with severity HIGH.
RUN-32181
Fixed a security vulnerability related to CVE-2025-32988 with severity HIGH.
RUN-34720
Fixed a security vulnerability related to CVE-2025-65637 with severity HIGH.
RUN-34680
Fixed a security vulnerability related to CVE-2025-58183 with severity HIGH.
RUN-35089
Fixed a security vulnerability related to CVE-2025-64756 with severity HIGH.
RUN-34620
Fixed an issue where, in rare cases, sessions could disconnect due to token refresh handling.
January 05
Product Enhancements
YAML-based workload submission in the UI - Submit supported workload types defined in YAML directly from the UI. This brings YAML-based submission, previously available through the API, into interactive workflows, allowing you to submit existing Kubernetes or framework-specific manifests while still benefiting from NVIDIA Run:ai scheduling, resource management, and monitoring. See Submit supported workload types via YAML for more details.
From cluster v2.23 onwardAutomatic network topology acceleration for supported workloads - Network topology–aware scheduling is applied automatically to supported distributed workloads submitted via YAML. Once a topology is attached to a node pool, NVIDIA Run:ai automatically applies Preferred topology constraints at the lowest available level for the entire workload, optimizing pod placement without additional user configuration. This expands the topology acceleration beyond NVIDIA Run:ai native distributed workloads to additional workload types. See Accelerating workloads with network topology-aware scheduling for more details.
From cluster v2.23 onwardAI application–based workload grouping in the UI - NVIDIA Run:ai provides a dedicated AI applications view. This view automatically groups Kubernetes resources deployed via Helm charts into a single logical application, allowing you to list, sort, and filter AI applications. You can also inspect aggregated resource requests and allocations (GPU, CPU, memory) and view the underlying workloads through the Details pane, making it easier to understand and manage complex, multi-component solutions. See AI applications for more details.
From cluster v2.23 onwardDynamo as a supported workload type - NVIDIA Run:ai supports Dynamo-based inference workloads through the DynamoGraphDeployment workload type. This allows Dynamo workloads to be deployed, scheduled, and monitored using the same platform capabilities and operational model as native workloads. See Supported workload types for more details.
From cluster v2.23 onwardKey capabilities include:
YAML-based deployment and management - Dynamo workloads can be submitted using YAML from the UI, API, or CLI, without requiring direct cluster access.
Hierarchical gang scheduling - NVIDIA Run:ai supports hierarchical (multi-level) gang scheduling for Dynamo workloads. Replica groups are scheduled together as sub-gangs, and the entire workload is then scheduled as a single unit. This ensures coordinated placement and execution across all components of the Dynamo workload.
Topology-aware scheduling - NVIDIA Run:ai applies topology-aware scheduling at the workload level to ensure Dynamo workload components are placed according to the underlying cluster topology, improving communication efficiency and execution consistency.
Automatic discovery of Dynamo frontend endpoints - NVIDIA Run:ai automatically detects Dynamo frontend endpoints and exposes them for access and monitoring.
Unified workload lifecycle and status visibility - Dynamo workloads are managed, monitored, and tracked with a unified lifecycle and status view.
Distributed inference support in the CLI - Native distributed inference workloads can be submitted and managed directly from the NVIDIA Run:ai CLI. AI practitioners can use familiar NVIDIA Run:ai commands to work with distributed inference workloads, such as list, describe, logs, exec, port-forward, update, and delete. See CLI command reference for more details.
From cluster v2.23 onwardTemplate-based workload submission in the CLI - The NVIDIA Run:ai CLI now supports submitting native workloads using existing templates. This allows AI practitioners to reuse the same predefined configurations available in the UI and API, reducing the need for long, flag-heavy CLI commands. Templates can be browsed and inspected directly from the CLI to support consistent and reliable workload submission. See CLI command reference for more details.
From cluster v2.23 onward
Resolved Bugs
RUN-34613
Fixed an issue where the Project GET API returned missing limit fields instead of an explicit unlimited value when CPU quotas were enabled.
RUN-30979
Fixed an issue where the PVC API did not validate claimName uniqueness.
RUN-34203
Fixed an issue where workloads using multiple GPU fractions were missing GPU utilization and memory metrics.
RUN-34631
Fixed an issue where the identity manager failed to start when the notification service was disabled.
RUN-34684
Fixed a security vulnerability related to CVE-2025-58183 with severity HIGH.
RUN-34694
Fixed a security vulnerability related to CVE-2025-58186 with severity HIGH.
RUN-34703
Fixed a security vulnerability related to CVE-2025-58187 with severity HIGH.
RUN-34712
Fixed a security vulnerability related to CVE-2025-61729 with severity HIGH.
RUN-34758
Fixed an issue where setting a GPU memory limit caused workload creation to fail.
December 2025 Releases
December 14
Product Enhancements
LeaderWorkerSet (LWS) as a new workload type - LeaderWorkerSet is now available as a supported workload type. LWS workloads can be deployed and managed YAML submission from the UI, API, or CLI, providing a standardized way to run leader–worker and multi-process workloads across the platform without direct cluster access. See Supported workload types for more details.
From cluster v2.23 onwardSubmit workloads from YAML via the CLI - The CLI now supports submitting workloads directly from a YAML definition using a new
runai workload submit -fcommand. This enables declarative workload creation while still allowing key fields to be overridden at submission time. Workloads created from YAML can also be deleted through the CLI, providing a simple way to manage YAML-defined workloads. See CLI commands reference for more details.Workloads v2 API update - The Workloads v2 API now includes a new PUT endpoint for updating workloads. This endpoint requires submitting a complete workload manifest, which fully replaces the existing workload configuration.
From cluster v2.23 onwardUpdate NVIDIA NIM services API - The NVIDIA NIM API now includes a new PATCH endpoint for modifying NIM service workloads. This endpoint supports partial updates, allowing you to update only the fields you need without submitting the full workload definition.
From cluster v2.23 onwardNew NVIDIA NIM performance histograms added to Metrics - The Metrics pane now includes two new histograms for NVIDIA NIM metrics. See NVIDIA NIM metrics for more details.
From cluster v2.23 onwardEnd-to-end request latency - Displays request distribution across latency buckets, helping you identify performance patterns and outliers over time.
Time to first token (TTFT) - Shows the distribution of TTFT across requests, enabling faster detection of model responsiveness issues.
Updated
runai loginCLI command - Therunai loginCLI command has been updated to streamline authentication options. These changes align the CLI login modes with the current authentication model and improve clarity in how each method is used. See CLI commands reference for more details.applicationlogin is deprecated and replaced withaccess-key, which is now the supported method for logging in with a service account or user access key.The previous
userlogin mode is deprecated and renamed topasswordto more accurately reflect username-and-password authentication.
Email invitations for local users - Administrators can now choose to automatically send an email invitation when creating a local user.
Resolved Bugs
RUN-30979
Fixed an issue where the PVC API did not validate claimName uniqueness.
RUN-33516
Fixed an issue so each access rule created or deleted in a batch action is now audited in the events history.
RUN-33806
Fixed an issue where containers ran as root instead of a non-privileged user.
RUN-33971
Fixed a permissions issue that allowed users with write-settings permissions to edit a centralized channel.
RUN-34048
Fixed an issue where inference workload URLs were generated as http instead of https.
RUN-34196
Fixed an issue where users with L1 Researcher and L2 Researcher roles could not list node pools using the NVIDIA Run:ai CLI.
RUN-34420
Fixed an issue where the NGC API key asset getById API response was missing the status field.
RUN-34429
Fixed an issue where users with the correct project permissions could create templates but were blocked from saving edits due to incorrect permission checks.
December 03
Product Enhancements
Service accounts replacing applications in the UI - The Applications feature has been renamed to Service accounts throughout the UI. All existing functionality remains the same, and existing application records continue to appear unchanged. See Service accounts for more details.
Connections column enabled by default in the Workloads grid - The Connections column is now selected by default in the Workloads table. When a workload has a single connection its URL is displayed directly, with long URLs automatically shortened using an ellipsis. The URL is clickable and opens the Connections dialog. When multiple connections exist, the table displays the total count.
Enhanced workload details view - The Workload Details tab now provides an enriched and clearer view of workload configuration data. The updated design improves readability and makes it easier to understand how a workload was submitted and configured. Key enhancements include:
Improved layout and data presentation - Configuration fields are now grouped and displayed more intuitively, helping users quickly find the information they need.
Specification selector - When a workload contains multiple specs, a new dropdown allows you to easily switch between them.
Overview dashboard enhancements - We’ve made several improvements to the Overview dashboard to strengthen visibility and better support key monitoring workflows. These enhancements also support deprecating the legacy Grafana dashboards. See Deprecation notifications for more details:
Enhancements to the Projects/Departments tables - Timeframe controls for GPU allocation, utilization, and memory utilization are now located within each column. The tables now display up to 15 entries, include GPU quota, separate pending and running workloads, and provide direct links to each project or department.
New pending-time widget - Introduced a new widget that displays pending workloads count by pending time, helping admins understand how much time their workloads are waiting and also identify the projects/departments experiencing extended pending times.
New guided tour for the Overview dashboard - A built-in tour now walks administrators and AI practitioners through the key areas of the Overview dashboard, helping them navigate the interface and become familiar with the functionalities that enable them to get the most out of the dashboard.
Additional dashboard improvements:
Added a top-stats Failed workloads widget.
Added numeric counters to bar graphs, displaying values directly on each bar rather than only in tooltips.
Updated the Workloads by category/type widget to count running workloads only.
Added an Idle time column to the idle workloads table.
Updated widget ordering to separate current time widgets from over time widgets, improving the analysis process from identifying issue to over time investigation.
Consumption report enhancements for GPU hour breakdown - The Consumption report now includes two new columns, GPU deserved quota hours and GPU over-quota hours. These metrics fully support all existing grouping options, including cluster, node pool, department, and project. This change also supports deprecating the legacy Consumption dashboard. See Deprecation notifications for more details.
Network topology visibility in clusters and node pools - The Network topologies modal in the Clusters page displays a new column showing which node pools each topology is associated with. This information is also available in the Network topologies API. In addition, the node pools list command in the CLI now includes a network topology column, showing the name of the topology assigned to each node pool.
From cluster v2.23 onwardPolicy-aware behavior for templates and assets - Templates and assets that do not fully comply with policy are no longer blocked outright when submitting a workload. Instead, NVIDIA Run:ai now evaluates non-compliance on a case-by-case basis:
Fixable non-compliance - If compliance can be achieved by adjusting settings during submission, the template or asset can be loaded. The UI highlights what needs to be updated to meet policy requirements.
Non-fixable non-compliance - If the non-compliant configuration cannot be changed, the template or asset cannot be used, and the relevant policy is displayed to explain the restriction.
Quick workload submit behavior - Templates with any non-compliance can now be loaded, but quick submit is automatically blocked. The full workload submission flow opens by default, where the UI highlights what needs to be updated to meet policy requirements.
Authenticated browsing for the NGC catalog - You can now browse the NGC catalog as an authenticated user by selecting your own NGC API key credentials during workload submission or template creation. This enables access to models and containers that require authentication while preserving the existing option to browse the public container registry.
BetaFrom cluster v2.23 onwardNew PVC events - NVIDIA Run:ai now emits new PVC asset lifecycle events - Creating, Deleting, and Syncing. These events appear in the PVC’s Event history, extending the visibility introduced in previous releases and giving administrators clearer insight into PVC asset changes and activity over time.
Resolved Bugs
RUN-34252
Fixed an issue that caused charts to remain in a loading state on every data refresh instead of only during the initial load.
RUN-33902
Fixed an issue where the workloads service could enter a CrashLoopBackOff during upgrade.
RUN-31856
Fixed a security vulnerability related to CVE-2025-47907 with severity HIGH.
RUN-33841
Fixed an issue that caused session disconnections.
RUN-33802
Fixed an issue that caused distributed inference workloads to become unsynchronized.
RUN-33642
Fixed an issue where the external-workload-integrator on OpenShift entered a constant reconcile loop, causing high CPU utilization.
RUN-33613
Fixed missing validations for CPU resources when the CPU quota feature flag was disabled, which caused project and department updates to skip required CPU checks.
RUN-33526
Fixed an issue that could cause the operator to crash during installation due to a race condition in ingress initialization.
RUN-33519
Fixed an issue where the UI incorrectly prevented creating templates with the same name across different scopes.
RUN-32889
Fixed an issue where idle GPU timeout rules were incorrectly applied to preemptible workspaces.
November 2025 Releases
November 18
Product Enhancements
Service accounts replacing applications - The applications feature has been renamed to service accounts in the API. Service accounts provide the same functionality for programmatic authentication and management but with updated terminology. The deprecation of applications begins with version 2.24 and will continue for two additional releases before removal. Existing application records and endpoints will remain functional during this period to ensure backward compatibility. See the Applications API (
/api/v1/apps) for more details.Distributed inference templates (API) - Distributed inference templates allow you to save workload configurations that can be reused across distributed inference submissions. These templates simplify the submission process and promote standardization across distributed inference workloads.
From cluster v2.22 onwardPolicy API for NIM services (API) - A new Policy API is now available for NVIDIA NIM services, enabling administrators to define and enforce policies that control the behavior of NIM service workloads. These policies help ensure consistent configurations across deployments, improve governance, and simplify management of NIM service workloads.
From cluster v2.23 onwardAutoscaling support in NVIDIA NIM service API - The NVIDIA NIM service API now supports autoscaling for inference workloads deployed through the NIM Operator. When enabled, NIM services automatically adjust the number of active replicas based on defined metrics, allowing deployments to scale up or down dynamically as traffic changes.
From cluster v2.23 onwardMulti-node NIM support in the NVIDIA NIM service API - The NVIDIA NIM service API now supports deploying multi-node NVIDIA NIM workloads.
From cluster v2.23 onwardHugging Face model catalog browsing - You can now browse and search the Hugging Face model catalog directly from the NVIDIA Run:ai UI and API when creating inference workloads. The live catalog view displays model details such as download count and gated status. For gated models, the platform prompts you to provide a Hugging Face token for access, while open models can be selected without authentication. See Deploy inference workloads from Hugging Face for more details.
Resolved Bugs
RUN-33471
Fixed an issue where cluster authentication didn’t use the tenant URL.
RUN-33638
Fixed an issue where the DCGM metric chart was displayed even when the cluster did not support DCGM metrics.
RUN-33634
Fixed an issue where resource name validation failed for hugepage resources by enhancing validation rules to properly support hugepages.
RUN-33448
Fixed an issue where switching between workloads in the workload Details drawer displayed incorrect data, particularly the workload lifespan value.
RUN-33418
Fixed an issue where the master spec was not inherited when creating a distributed workload from a template.
RUN-33364
Fixed an issue where policies allowed canEdit: false under attributes without specifying a default value, which incorrectly passed validation.
RUN-33313
Fixed an issue where the log viewer for distributed workloads displayed only a partial and unsorted list of pods.
RUN-33300
Fixed an issue where the metric gpu_memory_utilization_avg returned a NaN value.
RUN-33144
Fixed a security vulnerability related to CVE-2025-62156 with severity HIGH.
RUN-33127
Fixed an issue where workload submission in the CLI failed when commands contained special characters.
RUN-33099
Fixed an issue where a mismatch between Helm schema validation and pre-hooks runtime validation code caused clusterConfig.binder.resources errors during upgrades
RUN-33091
Fixed an issue where workloads logs initially loaded older logs instead of the most recent ones.
RUN-33054
Fixed an issue where creating or updating a policy failed with an ‘asset Id not found’ error when specifying an imposedAsset.
RUN-33044
Fixed an issue where the workload controller could delete all running workloads when init-ca generated a new certificate (every 30 days).
RUN-32702
Fixed an issue where users running Red Hat OpenShift Serverless experienced “Down” status alerts in OpenShift monitoring due to NVIDIA Run:ai Knative ServiceMonitors.
RUN-32680
Fixed an issue where logs were not displayed in the UI for workloads submitted using the Workloads v2 submission API.
RUN-32673
Fixed an issue where inference workload metrics did not allow selecting a specific pod for viewing metrics.
RUN-32642
Fixed an issue where the UI displayed an incorrect access rule status for users with Cloud Operator roles.
RUN-32572
Fixed an issue where the RunaiAgentPullRateLow and RunaiAgentClusterInfoPushRateLow Prometheus alerts were firing incorrectly without cause.
RUN-32449
Fixed an issue where a race condition between the NVIDIA Run:ai operator and upgrade/install post hooks caused the upgrade to fail.
RUN-31738
Fixed an issue where GPU fraction requests were not applied when submitting distributed workloads.
RUN-32989
Fixed an issue where the NVIDIA Run:ai operator experienced unusually high CPU utilization after upgrade.
RUN-32986
Fixed an issue where PVCs appeared with the status “Issues found” after upgrading to version 2.22.
November 02
Product Enhancements
Updated credential creation in the UI - The Credentials page has been redesigned for improved usability. The Access key and Username & password credential types have been consolidated under Generic secret, where each secret format now opens a dedicated form with context-specific input fields. In addition, a dedicated SSH key format has been added under Generic secret for easier configuration of SSH-based authentication. This change simplifies the UI and provides a more streamlined experience for managing credentials. See Credentials for more details.
Min/max worker configuration for PyTorch distributed training - You can now define the minimum and maximum number of workers directly from the UI when submitting PyTorch distributed training workloads. This provides greater flexibility and control over resource allocation. See Train models using a distributed training workload for more details.
Audit logging for password resets - Audit logs now capture all password reset events, including administrator-initiated resets, user-initiated resets, and password-recovery (“forgot password”) actions. This enhancement improves traceability and security visibility across user management workflows.
Access keys replacing user applications - The User applications feature has been renamed to Access keys across the UI and API (
/api/v1/user-applications). Access keys provide the same functionality for programmatic authentication and management but with updated terminology. The deprecation of User applications begins with version 2.24 and will continue for two additional releases before removal. Existing User application records and endpoints will remain functional during this period to ensure backward compatibility. See Access keys for more details.
Resolved Bugs
RUN-33365
Fixed an issue where selecting an environment asset template in the flexible workload form would not present the the capabilities field correctly.
RUN-33447
Fixed an issue where the API allowed creating a PVC asset without a claimName when existingPVC=false.
RUN-32968
Fixed an issue where users without permission to create data source assets were blocked from adding one-time data sources during workload submission.
RUN-33314
Fixed an issue where the NGC API key validation did not allow special characters (-, _, .). Validation now supports these characters as expected.
RUN-33177
Fixed an issue where removing the logo in Branding settings displayed an empty square.
RUN-33176
Fixed an issue where pagination in the Node Pool page did not respond.
RUN-33038
Fixed an issue where department administrators could not include cluster-scope templates in workloads due to incorrect validation of permitted scopes.
RUN-33036
Fixed an issue where the grace period preemption field in the UI was limited to 5 minutes, even when the workload policy allowed longer durations.
RUN-33006
Fixed an issue in the CLI installer where the PATH was not configured for all shells. The installer now correctly configures PATH for both zsh and bash.
RUN-32995
Fixed an issue where policies were not applied when submitting a workload using a template.
RUN-32752
Fixed an issue where the filterBy department option in the consumption report did not work as expected.
RUN-29375
Fixed an issue where stale department were not properly removed after deleting a cluster.
RUN-33053
Fixed an issue that caused conflicts with additional built-in Prometheus Operator deployments in OpenShift.
RUN-32876
Fixed an issue where running a NIM inference workload on a fractional GPU prevented the Triton server from starting, causing inference endpoint requests to fail.
RUN-32730
Fixed an issue where incorrect average GPU utilization per project and workload type was displayed in the Projects view charts and tables.
RUN-32159
Fixed an issue where the updatedBy field of a policy did not show the latest user who updated it.
RUN-31803
Fixed an issue where the Quota management dashboard occasionally displayed incorrect GPU quota values.
October 2025 Releases
October 19
Resolved Bugs
RUN-33039
Fixed an issue where setting uid or gid to 0 during environment creation was not allowed.
RUN-33147
Fixed an issue where users with expired refresh tokens (after 24 hours) could not log in, as the token endpoint returned a 400 error.
RUN-33168
Fixed an issue where certain policy calls failed when at least one unconfigured cluster existed in the system.
October 08
Product Enhancements
Cluster diagnostics collection command - Added a new CLI command, runai diagnostics collect-logs, which gathers diagnostic logs from the Kubernetes cluster for troubleshooting or sharing with NVIDIA Run:ai support. You can collect logs from all or specific namespaces, specify an output directory, and choose whether to include previous pod logs, simplifying cluster debugging and support workflows. See runai diagnostics command for more details.
Resolved Bugs
RUN-32571
Fixed an issue where credentials that were not yet synced to the cluster appeared in the credential selection dropdown in Hugging Face and NIM inference workloads.
RUN-32652
Fixed an issue where YAML submitted workloads were not supported in batch deletion.
RUN-32605
Fixed a security vulnerability related to CVE-2025-58754 with severity HIGH.
RUN-32314
Fixed an issue where deleting a project did not remove access rules scoped to that project
September 2025 Releases
September 28
Product Enhancements
Guided onboarding for first-time admins - A new onboarding flow helps system and platform administrators quickly get started by walking through cluster installation, setting up SSO and onboarding the first research team, reducing setup complexity and accelerating time to adoption.
Guided onboarding experience for new researchers - On their first login, all new researchers are directed to the Workloads page and guided through creating their first Jupyter Notebook workspace with a short tour. A template is available for immediate launch, helping users get started quickly. The guided tour remains available anytime from the Help menu.
Workload extensibility with Resource Interface - The Resource Interface (RI) enables organizations to extend NVIDIA Run:ai with new workload types from any ML framework, tool, or Kubernetes resource using a no-code configuration through the Workload Types API. This allows organizations to incorporate emerging AI/ML tools or custom resources without platform updates or code changes. These workloads become immediately available across the organization, empowering teams to innovate and collaborate while benefiting from advanced scheduling and monitoring. See Extending workload support with Resource Interface for more details.
ExperimentalFrom cluster v2.23 onwardNo-code onboarding - Register new workload types instantly via the Workload Types API.
Seamless researcher experience - Submit and run workloads using a standard YAML manifest via the Workloads v2 API.
Unified management - Newly added workloads are available to all teams and benefit from the same orchestration and monitoring as native types.
Resource Interface-powered integration - Defines how each workload is interpreted and optimized, enabling consistent support for scaling, dependencies, and advanced scheduling.
Newly added workload types - NIM Services, KServe, and JobSet.
New workload template capabilities - The new templates simplify the workload submission experience by allowing you to launch a workload in a single click, without modifying any settings. In addition, several supporting capabilities have been introduced. See Workload templates for more details.
From cluster v2.23 onwardPreset templates - A set of ready-to-use workload templates for NeMo, BioNeMo and PyTorch is now available, enabling you to launch workloads quickly.
Linked assets - Templates can now be linked to assets such as environments and compute resources. Any changes to these assets are automatically reflected in the template, ensuring consistency across workloads.
Migrating legacy templates - Existing legacy templates can now be migrated into the new workload templates format, allowing teams to retain their saved configurations while taking advantage of new features. This capability is available when the Flexible workload templates setting is toggled on. You will not lose your existing templates - all legacy templates remain available.
NGC public registry support for environment images - Environment images and tags can now be selected directly from the NGC public registry when creating workloads, environment assets and templates. This provides a streamlined way to access trusted NVIDIA containers without manually entering image URLs.
BetaFrom cluster v2.23 onwardEnhanced logging with per-container support - Workload logs can be viewed at the container level within each pod through the UI, API and CLI, giving researchers and administrators finer control when monitoring and debugging workloads. In addition, downloaded logs are saved with unique file names that include the workload, pod, container, and timestamp, making it easier to organize and analyze logs from distributed workloads. See Workloads for more details.
Networking metrics - A new metric, NVLink bandwidth total, has been added to Nodes and Workloads views in the UI and is also available through the Nodes and Pods APIs. This improves visibility into network utilization, giving teams deeper insight into consumption patterns and resource allocations.
From cluster v2.23 onwardEnhanced Git credential management - Git data sources can now be configured with Generic secret credentials through the UI or API, with support for SSH private keys. This provides a consistent and secure way to authenticate to repositories, simplifying setup for administrators and enabling users to connect to Git-based workflows more easily. See Credentials for more details.
From cluster v2.23 onwardCustomize your CLI list views - The new
--columnsflag allows you to tailor the output ofrunai listcommands to display only the fields you need, giving you complete control over table views. See CLI commands reference for more details.Select and order columns - Define exactly which columns to display and in what order.
Discover more data - Show useful fields that are not part of the default output.
Autocompletion support - Use tab completion to discover and select all available columns for any list command.
Distributed inference API enhancements - The inference API has been extended with support for multi-node deployments, adding autoscaling and rolling updates. These enhancements improve the robustness, scalability, and manageability of distributed inference workloads. See Distributed inferences API for more details.
From cluster v2.22 onwardDistributed inference support for GB200 and MNNVL - Distributed inference workloads can now take advantage of NVIDIA GB200 NVL72 and other Multi-Node NVLink systems. This enables automatic infrastructure detection, domain labeling, and optimized cross-node communication for high-bandwidth, performance-optimized inference execution. See Using GB200 NVL72 and Multi-Node NVLink domains for more details.
From cluster v2.23 onwardNVIDIA NIM service deployment API - A new API is available for deploying NVIDIA NIM services, allowing programmatic creation and management of NIM service workloads for easier automation and integration. See NVIDIA NIM API for more details.
From cluster v2.23 onwardSupport for Dynamo inference workloads - Multi-node inference workloads deployed with the NVIDIA Dynamo framework can now be scheduled efficiently using gang scheduling and topology-aware scheduling. This ensures fast startup, low latency, and better resource utilization for disaggregated inference pipelines.
ExperimentalFrom cluster v2.23 onwardNetwork topology-aware scheduling for distributed workloads - NVIDIA Run:ai now supports topology-aware scheduling to optimize placement of distributed workloads across data center nodes. By leveraging Kubernetes node labels, the Scheduler can co-locate pods on nodes that are “closer” to each other in the network. This reduces communication overhead, improves workload efficiency, and helps maximize GPU utilization. Once administrators configure the network topology and associate it with node pools, scheduling is applied automatically for distributed workloads submitted through the platform. See Accelerating workloads with network topology-aware scheduling for more details.
From cluster v2.23 onwardScoped access rules - Users with permissions restricted to a specific scope are now limited to access rules within that scope. This capability is enabled via a tenant setting (
enable_scoped_authorization) in the Settings API. Once enabled, the Access rules API returns only the rules within the viewer’s scope (or narrower), and the same scope filtering is applied when viewing access rules in the UI. This ensures access control is aligned with scope boundaries and prevents users from seeing or modifying rules outside their domain.Cluster configuration via Helm values - Cluster configurations can now be managed directly through the Helm values interface (
clusterConfig). At runtime,runaiconfigis the actual source of truth, representing what is actively running in the cluster. When a Helm upgrade is performed, the Helm values overwrite the existingrunaiconfig, ensuring alignment with the chart. As a result, clusters configured through a Helm chart should always be managed through Helm. This keeps configurations consistent and predictable across deployments and upgrades. See Advanced cluster configurations for more details.From cluster v2.23 onwardComponent version updates – NVIDIA Run:ai now supports Kubernetes version 1.34. Support for OpenShift version 4.15 has been removed.
From cluster v2.23 onwardSupport for ARM on OpenShift - NVIDIA Run:ai now supports running on ARM-based nodes in OpenShift clusters, expanding deployment flexibility, and allowing organizations to leverage ARM architectures alongside existing x86 infrastructure within their OpenShift environments.
From cluster v2.23 onwardDeleted workloads visible by default - Deleted workloads are displayed by default in the UI under Workload manager. The toggle to enable this view has been removed, simplifying the experience and making it easier for users to track and review deleted workloads without extra configuration.
Direct tool connection - When a workload has only one configured tool, clicking Connect opens the connection directly, without showing a selection menu. If multiple tools are configured, the selection menu will still appear.
Custom logo branding - You can now upload a custom logo to appear in the top-right corner of the NVIDIA Run:ai platform interface. This allows organizations to personalize the platform UI with their own branding. Logos can be uploaded in SVG or PNG format (up to 128 KB) directly from the Branding settings.
Resolved Bugs
RUN-32601
Fixed an issue where external token exchange failed because the API was incompatible with access_tokens.
RUN-31422
Fixed an issue where updating project resources created through the deprecated Projects API did not work correctly.
RUN-32551
Fixed an issue where inference workloads failed when using user credentials as an image pull secret.
RUN-32548
Fixed an issue where, in certain edge cases, removing an inference workload without deleting its revision caused the cluster to panic during revision sync.
RUN-32346
Fixed an issue where mappers could not be updated in identity providers (IdPs).
RUN-31993
Fixed a security vulnerability related to CVE-2025-22868 with severity HIGH.
RUN-31961
Fixed a security vulnerability related to CVE-2025-7425 with severity HIGH.
RUN-31051
Fixed a security vulnerability related to CVE-2025-49794 with severity HIGH.
RUN-31008
Fixed a security vulnerability related to CVE-2025-53547 with severity HIGH.
RUN-32123
Fixed an issue where email notifications configured through User settings were still sent after selecting and then immediately de-selecting all notification types.
RUN-32659
Fixed an issue where the search and filter logic for NIM models retrieved from the NGC catalog produced inconsistent results, causing some models to appear in unexpected positions in the list.
RUN-32699
Fixed an issue in the distributed inference policy API where some error messages displayed field names twice.
RUN-32789
Fixed an issue in CLI v2 where the --master-extended-resource flag had no effect in MPI training workloads.
RUN-30628
Fixed a security vulnerability related to CVE-2025-22874 with severity HIGH.
September 16
Product Enhancements
AI Application-based workload grouping - NVIDIA Run:ai now automatically groups related workloads into a single logical application for any workloads deployed via Helm charts. This provides a unified view of complex solutions. Using the API, you can track aggregated resource requests and allocations (GPU, CPU, memory) and monitor the overall application status. In the UI, you can filter the Workloads page by application name to easily see all components of a solution together. See AI Applications API for more details.
Flexible inference workload templates - Flexible workload templates allow you to save workload configurations that can be reused across workload submissions. You can create templates from scratch or base them on existing assets - environments, compute resources, or data sources. These templates simplify the submission process and promote standardization across users and teams. See Inference templates for more details.
Application access for inference serving endpoints - All inference workloads - custom, Hugging Face and NVIDIA NIM, support authorizing applications (in addition to users and groups) when connecting to inference serving endpoints. This enables secure, programmatic access to inference endpoints when accessed externally from the cluster. To use this capability, configure the serving endpoint, authenticate using a token granted by an application, and use the token in API requests to the endpoint.
Credential creation during NIM and Hugging Face submissions - You can now create My credentials of type Generic secret directly in the NVIDIA NIM and Hugging Face inference workloads submission, avoiding the need to leave the flow to configure authentication.
NVIDIA NIM observability metrics - Observability metrics are now available for NVIDIA NIM inference workloads via the UI and Workloads / Pods APIs, giving teams better visibility into the performance of large language model (LLM) deployments. These metrics can be collected when deploying NIM through NVIDIA Run:ai, NIM operator, Helm chart, or directly via container images (with
run.ai/nim-workload: "true"label). This enhancement enables more effective monitoring and troubleshooting of NIM-based inference workloads. See Workloads and NIM observability metrics via API for more details.From cluster v2.23 onwardApplication access for workload tools - Added support for authorizing applications (in addition to users and groups) when connecting to tools. This makes it easier to integrate external systems or services that need direct access to workload tools, providing more flexibility in how connections are managed.
PVC details view in data sources - A new details pane is available when selecting a PVC data source from the Data sources table. The pane shows Event History for cluster events, as well as Details such as scope, request settings, and partial storage class information. This enhancement gives administrators and AI practitioners greater visibility into PVC usage history and configuration, improving monitoring and debugging. See Data sources for more details.
From cluster v2.23 onwardSystem policies for workload governance - By default, every NVIDIA Run:ai account is governed by system policies that establish foundational security controls across all workloads, scopes, and interfaces (UI, API and CLI). These policies ensure consistent workload behavior and prevent unauthorized escalation, and can be viewed as part of the effective policy for any scope. Administrators can create new policies to update these defaults at any desired scope. This flexibility allows easing certain API restrictions when needed, while ensuring every change is explicit and auditable. See System policies for more details.
Privileged parameter - Set to
falseby default and not editable (canEdit: false), preventing containers from running with full host access unless explicitly enabled by an administrator.Grace period - Defines how long a workload can continue running after a preemption request before termination. The default grace period is 30 seconds, with a system-enforced maximum of 5 minutes across UI, API and CLI submissions. This value can be updated at any scope within the policy hierarchy.
Policy synchronization changes - Starting in version 2.23, control plane policies are no longer synchronized with the cluster. Policies are now stored and enforced only in the control plane, preventing conflicts with outdated cluster policies. See Workload policies for more details.
From cluster v2.23 onwardKeyboard shortcuts for dialogs and forms - Common keyboard actions across most UI screens and dialogs. Press Enter to confirm actions and Esc to cancel, making it quicker and easier to navigate workflows.
Updated General settings toggles - The following options are now enabled by default - Flexible workload submission, Flexible workload templates, Data volumes, and Policies.
Metrics view updates - The metrics view has been reorganized with new naming and grouping:
Renamed Default metrics view to Resource utilization
Renamed Advanced metrics view to GPU profiling
Inference metrics are shown in a dedicated Inference dropdown, available for all inference workloads
Resolved Bugs
RUN-32656
Fixed an issue where the selected node pool was not preserved when switching sections within the workload submission form for all workloads.
RUN-32002
Fixed an issue where exported CSVs had misaligned columns, causing values (e.g., scope, workload type, creation time, cluster) to shift into incorrect fields.
RUN-32150
Fixed a security vulnerability related to CVE-2025-5914 with severity HIGH.
RUN-31797
Fixed a security vulnerability related to CVE-2025-53547 with severity HIGH.
August 2025 Releases
August 31
Product Enhancements
Expanded cluster role permissions - Cluster roles have been updated to include
watchpermissions for all supported workload Custom Resource Definitions (CRDs) wherevergetandlistpermissions were already present. This change ensures compatibility with Kubernetes operators that requireget,list, andwatchaccess for proper monitoring and integration with NVIDIA Run:ai workloads.From cluster v2.21 onwardPolicy API for distributed inference - A dedicated policy API is available for distributed inference enabling fine-grained control over distributed inference workloads. Administrators can define and enforce policies that govern scheduling, scaling, and update behavior, ensuring workloads adhere to organizational requirements and operate consistently across environments. See Policy API for more details.
Removed General settings toggles - The following options have been removed from the General settings page: Job submission, MPI distributed training, Weights & Biases SWEEP integration, and Docker image registry.
Resolved Bugs
RUN-31860
Fixed a security vulnerability related to CVE-2025-47907 with severity HIGH.
RUN-31745
Fixed a bug which presented the value of the CPU memory in the wrong unit.
August 17
Product Enhancements
Workloads by category over time - Added a widget to the Overview dashboard that shows the number of workloads per category (e.g., Train, Build, Deploy) over time. This visualization helps identify usage trends, compare activity across categories, and track changes over specific periods. This feature is also supported in the API.
From cluster v2.22 onwardMinimum guaranteed runtime for preemptible workloads - You can now configure a minimum guaranteed runtime for preemptible workloads in node pools via the UI and API. This setting specifies the minimum time a preemptible workload will run once scheduled and bound to a node before becoming eligible for preemption. This reduces unexpected interruptions and makes workload execution more predictable. See Node pools for more details.
From cluster v2.23 onwardCluster filter enhancements for Nodes page - The Nodes page now includes an “All” option in the clusters filter to make it easier to view and manage nodes across multiple clusters at once. When multiple clusters are selected, a Cluster column is displayed by default, showing each node’s associated cluster. Available in both the UI and API.
Separate admin toggles for Hugging Face and NVIDIA NIM models - Previously, enabling Hugging Face and NVIDIA NIM models was managed through a single Models toggle in the Admin settings. These options are now separated into distinct toggles, allowing administrators to enable or disable Hugging Face and NIM models independently for finer control over inference model availability.
Resolved Bugs
RUN-31850
Fixed an issue where creating a workspace /training workload returned the error "terminationGracePeriod is not supported in this cluster.
RUN-31849
Fixed an issue where the non-preemptible priority over-quota warning text was missing from the inference workload creation page.
RUN-31579
Fixed an issue in the CLI documentation where the --new-pvc description did not clearly indicate that creating a new pvc means creating a new volume that is used only for the duration of the workload's lifecycle.
RUN-28394
Fixed an issue where the "Get Role by ID" API returned an "insufficient permissions" error for system administrator.
RUN-31304
Fixed a security vulnerability related to CVE-2025-22868 with severity HIGH.
RUN-31792
Fixed a security vulnerability related to CVE-2025-7425 with severity HIGH.
August 03
Product Enhancements
Flexible submission form for NVIDIA NIM and Hugging Face workloads - The flexible submission form is now supported for NVIDIA NIM and Hugging Face inference workloads. This form allows users to submit workloads using an existing setup or provide custom settings for one-time use, enabling faster, more consistent submissions aligned with organizational policies.
Advanced setup form for NVIDIA NIM and Hugging Face workloads - You can now access advanced configuration options when submitting NVIDIA NIM and Hugging Face inference workloads, including editing the image and tag, modifying or adding environment variables, and setting workload priority. This provides greater flexibility for adapting workload configurations to specific requirements.
Dynamic NVIDIA NIM model list from NGC catalog - The platform now retrieves the list of available NVIDIA NIM models directly from the NGC catalog using an API call. This ensures the model list remains current and reflects the latest offerings.
Resolved Bugs
RUN-31392
Fixed an issue where the audit logs page filter converted strings to lowercase, causing filtration to fail.
RUN-31410
Fixed an issue where templates did not appear in the templates table.
RUN-31269
Fixed an issue where upgrades failed due to changes in the OpenShift monitoring stack.
RUN-31687
Fixed an issue where the workload flexible submission form did not load the correct default node pools for a project.
RUN-31504
Fixed an issue where workloads created via CLI could not be cloned in the UI when flexible submission was disabled.
RUN-30746
Fixed an issue where workloads could not be scheduled if the combined length of the project name and node pool name was excessively long.
RUN-31208
Fixed an issue where, in OpenShift environments, certain container failures caused workloads to remain in the "Pending" phase instead of transitioning to "Failed".
RUN-31358
Fixed an issue where enabling enableWorkloadOwnershipProtection for inference workloads caused newly submitted workloads to get stuck.
RUN-31252
Fixed an issue where the terminationGracePeriodSeconds field accepted values greater than 300 seconds when submitted via the API.
RUN-31263
Fixed an issue where setting defaults for servingPort fields failed and incorrectly required the container port default as well.
RUN-31265
Fixed a security vulnerability related to CVE-2025-30749 with severity HIGH.
RUN-31488
Fixed an issue where the UI logs view called an unsupported API. The fix added a cluster version check to ensure the correct API is used.
RUN-30918
Fixed an issue where the createdAt timestamp was not updated when a policy was recreated, causing the timestamp to incorrectly reflect the original creation time.
RUN-31039
Fixed a base image security vulnerability in libxml2 related to CVE-2025-49796 with severity HIGH.
RUN-25973
Fixed an issue where some services were missing from cluster service groups.
RUN-31380
Fixed an issue where the SAML metadata XML redirect URL was invalid.
July 2025 Releases
July 20
Product Enhancements
Improved status messaging for node pools with undrained nodes - When creating a node pool or labeling nodes to add to the node pool, nodes that are not fully drained (i.e., still have running workloads) will now trigger clearer status messages in the API and UI. These messages indicate that the node pool cannot include the affected nodes until they are drained and reach a "Ready" state. This helps administrators better understand node pool readiness and identify which nodes are still in transition. From cluster v2.23 onward
Resolved Bugs
RUN-31167
Removed Groups resource type from the available permissions in NVIDIA Run:ai roles.
RUN-31129
Fixed an issue where the Inference Policy View option was missing from the Project and Department pages.
RUN-31036
Fixed a security vulnerability in runai-container-runtime-installer and runai-container-toolkit related to CVE-2025-49794 with severity HIGH.
RUN-31066
Fixed an issue where the validation for the number of workers in a policy was not applied correctly.
RUN-30740
Fixed an issue where negative values were allowed for GPU resource optimization swap size in node pool API.
Deprecation Notifications
Note
Deprecated features, APIs, and capabilities remain available for six months from the time of the deprecation notice, after which they may be removed.
January 2026
NVIDIA Run:ai Predefined Roles
The following predefined roles are deprecated in the UI and API. Review the new predefined roles to determine whether they meet your requirements, or create a custom role using the API. See Roles for more details:
Compute resource administrator
Credentials administrator
Data source administrator
Data volume administrator
Environment administrator
L1 researcher
L2 researcher
ML engineer
Research manager
Template administrator
During the deprecation period, the following predefined roles will be updated with minimal access to cluster and node pool data:
Both cluster and node pool access are planned to transition to Clusters minimal and Node pools minimal - L1 researcher, L2 researcher, Research manager
Cluster access is planned to transition to Clusters minimal while node pools access remains unchanged - ML engineer
Cluster access is planned to transition to Clusters minimal - Compute resource administrator, Credentials administrator, Data source administrator, Data volume administrator, Environment administrator, Template administrator
Models Catalog
The Models catalog page is deprecated. Previously, the Models catalog provided a quick start experience for deploying a curated set of Hugging Face models. The same capability is now available through the Hugging Face inference workload flow, which integrates directly with Hugging Face and allows you to browse, select, and deploy any supported model from an open list. To deploy Hugging Face models, use the Hugging Face inference workload flow.
API Deprecation Notifications
Deprecated Endpoints
/api/v1/apps
/api/v1/service-accounts
/api/v1/user-applications
/api/v1/access-keys
/api/v1/authorization/roles
/api/v2/authorization/roles
Deprecated Parameters
api/v1/authorization/access-rules
subjectType: app (enum)
subjectType: service-account
/api/v2/authorization/roles/api/v1/authorization/roles/api/v1/authorization/permissions
resourceType: app (enum)
resourceType: service-account
/api/v1/org-unit/projects/api/v1/org-unit/departments
resources: priority
resources: rank
CLI Deprecation Notifications
runai login
application
access-key
runai login
user
password
runai [workloadtype] submit
--environment
--environment-variable
September 2025
Grafana Dashboards
The legacy Grafana dashboards - Overview and Analytics - are deprecated and will be removed in a future release. We recommend transitioning to the new dashboards available in the NVIDIA Run:ai UI, which are powered by NVIDIA Run:ai APIs. These dashboards provide improved visibility with drill-down capabilities and more flexibility for analyzing usage and performance.
Note
The Consumption dashboard was deprecated in version July 2025 and replaced with Reports.
CLI v1
CLI v1 was deprecated in January 2025 and has now been fully removed from the platform. All command-line interactions should be performed using CLI v2.
Note
CLI v1 will still be available for clusters below v2.18.
Jobs
The Jobs workload type was deprecated in January 2025 and has now been fully removed from the platform. This means the Jobs option no longer appears in the General settings or in the Workloads page.
July 2025
Consumption Dashboard
The Consumption dashboard is deprecated and replaced with Reports. Consumption reports provide improved visibility into resource usage with enhanced filtering and export capabilities. We recommend transitioning to consumption reports for the most up-to-date insights.
Templates
The Templates feature is deprecated. We recommend transitioning to flexible workload templates, which offer enhanced functionality and support for flexible workload types - including workspace, standard training, and distributed training.
April 2025
Cluster API for Workload Submission
Using the Cluster API to submit NVIDIA Run:ai workloads via YAML was deprecated starting from NVIDIA Run:ai version 2.18. For cluster version 2.18 and above, use the NVIDIA Run:ai REST API to submit workloads. The Cluster API documentation has also been removed.
January 2025
Ongoing Dynamic MIG Deprecation Process
The Dynamic MIG deprecation process started in version 2.19. NVIDIA Run:ai supports standard MIG profiles as detailed in Configuring NVIDIA MIG profiles.
Before upgrading to version 2.20, workloads submitted with Dynamic MIG and their associated node configurations must be removed
In version 2.20, MIG was removed from the NVIDIA Run:ai UI under compute resources.
In Q2/25 all ‘Dynamic MIG’ APIs and CLI commands will be fully deprecated. (it will fail)
CLI v1 Deprecation
CLI V1 is deprecated and no new features will be developed for it. It will remain available for use for the next two releases to ensure a smooth transition for all users. We recommend switching to CLI v2, which provides feature parity, backwards compatibility, and ongoing support for new enhancements. CLI v2 is designed to deliver a more robust, efficient, and user-friendly experience.
Legacy Jobs View Deprecation
The legacy Jobs view will be discontinued in favor of the more advanced Workloads view. The legacy submission form will still be accessible via the Workload manager view for a smoother transition.
appID and appSecret Deprecation
Deprecating appID and appSecret parameters used for requesting an API token. It will remain available for use for the next two releases. To create application tokens, use your client credentials - Client ID and Client secret.
Last updated