What's New
The what's new provides transparency into the latest changes and improvements to NVIDIA Run:ai’s SaaS platform. The updates include new features, optimizations, and fixes aimed at improving performance and user experience.
Gradual Rollout
SaaS features and bug fixes are gradually rolled out to customers to ensure a smooth transition and minimize any potential disruption. SaaS releases follow a scheduled rollout cadence, typically every two weeks, allowing us to introduce new functionalities in a controlled and predictable manner.
In contrast, hotfixes are deployed as needed to address urgent issues and are released immediately to ensure the stability and security of the service.
DGX Cloud Create
Certain features are first made available in fully managed cloud-based deployments provisioned through DGX Cloud Create. These features are labeled as DGX Cloud Create only
and will become available to all customers in future releases.
Feature Life Cycle
NVIDIA Run:ai uses life cycle labels to indicate the maturity and stability of features across releases:
Experimental
- This feature is in early development. It may not be stable and could be removed or changed significantly in future versions. Use with caution.Beta
- This feature is still being developed for official release in a future version and may have some limitations. Use with caution.Legacy
- This feature is scheduled to be removed in future versions. We recommend using alternatives if available. Use only if necessary.
May 2025 Releases
May 18-25
Product Enhancements
Bulk delete for access rules - Users can now select and delete multiple access rules at once, provided they have the necessary permissions for each rule.
GPU memory utilization metrics for projects and departments - Added new metrics that track GPU memory utilization at the project and department level. This enables more granular visibility into resource usage across organizational units, helping teams monitor consumption and optimize allocations.
Resolved Bugs
RUN-28626
Fixed an issue where columns representing average over-time resource usage appeared empty in the departments grid.
RUN-27521
Fixed an issue where disabling CPU quota in the General settings did not remove existing CPU quotas from projects and departments.
RUN-28006
Fixed an issue where tokens became invalid for the API server after one hour.
RUN-28832
Fixed inference CLI v2 documentation with examples that reflect correct usage.
RUN-28311
Fixed an issue where user creation failed with a duplicate email error, even though the email address did not exist in the system.
RUN-27423
Fixed an issue where missing fields in the grantType request body did not return a proper error. The Tokens API now responds with a 400 Bad Request when required fields are missing.
RUN-28213
Fixed a security vulnerability in github.com.golang.org.x.crypto
related to CVE-2025-22869 with severity HIGH.
RUN-27986
Fixed an issue where leaving the Overview dashboard open for an extended period caused the Grafana session cookie to expire without refreshing, resulting in the dashboard becoming unavailable.
RUN-27955
Fixed an issue where the option to create a new host path data source was incorrectly available during inference workload submission, even when the policy did not allow it.
RUN-27638
Fixed a security vulnerability in axios related to CVE-2025-27152 with severity HIGH.
RUN-27514
Fixed an issue with incorrect calculation of the ALLOCATED_CPU_MEMORY_BYTES
telemetry metric.
RUN-27422
Fixed an issue where deleting a node type did not trigger project updates in the cluster.
May 04-11
Product Enhancements
Resource flag support for master pod in distributed training - Added new flags in the CLI to specify CPU and memory resources for the master pod in distributed training, including options to set CPU core limits, CPU core requests, memory limits, and memory requests. See CLI commands reference for more details.
ConfigMap subpath support - Added support for the subpath parameter in ConfigMap mounting, allowing customers to use different paths within the volume instead of just the root. This is supported across all workloads that use flexible submission in the UI, and is also available via the API and CLI.
DGX Cloud Create only
Suspend and resume actions for multiple workloads - You can now suspend or resume multiple workloads at once using the multi-select option in the UI, making it faster and easier to manage large sets of jobs.
From cluster v2.18 onward
Pod deletion policy for terminal distributed training workloads - You can now specify which pods should be deleted when a distributed training workload reaches a terminal state (completed/failed) in the UI. This enhancement provides greater control over resource cleanup and helps maintain a more organized and efficient cluster environment. See Train models using a distributed training workload for more details.
From cluster v2.20 onward
Pagination support in the CLI - The CLI now supports pagination for list commands (workloads, projects, nodes, node pools, and PVCs). You can control the number of results per page using
--page-size
, limit the total number of results with--max-items
, and retrieve additional pages using--next-token
. To disable pagination and retrieve a single page of results, use the--no-pagination
flag. See CLI commands reference for more details.Automatic CLI version updates - A new
--auto-update
flag has been added to theconfig set
CLI command allowing you to enable automatic version updates. This ensures you're always using the latest CLI features and fixes without needing manual upgrades. See CLI commands reference for more details.
Resolved Bugs
RUN-27837
Fixed an issue where a node pool’s placement strategy stopped functioning correctly after being edited.
RUN-28258
Fixed an issue where the nodes grid displayed undefined values in MNNVL columns.
RUN-28097
Fixed an issue where the allocated_gpu_count_per_gpu
metric displayed incorrect data for fractional pods.
RUN-26359
Fixed an issue in CLI v2 where using the --toleration
option required incorrect mandatory fields.
RUN-26608
Fixed an issue by adding a flag to the cli config set
command and the CLI install script, allowing users to set a cache directory.
RUN-27484
Fixed an issue where duplicate app.kubernetes.io/name labels
were applied to services in the control plane Helm chart.
RUN-27247
Fixed security vulnerabilities in Spring framework used by db-mechanic service
- CVE-2021-27568, CVE-2021-44228, CVE-2022-22965, CVE-2023-20873, CVE-2024-22243, CVE-2024-22259 and CVE-2024-22262.
RUN-27309
Fixed an issue where workloads configured with a multi node pool setup could fail to schedule on a specific node pool in the future after an initial scheduling failure, even if sufficient resources later became available.
RUN-27515
Fixed an issue where users were unable to use assets from an upper scope during flexible workload submissions.
RUN-27826
Fixed an issue where the runai inference update
command could result in a failure to update the workload. Although the command itself succeeded (since the update is asynchronous), the update often failed, and the new spec was not applied.
RUN-27915
Fixed an issue where the "Improved Command Line Interface" admin setting was incorrectly labeled as Beta instead of Stable.
RUN-27251
Fixed a security vulnerability in
knative.dev/serving
with CVE-2023-48713 with severity MEDIUM.
Fixed a security vulnerability in
golang.org.x.net
with CVE-2025-22872 with severity MEDIUM.
Fixed a security vulnerability in
github.com.golang-jwt.jwt.v4
andgithub.com.golang-jwt.jwt.v5
with CVE-2025-30204 with severity HIGH.
April 2025 Releases
April 20-27
Product Enhancements
New gRPC option to NIM workloads - You can now select gRPC as a protocol when submitting inference workloads through the NVIDIA NIM form, enabling more flexible communication with inference servers. See Deploy inference workloads with NVIDIA NIM for more details.
Departments enabled by default - To simplify onboarding and standardize tenant structure, all new tenants will now include a default department. The Departments setting flag has been removed from the Admin UI. See Departments for more details.
Pending time visibility for workloads - The Workloads API and UI now display total pending time, which represents the cumulative duration a workload spent in Pending state. This helps Admins assess resource demands for specific projects or departments.
Resolved Bugs
RUN-27485
Fixed an issue where users with the ML Engineer role were unable to submit inference workloads due to insufficient permissions.
RUN-27497
Fixed an issue where, after deleting an SSO user and immediately creating a local user, the delete confirmation dialog reappeared unexpectedly.
RUN-27008
Increased the range of generated reports to 31 days.
RUN-27502
Fixed the inference CLI commands documentation: --max-replicas
and --min-replicas
were incorrectly used instead of --max-scale
and --min-scale
.
RUN-27520
Fixed an issue where adding access rules immediately after creating an application did not refresh the access rules table.
RUN-26754
Fixed an issue where workload submission requests to the API did not apply UID
and GID
from the token when uidGidSource
was set to fromIdpToken
.
RUN-26953
Fixed an issue where OIDC client ID and password values containing spaces were allowed.
RUN-27264
Fixed an issue where creating a project from the UI with a non-unlimited deserved CPU value caused the queue to be created with limit = deserved
instead of unlimited.
RUN-27246
Fixed an issue by adding clientID=cli
to the CLI exchange request in Tokens API to ensure proper authentication flow.
RUN-26989
Fixed an issue that prevented reordering node pools in the workload submission form.
RUN-26992
Fixed an issue where workloads submitted with an invalid node port range would get stuck in Creating status.
RUN-26433
Fixed an issue where invalid GrantType
values in Tokens API requests returned unclear error messages.
April 06-14
Product Enhancements
See the What's new in version 2.21 for the full list of new features.
Resolved bugs
RUN-27088
Fixed a security vulnerability in tar-fs
related to CVE-2024-12905 with severity HIGH.
RUN-26464
Fixed an issue where fields and values associated with a selected storage class were not disabled as expected.
RUN-26690
Fixed an issue where the Run:ai logs view displayed both loading and empty states.
RUN-27229
Fixed a security vulnerability in github.com.opencontainers.runc
related to CVE-2024-21626 with severity HIGH.
RUN-27308
Fixed an issue where the API documentation did not include the return codes for duplicate project or department creation.
RUN-27219
Fixed an issue where project creation failed if a quota was set for CPU but not for memory.
RUN-27210
Fixed an issue that occurred when submitting a workload with multiple ConfigMap data storage entries.
RUN-26671
Fixed an issue where compute resources configured with multiple whole GPUs (e.g., 3 GPUs at 100%) were incorrectly submitted as a single GPU.
RUN-27120
Fixed an issue where copying a workload that a user was not authorized to access incorrectly granted them access to the serving port in the copied workload.
RUN-26753
Fixed an issue where creating a department or project scoped ConfigMap using the name of an existing cluster-wide ConfigMap resulted in an incorrect status
RUN-26386
Fixed an issue where inconsistent behavior occurred during project creation when configuring GPU resources with limit=null
, overQuotaPriority=null
, and deserved=0
.
RUN-27075
Fixed an issue where, in some cases, creating a project through the API with partial parameters would return an error when the "Limit projects from exceeding department quota" setting was enabled.
RUN-26120
Fixed an issue where the metrics service could get stuck while processing reports under certain conditions.
RUN-26861
Fixed an issue where the create workload page could remain stuck in a loading state due to pending cluster details.
RUN-26602
Fixed an issue where multiple workloads with the same name could be created via the UI, eventually leading to workload failures.
RUN-26892
Fixed an issue where the inference serving endpoint did not display the connection protocol and container port in the UI.
RUN-24579
Fixed an issue where the pubsub package failed to connect if the pubsub server (Redis/NATS) was not yet deployed.
RUN-26691
Fixed a security vulnerability in axios related to CVE-2025-27152 with severity HIGH.
RUN-26324
Fixed an issue in the documentation where the toleration name was incorrectly marked as mandatory. Also fixed an issue in CLI v2 where the required fields were incorrect: name
is no longer mandatory, and key
is now required.
RUN-26410
Fixed an issue in the POST /api/v1/workloads/trainings
API where, if Completions was set but Parallelism was not specified, the response returned Parallelism as null instead of the expected default value 1.
RUN-26955
Fixed an issue where duplicate results appeared in some cases for node metrics.
RUN-26764
Fixed an issue where in some cases, a node pool was stuck in "Creating" phase.
RUN-26772
Fixed an issue where a GET request for a non-existent workload returned an unexpected response format.
RUN-26641
Fixed an issue where CLI usage could be blocked even when the CLI and control plane version were aligned.
RUN-27041
Fixed an issue where Hugging Face inference workloads could not be submitted via the UI due to an error in the General section.
March 2025 Releases
March 16-23
Resolved Bugs
RUN-26686
Fixed an issue where workload names exceeding 50 characters caused failures due to Kubernetes label length constraints (max 63 characters).
RUN-26272
Fixed an issue where connecting to the SMTP server without credentials was not allowed.
RUN-26659
Fixed an issue where deleting the node pool did not remove it from the default node pools list.
RUN-26630
Fixed an issue that prevented updating tenant-scoped data sources.
RUN-25769
Fixed an issue where unusual text appeared at the end of each line when using the --help
option for the runai inference submit --help
command.
RUN-25918
Fixed an issue where the Running/Requested Pods column in the workload list displayed 1/0 instead of the correct format (1/1-3) for inference and other workload types that support minimum and maximum requested pods in the runai workloads list
command.
RUN-26473
Fixed an issue where removing labels and annotations from a workload created using "Copy & Edit" did not properly remove them.
RUN-26624
Fixed an issue which caused workloads to fail if both gpuPortionRequest and gpuPortionLimit were set to 1 (100%).
RUN-26270
Fixed an issue in SSO SAML where the Entity ID field had a different value before and after configuring SAML.
RUN-26240
CLI v2: Fixed an issue in the install script, where setting the install path environment variable did not install all the files in the correct path.
RUN-26479
CLI v2: Fixed an issue where using the wrong workload type in the workload describe command did not display an error.
RUN-26345
CLI v2: Added UIDGIDSOURCE_CUSTOM
when SupplementalGroups
is set.
March 05-09
Product Enhancements
Added functionality to verify the proper installation of Knative. The UI and API will reflect the status of various features based on their current state in Knative.
Added the NVIDIA logo to the platform, including the login page and other general areas.
Audit log: Only users with tenant-wide permissions now have the ability to access audit logs, ensuring proper access control and data security.
CLI v2: Users will be able to submit workloads and map secrets to volumes using the
--secret-volume
flag. This feature is applicable for all workload types - workspaces, training, and inference.
Resolved Bugs
RUN-26310
Fixed an issue where Docker registry credentials/secrets were not found when adding environment variables.
RUN-26253
CLI v2 list project now supports limit and offset flags.
RUN-25382
Fixed an issue where invalid min/max policy values caused an error in the policy pod.
RUN-26135
Fixed an issue which prevented enabling/disabling email notifications.
RUN-25131
Fixed an issue where authentication failures in the Grafana proxy incorrectly returned a 401 error causing users to be signed out of the UI.
RUN-26248
CLI v2: Fixed an issue where submitting an interactive workload with --attach
was not possible after the workload started running.
RUN-25982
CLI v2: Fixed an issue where interactive mode did not return an error for invalid control plane/Authentication URLs and timeout duration.
RUN-26356
Fixed an issue where Lowest for over quota weight did not appear as 0.
RUN-26249
Fixed an issue where creating a policy with the fields tty
and stdin
resulted in a validation error.
RUN-26178
Fixed an issue where the upgrade to 2.20 failed to migrate departments and projects if the job to validate the default department to clusters ran first.
RUN-25895
Fixed an issue where projects that were updated due to changes in their department override fields were not updated in the cluster.
RUN-26152
GET API for retrieving Workspaces, Trainings, and Inferences by ID returns deleted items.
RUN-25987
Updated all workload APIs to accurately reflect that both creating and deleting workloads return a 202 status code in the API documentation.
RUN-25984
Added a validation message to api/v1/me/password.
RUN-26062
Fixed an issue where a new API, intended for clusters running version 2.18 and above, was not disabled for older clusters, causing unintended workload operations — such as creation, deletion, resumption, or stoppage — after upgrading from versions below 2.18 to 2.18 or higher.
February 2025 Releases
February 16-23
Product Enhancements
NIM and model store: UX improvements
New functionalities added for CLI v2:
Allow users to list all available Persistent Volume Claims (PVCs) when submitting workloads. This enhancement simplifies the selection process for appropriate PVCs, making workload submission more efficient.
Enable users to display the config file in multiple formats. The available options are:
--json: Output structure in JSON format
--yaml: Output structure in YAML format
Resolved Bugs
RUN-25974
Fixed an issue where using filters in the Quota management dashboard was not working properly.
RUN-25969
Fixed an issue where the UI incorrectly rejected valid toleration key inputs during validation checks.
RUN-25946
Fixed an issue where the Update Inference Spec API did not enforce a minimum cluster version returning a 400 Bad Request for versions below 2.19.
RUN-25921
Fixed an issue where the Workspaces, Trainings and Distributed APIs did not enforce a minimum cluster version returning a 400 Bad Request for versions below 2.18.
RUN-25249
Fixed an issue where submitting a workload using a yaml file with a port but without service type would use ClusterIP as the default service type. If no host port is provided, the target port will be used as the host.
RUN-25269
Fixed an issue where the Pods modal was not paginated, limiting the display to only 50 records.
RUN-25466
Fixed an issue where an environment variable with the value SECRET was not valid as only SECRET:xxx was accepted.
RUN-23048
Improved error handling to display meaningful messages from the CLI upgrade command.
RUN-25552
Fixed an issue where clicking on "View Access Rules" in the Users table displayed only the first group if a user belonged to multiple groups.
RUN-25558
Fixed a memory issue when handling external workloads (deployments, ray etc.) which when scaled caused ETCD memory to increase.
RUN-25659
CLI v2: Fixed an issue where min and max replicas were submitted using TensorFlow.
February 02-09
Product Enhancements
Workload Events API, /api/v1/workloads/{workloadId}/events, now supports the sort order parameter (asc, desc).
MIG profile and MIG options are now marked as deprecated in CLI v2, following the deprecation notice in the last version.
As part of inference support in CLI v2, Knative readiness is now validated on submit requests.
Improved permission error messaging when attempting to delete a user with higher privileges.
Improved visibility of metrics in the Resources utilization widget by repositioning them above the graphs.
Added a new Idle workloads table widget to help users easily identify and manage underutilized resources.
Renamed and updated the "Workloads by type" widget to provide clearer insights into cluster usage with a focus on workloads.
Improved user experience by moving the date picker to a dedicated section within the overtime widgets, Resources allocation and Resources utilization.
Simplified configuration by enabling auto-creation of storage class for discovered storage classes.
Enhanced PVC underlying storage configuration by specifying allowed context for the selected storage (Workload Volume, PVC, both, or neither).
Added configurable grace period for workload preemption in CLI v2.
Resolved Bugs
RUN-24838
Fixed an issue where an environment asset could not be created if it included an environment variable with no value specified.
RUN-25031
Fixed an issue in the Templates form where existing credentials in the environment variables section were not displayed.
RUN-25303
Fixed an issue where submitting with the --attach flag was supported only in a workspace workload.
RUN-24354
Fixed an issue where migrating workloads failed due to slow network connection.
RUN-25220
CLI v2: Changed --image
flag from a required field to an optional one.
RUN-25290
Fixed a security vulnerability in golang.org/x/net v0.33.0 with CVE-2024-45338 with severity HIGH.
RUN-24688
Fixed an issue that blocked the Create Template submission due to a server error. This occurred when using the Copy & Edit Template form.
RUN-25511
Fixed an issue where deleting a workload in the CLI v2 caused an error due to a missing response body. The CLI now correctly receives and handles the expected response body.
Last updated