Hotfixes for Version 2.23

This section provides details on all hotfixes available for version 2.23. Hotfixes are critical updates released between our major and minor versions to address specific issues or vulnerabilities. These updates ensure the system remains secure, stable, and optimized without requiring a full version upgrade.

Version
Date (MM/DD/YYYY)
Internal ID
Description

2.23.68

04/30/2026

RUN-34648

Fixed an issue where an inference workload incorrectly displayed "Running" status when the pod was Pending after a scale-to-zero event.

2.23.67

04/30/2026

RUN-38467

Fixed a security vulnerability related to GHSA-pc3f-x583-g7j2 with severity HIGH.

2.23.67

04/30/2026

RUN-37894

Fixed a security vulnerability related to GHSA-xw7x-h9fj-p2c7 with severity CRITICAL.

2.23.66

04/27/2026

RUN-38510

Fixed a security vulnerability related to GHSA-9jj7-4m8r-rfcm with severity CRITICAL.

2.23.66

04/27/2026

RUN-38503

Fixed a security vulnerability related to GHSA-rp42-5vxx-qpwr with severity HIGH.

2.23.66

04/27/2026

RUN-38494

Fixed a security vulnerability related to GHSA-hfvc-g4fc-pqhx with severity HIGH.

2.23.66

04/27/2026

RUN-38430

Fixed an issue where workloads could not be submitted when the NVIDIA GPU Operator was deployed with the NRI plugin enabled.

2.23.66

04/27/2026

RUN-38428

Fixed a security vulnerability related to CVE-2026-40175 with severity CRITICAL.

2.23.66

04/27/2026

RUN-38405

Fixed an issue where uninstalling the cluster Helm chart on OpenShift failed because the runai-operator was missing permission to delete the runai-prometheus Secret in the openshift-monitoring namespace.

2.23.66

04/27/2026

RUN-38358

Fixed a security vulnerability related to CVE-2026-4424 with severity HIGH.

2.23.66

04/27/2026

RUN-38240

Fixed a security vulnerability related to GHSA-6v2p-p543-phr9 with severity HIGH.

2.23.66

04/27/2026

RUN-38183

Fixed a security vulnerability related to CVE-2026-21710 with severity HIGH.

2.23.66

04/27/2026

RUN-38173

Fixed a security vulnerability related to CVE-2026-32280 with severity HIGH.

2.23.66

04/27/2026

RUN-36522

Fixed a security vulnerability related to GHSA-37gf-gmxv-74wv with severity HIGH.

2.23.66

04/27/2026

RUN-36254

Fixed an issue where a race condition during webhook certificate generation caused failures.

2.23.65

04/19/2026

RUN-37532

Fixed an issue where workloads were slow to appear in the UI and API after being submitted.

2.23.65

04/19/2026

RUN-38029

Fixed an issue where the workloads pods API did not enforce project scope for user API tokens, allowing users to list pods across all projects.

2.23.65

04/19/2026

RUN-38099

Fixed a security vulnerability related to GHSA-hfvc-g4fc-pqhx with severity HIGH.

2.23.65

04/19/2026

RUN-38175

Fixed a security vulnerability related to CVE-2026-32280 with severity HIGH.

2.23.65

04/19/2026

RUN-38094

Fixed a security vulnerability related to CVE-2026-27654 with severity HIGH.

2.23.63

04/14/2026

RUN-38055

Fixed an issue where the access rules API accepted invalid subjectType values without returning a validation error.

2.23.63

04/14/2026

RUN-37959

Fixed an issue where automatic topology constraints for distributed workloads were applied at the wrong topology level.

2.23.63

04/14/2026

RUN-35919

Fixed an issue where db-migrations failed during control plane upgrades in the org-unit-service.

2.23.63

04/14/2026

RUN-37697

Fixed a security vulnerability related to GHSA-p77j-4mvh-x3m3 with severity CRITICAL.

2.23.63

04/14/2026

RUN-37984

Fixed a security vulnerability related to GHSA-r5fr-rjxr-66jc with severity HIGH.

2.23.63

04/14/2026

RUN-37895

Fixed a security vulnerability related to GHSA-c2c7-rcm5-vvqj with severity HIGH.

2.23.63

04/14/2026

RUN-37742

Fixed a security vulnerability related to CVE-2026-4111 with severity HIGH.

2.23.63

04/14/2026

RUN-37578

Fixed a security vulnerability related to GHSA-25h7-pfq9-p65f with severity HIGH.

2.23.63

04/14/2026

RUN-36615

Fixed a security vulnerability related to CVE-2024-12797 with severity HIGH.

2.23.61

03/27/2026

RUN-36559

Fixed an issue where tenant-level policy permissions could not delete policies belonging to scopes that no longer exist.

2.23.57

03/15/2026

RUN-37362

Fixed a security vulnerability related to CVE-2025-61732 with severity HIGH.

2.23.57

03/15/2026

RUN-37348

Fixed a security vulnerability related to CVE-2025-61726 with severity HIGH.

2.23.57

03/15/2026

RUN-37611

Fixed an issue in the distributed workload submission form where a project policy with a locked rule on storage instances could result in a failure to submit the workload.

2.23.57

03/15/2026

RUN-37504

Fixed a security vulnerability related to CVE-2026-25679 with severity HIGH.

2.23.57

03/15/2026

RUN-37170

Fixed a security vulnerability related to GHSA-23c5-xmqv-rm74 with severity HIGH.

2.23.57

03/15/2026

RUN-37169

Fixed a security vulnerability related to GHSA-5rq4-664w-9x2c with severity HIGH.

2.23.56

03/13/2026

RUN-37372

Fixed a security vulnerability related to CVE-2025-61731 with severity HIGH.

2.23.56

03/13/2026

RUN-37174

Fixed a security vulnerability related to GHSA-72hv-8253-57qq with severity HIGH.

2.23.55

03/11/2026

RUN-37341

Fixed a security vulnerability related to CVE-2025-61732 with severity HIGH.

2.23.46

03/09/2026

RUN-37278

Fixed a security vulnerability related to CVE-2024-1013 with severity HIGH.

2.23.45

03/09/2026

RUN-37167

Fixed a security vulnerability related to GHSA-72hv-8253-57qq with severity HIGH.

2.23.41

03/08/2026

RUN-37164

Fixed a security vulnerability related to GHSA-9h8m-3fm2-qjrq with severity HIGH.

2.23.40

03/06/2026

RUN-36734

Fixed an issue where the Analytics table displayed incorrect GPU Compute Utilization values for Training and Interactive workloads.

2.23.40

03/06/2026

RUN-36732

Fixed a security vulnerability related to GHSA-5vv4-hvf7-2h46 with severity HIGH.

2.23.39

02/26/2026

RUN-34875

Fixed an issue where enabling authentication and authorization prevented user metrics from being collected for inference workloads running on Knative and NIM.

2.23.39

02/26/2026

RUN-34624

Fixed an issue in Projects and Departments where GPU utilization/allocation metrics were not displayed if only partial data was available.

2.23.39

02/26/2026

RUN-36443

Fixed an issue where the dashboard returned a 500 error instead of an informative error message.

2.23.39

02/26/2026

RUN-36493

Fixed a security vulnerability related to GHSA-43fc-jf86-j433 with severity HIGH.

2.23.39

02/26/2026

RUN-36598

Fixed an issue where department data was not synced to the cluster, affecting both department creation and updates.

2.23.39

02/26/2026

RUN-37113

Fixed an issue where image strings that included a port number in the registry URL were not parsed correctly.

2.23.39

02/26/2026

RUN-37060

Fixed an issue where the NVLink total bytes per pod metric was labeled with GPU metrics labels instead of the expected pod labels.

2.23.39

02/26/2026

RUN-36560

Fixed an issue where the Connect button did not open the workspace URL for workspaces submitted through YAML.

2.23.39

02/26/2026

RUN-36370

Fixed an issue where NIM and HuggingFace inference templates failed to submit when a policy defined locked storage instances.

2.23.39

02/26/2026

RUN-35612

Fixed a security vulnerability related to CVE-2025-64756 with severity HIGH.

2.23.36

02/15/2026

RUN-36381

Fixed a security vulnerability related to GHSA-jmp9-x22r-554x with severity HIGH.

2.23.36

02/15/2026

RUN-36382

Fixed a security vulnerability related to GHSA-cv78-6m8q-ph82 with severity HIGH.

2.23.36

02/15/2026

RUN-36414

Fixed a security vulnerability related to CVE-2025-14459 and CVE-2025-64324 with severity HIGH.

2.23.36

02/15/2026

RUN-36457

Fixed an issue where, on rare occasions, "Allocation ratio by node pool" widget would show incorrect data.

2.23.36

02/15/2026

RUN-36020

Fixed an issue where, when swap was enabled, the toolkit-reservation pod could enter an OutOfMemory state if the kubelet detected insufficient RAM at startup, and would not automatically recover once memory was freed.

2.23.36

02/15/2026

RUN-36505

Fixed an issue where, on rare occasions, there was a race condition in some of the metrics causing the average GPU utilization to be above 100%.

2.23.36

02/15/2026

RUN-36506

Fixed an issue where the UI shows the wrong GPU quotas for node pools associated with the "Default" department.

2.23.36

02/15/2026

RUN-36555

Fixed a security vulnerability related to CVE-2024-56171 with severity HIGH.

2.23.34

02/03/2026

RUN-35326

Fixed an issue where the Projects/Departments table in the Overview dashboard sometimes showed fewer than 15 projects/departments when their workloads did not have allocated GPUs or were not in Running or Pending status.

2.23.32

01/29/2026

RUN-35976

Fixed an issue where workloads submitted with names longer than 63 characters failed to schedule.

2.23.32

01/29/2026

RUN-35511

Fixed an issue where an incorrect FQDN used during certificate generation caused errors.

2.23.32

01/29/2026

RUN-35620

Fixed an issue where providing an invalid admin password during installation caused the tenant to become permanently stuck.

2.23.31

01/26/2026

RUN-35443

Fixed a security vulnerability related to CVE-2025-68973 with severity HIGH.

2.23.31

01/26/2026

RUN-35637

Fixed an issue where, when CPU quota and Limit projects from exceeding department quota were both enabled, updating department or project memory quotas to very large values failed with incorrect validation errors, even though the values were valid.

2.23.31

01/26/2026

RUN-35922

Fixed a security vulnerability related to CVE-2026-0861 with severity HIGH.

2.23.30

01/20/2026

RUN-35623

Fixed an issue where running runai logout returned 404 Not Found when the session token had already expired. The logout command now completes successfully and returns a clear message.

2.23.30

01/20/2026

RUN-35421

Fixed a security vulnerability related to CVE-2025-15284 with severity HIGH.

2.23.29

01/11/2026

RUN-32181

Fixed a security vulnerability related to CVE-2025-32988 with severity HIGH.

2.23.29

01/11/2026

RUN-33448

Fixed an issue where switching between workloads in the workload Details drawer displayed incorrect data, particularly the workload lifespan value.

2.23.29

01/11/2026

RUN-34379

Fixed an issue where image names longer than the display limit were truncated without providing access to the full name.

2.23.29

01/11/2026

RUN-34381

Fixed an issue where the Node column displayed a sort icon but did not actually sort results in the Running / Requested Pods modal.

2.23.29

01/11/2026

RUN-34607

Fixed issues where readiness probes did not work correctly with serving port authorization in single-node Knative inference workloads.

2.23.29

01/11/2026

RUN-34613

Fixed an issue where the Project GET API returned missing limit fields instead of an explicit unlimited value when CPU quotas were enabled.

2.23.29

01/11/2026

RUN-34620

Fixed an issue where, in rare cases, sessions could disconnect due to token refresh handling.

2.23.29

01/11/2026

RUN-34639

Fixed an issue where the Fully free GPU devices column displayed - instead of 0 when no fully free GPU devices were available under fractional GPU allocations.

2.23.29

01/11/2026

RUN-34680

Fixed a security vulnerability related to CVE-2025-58183 with severity HIGH.

2.23.29

01/11/2026

RUN-34720

Fixed a security vulnerability related to CVE-2025-65637 with severity HIGH.

2.23.29

01/11/2026

RUN-35189

Fixed an issue where the --working-dir parameter was ignored for Knative-based inference workloads, causing containers to start in / instead of the specified directory.

2.23.25

12/21/2025

RUN-34791

Fixed a GPU memory swap issue where, under certain circumstances, GPU OOM killer could fail to select and preempt a GPU-consuming workload during out-of-memory or out of system RAM errors.

2.23.24

12/21/2025

RUN-34631

Fixed an issue where the identity manager failed to start when the notification service was disabled.

2.23.24

12/21/2025

RUN-34633

Fixed an issue where department administrators could not include cluster-scope templates in workloads due to incorrect validation of permitted scopes.

2.23.24

12/21/2025

RUN-34758

Fixed an issue where setting a GPU memory limit caused workload creation to fail.

2.23.24

12/21/2025

RUN-34712

Fixed a security vulnerability related to CVE-2025-61729 with severity HIGH.

2.23.23

12/09/2025

RUN-34233

Fixed an issue with refresh-token handling in legacy Grafana dashboards that caused unexpected session logouts.

2.23.23

12/09/2025

RUN-33806

Fixed an issue where containers ran as root instead of a non-privileged user.

2.23.22

12/08/2025

RUN-31856

Fixed a security vulnerability related to CVE-2025-47907 with severity HIGH.

2.23.22

12/08/2025

RUN-33516

Fixed an issue so each access rule created or deleted in a batch action is now audited in the events history.

2.23.22

12/08/2025

RUN-33780

Fixed an issue where apps with the "Viewer" role could not access node metrics, even when they had read permissions at the cluster scope.

2.23.22

12/08/2025

RUN-34429

Fixed an issue where users with the correct project permissions could create templates but were blocked from saving edits due to incorrect permission checks.

2.23.21

12/01/2025

RUN-33313

Fixed an issue where the log viewer for distributed workloads displayed only a partial and unsorted list of pods.

2.23.21

12/01/2025

RUN-33802

Fixed an issue that caused distributed inference workloads to become unsynchronized.

2.23.21

12/01/2025

RUN-33862

Fixed an issue where the workloads service could enter a CrashLoopBackOff during upgrade.

2.23.21

12/01/2025

RUN-33947

Fixed an issue where SMTP configurations using the "none" option still sent empty username/password fields. Added the auth_none type to ensure no credentials are sent for passwordless SMTP servers.

2.23.20

11/19/2025

RUN-33642

Fixed an issue where the external-workload-integrator on OpenShift entered a constant reconcile loop, causing high CPU utilization.

2.23.18

11/18/2025

RUN-33613

Fixed missing validations for CPU resources when the CPU quota feature flag was disabled, which caused project and department updates to skip required CPU checks.

2.23.18

11/18/2025

RUN-33634

Fixed an issue where resource name validation failed for hugepage resources by enhancing validation rules to properly support hugepages.

2.23.18

11/18/2025

RUN-32680

Fixed an issue where logs were not displayed in the UI for workloads submitted using the Workloads v2 submission API.

2.23.17

11/04/2025

RUN-33091

Fixed an issue where workloads logs initially loaded older logs instead of the most recent ones.

2.23.17

11/04/2025

RUN-33365

Fixed an issue where selecting an environment asset template in the flexible workload form would not present the the capabilities field correctly.

2.23.17

11/04/2025

RUN-33418

Fixed an issue where the master spec was not inherited when creating a distributed workload from a template.

2.23.16

10/30/2025

RUN-32449

Fixed an issue where a race condition between the NVIDIA Run:ai operator and upgrade/install post hooks caused the upgrade to fail

2.23.16

10/30/2025

RUN-32989

Fixed an issue where the NVIDIA Run:ai operator experienced unusually high CPU utilization after upgrade.

2.23.16

10/30/2025

RUN-33127

Fixed an issue where workload submission in the CLI failed when commands contained special characters.

2.23.16

10/30/2025

RUN-33144

Fixed a security vulnerability related to CVE-2025-62156 with severity HIGH.

2.23.16

10/30/2025

RUN-33235

Fixed a security vulnerability in the Valkey dependency.

2.23.16

10/30/2025

RUN-33388

Fixed an issue where dependency checks did not run properly for clusters installed with a remote control plane.

2.23.16

10/30/2025

RUN-33447

Fixed an issue where the API allowed creating a PVC asset without a claimName when existingPVC=false.

2.23.15

10/29/2025

RUN-33176

Fixed an issue where pagination in the Node Pool page did not respond.

2.23.15

10/29/2025

RUN-33006

Fixed an issue in the CLI installer where the PATH was not configured for all shells. The installer now correctly configures PATH for both zsh and bash.

2.23.14

10/27/2025

RUN-31803

Fixed an issue where the Quota management dashboard occasionally displayed incorrect GPU quota values.

2.23.14

10/27/2025

RUN-32159

Fixed an issue where the updatedBy field of a policy did not show the latest user who updated it.

2.23.14

10/27/2025

RUN-32730

Fixed an issue where incorrect average GPU utilization per project and workload type was displayed in the Projects view charts and tables.

2.23.14

10/27/2025

RUN-33036

Fixed an issue where the grace period preemption field in the UI was limited to 5 minutes, even when the workload policy allowed longer durations.

2.23.14

10/27/2025

RUN-33039

Fixed an issue where setting uid or gid to 0 during environment creation was not allowed.

2.23.14

10/27/2025

RUN-33053

Fixed an issue that caused conflicts with additional built-in Prometheus Operator deployments in OpenShift.

2.23.14

10/27/2025

RUN-33147

Fixed an issue where users with expired refresh tokens (after 24 hours) could not log in, as the token endpoint returned a 400 error.

2.23.14

10/27/2025

RUN-33168

Fixed an issue where certain policy calls failed when at least one unconfigured cluster existed in the system.

2.23.14

10/27/2025

RUN-33177

Fixed an issue where removing the logo in Branding settings displayed an empty square.

2.23.10

10/08/2025

RUN-31738

Fixed an issue where GPU fraction requests were not applied when submitting distributed workloads.

2.23.10

10/08/2025

RUN-32876

Fixed an issue where running a NIM inference workload on a fractional GPU prevented the Triton server from starting, causing inference endpoint requests to fail.

Last updated