Hotfixes for Version 2.23
This section provides details on all hotfixes available for version 2.23. Hotfixes are critical updates released between our major and minor versions to address specific issues or vulnerabilities. These updates ensure the system remains secure, stable, and optimized without requiring a full version upgrade.
2.23.68
04/30/2026
RUN-34648
Fixed an issue where an inference workload incorrectly displayed "Running" status when the pod was Pending after a scale-to-zero event.
2.23.67
04/30/2026
RUN-38467
Fixed a security vulnerability related to GHSA-pc3f-x583-g7j2 with severity HIGH.
2.23.67
04/30/2026
RUN-37894
Fixed a security vulnerability related to GHSA-xw7x-h9fj-p2c7 with severity CRITICAL.
2.23.66
04/27/2026
RUN-38510
Fixed a security vulnerability related to GHSA-9jj7-4m8r-rfcm with severity CRITICAL.
2.23.66
04/27/2026
RUN-38503
Fixed a security vulnerability related to GHSA-rp42-5vxx-qpwr with severity HIGH.
2.23.66
04/27/2026
RUN-38494
Fixed a security vulnerability related to GHSA-hfvc-g4fc-pqhx with severity HIGH.
2.23.66
04/27/2026
RUN-38430
Fixed an issue where workloads could not be submitted when the NVIDIA GPU Operator was deployed with the NRI plugin enabled.
2.23.66
04/27/2026
RUN-38428
Fixed a security vulnerability related to CVE-2026-40175 with severity CRITICAL.
2.23.66
04/27/2026
RUN-38405
Fixed an issue where uninstalling the cluster Helm chart on OpenShift failed because the runai-operator was missing permission to delete the runai-prometheus Secret in the openshift-monitoring namespace.
2.23.66
04/27/2026
RUN-38358
Fixed a security vulnerability related to CVE-2026-4424 with severity HIGH.
2.23.66
04/27/2026
RUN-38240
Fixed a security vulnerability related to GHSA-6v2p-p543-phr9 with severity HIGH.
2.23.66
04/27/2026
RUN-38183
Fixed a security vulnerability related to CVE-2026-21710 with severity HIGH.
2.23.66
04/27/2026
RUN-38173
Fixed a security vulnerability related to CVE-2026-32280 with severity HIGH.
2.23.66
04/27/2026
RUN-36522
Fixed a security vulnerability related to GHSA-37gf-gmxv-74wv with severity HIGH.
2.23.66
04/27/2026
RUN-36254
Fixed an issue where a race condition during webhook certificate generation caused failures.
2.23.65
04/19/2026
RUN-37532
Fixed an issue where workloads were slow to appear in the UI and API after being submitted.
2.23.65
04/19/2026
RUN-38029
Fixed an issue where the workloads pods API did not enforce project scope for user API tokens, allowing users to list pods across all projects.
2.23.65
04/19/2026
RUN-38099
Fixed a security vulnerability related to GHSA-hfvc-g4fc-pqhx with severity HIGH.
2.23.65
04/19/2026
RUN-38175
Fixed a security vulnerability related to CVE-2026-32280 with severity HIGH.
2.23.65
04/19/2026
RUN-38094
Fixed a security vulnerability related to CVE-2026-27654 with severity HIGH.
2.23.63
04/14/2026
RUN-38055
Fixed an issue where the access rules API accepted invalid subjectType values without returning a validation error.
2.23.63
04/14/2026
RUN-37959
Fixed an issue where automatic topology constraints for distributed workloads were applied at the wrong topology level.
2.23.63
04/14/2026
RUN-35919
Fixed an issue where db-migrations failed during control plane upgrades in the org-unit-service.
2.23.63
04/14/2026
RUN-37697
Fixed a security vulnerability related to GHSA-p77j-4mvh-x3m3 with severity CRITICAL.
2.23.63
04/14/2026
RUN-37984
Fixed a security vulnerability related to GHSA-r5fr-rjxr-66jc with severity HIGH.
2.23.63
04/14/2026
RUN-37895
Fixed a security vulnerability related to GHSA-c2c7-rcm5-vvqj with severity HIGH.
2.23.63
04/14/2026
RUN-37742
Fixed a security vulnerability related to CVE-2026-4111 with severity HIGH.
2.23.63
04/14/2026
RUN-37578
Fixed a security vulnerability related to GHSA-25h7-pfq9-p65f with severity HIGH.
2.23.63
04/14/2026
RUN-36615
Fixed a security vulnerability related to CVE-2024-12797 with severity HIGH.
2.23.61
03/27/2026
RUN-36559
Fixed an issue where tenant-level policy permissions could not delete policies belonging to scopes that no longer exist.
2.23.57
03/15/2026
RUN-37362
Fixed a security vulnerability related to CVE-2025-61732 with severity HIGH.
2.23.57
03/15/2026
RUN-37348
Fixed a security vulnerability related to CVE-2025-61726 with severity HIGH.
2.23.57
03/15/2026
RUN-37611
Fixed an issue in the distributed workload submission form where a project policy with a locked rule on storage instances could result in a failure to submit the workload.
2.23.57
03/15/2026
RUN-37504
Fixed a security vulnerability related to CVE-2026-25679 with severity HIGH.
2.23.57
03/15/2026
RUN-37170
Fixed a security vulnerability related to GHSA-23c5-xmqv-rm74 with severity HIGH.
2.23.57
03/15/2026
RUN-37169
Fixed a security vulnerability related to GHSA-5rq4-664w-9x2c with severity HIGH.
2.23.56
03/13/2026
RUN-37372
Fixed a security vulnerability related to CVE-2025-61731 with severity HIGH.
2.23.56
03/13/2026
RUN-37174
Fixed a security vulnerability related to GHSA-72hv-8253-57qq with severity HIGH.
2.23.55
03/11/2026
RUN-37341
Fixed a security vulnerability related to CVE-2025-61732 with severity HIGH.
2.23.46
03/09/2026
RUN-37278
Fixed a security vulnerability related to CVE-2024-1013 with severity HIGH.
2.23.45
03/09/2026
RUN-37167
Fixed a security vulnerability related to GHSA-72hv-8253-57qq with severity HIGH.
2.23.41
03/08/2026
RUN-37164
Fixed a security vulnerability related to GHSA-9h8m-3fm2-qjrq with severity HIGH.
2.23.40
03/06/2026
RUN-36734
Fixed an issue where the Analytics table displayed incorrect GPU Compute Utilization values for Training and Interactive workloads.
2.23.40
03/06/2026
RUN-36732
Fixed a security vulnerability related to GHSA-5vv4-hvf7-2h46 with severity HIGH.
2.23.39
02/26/2026
RUN-34875
Fixed an issue where enabling authentication and authorization prevented user metrics from being collected for inference workloads running on Knative and NIM.
2.23.39
02/26/2026
RUN-34624
Fixed an issue in Projects and Departments where GPU utilization/allocation metrics were not displayed if only partial data was available.
2.23.39
02/26/2026
RUN-36443
Fixed an issue where the dashboard returned a 500 error instead of an informative error message.
2.23.39
02/26/2026
RUN-36493
Fixed a security vulnerability related to GHSA-43fc-jf86-j433 with severity HIGH.
2.23.39
02/26/2026
RUN-36598
Fixed an issue where department data was not synced to the cluster, affecting both department creation and updates.
2.23.39
02/26/2026
RUN-37113
Fixed an issue where image strings that included a port number in the registry URL were not parsed correctly.
2.23.39
02/26/2026
RUN-37060
Fixed an issue where the NVLink total bytes per pod metric was labeled with GPU metrics labels instead of the expected pod labels.
2.23.39
02/26/2026
RUN-36560
Fixed an issue where the Connect button did not open the workspace URL for workspaces submitted through YAML.
2.23.39
02/26/2026
RUN-36370
Fixed an issue where NIM and HuggingFace inference templates failed to submit when a policy defined locked storage instances.
2.23.39
02/26/2026
RUN-35612
Fixed a security vulnerability related to CVE-2025-64756 with severity HIGH.
2.23.36
02/15/2026
RUN-36381
Fixed a security vulnerability related to GHSA-jmp9-x22r-554x with severity HIGH.
2.23.36
02/15/2026
RUN-36382
Fixed a security vulnerability related to GHSA-cv78-6m8q-ph82 with severity HIGH.
2.23.36
02/15/2026
RUN-36414
Fixed a security vulnerability related to CVE-2025-14459 and CVE-2025-64324 with severity HIGH.
2.23.36
02/15/2026
RUN-36457
Fixed an issue where, on rare occasions, "Allocation ratio by node pool" widget would show incorrect data.
2.23.36
02/15/2026
RUN-36020
Fixed an issue where, when swap was enabled, the toolkit-reservation pod could enter an OutOfMemory state if the kubelet detected insufficient RAM at startup, and would not automatically recover once memory was freed.
2.23.36
02/15/2026
RUN-36505
Fixed an issue where, on rare occasions, there was a race condition in some of the metrics causing the average GPU utilization to be above 100%.
2.23.36
02/15/2026
RUN-36506
Fixed an issue where the UI shows the wrong GPU quotas for node pools associated with the "Default" department.
2.23.36
02/15/2026
RUN-36555
Fixed a security vulnerability related to CVE-2024-56171 with severity HIGH.
2.23.34
02/03/2026
RUN-35326
Fixed an issue where the Projects/Departments table in the Overview dashboard sometimes showed fewer than 15 projects/departments when their workloads did not have allocated GPUs or were not in Running or Pending status.
2.23.32
01/29/2026
RUN-35976
Fixed an issue where workloads submitted with names longer than 63 characters failed to schedule.
2.23.32
01/29/2026
RUN-35511
Fixed an issue where an incorrect FQDN used during certificate generation caused errors.
2.23.32
01/29/2026
RUN-35620
Fixed an issue where providing an invalid admin password during installation caused the tenant to become permanently stuck.
2.23.31
01/26/2026
RUN-35443
Fixed a security vulnerability related to CVE-2025-68973 with severity HIGH.
2.23.31
01/26/2026
RUN-35637
Fixed an issue where, when CPU quota and Limit projects from exceeding department quota were both enabled, updating department or project memory quotas to very large values failed with incorrect validation errors, even though the values were valid.
2.23.31
01/26/2026
RUN-35922
Fixed a security vulnerability related to CVE-2026-0861 with severity HIGH.
2.23.30
01/20/2026
RUN-35623
Fixed an issue where running runai logout returned 404 Not Found when the session token had already expired. The logout command now completes successfully and returns a clear message.
2.23.30
01/20/2026
RUN-35421
Fixed a security vulnerability related to CVE-2025-15284 with severity HIGH.
2.23.29
01/11/2026
RUN-32181
Fixed a security vulnerability related to CVE-2025-32988 with severity HIGH.
2.23.29
01/11/2026
RUN-33448
Fixed an issue where switching between workloads in the workload Details drawer displayed incorrect data, particularly the workload lifespan value.
2.23.29
01/11/2026
RUN-34379
Fixed an issue where image names longer than the display limit were truncated without providing access to the full name.
2.23.29
01/11/2026
RUN-34381
Fixed an issue where the Node column displayed a sort icon but did not actually sort results in the Running / Requested Pods modal.
2.23.29
01/11/2026
RUN-34607
Fixed issues where readiness probes did not work correctly with serving port authorization in single-node Knative inference workloads.
2.23.29
01/11/2026
RUN-34613
Fixed an issue where the Project GET API returned missing limit fields instead of an explicit unlimited value when CPU quotas were enabled.
2.23.29
01/11/2026
RUN-34620
Fixed an issue where, in rare cases, sessions could disconnect due to token refresh handling.
2.23.29
01/11/2026
RUN-34639
Fixed an issue where the Fully free GPU devices column displayed - instead of 0 when no fully free GPU devices were available under fractional GPU allocations.
2.23.29
01/11/2026
RUN-34680
Fixed a security vulnerability related to CVE-2025-58183 with severity HIGH.
2.23.29
01/11/2026
RUN-34720
Fixed a security vulnerability related to CVE-2025-65637 with severity HIGH.
2.23.29
01/11/2026
RUN-35189
Fixed an issue where the --working-dir parameter was ignored for Knative-based inference workloads, causing containers to start in / instead of the specified directory.
2.23.25
12/21/2025
RUN-34791
Fixed a GPU memory swap issue where, under certain circumstances, GPU OOM killer could fail to select and preempt a GPU-consuming workload during out-of-memory or out of system RAM errors.
2.23.24
12/21/2025
RUN-34631
Fixed an issue where the identity manager failed to start when the notification service was disabled.
2.23.24
12/21/2025
RUN-34633
Fixed an issue where department administrators could not include cluster-scope templates in workloads due to incorrect validation of permitted scopes.
2.23.24
12/21/2025
RUN-34758
Fixed an issue where setting a GPU memory limit caused workload creation to fail.
2.23.24
12/21/2025
RUN-34712
Fixed a security vulnerability related to CVE-2025-61729 with severity HIGH.
2.23.23
12/09/2025
RUN-34233
Fixed an issue with refresh-token handling in legacy Grafana dashboards that caused unexpected session logouts.
2.23.23
12/09/2025
RUN-33806
Fixed an issue where containers ran as root instead of a non-privileged user.
2.23.22
12/08/2025
RUN-31856
Fixed a security vulnerability related to CVE-2025-47907 with severity HIGH.
2.23.22
12/08/2025
RUN-33516
Fixed an issue so each access rule created or deleted in a batch action is now audited in the events history.
2.23.22
12/08/2025
RUN-33780
Fixed an issue where apps with the "Viewer" role could not access node metrics, even when they had read permissions at the cluster scope.
2.23.22
12/08/2025
RUN-34429
Fixed an issue where users with the correct project permissions could create templates but were blocked from saving edits due to incorrect permission checks.
2.23.21
12/01/2025
RUN-33313
Fixed an issue where the log viewer for distributed workloads displayed only a partial and unsorted list of pods.
2.23.21
12/01/2025
RUN-33802
Fixed an issue that caused distributed inference workloads to become unsynchronized.
2.23.21
12/01/2025
RUN-33862
Fixed an issue where the workloads service could enter a CrashLoopBackOff during upgrade.
2.23.21
12/01/2025
RUN-33947
Fixed an issue where SMTP configurations using the "none" option still sent empty username/password fields. Added the auth_none type to ensure no credentials are sent for passwordless SMTP servers.
2.23.20
11/19/2025
RUN-33642
Fixed an issue where the external-workload-integrator on OpenShift entered a constant reconcile loop, causing high CPU utilization.
2.23.18
11/18/2025
RUN-33613
Fixed missing validations for CPU resources when the CPU quota feature flag was disabled, which caused project and department updates to skip required CPU checks.
2.23.18
11/18/2025
RUN-33634
Fixed an issue where resource name validation failed for hugepage resources by enhancing validation rules to properly support hugepages.
2.23.18
11/18/2025
RUN-32680
Fixed an issue where logs were not displayed in the UI for workloads submitted using the Workloads v2 submission API.
2.23.17
11/04/2025
RUN-33091
Fixed an issue where workloads logs initially loaded older logs instead of the most recent ones.
2.23.17
11/04/2025
RUN-33365
Fixed an issue where selecting an environment asset template in the flexible workload form would not present the the capabilities field correctly.
2.23.17
11/04/2025
RUN-33418
Fixed an issue where the master spec was not inherited when creating a distributed workload from a template.
2.23.16
10/30/2025
RUN-32449
Fixed an issue where a race condition between the NVIDIA Run:ai operator and upgrade/install post hooks caused the upgrade to fail
2.23.16
10/30/2025
RUN-32989
Fixed an issue where the NVIDIA Run:ai operator experienced unusually high CPU utilization after upgrade.
2.23.16
10/30/2025
RUN-33127
Fixed an issue where workload submission in the CLI failed when commands contained special characters.
2.23.16
10/30/2025
RUN-33144
Fixed a security vulnerability related to CVE-2025-62156 with severity HIGH.
2.23.16
10/30/2025
RUN-33235
Fixed a security vulnerability in the Valkey dependency.
2.23.16
10/30/2025
RUN-33388
Fixed an issue where dependency checks did not run properly for clusters installed with a remote control plane.
2.23.16
10/30/2025
RUN-33447
Fixed an issue where the API allowed creating a PVC asset without a claimName when existingPVC=false.
2.23.15
10/29/2025
RUN-33176
Fixed an issue where pagination in the Node Pool page did not respond.
2.23.15
10/29/2025
RUN-33006
Fixed an issue in the CLI installer where the PATH was not configured for all shells. The installer now correctly configures PATH for both zsh and bash.
2.23.14
10/27/2025
RUN-31803
Fixed an issue where the Quota management dashboard occasionally displayed incorrect GPU quota values.
2.23.14
10/27/2025
RUN-32159
Fixed an issue where the updatedBy field of a policy did not show the latest user who updated it.
2.23.14
10/27/2025
RUN-32730
Fixed an issue where incorrect average GPU utilization per project and workload type was displayed in the Projects view charts and tables.
2.23.14
10/27/2025
RUN-33036
Fixed an issue where the grace period preemption field in the UI was limited to 5 minutes, even when the workload policy allowed longer durations.
2.23.14
10/27/2025
RUN-33039
Fixed an issue where setting uid or gid to 0 during environment creation was not allowed.
2.23.14
10/27/2025
RUN-33053
Fixed an issue that caused conflicts with additional built-in Prometheus Operator deployments in OpenShift.
2.23.14
10/27/2025
RUN-33147
Fixed an issue where users with expired refresh tokens (after 24 hours) could not log in, as the token endpoint returned a 400 error.
2.23.14
10/27/2025
RUN-33168
Fixed an issue where certain policy calls failed when at least one unconfigured cluster existed in the system.
2.23.14
10/27/2025
RUN-33177
Fixed an issue where removing the logo in Branding settings displayed an empty square.
2.23.10
10/08/2025
RUN-31738
Fixed an issue where GPU fraction requests were not applied when submitting distributed workloads.
2.23.10
10/08/2025
RUN-32876
Fixed an issue where running a NIM inference workload on a fractional GPU prevented the Triton server from starting, causing inference endpoint requests to fail.
Last updated