Hotfixes for Version 2.23

This section provides details on all hotfixes available for version 2.23. Hotfixes are critical updates released between our major and minor versions to address specific issues or vulnerabilities. These updates ensure the system remains secure, stable, and optimized without requiring a full version upgrade.

Version
Date
Internal ID
Description

2.23.29

11/01/2026

RUN-32181

Fixed a security vulnerability related to CVE-2025-32988 with severity HIGH.

2.23.29

11/01/2026

RUN-33448

Fixed an issue where switching between workloads in the workload Details drawer displayed incorrect data, particularly the workload lifespan value.

2.23.29

11/01/2026

RUN-34379

Fixed an issue where image names longer than the display limit were truncated without providing access to the full name.

2.23.29

11/01/2026

RUN-34381

Fixed an issue where the Node column displayed a sort icon but did not actually sort results in the Running / Requested Pods modal.

2.23.29

11/01/2026

RUN-34607

Fixed issues where readiness probes did not work correctly with serving port authorization in single-node Knative inference workloads.

2.23.29

11/01/2026

RUN-34613

Fixed an issue where the Project GET API returned missing limit fields instead of an explicit unlimited value when CPU quotas were enabled.

2.23.29

11/01/2026

RUN-34620

Fixed an issue where, in rare cases, sessions could disconnect due to token refresh handling.

2.23.29

11/01/2026

RUN-34639

Fixed an issue where the Fully free GPU devices column displayed - instead of 0 when no fully free GPU devices were available under fractional GPU allocations.

2.23.29

11/01/2026

RUN-34680

Fixed a security vulnerability related to CVE-2025-58183 with severity HIGH.

2.23.29

11/01/2026

RUN-34720

Fixed a security vulnerability related to CVE-2025-65637 with severity HIGH.

2.23.29

11/01/2026

RUN-35189

Fixed an issue where the --working-dir parameter was ignored for Knative-based inference workloads, causing containers to start in / instead of the specified directory.

2.23.25

21/12/2025

RUN-34791

Fixed a GPU memory swap issue where, under certain circumstances, GPU OOM killer could fail to select and preempt a GPU-consuming workload during out-of-memory or out of system RAM errors.

2.23.24

21/12/2025

RUN-34631

Fixed an issue where the identity manager failed to start when the notification service was disabled.

2.23.24

21/12/2025

RUN-34633

Fixed an issue where department administrators could not include cluster-scope templates in workloads due to incorrect validation of permitted scopes.

2.23.24

21/12/2025

RUN-34758

Fixed an issue where setting a GPU memory limit caused workload creation to fail.

2.23.24

21/12/2025

RUN-34712

Fixed a security vulnerability related to CVE-2025-61729 with severity HIGH.

2.23.23

09/12/2025

RUN-34233

Fixed an issue with refresh-token handling in legacy Grafana dashboards that caused unexpected session logouts.

2.23.23

09/12/2025

RUN-33806

Fixed an issue where containers ran as root instead of a non-privileged user.

2.23.22

08/12/2025

RUN-31856

Fixed a security vulnerability related to CVE-2025-47907 with severity HIGH.

2.23.22

08/12/2025

RUN-33516

Fixed an issue so each access rule created or deleted in a batch action is now audited in the events history.

2.23.22

08/12/2025

RUN-33780

Fixed an issue where apps with the “Viewer” role could not access node metrics, even when they had read permissions at the cluster scope.

2.23.22

08/12/2025

RUN-34429

Fixed an issue where users with the correct project permissions could create templates but were blocked from saving edits due to incorrect permission checks.

2.23.21

01/12/2025

RUN-33313

Fixed an issue where the log viewer for distributed workloads displayed only a partial and unsorted list of pods.

2.23.21

01/12/2025

RUN-33802

Fixed an issue that caused distributed inference workloads to become unsynchronized.

2.23.21

01/12/2025

RUN-33862

Fixed an issue where the workloads service could enter a CrashLoopBackOff during upgrade.

2.23.21

01/12/2025

RUN-33947

Fixed an issue where SMTP configurations using the “none” option still sent empty username/password fields. Added the auth_none type to ensure no credentials are sent for passwordless SMTP servers.

2.23.20

19/11/2025

RUN-33642

Fixed an issue where the external-workload-integrator on OpenShift entered a constant reconcile loop, causing high CPU utilization.

2.23.18

18/11/2025

RUN-33613

Fixed missing validations for CPU resources when the CPU quota feature flag was disabled, which caused project and department updates to skip required CPU checks.

2.23.18

18/11/2025

RUN-33634

Fixed an issue where resource name validation failed for hugepage resources by enhancing validation rules to properly support hugepages.

2.23.18

18/11/2025

RUN-32680

Fixed an issue where logs were not displayed in the UI for workloads submitted using the Workloads v2 submission API.

2.23.17

04/11/2025

RUN-33091

Fixed an issue where workloads logs initially loaded older logs instead of the most recent ones.

2.23.17

04/11/2025

RUN-33365

Fixed an issue where selecting an environment asset template in the flexible workload form would not present the the capabilities field correctly.

2.23.17

04/11/2025

RUN-33418

Fixed an issue where the master spec was not inherited when creating a distributed workload from a template.

2.23.16

30/10/2025

RUN-32449

Fixed an issue where a race condition between the NVIDIA Run:ai operator and upgrade/install post hooks caused the upgrade to fail

2.23.16

30/10/2025

RUN-32989

Fixed an issue where the NVIDIA Run:ai operator experienced unusually high CPU utilization after upgrade.

2.23.16

30/10/2025

RUN-33127

Fixed an issue where workload submission in the CLI failed when commands contained special characters.

2.23.16

30/10/2025

RUN-33144

Fixed a security vulnerability related to CVE-2025-62156 with severity HIGH.

2.23.16

30/10/2025

RUN-33235

Fixed a security vulnerability in the Valkey dependency.

2.23.16

30/10/2025

RUN-33388

Fixed an issue where dependency checks did not run properly for clusters installed with a remote control plane.

2.23.16

30/10/2025

RUN-33447

Fixed an issue where the API allowed creating a PVC asset without a claimName when existingPVC=false.

2.23.15

29/10/2025

RUN-33176

Fixed an issue where pagination in the Node Pool page did not respond.

2.23.15

29/10/2025

RUN-33006

Fixed an issue in the CLI installer where the PATH was not configured for all shells. The installer now correctly configures PATH for both zsh and bash.

2.23.14

27/10/2025

RUN-31803

Fixed an issue where the Quota management dashboard occasionally displayed incorrect GPU quota values.

2.23.14

27/10/2025

RUN-32159

Fixed an issue where the updatedBy field of a policy did not show the latest user who updated it.

2.23.14

27/10/2025

RUN-32730

Fixed an issue where incorrect average GPU utilization per project and workload type was displayed in the Projects view charts and tables.

2.23.14

27/10/2025

RUN-33036

Fixed an issue where the grace period preemption field in the UI was limited to 5 minutes, even when the workload policy allowed longer durations.

2.23.14

27/10/2025

RUN-33039

Fixed an issue where setting uid or gid to 0 during environment creation was not allowed.

2.23.14

27/10/2025

RUN-33053

Fixed an issue that caused conflicts with additional built-in Prometheus Operator deployments in OpenShift.

2.23.14

27/10/2025

RUN-33147

Fixed an issue where users with expired refresh tokens (after 24 hours) could not log in, as the token endpoint returned a 400 error.

2.23.14

27/10/2025

RUN-33168

Fixed an issue where certain policy calls failed when at least one unconfigured cluster existed in the system.

2.23.14

27/10/2025

RUN-33177

Fixed an issue where removing the logo in Branding settings displayed an empty square.

2.23.10

08/10/2025

RUN-31738

Fixed an issue where GPU fraction requests were not applied when submitting distributed workloads.

2.23.10

08/10/2025

RUN-32876

Fixed an issue where running a NIM inference workload on a fractional GPU prevented the Triton server from starting, causing inference endpoint requests to fail.

Last updated