Hotfixes for Version 2.21

This section provides details on all hotfixes available for version 2.21. Hotfixes are critical updates released between our major and minor versions to address specific issues or vulnerabilities. These updates ensure the system remains secure, stable, and optimized without requiring a full version upgrade.

Version
Date
Internal ID
Description

2.21.25

11/06/2025

RUN-29548

Fixed a typo in the documentation where the API key was incorrectly written as enforceRun:aiScheduler instead of the correct enforceRunaiScheduler.

2.21.25

11/06/2025

RUN-29320

Fixed an issue in CLI v2 where the update server did not receive the terminal size during exec commands requiring TTY support. The terminal size is now set once upon session creation, ensuring proper behavior for interactive sessions.

2.21.24

08/06/2025

RUN-29282

Fixed a security vulnerability in golang.org.x.crypto related to CVE-2025-22869 with severity HIGH.

2.21.23

08/06/2025

RUN-28891

  • Fixed a security vulnerability in golang.org/x/crypto related to CVE-2024-45337 with severity HIGH.

  • Fixed a security vulnerability in go-git/go-git related to CVE-2025-21613 with severity HIGH.

2.21.23

08/06/2025

RUN-25281

Fixed an issue where deploying a Hugging Face model with vLLM using the Hugging Face inference UI form on an OpenShift environment failed due to permission errors.

2.21.22

03/06/2025

RUN-29341

Fixed an issue which caused high CPU usage in the Cluster API.

2.21.22

03/06/2025

RUN-29323

Fixed an issue where Prometheus failed to send metrics for OpenShift.

2.21.19

27/05/2025

RUN-29093

Fixed an issue where rotating the runai-config webhook secret caused the app.kubernetes.io/managed-by=helm label to be removed.

2.21.18

27/05/2025

RUN-28286

Fixed an issue where CPU-only workloads incorrectly triggered idle timeout notifications intended for GPU workloads.

2.21.18

27/05/2025

RUN-28555

Fixed an issue in Admin → General Settings where the "Disabled" workloads count displayed inconsistently between the collapsed and expanded views.

2.21.18

27/05/2025

RUN-26361

Fixed an issue where Prometheus remote-write credentials were not properly updated on OpenShift clusters.

2.21.18

27/05/2025

RUN-28780

Fixed an issue where Hugging Face model validation incorrectly blocked some valid models supported by vLLM and TGI.

2.21.18

27/05/2025

RUN-28851

Fixed an issue in CLI v2 where the port-forward command terminated SSH connections after 15–30 seconds due to an idle timeout.

2.21.18

27/05/2025

RUN-25281

Fixed an issue where the Hugging Face UI submission flow failed on OpenShift (OCP) clusters.

2.21.17

21/05/2025

RUN-28266

Fixed an issue where the documentation examples for the runai workload delete CLI command were incorrect.

2.21.17

21/05/2025

RUN-28609

Fixed an issue where users with the ML Engineer role were unable to delete multiple inference jobs at once.

2.21.17

21/05/2025

RUN-28665

Fixed an issue where using servingPort authorization fields in the Create an inference API on unsupported clusters did not return an error.

2.21.17

21/05/2025

RUN-28717

Fixed an issue where the Update inference spec API documentation listed an incorrect response code.

2.21.17

21/05/2025

RUN-28755

Fixed an issue where the tooltip next to the External URL for an inference endpoint incorrectly stated that the URL was internal.

2.21.17

21/05/2025

RUN-28762

Fixed an issue with the inference workload ownership protection.

2.21.17

21/05/2025

RUN-28859

Fixed an issue where the knative.enable-scale-to-zero setting did not default to true as expected.

2.21.17

21/05/2025

RUN-28923

Fixed an issue where calling the Get node telemetry data API with the telemetryType IDLE_ALLOCATED_GPUS resulted in a 500 Internal Server Error.

2.21.17

21/05/2025

RUN-28950

Fixed a security vulnerability in github.com/moby and github.com/docker/docker related to CVE-2024-41110 with severity Critical.

2.21.16

18/05/2025

RUN-27295

Fixed an issue in CLI v2 where the --node-type flag for inference workloads was not properly propagated to the pod specification.

2.21.16

18/05/2025

RUN-27375

Fixed an issue where projects were not visible in the legacy job submission form, preventing users from selecting a target project.

2.21.16

18/05/2025

RUN-27514

Fixed an issue where disabling CPU quota in the General settings did not remove existing CPU quotas from projects and departments.

2.21.16

18/05/2025

RUN-27521

Fixed a security vulnerability in axios related to CVE-2025-27152 with severity HIGH.

2.21.16

18/05/2025

RUN-27638

Fixed an issue where a node pool’s placement strategy stopped functioning correctly after being edited.

2.21.16

18/05/2025

RUN-27438

Fixed an issue where MPI jobs were unavailable due to an OpenShift MPI Operator installation error.

2.21.16

18/05/2025

RUN-27952

Fixed a security vulnerability in emacs-filesystem related to CVE-2025-1244 with severity HIGH.

2.21.16

18/05/2025

RUN-28244

Fixed a security vulnerability in liblzma5 related to CVE-2025-31115 with severity HIGH.

2.21.16

18/05/2025

RUN-28006

Fixed an issue where tokens became invalid for the API server after one hour.

2.21.16

18/05/2025

RUN-28097

Fixed an issue where the allocated_gpu_count_per_gpu metric displayed incorrect data for fractional pods.

2.21.16

18/05/2025

RUN-28213

Fixed a security vulnerability in github.com.golang.org.x.crypto related to CVE-2025-22869 with severity HIGH.

2.21.16

18/05/2025

RUN-28311

Fixed an issue where user creation failed with a duplicate email error, even though the email address did not exist in the system.

2.21.16

18/05/2025

RUN-28832

Fixed inference CLI v2 documentation with examples that reflect correct usage.

2.21.15

30/04/2025

RUN-27533

Fixed an issue where workloads with idle GPUs were not suspended after exceeding the configured idle time.

2.21.14

29/04/2025

RUN-26608

Fixed an issue by adding a flag to the cli config set command and the CLI install script, allowing users to set a cache directory.

2.21.14

29/04/2025

RUN-27264

Fixed an issue where creating a project from the UI with a non-unlimited deserved CPU value caused the queue to be created with limit = deserved instead of unlimited.

2.21.14

29/04/2025

RUN-27484

Fixed an issue where duplicate app.kubernetes.io/name labels were applied to services in the control plane Helm chart.

2.21.14

29/04/2025

RUN-27502

Fixed the inference CLI commands documentation: --max-replicas and --min-replicas were incorrectly used instead of --max-scale and --min-scale.

2.21.14

29/04/2025

RUN-27513

Fixed an issue where cluster-scoped policies were not visible to users with appropriate permissions.

2.21.14

29/04/2025

RUN-27515

Fixed an issue where users were unable to use assets from an upper scope during flexible workload submissions.

2.21.14

29/04/2025

RUN-27520

Fixed an issue where adding access rules immediately after creating an application did not refresh the access rules table.

2.21.14

29/04/2025

RUN-27628

Fixed an issue where a node pool could remain stuck in Updating status in certain cases.

2.21.14

29/04/2025

RUN-27826

Fixed an issue where the runai inference update command could result in a failure to update the workload. Although the command itself succeeded (since the update is asynchronous), the update often failed, and the new spec was not applied.

2.21.14

29/04/2025

RUN-27915

Fixed an issue where the "Improved Command Line Interface" admin setting was incorrectly labeled as Beta instead of Stable.

2.21.11

29/04/2025

RUN-27251

  • Fixed a security vulnerability in github.com.golang-jwt.jwt.v4 and github.com.golang-jwt.jwt.v5 with CVE-2025-30204 with severity HIGH.

  • Fixed a security vulnerability in golang.org.x.net with CVE-2025-22872 with severity MEDIUM.

  • Fixed a security vulnerability in knative.dev/serving with CVE-2023-48713 with severity MEDIUM.

2.21.11

29/04/2025

RUN-27309

Fixed an issue where workloads configured with a multi node pool setup could fail to schedule on a specific node pool in the future after an initial scheduling failure, even if sufficient resources later became available.

2.21.10

29/04/2025

RUN-26992

Fixed an issue where workloads submitted with an invalid node port range would get stuck in Creating status.

2.21.10

29/04/2025

RUN-27497

Fixed an issue where, after deleting an SSO user and immediately creating a local user, the delete confirmation dialog reappeared unexpectedly.

2.21.9

15/04/2025

RUN-26989

Fixed an issue that prevented reordering node pools in the workload submission form.

2.21.9

15/04/2025

RUN-27247

Fixed security vulnerabilities in Spring framework used by db-mechanic service - CVE-2021-27568, CVE-2021-44228, CVE-2022-22965, CVE-2023-20873, CVE-2024-22243, CVE-2024-22259 and CVE-2024-22262.

2.21.9

15/04/2025

RUN-26359

Fixed an issue in CLI v2 where using the --toleration option required incorrect mandatory fields.

Last updated