Upgrade

Before Upgrade

Before proceeding with the upgrade, it's crucial to apply the specific prerequisites associated with your current version of NVIDIA Run:ai and every version in between up to the version you are upgrading to.

To ensure a smooth and supported upgrade process:

  • Align control plane and cluster versions - For best results, upgrade the control plane and cluster components to the same NVIDIA Run:ai version during the same maintenance window. Keeping versions aligned helps avoid unexpected behavior caused by version mismatches and ensures full compatibility across platform components.

  • Upgrade order - When performing an upgrade:

    • Upgrade the control plane Helm chart first

    • Upgrade the cluster Helm chart only after the control plane upgrade completes successfully

Helm

NVIDIA Run:ai requires Helmarrow-up-right 3.14 or later. Before you continue, validate your installed helm client version. To install or upgrade Helm, see Installing Helmarrow-up-right.

Software Files

Run the following commands to add the NVIDIA Run:ai Helm repository and browse the available versions:

helm repo add runai-backend https://runai.jfrog.io/artifactory/cp-charts-prod
helm repo update
helm search repo -l runai-backend

Upgrade Control Plane

System and Network Requirements

Before upgrading the NVIDIA Run:ai control plane, validate that the latest system requirements and network requirements are met, as they can change from time to time.

Upgrade

To upgrade run the following:

circle-info

Note

To upgrade to a specific version, modify the --version flag by specifying the desired <VERSION>. You can find all available versions by using the helm search repo runai-backend/control-plane --versions command.

Migrate from NGINX to HAProxy Ingress

Starting with v2.24, NVIDIA Run:ai recommends using HAProxy as the ingress controller. This change aligns with the announced retirement of the upstream NGINX Ingress Controller project. For more details, see the NGINX Ingress Controller retirement announcementarrow-up-right.

Clusters upgraded from earlier versions typically already have NGINX installed. After upgrading to v2.24, follow the steps below to migrate ingress traffic from NGINX to HAProxy.

Check the Service Type of the Existing Ingress Controller

Before installing the HAProxy ingress controller, identify which ingress controller is currently in use. If your cluster already has an ingress controller installed, verify how it is exposed to avoid port address conflicts.

If the existing ingress controller uses NodePort, note the HTTP/HTTPS NodePort values to ensure HAProxy is configured with non-overlapping ports.

circle-info

Note

If your setup differs from the examples above, adjust the configuration accordingly. When using external LoadBalancer on top of Ingress with service type NodePort, you may need to update external resources to route traffic to HAProxy’s configured NodePort values.

Install and Configure HAProxy Ingress Controller

If your cluster already has an ingress controller installed (for example, NGINX) and it is exposed via NodePort, configure HAProxy to use different NodePort values so both controllers can run simultaneously.

Ensure the selected NodePort values do not overlap with ports already used by the existing ingress controller.

Verify HAProxy Ingress

After installing the HAProxy ingress controller, verify that HAProxy ingresses are reachable before switching NVIDIA Run:ai components to use it. You can do this by deploying a simple hello-world application.

To run the test, identify the IP address that should reach the cluster’s nodes in your environment.

  1. Create a local haproxy-test.yml file:

  2. Run the following command:

Once the application is deployed, access the cluster’s IP address in a browser. If the page displays “hello from haproxy-ingress”, HAProxy is functioning correctly and you can proceed with upgrading NVIDIA Run:ai.

Upgrade the Control Plane

Run the following Helm command to update the NVIDIA Run:ai control plane to use HAProxy instead of NGINX.

Last updated