Upgrade

This section explains how to upgrade NVIDIA Run:ai cluster version.

System and Network Requirements

Before upgrading the NVIDIA Run:ai cluster, validate that the latest system requirements and network requirements are met, as they can change from time to time.

Note

It is highly recommended to upgrade the Kubernetes version together with the NVIDIA Run:ai cluster version, to ensure compatibility with latest supported version of your Kubernetes distribution.

Helm

The latest releases of the NVIDIA Run:ai cluster require Helm 3.14 or above.

Upgrade

Follow the instructions to upgrade using Helm. The Helm commands to upgrade the NVIDIA Run:ai cluster version may differ between versions. The steps below describe how to get the instructions from the NVIDIA Run:ai UI.

Getting Installation Instructions

Follow the setup and installation instructions below to get the installation instructions to upgrade the NVIDIA Run:ai cluster.

Setup

In the NVIDIA Run:ai UI, go to Clusters
Select the cluster you want to upgrade
Click INSTALLATION INSTRUCTIONS
Optional: Select the NVIDIA Run:ai cluster version (latest, by default)
Click CONTINUE

Installation Instructions

Follow the installation instructions. Run the Helm commands provided on your Kubernetes cluster. See the below if installation fails.
Click DONE
Once installation is complete, validate the cluster is Connected and listed with the new cluster version (see the cluster troubleshooting scenarios). Once you have done this, the cluster is upgraded to the latest version.

Migrate from NGINX to HAProxy Ingress

Starting with v2.24, NVIDIA Run:ai recommends using HAProxy as the ingress controller. This change aligns with the announced retirement of the upstream NGINX Ingress Controller project. For more details, see the NGINX Ingress Controller retirement announcement.

Clusters upgraded from earlier versions typically already have NGINX installed. After upgrading to v2.24, follow the steps below to migrate ingress traffic from NGINX to HAProxy.

Check the Service Type of the Existing Ingress Controller

Before installing the HAProxy ingress controller, identify which ingress controller is currently in use. If your cluster already has an ingress controller installed, verify how it is exposed to avoid port or IP address conflicts.

kubectl get svc -n <nginx-namespace>

If the existing ingress controller uses NodePort, note the HTTP/HTTPS NodePort values to ensure HAProxy is configured with non-overlapping ports.
If the existing ingress controller uses LoadBalancer, no additional action is required.

When running more than one ingress controller in the same cluster, port conflicts are relevant only for NodePort-based setups. LoadBalancer-based controllers automatically receive separate external IP addresses.

Note

If your setup differs from the examples above, adjust the configuration accordingly. When using external LoadBalancer on top of Ingress with service type NodePort, you may need to update external resources to route traffic to HAProxy’s configured NodePort values.

Install and Configure HAProxy Ingress Controller

Ingress controllers can be installed and configured in different ways depending on your Kubernetes distribution and how you expose services (for example, NodePort vs. LoadBalancer).

The sections below provide environment-specific Helm installation examples. Select the option that matches your deployment environment.

Note

OpenShift and RKE2 include a pre-installed ingress controller by default.

Vanilla Kubernetes

If your cluster already has an ingress controller installed (for example, NGINX) and it is exposed via NodePort, configure HAProxy to use different NodePort values so both controllers can run simultaneously.

Ensure the selected NodePort values do not overlap with ports already used by the existing ingress controller.

helm repo add haproxytech https://haproxytech.github.io/helm-charts
helm repo update
helm install haproxy-kubernetes-ingress haproxytech/kubernetes-ingress \
  --create-namespace \
  --namespace haproxy-controller \
  --set controller.ingressClassResource.enabled=true \
  --set controller.service.type=NodePort \
  --set controller.service.nodePorts.http=32080 \
  --set controller.service.nodePorts.https=32443

Managed Kubernetes (EKS, GKE, AKS)

When using a LoadBalancer, each ingress controller automatically receives its own external IP address from the cloud provider. This allows multiple ingress controllers to run in the same cluster without additional configuration.

helm repo add haproxytech https://haproxytech.github.io/helm-charts
helm repo update
helm install haproxy-kubernetes-ingress haproxytech/kubernetes-ingress \
--create-namespace \
--namespace haproxy-controller \
--set controller.service.type=LoadBalancer \

Oracle Kubernetes Engine (OKE)

helm repo add haproxytech https://haproxytech.github.io/helm-charts
helm repo update
helm install haproxy-kubernetes-ingress haproxytech/kubernetes-ingress \
  --create-namespace \
  --namespace haproxy-controller \
  --set controller.kind=DaemonSet \
  --set controller.service.type=LoadBalancer \
  --set controller.service.externalTrafficPolicy=Local \
  --set controller.service.annotations."oci-network-load-balancer\.oraclecloud\.com/is-preserve-source"="True" \
  --set controller.service.annotations."oci-network-load-balancer\.oraclecloud\.com/security-list-management-mode"=All \
  --set controller.service.annotations."oci\.oraclecloud\.com/load-balancer-type"=nlb

Verify HAProxy Ingress

After installing the HAProxy ingress controller, verify that HAProxy ingresses are reachable before switching NVIDIA Run:ai components to use it. You can do this by deploying a simple hello-world application.

To run the test, identify the IP address that should reach the cluster’s nodes in your environment.

Create a local haproxy-test.yml file:

apiVersion: apps/v1
kind: Deployment
metadata:
  name: hello
spec:
  replicas: 1
  selector:
    matchLabels:
      app: hello
  template:
    metadata:
      labels:
        app: hello
    spec:
      containers:
      - name: hello
        image: hashicorp/http-echo:1.0
        args:
          - "-text=hello from haproxy-ingress"
        ports:
          - containerPort: 5678
---
apiVersion: v1
kind: Service
metadata:
  name: hello
spec:
  selector:
    app: hello
  ports:
  - port: 80
    targetPort: 5678
---
apiVersion: networking.k8s.io/v1
kind: Ingress
metadata:
  name: hello
spec:
  ingressClassName: haproxy
  rules:
  - http:
      paths:
      - path: /
        pathType: Prefix
        backend:
          service:
            name: hello
            port:
              number: 80

Run the following command:
```
kubectl apply -f ha-proxy-test.yml
```

Once the application is deployed, access the cluster’s IP address in a browser. If the page displays “hello from haproxy-ingress”, HAProxy is functioning correctly and you can proceed with upgrading NVIDIA Run:ai.

Upgrade the Cluster

Setup

In the NVIDIA Run:ai UI, go to Clusters
Select the cluster you want to upgrade
Click INSTALLATION INSTRUCTIONS
Click CONTINUE

Installation Instructions

Follow the installation instructions. Run the Helm commands provided on your Kubernetes cluster.
If not present, add the following flag to the helm install command:
```
--set clusterConfig.global.ingress.ingressClass=haproxy
```
Click DONE
Once installation is complete, validate the cluster is Connected and listed with the new cluster version (see the cluster troubleshooting scenarios). Once you have done this, the cluster is upgraded and the workloads in this cluster will now use HAProxy instead of NGINX.

Troubleshooting

If you encounter an issue with the cluster upgrade, use the troubleshooting scenarios below.

Installation Fails

If the NVIDIA Run:ai cluster upgrade fails, check the installation logs to identify the issue.

Run the following script to print the installation logs:

curl -fsSL https://raw.githubusercontent.com/run-ai/public/main/installation/get-installation-logs.sh

Cluster Status

If the NVIDIA Run:ai cluster upgrade completes, but the cluster status does not show as Connected, refer to the cluster troubleshooting scenarios.

PreviousInstall the Cluster NextUninstall

Last updated 18 days ago

Good afternoon

hashtagSystem and Network Requirements

hashtagHelm

hashtagUpgrade

hashtagGetting Installation Instructions

hashtagSetup

hashtagInstallation Instructions

hashtagMigrate from NGINX to HAProxy Ingress

hashtagCheck the Service Type of the Existing Ingress Controller

hashtagInstall and Configure HAProxy Ingress Controller

hashtagVerify HAProxy Ingress

hashtagUpgrade the Cluster

hashtagSetup

hashtagInstallation Instructions

hashtagTroubleshooting

hashtagInstallation Fails

hashtagCluster Status

System and Network Requirements

Helm

Upgrade

Getting Installation Instructions

Setup

Installation Instructions

Migrate from NGINX to HAProxy Ingress

Check the Service Type of the Existing Ingress Controller

Install and Configure HAProxy Ingress Controller

Verify HAProxy Ingress

Upgrade the Cluster

Setup

Installation Instructions

Troubleshooting

Installation Fails

Cluster Status