# Upgrade

## Before Upgrade

Before proceeding with the upgrade, it's crucial to apply the specific prerequisites associated with your current version of NVIDIA Run:ai and every version in between up to the version you are upgrading to.

To ensure a smooth and supported upgrade process:

* **Align control plane and cluster versions** - For best results, upgrade the control plane and cluster components to the same NVIDIA Run:ai version during the same maintenance window. Keeping versions aligned helps avoid unexpected behavior caused by version mismatches and ensures full compatibility across platform components.
* **Upgrade order** - When performing an upgrade:
  * Upgrade the control plane Helm chart first
  * Upgrade the cluster Helm chart only after the control plane upgrade completes successfully

## Helm

NVIDIA Run:ai requires [Helm](https://helm.sh/) 3.14 or later. Before you continue, validate your installed helm client version. To install or upgrade Helm, see [Installing Helm](https://helm.sh/docs/intro/install/). If you are installing an air-gapped version of NVIDIA Run:ai, the NVIDIA Run:ai tar file contains the helm binary.

## Software Files

{% tabs %}
{% tab title="Connected" %}
Run the following commands to add the NVIDIA Run:ai Helm repository and browse the available versions:

```bash
helm repo add runai-backend https://runai.jfrog.io/artifactory/cp-charts-prod
helm repo update
helm search repo -l runai-backend
```

{% endtab %}

{% tab title="Air-gapped" %}
Run the following command to browse all available air-gapped packages using the token provided by NVIDIA Run:ai.

To download and extract a specific version, and to upload the container images to your private registry, see the [Preparations](https://run-ai-docs.nvidia.com/self-hosted/2.22/getting-started/installation/install-using-helm/preparations) section.

```bash
curl -H "Authorization: Bearer <token>" "https://runai.jfrog.io/artifactory/api/storage/runai-airgapped-prod/?list"
```

{% endtab %}
{% endtabs %}

## Upgrade Control Plane

### System and Network Requirements

Before upgrading the NVIDIA Run:ai control plane, validate that the latest [system requirements](https://run-ai-docs.nvidia.com/self-hosted/2.22/getting-started/installation/install-using-helm/cp-system-requirements) and [network requirements](https://run-ai-docs.nvidia.com/self-hosted/2.22/getting-started/installation/install-using-helm/network-requirements) are met, as they can change from time to time.

### Upgrade

{% hint style="info" %}
**Note**

To upgrade to a specific version, modify the `--version` flag by specifying the desired `<VERSION>`. You can find all available versions by using the `helm search repo runai-backend/control-plane --versions` command.
{% endhint %}

**Upgrading from Version 2.16**

You must perform a two-step upgrade:

1. Upgrade to version 2.18:

{% tabs %}
{% tab title="Connected" %}

```bash
helm get values runai-backend -n runai-backend > runai_control_plane_values.yaml
helm upgrade runai-backend -n runai-backend runai-backend/control-plane --version "2.18.0" -f runai_control_plane_values.yaml --reset-then-reuse-values
```

{% endtab %}

{% tab title="Air-gapped" %}

```bash
helm get values runai-backend -n runai-backend > runai_control_plane_values.yaml
helm upgrade runai-backend control-plane-2.18.0.tgz -n runai-backend -f runai_control_plane_values.yaml --reset-then-reuse-values
```

{% endtab %}
{% endtabs %}

2. Then upgrade to the required version:

{% tabs %}
{% tab title="Connected" %}

```bash
helm get values runai-backend -n runai-backend > runai_control_plane_values.yaml
helm upgrade runai-backend -n runai-backend runai-backend/control-plane --version "<VERSION>" -f runai_control_plane_values.yaml --reset-then-reuse-values
```

{% endtab %}

{% tab title="Air-gapped" %}

```bash
helm get values runai-backend -n runai-backend > runai_control_plane_values.yaml
helm upgrade runai-backend control-plane-<NEW-VERSION>.tgz -n runai-backend -f runai_control_plane_values.yaml --reset-then-reuse-values
```

{% endtab %}
{% endtabs %}

**Upgrading from Version 2.17 or Later**

If your current version is 2.17 or higher, you can upgrade directly to the required version:

{% tabs %}
{% tab title="Connected" %}

```bash
helm get values runai-backend -n runai-backend > runai_control_plane_values.yaml
helm upgrade runai-backend -n runai-backend runai-backend/control-plane --version "<VERSION>" -f runai_control_plane_values.yaml --reset-then-reuse-values
```

{% endtab %}

{% tab title="Air-gapped" %}

```bash
helm get values runai-backend -n runai-backend > runai_control_plane_values.yaml
helm upgrade runai-backend control-plane-<NEW-VERSION>.tgz -n runai-backend -f runai_control_plane_values.yaml --reset-then-reuse-values
```

{% endtab %}
{% endtabs %}

## Upgrade Cluster

### System and Network Requirements

Before upgrading the NVIDIA Run:ai cluster, validate that the latest [system requirements](https://run-ai-docs.nvidia.com/self-hosted/2.22/getting-started/installation/install-using-helm/system-requirements) and [network requirements](https://run-ai-docs.nvidia.com/self-hosted/2.22/getting-started/installation/install-using-helm/network-requirements) are met, as they can change from time to time.

{% hint style="info" %}
**Note**

It is highly recommended to upgrade the Kubernetes version together with the NVIDIA Run:ai cluster version, to ensure compatibility with latest supported version of your [Kubernetes distribution](https://run-ai-docs.nvidia.com/self-hosted/2.22/getting-started/installation/system-requirements#kubernetes-distribution).
{% endhint %}

### Getting Installation Instructions

Follow the setup and installation instructions below to get the installation instructions to upgrade the NVIDIA Run:ai cluster.

{% hint style="info" %}
**Note**

To upgrade to a specific version, modify the `--version` flag by specifying the desired `<VERSION>`. You can find all available versions by using the `helm search repo runai/runai-cluster --versions` command.
{% endhint %}

#### Setup

1. In the NVIDIA Run:ai UI, go to **Clusters**
2. Select the cluster you want to upgrade
3. Click **INSTALLATION INSTRUCTIONS**
4. Optional: Select the NVIDIA Run:ai cluster version (latest, by default)
5. Click **CONTINUE**

#### Installation Instructions

1. Follow the installation instructions. Run the Helm commands provided on your Kubernetes cluster. See the below if [installation fails](#installation-fails).
2. Click **DONE**
3. Once installation is complete, validate the cluster is **Connected** and listed with the new cluster version (see the [cluster troubleshooting scenarios](https://run-ai-docs.nvidia.com/self-hosted/2.22/infrastructure-setup/procedures/clusters#troubleshooting-scenarios)). Once you have done this, the cluster is upgraded to the latest version.

### Troubleshooting

If you encounter an issue with the cluster upgrade, use the troubleshooting scenarios below.

#### Installation Fails

If the NVIDIA Run:ai cluster upgrade fails, check the installation logs to identify the issue.

Run the following script to print the installation logs:

```bash
curl -fsSL https://raw.githubusercontent.com/run-ai/public/main/installation/get-installation-logs.sh
```

#### Cluster Status

If the NVIDIA Run:ai cluster upgrade completes, but the cluster status does not show as **Connected**, refer to [Troubleshooting scenarios](https://run-ai-docs.nvidia.com/self-hosted/2.22/infrastructure-setup/procedures/clusters#troubleshooting-scenarios).
