# Install the Cluster

In this section you will install the NVIDIA Run:ai cluster on your Kubernetes environment using Helm. The cluster extends Kubernetes with NVIDIA Run:ai orchestration capabilities - scheduling and workload management - and connects to NVIDIA’s cloud-hosted control plane for centralized management.

Once you access the NVIDIA Run:ai UI for the first time, an onboarding wizard opens automatically. The wizard guides you through the cluster setup and generates a Helm installation command.

This procedure includes:

* Adding the NVIDIA Run:ai Helm repository from NGC or JFrog
* Installing the NVIDIA Run:ai cluster into the `runai` namespace
* Registering the NVIDIA Run:ai cluster with the NVIDIA Run:ai control plane using the provided connection details

By completing this process, the NVIDIA Run:ai cluster will be connected to NVIDIA’s cloud-hosted control plane and ready to run training, inference, and other workloads.

## System and Network Requirements

Before installing the NVIDIA Run:ai cluster, validate that the [system requirements](https://github.com/run-ai/runai-product-docs/blob/SaaS/getting-started/installation/system-requirements.md) and [network requirements](/saas/getting-started/installation/install-using-helm/network-requirements.md) are met.

Once all the requirements are met, it is highly recommended to use the NVIDIA Run:ai cluster preinstall diagnostics tool to:

* Test the below requirements in addition to failure points related to Kubernetes, NVIDIA, storage, and networking
* Look at additional components installed and analyze their relevance to a successful installation

To run the preinstall diagnostics tool, [download](https://runai.jfrog.io/ui/native/pd-cli-prod/preinstall-diagnostics-cli/) the latest version, and run:

```bash
chmod +x ./preinstall-diagnostics-<platform> && \
./preinstall-diagnostics-<platform> \
  --domain ${COMPANY_NAME}.run.ai \
  --cluster-domain ${CLUSTER_FQDN}
```

For more information, see [preinstall diagnostics](https://github.com/run-ai/preinstall-diagnostics).

## Helm

NVIDIA Run:ai cluster requires Helm 3.14 or above. To install Helm, see [Helm Install](https://helm.sh/docs/helm/helm_install/).

{% hint style="info" %}
**Note**

Helm 4 defaults to [server-side apply](https://helm.sh/docs/overview/#server-side-apply) when installing a new chart release, which can conflict with resources managed by the NVIDIA Run:ai operator. Append `--server-side=false` to your `helm upgrade` command. NVIDIA Run:ai clusters originally installed with Helm 3.x are unaffected.
{% endhint %}

## Permissions

Using a Kubernetes user with the `cluster-admin` role to ensure a successful installation is recommended. For more information, see [Using RBAC authorization](https://kubernetes.io/docs/reference/access-authn-authz/rbac/).

## Installation

Follow these instructions to install using Helm.

{% hint style="info" %}
**Note**

* To customize the installation based on your environment, see [Advanced cluster configurations](/saas/infrastructure-setup/advanced-setup/cluster-config.md).
* You can store the `clientSecret` as a Kubernetes secret within the cluster instead of using plain text. You can then configure the installation to use it by setting the `controlPlane.existingSecret` and `controlPlane.secretKeys.clientSecret` parameters as described in [Advanced cluster configurations](/saas/infrastructure-setup/advanced-setup/cluster-config.md).
  {% endhint %}

### Artifact Source <a href="#artifact-source" id="artifact-source"></a>

Starting with v2.24, NVIDIA Run:ai artifacts are available on both NVIDIA NGC and JFrog. NGC is the recommended artifact source. JFrog remains supported but will be removed in a future release.

### Adding a New Cluster

When adding a cluster for the first time, the onboarding wizard opens automatically when you log in to the NVIDIA Run:ai platform. You cannot perform other actions in the platform until the cluster is created.

1. Enter a unique name for your cluster and click **CONTINUE**.
2. Set your **Cluster URL**. Enter the Kubernetes cluster's URL. It will only be accessible within the organization network. For more information, see [Fully Qualified Domain Name (FQDN)](/saas/getting-started/installation/install-using-helm/system-requirements.md#fully-qualified-domain-name-fqdn).
3. Click **CONTINUE**

#### Installation Instructions

In the next section, the NVIDIA Run:ai cluster installation steps will be presented.

1. Before installing the NVIDIA Run:ai cluster, ensure that all required [system](/saas/getting-started/installation/install-using-helm/system-requirements.md) and [network](/saas/getting-started/installation/install-using-helm/network-requirements.md) requirements are met.
2. The NVIDIA Run:ai platform displays the Helm installation command in the cluster wizard. Follow the instructions for your artifact source.

{% tabs %}
{% tab title="NGC (Recommended)" %}
Modify the UI-generated command as follows:

* Add `--username='$oauthtoken'` and `--password=<NGC_API_KEY>` to the `helm repo add` command, and replace `<NGC_API_KEY>` with your NGC API key.
* If you are using a local certificate authority, add `--set global.customCA.enabled=true` to the Helm command as described in the [Local certificate authority](/saas/getting-started/installation/install-using-helm/system-requirements.md#local-certificate-authority) section.
* The recommended ingress controller is HAProxy. If you are using a different ingress controller, update the ingress class to match the ingress controller.

```bash
helm repo add runai https://helm.ngc.nvidia.com/nvidia/runai --force-update \
  --username='$oauthtoken' \
  --password=<NGC_API_KEY>
helm repo update
helm upgrade -i runai-cluster runai/runai-cluster -n runai \
    --set controlPlane.url=... \
    --set controlPlane.clientSecret=... \
    --set cluster.uid=... \
    --set cluster.url=... --version="<VERSION>" --create-namespace \
    --set clusterConfig.global.ingress.ingressClass=haproxy
```

{% endtab %}

{% tab title="JFrog" %}
Run the Helm commands exactly as shown in the UI.

* If you are using a local certificate authority, add `--set global.customCA.enabled=true` to the Helm command as described in the [Local certificate authority](/saas/getting-started/installation/install-using-helm/system-requirements.md#local-certificate-authority) section.
* The recommended ingress controller is HAProxy. If you are using a different ingress controller, update the ingress class to match the ingress controller.

```bash
helm repo add runai https://runai.jfrog.io/artifactory/api/helm/run-ai-charts --force-update
helm repo update
helm upgrade -i runai-cluster runai/runai-cluster -n runai \
  --set controlPlane.url=... \
  --set controlPlane.clientSecret=... \
  --set cluster.uid=... \
  --set cluster.url=... --version="<VERSION>" --create-namespace
  --set clusterConfig.global.ingress.ingressClass=haproxy
```

{% endtab %}
{% endtabs %}

The wizard displays **Waiting for cluster to connect** while the cluster is being installed and connected to the control plane. Once the installation completes successfully and the cluster establishes communication with the control plane, the wizard updates to **Cluster connected**. After completing the wizard flow, the cluster is added to the [Clusters](/saas/infrastructure-setup/procedures/clusters.md) table.

## Troubleshooting

If you encounter an issue with the installation, try the troubleshooting scenario below.

### Installation

If the NVIDIA Run:ai cluster installation failed, check the installation logs to identify the issue. Run the following script to print the installation logs:

{% file src="/files/MqckgdW2v7HB4rKj98sU" %}

### Cluster Status

If the NVIDIA Run:ai cluster installation completed, but the cluster status did not change its status to Connected, check the cluster [troubleshooting scenarios](/saas/infrastructure-setup/procedures/clusters.md#troubleshooting-scenarios).

## Next Steps

Once the cluster is installed and connected, the NVIDIA Run:ai UI guides you through optional post-installation configurations. These steps are optional but recommended for a production setup:

* **SSO (Single Sign-On)** - Configure SSO to allow users to log in with your organization's identity provider. See [SSO](/saas/infrastructure-setup/authentication/sso.md) for setup instructions using SAML or OpenID Connect.
* **Create your first research team** - Set up [projects](/saas/platform-management/aiinitiatives/organization/projects.md) and [permissions](/saas/infrastructure-setup/authentication/accessrules.md) to organize your AI practitioners and allocate GPU resources.


---

# Agent Instructions: Querying This Documentation

If you need additional information that is not directly available in this page, you can query the documentation dynamically by asking a question.

Perform an HTTP GET request on the current page URL with the `ask` query parameter:

```
GET https://run-ai-docs.nvidia.com/saas/getting-started/installation/install-using-helm/helm-install.md?ask=<question>
```

The question should be specific, self-contained, and written in natural language.
The response will contain a direct answer to the question and relevant excerpts and sources from the documentation.

Use this mechanism when the answer is not explicitly present in the current page, you need clarification or additional context, or you want to retrieve related documentation sections.