Install the Control Plane

In this section you will install the NVIDIA Run:ai control plane on your Kubernetes cluster using Helm. The control plane provides the central management layer for NVIDIA Run:ai handling multi-cluster management, resource and access management as well as workload submission and monitoring.

This procedure includes:

  • Adding the NVIDIA Run:ai Helm repository from NGC or JFrog

  • Configuring key settings such as domain name, ingress, and administrator credentials

  • Deploying the control plane into the runai-backend namespace

By completing this process, the NVIDIA Run:ai control plane will be running in your cluster and accessible via the configured domain.

System and Network Requirements

Before installing the NVIDIA Run:ai control plane, validate that the system requirements and network requirements are met. For air-gapped environments, make sure you have the software artifacts prepared.

Permissions

As part of the installation, you will be required to install the NVIDIA Run:ai control plane Helm charts. The Helm charts require Kubernetes administrator permissions. You can review the exact objects that are created by the charts using the --dry-run on both helm charts.

Installation

Note

Artifact Source

Starting with v2.24, NVIDIA Run:ai artifacts are available on both NVIDIA NGC and JFrog. NGC is the recommended artifact source. JFrog remains supported in v2.24 but will be removed in a future release. For connected environments, follow the instructions for your artifact source in the sections below. For air-gapped environments, the installation steps are the same regardless of artifact source. Artifacts are prepared in the Preparations step.

Kubernetes

Connected

Run the following command and update the values as described below:

  • Replace global.domain=<DOMAIN> with the fully qualified domain name (FQDN) obtained in the Fully qualified domain name section.

  • The recommended ingress controller is HAProxy. If you are using a different ingress controller, update the ingress class: --set global.ingress.ingressClass=<ingress class>

  • If you are using a local certificate authority, add --set global.customCA.enabled=true to the Helm command as described in the Local certificate authority section.

  • Set tenantsManager.config.adminUsername=<ADMIN_EMAIL> to the administrator's email address.

  • Set tenantsManager.config.adminPassword=<ADMIN_PASSWORD> to the initial administrator password. The password must meet the following requirements:

    • Minimum Length: Passwords must be at least 8 characters long.

    • Digits: Must contain at least 1 numeric digit (0-9).

    • Lowercase Characters: Must contain at least 1 lowercase letter (a-z).

    • Uppercase Characters: Must contain at least 1 uppercase letter (A-Z).

    • Special Characters: Must contain at least 1 special character (e.g., !, @, #, $).

NGC (Recommended)

Replace <NGC_API_KEY> with your NGC API key.

For example:

JFrog

For example:

Note

Use the dry-run flag --dry-run=client to gain an understanding of what is being installed before the actual installation.

Air-gapped

Run the following command and update the values as described below. The custom-env.yaml file is created during the preparations step:

  • Replace control-plane-<VERSION>.tgz with the full filename of the control plane Helm chart (e.g., control-plane-2.24.58.tgz), located in the chart folder of the extracted software artifacts.

  • Replace global.domain=<DOMAIN> with the fully qualified domain name (FQDN) obtained in the Fully qualified domain name section.

  • Set global.customCA.enabled=true as described in the Local certificate authority section.

  • The recommended ingress controller is HAProxy. If you are using a different ingress controller, update the ingress class: --set global.ingress.ingressClass=<ingress class>

  • Set tenantsManager.config.adminUsername=<ADMIN_EMAIL> to the administrator’s email address.

  • Set tenantsManager.config.adminPassword=<ADMIN_PASSWORD> to the initial administrator password. The password must meet the following requirements:

    • Minimum Length: Passwords must be at least 8 characters long.

    • Digits: Must contain at least 1 numeric digit (0-9).

    • Lowercase Characters: Must contain at least 1 lowercase letter (a-z).

    • Uppercase Characters: Must contain at least 1 uppercase letter (A-Z).

    • Special Characters: Must contain at least 1 special character (e.g., !, @, #, $).

Run the following command from the root of the extracted software artifacts directory:

For example:

Note

Use the dry-run flag --dry-run=client to gain an understanding of what is being installed before the actual installation.

OpenShift

Connected

Run the following command and update the values as described below:

  • Replace the <OPENSHIFT-CLUSTER-DOMAIN> with the domain configured for the OpenShift cluster. To determine the OpenShift cluster domain, run oc get routes -A.

  • If you are using a local certificate authority, add --set global.customCA.enabled=true to the Helm command as described in the Local certificate authority section.

  • Set tenantsManager.config.adminUsername=<ADMIN_EMAIL> to the administrator's email address.

  • Set tenantsManager.config.adminPassword=<ADMIN_PASSWORD> to the initial administrator password. The password must meet the following requirements:

    • Minimum Length: Passwords must be at least 8 characters long.

    • Digits: Must contain at least 1 numeric digit (0-9).

    • Lowercase Characters: Must contain at least 1 lowercase letter (a-z).

    • Uppercase Characters: Must contain at least 1 uppercase letter (A-Z).

    • Special Characters: Must contain at least 1 special character (e.g., !, @, #, $).

NGC (Recommended)

Replace <NGC_API_KEY> with your NGC API key.

For example:

JFrog

For example:

Note

Use the dry-run flag --dry-run=client to gain an understanding of what is being installed before the actual installation.

Air-gapped

Run the following command and update the values as described below. The custom-env.yaml file is created during the preparations step:

  • Replace control-plane-<VERSION>.tgz with the full filename of the control plane Helm chart (e.g., control-plane-2.24.49.tgz), located in the chart folder of the extracted software artifacts.

  • Replace <OPENSHIFT-CLUSTER-DOMAIN> with the domain configured for the OpenShift cluster. To determine the OpenShift cluster domain, run oc get routes -A.

  • Set global.customCA.enabled=true as described in the Local certificate authority section.

  • Set tenantsManager.config.adminUsername=<ADMIN_EMAIL> to the administrator’s email address.

  • Set tenantsManager.config.adminPassword=<ADMIN_PASSWORD> to the initial administrator password. The password must meet the following requirements:

    • Minimum Length: Passwords must be at least 8 characters long.

    • Digits: Must contain at least 1 numeric digit (0-9).

    • Lowercase Characters: Must contain at least 1 lowercase letter (a-z).

    • Uppercase Characters: Must contain at least 1 uppercase letter (A-Z).

    • Special Characters: Must contain at least 1 special character (e.g., !, @, #, $).

Run the following command from the root of the extracted software artifacts directory:

For example:

Note

Use the dry-run flag --dry-run=client to gain an understanding of what is being installed before the actual installation.

Connect to NVIDIA Run:ai User Interface

Note

After installing the NVIDIA Run:ai control plane, it may take a few minutes for the UI to become accessible (up to 15 minutes). Accessing it too early may result in a "server cannot be reached" error.

  1. Open your browser and go to:

https://runai.<DOMAIN>.local

  1. Log in using the administrator credentials provided during the installation. It is recommended to change the password after the first login.

Last updated