Install the Control Plane

In this section you will install the NVIDIA Run:ai control plane on your Kubernetes cluster using Helm. The control plane provides the central management layer for NVIDIA Run:ai handling multi-cluster management, resource and access management as well as workload submission and monitoring.

This procedure includes:

  • Adding the NVIDIA Run:ai Helm repository from NGC or JFrog

  • Configuring key settings such as domain name, ingress, and administrator credentials

  • Deploying the control plane into the runai-backend namespace

By completing this process, the NVIDIA Run:ai control plane will be running in your cluster and accessible via the configured domain.

System and Network Requirements

Before installing the NVIDIA Run:ai control plane, validate that the system requirements and network requirements are met. For air-gapped environments, make sure you have the software artifacts prepared.

Permissions

As part of the installation, you will be required to install the NVIDIA Run:ai control plane Helm chartsarrow-up-right. The Helm charts require Kubernetes administrator permissions. You can review the exact objects that are created by the charts using the --dry-run on both helm charts.

Installation

circle-info

Note

Artifact Source

Starting with v2.24, NVIDIA Run:ai artifacts are available on both NVIDIA NGC and JFrog. NGC is the recommended artifact source. JFrog remains supported in v2.24 but will be removed in a future release. For connected environments, follow the instructions for your artifact source in the sections below. For air-gapped environments, the installation steps are the same regardless of artifact source. Artifacts are prepared in the Preparations step.

Kubernetes

chevron-rightConnectedhashtag

Run the following command and update the values as described below:

  • Replace global.domain=<DOMAIN> with the fully qualified domain name (FQDN) obtained in the Fully qualified domain name section.

  • The recommended ingress controller is HAProxy. If you are using a different ingress controller, update the ingress class: --set global.ingress.ingressClass=<ingress class>

  • If you are using a local certificate authority, add --set global.customCA.enabled=true to the Helm command as described in the Local certificate authority section.

  • Set tenantsManager.config.adminUsername=<ADMIN_EMAIL> to the administrator's email address.

  • Set tenantsManager.config.adminPassword=<ADMIN_PASSWORD> to the initial administrator password. The password must meet the following requirements:

    • Minimum Length: Passwords must be at least 8 characters long.

    • Digits: Must contain at least 1 numeric digit (0-9).

    • Lowercase Characters: Must contain at least 1 lowercase letter (a-z).

    • Uppercase Characters: Must contain at least 1 uppercase letter (A-Z).

    • Special Characters: Must contain at least 1 special character (e.g., !, @, #, $).

NGC (Recommended)

Replace <NGC_API_KEY> with your NGC API key.

For example:

JFrog

For example:

circle-info

Note

Use the dry-run flag --dry-run=client to gain an understanding of what is being installed before the actual installation.

chevron-rightAir-gappedhashtag

Run the following command and update the values as described below. The custom-env.yaml file is created during the preparations step:

  • Replace control-plane-<VERSION>.tgz with the NVIDIA Run:ai control plane version.

  • Replace global.domain=<DOMAIN> with the fully qualified domain name (FQDN) obtained in the Fully qualified domain name section.

  • Set global.customCA.enabled=true as described in the Local certificate authority section.

  • The recommended ingress controller is HAProxy. If you are using a different ingress controller, update the ingress class: --set global.ingress.ingressClass=<ingress class>

  • Set tenantsManager.config.adminUsername=<ADMIN_EMAIL> to the administrator’s email address.

  • Set tenantsManager.config.adminPassword=<ADMIN_PASSWORD> to the initial administrator password. The password must meet the following requirements:

    • Minimum Length: Passwords must be at least 8 characters long.

    • Digits: Must contain at least 1 numeric digit (0-9).

    • Lowercase Characters: Must contain at least 1 lowercase letter (a-z).

    • Uppercase Characters: Must contain at least 1 uppercase letter (A-Z).

    • Special Characters: Must contain at least 1 special character (e.g., !, @, #, $).

For example:

circle-info

Note

Use the dry-run flag --dry-run=client to gain an understanding of what is being installed before the actual installation.

OpenShift

chevron-rightConnectedhashtag

Run the following command and update the values as described below:

  • Replace the <OPENSHIFT-CLUSTER-DOMAIN> with the domain configured for the OpenShift cluster. To determine the OpenShift cluster domain, run oc get routes -A.

  • If you are using a local certificate authority, add --set global.customCA.enabled=true to the Helm command as described in the Local certificate authority section.

  • Set tenantsManager.config.adminUsername=<ADMIN_EMAIL> to the administrator's email address.

  • Set tenantsManager.config.adminPassword=<ADMIN_PASSWORD> to the initial administrator password. The password must meet the following requirements:

    • Minimum Length: Passwords must be at least 8 characters long.

    • Digits: Must contain at least 1 numeric digit (0-9).

    • Lowercase Characters: Must contain at least 1 lowercase letter (a-z).

    • Uppercase Characters: Must contain at least 1 uppercase letter (A-Z).

    • Special Characters: Must contain at least 1 special character (e.g., !, @, #, $).

NGC (Recommended)

Replace <NGC_API_KEY> with your NGC API key.

For example:

JFrog

For example:

circle-info

Note

Use the dry-run flag --dry-run=client to gain an understanding of what is being installed before the actual installation.

chevron-rightAir-gappedhashtag

Run the following command and update the values as described below. The custom-env.yaml file is created during the preparations step:

  • Replace control-plane-<VERSION>.tgz with the NVIDIA Run:ai control plane version.

  • Replace <OPENSHIFT-CLUSTER-DOMAIN> with the domain configured for the OpenShift cluster. To determine the OpenShift cluster domain, run oc get routes -A.

  • Set global.customCA.enabled=true as described in the Local certificate authority section.

  • Set tenantsManager.config.adminUsername=<ADMIN_EMAIL> to the administrator’s email address.

  • Set tenantsManager.config.adminPassword=<ADMIN_PASSWORD> to the initial administrator password. The password must meet the following requirements:

    • Minimum Length: Passwords must be at least 8 characters long.

    • Digits: Must contain at least 1 numeric digit (0-9).

    • Lowercase Characters: Must contain at least 1 lowercase letter (a-z).

    • Uppercase Characters: Must contain at least 1 uppercase letter (A-Z).

    • Special Characters: Must contain at least 1 special character (e.g., !, @, #, $).

For example:

circle-info

Note

Use the dry-run flag --dry-run=client to gain an understanding of what is being installed before the actual installation.

Connect to NVIDIA Run:ai User Interface

circle-info

Note

After installing the NVIDIA Run:ai control plane, it may take a few minutes for the UI to become accessible (up to 15 minutes). Accessing it too early may result in a "server cannot be reached" error.

  1. Open your browser and go to:

https://runai.<DOMAIN>.local

  1. Log in using the administrator credentials provided during the installation. It is recommended to change the password after the first login.

Last updated