Install the Control Plane
In this section you will install the NVIDIA Run:ai control plane on your Kubernetes cluster using Helm. The control plane provides the central management layer for NVIDIA Run:ai handling multi-cluster management, resource and access management as well as workload submission and monitoring.
This procedure includes:
Adding the NVIDIA Run:ai Helm repository from NGC or JFrog
Configuring key settings such as domain name, ingress, and administrator credentials
Deploying the control plane into the
runai-backendnamespace
By completing this process, the NVIDIA Run:ai control plane will be running in your cluster and accessible via the configured domain.
System and Network Requirements
Before installing the NVIDIA Run:ai control plane, validate that the system requirements and network requirements are met. For air-gapped environments, make sure you have the software artifacts prepared.
Permissions
As part of the installation, you will be required to install the NVIDIA Run:ai control plane Helm charts. The Helm charts require Kubernetes administrator permissions. You can review the exact objects that are created by the charts using the --dry-run on both helm charts.
Installation
Note
To customize the installation based on your environment, see Advanced control plane configurations.
PostgreSQL and Keycloakx are installed with default usernames and passwords. To change the default credentials, see Additional third-party configurations.
Artifact Source
Starting with v2.24, NVIDIA Run:ai artifacts are available on both NVIDIA NGC and JFrog. NGC is the recommended artifact source. JFrog remains supported in v2.24 but will be removed in a future release. For connected environments, follow the instructions for your artifact source in the sections below. For air-gapped environments, the installation steps are the same regardless of artifact source. Artifacts are prepared in the Preparations step.
Kubernetes
Connected
Run the following command and update the values as described below:
Replace
global.domain=<DOMAIN>with the fully qualified domain name (FQDN) obtained in the Fully qualified domain name section.The recommended ingress controller is HAProxy. If you are using a different ingress controller, update the ingress class:
--set global.ingress.ingressClass=<ingress class>If you are using a local certificate authority, add
--set global.customCA.enabled=trueto the Helm command as described in the Local certificate authority section.Set
tenantsManager.config.adminUsername=<ADMIN_EMAIL>to the administrator's email address.Set
tenantsManager.config.adminPassword=<ADMIN_PASSWORD>to the initial administrator password. The password must meet the following requirements:Minimum Length: Passwords must be at least 8 characters long.
Digits: Must contain at least 1 numeric digit (0-9).
Lowercase Characters: Must contain at least 1 lowercase letter (a-z).
Uppercase Characters: Must contain at least 1 uppercase letter (A-Z).
Special Characters: Must contain at least 1 special character (e.g., !, @, #, $).
NGC (Recommended)
Replace <NGC_API_KEY> with your NGC API key.
For example:
JFrog
For example:
Note
Use the dry-run flag --dry-run=client to gain an understanding of what is being installed before the actual installation.
Air-gapped
Run the following command and update the values as described below. The custom-env.yaml file is created during the preparations step:
Replace
control-plane-<VERSION>.tgzwith the NVIDIA Run:ai control plane version.Replace
global.domain=<DOMAIN>with the fully qualified domain name (FQDN) obtained in the Fully qualified domain name section.Set
global.customCA.enabled=trueas described in the Local certificate authority section.The recommended ingress controller is HAProxy. If you are using a different ingress controller, update the ingress class:
--set global.ingress.ingressClass=<ingress class>Set
tenantsManager.config.adminUsername=<ADMIN_EMAIL>to the administrator’s email address.Set
tenantsManager.config.adminPassword=<ADMIN_PASSWORD>to the initial administrator password. The password must meet the following requirements:Minimum Length: Passwords must be at least 8 characters long.
Digits: Must contain at least 1 numeric digit (0-9).
Lowercase Characters: Must contain at least 1 lowercase letter (a-z).
Uppercase Characters: Must contain at least 1 uppercase letter (A-Z).
Special Characters: Must contain at least 1 special character (e.g., !, @, #, $).
For example:
Note
Use the dry-run flag --dry-run=client to gain an understanding of what is being installed before the actual installation.
OpenShift
Connected
Run the following command and update the values as described below:
Replace the
<OPENSHIFT-CLUSTER-DOMAIN>with the domain configured for the OpenShift cluster. To determine the OpenShift cluster domain, runoc get routes -A.If you are using a local certificate authority, add
--set global.customCA.enabled=trueto the Helm command as described in the Local certificate authority section.Set
tenantsManager.config.adminUsername=<ADMIN_EMAIL>to the administrator's email address.Set
tenantsManager.config.adminPassword=<ADMIN_PASSWORD>to the initial administrator password. The password must meet the following requirements:Minimum Length: Passwords must be at least 8 characters long.
Digits: Must contain at least 1 numeric digit (0-9).
Lowercase Characters: Must contain at least 1 lowercase letter (a-z).
Uppercase Characters: Must contain at least 1 uppercase letter (A-Z).
Special Characters: Must contain at least 1 special character (e.g., !, @, #, $).
NGC (Recommended)
Replace <NGC_API_KEY> with your NGC API key.
For example:
JFrog
For example:
Note
Use the dry-run flag --dry-run=client to gain an understanding of what is being installed before the actual installation.
Air-gapped
Run the following command and update the values as described below. The custom-env.yaml file is created during the preparations step:
Replace
control-plane-<VERSION>.tgzwith the NVIDIA Run:ai control plane version.Replace
<OPENSHIFT-CLUSTER-DOMAIN>with the domain configured for the OpenShift cluster. To determine the OpenShift cluster domain, runoc get routes -A.Set
global.customCA.enabled=trueas described in the Local certificate authority section.Set
tenantsManager.config.adminUsername=<ADMIN_EMAIL>to the administrator’s email address.Set
tenantsManager.config.adminPassword=<ADMIN_PASSWORD>to the initial administrator password. The password must meet the following requirements:Minimum Length: Passwords must be at least 8 characters long.
Digits: Must contain at least 1 numeric digit (0-9).
Lowercase Characters: Must contain at least 1 lowercase letter (a-z).
Uppercase Characters: Must contain at least 1 uppercase letter (A-Z).
Special Characters: Must contain at least 1 special character (e.g., !, @, #, $).
For example:
Note
Use the dry-run flag --dry-run=client to gain an understanding of what is being installed before the actual installation.
Connect to NVIDIA Run:ai User Interface
Note
After installing the NVIDIA Run:ai control plane, it may take a few minutes for the UI to become accessible (up to 15 minutes). Accessing it too early may result in a "server cannot be reached" error.
Open your browser and go to:
https://runai.<DOMAIN>.local
https://runai.apps.<OpenShift-DOMAIN>
Log in using the administrator credentials provided during the installation. It is recommended to change the password after the first login.
Last updated