Install using Helm
This section explains the steps required to install the NVIDIA Run:ai cluster on a Kubernetes cluster using Helm.
System and network requirements
Before installing the NVIDIA Run:ai cluster, validate that the system requirements and network requirements are met.
Once all the requirements are met, it is highly recommend to use the NVIDIA Run:ai cluster preinstall diagnostics tool to:
Test the below requirements in addition to failure points related to Kubernetes, NVIDIA, storage, and networking
Look at additional components installed and analyze their relevance to a successful installation
To run the preinstall diagnostics tool, download the latest version, and run:
For more information, see preinstall diagnostics.
Helm
NVIDIA Run:ai cluster requires Helm 3.14 or above. To install Helm, see Helm Install.
Permissions
A Kubernetes user with the cluster-admin
role is required to ensure a successful installation. For more information, see Using RBAC authorization.
Installation
Follow these instructions to install using Helm.
Adding a new cluster
Follow the steps below to add a new cluster.
Note
When adding a cluster for the first time, the New Cluster form automatically opens when you log-in to the NVIDIA Run:ai platform. Other actions are prevented, until the cluster is created.
If this is your first cluster and you have completed the New Cluster form, start at step 3. Otherwise, start at step 1.
Setup
In the NVIDIA Run:ai platform, go to Resources
Click +NEW CLUSTER
Enter a unique name for your cluster
Optional: Choose the NVIDIA Run:ai cluster version (latest, by default)
Enter the Cluster URL . For more information see Domain Name Requirement
Click Continue
Installation instructions
Follow the installation instructions and run the commands provided on your Kubernetes cluster
Click DONE
The cluster is displayed in the table with the status Waiting to connect. Once installation is complete, the cluster status changes to Connected.
Note
To customize the installation based on your environment, see Customize cluster installation.
Troubleshooting
If you encounter an issue with the installation, try the troubleshooting scenario below.
Installation
If the NVIDIA Run:ai cluster installation failed, check the installation logs to identify the issue. Run the following script to print the installation logs:
Cluster status
If the NVIDIA Run:ai cluster installation completed, but the cluster status did not change its status to Connected, check the cluster troubleshooting scenarios.
Last updated