Installation

NVIDIA Run:ai is a Kubernetes-native orchestration and management platform designed to maximize GPU utilization for AI workloads.

NVIDIA Run:ai System Components

NVIDIA Run:ai is made up of two components both installed over a Kubernetes cluster:

  • NVIDIA Run:ai control plane - Provides resource management, handles workload submission and provides cluster monitoring and analytics.

  • NVIDIA Run:ai cluster - Provides enhanced scheduling and workload management, extending Kubernetes native capabilities.

As part of the installation process, you will install:

Both the Nvidia Run:ai control plane and cluster/s require Kubernetes. In typical deployment, the control plane and first cluster are installed on the same Kubernetes cluster.

Installation Types

The self-hosted option is for organizations that cannot use a SaaS solution due to data leakage concerns. NVIDIA Run:ai self-hosting comes with two variants:

Type
Description

Connected

The organization can freely download from the internet (though upload is not allowed)

Air-gapped

The organization has no connection to the internet

Software Artifact Sources

NVIDIA Run:ai software artifacts (container images and Helm charts) can be obtained from two sources:

Source
Description

NVIDIA NGC (Recommended)

The NVIDIA NGC catalog. Recommended for all new and existing installations.

JFrog

The NVIDIA Run:ai artifact repository. JFrog artifacts are deprecated and will be removed in a future release.

Before installing, complete the Preparations steps to set up access to your chosen artifact source.

Last updated