Preparations

These steps prepare your environment for installation and include retrieving the required software artifacts, configuring access to container images, and preparing installation-specific configuration files where applicable (for example, in air-gapped environments).

Software Artifacts

This section describes how to prepare the software artifacts required for installing the NVIDIA Run:ai control plane and cluster. The steps depend on:

  • Platform distribution - Kubernetes or OpenShift

  • On-premise environment type - connected or air-gapped (i.e. not connected)

circle-info

Note

Starting with v2.24, NVIDIA Run:ai artifacts are available on both NVIDIA NGC and JFrog. NGC is the recommended artifact source. JFrog remains supported in v2.24 but will be removed in a future release.

Kubernetes

chevron-rightConnectedhashtag

In connected environments, Kubernetes pulls NVIDIA Run:ai container images directly from a remote registry at runtime. To enable this, you must create a Kubernetes secret for registry access.

NGC (Recommended)

NVIDIA Run:ai container images are hosted in the NVIDIA NGC container registry under nvidia/runai. You must have access to NVIDIA Run:ai artifacts in NGC in order to pull NVIDIA Run:ai container images. Create the required secret using your NGC API key. For information on creating an NGC API key and authenticating to NGC, see NGC API keysarrow-up-right.

kubectl create secret docker-registry runai-reg-creds \
--docker-server=https://nvcr.io \
--docker-username='$oauthtoken' \
--docker-password=<NGC_API_KEY> \
--docker-email=<EMAIL> \
--namespace=runai-backend

JFrog

You will receive a token from NVIDIA Run:ai to access the NVIDIA Run:ai container registry. Use the following command to create the required Kubernetes secret:

kubectl create secret docker-registry runai-reg-creds \
--docker-server=https://runai.jfrog.io \
--docker-username=self-hosted-image-puller-prod \
--docker-password=<TOKEN> \
[email protected] \
--namespace=runai-backend
chevron-rightAir-gappedhashtag

In air-gapped environments, Kubernetes cannot pull container images from external registries during runtime. Instead, NVIDIA Run:ai provides an air-gapped installation package that contains all required images. These images are uploaded to an internal registry that your cluster can access.

Download and Extract the Air-gapped Package

NGC (Recommended)

Use your NGC API key to download the NVIDIA Run:ai air-gapped installation package from NVIDIA NGC. For information on creating an NGC API key and authenticating to NGC, see NGC API keysarrow-up-right.

  1. Browse the available package versions. Air-gapped packages are published as an NGC Resource. To view available versions, refer to the NGC Resourcearrow-up-right page and select the required version.

  2. Download the package using your NGC API key (authenticated download):

    curl -LO --request GET \
      'https://api.ngc.nvidia.com/v2/org/nvidia/team/runai/resources/runai-airgapp-package/versions/<VERSION>/files/runai-airgapped-package-<VERSION>.tar.gz' \
      -H "Authorization: Bearer ${NGC_CLI_API_KEY}" \
      -H "Content-Type: application/json"

    For example, run the below to get the 2.24 air-gapped package:

    curl -LO --request GET \
      'https://api.ngc.nvidia.com/v2/org/nvidia/team/runai/resources/runai-airgapp-package/versions/2.24.58/files/runai-airgapped-package-2.24.49.tar.gz' \
      -H "Authorization: Bearer ${NGC_CLI_API_KEY}" \
      -H "Content-Type: application/json"
  3. SSH into a node with kubectl access to the cluster and Docker installed.

  4. Extract the NVIDIA Run:ai package, replace <VERSION> in the command below and run:

    tar xvf runai-airgapp-package-<VERSION>.tar.gz

JFrog

You will receive a token from NVIDIA Run:ai to access the NVIDIA Run:ai air-gapped installation package. Use the following commands with the token provided by NVIDIA Run:ai to download and extract the package.

  1. Run the following command to browse all available air-gapped packages:

    curl -H "Authorization: Bearer <token>" "https://runai.jfrog.io/artifactory/api/storage/runai-airgapped-prod/?list"
  2. Run the following command to download the desired package:

    curl -L -H "Authorization: Bearer <token>" -O "https://runai.jfrog.io/artifactory/runai-airgapped-prod/runai-airgapped-package-<VERSION>.tar.gz"
  3. SSH into a node with kubectl access to the cluster and Docker installed.

  4. Extract the NVIDIA Run:ai package and replace <VERSION> in the command below and run:

    tar xvf runai-airgapped-package-<VERSION>.tar.gz

Upload Images

circle-info

Note

The following steps apply to both NGC and JFrog artifact sources.

NVIDIA Run:ai assumes the existence of a Docker registry within your organization for hosting container images. The installation requires the network address and port for this registry (referred to as <REGISTRY_URL>).

  1. Upload images to a local Docker registry. Set the Docker registry address in the form of NAME:PORT (do not add https):

    export REGISTRY_URL=<DOCKER REGISTRY ADDRESS>
  2. Run the following script. You must have at least 20GB of free disk space to run. If Docker is configured to run as non-rootarrow-up-right then sudo is not required:

    sudo ./setup.sh

The script should create a file named custom-env.yaml which will be used during control plane installation.

OpenShift

chevron-rightConnectedhashtag

In connected environments, OpenShift pulls NVIDIA Run:ai container images directly from a remote registry at runtime. To enable this, you must create a Kubernetes secret for registry access.

NGC (Recommended)

NVIDIA Run:ai container images are hosted in the NVIDIA NGC container registry under nvidia/runai. You must have access to NVIDIA Run:ai artifacts in NGC in order to pull NVIDIA Run:ai container images. Create the required secret using your NGC API key. For information on creating an NGC API key and authenticating to NGC, see NGC API keysarrow-up-right.

JFrog

You will receive a token from NVIDIA Run:ai to access the NVIDIA Run:ai container registry. Use the following command to create the required Kubernetes secret:

chevron-rightAir-gappedhashtag

In air-gapped environments, OpenShift cannot pull container images from external registries during runtime. Instead, NVIDIA Run:ai provides an air-gapped installation package that contains all required images. These images are uploaded to an internal registry that your cluster can access.

Download and Extract the Air-gapped Package

NGC (Recommended)

Use your NGC API key to download the NVIDIA Run:ai air-gapped installation package from NVIDIA NGC. For information on creating an NGC API key and authenticating to NGC, see NGC API keysarrow-up-right.

  1. Browse the available package versions. Air-gapped packages are published as an NGC Resource. To view available versions, refer to the NGC Resourcearrow-up-right page and select the required version.

  2. Download the package using your NGC API key (authenticated download):

    For example, run the below to get the 2.24 air-gapped package:

  3. SSH into a node with oc access to the cluster and Docker installed.

  4. Extract the NVIDIA Run:ai package, replace <VERSION> in the command below and run:

JFrog

You will receive a token from NVIDIA Run:ai to access the NVIDIA Run:ai air-gapped installation package. Use the following commands with the token provided by NVIDIA Run:ai to download and extract the package.

  1. Run the following command to browse all available air-gapped packages:

  2. Run the following command to download the desired package:

  3. SSH into a node with oc access to the cluster and Docker installed.

  4. Extract the NVIDIA Run:ai package and replace <VERSION> in the command below and run:

Upload Images

circle-info

Note

The following steps apply to both NGC and JFrog artifact sources.

NVIDIA Run:ai assumes the existence of a Docker registry within your organization for hosting container images. The installation requires the network address and port for this registry (referred to as <REGISTRY_URL>).

  1. Upload images to a local Docker registry. Set the Docker registry address in the form of NAME:PORT (do not add https):

  2. Run the following script. You must have at least 20GB of free disk space to run. If Docker is configured to run as non-rootarrow-up-right then sudo is not required:

The script should create a file named custom-env.yaml which will be used during control plane installation.

Private Docker Registry

This step is required only if you are installing NVIDIA Run:ai in an environment that uses a private Docker registry.

Kubernetes

To access the organization's docker registry, set the registry's credentials (imagePullSecret).

Create the secret named runai-reg-creds based on your existing credentials. For more information, see Pull an Image from a Private Registryarrow-up-right.

OpenShift

To access the organization's docker registry, set the registry's credentials (imagePullSecret).

Create the secret named runai-reg-creds in the runai-backend namespace based on your existing credentials. The configuration will be copied over to the runai namespace at cluster install. For more information, see Allowing pods to reference images from other secured registriesarrow-up-right.

Set Up Your Environment

External Postgres Database

If you have opted to use an external PostgreSQL database, you need to perform initial setup to ensure successful installation. Follow these steps:

  1. Create an SQL script file, edit the parameters below, and save it locally:

    • Replace <DATABASE_NAME> with a dedicated database name for NVIDIA Run:ai in your PostgreSQL database.

    • Replace <ROLE_NAME> with a dedicated role name (user) for the NVIDIA Run:ai database.

    • Replace <ROLE_PASSWORD> with a password for the new PostgreSQL role.

    • Replace <GRAFANA_PASSWORD> with the password to be set for Grafana integration.

  2. Run the following command on a machine where PostgreSQL client (pgsql) is installed:

    • Replace <POSTGRESQL_HOST> with the PostgreSQL IP address or hostname.

    • Replace <POSTGRESQL_USER> with the PostgreSQL username.

    • Replace <POSTGRESQL_PORT> with the port number where PostgreSQL is running.

    • Replace <POSTGRESQL_DB> with the name of your PostgreSQL database.

    • Replace <SQL_FILE> with the path to the SQL script created in the previous step.

Last updated