Preparations

Note

This section applies to self-hosted installation only.

The following section provides the information needed to prepare for a NVIDIA Run:ai installation.

Software artifacts

The following software artifacts should be used when installing the control plane and cluster.

Kubernetes

Connected

You should receive a token from NVIDIA Run:ai customer support. The following command provides access to the NVIDIA Run:ai container registry:

kubectl create secret docker-registry runai-reg-creds  \
--docker-server=https://runai.jfrog.io \
--docker-username=self-hosted-image-puller-prod \
--docker-password=<TOKEN> \
[email protected] \
--namespace=runai-backend
Air-gapped

You should receive a single file runai-airgapped-package-<VERSION>.tar.gz from NVIDIA Run:ai customer support.

NVIDIA Run:ai assumes the existence of a Docker registry for images most likely installed within the organization. The installation requires the network address and port for the registry (referenced below as <REGISTRY_URL>).

SSH into a node with kubectl access to the cluster and Docker installed. To extract the NVIDIA Run:ai files, replace <VERSION> in the command below and run:

tar xvf runai-airgapped-package-<VERSION>.tar.gz

Upload images

  1. Upload images to a local Docker Registry. Set the Docker Registry address in the form of NAME:PORT (do not add https):

export REGISTRY_URL=<DOCKER REGISTRY ADDRESS>
  1. Run the following script. You must have at least 20GB of free disk space to run. If Docker is configured to run as non-root then sudo is not required:

sudo ./setup.sh

The script should create a file named custom-env.yaml which will be used during control plane installation.

OpenShift

Connected

You should receive a token from NVIDIA Run:ai customer support. The following command provides access to the NVIDIA Run:ai container registry:

oc create secret docker-registry runai-reg-creds  \
--docker-server=https://runai.jfrog.io \
--docker-username=self-hosted-image-puller-prod \
--docker-password=<TOKEN> \
[email protected] \
--namespace=runai-backend
Air-gapped

You should receive a single file runai-airgapped-package-<VERSION>.tar.gz from NVIDIA Run:ai customer support.

NVIDIA Run:ai assumes the existence of a Docker registry for images most likely installed within the organization. The installation requires the network address and port for the registry (referenced below as <REGISTRY_URL>).

SSH into a node with oc access to the cluster and Docker installed. To extract the NVIDIA Run:ai files, replace <VERSION> in the command below and run:

tar xvf runai-airgapped-package-<VERSION>.tar.gz

Upload images

  1. Upload images to a local Docker Registry. Set the Docker Registry address in the form of NAME:PORT (do not add https):

export REGISTRY_URL=<DOCKER REGISTRY ADDRESS>
  1. Run the following script. You must have at least 20GB of free disk space to run. If Docker is configured to run as non-root then sudo is not required:

sudo ./setup.sh

The script should create a file named custom-env.yaml which will be used by the control plane installation.

Private docker registry (optional)

Kubernetes

To access the organization's docker registry it is required to set the registry's credentials (imagePullSecret).

Create the secret named runai-reg-creds based on your existing credentials. For more information, see Pull an Image from a Private Registry.

OpenShift

To access the organization's docker registry it is required to set the registry's credentials (imagePullSecret).

Create the secret named runai-reg-creds in the runai-backend namespace based on your existing credentials. The configuration will be copied over to the runai namespace at cluster install. For more information, see Allowing pods to reference images from other secured registries.

Set up your environment

External Postgres database (optional)

If you have opted to use an external PostgreSQL database, you need to perform initial setup to ensure successful installation. Follow these steps:

  1. Create a SQL script file, edit the parameters below, and save it locally:

    • Replace <DATABASE_NAME> with a dedicate database name for NVIDIA Run:ai in your PostgreSQL database.

    • Replace <ROLE_NAME> with a dedicated role name (user) for NVIDIA Run:ai database.

    • Replace <ROLE_PASSWORD> with a password for the new PostgreSQL role.

    • Replace <GRAFANA_PASSWORD> with the password to be set for Grafana integration.

    -- Create a new database for runai
    CREATE DATABASE <DATABASE_NAME>; 
    
    -- Create the role with login and password
    CREATE ROLE <ROLE_NAME>  WITH LOGIN PASSWORD '<ROLE_PASSWORD>'; 
    
    -- Grant all privileges on the database to the role
    GRANT ALL PRIVILEGES ON DATABASE <DATABASE_NAME> TO <ROLE_NAME>; 
    
    -- Connect to the newly created database
    \c <DATABASE_NAME> 
    
    -- grafana
    CREATE ROLE grafana WITH LOGIN PASSWORD '<GRAFANA_PASSWORD>'; 
    CREATE SCHEMA grafana authorization grafana;
    ALTER USER grafana set search_path='grafana';
    -- Exit psql
    \q
  2. Run the following command on a machine where PostgreSQL client (pgsql) is installed:

    • Replace <POSTGRESQL_HOST> with the PostgreSQL ip address or hostname.

    • Replace <POSTGRESQL_USER> with the PostgreSQL username.

    • Replace <POSTGRESQL_PORT> with the port number where PostgreSQL is running.

    • Replace <POSTGRESQL_DB> with the name of your PostgreSQL database.

    • Replace <POSTGRESQL_DB> with the name of your PostgreSQL database.

    • Replace <SQL_FILE> with the path to the SQL script created in the previous step.

    psql --host <POSTGRESQL_HOST> \ 
    --user <POSTGRESQL_USER> \
    --port <POSTGRESQL_PORT> \ 
    --dbname <POSTGRESQL_DB> \
    -a -f <SQL_FILE> \

Last updated