Advanced Control Plane Configurations

Helm Chart Values

The NVIDIA Run:ai control plane installation can be customized to support your environment via Helm values files or Helm install flags. Make sure to restart the relevant NVIDIA Run:ai pods so they can fetch the new configurations.

Key
Change
Description

global.ingress.ingressClass

Ingress class

NVIDIA Run:ai default is using NGINX. If your cluster has a different ingress controller, you can configure the ingress class to be created by NVIDIA Run:ai

global.ingress.tlsSecretName

TLS secret name

NVIDIA Run:ai requires the creation of a secret with domain certificate. If the runai-backend namespace already had such a secret, you can set the secret name here

<service-name>.podLabels

Pod labels

Set NVIDIA Run:ai and 3rd party services' Pod Labels in a format of key/value pairs.

<service-name>.tolerations

Pod tolerations

Set NVIDIA Run:ai and 3rd party services's Pod Tolerations in a format of list.

<service-name>resources:limits:     cpu: 500m     memory: 512Mirequests:     cpu: 250m     memory: 256Mi

Pod request and limits

Set NVIDIA Run:ai and 3rd party services' resources

disableIstioSidecarInjection.enabled

Disable Istio sidecar injection

Disable the automatic injection of Istio sidecars across the entire NVIDIA Run:ai Control Plane services.

global.affinity

System nodes

Sets the system nodes where NVIDIA Run:ai system-level services are scheduled. Default: Prefer to schedule on nodes that are labeled with node-role.kubernetes.io/runai-system

global.customCA.enabled

Certificate authority

Enables the use of a custom Certificate Authority (CA) in your deployment. When set to true, the system is configured to trust a user-provided CA certificate for secure communication.

Email Notifications Configuration

To enable and manage outbound email notifications for the NVIDIA Run:ai platform, configure the following values under the notificationsService.config.sinks.runai-email section in your Helm values. These settings are used both for workload-related notifications and for system messages from NVIDIA Run:ai. All parameters can be set in your values.yaml file during Helm deployment or via Helm upgrade.

Parameter
Required
Default Value
Description

notificationsService.config.sinks.runai-email.type

required

email

Type of sink; must be set as email

notificationsService.config.sinks.runai-email.smtp_host

required

Empty

SMTP server hostname

notificationsService.config.sinks.runai-email.smtp_port

optional

587

SMTP server port

notificationsService.config.sinks.runai-email.user

optional

Empty

SMTP authentication username

notificationsService.config.sinks.runai-email.password

optional

Empty

SMTP authentication password

notificationsService.config.sinks.runai-email.smtp_tls_enabled

optional

true

Enable TLS for SMTP

notificationsService.config.sinks.runai-email.from_display_name

optional

NVIDIA Run:ai

Display name for the sender email address

notificationsService.config.sinks.runai-email.from

optional

Sender email address, e.g. [email protected]

notificationsService.config.sinks.runai-email.direct_notifications

optional

False

Send workload notifications directly to the submitter’s inbox instead of sending all notifications to a single global email address

notificationsService.config.sinks.runai-email.auth_type

optional

auth_login

Authentication type: auth_login or auth_plain

Example Configuration

notificationsService:
  config:
    sinks:
      runai-email:
        type: email
        smtp_host: my.smtp.host
        smtp_port: 587
        user: smtp_user
        password: smtp_password
        smtp_tls_enabled: true
        from_display_name: NVIDIA Run:ai
        from: [email protected]
        direct_notifications: true
        logo_url: https://s3.amazonaws.com/www.run.ai/nvidia_runai_logo_new.png
        auth_type: auth_login

Additional Third-Party Configurations

The NVIDIA Run:ai control plane chart includes multiple sub-charts of third-party components:

  • Data store- PostgreSQL (postgresql)

  • Metrics Store - Thanos (thanos)

  • Identity & Access Management - Keycloakx (keycloakx)

  • Analytics Dashboard - Grafana (grafana)

  • Caching, Queue - NATS (nats)

Note

Click on any component to view its chart values and configurations.

PostgreSQL

If you have opted to connect to an external PostgreSQL database, refer to the additional configurations table below. Adjust the following parameters based on your connection details:

  1. Disable PostgreSQL deployment - postgresql.enabled

  2. NVIDIA Run:ai connection details - global.postgresql.auth

  3. Grafana connection details - grafana.dbUser, grafana.dbPassword

Key
Change
Description

postgresql.enabled

PostgreSQL installation

If set to false, PostgreSQL will not be installed.

global.postgresql.auth.host

PostgreSQL host

Hostname or IP address of the PostgreSQL server.

global.postgresql.auth.port

PostgreSQL port

Port number on which PostgreSQL is running.

global.postgresql.auth.username

PostgreSQL username

Username for connecting to PostgreSQL.

global.postgresql.auth.password

PostgreSQL password

Password for the PostgreSQL user specified by global.postgresql.auth.username.

global.postgresql.auth.postgresPassword

PostgreSQL default admin password

Password for the built-in PostgreSQL superuser (postgres).

global.postgresql.auth.existingSecret

Postgres Credentials (secret)

Existing secret name with authentication credentials.

global.postgresql.auth.dbSslMode

Postgres connection SSL mode

Set the SSL mode. See the full list in Protection Provided in Different Modes. Prefer mode is not supported.

postgresql.primary.initdb.password

PostgreSQL default admin password

Set the same password as in global.postgresql.auth.postgresPassword (if changed).

postgresql.primary.persistence.storageClass

Storage class

The installation is configured to work with a specific storage class instead of the default one.

Thanos

Note

This section applies to Kubernetes only.

Key
Change
Description

thanos.receive.persistence.storageClass

Storage class

The installation is configured to work with a specific storage class instead of the default one.

Keycloakx

The keycloakx.adminUser can only be set during the initial installation. The admin password can be changed later through the Keycloak UI, but you must also update the keycloakx.adminPassword value in the Helm chart using helm upgrade. Failing to update the Helm values after changing the password can lead to control plane services encountering errors.

Key
Change
Description

keycloakx.adminUser

User name of the internal identity provider administrator

This user is the administrator of Keycloak.

keycloakx.adminPassword

Password of the internal identity provider administrator

This password is for the administrator of Keycloak.

keycloakx.existingSecret

Keycloakx Credentials (secret)

Existing secret name with authentication credentials.

global.keycloakx.host

KeyCloak (NVIDIA Run:ai internal identity provider) host path

Override the DNS for Keycloak. This can be used to access access Keycloack externally to the cluster.

Grafana

Key
Change
Description

grafana.db.existingSecret

Grafana database connection credentials (secret)

Existing secret name with authentication credentials.

grafana.dbUser

Grafana database username

Username for accessing the Grafana database.

grafana.dbPassword

Grafana database password

Password for the Grafana database user.

grafana.admin.existingSecret

Grafana admin default credentials (secret)

Existing secret name with authentication credentials.

grafana.adminUser

Grafana username

Override the NVIDIA Run:ai default user name for accessing Grafana.

grafana.adminPassword

Grafana password

Override the NVIDIA Run:ai default password for accessing Grafana.

Last updated