Service mesh

NVIDIA Run:ai supports service mesh implementations. When a service mesh is deployed with sidecar injection, specific configurations must be applied to ensure compatibility with NVIDIA Run:ai. This document outlines the required changes for the NVIDIA Run:ai control plane and cluster.

Control plane configuration

Note

This section applies for self-hosted only.

By default, NVIDIA Run:ai prevents Istio from injecting sidecar containers into system jobs in the control plane. For other service mesh solutions, users must manually add annotations during installation.

To disable sidecar injection in the NVIDIA Run:ai control plane, modify the Helm values file by adding the required pod labels to the following components. See Advanced control plane configurations for more details.

Example for Open Service Mesh:

authorizationMigrator:
  podLabels:
    openservicemesh.io/sidecar-injection: disabled
clusterMigrator:
  podLabels:
    openservicemesh.io/sidecar-injection: disabled
identityProviderReconciler:
  podLabels:
    openservicemesh.io/sidecar-injection: disabled
keepPVC:
  podLabels:
    openservicemesh.io/sidecar-injection: disabled
orgUnitsMigrator:
  podLabels:
    openservicemesh.io/sidecar-injection: disabled

Cluster configuration

Installation phase

Sidecar containers injected by some service mesh solutions can prevent NVIDIA Run:ai installation hooks from completing. To avoid this, modify the Helm installation command to include the required labels or annotations:

helm upgrade -i ... 
--set global.additionalJobLabels.A=B --set global.additionalJobAnnotations.A=B

Example for Istio Service Mesh:

helm upgrade -i ... 
--set-json global.additionalJobLabels='{"sidecar.istio.io/inject":false}'

Workloads

To prevent sidecar injection in workloads created at runtime (such as training workloads), update the runaiconfig resource. See Advanced cluster configurations for more details:

spec:
  workload-controller:
    additionalPodLabels:
      sidecar.istio.io/inject: false

Last updated