Cluster Authentication

To allow users to securely submit workloads using kubectl, you must configure the Kubernetes API server to authenticate users via the NVIDIA Run:ai identity provider. This is done by adding OpenID Connect (OIDC) flags to the Kubernetes API server configuration on each cluster. These flags instruct Kubernetes to validate credentials using NVIDIA Run:ai's identity provider.

Retrieve Required OIDC Flags

  1. Go to General settings

  2. Navigate to Cluster authentication

  containers:
  - command:
    ...
    - --oidc-client-id=runai
    - --oidc-issuer-url=https://<HOST>/auth/realms/runai
    - --oidc-username-prefix=-
  • --oidc-client-id - A client id that all tokens must be issued for.

  • --oidc-issuer-url - The URL of the NVIDIA Run:ai identity provider

  • --oidc-username-prefix - Prefix prepended to username claims to prevent clashes with existing names (e.g., [email protected]).

Note

These flags must be configured in the API server startup parameters for each cluster in your environment.

Kubernetes Distribution-Specific Configuration

Note

  • Azure Kubernetes Service (AKS) is not supported.

  • For other Kubernetes distributions, refer to specific instructions in the documentation.

Vanilla Kubernetes
  1. Locate the Kubernetes API server configuration file. For vanilla Kubernetes, the configuration file is typically located at: /etc/kubernetes/manifests/kube-apiserver.yaml.

  2. Edit the file. Under the command section, add the required OIDC flags.

  3. Verify that the changes have been applied. After saving the file, the API server should automatically restart since it's managed as a static pod. Confirm that the kube-apiserver-<master-node-name> pod in the kube-system namespace has restarted and is running with the new configuration. You can run the following command to check the pod status:

    kubectl get pods -n kube-system kube-apiserver-<master-node-name> -o yaml
OpenShift Container Platform (OCP)

No additional configuration is required.

Rancher Kubernetes Engine (RKE1)
  1. Edit the cluster.yml file used by RKE1. If you're using the Rancher UI, follow the instructions here.

  2. Add the required OIDC flags under the kube-api section:

    kube-api:
        always_pull_images: false
        extra_args:
            oidc-client-id: runai  # 
            ...
  1. Verify the flags are applied by inspecting the running API server container:

    • Follow the Rancher documentation here to locate the API server container ID.

    • Run the following command:

      docker inspect <kube-api-server-container-id>
    • Confirm that the OIDC flags have been added correctly to the container's configuration.

Rancher Kubernetes Engine 2 (RKE2)

If you're using the RKE2 Quickstart:

  1. Edit /etc/rancher/rke2/config.yaml.

  2. Add the required OIDC flags under kube-apiserver-arg, using the format shown below:

    kube-apiserver-arg:
    - "oidc-client-id=runai" # 
    ...

If you're using Rancher UI:

  1. Add the required flags during the cluster provisioning process.

  2. Navigate to: Cluster Management > Create, select RKE2, and choose your platform.

  3. In the Cluster Configuration screen, go to: Advanced > Additional API Server Args.

  4. Add the required OIDC flags as <key>=<value> (e.g. oidc-username-prefix=-).

Google Kubernetes Engine (GKE)

To configure researcher authentication on GKE, use Anthos Identity Service and apply the appropriate OIDC configuration.

  1. Install Anthos identity service by running:

    gcloud container clusters update <gke-cluster-name> \
        --enable-identity-service --project=<gcp-project-name> --zone=<gcp-zone-name>
  1. Install the yq utility.

  2. Configure the OIDC provider for username-password authentication. Make sure to use the required OIDC flags:

    kubectl get clientconfig default -n kube-public -o yaml > login-config.yaml
    yq -i e ".spec +={\"authentication\":[{\"name\":\"oidc\",\"oidc\":{\"clientID\":\"runai\",\"issuerURI\":\"$OIDC_ISSUER_URL\",\"kubectlRedirectURI\":\"http://localhost:8000/callback\",\"userClaim\":\"sub\",\"userPrefix\":\"-\"}}]}" login-config.yaml
    kubectl apply -f login-config.yaml
  1. Or, configure the OIDC provider for single-sign-on. Make sure to use the required OIDC flags:

    kubectl get clientconfig default -n kube-public -o yaml > login-config.yaml
    yq -i e ".spec +={\"authentication\":[{\"name\":\"oidc\",\"oidc\":{\"clientID\":\"runai\",\"issuerURI\":\"$OIDC_ISSUER_URL\",\"groupsClaim\":\"groups\",\"kubectlRedirectURI\":\"http://localhost:8000/callback\",\"userClaim\":\"sub\",\"userPrefix\":\"-\"}}]}" login-config.yaml
    kubectl apply -f login-config.yaml
  1. Update the runaiconfig with the Anthos Identity Service endpoint. First, get the external IP of the gke-oidc-envoy service:

    kubectl get svc -n anthos-identity-service
    NAME               TYPE           CLUSTER-IP    EXTERNAL-IP     PORT(S)              AGE
    gke-oidc-envoy     LoadBalancer   10.37.3.111   39.201.319.10   443:31545/TCP        12h
  1. Then, patch the runaiconfig to use this endpoint. Replace the below with the actual IP address of the gke-oidc-envoy service:

    kubectl -n runai patch runaiconfig runai -p '{"spec": {"researcher-service": 
    {"args": {"gkeOidcEnvoyHost": "35.236.229.19"}}}}'  --type="merge"
Elastic Kubernetes Engine (EKS)
  1. In the AWS Console, under EKS, find your cluster.

  1. Go to Configuration and then to Authentication.

  1. Associate a new identity provider. Use the required OIDC flags.

The process can take up to 30 minutes.

NVIDIA Base Command Manager (BCM)
  1. Locate the Kubernetes API server configuration file. For vanilla Kubernetes, the configuration file is typically located at: /etc/kubernetes/manifests/kube-apiserver.yaml.

  1. Edit the file. Under the command section, add the required OIDC flags.

  1. Verify that the changes have been applied. After saving the file, the API server should automatically restart since it's managed as a static pod. Confirm that the kube-apiserver-<master-node-name> pod in the kube-system namespace has restarted and is running with the new configuration. You can run the following command to check the pod status:

    kubectl get pods -n kube-system kube-apiserver-<master-node-name> -o yaml

Last updated