# Cluster Authentication

To allow users to securely submit workloads using `kubectl`, you must configure the Kubernetes API server to authenticate users via the NVIDIA Run:ai identity provider. This is done by adding OpenID Connect (OIDC) flags to the Kubernetes API server configuration on each cluster.

### Retrieve Required OIDC Flags

1. Go to **General settings**
2. Navigate to **Cluster authentication**

```yaml
  containers:
  - command:
    ...
    - --oidc-client-id=runai
    - --oidc-issuer-url=https://<HOST>/auth/realms/runai
    - --oidc-username-prefix=-
```

* `--oidc-client-id` - A client id that all tokens must be issued for.
* `--oidc-issuer-url` - The URL of the NVIDIA Run:ai identity provider
* `--oidc-username-prefix` - Prefix prepended to username claims to prevent clashes with existing names (e.g., `-user@example.com`).

{% hint style="info" %}
**Note**

These flags must be configured in the API server startup parameters for each cluster in your environment.
{% endhint %}

## Kubernetes Distribution-Specific Configuration

{% hint style="info" %}
**Note**

* Azure Kubernetes Service (AKS) is not supported.
* For other Kubernetes distributions, refer to specific instructions in the documentation.
  {% endhint %}

<details>

<summary>Vanilla Kubernetes</summary>

1. Locate the Kubernetes API server configuration file. For vanilla Kubernetes, the configuration file is typically located at: `/etc/kubernetes/manifests/kube-apiserver.yaml`.
2. Edit the file. Under the `command` section, add the [required OIDC flags](#retrieve-required-oidc-values).
3. Verify that the changes have been applied. After saving the file, the API server should automatically restart since it's managed as a static pod. Confirm that the `kube-apiserver-<master-node-name>` pod in the `kube-system` namespace has restarted and is running with the new configuration. You can run the following command to check the pod status:

   ```bash
   kubectl get pods -n kube-system kube-apiserver-<master-node-name> -o yaml
   ```

</details>

<details>

<summary>OpenShift Container Platform (OCP)</summary>

No additional configuration is required.

</details>

<details>

<summary>Rancher Kubernetes Engine (RKE1)</summary>

1. Edit the `cluster.yml` file used by RKE1. If you're using the Rancher UI, follow the instructions [here](https://ranchermanager.docs.rancher.com/reference-guides/cluster-configuration/rancher-server-configuration/rke1-cluster-configuration#editing-clusters-with-a-form-in-the-rancher-ui).
2. Add the [required OIDC flags](#retrieve-required-oidc-values) under the `kube-api` section:

   ```yaml
   kube-api:
       always_pull_images: false
       extra_args:
           oidc-client-id: runai  # 
           ...
   ```
3. Verify the flags are applied by inspecting the running API server container:
   * Follow the Rancher documentation [here](https://ranchermanager.docs.rancher.com/troubleshooting/kubernetes-components/troubleshooting-controlplane-nodes) to locate the API server container ID.
   * Run the following command:

     ```bash
     docker inspect <kube-api-server-container-id>
     ```
   * Confirm that the OIDC flags have been added correctly to the container's configuration.

</details>

<details>

<summary>Rancher Kubernetes Engine 2 (RKE2)</summary>

If you're using the [RKE2 Quickstart](https://docs.rke2.io/install/quickstart/):

1. Edit `/etc/rancher/rke2/config.yaml`.
2. Add the [required OIDC flags](#retrieve-required-oidc-values) under `kube-apiserver-arg`, using the format shown below:

   ```yaml
   kube-apiserver-arg:
   - "oidc-client-id=runai" # 
   ...
   ```

If you're using Rancher UI:

1. Add the required flags during the cluster provisioning process.
2. Navigate to: Cluster Management > Create, select RKE2, and choose your platform.
3. In the Cluster Configuration screen, go to: Advanced > Additional API Server Args.
4. Add the [required OIDC flags](#retrieve-required-oidc-values) as `<key>=<value>` (e.g. `oidc-username-prefix=-`).

</details>

<details>

<summary>Google Kubernetes Engine (GKE)</summary>

To configure researcher authentication on GKE, use **Anthos Identity Service** and apply the appropriate OIDC configuration.

1. Install [Anthos identity service](https://cloud.google.com/kubernetes-engine/docs/how-to/oidc#enable-oidc) by running:

   ```bash
   gcloud container clusters update <gke-cluster-name> \
       --enable-identity-service --project=<gcp-project-name> --zone=<gcp-zone-name>
   ```
2. Install the [yq](https://github.com/mikefarah/yq) utility.
3. Configure the OIDC provider for username-password authentication. Make sure to use the [required OIDC flags](#retrieve-required-oidc-values):

   ```bash
   kubectl get clientconfig default -n kube-public -o yaml > login-config.yaml
   yq -i e ".spec +={\"authentication\":[{\"name\":\"oidc\",\"oidc\":{\"clientID\":\"runai\",\"issuerURI\":\"$OIDC_ISSUER_URL\",\"kubectlRedirectURI\":\"http://localhost:8000/callback\",\"userClaim\":\"sub\",\"userPrefix\":\"-\"}}]}" login-config.yaml
   kubectl apply -f login-config.yaml
   ```
4. Or, configure the OIDC provider for single-sign-on. Make sure to use the [required OIDC flags](#retrieve-required-oidc-values):

   ```bash
   kubectl get clientconfig default -n kube-public -o yaml > login-config.yaml
   yq -i e ".spec +={\"authentication\":[{\"name\":\"oidc\",\"oidc\":{\"clientID\":\"runai\",\"issuerURI\":\"$OIDC_ISSUER_URL\",\"groupsClaim\":\"groups\",\"kubectlRedirectURI\":\"http://localhost:8000/callback\",\"userClaim\":\"sub\",\"userPrefix\":\"-\"}}]}" login-config.yaml
   kubectl apply -f login-config.yaml
   ```
5. Update the `runaiconfig` with the Anthos Identity Service endpoint. First, get the external IP of the `gke-oidc-envoy` service:

   ```bash
   kubectl get svc -n anthos-identity-service
   NAME               TYPE           CLUSTER-IP    EXTERNAL-IP     PORT(S)              AGE
   gke-oidc-envoy     LoadBalancer   10.37.3.111   39.201.319.10   443:31545/TCP        12h
   ```
6. Then, patch the `runaiconfig` to use this endpoint. Replace the below with the actual IP address of the `gke-oidc-envoy` service:

   ```bash
   kubectl -n runai patch runaiconfig runai -p '{"spec": {"researcher-service": 
   {"args": {"gkeOidcEnvoyHost": "35.236.229.19"}}}}'  --type="merge"
   ```

</details>

<details>

<summary>Elastic Kubernetes Engine (EKS)</summary>

1. In the AWS Console, under EKS, find your cluster.
2. Go to `Configuration` and then to `Authentication`.
3. Associate a new `identity provider`. Use the [required OIDC flags](#retrieve-required-oidc-values).

The process can take up to 30 minutes.

</details>

<details>

<summary>NVIDIA Base Command Manager (BCM)</summary>

1. Locate the Kubernetes API server configuration file. For vanilla Kubernetes, the configuration file is typically located at: `/etc/kubernetes/manifests/kube-apiserver.yaml`.
2. Edit the file. Under the `command` section, add the [required OIDC flags](#retrieve-required-oidc-values).
3. Verify that the changes have been applied. After saving the file, the API server should automatically restart since it's managed as a static pod. Confirm that the `kube-apiserver-<master-node-name>` pod in the `kube-system` namespace has restarted and is running with the new configuration. You can run the following command to check the pod status:

   ```bash
   kubectl get pods -n kube-system kube-apiserver-<master-node-name> -o yaml
   ```

</details>
