# Kubernetes Gateway API

NVIDIA Run:ai supports the Kubernetes Gateway API as an alternative to Ingress for routing external traffic. Gateway API provides a flexible and extensible model for defining how traffic is exposed and routed within the cluster.

This page builds on the concepts described in [Routing Traffic to and from NVIDIA Run:ai Services](/self-hosted/getting-started/installation/install-using-helm/system-requirements.md#routing-traffic-to-and-from-nvidia-run-ai-services), including FQDN configuration, TLS certificates, and routing behavior.

{% hint style="info" %}
**Note**

Gateway API support in NVIDIA Run:ai is **optional**. If you are currently using HAProxy Ingress or other ingress controllers, no action is required. Customers who wish to adopt Gateway API may do so using the instructions on this page.
{% endhint %}

## Scope and Prerequisites

This guide assumes a self-hosted deployment where the NVIDIA Run:ai control plane and cluster services are installed on the same Kubernetes cluster.

Before proceeding, ensure that the following prerequisites are already configured:

* Fully Qualified Domain Names (FQDN) for:
  * Control plane access
  * Development workspaces and training workloads
  * Inference workloads
* TLS certificates associated with the configured FQDNs

These configurations are described in the [Routing Traffic to and from NVIDIA Run:ai Services](/self-hosted/getting-started/installation/install-using-helm/system-requirements.md#routing-traffic-to-and-from-nvidia-run-ai-services) section.

## Routing Modes in NVIDIA Run:ai

NVIDIA Run:ai supports two routing approaches for exposing services: **host-based routing** and **path-based routing**.

Different NVIDIA Run:ai services use these routing approaches as follows:

| Service                         | Routing Mode                     | Example                                          |
| ------------------------------- | -------------------------------- | ------------------------------------------------ |
| Control plane                   | Host-based (single domain)       | `https://runai.mycorp.local`                     |
| Inference workloads             | Host-based (wildcard subdomains) | `https://<service>.runai-inference.mycorp.local` |
| Workspaces & training workloads | Host-based or path-based         | See section below                                |

## Installing a Gateway Controller

NVIDIA Run:ai supports any [conformant Gateway API implementation](https://gateway-api.sigs.k8s.io/implementations/). The example below uses **KGateway**. If you are using a different conformant controller, follow its installation documentation and then proceed to the migration steps.

1. Install the Gateway API CRDs:

   ```bash
   kubectl apply -f https://github.com/kubernetes-sigs/gateway-api/releases/download/v1.4.0/standard-install.yaml
   ```
2. Install the KGateway CRDs:

   ```bash
   helm upgrade -i --create-namespace \
     --namespace kgateway-system \
     --version v2.2.1 \
     kgateway-crds oci://cr.kgateway.dev/kgateway-dev/charts/kgateway-crds
   ```
3. Install the KGateway controller:

   ```bash
   helm upgrade -i -n kgateway-system kgateway \
     oci://cr.kgateway.dev/kgateway-dev/charts/kgateway \
     --version v2.2.1
   ```

Ensure that the Gateway controller is running before proceeding.

## Routing Traffic for Workspaces and Training Workloads

The following sections describe how development workspaces and training workloads are exposed using Gateway API.

Depending on the selected routing mode, these workloads are accessed differently:

* In **host-based routing**, each development workspace and training workload is exposed using its own subdomain:

  ```bash
  https://<project>-<workload>.<CLUSTER_URL>
  ```
* In **path-based routing**, development workspaces and training workloads are exposed under a shared domain using URL paths:

  ```bash
  https://runai.mycorp.local/<project>/<workload>
  ```

This distinction determines the required FQDN structure, TLS certificates, and Gateway configuration described in the sections below.

## Gateway API with Host-Based Routing

In this configuration:

* Development workspaces and training workloads are exposed using subdomains
* A wildcard FQDN is required (e.g., `*.runai.mycorp.local`)
* A wildcard TLS certificate is required for those workloads
* The Gateway includes listeners for:
  * The cluster domain (control plane access)
  * Wildcard subdomains (workspace and training workloads)

### Gateway Configuration

{% hint style="info" %}
**Note**

Routing is based on hostnames (subdomains). HTTPRoute resources for the control plane and workloads are created automatically by NVIDIA Run:ai and bind to the configured Gateway.
{% endhint %}

```yaml
apiVersion: gateway.networking.k8s.io/v1
kind: Gateway
metadata:
  name: runai-gateway
  namespace: runai
spec:
  gatewayClassName: kgateway
  listeners:
  - name: https-cluster-domain
    protocol: HTTPS
    port: 443
    hostname: "<CLUSTER_DOMAIN>"
    tls:
      mode: Terminate
      certificateRefs:
      - kind: Secret
        name: runai-cluster-domain-tls-secret

  - name: https-workloads
    protocol: HTTPS
    port: 443
    hostname: "*.<CLUSTER_DOMAIN>"
    tls:
      mode: Terminate
      certificateRefs:
      - kind: Secret
        name: runai-cluster-domain-star-tls-secret
```

## Gateway API with Path-Based Routing

In this configuration:

* Development workspaces and training workloads are exposed under a single domain using URL paths
* A wildcard FQDN is **not required** for workspace and training workloads
* A wildcard TLS certificate is **not required** for workspace and training workloads
* The Gateway uses a single domain listener for these services

Ensure that host-based routing is disabled using the following. For more details, see [Advanced cluster configurations](/self-hosted/infrastructure-setup/advanced-setup/cluster-config.md).

```
clusterConfig.global.subdomainSupport: false
```

### Gateway Configuration

{% hint style="info" %}
**Note**

Routing is based on URL paths under a shared domain. HTTPRoute resources for the control plane and workloads are created automatically by NVIDIA Run:ai and bind to the configured Gateway.
{% endhint %}

```yaml
apiVersion: gateway.networking.k8s.io/v1
kind: Gateway
metadata:
  name: runai-gateway
  namespace: runai
spec:
  gatewayClassName: kgateway
  listeners:
  - name: https-cluster-domain
    protocol: HTTPS
    port: 443
    hostname: "<CLUSTER_DOMAIN>"
    tls:
      mode: Terminate
      certificateRefs:
      - kind: Secret
        name: runai-cluster-domain-tls-secret
```

## Inference Routing with Gateway API

Inference workloads are always exposed using **host-based routing**, regardless of the selected routing mode for development workspaces and training workloads.

Inference endpoints are accessed via dedicated subdomains, for example:

```bash
https://<service>.runai-inference.mycorp.local
```

To enable inference routing with Gateway API:

* A wildcard FQDN must be configured for inference (e.g., `*.runai-inference.mycorp.local`)
* A wildcard TLS certificate must be configured for that domain
* The Gateway must include a listener for the inference subdomain
* Knative Serving must be configured to route inference traffic from the Gateway to inference workloads

### TLS Certificate

Ensure that a wildcard TLS certificate for inference is created:

* Secret name: `runai-cluster-inference-tls-secret`
* Namespace: `runai`

Replace `/path/to/inference-fullchain.pem` and `/path/to/inference-private.pem` with the actual paths to your certificate and private key:

```bash
kubectl create secret tls runai-cluster-inference-tls-secret -n runai \
    --cert /path/to/inference-fullchain.pem \
    --key /path/to/inference-private.pem
```

### Gateway Configuration

{% hint style="info" %}
**Note**

Routing to inference workloads is based on hostnames (subdomains).
{% endhint %}

Add an inference listener to the existing Gateway resource:

```yaml
- name: https-inference
  protocol: HTTPS
  port: 443
  hostname: "*.runai-inference.<CLUSTER_DOMAIN>"
  tls:
    mode: Terminate
    certificateRefs:
    - kind: Secret
      name: runai-cluster-inference-tls-secret
```

### Configure Knative Serving

NVIDIA Run:ai supports Knative-based inference workloads. Inference traffic arrives at the Gateway and is forwarded to Knative Serving through Kourier, which routes it to the individual inference endpoints. The following steps install Knative Serving and configure the HTTPRoute to connect the Gateway to Kourier. Knative versions 1.19 to 1.21 are supported.

{% tabs %}
{% tab title="Kubernetes" %}

1. Install Knative Serving. Follow the [Installing Knative](https://knative.dev/docs/install/operator/knative-with-operators/) instructions or run:

   ```bash
   helm repo add knative-operator https://knative.github.io/operator
   helm install knative-operator --create-namespace --namespace knativeoperator --version 1.18.2 knative-operator/knative-operator
   ```
2. Create the `knative-serving` namespace:

   ```bash
   kubectl create ns knative-serving
   ```
3. Create a YAML file named `knative-serving.yaml` and replace the placeholder FQDN with your wildcard [inference FQDN](/self-hosted/getting-started/installation/install-using-helm/system-requirements.md#fully-qualified-domain-name-fqdn) (for example, `runai-inference.mycorp.local`):

   ```yaml
   apiVersion: operator.knative.dev/v1beta1
   kind: KnativeServing
   metadata:
     name: knative-serving
     namespace: knative-serving
   spec:
     config:
       config-autoscaler:
         enable-scale-to-zero: "true"
       config-features:
         kubernetes.podspec-affinity: enabled
         kubernetes.podspec-init-containers: enabled
         kubernetes.podspec-persistent-volume-claim: enabled
         kubernetes.podspec-persistent-volume-write: enabled
         kubernetes.podspec-schedulername: enabled
         kubernetes.podspec-securitycontext: enabled
         kubernetes.podspec-tolerations: enabled
         kubernetes.podspec-volumes-emptydir: enabled
         kubernetes.podspec-fieldref: enabled
         kubernetes.containerspec-addcapabilities: enabled
         kubernetes.podspec-nodeselector: enabled
         multi-container: enabled
         kubernetes.podspec-hostipc: enabled
         kubernetes.podspec-hostnetwork: enabled
       domain:
         runai-inference.mycorp.local: "" # replace with the wildcard FQDN for Inference
       network:
         domainTemplate: '{{.Name}}-{{.Namespace}}.{{.Domain}}'
         ingress-class: kourier.ingress.networking.knative.dev
         default-external-scheme: https
     high-availability:
       replicas: 2
     ingress:
       kourier:
         enabled: true
   ```
4. Apply the changes:

   ```bash
   kubectl apply -f knative-serving.yaml
   ```
5. Create a YAML file named `knative-httproute.yaml` to route inference traffic from the Gateway to the Kourier service. Replace the FQDN placeholder with your wildcard [inference FQDN](/self-hosted/getting-started/installation/install-using-helm/system-requirements.md#fully-qualified-domain-name-fqdn):

   ```yaml
   apiVersion: gateway.networking.k8s.io/v1
   kind: HTTPRoute
   metadata:
     name: knative-serving-httproute
     namespace: knative-serving
   spec:
     parentRefs:
       - name: runai-gateway
         namespace: runai
     hostnames:
       - "*.runai-inference.mycorp.local" # replace with the wildcard FQDN for Inference
     rules:
       - matches:
         - path:
             type: PathPrefix
             value: /
         backendRefs:
           - name: kourier
             port: 80
   ```
6. Apply the changes:

   ```bash
   kubectl apply -f knative-httproute.yaml
   ```

{% endtab %}

{% tab title="OpenShift" %}

1. Install the OpenShift Serverless Operator. Follow the [Installing the OpenShift Serverless Operator](https://docs.redhat.com/en/documentation/red_hat_openshift_serverless/1.37/html/installing_openshift_serverless/install-serverless-operator) instructions. Once installed, follow the steps below.
2. Create the `knative-serving` project:

   ```bash
   oc new-project knative-serving
   ```
3. Create a YAML file named `knative-serving.yaml`:

   ```yaml
   apiVersion: operator.knative.dev/v1beta1
   kind: KnativeServing
   metadata:
     finalizers:
       - knative-serving-openshift
       - knativeservings.operator.knative.dev
     name: knative-serving
     namespace: knative-serving
   spec:
     config:
       config-features:
         kubernetes.podspec-tolerations: enabled
         kubernetes.podspec-volumes-emptydir: enabled
         kubernetes.podspec-persistent-volume-claim: enabled
         multi-container: enabled
         kubernetes.podspec-persistent-volume-write: enabled
         kubernetes.podspec-fieldref: enabled
         kubernetes.podspec-schedulername: enabled
         kubernetes.podspec-nodeselector: enabled
         kubernetes.podspec-init-containers: enabled
         kubernetes.podspec-securitycontext: enabled
         kubernetes.podspec-affinity: enabled
         kubernetes.containerspec-addcapabilities: enabled
     controller-custom-certs:
       name: ''
       type: ''
     registry: {}
   ```
4. Apply the changes:

   ```bash
   oc apply -f knative-serving.yaml
   ```
5. Create a YAML file named `knative-httproute.yaml` to route inference traffic from the Gateway to the Kourier service. Replace the FQDN placeholder with your wildcard [inference FQDN](/self-hosted/getting-started/installation/install-using-helm/system-requirements.md#fully-qualified-domain-name-fqdn):

   <div data-gb-custom-block data-tag="hint" data-style="info" class="hint hint-info"><p><strong>Note</strong></p><p>OpenShift's Ingress Operator automatically creates DNS records for Gateway listeners, so no manual DNS configuration is required for the inference subdomain.</p></div>

   ```yaml
   apiVersion: gateway.networking.k8s.io/v1
   kind: HTTPRoute
   metadata:
     name: knative-serving-httproute
     namespace: knative-serving
   spec:
     parentRefs:
       - name: openshift-default
         namespace: openshift-ingress
     hostnames:
       - "*.runai-inference.mycorp.local" # replace with the wildcard FQDN for Inference
     rules:
       - matches:
         - path:
             type: PathPrefix
             value: /
         backendRefs:
           - name: kourier
             port: 80
   ```
6. Apply the changes:

   ```bash
   oc apply -f knative-httproute.yaml
   ```

{% endtab %}
{% endtabs %}

## Introduce the Gateway API to NVIDIA Run:ai Services

To enable NVIDIA Run:ai to route traffic through the configured Gateway, both the control plane and the cluster must be updated to reference the Gateway.

At this stage, traffic is still served through the existing Ingress. The Gateway is introduced alongside the current setup and will become active only after DNS is updated in a later step.

### Configure the NVIDIA Run:ai Control Plane

```bash
helm upgrade runai-backend -n runai-backend runai-backend/control-plane \
  --reuse-values \
  --set-string global.gatewayAPI.enabled=true \
  --set-string global.gatewayAPI.parentRefs[0].name=<gateway-name> \
  --set-string global.gatewayAPI.parentRefs[0].namespace=<gateway-namespace>
```

### Configure the NVIDIA Run:ai Cluster

```bash
helm upgrade runai-cluster -n runai runai/runai-cluster \
  --reuse-values \
  --set-string clusterConfig.global.gatewayAPI.enabled=true \
  --set-string clusterConfig.global.gatewayAPI.name=<gateway-name> \
  --set-string clusterConfig.global.gatewayAPI.namespace=<gateway-namespace>
```

## Verify Gateway Configuration

```bash
kubectl get gateway -n runai
```

```bash
kubectl get httproute -A
```

Test connectivity:

```bash
curl -H "Host: <CLUSTER_DOMAIN>" https://<gateway-ip>/
```

## Switch Traffic to Gateway

Before switching traffic, ensure that the NVIDIA Run:ai control plane and cluster are already configured to use the Gateway API.

Update DNS:

* `<CLUSTER_DOMAIN>` -> Gateway IP
* `*.<CLUSTER_DOMAIN>` -> Gateway IP

Traffic is routed based on DNS configuration. Until DNS records are updated to point to the Gateway IP, all traffic continues to be served through the existing Ingress.

```bash
nslookup <CLUSTER_DOMAIN>
```

```bash
curl https://<CLUSTER_DOMAIN>/
```

Disable Ingress on the cluster:

```bash
helm upgrade runai-cluster -n runai runai/runai-cluster \
  --reuse-values \
  --set-string clusterConfig.global.ingress.enabled=false
```

Disable Ingress on the control plane:

```bash
helm upgrade runai-backend -n runai-backend runai-backend/control-plane \
  --reuse-values \
  --set-string global.ingress.enabled=false
```

## Rollback to Ingress

To roll back to Ingress, re-enable Ingress and disable Gateway API on both the cluster and the control plane. Then update DNS records to point back to the Ingress IP.

Cluster:

```bash
helm upgrade runai-cluster -n runai runai/runai-cluster \
  --reuse-values \
  --set-string clusterConfig.global.ingress.enabled=true \
  --set-string clusterConfig.global.gatewayAPI.enabled=false
```

Control plane:

```bash
helm upgrade runai-backend -n runai-backend runai-backend/control-plane \
  --reuse-values \
  --set-string global.ingress.enabled=true \
  --set-string global.gatewayAPI.enabled=false
```

Update DNS records for `<CLUSTER_DOMAIN>` and `*.<CLUSTER_DOMAIN>` to point back to the Ingress IP.


---

# Agent Instructions: Querying This Documentation

If you need additional information that is not directly available in this page, you can query the documentation dynamically by asking a question.

Perform an HTTP GET request on the current page URL with the `ask` query parameter:

```
GET https://run-ai-docs.nvidia.com/self-hosted/infrastructure-setup/advanced-setup/kubernetes-gateway-api.md?ask=<question>
```

The question should be specific, self-contained, and written in natural language.
The response will contain a direct answer to the question and relevant excerpts and sources from the documentation.

Use this mechanism when the answer is not explicitly present in the current page, you need clarification or additional context, or you want to retrieve related documentation sections.