Kubernetes Gateway API

NVIDIA Run:ai supports the Kubernetes Gateway API as an alternative to Ingress for routing external traffic. Gateway API provides a flexible and extensible model for defining how traffic is exposed and routed within the cluster.

This page builds on the concepts described in Routing Traffic to and from NVIDIA Run:ai Services, including FQDN configuration, TLS certificates, and routing behavior.

Note

Gateway API support in NVIDIA Run:ai is optional. If you are currently using HAProxy Ingress or other ingress controllers, no action is required. Customers who wish to adopt Gateway API may do so using the instructions on this page.

Scope and Prerequisites

Before proceeding, ensure that the following prerequisites are already configured:

  • Fully Qualified Domain Names (FQDN) for:

    • Control plane access

    • Development workspaces and training workloads

    • Inference workloads

  • TLS certificates associated with the configured FQDNs

These configurations are described in the Routing Traffic to and from NVIDIA Run:ai Services section.

Routing Modes in NVIDIA Run:ai

NVIDIA Run:ai supports two routing approaches for exposing services: host-based routing and path-based routing.

Different NVIDIA Run:ai services use these routing approaches as follows:

Service
Routing Mode
Example

Control plane

Host-based (single domain)

https://runai.mycorp.local

Inference workloads

Host-based (wildcard subdomains)

https://<service>.runai-inference.mycorp.local

Workspaces & training workloads

Host-based or path-based

See section below

Installing a Gateway Controller

NVIDIA Run:ai supports any conformant Gateway API implementation. The example below uses KGateway. If you are using a different conformant controller, follow its installation documentation and then proceed to the migration steps.

  1. Install the Gateway API CRDs:

  2. Install the KGateway CRDs:

  3. Install the KGateway controller:

Ensure that the Gateway controller is running before proceeding.

Routing Traffic for Workspaces and Training Workloads

The following sections describe how development workspaces and training workloads are exposed using Gateway API.

Depending on the selected routing mode, these workloads are accessed differently:

  • In host-based routing, each development workspace and training workload is exposed using its own subdomain:

  • In path-based routing, development workspaces and training workloads are exposed under a shared domain using URL paths:

This distinction determines the required FQDN structure, TLS certificates, and Gateway configuration described in the sections below.

Gateway API with Host-Based Routing

In this configuration:

  • Development workspaces and training workloads are exposed using subdomains

  • A wildcard FQDN is required (e.g., *.runai.mycorp.local)

  • A wildcard TLS certificate is required for those workloads

  • The Gateway includes listeners for:

    • The cluster domain (control plane access)

    • Wildcard subdomains (workspace and training workloads)

Gateway Configuration

Note

Routing is based on hostnames (subdomains). HTTPRoute resources for the control plane and workloads are created automatically by NVIDIA Run:ai and bind to the configured Gateway.

Gateway API with Path-Based Routing

In this configuration:

  • Development workspaces and training workloads are exposed under a single domain using URL paths

  • A wildcard FQDN is not required for workspace and training workloads

  • A wildcard TLS certificate is not required for workspace and training workloads

  • The Gateway uses a single domain listener for these services

Ensure that host-based routing is disabled using the following. For more details, see Advanced cluster configurations.

Gateway Configuration

Note

Routing is based on URL paths under a shared domain. HTTPRoute resources for the control plane and workloads are created automatically by NVIDIA Run:ai and bind to the configured Gateway.

Inference Routing with Gateway API

Inference workloads are always exposed using host-based routing, regardless of the selected routing mode for development workspaces and training workloads.

Inference endpoints are accessed via dedicated subdomains, for example:

To enable inference routing with Gateway API:

  • A wildcard FQDN must be configured for inference (e.g., *.runai-inference.mycorp.local)

  • A wildcard TLS certificate must be configured for that domain

  • The Gateway must include a listener for the inference subdomain

  • Knative Serving must be configured to route inference traffic from the Gateway to inference workloads

TLS Certificate

Ensure that a wildcard TLS certificate for inference is created:

  • Secret name: runai-cluster-inference-tls-secret

  • Namespace: runai

Replace /path/to/inference-fullchain.pem and /path/to/inference-private.pem with the actual paths to your certificate and private key:

Gateway Configuration

Note

Routing to inference workloads is based on hostnames (subdomains).

Add an inference listener to the existing Gateway resource:

Configure Knative Serving

NVIDIA Run:ai supports Knative-based inference workloads. Inference traffic arrives at the Gateway and is forwarded to Knative Serving through Kourier, which routes it to the individual inference endpoints. The following steps install Knative Serving and configure the HTTPRoute to connect the Gateway to Kourier. Knative versions 1.19 to 1.21 are supported.

  1. Install Knative Serving. Follow the Installing Knative instructions or run:

  2. Create the knative-serving namespace:

  3. Create a YAML file named knative-serving.yaml and replace the placeholder FQDN with your wildcard inference FQDN (for example, runai-inference.mycorp.local):

  4. Apply the changes:

  5. Create a YAML file named knative-httproute.yaml to route inference traffic from the Gateway to the Kourier service. Replace the FQDN placeholder with your wildcard inference FQDN:

  6. Apply the changes:

Introduce the Gateway API to NVIDIA Run:ai Services

To enable NVIDIA Run:ai to route traffic through the configured Gateway, both the control plane and the cluster must be updated to reference the Gateway.

At this stage, traffic is still served through the existing Ingress. The Gateway is introduced alongside the current setup and will become active only after DNS is updated in a later step.

Configure the NVIDIA Run:ai Cluster

Verify Gateway Configuration

Test connectivity:

Switch Traffic to Gateway

Before switching traffic, ensure that the NVIDIA Run:ai control plane and cluster are already configured to use the Gateway API.

Update DNS:

  • <CLUSTER_DOMAIN> -> Gateway IP

  • *.<CLUSTER_DOMAIN> -> Gateway IP

Traffic is routed based on DNS configuration. Until DNS records are updated to point to the Gateway IP, all traffic continues to be served through the existing Ingress.

Disable Ingress on the cluster:

Rollback to Ingress

To roll back to Ingress, re-enable Ingress and disable Gateway API on both the cluster and the control plane. Then update DNS records to point back to the Ingress IP.

Update DNS records for <CLUSTER_DOMAIN> and *.<CLUSTER_DOMAIN> to point back to the Ingress IP.

Last updated