Kubernetes Gateway API
NVIDIA Run:ai supports the Kubernetes Gateway API as an alternative to Ingress for routing external traffic. Gateway API provides a flexible and extensible model for defining how traffic is exposed and routed within the cluster.
This page builds on the concepts described in Routing Traffic to and from NVIDIA Run:ai Services, including FQDN configuration, TLS certificates, and routing behavior.
Note
Gateway API support in NVIDIA Run:ai is optional. If you are currently using HAProxy Ingress or other ingress controllers, no action is required. Customers who wish to adopt Gateway API may do so using the instructions on this page.
Scope and Prerequisites
This guide assumes a self-hosted deployment where the NVIDIA Run:ai control plane and cluster services are installed on the same Kubernetes cluster.
Before proceeding, ensure that the following prerequisites are already configured:
Fully Qualified Domain Names (FQDN) for:
Control plane access
Development workspaces and training workloads
Inference workloads
TLS certificates associated with the configured FQDNs
These configurations are described in the Routing Traffic to and from NVIDIA Run:ai Services section.
Routing Modes in NVIDIA Run:ai
NVIDIA Run:ai supports two routing approaches for exposing services: host-based routing and path-based routing.
Different NVIDIA Run:ai services use these routing approaches as follows:
Control plane
Host-based (single domain)
https://runai.mycorp.local
Inference workloads
Host-based (wildcard subdomains)
https://<service>.runai-inference.mycorp.local
Workspaces & training workloads
Host-based or path-based
See section below
Installing a Gateway Controller
NVIDIA Run:ai supports any conformant Gateway API implementation. The example below uses KGateway. If you are using a different conformant controller, follow its installation documentation and then proceed to the migration steps.
Install the Gateway API CRDs:
Install the KGateway CRDs:
Install the KGateway controller:
Ensure that the Gateway controller is running before proceeding.
Routing Traffic for Workspaces and Training Workloads
The following sections describe how development workspaces and training workloads are exposed using Gateway API.
Depending on the selected routing mode, these workloads are accessed differently:
In host-based routing, each development workspace and training workload is exposed using its own subdomain:
In path-based routing, development workspaces and training workloads are exposed under a shared domain using URL paths:
This distinction determines the required FQDN structure, TLS certificates, and Gateway configuration described in the sections below.
Gateway API with Host-Based Routing
In this configuration:
Development workspaces and training workloads are exposed using subdomains
A wildcard FQDN is required (e.g.,
*.runai.mycorp.local)A wildcard TLS certificate is required for those workloads
The Gateway includes listeners for:
The cluster domain (control plane access)
Wildcard subdomains (workspace and training workloads)
Gateway Configuration
Note
Routing is based on hostnames (subdomains). HTTPRoute resources for the control plane and workloads are created automatically by NVIDIA Run:ai and bind to the configured Gateway.
Gateway API with Path-Based Routing
In this configuration:
Development workspaces and training workloads are exposed under a single domain using URL paths
A wildcard FQDN is not required for workspace and training workloads
A wildcard TLS certificate is not required for workspace and training workloads
The Gateway uses a single domain listener for these services
Ensure that host-based routing is disabled using the following. For more details, see Advanced cluster configurations.
Gateway Configuration
Note
Routing is based on URL paths under a shared domain. HTTPRoute resources for the control plane and workloads are created automatically by NVIDIA Run:ai and bind to the configured Gateway.
Inference Routing with Gateway API
Inference workloads are always exposed using host-based routing, regardless of the selected routing mode for development workspaces and training workloads.
Inference endpoints are accessed via dedicated subdomains, for example:
To enable inference routing with Gateway API:
A wildcard FQDN must be configured for inference (e.g.,
*.runai-inference.mycorp.local)A wildcard TLS certificate must be configured for that domain
The Gateway must include a listener for the inference subdomain
Knative Serving must be configured to route inference traffic from the Gateway to inference workloads
TLS Certificate
Ensure that a wildcard TLS certificate for inference is created:
Secret name:
runai-cluster-inference-tls-secretNamespace:
runai
Replace /path/to/inference-fullchain.pem and /path/to/inference-private.pem with the actual paths to your certificate and private key:
Gateway Configuration
Note
Routing to inference workloads is based on hostnames (subdomains).
Add an inference listener to the existing Gateway resource:
Configure Knative Serving
NVIDIA Run:ai supports Knative-based inference workloads. Inference traffic arrives at the Gateway and is forwarded to Knative Serving through Kourier, which routes it to the individual inference endpoints. The following steps install Knative Serving and configure the HTTPRoute to connect the Gateway to Kourier. Knative versions 1.19 to 1.21 are supported.
Install Knative Serving. Follow the Installing Knative instructions or run:
Create the
knative-servingnamespace:Create a YAML file named
knative-serving.yamland replace the placeholder FQDN with your wildcard inference FQDN (for example,runai-inference.mycorp.local):Apply the changes:
Create a YAML file named
knative-httproute.yamlto route inference traffic from the Gateway to the Kourier service. Replace the FQDN placeholder with your wildcard inference FQDN:Apply the changes:
Install the OpenShift Serverless Operator. Follow the Installing the OpenShift Serverless Operator instructions. Once installed, follow the steps below.
Create the
knative-servingproject:Create a YAML file named
knative-serving.yaml:Apply the changes:
Create a YAML file named
knative-httproute.yamlto route inference traffic from the Gateway to the Kourier service. Replace the FQDN placeholder with your wildcard inference FQDN:Note
OpenShift's Ingress Operator automatically creates DNS records for Gateway listeners, so no manual DNS configuration is required for the inference subdomain.
Apply the changes:
Introduce the Gateway API to NVIDIA Run:ai Services
To enable NVIDIA Run:ai to route traffic through the configured Gateway, both the control plane and the cluster must be updated to reference the Gateway.
At this stage, traffic is still served through the existing Ingress. The Gateway is introduced alongside the current setup and will become active only after DNS is updated in a later step.
Configure the NVIDIA Run:ai Control Plane
Configure the NVIDIA Run:ai Cluster
Verify Gateway Configuration
Test connectivity:
Switch Traffic to Gateway
Before switching traffic, ensure that the NVIDIA Run:ai control plane and cluster are already configured to use the Gateway API.
Update DNS:
<CLUSTER_DOMAIN>-> Gateway IP*.<CLUSTER_DOMAIN>-> Gateway IP
Traffic is routed based on DNS configuration. Until DNS records are updated to point to the Gateway IP, all traffic continues to be served through the existing Ingress.
Disable Ingress on the cluster:
Disable Ingress on the control plane:
Rollback to Ingress
To roll back to Ingress, re-enable Ingress and disable Gateway API on both the cluster and the control plane. Then update DNS records to point back to the Ingress IP.
Cluster:
Control plane:
Update DNS records for <CLUSTER_DOMAIN> and *.<CLUSTER_DOMAIN> to point back to the Ingress IP.
Last updated