External Access to Containers

Researchers may need to access containers remotely during workload execution. Common use cases include:

Running a Jupyter Notebook inside the container
Connecting PyCharm for remote Python development
Viewing machine learning visualizations using TensorBoard

To enable this access, you must expose the relevant container ports.

Exposing Container Ports

Accessing the containers remotely requires exposing container ports. In Docker, ports are exposed by declaring them when launching the container. NVIDIA Run:ai provides similar functionality within a Kubernetes environment.

Since Kubernetes abstracts the container's physical location, exposing ports is more complex. Kubernetes supports multiple methods for exposing container ports. For more details, refer to the Kubernetes services and networking documentation.

Method

Description

NVIDIA Run:ai Support

Port Forwarding

Simple port forwarding allows access to the container via local and/or remote port.

Supported natively via Kubernetes

NodePort

Exposes the service on each Node’s IP at a static port (the NodePort). You’ll be able to contact the NodePort service from outside the cluster by requesting <NODE-IP>:<NODE-PORT> regardless of which node the container actually resides in.

Supported

LoadBalancer

Exposes the service externally using a cloud provider’s load balancer.

Supported via API with limited capabilities

Access to the Running Workload's Container

Many tools used by researchers, such as Jupyter, TensorBoard, or VSCode, require remote access to the running workload's container. In NVIDIA Run:ai, this access is provided through dynamically generated URLs.

Path-Based Routing

By default, NVIDIA Run:ai uses the Cluster URL provided to dynamically create SSL-secured URLs in the following format:

https://<CLUSTER_URL>/project-name/workload-name

While path-based routing works with applications such as Jupyter Notebooks, it may not be compatible with other applications. Some applications assume they are running at the root file system, so hardcoded file paths and settings within the container may become invalid when running at a path other than the root. For example, if an application expects to access /etc/config.json but is served at /project-name/workspace-name, the file will not be found. This can cause the container to fail or not function as intended.

Host-Based Routing

NVIDIA Run:ai provides support for host-based routing. When enabled, URLs follow the format:

https://project-name-workload-name.<CLUSTER_URL>/

This allows all workloads to run at the root path, avoiding file path issues and ensuring proper application behavior.

Enabling Host-Based Routing

To enable host-based routing, perform the following steps:

Note

For OpenShift, editing the runaiconfig command is the only step required to generate URLs. Refer to the last step below.

Create a second DNS entry (A record) for *.<CLUSTER_URL>, pointing to the same IP as the cluster's Fully Qualified Domain Name (FQDN).
Obtain a wildcard SSL certificate for this second DNS entry.
Add the certificate as a secret:

kubectl create secret tls runai-cluster-domain-tls-secret -n runai \    
  --cert /path/to/fullchain.pem  \ # Replace /path/to/fullchain.pem with the actual path to your TLS certificate    
  --key /path/to/private.pem # Replace /path/to/private.pem with the actual path to your private key

Create the following ingress rule and replace <CLUSTER_URL>:

apiVersion: networking.k8s.io/v1
kind: Ingress
metadata:
  name: runai-cluster-domain-star-ingress
  namespace: runai
spec:
  ingressClassName: nginx
  rules:
  - host: '*.<CLUSTER_URL>'
  tls:
  - hosts:
    - '*.<CLUSTER_URL>'
    secretName: runai-cluster-domain-star-tls-secret

Run the following:

kubectl apply -f <filename>

Edit runaiconfig to generate the URLs correctly:

kubectl patch RunaiConfig runai -n runai --type="merge" \    
    -p '{"spec":{"global":{"subdomainSupport": true}}}'

Once these requirements have been met, all workloads will automatically be assigned a secured URL with a subdomain, ensuring full functionality for all researcher applications.

PreviousAdvanced Cluster Configurations NextService Mesh

Last updated 1 day ago