> For the complete documentation index, see [llms.txt](https://run-ai-docs.nvidia.com/llms.txt). Markdown versions of documentation pages are available by appending `.md` to page URLs; this page is available as [Markdown](https://run-ai-docs.nvidia.com/multi-tenant/2.22/getting-started/installation/cp-system-requirements.md).

# Control Plane System Requirements

The NVIDIA Run:ai control plane is a Kubernetes-based application that centrally manages workloads, users, scheduling, and cluster integrations across multiple tenants.

In a multi-tenant deployment, the control plane is installed once, in a dedicated Kubernetes cluster and configured for multi-tenancy. This section outlines the hardware and software system requirements needed to deploy and operate the control plane in a multi-tenant environment.

## Installer Machine

The machine running the installation script (typically the Kubernetes master) must have:

* At least 50GB of free space
* Docker installed
* [Helm](https://helm.sh/) 3.14 or later

{% hint style="info" %}
**Note**

Helm 4 defaults to [server-side apply](https://helm.sh/docs/overview/#server-side-apply) when installing a new chart release, which can conflict with resources managed by the NVIDIA Run:ai operator. Append `--server-side=false` to your `helm upgrade` command. NVIDIA Run:ai clusters originally installed with Helm 3.x are unaffected.
{% endhint %}

## Hardware Requirements

The following hardware requirements are for the control plane system nodes. By default, all NVIDIA Run:ai control plane services run on all available nodes.

### Architecture

**x86** and **ARM** architectures are supported for Kubernetes.

### NVIDIA Run:ai Control Plane - System Nodes

This configuration is the minimum requirement you need to install and use NVIDIA Run:ai control plane:

| Component  | Required Capacity |
| ---------- | ----------------- |
| CPU        | 10 cores          |
| Memory     | 12GB              |
| Disk space | 110GB             |

{% hint style="info" %}
**Note**

* To designate nodes to NVIDIA Run:ai system services, follow the instructions as described in [System nodes](/multi-tenant/2.22/infrastructure-setup/advanced-setup/node-roles.md#system-nodes).
* If you are using **Grafana Mimir** for monitoring, we recommend using the **Microservices mode** to properly size your environment. Refer to the following guide for capacity planning, [Planning Grafana Mimir capacity](https://grafana.com/docs/mimir/latest/manage/run-production-environment/planning-capacity/#microservices-mode).
  {% endhint %}

## Software Requirements

The following software requirements must be fulfilled.

### Operating System

* Any **Linux** operating system supported by both Kubernetes and NVIDIA GPU Operator
* Internal tests are being performed on **Ubuntu 22.04.**

### Network Time Protocol

Nodes are required to be synchronized by time using NTP (Network Time Protocol) for proper system functionality.

### Kubernetes Distribution

NVIDIA Run:ai control plane requires Kubernetes. The following Kubernetes distributions are supported:

* Vanilla Kubernetes
* NVIDIA Base Command Manager (BCM)
* Elastic Kubernetes Engine (EKS)
* Google Kubernetes Engine (GKE)
* Azure Kubernetes Service (AKS)
* Oracle Kubernetes Engine (OKE)
* Rancher Kubernetes Engine (RKE1)
* Rancher Kubernetes Engine 2 (RKE2)

See the following Kubernetes version support matrix for the latest NVIDIA Run:ai releases:

| NVIDIA Run:ai version | Supported Kubernetes versions |
| --------------------- | ----------------------------- |
| v2.22 (latest)        | 1.31 to 1.33                  |

For information on supported versions of managed Kubernetes, it's important to consult the release notes provided by your Kubernetes service provider. There, you can confirm the specific version of the underlying Kubernetes platform supported by the provider, ensuring compatibility with NVIDIA Run:ai. For an up-to-date end-of-life statement see [Kubernetes Release History](https://kubernetes.io/releases/) or [OpenShift Container Platform Life Cycle Policy](https://access.redhat.com/support/policy/updates/openshift).

### NVIDIA Run:ai Namespace

The NVIDIA Run:ai control plane uses a namespace `runai-backend`. Use the following to create the namespace:

```
kubectl create namespace runai-backend
```

### Default Storage Class

The NVIDIA Run:ai control plane requires a **default storage class** to create persistent volume claims for NVIDIA Run:ai storage. The storage class, as per Kubernetes standards, controls the reclaim behavior, whether the NVIDIA Run:ai persistent data is saved or deleted when the NVIDIA Run:ai control plane is deleted.

{% hint style="info" %}
**Note**

For a simple (non-production) storage class example see [Kubernetes Local Storage Class](https://kubernetes.io/docs/concepts/storage/storage-classes/#local). The storage class will set the directory `/opt/local-path-provisioner` to be used across all nodes as the path for provisioning persistent volumes. Then set the new storage class as default:

```bash
kubectl patch storageclass local-path -p '{"metadata": {"annotations":{"storageclass.kubernetes.io/is-default-class":"true"}}}'
```

{% endhint %}

### Kubernetes Ingress Controller

The NVIDIA Run:ai control plane requires [Kubernetes Ingress Controller](https://kubernetes.io/docs/concepts/services-networking/ingress-controllers/) to be installed.

* RKE and RKE2 come with a pre-installed ingress controller.
* Internal tests are being performed on NGINX, Rancher NGINX, OpenShift Router, and Istio.
* Make sure that a default ingress controller is set.

There are many ways to install and configure different ingress controllers. The following shows a simple example to install and configure NGINX ingress controller using [helm](https://helm.sh/):

Run the following commands:

* For cloud deployments, both the **internal IP** and **external IP** are required.
* For on-prem deployments, only the **external IP** is needed.

```bash
helm repo add ingress-nginx https://kubernetes.github.io/ingress-nginx
helm repo update
helm upgrade -i nginx-ingress ingress-nginx/ingress-nginx \
    --namespace nginx-ingress --create-namespace \
    --set controller.kind=DaemonSet \
    --set controller.service.externalIPs="{<INTERNAL-IP>,<EXTERNAL-IP>}" # Replace <INTERNAL-IP> and <EXTERNAL-IP> with the internal and external IP addresses of one of the nodes
```

## Domain and Certificate Requirements

To install the NVIDIA Run:ai control plane in a NVIDIA Run:ai multi-tenant deployment, configure a wildcard DNS and wildcard certificate that allow secure access across tenant environments.

### Wildcard DNS Record

In order to expose the NVIDIA Run:ai platform under a unified domain, configure a wildcard DNS record (e.g., `*.runai.hostorg.com`) that resolves to the cluster's load balancer IP address.

### Wildcard TLS Certificate

You must provide a TLS certificate that matches your wildcard DNS domain (e.g., `*.runai.hostorg.com`). This certificate is used to secure HTTPS access to tenant-facing endpoints, ensuring that each tenant receives a secure URL when accessing NVIDIA Run:ai services.

Create a [Kubernetes Secret](https://kubernetes.io/docs/concepts/configuration/secret/) named `runai-backend-tls` in the `runai-backend` namespace and include the path to the TLS `--cert` and its corresponding private `--key` by running the following:

```bash
kubectl create secret tls runai-backend-tls -n runai-backend \
  --cert /path/to/fullchain.pem  \ # Replace /path/to/fullchain.pem with the actual path to your TLS certificate 
  --key /path/to/private.pem # Replace /path/to/private.pem with the actual 
```

## External Postgres Database (Optional)

The NVIDIA Run:ai control plane installation includes a default PostgreSQL database. However, you may opt to use an existing PostgreSQL database if you have specific requirements or preferences as detailed in [External Postgres database configuration.](/multi-tenant/2.22/getting-started/installation/preparations.md#external-postgres-database-optional) Note that only PostgreSQL version 16 is supported.


---

# Agent Instructions
This documentation is published with GitBook. GitBook is the documentation platform designed so that both humans and AI agents can read, navigate, and reason over technical content effectively. Learn more at gitbook.com.

## Querying This Documentation
If you need additional information that is not directly available in this page, you can query the documentation dynamically by asking a question.

Perform an HTTP GET request on the current page URL with the `ask` query parameter, and the optional `goal` query parameter:

```
GET https://run-ai-docs.nvidia.com/multi-tenant/2.22/getting-started/installation/cp-system-requirements.md?ask=<question>&goal=<endgoal>
```

`ask` is the immediate question: it should be specific, self-contained, and written in natural language.
`goal` is optional and describes the broader end goal you are ultimately trying to accomplish on behalf of the user. GitBook uses it to tailor the answer towards what is most useful for that goal.

The response will contain a direct answer to the question and relevant excerpts and sources from the documentation.

Use this mechanism when the answer is not explicitly present in the current page, you need clarification or additional context, or you want to retrieve related documentation sections.