Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
NVIDIA Run:ai is a GPU orchestration and optimization platform that helps organizations maximize compute utilization for AI workloads. By optimizing the use of expensive compute resources, NVIDIA Run:ai accelerates AI development cycles, and drives faster time-to-market for AI-powered innovations.
Built on Kubernetes, NVIDIA Run:ai supports dynamic GPU allocation, workload submission, workload scheduling, and resource sharing, ensuring that AI teams get the compute power they need while IT teams maintain control over infrastructure efficiency.
NVIDIA Run:ai centralizes cluster management and optimizes infrastructure control by offering:
- Manage all clusters from a single platform, ensuring consistency and control across environments.
- Gain real-time and historical insights into GPU consumption across clusters to optimize resource allocation and plan future capacity needs efficiently.
- Define and enforce security and usage policies to align GPU consumption with business and compliance requirements.
- Integrate with your organization's identity provider for streamlined authentication (Single Sign On) and role-based access control (RBAC).
NVIDIA Run:ai simplifies AI infrastructure management by providing a structured approach to managing AI initiatives, resources, and user access. It enables platform administrators maintain control, efficiency, and scalability across their infrastructure:
- Map and set up AI initiatives according to your organization's structure, ensuring clear resource allocation.
- Enable seamless sharing and pooling of GPUs across multiple users, reducing idle time and optimizing utilization.
- Assign users (AI practitioners, ML engineers) to specific projects and departments to manage access and enforce security policies, utilizing role-based access control (RBAC) to ensure permissions align with user roles.
NVIDIA Run:ai empowers data scientists and ML engineers by providing:
- Ensure high-priority jobs get GPU resources. Workloads dynamically receive resources based on demand.
- Request and utilize only a fraction of a GPU's memory, ensuring efficient resource allocation and leaving room for other workloads.
- Run your entire AI initiatives lifecycle – Jupyter Notebooks, training jobs, and inference workloads efficiently.
- Ensure an uninterrupted experience when working on Jupyter Notebooks without taking away GPUs.
NVIDIA Run:ai is made up of two components both installed over a cluster:
NVIDIA Run:ai cluster - Provides scheduling and workload management, extending Kubernetes native capabilities.
NVIDIA Run:ai control plane - Provides resource management, handles workload submission and provides cluster monitoring and analytics.
The NVIDIA Run:ai cluster is responsible for scheduling AI workloads and efficiently allocating GPU resources across users and projects:
- Applies AI-aware rules to efficiently schedule workloads submitted by AI practitioners.
- Handles workload management which includes the researcher code running as a Kubernetes container and the system resources required to run the code, such as storage, credentials, network endpoints to access the container and so on.
- Installed as a Kubernetes Operator to automate deployment, upgrades and configuration of NVIDIA Run:ai cluster services.
The NVIDIA Run:ai control plane provides a centralized management interface for organizations to oversee their GPU infrastructure across multiple locations/subnets, accessible via Web UI, and . The control plane can be deployed on the cloud or on-premise for organizations that require local control over their infrastructure (self-hosted).
- Manages multiple NVIDIA Run:ai clusters for a single tenant across different locations and subnets from a single unified interface.
- Allows administrators to define Projects, Departments and user roles, enforcing policies for fair resource distribution.
- Allows teams to submit workloads, track usage, and monitor GPU performance in real time.
There are two main installation options:
Kubernetes-native application - Install as a Kubernetes-native application, seamlessly extending Kubernetes for native cloud experience and operational standards (install, upgrade, configure).
Monitoring and insights - Track real-time and historical data on GPU usage to help track resource consumption and optimize costs.
Scalability for training and inference - Support for distributed training across multiple GPUs and auto-scales inference workloads.
Integrations - Integrate with popular ML frameworks - PyTorch, TensorFlow, XGBoost, Knative, Spark, Kubeflow Pipelines, Apache Airflow, Argo workloads, Ray and more.
Flexible workload submission - Submit workloads using the NVIDIA Run:ai UI, API, CLI or run third-party workloads.
Secured communication - Uses an outbound-only, secured (SSL) connection to synchronize with the NVIDIA Run:ai control plane.
Private - NVIDIA Run:ai only synchronizes metadata and operational metrics (e.g., workloads, nodes) with the control plane. No proprietary data, model artifacts, or user data sets are ever transmitted, ensuring full data privacy and security.
SaaS
NVIDIA Run:ai is installed on the customer's data science GPU clusters. The cluster connects to the NVIDIA Run:ai control plane on the cloud (https://<tenant-name>.run.ai).
With this installation, the cluster requires an outbound connection to the NVIDIA Run:ai cloud.
Self-hosted
The NVIDIA Run:ai control plane is also installed in the customer's data center


This section includes release information for the self-hosted version of NVIDIA Run:ai:
New Features and Enhancements - Highlights major updates introduced in each version, including new capabilities, UI improvements, and changes to system behavior..
Hotfixes - Lists patches applied to released versions, including critical fixes and behavior corrections.
NVIDIA Run:ai uses life cycle labels to indicate the maturity and stability of features across releases:
Experimental - This feature is in early development. It may not be stable and could be removed or changed significantly in future versions. Use with caution.
Beta - This feature is still being developed for official release in a future version and may have some limitations. Use with caution.
Legacy - This feature is scheduled to be removed in future versions. We recommend using alternatives if available. Use only if necessary.
This article explains the procedure to create your own user applications.
Applications are used for API integrations with NVIDIA Run:ai. An application contains a client ID and a client secret. With the client credentials, you can obtain a token as detailed in and use it within subsequent API calls.
The token obtained through user applications assumes the roles and permissions of the user.
To create an application:
Click the user avatar at the top right corner, then select Settings
Click +APPLICATION
Enter the application’s name
Click CREATE
Copy the Client ID and Client secret and store securely
Click DONE
You can create up to 20 user applications.
To regenerate a client secret:
Locate the application you want to regenerate its client secret
Click Regenerate client secret
Click REGENERATE
Copy the New client secret and store it securely
Click DONE
Important
Regenerating a client secret revokes the previous one.
Locate the application you want to delete
Click on the trash icon
On the dialog, click DELETE to confirm
Go to the User Applications API reference to view the available actions.
Both the control plane and clusters require Kubernetes. Typically, the control plane and first cluster are installed on the same Kubernetes cluster.
The self-hosted option is for organizations that cannot use a SaaS solution due to data leakage concerns. NVIDIA Run:ai self-hosting comes with two variants:
Connected
The organization can freely download from the internet (though upload is not allowed)
Air-gapped
The organization has no connection to the internet
NVIDIA Run:ai supports service mesh implementations. When a service mesh is deployed with sidecar injection, specific configurations must be applied to ensure compatibility with NVIDIA Run:ai. This document outlines the required changes for the NVIDIA Run:ai control plane and cluster.
By default, NVIDIA Run:ai prevents Istio from injecting sidecar containers into system jobs in the control plane. For other service mesh solutions, users must manually add annotations during installation.
To disable sidecar injection in the NVIDIA Run:ai control plane, modify the Helm values file by adding the required pod labels to the following components. See for more details.
Example for :
Sidecar containers injected by some service mesh solutions can prevent NVIDIA Run:ai installation hooks from completing. To avoid this, modify the Helm installation command to include the required labels or annotations:
Example for :
To prevent sidecar injection in workloads created at runtime (such as training workloads), update the runaiconfig resource. See for more details:
Deploying NVIDIA Run:ai in mission-critical environments requires proper monitoring and maintenance of resources to ensure workloads run and are deployed as expected.
Details on how to monitor different parts of the physical resources in your Kubernetes system, including clusters and nodes, can be found in the monitoring and maintenance section. Adjacent configuration and troubleshooting sections also cover high availability, restoring and securing clusters, collecting logs, and reviewing audit logs to meet compliance requirements.
In addition to monitoring NVIDIA Run:ai resources, it is also highly recommended to monitor NVIDIA Run:ai runs on Kubernetes, which manages containerized applications. In particular, focus on three main layers:
This is the highest layer and includes the parts of NVIDIA Run:ai pods, which run in containers managed by Kubernetes.
This layer includes the main Kubernetes system that runs and manages NVIDIA Run:ai components. Important elements to monitor include:
The health of the cluster and nodes (machines in the cluster).
The status of key Kubernetes services, such as the API server. For detailed information on managing clusters, see the .
This is the base layer, representing the actual machines (virtual or physical) that make up the cluster IT teams need to handle:
Managing CPU, memory, and storage
Keeping the operating system updated
Setting up the network and balancing the load
NVIDIA Run:ai does not require any special configurations at this level.
The articles below explain how to monitor these layers, maintain system security and compliance, and ensure the reliable operation of NVIDIA Run:ai in critical environments.
Karpenter is an open-source, Kubernetes cluster autoscaler built for cloud deployments. Karpenter optimizes the cloud cost of a customer’s cluster by moving workloads between different node types, consolidating workloads into fewer nodes, using lower-cost nodes where possible, scaling up new nodes when needed, and shutting down unused nodes.
Karpenter’s main goal is cost optimization. Unlike Karpenter, NVIDIA Run:ai’s Scheduler optimizes for fairness and resource utilization. Therefore, there are a few potential friction points when using both on the same cluster.
Karpenter looks for “unschedulable” pending workloads and may try to scale up new nodes to make those workloads schedulable. However, in some scenarios, these workloads may exceed their quota parameters, and the NVIDIA Run:ai Scheduler will put them into a pending state.
Karpenter is not aware of the NVIDIA Run:ai fractions mechanism and may try to interfere incorrectly.
Karpenter preempts any type of workload (i.e., high-priority, non-preemptible workloads will potentially be interrupted and moved to save cost).
Karpenter has no pod-group (i.e., workload) notion or gang scheduling awareness, meaning that Karpenter is unaware that a set of “arbitrary” pods is a single workload. This may cause Karpenter to schedule those pods into different node pools (in the case of multi-node-pool workloads) or scale up or down a mix of wrong nodes.
NVIDIA Run:ai Scheduler mitigates the friction points using the following techniques (each numbered bullet below corresponds to the related friction point listed above):
Karpenter uses a “nominated node” to recommend a node for the Scheduler. The NVIDIA Run:ai Scheduler treats this as a “preferred” recommendation, meaning it will try to use this node, but it’s not required and it may choose another node.
Fractions - Karpenter won’t consolidate nodes with one or more pods that cannot be moved. The NVIDIA Run:ai reservation pod is marked as ‘do not evict’ to allow the NVIDIA Run:ai Scheduler to control the scheduling of fractions.
Non-preemptible workloads - NVIDIA Run:ai marks non-preemptible workloads as ‘do not evict’ and Karpenter respects this annotation.
NVIDIA Run:ai node pools (single-node-pool workloads) - Karpenter respects the ‘node affinity’ that NVIDIA Run:ai sets on a pod, so Karpenter uses the node affinity for its recommended node. For the gang-scheduling/pod-group (workload) notion, NVIDIA Run:ai Scheduler considers Karpenter directives as preferred recommendations rather than mandatory instructions and overrides Karpenter instructions where appropriate.
Using multi-node-pool workloads
Workloads may include a list of optional node pools. Karpenter is not aware that only a single node pool should be selected out of that list for the workload. It may therefore recommend putting pods of the same workload into different node pools and may scale up nodes from different node pools to serve a “multi-node-pool” workload instead of nodes on the selected single node pool.
If this becomes an issue (i.e., if Karpenter scales up the wrong node types), users can set an inter-pod affinity using the node pool label or another common label as a ‘topology’ identifier. This will force Karpenter to choose nodes from a single-node pool per workload, selecting from any of the node pools listed as allowed by the workload.
This guide outlines the best practices for configuring the NVIDIA Run:ai platform to ensure high availability and maintain service continuity during system failures or under heavy load. The goal is to reduce downtime and eliminate single points of failure by leveraging Kubernetes best practices alongside NVIDIA Run:ai specific configuration options. The NVIDIA Run:ai platform relies on two fundamental high availability strategies:
Use of system nodes - Assigning multiple dedicated nodes for critical system services ensures control, resource isolation, and enables system-level scaling.
Replication of core and third-party services - Configuring multiple replicas of essential services, including both platform and third-party components, distributes workloads and reduces single points of failure. If a component fails on one node, requests can seamlessly route to another instance.
The NVIDIA Run:ai platform allows you to dedicate specific nodes (system nodes) exclusively for core platform services. This approach provides improved operational isolation and easier resource management.
Ensure that at least three system nodes are configured to support high availability. If you use only a single node for core services, horizontally scaled components will not be distributed, resulting in a single point of failure. See for more details. This practice applies to both the NVIDIA Run:ai cluster and control plane (self-hosted).
The NVIDIA Run:ai control plane runs in the runai-backend namespace and consists of multiple Kubernetes and . To achieve high availability, it is recommended to configure multiple replicas during installation or upgrade using Helm flags.
In addition, the control plane supports autoscaling for certain services to handle variable load and improve system resiliency. Autoscaling can be enabled or configured during installation or upgrade using Helm flags.
Each of the NVIDIA Run:ai deployments can be set to scale up, by adding a helm settings on install/upgrade. For a full list of settings, contact NVIDIA Run:ai support.
To increase the replica count, use the following NVIDIA Run:ai control plane Helm flag:
NVIDIA Run:ai uses the following third-party components which are managed as Kubernetes StatefulSets. For more information, see :
PostgreSQL - The internal PostgreSQL cannot be scaled horizontally. To connect NVIDIA Run:ai to an external PostgreSQL service which can be configured for high availability, see .
Thanos - To enable Thanos autoscaling, use the following NVIDIA Run:ai control plane helm flags:
Keycloak - By default, Keycloak sets a minimum of 3 pods and will scale to more on transaction load. To scale Keycloak, use the following NVIDIA Run:ai control plane helm flags:
By default, NVIDIA Run:ai cluster services are deployed with a single replica. To achieve high availability, it is recommended to configure multiple replicas for core NVIDIA Run:ai services. For more information, see .
This section details the security considerations for deploying NVIDIA Run:ai. It is intended to help administrators and security officers understand the specific permissions required by NVIDIA Run:ai.
NVIDIA Run:ai integrates with Kubernetes clusters and requires specific permissions to successfully operate. These are permissions are controlled with configuration flags that dictate how NVIDIA Run:ai interacts with cluster resources. Prior to installation, security teams can review the permissions and ensure it aligns with their organization’s policies.
NVIDIA Run:ai provides various security-related permissions that can be customized to fit specific organizational needs. Below are brief descriptions of the key use cases for these customizations:
Many organizations enforce IT compliance rules for Kubernetes, with strict access control for installing and running workloads. OpenShift uses Security Context Constraints (SCC) for this purpose. NVIDIA Run:ai fully supports SCC, ensuring integration with OpenShift's security requirements.
The platform is actively monitored for security vulnerabilities, with regular scans conducted to identify and address potential issues. Necessary fixes are applied to ensure that the software remains secure and resilient against emerging threats, providing a safe and reliable experience.
Shared storage is a critical component in AI and machine learning workflows, particularly in scenarios involving distributed training and shared datasets. In AI and ML environments, data must be readily accessible across multiple nodes, especially when training large models or working with vast datasets. Shared storage enable seamless access to data, ensuring that all nodes in a distributed training setup can read and write to the same datasets simultaneously. This setup not only enhances efficiency but is also crucial for maintaining consistency and speed in high-performance computing environments.
While NVIDIA Run:ai Platform supports a variety of remote data sources, such as Git and S3, it is often more efficient to keep data close to the compute resources. This proximity is typically achieved through the use of shared storage, accessible to multiple nodes in your Kubernetes cluster.
When implementing shared storage in Kubernetes, there are two primary approaches:
Utilizing the of your storage provider (Recommended)
Using a direct NFS (Network File System) mount
NVIDIA Run:ai support both direct NFS mount and Kubernetes Storage Classes.
Storage classes in Kubernetes defines how storage is provisioned and managed. This allows you to select storage types optimized for AI workloads. For example, you can choose storage with high IOPS (Input/Output Operations Per Second) for rapid data access during intensive training sessions, or tiered storage options to balance cost and performance-based on your organization’s requirements. This approach supports dynamic provisioning, enabling storage to be allocated on-demand as required by your applications.
NVIDIA Run:ai data sources such as and leverage storage class to manage and allocate storage efficiently. This ensures that the most suitable storage option is always accessible, contributing to the efficiency and performance of AI workloads.
Direct NFS allows you to mount a shared file system directly across multiple nodes in your Kubernetes cluster. This method provides a straightforward way to share data among nodes and is often used for simple setups or when a dedicated NFS server is available.
However, using NFS can present challenges related to security and control. Direct NFS setups might lack the fine-grained control and security features available with storage class.
This section explains the procedure to manage your organization's applications.
Applications are used for API integrations with NVIDIA Run:ai. An application contains a client ID and a client secret. With the client credentials, you can obtain a token as detailed in and use it within subsequent API calls.
Applications are assigned with to manage permissions. For example, application ci-pipeline-prod is assigned with a Researcher role in Cluster: A.
The Applications table can be found under Access in the NVIDIA Run:ai platform.
This section provides instructions for IT administrators on collecting NVIDIA Run:ai logs for support, including prerequisites, CLI commands, and log file retrieval. It also covers enabling verbose logging for Prometheus and the NVIDIA Run:ai Scheduler.
To collect NVIDIA Run:ai logs, follow these steps:
NVIDIA Run:ai authentication and authorization enables a streamlined experience for the user with precise controls covering the data each user can see and the actions each user can perform in the NVIDIA Run:ai platform.
Authentication verifies user identity during login, and authorization assigns the user with specific permissions according to the assigned .
Authenticated access is required to use all aspects of the NVIDIA Run:ai interfaces, including the NVIDIA Run:ai platform, the NVIDIA Run:ai Command Line Interface (CLI) and APIs.
There are multiple methods to authenticate and access NVIDIA Run:ai.
NVIDIA Run:ai provides for both physical cluster entities such as clusters, nodes, and node pools and application organization entities such as departments and projects. Metrics represent over-time data while telemetry represents current analytics data. This data is essential for monitoring and analyzing the performance and health of your platform.
Users can consume the data based on their permissions:
API - Access the data programmatically through the
NVIDIA Run:ai assets are preconfigured building blocks that simplify the workload submission effort and remove the complexities of Kubernetes and networks for AI practitioners.
Workload assets enable organizations to:
Create and reuse preconfigured setup for code, data, storage and resources to be used by AI practitioners to simplify the process of submitting workloads
Share the preconfigured setup with a wide audience of AI practitioners with similar needs
An alternative approach is to use a single-node pool for each workload instead of multi-node pools.
Consolidation
To make Karpenter more effective when using its consolidation function, users should consider separating preemptible and non-preemptible workloads, either by using node pools, node affinities, taint/tolerations, or inter-pod anti-affinity.
If users don’t separate preemptible and non-preemptible workloads (i.e., make them run on different nodes), Karpenter’s ability to consolidate (bin-pack) and shut down nodes will be reduced, but it is still effective.
Conflicts between bin-packing and spread policies
If NVIDIA Run:ai is used with a scheduling spread policy, it will clash with Karpenter’s default bin-packs/consolidation policy, and the outcome may be a deployment that is not optimized for any of these policies.
Usually spread is used for Inference, which is non-preemptible and therefore not controlled by Karpenter (NVIDIA Run:ai Scheduler will mark those workloads as ‘do not evict’ for Karpenter), so this should not present a real deployment issue for customers.
Automatic Namespace creation
Controls whether NVIDIA Run:ai automatically creates Kubernetes namespaces when new projects are created. Useful in environments where namespace creation must be strictly managed.
Automatic user assignment
Decides if users are automatically assigned to projects within NVIDIA Run:ai. Helps manage user access more tightly in certain compliance-driven environments.
Secret propagation
Determines whether NVIDIA Run:ai should propagate secrets across the cluster. Relevant for organizations with specific security protocols for managing sensitive data.
Disabling Kubernetes limit range
Chooses whether to disable the Kubernetes Limit Range feature. May be adjusted in environments with specific resource management needs.
Note
The creation of assets is possible only via API and the NVIDIA Run:ai UI.
The submission of workloads using assets, is possible only via the NVIDIA Run:ai UI.
There are four workload asset types used by the workload:
Environments The container image, tools and connections for the workload
Data sources The type of data, its origin and the target storage location such as PVCs or cloud storage buckets where datasets are stored
Compute resources The compute specification, including GPU and CPU compute and memory
Credentials The secrets to be used to access sensitive data, services, and applications such as docker registry or S3 buckets
When a workload asset is created, a scope is required. The scope defines who in the organization can view and/or use the asset.
Any subject (user, application, or SSO group) with a role that has permissions to Create an asset, can do so within their scope.
Assets are used when submitting workloads. Any subject (user, application or SSO group) with a role that has permissions to Create workloads, can also use assets.
Any subject (user, application, or SSO group) with a role that has permission to View an asset, can do so within their scope.
authorizationMigrator:
podLabels:
openservicemesh.io/sidecar-injection: disabled
clusterMigrator:
podLabels:
openservicemesh.io/sidecar-injection: disabled
identityProviderReconciler:
podLabels:
openservicemesh.io/sidecar-injection: disabled
keepPVC:
podLabels:
openservicemesh.io/sidecar-injection: disabled
orgUnitsMigrator:
podLabels:
openservicemesh.io/sidecar-injection: disabledhelm upgrade -i ...
--set global.additionalJobLabels.A=B --set global.additionalJobAnnotations.A=Bhelm upgrade -i ...
--set-json global.additionalJobLabels='{"sidecar.istio.io/inject":false}'spec:
workload-controller:
additionalPodLabels:
sidecar.istio.io/inject: false--set <service>.replicaCount=2--set thanos.query.autoscaling.enabled=true \
--set thanos.query.autoscaling.maxReplicas=2 \
--set thanos.query.autoscaling.minReplicas=2 --set keycloakx.autoscaling.enabled=trueThe Applications table consists of the following columns:
Application
The name of the application
Client ID
The client ID of the application
Access rule(s)
The access rules assigned to the application
Last login
The timestamp for the last time the user signed in
Created by
The user who created the application
Creation time
The timestamp for when the application was created
Filter - Click ADD FILTER, select the column to filter by, and enter the filter values
Search - Click SEARCH and type the value to search by
Sort - Click each column header to sort by
Column selection - Click COLUMNS and select the columns to display in the table
Download table - Click MORE and then Click Download as CSV. Export to CSV is limited to 20,000 rows.
To create an application:
Click +NEW APPLICATION
Enter the application’s name
Click CREATE
Copy the Client ID and Client secret and store them securely
Click DONE
To create an access rule:
Select the application you want to add an access rule for
Click ACCESS RULES
Click +ACCESS RULE
Select a role
Select a scope
Click SAVE RULE
Click CLOSE
To delete an access rule:
Select the application you want to remove an access rule from
Click ACCESS RULES
Find the access rule assigned to the user you would like to delete
Click on the trash icon
Click CLOSE
To regenerate a client secret:
Locate the application you want to regenerate its client secret
Click REGENERATE CLIENT SECRET
Click REGENERATE
Copy the New client secret and store it securely
Click DONE
Important
Regenerating a client secret revokes the previous one.
Select the application you want to delete
Click DELETE
On the dialog, click DELETE to confirm
Go to the Applications, Access rules API reference to view the available actions.
Ensure that you have administrator-level access to the Kubernetes cluster where NVIDIA Run:ai is installed.
The NVIDIA Run:ai Administrator Command-Line Interface (CLI) must be installed.
Run the Command from your local machine or a Bastion Host (secure server). Open a terminal on your local machine (or any machine that has network access to the Kubernetes cluster) where the NVIDIA Run:ai Administrator CLI is installed.
Collect the Logs. Execute the following command to collect the logs:
This command gathers all relevant NVIDIA Run:ai logs from the system and generate a compressed file.
Locate the Generated File. After running the command, note the location of the generated compressed log file. You can retrieve and send this file to NVIDIA Run:ai Support for further troubleshooting.
Increase log verbosity to capture more detailed information, providing deeper insights into system behavior and make it easier to identify and resolve issues.
Before you begin, ensure you have the following:
Access to the Kubernetes cluster where NVIDIA Run:ai is installed
Including necessary permissions to view and modify configurations.
kubectl installed and configured:
The Kubernetes command-line tool, kubectl, must be installed and configured to interact with the cluster.
Sufficient privileges to edit configurations and view logs.
Monitoring Disk Space
When enabling verbose logging, ensure adequate disk space to handle the increased log output, especially when enabling debug or high verbosity levels.
NVIDIA Run:ai supports three methods to set up SSO:
When using SSO, it is highly recommended to manage at least one local user, as a breakglass account (an emergency account), in case access to SSO is not possible.
Username and password access can be used when SSO integration is not possible.
Secret is the authentication method for Applications. Applications use the NVIDIA Run:ai APIs to perform automated tasks including scripts and pipelines based on their assigned access rules.
The NVIDIA Run:ai platform uses Role Base Access Control (RBAC) to manage authorization. Once a user or an application is authenticated, they can perform actions according to their assigned access rules.
While Kubernetes RBAC is limited to a single cluster, NVIDIA Run:ai expands the scope of Kubernetes RBAC, making it easy for administrators to manage access rules across multiple clusters.
RBAC at NVIDIA Run:ai is configured using access rules. An access rule is the assignment of a role to a subject in a scope: <Subject> is a <Role> in a <Scope>.
Subject
A user, a group, or an application assigned with the role
Role
A set of permissions that can be assigned to subjects. Roles at NVIDIA Run:ai are system defined and cannot be created, edited or deleted.
A permission is a set of actions (view, edit, create and delete) over a NVIDIA Run:ai entity (e.g. projects, workloads, users). For example, a role might allow a user to create and read Projects, but not update or delete them
Scope
A scope is part of an organization in which a set of permissions (roles) is effective. Scopes include Projects, Departments, Clusters, Account (all clusters).
Below is an example of an access rule: [email protected] is a Department admin in Department: A
CLI - Use the NVIDIA Run:ai Command Line Interface to query and manage the data.
UI - Visualize the data through the NVIDIA Run:ai user interface.
Metrics API - Access over-time detailed analytics data programmatically.
Telemetry API - Access current analytics data programmatically.
Refer to metrics and telemetry to see the full list of supported metrics and telemetry APIs.
Use the list and describe commands to fetch and manage the data. See CLI reference for more details.
Refer to metrics and telemetry to see the full list of supported metrics and telemetry.
Overview dashboard - Provides a high-level summary of the cluster's health and performance, including key metrics such as GPU utilization, memory usage, and node status. Allows administrators to quickly identify any potential issues or areas for optimization. Offers advanced analytics capabilities for analyzing GPU usage patterns and identifying trends. Helps administrators optimize resource allocation and improve cluster efficiency.
Quota management - Enables administrators to monitor and manage GPU quotas across the cluster. Includes features for setting and adjusting quotas, tracking usage, and receiving alerts when quotas are exceeded.
Workload visualizations - Provides detailed insights into the resource usage and utilization of each GPU in the cluster. Includes metrics such as GPU memory utilization, core utilization, and power consumption. Allows administrators to identify GPUs that are under-utilized and overloaded.
Node and node pool visualizations - Similar to workload visualizations, but focused on the resource usage and utilization of each GPU within a specific node or node pool. Helps administrators identify potential issues or bottlenecks at the node level.
Advanced NVIDIA metrics - Provides access to a range of advanced NVIDIA metrics, such as GPU temperature, fan speed, and voltage. Enables administrators to monitor the health and performance of GPUs in greater detail. This data is available at the node and workload level. To enable these metrics, contact NVIDIA Run:ai customer support.
Before proceeding with the upgrade, it's crucial to apply the specific prerequisites associated with your current version of NVIDIA Run:ai and every version in between up to the version you are upgrading to.
NVIDIA Run:ai requires 3.14 or later. Before you continue, validate your installed helm client version. To install or upgrade Helm, see . If you are installing an air-gapped version of NVIDIA Run:ai, the NVIDIA Run:ai tar file contains the helm binary.
Run the following commands to add the NVIDIA Run:ai Helm repository and browse the available versions:
Run the following command to browse all available air-gapped packages using the token provided by NVIDIA Run:ai.
To download and extract a specific version, and to upload the container images to your private registry, see the section.
Before upgrading the NVIDIA Run:ai control plane, validate that the latest and are met, as they can change from time to time.
Upgrading from Version 2.16
You must perform a two-step upgrade:
Upgrade to version 2.18:
Then upgrade to the required version:
Upgrading from Version 2.17 or Later
If your current version is 2.17 or higher, you can upgrade directly to the required version:
Before upgrading the NVIDIA Run:ai cluster, validate that the latest and are met, as they can change from time to time.
Follow the setup and installation instructions below to get the installation instructions to upgrade the NVIDIA Run:ai cluster.
In the NVIDIA Run:ai UI, go to Clusters
Select the cluster you want to upgrade
Click INSTALLATION INSTRUCTIONS
Optional: Select the NVIDIA Run:ai cluster version (latest, by default)
Follow the installation instructions. Run the Helm commands provided on your Kubernetes cluster. See the below if .
Click DONE
Once installation is complete, validate the cluster is Connected and listed with the new cluster version (see the ). Once you have done this, the cluster is upgraded to the latest version.
If you encounter an issue with the cluster upgrade, use the troubleshooting scenarios below.
If the NVIDIA Run:ai cluster upgrade fails, check the installation logs to identify the issue.
Run the following script to print the installation logs:
If the NVIDIA Run:ai cluster upgrade completes, but the cluster status does not show as Connected, refer to .
This section explains the procedure to manage users and their permissions.
Users can be managed locally, or via the identity provider (Idp), while assigned with access rules to manage permissions. For example, user [email protected] is a department admin in department A.
The Users table can be found under Access in the NVIDIA Run:ai platform.
The users table provides a list of all the users in the platform. You can manage users and user permissions (access rules) for both local and SSO users.
The Users table consists of the following columns:
Filter - Click ADD FILTER, select the column to filter by, and enter the filter values
Search - Click SEARCH and type the value to search by
Sort - Click each column header to sort by
Column selection - Click COLUMNS and select the columns to display in the table
To create a local user:
Click +NEW LOCAL USER
Enter the user’s Email address
Click CREATE
Review and copy the user’s credentials:
To create an access rule:
Select the user you want to add an access rule for
Click ACCESS RULES
Click +ACCESS RULE
Select a role
To delete an access rule:
Select the user you want to remove an access rule from
Click ACCESS RULES
Find the access rule assigned to the user you would like to delete
Click on the trash icon
To reset a user’s password:
Select the user you want to reset it’s password
Click RESET PASSWORD
Click RESET
Review and copy the user’s credentials:
Select the user you want to delete
Click DELETE
In the dialog, click DELETE to confirm
Go to the , API reference to view the available actions.
Researchers may need to access containers remotely during workload execution. Common use cases include:
Running a Jupyter Notebook inside the container
Connecting PyCharm for remote Python development
Viewing machine learning visualizations using TensorBoard
To enable this access, you must expose the relevant container ports.
Accessing the containers remotely requires exposing container ports. In Docker, ports are exposed by them when launching the container. NVIDIA Run:ai provides similar functionality within a Kubernetes environment.
Since Kubernetes abstracts the container's physical location, exposing ports is more complex. Kubernetes supports multiple methods for exposing container ports. For more details, refer to the .
Many tools used by researchers, such as Jupyter, TensorBoard, or VSCode, require remote access to the running workload's container. In NVIDIA Run:ai, this access is provided through dynamically generated URLs.
By default, NVIDIA Run:ai uses the provided to dynamically create SSL-secured URLs in the following format:
While path-based routing works with applications such as Jupyter Notebooks, it may not be compatible with other applications. Some applications assume they are running at the root file system, so hardcoded file paths and settings within the container may become invalid when running at a path other than the root. For example, if an application expects to access /etc/config.json but is served at /project-name/workspace-name, the file will not be found. This can cause the container to fail or not function as intended.
NVIDIA Run:ai provides support for host-based routing. When enabled, URLs follow the format:
This allows all workloads to run at the root path, avoiding file path issues and ensuring proper application behavior.
To enable host-based routing, perform the following steps:
Create a second DNS entry (A record) for *.<CLUSTER_URL>, pointing to the same IP as the cluster's .
Obtain a wildcard SSL certificate for this second DNS entry.
Add the certificate as a secret:
Create the following ingress rule and replace <CLUSTER_URL>:
Run the following:
Edit to generate the URLs correctly:
Once these requirements have been met, all workloads will automatically be assigned a secured URL with a subdomain, ensuring full functionality for all researcher applications.
Before installing the NVIDIA Run:ai control plane, validate that the system requirements and network requirements are met. For air-gapped environments, make sure you have the software artifacts prepared.
As part of the installation, you will be required to install the NVIDIA Run:ai control plane . The Helm charts require Kubernetes administrator permissions. You can review the exact objects that are created by the charts using the --dry-run on both helm charts.
It’s recommended to install the latest NVIDIA Run:ai release. If you need to install a specific version, you can browse the available versions using the following commands:
Open your browser and go to:
https://<DOMAIN>
https://runai.apps.<OpenShift-DOMAIN>
Log in using the default credentials:
User: [email protected]
Password: Abcd!234
You will be prompted to change the password.
This section explains the available configurations for customizing the NVIDIA Run:ai control plane and cluster installation.
The NVIDIA Run:ai control plane installation can be customized to support your environment via Helm values files or Helm install flags. See Advanced control plane configurations.
The NVIDIA Run:ai cluster installation can be customized to support your environment via Helm or flags.
These configurations are saved in the runaiconfig Kubernetes object and can be edited post-installation as needed. For more information, see .
The following table lists the available Helm chart values that can be configured to customize the NVIDIA Run:ai cluster installation.
NVIDIA’s Multi-Instance GPU (MIG) enables splitting a GPU into multiple logical GPU devices, each with its own memory and compute portion of the physical GPU.
NVIDIA provides two MIG strategies:
Single - A GPU can be divided evenly. This means all MIG profiles are the same.
Mixed - A GPU can be divided into different profiles.
The NVIDIA Run:ai platform supports running workloads using NVIDIA MIG. Administrators can set the Kubernetes nodes to their preferred MIG strategy and configure the appropriate MIG profiles for researchers and MLOPS engineers to use.
This guide explains how to configure MIG in each strategy to . It also outlines the individual implications of each strategy and best practices for administrators.
To use MIG single and mixed strategy effectively, make sure to familiarize yourself with the following NVIDIA resources:
When deploying MIG using single strategy, all GPUs within a are configured with the same profile. For example, a node might have GPUs configured with 3 MIG slices of profile type 1g.20gb, or 7 MIG slices of profile 1g.10gb. With this strategy, MIG profiles are displayed as whole GPU devices by CUDA.
The NVIDIA Run:ai platform discovers these MIG profiles as whole GPU devices as well, ensuring MIG devices are transparent to the end-user (practitioner). For example, a node that consists of 8 physical GPUs split into MIG slices, 3×2g20gb slices each, is discovered by the NVIDIA Run:ai platform as a node with 24 GPU devices.
Users can submit workloads by requesting a specific number of GPU devices (X GPU) and NVIDIA Run:ai will allocate X MIG slices (logical devices). The NVIDIA Run:ai platform deducts X GPUs from the workload’s , regardless of whether this ‘logical GPU’ represents 1/3 of a physical GPU device or 1/7 of a physical GPU device.
When deploying MIG using mixed strategy, each GPU in a can be configured with a different combination of MIG profiles such as 2×2g.20gb and 3×1g.10gb. For details on supported combinations per GPU type, refer to .
In mixed strategy, physical GPU devices continue to be displayed as physical GPU devices by CUDA, and each MIG profile is shown individually. The NVIDIA Run:ai platform identifies the physical GPU devices normally, however, MIG profiles are not visible in the UI or node APIs.
When submitting third-party workloads with this strategy, the user should explicitly specify the exact requested MIG profile (for example, nvidia.com/gpu.product: A100-SXM4-40GB-MIG-3g.20gb). The NVIDIA Run:ai finds a node that can provide this specific profile and binds it to the workload.
A third-party workload submitted with a MIG profile of type Xg.Ygb (e.g. 3g.40gb or 2g.20gb) is considered as consuming X GPUs. These X GPUs will be deducted from the workload’s project quota of GPUs. For example, a 3g.40gb profile deducts 3 GPUs from the associated , while 2g.20gb deducts 2 GPUs from the associated Project’s quota. This is done to maintain a logical ratio according to the characteristics of the MIG profile.
Configure proper and uniform sizes of MIG slices (profiles) across all GPUs within a node.
Set the same MIG profiles on all nodes of a single .
Create separate node pools with different MIG profile configurations allowing users to select the pool that best matches their workloads’ needs.
Ensure Project quotas are allocated according to the MIG profile sizes.
Use mixed strategy with workloads that require diverse resources. Make sure to evaluate the workload requirements and plan accordingly.
Configure individual MIG profiles on each node by using a limited set of MIG profile combinations to minimize complexity. Make sure to evaluate your requirements and node configurations.
Ensure Project quotas are allocated according to the MIG profile sizes.
Operating NVIDIA Run:ai at scale ensures that the system can efficiently handle fluctuating workloads while maintaining optimal performance. As clusters grow, whether due to an increasing number of nodes or a surge in workload demand, NVIDIA Run:ai services must be appropriately tuned to support large-scale environments.
This guide outlines the best practices for optimizing NVIDIA Run:ai for high-performance deployments, including NVIDIA Run:ai system services configurations, vertical scaling (adjusting CPU and memory resources) and where applicable, horizontal scaling (replicas).
Each of the NVIDIA Run:ai containers has default resource requirements that reflect an average customer load. With significantly larger cluster loads, certain NVIDIA Run:ai services will require more CPU and memory resources. NVIDIA Run:ai supports configuring these resources for each NVIDIA Run:ai service group separately. For instructions and more information, see .
The scheduling services group should be scaled together with the number of and the number of handled by the (running / pending). These resource recommendations are based on internal benchmarks performed on stressed environments:
The sync and workload service groups are less sensitive for scale. The recommendation for large or intensive environments is set to the following:
By default, NVIDIA Run:ai cluster services are deployed with a single replica. For large scale and intensive environments it is recommended to scale the NVIDIA Run:ai services horizontally by increasing the number of replicas. For more information, see .
NVIDIA Run:ai relies on to scrape cluster metrics and forward them to the NVIDIA Run:ai control plane. The volume of metrics generated is directly proportional to the number of nodes, workloads, and projects in the system. When operating at scale—reaching hundreds, and thousands of nodes and projects—the system generates a significant volume of metrics which can place a strain on the cluster and the network bandwidth.
To mitigate this impact, it is recommended to tune the Prometheus configurations. See to read more about the tuning parameters available via the remote write configuration and refer to this for optimizing Prometheus remote write performance.
You can apply the remote-write configurations required as described in
The following example demonstrates the recommended approach in NVIDIA Run:ai for tuning Prometheus remote-write configurations:
For clusters with more than 32 nodes (SuperPod and larger), increase the replica count for key control plane services to 2.
To set the replica count, use the following NVIDIA Run:ai control plane Helm flag:
Replicas for following services should not be increased: postgres, keycloak, grafana, thanos, nats, redoc, cluster-migrator, identity provider reconciler, settings migrator.
For Grafana, enable autoscaling first and then set the number of minReplicas. Use the following NVIDIA Run:ai control plane Helm flags:
is the third-party used by NVIDIA Run:ai to store metrics under a significant user load. Use the following NVIDIA Run:ai control plane Helm flags to increase resources for the Thanos query function:
This section provides details about NVIDIA Run:ai’s Audit log.
The NVIDIA Run:ai control plane provides the audit log API and event history table in the NVIDIA Run:ai UI. Both reflect the same information regarding changes to business objects: clusters, projects and assets etc.
The Event history table can be found under Event history in the NVIDIA Run:ai UI.
The Event history table consists of the following columns:
Filter - Click ADD FILTER, select the column to filter by, and enter the filter values
Search - Click SEARCH and type the value to search by
Sort - Click each column header to sort by
Column selection - Click COLUMNS and select the columns to display in the table
The Event history table saves events for the last 90 days. However, the table itself presents up to the last 30 days of information due to the potentially very high number of operations that might be logged during this period.
To view older events, or to refine your search for more specific results or fewer results, use the time selector and change the period you search for. You can also refine your search by clicking and using ADD FILTER accordingly.
Go to the reference to view the available actions. Since the amount of data is not trivial, the API is based on paging. It retrieves a specified number of items for each API call. You can get more data by using subsequent calls.
Submissions of workloads are not audited. As a result, the system does not track or log details of workload submissions, such as timestamps or user activity.
The Node Level Scheduler optimizes the performance of your pods and maximizes the utilization of GPUs by making optimal local decisions on GPU allocation to your pods. While the NVIDIA Run:ai Scheduler chooses the specific node for a pod, it has no visibility to the node’s GPUs' internal state. The Node Level Scheduler is aware of the local GPUs' states and makes optimal local decisions such that it can optimize both the GPU utilization and pods’ performance running on the node’s GPUs.
This guide provides an overview of the best use cases for the Node Level Scheduler and instructions for configuring it to maximize GPU performance and pod efficiency.
While the Node Level Scheduler applies to all , it will best optimize the performance of burstable workloads. Burstable workloads are workloads that use , giving those more GPU memory than requested and up to the Limit specified.
Burstable workloads are always susceptible to an OOM Kill signal if the owner of the excess memory requires it back. This means that using the Node Level Scheduler with inference or training workloads may cause pod preemption.
Using interactive workloads with notebooks is the best use case for burstable workloads and Node Level Scheduler. These workloads behave differently since the OOM Kill signal will cause the notebooks' GPU process to exit but not the notebook itself. This keeps the interactive pod running and retrying to attach a GPU again.
This use case is one scenario that shows how Node Level Scheduler locally optimizes and maximizes GPU utilization and workspaces’ performance.
The below shows a node with 2 GPUs and 2 submitted workspaces:
The Scheduler instructs the node to put the 2 workspaces on a single GPU, a single GPU and leaving the other free for a workload that requires resources. This means GPU#2 is idle while the two workspaces can only use up to half a GPU, even if they temporarily need more:
With the Node Level Scheduler enabled, the local decision will be to spread those 2 workspaces on 2 GPUs and allow them to maximize both workspaces’ performance and GPUs’ utilization by bursting out up to the full GPU memory and GPU compute resources:
The NVIDIA Run:ai Scheduler still sees a node with one fully empty GPU and one fully occupied GPU. When a 3rd workload is scheduled, and it requires a full GPU (or more than 0.5 GPU), the Scheduler will schedule it to that node, and the Node Level Scheduler will move one of the workspaces to run with the other in GPU#1, as was the Scheduler’s initial plan. Moving the workspace from GPU#1 back to GPU#2 maintains the workspace running while the GPU process within the Jupyter notebook is killed and re-established on GPU#2, continuing to serve the workspace:
The Node Level Scheduler can be enabled per node pool. To use Node Level Scheduler, follow the below steps.
Enable the Node Level Scheduler at the cluster level (per cluster) by:
Editing the runaiconfig as follows. For more details, see :
Or, using the following kubectl patch command:
Enable Node Level Scheduler on any of the node pools:
Select Resources → Node pools
or
Under the Resource Utilization Optimization tab, change the number of workloads on each GPU to any value other than Not Enforced (i.e. 2, 3, 4, 5)
The Node Level Scheduler is now ready to be used on that node pool.
In order for a workload to be considered by the Node Level Scheduler for rerouting, it must be submitted with a GPU Request and Limit where the Limit is larger than the Request:
Enable and set
Then using dynamic GPU fractions
This article explains how to designate specific node roles in a Kubernetes cluster to ensure optimal performance and reliability in production deployments.
For optimal performance in production clusters, it is essential to avoid extensive CPU usage on GPU nodes where possible. This can be done by ensuring the following:
NVIDIA Run:ai system-level services run on dedicated CPU-only nodes.
Workloads that do not request GPU resources (e.g. Machine Learning jobs) are executed on CPU-only nodes.
NVIDIA Run:ai services are scheduled on the defined node roles by applying using node labels .
This document outlines how to back up and restore a NVIDIA Run:ai deployment, including both the NVIDIA Run:ai cluster and control plane.
The restoration or backup of NVIDIA Run:ai and which are stored locally on the Kubernetes cluster is optional and can be restored and backed up separately. As backup of data is not required, the backup procedure is optional for advanced deployments.
This section explains the procedure to manage Access rules.
Access rules provide users, groups, or applications privileges to system entities. An access rule is the assignment of a to a : <Subject> is a <Role> in a <Scope>. For example, user [email protected] is a department admin in department A.
The identity of the user inside a container determines its access to various resources. For example, network file systems often rely on this identity to control access to mounted volumes. As a result, propagating the correct user identity into a container is crucial for both functionality and security.
By default, containers in both Docker and Kubernetes run as the root user. This means any process inside the container has full administrative privileges, capable of modifying system files, installing packages, or changing configurations.
While this level of access provides researchers with maximum flexibility, it conflicts with modern enterprise security practices. If the container’s root identity is propagated to external systems (e.g., network-attached storage), it can result in elevated permissions outside the container, increasing the risk of security breaches.
To uninstall the NVIDIA Run:ai cluster, run the following command in your terminal:
To remove the NVIDIA Run:ai cluster from the NVIDIA Run:ai platform, see .
kubectl edit runaiconfig runai -n runaikubectl edit runaiconfig runai -n runairunai-scheduler:
args:
verbosity: 6runai-adm collect-logsSingle strategy supports both NVIDIA Run:ai and third-party workloads. Using mixed strategy can only be done using third-party workloads. For more details on NVIDIA Run:ai and third-party workloads, see Introduction to workloads.
Last updated
The last time the user was updated
Download table - Click MORE and then Click Download as CSV. Export to CSV is limited to 20,000 rows.
User Email
Temporary password to be used on first sign-in
Click DONE
Select a scope
Click SAVE RULE
Click CLOSE
Click CLOSE
User Email
Temporary password to be used on next sign-in
Click DONE
User
The unique identity of the user (email address)
Type
The type of the user - SSO / local
Last login
The timestamp for the last time the user signed in
Access rule(s)
The access rule assigned to the user
Created By
The user who created the user
Creation time
The timestamp for when the user was created

Status
The outcome of the logged operation. Possible values: Succeeded, Failed
Entity type
The type of the logged business object.
Entity name
The name of logged business object.
Entity ID
The system's internal id of the logged business object.
URL
The endpoint or address that was accessed during the logged event.
HTTP Method
The HTTP operation method used for the request. Possible values include standard HTTP methods such as GET, POST, PUT, DELETE, indicating what kind of action was performed on the specified URL.
Download table - Click MORE and then Click Download as CSV or Download as JSON
Subject
The name of the subject
Subject type
The user or application assigned with the role
Source IP
The IP address of the subject
Date & time
The exact timestamp at which the event occurred. Format dd/mm/yyyy for date and hh:mm am/pm for time.
Event
The type of the event. Possible values: Create, Update, Delete, Login
Event ID
Internal event ID, can be used for support purposes


Last updated
The last time the application was updated




helm uninstall runai-cluster -n runaiEnables the use of a custom Certificate Authority (CA) in your deployment. When set to true, the system is configured to trust a user-provided CA certificate for secure communication.
openShift.securityContextConstraints.create
Enables the deployment of Security Context Constraints (SCC). Disable for CIS compliance.
Default: true
controlPlane.existingSecret
Specifies the name of the existing Kubernetes secret where the cluster’s clientSecret used for secure connection with the control plane is stored.
controlPlane.secretKeys.clientSecret
Specifies the key within the controlPlane.existingSecret that stores the cluster’s clientSecret used for secure connection with the control plane.
global.image.registry (string)
Global Docker image registry
Default: ""
global.additionalImagePullSecrets (list)
List of image pull secrets references
Default: []
spec.researcherService.ingress.tlsSecret (string)
Existing secret key where cluster TLS certificates are stored (non-OpenShift)
Default: runai-cluster-domain-tls-secret
spec.researcherService.route.tlsSecret (string)
Existing secret key where cluster TLS certificates are stored (OpenShift only)
Default: ""
spec.prometheus.spec.image (string)
Due to a known issue In the Prometheus Helm chart, the imageRegistry setting is ignored. To pull the image from a different registry, you can manually specify the Prometheus image reference.
Default: quay.io/prometheus/prometheus
spec.prometheus.spec.imagePullSecrets (string)
List of image pull secrets references in the runai namespace to use for pulling Prometheus images (relevant for air-gapped installations).
Default: []
global.customCA.enabled
NVIDIA Run:ai allows you to enhance security and enforce organizational policies by:
Controlling root access and privilege escalation within containers
Propagating the user identity to align with enterprise access policies
NVIDIA Run:ai supports security-related workload configurations to control user permissions and restrict privilege escalation. These options are available via the API and CLI during workload creation:
runAsNonRoot / --run-as-user - Force the container to run as non-root user.
allowPrivilegeEscalation / --allow-privilege-escalation - Allow the container to use setuid binaries to escalate privileges, even when running as a non-root user. This setting can increase security risk and should be disabled if elevated privileges are not required.
Administrators can enforce secure defaults across the environment using Policies, ensuring consistent workload behavior aligned with organizational security practices.
A best practice is to store the User Identifier (UID) and Group Identifier (GID) in the organization's directory. NVIDIA Run:ai allows you to pass these values to the container and use them as the container identity. To perform this, you must set up single sign-on and perform the steps for UID/GID integration.
It is possible to explicitly pass user identity when creating an environment or submitting a workload:
From the image - Use the UID/GID defined in the container image.
From the IdP token - Use identity attributes provided by the SSO identity provider (available only in SSO-enabled installations).
Custom - Manually set the User ID (UID), Group ID (GID) and supplementary groups that can run commands in the container.
Administrators can enforce secure defaults across the environment using Policies, ensuring consistent workload behavior aligned with organizational security practices.
In OpenShift, Security Context Constraints (SCCs) manage pod-level security, including root access. By default, containers are assigned a random non-root UID, and flags such as --run-as-user and --allow-privilege-escalation are disabled.
On non-OpenShift Kubernetes clusters, similar enforcement can be achieved using tools like Gatekeeper, which applies system-level policies to restrict containers from running as root.
By default, OpenShift restricts setting specific user and group IDs (UIDs/GIDs) in workloads through its SCCs. To allow NVIDIA Run:ai workloads to run with explicitly defined UIDs and GIDs, a cluster administrator must modify the relevant SCCs.
To enable UID and GID assignment:
Edit the runai-user-job SCC:
Edit the runai-jupyter-notebook SCC (only required if using Jupyter environments):
In both SCC definitions, ensure the following sections are configured:
These settings allow NVIDIA Run:ai to pass specific UID and GID values into the container, enabling compatibility with identity-aware file systems and enterprise access controls.
When containers run as a specific user, the user must have a home directory defined within the image. Otherwise, starting a shell session will fail due to the absence of a home directory.
Since pre-creating a home directory for every possible user is impractical, NVIDIA Run:ai offers the createHomeDir / --create-home-dir option. When enabled, this flag creates a temporary home directory for the user inside the container at runtime. By default, the directory is created at /home/<username>.
Click CONTINUE
helm repo add runai-backend https://runai.jfrog.io/artifactory/cp-charts-prod
helm repo update
helm search repo -l runai-backendcurl -H "Authorization: Bearer <token>" "https://runai.jfrog.io/artifactory/api/storage/runai-airgapped-prod/?list"Port Forwarding
Simple port forwarding allows access to the container via local and/or remote port.
Supported natively via Kubernetes
NodePort
Exposes the service on each Node’s IP at a static port (the NodePort). You’ll be able to contact the NodePort service from outside the cluster by requesting <NODE-IP>:<NODE-PORT> regardless of which node the container actually resides in.
Supported
LoadBalancer
Exposes the service externally using a cloud provider’s load balancer.
Supported via API with limited capabilities
Small - 30 / 480
1
1GB
Medium - 100 / 1600
2
2GB
Large - 500 / 8500
2
7GB
Small - 30 / 480
1
2GB
Medium - 100 / 1600
2
10GB
Large - 500 / 8500
4
24GB




global.customCA.enabled=true as described here
Note: Use the --dry-run flag to gain an understanding of what is being installed before the actual installation.
oc get routes -Aglobal.customCA.enabled=true as described here
helm search repo -l runai-backendoc edit scc runai-user-joboc edit scc runai-jupyter-notebookrunAsUser:
type: RunAsAny
supplementalGroups:
type: RunAsAnyspec:
prometheus:
spec:
logLevel: debugkubectl logs -n runai prometheus-runai-0 helm get values runai-backend -n runai-backend > runai_control_plane_values.yaml
helm upgrade runai-backend -n runai-backend runai-backend/control-plane --version "2.18.0" -f runai_control_plane_values.yaml --reset-then-reuse-valueshelm get values runai-backend -n runai-backend > runai_control_plane_values.yaml
helm upgrade runai-backend control-plane-2.18.0.tgz -n runai-backend -f runai_control_plane_values.yaml --reset-then-reuse-valueshelm get values runai-backend -n runai-backend > runai_control_plane_values.yaml
helm upgrade runai-backend -n runai-backend runai-backend/control-plane --version "<VERSION>" -f runai_control_plane_values.yaml --reset-then-reuse-valueshelm get values runai-backend -n runai-backend > runai_control_plane_values.yaml
helm upgrade runai-backend control-plane-<NEW-VERSION>.tgz -n runai-backend -f runai_control_plane_values.yaml --reset-then-reuse-valueshelm get values runai-backend -n runai-backend > runai_control_plane_values.yaml
helm upgrade runai-backend -n runai-backend runai-backend/control-plane --version "<VERSION>" -f runai_control_plane_values.yaml --reset-then-reuse-valueshelm get values runai-backend -n runai-backend > runai_control_plane_values.yaml
helm upgrade runai-backend control-plane-<NEW-VERSION>.tgz -n runai-backend -f runai_control_plane_values.yaml --reset-then-reuse-valuescurl -fsSL https://raw.githubusercontent.com/run-ai/public/main/installation/get-installation-logs.shhttps://<CLUSTER_URL>/project-name/workload-namehttps://project-name-workload-name.<CLUSTER_URL>/kubectl create secret tls runai-cluster-domain-star-tls-secret -n runai \
--cert /path/to/fullchain.pem \ # Replace /path/to/fullchain.pem with the actual path to your TLS certificate
--key /path/to/private.pem # Replace /path/to/private.pem with the actual path to your private keyapiVersion: networking.k8s.io/v1
kind: Ingress
metadata:
name: runai-cluster-domain-star-ingress
namespace: runai
spec:
ingressClassName: nginx
rules:
- host: '*.<CLUSTER_URL>'
tls:
- hosts:
- '*.<CLUSTER_URL>'
secretName: runai-cluster-domain-star-tls-secretkubectl apply -f <filename>kubectl patch RunaiConfig runai -n runai --type="merge" \
-p '{"spec":{"global":{"subdomainSupport": true}}}' remoteWrite:
queueConfig:
capacity: 5000
maxSamplesPerSend: 1000
maxShards: 100--set <service>.replicaCount=2--set grafana.autoscaling.enabled=true \
--set grafana.autoscaling.minReplicas=2--set thanos.query.resources.limits.memory=3G \
--set thanos.query.resources.requests.memory=3G \
--set thanos.query.resources.limits.cpu=1 \
--set thanos.query.resources.requests.cpu=1 \
--set thanos.receive.resources.limits.memory=15G \
--set thanos.receive.resources.requests.memory=15G \
--set thanos.receive.resources.limits.cpu=2 \
--set thanos.receive.resources.requests.cpu=2spec:
global:
core:
nodeScheduler:
enabled: truekubectl patch -n runai runaiconfigs.run.ai/runai --type='merge' --patch '{"spec":{"global":{"core":{"nodeScheduler":{"enabled": true}}}}}'curl -H "Authorization: Bearer <token>" "https://runai.jfrog.io/artifactory/api/storage/runai-airgapped-prod/?list"helm repo add runai-backend https://runai.jfrog.io/artifactory/cp-charts-prod
helm repo update
helm upgrade -i runai-backend -n runai-backend runai-backend/control-plane \
--set global.domain=<DOMAIN>helm repo add runai-backend https://runai.jfrog.io/artifactory/cp-charts-prod
helm repo update
helm upgrade -i runai-backend -n runai-backend runai-backend/control-plane \
--set global.domain=runai.apps.<OPENSHIFT-CLUSTER-DOMAIN> \
--set global.config.kubernetesDistribution=openshifthelm upgrade -i runai-backend control-plane-<VERSION>.tgz \
--set global.domain=<DOMAIN> \
--set global.customCA.enabled=true \
-n runai-backend -f custom-env.yamlhelm upgrade -i runai-backend ./control-plane-<VERSION>.tgz -n runai-backend \
--set global.domain=runai.apps.<OPENSHIFT-CLUSTER-DOMAIN> \
--set global.config.kubernetesDistribution=openshift \
--set global.customCA.enabled=true \
-f custom-env.yaml To allow your organization’s NVIDIA Run:ai users to interact with the cluster using the NVIDIA Run:ai Command-line interface, or access specific UI features, certain inbound ports need to be open:
NVIDIA Run:ai control plane
HTTPS entrypoint
0.0.0.0
NVIDIA Run:ai system nodes
443
NVIDIA Run:ai cluster
HTTPS entrypoint
0.0.0.0
NVIDIA Run:ai system nodes
443
For the NVIDIA Run:ai cluster installation and usage, certain outbound ports must be open:
Cluster sync
Sync NVIDIA Run:ai cluster with NVIDIA Run:ai control plane
NVIDIA Run:ai cluster system nodes
NVIDIA Run:ai control plane FQDN
443
Metric store
Push NVIDIA Run:ai cluster metrics to NVIDIA Run:ai control plane's metric store
NVIDIA Run:ai cluster system nodes
NVIDIA Run:ai control plane FQDN
443
The NVIDIA Run:ai installation has software requirements that require additional components to be installed on the cluster. This article includes simple installation examples which can be used optionally and require the following cluster outbound ports to be open:
Kubernetes Registry
Ingress Nginx image repository
All kubernetes nodes
registry.k8s.io
443
Google Container Registry
GPU Operator, and Knative image repository
All kubernetes nodes
gcr.io
443
Ensure that all Kubernetes nodes can communicate with each other across all necessary ports. Kubernetes assumes full interconnectivity between nodes, so you must configure your network to allow this seamless communication. Specific port requirements may vary depending on your network setup.
To perform these tasks, make sure to install the NVIDIA Run:ai Administrator CLI.
The following node roles can be configured on the cluster:
System node: Reserved for NVIDIA Run:ai system-level services.
GPU Worker node: Dedicated for GPU-based workloads.
CPU Worker node: Used for CPU-only workloads.
NVIDIA Run:ai system nodes run system-level services required to operate. This can be done via the Kubectl (recommended) or via NVIDIA Run:ai Administrator CLI.
By default, NVIDIA Run:ai applies a node affinity rule to prefer nodes that are labeled with node-role.kubernetes.io/runai-system for system services scheduling. You can modify the default node affinity rule by:
Editing the spec.global.affinity configuration parameter as detailed in Advanced cluster configurations.
Editing the global.affinity configuration as detailed in Advanced control plane configurations for self-hosted deployments.
To set a system role for a node in your Kubernetes cluster using Kubectl, follow these steps:
Use the kubectl get nodes command to list all the nodes in your cluster and identify the name of the node you want to modify.
Run one of the following commands to label the node with its role:
To set a system role for a node in your Kubernetes cluster, follow these steps:
Run the kubectl get nodes command to list all the nodes in your cluster and identify the name of the node you want to modify.
Run one of the following commands to set or remove a node’s role:
The set node-role command will label the node and set relevant cluster configurations.
NVIDIA Run:ai worker nodes run user-submitted workloads and system-level DeamonSets required to operate. This can be managed via the Kubectl (recommended) or via NVIDIA Run:ai Administrator CLI.
By default, GPU workloads are scheduled on GPU nodes based on the nvidia.com/gpu.present label. When global.nodeAffinity.restrictScheduling is set to true via the Advanced cluster configurations:
GPU Workloads are scheduled with node affinity rule to require nodes that are labeled with node-role.kubernetes.io/runai-gpu-worker
CPU-only Workloads are scheduled with node affinity rule to require nodes that are labeled with node-role.kubernetes.io/runai-cpu-worker
To set a worker role for a node in your Kubernetes cluster using Kubectl, follow these steps:
Validate the global.nodeAffinity.restrictScheduling is set to true in the cluster’s Configurations.
Use the kubectl get nodes command to list all the nodes in your cluster and identify the name of the node you want to modify.
Run one of the following commands to label the node with its role. Replace the label and value (true/false) to enable or disable GPU/CPU roles as needed:
To set worker role for a node in your Kubernetes cluster via NVIDIA Run:ai Administrator CLI, follow these steps:
Use the kubectl get nodes command to list all the nodes in your cluster and identify the name of the node you want to modify.
Run one of the following commands to set or remove a node’s role. <node-role> must be either --gpu-worker or --cpu-worker :
The set node-role command will label the node and set cluster configuration global.nodeAffinity.restrictScheduling true.
To back up the NVIDIA Run:ai cluster configurations:
Run the following command in your terminal:
Once the runaiconfig_back.yaml backup file is created, save the file externally, so that it can be retrieved later.
In the event of a critical Kubernetes failure or alternatively, if you want to migrate a NVIDIA Run:ai cluster to a new Kubernetes environment, simply reinstall the NVIDIA Run:ai cluster. Once you have reinstalled and reconnected the cluster, projects, workloads and other cluster data are synced automatically. Follow the steps below to restore the NVIDIA Run:ai cluster on a new Kubernetes environment.
Before restoring the NVIDIA Run:ai cluster, it is essential to validate that it is both disconnected and uninstalled:
If the Kubernetes cluster is still available, uninstall the NVIDIA Run:ai cluster. Make sure not to remove the cluster from the control plane.
Navigate to the Clusters grid in the NVIDIA Run:ai UI
Locate the cluster and verify its status is Disconnected
Follow the NVIDIA Run:ai cluster installation instructions and ensure all prerequisites are met.
If you have a backup of the cluster configurations, reload it once the installation is complete:
Navigate to the Clusters grid in the NVIDIA Run:ai UI
Locate the cluster and verify its status is Connected
If your cluster configuration disables automatic namespace creation for projects, you must manually:
Re-create each project namespace
Reapply the required role bindings for access control
For more information, see Advanced cluster configurations.
By default, NVIDIA Run:ai utilizes an internal PostgreSQL database to manage control plane data. This database resides on a Kubernetes Persistent Volume (PV). To safeguard against data loss, it's essential to implement a reliable backup strategy.
Consider the following methods to back up the PostgreSQL database:
PostgreSQL logical backup - Use pg_dump to create a logical backup of the database. Replace <password> with the appropriate PostgreSQL password. For example:
Persistent volume backup - Back up the entire PV that stores the PostgreSQL data.
Third-Party backup solutions - Integrate with external backup tools that support Kubernetes and PostgreSQL to automate and manage backups effectively.
NVIDIA Run:ai stores metrics history using Thanos. Thanos is configured to write data to a persistent volume (PV). To protect against data loss, it is recommended to regularly back up this volume.
The NVIDIA Run:ai control plane installation can be customized using --set flags during Helm deployment. These configuration overrides are preserved during upgrades but are not retained if Kubernetes is uninstalled or damaged. To ensure recovery, it's recommended to back up the full set of applied Helm customizations. You can retrieve the current configuration using:
Follow the steps below to restore the control plane including previously backed-up data and configurations:
Recreate the Kubernetes environment - Begin by provisioning a new Kubernetes or OpenShift cluster that meets all NVIDIA Run:ai installation requirements.
Restore Persistent Volumes - Recover the PVs and ensure these volumes are correctly reattached or restored from your backup solution:
PostgreSQL database - Stores control plane metadata
Thanos - Stores workload metrics and historical data
Reinstall the control plane - Install the NVIDIA Run:ai on the newly created cluster. During installation:
Use the saved Helm configuration overrides to preserve custom settings
Connect the control plane to the recovered PostgreSQL volume
Reconnect Thanos to the restored metrics volume
The Access rules table provides a list of all the access rules defined in the platform and allows you to manage them.
The Access rules table consists of the following columns:
Type
The type of subject assigned to the access rule (user, SSO group, or application).
Subject
The user, SSO group, or application assigned with the role
Role
The role assigned to the subject
Scope
The scope to which the subject has access. Click the name of the scope to see the scope and its subordinates
Authorized by
The user who granted the access rule
Creation time
The timestamp for when the rule was created
Filter - Click ADD FILTER, select the column to filter by, and enter the filter values
Search - Click SEARCH and type the value to search by
Sort - Click each column header to sort by
Column selection - Click COLUMNS and select the columns to display in the table
Download table - Click MORE and then Click Download as CSV. Export to CSV is limited to 20,000 rows.
To add a new access rule:
Click +NEW ACCESS RULE
Select a subject User, SSO Group, or Application
Select or enter the subject identifier:
User Email for a local user created in NVIDIA Run:ai or for SSO user as recognized by the IDP
Group name as recognized by the IDP
Application name as created in NVIDIA Run:ai
Select a role
Select a scope
Click SAVE RULE
Access rules cannot be edited. To change an access rule, you must delete the rule, and then create a new rule to replace it.
Select the access rule you want to delete
Click DELETE
On the dialog, click DELETE to confirm
To view the assigned roles and scopes you have access to:
Click the user avatar at the top right corner, then select Settings
Click User details
The list of assigned roles and scopes will be displayed.
Go to the Access rules API reference to view the available actions.
AI initiatives refer to advancing research, development, and implementation of AI technologies. These initiatives represent your business needs and involve collaboration between individuals, teams, and other stakeholders. AI initiatives require compute resources and a methodology to effectively and efficiently use those compute resources and split them among the different AI initiatives stakeholders. The building blocks of AI compute resources are GPUs, CPUs, and memory, which are built into nodes (servers) and can be further grouped into node pools. Nodes and node pools are part of a Kubernetes cluster.
To manage AI initiatives in NVIDIA Run:ai you should:
Map your organization and initiatives to projects and optionally departments
Map compute resources (node pools and quotas) to projects and optionally departments
Assign users (e.g. AI practitioners, ML engineers, Admins) to projects and departments
The way you map your AI initiatives and organization into NVIDIA Run:ai and should reflect your organization’s structure and Project management practices. There are multiple options, and we provide you here with 3 examples of typical forms in which to map your organization, initiatives, and users into NVIDIA Run:ai, but of course, other ways that suit your requirements are also acceptable.
A typical use case would be students (individual practitioners) within a faculty (business unit) - an individual practitioner may be involved in one or more initiatives. In this example, the resources are accounted for by the student (project) and aggregated per faculty (department).
Department = business unit / Project = individual practitioner
A typical use case would be an AI service (business unit) split into AI capabilities (initiatives) - an individual practitioner may be involved in several initiatives. In this example, the resources are accounted for by Initiative (project) and aggregated per AI service (department).
Department = business unit / Project = initiative
A typical use case would be a business unit split into teams - an individual practitioner is involved in a single team (project) but the team may be involved in several AI initiatives. In this example, the resources are accounted for by team (project) and aggregated per business unit (department).
Department = business unit / Project = team
AI initiatives require compute resources such as GPUs and CPUs to run. Compute resources in any organization are limited, either due to the number of servers (nodes) owned by the organization is limited, the budget it can spend to lease resources in the cloud or spending for in-house servers is also limited. Every organization strives to optimize the usage of its resources by maximizing their utilization and providing all users with their needs. Therefore, the organization needs to split resources according to the organization's internal priorities and budget constraints. But even after splitting the resources, the orchestration layer should still provide fairness between the resourced consumers, and allow access to unused resources to minimize scenarios of idle resources.
Another aspect of resource management is how to group your resources effectively, especially in large environments, or environments that are made of heterogeneous types of hardware, where some users need to use specific hardware types, or where other users should avoid occupying critical hardware of some users or initiatives.
NVIDIA Run:ai assists you with all of these complex issues by allowing you to map your cluster resources to node pools, then map each Project and Department a quota allocation per node pool, and set access rights to unused resources () per node pool.
There are several reasons why you would group resources (nodes) into node pools:
Control the GPU type to use in heterogeneous hardware environment - in many cases, AI models can be optimized per hardware type they will use, e.g. a training workload that is optimized for H100 does not necessarily run optimally on an A100, and vice versa. Therefore segmenting into node pools, each with a different hardware type gives the AI researcher and ML engineer better control of where to run.
Quota control - splitting to node pools allows the admin to set specific quota per hardware type, e.g. give high priority project guaranteed access to advanced GPU hardware, while keeping lower priority project with a lower quota or even with no quota at all for that high-end GPU, but give it a “best-effort” access only (i.e. if the high priority guaranteed project is not using those resources).
Multi-region or multi-availability-zone cloud environments - if some or all of your clusters run on the cloud (or even on-premise) but any of your clusters uses different physical locations or different topologies (e.g. racks), you probably want to segment your resources per region/zone/topology to be able to control where to run your workloads, how much quota to assign to specific environments (per project, per department), even if all those locations are all using the same hardware type. This methodology can help in optimizing the performance of your workloads because of the superior performance of local computing such as the locality of distributed workloads, local storage etc.
Set out below are illustrations of different grouping options.
Example: grouping nodes by topology
Example: grouping nodes by hardware type
After the initial grouping of resources, it is time to associate resources to AI initiatives, this is performed by assigning quotas to projects and optionally to departments. Assigning GPU quota to a project, on a node pool basis, means that the workloads submitted by that project are entitled to use those GPUs as guaranteed resources and can use them for all .
However, what happens if the project requires more resources than its quota? This depends on the type of workloads that the user wants to submit. If the user requires more resources for non-preemptible workloads, then the quota must be increased, because non-preemptible workloads require guaranteed resources. On the other hand, if the type of workload is, for example, a model Training workload that is preemptible - in this case the project can exploit unused resources of other projects, as long as the other projects don’t need them. over quota is set per project on a node-pool basis and per department.
Administrators can use quota allocations to prioritize resources between users, teams, and AI initiatives. The administrator can completely prevent the use of certain node pools by a project or department by setting the node pool quota to 0 and disabling over quota for that node pool, or it can keep the quota to 0 and enable over quota to that node pool and allow access based on resource availability only (e.g. unused GPUs). However, when a project with a non-zero quota needs to use those resources, the Scheduler reclaims those resources back and preempts the preemptible workloads of over quota projects. As an administrator, you can also have an impact on the amount of over quota resources a project or department uses.
It is essential to make sure that the sum of all projects' quota does NOT surpass that of the Department, and that the sum of all departments does not surpass the number of physical resources, per node pool and for the entire cluster (we call such behavior - ‘over-subscription’). The reason over-subscription is not recommended is that it may produce unexpected scheduling decisions, especially those that might preempt ‘non-preemptive’ workloads or fail to schedule workloads within quota, either non-preemptible or preemptible, thus quota cannot be considered anymore as ‘guaranteed’. Admins can opt-in a system flag that helps to prevent over-subscription scenarios.
Example: assigning resources to projects
NVIDIA Run:ai system is using to manage users’ access rights to the different objects of the system, its resources, and the set of allowed actions. To allow AI researchers, ML engineers, Project Admins, or any other stakeholder of your AI initiatives to access projects and use AI compute resources with their AI initiatives, the administrator needs to assign users to projects. After a user is assigned to a project with the proper role, e.g. ‘L1 Researcher’, the user can submit and monitor its workloads under that project. Assigning users to departments is usually done to assign ‘Department Admin’ to manage a specific department. Other roles, such as ‘L1 Researcher’, can also be assigned to departments, this allows the researcher access to all projects within that department.
This is an example of an organization, as represented in the NVIDIA Run:ai platform:
The organizational tree is structured from top down under a single node headed by the account. The account is comprised of clusters, departments and projects.
After mapping and building your hierarchal structured organization as shown above, you can assign or associate various NVIDIA Run:ai components (e.g. workloads, roles, assets, policies, and more) to different parts of the organization - these organizational parts are the Scopes. The following organizational example consists of 5 optional scopes:
Now that resources are grouped into node pools, organizational units or business initiatives are mapped into projects and departments, projects’ quota parameters are set per node pool, and users are assigned to projects, you can finally from a project and use compute resources to run your AI initiatives.
Efficient resource allocation is critical for managing AI and compute-intensive workloads in Kubernetes clusters. The NVIDIA Run:ai Scheduler enhances Kubernetes' native capabilities by introducing advanced scheduling principles such as fairness, quota management, and dynamic resource balancing. It ensures that workloads, whether simple single-pod or complex distributed tasks, are allocated resources effectively while adhering to organizational policies and priorities.
This guide explores the NVIDIA Run:ai Scheduler’s allocation process, preemption mechanisms, and resource management. Through examples and detailed explanations, you'll gain insights into how the Scheduler dynamically balances workloads to optimize cluster utilization and maintain fairness across projects and departments.
When a workload is submitted, the workload controller creates a pod or pods (for distributed training workloads or deployment based inference). When the Scheduler gets a submit request with the first pod, it creates a and allocates all the relevant building blocks of that workload. The next pods of the same workload are attached to the same pod group.
A workload, with its associated pod group, is queued in the appropriate . In every scheduling cycle, the Scheduler ranks the order of queues by calculating their precedence for scheduling.
The next step is for the Scheduler to find nodes for those pods, assign the pods to their nodes (bind operation), and bind other building blocks of the pods such as storage, ingress and so on. If the pod group has a rule attached to it, the Scheduler either allocates and binds all pods together, or puts all of them into pending state. It retries to schedule them all together in the next scheduling cycle. The Scheduler also updates the status of the pods and their associated pod group. Users are able to track the workload submission process both in the CLI or NVIDIA Run:ai UI. For more details on submitting and managing workloads, see .
If the Scheduler cannot find resources for the submitted workloads (and all of its associated pods), and the workload deserves resources either because it is under its queue quota , the Scheduler tries to from other queues. If this does not solve the resource issue, the Scheduler tries to preempt within the same queue (project).
Reclaim is an inter-project and inter-department resource balancing action that takes back resources from one project or department that has used them as an over quota. It returns the resources back to a project (or department) that deserves those resources as part of its deserved quota, or to balance fairness between projects (or departments), so a project (or department) does not exceed its fairshare (portion of the unused resources).
This mode of operation means that a lower priority workload submitted in one project (e.g. training) can reclaim resources from a project that runs a higher priority workload (e.g. preemptive workspace) if fairness balancing is required.
may preempt lower priority preemptible workloads within the same project/node pool queue. For example, in a project that runs a training workload that exceeds the project quota for a certain node pool, a newly submitted workspace within the same project/node pool may stop (preempt) the training workload if there are not enough over quota resources for the project within that node pool to run both workloads (e.g. workspace using in-quota resources and training using over quota resources).
The NVIDIA Run:ai Scheduler strives to ensure fairness between projects and between departments. This means each department and project always strive to get their deserved quota, and unused resources are split between projects according to known rules (e.g. over quota weights).
If a project needs more resources even beyond its fairshare, and the Scheduler finds unused resources that no other project needs, this project can consume resources even beyond its fairshare.
Some scenarios can prevent the Scheduler from fully providing deserved quota and fairness:
Fragmentation or other scheduling constraints such affinities, taints etc.
Some requested resources, such as GPUs and CPU memory, can be allocated, while others, like CPU cores, are insufficient to meet the request. As a result, the Scheduler will place the workload in a pending state until the required resource becomes available.
The example below illustrates a split of quota between different projects and departments using several node pools:
The example below illustrates how fairshare is calculated per project/node pool for the above example:
For each Project:
The over quota (OQ) portion of each project (per node pool) is calculated as:
[(OQ-Weight) / (Σ Projects OQ-Weights)] x (Unused Resource per node pool)
Fairshare is calculated as the sum of quota + over quota.
The Scheduler constantly re-calculates the fairshare of each project and department per node pool, represented in the scheduler as queues, resulting in the re-balancing of resources between projects and between departments. This means that a preemptible workload that was granted resources to run in one scheduling cycle, can find itself preempted and go back to pending state while waiting for resources in the next cycle.
A queue, representing a scheduler-managed object for each project or department per node pool, can be in one of 3 states:
In-quota: The queue’s allocated resources ≤ queue deserved quota. The Scheduler’s first priority is to ensure each queue receives its deserved quota.
Over quota but below fairshare: The queue’s deserved quota < queue’s allocated resources <= queue’s fairshare. The Scheduler tries to find and allocate more resources to queues that need resources beyond their deserved quota and up to their fairshare.
Over-fairshare and over quota: The queue’s fairshare < queue’s allocated resources. The Scheduler tries to allocate resources to queues that need even more resources beyond their fairshare.
When re-balancing resources between queues of different projects and departments, the Scheduler goes in the opposite direction, i.e. first take resources from over-fairshare queues, then from over quota queues, and finally, in some scenarios, even from queues that are below their deserved quota.
Now that you have gained insights into how the Scheduler dynamically balances workloads to optimize cluster utilization and maintain fairness across projects and departments, you can . Before submitting your workloads, it’s important to familiarize yourself with the following key topics:
- Learn what workloads are and what is supported for both NVIDIA Run:ai and third-party workloads.
- Explore the various NVIDIA Run:ai workload types available and understand their specific purposes to enable you to choose the most appropriate workload type for your needs.
At NVIDIA Run:ai, Administrators can access a suite of tools designed to facilitate efficient account management. This article focuses on two key features: workload policies and workload scheduling rules. These features empower admins to establish default values and implement restrictions allowing enhanced control, assuring compatibility with organizational policies, and optimizing resource usage and utilization.
A workload policy is an end-to-end solution for AI managers and administrators to control and simplify how workloads are submitted. This solution allows them to set best practices, enforce limitations, and standardize processes for the submission of workloads for AI projects within their organization. It acts as a key guideline for data scientists, researchers, ML & MLOps engineers by standardizing submission practices and simplifying the workload submission process.
Implementing workload policies is essential when managing complex AI projects within an enterprise for several reasons:
Resource control and management - Defining or limiting the use of costly resources across the enterprise via a centralized management system to ensure efficient allocation and prevent overuse.
Setting best practices - Provide managers with the ability to establish guidelines and standards to follow, reducing errors amongst AI practitioners within the organization.
Security and compliance - Define and enforce permitted and restricted actions to uphold organizational security and meet compliance requirements.
The following sections provide details of how the workload policy mechanism works.
The policy enforces the workloads regardless of whether they were submitted via UI, CLI, Rest APIs, or Kubernetes YAMLs.
NVIDIA Run:ai’s policies enforce NVIDIA Run:ai workloads. The policy type is per . This allows administrators to set different policies for each workload type.
A policy consists of rules for limiting and controlling the values of fields of the workload. In addition to rules, some defaults allow the implementation of default values to different workload fields. These default values are not rules, as they simply suggest values that can be overridden during the workload submission.
Furthermore, policies allow the enforcement of workload assets. For example, as an admin, you can impose a data source of type PVC to be used by any workload submitted.
For more information, see , and .
Numerous teams working on various projects require the use of different tools, requirements, and safeguards. One policy may not suit all teams and their requirements. Hence, administrators can select the scope to cover the effectiveness of the policy. When a scope is selected, all of its subordinate units are also affected. As a result, all workloads submitted within the selected scope are controlled by the policy.
For example, if a policy is set for Department A, all workloads submitted by any of the projects within this department are controlled.
A scope for a policy can be:
The different scoping of policies also allows the breakdown of the responsibility between different administrators. This allows delegation of ownership between different levels within the organization. The policies, containing rules and defaults, propagate* down the organizational tree, forming an “effective” policy that enforces any workload submitted by users within the project.
If a field is used by multiple policies at different scopes, the platform applies a reconciliation mechanism to determine which policy takes effect. Defaults of the same field can still be submitted by different organizational policies, as they are considered “soft” rules. In this case, the closest scope to the workload becomes the effective default (project default “wins” vs. department default, department default “wins” vs. cluster default, etc.). For rules, precedence depends on their type: simple rules on non-security and non-compute fields follow the same order as defaults (project > department > cluster), while strict rules on security and compute fields apply in reverse order (cluster > department > project).
limit a researcher's access to resources and provides a way for the admin to control resource allocation and prevent the waste of resources. Admins should use the rules to prevent GPU idleness, prevent GPU hogging and allocate specific types of resources to different types of workloads.
Admin can limit the duration of a workload, the duration of the idle time, or the type of nodes the workload can use. Rules are defined for and apply to all workloads in the project or department. In addition, rules can be applied to a specific type of workload in a project or department (workspace, standard training, or inference). When a workload reaches the limitation of the rule, it is stopped if the rule is time-limited. The rule type prevents the workload from being scheduled on nodes that violate the rule limitation.
This section provides detailed instructions on how to manage both planned and unplanned node downtimes in a Kubernetes cluster running NVIDIA Run:ai. It covers all the steps to maintain service continuity and ensure the proper handling of workloads during these events.
Access to Kubernetes cluster - Administrative access to the Kubernetes cluster, including permissions to run kubectl commands
Basic knowledge of Kubernetes - Familiarity with Kubernetes concepts such as nodes, taints, and workloads
NVIDIA Run:ai installation - The and configured within your Kubernetes cluster
Node naming conventions - Know the names of the nodes within your cluster, as these are required when executing the commands
This section distinguishes between two types of nodes within a NVIDIA Run:ai installation:
Worker nodes - Nodes on which AI practitioners can submit and run workloads
NVIDIA Run:ai system nodes - Nodes on which the NVIDIA Run:ai software runs, managing the cluster's operations
Worker nodes are responsible for running workloads. When a worker node goes down, either due to planned maintenance or unexpected failure, workloads ideally migrate to other available nodes or wait in the queue to be executed when possible.
The following workload types can run on worker nodes:
Training workloads - These are long-running processes that, in case of node downtime, can automatically move to another node.
Interactive workloads - These are short-lived, interactive processes that require manual intervention to be relocated to another node.
Before stopping a worker node for maintenance, perform the following steps:
Prevent new workloads on the node
To stop the Kubernetes Scheduler from assigning new workloads to the node and to safely remove all existing workloads, copy the following command to your terminal:
<node-name>
Replace this placeholder with the actual name of the node you want to drain
kubectl taint nodes
This command is used to add a taint to the node, which prevents any new pods from being scheduled on it
In the event of unplanned downtime:
Automatic restart If a node fails but immediately restarts, all services and workloads automatically resume.
Extended downtime
If the node remains down for an extended period, drain the node to migrate workloads to other nodes. Copy the following command to your terminal:
The command works the same as in the planned maintenance section, ensuring that no workloads remain scheduled on the node while it is down.
Reintegrate the node
In a production environment, the services responsible for scheduling, submitting and managing NVIDIA Run:ai workloads operate on one or more NVIDIA Run:ai system nodes. It is recommended to have more than one system node to ensure . If one system node goes down, another can take over, maintaining continuity. If a second system node does not exist, you must designate another node in the cluster as a temporary NVIDIA Run:ai system node to maintain operations.
The protocols for handling planned maintenance and unplanned downtime are identical to those for worker nodes. Refer to the above section for detailed instructions.
To rejoin a node to the Kubernetes cluster, follow these steps:
Generate a join command on the master node
On the master node, copy the following command to your terminal:
kubeadm token create
This command generates a token that can be used to join a node to the Kubernetes cluster.
--print-join-command
This option outputs the full command that needs to be run on the worker node to rejoin it to the cluster.
The workload priority management feature allows you to change the priority of a workload within a project. The priority determines the workload's position in the project scheduling queue managed by the NVIDIA Run:ai Scheduler. By adjusting the priority, you can increase the likelihood that a workload will be scheduled and preferred over others within the same project, ensuring that critical tasks are given higher priority and resources are allocated efficiently.
You can change the priority of a workload by selecting one of the predefined values from the NVIDIA Run:ai priority dictionary. This can be done using the NVIDIA Run:ai UI, API or CLI, depending on the workload type.
Workload priority is defined by selecting a string name from a predefined list in the NVIDIA Run:ai priority dictionary. Each string corresponds to a specific Kubernetes , which in turn determines scheduling behavior, such as whether the workload is preemptible or allowed to run over quota.
Non-preemptible workloads must run within the project’s deserved quota, cannot use over-quota resources, and will not be interrupted once scheduled.
Preemptible workloads can use opportunistic compute resources beyond the project’s quota but may be interrupted at any time.
Both NVIDIA Run:ai and third-party workloads are assigned a default priority. The below table shows the default priority per workload type:
The below table shows the default priority listed in the previous section and the supported override options per workload:
You can override the default priority when submitting a workload through the UI, API, or CLI depending on the workload type.
To use the override options:
UI: Enable "Allow the workload to exceed the project quota" when
API: Set PriorityClass in the
CLI: using the --priority flag
To use the override options:
API: Set PriorityClass in the
CLI: using the --priority flag
This article explains the procedure to configure and manage scheduling rules.
Scheduling rules are restrictions applied to workloads. These restrictions apply to either the resources (nodes) on which workloads can run or the duration of the run time. Scheduling rules are set for Projects or Departments and apply to specific workload types. Once scheduling rules are set for a project or department, all matching workloads associated with the project have the restrictions applied to them, as defined, when the workload was submitted. New scheduling rules added to a project are not applied over previously created workloads associated with that project.
There are three types of scheduling rules:
This rule limits the duration of a workload run time. Workload run time is calculated as the total time in which the workload was in status Running. You can apply a single rule per workload type - Preemptive Workspaces, Non-preemptive Workspaces, and Training.
This rule limits the total GPU time of a workload. Workload idle time is counted from the first time the workload is in status Running and the GPU was idle. Idleness is calculated by employing the runai_gpu_idle_seconds_per_workload metric. This metric determines the total duration of zero GPU utilization within each 30-second interval. If the GPU remains idle throughout the 30-second window, 30 seconds are added to the idleness sum; otherwise, the idleness count is reset. You can apply a single rule per workload type - “Preemptible” Workspaces, “Non-preemptible” Workspaces, and Training.
Node type is used to select a group of nodes, typically with specific characteristics such as a hardware feature, storage type, fast networking interconnection, etc. The uses node type as an indication of which nodes should be used for your workloads, within this project.
Node type is a label in the form of run.ai/type and a value (e.g. run.ai/type = dgx200) that the administrator uses to tag a set of nodes. Adding the node type to the project’s scheduling rules mandates the user to submit workloads with a node type label/value pairs from this list, according to the workload type - Workspace or Training. The Scheduler then schedules workloads using a node selector, targeting nodes tagged with the NVIDIA Run:ai node type label/value pair. Node pools and a node type can be used in conjunction. For example, specifying a node pool and a smaller group of nodes from that node pool that includes a fast SSD memory or other unique characteristics.
The administrator should use a node label with the key of run.ai/type and any coupled value
To assign a label to nodes you want to group, set the ‘node type (affinity)’ on each relevant node:
Obtain the list of nodes and their current labels by copying the following to your terminal:
Annotate a specific node with a new label by copying the following to your terminal:
To add a scheduling rule:
Select the project/department for which you want to add a scheduling rule
Click EDIT
In the Scheduling rules section click +RULE
Select the rule type
To edit a scheduling rule:
Select the project/department for which you want to edit its scheduling rule
Click EDIT
Find the scheduling rule you would like to edit
Edit the rule
To delete a scheduling rule:
Select the project/department from which you want to delete a scheduling rule
Click EDIT
Find the scheduling rule you would like to delete
Click on the x icon
Go to the API reference to view the available actions
In the world of machine learning (ML), the journey from raw data to actionable insights is a complex process that spans multiple stages. Each stage of the AI lifecycle requires different tools, resources, and frameworks to ensure optimal performance. NVIDIA Run:ai simplifies this process by offering specialized workload types tailored to each phase, facilitating a smooth transition across various stages of the ML workflows.
The ML lifecycle usually begins with the experimental work on data and exploration of different modeling techniques to identify the best approach for accurate predictions. At this stage, resource consumption is usually moderate as experimentation is done on a smaller scale. As confidence grows in the model's potential and its accuracy, the demand for compute resources increases. This is especially true during the training phase, where vast amounts of data need to be processed, particularly with complex models such as large language models (LLMs), with their huge parameter sizes, that often require distributed training across multiple GPUs to handle the intensive computational load.
Finally, once the model is ready, it moves to the inference stage, where it is deployed to make predictions on new, unseen data. NVIDIA Run:ai's workload types are designed to correspond with the natural stages of this lifecycle. They are structured to align with the specific resource and framework requirements of each phase, ensuring that AI researchers and data scientists can focus on advancing their models without worrying about infrastructure management.
NVIDIA Run:ai offers three workload types that correspond to a specific phase of the researcher’s work:
Workspaces – For experimentation with data and models.
Training – For resource-intensive tasks such as model training and data preparation.
Inference – For deploying and serving the trained model.
The Workspace is where data scientists conduct initial research, experiment with different data sets, and test various algorithms. This is the most flexible stage in the ML lifecycle, where models and data are explored, tuned, and refined. The value of workspaces lies in the flexibility they offer, allowing the researcher to iterate quickly without being constrained by rigid infrastructure.
Framework flexibility
Workspaces support a variety of machine learning frameworks, as researchers need to experiment with different tools and methods.
Resource requirements
Workspaces are often lighter on resources compared to the training phase, but they still require significant computational power for data processing, analysis, and model iteration.
Hence, the default for the NVIDIA Run:ai workspaces considerations is to allow scheduling those workloads without the ability to preempt them once the resources were allocated. However, this non-preemptible state doesn’t allow utilizing more resources outside of the project’s deserved quota.
See to learn more about how to submit a workspace via the NVIDIA Run:ai platform. For quick starts, see .
As models mature and the need for more robust data processing and model training increases, NVIDIA Run:ai facilitates this shift through the Training workload. This phase is resource-intensive, often requiring distributed computing and high-performance clusters to process vast data sets and train models.
Training architecture
For training workloads NVIDIA Run:ai allows you to specify the architecture - standard or distributed. The distributed architecture is relevant for larger data sets and more complex models that require utilizing multiple nodes. For the distributed architecture, NVIDIA Run:ai allows you to specify different configurations for the master and workers and select which framework to use - PyTorch, XGBoost, MPI, TensorFlow and JAX. In addition, as part of the distributed configuration, NVIDIA Run:ai enables the researchers to schedule their distributed workloads on nodes within the same region, zone, placement group, or any other topology.
Resource requirements
Training tasks demand high memory, compute power, and storage. NVIDIA Run:ai ensures that the allocated resources match the scale of the task and allows those workloads to utilize more compute resources than the project’s deserved quota. Make sure that if you wish your training workload not to be preempted, specify the number of GPUs that are in your quota.
See and to learn more about how to submit a training workload via the NVIDIA Run:ai UI. For quick starts, see and .
Once a model is trained and validated, it moves to the Inference stage, where it is deployed to make predictions (usually in a production environment). This phase is all about efficiency and responsiveness, as the model needs to serve real-time or batch predictions to end-users or other systems.
Inference-specific use cases
Naturally, inference workloads are required to change and adapt to the ever-changing demands to meet SLA. For example, additional replicas may be deployed, manually or automatically, to increase compute resources as part of a horizontal scaling approach or a new version of the deployment may need to be rolled out without affecting the running services.
Resource requirements
Inference models differ in size and purpose, leading to varying computational requirements. For example, small OCR models can run efficiently on CPUs, whereas LLMs typically require significant GPU memory for deployment and serving. Inference workloads are considered production-critical and are given the highest priority to ensure compliance with SLAs. Additionally, NVIDIA Run:ai ensures that inference workloads cannot be preempted, maintaining consistent performance and reliability.
See to learn more about how to submit an inference workload via the NVIDIA Run:ai UI. For a quick start, see .
Single Sign-On (SSO) is an authentication scheme, allowing users to log-in with a single pair of credentials to multiple, independent software systems.
This article explains the procedure to to NVIDIA Run:ai using the OpenID Connect protocol.
Before you start, make sure you have the following available from your identity provider:
This guide provides actionable best practices for administrators to securely configure, operate, and manage NVIDIA Run:ai environments. Each section highlights both platform-native features and mapped Kubernetes security practices to maintain robust protection for workloads and resources.
This section explains the procedure of managing reports in NVIDIA Run:ai.
Reports allow users to access and organize large amounts of data in a clear, CSV-formatted layout. They enable users to monitor resource consumption, analyze trends, and make data-driven decisions to optimize their AI workloads effectively.
kubectl label nodes <node-name> node-role.kubernetes.io/runai-system=true
kubectl label nodes <node-name> node-role.kubernetes.io/runai-system=falserunai-adm set node-role --runai-system-worker <node-name>
runai-adm remove node-role --runai-system-worker <node-name>runai-adm set node-role <node-role> <node-name>
runai-adm remove node-role <node-role> <node-name>kubectl get runaiconfig runai -n runai -o yaml -o=jsonpath='{.spec}' > runaiconfig_backup.yamlkubectl apply -f runaiconfig_backup.yaml -n runaikubectl -n runai-backend exec -it runai-backend-postgresql-0 -- \
env PGPASSWORD=<password> pg_dump -U postgres backend > cluster_name_db_backup.sqlhelm get values runai-backend -n runai-backendContainer Registry
Pull NVIDIA Run:ai images
All kubernetes nodes
runai.jfrog.io
443
Hugging Face
Browse Hugging Face models
NVIDIA Run:ai control plane system nodes
huggingface.co
443
Helm repository
NVIDIA Run:ai Helm repository for installation
Installer machine
runai.jfrog.io
443
Red Hat Container Registry
Prometheus Operator image repository
All kubernetes nodes
quay.io
443
Docker Hub Registry
Training Operator image repository
All kubernetes nodes
docker.io
443
Explainability and predictability - large environments are complex to understand, this becomes even more complex when an environment is loaded. To maintain users’ satisfaction and their understanding of the resources state, as well as to keep predictability of your workload chances to get scheduled, segmenting your cluster into smaller pools may significantly help.
Scale - NVIDIA Run:ai implementation of node pools has many benefits, one of the main of them is scale. Each node pool has its own Scheduler instance, therefore allowing the cluster to handle more nodes and schedule workloads faster when segmented into node pools vs. one large cluster. To allow your workloads to use any resource within a cluster that is split to node pools, a second-level Scheduler is in charge of scheduling workloads between node pools according to your preferences and resource availability.
Prevent mutual exclusion - Some AI workloads consume CPU-only resources, to prevent those workloads from consuming the CPU resources of GPU nodes and thus block GPU workloads from using those nodes, it is recommended to group CPU-only nodes into a dedicated node pool(s) and assign a quota for CPU projects to CPU node-pools only while keeping GPU node-pools with zero quota and optionally “best-effort” over quota access for CPU-only projects.








In Project 2, we assume that out of the 36 available GPUs in node pool A, 20 GPUs are currently unused. This means either these GPUs are not part of any project’s quota, or they are part of a project’s quota but not used by any workloads of that project:
Project 2 over quota share:
[(Project 2 OQ-Weight) / (Σ all Projects OQ-Weights)] x (Unused Resource within node pool A)
[(3) / (2 + 3 + 1)] x (20) = (3/6) x 20 = 10 GPUs
Fairshare = deserved quota + over quota = 6 +10 = 16 GPUs. Similarly, fairshare is also calculated for CPU and CPU memory. The Scheduler can grant a project more resources than its fairshare if the Scheduler finds resources not required by other projects that may deserve those resources.
In Project 3, fairshare = deserved quota + over quota = 0 +3 = 3 GPUs. Project 3 has no guaranteed quota, but it still has a share of the excess resources in node pool A. The NVIDIA Run:ai Scheduler ensures that Project 3 receives its part of the unused resources for over quota, even if this results in reclaiming resources from other projects and preempting preemptible workloads.



Last updated
The last time the access rule was updated

Scalability and diversity
Multi-purpose clusters with various workload types that may have different requirements and characteristics for resource usage.
The organization has multiple hierarchies, each with distinct goals, objectives, and degrees of flexibility.
Manage multiple users and projects with distinct requirements and methods, ensuring appropriate utilization of resources.
Workspace
Workspace
Interactive workload
Training: Standard
Training: Standard
Training workload
Training: Distributed
Training: Distributed
Distributed workload
Inference
Inference


Inference workload
Select the workload type and time limitation period
For Node type, choose one or more labels for the desired nodes
Click SAVE
Click SAVE
runai=drain:NoExecute
This specific taint ensures that all existing pods on the node are evicted and rescheduled on other available nodes, if possible
Result: The node stops accepting new workloads, and existing workloads either migrate to other nodes or are placed in a queue for later execution.
Shut down and perform maintenance
After draining the node, you can safely shut it down and perform the necessary maintenance tasks.
Restart the node
Once maintenance is complete and the node is back online, remove the taint to allow the node to resume normal operations. Copy the following command to your terminal:
runai=drain:NoExecute-
The - at the end of the command indicates the removal of the taint. This allows the node to start accepting new workloads again.
Result: The node rejoins the cluster's pool of available resources, and workloads can be scheduled on it as usual.
Result: This action reintegrates the node into the cluster, allowing it to accept new workloads.
Permanent shutdown
If the node is to be permanently decommissioned, remove it from Kubernetes with the following command:
kubectl delete node
This command completely removes the node from the cluster
<node-name>
Replace this placeholder with the actual name of the node
Result: The node is no longer part of the Kubernetes cluster. If you plan to bring the node back later, it must be rejoined to thel cluster using the steps outlined in the next section.
kubeadm join command.Run the join command on the worker node
Copy the kubeadm join command generated from the previous step and run it on the worker node that needs to rejoin the cluster.
The kubeadm join command re-enrolls the node into the cluster, allowing it to start participating in the cluster's workload scheduling.
Verify node rejoining
Verify that the node has successfully rejoined the cluster by running:
kubectl get nodes
This command lists all nodes currently part of the Kubernetes cluster, along with their status
Result: The rejoined node should appear in the list with a status of Ready
Re-label nodes
Once the node is ready, ensure it is labeled according to its role within the cluster.
kubectl label nodes <node-name> node-role.kubernetes.io/runai-gpu-worker=true
kubectl label nodes <node-name> node-role.kubernetes.io/runai-cpu-worker=falsekubectl get nodes --show-labelskubectl label node <node-name> run.ai/type=<value>kubectl taint nodes <node-name> runai=drain:NoExecute-kubectl delete node <node-name>kubeadm join <master-ip>:<master-port>
--token <token> \ --discovery-token-ca-cert-hash sha256:<hash>kubectl get nodeskubectl taint nodes <node-name> runai=drain:NoExecutekubectl taint nodes <node-name> runai=drain:NoExecutekubeadm token create --print-join-commandkubectl taint nodes <node-name> runai=drain:NoExecute- 4
train
Preemptible
Available
1
inference
Non-preemptible
Not available
2
build
Non-preemptible
Not available
3
interactive-preemptible
Preemptible
build
train
inference
train
inference
Available
ClientID - The ID used to identify the client with the Authorization Server.
Client Secret - A secret password that only the Client and Authorization server know.
Optional: Scopes - A set of user attributes to be used during authentication to authorize access to a user's details.
Go to General settings
Open the Security section and click +IDENTITY PROVIDER
Select Custom OpenID Connect
Enter the Discovery URL, Client ID, and Client Secret
Copy the Redirect URL to be used in your identity provider
Optional: Add the OIDC scopes
Optional: Enter the user attributes and their value in the identity provider as shown in the below table
Click SAVE
Optional: Enable Auto-Redirect to SSO to automatically redirect users to your configured identity provider’s login page when accessing the platform.
User role groups
GROUPS
If it exists in the IDP, it allows you to assign NVIDIA Run:ai role groups via the IDP. The IDP attribute must be a list of strings or an object where the group names are the values.
Linux User ID
UID
If it exists in the IDP, it allows Researcher containers to start with the Linux User UID. Used to map access to network resources such as file systems to users. The IDP attribute must be of type integer.
Linux Group ID
GID
If it exists in the IDP, it allows Researcher containers to start with the Linux Group GID. The IDP attribute must be of type integer.
Supplementary Groups
SUPPLEMENTARYGROUPS
Log in to the NVIDIA Run:ai platform as an admin
Add access rules to an SSO user defined in the IDP
Open the NVIDIA Run:ai platform in an incognito browser tab
On the sign-in page click CONTINUE WITH SSO You are redirected to the identity provider sign in page
In the identity provider sign-in page, log in with the SSO user who you granted with access rules
If you are unsuccessful signing-in to the identity provider, follow the section below
You can view the identity provider details and edit its configuration:
Go to General settings
Open the Security section
On the identity provider box, click Edit identity provider
You can edit either the Discovery URL, Client ID, Client Secret, OIDC scopes, or the User attributes
You can remove the identity provider configuration:
Go to General settings
Open the Security section
On the identity provider card, click Remove identity provider
In the dialog, click REMOVE to confirm the action
If testing the setup was unsuccessful, try the different troubleshooting scenarios according to the error you received.
If you have opted to use an external PostgreSQL database, you need to perform initial setup to ensure successful installation. Follow these steps:
Create a SQL script file, edit the parameters below, and save it locally:
Replace <DATABASE_NAME> with a dedicate database name for NVIDIA Run:ai in your PostgreSQL database.
Replace <ROLE_NAME> with a dedicated role name (user) for NVIDIA Run:ai database.
Replace <ROLE_PASSWORD> with a password for the new PostgreSQL role.
Replace <GRAFANA_PASSWORD> with the password to be set for Grafana integration.
Run the following command on a machine where PostgreSQL client (pgsql) is installed:
Replace <POSTGRESQL_HOST> with the PostgreSQL ip address or hostname.
Replace <POSTGRESQL_USER> with the PostgreSQL username.
Tools and serving endpoint access control
Control who can access tools and endpoints; restrict network exposure
Maintenance and compliance
Follow secure install guides, perform vulnerability scans, maintain data-privacy alignment
NVIDIA Run:ai uses Role‑Based Access Control to define what each user, group, or application can do, and where. Roles are assigned within a scope, such as a project, department, or cluster, and permissions cover actions like viewing, creating, editing, or deleting entities. Unlike Kubernetes RBAC, NVIDIA Run:ai’s RBAC works across multiple clusters, giving you a single place to manage access rules. See Role Based Access Control (RBAC) for more details.
Assign the minimum required permissions to users, groups and applications.
Segment duties using organizational scopes to restrict roles to specific projects or departments.
Regularly audit access rules and remove unnecessary privileges, especially admin-level roles.
NVIDIA Run:ai predefined roles are automatically mapped to Kubernetes cluster roles (also predefined by NVIDIA Run:ai). This means administrators do not need to manually configure role mappings.
These cluster roles define permissions for the entities NVIDIA Run:ai manages and displays (such as workloads) and also apply to users who access cluster data directly through Kubernetes tools (for example, kubectl).
NVIDIA Run:ai supports several authentication methods to control platform access. You can use single sign-on (SSO) for unified enterprise logins, traditional username/password accounts if SSO isn’t an option, and API secret keys for automated application access. Authentication is mandatory for all interfaces, including the UI, CLI, and APIs, ensuring only verified users or applications can interact with your environment.
Administrators can also configure session timeout. This refers to the period of inactivity before a user is automatically logged out. Once the timeout is reached, the session ends and re‑authentication is required, helping protect against risks from unattended or abandoned sessions. See Authentication and authorization for more details.
Integrate corporate SSO for centralized identity management.
Enforce strong password policies for local accounts.
Set appropriate session timeout values to minimize idle session risk.
Prefer SSO to eliminate password management within NVIDIA Run:ai.
Configure the Kubernetes API server to validate tokens via NVIDIA Run:ai’s identity service, ensuring unified authentication across the platform. For more information, see Cluster authentication.
Workload policies allow administrators to define and enforce how AI workloads are submitted and controlled across projects and teams. With these policies, you can set clear rules and defaults for workload parameters such as which resources can be requested, required security settings, and which defaults should apply. Policies are enforced whether workloads are submitted via the UI, CLI, API or Kubernetes YAML, and can be scoped to specific projects, departments, or clusters for fine-grained control. See Policies and rules for more details.
Enforce containers to run as non-root by default. Define policies that set constraints and defaults for workload submissions, such as requiring non-root users or specifying minimum UID/GID. Example security fields in policies:
security.runAsNonRoot: true
security.runAsUid: 1000
Restrict runAsUid with canEdit: false to prevent users from overriding.
Require explicit user/group IDs for all workload containers.
Impose data source and resource usage limits through policies.
Use policy rules to prevent users from submitting non-compliant workloads.
Apply policies by organizational scope for nuanced control within departments or projects.
Map these policies to PodSecurityContext settings in Kubernetes, and enforce them with Pod Security Admission or Kyverno for stricter compliance.
NVIDIA Run:ai offers flexible controls for how namespaces and resources are created and managed within your clusters. When a new project is set up, you can choose whether Kubernetes namespaces are created automatically, and whether users are auto-assigned to those projects. There are also options to manage how secrets are propagated across namespaces and to enable or disable resource limit enforcement using Kubernetes LimitRange objects. See Advanced cluster configurations for more details.
Require admin approval for namespace creation to avoid sprawl.
Limit secret propagation to essential cases only.
Use Kubernetes LimitRanges and ResourceQuotas alongside NVIDIA Run:ai policies for layered resource control.
Regularly audit and remove unused namespaces, secrets, and workloads.
NVIDIA Run:ai provides flexible options to control access to tools and serving endpoints. Access can be defined during workload submission or updated later, ensuring that only the intended users or groups can interact with the resource.
When configuring an endpoint or tool, users can select from the following access levels:
Public - Everyone within the network can access with no authentication (serving endpoints).
All authenticated users - Access is granted to anyone in the organization who can log in (NVIDIA Run:ai or SSO).
Specific groups - Access is restricted to members of designated identity provider groups.
Specific users - Access is restricted to individual users by email or username.
By default, network exposure is restricted, and access must be explicitly granted. Model endpoints automatically inherit RBAC and workload policy controls, ensuring consistent enforcement of role- and scope-based permissions across the platform. Administrators can also limit who can deploy, view, or manage endpoints, and should open network access only when required.
Define explicit roles for model management/use.
Restrict endpoint access to authorized users, groups and applications.
Monitor and audit endpoint access logs.
Use Kubernetes NetworkPolicies to limit inter-pod and external traffic to model-serving pods. Pair with NVIDIA Run:ai RBAC for end-to-end control.
A secure deployment is the foundation on which all other controls rest, and NVIDIA Run:ai’s installation procedures are built to align with organizational policies such as OpenShift Security Context Constraints (SCC). See Advanced cluster configurations for more details.
Deploy NVIDIA Run:ai cluster following secure installation guides (including IT compliance mandates such as SCC for OpenShift).
Run regular security scans and patch/update NVIDIA Run:ai deployments promptly when vulnerabilities are reported.
Regularly review and update all security policies, both at the NVIDIA Run:ai and Kubernetes levels, to adapt to evolving risks.
NVIDIA Run:ai supports SaaS and self-hosted modes to satisfy a range of data security needs. The self-hosted mode keeps all models, logs, and user data entirely within your infrastructure; SaaS requires careful review of what (minimal) data is transmitted for platform operations and analytics. See for more details.
Use the self-hosted mode when full control over the environment is required - including deployment and day-2 operations such as upgrades, monitoring, backup, and metadata restore.
Ensure transmission to the NVIDIA Run:ai cloud is scoped (in SaaS mode) and aligns with organization policy.
Encrypt secrets and sensitive resources; control secret propagation.
Document and audit data flows for regulatory alignment.
Access control (RBAC)
Enforce least privilege, segment roles by scope, audit regularly
Authentication and sessions management
Use SSO, token-based authentication, strong passwords, limit idle time
Workload policies
Require non-root, set UID/GID, block overrides, use trusted images
Namespace and resource management
Require namespace approval, limit secret propagation, apply quotas
Currently, only “Consumption Reports” are available, which provides insights into the consumption of resources such as GPU, CPU, and CPU memory across organizational units.
The Reports table can be found under Analytics in the NVIDIA Run:ai platform.
The Reports table provides a list of all the reports defined in the platform and allows you to manage them.
Users are able to access the reports they have generated themselves. Users with project viewing permissions throughout the tenant can access all reports within the tenant.
The Reports table comprises the following columns:
Report
The name of the report
Description
The description of the report
Status
The different lifecycle phases and representation of the report condition
Type
The type of the report – e.g., consumption
Created by
The user who created the report
Creation time
The timestamp of when the report was created
The following table describes the reports' condition and whether they were created successfully:
Ready
Report is ready and can be downloaded as CSV
Pending
Report is in the queue and waiting to be processed
Failed
The report couldn’t be created
Processing...
The report is being created
Filter - Click ADD FILTER, select the column to filter by, and enter the filter values
Search - Click SEARCH and type the value to search by
Sort - Click each column header to sort by
Column selection - Click COLUMNS and select the columns to display in the table
Before you start, make sure you have a project.
To create a new report:
Click +NEW REPORT
Enter a name for the report (if the name already exists, you will need to choose a different one)
Optional: Provide a description of the report
Set the report’s data collection period
Start date - The date at which the report data commenced
End date - The date at which the report data concluded
Set the report segmentation and filters
Filters - Filter by project or department name
Segment by - Data is collected and aggregated based on the segment
Click CREATE REPORT
Select the report you want to delete
Click DELETE
On the dialog, click DELETE to confirm
Select the report you want to download
Click DOWNLOAD CSV
Reports must be saved in a storage solution compatible with S3. To activate this feature for self-hosted accounts, the storage needs to be linked to the account. The configuration should be incorporated into two ConfigMap objects within the Control Plane.
Edit the runai-backend-org-unit-service ConfigMap:
Add the following lines to the file:
Edit the runai-backend-metrics-service ConfigMap:
Add the following lines to the file:
In addition on the same file, under config.yaml section, add the following right after log_level: \"Info\":
Restart the deployments:
Refresh the page to see Reports under Analytics in the NVIDIA Run:ai platform.
To view the available actions, go to the Reports API reference.
Before installing the NVIDIA Run:ai cluster, validate that the system requirements and network requirements are met. For air-gapped environments, make sure you have the software artifacts prepared.
Once all the requirements are met, it is highly recommend to use the NVIDIA Run:ai cluster preinstall diagnostics tool to:
Test the below requirements in addition to failure points related to Kubernetes, NVIDIA, storage, and networking
Look at additional components installed and analyze their relevance to a successful installation
For more information, see . To run the preinstall diagnostics tool, the latest version, and run:
In an air-gapped deployment, the diagnostics image is saved, pushed, and pulled manually from the organization's registry.
Run the binary with the --image parameter to modify the diagnostics image to be used:
NVIDIA Run:ai requires 3.14 or later. To install Helm, see . If you are installing an air-gapped version of NVIDIA Run:ai, the NVIDIA Run:ai tar file contains the .
A Kubernetes user with the cluster-admin role is required to ensure a successful installation. For more information, see .
If you encounter an issue with the installation, try the troubleshooting scenario below.
If the NVIDIA Run:ai cluster installation failed, check the installation logs to identify the issue. Run the following script to print the installation logs:
If the NVIDIA Run:ai cluster installation completed, but the cluster status did not change its status to Connected, check the cluster
This section explains the procedure for managing Nodes.
Nodes are Kubernetes elements automatically discovered by the NVIDIA Run:ai platform. Once a node is discovered by the NVIDIA Run:ai platform, an associated instance is created in the Nodes table, administrators can view the Node’s relevant information, and NVIDIA Run:ai scheduler can use the node for Scheduling.
The Nodes table can be found under Resources in the NVIDIA Run:ai platform.
The Nodes table displays a list of predefined nodes available to users in the NVIDIA Run:ai platform.
The Nodes table consists of the following columns:
Click one of the values in the GPU devices column, to view the list of GPU devices and their parameters.
Click one of the values in the Pod(s) column, to view the list of pods and their parameters.
Filter - Click ADD FILTER, select the column to filter by, and enter the filter values
Search - Click SEARCH and type the value to search by
Sort - Click each column header to sort by
Column selection - Click COLUMNS and select the columns to display in the table
Click a row in the Nodes table and then click the Show details button at the upper right side of the action bar. The details screen appears, presenting the following metrics graphs:
GPU utilization - Per GPU graph and an average of all GPUs graph, all on the same chart, along an adjustable period allows you to see the trends of all GPUs compute utilization (percentage of GPU compute) in this node.
GPU memory utilization - Per GPU graph and an average of all GPUs graph, all on the same chart, along an adjustable period allows you to see the trends of all GPUs memory usage (percentage of the GPU memory) in this node.
CPU compute utilization - The average of all CPUs’ cores compute utilization graph, along an adjustable period allows you to see the trends of CPU compute utilization (percentage of CPU compute) in this node.
To view the available actions, go to the API reference.
To allow users to securely submit workloads using kubectl, you must configure the Kubernetes API server to authenticate users via the NVIDIA Run:ai identity provider. This is done by adding OpenID Connect (OIDC) flags to the Kubernetes API server configuration on each cluster.
Go to General settings
Navigate to Cluster authentication
--oidc-client-id - A client id that all tokens must be issued for.
--oidc-issuer-url - The URL of the NVIDIA Run:ai identity provider
--oidc-username-prefix - Prefix prepended to username claims to prevent clashes with existing names (e.g., [email protected]).
Single Sign-On (SSO) is an authentication scheme, allowing users to log in with a single pair of credentials to multiple, independent software systems.
This section explains the procedure to configure SSO to NVIDIA Run:ai using the SAML 2.0 protocol.
Before your start, make sure you have the IDP Metadata XML available from your identity provider.
Go to General settings
Open the Security section and click +IDENTITY PROVIDER
Select Custom SAML 2.0
Select either From computer or From URL to upload your identity provider metadata file
Open the NVIDIA Run:ai platform as an admin
Add to an SSO user defined in the IDP
Open the NVIDIA Run:ai platform in an incognito browser tab
On the sign-in page click CONTINUE WITH SSO. You are redirected to the identity provider sign in page
You can view the identity provider details and edit its configuration:
Go to General settings
Open the Security section
On the identity provider box, click Edit identity provider
You can edit either the metadata file or the user attributes
You can remove the identity provider configuration:
Go to General settings
Open the Security section
On the identity provider card, click Remove identity provider
In the dialog, click REMOVE to confirm the action
You can download the XML file to view the identity provider settings:
Go to General settings
Open the Security section
On the identity provider card, click Edit identity provider
In the dialog, click DOWNLOAD IDP METADATA XML FILE
If testing the setup was unsuccessful, try the different troubleshooting scenarios according to the error you received. If an error still occurs, check the .
Support for third-party integrations varies. When noted below, the integration is supported out of the box with NVIDIA Run:ai. For other integrations, our Customer Success team has prior experience assisting customers with setup. In many cases, the NVIDIA Enterprise Support Portal may include additional reference documentation provided on an as-is basis.
Kubernetes has several built-in resources that encapsulate running Pods. These are called and should not be confused with .
Examples of such resources are a Deployment that manages a stateless application, or a Job that runs tasks to completion.
A NVIDIA Run:ai workload encapsulates all the resources needed to run and creates/deletes them together. Since NVIDIA Run:ai is an open platform, it allows the scheduling of any Kubernetes Workflow.
For more information, see .
NVIDIA Run:ai’s GPU memory swap helps administrators and AI practitioners to further increase the utilization of their existing GPU hardware by improving GPU sharing between AI initiatives and stakeholders. This is done by expanding the GPU physical memory to the CPU memory, typically an order of magnitude larger than that of the GPU.
Expanding the GPU physical memory helps the NVIDIA Run:ai system to put more workloads on the same GPU physical hardware, and to provide a smooth workload context switching between GPU memory and CPU memory, eliminating the need to kill workloads when the memory requirement is larger than what the GPU physical memory can provide.
There are several use cases where GPU memory swap can benefit and improve the user experience and the system's overall utilization.
AI practitioners use notebooks to develop and test new AI models and to improve existing AI models. While developing or testing an AI model, notebooks use GPU resources intermittently, yet, required resources of the GPUs are pre-allocated by the notebook and cannot be used by other workloads after one notebook has already reserved them. To overcome this inefficiency, NVIDIA Run:ai introduced and .
When one or more workloads require more than their requested GPU resources, there’s a high probability not all workloads can run on a single GPU because the total memory required is larger than the physical size of the GPU memory.
With GPU memory swap, several workloads can run on the same GPU, even if the sum of their used memory is larger than the size of the physical GPU memory. GPU memory swap can swap in and out workloads interchangeably, allowing multiple workloads to each use the full amount of GPU memory. The most common scenario is for one workload to run on the GPU (for example, an interactive notebook), while other notebooks are either idle or using the CPU to develop new code (while not using the GPU). From a user experience point of view, the swap in and out is a smooth process since the notebooks do not notice that they are being swapped in and out of the GPU memory. On rare occasions, when multiple notebooks need to access the GPU simultaneously, slower workload execution may be experienced.
Notebooks typically use the GPU intermittently, therefore with high probability, only one workload (for example, an ), will use the GPU at a time. The more notebooks the system puts on a single GPU, the higher the chances are that there will be more than one notebook requiring the GPU resources at the same time. Admins have a significant role here in fine tuning the number of notebooks running on the same GPU, based on specific use patterns and required SLAs. Using Node Level Scheduler reduces GPU access contention between different interactive notebooks running on the same node.
A single GPU can be shared between an (for example, a Jupyter notebook, image recognition services, or an LLM service), and a training workload that is not time-sensitive or delay-sensitive. At times when the inference/interactive workload uses the GPU, both training and inference/interactive workloads share the GPU resources, each running part of the time swapped-in to the GPU memory, and swapped-out into the CPU memory the rest of the time.
Whenever the inference/interactive workload stops using the GPU, the swap mechanism swaps out the inference/interactive workload GPU data to the CPU memory. Kubernetes wise, the pod is still alive and running using the CPU. This allows the training workload to run faster when the inference/interactive workload is not using the GPU, and slower when it does, thus sharing the same resource between multiple workloads, fully utilizing the GPU at all times, and maintaining uninterrupted service for both workloads.
Running multiple is a demanding task and you will need to ensure that your SLA is met. You need to provide high performance and low latency, while maximizing GPU utilization. This becomes even more challenging when the exact model usage patterns are unpredictable. You must plan for the agility of inference services and strive to keep models on standby in a ready state rather than an idle state.
NVIDIA Run:ai’s GPU memory swap feature enables you to load multiple models to a single GPU, where each can use up to the full amount GPU memory. Using an application load balancer, the administrator can control to which server each inference request is sent. Then the GPU can be loaded with multiple models, where the model in use is loaded into the GPU memory and the rest of the models are swapped-out to the CPU memory. The swapped models are stored as ready models to be loaded when required. GPU memory swap always maintains the context of the workload (model) on the GPU so it can easily and quickly switch between models. This is unlike industry standard model servers that load models from scratch into the GPU whenever required.
Swapping the workload’s GPU memory to and from the CPU is performed simultaneously and synchronously for all GPUs used by the workload. In some cases, if workloads specify a memory limit smaller than a full GPU memory size, multiple workloads can run in parallel on the same GPUs, maximizing the utilization and shortening the response times.
In other cases, workloads will run serially, with each workload running for a few seconds before the system swaps them in/out. If multiple workloads occupy more than the GPU physical memory and attempt to run simultaneously, memory swapping will occur. In this scenario, each workload will run part of the time on the GPU while being swapped out to the CPU memory the other part of the time, slowing down the execution of the workloads. Therefore, it is important to evaluate whether memory swapping is suitable for your specific use cases, weighing the benefits against the potential for slower execution time. To better understand the benefits and use cases of GPU memory swap, refer to the detailed sections below. This will help you determine how to best utilize GPU swap for your workloads and achieve optimal performance.
The workload MUST use . This means the workload’s memory Request is less than a full GPU, but it may add a GPU memory Limit to allow the workload to effectively use the full GPU memory. The NVIDIA Run:ai Scheduler allocates the dynamic fraction pair (Request and Limit) on single or multiple GPU devices in the same node.
The administrator must label each node that they want to provide GPU memory swap with a run.ai/swap-enabled=true to enable that node. Enabling the feature reserves CPU memory to serve the swapped GPU memory from all GPUs on that node. The administrator sets the size of the CPU reserved RAM memory using the runaiconfig file as detailed in .
Optionally, you can also configure the :
The Node Level Scheduler automatically spreads workloads between the different GPUs on a node, ensuring maximum workload performance and GPU utilization.
In scenarios where Interactive notebooks are involved, if the CPU reserved memory for the GPU swap is full, the Node Level Scheduler preempts the GPU process of that workload and potentially routes the workload to another GPU to run.
NVIDIA Run:ai also supports workload submission using multi-GPU memory swap. Multi-GPU memory swap works similarly to single GPU memory swap, but instead of swapping memory for a single GPU workload, it swaps memory for workloads across multiple GPUs simultaneously and synchronously.
The NVIDIA Run:ai Scheduler allocates the same dynamic GPU fraction pair (Request and Limit) on multiple GPU devices in the same node. For example, if you want to run two LLM models, each consuming 8 GPUs that are not used simultaneously, you can use GPU memory swap to share their GPUs. This approach allows multiple models to be stacked on the same node.
The following outlines the advantages of stacking multiple models on the same node:
Maximizes GPU utilization - Efficiently uses available GPU resources by enabling multiple workloads to share GPUs.
Improves cold start times - Loading large LLM models to a node and its GPUs can take several minutes during a “cold start”. Using memory swap turns this process into a “warm start” that takes only a fraction of a second to a few seconds (depending on the model size and the GPU model).
Increases GPU availability - Frees up and maximizes GPU availability for additional workloads (and users), enabling better resource sharing.
A pod created before the GPU memory swap feature was enabled in that cluster, cannot be scheduled to a swap-enabled node. A proper event is generated in case no matching node is found. Users must re-submit those pods to make them swap-enabled.
GPU memory swap cannot be enabled if the NVIDIA Run:ai is used. GPU memory swap can only be used with the default NVIDIA time-slicing mechanism.
CPU RAM size cannot be decreased once GPU memory swap is enabled.
Before configuring GPU memory swap, dynamic GPU fractions must be enabled. You can also configure and use Node Level Scheduler. Dynamic GPU fractions enable you to make your workloads burstable, while both features will maximize your workloads’ performance and GPU utilization within a single node.
To enable GPU memory swap in a NVIDIA Run:ai cluster:
Add the following label to each node where you want to enable GPU memory swap:
Edit the runaiconfig file with the following parameters. This example uses 100Gi as the size of the swap memory. For more details, see :
Or, use the following patch command from your terminal:
Swappable workloads require reserving a small part of the GPU for non-swappable allocations like binaries and GPU context. To avoid getting out-of-memory (OOM) errors due to non-swappable memory regions, the system reserves a 2GiB of GPU RAM memory by default, effectively truncating the total size of the GPU memory. For example, a 16GiB T4 will appear as 14GiB on a swap-enabled node. The exact reserved size is application-dependent, and 2GiB is a safe assumption for 2-3 applications sharing and swapping on a GPU. This value can be changed by:
Editing the runaiconfig as follows:
Or, using the following patch command from your terminal:
If you prefer your workloads not to be swapped into CPU memory, you can specify on the pod an anti-affinity to run.ai/swap-enabled=true node label when submitting your workloads and the Scheduler will ensure not to use swap-enabled nodes. An alternative way is to set swap on a dedicated node pool and not use this node pool for workloads you prefer not to swap.
CPU memory is limited, and since a single CPU serves multiple GPUs on a node, this number is usually between 2 to 8. For example, when using 80GB of GPU memory, each swapped workload consumes up to 80GB (but may use less) assuming each GPU is shared between 2-4 workloads. In this example, you can see how the swap memory can become very large. Therefore, we give administrators a way to limit the size of the CPU reserved memory for swapped GPU memory on each swap-enabled node as shown in .
Limiting the CPU reserved memory means that there may be scenarios where the GPU memory cannot be swapped out to the CPU reserved RAM. Whenever the CPU reserved memory for swapped GPU memory is exhausted, the workloads currently running will not be swapped out to the CPU reserved RAM, instead, (if enabled) logic takes over and provides GPU resource optimization.
Many workloads utilize GPU resources intermittently, with long periods of inactivity. These workloads typically need GPU resources when they are running AI applications or debugging a model in development. Other workloads such as inference may utilize GPUs at lower rates than requested, but may demand higher resource usage during peak utilization. The disparity between resource request and actual resource utilization often leads to inefficient utilization of GPUs. This usually occurs when multiple workloads request resources based on their peak demand, despite operating below those peaks for the majority of their runtime.
To address this challenge, NVIDIA Run:ai has introduced dynamic GPU fractions. This feature optimizes GPU utilization by enabling workloads to dynamically adjust their resource usage. It allows users to specify a guaranteed fraction of GPU memory and compute resources with a higher limit that can be dynamically utilized when additional resources are requested.
With dynamic GPU fractions, users can submit workloads using GPU fraction Request and Limit which is achieved by leveraging the Kubernetes Request and Limit notations. You can either:
Request a GPU fraction (portion) using a percentage of a GPU and specify a Limit
Request a GPU memory size (GB, MB) and specify a Limit
When setting a GPU memory limit either as GPU fraction or GPU memory size, the Limit must be equal to or greater than the GPU fractional memory request. Both GPU fraction and GPU memory are translated into the actual requested memory size of the Request (guaranteed resources) and the Limit (burstable resources - non guaranteed).
For example, a user can specify a workload with a GPU fraction request of 0.25 GPU, and add a limit of up to 0.80 GPU. The NVIDIA Run:ai schedules the workload to a node that can provide the GPU fraction request (0.25), and then assigns the workload to a GPU. The GPU scheduler monitors the workload and allows it to occupy memory between 0 to 0.80 of the GPU memory (based on the Limit), where only 0.25 of the GPU memory is guaranteed to that workload. The rest of the memory (from 0.25 to 0.8) is “loaned” to the workload, as long as it is not needed by other workloads.
NVIDIA Run:ai automatically manages the state changes between Request and Limit as well as the reverse (when the balance needs to be "returned"), updating the workloads’ utilization vs. Request and Limit parameters in the .
To guarantee fair quality of service between different workloads using the same GPU, NVIDIA Run:ai developed an extendable GPUOOMKiller (Out Of Memory Killer) component that guarantees the quality of service using Kubernetes semantics for resources of Request and Limit.
The OOMKiller capability requires adding CAP_KILL capabilities to the dynamic GPU fractions and to the NVIDIA Run:ai core scheduling module (toolkit daemon). This capability is enabled by default.
NVIDIA Run:ai also supports workload submission using multi-GPU dynamic fractions. Multi-GPU dynamic fractions work similarly to dynamic fractions on a single GPU workload, however, instead of a single GPU device, the NVIDIA Run:ai Scheduler allocates the same dynamic fraction pair (Request and Limit) on multiple GPU devices within the same node. For example, if practitioners develop a new model that uses 8 GPUs and requires 40GB of memory per GPU, but may want to burst out and consume up to the full GPU memory, they can allocate 8×40GB with multi-GPU fractions and a limit of 80GB (e.g. H100 GPU) instead of reserving the full memory of each GPU (e.g. 80GB). This leaves 40GB of GPU memory available on each of the 8 GPUs for other workloads within that node.This is useful during model development, where memory requirements are usually lower due to experimentation with smaller models or configurations.
This approach significantly improves GPU utilization and availability, enabling more precise and often smaller quota requirements for the end user. Time sharing where single GPUs can serve multiple workloads with dynamic fractions remains unchanged, only now, it serves multiple workloads using multi-GPUs per workload.
Using the asset, you can define the compute requirements by specifying your requested GPU portion or GPU memory, and set a Limit. You can then use the compute resource with any of the for single and multi-GPU dynamic fractions. In addition, you will be able to view the workloads’ utilization vs. Request and Limit parameters in the .
Single dynamic GPU fractions - Define the compute requirement to run 1 GPU device, by specifying either a fraction (percentage) of the overall memory or specifying the memory request (GB, MB) with a Limit. The limit must be equal to or greater than the GPU fractional memory request.
Multi-GPU dynamic fractions - Define the compute requirement to run multiple GPU devices, by specifying either a fraction (percentage) of the overall memory or specifying the memory request (GB, MB) with a Limit. The limit must be equal to or greater than the GPU fractional memory request.
To enable dynamic GPU fractions for workloads submitted via Kubernetes YAML, use the following annotations to define the GPU fraction configuration. You can configure either gpu-fraction or gpu-memory. You must also set the RUNAI_GPU_MEMORY_LIMIT environment variable in the first container to enforce the memory limit. This is the GPU consuming container. Make sure the default scheduler is set to runai-scheduler. See for more details.
The following example YAML creates a pod that requests 2 GPU devices, each requesting 50% of memory (gpu-fraction: "0.5") and allows usage of up to 95% (RUNAI_GPU_MEMORY_LIMIT: "0.95") if available.
To view the available actions, go to the and run according to your workload.
To view the available actions, go to the and run according to your workload.
Single Sign-On (SSO) is an authentication scheme, allowing users to log-in with a single pair of credentials to multiple, independent software systems.
This article explains the procedure to configure single sign-on to NVIDIA Run:ai using the OpenID Connect protocol in OpenShift V4.
Before starting, make sure you have the following available from your OpenShift cluster:
:
ClientID - The ID used to identify the client with the Authorization Server.
Client Secret - A secret password that only the Client and Authorization Server know.
Base URL - The OpenShift API Server endpoint (for example, )
Go to General settings
Open the Security section and click +IDENTITY PROVIDER
Select OpenShift V4
Enter the Base URL, Client ID, and Client Secret from your OpenShift OAuth client.
Open the NVIDIA Run:ai platform as an admin
Add to an SSO user defined in the IDP
Open the NVIDIA Run:ai platform in an incognito browser tab
On the sign-in page click CONTINUE WITH SSO You are redirected to the OpenShift IDP sign-in page
You can view the identity provider details and edit its configuration:
Go to General settings
Open the Security section
On the identity provider box, click Edit identity provider
You can edit either the Base URL, Client ID, Client Secret, or the User attributes
You can remove the identity provider configuration:
Go to General settings
Open the Security section
On the identity provider card, click Remove identity provider
In the dialog, click REMOVE to confirm
If testing the setup was unsuccessful, try the different troubleshooting scenarios according to the error you received.
Multi-Node NVLink (MNNVL) systems, including NVIDIA GB200, NVIDIA GB200 NVL72 and its derivatives are fully supported by the NVIDIA Run:ai platform.
Kubernetes does not natively recognize NVIDIA’s MNNVL architecture, which makes managing and scheduling workloads across these high-performance domains more complex. The NVIDIA Run:ai platform simplifies this by abstracting the complexity of MNNVL configuration. Without this abstraction, optimal performance on a GB200 NVL72 system would require deep knowledge of NVLink domains, their hardware dependencies, and manual configuration for each distributed workload. NVIDIA Run:ai automates these steps, ensuring high performance with minimal effort. While GB200 NVL72 supports all , distributed training workloads benefit most from its accelerated GPU networking capabilities.
To learn more about GB200, MNNVL and related NVIDIA technologies, refer to the following:
To submit a with GPU resources in Kubernetes, you typically need to specify an integer number of GPUs. However, workloads often require diverse GPU memory and compute requirements or even use GPUs intermittently depending on the application (such as inference workloads, training workloads or notebooks at the model-creation phase). Additionally, GPUs are becoming increasingly powerful, offering more processing power and larger memory capacity for applications. Despite the increasing model sizes, the increasing capabilities of GPUs allow them to be effectively shared among multiple users or applications.
NVIDIA Run:ai’s GPU fractions provide an agile and easy-to-use method to share a GPU or multiple GPUs across workloads. With GPU fractions, you can divide the GPU/s memory into smaller chunks and share the GPU/s compute resources between different workloads and users, resulting in higher GPU utilization and more efficient resource allocation.
Utilizing GPU fractions to share GPU resources among multiple workloads provides numerous advantages for both platform administrators and practitioners, including improved efficiency, resource optimization, and enhanced user experience.
When a user , the workload is directed to the selected Kubernetes cluster and managed by the NVIDIA Run:ai Scheduler. The Scheduler’s primary responsibility is to allocate workloads to the most suitable node or nodes based on resource requirements and other characteristics, as well as adherence to NVIDIA Run:ai’s fairness and quota management.
The NVIDIA Run:ai Scheduler schedules native Kubernetes workloads, NVIDIA Run:ai workloads, or any other type of third-party workloads. To learn more about workloads support, see .
To understand what is behind the NVIDIA Run:ai Scheduler’s decision-making logic, get to know the key concepts, resource management and scheduling principles of the Scheduler.
can range from a single pod running on individual nodes to distributed workloads using multiple pods, each running on a node (or part of a node). For example, a large scale training workload could use up to 128 nodes or more, while an inference workload could use many pods (replicas) and nodes.
Data volumes (DVs) are one type of . They offer a powerful solution for storing, managing, and sharing AI training data, promoting collaboration, simplifying data access control, and streamlining the AI development lifecycle.
Acting as a central repository for organizational data resources, data volumes can represent datasets or raw data, that is stored in Kubernetes Persistent Volume Claims (PVCs).
Once a data volume is created, it can be shared with additional multiple scopes and easily utilized by AI practitioners when submitting workloads. Shared data volumes are mounted with read-only permissions, ensuring data integrity. Any modifications to the data in a shared DV must be made by writing to the original volume of the PVC used to create the data volume.
runai workspace submit --priority priority-classrunai training submit --priority priority-classcurl -H "Authorization: Bearer <token>" "https://runai.jfrog.io/artifactory/api/storage/runai-airgapped-prod/?list"curl -L -H "Authorization: Bearer <token>" -O "https://runai.jfrog.io/artifactory/runai-airgapped-prod/runai-airgapped-package-<VERSION>.tar.gz"kubectl edit cm runai-backend-org-unit-service -n runai-backendS3_ENDPOINT: <S3_END_POINT_URL>
S3_ACCESS_KEY_ID: <S3_ACCESS_KEY_ID>
S3_ACCESS_KEY: <S3_ACCESS_KEY>
S3_USE_SSL: "true"
S3_BUCKET: <BUCKET_NAME>kubectl edit cm runai-backend-metrics-service -n runai-backendS3_ENDPOINT: <S3_END_POINT_URL>
S3_ACCESS_KEY_ID: <S3_ACCESS_KEY_ID>
S3_ACCESS_KEY: <S3_ACCESS_KEY>
S3_USE_SSL: "true"GPU devices
The number of GPU devices installed on the node. Clicking this field pops up a dialog with details per GPU (described below in this article)
Free GPU devices
The current number of fully vacant GPU devices
GPU memory
The total amount of GPU memory installed on this node. For example, if the number is 640GB and the number of GPU devices is 8, then each GPU is installed with 80GB of memory (assuming the node is assembled of homogenous GPU devices)
Allocated GPUs
The total allocation of GPU devices in units of GPUs (decimal number). For example, if 3 GPUs are 50% allocated, the field prints out the value 1.50. This value represents the portion of GPU memory consumed by all running pods using this node
Used GPU memory
The actual amount of memory (in GB or MB) used by pods running on this node.
GPU compute utilization
The average compute utilization of all GPU devices in this node
GPU memory utilization
The average memory utilization of all GPU devices in this node
CPU (Cores)
The number of CPU cores installed on this node
CPU memory
The total amount of CPU memory installed on this node
Allocated CPU (Cores)
The number of CPU cores allocated by pods running on this node (decimal number, e.g. a pod allocating 350 mili-cores shows an allocation of 0.35 cores).
Allocated CPU memory
The total amount of CPU memory allocated by pods running on this node (in GB or MB)
Used CPU memory
The total amount of actually used CPU memory by pods running on this node. Pods may allocate memory but not use all of it, or go beyond their CPU memory allocation if using Limit > Request for CPU memory (burstable workload)
CPU compute utilization
The utilization of all CPU compute resources on this node (percentage)
CPU memory utilization
The utilization of all CPU memory resources on this node (percentage)
Used swap CPU memory
The amount of CPU memory (in GB or MB) used for GPU swap memory (* future)
Pod(s)
List of pods running on this node, click the field to view details (described below in this article)
Download table - Click MORE and then Click Download as CSV. Export to CSV is limited to 20,000 rows.
Show/Hide details - Click to view additional information on the selected row
CPU memory utilization - The utilization of all CPUs memory in a single graph, along an adjustable period allows you to see the trends of CPU memory utilization (percentage of CPU memory) in this node.
CPU memory usage - The usage of all CPUs memory in a single graph, along an adjustable period allows you to see the trends of CPU memory usage (in GB or MB of CPU memory) in this node.
For GPUs charts - Click the GPU legend on the right-hand side of the chart, to activate or deactivate any of the GPU lines.
You can click the date picker to change the presented period
You can use your mouse to mark a sub-period in the graph for zooming in, and use the ‘Reset zoom’ button to go back to the preset period
Changes in the period affect all graphs on this screen.
Node
The Kubernetes name of the node
Status
The state of the node. Nodes in the Ready state are eligible for scheduling. If the state is Not ready then the main reason appears in parenthesis on the right side of the state field. Hovering the state lists the reasons why a node is Not ready.
NVLink domain UID
Indicates if the MNNVL domain ID is part of the MNNVL label value. In case the MNNVL label is not the default MNNVL label key (nvidia.com/gpu.clique), this field will show the whole label value.
MNNVL domain clique ID
Indicates if the MNNVL clique ID is part of the MNNVL label value. In case the MNNVL label is not the default MNNVL label key (nvidia.com/gpu.clique), this field will show an empty value.
Node pool
The name of the associated node pool. By default, every node in the NVIDIA Run:ai platform is associated with the default node pool, if no other node pool is associated
GPU type
The GPU model, for example, H100, or V100
Index
The GPU index, read from the GPU hardware. The same index is used when accessing the GPU directly
Used memory
The amount of memory used by pods and drivers using the GPU (in GB or MB)
Compute utilization
The portion of time the GPU is being used by applications (percentage)
Memory utilization
The portion of the GPU memory that is being used by applications (percentage)
Idle time
The elapsed time since the GPU was used (i.e. the GPU is being idle for ‘Idle time’)
Pod
The Kubernetes name of the pod. Usually name of the pod is made of the name of the parent workload if there is one, and an index for unique for that pod instance within the workload
Status
The state of the pod. In steady state this should be Running and the amount of time the pod is running
Project
The NVIDIA Run:ai project name the pod belongs to. Clicking this field takes you to the Projects table filtered by this project name
Workload
The workload name the pod belongs to. Clicking this field takes you to the Workloads table filtered by this workload name
Image
The full path of the image used by the main container of this pod
Creation time
The pod’s creation date and time

If it exists in the IDP, it allows Researcher containers to start with the relevant Linux supplementary groups. The IDP attribute must be a list of integers.
Defines the user attribute in the IDP holding the user's email address, which is the user identifier in NVIDIA Run:ai
User first name
firstName
Used as the user’s first name appearing in the NVIDIA Run:ai user interface
User last name
lastName
Used as the user’s last name appearing in the NVIDIA Run:ai user interface


Copy the Redirect URL to be used in your OpenShift OAuth client
Optional: Enter the user attributes and their value in the identity provider as shown in the below table
Click SAVE
Optional: Enable Auto-Redirect to SSO to automatically redirect users to your configured identity provider’s login page when accessing the platform.
Defines the user attribute in the IDP holding the user's email address, which is the user identifier in NVIDIA Run:ai
User first name
firstName
Used as the user’s first name appearing in the NVIDIA Run:ai platform
User last name
lastName
Used as the user’s last name appearing in the NVIDIA Run:ai platform
In the identity provider sign-in page, log in with the SSO user who you granted with access rules
If you are unsuccessful signing-in to the identity provider, follow the Troubleshooting section below
Validate the user’s groups attribute is mapped correctly
Advanced:
Open the Chrome DevTools: Right-click on page → Inspect → Console tab
Run the following command to retrieve and copy the user’s token: localStorage.token;
Paste in https://jwt.io
Under the Payload section validate the value of the user’s attributes
Validate the configured Client Secret match the Client Secret value in the OAuthclient Kubernetes object.
Advanced: Look for the specific error message in the URL address
User role groups
GROUPS
If it exists in the IDP, it allows you to assign NVIDIA Run:ai role groups via the IDP. The IDP attribute must be a list of strings.
Linux User ID
UID
If it exists in the IDP, it allows researcher containers to start with the Linux User UID. Used to map access to network resources such as file systems to users. The IDP attribute must be of type integer.
Linux Group ID
GID
If it exists in the IDP, it allows researcher containers to start with the Linux Group GID. The IDP attribute must be of type integer.
Supplementary Groups
SUPPLEMENTARYGROUPS



If it exists in the IDP, it allows researcher containers to start with the relevant Linux supplementary groups. The IDP attribute must be a list of integers.
Collection period
The period in which the data was collected

gpu-fraction
A portion of GPU memory as a double-precision floating-point number. Example: 0.25, 0.75.
Pod annotation (metadata.annotations)
gpu-memory
Memory size in MiB. Example: 2500, 4096. The gpu-memory values are always in MiB.
Pod annotation (metadata.annotations)
gpu-fraction-num-devices
The number of GPU devices to allocate using the specified gpu-fraction or gpu-memory value. Set this annotation only if you want to request multiple GPU devices.
Pod annotation (metadata.annotations)
RUNAI_GPU_MEMORY_LIMIT
To use for gpu-fraction - Specify a double-precision floating-point number. Example: 0.95
To use for gpu-memory - Specify a Kubernetes resource quantity format. Example: 500000000, 2500M
The limit must be equal to or greater than the GPU fractional memory request.
Environment variable in the first container
Replace <POSTGRESQL_PORT> with the port number where PostgreSQL is running.
Replace <POSTGRESQL_DB> with the name of your PostgreSQL database.
Replace <POSTGRESQL_DB> with the name of your PostgreSQL database.
Replace <SQL_FILE> with the path to the SQL script created in the previous step.
tar xvf runai-airgapped-package-<VERSION>.tar.gzexport REGISTRY_URL=<DOCKER REGISTRY ADDRESS>sudo ./setup.shVerify that the changes have been applied. After saving the file, the API server should automatically restart since it's managed as a static pod. Confirm that the kube-apiserver-<master-node-name> pod in the kube-system namespace has restarted and is running with the new configuration. You can run the following command to check the pod status:
Verify the flags are applied by inspecting the running API server container:
Follow the Rancher documentation here to locate the API server container ID.
Run the following command:
Confirm that the OIDC flags have been added correctly to the container's configuration.
If you're using Rancher UI:
Add the required flags during the cluster provisioning process.
Navigate to: Cluster Management > Create, select RKE2, and choose your platform.
In the Cluster Configuration screen, go to: Advanced > Additional API Server Args.
Add the required OIDC flags as <key>=<value> (e.g. oidc-username-prefix=-).
Configure the OIDC provider for username-password authentication. Make sure to use the required OIDC flags:
Or, configure the OIDC provider for single-sign-on. Make sure to use the required OIDC flags:
Update the runaiconfig with the Anthos Identity Service endpoint. First, get the external IP of the gke-oidc-envoy service:
Then, patch the runaiconfig to use this endpoint. Replace the below with the actual IP address of the gke-oidc-envoy service:
Associate a new identity provider. Use the required OIDC flags.
The process can take up to 30 minutes.
Verify that the changes have been applied. After saving the file, the API server should automatically restart since it's managed as a static pod. Confirm that the kube-apiserver-<master-node-name> pod in the kube-system namespace has restarted and is running with the new configuration. You can run the following command to check the pod status:
\nreports:\n s3_config:\n bucket: \"<BUCKET_NAME>\"\nkubectl rollout restart deployment runai-backend-metrics-service runai-backend-org-unit-service -n runai-backendrun.ai/swap-enabled=truespec:
global:
core:
swap:
enabled: true
limits:
cpuRam: 100Gi kubectl patch -n runai runaiconfigs.run.ai/runai --type='merge' --patch '{"spec":{"global":{"core":{"swap":{"enabled": true, "limits": {"cpuRam": "100Gi"}}}}}}'spec:
global:
core:
swap:
limits:
reservedGpuRam: 2Gi kubectl patch -n runai runaiconfigs.run.ai/runai --type='merge' --patch '{"spec":{"global":{"core":{"swap":{"limits":{"reservedGpuRam": <quantity>}}}}}}'apiVersion: v1
kind: Pod
metadata:
annotations:
user: test
gpu-fraction: "0.5"
gpu-fraction-num-devices: "2"
labels:
runai/queue: test
name: multi-fractional-pod-job
namespace: test
spec:
containers:
- image: gcr.io/run-ai-demo/quickstart-cuda
imagePullPolicy: Always
name: job
env:
- name: RUNAI_VERBOSE
value: "1"
- name: RUNAI_GPU_MEMORY_LIMIT
value: "0.95"
resources:
limits:
cpu: 200m
memory: 200Mi
requests:
cpu: 100m
memory: 100Mi
securityContext:
capabilities:
drop: ["ALL"]
schedulerName: runai-scheduler
serviceAccount: default
serviceAccountName: default
terminationGracePeriodSeconds: 5tar xvf runai-airgapped-package-<VERSION>.tar.gzexport REGISTRY_URL=<DOCKER REGISTRY ADDRESS>sudo ./setup.sh-- Create a new database for runai
CREATE DATABASE <DATABASE_NAME>;
-- Create the role with login and password
CREATE ROLE <ROLE_NAME> WITH LOGIN PASSWORD '<ROLE_PASSWORD>';
-- Grant all privileges on the database to the role
GRANT ALL PRIVILEGES ON DATABASE <DATABASE_NAME> TO <ROLE_NAME>;
-- Connect to the newly created database
\c <DATABASE_NAME>
-- grafana
CREATE ROLE grafana WITH LOGIN PASSWORD '<GRAFANA_PASSWORD>';
CREATE SCHEMA grafana authorization grafana;
ALTER USER grafana set search_path='grafana';
-- Exit psql
\qpsql --host <POSTGRESQL_HOST> \
--user <POSTGRESQL_USER> \
--port <POSTGRESQL_PORT> \
--dbname <POSTGRESQL_DB> \
-a -f <SQL_FILE> \kubectl get pods -n kube-system kube-apiserver-<master-node-name> -o yamldocker inspect <kube-api-server-container-id>kubectl get clientconfig default -n kube-public -o yaml > login-config.yaml
yq -i e ".spec +={\"authentication\":[{\"name\":\"oidc\",\"oidc\":{\"clientID\":\"runai\",\"issuerURI\":\"$OIDC_ISSUER_URL\",\"kubectlRedirectURI\":\"http://localhost:8000/callback\",\"userClaim\":\"sub\",\"userPrefix\":\"-\"}}]}" login-config.yaml
kubectl apply -f login-config.yamlkubectl get clientconfig default -n kube-public -o yaml > login-config.yaml
yq -i e ".spec +={\"authentication\":[{\"name\":\"oidc\",\"oidc\":{\"clientID\":\"runai\",\"issuerURI\":\"$OIDC_ISSUER_URL\",\"groupsClaim\":\"groups\",\"kubectlRedirectURI\":\"http://localhost:8000/callback\",\"userClaim\":\"sub\",\"userPrefix\":\"-\"}}]}" login-config.yaml
kubectl apply -f login-config.yamlkubectl get svc -n anthos-identity-service
NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE
gke-oidc-envoy LoadBalancer 10.37.3.111 39.201.319.10 443:31545/TCP 12hkubectl -n runai patch runaiconfig runai -p '{"spec": {"researcher-service":
{"args": {"gkeOidcEnvoyHost": "35.236.229.19"}}}}' --type="merge"kubectl get pods -n kube-system kube-apiserver-<master-node-name> -o yaml containers:
- command:
...
- --oidc-client-id=runai
- --oidc-issuer-url=https://<HOST>/auth/realms/runai
- --oidc-username-prefix=-kube-api:
always_pull_images: false
extra_args:
oidc-client-id: runai #
...gcloud container clusters update <gke-cluster-name> \
--enable-identity-service --project=<gcp-project-name> --zone=<gcp-zone-name>kube-apiserver-arg:
- "oidc-client-id=runai" #
...controlPlane.secretKeys.clientSecretClick +NEW CLUSTER
Enter a unique name for your cluster
Optional: Choose the NVIDIA Run:ai cluster version (latest, by default)
Enter the Cluster URL. For more information, see Fully Qualified Domain Name requirement.
Click Continue
Installing NVIDIA Run:ai cluster
In the next Section, the NVIDIA Run:ai cluster installation steps will be presented.
Follow the installation instructions and run the commands provided on your Kubernetes cluster.
Click DONE
The cluster is displayed in the table with the status Waiting to connect. Once installation is complete, the cluster status changes to Connected.
Tip: Use the --dry-run flag to gain an understanding of what is being installed before the actual installation. For more details, see see Understanding cluster access roles.
Click +NEW CLUSTER
Enter a unique name for your cluster
Optional: Choose the NVIDIA Run:ai cluster version (latest, by default)
Enter the Cluster URL . For more information, see Fully Qualified Domain Name requirement.
Click Continue
Installing NVIDIA Run:ai cluster
In the next Section, the NVIDIA Run:ai cluster installation steps will be presented.
Follow the installation instructions and run the commands provided on your Kubernetes cluster.
On the second tab of the cluster wizard, when copying the helm command for installation, you will need to use the pre-provided installation file instead of using helm repositories. As such:
Do not add the helm repository and do not run helm repo update.
Instead, edit the helm upgrade command.
Replace runai/runai-cluster with runai-cluster-<VERSION>.tgz.
Add --set global.image.registry=<DOCKER REGISTRY ADDRESS> where the registry address is as entered in the section
Add --set global.customCA.enabled=true as described
The command should look like the following:
Click DONE
The cluster is displayed in the table with the status Waiting to connect. Once installation is complete, the cluster status changes to Connected.
Tip: Use the --dry-run flag to gain an understanding of what is being installed before the actual installation. For more details, see Understanding cluster access roles.
Click +NEW CLUSTER
Enter a unique name for your cluster
Optional: Choose the NVIDIA Run:ai cluster version (latest, by default)
Enter the Cluster URL
Click Continue
Installing NVIDIA Run:ai cluster
In the next Section, the NVIDIA Run:ai cluster installation steps will be presented.
Follow the installation instructions and run the commands provided on your Kubernetes cluster.
Click DONE
The cluster is displayed in the table with the status Waiting to connect. Once installation is complete, the cluster status changes to Connected.
Click +NEW CLUSTER
Enter a unique name for your cluster
Optional: Choose the NVIDIA Run:ai cluster version (latest, by default)
Enter the Cluster URL
Click Continue
Installing NVIDIA Run:ai cluster
In the next Section, the NVIDIA Run:ai cluster installation steps will be presented.
Follow the installation instructions and run the commands provided on your Kubernetes cluster.
On the second tab of the cluster wizard, when copying the helm command for installation, you will need to use the pre-provided installation file instead of using helm repositories. As such:
Do not add the helm repository and do not run helm repo update.
Instead, edit the helm upgrade command.
Replace runai/runai-cluster with runai-cluster-<VERSION>.tgz.
Add --set global.image.registry=<DOCKER REGISTRY ADDRESS> where the registry address is as entered in the section
Add --set global.customCA.enabled=true as described
The command should look like the following:
Click DONE
The cluster is displayed in the table with the status Waiting to connect. Once installation is complete, the cluster status changes to Connected.
chmod +x ./preinstall-diagnostics-<platform> && \
./preinstall-diagnostics-<platform> \
--domain ${CONTROL_PLANE_FQDN} \
--cluster-domain ${CLUSTER_FQDN} \
#if the diagnostics image is hosted in a private registry
--image-pull-secret ${IMAGE_PULL_SECRET_NAME} \
--image ${PRIVATE_REGISTRY_IMAGE_URL} #Save the image locally
docker save --output preinstall-diagnostics.tar gcr.io/run-ai-lab/preinstall-diagnostics:${VERSION}
#Load the image to the organization's registry
docker load --input preinstall-diagnostics.tar
docker tag gcr.io/run-ai-lab/preinstall-diagnostics:${VERSION} ${CLIENT_IMAGE_AND_TAG}
docker push ${CLIENT_IMAGE_AND_TAG}chmod +x ./preinstall-diagnostics-darwin-arm64 && \
./preinstall-diagnostics-darwin-arm64 \
--domain ${CONTROL_PLANE_FQDN} \
--cluster-domain ${CLUSTER_FQDN} \
--image-pull-secret ${IMAGE_PULL_SECRET_NAME} \
--image ${PRIVATE_REGISTRY_IMAGE_URL} From computer - Click the Metadata XML file field, then select your file for upload
From URL - In the Metadata XML field, enter the URL to the IDP Metadata XML file
You can either copy the Redirect URL and Entity ID displayed on the screen and enter them in your identity provider, or use the service provider metadata XML, which contains the same information in XML format. This file becomes available after you click SAVE in step 7.
Optional: Enter the user attributes and their value in the identity provider as shown in the below table
Click SAVE. After save, click Open service provider metadata XML to access the metadata file. This file can be used to configure your identity provider.
Optional: Enable Auto-Redirect to SSO to automatically redirect users to your configured identity provider’s login page when accessing the platform.
Defines the user attribute in the IDP holding the user's email address, which is the user identifier in NVIDIA Run:ai.
User first name
firstName
Used as the user’s first name appearing in the NVIDIA Run:ai platform.
User last name
lastName
Used as the user’s last name appearing in the NVIDIA Run:ai platform.
In the identity provider sign-in page, log in with the SSO user who you granted with access rules
If you are unsuccessful signing-in to the identity provider, follow the Troubleshooting section below
You can view the identity provider URL, identity provider entity ID, and the certificate expiration date
In the identity provider box, check for a "Certificate expired” error
If it is expired, update the SAML metadata file to include a valid certificate
Advanced:
Open the Chrome DevTools: Right-click on page → Inspect → Console tab
Run the following command to retrieve and paste the user’s token: localStorage.token;
Paste in https://jwt.io
Under the Payload section validate the values of the user’s attributes
Go to the NVIDIA Run:ai login screen
Open the Chrome Network inspector: Right-click → Inspect on the page → Network tab
On the sign-in page click CONTINUE WITH SSO.
Once redirected to the Identity Provider, search in the Chrome network inspector for an HTTP request showing the SAML Request. Depending on the IDP url, this would be a request to the IDP domain name. For example, accounts.google.com/idp?1234.
When found, go to the Payload tab and copy the value of the SAML Request
Paste the value into a SAML decoder (e.g. )
Validate the request:
The content of the <saml:Issuer> tag is the same as Entity ID given when
The content of the AssertionConsumerServiceURL is the same as the Redirect URI given when
Validate the response:
The user email under the <saml2:Subject> tag is the same as the logged-in user
Make sure that under the <saml2:AttributeStatement> tag, there is an Attribute named email (lowercase). This attribute is mandatory.
User role groups
GROUPS
If it exists in the IDP, it allows you to assign NVIDIA Run:ai role groups via the IDP. The IDP attribute must be a list of strings.
Linux User ID
UID
If it exists in the IDP, it allows Researcher containers to start with the Linux User UID. Used to map access to network resources such as file systems to users. The IDP attribute must be of type integer.
Linux Group ID
GID
If it exists in the IDP, it allows Researcher containers to start with the Linux Group GID. The IDP attribute must be of type integer.
Supplementary Groups
SUPPLEMENTARYGROUPS
If it exists in the IDP, it allows Researcher containers to start with the relevant Linux supplementary groups. The IDP attribute must be a list of integers.
Supported
NVIDIA Run:ai communicates with GitHub by defining it as a asset
Hugging Face
Repositories
Supported
NVIDIA Run:ai provides an out of the box integration with
JupyterHub
Development
Community Support
It is possible to submit NVIDIA Run:ai workloads via JupyterHub.
Jupyter Notebook
Development
Supported
NVIDIA Run:ai provides integrated support with Jupyter Notebooks. See example.
Cost Optimization
Supported
NVIDIA Run:ai provides out of the box support for Karpenter to save cloud costs. Integration notes with Karpenter can be found .
Training
Supported
NVIDIA Run:ai provides out of the box support for submitting MPI workloads via API, CLI or UI. See for more details.
Kubeflow notebooks
Development
Community Support
It is possible to launch a Kubeflow notebook with the NVIDIA Run:ai Scheduler. Sample code: .
Kubeflow Pipelines
Orchestration
Community Support
It is possible to schedule kubeflow pipelines with the NVIDIA Run:ai Scheduler. Sample code: .
MLFlow
Model Serving
Community Support
It is possible to use ML Flow together with the NVIDIA Run:ai Scheduler.
PyCharm
Development
Supported
Containers created by NVIDIA Run:ai can be accessed via PyCharm.
PyTorch
Training
Supported
NVIDIA Run:ai provides out of the box support for submitting PyTorch workloads via API, CLI or UI. See for more details.
Ray
training, inference, data processing.
Community Support
It is possible to schedule Ray jobs with the NVIDIA Run:ai Scheduler. Sample code: .
SeldonX
Orchestration
Community Support
It is possible to schedule Seldon Core workloads with the NVIDIA Run:ai Scheduler.
Spark
Orchestration
Community Support
It is possible to schedule Spark workflows with the NVIDIA Run:ai Scheduler.
S3
Storage
Supported
NVIDIA Run:ai communicates with S3 by defining a asset
TensorBoard
Experiment tracking
Supported
NVIDIA Run:ai comes with a preset TensorBoard asset
TensorFlow
Training
Supported
NVIDIA Run:ai provides out of the box support for submitting TensorFlow workloads via API, CLI or UI. See for more details.
Triton
Orchestration
Supported
Usage via docker base image
VScode
Development
Supported
Containers created by NVIDIA Run:ai can be accessed via Visual Studio Code. You can automatically launch Visual Studio code web from the NVIDIA Run:ai console.
Weights & Biases
Experiment tracking
Community Support
It is possible to schedule W&B workloads with the NVIDIA Run:ai Scheduler. Sample code: .
Training
Supported
NVIDIA Run:ai provides out of the box support for submitting XGBoost via API, CLI or UI. See for more details.
Apache Airflow
Orchestration
Community Support
It is possible to schedule Airflow workflows with the NVIDIA Run:ai Scheduler. Sample code: How to integrate NVIDIA Run:ai with Apache Airflow.
Argo workflows
Orchestration
Community Support
It is possible to schedule Argo workflows with the NVIDIA Run:ai Scheduler. Sample code: How to integrate NVIDIA Run:ai with Argo Workflows.
ClearML
Experiment tracking
Community Support
It is possible to schedule ClearML workloads with the NVIDIA Run:ai Scheduler.
Docker Registry
Repositories
Supported
NVIDIA Run:ai allows using a docker registry as a Credential asset
GitHub
Storage
The NVIDIA Run:ai platform enables administrators, researchers, and MLOps engineers to fully leverage GB200 NVL72 systems and other NVLink-based domains without requiring deep knowledge of hardware configurations or NVLink topologies. Key capabilities include:
Automatic detection and labeling
Detects GB200 NVL72 nodes and identifies MNNVL domains (e.g., GB200 NVL72 racks).
Automatically detects whether a node pool contains GB200 NVL72.
Supports manual override of GB200 MNNVL detection and label key for future compatibility and improved resiliency.
Simplified distributed workload submission
Allows seamless submission of distributed workloads into GB200-based node pools, eliminating all the complexities involved with that operation on top of GB200 MNNVL domains.
Abstracts away the complexity of configuring workloads for NVL domains.
Flexible support for NVLink domain variants
Compatible with current and future NVL domain configurations.
Supports any number of domains or GB200 racks.
Enhanced monitoring and visibility
Provides detailed NVIDIA Run:ai dashboards for monitoring GB200 nodes and MNNVL domains by node pool.
Control and customization
Offers manual override and label configuration for greater resiliency and future-proofing.
Enables advanced users to fine-tune GB200 scheduling behavior based on workload requirements.
Kubernetes version - Requires Kubernetes 1.32 or later.
NVIDIA GPU Operator - Install NVIDIA GPU Operator version 25.3 or above. See the NVIDIA GPU Operator section for installation instructions. This version must include the associated Dynamic Resource Allocation (DRA) driver, which provides support for GB200 accelerated networking resources and the ComputeDomain feature. For detailed steps on installing the DRA driver and configuring ComputeDomain, refer to NVIDIA Dynamic Resource Allocation (DRA) Driver.
NVIDIA Network Operator - Install the NVIDIA Network Operator. See the NVIDIA Network Operator section for installation instructions.
Enable GPU network acceleration - After installation, update runaiconfig using the GPUNetworkAccelerationEnabled=True flag to enable GPU network acceleration. This triggers an update of the NVIDIA Run:ai workload-controller deployment and restarts the controller. See for more details.
Administrators must define dedicated node pools that align with GB200 NVL72 rack topologies. These node pools ensure that workloads are isolated to nodes with NVLink interconnects and are not scheduled on incompatible hardware. Each node pool can be manually configured in the NVIDIA Run:ai platform and associated with specific node labels. Two key configurations are required for each node pool:
Node Labels – Identify nodes equipped with GB200.
MNNVL Domain Discovery – Specify how the platform detects whether the node pool includes NVLink-connected nodes.
To create a node pool with GPU network acceleration, see Node pools.
To enable the NVIDIA Run:ai Scheduler to recognize GB200-based nodes, administrators must:
Use the default node label provided by the NVIDIA GPU Operator - nvidia.com/gpu.clique.
Or, apply a custom label that clearly marks the node as GB200/MNNVL capable.
This node label serves as the basis for identifying appropriate nodes and ensuring workloads are scheduled on the correct hardware.
The administrator can configure how the NVIDIA Run:ai platform detects MNNVL domains for each node pool. The available options include:
Automatic Discovery – Uses the default label key nvidia.com/gpu.clique, or a custom label key specified by the administrator. The NVIDIA Run:ai platform automatically discovers MNNVL domains within node pools. If a node is labeled with the MNNVL label key, the NVIDIA Run:ai platform indicates this node pool as MNNVL detected. MNNVL detected node pools are treated differently by the NVIDIA Run:ai platform when submitting a distributed training workload.
Manual Discovery – The platform does not evaluate any node labels. Detection is based solely on the administrator’s configuration of the node pool as MNNVL “Detected” or “Not Detected.”
When automatic discovery is enabled, all GB200 nodes that are part of the same physical rack (NVL72 or other future topologies) are part of the same NVL Domain and automatically labeled by the GPU Operator with a common label using a unique label value per domain and sub-domain. The default label key set by the NVIDIA GPU Operator is nvidia.com/gpu.clique and its value consists of - <NVL Domain ID (ClusterUUID)>.<Clique ID> :
The NVL Domain ID (ClusterUUID) is a unique identifier that represents the physical NVL domain, for example, a physical GB200 NVL72 rack.
The Clique ID denotes a logical MNNVL sub-domain. A clique represents a further logical split of the MNNVL into smaller domains that enable secure, fast, and isolated communication between pods running on different GB200 nodes within the same GB200 NVL72.
The Nodes table provides more information on which GB200 NVL72 domain each node belongs to, and which Clique ID it is associated with.
When a distributed training workload is submitted to an MNNVL-detected node pool, the NVIDIA Run:ai platform automates several key configuration steps to ensure optimal workload execution:
ComputeDomain creation - The NVIDIA Run:ai platform creates a ComputeDomain Custom Resource Definition (CRD), which is a proprietary resource used to manage NVLink-based domain assignments.
Resource Claim injection - A reference to the ComputeDomain is automatically added to the workload specification as a resource claim, allowing the Scheduler to link the workload to a specific NVLink domain.
Pod affinity configuration - Pod affinity is applied using a Preferred policy with the MNNVL label key (e.g., nvidia.com/gpu.clique) as the topology key. This ensures that pods within the distributed workload are located on nodes with NVLink interconnects.
Node affinity configuration - Node affinity is also applied using a Preferred policy based on the same label key, further guiding the Scheduler to place workloads within the correct node group.
These additional steps are crucial for the creation of underlying HW resources (also known as IMEX channels) and stickiness of the distributed workload to MNNVL topologies and nodes. When a distributed workload is stopped or evicted, the platform automatically removes the corresponding ComputeDomain.
When submitting a distributed workload, you should explicitly specify a list of one or more MNNVL detected node pools, or a list of one or more non-MNNVL detected node pools. A mix of MNNVL detected and non-MNNVL detected node pools is not supported. A GB200 MNNVL node pool is a pool that contains at least one node belonging to an MNNVL domain.
Other workload types (not distributed) can include a list of mixed MNNVL and non-MNNVL node pools, from which the Scheduler will choose.
MNNVL node pools can include any size of MNNVL domains (i.e. NVL72 and any future domain size) and support any Grace-Blackwell models (GB200 and any future models).
To support the submission of larger distributed workloads, it is recommended to group as many GB200 racks as possible into fewer node pools. When possible, use a single GB200 node pool, unless there is a specific operational reason to divide resources across multiple node pools.
When submitting distributed training workloads with the controller pod set as a distinct non-GPU workload, the MNNVL feature should be used with the default Preferred mode as explained in the below section.
You can influence how the Scheduler places distributed training workloads into GB200 MNNVL node pools using the Topology field available in the distributed training workload submission form.
Confine a workload to a single GB200 MNNVL domain - To ensure the workload is scheduled within a single GB200 MNNVL domain (e.g., a GB200 NVL72 rack), apply a topology label with a Required policy using the MNNVL label key (nvidia.com/gpu.clique). This instructs the Scheduler to strictly place all pods within the same MNNVL domain. If the workload exceeds 18 pods (or 72 GPUs), the Scheduler will not be able to find a matching domain and will fail to schedule the workload.
Try to schedule a workload using a Preferred topology - To guide the Scheduler to prioritize a specific topology without enforcing it, apply a topology label with a policy of Preferred. You can apply any topology label with a Preferred policy. These labels are treated with higher scheduling weight than the default Preferred pod affinity automatically applied by NVIDIA Run:ai for MNNVL.
Mandate a custom topology - To force scheduling a workload into a custom topology, add a topology label with a policy of Required. This ensures the workload is strictly scheduled according to the specified topology. Keep in mind that using a Required policy can significantly constrain scheduling. If matching resources are not available, the Scheduler may fail to place the workload.
You can customize how the NVIDIA Run:ai platform applies the MNNVL feature to each distributed training workload. This allows you to override the default behavior when needed. To configure this behavior, set the proprietary label key run.ai/MNNVL in the General settings section of the distributed training workload submission form. The following values are supported:
None - Disables the MNNVL feature for the workload. The platform does not create a ComputeDomain and no pod affinity or node affinity is applied by default.
Preferred (default) - Indicates that MNNVL feature is preferred but not required. This is the default behavior when submitting a distributed training workload:
If the workload is submitted to a 'non-MNNVL detected' node pool, then the NVIDIA Run:ai platform does not add a ComputeDomain, ComputeDomain claim, pod affinity or node affinity for MNNVL nodes.
Otherwise, if the workload is submitted to a 'MNNVL detected' node pool, then the NVIDIA Run:ai platform automatically adds: ComputeDomain, ComputeDomain claim, NodeAffinity and PodAffinity both with a Preferred policy and using the MNNVL label.
If you manually add an additional Preferred topology label, it will be given higher scheduling weight than the default embedded pod affinity (which has weight = 1).
Required - Enforces a strict use of MNNVL domains for the workload. The workload must be scheduled on MNNVL supported nodes:
The NVIDIA Run:ai platform creates a ComputeDomain and ComputeDomain claim.
The NVIDIA Run:ai platform will automatically add a node affinity rule with a Required policy using the appropriate label.
Pod affinity is set to Preferred by default, but you can override it manually with a Required pod affinity rule using the MNNVL label key or another custom label.
If the DRA driver is not installed correctly in the cluster, particularly if the required CRDs are missing, and the MNNVL feature is enabled in the NVIDIA Run:ai platform, the workload controller will enter a crash loop. This will continue until the DRA driver is properly installed with all necessary CRDs or the MNNVL feature is disabled in the NVIDIA Run:ai platform.
To run workloads on a GB200 node pool (i.e., a node pool detected as MNNVL-enabled), the workload must explicitly request that node pool. To prevent unintentional use of MNNVL-detected node pools, administrators must ensure these node pools are not included in any project's default list of node pools.
Only one distributed training workload per node can use GB200 accelerated networking resources. If GPUs remain unused on that node, other workload types may still utilize them.
If a GB200 node fails, any associated pod will be re-scheduled, causing the entire distributed workload to fail and restart. On non-GB200 nodes, this scenario may be self-healed by the Scheduler without impacting the entire workload.
If a pod from a distributed training workload fails or is evicted by the Scheduler, it must be re-scheduled on the same node. Otherwise, the entire workload will be evicted and, in some cases, re-queued.
Elastic distributed training workloads are not supported with MNNVL.
Workloads created in versions earlier than 2.21 do not include GB200 MNNVL node pools and are therefore not expected to experience compatibility issues.
If a node pool that was previously used in a workload submission is later updated to include GB200 nodes (i.e., becomes a mixed node pool), the workload submitted before version 2.21 will not use any accelerated networking resources, although it may still run on GB200 nodes.
For the AI practitioner:
Reduced wait time - Workloads with smaller GPU requests are more likely to be scheduled quickly, minimizing delays in accessing resources.
Increased workload capacity - More workloads can be run using the same admin-defined GPU quota and available unused resources - over quota.
For the platform administrator:
Improved GPU utilization - Sharing GPUs across workloads increases the utilization of individual GPUs, resulting in better overall platform efficiency.
Higher resource availability - More users gain access to GPU resources, ensuring better distribution.
Enhanced workload throughput - More workloads can be served per GPU, ensuring maximum output from existing hardware.
When planning the quota distribution for your projects and departments, using fractions gives the platform administrator the ability to allocate more precise quota per project and department, assuming the usage of GPU fractions or enforcing it with pre-defined policies or compute resource templates.
For example, in an organization with a department budgeted for two nodes of 8×H100 GPUs and a team of 32 researchers:
Allocating 0.5 GPU per researcher ensures all researchers have access to GPU resources.
Using fractions enables researchers to run smaller workloads intermittently within their quota or go over their quota by using temporary over quota resources with higher resource demanding workloads.
Using GPUs for notebook-based model development, where GPUs are not continuously active and can be shared among multiple users.
For more details on mapping your organization and resources, see Adapting AI initiatives to your organization.
When a workload is submitted, the Scheduler finds a node with a GPU that can satisfy the requested GPU portion or GPU memory, then it schedules the pod to that node. The NVIDIA Run:ai GPU fractions logic, running locally on each NVIDIA Run:ai worker node, allocates the requested memory size on the selected GPU. Each pod uses its own separate virtual memory address space. NVIDIA Run:ai’s GPU fractions logic enforces the requested memory size, so no workload can use more than requested, and no workload can run over another workload’s memory. This gives users the experience of a ‘logical GPU’ per workload.
While MIG requires administrative work to configure every MIG slice, where a slice is a fixed chunk of memory, GPU fractions allow dynamic and fully flexible allocation of GPU memory chunks. By default, GPU fractions use NVIDIA’s time-slicing to share the GPU compute runtime. You can also use the NVIDIA Run:ai GPU time-slicing which allows dynamic and fully flexible splitting of the GPU compute time.
NVIDIA Run:ai GPU fractions are agile and dynamic allowing a user to allocate and free GPU fractions during the runtime of the system, at any size between zero to the maximum GPU portion (100%) or memory size (up to the maximum memory size of a GPU).
The NVIDIA Run:ai Scheduler can work alongside other schedulers. In order to avoid collisions with other schedulers, the NVIDIA Run:ai Scheduler creates special reservation pods. Once a workload is submitted requesting a fraction of a GPU, NVIDIA Run:ai will create a pod in a dedicated runai-reservation namespace with the full GPU as a resource, allowing other schedulers to understand that the GPU is reserved.
NVIDIA Run:ai also supports workload submission using multi-GPU fractions. Multi-GPU fractions work similarly to single-GPU fractions, however, the NVIDIA Run:ai Scheduler allocates the same fraction size on multiple GPU devices within the same node. For example, if practitioners develop a new model that uses 8 GPUs and requires 40GB of memory per GPU, they can allocate 8×40GB with multi-GPU fractions instead of reserving the full memory of each GPU (e.g. 80GB). This leaves 40GB of GPU memory available on each of the 8 GPUs for other workloads within that node.
Time sharing where single GPUs can serve multiple workloads with fractions remains unchanged, only now, it serves multiple workloads using multi-GPUs per workload, single-GPU per workload, or a mix of both.
Selecting a GPU portion using percentages as units does not guarantee the exact memory size. This means 50% of an A-100-40GB is 20GB while 50% of an A-100-80 is 40GB. To have better control over the exact allocated memory, specify the exact memory size, i.e. 40GB.
Using NVIDIA Run:ai GPU fractions controls the memory split (i.e. 0.5 GPU means 50% of the GPU memory) but not the compute (processing time). To split the compute time, see NVIDIA Run:ai’s GPU time slicing.
NVIDIA Run:ai GPU fractions and MIG mode cannot be used on the same node.
Using the compute resources asset, you can define the compute requirements by specifying your requested GPU portion or GPU memory, and use it with any of the NVIDIA Run:ai workload types for single GPU and multi-GPU fractions.
Single-GPU fractions - Define the compute requirement to run 1 GPU device, by specifying either a fraction (percentage) of the overall memory or a memory request (GB, MB).
Multi-GPU fractions - Define the compute requirement to run multiple GPU devices, by specifying either a fraction (percentage) of the overall memory or a memory request (GB, MB).
To enable GPU fractions for workloads submitted via Kubernetes YAML, use the following annotations to define the GPU fraction configuration. You can configure either gpu-fraction or gpu-memory. Make sure the default scheduler is set to runai-scheduler. See Using the Scheduler with third-party workloads for more details.
gpu-fraction
A portion of GPU memory as a double-precision floating-point number. Example: 0.25, 0.75.
Pod annotation (metadata.annotations)
gpu-memory
Memory size in MiB. Example: 2500, 4096. The gpu-memory values are always in MiB.
Pod annotation (metadata.annotations)
gpu-fraction-num-devices
The number of GPU devices to allocate using the specified gpu-fraction or gpu-memory value. Set this annotation only if you want to request multiple GPU devices.
Pod annotation (metadata.annotations)
The following example YAML creates a pod that requests 2 GPU devices, each requesting 50% of memory (gpu-fraction: "0.5") .
To view the available actions, go to the CLI v2 reference or the CLI v1 reference and run according to your workload.
To view the available actions, go to the API reference and run according to your workload.
Every newly created pod is assigned to a pod group, which can represent one or multiple pods within a workload. For example, a distributed PyTorch training workload with 32 workers is grouped into a single pod group. All pods are attached to the pod group with certain rules, such as gang scheduling, applied to the entire pod group.
A scheduling queue (or simply a queue) represents a scheduler primitive that manages the scheduling of workloads based on different parameters.
A queue is created for each project/node pool pair and department/node pool pair. The NVIDIA Run:ai Scheduler supports hierarchical queueing, project queues are bound to department queues, per node pool. This allows an organization to manage quota, over quota and more for projects and their associated departments.
Each project and department includes a set of deserved resource quotas, per node pool and resource type. For example, project “LLM-Train/Node Pool NV-H100” quota parameters specify the number of GPUs, CPUs(cores), and the amount of CPU memory that this project deserves to get when using this node pool. Non-preemptible workloads can only be scheduled if their requested resources are within the deserved resource quotas of their respective project/node-pool and department/node-pool.
Projects and departments can have a share in the unused resources of any node pool, beyond their quota of deserved resources. These resources are referred to as over quota resources. The administrator configures the over quota parameters per node pool for each project and department.
Projects can receive a share of the cluster/node pool unused resources when the over quota weight setting is enabled. The part each Project receives depends on its over quota weight value, and the total weights of all other projects’ over quota weights. The administrator configures the over quota weight parameters per node pool for each project and department.
Each project has a set of guaranteed resource quotas (GPUs, CPUs, and CPU memory) per node pool. Projects can go over quota and get a share of the unused resources in a node pool beyond their guaranteed quota in that node pool. The same applies to Departments. The Scheduler balances the amount of over quota between departments, and then between projects. The department’s deserved quota and over quota limit the sum of resources of all projects, within the department. If the project shows it has deserved quota, but the department deserved quota is exhausted, the Scheduler will not give the project anymore deserved resources. The same applies to over quota resources. over quota resources are first given to the department, and only then split among its projects.
The NVIDIA Run:ai Scheduler calculates a numerical value, fairshare, per project (or department) for each node pool, representing the project’s (department’s) sum of guaranteed resources plus the portion of non-guaranteed resources in that node pool.
The Scheduler aims to provide each project (or department) the resources they deserve per node pool using two main parameters: deserved quota and deserved fairshare (i.e. quota + over quota resources). If one project’s node pool queue is below fairshare and another project’s node pool queue is above fairshare, the Scheduler shifts resources between queues to balance fairness. This may result in the preemption of some over quota preemptible workloads.
Over-subscription is a scenario where the sum of all guaranteed resource quotas surpasses the physical resources of the cluster or node pool. In this case, there may be scenarios in which the Scheduler cannot find matching nodes to all workload requests, even if those requests were within the resource quota of their associated projects.
The administrator can set a placement strategy, bin-pack or spread, of the Scheduler per node pool. For GPU based workloads, workloads can request both GPU and CPU resources. For CPU-only based workloads, workloads can request CPU resources only.
GPU workloads:
Bin-pack - The Scheduler places as many workloads as possible in each GPU and node to use fewer resources and maximize GPU and node vacancy.
Spread - The Scheduler spreads workloads across as many GPUs and nodes as possible to minimize the load and maximize the available resources per workload.
CPU workloads:
Bin-pack - The Scheduler places as many workloads as possible in each CPU and node to use fewer resources and maximize CPU and node vacancy.
Spread - The Scheduler spreads workloads across as many CPUs and nodes as possible to minimize the load and maximize the available resources per workload.
NVIDIA Run:ai supports scheduling workloads using different priority and preemption policies:
High-priority workloads (pods) can preempt lower priority workloads (pods) within the same scheduling queue (project), according to their preemption policy. The NVIDIA Run:ai Scheduler implicitly assumes any PriorityClass >= 100 is non-preemptible and any PriorityClass < 100 is preemptible.
Cross project and cross department workload preemptions are referred to as resource reclaim and are based on fairness between queues rather than the priority of the workloads.
To make it easier for users to submit workloads, NVIDIA Run:ai preconfigured several Kubernetes PriorityClass objects. The NVIDIA Run:ai preset PriorityClass objects have their ‘preemptionPolicy’ always set to ‘PreemptLowerPriority’, regardless of their actual NVIDIA Run:ai preemption policy within the NVIDIA Run:ai platform. A non-preemptible workload is only scheduled if in-quota and cannot be preempted after being scheduled, not even by a higher priority workload.
125
Non-preemptible
PreemptLowerPriority
Build ()
100
Non-preemptible
PreemptLowerPriority
Interactive-preemptible ()
75
Workload priority is always respected within a project. This means higher priority workloads are scheduled before lower priority workloads. It also means that higher priority workloads may preempt lower priority workloads within the same project if the lower priority workloads are preemptible.
Fairness is a major principle within the NVIDIA Run:ai scheduling system. It means that the NVIDIA Run:ai Scheduler always respects certain resource splitting rules (fairness) between projects and between departments.
Reclaim is an inter-project (and inter-department) scheduling action that takes back resources from one project (or department) that has used them as over quota, back to a project (or department) that deserves those resources as part of its deserved quota, or to balance fairness between projects, each to its fairshare (i.e. sharing fairly the portion of the unused resources).
Gang scheduling describes a scheduling principle where a workload composed of multiple pods is either fully scheduled (i.e. all pods are scheduled and running) or fully pending (i.e. all pods are not running). Gang scheduling refers to a single pod group.
Now that you have learned the key concepts and principles of the NVIDIA Run:ai Scheduler, see how the Scheduler works - allocating pods to workloads, applying preemption mechanisms, and managing resources.
Data volumes are disabled, by default. If you cannot see Data volumes, then it must be enabled by your Administrator, under General settings → Workloads → Data volumes.
Data volumes are supported only for flexible workload submission.
Sharing with multiple scopes - Data volumes can be shared across different scopes in a cluster, including projects, departments. Using data volumes allows for data reuse and collaboration within the organization.
Storage saving - A single copy of the data can be used across multiple scopes
Sharing large datasets - In large organizations, the data is often stored in a remote location, which can be a barrier for large model training. Even if the data is transferred into the cluster, sharing it easily with multiple users is still challenging. Data volumes can help share the data seamlessly, with maximum security and control.
Sharing data with colleagues - When sharing training results, generated datasets, or other artifacts with team members is needed, data volumes can help make the data available easily.
To create a data volume, you must have a PVC data source already created. Make sure the PVC includes data before sharing it.
The data volumes table can be found under Workload manager in the NVIDIA Run:ai platform.
The data volumes table provides a list of all the data volumes defined in the platform and allows you to manage them.
The data volumes table comprises the following columns:
Data volume
The name of the data volume
Description
A description of the data volume
Status
The different lifecycle and representation of the data volume condition
Scope
The of the data source within the organizational tree. Click the scope name to view the organizational tree diagram
Origin project
The project of the origin PVC
Origin PVC
The original PVC from which the data volume was created that points to the same PV
The following table describes the data volumes' condition and whether they were created successfully for the selected scope.
No issues found
No issues were found while creating the data volume
Issues found
Issues were found while sharing the data volume. Contact NVIDIA Run:ai support.
Creating…
The data volume is being created
Deleting...
The data volume is being deleted
No status / “-”
When the data volume’s scope is an account, the current version of the cluster is not up to date, or the asset is not a cluster-syncing entity, the status can’t be displayed
Filter - Click ADD FILTER, select the column to filter by, and enter the filter values
Search - Click SEARCH and type the value to search by
Sort - Click each column header to sort by
Column selection - Click COLUMNS and select the columns to display in the table
Refresh - Click REFRESH to update the table with the latest data
To create a new data volume:
Click +NEW DATA VOLUME
Enter a name for the data volume. The name must be unique.
Optional: Provide a description of the data volume
Set the project where the data is located
Set a PVC from which to create the data volume
Set the that will be able to mount the data volume
Click CREATE DATA VOLUME
To edit a data volume:
Select the data volume you want to edit
Click Edit
Click SAVE DATA VOLUME
To copy an existing data volume:
Select the data volume you want to copy
Click MAKE A COPY
Enter a name for the data volume. The name must be unique.
Set a new Origin PVC for your data volume, since only one Origin PVC can be used per data volume
Click CREATE DATA VOLUME
To delete a data volume:
Select the data volume you want to delete
Click DELETE
Confirm you want to delete the data volume
To view the available actions, go to the Data volumes API reference.
The NVIDIA Run:ai v2.21 what's new provides a detailed summary of the latest features, enhancements, and updates introduced in this version. They serve as a guide to help users, administrators, and researchers understand the new capabilities and how to leverage them for improved workload management, resource optimization, and more.
Streamlined workload submission with a customizable form - The new customizable submission form allows you to submit workloads by selecting and modifying an existing setup or providing your own settings. This enables faster, more accurate submissions that align with organizational policies and individual workload needs. Beta From cluster v2.18 onward
Feature high level details:
Flexible submission options - Choose from an existing setup and customize it, or start from scratch and provide your own settings for a one-time setup.
Improved visibility - Review existing setups and understand their associated policy definitions.
One-time data sources setup - Configure a data source as part of your one-time setup for a specific workload.
Unified experience - Use the new form for all workload types: , , , and
Support for JAX distributed training workloads - You can now submit distributed training workloads using the JAX framework via the UI, API, and CLI. This enables you to leverage JAX for scalable, high-performance training, making it easier to run and manage JAX-based workloads seamlessly within NVIDIA Run:ai. See for more details. From cluster v2.21 onward
Pod restart policy for all workload types - A restart policy can be configured to define how pods are restarted when they terminate. The policy is set at the workload level across all workload types via the API and CLI. For distributed training workloads, restart policies can be set separately for master and worker pods. This enhancement ensures workloads are restarted efficiently, minimizing downtime and optimizing resource usage. From cluster v2.21 onward
New environment presets - Added new NVIDIA Run:ai environment presets when running in a host-based routing cluster - vscode, rstudio, jupyter-scipy, tensorboard-tensorflow. See for more details. From cluster v2.21 onward
Support for PVC size expansion - Adjust the size of Persistent Volume Claims (PVCs) via the API, leveraging the allowVolumeExpansion field of the storage class resource. This enhancement enables you to dynamically adjust storage capacity as needed.
Improved visibility of storage class configurations - When creating new PVCs or volumes, the UI now displays access modes, volume modes, and size options based on administrator-defined storage class configurations. This update ensures consistency, increases transparency, and helps prevent misconfigurations during setup.
New default CLI - CLI v2 is the default command-line interface. CLI v1 has been as of version 2.20.
Secret volume mapping for workloads - You can now map secrets to volumes when submitting workloads using the --secret-volume flag. This feature is available for all workload types - workspaces, training, and inference.
Support for environment field references in submit commands - A new flag, fieldRef, has been added to all submit commands to support environment field references in a key:value format. This enhancement enables dynamic injection of environment variables directly from pod specifications, offering greater flexibility during workload submission.
Support for inference workloads via CLI v2 - You can now run inference workloads directly from the command-line interface. This update enables greater automation and flexibility for managing inference workloads. See for more details.
Enhanced rolling inference updates - Rolling inference updates allow ML engineers to apply live updates to existing inference workloads—regardless of their current status (e.g., running or pending)—without disrupting critical services. Experimental
This capability is now supported for both and workloads, with a new UI flow that aligns with the API functionality introduced in v2.19.
Enhancements to the Overview dashboard - The Overview dashboard includes optimization insights for projects and departments, providing real-time visibility into GPU resource allocation and utilization. These insights help department and project managers make more informed decisions about quota management, ensuring efficient resource usage.
Dashboard UX improvements:
Improved visibility of metrics in the Resources utilization widget by repositioning them above the graphs.
Enhanced resource prioritization for projects and departments - Admins can now define and manage SLAs tailored to specific and via the UI, ensuring resource allocation aligns with real business priorities. This enhancement empowers admins to assign strict priority to over-quota resources, extending control beyond the existing over-quota weight system. From cluster v2.20 onward
This feature allows administrators to:
Set the priority of each department relative to other departments within the same node pool.
Updated access control for audit logs - Only users with tenant-wide permissions have the ability to access audit logs, ensuring proper access control and data security. This update reinforces security and compliance by restricting access to sensitive system logs. It ensures that only authorized users can view audit logs, reducing the risk of unauthorized access and potential data exposure.
Slack API integration for notifications - A new API allows organizations to receive notifications directly to Slack. This feature enhances real-time communication and monitoring by enabling users to stay informed about workload statuses. See for more details.
Improved visibility into user roles and access scopes - Individual users can now view their assigned roles and scopes directly in their settings. This enhancement provides greater transparency into user permissions, allowing individuals to easily verify their access levels. It helps users understand what actions they can perform and reduces dependency on administrators for access-related inquiries. See for more details.
Added auto-redirect to SSO - To deliver a consistent and streamlined login experience across customer applications, users accessing the NVIDIA Run:ai login page will be automatically redirected to SSO, bypassing the standard login screen entirely. This can be enabled via a toggle after an Identity Provider is added, and is available through both the UI and API. See for more details.
SAML service provider metadata XML - After configuring SAML IDP, the service provider metadata XML is now available for download to simplify integration with identity providers. See
Added Data volumes to the UI - Administrators can now create and manage data volumes directly from the UI and share data across different scopes in a cluster, including projects and departments. See for more details. Experimental From cluster v2.19 onward
Support for NVIDIA GB200 NVL72 and MultiNode NVLink systems - NVIDIA Run:ai offers full support for NVIDIA’s most advanced MultiNode NVLink (MNNVL) systems, including NVIDIA GB200, NVIDIA GB200 NVL72 and its derivatives. NVIDIA Run:ai simplifies the complexity of managing and submitting workloads on these systems by automating infrastructure detection, domain labeling, and distributed job submission via the UI, CLI, or API. See for more details. From cluster v2.21 onward
Automatic cleanup of resources for failed workloads - When a workload fails due to infrastructure issues, its resources can be automatically cleaned up using failureResourceCleanupPolicy, reducing resource of failed workloads. For more details, see . From cluster v2.21 onward
Custom pod labels and annotations - Add custom labels and annotations to pods in both the control plane and cluster. This new capability enables service mesh deployment in NVIDIA Run:ai. This feature provides greater flexibility in workload customization and management, allowing users to integrate with service meshes more easily. See for more details.
NVIDIA Run:ai now supports NVIDIA GPU Operator version 25.3.
NVIDIA Run:ai now supports OpenShift version 4.18.
NVIDIA Run:ai now supports Kubeflow Training Operator 1.9.
Kubernetes version 1.29 is no longer supported.
Using the Cluster API to submit NVIDIA Run:ai workloads via YAML was starting from NVIDIA Run:ai version 2.18. For cluster version 2.18 and above, use the to submit workloads. The Cluster API documentation has also been removed from v2.20 and above.
The NVIDIA Run:ai control plane is a Kubernetes application. This section explains the required hardware and software system requirements for the NVIDIA Run:ai control plane. Before you start, make sure to review the Installation overview.
The machine running the installation script (typically the Kubernetes master) must have:
At least 50GB of free space
Docker installed
3.14 or later
The following hardware requirements are for the control plane system nodes. By default, all NVIDIA Run:ai control plane services run on all available nodes.
x86 - Supported for both Kubernetes and OpenShift deployments.
ARM - Supported for Kubernetes only. ARM is currently not supported for OpenShift.
This configuration is the minimum requirement you need to install and use NVIDIA Run:ai control plane:
If NVIDIA Run:ai control plane is planned to be installed on the same Kubernetes cluster as the NVIDIA Run:ai cluster, make sure the cluster are considered in addition to the NVIDIA Run:ai control plane hardware requirements.
The following software requirements must be fulfilled.
Any Linux operating system supported by both Kubernetes and NVIDIA GPU Operator
Internal tests are being performed on Ubuntu 22.04 and CoreOS for OpenShift.
Nodes are required to be synchronized by time using NTP (Network Time Protocol) for proper system functionality.
NVIDIA Run:ai control plane requires Kubernetes. The following Kubernetes distributions are supported:
Vanilla Kubernetes
OpenShift Container Platform (OCP)
NVIDIA Base Command Manager (BCM)
Elastic Kubernetes Engine (EKS)
See the following Kubernetes version support matrix for the latest NVIDIA Run:ai releases:
For information on supported versions of managed Kubernetes, it's important to consult the release notes provided by your Kubernetes service provider. There, you can confirm the specific version of the underlying Kubernetes platform supported by the provider, ensuring compatibility with NVIDIA Run:ai. For an up-to-date end-of-life statement see or .
The NVIDIA Run:ai control plane uses a namespace or project (OpenShift) called runai-backend. Use the following to create the namespace/project:
The NVIDIA Run:ai control plane requires a default storage class to create persistent volume claims for NVIDIA Run:ai storage. The storage class, as per Kubernetes standards, controls the reclaim behavior, whether the NVIDIA Run:ai persistent data is saved or deleted when the NVIDIA Run:ai control plane is deleted.
The NVIDIA Run:ai control plane requires to be installed.
OpenShift, RKE and RKE2 come with a pre-installed ingress controller.
Internal tests are being performed on NGINX, Rancher NGINX, OpenShift Router, and Istio.
Make sure that a default ingress controller is set.
There are many ways to install and configure different ingress controllers. The following shows a simple example to install and configure NGINX ingress controller using :
You must have a Fully Qualified Domain Name (FQDN) to install the NVIDIA Run:ai control plane (ex: runai.mycorp.local). This cannot be an IP. The FQDN must be resolvable within the organization's private network.
You must have a TLS certificate that is associated with the FQDN for HTTPS access. Create a named runai-backend-tls in the runai-backend namespace and include the path to the TLS --cert and its corresponding private --key by running the following:
NVIDIA Run:ai uses the OpenShift default Ingress router for serving. The TLS certificate configured for this router must be issued by a trusted CA. For more details, see the OpenShift documentation on .
A local certificate authority serves as the root certificate for organizations that cannot use publicly trusted certificate authority. Follow the below steps to configure the local certificate authority.
In air-gapped environments, you must configure and install the local CA's public key in the Kubernetes cluster. This is required for the installation to succeed:
Add the public key to the runai-backend namespace:
When installing the control plane, make sure the following flag is added to the helm command --set global.customCA.enabled=true. See .
The NVIDIA Run:ai control plane installation includes a default PostgreSQL database. However, you may opt to use an existing PostgreSQL database if you have specific requirements or preferences as detailed in Please ensure that your PostgreSQL database is version 16 or higher.
This quick start provides a step-by-step walkthrough for running a Jupyter Notebook workspace using GPU fractions.
NVIDIA Run:ai’s GPU fractions provides an agile and easy-to-use method to share a GPU or multiple GPUs across workloads. With GPU fractions, you can divide the GPU/s memory into smaller chunks and share the GPU/s compute resources between different workloads and users, resulting in higher GPU utilization and more efficient resource allocation.
Before you start, make sure:
You have created a or have one created for you.
The project has an assigned quota of at least 0.5 GPU.
Go to the Workload manager → Workloads
Click +NEW WORKLOAD and select Workspace
Select under which cluster to create the workload
Select the project in which your workspace will run
Select the newly created workspace with the Jupyter application that you want to connect to
Click CONNECT
Select the Jupyter tool. The selected tool is opened in a new tab on your browser.
To connect to the Jupyter Notebook, browse directly to https://<COMPANY-URL>/<PROJECT-NAME>/<WORKLOAD-NAME>
Manage and monitor your newly created workload using the table.
This section explains the procedure to manage workload policies.
The Workload policies table can be found under Policies in the NVIDIA Run:ai platform.
The Workload policies table provides a list of all the policies defined in the platform, and allows you to manage them.
The Workload policies table consists of the following columns:
Filter - Click ADD FILTER, select the column to filter by, and enter the filter values
Search - Click SEARCH and type the value to search by
Sort - Click each column header to sort by
Column selection - Click COLUMNS and select the columns to display in the table
To create a new policy:
Click +NEW POLICY
Select a scope
Select the workload type
Click +POLICY YAML
Select the policy you want to edit
Click EDIT
Update the policy and click APPLY
Click SAVE POLICY
Listed below are issues that might occur when creating or editing a policy via the YAML Editor:
To view a policy:
Select the policy for which you want to view its .
Click VIEW POLICY
In the Policy form per workload section, view the workload rules and defaults:
Parameter The workload submission parameter that Rules and Defaults are applied to
Select the policy you want to delete
Click DELETE
On the dialog, click DELETE to confirm the deletion
Go to the API reference to view the available actions.
curl -fsSL https://raw.githubusercontent.com/run-ai/public/main/installation/get-installation-logs.shapiVersion: v1
kind: Pod
metadata:
annotations:
user: test
gpu-fraction: "0.5"
gpu-fraction-num-devices: "2"
labels:
runai/queue: test
name: multi-fractional-pod-job
namespace: test
spec:
containers:
- image: gcr.io/run-ai-demo/quickstart-cuda
imagePullPolicy: Always
name: job
env:
- name: RUNAI_VERBOSE
value: "1"
resources:
limits:
cpu: 200m
memory: 200Mi
requests:
cpu: 100m
memory: 100Mi
securityContext:
capabilities:
drop: ["ALL"]
schedulerName: runai-scheduler
serviceAccount: default
serviceAccountName: default
terminationGracePeriodSeconds: 5If other, optional user attributes (groups, firstName, lastName, uid, gid) are mapped make sure they also exist under <saml2:AttributeStatement> along with their respective values.
If any of the targeted node pools do not support MNNVL or if the workload (or any of its pods) does not request GPU resources, the workload will fail to run.
Preemptible
PreemptLowerPriority
50
Preemptible
PreemptLowerPriority
Last updated
The last time the policy was updated
Refresh - Click REFRESH to update the table with the latest data
In the YAML editor type or paste a YAML policy with defaults and rules. You can utilize the following references and examples:
Click SAVE POLICY
Policy can’t be saved for some reason
The policy couldn't be saved due to a network or other unknown issue. Download your draft and try pasting and saving it again later.
Possible cluster connectivity issues. Try updating the policy once again at a different time.
Policies were submitted before version 2.18, you upgraded to version 2.18 or above and wish to submit new policies
If you have policies and want to create a new one, first contact NVIDIA Run:ai support to prevent potential conflicts
Contact NVIDIA Run:ai support. R&D can migrate your old policies to the new version.
Type (applicable for data sources only) The data source type (Git, S3, nfs, pvc etc.)
Default The default value of the Parameter
Rule Set up constraint on workload policy field
Source The origin of the applied policy (cluster, department or project)
Policy
The policy name which is a combination of the policy scope and the policy type
Type
The policy type is per NVIDIA Run:ai workload type. This allows administrators to set different policies for each workload type.
Status
Representation of the policy lifecycle (one of the following - “Creating…”, “Updating…”, “Deleting…”, Ready or Failed)
Scope
The scope the policy affects. Click the name of the scope to view the organizational tree diagram. You can only view the parts of the organizational tree for which you have permission to view.
Created by
The user who created the policy
Creation time
The timestamp for when the policy was created
Cluster connectivity issues
There's no communication from cluster “cluster_name“. Actions may be affected, and the data may be stale.
Verify that you are on a network that has been allowed access to the cluster. Reach out to your cluster administrator for instructions on verifying the issue.
Policy can’t be applied due to a rule that is occupied by a different policy
Field “field_name” already has rules in cluster: “cluster_id”
Remove the rule from the new policy or adjust the old policy for the specific rule.
Policy is not visible in the UI
-
Check that the policy hasn’t been deleted.
Policy syntax is no valid
Add a valid policy YAML;json: unknown field "field_name"
For correct syntax check the Policy YAML reference or the Policy YAML examples.

Cluster
The cluster that the data volume is associated with
Created by
The user who created the data volume
Creation time
The timestamp for when the data volume was created
Last updated
The timestamp of when the data volume was last updated

Azure Kubernetes Service (AKS)
Oracle Kubernetes Engine (OKE)
Rancher Kubernetes Engine (RKE1)
Rancher Kubernetes Engine 2 (RKE2)
v2.21 (latest)
1.30 to 1.32
4.14 to 4.18
CPU
10 cores
Memory
12GB
Disk space
110GB
v2.17
1.27 to 1.29
4.12 to 4.15
v2.18
1.28 to 1.30
4.12 to 4.16
v2.19
1.28 to 1.31
4.12 to 4.17
v2.20
1.29 to 1.32
4.14 to 4.17
helm upgrade -i runai-cluster runai-cluster-<VERSION>.tgz \
--set controlPlane.url=... \
--set controlPlane.clientSecret=... \
--set cluster.uid=... \
--set cluster.url=... --create-namespace \
--set global.image.registry=registry.mycompany.local \
--set global.customCA.enabled=truehelm upgrade -i runai-cluster runai-cluster-<VERSION>.tgz \
--set controlPlane.url=... \
--set controlPlane.clientSecret=... \
--set cluster.uid=... \
--set cluster.url=... --create-namespace \
--set global.image.registry=registry.mycompany.local \
--set global.customCA.enabled=truekubectl create namespace runai-backendoc new-project runai-backendkubectl patch storageclass local-path -p '{"metadata": {"annotations":{"storageclass.kubernetes.io/is-default-class":"true"}}}'helm repo add ingress-nginx https://kubernetes.github.io/ingress-nginx
helm repo update
helm upgrade -i nginx-ingress ingress-nginx/ingress-nginx \
--namespace nginx-ingress --create-namespace \
--set controller.kind=DaemonSet \
--set controller.service.externalIPs="{<INTERNAL-IP>,<EXTERNAL-IP>}" # Replace <INTERNAL-IP> and <EXTERNAL-IP> with the internal and external IP addresses of one of the nodeshelm repo add ingress-nginx https://kubernetes.github.io/ingress-nginx
helm repo update
helm install nginx-ingress ingress-nginx/ingress-nginx \
--namespace nginx-ingress --create-namespacehelm repo add ingress-nginx https://kubernetes.github.io/ingress-nginx
helm repo update
helm install nginx-ingress ingress-nginx/ingress-nginx \
--namespace ingress-nginx --create-namespace \
--set controller.service.annotations.oci.oraclecloud.com/load-balancer-type=nlb \
--set controller.service.annotations.oci-network-load-balancer.oraclecloud.com/is-preserve-source=True \
--set controller.service.annotations.oci-network-load-balancer.oraclecloud.com/security-list-management-mode=None \
--set controller.service.externalTrafficPolicy=Local \
--set controller.service.annotations.oci-network-load-balancer.oraclecloud.com/subnet=<SUBNET-ID> # Replace <SUBNET-ID> with the subnet ID of one of your clusterkubectl create secret tls runai-backend-tls -n runai-backend \
--cert /path/to/fullchain.pem \ # Replace /path/to/fullchain.pem with the actual path to your TLS certificate
--key /path/to/private.pem # Replace /path/to/private.pem with the actual path to your private keykubectl -n runai-backend create secret generic runai-ca-cert \
--from-file=runai-ca.pem=<ca_bundle_path>oc -n runai-backend create secret generic runai-ca-cert \
--from-file=runai-ca.pem=<ca_bundle_path>From cluster v2.21 onwardWorkload priority class management for training workloads - You can now change the default priority class of training workloads within a project, via the API or CLI, by selecting from predefined priority class values. This influences the workload’s position in the project scheduling queue managed by the Run:ai Scheduler, ensuring critical training jobs are prioritized and resources are allocated more efficiently. See Workload priority class control for more details. From cluster v2.18 onward
From cluster v2.21 onwardConfigMaps as environment variables - Use predefined ConfigMaps as environment variables during environment setup or workload submission. From cluster v2.21 onward
Improved scope selection experience - The scope mechanism has been improved to reduce clicks and enhance usability. The organization tree now opens by default at the cluster level for quicker navigation. Scope search now includes alphabetical sorting and supports browsing non-displayed scopes. You can also use keyboard shortcuts: Escape to cancel, or click outside the modal to close it. These improvements apply across templates, policies, projects, and all workload assets.
Improved PVC visibility and selection for researchers - Use runai pvc to list existing PVCs within your scope, making it easier to reference available options when submitting workloads. A noun auto-completion has been introduced for storage, streamlining the selection process. The workload describe command also includes a PVC section, improving visibility into persistent volume claims. These enhancements provide greater clarity and efficiency in storage utilization.
Enhanced workload deletion options - The runai workload delete command now supports deleting multiple workloads by specifying a list of workload names (e.g., workload-a, workload-b, workload-c).
From cluster v2.19 onwardCompute resources can now be updated via API and UI. From cluster v2.21 onward
Support for NVIDIA Cloud Functions (NVCF) external workloads - NVIDIA Run:ai enables you to deploy, schedule and manage NVCF workloads as external workloads within the platform. See Deploy NVIDIA Cloud Functions (NVCF) in NVIDIA Run:ai for more details. From cluster v2.21 onward
Added validation for Knative - You can now only submit inference workloads if Knative is properly installed. This ensures workloads are deployed successfully by preventing submission when Knative is misconfigured or missing. From cluster v2.21 onward
Enhancements in Hugging Face workloads. For more details, see Deploy inference workloads from Hugging Face:
Added Hugging Face model authentication - NVIDIA Run:ai validates whether a user-provided token grants access to a specific model, in addition to checking if a model requires a token and verifying the token format. This enhancement ensures that users can only load models they have permission to access, improving security and usability. From cluster v2.18 onward
Introduced model store support using data sources - Select a data source to serve as a model store, caching model weights to reduce loading time and avoid repeated downloads. This improves performance and deployment speed, especially for frequently used models, minimizing the need to re-authenticate with external sources.
Improved model selection - Select a model from a drop-down list. The list is partial and consists only of models that were tested. From cluster v2.18 onward
Enhanced Hugging Face environment control - Choose between vLLM, TGI, or any other custom container image by selecting an image tag and providing additional arguments. By default, workloads use the official vLLM or TGI containers, with full flexibility to override the image and customize runtime settings for more controlled and adaptable inference deployments. From cluster v2.18 onward
Updated authentication for NIM model access - You can now authenticate access to NIM models using tokens or credentials, ensuring a consistent, flexible, and secure authentication process. See Deploy inference workloads with NVIDIA NIM for more details. From cluster v2.19 onward
Added support for volume configuration - You can now set volumes for custom inference workloads. This feature allows inference workloads to allocate and retain storage, ensuring continuity and efficiency in inference execution. From cluster v2.20 onward
Renamed and updated the "Workloads by type" widget to provide clearer insights into cluster usage with a focus on workloads.
Improved user experience by moving the date picker to a dedicated section within the overtime widgets, Resources allocation and Resources utilization.
Set specific GPU resource limits for both departments and projects.
Expanded SSO OpenID Connect authentication support - SSO OpenID Connect authentication supports attribute mapping of groups in both list and map formats. In map format, the group name is used as the value. This applies to new identity providers only. See Set up SSO with OpenID Connect for more details.
Improved permission error messaging - Enhanced clarity when attempting to delete a user with higher privileges, making it easier to understand and resolve permission-related actions.
Select Start from scratch to launch a new workspace quickly
Enter a name for the workspace (if the name already exists in the project, you will be requested to submit a different name)
Under Submission, select Flexible and click CONTINUE
Click the load icon. A side pane appears, displaying a list of available environments. Select the ‘jupyter-lab’ environment for your workspace (Image URL: jupyter/scipy-notebook)
If ‘jupyter-lab’ is not displayed in the gallery, follow the below steps to create a one-time environment configuration:
Enter the jupyter-lab Image URL - jupyter/scipy-notebook
Tools - Set the connection for your tool
Click +TOOL
Select Jupyter tool from the list
Set the runtime settings for the environment. Click +COMMAND & ARGUMENTS and add the following:
Enter the command - start-notebook.sh
Enter the arguments - --NotebookApp.base_url=/${RUNAI_PROJECT}/${RUNAI_JOB_NAME} --NotebookApp.token=''
Note: If is enabled on the cluster, enter the --NotebookApp.token=''
Click the load icon. A side pane appears, displaying a list of available compute resources. Select the ‘small-fraction’ compute resource for your workspace.
If ‘small-fraction’ is not displayed in the gallery, follow the below steps to create a one-time compute resource configuration:
Set GPU devices per pod - 1
Set GPU memory per device
Select % (of device) - Fraction of a GPU device’s memory
Set the memory Request - 10 (the workload will allocate 10% of the GPU memory)
Optional: set the CPU compute per pod - 0.1 cores (default)
Optional: set the CPU memory per pod - 100 MB (default)
Click CREATE WORKSPACE
Go to the Workload manager → Workloads
Click +NEW WORKLOAD and select Workspace
Select under which cluster to create the workload
Select the project in which your workspace will run
Select Start from scratch to launch a new workspace quickly
Enter a name for the workspace (if the name already exists in the project, you will be requested to submit a different name)
Under Submission, select Original and click CONTINUE
Select the ‘jupyter-lab’ environment for your workspace (Image URL: jupyter/scipy-notebook)
If the ‘jupyter-lab’ is not displayed in the gallery, follow the below steps:
Click +NEW ENVIRONMENT
Select the ‘small-fraction’ compute resource for your workspace
If ‘small-fraction’ is not displayed in the gallery, follow the below steps:
Click +NEW COMPUTE RESOURCE
Enter small-fraction as the name for the compute resource. The name must be unique.
Click CREATE WORKSPACE
Copy the following command to your terminal. Make sure to update the below with the name of your project and workload. For more details, see CLI reference:
Copy the following command to your terminal. Make sure to update the below with the name of your project and workload. For more details, see CLI reference:
Copy the following command to your terminal. Make sure to update the below parameters. For more details, see Workspaces API:
<COMPANY-URL> - The link to the NVIDIA Run:ai user interface
<TOKEN> - The API access token obtained in Step 1
<PROJECT-ID> - The ID of the Project the workload is running on. You can get the Project ID via the .
<CLUSTER-UUID> - The unique identifier of the Cluster. You can get the Cluster UUID via the .
toolType will show the Jupyter icon when connecting to the Jupyter tool via the user interface.
toolName will show when connecting to the Jupyter tool via the user interface.
To connect to the Jupyter Notebook, browse directly to https://<COMPANY-URL>/<PROJECT-NAME>/<WORKLOAD-NAME>
To connect to the Jupyter Notebook, browse directly to https://<COMPANY-URL>/<PROJECT-NAME>/<WORKLOAD-NAME>
Ingress class
NVIDIA Run:ai uses NGINX as the default ingress controller. If your cluster has a different ingress controller, you can configure the ingress class to be created by NVIDIA Run:ai.
global.ingress.tlsSecretName
TLS secret name
NVIDIA Run:ai requires the creation of a secret with . If the runai-backend namespace already had such a secret, you can set the secret name here
<service-name>.podLabels
Pod labels
Set NVIDIA Run:ai and 3rd party services' in a format of key/value pairs.
<service-name>
resources:
limits:
cpu: 500m
memory: 512Mi
requests:
cpu: 250m
memory: 256Mi
Pod request and limits
Set NVIDIA Run:ai and 3rd party services' resources
disableIstioSidecarInjection.enabled
Disable Istio sidecar injection
Disable the automatic injection of Istio sidecars across the entire NVIDIA Run:ai Control Plane services.
global.affinity
System nodes
Sets the system nodes where NVIDIA Run:ai system-level services are scheduled.
Default: Prefer to schedule on nodes that are labeled with node-role.kubernetes.io/runai-system
global.customCA.enabled
Certificate authority
Enables the use of a custom Certificate Authority (CA) in your deployment. When set to true, the system is configured to trust a user-provided CA certificate for secure communication.
The NVIDIA Run:ai control plane chart includes multiple sub-charts of third-party components:
Data store- PostgreSQL (postgresql)
Metrics Store - Thanos (thanos)
Identity & Access Management - Keycloakx (keycloakx)
Analytics Dashboard - (grafana)
Caching, Queue - (nats)
If you have opted to connect to an external PostgreSQL database, refer to the additional configurations table below. Adjust the following parameters based on your connection details:
Disable PostgreSQL deployment - postgresql.enabled
NVIDIA Run:ai connection details - global.postgresql.auth
Grafana connection details - grafana.dbUser, grafana.dbPassword
postgresql.enabled
PostgreSQL installation
If set to false, PostgreSQL will not be installed.
global.postgresql.auth.host
PostgreSQL host
Hostname or IP address of the PostgreSQL server.
global.postgresql.auth.port
PostgreSQL port
Port number on which PostgreSQL is running.
global.postgresql.auth.username
PostgreSQL username
Username for connecting to PostgreSQL.
thanos.receive.persistence.storageClass
Storage class
The installation is configured to work with a specific storage class instead of the default one.
The keycloakx.adminUser can only be set during the initial installation. The admin password can be changed later through the Keycloak UI, but you must also update the keycloakx.adminPassword value in the Helm chart using helm upgrade. See Changing Keycloak admin password for more details.
keycloakx.adminUser
User name of the internal identity provider administrator
Defines the username for the Keycloak administrator. This can only be set during the initial installation.
keycloakx.adminPassword
Password of the internal identity provider administrator
Defines the password for the Keycloak administrator.
keycloakx.existingSecret
Keycloakx credentials (secret)
Existing secret name with authentication credentials.
global.keycloakx.host
Keycloak (NVIDIA Run:ai internal identity provider) host path
Overrides the DNS for Keycloak. This can be used to access access Keycloak externally to the cluster.
You can change the Keycloak admin password after deployment by performing the following steps:
Open the Keycloak UI at: https://<runai-domain>/auth
Sign in with your existing admin credentials as configured in your Helm values
Go to Users and select admin (or your admin username)
Open Credentials → Reset password
Set the new password and click Save
Update the keycloakx.adminPassword value using the helm upgrade command to match the password you set in the Keycloak UI
grafana.db.existingSecret
Grafana database connection credentials (secret)
Existing secret name with authentication credentials.
grafana.dbUser
Grafana database username
Username for accessing the Grafana database.
grafana.dbPassword
Grafana database password
Password for the Grafana database user.
grafana.admin.existingSecret
Grafana admin default credentials (secret)
Existing secret name with authentication credentials.
global.ingress.ingressClass
NVIDIA Run:ai enhances visibility and simplifies management, by monitoring, presenting and orchestrating all AI workloads in the clusters it is installed. Workloads are the fundamental building blocks for consuming resources, enabling AI practitioners such as researchers, data scientists and engineers to efficiently support the entire life cycle of an AI initiative.
A typical AI initiative progresses through several key stages, each with distinct workloads and objectives. With NVIDIA Run:ai, research and engineering teams can host and manage all these workloads to achieve the following:
Data preparation: Aggregating, cleaning, normalizing, and labeling data to prepare for training.
Training: Conducting resource-intensive model development and iterative performance optimization.
Fine-tuning: Adapting pre-trained models to domain-specific datasets while balancing efficiency and performance.
Inference: Deploying models for real-time or batch predictions with a focus on low latency and high throughput.
Monitoring and optimization: Ensuring ongoing performance by addressing data drift, usage patterns, and retraining as needed.
A workload runs in the cluster, is associated with a namespace, and operates to fulfill its targets, whether that is running to completion for a , allocating resources for in an integrated development environment (IDE)/notebook, or serving requests in production.
The workload, defined by the AI practitioner, consists of:
Container images: This includes the application, its dependencies, and the runtime environment.
Compute resources: CPU, GPU, and RAM to execute efficiently and address the workload’s needs.
Data & storage configuration: The data needed for processing such as training and testing datasets or input from external databases, and the storage configuration which refers to the way this data is managed, stored and accessed.
NVIDIA Run:ai’s core mission is to optimize AI resource usage at scale. This is achieved through efficient of all cluster workloads using the NVIDIA Run:ai . The Scheduler allows the prioritization of workloads across different departments and projects within the organization at large scales, based on the resource distribution set by the system administrator.
NVIDIA Run:ai workloads: These workloads are submitted via the NVIDIA Run:ai platform. They are represented by Kubernetes Custom Resource Definitions (CRDs) and APIs. When using , a complete Workload and Scheduling Policy solution is offered for administrators to ensure optimizations, governance and security standards are applied.
Third-party workloads: These workloads are submitted via third-party applications that use the NVIDIA Run:ai Scheduler. The NVIDIA Run:ai platform manages and monitors these workloads. They enable seamless integrations with external tools, allowing teams and individuals flexibility. See .
Different types of workloads have different levels of support. Understanding what capabilities are needed before selecting the workload type to work with is important. The table below details the level of support for each workload type in NVIDIA Run:ai. NVIDIA Run:ai workloads are fully supported with all of NVIDIA Run:ai advanced features and capabilities. While third-party workloads are partially supported. The list of capabilities can change between different NVIDIA Run:ai versions.
This section explains the procedure for managing Node pools.
Node pools assist in managing heterogeneous resources effectively. A node pool is a NVIDIA Run:ai construct representing a set of nodes grouped into a bucket of resources using a predefined node label (e.g. NVIDIA GPU type) or an administrator-defined node label (any key/value pair).
Typically, the grouped nodes share a common feature or property, such as GPU type or other HW capability (such as Infiniband connectivity), or represent a proximity group (i.e. nodes interconnected via a local ultra-fast switch). Researchers and ML Engineers would typically use node pools to run specific workloads on specific resource types.
In the NVIDIA Run:ai Platform a user with the System administrator role can create, view, edit, and delete node pools. Creating a new node pool creates a new instance of the NVIDIA Run:ai . Workloads submitted to a node pool are scheduled using the node pool’s designated scheduler instance.
Once created, the new node pool is automatically assigned to all and with a quota of zero GPU resources, unlimited CPU resources, and over quota enabled (medium weight if over quota weight is enabled). This allows any project and department to use any node pool when over quota is enabled, even if the administrator has not assigned a quota for a specific node pool within that project or department.
When submitting a new , users can add a prioritized list of node pools. The node pool selector picks one node pool at a time (according to the prioritized list) and the designated node pool scheduler instance handles the submission request and tries to match the requested resources within that node pool. If the scheduler cannot find resources to satisfy the submitted workload, the node pool selector moves the request to the next node pool in the prioritized list, if no node pool satisfies the request, the node pool selector starts from the first node pool again until one of the node pools satisfies the request.
By default, Kubernetes uses its own native scheduler to determine pod placement. The NVIDIA Run:ai platform provides a custom scheduler, runai-scheduler, which is used by default for workloads submitted using the platform. This section outlines how to configure third-party workloads, such as those submitted directly to Kubernetes, to run with the , runai-scheduler, instead of the default Kubernetes scheduler.
This section outlines how to configure workloads submitted directly to Kubernetes or through external frameworks to run with the , instead of the default Kubernetes scheduler.
runai project set "project-name"
runai workspace submit "workload-name" --image jupyter/scipy-notebook \
--gpu-devices-request 0.1 --command --external-url container=8888 \
--name-prefix jupyter --command -- start-notebook.sh \
--NotebookApp.base_url=/${RUNAI_PROJECT}/${RUNAI_JOB_NAME} --NotebookApp.token=runai config project "project-name"
runai submit "workload-name" --jupyter -g 0.1curl -L 'https://<COMPANY-URL>/api/v1/workloads/workspaces' \
-H 'Content-Type: application/json' \
-H 'Authorization: Bearer <TOKEN>' \
-d '{
"name": "workload-name",
"projectId": "<PROJECT-ID>",
"clusterId": "<CLUSTER-UUID>",
"spec": {
"command" : "start-notebook.sh",
"args" : "--NotebookApp.base_url=/${RUNAI_PROJECT}/${RUNAI_JOB_NAME} --NotebookApp.token=''",
"image": "jupyter/scipy-notebook",
"compute": {
"gpuDevicesRequest": 1,
"gpuRequestType": "portion",
"gpuPortionRequest": 0.1
},
"exposedUrls" : [
{
"container" : 8888,
"toolType": "jupyter-notebook",
"toolName": "Jupyter"
}
]
}
}global.postgresql.auth.password
PostgreSQL password
Password for the PostgreSQL user specified by global.postgresql.auth.username.
global.postgresql.auth.postgresPassword
PostgreSQL default admin password
Password for the built-in PostgreSQL superuser (postgres).
global.postgresql.auth.existingSecret
Postgres Credentials (secret)
Existing secret name with authentication credentials.
global.postgresql.auth.dbSslMode
Postgres connection SSL mode
Set the SSL mode. See the full list in Protection Provided in Different Modes. Prefer mode is not supported.
postgresql.primary.initdb.password
PostgreSQL default admin password
Set the same password as in global.postgresql.auth.postgresPassword (if changed).
postgresql.primary.persistence.storageClass
Storage class
The installation is configured to work with a specific storage class instead of the default one.
grafana.adminUser
Grafana username
Override the NVIDIA Run:ai default user name for accessing Grafana.
grafana.adminPassword
Grafana password
Override the NVIDIA Run:ai default password for accessing Grafana.
enforceRunaiScheduler is enabled (true) by default. This ensures that any workload associated with a NVIDIA Run:ai project automatically uses the runai-scheduler, including workloads submitted directly to Kubernetes or through external frameworks.If this parameter is disabled, enforceRunaiScheduler=false, workloads will no longer default to the NVIDIA Run:ai Scheduler. In this case, you can still use the NVIDIA Run:ai Scheduler by specifying it manually in the workload YAML.
To use the NVIDIA Run:ai Scheduler, specify it in the workload’s YAML file. This instructs Kubernetes to schedule the workload using the NVIDIA Run:ai Scheduler instead of the default one.
For example:
spec:schedulerName: runai-schedulerapiVersion: v1
kind: Pod
metadata:
annotations:
user: test
gpu-fraction: "0.5"
gpu-fraction-num-devices: "2"
labels:
runai/queue: test
name: multi-fractional-pod-job
namespace: test
spec:
containers:
- image: gcr.io/run-ai-demo/quickstart-cuda
imagePullPolicy: Always
name: job
env:
- name: RUNAI_VERBOSE
value: "1"
resources:
limits:
cpu: 200m
memory: 200Mi
requests:
cpu: 100m
memory: 100Mi
securityContext:
capabilities:
drop: ["ALL"]
schedulerName: runai-scheduler
serviceAccount: default
serviceAccountName: default
terminationGracePeriodSeconds: 5Enter the jupyter-lab Image URL - jupyter/scipy-notebook
Tools - Set the connection for your tool
Click +TOOL
Select Jupyter tool from the list
Set the runtime settings for the environment. Click +COMMAND & ARGUMENTS and add the following:
Enter the command - start-notebook.sh
Enter the arguments - --NotebookApp.base_url=/${RUNAI_PROJECT}/${RUNAI_JOB_NAME} --NotebookApp.token=''
Note: If host-based routing is enabled on the cluster, enter the --NotebookApp.token='' only.
Click CREATE ENVIRONMENT
The newly created environment will be selected automatically
Set GPU devices per pod - 1
Set GPU memory per device
Select % (of device) - Fraction of a GPU device’s memory
Set the memory Request - 10 (the workload will allocate 10% of the GPU memory)
Optional: set the CPU compute per pod - 0.1 cores (default)
Optional: set the CPU memory per pod - 100 MB (default)
Click CREATE COMPUTE RESOURCE
The newly created compute resource will be selected automatically
Browse to the provided NVIDIA Run:ai user interface and log in with your credentials.
Run the below --help command to obtain the login options and log in according to your setup:
Log in using the following command. You will be prompted to enter your username and password:
To use the API, you will need to obtain a token as shown in
v
v
v
v
v
v
v
v
v
v
v
v
v
v
v
v
v
v
v
v
v
v
v
v
v
v
v
v
v
v
v
v
v
v
v
v
v
v
Elastic scaling
NA
NA
v
v
v
v
v
v
v
v
v
v
v
v
v
Workload awareness
v
v
v
v
v
v
v
v
v
v
v
v
v
v
v
v
v
v
v
v
v
v
v
v
v
v
v
v
v
v
v
v
v
The Node pools table can be found under Resources in the NVIDIA Run:ai platform.
The Node pools table lists all the node pools defined in the NVIDIA Run:ai platform and allows you to manage them.
The Node pools table consists of the following columns:
Node pool
The node pool name, set by the administrator during its creation (the node pool name cannot be changed after its creation).
Status
Node pool status. A ‘Ready’ status means the scheduler can use this node pool to schedule workloads. ‘Empty’ status means no nodes are currently included in that node pool.
Label key Label value
The node pool controller will use this node-label key-value pair to match nodes into this node pool.
Node(s)
List of nodes included in this node pool. Click the field to view details (the details are in the article).
GPU network acceleration (MNNVL)
Indicates whether the discovery method of Multi-Node NVL nodes is done automatically or manually
MNNVL label key
The label key that is used to automatically detect if a node is part of an MNNVL domain. The default MNNVL domain label is nvidia.com/gpu.clique.
Click one of the values in the Workload(s) column, to view the list of workloads and their parameters.
Workload
The name of the workload. If the workloads’ type is one of the recognized types (for example: Pytorch, MPI, Jupyter, Ray, Spark, Kubeflow, and many more), an appropriate icon is printed.
Type
The NVIDIA Run:ai platform type of the workload - Workspace, Training, or Inference
Status
The state of the workload. The Workloads state is described in the NVIDIA Run:ai section
Created by
The User or Application created this workload
Running/requested pods
The number of running pods out of the number of requested pods within this workload.
Creation time
The workload’s creation date and time
Filter - Click ADD FILTER, select the column to filter by, and enter the filter values
Search - Click SEARCH and type the value to search by
Sort - Click each column header to sort by
Column selection - Click COLUMNS and select the columns to display in the table
Download table - Click MORE and then Click Download as CSV. Export to CSV is limited to 20,000 rows.
Show/Hide details - Click to view additional information on the selected row
Select a row in the Node pools table and then click Show details in the upper-right corner of the action bar. The details window appears, presenting metrics graphs for the whole node pool:
Node GPU allocation - This graph shows an overall sum of the Allocated, Unallocated, and Total number of GPUs for this node pool, over time. From observing this graph, you can learn about the occupancy of GPUs in this node pool, over time.
GPU Utilization Distribution - This graph shows the distribution of GPU utilization in this node pool over time. Observing this graph, you can learn how many GPUs are utilized up to 25%, 25%-50%, 50%-75%, and 75%-100%. This information helps to understand how many available resources you have in this node pool, and how well those resources are utilized by comparing the allocation graph to the utilization graphs, over time.
GPU Utilization - This graph shows the average GPU utilization in this node pool over time. Comparing this graph with the GPU Utilization Distribution helps to understand the actual distribution of GPU occupancy over time.
GPU Memory Utilization - This graph shows the average GPU memory utilization in this node pool over time, for example an average of all nodes’ GPU memory utilization over time.
CPU Utilization - This graph shows the average CPU utilization in this node pool over time, for example, an average of all nodes’ CPU utilization over time.
CPU Memory Utilization - This graph shows the average CPU memory utilization in this node pool over time, for example an average of all nodes’ CPU memory utilization over time.
To create a new node pool:
Click +NEW NODE POOL
Enter a name for the node pool. Node pools names must start with a letter and can only contain lowercase Latin letters, numbers or a hyphen ('-’)
Enter the node pool label: The node pool controller will use this node-label key-value pair to match nodes into this node pool.
Key is the unique identifier of a node label.
The key must fit the following regular expression: ^(([A-Za-z0-9][-A-Za-z0-9_.]*)?[A-Za-z0-9])?/?([A-Za-z0-9][-A-Za-z0-9_.]*)?[A-Za-z0-9]$
The administrator can put an automatically preset label such as the nvidia.com/gpu.product that labels the GPU type or any other key from a node label.
Value is the value of that label identifier (key). The same key may have different values, in this case, they are considered as different labels.
Value must fit the following regular expression: ^(([A-Za-z0-9][-A-Za-z0-9_.]*)?[A-Za-z0-9])?$
A node pool is defined by a single key-value pair. You must not use different labels that are set on the same node by different node pools, this situation may lead to unexpected results.
Set the GPU placement strategy:
Bin-pack - Place as many workloads as possible in each GPU and node to use fewer resources and maximize GPU and node vacancy.
Spread - Spread workloads across as many GPUs and nodes as possible to minimize the load and maximize the available resources per workload.
GPU workloads are workloads that request both GPU and CPU resources
Set the CPU placement strategy:
Bin-pack - Place as many workloads as possible in each CPU and node to use fewer resources and maximize CPU and node vacancy.
Spread - Spread workloads across as many CPUs and nodes as possible to minimize the load and maximize the available resources per workload.
CPU workloads are workloads that request purely CPU resources
Set the GPU network acceleration. For more details, see :
Set the discovery method of GPU network acceleration (MNNVL)
Automatic - Automatically identify whether the node pool contains any MNNVL nodes. MNNVL nodes that share the same ID are part of the same NVL rack.
Click CREATE NODE POOL
The administrator can use a preset node label, such as the nvidia.com/gpu.product that labels the GPU type, or configure any other node label (e.g. faculty=physics).
To assign a label to nodes you want to group into a node pool, set a node label on each node:
Obtain the list of nodes and their current labels by copying the following to your terminal:
Annotate a specific node with a new label by copying the following to your terminal:
Most cloud providers allow you to configure node labels at the node pool level. You can apply labels when creating a cluster, creating a node pool, or by editing an existing node pool.
Ensure that each node is labeled using the Kubernetes label format. This label ensures that workloads are scheduled correctly based on node pool definitions:
Refer to the provider-specific documentation below for guidance on how to configure node pool labels:
Select the node pool you want to edit
Click EDIT
Update the node pool and click SAVE
Select the node pool you want to delete
Click DELETE
On the dialog, click DELETE to confirm the deletion
To view the available actions, go to the Node pools API reference.
This section explains the procedure for managing departments
Departments are a grouping of projects. By grouping projects into a department, you can set quota limitations to a set of projects, create policies that are applied to the department, and create assets that can be scoped to the whole department or a partial group of descendent projects
For example, in an academic environment, a department can be the Physics Department grouping various projects (AI Initiatives) within the department, or grouping projects where each project represents a single student.
The Departments table can be found under Organization in the NVIDIA Run:ai platform.
The Departments table lists all departments defined for a specific cluster and allows you to manage them. You can switch between clusters by selecting your cluster using the filter at the top.
The Departments table consists of the following columns:
Click one of the values of Node pool(s) with quota column, to view the list of node pools and their parameters
Click one of the values of the Subject(s) column, to view the list of subjects and their parameters. This column is only viewable if your role in the NVIDIA Run:ai system affords you those permissions.
Filter - Click ADD FILTER, select the column to filter by, and enter the filter values
Search - Click SEARCH and type the value to search by
Sort - Click each column header to sort by
Column selection - Click COLUMNS and select the columns to display in the table
To create a new Department:
Click +NEW DEPARTMENT
Select a scope. By default, the field contains the scope of the current UI context cluster, viewable at the top left side of your screen. You can change the current UI context cluster by clicking the ‘Cluster: cluster-name’ field and applying another cluster as the UI context. Alternatively, you can choose another cluster within the ‘+ New Department’ form by clicking the organizational tree icon on the right side of the scope field, opening the organizational tree and selecting one of the available clusters.
Enter a name for the department. Department names must start with a letter and can only contain lower case latin letters, numbers or a hyphen ('-’).
To create a new access rule for a department:
Select the department you want to add an access rule for
Click ACCESS RULES
Click +ACCESS RULE
Select a subject
To delete an access rule from a department:
Select the department you want to remove an access rule from
Click ACCESS RULES
Find the access rule you would like to delete
Click on the trash icon
Select the Department you want to edit
Click EDIT
Update the Department and click SAVE
To view the policy of a department:
Select the department for which you want to view its . This option is only active if the department has defined policies in place.
Click VIEW POLICY and select the workload type for which you want to view the policies: a. Workspace workload type policy with its set of rules b. Training workload type policies with its set of rules
In the Policy form, view the workload rules that are enforcing your department for the selected workload type as well as the defaults:
Select the department you want to delete
Click DELETE
On the dialog, click DELETE to confirm the deletion
Select the department you want to review
Click REVIEW
Review and click CLOSE
To view the available actions, go to the API reference.
This section explains the procedure to view and manage Clusters.
The Cluster table provides a quick and easy way to see the status of your cluster.
The Clusters table can be found under Resources in the NVIDIA Run:ai platform.
The clusters table provides a list of the clusters added to NVIDIA Run:ai platform, along with their status.
The clusters table consists of the following columns:
Advanced cluster configurations can be used to tailor your NVIDIA Run:ai cluster deployment to meet specific operational requirements and optimize resource management. By fine-tuning these settings, you can enhance functionality, ensure compatibility with organizational policies, and achieve better control over your cluster environment. This article provides guidance on implementing and managing these configurations to adapt the NVIDIA Run:ai cluster to your unique needs.
After the NVIDIA Run:ai cluster is installed, you can adjust various settings to better align with your organization's operational needs and security requirements.
Advanced cluster configurations in NVIDIA Run:ai are managed through the runaiconfig . To edit the cluster configurations, run:
This quick start provides a step-by-step walkthrough for running a Jupyter Notebook with .
NVIDIA Run:ai’s dynamic GPU fractions optimizes GPU utilization by enabling workloads to dynamically adjust their resource usage. It allows users to specify a guaranteed fraction of GPU memory and compute resources with a higher limit that can be dynamically utilized when additional resources are requested.
Before you start, make sure:
This section explains what environments are and how to create and use them.
Environments are one type of . An environment consists of a configuration that simplifies how workloads are submitted and can be used by AI practitioners when they submit their workloads.
An environment asset is a preconfigured building block that encapsulates aspects for the workload such as:
Container image and container configuration
Tools and connections
kubectl get nodes --show-labelskubectl label node <node-name> <key>=<value>run.ai/type=<TYPE_VALUE>Detected
Not detected
Set the node’s label used to discover GPU network acceleration (MNNVL) to nvidia.com/gpu.clique
MNNVL nodes
Indicates whether MNNVL nodes are detected - automatically or manually.
Total GPU devices
The total number of GPU devices installed into nodes included in this node pool. For example, a node pool that includes 12 nodes each with 8 GPU devices would show a total number of 96 GPU devices.
Total GPU memory
The total amount of GPU memory included in this node pool. The total amount of GPU memory installed in nodes included in this node pool. For example, a node pool that includes 12 nodes, each with 8 GPU devices, and each device with 80 GB of memory would show a total memory amount of 7.68 TB.
Allocated GPUs
The total allocation of GPU devices in units of GPUs (decimal number). For example, if 3 GPUs are 50% allocated, the field prints out the value 1.50. This value represents the portion of GPU memory consumed by all running pods using this node pool. ‘Allocated GPUs’ can be larger than ‘Projects’ GPU quota’ if over quota is used by workloads, but not larger than GPU devices.
GPU resource optimization ratio
Shows the Node Level Scheduler mode.
Total CPU (Cores)
The number of CPU cores installed on nodes included in this node
Total CPU memory
The total amount of CPU memory installed on nodes using this node pool
Allocated CPU (Cores)
The total allocation of CPU compute in units of Cores (decimal number). This value represents the amount of CPU cores consumed by all running pods using this node pool. ‘Allocated CPUs’ can be larger than ‘Projects’ GPU quota’ if over quota is used by workloads, but not larger than CPUs (Cores).
Allocated CPU memory
The total allocation of CPU memory in units of TB/GB/MB (decimal number). This value represents the amount of CPU memory consumed by all running pods using this node pool. ‘Allocated CPUs’ can be larger than ‘Projects’ CPU memory quota’ if over quota is used by workloads, but not larger than CPU memory.
GPU placement strategy
Sets the Scheduler strategy for the assignment of pods requesting both GPU and CPU resources to nodes, which can be either Bin-pack or Spread. By default, Bin-Pack is used, but can be changed to Spread by editing the node pool. When set to Bin-pack the scheduler will try to fill nodes as much as possible before using empty or sparse nodes, when set to spread the scheduler will try to keep nodes as sparse as possible by spreading workloads across as many nodes as it succeeds.
CPU placement strategy
Sets the Scheduler strategy for the assignment of pods requesting only CPU resources to nodes, which can be either Bin-pack or Spread. By default, Bin-Pack is used, but can be changed to Spread by editing the node pool. When set to Bin-pack the scheduler will try to fill nodes as much as possible before using empty or sparse nodes, when set to spread the scheduler will try to keep nodes as sparse as possible by spreading workloads across as many nodes as it succeeds.
Last update
The date and time when the node pool was last updated
Creation time
The date and time when the node pool was created
Workload(s)
List of workloads running on nodes included in this node pool, click the field to view details (described below in this article)
Allocated GPU compute
The total amount of GPU compute allocated by this workload. A workload with 3 Pods, each allocating 0.5 GPU, will show a value of 1.5 GPUs for the workload.
Allocated GPU memory
The total amount of GPU memory allocated by this workload. A workload with 3 Pods, each allocating 20GB, will show a value of 60 GB for the workload.
Allocated CPU compute (cores)
The total amount of CPU compute allocated by this workload. A workload with 3 Pods, each allocating 0.5 Core, will show a value of 1.5 Cores for the workload.
Allocated CPU memory
The total amount of CPU memory allocated by this workload. A workload with 3 Pods, each allocating 5 GB of CPU memory, will show a value of 15 GB of CPU memory for the workload.

runai login --helprunai loginAllocated GPUs
The total number of GPUs allocated by successfully scheduled workloads in projects associated with this department
GPU allocation ratio
The ratio of Allocated GPUs to GPU quota. This number reflects how well the department’s GPU quota is utilized by its descendant projects. A number higher than 100% means the department is using over quota GPUs. A number lower than 100% means not all projects are utilizing their quotas. A quota becomes allocated once a workload is successfully scheduled.
Creation time
The timestamp for when the department was created
Workload(s)
The list of workloads under projects associated with this department. Click the values under this column to view the list of workloads with their resource parameters (as described below)
Cluster
The cluster that the department is associated with
Allocated CPU memory
The actual amount of CPU memory allocated by workloads using this node pool under all projects associated with this department. The number of Allocated CPU memory may temporarily surpass the CPU memory quota if over quota is used.
Download table - Click MORE and then Click Download as CSV. Export to CSV is limited to 20,000 rows.
In the Quota management section, you can set the quota parameters and prioritize resources
Order of priority This column is displayed only if more than one node pool exists. The node-pools order of priority in 'Departments/Quota management' sets the default node-pools order of priority for newly created projects under that Department. The Administrator can then change the order per Project. Node-pools order of priority sets the order in which the Scheduler uses node pools to schedule a workload, it is effective for projects and their associated workloads. This means the Scheduler first tries to allocate resources using the highest priority node pool, then the next in priority, until it reaches the lowest priority node pool list, then the Scheduler starts from the highest again. The Scheduler uses the Project's list of prioritized node pools, only if the order of priority of node pools is not set in the workload during submission, either by an admin policy or by the user. Empty value means the node pool is not part of the Department default node pool priority list inherited to newly created projects, but a node pool can still be chosen by the admin policy or a user during workload submission.
Node pool This column is displayed only if more than one node pool exists. It represents the name of the node pool
Under the QUOTA tab
Over-quota state Indicates if over-quota is enabled or disabled as set in the SCHEDULING PREFERENCES tab. If over-quota is set to None, then it is disabled.
GPU devices The number of GPUs you want to allocate for this department in this node pool (decimal number).
Set Scheduling rules as required.
Click CREATE DEPARTMENT
Select or enter the subject identifier:
User Email for a local user created in NVIDIA Run:ai or for SSO user as recognized by the IDP
Group name as recognized by the IDP
Application name as created in NVIDIA Run:ai
Select a role
Click SAVE RULE
Click CLOSE
Parameter - The workload submission parameter that Rule and Default is applied on
Type (applicable for data sources only) - The data source type (Git, S3, nfs, pvc etc.)
Default - The default value of the Parameter
Rule - Set up constraints on workload policy fields
Source - The origin of the applied policy (cluster, department or project)
Department
The name of the department
Node pool(s) with quota
The node pools associated with this department. By default, all node pools within a cluster are associated with each department. Administrators can change the node pools’ quota parameters for a department. Click the values under this column to view the list of node pools with their parameters (as described below)
GPU quota
GPU quota associated with the department
Total GPUs for projects
The sum of all projects’ GPU quotas associated with this department
Project(s)
List of projects associated with this department
Subject(s)
The users, SSO groups, or applications with access to the project. Click the values under this column to view the list of subjects with their parameters (as described below). This column is only viewable if your role in NVIDIA Run:ai platform allows you those permissions.
Node pool
The name of the node pool is given by the administrator during node pool creation. All clusters have a default node pool created automatically by the system and named ‘default’.
GPU quota
The amount of GPU quota the administrator dedicated to the department for this node pool (floating number, e.g. 2.3 means 230% of a GPU capacity)
CPU (Cores)
The amount of CPU (cores) quota the administrator has dedicated to the department for this node pool (floating number, e.g. 1.3 Cores = 1300 mili-cores). The ‘unlimited’ value means the CPU (Cores) quota is not bound and workloads using this node pool can use as many CPU (Cores) resources as they need (if available)
CPU memory
The amount of CPU memory quota the administrator has dedicated to the department for this node pool (floating number, in MB or GB). The ‘unlimited’ value means the CPU memory quota is not bounded and workloads using this node pool can use as much CPU memory resource as they need (if available).
Allocated GPUs
The total amount of GPUs allocated by workloads using this node pool under projects associated with this department. The number of allocated GPUs may temporarily surpass the GPU quota of the department if over quota is used.
Allocated CPU (Cores)
The total amount of CPUs (cores) allocated by workloads using this node pool under all projects associated with this department. The number of allocated CPUs (cores) may temporarily surpass the CPUs (Cores) quota of the department if over quota is used.
Subject
A user, SSO group, or application assigned with a role in the scope of this department
Type
The type of subject assigned to the access rule (user, SSO group, or application).
Scope
The scope of this department within the organizational tree. Click the name of the scope to view the organizational tree diagram, you can only view the parts of the organizational tree for which you have permission to view.
Role
The role assigned to the subject, in this department’s scope
Authorized by
The user who granted the access rule
Last updated
The last time the access rule was updated

Cluster
The name of the cluster
Status
The status of the cluster. For more information see the . Hover over the information icon for a short description and links to troubleshooting
Creation time
The timestamp when the cluster was created
URL
The URL that was given to the cluster
NVIDIA Run:ai cluster version
The NVIDIA Run:ai version installed on the cluster
Kubernetes distribution
The flavor of Kubernetes distribution
Kubernetes version
Waiting to connect
The cluster has never been connected.
Disconnected
There is no communication from the cluster to the Control plane. This may be due to a network issue.
Missing prerequisites
Some prerequisites are missing from the cluster. As a result, some features may be impacted.
Service issues
At least one of the services is not working properly. You can view the list of nonfunctioning services for more information.
Connected
The NVIDIA Run:ai cluster is connected, and all NVIDIA Run:ai services are running.
Filter - Click ADD FILTER, select the column to filter by, and enter the filter values
Search - Click SEARCH and type the value to search by
Sort - Click each column header to sort by
Column selection - Click COLUMNS and select the columns to display in the table
Download table - Click MORE and then Click Download as CSV. Export to CSV is limited to 20,000 rows.
To add a new cluster, see the installation guide.
Select the cluster you want to remove
Click REMOVE
A dialog appears: Make sure to carefully read the message before removing
Click REMOVE to confirm the removal.
Go to the Clusters API reference to view the available actions
Before starting, make sure you have access to the Kubernetes cluster where NVIDIA Run:ai is deployed with the necessary permissions

runaiconfig object structure, use:The following configurations allow you to enable or disable features, control permissions, and customize the behavior of your NVIDIA Run:ai cluster:
spec.global.affinity (object)
Sets the system nodes where NVIDIA Run:ai system-level services are scheduled. Using global.affinity will overwrite the set using the Administrator CLI (runai-adm).
Default: Prefer to schedule on nodes that are labeled with node-role.kubernetes.io/runai-system
spec.global.nodeAffinity.restrictScheduling (boolean)
Enables setting and restricting workload scheduling to designated nodes
Default: false
spec.global.tolerations (object)
Configure Kubernetes tolerations for NVIDIA Run:ai system-level services
spec.global.ingress.ingressClass
NVIDIA Run:ai uses NGINX as the default ingress controller. If your cluster has a different ingress controller, you can configure the ingress class to be created by NVIDIA Run:ai.
spec.global.subdomainSupport (boolean)
Allows the creation of subdomains for ingress endpoints, enabling access to workloads via unique subdomains on the . For details, see .
Default: false
spec.global.enableWorkloadOwnershipProtection (boolean)
Prevents users within the same project from deleting workloads created by others. This enhances workload ownership security and ensures better collaboration by restricting unauthorized modifications or deletions.
Default: false
NVIDIA Run:ai cluster includes many different services. To simplify resource management, the configuration structure allows you to configure the containers CPU / memory resources for each service individually or group of services together.
SchedulingServices
Containers associated with the NVIDIA Run:ai Scheduler
Scheduler, StatusUpdater, MetricsExporter, PodGrouper, PodGroupAssigner, Binder
SyncServices
Containers associated with syncing updates between the NVIDIA Run:ai cluster and the NVIDIA Run:ai control plane
Agent, ClusterSync, AssetsSync
WorkloadServices
Containers associated with submitting NVIDIA Run:ai workloads
WorkloadController,
JobController
Apply the following configuration in order to change resources request and limit for a group of services:
Or, apply the following configuration in order to change resources request and limit for each service individually:
For resource recommendations, see Vertical scaling.
By default, all NVIDIA Run:ai containers are deployed with a single replica. Some services support multiple replicas for redundancy and performance.
To simplify configuring replicas, a global replicas configuration can be set and is applied to all supported services:
This can be overwritten for specific services (if supported). Services without the replicas configuration does not support replicas:
The Prometheus instance in NVIDIA Run:ai is used for metrics collection and alerting.
The configuration scheme follows the official PrometheusSpec and supports additional custom configurations. The PrometheusSpec schema is available using the spec.prometheus.spec configuration.
A common use case using the PrometheusSpec is for metrics retention. This prevents metrics loss during potential connectivity issues and can be achieved by configuring local temporary metrics retention. For more information, see Prometheus Storage:
In addition to the PrometheusSpec schema, some custom NVIDIA Run:ai configurations are also available:
Additional labels – Set additional labels for NVIDIA Run:ai's built-in alerts sent by Prometheus.
Log level configuration – Configure the logLevel setting for the Prometheus container.
To include or exclude specific nodes from running workloads within a cluster managed by NVIDIA Run:ai, use the nodeSelectorTerms flag. For additional details, see Kubernetes nodeSelector.
Label the nodes using the below:
key: Label key (e.g., zone, instance-type).
operator: Operator defining the inclusion/exclusion condition (In, NotIn, Exists, DoesNotExist).
values: List of values for the key when using In or NotIn.
The below example shows how to include NVIDIA GPUs only and exclude all other GPU types in a cluster with mixed nodes, based on product type GPU label:
For air-gapped environments, when working with a Local Certificate Authority, it is required to replace the default sidecar images in order to use the Git and S3 data source integrations. Use the following configurations:
The project has an assigned quota of at least 0.5 GPU.
Dynamic GPU fractions is enabled.
Browse to the provided NVIDIA Run:ai user interface and log in with your credentials.
Run the below --help command to obtain the login options and log in according to your setup:
runai login --helpTo use the API, you will need to obtain a token as shown in API authentication.
Go to the Workload manager → Workloads
Click +NEW WORKLOAD and select Workspace
Select under which cluster to create the workload
Select the project in which your workspace will run
Select Start from scratch to launch a new workspace quickly
Enter a name for the workspace (if the name already exists in the project, you will be requested to submit a different name)
Under Submission, select Flexible and click CONTINUE
Click the load icon. A side pane appears, displaying a list of available environments. To add a new environment:
Click the + icon to create a new environment
Enter quick-start as the name for the environment. The name must be unique.
Enter the Image URL -
Click the load icon. A side pane appears, displaying a list of available compute resources. To add a new compute resource:
Click the + icon to create a new compute resource
Enter request-limit as the name for the compute resource. The name must be unique.
Set GPU devices per pod
Click CREATE WORKSPACE
Go to the Workload manager → Workloads
Click +NEW WORKLOAD and select Workspace
Select under which cluster to create the workload
Select the project in which your workspace will run
Copy the following command to your terminal. Make sure to update the below with the name of your project and workload. For more details, see :
Copy the following command to your terminal. Make sure to update the below parameters. For more details, see
<COMPANY-URL> - The link to the NVIDIA Run:ai user interface
<TOKEN> - The API access token obtained in
<PROJECT-ID>
Go to the Workload manager → Workloads
Click +NEW WORKLOAD and select Workspace
Select the cluster where the previous workspace was created
Select the project where the previous workspace was created
Select Start from scratch to launch a new workspace quickly
Enter a name for the workspace (if the name already exists in the project, you will be requested to submit a different name)
Under Submission, select Flexible and click CONTINUE
Click the load icon. A side pane appears, displaying a list of available environments. Select the environment created in .
Click the load icon. A side pane appears, displaying a list of available compute resources. Select the compute resources created in .
Click CREATE WORKSPACE
Go to the Workload manager → Workloads
Click +NEW WORKLOAD and select Workspace
Select the cluster where the previous workspace was created
Select the project where the previous workspace was created
Copy the following command to your terminal. Make sure to update the below with the name of your project and workload. For more details, see :
Copy the following command to your terminal. Make sure to update the below parameters. For more details, see
<COMPANY-URL> - The link to the NVIDIA Run:ai user interface
<TOKEN> - The API access token obtained in
<PROJECT-ID>
Select the newly created workspace with the Jupyter application that you want to connect to
Click CONNECT
Select the Jupyter tool. The selected tool is opened in a new tab on your browser.
Open a terminal and use the watch nvidia-smi to get a constant reading of the memory consumed by the pod. Note that the number shown in the memory box is the Limit and not the Request or Guarantee.
Open the file Untitled.ipynb and move the frame so you can see both tabs
Execute both cells in Untitled.ipynb. This will consume about 3 GB of GPU memory and be well below the 4GB of the GPU Memory Request value.
In the second cell, edit the value after --image-size from 100 to 200 and run the cell. This will increase the GPU memory utilization to about 11.5 GB which is above the Request value.
To connect to the Jupyter Notebook, browse directly to https://<COMPANY-URL>/<PROJECT-NAME>/<WORKLOAD-NAME>
Open a terminal and use the watch nvidia-smi to get a constant reading of the memory consumed by the pod. Note that the number shown in the memory box is the Limit and not the Request or Guarantee.
Open the file Untitled.ipynb and move the frame so you can see both tabs
To connect to the Jupyter Notebook, browse directly to https://<COMPANY-URL>/<PROJECT-NAME>/<WORKLOAD-NAME>
Open a terminal and use the watch nvidia-smi to get a constant reading of the memory consumed by the pod. Note that the number shown in the memory box is the Limit and not the Request or Guarantee.
Open the file Untitled.ipynb and move the frame so you can see both tabs
Manage and monitor your newly created workload using the Workloads table.
The Environments table can be found under Workload manager in the NVIDIA Run:ai platform.
The Environment table provides a list of all the environment defined in the platform and allows you to manage them.
The Environments table consists of the following columns:
Environment
The name of the environment
Description
A description of the environment
Scope
The of this environment within the organizational tree. Click the name of the scope to view the organizational tree diagram
Image
The application or service to be run by the workload
Workload Architecture
This can be either standard for running workloads on a single node or distributed for running distributed workloads on multiple nodes
Tool(s)
The tools and connection types the environment exposes
Click one of the values in the tools column to view the list of tools and their connection type.
Tool name
The name of the tool or application AI practitioner can set up within the environment. For more information, see .
Connection type
The method by which you can access and interact with the running workload. It's essentially the "doorway" through which you can reach and use the tools the workload provide. (E.g node port, external URL, etc)
Click one of the values in the Workload(s) column to view the list of workloads and their parameters.
Workload
The workload that uses the environment
Type
The workload type (Workspace/Training/Inference)
Status
Represents the workload lifecycle. See the full list of )
Filter - Click ADD FILTER, select the column to filter by, and enter the filter values
Search - Click SEARCH and type the value to search by
Sort - Click each column header to sort by
Column selection - Click COLUMNS and select the columns to display in the table
Download table - Click MORE and then Click Download as CSV. Export to CSV is limited to 20,000 rows.
When installing NVIDIA Run:ai, you automatically get the environments created by NVIDIA Run:ai to ease up the onboarding process and support different use cases out of the box. These environments are created at the scope of the account.
jupyter-lab / jupyter-scipy
jupyter/scipy-notebook
An interactive development environment for Jupyter notebooks, code, and data visualization
jupyter-tensorboard
gcr.io/run-ai-demo/jupyter-tensorboard
An integrated combination of the interactive Jupyter development environment and TensorFlow's visualization toolkit for monitoring and analyzing ML models
tensorboard / tensorboad-tensorflow
tensorflow/tensorflow:latest
A visualization toolkit for TensorFlow that helps users monitor and analyze ML models, displaying various metrics and model architecture
llm-server
runai.jfrog.io/core-llm/runai-vllm:v0.6.4-0.10.0
Environment creation is limited to specific roles
To add a new environment:
Go to the Environments table
Click +NEW ENVIRONMENT
Select under which cluster to create the environment
Select a scope
Enter a name for the environment. The name must be unique.
Optional: Provide a description of the essence of the environment
Enter the Image URL If a token or secret is required to pull the image, it is possible to create it via . These credentials are automatically used once the image is pulled (which happens when the workload is submitted)
Set the image pull policy - the condition for when to pull the image from the registry
Set the workload architecture:
Standard Only standard workloads can use the environment. A standard workload consists of a single process.
Distributed Only distributed workloads can use the environment. A distributed workload consists of multiple processes working together. These processes can run on different nodes.
Set the workload type:
Workspace
Training
Inference
Optional: Set the connection for your tool(s). The tools must be configured in the image. When submitting a workload using the environment, it is possible to connect to these tools
Select the tool from the list (the available tools varies from IDE, experiment tracking, and more, including a custom tool for your choice)
Select the connection type
Optional: Set a command and arguments for the container running the pod
When no command is added, the default command of the image is used (the image entrypoint)
The command can be modified while submitting a workload using the environment
The argument(s) can be modified while submitting a workload using the environment
Optional: Set the environment variable(s)
Click +ENVIRONMENT VARIABLE
Enter a name
Select the source for the environment variable
Optional: Set the container’s working directory to define where the container’s process starts running. When left empty, the default directory is used.
Optional: Set where the UID, GID and supplementary groups are taken from, this can be:
From the image
From the IdP token (only available in an SSO installations)
Custom (manually set) - decide whether the submitter can modify these value upon submission.
Optional: Select Linux capabilities - Grant certain privileges to a container without granting all the privileges of the root user.
Click CREATE ENVIRONMENT
To edit an existing environment:
Select the environment you want to edit
Click Edit
Update the environment and click SAVE ENVIRONMENT
To copy an existing environment:
Select the environment you want to copy
Click MAKE A COPY
Enter a name for the environment. The name must be unique.
Update the environment and click CREATE ENVIRONMENT
To delete an environment:
Select the environment you want to delete
Click DELETE
On the dialog, click DELETE to confirm
Go to the Environment API reference to view the available actions
kubectl edit runaiconfig runai -n runaikubectl get crds/runaiconfigs.run.ai -n runai -o yamlspec:
global:
<service-group-name>: # schedulingServices | SyncServices | WorkloadServices
resources:
limits:
cpu: 1000m
memory: 1Gi
requests:
cpu: 100m
memory: 512Mispec:
<service-name>: # for example: pod-grouper
resources:
limits:
cpu: 1000m
memory: 1Gi
requests:
cpu: 100m
memory: 512Mispec:
global:
replicaCount: 1 # defaultspec:
<service-name>: # for example: pod-grouper
replicas: 1 # defaultspec:
prometheus:
spec: # PrometheusSpec
retention: 2h # default
retentionSize: 20GBspec:
prometheus:
logLevel: info # debug | info | warn | error
additionalAlertLabels:
- env: prod # examplespec:
global:
managedNodes:
inclusionCriteria:
nodeSelectorTerms:
- matchExpressions:
- key: nvidia.com/gpu.product
operator: Existsspec:
workload-controller:
s3FileSystemImage:
name: goofys
registry: runai.jfrog.io/op-containers-prod
tag: 3.12.24
gitSyncImage:
name: git-sync
registry: registry.k8s.io
tag: v4.4.0spec.project-controller.createNamespaces (boolean)
Allows Kubernetes namespace creation for new projects
Default: true
spec.project-controller.createRoleBindings (boolean)
Specifies if role bindings should be created in the project's namespace
Default: true
spec.project-controller.limitRange (boolean)
Specifies if limit ranges should be defined for projects
Default: true
spec.project-controller.clusterWideSecret (boolean)
Allows Kubernetes Secrets creation at the cluster scope. See Credentials for more details.
Default: true
spec.workload-controller.additionalPodLabels (object)
Set workload's Pod Labels in a format of key/value pairs. These labels are applied to all pods.
spec.workload-controller.failureResourceCleanupPolicy
NVIDIA Run:ai cleans the workload's unnecessary resources:
All - Removes all resources of the failed workload
None - Retains all resources
KeepFailing - Removes all resources except for those that encountered issues (primarily for debugging purposes)
Default: All
spec.workload-controller.GPUNetworkAccelerationEnabled
Enables GPU network acceleration. See Using GB200 NVL72 and Multi-Node NVLink Domains for more details.
Default: false
spec.mps-server.enabled (boolean)
Enabled when using NVIDIA MPS
Default: false
spec.daemonSetsTolerations (object)
Configure Kubernetes tolerations for NVIDIA Run:ai daemonSets / engine
spec.runai-container-toolkit.logLevel (boolean)
Specifies the NVIDIA Run:ai-container-toolkit logging level: either 'SPAM', 'DEBUG', 'INFO', 'NOTICE', 'WARN', or 'ERROR'
Default: INFO
spec.runai-container-toolkit.enabled (boolean)
Enables workloads to use GPU fractions
Default: true
node-scale-adjuster.args.gpuMemoryToFractionRatio (object)
A scaling-pod requesting a single GPU device will be created for every 1 to 10 pods requesting fractional GPU memory (1/gpuMemoryToFractionRatio). This value represents the ratio (0.1-0.9) of fractional GPU memory (any size) to GPU fraction (portion) conversion.
Default: 0.1
spec.global.core.dynamicFractions.enabled (boolean)
Enables dynamic GPU fractions
Default: true
spec.global.core.swap.enabled (boolean)
Enables memory swap for GPU workloads
Default: false
spec.global.core.swap.limits.cpuRam (string)
Sets the CPU memory size used to swap GPU workloads
Default:100Gi
spec.global.core.swap.limits.reservedGpuRam (string)
Sets the reserved GPU memory size used to swap GPU workloads
Default: 2Gi
spec.global.core.nodeScheduler.enabled (boolean)
Enables the node-level scheduler
Default: false
spec.global.core.timeSlicing.mode (string)
Sets the GPU time-slicing mode. Possible values:
timesharing - all pods on a GPU share the GPU compute time evenly.
strict - each pod gets an exact time slice according to its memory fraction value.
fair - each pod gets an exact time slice according to its memory fraction value and any unused GPU compute time is split evenly between the running pods.
Default: timesharing
spec.runai-scheduler.args.fullHierarchyFairness (boolean)
Enables fairness between departments, on top of projects fairness
Default: true
spec.runai-scheduler.args.defaultStalenessGracePeriod
Sets the timeout in seconds before the scheduler evicts a stale pod-group (gang) that went below its min-members in running state:
0s - Immediately (no timeout)
-1 - Never
Default: 60s
spec.pod-grouper.args.gangSchedulingKnative (boolean)
Enables gang scheduling for inference workloads.For backward compatibility with versions earlier than v2.19, change the value to false
Default: false
spec.pod-grouper.args.gangScheduleArgoWorkflow (boolean)
Groups all pods of a single ArgoWorkflow workload into a single Pod-Group for gang scheduling
Default: true
spec.runai-scheduler.args.verbosity (int)
Configures the level of detail in the logs generated by the scheduler service
Default: 4
spec.limitRange.cpuDefaultRequestCpuLimitFactorNoGpu (string)
Sets a default ratio between the CPU request and the limit for workloads without GPU requests
Default: 0.1
spec.limitRange.memoryDefaultRequestMemoryLimitFactorNoGpu (string)
Sets a default ratio between the memory request and the limit for workloads without GPU requests
Default: 0.1
spec.limitRange.cpuDefaultRequestGpuFactor (string)
Sets a default amount of CPU allocated per GPU when the CPU is not specified
Default: 100
spec.limitRange.cpuDefaultLimitGpuFactor (int)
Sets a default CPU limit based on the number of GPUs requested when no CPU limit is specified
Default: NO DEFAULT
spec.limitRange.memoryDefaultRequestGpuFactor (string)
Sets a default amount of memory allocated per GPU when the memory is not specified
Default: 100Mi
spec.limitRange.memoryDefaultLimitGpuFactor (string)
Sets a default memory limit based on the number of GPUs requested when no memory limit is specified
Default: NO DEFAULT
When inference is selected, define the endpoint of the model by providing both the protocol and the container’s serving port
Auto generate A unique URL is automatically created for each workload using the environment
Custom URL The URL is set manually
Node port
Auto generate A unique port is automatically exposed for each workload using the environment
Custom URL Set the port manually
Set the container port
Custom
Enter a value
Leave empty
Add instructions for the expected value if any
Credentials - Select an existing credential as the environment variable
Select a credential name To add new credentials to the credentials list, and for additional information, see Credentials.
Select a secret key
ConfigMap - Select a predefined ConfigMap
Select a ConfigMap name To create a ConfigMap in your cluster, see Creating ConfigMaps in advance.
Enter a ConfigMap key
The environment variables can be modified and new variables can be added while submitting a workload using the environment
Set the User ID (UID), Group ID (GID) and the supplementary groups that can run commands in the container
Enter UID
Enter GID
Add Supplementary groups (multiple groups can be added, separated by commas)
Disable Allow the values above to be modified within the workload if you want the above values to be used as the default
Workload(s)
The list of existing workloads that use the environment
Workload types
The workload types that can use the environment (Workspace/ Training / Inference)
Template(s)
The list of workload templates that use this environment
Created by
The user who created the environment. By default NVIDIA Run:ai UI comes with preinstalled environments created by NVIDIA Run:ai
Creation time
The timestamp of when the environment was created
Last updated
The timestamp of when the environment was last updated
Cluster
The cluster with which the environment is associated
A vLLM-based server that hosts and serves large language models for inference, enabling API-based access to AI models
chatbot-ui
runai.jfrog.io/core-llm/llm-app
A user interface for interacting with chat-based AI models, often used for testing and deploying chatbot applications
rstudio
rocker/rstudio:4
An integrated development environment (IDE) for R, commonly used for statistical computing and data analysis
vscode
ghcr.io/coder/code-server
A fast, lightweight code editor with powerful features like intelligent code completion, debugging, Git integration, and extensions, ideal for web development, data science, and more
gpt2
runai.jfrog.io/core-llm/quickstart-inference:gpt2-cpu
A package containing an inference server, GPT2 model and chat UI often used for quick demos

Replace <control-plane-endpoint> with the URL of the Control Plane in your environment. If the pod fails to connect to the Control Plane, check for potential network policies
Check infrastructure-level configurations:
Ensure that firewall rules and security groups allow traffic between your Kubernetes cluster and the Control Plane
Verify required ports and protocols:
Ensure that the necessary ports and protocols for NVIDIA Run:ai’s services are not blocked by any firewalls or security groups
Try to identify the problem from the logs. If you cannot resolve the issue, continue to the next step.
Use this pod to perform network resolution tests and other diagnostics to identify any DNS or connectivity problems within your Kubernetes {{glossary.Cluster}}.
Replace <control-plane-endpoint> with the URL of the Control Plane in your environment. If the pod fails to connect to the Control Plane, check for potential network policies:
Example of allowing traffic:
Check infrastructure-level configurations:
Ensure that firewall rules and security groups allow traffic between your Kubernetes cluster and the Control Plane
Verify required ports and protocols:
Ensure that the necessary ports and protocols for NVIDIA Run:ai’s services are not blocked by any firewalls or security groups
Try to identify the problem from the logs. If you cannot resolve the issue, continue to the next step
The version of Kubernetes installed
NVIDIA Run:ai cluster UUID
The unique ID of the cluster
apiVersion: networking.k8s.io/v1
kind: NetworkPolicy
metadata:
name: allow-control-plane-traffic
namespace: runai
spec:
podSelector:
matchLabels:
app: runai
policyTypes:
- Ingress
- Egress
egress:
- to:
- ipBlock:
cidr: <control-plane-ip-range>
ports:
- protocol: TCP
port: <control-plane-port>
ingress:
- from:
- ipBlock:
cidr: <control-plane-ip-range>
ports:
- protocol: TCP
port: <control-plane-port>kubectl get pods -n runai | grep -E 'runai-agent|cluster-sync|assets-sync'kubectl run control-plane-connectivity-check -n runai --image=wbitt/network-multitool --command -- /bin/sh -c 'curl -sSf <control-plane-endpoint> > /dev/null && echo "Connection Successful" || echo "Failed connecting to the Control Plane"'kubectl get networkpolicies -n runaikubectl logs deployment/runai-agent -n runai
kubectl logs deployment/cluster-sync -n runai
kubectl logs deployment/assets-sync -n runaikubectl run -i --tty netutils --image=dersimn/netutils -- bashkubectl get runaiconfig -n runai runai -ojson | jq -r '.status.conditions | map(select(.type == "Available"))'kubectl get events -Akubectl describe <resource_type> <name>kubectl get pods -n runai | grep -E 'runai-agent|cluster-sync|assets-sync'kubectl run control-plane-connectivity-check -n runai --image=wbitt/network-multitool --command -- /bin/sh -c 'curl -sSf <control-plane-endpoint> > /dev/null && echo "Connection Successful" || echo "Failed connecting to the Control Plane"'kubectl get networkpolicies -n runaikubectl logs deployment/runai-agent -n runai
kubectl logs deployment/cluster-sync -n runai
kubectl logs deployment/assets-sync -n runaikubectl get configmap -n runai-publickubectl describe configmap runai-public -n runai-publicapiVersion: networking.k8s.io/v1
kind: NetworkPolicy
metadata:
name: allow-control-plane-traffic
namespace: runai
spec:
podSelector:
matchLabels:
app: runai
policyTypes:
- Ingress
- Egress
egress:
- to:
- ipBlock:
cidr: <control-plane-ip-range>
ports:
- protocol: TCP
port: <control-plane-port>
ingress:
- from:
- ipBlock:
cidr: <control-plane-ip-range>
ports:
- protocol: TCP
port: <control-plane-port>CPU memory This column is displayed only if CPU quota is enabled via the General settings. Represents the amount of CPU memory you want to allocate for this department in this node pool (in Megabytes or Gigabytes).
Under the SCHEDULING PREFERENCES tab
Department priority Sets the department's scheduling priority compared to other departments in the same node pool, using one of the following priorities:
Highest - 255
VeryHigh - 240
High - 210
MediumHigh - 180
Medium - 150
MediumLow - 100
Low - 50
VeryLow - 20
Lowest - 1
For v2.21, the default value is MediumLow. All departments are set with the same default value, therefore there is no change of scheduling behavior unless the Administrator changes any department priority values. To learn more about department priority, see .
Over-quota If over quota weight is enabled via the General settings then over quota weight is presented, otherwise over quota is presented
Over-quota When enabled, the department can use non-guaranteed overage resources above its quota in this node pool. The amount of the non-guaranteed overage resources for this department is calculated proportionally to the department's quota in this node pool. When disabled, the department cannot use more resources than the guaranteed quota in this node pool.
Over quota weight Represents a weight used to calculate the amount of non-guaranteed overage resources a project can get on top of its quota in this node pool. All unused resources are split between departments that require the use of overage resources:
Department max. GPU device allocation Represents the maximum GPU device allocation the department can get from this node pool - the maximum sum of quota and over-quota GPUs (decimal number).
gcr.io/run-ai-lab/pytorch-example-jupyterTools - Set the connection for your tool:
Click +TOOL
Select Jupyter tool from the list
Set the runtime settings for the environment. Click +COMMAND & ARGUMENTS and add the following:
Enter the command - start-notebook.sh
Enter the arguments - --NotebookApp.base_url=/${RUNAI_PROJECT}/${RUNAI_JOB_NAME} --NotebookApp.token=''
Note: If host-based routing is enabled on the cluster, enter the --NotebookApp.token='' only.
Click CREATE ENVIRONMENT
Select the newly created environment from the side pane
Set GPU memory per device
Select GB - Fraction of a GPU device’s memory
Set the memory Request - 4GB (the workload will allocate 4GB of the GPU memory)
Toggle Limit and set to 12
Optional: set the CPU compute per pod - 0.1 cores (default)
Optional: set the CPU memory per pod - 100 MB (default)
Select More settings and toggle Increase shared memory size
Click CREATE COMPUTE RESOURCE
Select the newly created compute resource from the side pane
Select Start from scratch to launch a new workspace quickly
Enter a name for the workspace (if the name already exists in the project, you will be requested to submit a different name)
Under Submission, select Original and click CONTINUE
Create an environment for your workspace
Click +NEW ENVIRONMENT
Enter quick-start as the name for the environment. The name must be unique.
Enter the Image URL - gcr.io/run-ai-lab/pytorch-example-jupyter
Tools - Set the connection for your tool
Click +TOOL
Select Jupyter tool from the list
Set the runtime settings for the environment. Click +COMMAND & ARGUMENTS and add the following:
Enter the command - start-notebook.sh
Enter the arguments - --NotebookApp.base_url=/${RUNAI_PROJECT}/${RUNAI_JOB_NAME} --NotebookApp.token=''
Note: If is enabled on the cluster, enter the --NotebookApp.token=''
Click CREATE ENVIRONMENT
The newly created environment will be selected automatically
Create a new “request-limit” compute resource for your workspace
Click +NEW COMPUTE RESOURCE
Enter request-limit as the name for the compute resource. The name must be unique.
Set GPU devices per pod - 1
Set GPU memory per device
Select GB - Fraction of a GPU device’s memory
Set the memory Request - 4GB (the workload will allocate 4GB of the GPU memory)
Toggle Limit and set to 12
Optional: set the CPU compute per pod - 0.1 cores (default)
Optional: set the CPU memory per pod - 100 MB (default)
Select More settings and toggle Increase shared memory size
Click CREATE COMPUTE RESOURCE
The newly created compute resource will be selected automatically
Click CREATE WORKSPACE
<CLUSTER-UUID> - The unique identifier of the Cluster. You can get the Cluster UUID via the Get Clusters API.
toolType will show the Jupyter icon when connecting to the Jupyter tool via the user interface.
toolName will show when connecting to the Jupyter tool via the user interface.
Select Start from scratch to launch a new workspace quickly
Enter a name for the workspace (if the name already exists in the project, you will be requested to submit a different name)
Under Submission, select Original and click CONTINUE
Select the environment created in Step 2
Select the compute resource created in Step 2
Click CREATE WORKSPACE
<CLUSTER-UUID> - The unique identifier of the Cluster. You can get the Cluster UUID via the Get Clusters API.
toolType will show the Jupyter icon when connecting to the Jupyter tool via the user interface.
toolName will show when connecting to the Jupyter tool via the user interface.
Execute both cells in Untitled.ipynb. This will consume about 3 GB of GPU memory and be well below the 4GB of the GPU Memory Request value.
In the second cell, edit the value after --image-size from 100 to 200 and run the cell. This will increase the GPU memory utilization to about 11.5 GB which is above the Request value.
Execute both cells in Untitled.ipynb. This will consume about 3 GB of GPU memory and be well below the 4GB of the GPU Memory Request value.
In the second cell, edit the value after --image-size from 100 to 200 and run the cell. This will increase the GPU memory utilization to about 11.5 GB which is above the Request value.
This section explains the procedure to manage Projects.
Researchers submit AI workloads. To streamline resource allocation and prioritize work, NVIDIA Run:ai introduces the concept of Projects. Projects are the tool to implement resource allocation policies as well as the segregation between different initiatives. A project may represent a team, an individual, or an initiative that shares resources or has a specific resource quota. Projects may be aggregated in NVIDIA Run:ai departments.
For example, you may have several people involved in a specific face-recognition initiative collaborating under one project named “face-recognition-2024”. Alternatively, you can have a project per person in your team, where each member receives their own quota.
The Projects table can be found under Organization in the NVIDIA Run:ai platform.
The Projects table provides a list of all projects defined for a specific cluster, and allows you to manage them. You can switch between clusters by selecting your cluster using the filter at the top.
The Projects table consists of the following columns:
Click one of the values of Node pool(s) with quota column, to view the list of node pools and their parameters
Click one of the values in the Subject(s) column, to view the list of subjects and their parameters. This column is only viewable, if your role in the NVIDIA Run:ai system affords you those permissions.
Click one of the values of Workload(s) column, to view the list of workloads and their parameters
Filter - Click ADD FILTER, select the column to filter by, and enter the filter values
Search - Click SEARCH and type the value to search by
Sort - Click each column header to sort by
Column selection - Click COLUMNS and select the columns to display in the table
To create a new Project:
Click +NEW PROJECT
Select a scope, you can only view clusters if you have permission to do so - within the scope of the roles assigned to you
Enter a name for the project Project names must start with a letter and can only contain lower case Latin letters, numbers or a hyphen ('-’)
Namespace associated with Project Each project has an associated (Kubernetes) namespace in the cluster. All workloads under this project use this namespace.
When no node pools are configured, you can set the same parameters for the whole project, instead of per node pool. After node pools are created, you can set the above parameters for each node-pool separately.
Set as required.
Click CREATE PROJECT
To create a new access rule for a project:
Select the project you want to add an access rule for
Click ACCESS RULES
Click +ACCESS RULE
Select a subject
To delete an access rule from a project:
Select the project you want to remove an access rule from
Click ACCESS RULES
Find the access rule you want to delete
Click on the trash icon
To edit a project:
Select the project you want to edit
Click EDIT
Update the Project and click SAVE
To view the policy of a project:
Select the project for which you want to view its . This option is only active for projects with defined policies in place.
Click VIEW POLICY and select the workload type for which you want to view the policies: a. Workspace workload type policy with its set of rules b. Training workload type policies with its set of rules
In the Policy form, view the workload rules that are enforcing your project for the selected workload type as well as the defaults:
To delete a project:
Select the project you want to delete
Click DELETE
On the dialog, click DELETE to confirm
To view the available actions, go to the API reference.
This section explains the procedure for managing workloads.
The Workloads table can be found under Workload manager in the NVIDIA Run:ai platform.
The workloads table provides a list of all the workloads scheduled on the NVIDIA Run:ai Scheduler, and allows you to manage them.
The Workloads table consists of the following columns:
The following table describes the different phases in a workload life cycle. The UI provides additional details for some of the below workload statuses which can be viewed by clicking the icon next to the status.
Click one of the values in the Running/requested pods column, to view the list of pods and their parameters.
A connection refers to the method by which you can access and interact with the running workloads. It is essentially the "doorway" through which you can reach and use the applications (tools) these workloads provide.
Click one of the values in the Connection(s) column, to view the list of connections and their parameters. Connections are network interfaces that communicate with the application running in the workload. Connections are either the URL the application exposes or the IP and the port of the node that the workload is running on.
Click one of the values in the Data source(s) column to view the list of data sources and their parameters.
Filter - Click ADD FILTER, select the column to filter by, and enter the filter values
Search - Click SEARCH and type the value to search by
Sort - Click each column header to sort by
Column selection - Click COLUMNS and select the columns to display in the table
Click a row in the Workloads table and then click the SHOW DETAILS button at the upper-right side of the action bar. The details pane appears, presenting the following tabs:
Displays the workload status over time. It displays events describing the workload lifecycle and alerts on notable events. Use the filter to search through the history for specific events.
GPU utilization Per GPU graph and an average of all GPUs graph, all on the same chart, along an adjustable period allows you to see the trends of all GPUs compute utilization (percentage of GPU compute) in this node.
GPU memory utilization Per GPU graph and an average of all GPUs graph, all on the same chart, along an adjustable period allows you to see the trends of all GPUs memory usage (percentage of the GPU memory) in this node.
CPU compute utilization The average of all CPUs’ cores compute utilization graph, along an adjustable period allows you to see the trends of CPU compute utilization (percentage of CPU compute) in this node.
Workload events are ordered in chronological order. The logs contain events from the workload’s lifecycle to help monitor and debug issues.
Before starting, make sure you have created a or have one created for you to work with workloads.
To create a new workload:
Click +NEW WORKLOAD
Select a workload type - Follow the links below to view the step-by-step guide for each workload type:
- Used for data preparation and model-building tasks.
- Used for standard training tasks of all sorts
Stopping a workload kills the workload pods and releases the workload resources.
Select the workload you want to stop
Click STOP
Running a workload spins up new pods and resumes the workload work after it was stopped.
Select the workload you want to run again
Click RUN
To connect to an application running in the workload (for example, Jupyter Notebook)
Select the workload you want to connect
Click CONNECT
Select the tool from the drop-down list
The selected tool is opened in a new tab on your browser
Select the workload you want to copy
Click MAKE A COPY
Enter a name for the workload. The name must be unique.
Update the workload and click CREATE WORKLOAD
Select the workload you want to delete
Click DELETE
On the dialog, click DELETE to confirm the deletion
Go to the API reference to view the available actions
To understand the condition of the workload, review the workload status in the Workload table. For more information, see check the .
Listed below are a number of known issues when working with workloads and how to fix them:
NVIDIA Run:ai supports simultaneous submission of multiple workloads to single or multi-GPUs when using GPU fractions. This is achieved by slicing the GPU memory between the different workloads according to the requested GPU fraction, and by using NVIDIA’s GPU time-slicing to share the GPU compute runtime. NVIDIA Run:ai ensures each workload receives the exact share of the GPU memory (= gpu_memory * requested), while the NVIDIA GPU time-slicing splits the GPU runtime evenly between the different workloads running on that GPU.
To provide customers with predictable and accurate GPU compute resource scheduling, NVIDIA Run:ai’s GPU time-slicing adds fractional compute capabilities on top of NVIDIA Run:ai GPU fraction capabilities.
While the default NVIDIA GPU time-slicing allows for sharing the GPU compute runtime evenly without splitting or limiting the runtime of each workload, NVIDIA Run:ai’s GPU time-slicing mechanism gives each workload exclusive access to the full GPU for a limited amount of time, lease time, in each scheduling cycle, plan time. This cycle repeats itself for the lifetime of the workload. Using the GPU runtime this way guarantees a workload is granted its requested GPU compute resources proportionally to its requested GPU fraction, but also allows splitting GPU unused compute time up to a requested Limit.
For example, when there are 2 workloads running on the same GPU, with NVIDIA’s default GPU time slicing, each workload gets 50% of the GPU compute runtime, even if one workload requests 25% of the GPU memory, and the other workload requests 75% of the GPU memory. With the NVIDIA Run:ai GPU time-slicing, the first workload will get 25% of the GPU compute time and the second will get 75%. If one of the workloads does not use its deserved GPU compute time, the others can split that time evenly between them. As shown in the example, if one of the workloads does not request the GPU for some time, the other will get the full GPU compute time.
NVIDIA Run:ai offers two GPU time-slicing modes:
Strict - Each workload gets its precise GPU compute fraction, which equals to its requested GPU (memory) fraction. In terms of official Kubernetes resource specification, this means:
Fair - Each workload is guaranteed at least its GPU compute fraction, but at the same time can also use additional GPU runtime compute slices that are not used by other idle workloads. Those excess time slices are divided equally between all workloads running on that GPU (after each got at least its requested GPU compute fraction). In terms of official Kubernetes resource specification, this means:
The figure below illustrates how Strict time-slicing mode uses the GPU from Lease (slice) and Plan (cycle) perspective:
The figure below illustrates how Fair time-slicing mode uses the GPU from Lease (slice) and Plan (cycle) perspective:
Each GPU scheduling cycle is a plan. The plan is determined by the lease time and granularity (precision). By default, basic lease time is 250ms with 5% granularity (precision), which means the plan (cycle) time is: 250 / 0.05 = 5000ms (5 Sec). Using these values, a workload that requests gpu-fraction=0.5 gets 2.5s runtime out of the 5s cycle time.
Different workloads require different SLA and precision, so it also possible to tune the lease time and precision for customizing the time-slicing capabilities to your cluster.
Once timeSlicing is enabled in the runaiconfig, all submitted GPU fractions or GPU memory workloads will have their gpu-compute-request/limit set automatically by the system, depending on the annotation used on the time-slicing mode:
Strict compute resources:
Fair compute resources:
NVIDIA Run:ai’s GPU time-slicing is a cluster flag which changes the default NVIDIA time-slicing used by GPU fractions. For more details, see .
Enable GPU time-slicing by setting the following cluster flag in the runaiconfig file:
If the timeSlicing flag is not set, the system continues to use the default NVIDIA GPU time-slicing to maintain backward compatibility.
This section explains what data sources are and how to create and use them.
Data sources are a type of and represent a location where data is actually stored. They may represent a remote data location, such as NFS, Git, or S3, or a Kubernetes local resource, such as PVC, ConfigMap, HostPath, or Secret.
This configuration simplifies the mapping of the data into the workload’s file system and handles the mounting process during workload creation for reading and writing. These data sources are reusable and can be easily integrated and used by AI practitioners while submitting workloads across various scopes.
The data sources table can be found under Workload manager in the NVIDIA Run:ai platform.
runai project set "project-name"
runai workspace submit "workload-name" \
--image gcr.io/run-ai-lab/pytorch-example-jupyter \
--gpu-memory-request 4G --gpu-memory-limit 12G --large-shm \
--external-url container=8888 --name-prefix jupyter \
--command -- start-notebook.sh \
--NotebookApp.base_url=/${RUNAI_PROJECT}/${RUNAI_JOB_NAME} --NotebookApp.token=curl -L 'https://<COMPANY-URL>/api/v1/workloads/workspaces' \
-H 'Content-Type: application/json' \
-H 'Authorization: Bearer <TOKEN>' \
-d '{
"name": "workload-name",
"projectId": "<PROJECT-ID>",
"clusterId": "<CLUSTER-UUID>",
"spec": {
"command" : "start-notebook.sh",
"args" : "--NotebookApp.base_url=/${RUNAI_PROJECT}/${RUNAI_JOB_NAME} --NotebookApp.token=''",
"image": "gcr.io/run-ai-lab/pytorch-example-jupyter",
"compute": {
"gpuDevicesRequest": 1,
"gpuMemoryRequest": "4G",
"gpuMemoryLimit": "12G",
"largeShmRequest": true
},
"exposedUrls" : [
{
"container" : 8888,
"toolType": "jupyter-notebook",
"toolName": "Jupyter"
}
]
}
}runai project set "project-name"
runai workspace submit "workload-name" \
--image gcr.io/run-ai-lab/pytorch-example-jupyter --gpu-memory-request 4G \
--gpu-memory-limit 12G --large-shm --external-url container=8888 \
--name-prefix jupyter --command -- start-notebook.sh \
--NotebookApp.base_url=/${RUNAI_PROJECT}/${RUNAI_JOB_NAME} --NotebookApp.token=curl -L 'https://<COMPANY-URL>/api/v1/workloads/workspaces' \
-H 'Content-Type: application/json' \
-H 'Authorization: Bearer <TOKEN>' \
-d '{
"name": "workload-name",
"projectId": "<PROJECT-ID>",
"clusterId": "<CLUSTER-UUID>",
"spec": {
"command" : "start-notebook.sh",
"args" : "--NotebookApp.base_url=/${RUNAI_PROJECT}/${RUNAI_JOB_NAME} --NotebookApp.token=''",
"image": "gcr.io/run-ai-lab/pytorch-example-jupyter",
"compute": {
"gpuDevicesRequest": 1,
"gpuMemoryRequest": "4G",
"gpuMemoryLimit": "12G",
"largeShmRequest": true
},
"exposedUrls" : [
{
"container" : 8888,
"toolType": "jupyter-notebook",
"toolName": "Jupyter"
}
]
}
}Medium The default value. The Admin can change the default to any of the following values: High, Low, Lowest, or None.
Lowest over quota weight ‘Lowest’ has a unique behavior, it can only use over-quota (unused overage) resources if no other department needs them, and any department with a higher over quota weight can snap the average resources at any time.
None When set, the department cannot use more resources than the guaranteed quota in this node pool.
In case over quota is disabled, workloads running under subordinate projects are not able to use more resources than the department’s quota, but each project can still go over-quota (if enabled at the project level) up to the department’s quota.
Unlimited CPU(Cores) and CPU memory quotas are an exception - in this case, workloads of subordinated projects can consume available resources up to the physical limitation of the cluster or any of the node pools.
Creation time
The timestamp of when the workload was created
Completion time
The timestamp the workload reached a terminal state (failed/completed)
Connection(s)
The method by which you can access and interact with the running workload. It's essentially the "doorway" through which you can reach and use the tools the workload provide. (E.g node port, external URL, etc). Click one of the values in the column to view the list of connections and their parameters.
Data source(s)
Data resources used by the workload
Environment
The environment used by the workload
Workload architecture
Standard or distributed. A standard workload consists of a single process. A distributed workload consists of multiple processes working together. These processes can run on different nodes.
GPU compute request
Amount of GPU devices requested
GPU compute allocation
Amount of GPU devices allocated
GPU memory request
Amount of GPU memory Requested
GPU memory allocation
Amount of GPU memory allocated
Idle GPU devices
The number of allocated GPU devices that have been idle for more than 5 minutes
CPU compute request
Amount of CPU cores requested
CPU compute allocation
Amount of CPU cores allocated
CPU memory request
Amount of CPU memory requested
CPU memory allocation
Amount of CPU memory allocated
Cluster
The cluster that the workload is associated with
Running
Workload is currently in progress with all pods operational
All pods initialized (all containers in pods are ready)
Workload completion or failure
Degraded
Pods may not align with specifications, network services might be incomplete, or persistent volumes may be detached. Check your logs for specific details.
Pending - All pods are running but have issues.
Running - All pods are running with no issues.
Running - All resources are OK.
Completed - Workload finished with fewer resources
Failed - Workload failure or user-defined rules.
Deleting
Workload and its associated resources are being decommissioned from the cluster
Deleting the workload
Resources are fully deleted
Stopped
Workload is on hold and resources are intact but inactive
Stopping the workload without deleting resources
Transitioning back to the initializing phase or proceeding to deleting the workload
Failed
Image retrieval failed or containers experienced a crash. Check your logs for specific details
An error occurs preventing the successful completion of the workload
Terminal state
Completed
Workload has successfully finished its execution
The workload has finished processing without errors
Terminal state
GPU memory allocation
Amount of GPU memory allocated for the pod
Download table - Click MORE and then Click Download as CSV. Export to CSV is limited to 20,000 rows.
Refresh - Click REFRESH to update the table with the latest data
Show/Hide details - Click to view additional information on the selected row
CPU memory utilization The utilization of all CPUs memory in a single graph, along an adjustable period allows you to see the trends of CPU memory utilization (percentage of CPU memory) in this node.
CPU memory usage The usage of all CPUs memory in a single graph, along an adjustable period allows you to see the trends of CPU memory usage (in GB or MB of CPU memory) in this node.
For GPUs charts - Click the GPU legend on the right-hand side of the chart, to activate or deactivate any of the GPU lines.
You can click the date picker to change the presented period
You can use your mouse to mark a sub-period in the graph for zooming in, and use Reset zoom to go back to the preset period
Changes in the period affect all graphs on this screen.
Distributed Training - Used for distributed tasks of all sorts
Inference - Used for inference and serving tasks
Job (legacy). This type is displayed only if enabled by your Administrator, under General settings → Workloads → Workload policies
Click CREATE WORKLOAD
Workload
The name of the workload
Type
The workload type
Preemptible
Is the workload preemptible (Yes/no)
Status
The different phases in a workload lifecycle
Project
The project in which the workload runs
Department
The department that the workload is associated with. This column is visible only if the department toggle is enabled by your administrator.
Created by
The user who created the workload
Running/requested pods
The number of running pods out of the requested
Creating
Workload setup is initiated in the cluster. Resources and pods are now provisioning.
A workload is submitted
A multi-pod group is created
Pending
Workload is queued and awaiting resource allocation
A pod group exists
All pods are scheduled
Initializing
Workload is retrieving images, starting containers, and preparing pods
All pods are scheduled
Pod
Pod name
Status
Pod lifecycle stages
Node
The node on which the pod resides
Node pool
The node pool in which the pod resides (applicable if node pools are enabled)
Image
The pod’s main image
GPU compute allocation
Amount of GPU devices allocated for the pod
Name
The name of the application running on the workload
Connection type
The network connection type selected for the workload
Access
Who is authorized to use this connection (everyone, specific groups/users)
Address
The connection URL
Copy button
Copy URL to clipboard
Connect button
Enabled only for supported tools
Data source
The name of the data source mounted to the workload
Type
The data source type
Cluster connectivity issues (there are issues with your connection to the cluster error message)
Verify that you are on a network that has been granted access to the cluster.
Reach out to your cluster admin for instructions on verifying this.
If you are an admin, see the troubleshooting section in the cluster documentation
Workload in “Initializing” status for some time
Check that you have access to the Container image registry.
Check the statuses of the pods in the pods’ dialog.
Check the event history for more details
Workload has been pending for some time
Check that you have the required quota.
Check the project’s available quota in the project dialog.
Check that all services needed to run are bound to the workload.
Check the event history for more details.
PVCs created using the K8s API or kubectl are not visible or mountable in NVIDIA Run:ai
This is by design.
Create a new data source of type PVC in the NVIDIA Run:ai UI
In the Data mount section, select Existing PVC
Select the PVC you created via the K8S API
You are now able to select and mount this PVC in your NVIDIA Run:ai submitted workloads.
Workload is not visible in the UI
Check that the workload hasn’t been deleted.
See the “Deleted” tab in the workloads view

All pods are initialized or a failure to initialize is detected
Annotation
Value
GPU Compute Request
GPU Compute Limit
gpu-fraction
x
x
x
gpu-memory
x
0
1.0
Annotation
Value
GPU Compute Request
GPU Compute Limit
gpu-fraction
x
x
1.0
gpu-memory
x
0
1.0


gpu-compute-request = gpu-compute-limit = gpu-(memory-)fractiongpu-compute-request = gpu-(memory-)fraction
gpu-compute-limit = 1.0global:
core:
timeSlicing:
mode: fair/strictGPU allocation ratio
The ratio of Allocated GPUs to GPU quota. This number reflects how well the project’s GPU quota is utilized by its descendent workloads. A number higher than 100% indicates the project is using over quota GPUs.
GPU quota
The GPU quota allocated to the project. This number represents the sum of all node pools’ GPU quota allocated to this project.
Allocated CPUs (Core)
The total number of CPU cores allocated by workloads submitted within this project. (This column is only available if the CPU Quota setting is enabled, as described below).
Allocated CPU Memory
The total number of CPUs allocated by successfully scheduled workloads under this project. (This column is only available if the CPU Quota setting is enabled, as described below).
CPU quota (Cores)
CPU quota allocated to this project. (This column is only available if the CPU Quota setting is enabled, as described below). This number represents the sum of all node pools’ CPU quota allocated to this project. The ‘unlimited’ value means the CPU (cores) quota is not bounded and workloads using this project can use as many CPU (cores) resources as they need (if available).
CPU memory quota
CPU memory quota allocated to this project. (This column is only available if the CPU Quota setting is enabled, as described below). This number represents the sum of all node pools’ CPU memory quota allocated to this project. The ‘unlimited’ value means the CPU memory quota is not bounded and workloads using this Project can use as much CPU memory resources as they need (if available).
CPU allocation ratio
The ratio of Allocated CPUs (cores) to CPU quota (cores). This number reflects how much the project’s ‘CPU quota’ is utilized by its descendent workloads. A number higher than 100% indicates the project is using over quota CPU cores.
CPU memory allocation ratio
The ratio of Allocated CPU memory to CPU memory quota. This number reflects how well the project’s ‘CPU memory quota’ is utilized by its descendent workloads. A number higher than 100% indicates the project is using over quota CPU memory.
Node affinity of training workloads
The list of NVIDIA Run:ai node-affinities. Any training workload submitted within this project must specify one of those NVIDIA Run:ai node affinities, otherwise it is not submitted.
Node affinity of interactive workloads
The list of NVIDIA Run:ai node-affinities. Any interactive (workspace) workload submitted within this project must specify one of those NVIDIA Run:ai node affinities, otherwise it is not submitted.
Idle time limit of training workloads
The time in days:hours:minutes after which the project stops a training workload not using its allocated GPU resources.
Idle time limit of preemptible workloads
The time in days:hours:minutes after which the project stops a preemptible interactive (workspace) workload not using its allocated GPU resources.
Idle time limit of non preemptible workloads
The time in days:hours:minutes after which the project stops a non-preemptible interactive (workspace) workload not using its allocated GPU resources..
Interactive workloads time limit
The duration in days:hours:minutes after which the project stops an interactive (workspace) workload
Training workloads time limit
The duration in days:hours:minutes after which the project stops a training workload
Creation time
The timestamp for when the project was created
Workload(s)
The list of workloads associated with the project. Click the values under this column to view the list of workloads with their resource parameters (as described below).
Cluster
The cluster that the project is associated with
Allocated CPU memory
The actual amount of CPU memory allocated by workloads using this node pool under this Project. The number of Allocated CPU memory may temporarily surpass the CPU memory quota if over quota is used.
Order of priority
The default order in which the Scheduler uses node-pools to schedule a workload. This is used only if the order of priority of node pools is not set in the workload during submission, either by an admin policy or the user. An empty value means the node pool is not part of the project’s default list, but can still be chosen by an admin policy or the user during workload submission
GPU compute request
The amount of GPU compute requested (floating number, represents either a portion of the GPU compute, or the number of whole GPUs requested)
GPU memory request
The amount of GPU memory requested (floating number, can either be presented as a portion of the GPU memory, an absolute memory size in MB or GB, or a MIG profile)
CPU memory request
The amount of CPU memory requested (floating number, presented as an absolute memory size in MB or GB)
CPU compute request
The amount of CPU compute requested (floating number, represents the number of requested Cores)
Download table - Click MORE and then Click Download as CSV. Export to CSV is limited to 20,000 rows.
By default, Run:ai creates a namespace based on the Project name (in the form of runai-<name>)
Alternatively, you can choose an existing namespace created for you by the cluster administrator
In the Quota management section, you can set the quota parameters and prioritize resources
Order of priority This column is displayed only if more than one node pool exists. The default order in which the Scheduler uses node pools to schedule a workload. This means the Scheduler first tries to allocate resources using the highest priority node pool, then the next in priority, until it reaches the lowest priority node pool list, then the Scheduler starts from the highest again. The Scheduler uses the Project list of prioritized node pools, only if the order of priority of node pools is not set in the workload during submission, either by an admin policy or by the user. Empty value means the node pool is not part of the Project’s default node pool priority list, but a node pool can still be chosen by the admin policy or a user during workload submission
Node pool This column is displayed only if more than one node pool exists. It represents the name of the node pool
Under the QUOTA tab
Over-quota state Indicates if over-quota is enabled or disabled as set in the SCHEDULING PREFERENCES tab. If over-quota is set to None, then it is disabled.
GPU devices The number of GPUs you want to allocate for this project in this node pool (decimal number)
Select or enter the subject identifier:
User Email for a local user created in NVIDIA Run:ai or for SSO user as recognized by the IDP
Group name as recognized by the IDP
Application name as created in NVIDIA Run:ai
Select a role
Click SAVE RULE
Click CLOSE
Parameter - The workload submission parameter that Rules and Defaults are applied to
Type (applicable for data sources only) - The data source type (Git, S3, nfs, pvc etc.)
Default - The default value of the Parameter
Rule - Set up constraints on workload policy fields
Source - The origin of the applied policy (cluster, department or project)
Project
The name of the project
Department
The name of the parent department. Several projects may be grouped under a department.
Status
The Project creation status. Projects are manifested as Kubernetes namespaces. The project status represents the Namespace creation status.
Node pool(s) with quota
The node pools associated with the project. By default, a new project is associated with all node pools within its associated cluster. Administrators can change the node pools’ quota parameters for a project. Click the values under this column to view the list of node pools with their parameters (as described below)
Subject(s)
The users, SSO groups, or applications with access to the project. Click the values under this column to view the list of subjects with their parameters (as described below). This column is only viewable if your role in the NVIDIA Run:ai platform allows you those permissions.
Allocated GPUs
The total number of GPUs allocated by successfully scheduled workloads under this project
Node pool
The name of the node pool is given by the administrator during node pool creation. All clusters have a default node pool created automatically by the system and named ‘default’.
GPU quota
The amount of GPU quota the administrator dedicated to the project for this node pool (floating number, e.g. 2.3 means 230% of GPU capacity).
CPU (Cores)
The amount of CPUs (cores) quota the administrator has dedicated to the project for this node pool (floating number, e.g. 1.3 Cores = 1300 mili-cores). The ‘unlimited’ value means the CPU (Cores) quota is not bounded and workloads using this node pool can use as many CPU (Cores) resources as they require, (if available).
CPU memory
The amount of CPU memory quota the administrator has dedicated to the project for this node pool (floating number, in MB or GB). The ‘unlimited’ value means the CPU memory quota is not bounded and workloads using this node pool can use as much CPU memory resource as they need (if available).
Allocated GPUs
The actual amount of GPUs allocated by workloads using this node pool under this project. The number of allocated GPUs may temporarily surpass the GPU quota if over quota is used.
Allocated CPU (Cores)
The actual amount of CPUs (cores) allocated by workloads using this node pool under this project. The number of allocated CPUs (cores) may temporarily surpass the CPUs (Cores) quota if over quota is used.
Subject
A user, SSO group, or application assigned with a role in the scope of this Project
Type
The type of subject assigned to the access rule (user, SSO group, or application)
Scope
The scope of this project in the organizational tree. Click the name of the scope to view the organizational tree diagram, you can only view the parts of the organizational tree for which you have permission to view.
Role
The role assigned to the subject, in this project’s scope
Authorized by
The user who granted the access rule
Last updated
The last time the access rule was updated
Workload
The name of the workload, given during its submission. Optionally, an icon describing the type of workload is also visible
Type
The type of the workload, e.g. Workspace, Training, Inference
Status
The state of the workload and time elapsed since the last status change
Created by
The subject that created this workload
Running/ requested pods
The number of running pods out of the number of requested pods for this workload. e.g. a distributed workload requesting 4 pods but may be in a state where only 2 are running and 2 are pending
Creation time
The date and time the workload was created

The data sources table comprises the following columns:
Data source
The name of the data source
Description
A description of the data source
Type
The type of data source connected – e.g., S3 bucket, PVC, or others
Status
The different lifecycle and representation of the data source condition
Scope
The of the data source within the organizational tree. Click the scope name to view the organizational tree diagram
Kubernetes name
The unique name of the data sources Kubernetes name as it appears in the cluster
The following table describes the data sources' condition and whether they were created successfully for the selected scope.
No issues found
No issues were found while creating the data source
Issues found
Issues were found while propagating the data source credentials
Issues found
The data source couldn’t be created at the cluster
Creating…
The data source is being created
No status / “-”
When the data source’s scope is an account, the current version of the cluster is not up to date, or the asset is not a cluster-syncing entity, the status can’t be displayed
Filter - Click ADD FILTER, select the column to filter by, and enter the filter values
Search - Click SEARCH and type the value to search by
Sort - Click each column header to sort by
Column selection - Click COLUMNS and select the columns to display in the table
Download table - Click MORE and then click ‘Download as CSV’. Export to CSV is limited to 20,000 rows.
Refresh - Click REFRESH to update the table with the latest data
To create a new data source:
Click +NEW DATA SOURCE
Select the data source type from the list. Follow the step-by-step guide for each data source type:
To copy an existing data source:
Select the data source you want to copy
Click MAKE A COPY
Enter a name for the data source. The name must be unique.
Update the data source and click CREATE DATA SOURCE
To rename an existing data source:
Select the data source you want to rename
Click Rename and edit the name/description
To delete a data source:
Select the data source you want to delete
Click DELETE
Confirm you want to delete the data source
Add PVCs in advance to be used when creating a PVC-type data source via the NVIDIA Run:ai UI.
The actions taken by the admin are based on the scope (cluster, department or project) that the admin wants for data source of type PVC. Follow the steps below for each required scope:
Locate the PVC in the NVIDIA Run:ai namespace (runai)
Provide NVIDIA Run:ai with visibility and authorization to share the PVC to your selected scope by implementing the following label: run.ai/cluster-wide: "true”
The PVC is now displayed for that scope in the list of existing PVCs.
Locate the PVC in the NVIDIA Run:ai namespace (runai)
To authorize NVIDIA Run:ai to use the PVC, label it: run.ai/department: "id"
The PVC is now displayed for that scope in the list of existing PVCs.
Locate the PVC in the project’s namespace.
The PVC is now displayed for that scope in the list of existing PVCs.
Add ConfigMaps in advance to be used when creating a ConfigMap-type data source via the NVIDIA Run:ai UI.
Locate the ConfigMap in the NVIDIA Run:ai namespace (runai)
To authorize NVIDIA Run:ai to use the ConfigMap, label it: run.ai/cluster-wide: "true”
The ConfigMap must have a label of run.ai/resource: <resource-name>
The ConfigMap is now displayed for that scope in the list of existing ConfigMaps.
Locate the ConfigMap in the NVIDIA Run:ai namespace (runai)
To authorize NVIDIA Run:ai to use the ConfigMap, label it: run.ai/department: "<department-id>"
The ConfigMap must have a label of run.ai/resource: <resource-name>
The ConfigMap is now displayed for that scope in the list of existing ConfigMaps.
Locate the ConfigMap in the project’s namespace
The ConfigMap must have a label of run.ai/resource: <resource-name>
The ConfigMap is now displayed for that scope in the list of existing ConfigMaps.
To view the available actions, go to the Data sources API reference.
This quick start provides a step-by-step walkthrough of the core scheduling concepts - over quota, fairness, and preemption. It demonstrates the simplicity of resource provisioning and how the system eliminates bottlenecks by allowing users or teams to exceed their resource quota when free GPUs are available.
Over quota - In this scenario, team-a runs two training workloads and team-b runs one. Team-a has a quota of 3 GPUs and is over quota by 1 GPU, while team-b has a quota of 1 GPU. The system allows this over quota usage as long as there are available GPUs in the cluster.
Fairness and preemption - Since the cluster is already at full capacity, when team-b launches a new b2 workload requiring 1 GPU , team-a can no longer remain over quota. To maintain fairness, the NVIDIA Run:ai Scheduler preempts workload a1 (1 GPU), freeing up resources for team-b.
You have created two - team-a and team-b - or have them created for you.
Each project has an assigned quota of 2 GPUs.
Go to Workload manager → Workloads
Click +NEW WORKLOAD and select Training
Select under which cluster to create the workload
Select the project named team-a
Go to the Workload Manager → Workloads
Click +NEW WORKLOAD and select Training
Select the cluster where the previous training workload was created
Select the project named team-a
Go to the Workload Manager → Workloads
Click +NEW WORKLOAD and select Training
Select the cluster where the previous training was created
Select the project named team-b
System status after run:
System status after run:
System status after run:
System status after run:
Go to the Workload Manager → Workloads
Click +NEW WORKLOAD and select Training
Select the cluster where the previous training was created
Select the project named team-b
Workloads status after run:
Workloads status after run:
Workloads status after run:
Workloads status after run:
Manage and monitor your newly created workload using the table.
CPU memory This column is displayed only if CPU quota is enabled via the General settings. Represents the amount of CPU memory you want to allocate for this project in this node pool (in Megabytes or Gigabytes).
Under the SCHEDULING PREFERENCES tab
Project priority Sets the project's scheduling priority compared to other projects in the same node pool, using one of the following priorities:
Highest - 255
VeryHigh - 240
High - 210
MediumHigh - 180
Medium - 150
MediumLow - 100
Low - 50
VeryLow - 20
Lowest - 1
For v2.21, the default value is MediumLow. All Projects are set with the same default value, therefore there is no change of scheduling behavior unless the Administrator changes any Project priority values. To learn more about Project priority, see .
Over-quota If over quota weight is enabled via the General settings, then over quota weight is presented, otherwise over quota is presented
Over-quota When enabled, the project can use non-guaranteed overage resources above its quota in this node pool. The amount of the non-guaranteed overage resources for this project is calculated proportionally to the project quota in this node pool. When disabled, the project cannot use more resources than the guaranteed quota in this node pool.
Over quota weight Represents a weight used to calculate the amount of non-guaranteed overage resources a project can get on top of its quota in this node pool. All unused resources are split between projects that require the use of overage resources:
Project max. GPU device allocation Represents the maximum GPU device allocation the project can get from this node pool - the maximum sum of quota and over-quota GPUs (decimal number)
When creating a PVC-type data source and selecting the ‘New PVC’ option, the PVC is immediately created in the cluster (even if no workload has requested this PVC).
Set the data origin
Set the S3 service URL
Select the credential
None - for public buckets
Credential names - This option is relevant for private buckets based on existing credentials that were created for the scope.
To add new credentials to the credentials list, and for additional information, check the article.
Enter the bucket name
Set the data target location
container path
Click CREATE DATA SOURCE
None - for public repositories
Credential names - This option applies to private repositories based on existing credentials that were created for the scope.
To add new credentials to the credentials list, and for additional information, check the Credentials article.
Workload(s)
The list of existing workloads that use the data source
Template(s)
The list of workload templates that use the data source
Created by
The user who created the data source
Creation time
The timestamp for when the data source was created
Cluster
The cluster that the data source is associated with

Under Workload architecture, select Standard
Select Start from scratch to launch a new training quickly
Enter a1 as the workload name
Under Submission, select Flexible and click CONTINUE
Under Environment, enter the Image URL - runai.jfrog.io/demo/quickstart
Click the load icon. A side pane appears, displaying a list of available compute resources. Select the ‘one-gpu’ compute resource for your workload.
If ‘one-gpu’ is not displayed, follow the below steps to create a one-time compute resource configuration:
Set GPU devices per pod - 1
Set GPU memory per device
Select % (of device) - Fraction of a GPU device's memory
Set the memory Request - 100 (the workload will allocate 100% of the GPU memory)
Optional: set the CPU compute per pod - 0.1 cores (default)
Optional: set the CPU memory per pod - 100 MB (default)
Click CREATE TRAINING
Go to the Workload Manager → Workloads
Click +NEW WORKLOAD and select Training
Select under which cluster to create the workload
Select the project named team-a
Under Workload architecture, select Standard
Select Start from scratch to launch a new training quickly
Enter a1 as the workload name
Under Submission, select Original and click CONTINUE
Create a new environment:
Click +NEW ENVIRONMENT
Enter quick-start as the name for the environment. The name must be unique.
Enter the Image URL - runai.jfrog.io/demo/quickstart
Select the ‘one-gpu’ compute resource for your workload
If ‘one-gpu’ is not displayed in the gallery, follow the below steps:
Click +NEW COMPUTE RESOURCE
Enter one-gpu as the name for the compute resource. The name must be unique.
Click CREATE TRAINING
Copy the following command to your terminal. For more details, see CLI reference:
Copy the following command to your terminal. For more details, see CLI reference:
Copy the following command to your terminal. Make sure to update the following parameters. For more details, see Trainings API.
<COMPANY-URL> - The link to the NVIDIA Run:ai user interface
<TOKEN> - The API access token obtained in Step 1
<PROJECT-ID> - The ID of the Project the workload is running on. You can get the Project ID via the .
<CLUSTER-UUID> - The unique identifier of the Cluster. You can get the Cluster UUID via the .
Under Workload architecture, select Standard
Select Start from scratch to launch a new training quickly
Enter a2 as the workload name
Under Submission, select Flexible and click CONTINUE
Under Environment, enter the Image URL - runai.jfrog.io/demo/quickstart
Click the load icon. A side pane appears, displaying a list of available compute resources. Select the ‘two-gpus’ compute resource for your workload.
If ‘two-gpus’ is not displayed, follow the below steps to create a one-time compute resource configuration:
Set GPU devices per pod - 2
Set GPU memory per device
Select % (of device) - Fraction of a GPU device's memory
Set the memory Request - 100 (the workload will allocate 100% of the GPU memory)
Optional: set the CPU compute per pod - 0.1 cores (default)
Optional: set the CPU memory per pod - 100 MB (default)
Click CREATE TRAINING
Go to the Workload Manager → Workloads
Click +NEW WORKLOAD and select Training
Select the cluster where the previous training workload was created
Select the project named team-a
Under Workload architecture, select Standard
Select Start from scratch to launch a new training quickly
Enter a2 as the workload name
Under Submission, select Original and click CONTINUE
Select the environment created in
Select the ‘two-gpus’ compute resource for your workload
If ‘two-gpus’ is not displayed in the gallery, follow the below steps:
Click +NEW COMPUTE RESOURCE
Enter two-gpus as the name for the compute resource. The name must be unique.
Click CREATE TRAINING
Copy the following command to your terminal. For more details, see CLI reference:
Copy the following command to your terminal. For more details, see CLI reference:
Copy the following command to your terminal. Make sure to update the following parameters. For more details, see Trainings API.
<COMPANY-URL> - The link to the NVIDIA Run:ai user interface
<TOKEN> - The API access token obtained in Step 1
<PROJECT-ID> - The ID of the Project the workload is running on. You can get the Project ID via the .
<CLUSTER-UUID> - The unique identifier of the Cluster. You can get the Cluster UUID via the .
Under Workload architecture, select Standard
Select Start from scratch to launch a new training quickly
Enter b1 as the workload name
Under Submission, select Flexible and click CONTINUE
Under Environment, enter the Image URL - runai.jfrog.io/demo/quickstart
Click the load icon. A side pane appears, displaying a list of available compute resources. Select the ‘one-gpu’ compute resource for your workload.
If ‘one-gpu’ is not displayed, follow the below steps to create a one-time compute resource configuration:
Set GPU devices per pod - 1
Set GPU memory per device
Select % (of device) - Fraction of a GPU device's memory
Set the memory Request - 100 (the workload will allocate 100% of the GPU memory)
Optional: set the CPU compute per pod - 0.1 cores (default)
Optional: set the CPU memory per pod - 100 MB (default)
Click CREATE TRAINING
Go to the Workload Manager → Workloads
Click +NEW WORKLOAD and select Training
Select the cluster where the previous training was created
Select the project named team-b
Under Workload architecture, select Standard
Select Start from scratch to launch a new training quickly
Enter b1 as the workload name
Under Submission, select Original and click CONTINUE
Create a new environment:
Click +NEW ENVIRONMENT
Enter quick-start as the name for the environment. The name must be unique.
Enter the Image URL - runai.jfrog.io/demo/quickstart
Select the ‘one-gpu’ compute resource for your workload
If ‘one-gpu’ is not displayed in the gallery, follow the below steps:
Click +NEW COMPUTE RESOURCE
Enter one-gpu as the name for the compute resource. The name must be unique.
Click CREATE TRAINING
Copy the following command to your terminal. For more details, see CLI reference:
Copy the following command to your terminal. For more details, see CLI reference:
Copy the following command to your terminal. Make sure to update the following parameters. For more details, see Trainings API.
<COMPANY-URL> - The link to the NVIDIA Run:ai user interface
<TOKEN> - The API access token obtained in Step 1
<PROJECT-ID> - The ID of the Project the workload is running on. You can get the Project ID via the .
<CLUSTER-UUID> - The unique identifier of the Cluster. You can get the Cluster UUID via the .
Under Workload architecture, select Standard
Select Start from scratch to launch a new training quickly
Enter b2 as the workload name
Under Submission, select Flexible and click CONTINUE
Under Environment, enter the Image URL - runai.jfrog.io/demo/quickstart
Click the load icon. A side pane appears, displaying a list of available compute resources. Select the ‘one-gpu’ compute resource for your workload.
If ‘one-gpu’ is not displayed, follow the below steps to create a one-time compute resource configuration:
Set GPU devices per pod - 1
Set GPU memory per device
Select % (of device) - Fraction of a GPU device's memory
Set the memory Request - 100 (the workload will allocate 100% of the GPU memory)
Optional: set the CPU compute per pod - 0.1 cores (default)
Optional: set the CPU memory per pod - 100 MB (default)
Click CREATE TRAINING
Go to the Workload Manager → Workloads
Click +NEW WORKLOAD and select Training
Select the cluster where the previous training was created
Select the project named team-b
Under Workload architecture, select Standard
Select Start from scratch to launch a new training quickly
Enter b2 as the workload name
Under Submission, select Original and click CONTINUE
Select the environment created in
Select the compute resource created in
Click CREATE TRAINING
Copy the following command to your terminal. For more details, see CLI reference:
Copy the following command to your terminal. For more details, see CLI reference:
Copy the following command to your terminal. Make sure to update the following parameters. For more details, see Trainings API.
<COMPANY-URL> - The link to the NVIDIA Run:ai user interface
<TOKEN> - The API access token obtained in Step 1
<PROJECT-ID> - The ID of the Project the workload is running on. You can get the Project ID via the .
<CLUSTER-UUID> - The unique identifier of the Cluster. You can get the Cluster UUID via the .


Browse to the provided NVIDIA Run:ai user interface and log in with your credentials.
Run the below --help command to obtain the login options and log in according to your setup:
runai login --helpLog in using the following command. You will be prompted to enter your username and password:
runai loginTo use the API, you will need to obtain a token as shown in API authentication.
The NVIDIA Run:ai cluster is a Kubernetes application. This section explains the required hardware and software system requirements for the NVIDIA Run:ai cluster.
The system requirements needed depend on where the control plane and cluster are installed. The following applies for Kubernetes only:
If you are installing the first cluster and control plane on the same Kubernetes cluster, and are not required.
If you are installing the first cluster and control plane on separate Kubernetes clusters, the and are required.
runai training submit a1 -i runai.jfrog.io/demo/quickstart -g 1 -p team-arunai submit a1 -i runai.jfrog.io/demo/quickstart -g 1 -p team-acurl --location 'https://<COMPANY-URL>/api/v1/workloads/trainings' \
--header 'Content-Type: application/json' \
--header 'Authorization: Bearer <TOKEN>' \
--data '{
"name": "a1",
"projectId": "<PROJECT-ID>",
"clusterId": "<CLUSTER-UUID>",
"spec": {
"image":"runai.jfrog.io/demo/quickstart",
"compute": {
"gpuDevicesRequest": 1
}
}
}'runai training submit a2 -i runai.jfrog.io/demo/quickstart -g 2 -p team-arunai submit a2 -i runai.jfrog.io/demo/quickstart -g 2 -p team-acurl --location 'https://<COMPANY-URL>/api/v1/workloads/trainings' \
--header 'Content-Type: application/json' \
--header 'Authorization: Bearer <TOKEN>' \
--data '{
"name": "a2",
"projectId": "<PROJECT-ID>",
"clusterId": "<CLUSTER-UUID>",
"spec": {
"image":"runai.jfrog.io/demo/quickstart",
"compute": {
"gpuDevicesRequest": 2
}
}
}'runai training submit b1 -i runai.jfrog.io/demo/quickstart -g 1 -p team-brunai submit b1 -i runai.jfrog.io/demo/quickstart -g 1 -p team-bcurl --location 'https://<COMPANY-URL>/api/v1/workloads/trainings' \
--header 'Content-Type: application/json' \
--header 'Authorization: Bearer <TOKEN>' \
--data '{
"name": "b1",
"projectId": "<PROJECT-ID>",
"clusterId": "<CLUSTER-UUID>",
"spec": {
"image":"runai.jfrog.io/demo/quickstart",
"compute": {
"gpuDevicesRequest": 1
}
}
}'runai training submit b2 -i runai.jfrog.io/demo/quickstart -g 1 -p team-brunai submit b2 -i runai.jfrog.io/demo/quickstart -g 1 -p team-bcurl --location 'https://<COMPANY-URL>/api/v1/workloads/trainings' \
--header 'Content-Type: application/json' \
--header 'Authorization: Bearer <TOKEN>' \
--data '{
"name": "b2",
"projectId": "<PROJECT-ID>",
"clusterId": "<CLUSTER-UUID>",
"spec": {
"image":"runai.jfrog.io/demo/quickstart",
"compute": {
"gpuDevicesRequest": 1
}
}
}'~ runai workload list -A
Workload Type Status Project Running/Req.Pods GPU Alloc.
────────────────────────────────────────────────────────────────────────────
a2 Training Running team-a 1/1 2.00
b1 Training Running team-b 1/1 1.00
a1 Training. Running team-a 0/1 1.00~ runai list -A
Workload Type Status Project Running/Req.Pods GPU Alloc.
────────────────────────────────────────────────────────────────────────────
a2 Training Running team-a 1/1 2.00
b1 Training Running team-b 1/1 1.00
a1 Training. Running team-a 0/1 1.00curl --location 'https://<COMPANY-URL>/api/v1/workloads' \
--header 'Content-Type: application/json' \
--header 'Authorization: Bearer <TOKEN>' \ #<TOKEN> is the API access token obtained in Step 1.
--data ''~ runai workload list -A
Workload Type Status Project Running/Req.Pods GPU Alloc.
────────────────────────────────────────────────────────────────────────────
a2 Training Running team-a 1/1 2.00
b1 Training Running team-b 1/1 1.00
b2 Training Running team-b 1/1 1.00
a1 Training. Pending team-a 0/1 1.00~ runai list -A
Workload Type Status Project Running/Req.Pods GPU Alloc.
────────────────────────────────────────────────────────────────────────────
a2 Training Running team-a 1/1 2.00
b1 Training Running team-b 1/1 1.00
b2 Training Running team-b 1/1 1.00
a1 Training. Pending team-a 0/1 1.00curl --location 'https://<COMPANY-URL>/api/v1/workloads' \
--header 'Content-Type: application/json' \
--header 'Authorization: Bearer <TOKEN>' \ #<TOKEN> is the API access token obtained in Step 1.
--data ''Medium The default value. The Administrator can change the default to any of the following values - High, Low, Lowest, or None.
Lowest Over quota weight ‘Lowest’ has a unique behavior since it can only use over-quota (unused overage) resources if no other project needs them. Any project with a higher over quota weight can snap the average resources at any time.
None When set, the project cannot use more resources than the guaranteed quota in this node pool
Unlimited CPU(Cores) and CPU memory quotas are an exception. In this case, workloads of subordinated projects can consume available resources up to the physical limitation of the cluster or any of the node pools.
Click CREATE ENVIRONMENT
The newly created environment will be selected automatically
Set GPU devices per pod - 1
Set GPU memory per device
Select % (of device) - Fraction of a GPU device's memory
Set the memory Request - 100 (the workload will allocate 100% of the GPU memory)
Optional: set the CPU compute per pod - 0.1 cores (default)
Optional: set the CPU memory per pod - 100 MB (default)
Click CREATE COMPUTE RESOURCE
The newly created compute resource will be selected automatically
Set GPU devices per pod - 2
Set GPU memory per device
Select % (of device) - Fraction of a GPU device's memory
Set the memory Request - 100 (the workload will allocate 100% of the GPU memory)
Optional: set the CPU compute per pod - 0.1 cores (default)
Optional: set the CPU memory per pod - 100 MB (default)
Click CREATE COMPUTE RESOURCE
The newly created compute resource will be selected automatically
Click CREATE ENVIRONMENT
The newly created environment will be selected automatically
Set GPU devices per pod - 1
Set GPU memory per device
Select % (of device) - Fraction of a GPU device's memory
Set the memory Request - 100 (the workload will allocate 100% of the GPU memory)
Optional: set the CPU compute per pod - 0.1 cores (default)
Optional: set the CPU memory per pod - 100 MB (default)
Click CREATE COMPUTE RESOURCE
The newly created compute resource will be selected automatically
The following hardware requirements are for the Kubernetes cluster nodes. By default, all NVIDIA Run:ai cluster services run on all available nodes. For production deployments, you may want to set node roles, to separate between system and worker nodes, reduce downtime and save CPU cycles on expensive GPU Machines.
x86 - Supported for both Kubernetes and OpenShift deployments.
ARM - Supported for Kubernetes only. ARM is currently not supported for OpenShift.
This configuration is the minimum requirement you need to install and use NVIDIA Run:ai cluster.
CPU
10 cores
Memory
20GB
Disk space
50GB
The NVIDIA Run:ai cluster supports x86 and ARM CPUs, and any NVIDIA GPU supported by the NVIDIA GPU Operator. GPU compatibility depends on the version of the NVIDIA GPU Operator installed in the cluster. NVIDIA Run:ai supports GPU Operator versions 22.9 to 25.3. For the list of supported GPU models, see Supported NVIDIA Data Center GPUs and Systems. To install the GPU Operator, see NVIDIA GPU Operator.
The following configuration represents the minimum hardware requirements for installing and operating the NVIDIA Run:ai cluster on worker nodes. Each node must meet these specifications:
CPU
2 cores
Memory
4GB
NVIDIA Run:ai workloads must be able to access data from any worker node in a uniform way, to access training data and code as well as save checkpoints, weights, and other machine-learning-related artifacts.
Typical protocols are Network File Storage (NFS) or Network-attached storage (NAS). NVIDIA Run:ai cluster supports both, for more information see Shared storage.
The following software requirements must be fulfilled on the Kubernetes cluster.
Any Linux operating system supported by both Kubernetes and NVIDIA GPU Operator
NVIDIA Run:ai cluster on Google Kubernetes Engine (GKE) supports both Ubuntu and Container Optimized OS (COS). COS is supported only with NVIDIA GPU Operator 24.6 or newer, and NVIDIA Run:ai cluster version 2.19 or newer.
NVIDIA Run:ai cluster on Elastic Kubernetes Service (EKS) does not support Bottlerocket or Amazon Linux.
NVIDIA Run:ai cluster on Oracle Kubernetes Engine (OKE) supports only Ubuntu.
Internal tests are being performed on Ubuntu 22.04 and CoreOS for OpenShift.
NVIDIA Run:ai cluster requires Kubernetes. The following Kubernetes distributions are supported:
Vanilla Kubernetes
OpenShift Container Platform (OCP)
NVIDIA Base Command Manager (BCM)
Elastic Kubernetes Engine (EKS)
Google Kubernetes Engine (GKE)
Azure Kubernetes Service (AKS)
Oracle Kubernetes Engine (OKE)
Rancher Kubernetes Engine (RKE1)
Rancher Kubernetes Engine 2 (RKE2)
For existing Kubernetes clusters, see the following Kubernetes version support matrix for the latest NVIDIA Run:ai cluster releases:
v2.17
1.27 to 1.29
4.12 to 4.15
v2.18
1.28 to 1.30
4.12 to 4.16
v2.19
1.28 to 1.31
4.12 to 4.17
v2.20
1.29 to 1.32
For information on supported versions of managed Kubernetes, it's important to consult the release notes provided by your Kubernetes service provider. There, you can confirm the specific version of the underlying Kubernetes platform supported by the provider, ensuring compatibility with NVIDIA Run:ai. For an up-to-date end-of-life statement see Kubernetes Release History or OpenShift Container Platform Life Cycle Policy.
NVIDIA Run:ai supports the following container runtimes. Make sure your Kubernetes cluster is configured with one of these runtimes:
Containerd (default in Kubernetes)
CRI-O (default in OpenShift)
NVIDIA Run:ai supports restricted policy for Pod Security Admission (PSA) on OpenShift only. Other Kubernetes distributions are only supported with privileged policy.
For NVIDIA Run:ai on OpenShift to run with PSA restricted policy:
Label the runai namespace as described in Pod Security Admission with the following labels:
The workloads submitted through NVIDIA Run:ai should comply with the restrictions of PSA restricted policy. This can be enforced using Policies.
The NVIDIA Run:ai must be installed in a namespace or project (OpenShift) called runai. Use the following to create the namespace/project:
NVIDIA Run:ai cluster requires Kubernetes Ingress Controller to be installed on the Kubernetes cluster.
OpenShift, RKE and RKE2 come pre-installed ingress controller.
Internal tests are being performed on NGINX, Rancher NGINX, OpenShift Router, and Istio.
Make sure that a default ingress controller is set.
There are many ways to install and configure different ingress controllers. A simple example to install and configure NGINX ingress controller using helm:
You must have a Fully Qualified Domain Name (FQDN) to install the NVIDIA Run:ai cluster (ex: runai.mycorp.local). This cannot be an IP. The domain name must be accessible inside the organization's private network.
In order to make inference serving endpoints available externally to the cluster, configure a wildcard DNS record (*.runai-inference.mycorp.local) that resolves to the cluster’s public IP address, or to the cluster's load balancer IP address in on-prem environments. This ensures each inference workload receives a unique subdomain under the wildcard domain.
You must have a TLS certificate that is associated with the FQDN for HTTPS access. Create a Kubernetes Secret named runai-cluster-domain-tls-secret in the runai namespace and include the path to the TLS --cert and its corresponding private --key by running the following:
NVIDIA Run:ai uses the OpenShift default Ingress router for serving. The TLS certificate configured for this router must be issued by a trusted CA. For more details, see the OpenShift documentation on configuring certificates.
For serving inference endpoints over HTTPS, NVIDIA Run:ai requires a dedicated wildcard TLS certificate that matches the fully qualified domain name (FQDN) used for inference. This certificate ensures secure external access to inference workloads.
A local certificate authority serves as the root certificate for organizations that cannot use publicly trusted certificate authority. Follow the below steps to configure the local certificate authority.
In air-gapped environments, you must configure and install the local CA's public key in the Kubernetes cluster. This is required for the installation to succeed:
Add the public key to the required namespace:
When installing the cluster, make sure the following flag is added to the helm command --set global.customCA.enabled=true. See Install cluster.
NVIDIA Run:ai cluster requires NVIDIA GPU Operator to be installed on the Kubernetes cluster. GPU Operator versions 22.9 to 25.3 are supported.
For air-gapped installation, follow the instructions in Install NVIDIA GPU Operator in Air-Gapped Environments.
See Installing the NVIDIA GPU Operator, followed by notes below:
Use the default gpu-operator namespace . Otherwise, you must specify the target namespace using the flag runai-operator.config.nvidiaDcgmExporter.namespace as described in customized cluster installation.
NVIDIA drivers may already be installed on the nodes. In such cases, use the NVIDIA GPU Operator flags --set driver.enabled=false. DGX OS is one such example as it comes bundled with NVIDIA Drivers.
For distribution-specific additional instructions see below:
For troubleshooting information, see the NVIDIA GPU Operator Troubleshooting Guide.
When deploying on clusters with RDMA or Multi Node NVLink‑capable nodes (e.g. B200, GB200), the NVIDIA Network Operator is required to enable high-performance networking features such as GPUDirect RDMA in Kubernetes. Network Operator versions v24.4 and above are supported.
The Network Operator works alongside the NVIDIA GPU Operator to provide:
NVIDIA networking drivers for advanced network capabilities.
Kubernetes device plugins to expose high‑speed network hardware to workloads.
Secondary network components to support network‑intensive applications.
The Network Operator must be installed and configured as follows:
Install the network operator as detailed in Network Operator Deployment on Vanilla Kubernetes Cluster.
Configure SR-IOV InfiniBand support as detailed in Network Operator Deployment with an SR-IOV InfiniBand Network.
For air-gapped installation, follow the instructions in Network Operator Deployment in an Air-gapped Environment.
When deploying on clusters with Multi-Node NVLink (e.g. GB200), the NVIDIA DRA driver is essential to enable Dynamic Resource Allocation at the Kubernetes level. To install, follow the instructions in Configure and Helm-install the driver.
After installation, update runaiconfig using the GPUNetworkAccelerationEnabled=True flag to enable GPU network acceleration. This triggers an update of the NVIDIA Run:ai workload-controller deployment and restarts the controller. See Advanced cluster configurations for more details.
NVIDIA Run:ai cluster requires Prometheus to be installed on the Kubernetes cluster.
OpenShift comes pre-installed with prometheus
For RKE2 see Enable Monitoring instructions to install Prometheus
There are many ways to install Prometheus. A simple example to install the community Kube-Prometheus Stack using helm, run the following commands:
Additional NVIDIA Run:ai capabilities, Distributed Training and Inference require additional Kubernetes applications (frameworks) to be installed on the cluster.
Distributed training enables training of AI models over multiple nodes. This requires installing a distributed training framework on the cluster. The following frameworks are supported:
There are several ways to install each framework. A simple method of installation example is the Kubeflow Training Operator which includes TensorFlow, PyTorch, XGBoost and JAX.
It is recommended to use Kubeflow Training Operator v1.9.2, and MPI Operator v0.6.0 or later for compatibility with advanced workload capabilities, such as Stopping a workload and Scheduling rules.
To install the Kubeflow Training Operator for TensorFlow, PyTorch, XGBoost and JAX frameworks, run the following command:
To install the MPI Operator for MPI v2, run the following command:
Inference enables serving of AI models. This requires the Knative Serving framework to be installed on the cluster and supports Knative versions 1.11 to 1.16. Follow the Installing Knative instructions or run:
Once installed, follow the below steps:
Create the knative-serving namespace:
Create a YAML file named knative-serving.yaml and replace the placeholder FQDN with your wildcard inference domain (for example, runai-inference.mycorp.local):
Apply the changes:
Configure NGINX to proxy requests to Kourier / Knative and handle TLS termination using the wildcard certificate. Create a YAML file named knative-ingress.yaml and replace the FQDN placeholders with your wildcard inference domain:
Apply the changes:
NVIDIA Run:ai allows for autoscaling a deployment according to the below metrics:
Latency (milliseconds)
Throughput (requests/sec)
Concurrency (requests)
Using a custom metric (for example, Latency) requires installing the Kubernetes Horizontal Pod Autoscaler (HPA). Use the following command to install. Make sure to update the {VERSION} in the below command with a supported Knative version.
This section explains how to configure NVIDIA Run:ai to generate health alerts and to connect these alerts to alert-management systems within your organization. Alerts are generated for NVIDIA Run:ai clusters.
NVIDIA Run:ai uses Prometheus for externalizing metrics and providing visibility to end-users. The NVIDIA Run:ai Cluster installation includes Prometheus or can connect to an existing Prometheus instance used in your organization. The alerts are based on the Prometheus AlertManager. Once installed, it is enabled by default.
This document explains how to:
Configure alert destinations - triggered alerts send data to specified destinations
Understand the out-of-the-box cluster alerts, provided by NVIDIA Run:ai
Add additional custom alerts
A Kubernetes cluster with the necessary permissions
Up and running NVIDIA Run:ai environment, including Prometheus Operator
command-line tool installed and configured to interact with the cluster
Use the steps below to set up monitoring alerts.
Verify that the Prometheus Operator Deployment is running. Copy the following command and paste it in your terminal, where you have access to the Kubernetes cluster. In your terminal, you can see an output indicating the deployment's status, including the number of replicas and their current state.
Verify that Prometheus instances are running. Copy the following command and paste it in your terminal. You can see the Prometheus instance(s) listed along with their status:
In each of the steps in this section, copy the content of the code snippet to a new YAML file (e.g., step1.yaml).
Copy the following command to your terminal, to apply the YAML file to the cluster:
Copy the following command to your terminal to create the AlertManager CustomResource, to enable AlertManager:
Copy the following command to your terminal to validate that the AlertManager instance has started:
Copy the following command to your terminal to validate that the Prometheus operator has created a Service for AlertManager:
Open the terminal on your local machine or another machine that has access to your Kubernetes cluster.
Copy and paste the following command in your terminal to edit the Prometheus configuration for the runai namespace. This command opens the Prometheus configuration file in your default text editor (usually vi or nano):
Copy and paste the following text to your terminal to change the configuration file:
Set out below are the various alert destinations.
In each step, copy the contents of the code snippets to a new file and apply it to the cluster using kubectl apply -f.
Add your smtp password as a secret:
Replace the relevant smtp details with your own, then apply the alertmanagerconfig using kubectl apply:
Save and exit the editor. The configuration is automatically reloaded.
Prometheus AlertManager provides a structured way to connect to alert-management systems. There are built-in plugins for popular systems such as PagerDuty and OpsGenie, including a generic Webhook.
Use to get a unique URL.
Use the upgrade cluster instructions to modify the values file:
Edit the values file to add the following, and replace <WEB-HOOK-URL> with the URL from :
Verify that you are receiving alerts on the , in the left pane:
A NVIDIA Run:ai cluster comes with several built-in alerts. Each alert notifies on a specific functionality of a NVIDIA Run:ai’s entity. There is also a single, inclusive alert: NVIDIA Run:ai Critical Problems, which aggregates all component-based alerts into a single cluster health test.
You can add additional alerts from NVIDIA Run:ai. Alerts are triggered by using the Prometheus query language with any NVIDIA Run:ai metric.
To create an alert, follow these steps using Prometheus query language with NVIDIA Run:ai Metrics:
Modify Values File: Use the upgrade cluster instructions to modify the values file.
Add Alert Structure: Incorporate alerts according to the structure outlined below. Replace placeholders <ALERT-NAME>, <ALERT-SUMMARY-TEXT>, <PROMQL-EXPRESSION>, <optional: duration s/m/h>, and <critical/warning> with appropriate values for your alert, as described below:
You can find an example in the .
This quick start provides a step-by-step walkthrough for running multiple LLMs (inference workload) on a single GPU using .
GPU memory swap expands the GPU physical memory to the CPU memory, allowing NVIDIA Run:ai to place and run more workloads on the same GPU physical hardware. This provides a smooth workload context switching between GPU memory and CPU memory, eliminating the need to kill workloads when the memory requirement is larger than what the GPU physical memory can provide.
Before you start, make sure:
kubectl create ns runaioc new-project runaihelm repo add ingress-nginx https://kubernetes.github.io/ingress-nginx
helm repo update
helm upgrade -i nginx-ingress ingress-nginx/ingress-nginx \
--namespace nginx-ingress --create-namespace \
--set controller.kind=DaemonSet \
--set controller.service.externalIPs="{<INTERNAL-IP>,<EXTERNAL-IP>}" # Replace <INTERNAL-IP> and <EXTERNAL-IP> with the internal and external IP addresses of one of the nodeshelm repo add ingress-nginx https://kubernetes.github.io/ingress-nginx
helm repo update
helm install nginx-ingress ingress-nginx/ingress-nginx \
--namespace nginx-ingress --create-namespacehelm repo add ingress-nginx https://kubernetes.github.io/ingress-nginx
helm repo update
helm install nginx-ingress ingress-nginx/ingress-nginx \
--namespace ingress-nginx --create-namespace \
--set controller.service.annotations.oci.oraclecloud.com/load-balancer-type=nlb \
--set controller.service.annotations.oci-network-load-balancer.oraclecloud.com/is-preserve-source=True \
--set controller.service.annotations.oci-network-load-balancer.oraclecloud.com/security-list-management-mode=None \
--set controller.service.externalTrafficPolicy=Local \
--set controller.service.annotations.oci-network-load-balancer.oraclecloud.com/subnet=<SUBNET-ID> # Replace <SUBNET-ID> with the subnet ID of one of your clusterkubectl -n runai create secret generic runai-ca-cert \
--from-file=runai-ca.pem=<ca_bundle_path>
kubectl label secret runai-ca-cert -n runai run.ai/cluster-wide=true run.ai/name=runai-ca-cert --overwriteoc -n runai create secret generic runai-ca-cert \
--from-file=runai-ca.pem=<ca_bundle_path>
oc -n openshift-monitoring create secret generic runai-ca-cert \
--from-file=runai-ca.pem=<ca_bundle_path>
oc label secret runai-ca-cert -n runai run.ai/cluster-wide=true run.ai/name=runai-ca-cert --overwritekubectl create ns gpu-operator#resourcequota.yaml
apiVersion: v1
kind: ResourceQuota
metadata:
name: gcp-critical-pods
namespace: gpu-operator
spec:
scopeSelector:
matchExpressions:
- operator: In
scopeName: PriorityClass
values:
- system-node-critical
- system-cluster-criticalkubectl patch deployment training-operator -n kubeflow --type='json' -p='[{"op": "add", "path": "/spec/template/spec/containers/0/args", "value": ["--enable-scheme=tfjob", "--enable-scheme=pytorchjob", "--enable-scheme=xgboostjob", "--enable-scheme=jaxjob"]}]'
kubectl delete crd mpijobs.kubeflow.orgkubectl create ns knative-servingapiVersion: operator.knative.dev/v1beta1
kind: KnativeServing
metadata:
name: knative-serving
namespace: knative-serving
spec:
config:
config-autoscaler:
enable-scale-to-zero: "true"
config-features:
kubernetes.podspec-affinity: enabled
kubernetes.podspec-init-containers: enabled
kubernetes.podspec-persistent-volume-claim: enabled
kubernetes.podspec-persistent-volume-write: enabled
kubernetes.podspec-schedulername: enabled
kubernetes.podspec-securitycontext: enabled
kubernetes.podspec-tolerations: enabled
kubernetes.podspec-volumes-emptydir: enabled
kubernetes.podspec-fieldref: enabled
kubernetes.containerspec-addcapabilities: enabled
kubernetes.podspec-nodeselector: enabled
multi-container: enabled
domain:
runai-inference.mycorp.local: "" # replace with the wildcard FQDN for Inference
network:
domainTemplate: '{{.Name}}-{{.Namespace}}.{{.Domain}}'
ingress-class: kourier.ingress.networking.knative.dev
default-external-scheme: https
high-availability:
replicas: 2
ingress:
kourier:
enabled: truekubectl apply -f knative-serving.yamlpod-security.kubernetes.io/audit=privileged
pod-security.kubernetes.io/enforce=privileged
pod-security.kubernetes.io/warn=privilegedkubectl create secret tls runai-cluster-domain-tls-secret -n runai \
--cert /path/to/fullchain.pem \ # Replace /path/to/fullchain.pem with the actual path to your TLS certificate
--key /path/to/private.pem # Replace /path/to/private.pem with the actual path to your private keykubectl create secret tls runai-cluster-inference-tls-secret -n knative-serving \
--cert /path/to/fullchain.pem \ # Replace /path/to/fullchain.pem with the actual path to your TLS certificate
--key /path/to/private.pem # Replace /path/to/private.pem with the actual path to your private keyhelm repo add prometheus-community https://prometheus-community.github.io/helm-charts
helm repo update
helm install prometheus prometheus-community/kube-prometheus-stack \
-n monitoring --create-namespace --set grafana.enabled=falsekubectl apply --server-side -k "github.com/kubeflow/training-operator.git/manifests/overlays/standalone?ref=v1.9.2"kubectl apply --server-side -f https://raw.githubusercontent.com/kubeflow/mpi-operator/v0.6.0/deploy/v2beta1/mpi-operator.yamlhelm repo add knative-operator https://knative.github.io/operator
helm install knative-operator --create-namespace --namespace knativeoperator --version 1.16.6 knative-operator/knative-operatorkubectl apply -f https://github.com/knative/serving/releases/download/knative-{VERSION}/serving-hpa.yaml4.14 to 4.17
v2.21 (latest)
1.30 to 1.32
4.14 to 4.18
kubectl apply -f resourcequota.yamlapiVersion: networking.k8s.io/v1
kind: Ingress
metadata:
name: knative-serving
namespace: knative-serving
spec:
ingressClassName: nginx
rules:
- host: '*.runai-inference.mycorp.local' # replace with the wildcard FQDN for Inference
http:
paths:
- backend:
service:
name: kourier
port:
number: 80
path: /
pathType: Prefix
tls:
- hosts:
- '*.runai-inference.mycorp.local' # replace with the wildcard FQDN for Inference
secretName: runai-cluster-inference-tls-secretkubectl apply -f knative-ingress.yamlDelete the prometheus pod to reset the pod's settings:
Save the changes and exit the text editor.
<ALERT-NAME>: Choose a descriptive name for your alert, such as HighCPUUsage or LowMemory.
<ALERT-SUMMARY-TEXT>: Provide a brief summary of what the alert signifies, for example, High CPU usage detected or Memory usage below threshold.
<PROMQL-EXPRESSION>: Construct a Prometheus query (PROMQL) that defines the conditions under which the alert should trigger. This query should evaluate to a boolean value (1 for alert, 0 for no alert).
<optional: duration s/m/h>: Optionally, specify a duration in seconds (s), minutes (m), or hours (h) that the alert condition should persist before triggering an alert. If not specified, the alert triggers as soon as the condition is met.
<critical/warning>: Assign a severity level to the alert, indicating its importance. Choose between critical for severe issues requiring immediate attention, or warning for less critical issues that still need monitoring.
Meaning
The cluster-sync Pod in the runai namespace might not be functioning properly
Impact
Possible impact - no info/partial info from the cluster is being synced back to the control-plane
Severity
Critical
Diagnosis
kubectl get pod -n runai to see if the cluster-sync pod is running
Troubleshooting/Mitigation
To diagnose issues with the cluster-sync pod, follow these steps:
Paste the following command to your terminal, to receive detailed information about the cluster-sync deployment:kubectl describe deployment cluster-sync -n runai
Check the Logs: Use the following command to view the logs of the cluster-sync deployment:kubectl logs deployment/cluster-sync -n runai
Analyze the Logs and Pod Details: From the information provided by the logs and the deployment details, attempt to identify the reason why the cluster-sync pod is not functioning correctly
Check Connectivity: Ensure there is a stable network connection between the cluster and the NVIDIA Run:ai Control Plane. A connectivity issue may be the root cause of the problem.
Contact Support: If the network connection is stable and you are still unable to resolve the issue, contact NVIDIA Run:ai support for further assistance
Meaning
The runai-agent pod may be too loaded, is slow in processing data (possible in very big clusters), or the runai-agent pod itself in the runai namespace may not be functioning properly.
Impact
Possible impact - no info/partial info from the control-plane is being synced in the cluster
Severity
Critical
Diagnosis
Run: kubectl get pod -n runai And see if the runai-agent pod is running.
Troubleshooting/Mitigation
To diagnose issues with the runai-agent pod, follow these steps:
Describe the Deployment: Run the following command to get detailed information about the runai-agent deployment:kubectl describe deployment runai-agent -n runai
Check the Logs: Use the following command to view the logs of the runai-agent deployment:kubectl logs deployment/runai-agent -n runai
Analyze the Logs and Pod Details: From the information provided by the logs and the deployment details, attempt to identify the reason why the runai-agent pod is not functioning correctly. There may be a connectivity issue with the control plane.
Check Connectivity: Ensure there is a stable network connection between the runai-agent and the control plane. A connectivity issue may be the root cause of the problem.
Consider Cluster Load: If the runai-agent appears to be functioning properly but the cluster is very large and heavily loaded, it may take more time for the agent to process data from the control plane.
Adjust Alert Threshold: If the cluster load is causing the alert to fire, you can adjust the threshold at which the alert triggers. The default value is 0.05. You can try changing it to a lower value (e.g., 0.045 or 0.04). To edit the value, paste the following in your terminal:kubectl edit runaiconfig -n runai/. In the editor, navigate to: spec: prometheus: agentPullPushRateMinForAlert . If the agentPullPushRateMinForAlert value does not exist, add it under spec -> prometheus .
Meaning
Runai container is using more than 90% of its Memory limit
Impact
The container might run out of memory and crash.
Severity
Critical
Diagnosis
Calculate the memory usage, this is performed by pasting the following to your terminal: container_memory_usage_bytes{namespace=~"runai
Troubleshooting/Mitigation
Add more memory resources to the container. If the issue persists, contact NVIDIA Run:ai
Meaning
Runai container is using more than 80% of its memory limit
Impact
The container might run out of memory and crash
Severity
Warning
Diagnosis
Calculate the memory usage, this can be done by pasting the following to your terminal: container_memory_usage_bytes{namespace=~"runai
Troubleshooting/Mitigation
Add more memory resources to the container. If the issue persists, contact NVIDIA Run:ai
Meaning
Runai container has restarted more than twice in the last 10 min
Impact
The container might become unavailable and impact the NVIDIA Run:ai system
Severity
Warning
Diagnosis
To diagnose the issue and identify the problematic pods, paste this into your terminal: kubectl get pods -n runai kubectl get pods -n runai-backendOne or more of the pods have a restart count >= 2.
Troubleshooting/Mitigation
Paste this into your terminal:kubectl logs -n NAMESPACE POD_NAMEReplace NAMESPACE and POD_NAME with the relevant pod information from the previous step. Check the logs for any standout issues and verify that the container has sufficient resources. If you need further assistance, contact NVIDIA Run:ai
Meaning
runai container is using more than 80% of its CPU limit
Impact
This might cause slowness in the operation of certain NVIDIA Run:ai features.
Severity
Warning
Diagnosis
Paste the following query to your terminal in order to calculate the CPU usage: rate(container_cpu_usage_seconds_total{namespace=~"runai
Troubleshooting/Mitigation
Add more CPU resources to the container. If the issue persists, please contact NVIDIA Run:ai.
Meaning
One of the critical NVIDIA Run:ai alerts is currently active
Impact
Impact is based on the active alert
Severity
Critical
Diagnosis
Check NVIDIA Run:ai alerts in Prometheus to identify any active critical alerts
Meaning
The Kubernetes node hosting GPU workloads is in an unknown state, and its health and readiness cannot be determined.
Impact
This may interrupt GPU workload scheduling and execution.
Severity
Critical - Node is either unschedulable or has unknown status. The node is in one of the following states:
Ready=Unknown: The control plane cannot communicate with the node.
Ready=False: The node is not healthy.
Unschedulable=True: The node is marked as unschedulable.
Diagnosis
Check the node's status using kubectl describe node, verify Kubernetes API server connectivity, and inspect system logs for GPU-specific or node-level errors.
Meaning
The Kubernetes node hosting GPU workloads has insufficient memory to support current or upcoming workloads.
Impact
GPU workloads may fail to schedule, experience degraded performance, or crash due to memory shortages, disrupting dependent applications.
Severity
Critical - Node is using more than 90% of its memory. Warning - Node is using more than 80% of its memory.
Diagnosis
Use kubectl top node to assess memory usage, identify memory-intensive pods, consider resizing the node or optimizing memory usage in affected pods.
Meaning
There are currently 0 available pods for the runai daemonset on the relevant node
Impact
No fractional GPU workloads support
Severity
Critical
Diagnosis
Paste the following command to your terminal: kubectl get daemonset -n runai-backend In the result of this command, identify the daemonset(s) that don’t have any running pods
Troubleshooting/Mitigation
Paste the following command to your terminal, where daemonsetX is the problematic daemonset from the pervious step: kubectl describe daemonsetX -n runai on the relevant deamonset(s) from the previous step. The next step is to look for the specific error which prevents it from creating pods. Possible reasons might be:
Node Resource Constraints: The nodes in the cluster may lack sufficient resources (CPU, memory, etc.) to accommodate new pods from the daemonset.
Node Selector or Affinity Rules: The daemonset may have node selector or affinity rules that are not matching with any nodes currently available in the cluster, thus preventing pod creation.
Meaning
Runai deployment has one or more unavailable pods
Impact
When this happens, there may be scale issues. Additionally, new versions cannot be deployed, potentially resulting in missing features.
Severity
Critical
Diagnosis
Paste the following commands to your terminal, in order to get the status of the deployments in the runai and runai-backend namespaces:kubectl get deployment -n runai kubectl get deployment -n runai-backendIdentify any deployments that have missing pods. Look for discrepancies in the DESIRED and AVAILABLE columns. If the number of AVAILABLE pods is less than the DESIRED pods, it indicates that there are missing pods.
Troubleshooting/Mitigation
Paste the following commands to your terminal, to receive detailed information about the problematic deployment:kubectl describe deployment <DEPLOYMENT_NAME> -n runai kubectl describe deployment <DEPLOYMENT_NAME> -n runai-backend
Paste the following commands to your terminal, to check the replicaset details associated with the deployment:kubectl describe replicaset <REPLICASET_NAME> -n runai kubectl describe replicaset <REPLICASET_NAME> -n runai-backend
Paste the following commands to your terminal to retrieve the logs for the deployment to identify any errors or issues:kubectl logs deployment/<DEPLOYMENT_NAME> -n runai kubectl logs deployment/<DEPLOYMENT_NAME> -n runai-backend
From the logs and the detailed information provided by the describe commands, analyze the reasons why the deployment is unable to create pods. Look for common issues such as:
Resource constraints (CPU, memory)
Misconfigured deployment settings or replicasets
Node selector or affinity rules preventing pod scheduling
Meaning
The project-controller in runai namespace had errors while reconciling projects
Impact
Some projects might not be in the “Ready” state. This means that they are not fully operational and may not have all the necessary components running or configured correctly.
Severity
Critical
Diagnosis
Retrieve the logs for the project-controller deployment by pasting the following command in your terminal:kubectl logs deployment/project-controller -n runai Carefully examine the logs for any errors or warning messages. These logs help you understand what might be going wrong with the project controller.
Troubleshooting/Mitigation
Once errors in the log have been identified, follow these steps to mitigate the issue: The error messages in the logs should provide detailed information about the problem.
Read through them to understand the nature of the issue. If the logs indicate which project failed to reconcile, you can further investigate by checking the status of that specific project.
Run the following command, replacing <PROJECT_NAME> with the name of the problematic project:kubectl get project <PROJECT_NAME> -o yaml
Review the status section in the YAML output. This section describes the current state of the project and provide insights into what might be causing the failure. If the issue persists, contact NVIDIA Run:ai.
Meaning
Runai statefulset has no available pods
Impact
Absence of Metrics Database Unavailability
Severity
Critical
Diagnosis
To diagnose the issue, follow these steps:
Check the status of the stateful sets in the runai-backend namespace by running the following command:kubectl get statefulset -n runai-backend
Identify any stateful sets that have no running pods. These are the ones that might be causing the problem.
Troubleshooting/Mitigation
Once you've identified the problematic stateful sets, follow these steps to mitigate the issue:
Describe the stateful set to get detailed information on why it cannot create pods. Replace X with the name of the stateful set:kubectl describe statefulset X -n runai-backend
Review the description output to understand the root cause of the issue. Look for events or error messages that explain why the pods are not being created.
If you're unable to resolve the issue based on the information gathered, contact NVIDIA Run:ai support for further assistance.

The project has an assigned quota of at least 1 GPU.
Dynamic GPU fractions is enabled.
GPU memory swap is enabled on at least one free node as detailed here.
Host-based routing is configured.
Browse to the provided NVIDIA Run:ai user interface and log in with your credentials.
To use the API, you will need to obtain a token as shown in API authentication.
Go to the Workload manager → Workloads
Click +NEW WORKLOAD and select Inference
Select under which cluster to create the workload
Select the project in which your workload will run
Select custom inference from Inference type (if applicable)
Enter a name for the workload (if the name already exists in the project, you will be requested to submit a different name)
Under Submission, select Flexible and click CONTINUE
Click the load icon. A side pane appears, displaying a list of available environments. To add a new environment:
Click the + icon to create a new environment
Enter quick-start as the name for the environment. The name must be unique.
Enter the NVIDIA Run:ai vLLM Image URL -
Click the load icon. A side pane appears, displaying a list of available compute resources. To add a new compute resource:
Click the + icon to create a new compute resource
Enter request-limit as the name for the compute resource. The name must be unique.
Set GPU devices per pod
Click CREATE INFERENCE
Go to the Workload manager → Workloads
Click +NEW WORKLOAD and select Inference
Select under which cluster to create the workload
Select the project in which your workload will run
Copy the following command to your terminal. Make sure to update the below parameters. For more details, see :
<COMPANY-URL> - The link to the NVIDIA Run:ai user interface
<TOKEN> - The API access token obtained in
<PROJECT-ID>
Go to the Workload manager → Workloads
Click +NEW WORKLOAD and select Inference
Select the cluster where the previous inference workload was created
Select the project where the previous inference workload was created
Select custom inference from Inference type (if applicable)
Enter a name for the workload (if the name already exists in the project, you will be requested to submit a different name)
Under Submission, select Flexible and click CONTINUE
Click the load icon. A side pane appears, displaying a list of available environments. Select the environment created in .
Click the load icon. A side pane appears, displaying a list of available compute resources. Select the compute resources created in .
Click CREATE INFERENCE
Go to the Workload manager → Workloads
Click +NEW WORKLOAD and select Inference
Select the cluster where the previous inference workload was created
Select the project where the previous inference workload was created
Copy the following command to your terminal. Make sure to update the below parameters. For more details, see :
<COMPANY-URL> - The link to the NVIDIA Run:ai user interface
<TOKEN> - The API access token obtained in
<PROJECT-ID>
Go to the Workload manager → Workloads
Click COLUMNS and select Connections
Select the link under the Connections column for the first inference workload created in Step 2
In the Connections Associated with Workload form, copy the URL under the Address column
Click +NEW WORKLOAD and select Workspace
Select the cluster where the previous inference workloads were created
Select the project where the previous inference workloads were created
Select Start from scratch to launch a new workspace quickly
Enter a name for the workspace (if the name already exists in the project, you will be requested to submit a different name)
Under Submission, select Flexible and click CONTINUE
Click the load icon. A side pane appears, displaying a list of available environments. Select the ‘chatbot-ui’ environment for your workspace (Image URL: runai.jfrog.io/core-llm/llm-app)
Set the runtime settings for the environment with the following environment variables:
Name: RUNAI_MODEL_NAME Source: Custom Value: meta-llama/Llama-3.2-1B-Instruct
Click the load icon. A side pane appears, displaying a list of available compute resources. Select ‘cpu-only’ from the list.
If ‘cpu-only’ is not displayed, follow the below steps:
Click the + icon to create a new compute resource
Click CREATE WORKSPACE
Go to the Workload manager → Workloads
Click COLUMNS and select Connections
Select the link under the Connections column for the first inference workload created in
In the Connections Associated with Workload form, copy the URL under the Address
Copy the following command to your terminal. Make sure to update the below parameters. For more details, see
<COMPANY-URL> - The link to the NVIDIA Run:ai user interface
<TOKEN> - The API access token obtained in
<PROJECT-ID>
Go to the Workload manager → Workloads
Click COLUMNS and select Connections
Select the link under the Connections column for the second inference workload created in Step 3
In the Connections Associated with Workload form, copy the URL under the Address column
Click +NEW WORKLOAD and select Workspace
Select the cluster where the previous inference workloads were created
Select the project where the previous inference workloads were created
Select Start from scratch to launch a new workspace quickly
Enter a name for the workspace (if the name already exists in the project, you will be requested to submit a different name)
Under Submission, select Flexible and click CONTINUE
Click the load icon. A side pane appears, displaying a list of available environments. Select the environment created in .
Set the runtime settings for the environment with the following environment variables:
Name: RUNAI_MODEL_NAME Source: Custom Value: meta-llama/Llama-3.2-1B-Instruct
Click the load icon. A side pane appears, displaying a list of available compute resources. Select the compute resources created in .
Click CREATE WORKSPACE
Go to the Workload manager → Workloads
Click COLUMNS and select Connections
Select the link under the Connections column for the second inference workload created in
In the Connections Associated with Workload form, copy the URL under the Address
Copy the following command to your terminal. Make sure to update the below parameters. For more details, see
<COMPANY-URL> - The link to the NVIDIA Run:ai user interface
<TOKEN> - The API access token obtained in
<PROJECT-ID>
Select the newly created workspace that you want to connect to
Click CONNECT
Select the ChatbotUI tool. The selected tool is opened in a new tab on your browser.
Query both workspaces simultaneously and see them both responding. The one on CPU RAM at the time will take longer as it switches back to the GPU and vice versa.
To connect to the ChatbotUI tool, browse directly to https://<COMPANY-URL>/<PROJECT-NAME>/<WORKLOAD-NAME>
Query both workspaces simultaneously and see them both responding. The one on CPU RAM at the time will take longer as it switches back to the GPU and vice versa.
Manage and monitor your newly created workloads using the Workloads table.
kubectl delete pod prometheus-runai-0 -n runaikubectl get deployment kube-prometheus-stack-operator -n monitoringkubectl get prometheus -n runaikubectl apply -f step1.yaml apiVersion: monitoring.coreos.com/v1
kind: Alertmanager
metadata:
name: runai
namespace: runai
spec:
replicas: 1
alertmanagerConfigSelector:
matchLabels:
alertmanagerConfig: runai kubectl get alertmanager -n runaikubectl get svc alertmanager-operated -n runaikubectl edit runaiconfig -n runaiprometheus:
spec:
alerting:
alertmanagers:
- name: alertmanager-operated
namespace: runai
port: webapiVersion: v1
kind: Secret
metadata:
name: alertmanager-smtp-password
namespace: runai
stringData:
password: "your_smtp_password"apiVersion: monitoring.coreos.com/v1alpha1
kind: AlertmanagerConfig
metadata:
name: runai
namespace: runai
labels:
alertmanagerConfig: runai
spec:
route:
continue: true
groupBy:
- alertname
groupWait: 30s
groupInterval: 5m
repeatInterval: 1h
matchers:
- matchType: =~
name: alertname
value: Runai.*
receiver: email
receivers:
- name: 'email'
emailConfigs:
- to: '<destination_email_address>'
from: '<from_email_address>'
smarthost: 'smtp.gmail.com:587'
authUsername: '<smtp_server_user_name>'
authPassword:
name: alertmanager-smtp-password
key: password codekube-prometheus-stack:
...
alertmanager:
enabled: true
config:
global:
resolve_timeout: 5m
receivers:
- name: "null"
- name: webhook-notifications
webhook_configs:
- url: <WEB-HOOK-URL>
send_resolved: true
route:
group_by:
- alertname
group_interval: 5m
group_wait: 30s
receiver: 'null'
repeat_interval: 10m
routes:
- receiver: webhook-notificationskube-prometheus-stack:
additionalPrometheusRulesMap:
custom-runai:
groups:
- name: custom-runai-rules
rules:
- alert: <ALERT-NAME>
annotations:
summary: <ALERT-SUMMARY-TEXT>
expr: <PROMQL-EXPRESSION>
for: <optional: duration s/m/h>
labels:
severity: <critical/warning>runai.jfrog.io/core-llm/runai-vllm:v0.6.4-0.10.0Set the inference serving endpoint to HTTP and the container port to 8000
Set the runtime settings for the environment. Click +ENVIRONMENT VARIABLE and add the following:
Name: RUNAI_MODEL Source: Custom Value: meta-llama/Llama-3.2-1B-Instruct (you can choose any vLLM supporting model from Hugging Face)
Name: RUNAI_MODEL_NAME Source: Custom Value: Llama-3.2-1B-Instruct
Name: HF_TOKEN Source: Custom Value: <Your Hugging Face token> (only needed for gated models)
Name: VLLM_RPC_TIMEOUT Source: Custom Value: 60000
Click CREATE ENVIRONMENT
Select the newly created environment from the side pane
Set GPU memory per device
Select % (of device) - Fraction of a GPU device’s memory
Set the memory Request - 50 (the workload will allocate 50% of the GPU memory)
Toggle Limit and set to 100%
Optional: set the CPU compute per pod - 0.1 cores (default)
Optional: set the CPU memory per pod - 100 MB (default)
Select More settings and toggle Increase shared memory size
Click CREATE COMPUTE RESOURCE
Select the newly created compute resource from the side pane
Select custom inference from Inference type (if applicable)
Enter a name for the workload (if the name already exists in the project, you will be requested to submit a different name)
Under Submission, select Original and click CONTINUE
Create an environment for your workload
Click +NEW ENVIRONMENT
Enter quick-start as the name for the environment. The name must be unique.
Enter the NVIDIA Run:ai vLLM Image URL - runai.jfrog.io/core-llm/runai-vllm:v0.6.4-0.10.0
Set the runtime settings for the environment. Click +ENVIRONMENT VARIABLE and add the following:
Name: RUNAI_MODEL Source: Custom Value: meta-llama/Llama-3.2-1B-Instruct (you can choose any vLLM supporting model from Hugging Face)
Name: RUNAI_MODEL_NAME Source: Custom Value: Llama-3.2-1B-Instruct
Click CREATE ENVIRONMENT
The newly created environment will be selected automatically
Create a new “request-limit” compute resource
Click +NEW COMPUTE RESOURCE
Enter request-limit as the name for the compute resource. The name must be unique.
Set GPU devices per pod - 1
Set GPU memory per device
Select % (of device) - Fraction of a GPU device’s memory
Set the memory Request - 50 (the workload will allocate 50% of the GPU memory)
Toggle Limit and set to 100%
Optional: set the CPU compute per pod - 0.1 cores (default)
Optional: set the CPU memory per pod - 100 MB (default)
Select More settings and toggle Increase shared memory size
Click CREATE COMPUTE RESOURCE
The newly created compute resource will be selected automatically
Click CREATE INFERENCE
<CLUSTER-UUID> - The unique identifier of the Cluster. You can get the Cluster UUID via the Get Clusters API.
Select custom inference from Inference type (if applicable)
Enter a name for the workload (if the name already exists in the project, you will be requested to submit a different name)
Under Submission, select Original and click CONTINUE
Select the environment created in Step 2
Select the compute resource created in Step 2
Click CREATE INFERENCE
<CLUSTER-UUID> - The unique identifier of the Cluster. You can get the Cluster UUID via the Get Clusters API.
Name: RUNAI_MODEL_BASE_URL Source: Custom Value: Add the address link from Step 4
Delete the PATH_PREFIX environment variable if you are using host-based routing.
If ‘chatbot-ui’ is not displayed in the gallery, follow the below steps:
Click the + icon to create a new environment
Enter chatbot-ui as the name for the environment. The name must be unique.
Enter the chatbot-ui Image URL - runai.jfrog.io/core-llm/llm-app
Tools - Set the connection for your tool
Click +TOOL
Select Chatbot UI tool from the list
Set the runtime settings for the environment. Click +ENVIRONMENT VARIABLE and add the following:
Name: RUNAI_MODEL_NAME Source: Custom Value: meta-llama/Llama-3.2-1B-Instruct
Name: RUNAI_MODEL_BASE_URL Source: Custom Value: Add the Address link
Click CREATE ENVIRONMENT
Select the newly created environment from the side pane
Set GPU devices per pod - 0
Set CPU compute per pod - 0.1 cores
Set the CPU memory per pod - 100 MB (default)
Click CREATE COMPUTE RESOURCE
Select the newly created compute resource from the side pane
Click +NEW WORKLOAD and select Workspace
Select the cluster where the previous inference workloads were created
Select the project where the previous inference workloads were created
Select Start from scratch to launch a new workspace quickly
Enter a name for the workspace (if the name already exists in the project, you will be requested to submit a different name)
Under Submission, select Original and click CONTINUE
Select the ‘chatbot-ui’ environment for your workspace (Image URL: runai.jfrog.io/core-llm/llm-app)
Set the runtime settings for the environment with the following environment variables:
Name: RUNAI_MODEL_NAME Source: Custom Value: meta-llama/Llama-3.2-1B-Instruct
Name: RUNAI_MODEL_BASE_URL Source: Custom Value: Add the address link from Step 4
Delete the PATH_PREFIX environment variable if you are using host-based routing.
If ‘chatbot-ui’ is not displayed in the gallery, follow the below steps:
Click +NEW ENVIRONMENT
Enter chatbot-ui as the name for the environment. The name must be unique.
Enter the chatbot-ui Image URL - runai.jfrog.io/core-llm/llm-app
The newly created environment will be selected automatically
Select the ‘cpu-only’ compute resource for your workspace
If ‘cpu-only’ is not displayed in the gallery, follow the below steps:
Click +NEW COMPUTE RESOURCE
Enter cpu-only as the name for the compute resource. The name must be unique.
Set GPU devices per pod - 0
Set CPU compute per pod - 0.1 cores
Set the CPU memory per pod - 100 MB (default)
Click CREATE COMPUTE RESOURCE
The newly created compute resource will be selected automatically
Click CREATE WORKSPACE
<CLUSTER-UUID> - The unique identifier of the Cluster. You can get the Cluster UUID via the Get Clusters API.
<URL> - The URL for connecting an external service related to the workload. You can get the URL via the List Workloads API.
Name: RUNAI_MODEL_BASE_URL Source: Custom Value: Add the Address link
Delete the PATH_PREFIX environment variable if you are using host-based routing.
Click +NEW WORKLOAD and select Workspace
Select the cluster where the previous inference workloads were created
Select the project where the previous inference workloads were created
Select Start from scratch to launch a new workspace quickly
Enter a name for the workspace (if the name already exists in the project, you will be requested to submit a different name)
Under Submission, select Original and click CONTINUE
Select the environment created in Step 4
Set the runtime settings for the environment with the following environment variables:
Name: RUNAI_MODEL_NAME Source: Custom Value: meta-llama/Llama-3.2-1B-Instruct
Name: RUNAI_MODEL_BASE_URL Source: Custom Value: Add the Address link
Delete the PATH_PREFIX environment variable if you are using host-based routing.
Select the compute resource created in Step 4
Click CREATE WORKSPACE
<CLUSTER-UUID> - The unique identifier of the Cluster. You can get the Cluster UUID via the Get Clusters API.
<URL> - The URL for connecting an external service related to the workload. You can get the URL via the List Workloads API.
This article provides examples of:
Creating a new rule within a policy
Best practices for adding sections to a policy
A full example of a whole policy
This example shows how to add a new limitation to the GPU usage for workloads of type workspace:
Check the documentation and select the field(s) that are most relevant for GPU usage.
Search the field in the . For example, gpuDevicesRequest appears under the Compute fields sub-table and appears as follow:
Use the value type of the gpuDevicesRequest field indicated in the table - “integer” and navigate to the Value types table to view the possible rules that can be applied to this value type -
for integer, the options are:
canEdit
required
curl -L 'https://<COMPANY-URL>/api/v1/workloads/inferences' \
-H 'Content-Type: application/json' \
-H 'Authorization: Bearer <TOKEN>' \
-d '{
"name": "workload-name",
"useGivenNameAsPrefix": true,
"projectId": "<PROJECT-ID>",
"clusterId": "<CLUSTER-UUID>",
"spec": {
"image": "runai.jfrog.io/core-llm/runai-vllm:v0.6.4-0.10.0",
"imagePullPolicy":"IfNotPresent",
"environmentVariables": [
{
"name": "RUNAI_MODEL",
"value": "meta-lama/Llama-3.2-1B-Instruct"
},
{
"name": "VLLM_RPC_TIMEOUT",
"value": "60000"
},
{
"name": "HF_TOKEN",
"value":"<INSERT HUGGINGFACE TOKEN>"
}
],
"compute": {
"gpuDevicesRequest": 1,
"gpuRequestType": "portion",
"gpuPortionRequest": 0.1,
"gpuPortionLimit": 1,
"cpuCoreRequest":0.2,
"cpuMemoryRequest": "200M",
"largeShmRequest": false
},
"servingPort": {
"container": 8000,
"protocol": "http",
"authorizationType": "public"
}
}
} curl -L 'https://<COMPANY-URL>/api/v1/workloads/inferences' \
-H 'Content-Type: application/json' \
-H 'Authorization: Bearer <TOKEN>' \
-d '{
"name": "workload-name",
"useGivenNameAsPrefix": true,
"projectId": "<PROJECT-ID>",
"clusterId": "<CLUSTER-UUID>",
"spec": {
"image": "runai.jfrog.io/core-llm/runai-vllm:v0.6.4-0.10.0",
"imagePullPolicy":"IfNotPresent",
"environmentVariables": [
{
"name": "RUNAI_MODEL",
"value": "meta-lama/Llama-3.2-1B-Instruct"
},
{
"name": "VLLM_RPC_TIMEOUT",
"value": "60000"
},
{
"name": "HF_TOKEN",
"value":"<INSERT HUGGINGFACE TOKEN>"
}
],
"compute": {
"gpuDevicesRequest": 1,
"gpuRequestType": "portion",
"gpuPortionRequest": 0.1,
"gpuPortionLimit": 1,
"cpuCoreRequest":0.2,
"cpuMemoryRequest": "200M",
"largeShmRequest": false
},
"servingPort": {
"container": 8000,
"protocol": "http",
"authorizationType": "public"
}
}
} curl -L 'https://<COMPANY-URL>/api/v1/workloads/workspaces' \
-H 'Content-Type: application/json' \
-H 'Authorization: Bearer <TOKEN>' \
-d '{
"name": "workload-name",
"projectId": "<PROJECT-ID>",
"clusterId": "<CLUSTER-UUID>",
"spec": {
"image": "runai.jfrog.io/core-llm/llm-app",
"environmentVariables": [
{
"name": "RUNAI_MODEL_NAME",
"value": "meta-llama/Llama-3.2-1B-Instruct"
},
{
"name": "RUNAI_MODEL_BASE_URL",
"value": "<URL>"
}
],
"compute": {
"cpuCoreRequest":0.1,
"cpuMemoryRequest": "100M",
}
}
}curl -L 'https://<COMPANY-URL>/api/v1/workloads/workspaces' \
-H 'Content-Type: application/json' \
-H 'Authorization: Bearer <TOKEN>' \
-d '{
"name": "workload-name",
"projectId": "<PROJECT-ID>", '\
"clusterId": "<CLUSTER-UUID>", \
"spec": {
"image": "runai.jfrog.io/core-llm/llm-app",
"environmentVariables": [
{
"name": "RUNAI_MODEL_NAME",
"value": "meta-llama/Llama-3.2-1B-Instruct"
},
{
"name": "RUNAI_MODEL_BASE_URL",
"value": "<URL>"
}
],
"compute": {
"cpuCoreRequest":0.1,
"cpuMemoryRequest": "100M",
}
}
}Name: HF_TOKEN Source: Custom Value: <Your Hugging Face token> (only needed for gated models)
Name: VLLM_RPC_TIMEOUT Source: Custom Value: 60000
Name: RUNAI_MODEL_TOKEN_LIMIT Source: Custom Value: 8192
Name: RUNAI_MODEL_MAX_LENGTH Source: Custom Value: 16384
Tools - Set the connection for your tool
Click +TOOL
Select Chatbot UI tool from the list
Set the runtime settings for the environment. Click +ENVIRONMENT VARIABLE and add the following:
Name: RUNAI_MODEL_NAME Source: Custom Value: meta-llama/Llama-3.2-1B-Instruct
Name: RUNAI_MODEL_BASE_URL Source: Custom Value: Add the Address link
Name: RUNAI_MODEL_TOKEN_LIMIT Source: Custom Value: 8192
Name: RUNAI_MODEL_MAX_LENGTH Source: Custom Value: 16384
Click CREATE ENVIRONMENT
max
step
Proceed to the Rule Type table, select the required rule for the limitation of the field - for example “max” and use the examples syntax to indicate the maximum GPU device requested.
gpuDeviceRequest
Specifies the number of GPUs to allocate for the created workload. Only if gpuDeviceRequest = 1, the gpuRequestType can be defined.
integer
Workspace & Training
{
"spec": {
"compute": {
"gpuDevicesRequest": 1,
"gpuRequestType": "portion",
"gpuPortionRequest": 0.5,
"gpuPortionLimit": 0.5,
"gpuMemoryRequest": "10M",
"gpuMemoryLimit": "10M",
"migProfile": "1g.5gb",
"cpuCoreRequest": 0.5,
"cpuCoreLimit": 2,
"cpuMemoryRequest": "20M",
"cpuMemoryLimit": "30M",
"largeShmRequest": false,
"extendedResources": [
{
"resource": "hardware-vendor.example/foo",
"quantity": 2,
"exclude": false
}
]
},
}
}compute:
gpuDevicesRequest:
max: 2defaults:
createHomeDir: true
environmentVariables:
instances:
- name: MY_ENV
value: my_value
security:
allowPrivilegeEscalation: false
rules:
storage:
s3:
attributes:
url:
options:
- value: https://www.google.com
displayed: https://www.google.com
- value: https://www.yahoo.com
displayed: https://www.yahoo.comrules:
storage:
dataVolume:
instances:
canAdd: false
hostPath:
instances:
canAdd: false
pvc:
instances:
canAdd: false
git:
attributes:
repository:
required: true
branch:
required: true
path:
required: true
nfs:
instances:
canAdd: false
s3:
instances:
canAdd: falsecompute:
cpuCoreRequest:
required: true
min: 0
max: 8
cpuCoreLimit:
min: 0
max: 8
cpuMemoryRequest:
required: true
min: '0'
max: 16G
cpuMemoryLimit:
min: '0'
max: 8G
migProfile:
canEdit: false
gpuPortionRequest:
min: 0
max: 1
gpuMemoryRequest:
canEdit: false
extendedResources:
instances:
canAdd: falsedefaults:
worker:
command: my-command-worker-1
environmentVariables:
instances:
- name: LOG_DIR
value: policy-worker-to-be-ignored
- name: ADDED_VAR
value: policy-worker-added
security:
runAsUid: 500
storage:
s3:
attributes:
bucket: bucket1-worker
master:
command: my-command-master-2
environmentVariables:
instances:
- name: LOG_DIR
value: policy-master-to-be-ignored
- name: ADDED_VAR
value: policy-master-added
security:
runAsUid: 800
storage:
s3:
attributes:
bucket: bucket1-master
rules:
worker:
command:
options:
- value: my-command-worker-1
displayed: command1
- value: my-command-worker-2
displayed: command2
storage:
nfs:
instances:
canAdd: false
s3:
attributes:
bucket:
options:
- value: bucket1-worker
- value: bucket2-worker
master:
command:
options:
- value: my-command-master-1
displayed: command1
- value: my-command-master-2
displayed: command2
storage:
nfs:
instances:
canAdd: false
s3:
attributes:
bucket:
options:
- value: bucket1-master
- value: bucket2-masterules:
imagePullPolicy:
required: true
options:
- value: Always
displayed: Always
- value: Never
displayed: Never
createHomeDir:
canEdit: falserules:
security:
runAsUid:
min: 1
max: 32700
allowPrivilegeEscalation:
canEdit: falsedefaults:
createHomeDir: true
imagePullPolicy: IfNotPresent
nodePools:
- node-pool-a
- node-pool-b
environmentVariables:
instances:
- name: WANDB_API_KEY
value: REPLACE_ME!
- name: WANDB_BASE_URL
value: https://wandb.mydomain.com
compute:
cpuCoreRequest: 0.1
cpuCoreLimit: 20
cpuMemoryRequest: 10G
cpuMemoryLimit: 40G
largeShmRequest: true
security:
allowPrivilegeEscalation: false
storage:
git:
attributes:
repository: https://git-repo.my-domain.com
branch: master
hostPath:
instances:
- name: vol-data-1
path: /data-1
mountPath: /mount/data-1
- name: vol-data-2
path: /data-2
mountPath: /mount/data-2
rules:
createHomeDir:
canEdit: false
imagePullPolicy:
canEdit: false
environmentVariables:
instances:
locked:
- WANDB_BASE_URL
compute:
cpuCoreRequest:
max: 32
cpuCoreLimit:
max: 32
cpuMemoryRequest:
min: 1G
max: 20G
cpuMemoryLimit:
min: 1G
max: 40G
largeShmRequest:
canEdit: false
extendedResources:
instances:
canAdd: false
security:
allowPrivilegeEscalation:
canEdit: false
runAsUid:
min: 1
storage:
hostPath:
instances:
locked:
- vol-data-1
- vol-data-2
imposedAssets:
- 4ba37689-f528-4eb6-9377-5e322780cc27
defaults: null
rules: null
imposedAssets:
- f12c965b-44e9-4ff6-8b43-01d8f9e630ccThis section provides details on all hotfixes available for version 2.21. Hotfixes are critical updates released between our major and minor versions to address specific issues or vulnerabilities. These updates ensure the system remains secure, stable, and optimized without requiring a full version upgrade.
2.21.57
20/11/2025
RUN-33802
Fixed an issue that caused distributed inference workloads to become unsynchronized.
2.21.57
20/11/2025
RUN-33144
Fixed a security vulnerability related to CVE-2025-62156 with severity HIGH.
2.21.56
20/11/2025
RUN-33613
Fixed missing validations for CPU resources when the CPU quota feature flag was disabled, which caused project and department updates to skip required CPU checks.
2.21.56
20/11/2025
RUN-33947
Fixed an issue where SMTP configurations using the “none” option still sent empty username/password fields. Added the auth_none type to ensure no credentials are sent for passwordless SMTP servers.
2.21.55
13/11/2025
RUN-33840
Fixed a revision sync issue that caused excessive error logs
2.21.53
02/11/2024
RUN-33053
Fixed an issue that caused conflicts with additional built-in Prometheus Operator deployments in OpenShift.
2.21.53
02/11/2024
RUN-33006
Fixed an issue in the CLI installer where the PATH was not configured for all shells. The installer now correctly configures PATH for both zsh and bash.
2.21.53
02/11/2024
RUN-32945
Fixed a security vulnerability related to CVE-2025-58754 with severity HIGH.
2.21.53
02/11/2024
RUN-32548
Fixed an issue where, in certain edge cases, removing an inference workload without deleting its revision caused the cluster to panic during revision sync.
2.21.52
19/10/2025
RUN-31803
Fixed an issue where the Quota management dashboard occasionally displayed incorrect GPU quota values.
2.21.52
19/10/2025
RUN-33044
Fixed an issue where the workload controller could delete all running workloads when init-ca generated a new certificate (every 30 days).
2.21.51
16/10/2025
RUN-31383
Fixed a security vulnerability related to CVE-2025-7783 with severity HIGH.
2.21.51
16/10/2025
RUN-31422
Fixed an issue where updating project resources created through the deprecated Projects API did not work correctly.
2.21.51
16/10/2025
RUN-31571
Fixed a security vulnerability related to CVE-2025-6965 with severity HIGH.
2.21.51
16/10/2025
RUN-31792
Fixed a security vulnerability related to CVE-2025-7425 with severity HIGH.
2.21.51
16/10/2025
RUN-31855
Fixed a security vulnerability related to CVE-2025-47907 with severity HIGH.
2.21.51
16/10/2025
RUN-31993
Fixed a security vulnerability related to CVE-2025-22868 with severity HIGH.
2.21.51
16/10/2025
RUN-32146
Fixed a security vulnerability related to CVE-2025-5914 with severity HIGH.
2.21.51
16/10/2025
RUN-32572
Fixed an issue where the RunaiAgentPullRateLow and RunaiAgentClusterInfoPushRateLow Prometheus alerts were firing incorrectly without cause.
2.21.51
16/10/2025
RUN-32730
Fixed an issue where incorrect average GPU utilization per project and workload type was displayed in the Projects view charts and tables.
2.21.51
16/10/2025
RUN-32789
Fixed an issue in CLI v2 where the --master-extended-resource flag had no effect in MPI training workloads.
2.21.51
16/10/2025
RUN-32889
Fixed an issue where idle GPU timeout rules were incorrectly applied to preemptible workspaces.
2.21.51
16/10/2025
RUN-33039
Fixed an issue where setting uid or gid to 0 during environment creation was not allowed.
2.21.46
12/08/2025
RUN-28394
Fixed an issue where using the GET Roles API returned a 403 unauthorized for all users.
2.21.46
12/08/2025
RUN-31008
Fixed a security vulnerability related to CVE-2025-53547 with severity HIGH.
2.21.46
12/08/2025
RUN-31051
Fixed a security vulnerability related to CVE-2025-49794 with severity HIGH.
2.21.46
12/08/2025
RUN-31310
Fixed a security vulnerability related to CVE-2025-22868 with severity HIGH.
2.21.46
12/08/2025
RUN-31678
Fixed an issue where the workload flexible submission form did not load the correct default node pools for a project.
2.21.45
01/08/2025
RUN-31265
Fixed a security vulnerability related to CVE-2025-30749 with severity HIGH.
2.21.45
01/08/2025
RUN-31007
Fixed a security vulnerability related to CVE-2025-22874 with severity HIGH.
2.21.43
30/07/2025
RUN-29828
Fixed an issue where the completion time date formatting in the Workload grid was inconsistent. Also resolved a bug where exported CSV files shifted date values to the next cell.
2.21.43
30/07/2025
RUN-31039
Fixed a base image security vulnerability in libxml2 related to CVE-2025-49796 with severity HIGH.
2.21.43
30/07/2025
RUN-31263
Fixed an issue where setting defaults for servingPort fields failed and incorrectly required the container port default as well.
2.21.42
24/07/2025
RUN-30746
Fixed an issue where workloads could not be scheduled if the combined length of the project name and node pool name was excessively long.
2.21.42
24/07/2025
RUN-31039
Fixed a security vulnerability in golang.org/x/oauth2 related to CVE-2025-22868 with severity HIGH.
2.21.42
24/07/2025
RUN-31358
Fixed an issue where enabling enableWorkloadOwnershipProtection for inference workloads caused newly submitted workloads to get stuck.
2.21.41
20/07/2025
RUN-31131
Fixed a security vulnerability in runai-container-runtime-installer and runai-container-toolkit related to CVE-2025-49794 with severity HIGH.
2.21.39
17/07/2025
RUN-29092
Fixed an issue where project quota could not be changed due to scheduling rules being set to 0 instead of null.
2.21.38
14/07/2025
RUN-28377
Fixed an issue where the CLI cache folder was created in a location where the user might not have sufficient permissions, leading to failures. The cache folder is now created in the same directory as the config file.
2.21.38
14/07/2025
RUN-30713
Fixed an issue where configuring an incorrect Auth URL during CLI installation could lead to connectivity issues. To prevent this, the option to set the Auth URL during installation has been removed. The install script now automatically sets the control plane URL based on the script's source.
2.21.37
09/07/2025
RUN-29113
Fixed a security vulnerability in DOMPurify related to CVE-2024-24762 with severity HIGH.
2.21.37
09/07/2025
RUN-30634
Fixed a security vulnerability in cluster-installer related to CVE-2025-30204 with severity HIGH.
2.21.37
09/07/2025
RUN-29831
Fixed an issue where the API documentation for asset filtering parameters was inaccurate.
2.21.37
09/07/2025
RUN-30673
Fixed an issue where users with create permissions on one scope and read-only permissions on another were incorrectly allowed to create projects in both scopes.
2.21.37
09/07/2025
RUN-30657
Fixed a security vulnerability in runai-container-runtime-installer and runai-container-toolkit related to CVE-2025-6020 with severity HIGH.
2.21.33
30/06/2025
RUN-30197
Fixed a security vulnerability in with stdlib package in go v1.24.2 related to CVE-2025-22874 with severity HIGH.
2.21.32
29/06/2025
RUN-30674
Fixed an issue where, on rare occasions, running the runai upgrade command deleted all files in the current directory.
2.21.30
29/06/2025
RUN-25883
Fixed a security vulnerability in io.netty:netty-handler related to CVE-2025-24970 with severity HIGH.
2.21.30
29/06/2025
RUN-30666
Fixed an issue where users were unable to create Hugging Face workloads due to a missing function in the system.
2.21.29
25/06/2025
RUN-27390
Fixed an issue where CPU-only workloads submitted via the CLI incorrectly displayed a GPU allocation value.
2.21.29
25/06/2025
RUN-29768
Fixed an issue where the Get token request returned a 500 error when the email mapper failed.
2.21.29
25/06/2025
RUN-29049
Fixed a security vulnerability in github.com.golang.org.x.crypto related to CVE-2025-22869 with severity HIGH.
2.21.28
25/06/2025
RUN-29143
Fixed an issue where nodes could become unschedulable when workloads were submitted to a different node pool.
2.21.27
17/06/2025
RUN-29709
Fixed a security vulnerability in jq cli related to CVE-2024-53427 with severity HIGH.
Fixed a security vulnerability in jq cli related to CVE-2025-48060 with severity HIGH.
2.21.27
17/06/2025
RUN-29756
Fixed an issue where not all subjects were returned for each project or department
2.21.27
17/06/2025
RUN-29700
Fixed a security vulnerability in github.com/moby and github.com/docker/docker related to CVE-2024-41110 with severity Critical.
2.21.25
11/06/2025
RUN-29548
Fixed a typo in the documentation where the API key was incorrectly written as enforceRun:aiScheduler instead of the correct enforceRunaiScheduler.
2.21.25
11/06/2025
RUN-29320
Fixed an issue in CLI v2 where the update server did not receive the terminal size during exec commands requiring TTY support. The terminal size is now set once upon session creation, ensuring proper behavior for interactive sessions.
2.21.24
08/06/2025
RUN-29282
Fixed a security vulnerability in golang.org.x.crypto related to CVE-2025-22869 with severity HIGH.
2.21.23
08/06/2025
RUN-28891
Fixed a security vulnerability in golang.org/x/crypto related to CVE-2024-45337 with severity HIGH.
Fixed a security vulnerability in go-git/go-git related to CVE-2025-21613 with severity HIGH.
2.21.23
08/06/2025
RUN-25281
Fixed an issue where deploying a Hugging Face model with vLLM using the Hugging Face inference UI form on an OpenShift environment failed due to permission errors.
2.21.22
03/06/2025
RUN-29341
Fixed an issue which caused high CPU usage in the Cluster API.
2.21.22
03/06/2025
RUN-29323
Fixed an issue where Prometheus failed to send metrics for OpenShift.
2.21.19
27/05/2025
RUN-29093
Fixed an issue where rotating the runai-config webhook secret caused the app.kubernetes.io/managed-by=helm label to be removed.
2.21.18
27/05/2025
RUN-28286
Fixed an issue where CPU-only workloads incorrectly triggered idle timeout notifications intended for GPU workloads.
2.21.18
27/05/2025
RUN-28555
Fixed an issue in Admin → General Settings where the "Disabled" workloads count displayed inconsistently between the collapsed and expanded views.
2.21.18
27/05/2025
RUN-26361
Fixed an issue where Prometheus remote-write credentials were not properly updated on OpenShift clusters.
2.21.18
27/05/2025
RUN-28780
Fixed an issue where Hugging Face model validation incorrectly blocked some valid models supported by vLLM and TGI.
2.21.18
27/05/2025
RUN-28851
Fixed an issue in CLI v2 where the port-forward command terminated SSH connections after 15–30 seconds due to an idle timeout.
2.21.18
27/05/2025
RUN-25281
Fixed an issue where the Hugging Face UI submission flow failed on OpenShift (OCP) clusters.
2.21.17
21/05/2025
RUN-28266
Fixed an issue where the documentation examples for the runai workload delete CLI command were incorrect.
2.21.17
21/05/2025
RUN-28609
Fixed an issue where users with the ML Engineer role were unable to delete multiple inference jobs at once.
2.21.17
21/05/2025
RUN-28665
Fixed an issue where using servingPort authorization fields in the Create an inference API on unsupported clusters did not return an error.
2.21.17
21/05/2025
RUN-28717
Fixed an issue where the Update inference spec API documentation listed an incorrect response code.
2.21.17
21/05/2025
RUN-28755
Fixed an issue where the tooltip next to the External URL for an inference endpoint incorrectly stated that the URL was internal.
2.21.17
21/05/2025
RUN-28762
Fixed an issue with the inference workload ownership protection.
2.21.17
21/05/2025
RUN-28859
Fixed an issue where the knative.enable-scale-to-zero setting did not default to true as expected.
2.21.17
21/05/2025
RUN-28923
Fixed an issue where calling the Get node telemetry data API with the telemetryType IDLE_ALLOCATED_GPUS resulted in a 500 Internal Server Error.
2.21.17
21/05/2025
RUN-28950
Fixed a security vulnerability in github.com/moby and github.com/docker/docker related to CVE-2024-41110 with severity Critical.
2.21.16
18/05/2025
RUN-27295
Fixed an issue in CLI v2 where the --node-type flag for inference workloads was not properly propagated to the pod specification.
2.21.16
18/05/2025
RUN-27375
Fixed an issue where projects were not visible in the legacy job submission form, preventing users from selecting a target project.
2.21.16
18/05/2025
RUN-27514
Fixed an issue where disabling CPU quota in the General settings did not remove existing CPU quotas from projects and departments.
2.21.16
18/05/2025
RUN-27521
Fixed a security vulnerability in axios related to CVE-2025-27152 with severity HIGH.
2.21.16
18/05/2025
RUN-27638
Fixed an issue where a node pool’s placement strategy stopped functioning correctly after being edited.
2.21.16
18/05/2025
RUN-27438
Fixed an issue where MPI jobs were unavailable due to an OpenShift MPI Operator installation error.
2.21.16
18/05/2025
RUN-27952
Fixed a security vulnerability in emacs-filesystem related to CVE-2025-1244 with severity HIGH.
2.21.16
18/05/2025
RUN-28244
Fixed a security vulnerability in liblzma5 related to CVE-2025-31115 with severity HIGH.
2.21.16
18/05/2025
RUN-28006
Fixed an issue where tokens became invalid for the API server after one hour.
2.21.16
18/05/2025
RUN-28097
Fixed an issue where the allocated_gpu_count_per_gpu metric displayed incorrect data for fractional pods.
2.21.16
18/05/2025
RUN-28213
Fixed a security vulnerability in github.com.golang.org.x.crypto related to CVE-2025-22869 with severity HIGH.
2.21.16
18/05/2025
RUN-28311
Fixed an issue where user creation failed with a duplicate email error, even though the email address did not exist in the system.
2.21.16
18/05/2025
RUN-28832
Fixed inference CLI v2 documentation with examples that reflect correct usage.
2.21.15
30/04/2025
RUN-27533
Fixed an issue where workloads with idle GPUs were not suspended after exceeding the configured idle time.
2.21.14
29/04/2025
RUN-26608
Fixed an issue by adding a flag to the cli config set command and the CLI install script, allowing users to set a cache directory.
2.21.14
29/04/2025
RUN-27264
Fixed an issue where creating a project from the UI with a non-unlimited deserved CPU value caused the queue to be created with limit = deserved instead of unlimited.
2.21.14
29/04/2025
RUN-27484
Fixed an issue where duplicate app.kubernetes.io/name labels were applied to services in the control plane Helm chart.
2.21.14
29/04/2025
RUN-27502
Fixed the inference CLI commands documentation: --max-replicas and --min-replicas were incorrectly used instead of --max-scale and --min-scale.
2.21.14
29/04/2025
RUN-27513
Fixed an issue where cluster-scoped policies were not visible to users with appropriate permissions.
2.21.14
29/04/2025
RUN-27515
Fixed an issue where users were unable to use assets from an upper scope during flexible workload submissions.
2.21.14
29/04/2025
RUN-27520
Fixed an issue where adding access rules immediately after creating an application did not refresh the access rules table.
2.21.14
29/04/2025
RUN-27628
Fixed an issue where a node pool could remain stuck in Updating status in certain cases.
2.21.14
29/04/2025
RUN-27826
Fixed an issue where the runai inference update command could result in a failure to update the workload. Although the command itself succeeded (since the update is asynchronous), the update often failed, and the new spec was not applied.
2.21.14
29/04/2025
RUN-27915
Fixed an issue where the "Improved Command Line Interface" admin setting was incorrectly labeled as Beta instead of Stable.
2.21.11
29/04/2025
RUN-27251
Fixed a security vulnerability in github.com.golang-jwt.jwt.v4 and github.com.golang-jwt.jwt.v5 with CVE-2025-30204 with severity HIGH.
Fixed a security vulnerability in golang.org.x.net with CVE-2025-22872 with severity MEDIUM.
Fixed a security vulnerability in knative.dev/serving with CVE-2023-48713 with severity MEDIUM.
2.21.11
29/04/2025
RUN-27309
Fixed an issue where workloads configured with a multi node pool setup could fail to schedule on a specific node pool in the future after an initial scheduling failure, even if sufficient resources later became available.
2.21.10
29/04/2025
RUN-26992
Fixed an issue where workloads submitted with an invalid node port range would get stuck in Creating status.
2.21.10
29/04/2025
RUN-27497
Fixed an issue where, after deleting an SSO user and immediately creating a local user, the delete confirmation dialog reappeared unexpectedly.
2.21.9
15/04/2025
RUN-26989
Fixed an issue that prevented reordering node pools in the workload submission form.
2.21.9
15/04/2025
RUN-27247
Fixed security vulnerabilities in Spring framework used by db-mechanic service - CVE-2021-27568, CVE-2021-44228, CVE-2022-22965, CVE-2023-20873, CVE-2024-22243, CVE-2024-22259 and CVE-2024-22262.
2.21.9
15/04/2025
RUN-26359
Fixed an issue in CLI v2 where using the --toleration option required incorrect mandatory fields.
Metrics are numeric measurements recorded over time that are emitted from the NVIDIA Run:ai cluster and telemetry is a numeric measurement recorded in real-time when emitted from the NVIDIA Run:ai cluster.
NVIDIA Run:ai provides control-plane API which supports and aggregates analytics at various levels.
NVIDIA provides extended metrics as shown here . To enable these metrics, please contact NVIDIA Run:ai customer support.
CPU_MEMORY_LIMIT_BYTES
CPU memory limit
CPU_MEMORY_REQUEST_BYTES
CPU memory request
CPU_MEMORY_USAGE_BYTES
CPU memory usage
CPU_MEMORY_UTILIZATION
CPU memory utilization
CPU_REQUEST_CORES
CPU request
CPU_USAGE_CORES
CPU usage
CPU_UTILIZATION
CPU compute utilization
CPU utilization
and
GPU_ALLOCATION
GPU devices (allocated)
GPU_MEMORY_REQUEST_BYTES
GPU memory request
GPU_MEMORY_USAGE_BYTES
GPU memory usage
GPU_MEMORY_USAGE_BYTES_PER_GPU
GPU memory usage per GPU
GPU_MEMORY_UTILIZATION
GPU memory utilization
GPU_MEMORY_UTILIZATION_PER_GPU
GPU memory utilization per GPU
GPU_QUOTA
Quota
GPU_UTILIZATION
GPU compute utilization
GPU_UTILIZATION_PER_GPU
GPU utilization per GPU
TOTAL_GPU
GPU devices total
Total GPUs
TOTAL_GPU_NODES
GPU_UTILIZATION_DISTRIBUTION
GPU utilization distribution
UNALLOCATED_GPU
GPU devices (unallocated)
Unallocated GPUs
CPU_QUOTA_MILLICORES
CPU_MEMORY_QUOTA_MB
CPU_ALLOCATION_MILLICORES
CPU_MEMORY_ALLOCATION_MB
POD_COUNT
RUNNING_POD_COUNT
GPU_GRAPHICS_ENGINE_ACTIVITY_PER_GPU
Graphics engine activity
GPU_MEMORY_BANDWIDTH_UTILIZATION_PER_GPU
GPU_NVLINK_RECEIVED_BANDWIDTH_PER_GPU
GPU_NVLINK_TRANSMITTED_BANDWIDTH_PER_GPU
GPU_PCIE_RECEIVED_BANDWIDTH_PER_GPU
GPU_PCIE_TRANSMITTED_BANDWIDTH_PER_GPU
GPU_SM_ACTIVITY_PER_GPU
GPU SM activity
GPU_SM_OCCUPANCY_PER_GPU
GPU SM occupancy
GPU_TENSOR_ACTIVITY_PER_GPU
GPU tensor activity
READY_GPU_NODES
Ready / Total GPU nodes
READY_GPUS
Ready / Total GPU devices
TOTAL_GPU_NODES
Ready / Total GPU nodes
TOTAL_GPUS
Ready / Total GPU devices
IDLE_ALLOCATED_GPUS
Idle allocated GPU devices
FREE_GPUS
Free GPU devices
TOTAL_CPU_CORES
CPU (Cores)
USED_CPU_CORES
ALLOCATED_CPU_CORES
Allocated CPU cores
TOTAL_GPU_MEMORY_BYTES
GPU memory
USED_GPU_MEMORY_BYTES
Used GPU memory
TOTAL_CPU_MEMORY_BYTES
CPU memory
USED_CPU_MEMORY_BYTES
Used CPU memory
ALLOCATED_CPU_MEMORY_BYTES
Allocated CPU memory
GPU_QUOTA
GPU quota
CPU_QUOTA
MEMORY_QUOTA
GPU_ALLOCATION_NON_PREEMPTIBLE
CPU_ALLOCATION_NON_PREEMPTIBLE
MEMORY_ALLOCATION_NON_PREEMPTIBLE
Cluster
A cluster is a set of nodes pools and nodes. With Cluster metrics, metrics are aggregated at the Cluster level. In the NVIDIA Run:ai user interface, metrics are available in the Overview dashboard.
Node
Data is aggregated at the node level.
Node pool
Data is aggregated at the node pool level.
Workload
Data is aggregated at the workload level. In some workloads, e.g. with distributed workloads, these metrics aggregate data from all worker pods.
Pod
The basic unit of execution.
Project
The basic organizational unit. Projects are the tool to implement resource allocation policies as well as the segregation between different initiatives.
Department
Departments are a grouping of projects.
ALLOCATED_GPU
GPU devices (allocated)
Allocated GPUs
AVG_WORKLOAD_WAIT_TIME
CPU_LIMIT_CORES
GPU_FP16_ENGINE_ACTIVITY_PER_GPU
GPU FP16 engine activity
GPU_FP32_ENGINE_ACTIVITY_PER_GPU
GPU FP32 engine activity
GPU_FP64_ENGINE_ACTIVITY_PER_GPU
WORKLOADS_COUNT
ALLOCATED_GPUS
Allocated GPUs
GPU_allocation
CPU limit
GPU FP64 engine activity
This section explains the available roles in the NVIDIA Run:ai platform.
A role is a set of permissions that can be assigned to a subject in a scope. A permission is a set of actions (View, Edit, Create and Delete) over a NVIDIA Run:ai entity (e.g. projects, workloads, users).
The Roles table can be found under Access in the NVIDIA Run:ai platform.
The Roles table displays a list of roles available to users in the NVIDIA Run:ai platform. Both predefined and custom roles will be displayed in the table.
The Roles table consists of the following columns:
Filter - Click ADD FILTER, select the column to filter by, and enter the filter values
Search - Click SEARCH and type the value to search by
Sort - Click each column header to sort by
Column selection - Click COLUMNS and select the columns to display in the table
To review a role click the role name on the table
In the role form review the following:
Role name The name of the role
Entity A system-managed object that can be viewed, edited, created or deleted by a user based on their assigned role and scope
NVIDIA Run:ai supports the following roles and their permissions. Under each role is a detailed list of the actions that the role assignee is authorized to perform for each entity.
Go to the API reference to view the available actions.
Download table - Click MORE and then Click Download as CSV. Export to CSV is limited to 20,000 rows.
Actions The actions that the role assignee is authorized to perform for each entity
View - If checked, an assigned user with this role can view instances of this type of entity within their defined scope
Edit - If checked, an assigned user with this role can change the settings of an instance of this type of entity within their defined scope
Create - If checked, an assigned user with this role can create new instances of this type of entity within their defined scope
Delete - If checked, an assigned user with this role can delete instances of this type of entity within their defined scope
Departments
Event history
Policies
Projects
Settings
Clusters
Node pools
Nodes
Access rules
Applications
Groups
Roles
User applications
Users
Analytics dashboard
Consumption dashboard
Overview dashboard
Inferences
Workloads
Compute resources
Credentials
Data sources
Data volumes
Data volumes - sharing list
Environments
Storage class configurations
Templates
Departments
Event history
Policies
Projects
Settings
Clusters
Node pools
Nodes
Access rules
Applications
Groups
Roles
User applications
Users
Analytics dashboard
Consumption dashboard
Overview dashboard
Inferences
Workloads
Compute resources
Credentials
Data sources
Data volumes
Data volumes - sharing list
Environments
Storage class configurations
Templates
Departments
Event history
Policies
Projects
Settings
Clusters
Node pools
Nodes
Access rules
Applications
Groups
Roles
User applications
Users
Analytics dashboard
Consumption dashboard
Overview dashboard
Inferences
Workloads
Compute resources
Credentials
Data sources
Data volumes
Data volumes - sharing list
Environments
Storage class configurations
Templates
Departments
Event history
Policies
Projects
Settings
Clusters
Node pools
Nodes
Access rules
Applications
Groups
Roles
User applications
Users
Analytics dashboard
Consumption dashboard
Overview dashboard
Inferences
Workloads
Compute resources
Credentials
Data sources
Data volumes
Data volumes - sharing list
Environments
Storage class configurations
Templates
Departments
Event history
Policies
Projects
Settings
Clusters
Node pools
Nodes
Access rules
Applications
Groups
Roles
User applications
Users
Analytics dashboard
Consumption dashboard
Overview dashboard
Inferences
Workloads
Compute resources
Credentials
Data sources
Data volumes
Data volumes - sharing list
Environments
Storage class configurations
Templates
Departments
Event history
Policies
Projects
Settings
Clusters
Node pools
Nodes
Access rules
Applications
Groups
Roles
User applications
Users
Analytics dashboard
Consumption dashboard
Overview dashboard
Inferences
Workloads
Compute resources
Credentials
Data sources
Data volumes
Data volumes - sharing list
Environments
Storage class configurations
Templates
Departments
Event history
Policies
Projects
Settings
Clusters
Node pools
Nodes
Access rules
Applications
Groups
Roles
User applications
Users
Analytics dashboard
Consumption dashboard
Overview dashboard
Inferences
Workloads
Compute resources
Credentials
Data sources
Data volumes
Data volumes - sharing list
Environments
Storage class configurations
Templates
Departments
Event history
Policies
Projects
Settings
Clusters
Node pools
Nodes
Access rules
Applications
Groups
Roles
User applications
Users
Analytics dashboard
Consumption dashboard
Overview dashboard
Inferences
Workloads
Compute resources
Credentials
Data sources
Data volumes
Data volumes - sharing list
Environments
Storage class configurations
Templates
Departments
Event history
Policies
Projects
Settings
Clusters
Node pools
Nodes
Access rules
Applications
Groups
Roles
User applications
Users
Analytics dashboard
Consumption dashboard
Overview dashboard
Inferences
Workloads
Compute resources
Credentials
Data sources
Data volumes
Data volumes - sharing list
Environments
Storage class configurations
Templates
Departments
Event history
Policies
Projects
Settings
Clusters
Node pools
Nodes
Access rules
Applications
Groups
Roles
User applications
Users
Analytics dashboard
Consumption dashboard
Overview dashboard
Inferences
Workloads
Compute resources
Credentials
Data sources
Data volumes
Data volumes - sharing list
Environments
Storage class configurations
Templates
Departments
Event history
Policies
Projects
Settings
Clusters
Node pools
Nodes
Access rules
Applications
Groups
Roles
User applications
Users
Analytics dashboard
Consumption dashboard
Overview dashboard
Inferences
Workloads
Compute resources
Credentials
Data sources
Data volumes
Data volumes - sharing list
Environments
Storage class configurations
Templates
Departments
Event history
Policies
Projects
Settings
Clusters
Node pools
Nodes
Access rules
Applications
Groups
Roles
User applications
Users
Analytics dashboard
Consumption dashboard
Overview dashboard
Inferences
Workloads
Compute resources
Credentials
Data sources
Data volumes
Data volumes - sharing list
Environments
Storage class configurations
Templates
Departments
Event history
Policies
Projects
Settings
Clusters
Node pools
Nodes
Access rules
Applications
Groups
Roles
User applications
Users
Analytics dashboard
Consumption dashboard
Overview dashboard
Inferences
Workloads
Compute resources
Credentials
Data sources
Data volumes
Data volumes - sharing list
Environments
Storage class configurations
Templates
Departments
Event history
Policies
Projects
Settings
Clusters
Node pools
Nodes
Access rules
Applications
Groups
Roles
User applications
Users
Analytics dashboard
Consumption dashboard
Overview dashboard
Inferences
Workloads
Compute resources
Credentials
Data sources
Data volumes
Data volumes - sharing list
Environments
Storage class configurations
Templates
Departments
Event history
Policies
Projects
Settings
Clusters
Node pools
Nodes
Access rules
Applications
Groups
Roles
User applications
Users
Analytics dashboard
Consumption dashboard
Overview dashboard
Inferences
Workloads
Compute resources
Credentials
Data sources
Data volumes
Data volumes - sharing list
Environments
Storage class configurations
Templates
k8s: Pod
k8s: Deployment
batch: Job
batch: CronJob
machinelearning.seldon.io: SeldonDeployment
kubevirt.io: VirtualMachineInstance
kubeflow.org: TFJob
kubeflow.org: PyTorchJob
kubeflow.org: XGBoostJob
kubeflow.org: MPIJob
kubeflow.org: MPIJob
kubeflow.org: Notebook
kubeflow.org: ScheduledWorkflow
amlarc.azureml.com: AmlJob
serving.knative.dev: Service
workspace.devfile.io: DevWorkspace
ray.io: RayCluster
ray.io: RayJob
ray.io: RayService
tekton.dev: TaskRun
tekton.dev: PipelineRun
argoproj.io: Workflow
Role
The name of the role
Created by
The name of the role creator
Creation time
The timestamp when the role was created
Account
Account
Account
Account
Account
Account
Account
Account
Account
Account
Account
Account
Account
Account
Account

A workload policy is an end-to-end solution for AI managers and administrators to control and simplify how workloads are submitted, setting best practices, enforcing limitations, and standardizing processes for AI projects within their organization.
This article explains the policy YAML fields and the possible rules and defaults that can be set for each field.
The policy fields are structured in a similar format to the workload API fields. The following tables represent a structured guide designed to help you understand and configure policies in a YAML format. It provides the fields, descriptions, defaults and rules for each workload type.
Click the link to view the value type of each field.
Each field has a specific value type. The following value types are supported.
Workload fields of type itemized have multiple instances, however in comparison to objects, each can be referenced by a key field. The key field is defined for each field.
Consider the following workload spec:
In this example, extendedResources have two instances, each has two attributes: resource (the key attribute) and quantity.
In policy, the defaults and rules for itemized fields have two sub sections:
Instances: default items to be added to the policy or rules which apply to an instance as a whole.
Attributes: defaults for attributes within an item or rules which apply to attributes within each item.
Consider the following example:
Assume the following workload submission is requested:
The effective policy for the above mentioned workload has the following extendedResources instances:
A workload submission request cannot exclude the default/cpu resource, as this key is included in the locked rules under the instances section. {#a-workload-submission-request-cannot-exclude-the-default/cpu-resource,-as-this-key-is-included-in-the-locked-rules-under-the-instances-section.}
For each field of a specific policy, you can specify both rules and defaults. A policy spec consists of the following sections:
Rules
Defaults
Imposed Assets
Rules set up constraints on workload policy fields. For example, consider the following policy:
Such a policy restricts the maximum value for gpuDeviceRequests to 8, and the minimal value for runAsUid, provided in the security section to 500.
The defaults section is used for providing defaults for various workload fields. For example, consider the following policy:
Assume a submission request with the following values:
Image: ubuntu
runAsUid: 501
The effective workload that runs has the following set of values:
Default instances of a storage field can be provided using a datasource containing the details of this storage instance. To add such instances in the policy, specify those asset IDs in the imposedAssets section of the policy.
Assets with references to credential assets (for example: private S3, containing reference to an AccessKey asset) cannot be used as imposedAssets.
environmentVariables
Set of environmentVariables to populate the container running the workspace
Workspace
Standard training
Distributed training
image
Specifies the image to use when creating the container running the workload
Workspace
Standard training
Distributed training
imagePullPolicy
Specifies the pull policy of the image when starting t a container running the created workload. Options are: Always, Never, or IfNotPresent
Workspace
Standard training
Distributed training
workingDir
Container’s working directory. If not specified, the container runtime default is used, which might be configured in the container image
Workspace
Standard training
Distributed training
nodeType
Nodes (machines) or a group of nodes on which the workload runs
Workspace
Standard training
Distributed training
nodePools
A prioritized list of node pools for the scheduler to run the workspace on. The scheduler always tries to use the first node pool before moving to the next one when the first is not available.
Workspace
Standard training
Distributed training
annotations
Set of annotations to populate into the container running the workspace
Workspace
Standard training
Distributed training
labels
Set of labels to populate into the container running the workspace
Workspace
Standard training
Distributed training
terminateAfterPreemtpion
Indicates whether the job should be terminated, by the system, after it has been preempted
Workspace
Standard training
Distributed training
autoDeletionTimeAfterCompletionSeconds
Specifies the duration after which a finished workload (Completed or Failed) is automatically deleted. If this field is set to zero, the workload becomes eligible to be deleted immediately after it finishes.
Workspace
Standard training
Distributed training
backoffLimit
Specifies the number of retries before marking a workload as failed
Workspace
Standard training
Distributed training
restartPolicy
Specify the restart policy of the workload pods. Default is empty, which is determine by the framework default
Enum: "Always" "Never" "OnFailure"
Workspace
Standard training
Distributed training
cleanPodPolicy
Specifies which pods will be deleted when the workload reaches a terminal state (completed/failed). The policy can be one of the following values:
Running - Only pods still running when a job completes (for example, parameter servers) will be deleted immediately. Completed pods will not be deleted so that the logs will be preserved. (Default for MPI)
All - All (including completed) pods will be deleted immediately when the job finishes.
None
Distributed training
completions
Used with Hyperparameter Optimization. Specifies the number of successful pods the job should reach to be completed. The Job is marked as successful once the specified amount of pods has succeeded.
Standard training
parallelism
Used with Hyperparameters Optimization. Specifies the maximum desired number of pods the workload should run at any given time.
Standard training
exposedUrls
Specifies a set of exported URL (e.g. ingress) from the container running the created workload.
Workspace
Standard training
Distributed training
relatedUrls
Specifies a set of URLs related to the workload. For example, a URL to an external server providing statistics or logging about the workload.
Workspace
Standard training
Distributed training
PodAffinitySchedulingRule
Indicates if we want to use the Pod affinity rule as: the “hard” (required) or the “soft” (preferred) option. This field can be specified only if PodAffinity is set to true.
Workspace
Standard training
Distributed training
podAffinityTopology
Specifies the Pod Affinity Topology to be used for scheduling the job. This field can be specified only if PodAffinity is set to true.
Workspace
Standard training
Distributed training
sshAuthMountPath
Specifies the directory where SSH keys are mounted
Distributed training (MPI only)
ports
Specifies a set of ports exposed from the container running the created workload. More information in Ports fields below.
Workspace
Standard training
Distributed training
probes
Specifies the ReadinessProbe to use to determine if the container is ready to accept traffic. More information in below
-
Workspace
Standard training
Distributed training
tolerations
Toleration rules which apply to the pods running the workload. Toleration rules guide (but do not require) the system to which node each pod can be scheduled to or evicted from, based on matching between those rules and the set of taints defined for each Kubernetes node.
Workspace
Standard training
Distributed training
priorityClass
Priority class of the workload. The default value for workspace is 'build' and it can be changed to 'interactive-preemptible' to allow the workload to use over-quota resources. The default value for training is 'train' and it can be changed to 'build' to allow the training workload to have a higher priority for in-queue scheduling and also become non-preemptive (if it's in deserved quota).
Enum: "build" "train" "interactive-preemptible"
Workspace
Standard training
storage
Contains all the fields related to storage configurations. More information in below.
-
Workspace
Standard training
Distributed training
security
Contains all the fields related to security configurations. More information in below.
-
Workspace
Standard training
Distributed training
compute
Contains all the fields related to compute configurations. More information in below.
-
Workspace
Standard training
Distributed training
tty
Whether this container should allocate a TTY for itself, also requires 'stdin' to be true
Workspace
Standard training
Distributed training
stdin
Whether this container should allocate a buffer for stdin in the container runtime. If this is not set, reads from stdin in the container will always result in EOF
Workspace
Standard training
Distributed training
numWorkers
The number of workers that will be allocated for running the workload.
Distributed training
distributedFramework
The distributed training framework used in the workload.
Enum: "MPI" "PyTorch" "TF" "XGBoost"
Distributed training
slotsPerWorker
Specifies the number of slots per worker used in hostfile. Defaults to 1. (applicable only for MPI)
Distributed training (MPI only)
minReplicas
The lower limit for the number of worker pods to which the training job can scale down. (applicable only for PyTorch)
Distributed training (PyTorch only)
maxReplicas
The upper limit for the number of worker pods that can be set by the autoscaler. Cannot be smaller than MinReplicas. (applicable only for PyTorch)
Distributed training (PyTorch only)
Workspace
Standard training
Distributed training
toolType
The tool type that runs on this port.
Workspace
Standard training
Distributed training
toolName
A name describing the tool that runs on this port.
Workspace
Standard training
Distributed training
initialDelaySeconds
Number of seconds after the container has started before liveness or readiness probes are initiated.
periodSeconds
How often (in seconds) to perform the probe
timeoutSeconds
Number of seconds after which the probe times out
successThreshold
Minimum consecutive successes for the probe to be considered successful after having failed
Workspace
Standard training
Distributed training
runAsNonRoot
Indicates that the container must run as a non-root user.
Workspace
Standard training
Distributed training
readOnlyRootFilesystem
If true, mounts the container's root filesystem as read-only.
Workspace
Standard training
Distributed training
runAsUid
Specifies the Unix user id with which the container running the created workload should run.
Workspace
Standard training
Distributed training
runasGid
Specifies the Unix Group ID with which the container should run.
Workspace
Standard training
Distributed training
supplementalGroups
Comma separated list of groups that the user running the container belongs to, in addition to the group indicated by runAsGid.
Workspace
Standard training
Distributed training
allowPrivilegeEscalation
Allows the container running the workload and all launched processes to gain additional privileges after the workload starts
Workspace
Standard training
Distributed training
hostIpc
Whether to enable hostIpc. Defaults to false.
Workspace
Standard training
Distributed training
hostNetwork
Whether to enable host network.
Workspace
Standard training
Distributed training
Workspace
Standard training
Distributed training
cpuMemoryLimit
Limitations on the CPU memory to allocate for this workload (1G, 20M, .etc). The system guarantees that this workload is not be able to consume more than this amount of memory. The workload receives an error when trying to allocate more memory than this limit.
Workspace
Standard training
Distributed training
largeShmRequest
A large /dev/shm device to mount into a container running the created workload (shm is a shared file system mounted on RAM).
Workspace
Standard training
Distributed training
gpuRequestType
Sets the unit type for GPU resources requests to either portion, memory or mig profile. Only if gpuDeviceRequest = 1, the request type can be stated as portion, memory or migProfile.
Workspace
Standard training
Distributed training
migProfile (Deprecated)
Specifies the memory profile to be used for workload running on NVIDIA Multi-Instance GPU (MIG) technology.
Workspace
Standard training
Distributed training
gpuPortionRequest
Specifies the fraction of GPU to be allocated to the workload, between 0 and 1. For backward compatibility, it also supports the number of gpuDevices larger than 1, currently provided using the gpuDevices field.
Workspace
Standard training
Distributed training
gpuDeviceRequest
Specifies the number of GPUs to allocate for the created workload. Only if gpuDeviceRequest = 1, the gpuRequestType can be defined.
Workspace
Standard training
Distributed training
gpuPortionLimit
When a fraction of a GPU is requested, the GPU limit specifies the portion limit to allocate to the workload. The range of the value is from 0 to 1.
Workspace
Standard training
Distributed training
gpuMemoryRequest
Specifies GPU memory to allocate for the created workload. The workload receives this amount of memory. Note that the workload is not scheduled unless the system can guarantee this amount of GPU memory to the workload.
Workspace
Standard training
Distributed training
gpuMemoryLimit
Specifies a limit on the GPU memory to allocate for this workload. Should be no less than the gpuMemory.
Workspace
Standard training
Distributed training
extendedResources
Specifies values for extended resources. Extended resources are third-party devices (such as high-performance NICs, FPGAs, or InfiniBand adapters) that you want to allocate to your Job.
Workspace
Standard training
Distributed training
Workspace
Standard training
Distributed training
Specifies persistent volume claims to mount into a container running the created workload.
Workspace
Standard training
Distributed training
Specifies NFS volume to mount into the container running the workload.
Workspace
Standard training
Distributed training
Specifies S3 buckets to mount into the container running the workload.
Workspace
Standard training
Distributed training
configMapVolumes
Specifies ConfigMaps to mount as volumes into a container running the created workload.
Workspace
Standard training
Distributed training
secretVolume
Set of secret volumes to use in the workload. A secret volume maps a secret resource in the cluster to a file-system mount point within the container running the workload.
Workspace
Standard training
Distributed training
name
Unique name to identify the instance. Primarily used for policy locked rules.
path
Local path within the controller to which the host volume is mapped.
readOnly
Force the volume to be mounted with read-only permissions. Defaults to false
mountPath
The path that the host volume is mounted to when in use. Enum:
"None"
"HostToContainer"
repository
URL to a remote git repository. The content of this repository is mapped to the container running the workload
revision
Specific revision to synchronize the repository from
path
Local path within the workspace to which the S3 bucket is mapped
secretName
Optional name of Kubernetes secret that holds your git username and password
claimName (mandatory)
A given name for the PVC. Allowed referencing it across workspaces
ephemeral
Use true to set PVC to ephemeral. If set to true, the PVC is deleted when the workspace is stopped.
path
Local path within the workspace to which the PVC bucket is mapped
readonly
Permits read only from the PVC, prevents additions or modifications to its content
mountPath
The path that the NFS volume is mounted to when in use
path
Path that is exported by the NFS server
readOnly
Whether to force the NFS export to be mounted with read-only permissions
nfsServer
The hostname or IP address of the NFS server
Bucket
The name of the bucket
path
Local path within the workspace to which the S3 bucket is mapped
url
The URL of the S3 service provider. The default is the URL of the Amazon AWS S3 service
Integer
An Integer is a whole number without a fractional component.
canEdit
required
min
max
100
Number
Capable of having non-integer values
canEdit
required
min
defaultFrom
10.3
Quantity
Holds a string composed of a number and a unit representing a quantity
canEdit
required
min
max
5M
Array
Set of values that are treated as one, as opposed to Itemized in which each item can be referenced separately.
canEdit
required
node-a
node-b
node-c
Submission request
min
The minimal value for the field
max
The maximal value for the field
step
The allowed gap between values for this field. In this example the allowed values are: 1, 3, 5, 7
options
Set of allowed values for this field
defaultFrom
Set a default value for a field that will be calculated based on the value of another field
args
When set, contains the arguments sent along with the command. These override the entry point of the image in the created workload
Workspace
Standard training
Distributed training
command
A command to serve as the entry point of the container running the workspace
Workspace
Standard training
Distributed training
createHomeDir
Instructs the system to create a temporary home directory for the user within the container. Data stored in this directory is not saved when the container exists. When the runAsUser flag is set to true, this flag defaults to true as well
Workspace
Standard training
Distributed training
container
The port that the container running the workload exposes.
Workspace
Standard training
Distributed training
serviceType
Specifies the default service exposure method for ports. the default shall be sued for ports which do not specify service type. Options are: LoadBalancer, NodePort or ClusterIP. For more information see the External Access to Containers guide.
Workspace
Standard training
Distributed training
external
The external port which allows a connection to the container port. If not specified, the port is auto-generated by the system.
readiness
Specifies the Readiness Probe to use to determine if the container is ready to accept traffic.
-
Workspace
Standard training
Distributed training
uidGidSource
Indicates the way to determine the user and group ids of the container. The options are:
fromTheImage - user and group IDs are determined by the docker image that the container runs. This is the default option.
custom - user and group IDs can be specified in the environment asset and/or the workspace creation request.
fromIdpToken - user and group IDs are automatically taken from the identity provider (IdP) token (available only in SSO-enabled installations).
For more information, see .
Workspace
Standard training
Distributed training
capabilities
The capabilities field allows adding a set of unix capabilities to the container running the workload. Capabilities are Linux distinct privileges traditionally associated with superuser which can be independently enabled and disabled
Workspace
Standard training
Distributed training
seccompProfileType
Indicates which kind of seccomp profile is applied to the container. The options are:
RuntimeDefault - the container runtime default profile should be used
Unconfined - no profile should be applied
cpuCoreRequest
CPU units to allocate for the created workload (0.5, 1, .etc). The workload receives at least this amount of CPU. Note that the workload is not scheduled unless the system can guarantee this amount of CPUs to the workload.
Workspace
Standard training
Distributed training
cpuCoreLimit
Limitations on the number of CPUs consumed by the workload (0.5, 1, .etc). The system guarantees that this workload is not able to consume more than this amount of CPUs.
Workspace
Standard training
Distributed training
cpuMemoryRequest
The amount of CPU memory to allocate for this workload (1G, 20M, .etc). The workload receives at least this amount of memory. Note that the workload is not scheduled unless the system can guarantee this amount of memory to the workload
dataVolume
Set of data volumes to use in the workload. Each data volume is mapped to a file-system mount point within the container running the workload.
Workspace
Standard training
Distributed training
Maps a folder to a file-system mount point within the container running the workload.
Workspace
Standard training
Distributed training
Details of the git repository and items mapped to it.
Boolean
A binary value that can be either True or False
canEdit
required
true/false
String
A sequence of characters used to represent text. It can include letters, numbers, symbols, and spaces
canEdit
required
options
abc
Itemized
An ordered collection of items (objects), which can be of different types (all items in the list are of the same type). For further information see the chapter below the table.
canAdd
locked
default/cpu
Policy defaults
5
The default of this instance in the policy defaults section
added/cpu
Submission request
3
The default of the quantity attribute from the attributes section
added/memory
Submission request
canAdd
Whether the submission request can add items to an itemized field other than those listed in the policy defaults for this field.
locked
Set of items that the workload is unable to modify or exclude. In this example, a workload policy default is given to HOME and USER, that the submission request cannot modify or exclude from the workload.
canEdit
Whether the submission request can modify the policy default for this field. In this example, it is assumed that the policy has default for imagePullPolicy. As canEdit is set to false, submission requests are not able to alter this default.
required
When set to true, the workload must have a value for this field. The value can be obtained from policy defaults. If no value specified in the policy defaults, a value must be specified for this field in the submission request.
Image
Ubuntu
Submission request
ImagePullPolicy
Always
Policy defaults
security.runAsNonRoot
true
Policy defaults
security.runAsUid
501
Submission request
See below
5M
spec:
image: ubuntu
compute:
extendedResources:
- resource: added/cpu
quantity: 10
- resource: added/memory
quantity: 20Mdefaults:
compute:
extendedResources:
instances:
- resource: default/cpu
quantity: 5
- resource: default/memory
quantity: 4M
attributes:
quantity: 3
rules:
compute:
extendedResources:
instances:
locked:
- default/cpu
attributes:
quantity:
required: truespec:
image: ubuntu
compute:
extendedResources:
- resource: default/memory
exclude: true
- resource: added/cpu
- resource: added/memory
quantity: 5Mstorage:
hostPath:
instances:
canAdd: falsestorage:
hostPath:
Instances:
locked:
- HOME
- USERimagePullPolicy:
canEdit: falseimage:
required: truecompute:
gpuDevicesRequest:
min: 3compute:
gpuMemoryRequest:
max: 2Gcompute:
cpuCoreRequest:
min: 1
max: 7
Step: 2image:
options:
- value: image-1
- value: image-2cpuCoreRequest:
defaultFrom:
field: compute.cpuCoreLimit
factor: 0.5rules:
compute:
gpuDevicesRequest:
max: 8
security:
runAsUid:
min: 500defaults:
imagePullPolicy: Always
security:
runAsNonRoot: true
runAsUid: 500defaults:
imagePullPolicy: Always
security:
runAsNonRoot: true
runAsUid: 500
rules:
security:
runAsUid:
canEdit: falsedefaults: null
rules: null
imposedAssets:
- f12c965b-44e9-4ff6-8b43-01d8f9e630ccdefaults:
probes:
readiness:
initialDelaySeconds: 2defaults:
storage:
hostPath:
instances:
- path: h3-path-1
mountPath: h3-mount-1
- path: h3-path-2
mountPath: h3-mount-2
attributes:
- readOnly: truedefaults:
storage:
git:
attributes:
Repository: https://runai.public.github.com
instances
- branch: "master"
path: /container/my-repository
passwordSecret: my-password-secretdefaults:
storage:
pvc:
instances:
- claimName: pvc-staging-researcher1-home
existingPvc: true
path: /myhome
readOnly: false
claimInfo:
accessModes:
readWriteMany: truedefaults:
storage:
nfs:
instances:
- path: nfs-path
readOnly: true
server: nfs-server
mountPath: nfs-mount
rules:
storage:
nfs:
instances:
canAdd: falsedefaults:
storage:
s3:
instances:
- bucket: bucket-opt-1
path: /s3/path
accessKeySecret: s3-access-key
secretKeyOfAccessKeyId: s3-secret-id
secretKeyOfSecretKey: s3-secret-key
attributes:
url: https://amazonaws.s3.comdefaultFrom
failureThreshold
When a probe fails, the number of times to try before giving up
mountPropagation
Share this volume mount with other containers. If set to HostToContainer, this volume mount receives all subsequent mounts that are mounted to this volume or any of its subdirectories. In case of multiple hostPath entries, this field should have the same value for all of them.
username
If secretName is provided, this field should contain the key, within the provided Kubernetes secret, which holds the value of your git username. Otherwise, this field should specify your git username in plain text (example: myuser).
ReadwriteOnce
Requesting claim that can be mounted in read/write mode to exactly 1 host. If none of the modes are specified, the default is readWriteOnce.
size
Requested size for the PVC. Mandatory when existing PVC is false
storageClass
Storage class name to associate with the PVC. This parameter may be omitted if there is a single storage class in the system, or you are using the default storage class. Further details at Kubernetes storage classes.
readOnlyMany
Requesting claim that can be mounted in read-only mode to many hosts
readWriteMany
Requesting claim that can be mounted in read/write mode to many hosts