Quick Start for Infrastructure Administrators

This guide is for infrastructure administrators responsible for installing, configuring, and operating NVIDIA Run:ai.

The quick start walks through the initial infrastructure setup lifecycle, including platform installation and the essential post-installation configuration required to prepare the cluster for onboarding and workload execution. It focuses on infrastructure-level concerns such as cluster readiness, control plane behavior, security boundaries, and operational stability.

Prerequisites

Before you begin, ensure that:

  • A Kubernetes cluster is up and running.

  • Helmarrow-up-right 3.14 or later is installed.

  • You have kubectl access to the cluster with admin-level permissions.

Installation

The platform supports deployment using two primary methods, depending on your environment:

  • Install using Helm - The standard installation method using Helm charts. Provides full control and flexibility over configuration and deployment.

  • Install using Base Command Manager (BCM) - A guided installation method available through NVIDIA Base Command Manager intended to simplify deployment, employing defaults meant to enable most NVIDIA Run:ai capabilities on NVIDIA DGX SuperPOD systems.

Post Installation Infrastructure Setup

After installing NVIDIA Run:ai, complete the following foundational infrastructure configuration steps to ensure the platform is production-ready and can safely support organizational onboarding and workloads. These steps focus on cluster readiness, control plane behavior, and operational guardrails, rather than day-to-day platform usage:

  • Validate node readiness and assign node roles as required

  • Configure advanced control plane and cluster settings based on your environment requirements

  • Enable required integrations and networking components

  • Apply security and operational best practices

  • Prepare the platform for scale, availability, and ongoing maintenance

The exact configuration required depends on your environment, scale, and operational model. Detailed procedures and advanced options are documented in the Advanced setup and Infrastructure procedures sections.

Last updated