Deployment
Preparations
Before installing NVIDIA Run:ai, make sure you have reviewed the Preparations section and completed all tasks indicated in the Pre-installation checklist.
BCM Version
The instructions in this document are specific to BCM 11, with a minimum required version of 11.25.08.
Deploy Using the Wizard
Access the active BCM head node via ssh:
ssh root@<IP address of BCM head node>Verify the BCM version:
cm-package-release-info -f cm-setup,cmdaemon Name Version Release(s) -------- --------- ------------ cm-setup 123245 11.25.08 cmdaemon 163415 11.25.08Create the following files in the
/cm/shared/runai/directory populating each respectively from the linked content:If on DGX GB200, add these:
If on DGX B200, add these:
Verify that all files from the Preparations section and the step above have been created and are present:
root@bcm11-headnode:~# ls -1 /cm/shared/runai/* credential.jwt netop-values.yaml nic-cluster-policy.yaml combined-ippools-gb200.yaml combined-sriovibnet-gb200.yaml dra-test-gb200.yaml ib-test-gb200.yaml full-chain.pem private.key ca.crt # optionalRun the following command to initiate deployment via an interactive command-line assistant:
cm-kubernetes-setupSelect Deploy Kubernetes installation wizard and click Ok to proceed. If
cm-kubernetes-setupis being run from GB200, refer to the second screenshot:


Select the relevant Kubernetes version. This guide, employing Base Command Manager 11.25.08, is based on and requires Kubernetes 1.32. Click Ok to proceed:

The next step inquires if there’s a Docker Hub registry mirror available. It’s recommended that a local registry mirror be employed when available. For the purpose of this guide, leave the default value (blank) and click Ok to proceed:

Insert values for the new Kubernetes cluster that NVIDIA Run:ai will be installed into. Click Ok to proceed:
The Kubernetes cluster name should be a short, unique name that can be used to distinguish between multiple clusters (i.e.
k8s-user).The
k8s-user.localvalue for Kubernetes domain name is the default value for internal (within the Kubernetes cluster) name resolution and service discovery. It should be unique to distinguish it from the NMC cluster on DGX GB200 and later SuperPODs. Common practice is to avoid using the same domain for the internal Kubernetes domain name and eternally referenceable FQDN to avoid potential name resolution inconsistencies.The Kubernetes external FQDN field refers to the domain name that the Kubernetes API Server will be proxied at and will be automatically populated by BCM. If a valid name record (FQDN) for the BCM head node has been established prior that should be entered here. Please see the reference architecture section of the BCM Containerization Manual for details on how this is implemented via an NGINX proxy.
The Service network base address, Service network netmask bits, Pod network base address, & Pod network netmask bits fields provide CIDR ranges for Kubernetes service and pod networks. These will be pre-populated (taking care to avoid overlapping ranges from networks known to BCM) from private, non-routable ranges.

The next step asks about exposing the Kubernetes API server to the external network. Select no and click Ok to proceed:

The preferred internal network is used for Kubernetes intercommunication between ctrl plane and worker nodes. Select internalnet for the preferred internal network and click Ok to proceed:

Select 3 or more Kubernetes master nodes. These should be the same nodes assigned to the control plane category. The screenshot below is for illustration only - the correct category should be
k8s-system-user. See the BCM Node Categories section for more information. Click Ok to proceed:

Select the worker node categories to operate as the Kubernetes worker nodes. The screenshot below is for illustration only - the correct category should be either
dgx-gb200-k8sordgx-b200-k8sandk8s-system-user. See the BCM Node Categories section for more information. Click Ok to proceed:

Skip the selection of individual Kubernetes worker nodes (the category selected in the previous step will be used instead). Click Ok to proceed:

Select nodes for deploying etcd on. Make sure to select the same three nodes as the Kubernetes control plane nodes (Step 13). Click Ok to proceed:

Leave the API server proxy port and etcd spool directory values at their prepopulated values (do not modify them). Click Ok to proceed:

Select Calico as the Kubernetes network plugin. Click Ok to proceed:

Select no and click Ok to proceed:

The components selected in this screen represent those required by NVIDIA Run:ai for a self-hosted installation. Select the operator and NVIDIA Run:ai self-hosted options as depicted below. Click Ok to proceed:
NVIDIA GPU Operator
Ingress NGINX Controller
Knative Operator
KubeFlow Training operator
Kubernetes Metrics Server
Kubernetes State Metrics
LeaderWorkerSet operator
MetalLB
Network Operator
Prometheus Adapter
Prometheus Operator Stack
Run:ai (self-hosted)

Provide the NVIDIA Run:ai configuration with the below and click Ok to proceed:
Run:ai Registry Credentials - Enter the path to a file containing the base64-encoded NVIDIA token. Alternatively the Base64 encoded value can be pasted in directly.
Run:ai Control Plane Domain Name (FQDN) - Enter the Run:ai control plane’s fully qualified domain name (e.g.,
runai.example.com). This value should be different from the FQDN entered on the first “Insert basic values” Kubernetes setup in Step 10. It should be what was used when creating certificates (and should not be the same as the BCM head node hostname).Local CA Cert Path (.crt or .pem) - Path to the root CA certificate file if you are using a local CA–issued certificate (common in testing or internal environments). It’s optional if using a certificate from a public CA.
Domain Cert Path (.crt/.pem) - Path to the full-chain certificate for your domain (the domain’s leaf certificate followed by any intermediate certificates).
Domain Cert Key Path (.key) - Path to the private key that matches the domain certificate.

Select yes to install NVIDIA Run:ai components. Click Ok to proceed:

Select the
k8s-system-usernode category for the NVIDIA Run:ai control plane nodes and click Ok to proceed:

Select the required NVIDIA GPU Operator version (v25.3.2). Click Ok to proceed:

Select the required Network Operator (v25.4.0) version. Click Ok to proceed:

Select the required NVIDIA Run:ai version. Click Ok to proceed:

When prompted to supply a Custom YAML config for the GPU Operator leave the default (blank) and click Ok to proceed:

Configure the NVIDIA GPU Operator by selecting the following configuration parameters. Click Ok to proceed:

Supply the path to the
netop-values.yamlfile that was created before. Click Ok to proceed:

Click Ok for the MetalLB IP address pools page and it will automatically set up the requirements for NVIDIA Run:ai:

Specify the ingress IP addresses prepared as documented in the Pre-installation checklist section. The mention of MetalLB here is an indication that these will be set up as part of a load balanced pool and assigned to each respective ingress. Click Ok to proceed:

Select no to expose the Kubernetes Ingress to the default HTTPS port. Click Ok to proceed:

Leave the node ports for the Ingress NGINX Controller at the pre-populated values (do not modify them) and click Ok to proceed:

Select the serving option in the Knative Operator components dialog. Click Ok to proceed:

If deploying onto an A100 or H100 only cluster, select yes. If deploying on any other cluster configuration select no. Click Ok to proceed:

If yes was selected for the previous step, select the appropriate option for the cluster and click Ok to proceed. If no was selected for the previous step, this page will not appear:

Select yes to install the Permission Manager. Click Ok to proceed:

Select Local path as the Kubernetes StorageClass. Ensure that both enabled and default are specified. Click Ok to proceed:

Configure the CSI Provider (
local-path-provisioner) to employ shared storage (/cm/shared/apps/kubernetes/k8s-user/var/volumesas a default). Click Ok to proceed:

Select yes to enable local persistent storage for Grafana. Click Ok to proceed:

Select Save config, set an accessible location for the config file (for example:
/cm/share/runai/cm-kubernete-setup.conf) with the rest of the config files and then click Ok. Select Exit and Ok to complete the wizard and return to the terminal:

The deployment process may require an extended period (60+ minutes).. In order to prevent potential interruptions, failures, or network outages from disrupting the deployment process it’s recommended to perform the deployment from a persistent terminal session such as tmux or screen.
# Start a new screen session named "install_runai"
# This allows detach/reattaching safely during the installation
screen -S install_runai
# Inside the screen session: run the cluster setup using the configuration file
cm-kubernetes-setup -c /cm/shared/runai/cm-kubernetes-setup.confConnect to NVIDIA Run:ai User Interface
Open your browser and go to:
https://<DOMAIN>Log in using the default credentials:
User:
[email protected]Password:
Abcd!234
You will be prompted to change the password.

Post-wizard Deployment Steps
After the BCM installation assistant completes additional steps are required.
If multiple Kubernetes clusters are configured in this instance of BCM, load the correct Kubernetes module before running all post-wizard commands:
module unload kubernetes
module load kubernetes/k8s-userMPI Operator
Install the MPI Operator v0.6.0 or later by running the following command:
kubectl apply --server-side -f https://raw.githubusercontent.com/kubeflow/mpi-operator/v0.6.0/deploy/v2beta1/mpi-operator.yaml --force-conflicts
# Validate MPIJob CRD is installed
kubectl get crd mpijobs.kubeflow.org
> NAME CREATED AT
> mpijobs.kubeflow.org 2025-09-10T20:57:42ZNVIDIA Dynamic Resource Allocation (DRA) Driver
The NVIDIA DRA Driver for GPUs extends how NVIDIA GPUs are consumed within Kubernetes. This is required to enable secure Internode Memory Exchange (IMEX) on Multi-Node NVLink (MNNVL) systems (e.g. GB200 and similar) for Kubernetes workloads and should be included with all NVIDIA GPU systems.
Install using Helm:
helm repo add nvidia https://helm.ngc.nvidia.com/nvidia \ && helm repo update helm install nvidia-dra-driver-gpu nvidia/nvidia-dra-driver-gpu \ --version="25.3.0" \ --create-namespace \ --namespace nvidia-dra-driver-gpu \ --set nvidiaDriverRoot=/ \ --set resources.gpus.enabled=falseGB200 only - Create a file in
/cm/shared/runaifrom dra-test-gb200.yaml and update the clique ID to match the clique ID from the cluster. Note that the following test addresses single rack NVL72 clusters. For multi-rack systems, you’ll need to adjustpodAffinity(e.g.topologyKey: nvidia.com/gpu.clique).kubectl describe nodes | grep nvidia.com/gpu.clique= > nvidia.com/gpu.clique=f84d133c-bbc9-55fd-b1ff-ffffc7ef6783.23322 # For DGX GB200 systems kubectl apply -f /cm/shared/runai/dra-test-gb200.yamlGB200 only - Validate the test successfully completed and inspect the logs of the launcher:
# For GB200 kubectl get pods > NAME READY STATUS RESTARTS AGE > nvbandwidth-test-launcher-snb82 0/1 Completed 0 72s kubectl logs nvbandwidth-test-launcher-snb82GB200 only - Cleanup test:
#For GB200 kubectl delete -f dra-test-gb200.yaml
Enable DRA and Multi-Node NVLink
The default NVIDIA Run:ai configuration does not expose DRA features. After installing the DRA components, this can be enabled by modifying the runaiconfig in the cluster. See Advanced cluster configurations for more details:
# Edit the runaiconfig object to toggle GPUNetworkAccelerationEnable
# to true and adjust tolerations for the Kubernetes control plane
kubectl patch runaiconfig runai \
-n runai \
--context=kubernetes-admin@k8s-user \
--type='merge' \
-p '{
"spec": {
"workload-controller": {
"GPUNetworkAccelerationEnabled": true
},
"global": {
"tolerations": [
{
"key": "node-role.kubernetes.io/control-plane",
"operator": "Exists",
"effect": "NoSchedule"
}
]
}
}
}'Instructions for validating the change and reverting if necessary:
# Validate the patch was applied successfully
kubectl get runaiconfig runai \
-n runai \
--context=kubernetes-admin@k8s-user \
-o custom-columns=GPUAccelEnabled:.spec.workload-controller.GPUNetworkAccelerationEnabled,Tolerations:.spec.global.tolerations
# To revert the runaiconfig object change
kubectl patch runaiconfig runai -n runai --type='merge' -p '{
"spec": {
"workload-controller": {
"GPUNetworkAccelerationEnabled": false
},
"global": {
"tolerations": null
}
}
}'Configure the Network Operator for B200 and GB200 Systems
In version 11.25.08 of the BCM installation assistant, the Network Operator requires additional configuration on DGX B200 and GB200 SuperPOD / BasePOD systems. While the operator is installed in a preceding step, it does not automatically initialize or configure SR-IOV and secondary network plugins.
The following CRD resources have to be created in the exact order as below:
SR-IOV Network Policies for each NVIDIA InfiniBand NIC
An nvIPAM IP address pool
SR-IOV InfiniBand networks\
Create SR-IOV network node policies using the nic-cluster-policy.yaml that was created in an earlier step:
kubectl apply -f /cm/shared/runai/nic-cluster-policy.yamlCreate an IPAM IP Pool using the respective combined-ippools-gb200.yaml or combined-ippools-b200.yaml that were created in an earlier step:
kubectl apply -f /cm/shared/runai/combined-ippools-gb200.yamlCreate the SR-IOV IB networks using the respective combined-sriovibnet-gb200.yaml or combined-sriovibnet-b200.yaml that were created in an earlier step:
kubectl apply -f /cm/shared/runai/combined-sriovibnet-gb200.yaml
Create the SR-IOV node pool configuration using the sriov-node-pool-config.yaml:
kubectl apply -f /cm/shared/runai/sriov-node-pool-config.yaml
Validate by describing one of the DGX nodes and checking for SRIOV devices:
# Describe a DGX worker node kubectl describe node <dgx-node> --context=kubernetes-admin@k8s-user | grep sriovib # Example output nvidia.com/sriovib_resource_a: 16 nvidia.com/sriovib_resource_b: 16 nvidia.com/sriovib_resource_c: 16 nvidia.com/sriovib_resource_d: 16 nvidia.com/sriovib_resource_a: 16 nvidia.com/sriovib_resource_b: 16 nvidia.com/sriovib_resource_c: 16 nvidia.com/sriovib_resource_d: 16 nvidia.com/sriovib_resource_b # Check the state of SR-IOV Nodes kubectl get -n network-operator sriovnetworknodestate --context=kubernetes-admin@k8s-user # Example Output NAME SYNC STATUS <dgx_node> Succeeded
Validate by running the DGX SuperPOD platform specific tests:
For GB200 - ib-test-gb200.yaml:
#DGX GB200 kubectl apply -f /cm/shared/runai/ib-test-gb200.yaml -n default MPI version: Open MPI v4.1.4, package: Debian OpenMPI, ident: 4.1.4, repo rev: v4.1.4, May 26, 2022 CUDA Runtime Version: 12080 CUDA Driver Version: 12080 Driver Version: 570.172.08 Process 0 (nvbandwidth-test-worker-0): device 0: NVIDIA GB200 (00000008:01:00) Process 1 (nvbandwidth-test-worker-0): device 1: NVIDIA GB200 (00000009:01:00) Process 2 (nvbandwidth-test-worker-0): device 2: NVIDIA GB200 (00000018:01:00) Process 3 (nvbandwidth-test-worker-0): device 3: NVIDIA GB200 (00000019:01:00) Process 4 (nvbandwidth-test-worker-1): device 0: NVIDIA GB200 (00000008:01:00) Process 5 (nvbandwidth-test-worker-1): device 1: NVIDIA GB200 (00000009:01:00) Process 6 (nvbandwidth-test-worker-1): device 2: NVIDIA GB200 (00000018:01:00) Process 7 (nvbandwidth-test-worker-1): device 3: NVIDIA GB200 (00000019:01:00) Running host_to_device_memcpy_ce. memcpy CE CPU(row) -> GPU(column) bandwidth (GB/s) 0 1 2 3 0 85.59 95.33 200.73 191.27 SUM host_to_device_memcpy_ce 572.93For B200 - ib-test-b200.yaml:
# DGX B200 kubectl apply -f /cm/shared/runai/ib-test-b200.yaml -n default kubectl logs -n default nccl-test-launcher-hdm54 Warning: Permanently added '[nccl-test-worker-0.nccl-test.default.svc]:2222' (ED25519) to the list of known hosts. Warning: Permanently added '[nccl-test-worker-1.nccl-test.default.svc]:2222' (ED25519) to the list of known hosts. # nThread 1 nGpus 1 minBytes 16 maxBytes 17179869184 step: 2(factor) warmup iters: 5 iters: 20 agg iters: 1 validation: 1 graph: 0 # # Using devices # Rank 0 Group 0 Pid 46 on nccl-test-worker-0 device 0 [0x1b] NVIDIA B200 # Rank 1 Group 0 Pid 47 on nccl-test-worker-0 device 1 [0x43] NVIDIA B200 # Rank 2 Group 0 Pid 48 on nccl-test-worker-0 device 2 [0x52] NVIDIA B200 # Rank 3 Group 0 Pid 49 on nccl-test-worker-0 device 3 [0x61] NVIDIA B200 # Rank 4 Group 0 Pid 50 on nccl-test-worker-0 device 4 [0x9d] NVIDIA B200 # Rank 5 Group 0 Pid 52 on nccl-test-worker-0 device 5 [0xc3] NVIDIA B200 # Rank 6 Group 0 Pid 55 on nccl-test-worker-0 device 6 [0xd1] NVIDIA B200 # Rank 7 Group 0 Pid 59 on nccl-test-worker-0 device 7 [0xdf] NVIDIA B200 # Rank 8 Group 0 Pid 46 on nccl-test-worker-1 device 0 [0x1b] NVIDIA B200 # Rank 9 Group 0 Pid 47 on nccl-test-worker-1 device 1 [0x43] NVIDIA B200 # Rank 10 Group 0 Pid 48 on nccl-test-worker-1 device 2 [0x52] NVIDIA B200 # Rank 11 Group 0 Pid 49 on nccl-test-worker-1 device 3 [0x61] NVIDIA B200 # Rank 12 Group 0 Pid 50 on nccl-test-worker-1 device 4 [0x9d] NVIDIA B200 # Rank 13 Group 0 Pid 51 on nccl-test-worker-1 device 5 [0xc3] NVIDIA B200 # Rank 14 Group 0 Pid 54 on nccl-test-worker-1 device 6 [0xd1] NVIDIA B200 # Rank 15 Group 0 Pid 58 on nccl-test-worker-1 device 7 [0xdf] NVIDIA B200 # # out-of-place in-place # size count type redop root time algbw busbw #wrong time algbw busbw #wrong # (B) (elements) (us) (GB/s) (GB/s) (us) (GB/s) (GB/s) 16 4 float sum -1 38.60 0.00 0.00 0 49.18 0.00 0.00 0 32 8 float sum -1 42.23 0.00 0.00 0 50.96 0.00 0.00 0 64 16 float sum -1 47.44 0.00 0.00 0 42.18 0.00 0.00 0 128 32 float sum -1 39.51 0.00 0.01 0 42.78 0.00 0.01 0 256 64 float sum -1 40.16 0.01 0.01 0 43.35 0.01 0.01 0 512 128 float sum -1 39.22 0.01 0.02 0 44.53 0.01 0.02 0 1024 256 float sum -1 42.81 0.02 0.04 0 44.47 0.02 0.04 0 2048 512 float sum -1 40.63 0.05 0.09 0 52.70 0.04 0.07 0 4096 1024 float sum -1 46.76 0.09 0.16 0 52.63 0.08 0.15 0 8192 2048 float sum -1 47.22 0.17 0.33 0 53.69 0.15 0.29 0 16384 4096 float sum -1 49.02 0.33 0.63 0 50.96 0.32 0.60 0 32768 8192 float sum -1 54.24 0.60 1.13 0 53.88 0.61 1.14 0 65536 16384 float sum -1 59.05 1.11 2.08 0 59.53 1.10 2.06 0 131072 32768 float sum -1 62.04 2.11 3.96 0 63.99 2.05 3.84 0 262144 65536 float sum -1 106.4 2.46 4.62 0 103.1 2.54 4.77 0 524288 131072 float sum -1 107.5 4.88 9.15 0 102.8 5.10 9.56 0 1048576 262144 float sum -1 108.8 9.64 18.07 0 106.6 9.83 18.44 0 2097152 524288 float sum -1 112.7 18.60 34.88 0 106.6 19.67 36.88 0 4194304 1048576 float sum -1 118.2 35.49 66.54 0 116.6 35.97 67.44 0 8388608 2097152 float sum -1 150.2 55.85 104.72 0 153.8 54.54 102.26 0 16777216 4194304 float sum -1 187.5 89.46 167.73 0 188.1 89.19 167.23 0 33554432 8388608 float sum -1 250.5 133.97 251.20 0 251.6 133.35 250.02 0 67108864 16777216 float sum -1 395.9 169.52 317.86 0 395.1 169.87 318.50 0 134217728 33554432 float sum -1 618.9 216.85 406.59 0 620.8 216.20 405.37 0 268435456 67108864 float sum -1 1073.4 250.08 468.90 0 1074.2 249.89 468.54 0 536870912 134217728 float sum -1 1977.2 271.53 509.13 0 1976.0 271.69 509.42 0 1073741824 268435456 float sum -1 3713.5 289.14 542.15 0 3710.3 289.40 542.62 0 2147483648 536870912 float sum -1 7245.1 296.40 555.76 0 7226.3 297.18 557.20 0 4294967296 1073741824 float sum -1 14049 305.71 573.20 0 13939 308.13 577.75 0 8589934592 2147483648 float sum -1 27360 313.97 588.68 0 27292 314.74 590.13 0 17179869184 4294967296 float sum -1 53941 318.49 597.17 0 53953 318.43 597.05 0 # Out of bounds values : 0 OK # Avg bus bandwidth : 168.649 # Clean up after validating via ib-test-b200.yaml kubectl delete -f /cm/shared/runai/ib-test-b200.yaml -n default
Apply Security Policies - Optional
By default, BCM Kubernetes deployment has permissive security policies to ease in development environments. For production clusters or in secure environments, it’s recommended to take additional steps to harden the cluster. This includes steps such as configuring permission manager, applying Kyverno policies, and applying Calico policies.
For deployments of NVIDIA Run:ai as a part of NVIDIA Mission Control, please reach out to your NVIDIA representative for the latest example configurations and suggested policies. The Mission Control software installation guide’s Kubernetes Security Hardening documentation provides guidance for application.
Create Node Pools - Optional
See Node pools to create and manage groups of nodes (either by predefined node label or administrator-defined node labels). This optional configuration step can be used for advanced deployment scenarios to allocate different resources across teams or projects.
Add Additional Users - Optional
See Users for steps on adding additional users beyond the initial [email protected] account or connecting SSO.
Install the NVIDA Run:ai Command-line - Optional
To obtain the command line binary, see the Install and configure CLI section.
Test the command line tool installation
Validate the installation by running the following command:
runai versionSet the Control Plane URL
The following step is required for Windows users only. Linux and Mac clients are configured via the installation script.
Run the following command (substituting the NVIDIA Run:ai control plane FQDN value specified in previous steps) to create the config.json file in the default path:
runai config set --cp-url runai.example.comAlternative, the Base Command Manager installation assistant can generate this config with the following steps:


Validate NVIDIA Run:ai
To validate the installation, please refer to the quick start guides for deploying single-GPU training jobs, multi-node training jobs, single-GPU inference jobs, and multi-GPU inference jobs. Certain NGC workloads may require adding NGC API keys and docker credentials into the cluster.
Validate the ingress IP for NVIDIA Run:ai inference is configured,
EXTERNAL-IPshould have the value configured in the prior MetalLB steps:kubectl get svc -n knative-serving kourier -o wide NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE kourier LoadBalancer x.x.x.x 10.1.1.26 80:31038/TCP,443:30783/TCP 8hValidate distributed training workloads, see Run you first distributed training workload:
# Example command runai training mpi submit distributed-training \ -g 4 \ -p training \ --node-pools nvl72rackb06 \ -i ghcr.io/nvidia/k8s-samples:nvbandwidth-v0.7-8d103163 \ --workers 2 \ --slots-per-worker 4 \ --run-as-uid 1000 \ --ssh-auth-mount-path /home/mpiuser/.ssh \ --clean-pod-policy Running \ --master-command mpirun \ --master-args "--bind-to core --map-by ppr:4:node -np 8 --report-bindings -q nvbandwidth -t multinode_device_to_device_memcpy_read_ce" \ --command -- /usr/sbin/sshd -De -f /home/mpiuser/.sshd_configValidate distributed inference workloads, see Run your first custom inference workload:
{ "name": "distributed-vllm", "projectId": "4501034", "clusterId": "c7cd67df-c309-45ac-9056-5a04d074617d", "spec": { "workers": 1, "replicas": 1, "servingPort": { "port": 8000, "exposedUrl": "http://vllm.infernece-calorado.runailabs-ps.com/" }, "leader": { "image": "vllm/vllm-openai:latest-aarch64", "command": "sh -c \"bash /vllm-workspace/examples/online_serving/multi-node-serving.sh leader --ray_cluster_size=$(LWS_GROUP_SIZE); python3 -m vllm.entrypoints.openai.api_server --port 8000 --model meta-llama/Meta-Llama-3.1-8B-Instruct --tensor-parallel-size 4 --pipeline_parallel_size 2\"", "environmentVariables": [ { "name": "NCCL_MNNVL_ENABLE", "value": "0" }, { "name": "HF_TOKEN", "value": "hf_xxx } ], "compute": { "largeShmRequest": true, "gpuDevicesRequest": 4 } }, "worker": { "image": "vllm/vllm-openai:latest-aarch64", "command": "sh -c \"bash /vllm-workspace/examples/online_serving/multi-node-serving.sh worker --ray_address=$(LWS_LEADER_ADDRESS)\"", "environmentVariables": [ { "name": "NCCL_MNNVL_ENABLE", "value": "0" }, { "name": "HF_TOKEN", "value": "<HF_TOKEN>" } ], "compute": { "largeShmRequest": true, "gpuDevicesRequest": 4 } } } }
Troubleshooting Common Issues
Last updated