Ctrlk

Contact support

Metrics and Telemetry

Metrics are numeric measurements recorded over time that are emitted from the NVIDIA Run:ai cluster and telemetry is a numeric measurement recorded in real-time when emitted from the NVIDIA Run:ai cluster.

Scopes

NVIDIA Run:ai provides control-plane API which supports and aggregates analytics at various levels.

Level

Description

Cluster

A cluster is a set of nodes pools and nodes. With Cluster metrics, metrics are aggregated at the Cluster level. In the NVIDIA Run:ai user interface, metrics are available in the Overview dashboard.

Node

Data is aggregated at the node level.

Node pool

Data is aggregated at the node pool level.

Workload

Data is aggregated at the workload level. In some workloads, e.g. with distributed workloads, these metrics aggregate data from all worker pods.

Pod

The basic unit of execution.

Project

The basic organizational unit. Projects are the tool to implement resource allocation policies as well as the segregation between different initiatives.

Department

Departments are a grouping of projects.

Supported Metrics

Metric name in API

Applicable API endpoint

Metric name in UI per grid

Applicable UI grid

ALLOCATED_GPU

Clusters
Node pools

GPU devices (allocated)
Allocated GPUs

AVG_WORKLOAD_WAIT_TIME

CPU_LIMIT_CORES

CPU limit

CPU_MEMORY_LIMIT_BYTES

CPU memory limit

CPU_MEMORY_REQUEST_BYTES

CPU memory request

CPU_MEMORY_USAGE_BYTES

CPU memory usage

CPU_MEMORY_UTILIZATION

CPU memory utilization

CPU_REQUEST_CORES

CPU request

CPU_USAGE_CORES

CPU usage

CPU_UTILIZATION

CPU compute utilization
CPU utilization

GPU_ALLOCATION

GPU devices (allocated)

Overview dashboard

GPU_MEMORY_REQUEST_BYTES

GPU memory request

Workloads

GPU_MEMORY_USAGE_BYTES

GPU memory usage

GPU_MEMORY_USAGE_BYTES_PER_GPU

GPU memory usage per GPU

Workloads per pod

GPU_MEMORY_UTILIZATION

GPU memory utilization

GPU_MEMORY_UTILIZATION_PER_GPU

GPU memory utilization per GPU

Nodes

GPU_QUOTA

Quota

Quota management

GPU_UTILIZATION

GPU compute utilization

GPU_UTILIZATION_PER_GPU

GPU utilization per GPU

TOTAL_GPU

GPU devices total
Total GPUs

TOTAL_GPU_NODES

GPU_UTILIZATION_DISTRIBUTION

GPU utilization distribution

UNALLOCATED_GPU

GPU devices (unallocated)
Unallocated GPUs

CPU_QUOTA_MILLICORES

CPU_MEMORY_QUOTA_MB

CPU_ALLOCATION_MILLICORES

CPU_MEMORY_ALLOCATION_MB

POD_COUNT

RUNNING_POD_COUNT

NVLINK_BANDWIDTH_TOTAL

GPU Profiling

NVIDIA provides extended metrics as shown here.

Note

GPU profiling metrics are disabled by default. If unavailable, your administrator must enable it under General settings → Analytics → GPU profiling metrics. Before enabling, the administrator must configure GPU profiling through the DCGM Exporter and NVIDIA Run:ai Prometheus integration. For configuration steps, see GPU profiling metrics.

Metric name in API

Applicable API endpoint

Metric name in UI

Applicable UI table

GPU_FP16_ENGINE_ACTIVITY_PER_GPU

Pods
Nodes

GPU FP16 engine activity

GPU_FP32_ENGINE_ACTIVITY_PER_GPU

GPU FP32 engine activity

GPU_FP64_ENGINE_ACTIVITY_PER_GPU

GPU FP64 engine activity

GPU_GRAPHICS_ENGINE_ACTIVITY_PER_GPU

Graphics engine activity

GPU_MEMORY_BANDWIDTH_UTILIZATION_PER_GPU

Memory bandwidth utilization

GPU_NVLINK_RECEIVED_BANDWIDTH_PER_GPU

NVLink received bandwidth

GPU_NVLINK_TRANSMITTED_BANDWIDTH_PER_GPU

NVLink transmitted bandwidth

GPU_PCIE_RECEIVED_BANDWIDTH_PER_GPU

PCIe received bandwidth

GPU_PCIE_TRANSMITTED_BANDWIDTH_PER_GPU

PCIe transmitted bandwidth

GPU_SM_ACTIVITY_PER_GPU

GPU SM activity

GPU_SM_OCCUPANCY_PER_GPU

GPU SM occupancy

GPU_TENSOR_ACTIVITY_PER_GPU

GPU tensor activity

GPU_OOMKILL_SWAP_OUT_OF_RAM_COUNT_PER_GPU

OOMKill swap out of RAM count

GPU_OOMKILL_BURST_COUNT_PER_GPU

OOMKill burst count

GPU_OOMKILL_IDLE_COUNT_PER_GPU

OOMKill idle count

GPU_SWAP_MEMORY_BYTES_PER_GPU

GPU swap memory

NVIDIA NIM

NVIDIA NIM metrics provide workload-level observability, including key runtime and performance data such as request throughput, latency, and token usage for LLMs. See NIM observability metrics via API for more details.

Metric name in API

Applicable API endpoint

Metric name in UI

Applicable UI table

NIM_NUM_REQUESTS_RUNNING

Pods
Workloads

Request concurrency by status

NIM_NUM_REQUESTS_WAITING

Request concurrency by status

NIM_NUM_REQUEST_MAX

Request concurrency by status

NIM_REQUEST_SUCCESS_TOTAL

Request count by status

NIM_REQUEST_FAILURE_TOTAL

Request count by status

NIM_GPU_CACHE_USAGE_PERC

GPU KV cache utilization

NIM_TIME_TO_FIRST_TOKEN_SECONDS

NIM_E2E_REQUEST_LATENCY_SECONDS

NIM_TIME_TO_FIRST_TOKEN_SECONDS_PERCENTILES

Time to first token (TTFT) by percentiles

NIM_E2E_REQUEST_LATENCY_SECONDS_PERCENTILES

End to end request latency by percentiles

Supported Telemetry

Metric

Applicable API endpoint

Metric name in UI

Applicable UI table

WORKLOADS_COUNT

Workloads

ALLOCATED_GPUS

Allocated GPUs

GPU_allocation

READY_GPU_NODES

Ready / Total GPU nodes

Overview dashboard

READY_GPUS

Ready / Total GPU devices

Overview dashboard

TOTAL_GPU_NODES

Ready / Total GPU nodes

Overview dashboard

TOTAL_GPUS

Ready / Total GPU devices

Overview dashboard

IDLE_ALLOCATED_GPUS

Idle allocated GPU devices

Overview dashboard

FREE_GPUS

Free GPU devices

Nodes

TOTAL_CPU_CORES

CPU (Cores)

USED_CPU_CORES

ALLOCATED_CPU_CORES

Allocated CPU cores

TOTAL_GPU_MEMORY_BYTES

Nodes

GPU memory

USED_GPU_MEMORY_BYTES

Used GPU memory

TOTAL_CPU_MEMORY_BYTES

CPU memory

USED_CPU_MEMORY_BYTES

Used CPU memory

ALLOCATED_CPU_MEMORY_BYTES

Allocated CPU memory

GPU_QUOTA

GPU quota

CPU_QUOTA

MEMORY_QUOTA

GPU_ALLOCATION_NON_PREEMPTIBLE

CPU_ALLOCATION_NON_PREEMPTIBLE

MEMORY_ALLOCATION_NON_PREEMPTIBLE

PreviousGPU Profiling Metrics NextReports

Last updated 2 months ago