Metrics and Telemetry

Metrics are numeric measurements recorded over time that are emitted from the NVIDIA Run:ai cluster and telemetry is a numeric measurement recorded in real-time when emitted from the NVIDIA Run:ai cluster.

Scopes

NVIDIA Run:ai provides control-plane API which supports and aggregates analytics at various levels.

Level
Description

Cluster

A cluster is a set of nodes pools and nodes. With Cluster metrics, metrics are aggregated at the Cluster level. In the NVIDIA Run:ai user interface, metrics are available in the Overview dashboard.

Node

Data is aggregated at the node level.

Node pool

Data is aggregated at the node pool level.

Workload

Data is aggregated at the workload level. In some workloads, e.g. with distributed workloads, these metrics aggregate data from all worker pods.

Pod

The basic unit of execution.

Project

The basic organizational unit. Projects are the tool to implement resource allocation policies as well as the segregation between different initiatives.

Department

Departments are a grouping of projects.

Supported Metrics

Metric name in API
Applicable API endpoint
Metric name in UI per grid
Applicable UI grid

ALLOCATED_GPU

  • GPU devices (allocated)

  • Allocated GPUs

AVG_WORKLOAD_WAIT_TIME

CPU_LIMIT_CORES

CPU limit

CPU_MEMORY_LIMIT_BYTES

CPU memory limit

CPU_MEMORY_REQUEST_BYTES

CPU memory request

CPU_MEMORY_USAGE_BYTES

CPU memory usage

CPU_MEMORY_UTILIZATION

CPU memory utilization

CPU_REQUEST_CORES

CPU request

CPU_USAGE_CORES

CPU usage

CPU_UTILIZATION

  • CPU compute utilization

  • CPU utilization

GPU_ALLOCATION

GPU devices (allocated)

GPU_MEMORY_REQUEST_BYTES

GPU memory request

GPU_MEMORY_USAGE_BYTES

GPU memory usage

GPU_MEMORY_USAGE_BYTES_PER_GPU

GPU memory usage per GPU

GPU_MEMORY_UTILIZATION

GPU memory utilization

GPU_MEMORY_UTILIZATION_PER_GPU

GPU memory utilization per GPU

GPU_UTILIZATION_PER_GPU

GPU utilization per GPU

TOTAL_GPU

  • GPU devices total

  • Total GPUs

TOTAL_GPU_NODES

GPU_UTILIZATION_DISTRIBUTION

GPU utilization distribution

UNALLOCATED_GPU

  • GPU devices (unallocated)

  • Unallocated GPUs

CPU_QUOTA_MILLICORES

CPU_MEMORY_QUOTA_MB

CPU_ALLOCATION_MILLICORES

CPU_MEMORY_ALLOCATION_MB

POD_COUNT

RUNNING_POD_COUNT

NVLINK_BANDWIDTH_TOTAL

GPU Profiling

NVIDIA provides extended metrics as shown here. To enable these metrics, please contact NVIDIA Run:ai customer support.

Note

GPU profiling metrics are disabled by default. If unavailable, your administrator must enable it under General settings → Analytics → GPU profiling metrics.

Metric name in API
Applicable API endpoint
Metric name in UI
Applicable UI table

GPU_FP16_ENGINE_ACTIVITY_PER_GPU

GPU FP16 engine activity

GPU_FP32_ENGINE_ACTIVITY_PER_GPU

GPU FP32 engine activity

GPU_FP64_ENGINE_ACTIVITY_PER_GPU

GPU FP64 engine activity

GPU_GRAPHICS_ENGINE_ACTIVITY_PER_GPU

Graphics engine activity

GPU_MEMORY_BANDWIDTH_UTILIZATION_PER_GPU

Memory bandwidth utilization

GPU_NVLINK_RECEIVED_BANDWIDTH_PER_GPU

NVLink received bandwidth

GPU_NVLINK_TRANSMITTED_BANDWIDTH_PER_GPU

NVLink transmitted bandwidth

GPU_PCIE_RECEIVED_BANDWIDTH_PER_GPU

PCIe received bandwidth

GPU_PCIE_TRANSMITTED_BANDWIDTH_PER_GPU

PCIe transmitted bandwidth

GPU_SM_ACTIVITY_PER_GPU

GPU SM activity

GPU_SM_OCCUPANCY_PER_GPU

GPU SM occupancy

GPU_TENSOR_ACTIVITY_PER_GPU

GPU tensor activity

GPU_OOMKILL_SWAP_OUT_OF_RAM_COUNT_PER_GPU

OOMKill swap out of RAM count

GPU_OOMKILL_BURST_COUNT_PER_GPU

OOMKill burst count

GPU_OOMKILL_IDLE_COUNT_PER_GPU

OOMKill idle count

GPU_SWAP_MEMORY_BYTES_PER_GPU

GPU swap memory

NVIDIA NIM

NVIDIA NIM metrics provide workload-level observability, including key runtime and performance data such as request throughput, latency, and token usage for LLMs. See NIM observability metrics via API for more details.

Metric name in API
Applicable API endpoint
Metric name in UI
Applicable UI table

NIM_NUM_REQUESTS_RUNNING

Request concurrency by status

NIM_NUM_REQUESTS_WAITING

Request concurrency by status

NIM_NUM_REQUEST_MAX

Request concurrency by status

NIM_REQUEST_SUCCESS_TOTAL

Request count by status

NIM_REQUEST_FAILURE_TOTAL

Request count by status

NIM_GPU_CACHE_USAGE_PERC

GPU KV cache utilization

NIM_TIME_TO_FIRST_TOKEN_SECONDS

NIM_E2E_REQUEST_LATENCY_SECONDS

NIM_TIME_TO_FIRST_TOKEN_SECONDS_PERCENTILES

Time to first token (TTFT) by percentiles

NIM_E2E_REQUEST_LATENCY_SECONDS_PERCENTILES

End to end request latency by percentiles

Supported Telemetry

Metric
Applicable API endpoint
Metric name in UI
Applicable UI table

WORKLOADS_COUNT

ALLOCATED_GPUS

Allocated GPUs

READY_GPU_NODES

Ready / Total GPU nodes

READY_GPUS

Ready / Total GPU devices

TOTAL_GPU_NODES

Ready / Total GPU nodes

TOTAL_GPUS

Ready / Total GPU devices

IDLE_ALLOCATED_GPUS

Idle allocated GPU devices

FREE_GPUS

Free GPU devices

TOTAL_CPU_CORES

CPU (Cores)

USED_CPU_CORES

ALLOCATED_CPU_CORES

Allocated CPU cores

TOTAL_GPU_MEMORY_BYTES

GPU memory

USED_GPU_MEMORY_BYTES

Used GPU memory

TOTAL_CPU_MEMORY_BYTES

CPU memory

USED_CPU_MEMORY_BYTES

Used CPU memory

ALLOCATED_CPU_MEMORY_BYTES

Allocated CPU memory

GPU_ALLOCATION_NON_PREEMPTIBLE

CPU_ALLOCATION_NON_PREEMPTIBLE

MEMORY_ALLOCATION_NON_PREEMPTIBLE

Last updated