# Metrics and Telemetry

Metrics are numeric measurements recorded **over time** that are emitted from the NVIDIA Run:ai cluster and telemetry is a numeric measurement recorded in real-time when emitted from the NVIDIA Run:ai cluster.

## Scopes

NVIDIA Run:ai provides control-plane API which supports and aggregates analytics at various levels.

| Level      | Description                                                                                                                                                                                           |
| ---------- | ----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
| Cluster    | A cluster is a set of nodes pools and nodes. With Cluster metrics, metrics are aggregated at the Cluster level. In the NVIDIA Run:ai user interface, metrics are available in the Overview dashboard. |
| Node       | Data is aggregated at the node level.                                                                                                                                                                 |
| Node pool  | Data is aggregated at the node pool level.                                                                                                                                                            |
| Workload   | Data is aggregated at the workload level. In some workloads, e.g. with distributed workloads, these metrics aggregate data from all worker pods.                                                      |
| Pod        | The basic unit of execution.                                                                                                                                                                          |
| Project    | The basic organizational unit. Projects are the tool to implement resource allocation policies as well as the segregation between different initiatives.                                              |
| Department | Departments are a grouping of projects.                                                                                                                                                               |

## Supported Metrics

| Metric name in API               | Applicable API endpoint                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                           | Metric name in UI per grid                                           | Applicable UI grid                                                                                                                                                                                                                                                |
| -------------------------------- | ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- | -------------------------------------------------------------------- | ----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
| `ALLOCATED_GPU`                  | <ul><li><a href="https://app.gitbook.com/s/b5QLzc5pV7wpXz3CDYyp/organizations/clusters#get-api-v1-clusters-clusteruuid-metrics">Clusters</a></li><li><a href="https://app.gitbook.com/s/b5QLzc5pV7wpXz3CDYyp/organizations/nodepools#get-api-v1-clusters-clusteruuid-nodepools-nodepoolname-metrics">Node pools</a></li></ul>                                                                                                                                                                                                                                                                                                                     | <ul><li>GPU devices (allocated)</li><li>Allocated GPUs</li></ul>     | <ul><li><a href="../before-you-start#ui-views">Overview dashboard</a></li><li><a href="../../aiinitiatives/resources/node-pools#show-hide-details">Node pools</a></li></ul>                                                                                       |
| `AVG_WORKLOAD_WAIT_TIME`         | <ul><li><a href="https://app.gitbook.com/s/b5QLzc5pV7wpXz3CDYyp/organizations/clusters#get-api-v1-clusters-clusteruuid-metrics">Clusters</a></li><li><a href="https://app.gitbook.com/s/b5QLzc5pV7wpXz3CDYyp/organizations/nodepools#get-api-v1-clusters-clusteruuid-nodepools-nodepoolname-metrics">Node pools</a></li></ul>                                                                                                                                                                                                                                                                                                                     |                                                                      |                                                                                                                                                                                                                                                                   |
| `CPU_LIMIT_CORES`                | [Workloads](https://app.gitbook.com/s/b5QLzc5pV7wpXz3CDYyp/workloads/workloads#get-api-v1-workloads-workloadid-metrics)                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                           | CPU limit                                                            | [Workloads](https://run-ai-docs.nvidia.com/self-hosted/2.22/workloads-in-nvidia-run-ai/workloads#metrics)                                                                                                                                                         |
| `CPU_MEMORY_LIMIT_BYTES`         | [Workloads](https://app.gitbook.com/s/b5QLzc5pV7wpXz3CDYyp/workloads/workloads#get-api-v1-workloads-workloadid-metrics)                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                           | CPU memory limit                                                     | [Workloads](https://run-ai-docs.nvidia.com/self-hosted/2.22/workloads-in-nvidia-run-ai/workloads#metrics)                                                                                                                                                         |
| `CPU_MEMORY_REQUEST_BYTES`       | [Workloads](https://app.gitbook.com/s/b5QLzc5pV7wpXz3CDYyp/workloads/workloads#get-api-v1-workloads-workloadid-metrics)                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                           | CPU memory request                                                   | [Workloads](https://run-ai-docs.nvidia.com/self-hosted/2.22/workloads-in-nvidia-run-ai/workloads#metrics)                                                                                                                                                         |
| `CPU_MEMORY_USAGE_BYTES`         | <ul><li><a href="https://app.gitbook.com/s/b5QLzc5pV7wpXz3CDYyp/workloads/workloads#get-api-v1-workloads-workloadid-metrics">Workloads</a></li><li><a href="https://app.gitbook.com/s/b5QLzc5pV7wpXz3CDYyp/workloads/pods#get-api-v1-workloads-workloadid-pods-podid-metrics">Pods</a></li></ul>                                                                                                                                                                                                                                                                                                                                                  | CPU memory usage                                                     | [Workloads](https://run-ai-docs.nvidia.com/self-hosted/2.22/workloads-in-nvidia-run-ai/workloads#metrics)                                                                                                                                                         |
| `CPU_MEMORY_UTILIZATION`         | <ul><li><a href="https://app.gitbook.com/s/b5QLzc5pV7wpXz3CDYyp/organizations/clusters#get-api-v1-clusters-clusteruuid-metrics">Clusters</a></li><li><a href="https://app.gitbook.com/s/b5QLzc5pV7wpXz3CDYyp/organizations/nodepools#get-api-v1-clusters-clusteruuid-nodepools-nodepoolname-metrics">Node pools</a></li><li><a href="https://app.gitbook.com/s/b5QLzc5pV7wpXz3CDYyp/organizations/nodes#get-api-v1-nodes-nodeid-metrics">Nodes</a></li></ul>                                                                                                                                                                                      | CPU memory utilization                                               | <ul><li><a href="../before-you-start#ui-views">Overview dashboard</a></li><li><a href="../../aiinitiatives/resources/node-pools#show-hide-details">Node pools</a></li><li><a href="../../aiinitiatives/resources/nodes#show-hide-details">Nodes</a></li></ul>     |
| `CPU_REQUEST_CORES`              | [Workloads](https://app.gitbook.com/s/b5QLzc5pV7wpXz3CDYyp/workloads/workloads#get-api-v1-workloads-workloadid-metrics)                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                           | CPU request                                                          | [Workloads](https://run-ai-docs.nvidia.com/self-hosted/2.22/workloads-in-nvidia-run-ai/workloads)                                                                                                                                                                 |
| `CPU_USAGE_CORES`                | <ul><li><a href="https://app.gitbook.com/s/b5QLzc5pV7wpXz3CDYyp/organizations/nodes#get-api-v1-nodes-nodeid-metrics">Nodes</a></li><li><a href="https://app.gitbook.com/s/b5QLzc5pV7wpXz3CDYyp/workloads/workloads#get-api-v1-workloads-workloadid-metrics">Workloads</a></li><li><a href="https://app.gitbook.com/s/b5QLzc5pV7wpXz3CDYyp/workloads/pods#get-api-v1-workloads-workloadid-pods-podid-metrics">Pods</a></li></ul>                                                                                                                                                                                                                   | CPU usage                                                            | [Workloads](https://run-ai-docs.nvidia.com/self-hosted/2.22/workloads-in-nvidia-run-ai/workloads#metrics)                                                                                                                                                         |
| `CPU_UTILIZATION`                | <ul><li><a href="https://app.gitbook.com/s/b5QLzc5pV7wpXz3CDYyp/organizations/clusters#get-api-v1-clusters-clusteruuid-metrics">Clusters</a></li><li><a href="https://app.gitbook.com/s/b5QLzc5pV7wpXz3CDYyp/organizations/nodepools#get-api-v1-clusters-clusteruuid-nodepools-nodepoolname-metrics">Node pools</a></li><li><a href="https://app.gitbook.com/s/b5QLzc5pV7wpXz3CDYyp/organizations/nodes#get-api-v1-nodes-nodeid-metrics">Nodes</a></li></ul>                                                                                                                                                                                      | <ul><li>CPU compute utilization</li><li>CPU utilization</li></ul>    | <ul><li><a href="../before-you-start#ui-views">Overview dashboard</a> and <a href="../../aiinitiatives/resources/node-pools#show-hide-details">Node pools</a></li><li><a href="../../aiinitiatives/resources/nodes#show-hide-details">Nodes</a></li></ul>         |
| `GPU_ALLOCATION`                 | <ul><li><a href="https://app.gitbook.com/s/b5QLzc5pV7wpXz3CDYyp/workloads/workloads#get-api-v1-workloads-workloadid-metrics">Workloads</a></li><li><a href="https://app.gitbook.com/s/b5QLzc5pV7wpXz3CDYyp/organizations/projects#get-api-v1-org-unit-projects-projectid-metrics">Projects</a></li><li><a href="https://app.gitbook.com/s/b5QLzc5pV7wpXz3CDYyp/organizations/departments#get-api-v1-org-unit-departments-departmentid-metrics">Departments</a></li></ul>                                                                                                                                                                          | GPU devices (allocated)                                              | [Overview dashboard](https://run-ai-docs.nvidia.com/self-hosted/2.22/platform-management/before-you-start#ui-views)                                                                                                                                               |
| `GPU_MEMORY_REQUEST_BYTES`       | [Workloads](https://app.gitbook.com/s/b5QLzc5pV7wpXz3CDYyp/workloads/workloads#get-api-v1-workloads-workloadid-metrics)                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                           | GPU memory request                                                   | [Workloads](https://run-ai-docs.nvidia.com/self-hosted/2.22/workloads-in-nvidia-run-ai/workloads#metrics)                                                                                                                                                         |
| `GPU_MEMORY_USAGE_BYTES`         | <ul><li><a href="https://app.gitbook.com/s/b5QLzc5pV7wpXz3CDYyp/workloads/workloads#get-api-v1-workloads-workloadid-metrics">Workloads</a></li><li><a href="https://app.gitbook.com/s/b5QLzc5pV7wpXz3CDYyp/workloads/pods#get-api-v1-workloads-workloadid-pods-podid-metrics">Pods</a></li><li><a href="https://app.gitbook.com/s/b5QLzc5pV7wpXz3CDYyp/organizations/nodes#get-api-v1-nodes-nodeid-metrics">Nodes</a></li></ul>                                                                                                                                                                                                                   | GPU memory usage                                                     | [Workloads](https://run-ai-docs.nvidia.com/self-hosted/2.22/workloads-in-nvidia-run-ai/workloads#metrics)                                                                                                                                                         |
| `GPU_MEMORY_USAGE_BYTES_PER_GPU` | <ul><li><a href="https://app.gitbook.com/s/b5QLzc5pV7wpXz3CDYyp/organizations/nodes#get-api-v1-nodes-nodeid-metrics">Nodes</a></li><li><a href="https://app.gitbook.com/s/b5QLzc5pV7wpXz3CDYyp/workloads/pods#get-api-v1-workloads-workloadid-pods-podid-metrics">Pods</a></li></ul>                                                                                                                                                                                                                                                                                                                                                              | GPU memory usage per GPU                                             | [Workloads per pod](https://run-ai-docs.nvidia.com/self-hosted/2.22/workloads-in-nvidia-run-ai/workloads#metrics)                                                                                                                                                 |
| `GPU_MEMORY_UTILIZATION`         | <ul><li><a href="https://app.gitbook.com/s/b5QLzc5pV7wpXz3CDYyp/organizations/clusters#get-api-v1-clusters-clusteruuid-metrics">Clusters</a></li><li><a href="https://app.gitbook.com/s/b5QLzc5pV7wpXz3CDYyp/organizations/nodepools#get-api-v1-clusters-clusteruuid-nodepools-nodepoolname-metrics">Node pools</a></li></ul>                                                                                                                                                                                                                                                                                                                     | GPU memory utilization                                               | <ul><li><a href="../before-you-start#ui-views">Overview dashboard</a></li><li><a href="../../aiinitiatives/resources/node-pools#show-hide-details">Node pools</a></li></ul>                                                                                       |
| `GPU_MEMORY_UTILIZATION_PER_GPU` | [Nodes](https://app.gitbook.com/s/b5QLzc5pV7wpXz3CDYyp/organizations/nodes#get-api-v1-nodes-nodeid-metrics)                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                       | GPU memory utilization per GPU                                       | [Nodes](https://run-ai-docs.nvidia.com/self-hosted/2.22/aiinitiatives/resources/nodes#show-hide-details)                                                                                                                                                          |
| `GPU_QUOTA`                      | <ul><li><a href="https://app.gitbook.com/s/b5QLzc5pV7wpXz3CDYyp/organizations/clusters#get-api-v1-clusters-clusteruuid-metrics">Clusters</a></li><li><a href="https://app.gitbook.com/s/b5QLzc5pV7wpXz3CDYyp/organizations/nodepools#get-api-v1-clusters-clusteruuid-nodepools-nodepoolname-metrics">Node pools</a></li><li><a href="https://app.gitbook.com/s/b5QLzc5pV7wpXz3CDYyp/organizations/projects#get-api-v1-org-unit-projects-projectid-metrics">Projects</a></li><li><a href="https://app.gitbook.com/s/b5QLzc5pV7wpXz3CDYyp/organizations/departments#get-api-v1-org-unit-departments-departmentid-metrics">Departments</a></li></ul> | Quota                                                                | [Quota management](https://run-ai-docs.nvidia.com/self-hosted/2.22/platform-management/before-you-start#ui-views)                                                                                                                                                 |
| `GPU_UTILIZATION`                | <ul><li><a href="https://app.gitbook.com/s/b5QLzc5pV7wpXz3CDYyp/organizations/clusters#get-api-v1-clusters-clusteruuid-metrics">Clusters</a></li><li><a href="https://app.gitbook.com/s/b5QLzc5pV7wpXz3CDYyp/organizations/nodepools#get-api-v1-clusters-clusteruuid-nodepools-nodepoolname-metrics">Node pools</a></li><li><a href="https://app.gitbook.com/s/b5QLzc5pV7wpXz3CDYyp/workloads/workloads#get-api-v1-workloads-workloadid-metrics">Workloads</a></li><li><a href="https://app.gitbook.com/s/b5QLzc5pV7wpXz3CDYyp/workloads/pods#get-api-v1-workloads-workloadid-pods-podid-metrics">Pods</a></li></ul>                              | GPU compute utilization                                              | <ul><li><a href="../before-you-start#ui-views">Overview dashboard</a></li><li><a href="../../aiinitiatives/resources/node-pools#show-hide-details">Node pools</a></li><li><a href="../../../workloads-in-nvidia-run-ai/workloads#metrics">Workloads</a></li></ul> |
| `GPU_UTILIZATION_PER_GPU`        | <ul><li><a href="https://app.gitbook.com/s/b5QLzc5pV7wpXz3CDYyp/organizations/nodes#get-api-v1-nodes-nodeid-metrics">Nodes</a></li><li><a href="https://app.gitbook.com/s/b5QLzc5pV7wpXz3CDYyp/workloads/pods#get-api-v1-workloads-workloadid-pods-podid-metrics">Pods</a></li></ul>                                                                                                                                                                                                                                                                                                                                                              | GPU utilization per GPU                                              | [Nodes](https://run-ai-docs.nvidia.com/self-hosted/2.22/aiinitiatives/resources/nodes#show-hide-details)                                                                                                                                                          |
| `TOTAL_GPU`                      | <ul><li><a href="https://app.gitbook.com/s/b5QLzc5pV7wpXz3CDYyp/organizations/clusters#get-api-v1-clusters-clusteruuid-metrics">Clusters</a></li><li><a href="https://app.gitbook.com/s/b5QLzc5pV7wpXz3CDYyp/organizations/nodepools#get-api-v1-clusters-clusteruuid-nodepools-nodepoolname-metrics">Node pools</a></li></ul>                                                                                                                                                                                                                                                                                                                     | <ul><li>GPU devices total</li><li>Total GPUs</li></ul>               | <ul><li><a href="../before-you-start#ui-views">Overview dashboard</a></li><li><a href="../../aiinitiatives/resources/node-pools#show-hide-details">Node pools</a></li></ul>                                                                                       |
| `TOTAL_GPU_NODES`                | <ul><li><a href="https://app.gitbook.com/s/b5QLzc5pV7wpXz3CDYyp/organizations/clusters#get-api-v1-clusters-clusteruuid-metrics">Clusters</a></li><li><a href="https://app.gitbook.com/s/b5QLzc5pV7wpXz3CDYyp/organizations/nodepools#get-api-v1-clusters-clusteruuid-nodepools-nodepoolname-metrics">Node pools</a></li></ul>                                                                                                                                                                                                                                                                                                                     |                                                                      |                                                                                                                                                                                                                                                                   |
| `GPU_UTILIZATION_DISTRIBUTION`   | <ul><li><a href="https://app.gitbook.com/s/b5QLzc5pV7wpXz3CDYyp/organizations/clusters#get-api-v1-clusters-clusteruuid-metrics">Clusters</a></li><li><a href="https://app.gitbook.com/s/b5QLzc5pV7wpXz3CDYyp/organizations/nodepools#get-api-v1-clusters-clusteruuid-nodepools-nodepoolname-metrics">Node pools</a></li></ul>                                                                                                                                                                                                                                                                                                                     | GPU utilization distribution                                         | [Node pools](https://run-ai-docs.nvidia.com/self-hosted/2.22/aiinitiatives/resources/node-pools#show-hide-details)                                                                                                                                                |
| `UNALLOCATED_GPU`                | <ul><li><a href="https://app.gitbook.com/s/b5QLzc5pV7wpXz3CDYyp/organizations/clusters#get-api-v1-clusters-clusteruuid-metrics">Clusters</a></li><li><a href="https://app.gitbook.com/s/b5QLzc5pV7wpXz3CDYyp/organizations/nodepools#get-api-v1-clusters-clusteruuid-nodepools-nodepoolname-metrics">Node pools</a></li></ul>                                                                                                                                                                                                                                                                                                                     | <ul><li>GPU devices (unallocated)</li><li>Unallocated GPUs</li></ul> | <ul><li><a href="../before-you-start#ui-views">Overview dashboard</a></li><li><a href="../../aiinitiatives/resources/node-pools#show-hide-details">Node pools</a></li></ul>                                                                                       |
| `CPU_QUOTA_MILLICORES`           | <ul><li><a href="https://app.gitbook.com/s/b5QLzc5pV7wpXz3CDYyp/organizations/projects#get-api-v1-org-unit-projects-projectid-metrics">Projects</a></li><li><a href="https://app.gitbook.com/s/b5QLzc5pV7wpXz3CDYyp/organizations/departments#get-api-v1-org-unit-departments-departmentid-metrics">Departments</a></li></ul>                                                                                                                                                                                                                                                                                                                     |                                                                      |                                                                                                                                                                                                                                                                   |
| `CPU_MEMORY_QUOTA_MB`            | <ul><li><a href="https://app.gitbook.com/s/b5QLzc5pV7wpXz3CDYyp/organizations/projects#get-api-v1-org-unit-projects-projectid-metrics">Projects</a></li><li><a href="https://app.gitbook.com/s/b5QLzc5pV7wpXz3CDYyp/organizations/departments#get-api-v1-org-unit-departments-departmentid-metrics">Departments</a></li></ul>                                                                                                                                                                                                                                                                                                                     |                                                                      |                                                                                                                                                                                                                                                                   |
| `CPU_ALLOCATION_MILLICORES`      | <ul><li><a href="https://app.gitbook.com/s/b5QLzc5pV7wpXz3CDYyp/organizations/projects#get-api-v1-org-unit-projects-projectid-metrics">Projects</a></li><li><a href="https://app.gitbook.com/s/b5QLzc5pV7wpXz3CDYyp/organizations/departments#get-api-v1-org-unit-departments-departmentid-metrics">Departments</a></li></ul>                                                                                                                                                                                                                                                                                                                     |                                                                      |                                                                                                                                                                                                                                                                   |
| `CPU_MEMORY_ALLOCATION_MB`       | <ul><li><a href="https://app.gitbook.com/s/b5QLzc5pV7wpXz3CDYyp/organizations/projects#get-api-v1-org-unit-projects-projectid-metrics">Projects</a></li><li><a href="https://app.gitbook.com/s/b5QLzc5pV7wpXz3CDYyp/organizations/departments#get-api-v1-org-unit-departments-departmentid-metrics">Departments</a></li></ul>                                                                                                                                                                                                                                                                                                                     |                                                                      |                                                                                                                                                                                                                                                                   |
| `POD_COUNT`                      | [Workloads](https://app.gitbook.com/s/b5QLzc5pV7wpXz3CDYyp/workloads/workloads#get-api-v1-workloads-workloadid-metrics)                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                           |                                                                      |                                                                                                                                                                                                                                                                   |
| `RUNNING_POD_COUNT`              | [Workloads](https://app.gitbook.com/s/b5QLzc5pV7wpXz3CDYyp/workloads/workloads#get-api-v1-workloads-workloadid-metrics)                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                           |                                                                      |                                                                                                                                                                                                                                                                   |

### Advanced Metrics

NVIDIA provides extended metrics as shown [here](https://docs.nvidia.com/datacenter/dcgm/latest/user-guide/feature-overview.html#profiling-metrics).

{% hint style="info" %}
**Note**

Advanced metrics are disabled by default. If unavailable, your Administrator must enable it under **General Settings** → Analytics → Advanced metrics. Before enabling, the administrator must configure GPU profiling through the DCGM Exporter and NVIDIA Run:ai Prometheus integration. For configuration steps, see [Advanced metrics](https://run-ai-docs.nvidia.com/self-hosted/2.22/platform-management/monitor-performance/advanced-metrics).
{% endhint %}

| Metric name in API                          | Applicable API endpoint                                                                                                                                                                                                                                                              | Metric name in UI             | Applicable UI table                                                                                                                                                                         |
| ------------------------------------------- | ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------ | ----------------------------- | ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
| `GPU_FP16_ENGINE_ACTIVITY_PER_GPU`          | <ul><li><a href="https://app.gitbook.com/s/b5QLzc5pV7wpXz3CDYyp/workloads/pods#get-api-v1-workloads-workloadid-pods-podid-metrics">Pods</a></li><li><a href="https://app.gitbook.com/s/b5QLzc5pV7wpXz3CDYyp/organizations/nodes#get-api-v1-nodes-nodeid-metrics">Nodes</a></li></ul> | GPU FP16 engine activity      | <ul><li><a href="../../../workloads-in-nvidia-run-ai/workloads#show-hide-details">Workloads</a></li><li><a href="../../aiinitiatives/resources/nodes#show-hide-details">Nodes</a></li></ul> |
| `GPU_FP32_ENGINE_ACTIVITY_PER_GPU`          | <ul><li><a href="https://app.gitbook.com/s/b5QLzc5pV7wpXz3CDYyp/workloads/pods#get-api-v1-workloads-workloadid-pods-podid-metrics">Pods</a></li><li><a href="https://app.gitbook.com/s/b5QLzc5pV7wpXz3CDYyp/organizations/nodes#get-api-v1-nodes-nodeid-metrics">Nodes</a></li></ul> | GPU FP32 engine activity      | <ul><li><a href="../../../workloads-in-nvidia-run-ai/workloads#show-hide-details">Workloads</a></li><li><a href="../../aiinitiatives/resources/nodes#show-hide-details">Nodes</a></li></ul> |
| `GPU_FP64_ENGINE_ACTIVITY_PER_GPU`          | <ul><li><a href="https://app.gitbook.com/s/b5QLzc5pV7wpXz3CDYyp/workloads/pods#get-api-v1-workloads-workloadid-pods-podid-metrics">Pods</a></li><li><a href="https://app.gitbook.com/s/b5QLzc5pV7wpXz3CDYyp/organizations/nodes#get-api-v1-nodes-nodeid-metrics">Nodes</a></li></ul> | GPU FP64 engine activity      | <ul><li><a href="../../../workloads-in-nvidia-run-ai/workloads#show-hide-details">Workloads</a></li><li><a href="../../aiinitiatives/resources/nodes#show-hide-details">Nodes</a></li></ul> |
| `GPU_GRAPHICS_ENGINE_ACTIVITY_PER_GPU`      | <ul><li><a href="https://app.gitbook.com/s/b5QLzc5pV7wpXz3CDYyp/workloads/pods#get-api-v1-workloads-workloadid-pods-podid-metrics">Pods</a></li><li><a href="https://app.gitbook.com/s/b5QLzc5pV7wpXz3CDYyp/organizations/nodes#get-api-v1-nodes-nodeid-metrics">Nodes</a></li></ul> | Graphics engine activity      | <ul><li><a href="../../../workloads-in-nvidia-run-ai/workloads#show-hide-details">Workloads</a></li><li><a href="../../aiinitiatives/resources/nodes#show-hide-details">Nodes</a></li></ul> |
| `GPU_MEMORY_BANDWIDTH_UTILIZATION_PER_GPU`  | <ul><li><a href="https://app.gitbook.com/s/b5QLzc5pV7wpXz3CDYyp/workloads/pods#get-api-v1-workloads-workloadid-pods-podid-metrics">Pods</a></li><li><a href="https://app.gitbook.com/s/b5QLzc5pV7wpXz3CDYyp/organizations/nodes#get-api-v1-nodes-nodeid-metrics">Nodes</a></li></ul> | Memory bandwidth utilization  | <ul><li><a href="../../../workloads-in-nvidia-run-ai/workloads#show-hide-details">Workloads</a></li><li><a href="../../aiinitiatives/resources/nodes#show-hide-details">Nodes</a></li></ul> |
| `GPU_NVLINK_RECEIVED_BANDWIDTH_PER_GPU`     | <ul><li><a href="https://app.gitbook.com/s/b5QLzc5pV7wpXz3CDYyp/workloads/pods#get-api-v1-workloads-workloadid-pods-podid-metrics">Pods</a></li><li><a href="https://app.gitbook.com/s/b5QLzc5pV7wpXz3CDYyp/organizations/nodes#get-api-v1-nodes-nodeid-metrics">Nodes</a></li></ul> | NVLink received bandwidth     | <ul><li><a href="../../../workloads-in-nvidia-run-ai/workloads#show-hide-details">Workloads</a></li><li><a href="../../aiinitiatives/resources/nodes#show-hide-details">Nodes</a></li></ul> |
| `GPU_NVLINK_TRANSMITTED_BANDWIDTH_PER_GPU`  | <ul><li><a href="https://app.gitbook.com/s/b5QLzc5pV7wpXz3CDYyp/workloads/pods#get-api-v1-workloads-workloadid-pods-podid-metrics">Pods</a></li><li><a href="https://app.gitbook.com/s/b5QLzc5pV7wpXz3CDYyp/organizations/nodes#get-api-v1-nodes-nodeid-metrics">Nodes</a></li></ul> | NVLink transmitted bandwidth  | <ul><li><a href="../../../workloads-in-nvidia-run-ai/workloads#show-hide-details">Workloads</a></li><li><a href="../../aiinitiatives/resources/nodes#show-hide-details">Nodes</a></li></ul> |
| `GPU_PCIE_RECEIVED_BANDWIDTH_PER_GPU`       | <ul><li><a href="https://app.gitbook.com/s/b5QLzc5pV7wpXz3CDYyp/workloads/pods#get-api-v1-workloads-workloadid-pods-podid-metrics">Pods</a></li><li><a href="https://app.gitbook.com/s/b5QLzc5pV7wpXz3CDYyp/organizations/nodes#get-api-v1-nodes-nodeid-metrics">Nodes</a></li></ul> | PCIe received bandwidth       | <ul><li><a href="../../../workloads-in-nvidia-run-ai/workloads#show-hide-details">Workloads</a></li><li><a href="../../aiinitiatives/resources/nodes#show-hide-details">Nodes</a></li></ul> |
| `GPU_PCIE_TRANSMITTED_BANDWIDTH_PER_GPU`    | <ul><li><a href="https://app.gitbook.com/s/b5QLzc5pV7wpXz3CDYyp/workloads/pods#get-api-v1-workloads-workloadid-pods-podid-metrics">Pods</a></li><li><a href="https://app.gitbook.com/s/b5QLzc5pV7wpXz3CDYyp/organizations/nodes#get-api-v1-nodes-nodeid-metrics">Nodes</a></li></ul> | PCIe transmitted bandwidth    | <ul><li><a href="../../../workloads-in-nvidia-run-ai/workloads#show-hide-details">Workloads</a></li><li><a href="../../aiinitiatives/resources/nodes#show-hide-details">Nodes</a></li></ul> |
| `GPU_SM_ACTIVITY_PER_GPU`                   | <ul><li><a href="https://app.gitbook.com/s/b5QLzc5pV7wpXz3CDYyp/workloads/pods#get-api-v1-workloads-workloadid-pods-podid-metrics">Pods</a></li><li><a href="https://app.gitbook.com/s/b5QLzc5pV7wpXz3CDYyp/organizations/nodes#get-api-v1-nodes-nodeid-metrics">Nodes</a></li></ul> | GPU SM activity               | <ul><li><a href="../../../workloads-in-nvidia-run-ai/workloads#show-hide-details">Workloads</a></li><li><a href="../../aiinitiatives/resources/nodes#show-hide-details">Nodes</a></li></ul> |
| `GPU_SM_OCCUPANCY_PER_GPU`                  | <ul><li><a href="https://app.gitbook.com/s/b5QLzc5pV7wpXz3CDYyp/workloads/pods#get-api-v1-workloads-workloadid-pods-podid-metrics">Pods</a></li><li><a href="https://app.gitbook.com/s/b5QLzc5pV7wpXz3CDYyp/organizations/nodes#get-api-v1-nodes-nodeid-metrics">Nodes</a></li></ul> | GPU SM occupancy              | <ul><li><a href="../../../workloads-in-nvidia-run-ai/workloads#show-hide-details">Workloads</a></li><li><a href="../../aiinitiatives/resources/nodes#show-hide-details">Nodes</a></li></ul> |
| `GPU_TENSOR_ACTIVITY_PER_GPU`               | <ul><li><a href="https://app.gitbook.com/s/b5QLzc5pV7wpXz3CDYyp/workloads/pods#get-api-v1-workloads-workloadid-pods-podid-metrics">Pods</a></li><li><a href="https://app.gitbook.com/s/b5QLzc5pV7wpXz3CDYyp/organizations/nodes#get-api-v1-nodes-nodeid-metrics">Nodes</a></li></ul> | GPU tensor activity           | <ul><li><a href="../../../workloads-in-nvidia-run-ai/workloads#show-hide-details">Workloads</a></li><li><a href="../../aiinitiatives/resources/nodes#show-hide-details">Nodes</a></li></ul> |
| `GPU_OOMKILL_SWAP_OUT_OF_RAM_COUNT_PER_GPU` | [Nodes](https://app.gitbook.com/s/b5QLzc5pV7wpXz3CDYyp/organizations/nodes#get-api-v1-nodes-nodeid-metrics)                                                                                                                                                                          | OOMKill swap out of RAM count | [Nodes](https://run-ai-docs.nvidia.com/self-hosted/2.22/aiinitiatives/resources/nodes#show-hide-details)                                                                                    |
| `GPU_OOMKILL_BURST_COUNT_PER_GPU`           | [Nodes](https://app.gitbook.com/s/b5QLzc5pV7wpXz3CDYyp/organizations/nodes#get-api-v1-nodes-nodeid-metrics)                                                                                                                                                                          | OOMKill burst count           | [Nodes](https://run-ai-docs.nvidia.com/self-hosted/2.22/aiinitiatives/resources/nodes#show-hide-details)                                                                                    |
| `GPU_OOMKILL_IDLE_COUNT_PER_GPU`            | [Nodes](https://app.gitbook.com/s/b5QLzc5pV7wpXz3CDYyp/organizations/nodes#get-api-v1-nodes-nodeid-metrics)                                                                                                                                                                          | OOMKill idle count            | [Nodes](https://run-ai-docs.nvidia.com/self-hosted/2.22/aiinitiatives/resources/nodes#show-hide-details)                                                                                    |
| `GPU_SWAP_MEMORY_BYTES_PER_GPU`             | [Pods](https://app.gitbook.com/s/b5QLzc5pV7wpXz3CDYyp/workloads/pods#get-api-v1-workloads-workloadid-pods-podid-metrics)                                                                                                                                                             | GPU swap memory               | [Workloads](https://run-ai-docs.nvidia.com/self-hosted/2.22/workloads-in-nvidia-run-ai/workloads#show-hide-details)                                                                         |

## Supported Telemetry

| Metric                              | Applicable API endpoint                                                                                                                                                                                                                                                                                                                                                                                                                              | Metric name in UI              | Applicable UI table                                                                                                                                                                                                      |
| ----------------------------------- | ---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- | ------------------------------ | ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------ |
| `WORKLOADS_COUNT`                   | [Workloads](https://app.gitbook.com/s/b5QLzc5pV7wpXz3CDYyp/workloads/workloads#get-api-v1-workloads-telemetry)                                                                                                                                                                                                                                                                                                                                       |                                |                                                                                                                                                                                                                          |
| `ALLOCATED_GPUS`                    | [Nodes](https://app.gitbook.com/s/b5QLzc5pV7wpXz3CDYyp/organizations/nodes#get-api-v1-nodes-telemetry)                                                                                                                                                                                                                                                                                                                                               | Allocated GPUs                 | [Nodes](https://run-ai-docs.nvidia.com/self-hosted/2.22/platform-management/aiinitiatives/resources/nodes)                                                                                                               |
| `GPU_allocation`                    | <ul><li><a href="https://app.gitbook.com/s/b5QLzc5pV7wpXz3CDYyp/workloads/workloads#get-api-v1-workloads-telemetry">Workloads</a></li><li><a href="https://app.gitbook.com/s/b5QLzc5pV7wpXz3CDYyp/organizations/projects#get-api-v1-org-unit-projects-projectid-metrics">Projects</a></li><li><a href="https://app.gitbook.com/s/b5QLzc5pV7wpXz3CDYyp/organizations/departments#get-api-v1-org-unit-departments-telemetry">Departments</a></li></ul> |                                |                                                                                                                                                                                                                          |
| `READY_GPU_NODES`                   | [Nodes](https://app.gitbook.com/s/b5QLzc5pV7wpXz3CDYyp/organizations/nodes#get-api-v1-nodes-telemetry)                                                                                                                                                                                                                                                                                                                                               | Ready / Total GPU nodes        | [Overview dashboard](https://run-ai-docs.nvidia.com/self-hosted/2.22/platform-management/before-you-start#ui-views)                                                                                                      |
| `READY_GPUS`                        | [Nodes](https://app.gitbook.com/s/b5QLzc5pV7wpXz3CDYyp/organizations/nodes#get-api-v1-nodes-telemetry)                                                                                                                                                                                                                                                                                                                                               | Ready / Total GPU devices      | [Overview dashboard](https://run-ai-docs.nvidia.com/self-hosted/2.22/platform-management/before-you-start#ui-views)                                                                                                      |
| `TOTAL_GPU_NODES`                   | [Nodes](https://app.gitbook.com/s/b5QLzc5pV7wpXz3CDYyp/organizations/nodes#get-api-v1-nodes-telemetry)                                                                                                                                                                                                                                                                                                                                               | Ready / Total GPU nodes        | [Overview dashboard](https://run-ai-docs.nvidia.com/self-hosted/2.22/platform-management/before-you-start#ui-views)                                                                                                      |
| `TOTAL_GPUS`                        | [Nodes](https://app.gitbook.com/s/b5QLzc5pV7wpXz3CDYyp/organizations/nodes#get-api-v1-nodes-telemetry)                                                                                                                                                                                                                                                                                                                                               | Ready / Total GPU devices      | [Overview dashboard](https://run-ai-docs.nvidia.com/self-hosted/2.22/platform-management/before-you-start#ui-views)                                                                                                      |
| `IDLE_ALLOCATED_GPUS`               | [Nodes](https://app.gitbook.com/s/b5QLzc5pV7wpXz3CDYyp/organizations/nodes#get-api-v1-nodes-telemetry)                                                                                                                                                                                                                                                                                                                                               | Idle allocated GPU devices     | [Overview dashboard](https://run-ai-docs.nvidia.com/self-hosted/2.22/platform-management/before-you-start#ui-views)                                                                                                      |
| `FREE_GPUS`                         | [Nodes](https://app.gitbook.com/s/b5QLzc5pV7wpXz3CDYyp/organizations/nodes#get-api-v1-nodes-telemetry)                                                                                                                                                                                                                                                                                                                                               | Free GPU devices               | [Nodes](https://run-ai-docs.nvidia.com/self-hosted/2.22/platform-management/aiinitiatives/resources/nodes)                                                                                                               |
| `TOTAL_CPU_CORES`                   | [Nodes](https://app.gitbook.com/s/b5QLzc5pV7wpXz3CDYyp/organizations/nodes#get-api-v1-nodes-telemetry)                                                                                                                                                                                                                                                                                                                                               | CPU (Cores)                    | [Nodes](https://run-ai-docs.nvidia.com/self-hosted/2.22/platform-management/aiinitiatives/resources/nodes)                                                                                                               |
| `USED_CPU_CORES`                    | [Nodes](https://app.gitbook.com/s/b5QLzc5pV7wpXz3CDYyp/organizations/nodes#get-api-v1-nodes-telemetry)                                                                                                                                                                                                                                                                                                                                               |                                |                                                                                                                                                                                                                          |
| `ALLOCATED_CPU_CORES`               | <ul><li><a href="https://app.gitbook.com/s/b5QLzc5pV7wpXz3CDYyp/organizations/nodes#get-api-v1-nodes-telemetry">Nodes</a></li><li><a href="https://app.gitbook.com/s/b5QLzc5pV7wpXz3CDYyp/organizations/projects#get-api-v1-org-unit-projects-projectid-metrics">Projects</a></li><li><a href="https://app.gitbook.com/s/b5QLzc5pV7wpXz3CDYyp/organizations/departments#get-api-v1-org-unit-departments-telemetry">Departments</a></li></ul>         | <p>Allocated CPU cores<br></p> | [Nodes](https://run-ai-docs.nvidia.com/self-hosted/2.22/platform-management/aiinitiatives/resources/nodes)                                                                                                               |
| `TOTAL_GPU_MEMORY_BYTES`            | [Nodes](https://app.gitbook.com/s/b5QLzc5pV7wpXz3CDYyp/organizations/nodes#get-api-v1-nodes-telemetry)                                                                                                                                                                                                                                                                                                                                               | GPU memory                     | [Nodes](https://run-ai-docs.nvidia.com/self-hosted/2.22/platform-management/aiinitiatives/resources/nodes)                                                                                                               |
| `USED_GPU_MEMORY_BYTES`             | [Nodes](https://app.gitbook.com/s/b5QLzc5pV7wpXz3CDYyp/organizations/nodes#get-api-v1-nodes-telemetry)                                                                                                                                                                                                                                                                                                                                               | Used GPU memory                | [Nodes](https://run-ai-docs.nvidia.com/self-hosted/2.22/platform-management/aiinitiatives/resources/nodes)                                                                                                               |
| `TOTAL_CPU_MEMORY_BYTES`            | [Nodes](https://app.gitbook.com/s/b5QLzc5pV7wpXz3CDYyp/organizations/nodes#get-api-v1-nodes-telemetry)                                                                                                                                                                                                                                                                                                                                               | CPU memory                     | [Nodes](https://run-ai-docs.nvidia.com/self-hosted/2.22/platform-management/aiinitiatives/resources/nodes)                                                                                                               |
| `USED_CPU_MEMORY_BYTES`             | [Nodes](https://app.gitbook.com/s/b5QLzc5pV7wpXz3CDYyp/organizations/nodes#get-api-v1-nodes-telemetry)                                                                                                                                                                                                                                                                                                                                               | Used CPU memory                | [Nodes](https://run-ai-docs.nvidia.com/self-hosted/2.22/platform-management/aiinitiatives/resources/nodes)                                                                                                               |
| `ALLOCATED_CPU_MEMORY_BYTES`        | <ul><li><a href="https://app.gitbook.com/s/b5QLzc5pV7wpXz3CDYyp/organizations/nodes#get-api-v1-nodes-telemetry">Nodes</a></li><li><a href="https://app.gitbook.com/s/b5QLzc5pV7wpXz3CDYyp/organizations/projects#get-api-v1-org-unit-projects-projectid-metrics">Projects</a></li><li><a href="https://app.gitbook.com/s/b5QLzc5pV7wpXz3CDYyp/organizations/departments#get-api-v1-org-unit-departments-telemetry">Departments</a></li></ul>         | Allocated CPU memory           | <ul><li><a href="../aiinitiatives/resources/nodes">Nodes</a></li><li><a href="../aiinitiatives/organization/projects">Projects</a></li><li><a href="../aiinitiatives/organization/departments">Departments</a></li></ul> |
| `GPU_QUOTA`                         | <ul><li><a href="https://app.gitbook.com/s/b5QLzc5pV7wpXz3CDYyp/organizations/projects#get-api-v1-org-unit-projects-projectid-metrics">Projects</a></li><li><a href="https://app.gitbook.com/s/b5QLzc5pV7wpXz3CDYyp/organizations/departments#get-api-v1-org-unit-departments-telemetry">Departments</a></li></ul>                                                                                                                                   | GPU quota                      | <ul><li><a href="../aiinitiatives/organization/projects">Projects</a></li><li><a href="../aiinitiatives/organization/departments">Departments</a></li></ul>                                                              |
| `CPU_QUOTA`                         | <ul><li><a href="https://app.gitbook.com/s/b5QLzc5pV7wpXz3CDYyp/organizations/projects#get-api-v1-org-unit-projects-projectid-metrics">Projects</a></li><li><a href="https://app.gitbook.com/s/b5QLzc5pV7wpXz3CDYyp/organizations/departments#get-api-v1-org-unit-departments-telemetry">Departments</a></li></ul>                                                                                                                                   |                                |                                                                                                                                                                                                                          |
| `MEMORY_QUOTA`                      | <ul><li><a href="https://app.gitbook.com/s/b5QLzc5pV7wpXz3CDYyp/organizations/projects#get-api-v1-org-unit-projects-projectid-metrics">Projects</a></li><li><a href="https://app.gitbook.com/s/b5QLzc5pV7wpXz3CDYyp/organizations/departments#get-api-v1-org-unit-departments-telemetry">Departments</a></li></ul>                                                                                                                                   |                                |                                                                                                                                                                                                                          |
| `GPU_ALLOCATION_NON_PREEMPTIBLE`    | <ul><li><a href="https://app.gitbook.com/s/b5QLzc5pV7wpXz3CDYyp/organizations/projects#get-api-v1-org-unit-projects-projectid-metrics">Projects</a></li><li><a href="https://app.gitbook.com/s/b5QLzc5pV7wpXz3CDYyp/organizations/departments#get-api-v1-org-unit-departments-telemetry">Departments</a></li></ul>                                                                                                                                   |                                |                                                                                                                                                                                                                          |
| `CPU_ALLOCATION_NON_PREEMPTIBLE`    | <ul><li><a href="https://app.gitbook.com/s/b5QLzc5pV7wpXz3CDYyp/organizations/projects#get-api-v1-org-unit-projects-projectid-metrics">Projects</a></li><li><a href="https://app.gitbook.com/s/b5QLzc5pV7wpXz3CDYyp/organizations/departments#get-api-v1-org-unit-departments-telemetry">Departments</a></li></ul>                                                                                                                                   |                                |                                                                                                                                                                                                                          |
| `MEMORY_ALLOCATION_NON_PREEMPTIBLE` | <ul><li><a href="https://app.gitbook.com/s/b5QLzc5pV7wpXz3CDYyp/organizations/projects#get-api-v1-org-unit-projects-projectid-metrics">Projects</a></li><li><a href="https://app.gitbook.com/s/b5QLzc5pV7wpXz3CDYyp/organizations/departments#get-api-v1-org-unit-departments-telemetry">Departments</a></li></ul>                                                                                                                                   |                                |                                                                                                                                                                                                                          |
