# Deploy NVIDIA Cloud Functions (NVCF) in NVIDIA Run:ai

NVIDIA Cloud Functions (NVCF) is a serverless API platform designed to deploy and manage AI workloads on GPUs. Through its integration with NVIDIA Run:ai, NVCF can be deployed directly onto NVIDIA Run:ai-managed GPU clusters. This allows users to take advantage of NVIDIA Run:ai's scheduling, quota management, and monitoring features. See [Supported features](#supported-features) for more details.

This guide provides the required steps for integrating NVIDIA Cloud Functions with the NVIDIA Run:ai platform.

## Workload Priority

By default, inference workloads in NVIDIA Run:ai are assigned a priority of `very-high`, which is non-preemptible. This behavior ensures that inference workloads, which often serve real-time or latency-sensitive traffic, are guaranteed the resources they need and will not be disrupted by other workloads. For more details, see [Workload priority control](https://run-ai-docs.nvidia.com/self-hosted/platform-management/runai-scheduler/scheduling/workload-priority-control).

{% hint style="info" %}
**Note**

Changing the priority is not supported for NVCF workloads.
{% endhint %}

## Setup

Follow the official instructions provided in the [NVIDIA Cloud Functions](https://docs.nvidia.com/cloud-functions/user-guide/latest/cloud-function/overview.html) documentation.

### Setting up a Cluster in NVCF

Cloud Functions administrators can install the NVIDIA Cluster Agent to enable existing GPU clusters as deployment targets for NVCF functions. Once installed, the cluster appears as a deployment option in the API and Cloud Functions menu, allowing authorized functions to deploy on it. See [Cluster Setup & Management](https://docs.nvidia.com/cloud-functions/user-guide/latest/cloud-function/cluster-management.html) to register, configure and verify the cluster.

### Setting up a Project in NVIDIA Run:ai

Once the cluster is registered to NVCF and appears as ready, create a project in the NVIDIA Run:ai UI with an **NVCF namespace**:

1. Follow the instructions detailed in [Projects](https://run-ai-docs.nvidia.com/self-hosted/platform-management/aiinitiatives/organization/projects) to create a new project.
2. When setting the **Namespace**, choose **"Enter existing namespace from the cluster"** and enter **`nvcf-backend`**.
3. Assign the necessary resource quotas to the project.

{% hint style="info" %}
**Note**

* This is the designated NVIDIA Run:ai project for all NVCF functions; other projects will not be used.
* NVCF is assigned to a specific project and adheres to the project's resource quota. Ensure that your project has sufficient quota allocated to accommodate.
  {% endhint %}

## Deploying a Function

In NVCF, function creation defines the code and resources while deployment registers the function to a GPU cluster, making it available for execution:

1. Create the function as detailed in [Function Creation](https://docs.nvidia.com/cloud-functions/user-guide/latest/cloud-function/function-creation.html#function-creation).
2. Deploy the function as detailed in [Function Deployment](https://docs.nvidia.com/cloud-functions/user-guide/latest/cloud-function/function-deployment.html).

{% hint style="info" %}
**Note**

Using a custom Helm chart when creating a [Cloud Function](https://docs.nvidia.com/cloud-functions/user-guide/latest/cloud-function/function-creation.html) is not supported.
{% endhint %}

## Managing and Monitoring

After the NVCF function is deployed, it is added to the [Workloads](https://run-ai-docs.nvidia.com/self-hosted/workloads-in-nvidia-run-ai/workloads) table, where it can be managed and monitored:

* Monitor resource usage, performance, and execution status.
* Manage workload lifecycle, including scaling, logging, and troubleshooting.

## Supported Features

| NVIDIA Run:ai Functionality                                                                                                                                                      | NVCF |
| -------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- | :--: |
| [Fairness](https://run-ai-docs.nvidia.com/self-hosted/platform-management/runai-scheduler/scheduling/concepts-and-principles#fairness-fair-resource-distribution)                |   v  |
| [Priority and preemption](https://run-ai-docs.nvidia.com/self-hosted/platform-management/runai-scheduler/scheduling/concepts-and-principles#priority-and-preemption)             |   v  |
| [Over quota](https://run-ai-docs.nvidia.com/self-hosted/platform-management/runai-scheduler/scheduling/concepts-and-principles#over-quota)                                       |   v  |
| [Node pools](https://run-ai-docs.nvidia.com/self-hosted/platform-management/aiinitiatives/resources/node-pools)                                                                  |   v  |
| [Bin packing / Spread](https://run-ai-docs.nvidia.com/self-hosted/platform-management/runai-scheduler/scheduling/concepts-and-principles#placement-strategy-bin-pack-and-spread) |   v  |
| [Multi-GPU fractions](https://run-ai-docs.nvidia.com/self-hosted/platform-management/runai-scheduler/resource-optimization/fractions)                                            |      |
| [Multi-GPU dynamic fractions](https://run-ai-docs.nvidia.com/self-hosted/platform-management/runai-scheduler/resource-optimization/dynamic-fractions)                            |      |
| [Node level scheduler](https://run-ai-docs.nvidia.com/self-hosted/platform-management/runai-scheduler/resource-optimization/node-level-scheduler)                                |   v  |
| [Multi-GPU memory swap](https://run-ai-docs.nvidia.com/self-hosted/platform-management/runai-scheduler/resource-optimization/memory-swap)                                        |      |
| [Gang scheduling](https://run-ai-docs.nvidia.com/self-hosted/platform-management/runai-scheduler/scheduling/concepts-and-principles#gang-scheduling)                             |   v  |
| [Monitoring](https://run-ai-docs.nvidia.com/self-hosted/infrastructure-setup/procedures/system-monitoring)                                                                       |   v  |
| [RBAC](https://run-ai-docs.nvidia.com/self-hosted/infrastructure-setup/authentication/overview#role-based-access-control-rbac-in-run-ai)                                         |   v  |
| Workload awareness                                                                                                                                                               |   v  |
| [Workload submission](https://run-ai-docs.nvidia.com/self-hosted/workloads-in-nvidia-run-ai/workloads)                                                                           |      |
| [Workload actions (stop/run)](https://run-ai-docs.nvidia.com/self-hosted/workloads-in-nvidia-run-ai/workloads)                                                                   |      |
| [Workload Policies](https://run-ai-docs.nvidia.com/self-hosted/platform-management/policies/workload-policies)                                                                   |      |
| [Scheduling rules](https://run-ai-docs.nvidia.com/self-hosted/platform-management/policies/scheduling-rules)                                                                     |      |
