> For the complete documentation index, see [llms.txt](https://run-ai-docs.nvidia.com/llms.txt). Markdown versions of documentation pages are available by appending `.md` to page URLs; this page is available as [Markdown](https://run-ai-docs.nvidia.com/self-hosted/2.23/workloads-in-nvidia-run-ai/using-inference/nvcf.md).

# Deploy NVIDIA Cloud Functions (NVCF) in NVIDIA Run:ai

NVIDIA Cloud Functions (NVCF) is a serverless API platform designed to deploy and manage AI workloads on GPUs. Through its integration with NVIDIA Run:ai, NVCF can be deployed directly onto NVIDIA Run:ai-managed GPU clusters. This allows users to take advantage of NVIDIA Run:ai's scheduling, quota management, and monitoring features. See [Supported features](#supported-features) for more details.

This guides provides the required steps for integrating NVIDIA Cloud Functions with the NVIDIA Run:ai platform.

## Workload Priority

By default, inference workloads in NVIDIA Run:ai are assigned a priority of `very-high`, which is non-preemptible. This behavior ensures that inference workloads, which often serve real-time or latency-sensitive traffic, are guaranteed the resources they need and will not be disrupted by other workloads. For more details, see [Workload priority control](/self-hosted/2.23/platform-management/runai-scheduler/scheduling/workload-priority-control.md).

{% hint style="info" %}
**Note**

Changing the priority is not supported for NVCF workloads.
{% endhint %}

## Setup

Follow the official instructions provided in the [NVIDIA Cloud Functions](https://docs.nvidia.com/cloud-functions/user-guide/latest/cloud-function/overview.html) documentation.

### Setting up a Cluster in NVCF

Cloud Functions administrators can install the NVIDIA Cluster Agent to enable existing GPU clusters as deployment targets for NVCF functions. Once installed, the cluster appears as a deployment option in the API and Cloud Functions menu, allowing authorized functions to deploy on it. See [Cluster Setup & Management](https://docs.nvidia.com/cloud-functions/user-guide/latest/cloud-function/cluster-management.html) to register, configure and verify the cluster.

### Setting up a Project in NVIDIA Run:ai

Once the cluster is registered to NVCF and appears as ready, create a project in the NVIDIA Run:ai UI with an **NVCF namespace**:

1. Follow the instructions detailed in [Projects](/self-hosted/2.23/platform-management/aiinitiatives/organization/projects.md) to create a new project.
2. When setting the **Namespace**, choose **"Enter existing namespace from the cluster"** and enter **`nvcf-backend`**.
3. Assign the necessary resource quotas to the project.

{% hint style="info" %}
**Note**

* This is the designated NVIDIA Run:ai project for all NVCF functions; other projects will not be used.
* NVCF is assigned to a specific project and adheres to the project's resource quota. Ensure that your project has sufficient quota allocated to accommodate.
  {% endhint %}

## Deploying a Function

In NVCF, function creation defines the code and resources while deployment registers the function to a GPU cluster, making it available for execution:

1. Create the function as detailed in [Function Creation](https://docs.nvidia.com/cloud-functions/user-guide/latest/cloud-function/function-creation.html#function-creation).
2. Deploy the function as detailed in [Function Deployment](https://docs.nvidia.com/cloud-functions/user-guide/latest/cloud-function/function-deployment.html).

{% hint style="info" %}
**Note**

Using a custom Helm chart when creating a [Cloud Function](https://docs.nvidia.com/cloud-functions/user-guide/latest/cloud-function/function-creation.html) is not supported.
{% endhint %}

## Managing and Monitoring

After the NVCF function is deployed, it is added to the [Workloads](/self-hosted/2.23/workloads-in-nvidia-run-ai/workloads.md) table, where it can be managed and monitored:

* Monitor resource usage, performance, and execution status.
* Manage workload lifecycle, including scaling, logging, and troubleshooting.

## Supported Features

| NVIDIA Run:ai Functionality                                                                                                                                | NVCF |
| ---------------------------------------------------------------------------------------------------------------------------------------------------------- | :--: |
| [Fairness](/self-hosted/2.23/platform-management/runai-scheduler/scheduling/concepts-and-principles.md#fairness-fair-resource-distribution)                |   v  |
| [Priority and preemption](/self-hosted/2.23/platform-management/runai-scheduler/scheduling/concepts-and-principles.md#priority-and-preemption)             |   v  |
| [Over quota](/self-hosted/2.23/platform-management/runai-scheduler/scheduling/concepts-and-principles.md#over-quota)                                       |   v  |
| [Node pools](/self-hosted/2.23/platform-management/aiinitiatives/resources/node-pools.md)                                                                  |   v  |
| [Bin packing / Spread](/self-hosted/2.23/platform-management/runai-scheduler/scheduling/concepts-and-principles.md#placement-strategy-bin-pack-and-spread) |   v  |
| [Multi-GPU fractions](/self-hosted/2.23/platform-management/runai-scheduler/resource-optimization/fractions.md)                                            |      |
| [Multi-GPU dynamic fractions](/self-hosted/2.23/platform-management/runai-scheduler/resource-optimization/dynamic-fractions.md)                            |      |
| [Node level scheduler](/self-hosted/2.23/platform-management/runai-scheduler/resource-optimization/node-level-scheduler.md)                                |   v  |
| [Multi-GPU memory swap](/self-hosted/2.23/platform-management/runai-scheduler/resource-optimization/memory-swap.md)                                        |      |
| [Gang scheduling](/self-hosted/2.23/platform-management/runai-scheduler/scheduling/concepts-and-principles.md#gang-scheduling)                             |   v  |
| [Monitoring](/self-hosted/2.23/infrastructure-setup/procedures/system-monitoring.md)                                                                       |   v  |
| [RBAC](/self-hosted/2.23/infrastructure-setup/authentication/overview.md#role-based-access-control-rbac-in-run-ai)                                         |   v  |
| Workload awareness                                                                                                                                         |   v  |
| [Workload submission](/self-hosted/2.23/workloads-in-nvidia-run-ai/workloads.md)                                                                           |      |
| [Workload actions (stop/run)](/self-hosted/2.23/workloads-in-nvidia-run-ai/workloads.md)                                                                   |      |
| [Workload Policies](/self-hosted/2.23/platform-management/policies/workload-policies.md)                                                                   |      |
| [Scheduling rules](/self-hosted/2.23/platform-management/policies/scheduling-rules.md)                                                                     |      |


---

# Agent Instructions
This documentation is published with GitBook. GitBook is the documentation platform designed so that both humans and AI agents can read, navigate, and reason over technical content effectively. Learn more at gitbook.com.

## Querying This Documentation
If you need additional information that is not directly available in this page, you can query the documentation dynamically by asking a question.

Perform an HTTP GET request on the current page URL with the `ask` query parameter, and the optional `goal` query parameter:

```
GET https://run-ai-docs.nvidia.com/self-hosted/2.23/workloads-in-nvidia-run-ai/using-inference/nvcf.md?ask=<question>&goal=<endgoal>
```

`ask` is the immediate question: it should be specific, self-contained, and written in natural language.
`goal` is optional and describes the broader end goal you are ultimately trying to accomplish on behalf of the user. GitBook uses it to tailor the answer towards what is most useful for that goal.

The response will contain a direct answer to the question and relevant excerpts and sources from the documentation.

Use this mechanism when the answer is not explicitly present in the current page, you need clarification or additional context, or you want to retrieve related documentation sections.