Deploy NVIDIA Cloud Functions (NVCF) in NVIDIA Run:ai

NVIDIA Cloud Functions (NVCF) is a serverless API platform designed to deploy and manage AI workloads on GPUs. Through its integration with NVIDIA Run:ai, NVCF can be deployed directly onto NVIDIA Run:ai-managed GPU clusters. This allows users to take advantage of NVIDIA Run:ai's scheduling, quota management, and monitoring features. See Supported features for more details.

This guide provides the required steps for integrating NVIDIA Cloud Functions with the NVIDIA Run:ai platform.

Workload Priority

By default, inference workloads in NVIDIA Run:ai are assigned a priority of very-high, which is non-preemptible. This behavior ensures that inference workloads, which often serve real-time or latency-sensitive traffic, are guaranteed the resources they need and will not be disrupted by other workloads. For more details, see Workload priority control.

Note

Changing the priority is not supported for NVCF workloads.

Setup

Follow the official instructions provided in the NVIDIA Cloud Functions documentation.

Setting up a Cluster in NVCF

Cloud Functions administrators can install the NVIDIA Cluster Agent to enable existing GPU clusters as deployment targets for NVCF functions. Once installed, the cluster appears as a deployment option in the API and Cloud Functions menu, allowing authorized functions to deploy on it. See Cluster Setup & Management to register, configure and verify the cluster.

Setting up a Project in NVIDIA Run:ai

Once the cluster is registered to NVCF and appears as ready, create a project in the NVIDIA Run:ai UI with an NVCF namespace:

Follow the instructions detailed in Projects to create a new project.
When setting the Namespace, choose "Enter existing namespace from the cluster" and enter nvcf-backend.
Assign the necessary resource quotas to the project.

Note

This is the designated NVIDIA Run:ai project for all NVCF functions; other projects will not be used.
NVCF is assigned to a specific project and adheres to the project's resource quota. Ensure that your project has sufficient quota allocated to accommodate.

Deploying a Function

In NVCF, function creation defines the code and resources while deployment registers the function to a GPU cluster, making it available for execution: