Deploy NVIDIA Cloud Functions (NVCF) in NVIDIA Run:ai

NVIDIA Cloud Functions (NVCF) is a serverless API platform designed to deploy and manage AI workloads on GPUs. Through its integration with NVIDIA Run:ai, NVCF can be deployed directly onto NVIDIA Run:ai-managed GPU clusters. This allows users to take advantage of NVIDIA Run:ai's scheduling, quota management, and monitoring features. See Supported features for more details.

This guides provides the required steps for integrating NVIDIA Cloud Functions with the NVIDIA Run:ai platform.

Workload Priority

By default, inference workloads in NVIDIA Run:ai are assigned the inference priority which is non-preemptible. This behavior ensures that inference workloads, which often serve real-time or latency-sensitive traffic, are guaranteed the resources they need and will not be disrupted by other workloads. For more details, see Workload priority control.

Setup

Follow the official instructions provided in the NVIDIA Cloud Functions documentation.

Setting Up a Cluster in NVCF

Cloud Functions administrators can install the NVIDIA Cluster Agent to enable existing GPU clusters as deployment targets for NVCF functions. Once installed, the cluster appears as a deployment option in the API and Cloud Functions menu, allowing authorized functions to deploy on it. See Cluster Setup & Management to register, configure and verify the cluster.

Setting Up a Project in NVIDIA Run:ai

Once the cluster is registered to NVCF and appears as ready, create a project in the NVIDIA Run:ai UI with an NVCF namespace:

Follow the instructions detailed in Projects to create a new project.
When setting the Namespace, choose "Enter existing namespace from the cluster" and enter nvcf-backend.
Assign the necessary resource quotas to the project.

Note

This is the designated NVIDIA Run:ai project for all NVCF functions; other projects will not be used.
NVCF is assigned to a specific project and adheres to the project's resource quota. Ensure that your project has sufficient quota allocated to accommodate.

Deploying a Function

In NVCF, function creation defines the code and resources while deployment registers the function to a GPU cluster, making it available for execution:

Create the function as detailed in Function Creation.
Deploy the function as detailed in Function Deployment.

Note

Using a custom Helm chart when creating a Cloud Function is not supported.

Managing and Monitoring

After the NVCF function is deployed, it is added to the Workloads table, where it can be managed and monitored:

Monitor resource usage, performance, and execution status.
Manage workload lifecycle, including scaling, logging, and troubleshooting.

Supported Features

NVIDIA Run:ai Functionality

NVCF

Fairness

Priority and preemption

Multi-GPU dynamic fractions

Node level scheduler

Multi-GPU memory swap

Workload awareness

Workload actions (stop/run)

Workload Policies

Scheduling rules

PreviousDeploy Inference Workloads with NVIDIA NIM NextCLI Reference

Last updated 2 months ago