NVIDIA NIM

The NVIDIA NIM API provides endpoints to create and manage workloads that deploy NVIDIA Inference Microservices (NIM) through the NIM Operator. These workloads package optimized NVIDIA model servers and run as managed services on the NVIDIA Run:ai platform. Each request includes NVIDIA Run:ai scheduling metadata (for example, project, priority, and category) and a NIM service specification that defines the container image, compute resources, environment variables, storage, and networking configuration. Once submitted, NVIDIA Run:ai handles scheduling, orchestration, and lifecycle management of the NIM service to ensure reliable and efficient model serving.

Create a NVIDIA NIM service. [Experimental]

post

Create a NVIDIA NIM service

Authorizations
AuthorizationstringRequired

Bearer authentication

Body
Responses
post
/api/v2/workloads/nim-services

Get a NVIDIA NIM service. [Experimental]

get

Retrieve details of a specific NVIDIA NIM service, by id

Authorizations
AuthorizationstringRequired

Bearer authentication

Path parameters
WorkloadV2Idstring · uuidRequired

The ID of the workload.

Responses
chevron-right
200

Successfully retrieved the workload

application/json
get
/api/v2/workloads/nim-services/{WorkloadV2Id}

Update NVIDIA NIM service spec. [Experimental]

patch

Update the specification of an existing NVIDIA NIM service.

Authorizations
AuthorizationstringRequired

Bearer authentication

Path parameters
WorkloadV2Idstring · uuidRequired

The ID of the workload.

Body
Responses
patch
/api/v2/workloads/nim-services/{WorkloadV2Id}

Last updated