The NVIDIA NIM API provides endpoints to create and manage workloads that deploy NVIDIA Inference Microservices (NIM) through the NIM Operator. These workloads package optimized NVIDIA model servers and run as managed services on the NVIDIA Run:ai platform. Each request includes NVIDIA Run:ai scheduling metadata (for example, project, priority, and category) and a NIM service specification that defines the container image, compute resources, environment variables, storage, and networking configuration. Once submitted, NVIDIA Run:ai handles scheduling, orchestration, and lifecycle management of the NIM service to ensure reliable and efficient model serving.
Create a NVIDIA NIM service. [Experimental]
post
Create a NVIDIA NIM service
Authorizations
AuthorizationstringRequired
Bearer authentication
Body
Responses
202
Workload creation accepted
application/json
400
Bad request.
application/json
401
Unauthorized
application/json
403
Forbidden
application/json
409
The specified resource already exists
application/json
500
unexpected error
application/json
503
unexpected error
application/json
post
/api/v2/workloads/nim-services
Get a NVIDIA NIM service. [Experimental]
get
Retrieve details of a specific NVIDIA NIM service, by id