The NVIDIA NIM API provides endpoints to create and manage workloads that deploy NVIDIA Inference Microservices (NIM) through the NIM Operator. These workloads package optimized NVIDIA model servers and run as managed services on the NVIDIA Run:ai platform. Each request includes NVIDIA Run:ai scheduling metadata (for example, project, priority, and category) and a NIM service specification that defines the container image, compute resources, environment variables, storage, and networking configuration. Once submitted, NVIDIA Run:ai handles scheduling, orchestration, and lifecycle management of the NIM service to ensure reliable and efficient model serving.
Create a NVIDIA NIM service. [Experimental]
post
Create a NVIDIA NIM service
Authorizations
AuthorizationstringRequired
Bearer authentication
Body
Responses
202
Workload creation accepted
application/json
400
Bad submission request.
application/json
401
Unauthorized
application/json
403
Forbidden
application/json
409
The specified resource already exists
application/json
500
unexpected error
application/json
503
unexpected error
application/json
post
/api/v2/workloads/nim-services
Get a NVIDIA NIM service. [Experimental]
get
Retrieve details of a specific NVIDIA NIM service, by id
Authorizations
AuthorizationstringRequired
Bearer authentication
Path parameters
WorkloadV2Idstring · uuidRequired
The ID of the workload.
Responses
200
Successfully retrieved the workload
application/json
401
Unauthorized
application/json
403
Forbidden
application/json
404
The specified resource was not found
application/json
500
unexpected error
application/json
503
unexpected error
application/json
get
/api/v2/workloads/nim-services/{WorkloadV2Id}
Update NVIDIA NIM service spec. [Experimental]
patch
Update the specification of an existing NVIDIA NIM service.