Inferences

Inference workloads deploy trained models into a production environment to generate predictions from live data. These workloads are prioritized over Trainings and Workspaces during scheduling. NVIDIA Run:ai Inference workloads support auto-scaling to maintain service-level agreements (SLAs) by dynamically adjusting resources as demand changes.

Create an inference.

post

Create an inference using container related fields.

Authorizations
AuthorizationstringRequired

Bearer authentication

Body
and
and
Responses
post
/api/v1/workloads/inferences

Get inference data.

get

Retrieve inference details using a workload id.

Authorizations
AuthorizationstringRequired

Bearer authentication

Path parameters
workloadIdstring · uuidRequired

The Universally Unique Identifier (UUID) of the workload.

Responses
chevron-right
200

Executed successfully.

application/json
get
/api/v1/workloads/inferences/{workloadId}

Delete an inference.

delete

Delete an inference using a workload id.

Authorizations
AuthorizationstringRequired

Bearer authentication

Path parameters
workloadIdstring · uuidRequired

The Universally Unique Identifier (UUID) of the workload.

Responses
delete
/api/v1/workloads/inferences/{workloadId}

Update inference spec.

patch

Update the specification of an existing inference workload.

Authorizations
AuthorizationstringRequired

Bearer authentication

Path parameters
workloadIdstring · uuidRequired

The Universally Unique Identifier (UUID) of the workload.

Body
Responses
patch
/api/v1/workloads/inferences/{workloadId}

Get inference metrics data.

get

Retrieve inference metrics data by id. Supported from control-plane version 2.18 or later.

Authorizations
AuthorizationstringRequired

Bearer authentication

Path parameters
workloadIdstring · uuidRequired

The Universally Unique Identifier (UUID) of the workload.

Query parameters
startstring · date-timeRequired

Start date of time range to fetch data in ISO 8601 timestamp format.

Example: 2023-06-06T12:09:18.211Z
endstring · date-timeRequired

End date of time range to fetch data in ISO 8601 timestamp format.

Example: 2023-06-07T12:09:18.211Z
numberOfSamplesinteger · max: 1000Optional

The number of samples to take in the specified time range.

Default: 20Example: 20
Responses
chevron-right
200

Executed successfully.

get
/api/v1/workloads/inferences/{workloadId}/metrics

Get inference pod's metrics data.

get

Retrieve inference metrics pod's data by workload and pod id. Supported from control-plane version 2.18 or later.

Authorizations
AuthorizationstringRequired

Bearer authentication

Path parameters
workloadIdstring · uuidRequired

The Universally Unique Identifier (UUID) of the workload.

podIdstring · uuidRequired

The requested pod id.

Query parameters
startstring · date-timeRequired

Start date of time range to fetch data in ISO 8601 timestamp format.

Example: 2023-06-06T12:09:18.211Z
endstring · date-timeRequired

End date of time range to fetch data in ISO 8601 timestamp format.

Example: 2023-06-07T12:09:18.211Z
numberOfSamplesinteger · max: 1000Optional

The number of samples to take in the specified time range.

Default: 20Example: 20
Responses
chevron-right
200

Executed successfully.

get
/api/v1/workloads/inferences/{workloadId}/pods/{podId}/metrics

Last updated