runai inference standard submit

submit an inference workload

runai inference standard submit [flags]

Examples


# Submit a workload with scale to zero
runai inference submit <workload-name> -p <project-name> -i ghcr.io/knative/helloworld-go --gpu-devices-request 1 
--serving-port=8080 --min-replicas=0 --max-replicas=1 

# Submit a workload with a template
runai inference submit <workload-name> -p <project-name> --template <template-name>

# Submit a workload with an asset
runai inference submit <workload-name> -p <project-name> --environment <environment-asset-name>

# Submit a workload with autoscaling and authorization
runai inference submit <workload-name> -p <project-name> -i ghcr.io/knative/helloworld-go  --gpu-devices-request 1
--serving-port=container=8080,authorization-type=authorizedUsersOrGroups,authorized-users=user1:user2:app1,protocol=http 
--min-replicas=1 --max-replicas=4 --metric=concurrency  --metric-threshold=100 

Options

Options inherited from parent commands

SEE ALSO

  • runai inference standard - Runs a single inference process on one node. Suitable for smaller models or simpler inference tasks.

Last updated