runai inference submit

submit an inference workload

runai inference submit [flags]

Examples


# Submit a workload with scale to zero
runai inference submit <workload-name> -p <project-name> -i ghcr.io/knative/helloworld-go --gpu-devices-request 1 
--serving-port=8080 --min-replicas=0 --max-replicas=1 

# Submit a workload with arguments
runai inference submit <workload-name> -p <project-name> -i vllm/vllm-openai:latest --gpu-devices-request 1 --serving-port=8000 -- --model Qwen/Qwen3-0.6B

# Submit a workload with a template
runai inference submit <workload-name> -p <project-name> --template <template-name>

# Submit a workload with an asset
runai inference submit <workload-name> -p <project-name> --environment <environment-asset-name>

# Submit a workload with autoscaling and authorization
runai inference submit <workload-name> -p <project-name> -i ghcr.io/knative/helloworld-go  --gpu-devices-request 1
--serving-port=container=8080,authorization-type=authorizedUsersOrGroups,authorized-users=user1:user2:app1,protocol=http 
--min-replicas=1 --max-replicas=4 --metric=concurrency  --metric-threshold=100 

Options

Options inherited from parent commands

SEE ALSO

Last updated