runai inference distributed submit

submit a distributed inference workload

Synopsis

Before using the flags, keep in mind:

By default (--worker-as-leader=true), worker pods inherit their configuration from the leader:

  • You only need to specify worker-specific flags if you want worker pods to differ from the leader

  • When --worker-as-leader=false is set, workers require their own explicit --worker-* flags

runai inference distributed submit [flags]

Examples


# Submit a workload
runai inference distributed submit <workload-name> -p <project-name> -i tiangolo/uvicorn-gunicorn-fastapi:python3.9 --replicas 2 --workers 3 -g 1 --serving-port 80

# Submit a workload with a template
runai inference distributed submit <workload-name> -p <project-name> --template <template-name>

# Submit a workload where workers inherit leader settings but override specific fields
runai inference distributed submit <workload-name> -p <project-name> -i tiangolo/uvicorn-gunicorn-fastapi:python3.9 --environment ROLE=LEADER --replicas 3 --workers 7 -g 2 --serving-port 80 --worker-environment ROLE=WORKER

# Submit a workload with a different worker setup (workers not based on leader configuration)
runai inference distributed submit <workload-name> -p <project-name> -i tiangolo/uvicorn-gunicorn-fastapi:python3.9 --replicas 4 --workers 1 -g 8 --serving-port 80 --existing-pvc claimname=my-pvc,path=/data --worker-as-leader=false --worker-image tiangolo/uvicorn-gunicorn-fastapi:python3.9 --worker-gpu-devices-request 8

Options

Options inherited from parent commands

SEE ALSO

  • runai inference distributed - Runs multiple coordinated inference processes across multiple nodes. Required for models too large to run on a single node.

Last updated