By default (--worker-as-leader=true), worker pods inherit their configuration from the leader:
You only need to specify worker-specific flags if you want worker pods to differ from the leader
When --worker-as-leader=false is set, workers require their own explicit --worker-* flags
runai inference distributed submit [flags]
Examples
# Submit a workload
runai inference distributed submit <workload-name> -p <project-name> -i tiangolo/uvicorn-gunicorn-fastapi:python3.9 --replicas 2 --workers 3 -g 1 --serving-port 80
# Submit a workload with a template
runai inference distributed submit <workload-name> -p <project-name> --template <template-name>
# Submit a workload where workers inherit leader settings but override specific fields
runai inference distributed submit <workload-name> -p <project-name> -i tiangolo/uvicorn-gunicorn-fastapi:python3.9 --environment ROLE=LEADER --replicas 3 --workers 7 -g 2 --serving-port 80 --worker-environment ROLE=WORKER
# Submit a workload with a different worker setup (workers not based on leader configuration)
runai inference distributed submit <workload-name> -p <project-name> -i tiangolo/uvicorn-gunicorn-fastapi:python3.9 --replicas 4 --workers 1 -g 8 --serving-port 80 --existing-pvc claimname=my-pvc,path=/data --worker-as-leader=false --worker-image tiangolo/uvicorn-gunicorn-fastapi:python3.9 --worker-gpu-devices-request 8
Options
Options inherited from parent commands
SEE ALSO
runai inference distributed - Runs multiple coordinated inference processes across multiple nodes. Required for models too large to run on a single node.
--annotation stringArray Set of annotations to populate into the container running the workload
--arguments string Specifies the arguments to pass to the container command
--capability stringArray Add POSIX capabilities to the container. Defaults to the runtime's default set.
--command string Specifies the command to run in the container, overriding the image's default entrypoint
--compute string Specifies the compute resource asset to use for the workload.
--configmap-map-volume stringArray Mount a ConfigMap as a volume. Format: name=CONFIGMAP_NAME,path=PATH,subpath=SUBPATH,default-mode=DEFAULT_MODE.
--cpu-core-limit positiveFloat Maximum number of CPU cores allowed (e.g. 0.5, 1).
--cpu-core-request positiveFloat Number of CPU cores to request (e.g. 0.5, 1).
--cpu-memory-limit string Maximum memory allowed (e.g. 1G, 500M).
--cpu-memory-request string Amount of memory to request (e.g. 1G, 500M).
--create-home-dir Create a temporary home directory for the container. Defaults to true when --run-as-user is set, false otherwise.
--datasource stringArray Specifies the data source asset to attach to the workload. Format: type=hostPath|nfs|pvc|git|s3|configMap|secretVolume,name=NAME
--empty-dir-volume stringArray Mount an empty directory as a volume. Format: name=NAME,path=PATH,medium=MEDIUM,size-limit=SIZE.
--env-pod-field-ref stringArray Set an environment variable from a pod field reference. Format: ENV_VARIABLE=FIELD_REFERENCE.
--env-secret stringArray Set an environment variable from a Kubernetes secret. Format: ENV_VARIABLE=secret-name,key=secret-key.
--environment stringArray Specifies the environment asset to use for the workload.
-e, --environment-variable stringArray Set environment variables in the container. Format: --environment-variable name=value --environment-variable name-b=value-b.
--exclude-node stringArray Nodes that will be excluded from use by the scheduler. Format: --exclude-node node-a --exclude-node node-b
--existing-pvc stringArray Mount an existing PersistentVolumeClaim. Format: claimname=CLAIM_NAME,path=PATH. Auto-complete supported.
--extended-resource stringArray Request access to a Kubernetes extended resource. Format: resource_name=quantity.
--git-sync stringArray Mount a Git repository into the container. Format: name=NAME,repository=REPO,path=PATH,secret=SECRET,rev=REVISION.
-g, --gpu-devices-request positiveInt Number of GPU devices to allocate for the workload (e.g. 1, 2).
--gpu-memory-limit string Maximum GPU memory to allocate (e.g. 1G, 500M).
--gpu-memory-request string Amount of GPU memory to allocate (e.g. 1G, 500M).
--gpu-portion-limit positiveFloat Maximum GPU fraction allowed for the workload (between 0 and 1).
--gpu-portion-request positiveFloat Fraction of a GPU to allocate (between 0 and 1, e.g. 0.5).
--gpu-request-type string Type of GPU request: portion, memory
-h, --help help for submit
--host-path stringArray Mount a host path as a volume. Format: path=PATH,mount=MOUNT,mount-propagation=None|HostToContainer,readwrite.
-i, --image string The container image to use for the workload.
--image-pull-policy string Image pull policy for the container. Valid values: Always, IfNotPresent, Never.
--label stringArray Set of labels to populate into the container running the workspace
--large-shm Request a large /dev/shm device to mount in the container. Useful for memory-intensive workloads.
--mnnvl mnnvl-mode Multi-Node NVLINK (MNNVL) mode controls GPU network acceleration for distributed workloads. Valid values: None, Preferred, Required
--name-prefix string Set defined prefix for the workload name and add index as a suffix
--new-pvc stringArray Create and mount a new volume. This volume is used only for the duration of the workload's lifecycle. Format: claimname=CLAIM_NAME,storageclass=STORAGE_CLASS,size=SIZE,path=PATH,accessmode-rwo,accessmode-rom,accessmode-rwm,ro,ephemeral.
--nfs stringArray Mount an NFS volume. Format: path=PATH,server=SERVER,mountpath=MOUNT_PATH,readwrite.
--node-pools stringArray Node pools to use for scheduling the job, ordered by priority. Format: --node-pools pool-a --node-pools pool-b
--node-type string Enforce node type affinity by setting a node-type label.
--pod-running-timeout duration Timeout for pod to reach running state (e.g. 5s, 2m, 3h).
--preemptibility preemptibility Specify whether the workload can be preempted by higher-priority workloads. Valid values: preemptible, non-preemptible. Overrides the default preemptibility for the workload type.
--preferred-pod-topology-key string If possible, schedule all pods of this workload on nodes with a matching label key and value. Format: key=VALUE.
--priority string Sets the workload’s scheduling priority. Valid values: very-low, low, medium-low, medium, medium-high, high, very-high.
Overrides the default priority for the workload type. Changing priority does not update preemptibility automatically.
--privileged Grants the container full access to the host, bypassing almost all container isolation; the container acts like root.
-p, --project string Specify the project for the command to use. Defaults to the project set in the context, if any. Use 'runai project set <project>' to set the default.
--replicas int32 The number of replicas (sets of leader and workers) to run. Defaults to 1 (default 1)
--required-pod-topology-key string Require scheduling pods of this workload on nodes with a matching label key and value. Format: key=VALUE.
--restart-policy distributed-inference-restart-policy Restart policy for distributed inference workload. Valid values: RecreateGroupOnPodRestart (restarts all pods if any pod fails), None (no automatic restart). Defaults to RecreateGroupOnPodRestart
--run-as-gid int Group ID to run the container as.
--run-as-uid int User ID to run the container as.
--run-as-user Set the user and group IDs for the container. Uses local terminal credentials if not specified.
--s3 stringArray Mount an S3 bucket as a volume. Format: name=NAME,bucket=BUCKET,path=PATH,accesskey=ACCESS_KEY,url=URL.
--seccomp-profile string Seccomp profile for the container. Valid values: RuntimeDefault, Unconfined, or Localhost.
--secret-volume stringArray Mount a Kubernetes Secret as a volume. Format: path=PATH,name=SECRET_RESOURCE_NAME.
--serving-port string Defines various attributes for the serving port. Usage formats: (1) Simplified format: --serving-port=CONTAINER_PORT (2) Full format: --serving-port=container=CONTAINER_PORT,[authorization-type=public|authenticatedUsers|authorizedUsers|authorizedGroups],[authorized-users=USER1:USER2:APP1...],[authorized-groups=GROUP1:GROUP2...],[expose-externally],[exposed-url=URL],[protocol=http]
--startup-policy distributed-inference-startup-policy Controls when worker pods start. Valid values: LeaderCreated (workers start when leader is created), LeaderReady (workers start when leader is ready). Defaults to LeaderCreated
--supplemental-groups ints Comma-separated list of group IDs for the container user.
--template string The name of the template to use for submitting this workload. Values from the template and CLI flags are combined. Flags with matching keys replace template values; other flags are applied in addition.
--termination-grace-period duration The length of time (like 5s or 2m, higher than zero) the workload's pod is expected to terminate gracefully upon probe failure. In case value is not specified, kubernetes default of 30 seconds applies (default 0s)
--toleration stringArray Add Kubernetes tolerations. Format: operator=Equal|Exists,key=KEY,[value=VALUE],[effect=NoSchedule|NoExecute|PreferNoSchedule],[seconds=SECONDS].
--user-group-source string How to determine user/group IDs. Valid values: fromTheImage, fromIdpToken.
--wait-for-submit duration How long to wait for the workload to be created in the cluster. Default: 1m.
--worker-allow-privilege-escalation Allow worker pods to gain additional privileges after starting.
--worker-annotation stringArray Set of annotations to populate into worker pods running the workload
--worker-arguments string Specifies the arguments to pass to worker pod command
--worker-as-leader Use leader configuration for workers unless overridden with --worker-* flags. Set to false when workers require explicit configuration. (default true)
--worker-capability stringArray Add POSIX capabilities to worker pods. Defaults to the runtime's default set.
--worker-command string Specifies the command to run in worker pods, overriding the image's default entrypoint
--worker-configmap-map-volume stringArray Mount a ConfigMap as a volume for worker pods. Format: name=CONFIGMAP_NAME,path=PATH,subpath=SUBPATH,default-mode=DEFAULT_MODE.
--worker-cpu-core-limit positiveFloat Maximum number of CPU cores allowed per worker pod (e.g. 0.5, 1).
--worker-cpu-core-request positiveFloat Number of CPU cores to request per worker pod (e.g. 0.5, 1).
--worker-cpu-memory-limit string Maximum memory allowed per worker pod (e.g. 1G, 500M).
--worker-cpu-memory-request string Amount of memory to request per worker pod (e.g. 1G, 500M).
--worker-create-home-dir Create a temporary home directory for worker pods. Defaults to true when --worker-run-as-uid is set, false otherwise.
--worker-empty-dir-volume stringArray Mount an empty directory as a volume for worker pods. Format: path=PATH,medium=MEDIUM,size-limit=SIZE.
--worker-env-pod-field-ref stringArray Set an environment variable from a pod field reference for worker pods. Format: ENV_VARIABLE=FIELD_REFERENCE.
--worker-env-secret stringArray Set an environment variable from a Kubernetes secret for worker pods. Format: ENV_VARIABLE=secret-name,key=secret-key.
--worker-environment stringArray Set environment variables in worker pods. Format: --worker-environment name=value --worker-environment name-b=value-b.
--worker-exclude-node stringArray Nodes that will be excluded from use by worker pods. Format: --worker-exclude-node node-a --worker-exclude-node node-b
--worker-existing-pvc stringArray Mount an existing PersistentVolumeClaim for worker pods. Format: claimname=CLAIM_NAME,path=PATH. Auto-complete supported.
--worker-extended-resource stringArray Request access to a Kubernetes extended resource per worker pod. Format: resource_name=quantity.
--worker-git-sync stringArray Mount a Git repository into worker pods. Format: name=NAME,repository=REPO,path=PATH,secret=SECRET,rev=REVISION.
--worker-gpu-devices-request positiveInt Number of GPU devices to allocate per worker pod (e.g. 1, 2).
--worker-gpu-memory-limit string Maximum GPU memory to allocate per worker pod (e.g. 1G, 500M).
--worker-gpu-memory-request string Amount of GPU memory to allocate per worker pod (e.g. 1G, 500M).
--worker-gpu-portion-limit positiveFloat Maximum GPU fraction allowed per worker pod (between 0 and 1).
--worker-gpu-portion-request positiveFloat Fraction of a GPU to allocate per worker pod (between 0 and 1, e.g. 0.5).
--worker-host-path stringArray Mount a host path as a volume for worker pods. Format: path=PATH,mount=MOUNT,mount-propagation=None|HostToContainer,readwrite.
--worker-image string The container image to use for worker pods.
--worker-image-pull-policy string Image pull policy for worker pods. Valid values: Always, IfNotPresent, Never.
--worker-label stringArray Set of labels to populate into worker pods running the workload
--worker-new-pvc stringArray Create and mount a new volume for worker pods. This volume is used only for the duration of the workload's lifecycle. Format: claimname=CLAIM_NAME,storageclass=STORAGE_CLASS,size=SIZE,path=PATH,accessmode-rwo,accessmode-rom,accessmode-rwm,ro,ephemeral.
--worker-nfs stringArray Mount an NFS volume for worker pods. Format: path=PATH,server=SERVER,mountpath=MOUNT_PATH,readwrite.
--worker-node-type string Enforce node type affinity for worker pods by setting a node-type label.
--worker-preferred-pod-topology-key string If possible, schedule all worker pods on nodes with a matching label key and value. Format: key=VALUE.
--worker-privileged Grants worker pods full access to the host, bypassing almost all container isolation; the container acts like root.
--worker-required-pod-topology-key string Require scheduling worker pods on nodes with a matching label key and value. Format: key=VALUE.
--worker-run-as-gid int Group ID to run worker pods as.
--worker-run-as-uid int User ID to run worker pods as.
--worker-s3 stringArray Mount an S3 bucket as a volume for worker pods. Format: name=NAME,bucket=BUCKET,path=PATH,accesskey=ACCESS_KEY,url=URL.
--worker-secret-volume stringArray Mount a Kubernetes Secret as a volume for worker pods. Format: path=PATH,name=SECRET_RESOURCE_NAME.
--worker-supplemental-groups ints Comma-separated list of group IDs for worker pods.
--worker-toleration stringArray Add Kubernetes tolerations for worker pods. Format: operator=Equal|Exists,key=KEY,[value=VALUE],[effect=NoSchedule|NoExecute|PreferNoSchedule],[seconds=SECONDS].
--worker-working-dir string Working directory inside worker pods. Overrides the default working directory set in the image.
--workers int32 the number of workers that will be allocated for running the workload
--working-dir string Working directory inside the container. Overrides the default working directory set in the image.
--config-file string config file name; can be set by environment variable RUNAI_CLI_CONFIG_FILE (default "config.json")
--config-path string config path; can be set by environment variable RUNAI_CLI_CONFIG_PATH
-d, --debug enable debug mode
-q, --quiet enable quiet mode, suppress all output except error messages
--verbose enable verbose mode