# runai inference distributed submit

submit a distributed inference workload

## Synopsis

Before using the flags, keep in mind:

By default (--worker-as-leader=true), worker pods inherit their configuration from the leader:

* You only need to specify worker-specific flags if you want worker pods to differ from the leader
* When --worker-as-leader=false is set, workers require their own explicit --worker-\* flags

```
runai inference distributed submit [flags]
```

## Examples

```

# Submit a workload
runai inference distributed submit <workload-name> -p <project-name> -i tiangolo/uvicorn-gunicorn-fastapi:python3.9 --replicas 2 --workers 3 -g 1 --serving-port 80

# Submit a workload with a template
runai inference distributed submit <workload-name> -p <project-name> --template <template-name>

# Submit a workload where workers inherit leader settings but override specific fields
runai inference distributed submit <workload-name> -p <project-name> -i tiangolo/uvicorn-gunicorn-fastapi:python3.9 --environment ROLE=LEADER --replicas 3 --workers 7 -g 2 --serving-port 80 --worker-environment ROLE=WORKER

# Submit a workload with a different worker setup (workers not based on leader configuration)
runai inference distributed submit <workload-name> -p <project-name> -i tiangolo/uvicorn-gunicorn-fastapi:python3.9 --replicas 4 --workers 1 -g 8 --serving-port 80 --existing-pvc claimname=my-pvc,path=/data --worker-as-leader=false --worker-image tiangolo/uvicorn-gunicorn-fastapi:python3.9 --worker-gpu-devices-request 8
```

## Options

```
      --annotation stringArray                                Set of annotations to populate into the container running the workload
      --arguments string                                      Specifies the arguments to pass to the container command
      --capability stringArray                                Add POSIX capabilities to the container. Defaults to the runtime's default set.
      --command string                                        Specifies the command to run in the container, overriding the image's default entrypoint
      --compute string                                        Specifies the compute resource asset to use for the workload.
      --configmap-map-volume stringArray                      Mount a ConfigMap as a volume. Format: name=CONFIGMAP_NAME,path=PATH,subpath=SUBPATH,default-mode=DEFAULT_MODE.
      --cpu-core-limit positiveFloat                          Maximum number of CPU cores allowed (e.g. 0.5, 1).
      --cpu-core-request positiveFloat                        Number of CPU cores to request (e.g. 0.5, 1).
      --cpu-memory-limit string                               Maximum memory allowed (e.g. 1G, 500M).
      --cpu-memory-request string                             Amount of memory to request (e.g. 1G, 500M).
      --create-home-dir                                       Create a temporary home directory for the container. Defaults to true when --run-as-user is set, false otherwise.
      --datasource stringArray                                Specifies the data source asset to attach to the workload. Format: type=hostPath|nfs|pvc|git|s3|configMap|secretVolume,name=NAME
      --empty-dir-volume stringArray                          Mount an empty directory as a volume. Format: name=NAME,path=PATH,medium=MEDIUM,size-limit=SIZE.
      --env-pod-field-ref stringArray                         Set an environment variable from a pod field reference. Format: ENV_VARIABLE=FIELD_REFERENCE.
      --env-secret stringArray                                Set an environment variable from a Kubernetes secret. Format: ENV_VARIABLE=secret-name,key=secret-key.
      --environment stringArray                               Specifies the environment asset to use for the workload.
  -e, --environment-variable stringArray                      Set environment variables in the container. Format: --environment-variable name=value --environment-variable name-b=value-b.
      --exclude-node stringArray                              Nodes that will be excluded from use by the scheduler. Format: --exclude-node node-a --exclude-node node-b
      --existing-pvc stringArray                              Mount an existing PersistentVolumeClaim. Format: claimname=CLAIM_NAME,path=PATH. Auto-complete supported.
      --extended-resource stringArray                         Request access to a Kubernetes extended resource. Format: resource_name=quantity.
      --git-sync stringArray                                  Mount a Git repository into the container. Format: name=NAME,repository=REPO,path=PATH,secret=SECRET,rev=REVISION.
  -g, --gpu-devices-request positiveInt                       Number of GPU devices to allocate for the workload (e.g. 1, 2).
      --gpu-memory-limit string                               Maximum GPU memory to allocate (e.g. 1G, 500M).
      --gpu-memory-request string                             Amount of GPU memory to allocate (e.g. 1G, 500M).
      --gpu-portion-limit positiveFloat                       Maximum GPU fraction allowed for the workload (between 0 and 1).
      --gpu-portion-request positiveFloat                     Fraction of a GPU to allocate (between 0 and 1, e.g. 0.5).
      --gpu-request-type string                               Type of GPU request: portion, memory
  -h, --help                                                  help for submit
      --host-path stringArray                                 Mount a host path as a volume. Format: path=PATH,mount=MOUNT,mount-propagation=None|HostToContainer,readwrite.
  -i, --image string                                          The container image to use for the workload.
      --image-pull-policy string                              Image pull policy for the container. Valid values: Always, IfNotPresent, Never.
      --label stringArray                                     Set of labels to populate into the container running the workspace
      --large-shm                                             Request a large /dev/shm device to mount in the container. Useful for memory-intensive workloads.
      --mnnvl mnnvl-mode                                      Multi-Node NVLINK (MNNVL) mode controls GPU network acceleration for distributed workloads. Valid values: None, Preferred, Required
      --name-prefix string                                    Set defined prefix for the workload name and add index as a suffix
      --new-pvc stringArray                                   Create and mount a new volume. This volume is used only for the duration of the workload's lifecycle. Format: claimname=CLAIM_NAME,storageclass=STORAGE_CLASS,size=SIZE,path=PATH,accessmode-rwo,accessmode-rom,accessmode-rwm,ro,ephemeral.
      --nfs stringArray                                       Mount an NFS volume. Format: path=PATH,server=SERVER,mountpath=MOUNT_PATH,readwrite.
      --node-pools stringArray                                Node pools to use for scheduling the job, ordered by priority. Format: --node-pools pool-a --node-pools pool-b
      --node-type string                                      Enforce node type affinity by setting a node-type label.
      --pod-running-timeout duration                          Timeout for pod to reach running state (e.g. 5s, 2m, 3h).
      --preemptibility preemptibility                         Specify whether the workload can be preempted by higher-priority workloads. Valid values: preemptible, non-preemptible. Overrides the default preemptibility for the workload type.
      --preferred-pod-topology-key string                     If possible, schedule all pods of this workload on nodes with a matching label key and value. Format: key=VALUE.
      --priority string                                       Sets the workload’s scheduling priority. Valid values: very-low, low, medium-low, medium, medium-high, high, very-high. 
                                                              Overrides the default priority for the workload type. Changing priority does not update preemptibility automatically.
      --privileged                                            Grants the container full access to the host, bypassing almost all container isolation; the container acts like root.
  -p, --project string                                        Specify the project for the command to use. Defaults to the project set in the context, if any. Use 'runai project set <project>' to set the default.
      --replicas int32                                        The number of replicas (sets of leader and workers) to run. Defaults to 1 (default 1)
      --required-pod-topology-key string                      Require scheduling pods of this workload on nodes with a matching label key and value. Format: key=VALUE.
      --restart-policy distributed-inference-restart-policy   Restart policy for distributed inference workload. Valid values: RecreateGroupOnPodRestart (restarts all pods if any pod fails), None (no automatic restart). Defaults to RecreateGroupOnPodRestart
      --run-as-gid int                                        Group ID to run the container as.
      --run-as-uid int                                        User ID to run the container as.
      --run-as-user                                           Set the user and group IDs for the container. Uses local terminal credentials if not specified.
      --s3 stringArray                                        Mount an S3 bucket as a volume. Format: name=NAME,bucket=BUCKET,path=PATH,accesskey=ACCESS_KEY,url=URL.
      --seccomp-profile string                                Seccomp profile for the container. Valid values: RuntimeDefault, Unconfined, or Localhost.
      --secret-volume stringArray                             Mount a Kubernetes Secret as a volume. Format: path=PATH,name=SECRET_RESOURCE_NAME.
      --serving-port string                                   Defines various attributes for the serving port. Usage formats: (1) Simplified format: --serving-port=CONTAINER_PORT (2) Full format: --serving-port=container=CONTAINER_PORT,[authorization-type=public|authenticatedUsers|authorizedUsers|authorizedGroups],[authorized-users=USER1:USER2:APP1...],[authorized-groups=GROUP1:GROUP2...],[expose-externally],[exposed-url=URL],[protocol=http]
      --startup-policy distributed-inference-startup-policy   Controls when worker pods start. Valid values: LeaderCreated (workers start when leader is created), LeaderReady (workers start when leader is ready). Defaults to LeaderCreated
      --supplemental-groups ints                              Comma-separated list of group IDs for the container user.
      --template string                                       The name of the template to use for submitting this workload. Values from the template and CLI flags are combined. Flags with matching keys replace template values; other flags are applied in addition.
      --termination-grace-period duration                     The length of time (like 5s or 2m, higher than zero) the workload's pod is expected to terminate gracefully upon probe failure. In case value is not specified, kubernetes default of 30 seconds applies (default 0s)
      --toleration stringArray                                Add Kubernetes tolerations. Format: operator=Equal|Exists,key=KEY,[value=VALUE],[effect=NoSchedule|NoExecute|PreferNoSchedule],[seconds=SECONDS].
      --user-group-source string                              How to determine user/group IDs. Valid values: fromTheImage, fromIdpToken.
      --wait-for-submit duration                              How long to wait for the workload to be created in the cluster. Default: 1m.
      --worker-allow-privilege-escalation                     Allow worker pods to gain additional privileges after starting.
      --worker-annotation stringArray                         Set of annotations to populate into worker pods running the workload
      --worker-arguments string                               Specifies the arguments to pass to worker pod command
      --worker-as-leader                                      Use leader configuration for workers unless overridden with --worker-* flags. Set to false when workers require explicit configuration. (default true)
      --worker-capability stringArray                         Add POSIX capabilities to worker pods. Defaults to the runtime's default set.
      --worker-command string                                 Specifies the command to run in worker pods, overriding the image's default entrypoint
      --worker-configmap-map-volume stringArray               Mount a ConfigMap as a volume for worker pods. Format: name=CONFIGMAP_NAME,path=PATH,subpath=SUBPATH,default-mode=DEFAULT_MODE.
      --worker-cpu-core-limit positiveFloat                   Maximum number of CPU cores allowed per worker pod (e.g. 0.5, 1).
      --worker-cpu-core-request positiveFloat                 Number of CPU cores to request per worker pod (e.g. 0.5, 1).
      --worker-cpu-memory-limit string                        Maximum memory allowed per worker pod (e.g. 1G, 500M).
      --worker-cpu-memory-request string                      Amount of memory to request per worker pod (e.g. 1G, 500M).
      --worker-create-home-dir                                Create a temporary home directory for worker pods. Defaults to true when --worker-run-as-uid is set, false otherwise.
      --worker-empty-dir-volume stringArray                   Mount an empty directory as a volume for worker pods. Format: path=PATH,medium=MEDIUM,size-limit=SIZE.
      --worker-env-pod-field-ref stringArray                  Set an environment variable from a pod field reference for worker pods. Format: ENV_VARIABLE=FIELD_REFERENCE.
      --worker-env-secret stringArray                         Set an environment variable from a Kubernetes secret for worker pods. Format: ENV_VARIABLE=secret-name,key=secret-key.
      --worker-environment stringArray                        Set environment variables in worker pods. Format: --worker-environment name=value --worker-environment name-b=value-b.
      --worker-exclude-node stringArray                       Nodes that will be excluded from use by worker pods. Format: --worker-exclude-node node-a --worker-exclude-node node-b
      --worker-existing-pvc stringArray                       Mount an existing PersistentVolumeClaim for worker pods. Format: claimname=CLAIM_NAME,path=PATH. Auto-complete supported.
      --worker-extended-resource stringArray                  Request access to a Kubernetes extended resource per worker pod. Format: resource_name=quantity.
      --worker-git-sync stringArray                           Mount a Git repository into worker pods. Format: name=NAME,repository=REPO,path=PATH,secret=SECRET,rev=REVISION.
      --worker-gpu-devices-request positiveInt                Number of GPU devices to allocate per worker pod (e.g. 1, 2).
      --worker-gpu-memory-limit string                        Maximum GPU memory to allocate per worker pod (e.g. 1G, 500M).
      --worker-gpu-memory-request string                      Amount of GPU memory to allocate per worker pod (e.g. 1G, 500M).
      --worker-gpu-portion-limit positiveFloat                Maximum GPU fraction allowed per worker pod (between 0 and 1).
      --worker-gpu-portion-request positiveFloat              Fraction of a GPU to allocate per worker pod (between 0 and 1, e.g. 0.5).
      --worker-host-path stringArray                          Mount a host path as a volume for worker pods. Format: path=PATH,mount=MOUNT,mount-propagation=None|HostToContainer,readwrite.
      --worker-image string                                   The container image to use for worker pods.
      --worker-image-pull-policy string                       Image pull policy for worker pods. Valid values: Always, IfNotPresent, Never.
      --worker-label stringArray                              Set of labels to populate into worker pods running the workload
      --worker-new-pvc stringArray                            Create and mount a new volume for worker pods. This volume is used only for the duration of the workload's lifecycle. Format: claimname=CLAIM_NAME,storageclass=STORAGE_CLASS,size=SIZE,path=PATH,accessmode-rwo,accessmode-rom,accessmode-rwm,ro,ephemeral.
      --worker-nfs stringArray                                Mount an NFS volume for worker pods. Format: path=PATH,server=SERVER,mountpath=MOUNT_PATH,readwrite.
      --worker-node-type string                               Enforce node type affinity for worker pods by setting a node-type label.
      --worker-preferred-pod-topology-key string              If possible, schedule all worker pods on nodes with a matching label key and value. Format: key=VALUE.
      --worker-privileged                                     Grants worker pods full access to the host, bypassing almost all container isolation; the container acts like root.
      --worker-required-pod-topology-key string               Require scheduling worker pods on nodes with a matching label key and value. Format: key=VALUE.
      --worker-run-as-gid int                                 Group ID to run worker pods as.
      --worker-run-as-uid int                                 User ID to run worker pods as.
      --worker-s3 stringArray                                 Mount an S3 bucket as a volume for worker pods. Format: name=NAME,bucket=BUCKET,path=PATH,accesskey=ACCESS_KEY,url=URL.
      --worker-secret-volume stringArray                      Mount a Kubernetes Secret as a volume for worker pods. Format: path=PATH,name=SECRET_RESOURCE_NAME.
      --worker-supplemental-groups ints                       Comma-separated list of group IDs for worker pods.
      --worker-toleration stringArray                         Add Kubernetes tolerations for worker pods. Format: operator=Equal|Exists,key=KEY,[value=VALUE],[effect=NoSchedule|NoExecute|PreferNoSchedule],[seconds=SECONDS].
      --worker-working-dir string                             Working directory inside worker pods. Overrides the default working directory set in the image.
      --workers int32                                         the number of workers that will be allocated for running the workload
      --working-dir string                                    Working directory inside the container. Overrides the default working directory set in the image.
```

## Options inherited from parent commands

```
      --config-file string   config file name; can be set by environment variable RUNAI_CLI_CONFIG_FILE (default "config.json")
      --config-path string   config path; can be set by environment variable RUNAI_CLI_CONFIG_PATH
  -d, --debug                enable debug mode
  -q, --quiet                enable quiet mode, suppress all output except error messages
      --verbose              enable verbose mode
```

## SEE ALSO

* [runai inference distributed](/self-hosted/2.24/reference/cli/runai/runai-inference-distributed.md) - Runs multiple coordinated inference processes across multiple nodes. Required for models too large to run on a single node.


---

# Agent Instructions: Querying This Documentation

If you need additional information that is not directly available in this page, you can query the documentation dynamically by asking a question.

Perform an HTTP GET request on the current page URL with the `ask` query parameter:

```
GET https://run-ai-docs.nvidia.com/self-hosted/2.24/reference/cli/runai/runai-inference-distributed-submit.md?ask=<question>
```

The question should be specific, self-contained, and written in natural language.
The response will contain a direct answer to the question and relevant excerpts and sources from the documentation.

Use this mechanism when the answer is not explicitly present in the current page, you need clarification or additional context, or you want to retrieve related documentation sections.