runai mpi submit

submit a mpi training workload

runai mpi submit [flags]

Examples

# Submit a workload
runai training mpi submit <name> -p <project_name> -i runai.jfrog.io/demo/quickstart-demo

# Submit a workload with arguments
runai training mpi submit <name> -p <project_name> -i ubuntu -- ls -la

# Submit a workload with a custom command
runai training mpi submit <name> -p <project_name> -i ubuntu --command -- echo "Hello, World"

# Submit a workload with a field reference 
runai training mpi submit <name> -p <project_name> -i ubuntu --env-pod-field-ref "PROJECT=metadata.labels['project']"

# Submit a workload master args with worker args
runai training mpi submit <name> -p <project_name> -i ubuntu --master-args "-a master_arg_a -b master-arg_b'" -- '-a worker_arg_a'

# Submit a workload master command with worker args
runai training mpi submit <name> -p <project_name> -i ubuntu --master-command "echo -e 'master command'" -- '-a worker_arg_a'

# Submit a workload master command with worker command
runai training mpi submit <name> -p <project_name> -i ubuntu --master-command "echo -e 'master command'" --command -- echo -e 'worker command'

Options

      --allow-privilege-escalation                          Allow the container to gain additional privileges after starting.
      --annotation stringArray                              Set of annotations to populate into the container running the workload
      --attach                                              Wait for the pod to start running, then attach to it as if 'runai attach' was called. Implies --tty and --stdin.
      --auto-deletion-time-after-completion duration        Automatically delete a completed job after a specified duration (e.g. 5s, 2m, 3h). (default 0s)
      --backoff-limit int32                                 Number of times to retry a failed job before marking it as failed.
      --capability stringArray                              Add POSIX capabilities to the container. Defaults to the runtime's default set.
      --clean-pod-policy string                             Specifies which pods will be deleted when the workload reaches a terminal state (completed/failed)
  -c, --command                                             If true, override the image's entrypoint with the command supplied after '--'
      --configmap-map-volume stringArray                    Mount a ConfigMap as a volume. Format: name=CONFIGMAP_NAME,path=PATH,subpath=SUBPATH.
      --cpu-core-limit positiveFloat                        Maximum number of CPU cores allowed (e.g. 0.5, 1).
      --cpu-core-request positiveFloat                      Number of CPU cores to request (e.g. 0.5, 1).
      --cpu-memory-limit string                             Maximum memory allowed (e.g. 1G, 500M).
      --cpu-memory-request string                           Amount of memory to request (e.g. 1G, 500M).
      --create-home-dir                                     Create a temporary home directory for the container. Defaults to true when --run-as-user is set, false otherwise.
      --env-pod-field-ref stringArray                       Set an environment variable from a pod field reference. Format: ENV_VARIABLE=FIELD_REFERENCE.
  -e, --environment stringArray                             Set environment variables in the container. Format: --environment name=value -environment name-b=value-b.
      --exclude-node stringArray                            Nodes that will be excluded from use by the scheduler. Format: --exclude-node node-a --exclude-node node-b
      --existing-pvc stringArray                            Mount an existing PersistentVolumeClaim. Format: claimname=CLAIM_NAME,path=PATH. Auto-complete supported.
      --extended-resource stringArray                       Request access to a Kubernetes extended resource. Format: resource_name=quantity.
      --external-url stringArray                            Expose a URL from the workload container. Format: container=PORT,url=https://external.runai.com,authusers=user1,authgroups=group1.
      --git-sync stringArray                                Mount a Git repository into the container. Format: name=NAME,repository=REPO,path=PATH,secret=SECRET,rev=REVISION.
  -g, --gpu-devices-request positiveInt                     Number of GPU devices to allocate for the workload (e.g. 1, 2).
      --gpu-memory-limit string                             Maximum GPU memory to allocate (e.g. 1G, 500M).
      --gpu-memory-request string                           Amount of GPU memory to allocate (e.g. 1G, 500M).
      --gpu-portion-limit positiveFloat                     Maximum GPU fraction allowed for the workload (between 0 and 1).
      --gpu-portion-request positiveFloat                   Fraction of a GPU to allocate (between 0 and 1, e.g. 0.5).
      --gpu-request-type string                             Type of GPU request: portion, memory
  -h, --help                                                help for submit
      --host-ipc                                            Enable host IPC for the container. Default: false.
      --host-network                                        Enable host networking for the container. Default: false.
      --host-path stringArray                               Mount a host path as a volume. Format: path=PATH,mount=MOUNT,mount-propagation=None|HostToContainer,readwrite.
  -i, --image string                                        The container image to use for the workload.
      --image-pull-policy string                            Image pull policy for the container. Valid values: Always, IfNotPresent, Never.
      --label stringArray                                   Set of labels to populate into the container running the workspace
      --large-shm                                           Request a large /dev/shm device to mount in the container. Useful for memory-intensive workloads.
      --launcher-creation-policy launcher-creation-policy   Launcher creation policy. Valid values: AtStartup, WaitForWorkersReady
      --master-args string                                  Specifies the arguments to pass to the master pod container command
      --master-command string                               Specifies the command to run in the master pod container, overriding the image's default entrypoint. The command can include arguments following it.
      --master-cpu-core-limit positiveFloat                 Maximum number of CPU cores allowed for the master pod (e.g., 0.5, 1).
      --master-cpu-core-request positiveFloat               Number of CPU cores to request for the master pod (e.g., 0.5, 1).
      --master-cpu-memory-limit string                      Maximum memory allowed for the master pod (e.g., 1G, 500M).
      --master-cpu-memory-request string                    Amount of memory to request for the master pod (e.g., 1G, 500M).
      --master-environment stringArray                      Set environment variables in the master container. --master-environment name=value -master-environment name-b=value-b.
      --master-extended-resource stringArray                Request access to a Kubernetes extended resource. Format: resource_name=quantity.
      --master-no-pvcs                                      Do not mount any persistent volumes in the master pod
      --master-restart-policy restart-policy                Restart policy for master pod. Valid values: Always, OnFailure, Never.
      --name-prefix string                                  Set defined prefix for the workload name and add index as a suffix
      --new-pvc stringArray                                 Create and mount a new PersistentVolumeClaim. Format: claimname=CLAIM_NAME,storageclass=STORAGE_CLASS,size=SIZE,path=PATH,accessmode-rwo,accessmode-rom,accessmode-rwm,ro,ephemeral.
      --nfs stringArray                                     Mount an NFS volume. Format: path=PATH,server=SERVER,mountpath=MOUNT_PATH,readwrite.
      --node-pools stringArray                              Node pools to use for scheduling the job, ordered by priority. Format: --node-pools pool-a --node-pools pool-b
      --node-type string                                    Enforce node type affinity by setting a node-type label.
      --pod-running-timeout duration                        Timeout for pod to reach running state (e.g. 5s, 2m, 3h).
      --port stringArray                                    Expose ports from the workload container. Format: service-type=NodePort,container=80,external=8080.
      --preferred-pod-topology-key string                   If possible, schedule all pods of this workload on nodes with a matching label key and value. Format: key=VALUE.
      --priority string                                     Set the priority class for the workload.
  -p, --project string                                      Specify the project for the command to use. Defaults to the project set in the context, if any. Use 'runai project set <project>' to set the default.
      --required-pod-topology-key string                    Require scheduling pods of this workload on nodes with a matching label key and value. Format: key=VALUE.
      --restart-policy restart-policy                       Restart policy for worker pods. Valid values: Always, OnFailure, Never.
      --run-as-gid int                                      Group ID to run the container as.
      --run-as-uid int                                      User ID to run the container as.
      --run-as-user                                         Set the user and group IDs for the container. Uses local terminal credentials if not specified.
      --s3 stringArray                                      Mount an S3 bucket as a volume. Format: name=NAME,bucket=BUCKET,path=PATH,accesskey=ACCESS_KEY,url=URL.
      --seccomp-profile string                              Seccomp profile for the container. Valid values: RuntimeDefault, Unconfined, or Localhost.
      --secret-volume stringArray                           Mount a Kubernetes Secret as a volume. Format: path=PATH,name=SECRET_RESOURCE_NAME.
      --slots-per-worker int32                              Number of slots to allocate for each worker
      --ssh-auth-mount-path string                          Specifies the directory where SSH keys are mounted.
      --stdin                                               Keep stdin open on the container(s) in the pod, even if nothing is attached.
      --supplemental-groups ints                            Comma-separated list of group IDs for the container user.
      --termination-grace-period duration                   The length of time (like 5s or 2m, higher than zero) the workload's pod is expected to terminate gracefully upon probe failure. In case value is not specified, kubernetes default of 30 seconds applies (default 0s)
      --toleration stringArray                              Add Kubernetes tolerations. Format: operator=Equal|Exists,key=KEY,[value=VALUE],[effect=NoSchedule|NoExecute|PreferNoSchedule],[seconds=SECONDS].
  -t, --tty                                                 Allocate a TTY for the container. Useful for interactive workloads.
      --user-group-source string                            How to determine user/group IDs. Valid values: fromTheImage, fromIdpToken.
      --wait-for-submit duration                            How long to wait for the workload to be created in the cluster. Default: 1m.
      --workers int32                                       the number of workers that will be allocated for running the workload
      --working-dir string                                  Working directory inside the container. Overrides the default working directory set in the image.

Options inherited from parent commands

      --config-file string   config file name; can be set by environment variable RUNAI_CLI_CONFIG_FILE (default "config.json")
      --config-path string   config path; can be set by environment variable RUNAI_CLI_CONFIG_PATH
  -d, --debug                enable debug mode
  -q, --quiet                enable quiet mode, suppress all output except error messages
      --verbose              enable verbose mode

SEE ALSO

Last updated