runai training mpi submit
submit a mpi training workload
runai training mpi submit [flags]
Examples
# Submit a workload
runai training mpi submit <name> -p <project_name> -i runai.jfrog.io/demo/quickstart-demo
# Submit a workload with arguments
runai training mpi submit <name> -p <project_name> -i ubuntu -- ls -la
# Submit a workload with a custom command
runai training mpi submit <name> -p <project_name> -i ubuntu --command -- echo "Hello, World"
# Submit a workload with a field reference
runai training mpi submit <name> -p <project_name> -i ubuntu --env-pod-field-ref "PROJECT=metadata.labels['project']"
# Submit a workload master args with worker args
runai training mpi submit <name> -p <project_name> -i ubuntu --master-args "-a master_arg_a -b master-arg_b'" -- '-a worker_arg_a'
# Submit a workload master command with worker args
runai training mpi submit <name> -p <project_name> -i ubuntu --master-command "echo -e 'master command'" -- '-a worker_arg_a'
# Submit a workload master command with worker command
runai training mpi submit <name> -p <project_name> -i ubuntu --master-command "echo -e 'master command'" --command -- echo -e 'worker command'
Options
--allow-privilege-escalation Allow the container to gain additional privileges after starting.
--annotation stringArray Set of annotations to populate into the container running the workload
--attach Wait for the pod to start running, then attach to it as if 'runai attach' was called. Implies --tty and --stdin.
--auto-deletion-time-after-completion duration Automatically delete a completed job after a specified duration (e.g. 5s, 2m, 3h). (default 0s)
--backoff-limit int32 Number of times to retry a failed job before marking it as failed.
--capability stringArray Add POSIX capabilities to the container. Defaults to the runtime's default set.
--clean-pod-policy string Specifies which pods will be deleted when the workload reaches a terminal state (completed/failed)
-c, --command If true, override the image's entrypoint with the command supplied after '--'
--configmap-map-volume stringArray Mount a ConfigMap as a volume. Format: name=CONFIGMAP_NAME,path=PATH,subpath=SUBPATH.
--cpu-core-limit positiveFloat Maximum number of CPU cores allowed (e.g. 0.5, 1).
--cpu-core-request positiveFloat Number of CPU cores to request (e.g. 0.5, 1).
--cpu-memory-limit string Maximum memory allowed (e.g. 1G, 500M).
--cpu-memory-request string Amount of memory to request (e.g. 1G, 500M).
--create-home-dir Create a temporary home directory for the container. Defaults to true when --run-as-user is set, false otherwise.
--env-pod-field-ref stringArray Set an environment variable from a pod field reference. Format: ENV_VARIABLE=FIELD_REFERENCE.
-e, --environment stringArray Set environment variables in the container. Format: --environment name=value -environment name-b=value-b.
--exclude-node stringArray Nodes that will be excluded from use by the scheduler. Format: --exclude-node node-a --exclude-node node-b
--existing-pvc stringArray Mount an existing PersistentVolumeClaim. Format: claimname=CLAIM_NAME,path=PATH. Auto-complete supported.
--extended-resource stringArray Request access to a Kubernetes extended resource. Format: resource_name=quantity.
--external-url stringArray Expose a URL from the workload container. Format: container=PORT,url=https://external.runai.com,authusers=user1,authgroups=group1.
--git-sync stringArray Mount a Git repository into the container. Format: name=NAME,repository=REPO,path=PATH,secret=SECRET,rev=REVISION.
-g, --gpu-devices-request positiveInt Number of GPU devices to allocate for the workload (e.g. 1, 2).
--gpu-memory-limit string Maximum GPU memory to allocate (e.g. 1G, 500M).
--gpu-memory-request string Amount of GPU memory to allocate (e.g. 1G, 500M).
--gpu-portion-limit positiveFloat Maximum GPU fraction allowed for the workload (between 0 and 1).
--gpu-portion-request positiveFloat Fraction of a GPU to allocate (between 0 and 1, e.g. 0.5).
--gpu-request-type string Type of GPU request: portion, memory
-h, --help help for submit
--host-ipc Enable host IPC for the container. Default: false.
--host-network Enable host networking for the container. Default: false.
--host-path stringArray Mount a host path as a volume. Format: path=PATH,mount=MOUNT,mount-propagation=None|HostToContainer,readwrite.
-i, --image string The container image to use for the workload.
--image-pull-policy string Image pull policy for the container. Valid values: Always, IfNotPresent, Never.
--label stringArray Set of labels to populate into the container running the workspace
--large-shm Request a large /dev/shm device to mount in the container. Useful for memory-intensive workloads.
--launcher-creation-policy launcher-creation-policy Launcher creation policy. Valid values: AtStartup, WaitForWorkersReady
--master-args string Specifies the arguments to pass to the master pod container command
--master-command string Specifies the command to run in the master pod container, overriding the image's default entrypoint. The command can include arguments following it.
--master-cpu-core-limit positiveFloat Maximum number of CPU cores allowed for the master pod (e.g., 0.5, 1).
--master-cpu-core-request positiveFloat Number of CPU cores to request for the master pod (e.g., 0.5, 1).
--master-cpu-memory-limit string Maximum memory allowed for the master pod (e.g., 1G, 500M).
--master-cpu-memory-request string Amount of memory to request for the master pod (e.g., 1G, 500M).
--master-environment stringArray Set environment variables in the master container. --master-environment name=value -master-environment name-b=value-b.
--master-extended-resource stringArray Request access to a Kubernetes extended resource. Format: resource_name=quantity.
--master-no-pvcs Do not mount any persistent volumes in the master pod
--master-restart-policy restart-policy Restart policy for master pod. Valid values: Always, OnFailure, Never.
--name-prefix string Set defined prefix for the workload name and add index as a suffix
--new-pvc stringArray Create and mount a new volume. This volume is used only for the duration of the workload's lifecycle. Format: claimname=CLAIM_NAME,storageclass=STORAGE_CLASS,size=SIZE,path=PATH,accessmode-rwo,accessmode-rom,accessmode-rwm,ro,ephemeral.
--nfs stringArray Mount an NFS volume. Format: path=PATH,server=SERVER,mountpath=MOUNT_PATH,readwrite.
--node-pools stringArray Node pools to use for scheduling the job, ordered by priority. Format: --node-pools pool-a --node-pools pool-b
--node-type string Enforce node type affinity by setting a node-type label.
--pod-running-timeout duration Timeout for pod to reach running state (e.g. 5s, 2m, 3h).
--port stringArray Expose ports from the workload container. Format: service-type=NodePort,container=80,external=8080.
--preferred-pod-topology-key string If possible, schedule all pods of this workload on nodes with a matching label key and value. Format: key=VALUE.
--priority string Set the priority class for the workload.
-p, --project string Specify the project for the command to use. Defaults to the project set in the context, if any. Use 'runai project set <project>' to set the default.
--required-pod-topology-key string Require scheduling pods of this workload on nodes with a matching label key and value. Format: key=VALUE.
--restart-policy restart-policy Restart policy for worker pods. Valid values: Always, OnFailure, Never.
--run-as-gid int Group ID to run the container as.
--run-as-uid int User ID to run the container as.
--run-as-user Set the user and group IDs for the container. Uses local terminal credentials if not specified.
--s3 stringArray Mount an S3 bucket as a volume. Format: name=NAME,bucket=BUCKET,path=PATH,accesskey=ACCESS_KEY,url=URL.
--seccomp-profile string Seccomp profile for the container. Valid values: RuntimeDefault, Unconfined, or Localhost.
--secret-volume stringArray Mount a Kubernetes Secret as a volume. Format: path=PATH,name=SECRET_RESOURCE_NAME.
--slots-per-worker int32 Number of slots to allocate for each worker
--ssh-auth-mount-path string Specifies the directory where SSH keys are mounted.
--stdin Keep stdin open on the container(s) in the pod, even if nothing is attached.
--supplemental-groups ints Comma-separated list of group IDs for the container user.
--termination-grace-period duration The length of time (like 5s or 2m, higher than zero) the workload's pod is expected to terminate gracefully upon probe failure. In case value is not specified, kubernetes default of 30 seconds applies (default 0s)
--toleration stringArray Add Kubernetes tolerations. Format: operator=Equal|Exists,key=KEY,[value=VALUE],[effect=NoSchedule|NoExecute|PreferNoSchedule],[seconds=SECONDS].
-t, --tty Allocate a TTY for the container. Useful for interactive workloads.
--user-group-source string How to determine user/group IDs. Valid values: fromTheImage, fromIdpToken.
--wait-for-submit duration How long to wait for the workload to be created in the cluster. Default: 1m.
--workers int32 the number of workers that will be allocated for running the workload
--working-dir string Working directory inside the container. Overrides the default working directory set in the image.
Options inherited from parent commands
--config-file string config file name; can be set by environment variable RUNAI_CLI_CONFIG_FILE (default "config.json")
--config-path string config path; can be set by environment variable RUNAI_CLI_CONFIG_PATH
-d, --debug enable debug mode
-q, --quiet enable quiet mode, suppress all output except error messages
--verbose enable verbose mode
SEE ALSO
runai training mpi - mpi management
Last updated