runai inference submit

submit an inference workload

runai inference submit [flags]

Examples


# Submit a workload with scale to zero
runai inference submit <name> -p <project_name> -i ubuntu --gpu-devices-request 1 
--serving-port=8000 --min-scale=0 --max-scale=1 

# Submit a workload with autoscaling and authorization
runai inference submit <name> -p <project_name> -i ubuntu --gpu-devices-request 1
--serving-port=container=8000,authorization-type=authorizedUsersOrGroups,authorized-users=user1:user2,protocol=http 
--min-scale=1 --max-scale=4 --metric=concurrency  --metric-threshold=100 

Options

      --activation-replicas int32               The number of replicas to run when scaling-up from zero. Defaults to minReplicas, or to 1 if minReplicas is set to 0
      --annotation stringArray                  Set of annotations to populate into the container running the workload
      --attach                                  If true, wait for the pod to start running, and then attach to the pod as if 'runai attach' was called. Attach makes tty and stdin true by default. Defaults to false
      --capability stringArray                  The POSIX capabilities to add when running containers. Defaults to the default set of capabilities granted by the container runtime.
  -c, --command                                 If true, override the image's entrypoint with the command supplied after '--'
      --concurrency-hard-limit int32            The maximum number of requests allowed to flow to a single replica at any time. 0 means no limit
      --configmap-map-volume stringArray        Mount ConfigMap as a volume. Use the fhe format name=CONFIGMAP_NAME,path=PATH
      --cpu-core-limit float                    CPU core limit (e.g. 0.5, 1)
      --cpu-core-request float                  CPU core request (e.g. 0.5, 1)
      --cpu-memory-limit string                 CPU memory limit to allocate for the job (e.g. 1G, 500M)
      --cpu-memory-request string               CPU memory to allocate for the job (e.g. 1G, 500M)
      --create-home-dir                         Create a temporary home directory. Defaults to true when --run-as-user is set, false otherwise
      --env-pod-field-ref stringArray           Set an environment variable in the container with a field reference as value. Format: "ENV_VARIABLE=FIELD_REFERENCE"
  -e, --environment stringArray                 Set environment variables in the container
      --existing-pvc stringArray                Mount an existing persistent volume. Use the format: claimname=CLAIM_NAME,path=PATH <auto-complete supported>
      --extended-resource stringArray           Request access to an extended resource. Use the format: resource_name=quantity
      --external-url stringArray                Expose URL from the job container. Use the format: container=9443,url=https://external.runai.com,authusers=user1,authgroups=group1
      --git-sync stringArray                    Specifies git repositories to mount into the container. Use the format: name=NAME,repository=REPO,path=PATH,secret=SECRET,rev=REVISION
  -g, --gpu-devices-request int32               GPU units to allocate for the job (e.g. 1, 2)
      --gpu-memory-limit string                 GPU memory limit to allocate for the job (e.g. 1G, 500M)
      --gpu-memory-request string               GPU memory to allocate for the job (e.g. 1G, 500M)
      --gpu-portion-limit float                 GPU portion limit, must be no less than the gpu-memory-request (between 0 and 1, e.g. 0.5, 0.2)
      --gpu-portion-request float               GPU portion request (between 0 and 1, e.g. 0.5, 0.2)
      --gpu-request-type string                 GPU request type (portion|memory|migProfile[Deprecated])
  -h, --help                                    help for submit
      --host-path stringArray                   host paths (Volumes) to mount into the container. Format: path=PATH,mount=MOUNT,mount-propagation=None|HostToContainer,readwrite
  -i, --image string                            The image for the workload
      --image-pull-policy string                Set image pull policy. One of: Always, IfNotPresent, Never. Defaults to Always (default "Always")
      --initial-replicas int32                  The number of replicas to run when initializing the workload for the first time. Defaults to minReplicas, or to 1 if minReplicas is set to 0
      --initialization-timeout-seconds int32    The maximum amount of time (in seconds) to wait for the container to become ready
      --label stringArray                       Set of labels to populate into the container running the workspace
      --large-shm                               Request large /dev/shm device to mount
      --max-replicas int32                      The maximum number of replicas for autoscaling. Defaults to minReplicas, or to 1 if minReplicas is set to 0
      --metric string                           Autoscaling metric is required if minReplicas < maxReplicas, except when minReplicas = 0 and maxReplicas = 1. Use 'throughput', 'concurrency', 'latency', or custom metrics.
      --metric-threshold int32                  The threshold to use with the specified metric for autoscaling. Mandatory if metric is specified
      --metric-threshold-percentage float32     The percentage of metric threshold value to use for autoscaling. Defaults to 70. Applicable only with the 'throughput' and 'concurrency' metrics
      --min-replicas int32                      The minimum number of replicas for autoscaling. Defaults to 1. Use 0 to allow scale-to-zero
      --name-prefix string                      Set defined prefix for the workload name and add index as a suffix
      --new-pvc stringArray                     Mount a persistent volume, create it if it does not exist. Use the format: claimname=CLAIM_NAME,storageclass=STORAGE_CLASS,size=SIZE,path=PATH,accessmode-rwo,accessmode-rom,accessmode-rwm,ro,ephemeral
      --nfs stringArray                         NFS volumes to use in the workload. Format: path=PATH,server=SERVER,mountpath=MOUNT_PATH,readwrite
      --node-pools stringArray                  List of node pools to use for scheduling the job, ordered by priority
      --node-type string                        Enforce node type affinity by setting a node-type label
      --pod-running-timeout duration            Pod check for running state timeout.
      --port stringArray                        Expose ports from the job container. Use the format: service-type=NodePort,container=80,external=8080
      --preferred-pod-topology-key string       If possible, all pods of this job will be scheduled onto nodes that have a label with this key and identical values
      --priority priority-class                 Priority class of the workload (build|train|interactive-preemptible). The default value for workspace is 'build' and it can be changed to 'interactive-preemptible' to allow the workload to use over-quota resources. The default value for training is 'train' and it can be changed to 'build' to allow the training workload to have a higher priority for in-queue scheduling and also become non-preemptive (if it's in deserved quota)..
  -p, --project string                          Specify the project to which the command applies. By default, commands apply to the default project. To change the default project use ‘runai config project <project name>’
      --required-pod-topology-key string        Enforce scheduling pods of this job onto nodes that have a label with this key and identical values
      --run-as-gid int                          The group ID the container will run with
      --run-as-uid int                          The user ID the container will run with
      --run-as-user                             takes the uid, gid, and supplementary groups fields from the token, if all the fields do not exist, uses the local running terminal user credentials. if any of the fields exist take only the existing fields
      --scale-down-delay-seconds int32          The minimum amount of time (in seconds) that a replica will remain active after a scale-down decision
      --scale-to-zero-retention-seconds int32   The minimum amount of time (in seconds) that the last replica will remain active after a scale-to-zero decision. Defaults to 0. Available only if minReplicas is set to 0
      --seccomp-profile string                  Indicates which kind of seccomp profile will be applied to the container, options: RuntimeDefault|Unconfined|Localhost
      --secret-volume stringArray               Secret volumes to use in the workload. Format: path=PATH,name=SECRET_RESOURCE_NAME
      --serving-port string                     Defines various attributes for the serving port. Usage formats: (1) Simplified format: --serving-port=CONTAINER_PORT (2) Full format: --serving-port=container=CONTAINER_PORT,[authorization-type=public|authenticatedUsers|authorizedUsersOrGroups],[authorized-users=USER1:USER2...],[authorized-groups=GROUP1:GROUP2...],[cluster-local-access-only],[protocol=http|grpc]
      --supplemental-groups ints                Comma seperated list of groups (IDs) that the user running the container belongs to
      --toleration stringArray                  Toleration details. Use the format: operator=Equal|Exists,key=KEY,[value=VALUE],[effect=NoSchedule|NoExecute|PreferNoSchedule],[seconds=SECONDS]
      --user-group-source string                Indicate the way to determine the user and group ids of the container, options: fromTheImage|fromIdpToken|fromIdpToken
      --wait-for-submit duration                Waiting duration for the workload to be created in the cluster. Defaults to 1 minute (1m)
      --working-dir string                      Set the container's working directory

Options inherited from parent commands

      --config-file string   config file name; can be set by environment variable RUNAI_CLI_CONFIG_FILE (default "config.json")
      --config-path string   config path; can be set by environment variable RUNAI_CLI_CONFIG_PATH
  -d, --debug                enable debug mode
  -q, --quiet                enable quiet mode, suppress all output except error messages
      --verbose              enable verbose mode

SEE ALSO

Last updated