runai inference submit

submit an inference workload

runai inference submit [flags]

Examples


# Submit a workload with scale to zero
runai inference submit <name> -p <project_name> -i ghcr.io/knative/helloworld-go --gpu-devices-request 1 
--serving-port=8080 --min-replicas=0 --max-replicas=1 

# Submit a workload with autoscaling and authorization
runai inference submit <name> -p <project_name> -i ghcr.io/knative/helloworld-go  --gpu-devices-request 1
--serving-port=container=8080,authorization-type=authorizedUsersOrGroups,authorized-users=user1:user2:app1,protocol=http 
--min-replicas=1 --max-replicas=4 --metric=concurrency  --metric-threshold=100

Options

      --activation-replicas int32               The number of replicas to run when scaling-up from zero. Defaults to minReplicas, or to 1 if minReplicas is set to 0
      --annotation stringArray                  Set of annotations to populate into the container running the workload
      --capability stringArray                  Add POSIX capabilities to the container. Defaults to the runtime's default set.
  -c, --command                                 If true, override the image's entrypoint with the command supplied after '--'
      --concurrency-hard-limit int32            The maximum number of requests allowed to flow to a single replica at any time. 0 means no limit
      --configmap-map-volume stringArray        Mount a ConfigMap as a volume. Format: name=CONFIGMAP_NAME,path=PATH,subpath=SUBPATH,default-mode=DEFAULT_MODE.
      --cpu-core-limit positiveFloat            Maximum number of CPU cores allowed (e.g. 0.5, 1).
      --cpu-core-request positiveFloat          Number of CPU cores to request (e.g. 0.5, 1).
      --cpu-memory-limit string                 Maximum memory allowed (e.g. 1G, 500M).
      --cpu-memory-request string               Amount of memory to request (e.g. 1G, 500M).
      --create-home-dir                         Create a temporary home directory for the container. Defaults to true when --run-as-user is set, false otherwise.
      --env-pod-field-ref stringArray           Set an environment variable from a pod field reference. Format: ENV_VARIABLE=FIELD_REFERENCE.
  -e, --environment stringArray                 Set environment variables in the container. Format: --environment name=value -environment name-b=value-b.
      --exclude-node stringArray                Nodes that will be excluded from use by the scheduler. Format: --exclude-node node-a --exclude-node node-b
      --existing-pvc stringArray                Mount an existing PersistentVolumeClaim. Format: claimname=CLAIM_NAME,path=PATH. Auto-complete supported.
      --extended-resource stringArray           Request access to a Kubernetes extended resource. Format: resource_name=quantity.
      --external-url stringArray                Expose a URL from the workload container. Format: container=PORT,url=https://external.runai.com,authusers=user1,authgroups=group1.
      --git-sync stringArray                    Mount a Git repository into the container. Format: name=NAME,repository=REPO,path=PATH,secret=SECRET,rev=REVISION.
  -g, --gpu-devices-request positiveInt         Number of GPU devices to allocate for the workload (e.g. 1, 2).
      --gpu-memory-limit string                 Maximum GPU memory to allocate (e.g. 1G, 500M).
      --gpu-memory-request string               Amount of GPU memory to allocate (e.g. 1G, 500M).
      --gpu-portion-limit positiveFloat         Maximum GPU fraction allowed for the workload (between 0 and 1).
      --gpu-portion-request positiveFloat       Fraction of a GPU to allocate (between 0 and 1, e.g. 0.5).
      --gpu-request-type string                 Type of GPU request: portion, memory
  -h, --help                                    help for submit
      --host-path stringArray                   Mount a host path as a volume. Format: path=PATH,mount=MOUNT,mount-propagation=None|HostToContainer,readwrite.
  -i, --image string                            The container image to use for the workload.
      --image-pull-policy string                Image pull policy for the container. Valid values: Always, IfNotPresent, Never.
      --initial-replicas int32                  The number of replicas to run when initializing the workload for the first time. Defaults to minReplicas, or to 1 if minReplicas is set to 0
      --initialization-timeout-seconds int32    The maximum amount of time (in seconds) to wait for the container to become ready
      --label stringArray                       Set of labels to populate into the container running the workspace
      --large-shm                               Request a large /dev/shm device to mount in the container. Useful for memory-intensive workloads.
      --max-replicas int32                      The maximum number of replicas for autoscaling. Defaults to minReplicas, or to 1 if minReplicas is set to 0
      --metric string                           Autoscaling metric is required if minReplicas < maxReplicas, except when minReplicas = 0 and maxReplicas = 1. Use 'throughput', 'concurrency', 'latency', or custom metrics.
      --metric-threshold int32                  The threshold to use with the specified metric for autoscaling. Mandatory if metric is specified
      --metric-threshold-percentage float32     The percentage of metric threshold value to use for autoscaling. Defaults to 70. Applicable only with the 'throughput' and 'concurrency' metrics
      --min-replicas int32                      The minimum number of replicas for autoscaling. Defaults to 1. Use 0 to allow scale-to-zero
      --name-prefix string                      Set defined prefix for the workload name and add index as a suffix
      --new-pvc stringArray                     Create and mount a new volume. This volume is used only for the duration of the workload's lifecycle. Format: claimname=CLAIM_NAME,storageclass=STORAGE_CLASS,size=SIZE,path=PATH,accessmode-rwo,accessmode-rom,accessmode-rwm,ro,ephemeral.
      --nfs stringArray                         Mount an NFS volume. Format: path=PATH,server=SERVER,mountpath=MOUNT_PATH,readwrite.
      --node-pools stringArray                  Node pools to use for scheduling the job, ordered by priority. Format: --node-pools pool-a --node-pools pool-b
      --node-type string                        Enforce node type affinity by setting a node-type label.
      --pod-running-timeout duration            Timeout for pod to reach running state (e.g. 5s, 2m, 3h).
      --port stringArray                        Expose ports from the workload container. Format: service-type=NodePort,container=80,external=8080.
      --preferred-pod-topology-key string       If possible, schedule all pods of this workload on nodes with a matching label key and value. Format: key=VALUE.
      --priority string                         Set the priority class for the workload.
      --privileged                              Grants the container full access to the host, bypassing almost all container isolation; the container acts like root.
  -p, --project string                          Specify the project for the command to use. Defaults to the project set in the context, if any. Use 'runai project set <project>' to set the default.
      --request-timeout-seconds int32           The maximum time (in seconds) allowed to process an end-user request. If no response is returned within this time, the request will be ignored
      --required-pod-topology-key string        Require scheduling pods of this workload on nodes with a matching label key and value. Format: key=VALUE.
      --run-as-gid int                          Group ID to run the container as.
      --run-as-uid int                          User ID to run the container as.
      --run-as-user                             Set the user and group IDs for the container. Uses local terminal credentials if not specified.
      --scale-down-delay-seconds int32          The minimum amount of time (in seconds) that a replica will remain active after a scale-down decision
      --scale-to-zero-retention-seconds int32   The minimum amount of time (in seconds) that the last replica will remain active after a scale-to-zero decision. Defaults to 0. Available only if minReplicas is set to 0
      --seccomp-profile string                  Seccomp profile for the container. Valid values: RuntimeDefault, Unconfined, or Localhost.
      --secret-volume stringArray               Mount a Kubernetes Secret as a volume. Format: path=PATH,name=SECRET_RESOURCE_NAME.
      --serving-port string                     Defines various attributes for the serving port. Usage formats: (1) Simplified format: --serving-port=CONTAINER_PORT (2) Full format: --serving-port=container=CONTAINER_PORT,[authorization-type=public|authenticatedUsers|authorizedUsersOrGroups],[authorized-users=USER1:USER2:APP1...],[authorized-groups=GROUP1:GROUP2...],[cluster-local-access-only],[protocol=http|grpc]
      --supplemental-groups ints                Comma-separated list of group IDs for the container user.
      --toleration stringArray                  Add Kubernetes tolerations. Format: operator=Equal|Exists,key=KEY,[value=VALUE],[effect=NoSchedule|NoExecute|PreferNoSchedule],[seconds=SECONDS].
      --user-group-source string                How to determine user/group IDs. Valid values: fromTheImage, fromIdpToken.
      --wait-for-submit duration                How long to wait for the workload to be created in the cluster. Default: 1m.
      --working-dir string                      Working directory inside the container. Overrides the default working directory set in the image.

Options inherited from parent commands

      --config-file string   config file name; can be set by environment variable RUNAI_CLI_CONFIG_FILE (default "config.json")
      --config-path string   config path; can be set by environment variable RUNAI_CLI_CONFIG_PATH
  -d, --debug                enable debug mode
  -q, --quiet                enable quiet mode, suppress all output except error messages
      --verbose              enable verbose mode

Good afternoon

runai inference submit

Examples

Options

Options inherited from parent commands

SEE ALSO