clearml-docs/docs/clearml_serving/clearml_serving_cli.md
2024-03-17 12:53:36 +02:00

18 KiB

title
ClearML Serving CLI

The clearml-serving utility is a CLI tool for model deployment and orchestration.

The following page provides a reference for clearml-serving's CLI commands:

  • list - List running Serving Services
  • create - Create a new Serving Service
  • metrics - Configure inference metrics Service
  • config - Configure a new Serving Service
  • model - Configure model endpoints for a running Service

Global Parameters

clearml-serving [-h] [--debug] [--yes] [--id ID] {list,create,metrics,config,model} 
Name Description Optional
--id Serving Service (Control plane) Task ID to configure (if not provided, automatically detect the running control plane Task) No
--debug Print debug messages Yes
--yes Always answer YES on interactive inputs Yes

:::info Service ID The Serving Service's ID (--id) is required to execute the metrics, config, and model commands. :::

list

List running Serving Services.

clearml-serving list [-h]

create

Create a new Serving Service.

clearml-serving create [-h] [--name NAME] [--tags TAGS [TAGS ...]] [--project PROJECT]

Parameters

Name Description Optional
--name Serving service's name. Default: Serving-Service No
--project Serving service's project. Default: DevOps No
--tags Serving service's user tags. The serving service can be labeled, which can be useful for organizing Yes

metrics

Configure inference metrics Service.

clearml-serving metrics [-h] {add,remove,list}

add

Add/modify metric for a specific endpoint.

clearml-serving metrics add [-h] --endpoint ENDPOINT [--log-freq LOG_FREQ]
                            [--variable-scalar VARIABLE_SCALAR [VARIABLE_SCALAR ...]]
                            [--variable-enum VARIABLE_ENUM [VARIABLE_ENUM ...]]
                            [--variable-value VARIABLE_VALUE [VARIABLE_VALUE ...]]

Parameters

Name Description Optional
--endpoint Metric endpoint name including version (e.g. "model/1" or a prefix "model/*"). Notice: it will override any previous endpoint logged metrics No
--log-freq Logging request frequency, between 0.0 to 1.0. Example: 1.0 means all requests are logged, 0.5 means half of the requests are logged if not specified. To use global logging frequency, see config --metric-log-freq Yes
--variable-scalar Add float (scalar) argument to the metric logger, <name>=<histogram>. Example: with specific buckets: "x1=0,0.2,0.4,0.6,0.8,1" or with min/max/num_buckets "x1=0.0/1.0/5". Notice: In cases where 1000s of requests per second reach the serving, it makes no sense to display every datapoint. So scalars can be divided in buckets, and for each minute for example. Then it's possible to calculate what % of the total traffic fell in bucket 1, bucket 2, bucket 3 etc. The Y axis represents the buckets, color is the value in % of traffic in that bucket, and X is time. Yes
--variable-enum Add enum (string) argument to the metric logger, <name>=<optional_values>. Example: "detect=cat,dog,sheep" Yes
--variable-value Add non-samples scalar argument to the metric logger, <name>. Example: "latency" Yes

remove

Remove metric from a specific endpoint.

clearml-serving metrics remove [-h] [--endpoint ENDPOINT]
                               [--variable VARIABLE [VARIABLE ...]]

Parameters

Name Description Optional
--endpoint Metric endpoint name including version (e.g. "model/1" or a prefix "model/*") No
--variable Remove (scalar/enum) argument from the metric logger, <name> example: "x1" Yes

list

List metrics logged on all endpoints.

clearml-serving metrics list [-h]

config

Configure a new Serving Service.

clearml-serving config [-h] [--base-serving-url BASE_SERVING_URL]
                       [--triton-grpc-server TRITON_GRPC_SERVER]
                       [--kafka-metric-server KAFKA_METRIC_SERVER]
                       [--metric-log-freq METRIC_LOG_FREQ]

Parameters

Name Description Optional
--base-serving-url External base serving service url. Example: http://127.0.0.1:8080/serve Yes
--triton-grpc-server External ClearML-Triton serving container gRPC address. Example: 127.0.0.1:9001 Yes
--kafka-metric-server External Kafka service url. Example: 127.0.0.1:9092 Yes
--metric-log-freq Set default metric logging frequency between 0.0 to 1.0. 1.0 means that 100% of all requests are logged Yes

model

Configure model endpoints for an already running Service.

clearml-serving model [-h] {list,remove,upload,canary,auto-update,add}

list

List current models.

clearml-serving model list [-h]

remove

Remove model by its endpoint name.

clearml-serving model remove [-h] [--endpoint ENDPOINT]

Parameter

Name Description Optional
--endpoint Model endpoint name No

upload

Upload and register model files/folder.

clearml-serving model upload [-h] --name NAME [--tags TAGS [TAGS ...]] --project PROJECT
                             [--framework {tensorflow,tensorflowjs,tensorflowlite,pytorch,torchscript,caffe,caffe2,onnx,keras,mknet,cntk,torch,darknet,paddlepaddle,scikitlearn,xgboost,lightgbm,parquet,megengine,catboost,tensorrt,openvino,custom}]
                             [--publish] [--path PATH] [--url URL]
                             [--destination DESTINATION]

Parameters

Name Description Optional
--name Specifying the model name to be registered in No
--tags Add tags to the newly created model Yes
--project Specify the project for the model to be registered in No
--framework Specify the model framework. Options are: 'tensorflow', 'tensorflowjs', 'tensorflowlite', 'pytorch', 'torchscript', 'caffe', 'caffe2', 'onnx', 'keras', 'mknet', 'cntk', 'torch', 'darknet', 'paddlepaddle', 'scikitlearn', 'xgboost', 'lightgbm', 'parquet', 'megengine', 'catboost', 'tensorrt', 'openvino', 'custom' Yes
--publish Publish the newly created model (change model state to "published" (i.e. locked and ready to deploy) Yes
--path Specify a model file/folder to be uploaded and registered Yes
--url Specify an already uploaded model url (e.g. s3://bucket/model.bin, gs://bucket/model.bin) Yes
--destination Specify the target destination for the model to be uploaded. For example: s3://bucket/folder/, s3://host_addr:port/bucket (for non-AWS S3-like services like MinIO), gs://bucket-name/folder, azure://<account name>.blob.core.windows.net/path/to/file Yes

canary

Add model Canary/A/B endpoint.

clearml-serving model canary [-h] [--endpoint ENDPOINT] [--weights WEIGHTS [WEIGHTS ...]]
                             [--input-endpoints INPUT_ENDPOINTS [INPUT_ENDPOINTS ...]]
                             [--input-endpoint-prefix INPUT_ENDPOINT_PREFIX]

Parameters

Name Description Optional
--endpoint Model canary serving endpoint name (e.g. my_model/latest) Yes
--weights Model canary weights (order matching model ep), (e.g. 0.2 0.8) Yes
--input-endpoints Model endpoint prefixes, can also include version (e.g. my_model, my_model/v1) Yes
--input-endpoint-prefix Model endpoint prefix, lexicographic order or by version <int> (e.g. my_model/1, my_model/v1), where the first weight matches the last version. Yes

auto-update

Add/Modify model auto-update service.

clearml-serving model auto-update [-h] [--endpoint ENDPOINT] --engine ENGINE
                                  [--max-versions MAX_VERSIONS] [--name NAME]
                                  [--tags TAGS [TAGS ...]] [--project PROJECT]
                                  [--published] [--preprocess PREPROCESS]
                                  [--input-size INPUT_SIZE [INPUT_SIZE ...]]
                                  [--input-type INPUT_TYPE] [--input-name INPUT_NAME]
                                  [--output-size OUTPUT_SIZE [OUTPUT_SIZE ...]]
                                  [--output_type OUTPUT_TYPE] [--output-name OUTPUT_NAME]
                                  [--aux-config AUX_CONFIG [AUX_CONFIG ...]]

Parameters

Name Description Optional
--endpoint Base model endpoint (must be unique) No
--engine Model endpoint serving engine (triton, sklearn, xgboost, lightgbm) No
--max-versions Max versions to store (and create endpoints) for the model. Highest number is the latest version Yes
--name Specify model name to be selected and auto-updated (notice regexp selection use "$name^" for exact match) Yes
--tags Specify tags to be selected and auto-updated Yes
--project Specify model project to be selected and auto-updated Yes
--published Only select published model for auto-update Yes
--preprocess Specify Pre/Post processing code to be used with the model (point to local file / folder) - this should hold for all the models Yes
--input-size Specify the model matrix input size [Rows x Columns X Channels etc ...] Yes
--input-type Specify the model matrix input type. Examples: uint8, float32, int16, float16 etc. Yes
--input-name Specify the model layer pushing input into. Example: layer_0 Yes
--output-size Specify the model matrix output size [Rows x Columns X Channels etc ...] Yes
--output_type Specify the model matrix output type. Examples: uint8, float32, int16, float16 etc. Yes
--output-name Specify the model layer pulling results from. Examples: layer_99 Yes
--aux-config Specify additional engine specific auxiliary configuration in the form of key=value. Example: platform=onnxruntime_onnx response_cache.enable=true max_batch_size=8. Notice: you can also pass a full configuration file (e.g. Triton "config.pbtxt") Yes

add

Add/Update model.

clearml-serving model add [-h] --engine ENGINE --endpoint ENDPOINT [--version VERSION]
                          [--model-id MODEL_ID] [--preprocess PREPROCESS]
                          [--input-size INPUT_SIZE [INPUT_SIZE ...]]
                          [--input-type INPUT_TYPE] [--input-name INPUT_NAME]
                          [--output-size OUTPUT_SIZE [OUTPUT_SIZE ...]]
                          [--output-type OUTPUT_TYPE] [--output-name OUTPUT_NAME]
                          [--aux-config AUX_CONFIG [AUX_CONFIG ...]] [--name NAME]
                          [--tags TAGS [TAGS ...]] [--project PROJECT] [--published]

Parameters

Name Description Optional
--engine Model endpoint serving engine (triton, sklearn, xgboost, lightgbm) No
--endpoint Base model endpoint (must be unique) No
--version Model endpoint version (default: None) Yes
--model-id Specify a model ID to be served No
--preprocess Specify Pre/Post processing code to be used with the model (point to local file / folder) - this should hold for all the models Yes
--input-size Specify the model matrix input size [Rows x Columns X Channels etc ...] Yes
--input-type Specify the model matrix input type. Examples: uint8, float32, int16, float16 etc. Yes
--input-name Specify the model layer pushing input into. Example: layer_0 Yes
--output-size Specify the model matrix output size [Rows x Columns X Channels etc ...] Yes
--output_type Specify the model matrix output type. Examples: uint8, float32, int16, float16 etc. Yes
--output-name Specify the model layer pulling results from. Examples: layer_99 Yes
--aux-config Specify additional engine specific auxiliary configuration in the form of key=value. Example: platform=onnxruntime_onnx response_cache.enable=true max_batch_size=8. Notice: you can also pass a full configuration file (e.g. Triton "config.pbtxt") Yes
--name Instead of specifying --model-id select based on model name Yes
--tags Specify tags to be selected and auto-updated Yes
--project Instead of specifying --model-id select based on model project Yes
--published Instead of specifying --model-id select based on model published Yes