--- title: CLI --- The `clearml-serving` utility is a CLI tool for model deployment and orchestration. The following page provides a reference for `clearml-serving`'s CLI commands: * [list](#list) - List running Serving Services * [create](#create) - Create a new Serving Service * [metrics](#metrics) - Configure inference metrics Service * [config](#config) - Configure a new Serving Service * [model](#model) - Configure model endpoints for a running Service ## Global Parameters ```bash clearml-serving [-h] [--debug] [--yes] [--id ID] {list,create,metrics,config,model} ```
|Name|Description|Optional| |---|---|---| |`--id`|Serving Service (Control plane) Task ID to configure (if not provided, automatically detect the running control plane Task) | No | |`--debug` | Print debug messages | Yes | |`--yes` |Always answer YES on interactive inputs| Yes |
:::info Service ID The Serving Service's ID (`--id`) is required to execute the `metrics`, `config`, and `model` commands. ::: ## list List running Serving Services. ```bash clearml-serving list [-h] ``` ## create Create a new Serving Service. ```bash clearml-serving create [-h] [--name NAME] [--tags TAGS [TAGS ...]] [--project PROJECT] ``` **Parameters**
|Name|Description|Optional| |---|---|---| |`--name` |Serving service's name. Default: `Serving-Service`| No | |`--project`|Serving service's project. Default: `DevOps`| No | |`--tags` |Serving service's user tags. The serving service can be labeled, which can be useful for organizing | Yes|
## metrics Configure inference metrics Service. ```bash clearml-serving metrics [-h] {add,remove,list} ``` ### add Add/modify metric for a specific endpoint. ```bash clearml-serving metrics add [-h] --endpoint ENDPOINT [--log-freq LOG_FREQ] [--variable-scalar VARIABLE_SCALAR [VARIABLE_SCALAR ...]] [--variable-enum VARIABLE_ENUM [VARIABLE_ENUM ...]] [--variable-value VARIABLE_VALUE [VARIABLE_VALUE ...]] ``` **Parameters**
|Name|Description|Optional| |---|---|---| |`--endpoint`|Metric endpoint name including version (e.g. `"model/1"` or a prefix `"model/*"`). Notice: it will override any previous endpoint logged metrics| No| |`--log-freq`|Logging request frequency, between 0.0 to 1.0. Example: 1.0 means all requests are logged, 0.5 means half of the requests are logged if not specified. To use global logging frequency, see [`config --metric-log-freq`](#config)| Yes| |`--variable-scalar`|Add float (scalar) argument to the metric logger, `=`. Example: with specific buckets: `"x1=0,0.2,0.4,0.6,0.8,1"` or with min/max/num_buckets `"x1=0.0/1.0/5"`. Notice: In cases where 1000s of requests per second reach the serving, it makes no sense to display every datapoint. So scalars can be divided in buckets, and for each minute for example. Then it's possible to calculate what % of the total traffic fell in bucket 1, bucket 2, bucket 3 etc. The Y axis represents the buckets, color is the value in % of traffic in that bucket, and X is time. | Yes| |`--variable-enum`|Add enum (string) argument to the metric logger, `=`. Example: `"detect=cat,dog,sheep"` |Yes| |`--variable-value`|Add non-samples scalar argument to the metric logger, ``. Example: `"latency"` |Yes|
### remove Remove metric from a specific endpoint. ```bash clearml-serving metrics remove [-h] [--endpoint ENDPOINT] [--variable VARIABLE [VARIABLE ...]] ``` **Parameters**
|Name|Description|Optional| |---|---|---| |`--endpoint`| Metric endpoint name including version (e.g. `"model/1"` or a prefix `"model/*"`) |No| |`--variable`| Remove (scalar/enum) argument from the metric logger, `` example: `"x1"` |Yes|
### list List metrics logged on all endpoints. ```bash clearml-serving metrics list [-h] ```
## config Configure a new Serving Service. ```bash clearml-serving config [-h] [--base-serving-url BASE_SERVING_URL] [--triton-grpc-server TRITON_GRPC_SERVER] [--kafka-metric-server KAFKA_METRIC_SERVER] [--metric-log-freq METRIC_LOG_FREQ] ``` **Parameters**
|Name|Description|Optional| |---|---|---| |`--base-serving-url`|External base serving service url. Example: `http://127.0.0.1:8080/serve`|Yes| |`--triton-grpc-server`|External ClearML-Triton serving container gRPC address. Example: `127.0.0.1:9001`|Yes| |`--kafka-metric-server`|External Kafka service url. Example: `127.0.0.1:9092`|Yes| |`--metric-log-freq`|Set default metric logging frequency between 0.0 to 1.0. 1.0 means that 100% of all requests are logged|Yes|

## model Configure model endpoints for an already running Service. ```bash clearml-serving model [-h] {list,remove,upload,canary,auto-update,add} ``` ### list List current models. ```bash clearml-serving model list [-h] ``` ### remove Remove model by its endpoint name. ```bash clearml-serving model remove [-h] [--endpoint ENDPOINT] ``` **Parameter**
|Name|Description|Optional| |---|---|---| |`--endpoint` | Model endpoint name | No|
### upload Upload and register model files/folder. ```bash clearml-serving model upload [-h] --name NAME [--tags TAGS [TAGS ...]] --project PROJECT [--framework {tensorflow,tensorflowjs,tensorflowlite,pytorch,torchscript,caffe,caffe2,onnx,keras,mknet,cntk,torch,darknet,paddlepaddle,scikitlearn,xgboost,lightgbm,parquet,megengine,catboost,tensorrt,openvino,custom}] [--publish] [--path PATH] [--url URL] [--destination DESTINATION] ``` **Parameters**
|Name|Description|Optional| |---|---|---| |`--name`|Specifying the model name to be registered in| No| |`--tags`| Add tags to the newly created model| Yes| |`--project`| Specify the project for the model to be registered in| No| |`--framework`| Specify the model framework. Options are: 'tensorflow', 'tensorflowjs', 'tensorflowlite', 'pytorch', 'torchscript', 'caffe', 'caffe2', 'onnx', 'keras', 'mknet', 'cntk', 'torch', 'darknet', 'paddlepaddle', 'scikitlearn', 'xgboost', 'lightgbm', 'parquet', 'megengine', 'catboost', 'tensorrt', 'openvino', 'custom' | Yes| |`--publish`| Publish the newly created model (change model state to "published" (i.e. locked and ready to deploy)|Yes| |`--path`|Specify a model file/folder to be uploaded and registered| Yes| |`--url`| Specify an already uploaded model url (e.g. `s3://bucket/model.bin`, `gs://bucket/model.bin`)|Yes| |`--destination`|Specify the target destination for the model to be uploaded (e.g. `s3://bucket/folder/`, `gs://bucket/folder/`)|Yes|
### canary Add model Canary/A/B endpoint. ```bash clearml-serving model canary [-h] [--endpoint ENDPOINT] [--weights WEIGHTS [WEIGHTS ...]] [--input-endpoints INPUT_ENDPOINTS [INPUT_ENDPOINTS ...]] [--input-endpoint-prefix INPUT_ENDPOINT_PREFIX] ``` **Parameters**
|Name|Description|Optional| |---|---|---| |`--endpoint`| Model canary serving endpoint name (e.g. `my_model/latest`)| Yes| |`--weights`| Model canary weights (order matching model ep), (e.g. 0.2 0.8) |Yes| |`--input-endpoints`|Model endpoint prefixes, can also include version (e.g. `my_model`, `my_model/v1`)| Yes| |`--input-endpoint-prefix`| Model endpoint prefix, lexicographic order or by version `` (e.g. `my_model/1`, `my_model/v1`), where the first weight matches the last version.|Yes|
### auto-update Add/Modify model auto-update service. ```bash clearml-serving model auto-update [-h] [--endpoint ENDPOINT] --engine ENGINE [--max-versions MAX_VERSIONS] [--name NAME] [--tags TAGS [TAGS ...]] [--project PROJECT] [--published] [--preprocess PREPROCESS] [--input-size INPUT_SIZE [INPUT_SIZE ...]] [--input-type INPUT_TYPE] [--input-name INPUT_NAME] [--output-size OUTPUT_SIZE [OUTPUT_SIZE ...]] [--output_type OUTPUT_TYPE] [--output-name OUTPUT_NAME] [--aux-config AUX_CONFIG [AUX_CONFIG ...]] ``` **Parameters**
|Name|Description|Optional| |---|---|---| |`--endpoint`| Base model endpoint (must be unique)| No| |`--engine`| Model endpoint serving engine (triton, sklearn, xgboost, lightgbm)| No| |`--max-versions`|Max versions to store (and create endpoints) for the model. Highest number is the latest version | Yes| |`--name`| Specify model name to be selected and auto-updated (notice regexp selection use `"$name^"` for exact match) | Yes| |`--tags`|Specify tags to be selected and auto-updated |Yes| |`--project`|Specify model project to be selected and auto-updated | Yes| |`--published`| Only select published model for auto-update |Yes| |`--preprocess` |Specify Pre/Post processing code to be used with the model (point to local file / folder) - this should hold for all the models |Yes| |`--input-size`| Specify the model matrix input size [Rows x Columns X Channels etc ...] | Yes| |`--input-type`| Specify the model matrix input type. Examples: uint8, float32, int16, float16 etc. |Yes| |`--input-name`|Specify the model layer pushing input into. Example: layer_0 | Yes| |`--output-size`|Specify the model matrix output size [Rows x Columns X Channels etc ...]|Yes| |`--output_type`| Specify the model matrix output type. Examples: uint8, float32, int16, float16 etc. | Yes| |`--output-name`|Specify the model layer pulling results from. Examples: layer_99| Yes| |`--aux-config`| Specify additional engine specific auxiliary configuration in the form of key=value. Example: `platform=onnxruntime_onnx response_cache.enable=true max_batch_size=8`. Notice: you can also pass a full configuration file (e.g. Triton "config.pbtxt")|Yes|
### add Add/Update model. ```bash clearml-serving model add [-h] --engine ENGINE --endpoint ENDPOINT [--version VERSION] [--model-id MODEL_ID] [--preprocess PREPROCESS] [--input-size INPUT_SIZE [INPUT_SIZE ...]] [--input-type INPUT_TYPE] [--input-name INPUT_NAME] [--output-size OUTPUT_SIZE [OUTPUT_SIZE ...]] [--output-type OUTPUT_TYPE] [--output-name OUTPUT_NAME] [--aux-config AUX_CONFIG [AUX_CONFIG ...]] [--name NAME] [--tags TAGS [TAGS ...]] [--project PROJECT] [--published] ``` **Parameters**
|Name|Description|Optional| |---|---|---| |`--engine`| Model endpoint serving engine (triton, sklearn, xgboost, lightgbm)| No| |`--endpoint`| Base model endpoint (must be unique)| No| |`--version`|Model endpoint version (default: None) | Yes| |`--model-id`|Specify a model ID to be served|No| |`--preprocess` |Specify Pre/Post processing code to be used with the model (point to local file / folder) - this should hold for all the models |Yes| |`--input-size`| Specify the model matrix input size [Rows x Columns X Channels etc ...] | Yes| |`--input-type`| Specify the model matrix input type. Examples: uint8, float32, int16, float16 etc. |Yes| |`--input-name`|Specify the model layer pushing input into. Example: layer_0 | Yes| |`--output-size`|Specify the model matrix output size [Rows x Columns X Channels etc ...]|Yes| |`--output_type`| Specify the model matrix output type. Examples: uint8, float32, int16, float16 etc. | Yes| |`--output-name`|Specify the model layer pulling results from. Examples: layer_99| Yes| |`--aux-config`| Specify additional engine specific auxiliary configuration in the form of key=value. Example: `platform=onnxruntime_onnx response_cache.enable=true max_batch_size=8`. Notice: you can also pass a full configuration file (e.g. Triton "config.pbtxt")|Yes| |`--name`| Instead of specifying `--model-id` select based on model name | Yes| |`--tags`|Specify tags to be selected and auto-updated |Yes| |`--project`|Instead of specifying `--model-id` select based on model project | Yes| |`--published`| Instead of specifying `--model-id` select based on model published |Yes|