17 KiB
title |
---|
CLI |
The clearml-serving
utility is a CLI tool for model deployment and orchestration.
The following page provides a reference for clearml-serving
's CLI commands:
- list - List running Serving Services
- create - Create a new Serving Service
- metrics - Configure inference metrics Service
- config - Configure a new Serving Service
- model - Configure model endpoints for a running Service
Global Parameters
clearml-serving [-h] [--debug] [--id ID] {list,create,metrics,config,model}
Name | Description | Optional |
---|---|---|
--id |
Serving Service (Control plane) Task ID to configure (if not provided automatically detect the running control plane Task) | |
--debug |
Print debug messages |
:::info Service ID
The Serving Service's ID (--id
) is required to execute the metrics
, config
, and model
commands.
:::
list
List running Serving Services.
clearml-serving list [-h]
create
Create a new Serving Service.
clearml-serving create [-h] [--name NAME] [--tags TAGS [TAGS ...]] [--project PROJECT]
Parameters
Name | Description | Optional |
---|---|---|
--name |
Serving service's name. Default: Serving-Service |
|
--project |
Serving service's project. Default: DevOps |
|
--tags |
Serving service's user tags. The serving service can be labeled, which can be useful for organizing |
metrics
Configure inference metrics Service.
clearml-serving metrics [-h] {add,remove,list}
add
Add/modify metric for a specific endpoint.
clearml-serving metrics add [-h] --endpoint ENDPOINT [--log-freq LOG_FREQ]
[--variable-scalar VARIABLE_SCALAR [VARIABLE_SCALAR ...]]
[--variable-enum VARIABLE_ENUM [VARIABLE_ENUM ...]]
[--variable-value VARIABLE_VALUE [VARIABLE_VALUE ...]]
Parameters
Name | Description | Optional |
---|---|---|
--endpoint |
Metric endpoint name including version (e.g. "model/1" or a prefix "model/*" ). Notice: it will override any previous endpoint logged metrics |
|
--log-freq |
Logging request frequency, between 0.0 to 1.0. Example: 1.0 means all requests are logged, 0.5 means half of the requests are logged if not specified. To use global logging frequency, see config --metric-log-freq |
|
--variable-scalar |
Add float (scalar) argument to the metric logger, <name>=<histogram> . Example: with specific buckets: "x1=0,0.2,0.4,0.6,0.8,1" or with min/max/num_buckets "x1=0.0/1.0/5" |
|
--variable-enum |
Add enum (string) argument to the metric logger, <name>=<optional_values> . Example: "detect=cat,dog,sheep" |
|
--variable-value |
Add non-samples scalar argument to the metric logger, <name> . Example: "latency" |
remove
Remove metric from a specific endpoint.
clearml-serving metrics remove [-h] [--endpoint ENDPOINT]
[--variable VARIABLE [VARIABLE ...]]
Parameters
Name | Description | Optional |
---|---|---|
--endpoint |
Metric endpoint name including version (e.g. "model/1" or a prefix "model/*" ) |
|
--variable |
Remove (scalar/enum) argument from the metric logger, <name> example: "x1" |
list
List metrics logged on all endpoints.
clearml-serving metrics list [-h]
config
Configure a new Serving Service.
clearml-serving config [-h] [--base-serving-url BASE_SERVING_URL]
[--triton-grpc-server TRITON_GRPC_SERVER]
[--kafka-metric-server KAFKA_METRIC_SERVER]
[--metric-log-freq METRIC_LOG_FREQ]
Parameters
Name | Description | Optional |
---|---|---|
--base-serving-url |
External base serving service url. Example: http://127.0.0.1:8080/serve |
|
--triton-grpc-server |
External ClearML-Triton serving container gRPC address. Example: 127.0.0.1:9001 |
|
--kafka-metric-server |
External Kafka service url. Example: 127.0.0.1:9092 |
|
--metric-log-freq |
Set default metric logging frequency between 0.0 to 1.0. 1.0 means that 100% of all requests are logged |
model
Configure model endpoints for an already running Service.
clearml-serving model [-h] {list,remove,upload,canary,auto-update,add}
list
List current models.
clearml-serving model list [-h]
remove
Remove model by its endpoint name.
clearml-serving model remove [-h] [--endpoint ENDPOINT]
Parameter
Name | Description | Optional |
---|---|---|
--endpoint |
Model endpoint name |
upload
Upload and register model files/folder.
clearml-serving model upload [-h] --name NAME [--tags TAGS [TAGS ...]] --project PROJECT
[--framework {scikit-learn,xgboost,lightgbm,tensorflow,pytorch}]
[--publish] [--path PATH] [--url URL]
[--destination DESTINATION]
Parameters
Name | Description | Optional |
---|---|---|
--name |
Specifying the model name to be registered in | |
--tags |
Add tags to the newly created model | |
--project |
Specify the project for the model to be registered in | |
--framework |
Specify the model framework. Options are: "scikit-learn", "xgboost", "lightgbm", "tensorflow", "pytorch" | |
--publish |
Publish the newly created model (change model state to "published" (i.e. locked and ready to deploy) | |
--path |
Specify a model file/folder to be uploaded and registered | |
--url |
Specify an already uploaded model url (e.g. s3://bucket/model.bin , gs://bucket/model.bin ) |
|
--destination |
Specify the target destination for the model to be uploaded (e.g. s3://bucket/folder/ , gs://bucket/folder/ ) |
canary
Add model Canary/A/B endpoint.
clearml-serving model canary [-h] [--endpoint ENDPOINT] [--weights WEIGHTS [WEIGHTS ...]]
[--input-endpoints INPUT_ENDPOINTS [INPUT_ENDPOINTS ...]]
[--input-endpoint-prefix INPUT_ENDPOINT_PREFIX]
Parameters
Name | Description | Optional |
---|---|---|
--endpoint |
Model canary serving endpoint name (e.g. my_model/latest ) |
|
--weights |
Model canary weights (order matching model ep), (e.g. 0.2 0.8) | |
--input-endpoints |
Model endpoint prefixes, can also include version (e.g. my_model , my_model/v1 ) |
|
--input-endpoint-prefix |
Model endpoint prefix, lexicographic order or by version <int> (e.g. my_model/1 , my_model/v1 ), where the first weight matches the last version. |
auto-update
Add/Modify model auto-update service.
clearml-serving model auto-update [-h] [--endpoint ENDPOINT] --engine ENGINE
[--max-versions MAX_VERSIONS] [--name NAME]
[--tags TAGS [TAGS ...]] [--project PROJECT]
[--published] [--preprocess PREPROCESS]
[--input-size INPUT_SIZE [INPUT_SIZE ...]]
[--input-type INPUT_TYPE] [--input-name INPUT_NAME]
[--output-size OUTPUT_SIZE [OUTPUT_SIZE ...]]
[--output_type OUTPUT_TYPE] [--output-name OUTPUT_NAME]
[--aux-config AUX_CONFIG [AUX_CONFIG ...]]
Parameters
Name | Description | Optional |
---|---|---|
--endpoint |
Base model endpoint (must be unique) | |
--engine |
Model endpoint serving engine (triton, sklearn, xgboost, lightgbm) | |
--max-versions |
Max versions to store (and create endpoints) for the model. Highest number is the latest version | |
--name |
Specify model name to be selected and auto-updated (notice regexp selection use "$name^" for exact match) |
|
--tags |
Specify tags to be selected and auto-updated | |
--project |
Specify model project to be selected and auto-updated | |
--published |
Only select published model for auto-update | |
--preprocess |
Specify Pre/Post processing code to be used with the model (point to local file / folder) - this should hold for all the models | |
--input-size |
Specify the model matrix input size [Rows x Columns X Channels etc ...] | |
--input-type |
Specify the model matrix input type. Examples: uint8, float32, int16, float16 etc. | |
--input-name |
Specify the model layer pushing input into. Example: layer_0 | |
--output-size |
Specify the model matrix output size [Rows x Columns X Channels etc ...] | |
--output_type |
Specify the model matrix output type. Examples: uint8, float32, int16, float16 etc. | |
--output-name |
Specify the model layer pulling results from. Examples: layer_99 | |
--aux-config |
Specify additional engine specific auxiliary configuration in the form of key=value. Example: platform=onnxruntime_onnx response_cache.enable=true max_batch_size=8 . Notice: you can also pass a full configuration file (e.g. Triton "config.pbtxt") |
add
Add/Update model.
clearml-serving model add [-h] --engine ENGINE --endpoint ENDPOINT [--version VERSION]
[--model-id MODEL_ID] [--preprocess PREPROCESS]
[--input-size INPUT_SIZE [INPUT_SIZE ...]]
[--input-type INPUT_TYPE] [--input-name INPUT_NAME]
[--output-size OUTPUT_SIZE [OUTPUT_SIZE ...]]
[--output-type OUTPUT_TYPE] [--output-name OUTPUT_NAME]
[--aux-config AUX_CONFIG [AUX_CONFIG ...]] [--name NAME]
[--tags TAGS [TAGS ...]] [--project PROJECT] [--published]
Parameters
Name | Description | Optional |
---|---|---|
--engine |
Model endpoint serving engine (triton, sklearn, xgboost, lightgbm) | |
--endpoint |
Base model endpoint (must be unique) | |
--version |
Model endpoint version (default: None) | |
model-id |
Specify a model ID to be served | |
--preprocess |
Specify Pre/Post processing code to be used with the model (point to local file / folder) - this should hold for all the models | |
--input-size |
Specify the model matrix input size [Rows x Columns X Channels etc ...] | |
--input-type |
Specify the model matrix input type. Examples: uint8, float32, int16, float16 etc. | |
--input-name |
Specify the model layer pushing input into. Example: layer_0 | |
--output-size |
Specify the model matrix output size [Rows x Columns X Channels etc ...] | |
--output_type |
Specify the model matrix output type. Examples: uint8, float32, int16, float16 etc. | |
--output-name |
Specify the model layer pulling results from. Examples: layer_99 | |
--aux-config |
Specify additional engine specific auxiliary configuration in the form of key=value. Example: platform=onnxruntime_onnx response_cache.enable=true max_batch_size=8 . Notice: you can also pass a full configuration file (e.g. Triton "config.pbtxt") |
|
--name |
Instead of specifying model-id select based on model name |
|
--tags |
Specify tags to be selected and auto-updated | |
--project |
Instead of specifying model-id select based on model project |
|
--published |
Instead of specifying model-id select based on model published |