diff --git a/docs/clearml_serving/clearml_serving_cli.md b/docs/clearml_serving/clearml_serving_cli.md index da0ec42f..fa72030b 100644 --- a/docs/clearml_serving/clearml_serving_cli.md +++ b/docs/clearml_serving/clearml_serving_cli.md @@ -9,15 +9,14 @@ The following page provides a reference for `clearml-serving`'s CLI commands: * [create](#create) - Create a new Serving Service * [metrics](#metrics) - Configure inference metrics Service * [config](#config) - Configure a new Serving Service -* [model](#model) - Configure Model endpoints for a running Service +* [model](#model) - Configure model endpoints for a running Service +## Global Parameters ```bash clearml-serving [-h] [--debug] [--id ID] {list,create,metrics,config,model} ``` -**Parameters** -
|Name|Description|Optional| @@ -31,21 +30,22 @@ clearml-serving [-h] [--debug] [--id ID] {list,create,metrics,config,model} The Serving Service's ID (`--id`) is required to execute the `metrics`, `config`, and `model` commands. ::: -### list +## list + +List running Serving Services. + ```bash clearml-serving list [-h] ``` -List running Serving Services. +## create -### create +Create a new Serving Service. ```bash clearml-serving create [-h] [--name NAME] [--tags TAGS [TAGS ...]] [--project PROJECT] ``` -Create a new Serving Service - **Parameters**
@@ -58,34 +58,76 @@ Create a new Serving Service
-### metrics +## metrics -Configure inference metrics Service +Configure inference metrics Service. ```bash clearml-serving metrics [-h] {add,remove,list} ``` +### add + +Add/modify metric for a specific endpoint. + +```bash +clearml-serving metrics add [-h] --endpoint ENDPOINT [--log-freq LOG_FREQ] + [--variable-scalar VARIABLE_SCALAR [VARIABLE_SCALAR ...]] + [--variable-enum VARIABLE_ENUM [VARIABLE_ENUM ...]] + [--variable-value VARIABLE_VALUE [VARIABLE_VALUE ...]] +``` **Parameters**
|Name|Description|Optional| |---|---|---| -|`--add` | Add/modify metric for a specific endpoint| Yes | -|`--remove` | Remove metric from a specific endpoint| Yes | -|`--list` | list metrics logged on all endpoints | Yes | +|`--endpoint`|Metric endpoint name including version (e.g. `"model/1"` or a prefix `"model/*"`). Notice: it will override any previous endpoint logged metrics| No| +|`--log-freq`|Logging request frequency, between 0.0 to 1.0. Example: 1.0 means all requests are logged, 0.5 means half of the requests are logged if not specified. To use global logging frequency, see [`config --metric-log-freq`](#config)| Yes| +|`--variable-scalar`|Add float (scalar) argument to the metric logger, `=`. Example: with specific buckets: `"x1=0,0.2,0.4,0.6,0.8,1"` or with min/max/num_buckets `"x1=0.0/1.0/5"` | Yes| +|`--variable-enum`|Add enum (string) argument to the metric logger, `=`. Example: `"detect=cat,dog,sheep"` |Yes| +|`--variable-value`|Add non-samples scalar argument to the metric logger, ``. Example: `"latency"` |Yes|
+### remove + +Remove metric from a specific endpoint. + +```bash +clearml-serving metrics remove [-h] [--endpoint ENDPOINT] + [--variable VARIABLE [VARIABLE ...]] +``` +**Parameters** + +
+ +|Name|Description|Optional| +|---|---|---| +|`--endpoint`| Metric endpoint name including version (e.g. `"model/1"` or a prefix `"model/*"`) |No| +|`--variable`| Remove (scalar/enum) argument from the metric logger, `` example: `"x1"` |Yes| + +
+ +### list + +List metrics logged on all endpoints. + +```bash +clearml-serving metrics list [-h] +``` +
-### config +## config Configure a new Serving Service. ```bash -clearml-serving {base-serving-url, triton-grpc, kafka-metric-server, metric-log-freq} +clearml-serving config [-h] [--base-serving-url BASE_SERVING_URL] + [--triton-grpc-server TRITON_GRPC_SERVER] + [--kafka-metric-server KAFKA_METRIC_SERVER] + [--metric-log-freq METRIC_LOG_FREQ] ``` **Parameters** @@ -97,33 +139,173 @@ clearml-serving {base-serving-url, triton-grpc, kafka-metric-server, metric-log- |`--base-serving-url`|External base serving service url. Example: `http://127.0.0.1:8080/serve`|Yes| |`--triton-grpc-server`|External ClearML-Triton serving container gRPC address. Example: `127.0.0.1:9001`|Yes| |`--kafka-metric-server`|External Kafka service url. Example: `127.0.0.1:9092`|Yes| -|`--metric-log-freq`|Set default metric logging frequency. 1.0 is 100% of all requests are logged|Yes| +|`--metric-log-freq`|Set default metric logging frequency between 0.0 to 1.0. 1.0 means that 100% of all requests are logged|Yes|

-### model +## model -Configure Model endpoints for an already running Service +Configure model endpoints for an already running Service. ```bash clearml-serving model [-h] {list,remove,upload,canary,auto-update,add} ``` +### list + +List current models. + +```bash +clearml-serving model list [-h] +``` + +### remove + +Remove model by its endpoint name. + +```bash +clearml-serving model remove [-h] [--endpoint ENDPOINT] +``` + +**Parameter** + +
+ +|Name|Description|Optional| +|---|---|---| +|`--endpoint` | Model endpoint name | No| +`` +
+ +### upload +Upload and register model files/folder. + +```bash +clearml-serving model upload [-h] --name NAME [--tags TAGS [TAGS ...]] --project PROJECT + [--framework {scikit-learn,xgboost,lightgbm,tensorflow,pytorch}] + [--publish] [--path PATH] [--url URL] + [--destination DESTINATION] +``` +**Parameters** + +
+ +|Name|Description|Optional| +|---|---|---| +|`--name`|Specifying the model name to be registered in| No| +|`--tags`| Add tags to the newly created model| Yes| +|`--project`| Specify the project for the model to be registered in| No| +|`--framework`| Specify the model framework. Options are: "scikit-learn", "xgboost", "lightgbm", "tensorflow", "pytorch" | Yes| +|`--publish`| Publish the newly created model (change model state to "published" (i.e. locked and ready to deploy)|Yes| +|`--path`|Specify a model file/folder to be uploaded and registered| Yes| +|`--url`| Specify an already uploaded model url (e.g. `s3://bucket/model.bin`, `gs://bucket/model.bin`)|Yes| +|`--destination`|Specify the target destination for the model to be uploaded (e.g. `s3://bucket/folder/`, `gs://bucket/folder/`)|Yes| + +
+ + +### canary + +Add model Canary/A/B endpoint. + +```bash +clearml-serving model canary [-h] [--endpoint ENDPOINT] [--weights WEIGHTS [WEIGHTS ...]] + [--input-endpoints INPUT_ENDPOINTS [INPUT_ENDPOINTS ...]] + [--input-endpoint-prefix INPUT_ENDPOINT_PREFIX] +``` + **Parameters**
|Name|Description|Optional| |---|---|---| -|`--list`| List current models| Yes | -|`--remove`| Remove model by its endpoint name | Yes | -|`--upload` | Upload and register model files/folder | Yes| -|`--canary` | Add model Canary/A/B endpoint | Yes| -|`--auto-update` | Add/Modify model auto update service | Yes| -|`--add` | Add/Update model | Yes| +|`--endpoint`| Model canary serving endpoint name (e.g. `my_model/latest`)| Yes| +|`--weights`| Model canary weights (order matching model ep), (e.g. 0.2 0.8) |Yes| +|`--input-endpoints`|Model endpoint prefixes, can also include version (e.g. `my_model`, `my_model/v1`)| Yes| +|`--input-endpoint-prefix`| Model endpoint prefix, lexicographic order or by version `` (e.g. `my_model/1`, `my_model/v1`), where the first weight matches the last version.|Yes| + + +
+ +### auto-update + +Add/Modify model auto-update service. + +```bash +clearml-serving model auto-update [-h] [--endpoint ENDPOINT] --engine ENGINE + [--max-versions MAX_VERSIONS] [--name NAME] + [--tags TAGS [TAGS ...]] [--project PROJECT] + [--published] [--preprocess PREPROCESS] + [--input-size INPUT_SIZE [INPUT_SIZE ...]] + [--input-type INPUT_TYPE] [--input-name INPUT_NAME] + [--output-size OUTPUT_SIZE [OUTPUT_SIZE ...]] + [--output_type OUTPUT_TYPE] [--output-name OUTPUT_NAME] + [--aux-config AUX_CONFIG [AUX_CONFIG ...]] +``` +**Parameters** + +
+ +|Name|Description|Optional| +|---|---|---| +|`--endpoint`| Base model endpoint (must be unique)| No| +|`--engine`| Model endpoint serving engine (triton, sklearn, xgboost, lightgbm)| No| +|`--max-versions`|Max versions to store (and create endpoints) for the model. Highest number is the latest version | Yes| +|`--name`| Specify model name to be selected and auto-updated (notice regexp selection use `"$name^"` for exact match) | Yes| +|`--tags`|Specify tags to be selected and auto-updated |Yes| +|`--project`|Specify model project to be selected and auto-updated | Yes| +|`--published`| Only select published model for auto-update |Yes| +|`--preprocess` |Specify Pre/Post processing code to be used with the model (point to local file / folder) - this should hold for all the models |Yes| +|`--input-size`| Specify the model matrix input size [Rows x Columns X Channels etc ...] | Yes| +|`--input-type`| Specify the model matrix input type. Examples: uint8, float32, int16, float16 etc. |Yes| +|`--input-name`|Specify the model layer pushing input into. Example: layer_0 | Yes| +|`--output-size`|Specify the model matrix output size [Rows x Columns X Channels etc ...]|Yes| +|`--output_type`| Specify the model matrix output type. Examples: uint8, float32, int16, float16 etc. | Yes| +|`--output-name`|Specify the model layer pulling results from. Examples: layer_99| Yes| +|`--aux-config`| Specify additional engine specific auxiliary configuration in the form of key=value. Example: `platform=onnxruntime_onnx response_cache.enable=true max_batch_size=8`. Notice: you can also pass a full configuration file (e.g. Triton "config.pbtxt")|Yes| + +
+ +### add + +Add/Update model. + +```bash +clearml-serving model add [-h] --engine ENGINE --endpoint ENDPOINT [--version VERSION] + [--model-id MODEL_ID] [--preprocess PREPROCESS] + [--input-size INPUT_SIZE [INPUT_SIZE ...]] + [--input-type INPUT_TYPE] [--input-name INPUT_NAME] + [--output-size OUTPUT_SIZE [OUTPUT_SIZE ...]] + [--output-type OUTPUT_TYPE] [--output-name OUTPUT_NAME] + [--aux-config AUX_CONFIG [AUX_CONFIG ...]] [--name NAME] + [--tags TAGS [TAGS ...]] [--project PROJECT] [--published] +``` + +**Parameters** + +
+ +|Name|Description|Optional| +|---|---|---| +|`--engine`| Model endpoint serving engine (triton, sklearn, xgboost, lightgbm)| No| +|`--endpoint`| Base model endpoint (must be unique)| No| +|`--version`|Model endpoint version (default: None) | Yes| +|`model-id`|Specify a model ID to be served|No| +|`--preprocess` |Specify Pre/Post processing code to be used with the model (point to local file / folder) - this should hold for all the models |Yes| +|`--input-size`| Specify the model matrix input size [Rows x Columns X Channels etc ...] | Yes| +|`--input-type`| Specify the model matrix input type. Examples: uint8, float32, int16, float16 etc. |Yes| +|`--input-name`|Specify the model layer pushing input into. Example: layer_0 | Yes| +|`--output-size`|Specify the model matrix output size [Rows x Columns X Channels etc ...]|Yes| +|`--output_type`| Specify the model matrix output type. Examples: uint8, float32, int16, float16 etc. | Yes| +|`--output-name`|Specify the model layer pulling results from. Examples: layer_99| Yes| +|`--aux-config`| Specify additional engine specific auxiliary configuration in the form of key=value. Example: `platform=onnxruntime_onnx response_cache.enable=true max_batch_size=8`. Notice: you can also pass a full configuration file (e.g. Triton "config.pbtxt")|Yes| +|`--name`| Instead of specifying `model-id` select based on model name | Yes| +|`--tags`|Specify tags to be selected and auto-updated |Yes| +|`--project`|Instead of specifying `model-id` select based on model project | Yes| +|`--published`| Instead of specifying `model-id` select based on model published |Yes|
-