From 521b3f883887e0156735f33d6d0b5075da6d7204 Mon Sep 17 00:00:00 2001 From: pollfly <75068813+pollfly@users.noreply.github.com> Date: Mon, 31 Jul 2023 12:32:51 +0300 Subject: [PATCH] Edit CatBoost and XGBoost integration pages (#624) --- docs/guides/frameworks/catboost/catboost.md | 1 - .../frameworks/xgboost/xgboost_metrics.md | 3 +- docs/integrations/catboost.md | 120 ++++++++++++++ docs/integrations/xgboost.md | 148 ++++++++++++++++++ sidebars.js | 4 +- 5 files changed, 271 insertions(+), 5 deletions(-) create mode 100644 docs/integrations/catboost.md create mode 100644 docs/integrations/xgboost.md diff --git a/docs/guides/frameworks/catboost/catboost.md b/docs/guides/frameworks/catboost/catboost.md index 497e65f5..05c26f84 100644 --- a/docs/guides/frameworks/catboost/catboost.md +++ b/docs/guides/frameworks/catboost/catboost.md @@ -1,6 +1,5 @@ --- title: CatBoost -displayed_sidebar: mainSidebar --- The [catboost_example.py](https://github.com/allegroai/clearml/blob/master/examples/frameworks/catboost/catboost_example.py) diff --git a/docs/guides/frameworks/xgboost/xgboost_metrics.md b/docs/guides/frameworks/xgboost/xgboost_metrics.md index 7e266c85..6337a130 100644 --- a/docs/guides/frameworks/xgboost/xgboost_metrics.md +++ b/docs/guides/frameworks/xgboost/xgboost_metrics.md @@ -1,6 +1,5 @@ --- -title: XGBoost -displayed_sidebar: mainSidebar +title: XGBoost Metrics --- The [xgboost_metrics.py](https://github.com/allegroai/clearml/blob/master/examples/frameworks/xgboost/xgboost_metrics.py) diff --git a/docs/integrations/catboost.md b/docs/integrations/catboost.md new file mode 100644 index 00000000..e50dd159 --- /dev/null +++ b/docs/integrations/catboost.md @@ -0,0 +1,120 @@ +--- +title: CatBoost +--- + +:::tip +If you are not already using ClearML, see [Getting Started](../getting_started/ds/ds_first_steps.md) for setup +instructions. +::: + +ClearML integrates seamlessly with [CatBoost](https://catboost.ai/), automatically logging its models and scalars. + +All you have to do is simply add two lines of code to your CatBoost script: + +```python +from clearml import Task +task = Task.init(task_name="", project_name="") +``` + +And that’s it! This creates a [ClearML Task](../fundamentals/task.md) which captures: +* Source code and uncommitted changes +* Installed packages +* CatBoost model files +* Scalars (loss, learning rates) +* Console output +* General details such as machine details, runtime, creation date etc. +* Hyperparameters created with standard python packages (e.g. argparse, click, Python Fire, etc.) +* And more + +You can view all the task details in the [WebApp](../webapp/webapp_overview.md). + +See an example of CatBoost and ClearML in action [here](../guides/frameworks/catboost/catboost.md). + +![Experiment scalars](../img/examples_catboost_scalars.png) + +## Automatic Logging Control +By default, when ClearML is integrated into your CatBoost script, it captures models, and +scalars. But, you may want to have more control over what your experiment logs. + +To control a task's framework logging, use the `auto_connect_frameworks` parameter of [`Task.init()`](../references/sdk/task.md#taskinit). +Completely disable all automatic logging by setting the parameter to `False`. For finer grained control of logged +frameworks, input a dictionary, with framework-boolean pairs. + +For example: + +```python +auto_connect_frameworks={ + 'catboost': False, 'tensorflow': False, 'tensorboard': False, 'pytorch': True, + 'xgboost': False, 'scikit': True, 'fastai': True, 'lightgbm': False, + 'hydra': True, 'detect_repository': True, 'tfdefines': True, 'joblib': True, + 'megengine': True, 'jsonargparse': True +} +``` + +You can also input wildcards as dictionary values, so ClearML will log a model created by a framework only if its local +path matches at least one wildcard. + +For example, in the code below, ClearML will log CatBoost models only if their paths have the `.pt` extension. The +unspecified frameworks' values default to true so all their models are automatically logged. + +```python +auto_connect_frameworks={'catboost' : '*.pt'} +``` + +## Manual Logging +To augment its automatic logging, ClearML also provides an explicit logging interface. + +See more information about explicitly logging information to a ClearML Task: +* [Models](../clearml_sdk/model_sdk.md#manually-logging-models) +* [Configuration](../clearml_sdk/task_sdk.md#configuration) (e.g. parameters, configuration files) +* [Artifacts](../clearml_sdk/task_sdk.md#artifacts) (e.g. output files or python objects created by a task) +* [Scalars](../clearml_sdk/task_sdk.md#scalars) +* [Text/Plots/Debug Samples](../fundamentals/logger.md#manual-reporting) + +See [Explicit Reporting Tutorial](../guides/reporting/explicit_reporting.md). + +## Remote Execution +ClearML logs all the information required to reproduce an experiment on a different machine (installed packages, +uncommitted changes etc.). The [ClearML Agent](../clearml_agent) listens to designated queues and when a task is enqueued, +the agent pulls it, recreates its execution environment, and runs it, reporting its scalars, plots, etc. to the +experiment manager. + +Deploy a ClearML Agent onto any machine (e.g. a cloud VM, a local GPU machine, your own laptop) by simply running the +following command on it: + +```commandline +clearml-agent daemon --queue [--docker] +``` + +Use the ClearML [Autoscalers](../cloud_autoscaling/autoscaling_overview.md), to help you manage cloud workloads in the +cloud of your choice (AWS, GCP, Azure) and automatically deploy ClearML agents: the autoscaler automatically spins up +and shuts down instances as needed, according to a resource budget that you set. + +### Cloning, Editing, and Enqueuing + +![Cloning, editing, enqueuing gif](../img/gif/integrations_yolov5.gif) + +Use ClearML's web interface to edit task details, like configuration parameters or input models, then execute the task +with the new configuration on a remote machine: + +* Clone the experiment +* Edit the hyperparameters and/or other details +* Enqueue the task + +The ClearML Agent executing the task will use the new values to [override any hard coded values](../clearml_agent). + +### Executing a Task Remotely + +You can set a task to be executed remotely programmatically by adding [`Task.execute_remotely()`](../references/sdk/task.md#execute_remotely) +to your script. This method stops the current local execution of the task, and then enqueues it to a specified queue to +re-run it on a remote machine. + +```python +# If executed locally, process will terminate, and a copy will be executed by an agent instead +task.execute_remotely(queue_name='default', exit_process=True) +``` + +## Hyperparameter Optimization +Use ClearML’s [`HyperParameterOptimizer`](../references/sdk/hpo_optimization_hyperparameteroptimizer.md) class to find +the hyperparameter values that yield the best performing models. See [Hyperparameter Optimization](../fundamentals/hpo.md) +for more information. diff --git a/docs/integrations/xgboost.md b/docs/integrations/xgboost.md new file mode 100644 index 00000000..a27cc684 --- /dev/null +++ b/docs/integrations/xgboost.md @@ -0,0 +1,148 @@ +--- +title: XGBoost +--- + +:::tip +If you are not already using ClearML, see [Getting Started](../getting_started/ds/ds_first_steps.md) for setup +instructions. +::: + +ClearML integrates seamlessly with [XGBoost](https://xgboost.readthedocs.io/en/stable/), automatically logging its models, +and scalars. + +All you have to do is simply add two lines of code to your XGBoost script: + +```python +from clearml import Task +task = Task.init(task_name="", project_name="") +``` + +And that’s it! This creates a [ClearML Task](../fundamentals/task.md) which captures: +* Source code and uncommitted changes +* Installed packages +* XGBoost model files +* Scalars (loss, learning rates) +* Console output +* General details such as machine details, runtime, creation date etc. +* Hyperparameters created with standard python packages (e.g. argparse, click, Python Fire, etc.) +* And more + +:::tip Logging Plots +ClearML automatically logs plots displayed using Matplotlib. To automatically log XGBoost plots, like tree and +feature importance plots, pass `matplotlib.pyplot.show()` after the plot creation method: + +```python +import matplotlib.pyplot as plt +import xgboost as xgb +from xgboost import plot_tree + +# model training +# ... + +xgb.plot_importance(model) +plt.show() +try: + plot_tree(model) + plt.show() +except ImportError: + print('Skipping tree plot: You must install graphviz to support plot tree') +``` +::: + +You can view all the task details in the [WebApp](../webapp/webapp_overview.md). + +![Experiment scalars](../img/examples_xgboost_metric_scalars.png) + +## Automatic Logging Control +By default, when ClearML is integrated into your XGBoost script, it captures models, and +scalars. But, you may want to have more control over what your experiment logs. + +To control a task's framework logging, use the `auto_connect_frameworks` parameter of [`Task.init()`](../references/sdk/task.md#taskinit). +Completely disable all automatic logging by setting the parameter to `False`. For finer grained control of logged +frameworks, input a dictionary, with framework-boolean pairs. + +For example: + +```python +auto_connect_frameworks={ + 'xgboost': False, 'catboost': False, 'tensorflow': False, 'tensorboard': False, + 'pytorch': True, 'scikit': True, 'fastai': True, 'lightgbm': False, + 'hydra': True, 'detect_repository': True, 'tfdefines': True, 'joblib': True, + 'megengine': True, 'jsonargparse': True +} +``` + +You can also input wildcards as dictionary values, so ClearML will log a model created by a framework only if its local +path matches at least one wildcard. + +For example, in the code below, ClearML will log XGBoost models only if their paths have the `.pt` extension. The +unspecified frameworks' values default to true so all their models are automatically logged. + +```python +auto_connect_frameworks={'xgboost' : '*.pt'} +``` + +## Manual Logging +To augment its automatic logging, ClearML also provides an explicit logging interface. + +See more information about explicitly logging information to a ClearML Task: +* [Models](../clearml_sdk/model_sdk.md#manually-logging-models) +* [Configuration](../clearml_sdk/task_sdk.md#configuration) (e.g. parameters, configuration files) +* [Artifacts](../clearml_sdk/task_sdk.md#artifacts) (e.g. output files or python objects created by a task) +* [Scalars](../clearml_sdk/task_sdk.md#scalars) +* [Text/Plots/Debug Samples](../fundamentals/logger.md#manual-reporting) + +See [Explicit Reporting Tutorial](../guides/reporting/explicit_reporting.md). + +## Examples + +Take a look at ClearML's XGBoost examples. The examples use XGBOost and ClearML in different configurations with +additional tools, like Matplotlib and scikit-learn: +* [XGBoost Metric](../guides/frameworks/xgboost/xgboost_metrics.md) - Demonstrates ClearML automatic logging of XGBoost models and plots +* [XGBoost and scikit-learn](../guides/frameworks/xgboost/xgboost_sample.md) - Demonstrates ClearML automatic logging of XGBoost scalars and models + +## Remote Execution +ClearML logs all the information required to reproduce an experiment on a different machine (installed packages, +uncommitted changes etc.). The [ClearML Agent](../clearml_agent) listens to designated queues and when a task is enqueued, +the agent pulls it, recreates its execution environment, and runs it, reporting its scalars, plots, etc. to the +experiment manager. + +Deploy a ClearML Agent onto any machine (e.g. a cloud VM, a local GPU machine, your own laptop) by simply running the +following command on it: + +```commandline +clearml-agent daemon --queue [--docker] +``` + +Use the ClearML [Autoscalers](../cloud_autoscaling/autoscaling_overview.md), to help you manage cloud workloads in the +cloud of your choice (AWS, GCP, Azure) and automatically deploy ClearML agents: the autoscaler automatically spins up +and shuts down instances as needed, according to a resource budget that you set. + +### Cloning, Editing, and Enqueuing + +![Cloning, editing, enqueuing gif](../img/gif/integrations_yolov5.gif) + +Use ClearML's web interface to edit task details, like configuration parameters or input models, then execute the task +with the new configuration on a remote machine: + +* Clone the experiment +* Edit the hyperparameters and/or other details +* Enqueue the task + +The ClearML Agent executing the task will use the new values to [override any hard coded values](../clearml_agent). + +### Executing a Task Remotely + +You can set a task to be executed remotely programmatically by adding [`Task.execute_remotely()`](../references/sdk/task.md#execute_remotely) +to your script. This method stops the current local execution of the task, and then enqueues it to a specified queue to +re-run it on a remote machine. + +```python +# If executed locally, process will terminate, and a copy will be executed by an agent instead +task.execute_remotely(queue_name='default', exit_process=True) +``` + +## Hyperparameter Optimization +Use ClearML’s [`HyperParameterOptimizer`](../references/sdk/hpo_optimization_hyperparameteroptimizer.md) class to find +the hyperparameter values that yield the best performing models. See [Hyperparameter Optimization](../fundamentals/hpo.md) +for more information. diff --git a/sidebars.js b/sidebars.js index 2a197666..2e832a2c 100644 --- a/sidebars.js +++ b/sidebars.js @@ -60,7 +60,7 @@ module.exports = { {'CLI Tools': ['apps/clearml_session', 'apps/clearml_task', 'apps/clearml_param_search']}, {'Integrations': [ 'guides/frameworks/autokeras/integration_autokeras', - 'guides/frameworks/catboost/catboost', 'integrations/click', 'guides/frameworks/fastai/fastai_with_tensorboard', + 'integrations/catboost', 'integrations/click', 'guides/frameworks/fastai/fastai_with_tensorboard', 'integrations/hydra', 'guides/frameworks/keras/keras_tensorboard', 'guides/frameworks/tensorflow/integration_keras_tuner', 'guides/frameworks/lightgbm/lightgbm_example', 'guides/frameworks/matplotlib/matplotlib_example', @@ -69,7 +69,7 @@ module.exports = { {'PyTorch Ignite':['guides/frameworks/pytorch_ignite/integration_pytorch_ignite', 'guides/frameworks/pytorch_ignite/pytorch_ignite_mnist']}, 'guides/frameworks/pytorch_lightning/pytorch_lightning_example', 'guides/frameworks/scikit-learn/sklearn_joblib_example', 'guides/frameworks/pytorch/pytorch_tensorboard', 'guides/frameworks/tensorboardx/tensorboardx', 'guides/frameworks/tensorflow/tensorflow_mnist', - 'integrations/seaborn', 'guides/frameworks/xgboost/xgboost_metrics', 'integrations/yolov5', 'integrations/yolov8' + 'integrations/seaborn', 'integrations/xgboost', 'integrations/yolov5', 'integrations/yolov8' ] }, 'integrations/storage',