Rewrite scikit-learn integration page (#647)

2025-06-26 18:17:44 +00:00 · 2023-08-15 19:01:19 +03:00 · 2023-08-15 19:01:19 +03:00 · 4a2af6d084
commit 4a2af6d084
parent 6eefec2325
3 changed files with 123 additions and 4 deletions
--- a/docs/guides/frameworks/scikit-learn/sklearn_joblib_example.md
+++ b/docs/guides/frameworks/scikit-learn/sklearn_joblib_example.md
@ -1,6 +1,5 @@
 ---
-title: Scikit-Learn
-displayed_sidebar: mainSidebar
+title: Scikit-Learn with Joblib 
 ---

 The [sklearn_joblib_example.py](https://github.com/allegroai/clearml/blob/master/examples/frameworks/scikit-learn/sklearn_joblib_example.py) 
--- a/docs/integrations/scikit_learn.md
+++ b/docs/integrations/scikit_learn.md
@ -0,0 +1,119 @@
+---
+title: Scikit-Learn
+---
+
+:::tip
+If you are not already using ClearML, see [Getting Started](../getting_started/ds/ds_first_steps.md) for setup 
+instructions.
+:::
+
+ClearML integrates seamlessly with [Scikit-Learn](https://scikit-learn.org/stable/), automatically logging models created
+with `joblib`.
+
+All you have to do is simply add two lines of code to your scikit-learn script:
+
+```python
+from clearml import Task
+task = Task.init(task_name="<task_name>", project_name="<project_name>")
+```
+
+And that’s it! This creates a [ClearML Task](../fundamentals/task.md) which captures: 
+* Source code and uncommitted changes
+* Installed packages
+* Joblib model files 
+* Console output
+* General details such as machine details, runtime, creation date etc.
+* Hyperparameters created with standard python packages (e.g. argparse, click, Python Fire, etc.)
+* And more
+
+You can view all the task details in the [WebApp](../webapp/webapp_exp_track_visual.md). 
+
+## Automatic Logging Control 
+By default, when ClearML is integrated into your scikit-learn script, it captures models, and 
+scalars. But, you may want to have more control over what your experiment logs.
+
+To control a task's framework logging, use the `auto_connect_frameworks` parameter of [`Task.init()`](../references/sdk/task.md#taskinit). 
+Completely disable all automatic logging by setting the parameter to `False`. For finer grained control of logged 
+frameworks, input a dictionary, with framework-boolean pairs.
+
+For example:
+
+```python
+auto_connect_frameworks={
+   'joblib': False, 'xgboost': True, 'catboost': True, 'tensorflow': True, 'tensorboard': True, 
+   'pytorch': True, 'scikit': True, 'fastai': True, 'lightgbm': False,
+   'hydra': True, 'detect_repository': True, 'tfdefines': True, 
+   'megengine': True, 'jsonargparse': True
+}
+```
+
+You can also input wildcards as dictionary values, so ClearML will log a model created by a framework only if its local 
+path matches at least one wildcard. 
+
+For example, in the code below, ClearML will log joblib models only if their paths have the `.pkl` extension. The 
+unspecified frameworks' values default to true so all their models are automatically logged.
+
+```python
+auto_connect_frameworks={'joblib' : '*.pkl'}
+```
+
+## Manual Logging
+To augment its automatic logging, ClearML also provides an explicit logging interface.
+
+See more information about explicitly logging information to a ClearML Task:
+* [Models](../clearml_sdk/model_sdk.md#manually-logging-models)
+* [Configuration](../clearml_sdk/task_sdk.md#configuration) (e.g. parameters, configuration files)
+* [Artifacts](../clearml_sdk/task_sdk.md#artifacts) (e.g. output files or python objects created by a task)
+* [Scalars](../clearml_sdk/task_sdk.md#scalars) 
+* [Text/Plots/Debug Samples](../fundamentals/logger.md#manual-reporting)
+
+See [Explicit Reporting Tutorial](../guides/reporting/explicit_reporting.md).
+
+## Examples 
+
+Take a look at ClearML's scikit-learn examples. The examples use scikit-learn and ClearML in different configurations with 
+additional tools, like Matplotlib: 
+* [Scikit-Learn with Joblib](../guides/frameworks/scikit-learn/sklearn_joblib_example.md) - Demonstrates ClearML automatically logging the models created with joblib and a scatter plot created by Matplotlib.
+* [Scikit-Learn with Matplotlib](../guides/frameworks/scikit-learn/sklearn_matplotlib_example.md) - Demonstrates ClearML automatically logging scatter diagrams created with Matplotlib.
+
+
+## Remote Execution
+ClearML logs all the information required to reproduce an experiment on a different machine (installed packages, 
+uncommitted changes etc.). The [ClearML Agent](../clearml_agent) listens to designated queues and when a task is enqueued, 
+the agent pulls it, recreates its execution environment, and runs it, reporting its scalars, plots, etc. to the 
+experiment manager.
+
+Deploy a ClearML Agent onto any machine (e.g. a cloud VM, a local GPU machine, your own laptop) by simply running the 
+following command on it:
+
+```commandline
+clearml-agent daemon --queue <queues_to_listen_to> [--docker]
+```
+
+Use the ClearML [Autoscalers](../cloud_autoscaling/autoscaling_overview.md), to help you manage cloud workloads in the 
+cloud of your choice (AWS, GCP, Azure) and automatically deploy ClearML agents: the autoscaler automatically spins up 
+and shuts down instances as needed, according to a resource budget that you set.
+
+### Cloning, Editing, and Enqueuing
+
+![Cloning, editing, enqueuing gif](../img/gif/integrations_yolov5.gif)
+
+Use ClearML's web interface to edit task details, like configuration parameters or input models, then execute the task 
+with the new configuration on a remote machine:
+
+* Clone the experiment
+* Edit the hyperparameters and/or other details
+* Enqueue the task
+
+The ClearML Agent executing the task will use the new values to [override any hard coded values](../clearml_agent).
+
+### Executing a Task Remotely
+
+You can set a task to be executed remotely programmatically by adding [`Task.execute_remotely()`](../references/sdk/task.md#execute_remotely) 
+to your script. This method stops the current local execution of the task, and then enqueues it to a specified queue to 
+re-run it on a remote machine.
+
+```python
+# If executed locally, process will terminate, and a copy will be executed by an agent instead
+task.execute_remotely(queue_name='default', exit_process=True)
+```
--- a/sidebars.js
+++ b/sidebars.js
@ -67,9 +67,10 @@ module.exports = {
                'integrations/megengine', 'integrations/openmmv', 'integrations/optuna',
                'integrations/python_fire', 'integrations/pytorch',
                'integrations/ignite',
-                'guides/frameworks/pytorch_lightning/pytorch_lightning_example', 'guides/frameworks/scikit-learn/sklearn_joblib_example',
+                'guides/frameworks/pytorch_lightning/pytorch_lightning_example',
+                'integrations/scikit_learn', 'integrations/seaborn',
                'integrations/tensorboard', 'integrations/tensorboardx', 'integrations/tensorflow',
-                'integrations/seaborn', 'integrations/xgboost', 'integrations/yolov5', 'integrations/yolov8'
+                 'integrations/xgboost', 'integrations/yolov5', 'integrations/yolov8'
            ]
        },
        'integrations/storage',