2023-08-15 16:01:19 +00:00
---
2023-11-05 08:30:37 +00:00
title: scikit-learn
2023-08-15 16:01:19 +00:00
---
:::tip
If you are not already using ClearML, see [Getting Started ](../getting_started/ds/ds_first_steps.md ) for setup
instructions.
:::
2023-11-05 08:30:37 +00:00
ClearML integrates seamlessly with [scikit-learn ](https://scikit-learn.org/stable/ ), automatically logging models created
2023-08-15 16:01:19 +00:00
with `joblib` .
All you have to do is simply add two lines of code to your scikit-learn script:
```python
from clearml import Task
2024-01-10 12:40:19 +00:00
2023-08-15 16:01:19 +00:00
task = Task.init(task_name="< task_name > ", project_name="< project_name > ")
```
2023-12-03 12:27:46 +00:00
And that's it! This creates a [ClearML Task ](../fundamentals/task.md ) which captures:
2023-08-15 16:01:19 +00:00
* Source code and uncommitted changes
* Installed packages
* Joblib model files
* Console output
* General details such as machine details, runtime, creation date etc.
* Hyperparameters created with standard python packages (e.g. argparse, click, Python Fire, etc.)
* And more
You can view all the task details in the [WebApp ](../webapp/webapp_exp_track_visual.md ).
## Automatic Logging Control
By default, when ClearML is integrated into your scikit-learn script, it captures models, and
scalars. But, you may want to have more control over what your experiment logs.
To control a task's framework logging, use the `auto_connect_frameworks` parameter of [`Task.init()` ](../references/sdk/task.md#taskinit ).
Completely disable all automatic logging by setting the parameter to `False` . For finer grained control of logged
frameworks, input a dictionary, with framework-boolean pairs.
For example:
```python
auto_connect_frameworks={
'joblib': False, 'xgboost': True, 'catboost': True, 'tensorflow': True, 'tensorboard': True,
'pytorch': True, 'scikit': True, 'fastai': True, 'lightgbm': False,
2023-11-23 13:05:19 +00:00
'hydra': True, 'detect_repository': True, 'tfdefines': True, 'megengine': True
2023-08-15 16:01:19 +00:00
}
```
You can also input wildcards as dictionary values, so ClearML will log a model created by a framework only if its local
path matches at least one wildcard.
For example, in the code below, ClearML will log joblib models only if their paths have the `.pkl` extension. The
unspecified frameworks' values default to true so all their models are automatically logged.
```python
auto_connect_frameworks={'joblib' : '*.pkl'}
```
## Manual Logging
To augment its automatic logging, ClearML also provides an explicit logging interface.
See more information about explicitly logging information to a ClearML Task:
* [Models ](../clearml_sdk/model_sdk.md#manually-logging-models )
* [Configuration ](../clearml_sdk/task_sdk.md#configuration ) (e.g. parameters, configuration files)
* [Artifacts ](../clearml_sdk/task_sdk.md#artifacts ) (e.g. output files or python objects created by a task)
* [Scalars ](../clearml_sdk/task_sdk.md#scalars )
* [Text/Plots/Debug Samples ](../fundamentals/logger.md#manual-reporting )
See [Explicit Reporting Tutorial ](../guides/reporting/explicit_reporting.md ).
## Examples
Take a look at ClearML's scikit-learn examples. The examples use scikit-learn and ClearML in different configurations with
additional tools, like Matplotlib:
2023-11-05 08:30:37 +00:00
* [scikit-learn with Joblib ](../guides/frameworks/scikit-learn/sklearn_joblib_example.md ) - Demonstrates ClearML automatically logging the models created with joblib and a scatter plot created by Matplotlib.
* [scikit-learn with Matplotlib ](../guides/frameworks/scikit-learn/sklearn_matplotlib_example.md ) - Demonstrates ClearML automatically logging scatter diagrams created with Matplotlib.
2023-08-15 16:01:19 +00:00
## Remote Execution
ClearML logs all the information required to reproduce an experiment on a different machine (installed packages,
uncommitted changes etc.). The [ClearML Agent ](../clearml_agent ) listens to designated queues and when a task is enqueued,
the agent pulls it, recreates its execution environment, and runs it, reporting its scalars, plots, etc. to the
experiment manager.
Deploy a ClearML Agent onto any machine (e.g. a cloud VM, a local GPU machine, your own laptop) by simply running the
following command on it:
```commandline
clearml-agent daemon --queue < queues_to_listen_to > [--docker]
```
2023-09-11 10:33:30 +00:00
Use the ClearML [Autoscalers ](../cloud_autoscaling/autoscaling_overview.md ) to help you manage cloud workloads in the
2023-08-15 16:01:19 +00:00
cloud of your choice (AWS, GCP, Azure) and automatically deploy ClearML agents: the autoscaler automatically spins up
and shuts down instances as needed, according to a resource budget that you set.
### Cloning, Editing, and Enqueuing
![Cloning, editing, enqueuing gif ](../img/gif/integrations_yolov5.gif )
Use ClearML's web interface to edit task details, like configuration parameters or input models, then execute the task
with the new configuration on a remote machine:
* Clone the experiment
* Edit the hyperparameters and/or other details
* Enqueue the task
The ClearML Agent executing the task will use the new values to [override any hard coded values ](../clearml_agent ).
### Executing a Task Remotely
You can set a task to be executed remotely programmatically by adding [`Task.execute_remotely()` ](../references/sdk/task.md#execute_remotely )
to your script. This method stops the current local execution of the task, and then enqueues it to a specified queue to
re-run it on a remote machine.
```python
# If executed locally, process will terminate, and a copy will be executed by an agent instead
task.execute_remotely(queue_name='default', exit_process=True)
```