clearml-docs/docs/integrations/scikit_learn.md

---
title: scikit-learn
---

:::tip
If you are not already using ClearML, see [Getting Started](../getting_started/ds/ds_first_steps.md) for setup 
instructions.
:::

ClearML integrates seamlessly with [scikit-learn](https://scikit-learn.org/stable/), automatically logging models created
with `joblib`.

All you have to do is simply add two lines of code to your scikit-learn script:

```python
from clearml import Task

task = Task.init(task_name="<task_name>", project_name="<project_name>")
```

And that's it! This creates a [ClearML Task](../fundamentals/task.md) which captures: 
* Source code and uncommitted changes
* Installed packages
* Joblib model files 
* Console output
* General details such as machine details, runtime, creation date etc.
* Hyperparameters created with standard python packages (e.g. argparse, click, Python Fire, etc.)
* And more

You can view all the task details in the [WebApp](../webapp/webapp_exp_track_visual.md). 

## Automatic Logging Control 
By default, when ClearML is integrated into your scikit-learn script, it captures models, and 
scalars. But, you may want to have more control over what your experiment logs.

To control a task's framework logging, use the `auto_connect_frameworks` parameter of [`Task.init()`](../references/sdk/task.md#taskinit). 
Completely disable all automatic logging by setting the parameter to `False`. For finer grained control of logged 
frameworks, input a dictionary, with framework-boolean pairs.

For example:

```python
auto_connect_frameworks={
   'joblib': False, 'xgboost': True, 'catboost': True, 'tensorflow': True, 'tensorboard': True, 
   'pytorch': True, 'scikit': True, 'fastai': True, 'lightgbm': False,
   'hydra': True, 'detect_repository': True, 'tfdefines': True, 'megengine': True
}
```

You can also input wildcards as dictionary values, so ClearML will log a model created by a framework only if its local 
path matches at least one wildcard. 

For example, in the code below, ClearML will log joblib models only if their paths have the `.pkl` extension. The 
unspecified frameworks' values default to true so all their models are automatically logged.

```python
auto_connect_frameworks={'joblib' : '*.pkl'}
```

## Manual Logging
To augment its automatic logging, ClearML also provides an explicit logging interface.

See more information about explicitly logging information to a ClearML Task:
* [Models](../clearml_sdk/model_sdk.md#manually-logging-models)
* [Configuration](../clearml_sdk/task_sdk.md#configuration) (e.g. parameters, configuration files)
* [Artifacts](../clearml_sdk/task_sdk.md#artifacts) (e.g. output files or python objects created by a task)
* [Scalars](../clearml_sdk/task_sdk.md#scalars) 
* [Text/Plots/Debug Samples](../fundamentals/logger.md#manual-reporting)

See [Explicit Reporting Tutorial](../guides/reporting/explicit_reporting.md).

## Examples 

Take a look at ClearML's scikit-learn examples. The examples use scikit-learn and ClearML in different configurations with 
additional tools, like Matplotlib: 
* [scikit-learn with Joblib](../guides/frameworks/scikit-learn/sklearn_joblib_example.md) - Demonstrates ClearML automatically logging the models created with joblib and a scatter plot created by Matplotlib.
* [scikit-learn with Matplotlib](../guides/frameworks/scikit-learn/sklearn_matplotlib_example.md) - Demonstrates ClearML automatically logging scatter diagrams created with Matplotlib.


## Remote Execution
ClearML logs all the information required to reproduce an experiment on a different machine (installed packages, 
uncommitted changes etc.). The [ClearML Agent](../clearml_agent) listens to designated queues and when a task is enqueued, 
the agent pulls it, recreates its execution environment, and runs it, reporting its scalars, plots, etc. to the 
experiment manager.

Deploy a ClearML Agent onto any machine (e.g. a cloud VM, a local GPU machine, your own laptop) by simply running the 
following command on it:

```commandline
clearml-agent daemon --queue <queues_to_listen_to> [--docker]
```

Use the ClearML [Autoscalers](../cloud_autoscaling/autoscaling_overview.md) to help you manage cloud workloads in the 
cloud of your choice (AWS, GCP, Azure) and automatically deploy ClearML agents: the autoscaler automatically spins up 
and shuts down instances as needed, according to a resource budget that you set.

### Cloning, Editing, and Enqueuing

![Cloning, editing, enqueuing gif](../img/gif/integrations_yolov5.gif)

Use ClearML's web interface to edit task details, like configuration parameters or input models, then execute the task 
with the new configuration on a remote machine:

* Clone the experiment
* Edit the hyperparameters and/or other details
* Enqueue the task

The ClearML Agent executing the task will use the new values to [override any hard coded values](../clearml_agent).

### Executing a Task Remotely

You can set a task to be executed remotely programmatically by adding [`Task.execute_remotely()`](../references/sdk/task.md#execute_remotely) 
to your script. This method stops the current local execution of the task, and then enqueues it to a specified queue to 
re-run it on a remote machine.

```python
# If executed locally, process will terminate, and a copy will be executed by an agent instead
task.execute_remotely(queue_name='default', exit_process=True)
```
Rewrite scikit-learn integration page (#647) 2023-08-15 16:01:19 +00:00			`---`
Small edits (#701) 2023-11-05 08:30:37 +00:00			`title: scikit-learn`
Rewrite scikit-learn integration page (#647) 2023-08-15 16:01:19 +00:00			`---`

			`:::tip`
			`If you are not already using ClearML, see [Getting Started](../getting_started/ds/ds_first_steps.md) for setup`
			`instructions.`
			`:::`

Small edits (#701) 2023-11-05 08:30:37 +00:00			`ClearML integrates seamlessly with [scikit-learn](https://scikit-learn.org/stable/), automatically logging models created`
Rewrite scikit-learn integration page (#647) 2023-08-15 16:01:19 +00:00			with `joblib`.

			`All you have to do is simply add two lines of code to your scikit-learn script:`

			```python
			`from clearml import Task`
Small edits (#753) 2024-01-10 12:40:19 +00:00
Rewrite scikit-learn integration page (#647) 2023-08-15 16:01:19 +00:00			`task = Task.init(task_name="<task_name>", project_name="<project_name>")`
			```

Small edits (#724) 2023-12-03 12:27:46 +00:00			`And that's it! This creates a [ClearML Task](../fundamentals/task.md) which captures:`
Rewrite scikit-learn integration page (#647) 2023-08-15 16:01:19 +00:00			`* Source code and uncommitted changes`
			`* Installed packages`
			`* Joblib model files`
			`* Console output`
			`* General details such as machine details, runtime, creation date etc.`
			`* Hyperparameters created with standard python packages (e.g. argparse, click, Python Fire, etc.)`
			`* And more`

			`You can view all the task details in the [WebApp](../webapp/webapp_exp_track_visual.md).`

			`## Automatic Logging Control`
			`By default, when ClearML is integrated into your scikit-learn script, it captures models, and`
			`scalars. But, you may want to have more control over what your experiment logs.`

			To control a task's framework logging, use the `auto_connect_frameworks` parameter of [`Task.init()`](../references/sdk/task.md#taskinit).
			Completely disable all automatic logging by setting the parameter to `False`. For finer grained control of logged
			`frameworks, input a dictionary, with framework-boolean pairs.`

			`For example:`

			```python
			`auto_connect_frameworks={`
			`'joblib': False, 'xgboost': True, 'catboost': True, 'tensorflow': True, 'tensorboard': True,`
			`'pytorch': True, 'scikit': True, 'fastai': True, 'lightgbm': False,`
Update `auto_connect_frameworks` info (#718) 2023-11-23 13:05:19 +00:00			`'hydra': True, 'detect_repository': True, 'tfdefines': True, 'megengine': True`
Rewrite scikit-learn integration page (#647) 2023-08-15 16:01:19 +00:00			`}`
			```

			`You can also input wildcards as dictionary values, so ClearML will log a model created by a framework only if its local`
			`path matches at least one wildcard.`

			For example, in the code below, ClearML will log joblib models only if their paths have the `.pkl` extension. The
			`unspecified frameworks' values default to true so all their models are automatically logged.`

			```python
			`auto_connect_frameworks={'joblib' : '*.pkl'}`
			```

			`## Manual Logging`
			`To augment its automatic logging, ClearML also provides an explicit logging interface.`

			`See more information about explicitly logging information to a ClearML Task:`
			`* [Models](../clearml_sdk/model_sdk.md#manually-logging-models)`
			`* [Configuration](../clearml_sdk/task_sdk.md#configuration) (e.g. parameters, configuration files)`
			`* [Artifacts](../clearml_sdk/task_sdk.md#artifacts) (e.g. output files or python objects created by a task)`
			`* [Scalars](../clearml_sdk/task_sdk.md#scalars)`
			`* [Text/Plots/Debug Samples](../fundamentals/logger.md#manual-reporting)`

			`See [Explicit Reporting Tutorial](../guides/reporting/explicit_reporting.md).`

			`## Examples`

			`Take a look at ClearML's scikit-learn examples. The examples use scikit-learn and ClearML in different configurations with`
			`additional tools, like Matplotlib:`
Small edits (#701) 2023-11-05 08:30:37 +00:00			`* [scikit-learn with Joblib](../guides/frameworks/scikit-learn/sklearn_joblib_example.md) - Demonstrates ClearML automatically logging the models created with joblib and a scatter plot created by Matplotlib.`
			`* [scikit-learn with Matplotlib](../guides/frameworks/scikit-learn/sklearn_matplotlib_example.md) - Demonstrates ClearML automatically logging scatter diagrams created with Matplotlib.`
Rewrite scikit-learn integration page (#647) 2023-08-15 16:01:19 +00:00

			`## Remote Execution`
			`ClearML logs all the information required to reproduce an experiment on a different machine (installed packages,`
			`uncommitted changes etc.). The [ClearML Agent](../clearml_agent) listens to designated queues and when a task is enqueued,`
			`the agent pulls it, recreates its execution environment, and runs it, reporting its scalars, plots, etc. to the`
			`experiment manager.`

			`Deploy a ClearML Agent onto any machine (e.g. a cloud VM, a local GPU machine, your own laptop) by simply running the`
			`following command on it:`

			```commandline
			`clearml-agent daemon --queue <queues_to_listen_to> [--docker]`
			```

Small edits (#668) 2023-09-11 10:33:30 +00:00			`Use the ClearML [Autoscalers](../cloud_autoscaling/autoscaling_overview.md) to help you manage cloud workloads in the`
Rewrite scikit-learn integration page (#647) 2023-08-15 16:01:19 +00:00			`cloud of your choice (AWS, GCP, Azure) and automatically deploy ClearML agents: the autoscaler automatically spins up`
			`and shuts down instances as needed, according to a resource budget that you set.`

			`### Cloning, Editing, and Enqueuing`

			`![Cloning, editing, enqueuing gif](../img/gif/integrations_yolov5.gif)`

			`Use ClearML's web interface to edit task details, like configuration parameters or input models, then execute the task`
			`with the new configuration on a remote machine:`

			`* Clone the experiment`
			`* Edit the hyperparameters and/or other details`
			`* Enqueue the task`

			`The ClearML Agent executing the task will use the new values to [override any hard coded values](../clearml_agent).`

			`### Executing a Task Remotely`

			You can set a task to be executed remotely programmatically by adding [`Task.execute_remotely()`](../references/sdk/task.md#execute_remotely)
			`to your script. This method stops the current local execution of the task, and then enqueues it to a specified queue to`
			`re-run it on a remote machine.`

			```python
			`# If executed locally, process will terminate, and a copy will be executed by an agent instead`
			`task.execute_remotely(queue_name='default', exit_process=True)`
			```