Initial commit

This commit is contained in:
allegroai
2021-05-14 02:48:51 +03:00
parent dc5a4e8a0d
commit 77c9a91a95
645 changed files with 37481 additions and 14 deletions

View File

@@ -0,0 +1,51 @@
---
title: Audio Classification - Jupyter Notebooks
---
The example [audio_classification_UrbanSound8K.ipynb](https://github.com/allegroai/clearml/blob/master/examples/frameworks/pytorch/notebooks/audio/audio_classifier_UrbanSound8K.ipynb) demonstrates integrating **ClearML** into a Jupyter Notebook which uses PyTorch, TensorBoard, and TorchVision to train a neural network on the UrbanSound8K dataset for audio classification. The example calls TensorBoard methods in training and testing to report scalars, audio debug samples, and spectrogram visualizations. The spectrogram visualizations are plotted by calling Matplotlib methods. In the example, we also demonstrate connecting parameters to a Task and logging them. When the script runs, it creates an experiment named `audio classifier` which is associated with the `Audio Example` project.
## Scalars
The accuracy, learning rate, and training loss scalars are automatically logged, along with the resource utilization plots (titled **:monitor: machine**), and appear **RESULTS** **>** **SCALARS**.
![image](../../../../../img/examples_audio_classification_UrbanSound8K_03.png)
## Debug samples
The audio samples and spectrogram plots are automatically logged and appear in **RESULTS** **>** **DEBUG SAMPLES**.
### Audio samples
![image](../../../../../img/examples_audio_classification_UrbanSound8K_06.png)
By doubling clicking a thumbnail, you can play an audio sample.
### Spectrogram visualizations
![image](../../../../../img/examples_audio_classification_UrbanSound8K_04.png)
By doubling clicking a thumbnail, you can view a spectrogram plot in the image viewer.
![image](../../../../../img/examples_audio_classification_UrbanSound8K_05.png)
## Hyperparameters
**ClearML** automatically logs TensorFlow Definitions. A parameter dictionary is logged by connecting it to the Task using
a call to the [Task.connect](../../../../../references/sdk/task.md#connect) method.
configuration_dict = {'number_of_epochs': 10, 'batch_size': 4, 'dropout': 0.25, 'base_lr': 0.001}
configuration_dict = task.connect(configuration_dict) # enabling configuration override by clearml
Parameter dictionaries appear in **CONFIGURATIONS** **>** **HYPER PARAMETERS** **>** **General**.
![image](../../../../../img/examples_audio_classification_UrbanSound8K_01.png)
TensorFlow Definitions appear in the **TF_DEFINE** subsection.
![image](../../../../../img/examples_audio_classification_UrbanSound8K_01a.png)
## Log
Text printed to the console for training progress, as well as all other console output, appear in **RESULTS** **>** **LOG**.
![image](../../../../../img/examples_audio_classification_UrbanSound8K_02.png)

View File

@@ -0,0 +1,31 @@
---
title: Audio Preprocessing - Jupyter Notebook
---
The example [audio_preprocessing_example.ipynb](https://github.com/allegroai/clearml/blob/master/examples/frameworks/pytorch/notebooks/audio/audio_preprocessing_example.ipynb)
demonstrates integrating **ClearML** into a Jupyter Notebook which uses PyTorch and preprocesses audio samples. **ClearML** automatically logs spectrogram visualizations reported by calling Matplotlib methods, and audio samples reported by calling TensorBoard methods. In the example, we also demonstrate connecting parameters to a Task and logging them. When the script runs, it creates an experiment named `data pre-processing`, which is associated with the `Audio Example` project.
## Plots
**ClearML** automatically logs the waveform which the example reports by calling a Matplotlib method. These appear in **RESULTS** **>** **PLOTS**.
![image](../../../../../img/examples_audio_preprocessing_example_08.png)
## Debug samples
**ClearML** automatically logs the audio samples which the example reports by calling TensorBoard methods, and the spectrogram visualizations reported by calling Matplotlib methods. They appear in **RESULTS** **>** **DEBUG SAMPLES**.
### Audio samples
You can play the audio samples by double clicking the audio thumbnail.
![image](../../../../../img/examples_audio_preprocessing_example_03.png)
### Spectrogram visualizations
![image](../../../../../img/examples_audio_preprocessing_example_06.png)
![image](../../../../../img/examples_audio_preprocessing_example_06a.png)
You can view the spectrogram visualizations in the **ClearML Web UI** image viewer.
![image](../../../../../img/examples_audio_preprocessing_example_07.png)

View File

@@ -0,0 +1,124 @@
---
title: Image Hyperparameter Optimization - Jupyter Notebook
---
[hyperparameter_search.ipynb](https://github.com/allegroai/clearml/blob/master/examples/frameworks/pytorch/notebooks/image/hyperparameter_search.ipynb)
demonstrates integrating **ClearML** into a Jupyter Notebook which performs automated hyperparameter optimization. This
is an example of **ClearML** [automation](../../../../../references/sdk/automation_controller_pipelinecontroller). It creates a **ClearML**
[HyperParameterOptimizer](../../../../../references/sdk/hpo_optimization_hyperparameteroptimizer.md)
object, which is a search controller. The search controller's search strategy optimizer is [OptimizerBOHB](../../../../../references/sdk/hpo_hpbandster_bandster_optimizerbohb.md)
The example maximizes total accuracy by finding an optimal batch size, base learning rate, and dropout. **ClearML**
automatically logs the optimization's top performing experiments.
The experiment whose hyperparameters are optimized is named `image_classification_CIFAR10`. It is created by running another
**ClearML** example, [image_classification_CIFAR10.ipynb](https://github.com/allegroai/clearml/blob/master/examples/frameworks/pytorch/notebooks/image/image_classification_CIFAR10.ipynb), which must run before `hyperparameter_search.ipynb`.
When `hyperparameter_search.py` runs, it creates an experiment named `Hyper-Parameter Optimization` which is associated
with the `Hyper-Parameter Search` project.
The optimizer Task, `Hyper-Parameter Optimization`, and the experiments appear individually in the **ClearML Web UI**.
## Optimizer Task
### Scalars
Scalars for total accuracy and remaining budget by iteration, and a plot of total accuracy by iteration appear in **RESULTS** **>** **SCALARS**. Remaining budget indicates the percentage of total iterations for all jobs left before that total is reached.
These scalars are reported automatically by **ClearML** from `HyperParameterOptimizer` when it runs.
![image](../../../../../img/examples_hyperparameter_search_04.png)
### Plots
A plot for the optimization of total accuracy by job appears in **RESULTS** **>** **SCALARS**.
This is also reported automatically by **ClearML** when `HyperParameterOptimizer` runs.
![image](../../../../../img/examples_hyperparameter_search_05.png)
### Hyperparameters
`HyperParameterOptimizer` hyperparameters, including the optimizer parameters appear in **CONFIGURATIONS** **>** **HYPER PARAMETERS**.
These hyperparameters are those in the optimizer Task, where the `HyperParameterOptimizer` object is created.
optimizer = HyperParameterOptimizer(
base_task_id=TEMPLATE_TASK_ID, # This is the experiment we want to optimize
# here we define the hyper-parameters to optimize
hyper_parameters=[
UniformIntegerParameterRange('number_of_epochs', min_value=5, max_value=15, step_size=1),
UniformIntegerParameterRange('batch_size', min_value=2, max_value=12, step_size=2),
UniformParameterRange('dropout', min_value=0, max_value=0.5, step_size=0.05),
UniformParameterRange('base_lr', min_value=0.0005, max_value=0.01, step_size=0.0005),
],
# this is the objective metric we want to maximize/minimize
objective_metric_title='accuracy',
objective_metric_series='total',
objective_metric_sign='max', # maximize or minimize the objective metric
max_number_of_concurrent_tasks=3, # number of concurrent experiments
# setting optimizer - clearml supports GridSearch, RandomSearch or OptimizerBOHB
optimizer_class=OptimizerBOHB, # can be replaced with OptimizerBOHB
execution_queue='default', # queue to schedule the experiments for execution
optimization_time_limit=30., # time limit for each experiment (optional, ignored by OptimizerBOHB)
pool_period_min=1, # Check the experiments every x minutes
# set the maximum number of experiments for the optimization.
# OptimizerBOHB sets the total number of iteration as total_max_jobs * max_iteration_per_job
total_max_jobs=12,
# setting OptimizerBOHB configuration (ignored by other optimizers)
min_iteration_per_job=15000, # minimum number of iterations per experiment, till early stopping
max_iteration_per_job=150000, # maximum number of iterations per experiment
)
![image](../../../../../img/examples_hyperparameter_search_01.png)
### Log
All console output from `Hyper-Parameter Optimization` appears in **RESULTS** tab, **LOG** sub-tab.
![image](../../../../../img/examples_hyperparameter_search_03.png)
## Experiments comparison
**ClearML** automatically logs each job, meaning each experiment that executes with a set of hyperparameters, separately. Each appears as an individual experiment in the **ClearML Web UI**, where the Task name is `image_classification_CIFAR10` and the hyperparameters appended.
For example:
`image_classification_CIFAR10: base_lr=0.0075 batch_size=12 dropout=0.05 number_of_epochs=6`
Use the **ClearML Web UI** [experiment comparison](../../../../../webapp/webapp_exp_comparing.md) to visualize the following:
* Side by side hyperparameter value comparison
* Metric comparison by hyperparameter
* Scalars by specific values and series
* Plots
* Debug images
### Side by side hyperparameter value comparison
In the experiment comparison window, **HYPER PARAMETERS** tab, select **Values** in the list (the right of **+ Add Experiment**), and hyperparameter differences appear with a different background color.
![image](../../../../../img/examples_hyperparameter_search_06.png)
### Metric comparison by hyperparameter
Select **Parallel Coordinates** in the list, click a **Performance Metric**, and then select the checkboxes of the hyperparameters.
![image](../../../../../img/examples_hyperparameter_search_07.png)
### Scalar values comparison
In the **SCALARS** tab, select **Last Values**, **Min Values**, or **Max Values**. Value differences appear with a different background color.
![image](../../../../../img/examples_hyperparameter_search_09.png)
### Scalar series comparison
Select **Graph** and the scalar series for the jobs appears, where each scalar plot shows the series for all jobs.
![image](../../../../../img/examples_hyperparameter_search_08.png)
### Debug samples comparison
In the **DEBUG SAMPLES** tab, debug images appear.
![image](../../../../../img/examples_hyperparameter_search_10.png)

View File

@@ -0,0 +1,50 @@
---
title: Image Classification - Jupyter Notebook
---
The example [image_classification_CIFAR10.ipynb](https://github.com/allegroai/clearml/blob/master/examples/frameworks/pytorch/notebooks/image/image_classification_CIFAR10.ipynb)
demonstrates integrating **ClearML** into a Jupyter Notebook, which uses PyTorch, TensorBoard, and TorchVision to train a
neural network on the UrbanSound8K dataset for image classification. **ClearML** automatically logs the example script's
calls to TensorBoard methods in training and testing which report scalars and image debug samples, as well as the model
and console log. In the example, we also demonstrate connecting parameters to a Task and logging them. When the script runs,
it creates an experiment named `image_classification_CIFAR10` which is associated with the `Image Example` project.
Another example optimizes the hyperparameters for this image classification example (see the [Hyperparameter Optimization - Jupyter Notebook](hyperparameter_search.md) documentation page). This image classification example must run before the hyperparameter optimization example.
## Scalars
The accuracy, accuracy per class, and training loss scalars are automatically logged, along with the resource utilization plots (titled **:monitor: machine**), and appear **RESULTS** **>** **SCALARS**.
![image](../../../../../img/examples_image_classification_CIFAR10_05.png)
## Debug samples
The image samples are automatically logged and appear in **RESULTS** **>** **DEBUG SAMPLES**.
![image](../../../../../img/examples_image_classification_CIFAR10_07.png)
By doubling clicking a thumbnail, you can view a spectrogram plot in the image viewer.
![image](../../../../../img/examples_image_classification_CIFAR10_06.png)
## Hyperparameters
**ClearML** automatically logs TensorFlow Definitions. A parameter dictionary is logged by connecting it to the Task using
a call to the [Task.connect](../../../../../references/sdk/task.md#connect) method.
configuration_dict = {'number_of_epochs': 3, 'batch_size': 4, 'dropout': 0.25, 'base_lr': 0.001}
configuration_dict = task.connect(configuration_dict) # enabling configuration override by clearml
Parameter dictionaries appear in **CONFIGURATIONS** **>** **HYPER PARAMETERS** **>** **General**.
![image](../../../../../img/examples_image_classification_CIFAR10_01.png)
TensorFlow Definitions appear in the **TF_DEFINE** subsection.
![image](../../../../../img/examples_image_classification_CIFAR10_01a.png)
## Log
Text printed to the console for training progress, as well as all other console output, appear in **RESULTS** **>** **LOG**.
![image](../../../../../img/examples_image_classification_CIFAR10_04.png)

View File

@@ -0,0 +1,53 @@
---
title: Tabular Data Downloading and Preprocessing - Jupyter Notebook
---
The [download_and_preprocessing.ipynb](https://github.com/allegroai/clearml/blob/master/examples/frameworks/pytorch/notebooks/table/download_and_preprocessing.ipynb) example demonstrates **ClearML** storing preprocessed tabular data as artifacts, and explicitly reporting the tabular data in the **ClearML Web UI**. When the script runs, it creates an experiment named `tabular preprocessing` which is associated with the `Table Example` project.
This tabular data is prepared for another script, [train_tabular_predictor.ipynb](https://github.com/allegroai/clearml/blob/master/examples/frameworks/pytorch/notebooks/table/train_tabular_predictor.ipynb), which trains a network with it.
## Artifacts
The example code preprocesses the downloaded data using Pandas DataFrames, and stores it as three artifacts:
* `Categories per column` - Number of unique values per column of data.
* `Outcome dictionary` - Label enumeration for training.
* `Processed data` - A dictionary containing the paths of the training and validation data.
Each artifact is uploaded by calling the [Task.upload_artifact](../../../../../references/sdk/task.md#upload_artifact)
method. Artifacts appear in the **ARTIFACTS** tab.
![image](../../../../../img/download_and_preprocessing_02.png)
## Plots (tables)
The example code explicitly reports the data in Pandas DataFrames by calling the [Logger.report_table](../../../../../references/sdk/logger.md#report_table)
method.
For example, the raw data is read into a Pandas DataFrame named `train_set`, and the `head` of the DataFrame is reported.
train_set = pd.read_csv(Path(path_to_ShelterAnimal) / 'train.csv')
Logger.current_logger().report_table(title='ClearMLet - raw',series='pandas DataFrame',iteration=0, table_plot=train_set.head())
The tables appear in **RESULTS** **>** **PLOTS**.
![image](../../../../../img/download_and_preprocessing_07.png)
## Hyperparameters
A parameter dictionary is logged by connecting it to the Task using a call to the [Task.connect](../../../../../references/sdk/task.md#connect)
method.
logger = task.get_logger()
configuration_dict = {'test_size': 0.1, 'split_random_state': 0}
configuration_dict = task.connect(configuration_dict)
Parameter dictionaries appear in the **General** subsection.
![image](../../../../../img/download_and_preprocessing_01.png)
## Log
Output to the console appears in **RESULTS** **>** **LOG**.
![image](../../../../../img/download_and_preprocessing_06.png)

View File

@@ -0,0 +1,219 @@
---
title: Tabular Data Pipeline with Concurrent Steps - Jupyter Notebook
---
This example demonstrates an ML pipeline which preprocesses data in two concurrent steps, trains two networks, where each
network's training depends upon the completion of its own preprocessed data, and picks the best model. It is implemented
using the [automation.controller.PipelineController](../../../../../references/sdk/automation_controller_pipelinecontroller.md)
class.
The pipeline uses four Tasks (each Task is created using a different notebook):
* The pipeline controller Task ([tabular_ml_pipeline.ipynb](https://github.com/allegroai/clearml/blob/master/examples/frameworks/pytorch/notebooks/table/tabular_ml_pipeline.ipynb))
* A data preprocessing Task ([preprocessing_and_encoding.ipynb](https://github.com/allegroai/clearml/blob/master/examples/frameworks/pytorch/notebooks/table/preprocessing_and_encoding.ipynb))
* A training Task ([train_tabular_predictor.ipynb](https://github.com/allegroai/clearml/blob/master/examples/frameworks/pytorch/notebooks/table/train_tabular_predictor.ipynb))
* A better model comparison Task ([pick_best_model.ipynb](https://github.com/allegroai/clearml/blob/master/examples/frameworks/pytorch/notebooks/table/pick_best_model.ipynb))
The `automation.controller.PipelineController` class includes functionality to create a pipeline controller, add steps to the pipeline, pass data from one step to another, control the dependencies of a step beginning only after other steps complete, run the pipeline, wait for it to complete, and cleanup afterwards.
In this pipeline example, the data preprocessing Task and training Task are each added to the pipeline twice (each is in two steps). When the pipeline runs, the data preprocessing Task and training Task are cloned twice, and the newly cloned Tasks execute. The Task they are cloned from, called the base Task, does not execute. The pipeline controller passes different data to each cloned Task by overriding parameters. In this way, the same Task can run more than once in the pipeline, but with different data.
:::note
The data download Task is not a step in the pipeline, see [download_and_split](https://github.com/allegroai/clearml/blob/master/examples/frameworks/pytorch/notebooks/table/download_and_split.ipynb).
:::
## Pipeline controller and steps
In this example, a pipeline controller object is created.
pipe = PipelineController(default_execution_queue='dan_queue', add_pipeline_tags=True)
### Preprocessing step
Two preprocessing nodes are added to the pipeline. These steps will run concurrently.
pipe.add_step(name='preprocessing_1', base_task_project='Tabular Example', base_task_name='tabular preprocessing',
parameter_override={'General/data_task_id': '39fbf86fc4a341359ac6df4aa70ff91b',
'General/fill_categorical_NA': 'True',
'General/fill_numerical_NA': 'True'})
pipe.add_step(name='preprocessing_2', base_task_project='Tabular Example', base_task_name='tabular preprocessing',
parameter_override={'General/data_task_id': '39fbf86fc4a341359ac6df4aa70ff91b',
'General/fill_categorical_NA': 'False',
'General/fill_numerical_NA': 'True'})
The preprocessing data Task fills in values of `NaN` data based on the values of the parameters named `fill_categorical_NA`
and `fill_numerical_NA`. It will connect a parameter dictionary to the Task which contains keys with those same names.
The pipeline will override the values of those keys when the pipeline executes the cloned Tasks of the base Task. In this way,
two sets of data are created in the pipeline.
<details className="cml-expansion-panel info">
<summary className="cml-expansion-panel-summary">ClearML tracks and reports the preprocessing step</summary>
<div className="cml-expansion-panel-content">
In the preprocessing data Task, the parameter values in ``data_task_id``, ``fill_categorical_NA``, and ``fill_numerical_NA`` are overridden.
configuration_dict = {'data_task_id': '39fbf86fc4a341359ac6df4aa70ff91b',
'fill_categorical_NA': True, 'fill_numerical_NA': True}
configuration_dict = task.connect(configuration_dict) # enabling configuration override by clearml
**ClearML** tracks and reports each instance of the preprocessing Task.
The raw data appears as a table in **RESULTS** **>** **PLOTS**.
These images are from one of the two preprocessing Tasks.
![image](../../../../../img/preprocessing_and_encoding_02.png)
The data after filling NA values is also reported.
![image](../../../../../img/preprocessing_and_encoding_03.png)
After an outcome dictionary (label enumeration) is created, it appears in **ARTIFACTS** **>** **OTHER** **>** **Outcome Dictionary**.
![image](../../../../../img/preprocessing_and_encoding_04.png)
The training and validation data is labeled with the encoding and reported as table.
![image](../../../../../img/preprocessing_and_encoding_05.png)
The column categories are created and uploaded as artifacts, which appear in appears in **ARTIFACTS** **>** **OTHER** **>** **Outcome Dictionary**.
![image](../../../../../img/preprocessing_and_encoding_06.png)
Finally, the training data and validation data are stored as artifacts.
![image](../../../../../img/preprocessing_and_encoding_07.png)
</div>
</details>
### Training step
Each training node depends upon the completion of one preprocessing node. The parameter `parents` is a list of step names indicating all steps that must complete before the new step starts. In this case, `preprocessing_1` must complete before `train_1` begins, and `preprocessing_2` must complete before `train_2` begins.
The ID of a Task whose artifact contains a set of preprocessed data for training will be overridden using the `data_task_id` key. Its value takes the form `${<stage-name>.<part-of-Task>}`. In this case, `${preprocessing_1.id}` is the ID of one of the preprocessing node Tasks. In this way, each training Task consumes its own set of data.
pipe.add_step(name='train_1', parents=['preprocessing_1'],
base_task_project='Tabular Example', base_task_name='tabular prediction',
parameter_override={'General/data_task_id': '${preprocessing_1.id}'})
pipe.add_step(name='train_2', parents=['preprocessing_2'],
base_task_project='Tabular Example', base_task_name='tabular prediction',
parameter_override={'General/data_task_id': '${preprocessing_2.id}'})
<details className="cml-expansion-panel info">
<summary className="cml-expansion-panel-summary">ClearML tracks and reports the training step</summary>
<div className="cml-expansion-panel-content">
In the training Task, the ``data_task_id`` parameter value is overridden. This allows the pipeline controller to pass a
different Task ID to each instance of training, where each Task has an artifact containing different data.
configuration_dict = {'data_task_id': 'b605d76398f941e69fc91b43420151d2',
'number_of_epochs': 15, 'batch_size': 100, 'dropout': 0.3, 'base_lr': 0.1}
configuration_dict = task.connect(configuration_dict) # enabling configuration override by clearml
**ClearML** tracks and reports the training step with each instance of the newly cloned and executed training Task.
**ClearML** automatically logs training loss and learning. They appear in **RESULTS** **>** **SCALARS**.
The following images show one of the two training Tasks.
![image](../../../../../img/train_tabular_predictor_04.png)
Parameter dictionaries appear in the **General** subsection.
![image](../../../../../img/train_tabular_predictor_01.png)
The TensorFlow Definitions appear in the **TF_DEFINE** subsection.
![image](../../../../../img/train_tabular_predictor_02.png)
</div>
</details>
### Best model step
The best model step depends upon both training nodes completing and takes the two training node Task IDs to override.
pipe.add_step(name='pick_best', parents=['train_1', 'train_2'],
base_task_project='Tabular Example', base_task_name='pick best model',
parameter_override={'General/train_tasks_ids': '[${train_1.id}, ${train_2.id}]'})
The IDs of the training Tasks from the steps named `train_1` and `train_2` are passed to the best model Task. They take the form `${<stage-name>.<part-of-Task>}`.
<details className="cml-expansion-panel info">
<summary className="cml-expansion-panel-summary">ClearML tracks and reports the best model step</summary>
<div className="cml-expansion-panel-content">
In the best model Task, the `train_tasks_ids` parameter is overridden with the Task IDs of the two training tasks.
configuration_dict = {'train_tasks_ids': ['c9bff3d15309487a9e5aaa00358ff091', 'c9bff3d15309487a9e5aaa00358ff091']}
configuration_dict = task.connect(configuration_dict) # enabling configuration override by clearml
The logs shows the Task ID and accuracy for the best model in **RESULTS** **>** **LOGS**.
![image](../../../../../img/tabular_training_pipeline_02.png)
In **ARTIFACTS** **>** **Output Model** is link to the model details.
![image](../../../../../img/tabular_training_pipeline_03.png)
The model details appear in the **MODELS** table **>** **>GENERAL**.
![image](../../../../../img/tabular_training_pipeline_04.png)
</div>
</details>
### Pipeline start, wait, and cleanup
Once all steps are added to the pipeline, start it. Wait for it to complete. Finally, cleanup the pipeline processes.
# Starting the pipeline (in the background)
pipe.start()
# Wait until pipeline terminates
pipe.wait()
# cleanup everything
pipe.stop()
<details className="cml-expansion-panel info">
<summary className="cml-expansion-panel-summary">ClearML tracks and reports the pipeline's execution</summary>
<div className="cml-expansion-panel-content">
ClearML reports the pipeline with its steps in **RESULTS** **>** **PLOTS**.
![image](../../../../../img/tabular_training_pipeline_01.png)
By hovering over a step or path between nodes, you can view information about it.
![image](../../../../../img/tabular_training_pipeline_06.png)
</div>
</details>
## Running the pipeline
**To run the pipeline:**
1. Download the data by running the notebook [download_and_split.ipynb](https://github.com/allegroai/clearml/blob/master/examples/frameworks/pytorch/notebooks/table/download_and_split.ipynb).
1. Run the script for each of the steps, if the script has not run once before.
* [preprocessing_and_encoding.ipynb](https://github.com/allegroai/clearml/blob/master/examples/frameworks/pytorch/notebooks/table/preprocessing_and_encoding.ipynb)
* [train_tabular_predictor.ipynb](https://github.com/allegroai/clearml/blob/master/examples/frameworks/pytorch/notebooks/table/train_tabular_predictor.ipynb)
* [pick_best_model.ipynb](https://github.com/allegroai/clearml/blob/master/examples/frameworks/pytorch/notebooks/table/pick_best_model.ipynb).
1. Run the pipeline controller one of the following two ways:
* Run the notebook [tabular_ml_pipeline.ipynb](https://github.com/allegroai/clearml/blob/master/examples/frameworks/pytorch/notebooks/table/tabular_ml_pipeline.ipynb).
* Remotely execute the Task - If the Task `tabular training pipeline` which is associated with the project `Tabular Example` already exists in **ClearML Server**, clone it and enqueue it to execute.
:::note
If you enqueue a Task, a worker must be listening to that queue for the Task to execute.
:::

View File

@@ -0,0 +1,37 @@
---
title: Text Classification - Jupyter Notebook
---
The example [text_classification_AG_NEWS.ipynb](https://github.com/allegroai/clearml/blob/master/examples/frameworks/pytorch/notebooks/text/text_classification_AG_NEWS.ipynb)
demonstrates using Jupyter Notebook for **ClearML**, and the integration of **ClearML** into code which trains a network
to classify text in the `torchtext` [AG_NEWS](https://pytorch.org/text/stable/datasets.html#ag-news) dataset, and then applies the model to predict the classification of sample text. **ClearML** automatically logs the scalar and console output by calling TensorBoard methods. In the example, we explicitly log parameters with the Task. When the script runs, it creates an experiment named `text classifier` which is associated with the `Text Example` project.
## Scalars
Accuracy, learning rate, and training loss appear in **RESULTS** **>** **SCALARS**, along with the resource utilization plots, which are titled **:monitor: machine**.
![image](../../../../../img/text_classification_AG_NEWS_03.png)
## Hyperparameters
**ClearML** automatically logs the command line options, because the example code uses `argparse`. A parameter dictionary
is logged by connecting it to the Task using a call to the [Task.connect](../../../../../references/sdk/task.md#connect)
method.
configuration_dict = {'number_of_epochs': 6, 'batch_size': 16, 'ngrams': 2, 'base_lr': 1.0}
configuration_dict = task.connect(configuration_dict) # enabling configuration override by clearml
Command line options appear in **CONFIGURATIONS** **>** **HYPER PARAMETERS** **>** **Args**.
![image](../../../../../img/text_classification_AG_NEWS_01.png)
Parameter dictionaries appear in the **General** subsection.
![image](../../../../../img/text_classification_AG_NEWS_01a.png)
## Log
Text printed to the console for training progress, as well as all other console output, appear in **RESULTS** **>** **LOG**.
![image](../../../../../img/text_classification_AG_NEWS_02.png)