clearml/docs/tutorials/Getting_Started_1_Experiment_Management.ipynb
2023-03-12 16:58:48 +02:00

696 lines
130 KiB
Plaintext

{
"cells": [
{
"attachments": {},
"cell_type": "markdown",
"metadata": {
"id": "fauyMKCEJ6x4"
},
"source": [
"<div align=\"center\">\n",
"\n",
" <a href=\"https://clear.ml\" target=\"_blank\">\n",
" <img width=\"512\", src=\"https://github.com/allegroai/clearml/raw/master/docs/clearml-logo.svg\"></a>\n",
"\n",
"\n",
"<br>\n",
"\n",
"<h1>Notebook 1: Experiment Management</h1>\n",
"\n",
"<br>\n",
"\n",
"Hi there! This is the ClearML getting started notebook, meant to teach you the ropes. ClearML has a lot of modules that you can use, so in this notebook, we'll start with the most well-known one: <a href=\"https://app.clear.ml/projects\" target=\"_blank\">Experiment Management.</a>\n",
"\n",
"You can find out more details about the other ClearML modules and the technical specifics of each in <a href=\"https://clear.ml/docs\" target=\"_blank\">our documentation.</a>\n",
"\n",
"\n",
"<table>\n",
"<tbody>\n",
" <tr>\n",
" <td><b>Step 1: Experiment Management</b></td>\n",
" <td><a target=\"_blank\" href=\"https://colab.research.google.com/github/allegroai/clearml/blob/master/docs/tutorials/Getting_Started_1_Experiment_Management.ipynb\">\n",
" <img src=\"https://colab.research.google.com/assets/colab-badge.svg\" alt=\"Open In Colab\"/>\n",
"</a></td>\n",
" </tr>\n",
" <tr>\n",
" <td>Step 2: Remote Agent</td>\n",
" <td><a target=\"_blank\" href=\"https://colab.research.google.com/github/allegroai/clearml/blob/master/docs/tutorials/Getting_Started_2_Setting_Up_Agent.ipynb\">\n",
" <img src=\"https://colab.research.google.com/assets/colab-badge.svg\" alt=\"Open In Colab\"/>\n",
"</a></td>\n",
" </tr>\n",
" <tr>\n",
" <td>Step 3: Remote Task Execution</td>\n",
" <td><a target=\"_blank\" href=\"https://colab.research.google.com/github/allegroai/clearml/blob/master/docs/tutorials/Getting_Started_3_Remote_Execution.ipynb\">\n",
" <img src=\"https://colab.research.google.com/assets/colab-badge.svg\" alt=\"Open In Colab\"/>\n",
"</a></td>\n",
" </tr>\n",
"</tbody>\n",
"</table>\n",
"\n",
"</div>"
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "s8eUiQauOA31"
},
"source": [
"# 📦 Setup\n",
"\n",
"Since we are using a notebook here, we're importing the special `browser_login` function, it will try to help you easily log in. If it doesn't work, don't worry, it will guide you through the steps to get it done :)\n",
"\n",
"**If it asks you to generate new credentials, keep them handy, you'll need them again in later notebooks**\n",
"\n",
"When installing ClearML in a normal python environment (not a colab notebook), you'll want to use `clearml-init` instead. It, too, will guide you through the setup.\n",
"\n",
"What we're doing here is connecting to a ClearML server, that will store all your experiment details."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"id": "IMZsEYw5J5JI"
},
"outputs": [],
"source": [
"%pip install --upgrade xgboost clearml\n",
"import clearml\n",
"clearml.browser_login()"
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "TLsgNR5PhPPN"
},
"source": [
"# ✈️ Example: XGBoost\n",
"\n",
"Let's start simple, by adding the ClearML experiment tracker to an XGBoost training script.\n",
"\n",
"The important parts are:\n",
"\n",
"- Initializing ClearML. Always do this as a very first line if possible!\n",
"- Manually log the parameter dict (e.g. CLI commands are captured automatically)\n",
"\n",
"**⚠️ NOTE: `output_uri` in `Task.init` is an important parameter. By default it is set to `False`, meaning any registered models will NOT be uploaded to ClearML, but their info will be registered. Set this to `True` to automatically upload all model files.**"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"colab": {
"base_uri": "https://localhost:8080/"
},
"id": "CSaL3XTqhYAy",
"outputId": "6c870d67-d4f8-4c11-a356-0678b2fd9a41"
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"ClearML Task: created new task id=38ba982031b04187a59acbd7346c5010\n",
"ClearML results page: https://app.clear.ml/projects/df7858a441f94c29a1230d467f131840/experiments/38ba982031b04187a59acbd7346c5010/output/log\n",
"2023-03-06 16:06:09,239 - clearml.Task - INFO - Storing jupyter notebook directly as code\n"
]
}
],
"source": [
"from sklearn.model_selection import train_test_split\n",
"from sklearn.datasets import load_iris\n",
"from clearml import Task\n",
"import xgboost as xgb\n",
"import numpy as np\n",
"\n",
"\n",
"# Always initialize ClearML before anything else. Automatic hooks will track as\n",
"# much as possible for you!\n",
"task = Task.init(\n",
" project_name=\"Getting Started\",\n",
" task_name=\"XGBoost Training\",\n",
" output_uri=True # IMPORTANT: setting this to True will upload the model\n",
" # If not set the local path of the model will be saved instead!\n",
")\n",
"\n",
"# Training data\n",
"X, y = load_iris(return_X_y=True)\n",
"X_train, X_test, y_train, y_test = train_test_split(\n",
" X, y, test_size=0.2, random_state=100\n",
")\n",
"\n",
"dtrain = xgb.DMatrix(X_train, label=y_train)\n",
"dtest = xgb.DMatrix(X_test, label=y_test)\n",
"\n",
"# Setting the parameters\n",
"params = {\n",
" 'max_depth': 2,\n",
" 'eta': 1,\n",
" 'objective': 'reg:squarederror',\n",
" 'nthread': 4,\n",
" 'eval_metric': 'rmse',\n",
"}\n",
"# Make sure ClearML knows these parameters are our hyperparameters!\n",
"task.connect(params)\n",
"\n",
"# Train the model\n",
"bst = xgb.train(\n",
" params,\n",
" dtrain,\n",
" num_boost_round=100,\n",
" evals=[(dtrain, \"train\"), (dtest, \"test\")],\n",
" verbose_eval=0,\n",
")\n",
"\n",
"# Save the model, saving the model will automatically also register it to \n",
"# ClearML thanks to the automagic hooks\n",
"bst.save_model(\"best_model\")"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"id": "P150YtRbhYww"
},
"outputs": [],
"source": [
"# When a python script ends, the ClearML task is closed automatically. But in\n",
"# a notebook (that never ends), we need to manually close the task.\n",
"task.close()"
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "-jWVBTuXyRHd"
},
"source": [
"## 🔹 XGBoost WebUI\n",
"\n",
"\n",
"After running the code above you should have a new ClearML project on your server called \"Getting Started\".\n",
"\n",
"Inside the task, a lot of things are tracked!\n",
"\n",
"\n",
"\n",
"### Source information (git, installed packages, uncommitted changes, ...)\n",
"\n",
"Naturally, running this notebook doesn't actually give us any git information. Instead ClearML saves the execution order of the cells you've executed until it detected the `Task.close()` command.\n",
"\n",
"![](https://i.imgur.com/DehMv2X.png)\n",
"\n",
"### Configuration\n",
"\n",
"The configuration section holds all the values we added using the `task.connect()` call before. You can also use `task.set_parameter()` for a single value or `task.connect_configuration()` to connect an external configuration file.\n",
"\n",
"![](https://i.imgur.com/hlHcKfm.png)\n",
"\n",
"### Artifacts\n",
"\n",
"\n",
"Artifacts are very flexible and can mean every type of file storage. Mostly this is used to track and save input and output models. In this case we get the saved XGBoost model. But when running a notebook you'll also get the original notebook here as well as an HTML preview!\n",
"\n",
"![](https://i.imgur.com/Ow5meUZ.png)\n",
"\n",
"\n",
"### Scalars\n",
"\n",
"Scalars are the performance values of your models. They can either be a value for each epoch/iteration, which will display them as a plot, or a single one, which displays them as a table. In our example above the scalars where automatically grabbed from XGBoost using our integration!\n",
"\n",
"![](https://i.imgur.com/QBl79GI.png)\n",
"\n",
"\n",
"### Plots and Debug Samples\n",
"\n",
"We haven't generated any plots or debug samples in our XGBoost example. We'll look at these a little later in the tutorial.\n",
"\n",
"\n",
"### Info + Console logs\n",
"\n",
"These sections speak for themselves: you get some additional information (like runtime or original machine hostname) as well as the original console logs.\n",
"\n",
"\n"
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "pkYjS0RTtJ6s"
},
"source": [
"# 🚀 Example: Pytorch + Tensorboard + Matplotlib\n",
"\n",
"As you might have seen, our previous example was missing plots and debug samples, none were logged to ClearML! Luckily, XGBoost is not the only integration that ClearML has, it can also lift scalars, plots and debug samples from other frameworks you most likely use already.\n",
"\n",
"A full list of our integrations can be found [here](https://clear.ml/docs/latest/docs/integrations/libraries)."
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "5iwK3p-X3WIC"
},
"source": [
"## 🔹 Logging scalars through Tensorboard\n",
"\n",
"ClearML will detect this and also log scalars, images etc. to the ClearML experiment manager. So you don't even have to change your existing code!\n",
"\n",
"![](https://i.imgur.com/4MNZdGi.png)"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"colab": {
"base_uri": "https://localhost:8080/"
},
"id": "4aeacPunj_7o",
"outputId": "52a3afcf-bb5e-4be2-e345-5b2199a5358a"
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"ClearML Task: created new task id=0bfefb8b86ba44798d9abe34ba0a6ab4\n",
"ClearML results page: https://app.clear.ml/projects/df7858a441f94c29a1230d467f131840/experiments/0bfefb8b86ba44798d9abe34ba0a6ab4/output/log\n",
"Epoch 1 complete\n",
"Epoch 2 complete\n",
"Epoch 3 complete\n",
"Epoch 4 complete\n",
"Epoch 5 complete\n",
"Epoch 6 complete\n",
"Epoch 7 complete\n",
"Epoch 8 complete\n",
"Epoch 9 complete\n",
"Epoch 10 complete\n"
]
}
],
"source": [
"import torch\n",
"import torch.nn as nn\n",
"import torch.optim as optim\n",
"from clearml import Task\n",
"from torch.utils.data import DataLoader\n",
"from torchvision.datasets import MNIST\n",
"from torchvision.transforms import ToTensor\n",
"from torch.utils.tensorboard import SummaryWriter\n",
"\n",
"\n",
"# Always initialize ClearML before anything else. Automatic hooks will track as\n",
"# much as possible for you (such as in this case TensorBoard logs)!\n",
"task = Task.init(project_name=\"Getting Started\", task_name=\"TB Logging\")\n",
"\n",
"# Set up TensorBoard logging\n",
"writer = SummaryWriter()\n",
"\n",
"# Load MNIST dataset\n",
"train_data = MNIST('data', train=True, download=True, transform=ToTensor())\n",
"train_loader = DataLoader(train_data, batch_size=64, shuffle=True)\n",
"\n",
"# Define model\n",
"model = nn.Sequential(\n",
" nn.Linear(784, 128),\n",
" nn.ReLU(),\n",
" nn.Linear(128, 10)\n",
")\n",
"\n",
"# Define loss and optimizer\n",
"criterion = nn.CrossEntropyLoss()\n",
"optimizer = optim.SGD(model.parameters(), lr=0.01)\n",
"\n",
"# Train the model\n",
"for epoch in range(10):\n",
" for i, (inputs, labels) in enumerate(train_loader):\n",
" # Flatten input images\n",
" inputs = inputs.view(-1, 784)\n",
" \n",
" # Zero the gradients\n",
" optimizer.zero_grad()\n",
" \n",
" # Forward pass\n",
" outputs = model(inputs)\n",
" loss = criterion(outputs, labels)\n",
" \n",
" # Backward pass and update parameters\n",
" loss.backward()\n",
" optimizer.step()\n",
" \n",
" # Log loss to TensorBoard\n",
" # ClearML will detect this and also log the scalar to the ClearML\n",
" # experiment manager. So you don't even have to change your existing code!\n",
" writer.add_scalar('Training loss', loss.item(), epoch * len(train_loader) + i)\n",
" \n",
" print(f'Epoch {epoch + 1} complete')\n",
" \n",
"# Close TensorBoard writer\n",
"writer.close()"
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "fsHJZGcC3afb"
},
"source": [
"## 🔹 Logging debug samples through matplotlib\n",
"\n",
"Whenever you use `plt.imshow()` ClearML will intercept the call and immediately log the image to the experiment manager. The images will become visible under the Debug Samples tab.\n",
"\n",
"![](https://i.imgur.com/FuHTPKP.png)\n",
"\n",
"You can log basically any media type (images, videos, audio, ...) as a debug sample. Check [our docs](https://clear.ml/docs/latest/docs/references/sdk/logger#report_media) for more info."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"colab": {
"base_uri": "https://localhost:8080/",
"height": 281
},
"id": "bAEB04OUuI3R",
"outputId": "d9e8e484-3903-4212-9674-7490cbb192e3"
},
"outputs": [
{
"data": {
"image/png": "",
"text/plain": [
"<Figure size 432x288 with 1 Axes>"
]
},
"metadata": {
"needs_background": "light"
},
"output_type": "display_data"
}
],
"source": [
"import matplotlib.pyplot as plt\n",
"import torchvision\n",
"\n",
"# Helper function to show an image\n",
"def matplotlib_imshow(img):\n",
" img = img.mean(dim=0)\n",
" img = img / 2 + 0.5 # unnormalize\n",
" npimg = img.numpy()\n",
" plt.title(\"MNIST Images\")\n",
" plt.imshow(npimg, cmap=\"Greys\")\n",
"\n",
"# get some random training images\n",
"dataiter = iter(train_loader)\n",
"images, labels = next(dataiter)\n",
"\n",
"# create grid of images\n",
"img_grid = torchvision.utils.make_grid(images)\n",
"\n",
"# show images\n",
"matplotlib_imshow(img_grid)"
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "pjtyBV0h5Aak"
},
"source": [
"## 🔹 Logging plots through Matplotlib\n",
"\n",
"Similar to above, matplotlib is automatically captured, but this time, we use `plt.show()` (implicit inside `ConfusionMatrixDisplay`) which logs the result as a plot!\n",
"\n",
"![](https://i.imgur.com/Q4H7RDM.png)\n",
"\n"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"colab": {
"base_uri": "https://localhost:8080/",
"height": 313
},
"id": "GvM3VYPvxZoR",
"outputId": "b1d637a6-47b5-446c-9728-d6b720106a51"
},
"outputs": [
{
"data": {
"text/plain": [
"<sklearn.metrics._plot.confusion_matrix.ConfusionMatrixDisplay at 0x7efe139e0b50>"
]
},
"execution_count": 24,
"metadata": {},
"output_type": "execute_result"
},
{
"data": {
"image/png": "",
"text/plain": [
"<Figure size 432x288 with 2 Axes>"
]
},
"metadata": {
"needs_background": "light"
},
"output_type": "display_data"
}
],
"source": [
"from sklearn.metrics import confusion_matrix, ConfusionMatrixDisplay\n",
"\n",
"\n",
"# Load MNIST test dataset\n",
"test_data = MNIST('data', train=False, download=True, transform=ToTensor())\n",
"test_loader = DataLoader(test_data, batch_size=64, shuffle=False)\n",
"\n",
"# Test the model and compute confusion matrix\n",
"y_true = []\n",
"y_pred = []\n",
"model.eval()\n",
"with torch.no_grad():\n",
" for inputs, labels in test_loader:\n",
" inputs = inputs.view(-1, 784)\n",
" outputs = model(inputs)\n",
" _, predicted = torch.max(outputs, 1)\n",
" y_true.extend(labels.numpy())\n",
" y_pred.extend(predicted.numpy())\n",
"cm = confusion_matrix(y_true, y_pred)\n",
"\n",
"# Display confusion matrix\n",
"plt.title(\"Confusion Matrix Logging\")\n",
"ConfusionMatrixDisplay(cm).plot(ax=plt.gca())"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"id": "6hvbrbvEp3yb"
},
"outputs": [],
"source": [
"# When a python script ends, the ClearML task is closed automatically. But in\n",
"# a notebook (that never ends), we need to manually close the task.\n",
"task.close()"
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "nXCUDvfZ8kAs"
},
"source": [
"# 🚁 Example: Sklearn\n",
"\n",
"As a third example, let's train an sklearn example. Here, too, ClearML will automatically capture a number of outputs, which we will describe in the code comments."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"colab": {
"base_uri": "https://localhost:8080/",
"height": 264
},
"id": "I_UH3SSZ9WQ0",
"outputId": "e6c6c58d-db9d-4dd0-edb2-3d556283ae55"
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"ClearML Task: created new task id=62fb0fc73d384181a220dfb8adb2c75a\n",
"ClearML results page: https://app.clear.ml/projects/df7858a441f94c29a1230d467f131840/experiments/62fb0fc73d384181a220dfb8adb2c75a/output/log\n"
]
},
{
"data": {
"image/png": "",
"text/plain": [
"<Figure size 288x216 with 1 Axes>"
]
},
"metadata": {},
"output_type": "display_data"
}
],
"source": [
"import joblib\n",
"\n",
"from sklearn import datasets\n",
"from sklearn.linear_model import LogisticRegression\n",
"from sklearn.model_selection import train_test_split\n",
"import numpy as np\n",
"import matplotlib.pyplot as plt\n",
"\n",
"from clearml import Task\n",
"\n",
"\n",
"# Connecting ClearML with the current process,\n",
"# from here on everything is logged automatically\n",
"task = Task.init(\n",
" project_name=\"Getting Started\",\n",
" task_name=\"Scikit-Learn\",\n",
" output_uri=True\n",
")\n",
"\n",
"iris = datasets.load_iris()\n",
"X = iris.data\n",
"y = iris.target\n",
"\n",
"X_train, X_test, y_train, y_test = \\\n",
" train_test_split(X, y, test_size=0.2, random_state=42)\n",
"\n",
"model = LogisticRegression(solver='liblinear', multi_class='auto')\n",
"model.fit(X_train, y_train)\n",
"\n",
"# Using joblib to save the model will automatically register it to ClearML, too!\n",
"joblib.dump(model, 'model.pkl', compress=True)\n",
"\n",
"loaded_model = joblib.load('model.pkl')\n",
"result = loaded_model.score(X_test, y_test)\n",
"x_min, x_max = X[:, 0].min() - .5, X[:, 0].max() + .5\n",
"y_min, y_max = X[:, 1].min() - .5, X[:, 1].max() + .5\n",
"h = .02 # step size in the mesh\n",
"xx, yy = np.meshgrid(np.arange(x_min, x_max, h), np.arange(y_min, y_max, h))\n",
"plt.figure(1, figsize=(4, 3))\n",
"\n",
"plt.scatter(X[:, 0], X[:, 1], c=y, edgecolors='k', cmap=plt.cm.Paired)\n",
"plt.title(\"Iris Dataset\")\n",
"plt.xlabel('Sepal length')\n",
"plt.ylabel('Sepal width')\n",
"\n",
"plt.xlim(xx.min(), xx.max())\n",
"plt.ylim(yy.min(), yy.max())\n",
"plt.xticks(())\n",
"plt.yticks(())\n",
"\n",
"# Plt.show() will trigger ClearML to log the resulting plot automatically\n",
"plt.show()\n",
"\n",
"# Always close the task when in a notebook! If using a python file, the task is\n",
"# closed automatically when the script ends.\n",
"task.close()"
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "AaxJ4QmErwTw"
},
"source": [
"# ✋ Manual Logging\n",
"\n",
"Naturally, when our integrations aren't cutting it for you, you can always manually log anything you wish!\n",
"\n",
"Check our documentation for [a list of examples](https://clear.ml/docs/latest/docs/fundamentals/logger#explicit-reporting-examples) of how to manually log anything else!"
]
},
{
"attachments": {},
"cell_type": "markdown",
"metadata": {
"id": "Rn1Csnat7b5D"
},
"source": [
"# 🥾 Next Steps\n",
"\n",
"Now that you have the basics of the experiment manager, take a look at **running this experiment remotely**! Start by setting up a remote agent and then clone and enqueue these experiments using the ClearML orchestration component.\n",
"\n",
"The [second notebook](https://colab.research.google.com/github/allegroai/clearml/blob/master/docs/tutorials/Getting_Started_2_Setting_Up_Agent.ipynb) in this series will get you started with this."
]
},
{
"attachments": {},
"cell_type": "markdown",
"metadata": {},
"source": [
"<table>\n",
"<tbody>\n",
" <tr>\n",
" <td>Step 1: Experiment Management</td>\n",
" <td><a target=\"_blank\" href=\"https://colab.research.google.com/github/allegroai/clearml/blob/master/docs/tutorials/Getting_Started_1_Experiment_Management.ipynb\">\n",
" <img src=\"https://colab.research.google.com/assets/colab-badge.svg\" alt=\"Open In Colab\"/>\n",
"</a></td>\n",
" </tr>\n",
" <tr>\n",
" <td><b>NEXT UP -> Step 2: Remote Execution Agent Setup</b></td>\n",
" <td><a target=\"_blank\" href=\"https://colab.research.google.com/github/allegroai/clearml/blob/master/docs/tutorials/Getting_Started_2_Setting_Up_Agent.ipynb\">\n",
" <img src=\"https://colab.research.google.com/assets/colab-badge.svg\" alt=\"Open In Colab\"/>\n",
"</a></td>\n",
" </tr>\n",
" <tr>\n",
" <td>Step 3: Remotely Execute Tasks</td>\n",
" <td><a target=\"_blank\" href=\"https://colab.research.google.com/github/allegroai/clearml/blob/master/docs/tutorials/Getting_Started_3_Remote_Execution.ipynb\">\n",
" <img src=\"https://colab.research.google.com/assets/colab-badge.svg\" alt=\"Open In Colab\"/>\n",
"</a></td>\n",
" </tr>\n",
"</tbody>\n",
"</table>"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": []
}
],
"metadata": {
"accelerator": "GPU",
"colab": {
"collapsed_sections": [
"TLsgNR5PhPPN",
"pkYjS0RTtJ6s",
"5iwK3p-X3WIC",
"fsHJZGcC3afb",
"nXCUDvfZ8kAs"
],
"provenance": [],
"toc_visible": true
},
"gpuClass": "standard",
"kernelspec": {
"display_name": "Python 3",
"name": "python3"
},
"language_info": {
"name": "python"
}
},
"nbformat": 4,
"nbformat_minor": 0
}