mirror of
https://github.com/clearml/clearml-docs
synced 2025-04-03 04:41:56 +00:00
Added the video tutorials (#382)
This commit is contained in:
parent
ba16599bf2
commit
12dd425c2c
@ -0,0 +1,66 @@
|
||||
---
|
||||
title: Agent Remote Execution and Automation
|
||||
---
|
||||
|
||||
|
||||
## Video Tutorial
|
||||
|
||||
<div style={{position: 'relative', overflow: 'hidden', width: '100%', paddingTop: '56.25%' }} >
|
||||
<iframe style={{position: 'absolute', top: '0', left: '0', bottom: '0', right: '0', width: '100%', height: '100%'}}
|
||||
src="https://www.youtube.com/embed/MX3BrXnaULs"
|
||||
title="YouTube video player"
|
||||
frameborder="0"
|
||||
allow="accelerometer; autoplay; clipboard-write; encrypted-media; gyroscope; picture-in-picture; fullscreen"
|
||||
allowfullscreen>
|
||||
</iframe>
|
||||
</div>
|
||||
|
||||
<details className="cml-expansion-panel info">
|
||||
<summary className="cml-expansion-panel-summary">Read the transcript</summary>
|
||||
<div className="cml-expansion-panel-content">
|
||||
Welcome to clearml. In this video we’ll take a look at the ClearML agent, which will allow you to run your tasks remotely and open the door for automating your workflows.
|
||||
|
||||
Remember our overview from the previous videos? We talked about the pip package that allows us to run experiments and data management as well as the server, which stores everything we track. Today we add a third component: the clearml agent.
|
||||
|
||||
The agent will turn any machine, either on premise or in the cloud into a worker that will execute your tasks. So let’s see how that’s done!
|
||||
|
||||
For the purpose of this video, we’ll be running the agent on a simple ubuntu machine, but you can run it anywhere you want.
|
||||
|
||||
The agent can be installed by using the pip package clearml-agent. Then we run the command clearml-agent init to connect our agent to the clearml server.
|
||||
|
||||
Pasting the credentials works the same way as in the experiment manager, but there are several more options you’ll be asked to fill in, when compared to the regular clearml-init command.
|
||||
|
||||
The most important difference is that you’ll also be asked for your git information, this is necessary for the agent to be able to pull your code when it’s asked to run it. You’ll find more information about these settings in our documentation.
|
||||
|
||||
Before we run the agent though, let's take a quick look at what will happen when we spin it up.
|
||||
|
||||
Our server hosts one or more queues in which we can put our tasks. And then we have our agent. By default, it will be running in pip mode, or virtual environment mode. Once an agent pulls a new task from the queue to be executed, it will create a new python virtual environment for it. It will then clone the code itself and install all required python packages in the new virtual environment. It then runs the code and injects any new hyperparameters we changed in the UI.
|
||||
|
||||
PIP mode is really handy and efficient. It will create a new python virtual environment for every task it pulls and will use smart caching so packages or even whole environments can be reused over multiple tasks.
|
||||
|
||||
You can also run the agent in conda mode or poetry mode, which essentially do the same thing as pip mode, only with a conda or poetry environment instead.
|
||||
|
||||
However, there’s also docker mode. In this case the agent will run every incoming task in its own docker container instead of just a virtual environment. This makes things much easier if your tasks have system package dependencies for example, or when not every task uses the same python version. For our example, we’ll be using docker mode.
|
||||
|
||||
Now that our configuration is ready, we can start our agent in docker mode by running the command `clearml-agent daemon –docker`
|
||||
|
||||
|
||||
After running the command, we can see it pop up in our workers table. Now the agent will start listening for tasks in the default queue and it’s ready to go!
|
||||
|
||||
Now, let’s say you have a task that you already ran on your local machine and you tracked it using the 2 magic lines that we saw before. Just like in the last video, we can right click it and clone it, so it’s now in draft mode. We can easily change some of the hyperparameters on the fly and *enqueue* the task.
|
||||
|
||||
The agent will immediately detect that we enqueued a task and start working on it. Like we saw before, it will spin up a docker container, install the required packages and dependencies and run the code.
|
||||
|
||||
The task itself is reported to the experiment manager just like any other task and you can browse its outputs like normal, albeit with the changed parameters we edited earlier during draft mode.
|
||||
|
||||
On the left we can see a button labeled “Workers and Queues”. Under the workers tab we can see that our worker is indeed busy with our task and we can see it’s resource utilization as well. If we click on the current experiment, we end up in our experiment view again. Now, imagine we see in the scalar output that our model isn’t training the way we want it to, we can abort the task here and the agent will start working on the next task in the queue.
|
||||
|
||||
Back to our workers overview. Over in the Queues tab, we get some extra information about which experiments are currently in the queue and we can even change their order by dragging them in the correct position like so. Finally, we have graphs of the overall waiting time and overall amount of enqueued tasks over time.
|
||||
|
||||
Talking of which, let’s say your wait times are very long because all data scientists have collectively decided that now is a perfect time to train their models and your on-premise servers are at capacity. We have built in autoscalers for AWS and GCP (in the works) which will automatically spin up new clearml agent VMs when the queue wait time becomes too long. If you go for the premium tiers of ClearML, you’ll even get a really nice dashboard to go along with it.
|
||||
|
||||
In the following video we’ll go a little deeper yet into this newly discovered automation thing we just saw and introduce things like automatic hyperparameter optimization and pipelines.
|
||||
|
||||
But for now, feel free to start spinning up some agents on your own machines completely for free at app.clear.ml or by using our self hosted server on github and don’t forget to join our slack channel if you need any help.
|
||||
</div>
|
||||
</details>
|
89
docs/getting_started/video_tutorials/clearml-data.md
Normal file
89
docs/getting_started/video_tutorials/clearml-data.md
Normal file
@ -0,0 +1,89 @@
|
||||
---
|
||||
title: ClearML-Data
|
||||
---
|
||||
|
||||
|
||||
## Video Tutorial
|
||||
|
||||
<div style={{position: 'relative', overflow: 'hidden', width: '100%', paddingTop: '56.25%' }} >
|
||||
<iframe style={{position: 'absolute', top: '0', left: '0', bottom: '0', right: '0', width: '100%', height: '100%'}}
|
||||
src="https://www.youtube.com/embed/S2pz9jn26uI"
|
||||
title="YouTube video player"
|
||||
frameborder="0"
|
||||
allow="accelerometer; autoplay; clipboard-write; encrypted-media; gyroscope; picture-in-picture; fullscreen"
|
||||
allowfullscreen>
|
||||
</iframe>
|
||||
</div>
|
||||
|
||||
<details className="cml-expansion-panel info">
|
||||
<summary className="cml-expansion-panel-summary">Read the transcript</summary>
|
||||
<div className="cml-expansion-panel-content">
|
||||
Hello and welcome to ClearML. In this video we’ll take a look at both the command line and python interfaces of our data versioning tool called clearml-data.
|
||||
|
||||
In the world of machine learning, you are very likely dealing with large amounts of data that you need to put into a dataset. ClearML Data solves 2 important challenges that occur in this situation:
|
||||
|
||||
One is accessibility, making sure the data can be accessed from every machine you use. And two is versioning, linking which dataset version was used in which task. This helps to make experiments more reproducible. Moreover, versioning systems like git were never really designed for the size and number of files in machine learning datasets. We’re going to need something else.
|
||||
|
||||
ClearML-data comes built-in with the clearml python package and has both a command line interface for easy and quick operations and a python interface if you want more flexibility. Both interfaces are quite similar, so we’ll address both of them in the video.
|
||||
|
||||
Let’s start with an example. Say I have some files here that I want to put into a dataset and start to keep track of.
|
||||
|
||||
First, we need to actually create an initial dataset version. The easiest way to do this is with the command line interface. Use the command clearml-data create and then give it a name and a project, just like with a ClearML task. It will return the dataset ID, which we will copy for later. The dataset is now initialized, but is still empty because we haven’t added any files yet.
|
||||
|
||||
We can do that by using the clearml-data add command and providing the path to the files we want to add. This will recursively add all files in that path to the Dataset.
|
||||
|
||||
Now we need to tell the server that we’re done here. We can call clearml-data close to upload the files and change the dataset status to done, which finalizes this version of the dataset.
|
||||
|
||||
The process of doing this with the python interface is very similar.
|
||||
|
||||
You can create a new Dataset by importing the Dataset object from the clearml pip package and calling its create method. Now we have to give the dataset a name and a project just like with the command line tool. The create method returns a dataset instance which we will use to do all of our operations on.
|
||||
|
||||
To add some files to this newly created dataset version, call the add_files method on the dataset object and provide a path to a local file or folder. Bear in mind that nothing is uploaded just yet, we’re simply instructing the dataset object what it should do when we eventually *do* want to upload.
|
||||
|
||||
A really useful thing we can do with the python interface is adding some interesting statistics about the dataset itself, such as a plot for example. Here we simply report a histogram on the amount of files in the train and test folders. You can add anything to a dataset that you can add to a clearml task, so go nuts!
|
||||
|
||||
Finally, upload the dataset and then finalize it, or just set auto_upload to true to make it a one liner.
|
||||
|
||||
In the webUI, we can now see the details of our dataset version by clicking on the Dataset button on the left. When we click on our newly created dataset here, we get an overview of our latest version, of course we have only one for now.
|
||||
|
||||
At a glance you can see things like the dataset ID, its size and which files have been changed in this particular version. If you click on details, you’ll get a list of those files in the content tab. Let’s make the view a little larger with this button, so it’s easier to see. When we switch to the preview tab, we can see the histogram we made before as well as an automatically generated preview of some of the files in our dataset version. Feel free to add anything you want in here! Finally you can check out the original console logs that can be handy for debugging.
|
||||
|
||||
Now imagine we’re on a different machine. Maybe one from a team member, a classmate or just one of your remote agents and you want to get the dataset to do something cool with it.
|
||||
|
||||
Using the command line tool, you can download a dataset version locally by using the clearml-data get command and providing its unique ID. You can find a dataset’s ID in the UI here, or alternatively, you can search for a specific dataset by providing the dataset name, its project, some tags attached to the dataset or any combination of the three. Running the command will give you the system path where the data was downloaded.
|
||||
|
||||
That path will be a local cached folder, which means that if you try to get the same dataset again, or any other dataset that’s based on this one, it will check which files are already on your system and it will not download these again.
|
||||
|
||||
The python interface is similar, with one major difference. You can also get a dataset using any combination of name, project, id or tags, but _getting_ the dataset does not mean it is downloaded, we simply got all of the metadata, which we can now access from the dataset object. This is important, as it means you don’t have to download the dataset to make changes to it, or to add files. More on that in just a moment.
|
||||
|
||||
If you do want to download a local copy of the dataset, it has to be done explicitly, by calling get_local_copy which will return the path to which the data was downloaded for you.
|
||||
|
||||
This is a good approach for when you want to just download and use the data. But it *is* a read-only copy, so if we want to add or remove some data to create a new version, we’ll have to get a mutable copy instead, which we can do by using get_local_mutable_copy instead. We can give it a local path and it will download the dataset into that path, but this time, we have full control over the contents.
|
||||
|
||||
We can do this with the command line tool too, by simply adding a –copy flag to the command
|
||||
|
||||
Now that we have this mutable copy, let’s try to change our dataset and create a new version.
|
||||
|
||||
Let’s say we found an issue with the hamburgers so we remove them from the folder. Then we add new pictures of chocolate cake. Essentially, we have now removed 3 files and added 4 new ones.
|
||||
|
||||
Now we can tell clearml that the changes we made to this folder should become a new version of the previous dataset. We start by creating a new dataset just like we saw before, but now, we add the previous dataset ID as a parent. This tells ClearML that this new dataset version we’re creating is based on the previous one and so our dataset object here will already contain all the files that the parent contained.
|
||||
|
||||
Now we can manually remove and add the files that we want, even without actually downloading the dataset. It will just change the metadata inside the python object and sync everything when it’s finalized.
|
||||
|
||||
That said, we do have a local copy of the dataset in this case, so we have a better option.
|
||||
|
||||
Using the python SDK, we can call the sync_folder method. This method will essentially compare the dataset object metadata with the content of a local_path that you supply. So when we now call finalize and upload, it will only upload or remove the files that changed.
|
||||
|
||||
The command line interface doesn’t have the python object for metadata, so it can only work with local data using the sync command. But it bunches this whole process together in one single command. Call clearml-data sync, provide it with the dataset name and project for the new version and maybe add some parent datasets too if applicable. This single call will create a new dataset version, sync it and then upload the changes all in 1 go. Neat, right?
|
||||
|
||||
Now we can take a look again at the dataset UI. We’ll see our original dataset as well as the new version we made just now that’s based on it.
|
||||
|
||||
When we click on our newest version in the lineage view, we can see that we indeed added 4 files and removed 3.
|
||||
|
||||
If we now click on details again to look at the content, we can see that our chocolate cakes have been added correctly. You’ll also notice that when we go to the preview tab, we only see chocolate cakes. This is because a dataset version only stores the differences between itself and its parents. So in this case, only chocolate cakes were added.
|
||||
|
||||
In this video, we’ve covered the most important uses of clearml data, so hopefully you have a good intuition into what’s possible now and how valuable it can be. Building and updating your dataset versions from code is the best way to keep everything updated andiehair make sure no data is ever lost. You’re highly encouraged to explore ways to automate as much of this process as possible, take a look at our documentation to find the full range of possibilities.
|
||||
|
||||
So what are you waiting for? Start tracking your datasets with clearml-data and don’t forget to join our slack channel if you need any help.
|
||||
</div>
|
||||
</details>
|
@ -0,0 +1,54 @@
|
||||
---
|
||||
title: Core Component Overview
|
||||
---
|
||||
|
||||
## Video Tutorial
|
||||
|
||||
<div style={{position: 'relative', overflow: 'hidden', width: '100%', paddingTop: '56.25%' }} >
|
||||
<iframe style={{position: 'absolute', top: '0', left: '0', bottom: '0', right: '0', width: '100%', height: '100%'}}
|
||||
src="https://www.youtube.com/embed/s3k9ntmQmD4"
|
||||
title="YouTube video player"
|
||||
frameborder="0"
|
||||
allow="accelerometer; autoplay; clipboard-write; encrypted-media; gyroscope; picture-in-picture; fullscreen"
|
||||
allowfullscreen>
|
||||
</iframe>
|
||||
</div>
|
||||
|
||||
<details className="cml-expansion-panel info">
|
||||
<summary className="cml-expansion-panel-summary">Read the transcript</summary>
|
||||
<div className="cml-expansion-panel-content">
|
||||
|
||||
Welcome to clearML! This video will serve as an overview of the complete clearML stack. We’ll introduce you to the most important concepts and show you how everything fits together, so you can dive deep into the next videos, which will cover the clearML functionality in more detail.
|
||||
|
||||
ClearML is designed to get you up and running in less than 10 minutes and 2 magic lines of code. But if you start digging, you’ll quickly find out that it has a lot of functionality to offer. So let’s break it down, shall we?
|
||||
|
||||
At the heart of clearML lies the experiment manager. It consists of the clearml pip package and the clearml server.
|
||||
|
||||
After running `pip install clearml` we can add 2 simple lines of python code to your existing codebase. These 2 lines will capture all the output that your code produces: logs, source code, hyperparameters, plots, images, you name it.
|
||||
|
||||
The pip package also includes clearml-data. It can help you keep track of your ever-changing datasets and provides an easy way to store, track and version control your data. It’s also an easy way to share your dataset with colleagues over multiple machines while keeping track of who has which version. Clearml-data can even keep track of your data’s ancestry, making sure you can always figure out where specific parts of your data came from.
|
||||
|
||||
Both the 2 magic lines and the data tool will send all of their information to a clearML server. This server then keeps an overview of your experiment runs and data sets over time, so you can always go back to a previous experiment, see how it was created and even recreate it exactly. Keep track of your best models by creating leaderboards based on your own metrics and you can even directly compare multiple experiment runs, helping you to figure out the best way forward for your models.
|
||||
|
||||
To get started with a server right away, you can make use of the free tier. And when your needs grow, we’ve got you covered too! Just check out our website to find a tier that fits your organisation best. But, because we’re open source, you can also host your own completely for free. We have AWS images, Google Cloud images, you can run it on docker-compose locally or even, if you really hate yourself, run it on a self hosted kubernetes cluster using our helm charts.
|
||||
|
||||
So, to recap: to get started, all you need is a pip package and a server to store everything. Easy right? But MLops is much more than experiment and data management. It’s also about automation and orchestration, which is exactly where the clearml-agent comes into play.
|
||||
|
||||
The clearml agent is a daemon that you can run on 1 or multiple machines and turns them into workers. An agent executes an experiment or other workflow by reproducing the state of the code from the original machine to a remote machine.
|
||||
|
||||
Now that we have this remote execution capability, the possibilities are near endless.
|
||||
|
||||
For example, It’s easy to set up an agent on a either a CPU or a GPU machine, so you can easily run all of your experiments on any compute resource you have available. And if you spin up your agents in the cloud, they’ll even support auto scaling out of the box. But it can also do all of this locally, if you don’t have access to the cloud.
|
||||
|
||||
You can set up multiple machines as agents to support large teams with their complex projects and easily configure a queuing system to get the most out of your available hardware.
|
||||
|
||||
Talking about using multiple machines, say you have an experiment and want to optimize your hyper parameters. Clearml can easily and automatically clone your experiments however many times you want, change some hyper parameters on the fly according to your strategy and send the task to any one of your agents.
|
||||
|
||||
You can even use a google colab instance as a clearML agent to get free gpu power, just sayin!
|
||||
|
||||
As a final example on how you could use the agents functionality, clearml provides a PipelineController, which allows you to chain together tasks by plugging the output of one task as the input of another. Each of the tasks are of course run on your army of agents for full automation.
|
||||
|
||||
As you can see clearML is a large toolbox, stuffed with the most useful components for both data scientists and mlops engineers. We’re diving deeper into each component in the following videos if you need more details, but feel free to get started now at clear.ml
|
||||
|
||||
</div>
|
||||
</details>
|
@ -0,0 +1,72 @@
|
||||
---
|
||||
title: Experiment Management Best Practices
|
||||
---
|
||||
|
||||
|
||||
## Video Tutorial
|
||||
|
||||
<div style={{position: 'relative', overflow: 'hidden', width: '100%', paddingTop: '56.25%' }} >
|
||||
<iframe style={{position: 'absolute', top: '0', left: '0', bottom: '0', right: '0', width: '100%', height: '100%'}}
|
||||
src="https://www.youtube.com/embed/kyOfwVg05EM"
|
||||
title="YouTube video player"
|
||||
frameborder="0"
|
||||
allow="accelerometer; autoplay; clipboard-write; encrypted-media; gyroscope; picture-in-picture; fullscreen"
|
||||
allowfullscreen>
|
||||
</iframe>
|
||||
</div>
|
||||
|
||||
|
||||
|
||||
|
||||
<details className="cml-expansion-panel info">
|
||||
<summary className="cml-expansion-panel-summary">Read the transcript</summary>
|
||||
<div className="cml-expansion-panel-content">
|
||||
Welcome to clearml. In this video, we’ll go deeper into some of the best practices and advanced tricks you can use while working with clearml for experiment management.
|
||||
|
||||
The first thing to know is that the Task object is the central pillar of both the experiment manager and the orchestration and automation components. This means that if you manage the task well in the experiment phase, it will be much easier to scale to production later down the line.
|
||||
|
||||
So let’s take a look at the task object in more detail. We have inputs called hyperparameters and configuration objects for external config files. Outputs can be anything like we saw in the last video. Things like debug images, plots and console output kind of speak for themselves, so the ones we’ll focus on here are scalars and artifacts.
|
||||
|
||||
Right, so let’s start with the inputs: hyperparameters. Hyperparameters are the configuration options of your code, not only your model. Usually people put them in a config.py file or a global dictionary for example. Others just use command line parameters for this.
|
||||
|
||||
Let’s take this simple code as an example. First of all, we start the script with the 2 magic lines of code that we covered before. Next to that we have a mix of command line arguments and some additional parameters in a dictionary here.
|
||||
|
||||
The command line arguments will be captured automatically, and for the dict (or really any python object) we can use the `task.connect()` function, to report our dict values as clearml hyperparameters.
|
||||
|
||||
As you can see, when we run the script, all hyperparameters are captured and parsed by the server, giving you a clean overview in the UI.
|
||||
|
||||
Configuration objects, however, work slightly differently and are mostly used for more complex configurations, like a nested dict or a yaml file for example. They’re logged by using the `task.connect_configuration()` function instead and will save the configuration as a whole, without parsing it.
|
||||
|
||||
We have now logged our task with all of its inputs, but if we wanted to, we could rerun our code with different parameters and this is where the magic happens.
|
||||
|
||||
Remember ClearML also stores your code environment making it reproducible. So when we clone our task here, we’re making a copy of everything in that task and it will be in draft mode. Now we can edit any of the hyperparameters straight from the interface. We can then enqueue the task, so it will be remotely executed by one of your agents. What’s special about that is that the changed parameters will be injected in your original code! So when your code now addresses the parameter we just changed, it will work with the new value instead. This allows you to very quickly run experiments with different parameters. We can even do this automatically, but that’s a topic for the video on automation.
|
||||
|
||||
Back to the overview. One of the output types you can add to your task is what’s called an artifact.
|
||||
|
||||
An artifact can be a lot of things, mostly they’re files like model weights or pandas dataframes containing preprocessed features for example. Our documentation lists all supported data types.
|
||||
|
||||
You can download the artifacts your code produced from the web UI to your local computer if you want to, but artifacts can also be retrieved programmatically.
|
||||
|
||||
Let’s say we have a preprocessing task, which produces a set of features as an artifact, and we have a training task.
|
||||
|
||||
We can set up our training task to pull the features artifact from our preprocessing task and start training on it. We can either select the preprocessing task by it’s ID or just by it’s name, in which case the newest task in the project will be pulled. In that case, if we get new data and rerun our preprocessing task, our training task will automatically pull the newest features the next time it’s executed. You can even go further by using tags for example. No more shuffling around csv files.
|
||||
|
||||
Another type of artifact are model files or weight files, but *they* get a special place in the ClearML ecosystem. First of all models that are saved using any of the major machine learning libraries will automatically be captured just like the command line arguments from before.
|
||||
|
||||
Next to that, models are not JUST artifacts of their original task, they also exist as a standalone entity, which has 2 major advantages.
|
||||
|
||||
First, you don’t have to find the original task and then get the attached model like with other artifacts, you can just pull a model by it’s ID or tag. They can also be shared individually, without sharing the whole task.
|
||||
|
||||
Secondly, you’re organically building up a central model repository while running your experiments, which will be super valuable when later we need to serve the model for example. We can just pull the latest model in a similar way to how we pulled the features from before.
|
||||
|
||||
Finally, we have the scalars. These are numeric metrics that reflect the performance of your training runs, such as loss or accuracy for example.
|
||||
|
||||
There’s quite some benefits to properly keeping track of your scalars instead of just looking at the console output for example. They are nicely plotted over time and so can be easily compared to other experiment runs like we saw in the last video. Next to that you can add scalars as custom columns in your experiment overview, effectively creating a leaderboard of your best models.
|
||||
|
||||
And then we’re not even talking about all the ways to automate tasks using these scalars, artifacts and hyperparameters. Trust me, in the future, your MLops engineer will cry of happiness if you do this correctly now and chances are that engineer is going to be you.
|
||||
|
||||
For the next videos we’ll finally cover automation and orchestration as well as ClearML data, our data versioning tool.
|
||||
|
||||
Feel free to check out and test all of these features at app.clear.ml or using our self hosted server on github and don’t forget to join our slack channel if you need any help.
|
||||
</div>
|
||||
</details>
|
@ -0,0 +1,76 @@
|
||||
---
|
||||
title: Experiment Manager Hands-on
|
||||
---
|
||||
|
||||
|
||||
## Video Tutorial
|
||||
|
||||
<div style={{position: 'relative', overflow: 'hidden', width: '100%', paddingTop: '56.25%' }} >
|
||||
<iframe style={{position: 'absolute', top: '0', left: '0', bottom: '0', right: '0', width: '100%', height: '100%'}}
|
||||
src="https://www.youtube.com/embed/bjWwZAzDxTY"
|
||||
title="YouTube video player"
|
||||
frameborder="0"
|
||||
allow="accelerometer; autoplay; clipboard-write; encrypted-media; gyroscope; picture-in-picture; fullscreen"
|
||||
allowfullscreen>
|
||||
</iframe>
|
||||
</div>
|
||||
|
||||
|
||||
|
||||
|
||||
<details className="cml-expansion-panel info">
|
||||
<summary className="cml-expansion-panel-summary">Read the transcript</summary>
|
||||
<div className="cml-expansion-panel-content">
|
||||
Welcome to clearML! In this video, you’ll learn how to quickly get started with clearML by adding 2 simple lines of Python code to your existing project.
|
||||
|
||||
This is the experiment manager UI and every row you can see here, is a single run of your code. So let’s set everything up in the code first and then we’ll come back to this UI later in the video.
|
||||
|
||||
We’re currently in our project folder. As you can see, we have our very basic toy example here that we want to start to track by using clearML’s experiment manager.
|
||||
|
||||
The first thing to do is to install the clearml python package in our virtual environment. Installing the package itself, will add 3 commands for you. We’ll cover the `clearml-data` and `clearml-task` commands later, for now the one we need is `clearml-init`
|
||||
|
||||
If you paid attention in the first video of this series, you’d remember that we need to connect to a clearML server to save all our tracked data, the server is where we saw the list of experiments earlier. This connection is what `clearml-init` will set up for us. When running the command it’ll ask for your server API credentials.
|
||||
|
||||
To get those, go to your clearML server webpage. If you’re using our hosted service, this will be at app.clear.ml, if you’re hosting your own, browse to your servers address at port 8080. Go to your settings on the top right and, under workspace, create new credentials. This will pop up a window with your API info and you can just copy paste it into the `clearml-init` prompt.
|
||||
|
||||
The prompt will suggest the server urls that were in your copied snippet. If they are correct just press Enter, otherwise you can change them here.
|
||||
|
||||
Now we’re all set to add the 2 magic lines to our code and start tracking our experiments!
|
||||
|
||||
The first line imports the Task object from the clearml package and the second line creates a new task in a certain project. That project will be created if it doesn’t exist already.
|
||||
|
||||
Now we can just run the experiment and see it popup in realtime in our web view! The experiment will also generate a link to your experiment page for easy access. From here we have a lot of cool features at our disposal.
|
||||
|
||||
This is our experiment overview, experiments in clearml are called tasks. If we click on one of our tasks here, we get the detailed overview that we saw earlier.
|
||||
|
||||
We can change the task’s name by clicking it here, add a description or getting the task’s ID here. We can also add tags to our task for easy filtering and searching.
|
||||
|
||||
First of all, source code is captured. If you’re working in a git repository we’ll save your git information along with any uncommitted changes. If you’re running an unversioned script, clearml will save the script instead.
|
||||
|
||||
Together with the python packages your coded uses, this’ll allow you to recreate your experiment on any machine.
|
||||
|
||||
Similarly, all of the output the code produces will also be captured.
|
||||
|
||||
The 2 magic lines will also automatically hook into most ML/DL libraries when they’re imported by your code. For example when python argparse is used to handle command line arguments, the arguments themselves will be captured by ClearML without any extra code. In the same way, other frameworks such as tensorflow, pytorch and matplotlib will automatically log hyperparameters, model checkpoints, preview images and plots for example.
|
||||
|
||||
Next to automatic logging, it is super easy to manually add anything you want to the task with just a single extra line of code. Artifacts are usually meant for reusable files like model weights, while debug samples can show any output type like annotated images or even audio or video clips.
|
||||
|
||||
Just take a look at our documentation for more info.
|
||||
|
||||
If you want to show colleagues or friends how well your models are performing, you can easily share a task by right clicking it and choosing share to make it accessible with a link. Anyone visiting that link will get the detail view in fullscreen mode and the task itself will get a tag showing that it’s now shared.
|
||||
|
||||
In many cases, we also want to compare multiple versions of our experiments directly, this is easily done by selecting the tasks you’re interested in and clicking on compare in the bottom ribbon.
|
||||
|
||||
This will bring up the same information tabs as in our detail view.
|
||||
|
||||
Differences are highlighted in red and you can choose to hide everything that’s the same between tasks for a cleaner comparison.
|
||||
|
||||
Scalars such as loss or accuracy will be plotted on the same axes which makes comparing them much more convenient.
|
||||
|
||||
Finally plots such as a confusion matrix and debug samples can be compared too. For those times when you just want to confirm that the new model is better with your own eyes.
|
||||
|
||||
Now that you’re ready to start tracking and managing your experiments, we’ll cover some more advanced features and concepts of the experiment manager in the next video. Later we’ll also look at remote execution and automation using the clearml agent.
|
||||
|
||||
But if you want to get started right now, head over to clear.ml and join our community slack channel if you need any help.
|
||||
</div>
|
||||
</details>
|
@ -0,0 +1,75 @@
|
||||
---
|
||||
title: Hyperdatasets Data Versioning
|
||||
---
|
||||
|
||||
|
||||
## Video Tutorial
|
||||
|
||||
<div style={{position: 'relative', overflow: 'hidden', width: '100%', paddingTop: '56.25%' }} >
|
||||
<iframe style={{position: 'absolute', top: '0', left: '0', bottom: '0', right: '0', width: '100%', height: '100%'}}
|
||||
src="https://www.youtube.com/embed/1VliYRexeLU"
|
||||
title="YouTube video player"
|
||||
frameborder="0"
|
||||
allow="accelerometer; autoplay; clipboard-write; encrypted-media; gyroscope; picture-in-picture; fullscreen"
|
||||
allowfullscreen>
|
||||
</iframe>
|
||||
</div>
|
||||
|
||||
<details className="cml-expansion-panel info">
|
||||
<summary className="cml-expansion-panel-summary">Read the transcript</summary>
|
||||
<div className="cml-expansion-panel-content">
|
||||
Hello and welcome to ClearML. In this video, we're taking a closer look at hyperdatasets, a supercharged version of ClearML-Data.
|
||||
|
||||
Hyperdatasets is a data management system that’s designed for unstructured data like text, audio or visual data. It is part of the ClearML paid offering, which means it brings along quite a bit of upgrades over the open source clearml-data.
|
||||
|
||||
The main conceptual difference between the two, is that hyperdatasets decouples the metadata from the raw data files. This allows you to manipulate the metadata in all kinds of ways while abstracting away the logistics of having to deal with large amounts of data.
|
||||
|
||||
Manipulating the metadata is done through queries and parameters, both of which can then be tracked using the experiment manager.
|
||||
|
||||
This means it’s easy to not only trace back which data was used at the time of training, but also clone the experiment and rerun it using different data manipulations without changing a single line of code! Combine this with the clearml-agent and autoscalers and you can start to see the potential.
|
||||
|
||||
The data manipulations themselves become part of the experiment, we call it a dataview. A machine learning engineer can create the model training code and then a data engineer or QA engineer can experiment with different dataset configurations without any coding. In essence the data access is completely abstracted.
|
||||
|
||||
By contrast, in ClearML Data, just like many other data versioning tools, the data and the metadata are entangled. Take this example where the label of the image is defined by which folder it is in, a common dataset structure. What if I want to train only on donuts? Or what if I have a large class imbalance? I still have to download the whole dataset even though I might only be using a small part of it. Then I have to change my code to only grab the donut images or to rebalance my classes by over or under sampling them. If later I want to add waffles to the mix, I have to change my code again.
|
||||
|
||||
Let’s take a look at an example that will show you how to use hyperdatasets to debug an underperforming model. But first, we start where any good data science projects starts: data exploration.
|
||||
|
||||
When you open hyperdatasets to explore a dataset, you can find the version history of that dataset here. Datasets can have multiple versions, which in turn can have multiple child versions. Each of the child versions will inherit the content of their parents.
|
||||
|
||||
By default, a dataset version will be in draft mode, meaning it can still be modified. You can press the publish button to essentially lock it to make sure it will not change anymore. If you want to make changes to a published dataset version, make a new version that’s based on it.
|
||||
|
||||
You’ll find automatically generated label statistics here, that give you a quick overview of the label distribution in your dataset as well as some version metadata and other version information.
|
||||
|
||||
Over here you can actually see the contents of the dataset itself. In this case, we’re storing images, but it could also be video, audio, text or even a reference to a file that’s stored somewhere else, such as in an S3 bucket.
|
||||
|
||||
When you click on one of the samples, you can see the image itself as well as any bounding boxes, keypoints or masks the image may have been annotated with. In fact, over here you can see a list of all the annotations in the image, including classification labels for the image itself. After going back to the main screen, you can also view your samples as a table instead of a preview grid, which can be handy for audio or text for example.
|
||||
|
||||
Above the table, you can try out the querying functionality by switching to advanced filters here. As an example, you could create a query that only includes donuts with a certainty of at least 75 percent. You can query on basically any metadata or annotation, so go nuts!
|
||||
|
||||
The goal of these queries is not to simply serve as a neat filter for data exploration, we want to use these queries as part of our machine learning experiments!
|
||||
|
||||
Enter the dataviews that I introduced in the beginning of this video. Dataviews can use sophisticated queries to connect specific data from one or more datasets to an experiment in the experiment manager. Essentially it creates and manages local views of remote Datasets.
|
||||
|
||||
As an example, imagine you have created an experiment that tries to train a model based on a specific subset of data using hyperdatasets.
|
||||
|
||||
To get the data you need to train on, you can easily create a dataview from code like so. Then you can add all sorts of constraints, like class filters, metadata filters and class weights which will over- or undersample the data as is required.
|
||||
|
||||
After running the task, we can see it in the experiment manager. The model is reporting scalars and training as we would expect. When using hyperdatasets, there is also a dataviews tab with all of the possibilities at your disposal. You can see which input datasets and versions that you used and can see the querying system that is used to subset them. This will already give you a nice, clean way to train your models on a very specific subset of the data, but there is more!
|
||||
|
||||
If you want to remap labels, or enumerate them to integers on the fly, ClearML will keep track of all the transformations that are done and make sure they are reproducible. There is, of course more still, so if you’re interested check out our documentation on hyperdatasets.
|
||||
|
||||
ClearML veterans already know what’s coming next. Cloning.
|
||||
|
||||
Imagine the scenario that the Machine Learning engineer has created the model training code that we saw before and integrated a dataview as the data source.
|
||||
|
||||
Now, a QA engineer or data analyst has spotted that the data distribution is not very balanced and that’s throwing the model off.
|
||||
|
||||
Without changing anything to the underlying code, someone can clone the existing experiment. This allows them to change any of the queries or parameters in the dataview itself. In this example, we’ll change the class weight to something else, modifying the data distribution in the process. You can then enqueue this experiment for a remote ClearML agent to start working on. The exact same model will be retraining on a different data distribution running on a remote machine in just a few clicks, no code change required.
|
||||
|
||||
After the remote machine has executed the experiment on the new dataview, we can easily compare the 2 to further help us with our analysis. This is a very fast and efficient way to iterate and gets rid of so much unnecessary work.
|
||||
|
||||
If you’ve been following along with the other getting started videos, you should already start to see the potential this approach can have. For example: we could now run hyperparameter optimization on the data itself, because all of the filters and settings previously shown are just parameters on a task. The whole process could be running in parallel on a cloud autoscaler for example. Imagine finding the best training data confidence threshold for each class to optimize the model performance.
|
||||
|
||||
If you’re interested in using Hyperdatasets for your team, then contact us using our website and we’ll get you going in no time. In the meantime, you can enjoy the power of the open source components at app.clear.ml and don’t forget to join our slack channel, if you need any help!
|
||||
</div>
|
||||
</details>
|
@ -0,0 +1,75 @@
|
||||
---
|
||||
title: Hyperparameter Optimization
|
||||
---
|
||||
|
||||
|
||||
## Video Tutorial
|
||||
|
||||
<div style={{position: 'relative', overflow: 'hidden', width: '100%', paddingTop: '56.25%' }} >
|
||||
<iframe style={{position: 'absolute', top: '0', left: '0', bottom: '0', right: '0', width: '100%', height: '100%'}}
|
||||
src="https://www.youtube.com/embed/dLkP7y4USFg"
|
||||
title="YouTube video player"
|
||||
frameborder="0"
|
||||
allow="accelerometer; autoplay; clipboard-write; encrypted-media; gyroscope; picture-in-picture; fullscreen"
|
||||
allowfullscreen>
|
||||
</iframe>
|
||||
</div>
|
||||
|
||||
<details className="cml-expansion-panel info">
|
||||
<summary className="cml-expansion-panel-summary">Read the transcript</summary>
|
||||
<div className="cml-expansion-panel-content">
|
||||
Hello and welcome to ClearML, in this video we’ll take a look at one cool way of using the agent other than rerunning a task remotely: hyperparameter optimization.
|
||||
|
||||
By now, we know that ClearML can easily capture our hyperparameters and scalars as part of the experiment tracking. We also know we can clone any task and change its hyperparameters so they’ll be injected into the original code at runtime. In the last video, we learnt how to make a remote machine execute this task automatically by using the agent.
|
||||
|
||||
Soooo…. Can we just clone a task like a 100 times, inject different hyperparameters in every clone, run the clones x_distributed_x on 10 agents and then sort the results based on a specific scalar?
|
||||
|
||||
Yeah, yeah we can, it's called hyperparameter optimization. And we can do all of this automatically too, no way you were going to clone and edit those 100 tasks yourself right?
|
||||
|
||||
If you don’t know what Hyperparameter Optimization is yet, you can find a link to our blog post on the topic in the description below. But in its most basic form, hyperparameter optimization tries to optimize a certain output by changing a set of inputs.
|
||||
|
||||
Let’s say we’ve been working on this model here and we were tracking our experiments with it anyway. We can see we have some hyperparameters to work with in the hyperparameters tab of the webUI. They are logged by using the task.connect function in our code. These are our inputs. We also have a scaler called validation epoch_accuracy, that we want to get as high as possible. This is our output. We could also select to minimize the epoch_loss for example, that is something you can decide yourself.
|
||||
|
||||
For a simple optimizer like RandomSearch, we will simply run a bunch of random combinations of hyperparameters and sort the tasks based on the value of the output scalar. So the best performing task will be on top. But more complex optimizers like Optuna use bayesian optimization, which actually looks at this scalar to see how it just changes and tries to predict the best possible next set of hyperparameters to try out. So it’s not just a shot in the dark, pretty neat right?
|
||||
|
||||
We are using a training script as our task in our example here, but the optimizer doesn’t actually care what’s in our task, it just wants inputs and outputs. So you can optimize basically anything you want.
|
||||
|
||||
The only thing we have to do to start optimizing this model is to write a small python file detailing what exactly we want our optimizer to do.
|
||||
|
||||
When you’re a ClearML pro user, you can just start the optimizer straight from the UI, but more on that later.
|
||||
|
||||
First of all, everything in ClearML is a task, the optimizer itself is one too, so we let the server know that, by using the task_type argument.
|
||||
|
||||
Next, we choose a task we want to optimize. We can either get our tasks ID from the webUI by pressing this button next to the task we want, or we can grab the ID with a single line of code given the task's name and project.
|
||||
|
||||
Next, we choose which task we want to optimize by providing its ID.
|
||||
|
||||
Now the optimizer needs it’s inputs and outputs. For the inputs, we can tell it to choose a parameter either from a discrete list of options, or within certain boundaries. The name of the hyperparameter consists of the section it’s reported to, followed by a slash and then its name.
|
||||
|
||||
For the outputs, we tell the optimizer what the scalar is that we want to optimize. You can find the necessary information in your original task, under scalars. The metric title is the title of the plot, the metric series is the trace and the sign is whether we want to minimize or maximise this scalar.
|
||||
|
||||
There are many more parameters that you can tune, but if you want to go deeper, check out our other HPO blogpost on the website and in the description.
|
||||
|
||||
That’s it! With just a few lines of code, we can optimize a task. If we take a look now at the experiment list, we can see that both our optimizer task and our different clones are showing up here. Each clone is using the same code as the original task, but with different hyperparameters injected.
|
||||
|
||||
And that’s really cool! Instead of inserting the HPO process in our original code, like you would do with most optimization libraries, we’ve now put it on top of it instead. So we can keep our code completely separate from the optimization process. Which, again, means we can optimize anything we want.
|
||||
|
||||
The more advanced optimizers like Optuna and BOHB will use early stopping to not waste time on bad parameter runs, so expect some of these tasks to be aborted, that’s normal.
|
||||
|
||||
We can now follow the progress of our optimization process by looking at the optimizer task under the plots section. Here we can see several interesting things happening.
|
||||
|
||||
Every point in this graph is a task, or a single run of your code using a specific hyperparameter configuration. It will give you a quick glimpse into how all tasks are performing.
|
||||
|
||||
The next graph is a really cool one, designed to give you some intuition on what parameters ranges are good and which parameters have the most impact on the final outcome. For example here we can clearly see that the adam optimizer is much better for our task than the sgd optimizer.
|
||||
|
||||
Then we have the table, which is a sorted list of all tasks with their objective value, parameter combinations and current status.
|
||||
|
||||
As we saw earlier, if you’re a ClearML pro user, you can even launch your optimizer straight from the UI, no optimizer script required and you get a nicer overview dashboard included. This means you can optimize your tasks literally in a minute.
|
||||
|
||||
Next to that, we have a graph that tells us the most recent and maximum values of our objective scalar. If everything is going as planned these should steadily climb up.
|
||||
|
||||
And don’t forget about autoscaling! You can run it for free using code of course, but with clearml pro you can set it up in the UI as well. Which means that, starting from scratch, you can have an autoscaling cluster of cloud VMs running hyperparameter optimization on your experiment tasks in just a few minutes.
|
||||
|
||||
How cool is that? In the next video, we’ll take a look at another example of automation goodness: pipelines. In the meantime, why not try and optimize one of your existing models for free at app.clear.ml and don’t forget to join our slack channel, if you need any help.
|
||||
</div>
|
||||
</details>
|
67
docs/getting_started/video_tutorials/pipelines_from_code.md
Normal file
67
docs/getting_started/video_tutorials/pipelines_from_code.md
Normal file
@ -0,0 +1,67 @@
|
||||
---
|
||||
title: Pipelines from code
|
||||
---
|
||||
|
||||
|
||||
## Video Tutorial
|
||||
|
||||
<div style={{position: 'relative', overflow: 'hidden', width: '100%', paddingTop: '56.25%' }} >
|
||||
<iframe style={{position: 'absolute', top: '0', left: '0', bottom: '0', right: '0', width: '100%', height: '100%'}}
|
||||
src="https://www.youtube.com/embed/UVBk337xzZo"
|
||||
title="YouTube video player"
|
||||
frameborder="0"
|
||||
allow="accelerometer; autoplay; clipboard-write; encrypted-media; gyroscope; picture-in-picture; fullscreen"
|
||||
allowfullscreen>
|
||||
</iframe>
|
||||
</div>
|
||||
|
||||
<details className="cml-expansion-panel info">
|
||||
<summary className="cml-expansion-panel-summary">Read the transcript</summary>
|
||||
<div className="cml-expansion-panel-content">
|
||||
Hello and welcome to ClearML, in this video we’ll take a look at how pipelines can be used as a way to easily automate and orchestrate multiple tasks.
|
||||
|
||||
Essentially pipelines are a way to automate and orchestrate the execution of multiple tasks in a scalable way. Each task in the context of a ClearML pipeline is called a step or component and it doesn’t necessarily have to be an existing *ClearML* task, it can be any code.
|
||||
|
||||
A pipeline can be orchestrated using your own control logic. So you could say run task 2 only if task 1 was successful. But you can do more complex control logic too like if the accuracy of the final model is not high enough, run the pipeline again with different parameters.
|
||||
|
||||
Pipelines are highly scalable too, just like any object in the ClearML ecosystem, a pipeline is a task with inputs and outputs that you can clone just like any other. If you saw our video on HPO, this should ring a bell. It’s completely doable to use hyperparameter optimization to optimize a complete pipeline and have all of the steps be run distributed on an auto-scaling cluster of agents. How is that not awesome?
|
||||
|
||||
Ok, but how do we make one? Well, you can easily chain existing ClearML tasks together to create a single pipeline, but in this video, we’ll go a little deeper and show you how to create pipelines straight from your existing code.
|
||||
|
||||
Ok, but how do we make one? In ClearML there are 2 main ways:
|
||||
|
||||
You can easily chain existing tasks together into a single pipeline. This means each step in the pipeline is a task that you tracked before using the experiment manager. On the other hand, you could go a little deeper and create pipelines straight from your codebase, which is what we’ll focus on in this video. But don’t worry, the end result is the same in both cases: a clearml pipeline.
|
||||
|
||||
Let’s say we have some functions that we already use to run ETL and a function that trains a model on the preprocessed data. We already have a main function too, that orchestrates when and how these other components should be run.
|
||||
|
||||
If we want to make this code into a pipeline, the first thing we have to do is to tell ClearML that these functions are supposed to become steps in our pipeline. We can do that by using a python decorator! For each function we want as a step, we can decorate it with PipelineDecorator.component.
|
||||
|
||||
The component call will fully automatically transform this function into a ClearML task, with all the benefits that come with that. It will also make it clear that this task will be part of a larger pipeline.
|
||||
|
||||
We can specify what values the function will return and these will become artefacts in the new task. This will allow the following tasks in the pipeline to easily access them.
|
||||
|
||||
We can also cache the function, which means that if the pipeline is rerun, but this function didn’t change, we will not execute the function again, which is super handy when loading lots of data that takes a long time for example.
|
||||
|
||||
Finally, we can add parameters to the pipeline as a whole. This means we can easily change these parameters later in the UI and rerun the pipeline with the new parameters fully automatically, just like we did with normal tasks in the previous videos.
|
||||
|
||||
You can go quite far with configuring this component, one can even specify in which docker image this particular step should be executed when it’s run by the agent. Check our documentation in the links below for a detailed overview of all the arguments.
|
||||
|
||||
The next thing we need is our control logic, the code that binds all other code together, in ClearML this is called a controller. We already have our control logic as code in our main function, so we can add a different decorator on here which is called: pipeline. The only arguments you need for the pipeline is a name and a project just like any other task. Easy as pie.
|
||||
|
||||
An important note here is that only if a step uses the output of a previous step, it will wait for that previous step to be completed before starting itself. If not, the steps will be executed in parallel.
|
||||
|
||||
At last, we can now run our pipeline! We can choose to run it locally which means both the controller and all the steps will be run as subprocesses on your local machine. This is great for debugging, but if we want the real scaling powers of our pipeline, we can execute it normally and the pipeline and tasks will be queued instead, so they can be executed by our remote agents. The pipeline task itself will be enqueued in a special “services” queue, so when setting up your agents for pipeline execution, take a look at the documentation first.
|
||||
|
||||
After running the pipeline, you can see both the controller task and the first step popping up in the experiment view. But it’s easier to use the dedicated pipeline UI, which we can find on the left here.
|
||||
|
||||
Here, we can find our pipeline project which automatically keeps track of every run we do. If we click on our pipeline here, we can see a nice visual representation of our pipeline steps.
|
||||
|
||||
When no step is selected, we can see our global pipeline info on the right. By clicking on the details button, we get the console output of our pipeline controller, our main function in the example, so we can see which steps were executed when.
|
||||
|
||||
If we select a step from our pipeline, we can see much of the same details, but this time for that specific step. On the right we can see any inputs or outputs our step produced and below, we can see the steps console output as well as the original code.
|
||||
|
||||
But now comes the most powerful feature of all. Again, a pipeline controller is a task like any other, so… we can clone it like any other. Pressing the “new run” button will allow us to do that from the UI! We can even change our global pipeline parameters here and, just like normal tasks, these will be injected into the original task and overwrite the original parameters. In this way, you can very quickly run many pipelines each with different parameters.
|
||||
|
||||
In the next video of this getting started series, we’ll get a long-overdue look at ClearML data, our data versioning tool. In the meantime, slap some pipeline decorators on your own functions for free at app.clear.ml and don’t forget to join our slack channel, if you need any help.
|
||||
</div>
|
||||
</details>
|
63
docs/getting_started/video_tutorials/pipelines_from_tasks.md
Normal file
63
docs/getting_started/video_tutorials/pipelines_from_tasks.md
Normal file
@ -0,0 +1,63 @@
|
||||
---
|
||||
title: Pipelines from tasks
|
||||
---
|
||||
|
||||
|
||||
## Video Tutorial
|
||||
|
||||
<div style={{position: 'relative', overflow: 'hidden', width: '100%', paddingTop: '56.25%' }} >
|
||||
<iframe style={{position: 'absolute', top: '0', left: '0', bottom: '0', right: '0', width: '100%', height: '100%'}}
|
||||
src="https://www.youtube.com/embed/prZ_eiv_y3c"
|
||||
title="YouTube video player"
|
||||
frameborder="0"
|
||||
allow="accelerometer; autoplay; clipboard-write; encrypted-media; gyroscope; picture-in-picture; fullscreen"
|
||||
allowfullscreen>
|
||||
</iframe>
|
||||
</div>
|
||||
|
||||
<details className="cml-expansion-panel info">
|
||||
<summary className="cml-expansion-panel-summary">Read the transcript</summary>
|
||||
<div className="cml-expansion-panel-content">
|
||||
Hello and welcome to ClearML, in this video we’ll take a look at how pipelines can be created from tasks instead of from code like we saw in the last video.
|
||||
|
||||
The tasks themselves are already in the system by using the experiment manager. What’s important to note here though is that hyperparameters, scalars and artifacts should be reported correctly because the pipeline will consider them to be the inputs and outputs of each step. In that way, a step can easily access for example the artifacts from a previous step.
|
||||
|
||||
So with the tasks as our steps this time, we really only need to add our control logic. And since we don’t have the main function as we had in the last video, we’ll put our control logic code in a dedicated PipelineController script instead. Let’s start with a small example.
|
||||
|
||||
Our example pipeline will consist of three distinct tasks. The first task downloads some data and then uploads it to clearml as an artifact.
|
||||
|
||||
In a future video, I’ll introduce you to clearml data which is actually our preferred way to handle data instead of uploading it as an artifact. So keep watching this getting started playlist if you want to know more.
|
||||
|
||||
The next task will preprocess that data. It has some hyperparameters here that configure the way the preprocessing is done. As you can see, the dataset url parameter is still empty. When the pipeline is run, these hyperparameters can be overwritten by the output of the previous step, we’ll see how that’s done a little later in the video. After the preprocessing, we’ll upload the resulting training and test data as an artifact again.
|
||||
|
||||
The final task will train a model on the preprocessed data by downloading the train and test artifacts from the previous step. Again, the actual parameter, preprocessing task ID in this case, will be overwritten by the real ID when the pipeline is run. You can see here in my experiment list, that I already have these 3 tasks already logged.
|
||||
|
||||
Now comes our control logic. Let’s start by making a simple python script. We can create a PipelineController object and give it a name and a project, it will become visible in the experiment list under that name because just like anything in clearml, the controller is just a task, albeit a special type of task in this case.
|
||||
|
||||
Next, we can add some pipeline level parameters. These can be easily accessed from within every step of the pipeline. They’re basically global variables. In this case we’ll add a parameter that will tell the first step where to get the raw data from. This is very useful because we’ll see later that we can easily rerun our pipeline with a different URL.
|
||||
|
||||
Now we can define our steps. Each step needs a name and some link to the original task. We can either give it the original task's ID or provide the task name and project, in which case the controller will use the most recent task with that name in that project.
|
||||
|
||||
For the next step we do the same thing, only now, we want the controller to know that we only want to run this step after the previous one has been completed. We can easily do that by providing the name of the previous steps as a list to the parents argument.
|
||||
|
||||
The structure of your pipeline will be derived from looking at this parents argument, so you can build your flow by defining the previous steps as parents for each step in the pipeline.
|
||||
|
||||
Now we do the same for the final step. However, remember the empty hyperparameters we saw before? We still have to overwrite these. We can use the parameter_override argument to do just that.
|
||||
|
||||
For example we can tell the first step to use the global pipeline parameter raw data url like so. But we can also reference output artifacts from a previous step by using its name and we can of course also just overwrite a parameter with a normal value. Finally, we can even pass along the unique task ID of a previous step, so you can get the task object based on that ID and access anything and everything within that task.
|
||||
|
||||
And that’s it! We now have our first pipeline!
|
||||
|
||||
Just like in the previous video, we can run the whole pipeline locally first, to debug our flow and make sure everything is working. If everything works as planned, we can then start it normally and everything will be enqueued instead. Your agents listening to the services queue will pick up the pipeline controller, clone the tasks that form your steps, override the necessary parameters and enqueue them into the default queue, for your other agents to start working on.
|
||||
|
||||
After running the script you can go to the pipeline screen and see the same kind of output as we saw last video: a list of pipeline runs and when we click it, we get a nice visual representation of the pipeline.
|
||||
|
||||
Now we can do all the same things that we could with a pipeline built from code. We can see the overall details of the pipeline itself and the logs of the pipeline controller.
|
||||
|
||||
When we select a specific step, we can see its inputs and outputs as well as its logs down here and even the original code.
|
||||
|
||||
Finally, we can also clone the whole pipeline and change its parameters by clicking on the new run button. This is the most powerful feature of all, as it allows us to really quickly rerun the whole pipeline with different parameters from the UI. The agents will take care of the rest!
|
||||
|
||||
In the next video of this getting started series, we’ll take a look at ClearML data, for realz this time. In the meantime, spin up some pipelinecontrollers yourself for free at app.clear.ml and don’t forget to join our slack channel, if you need any help.
|
||||
</div>
|
||||
</details>
|
16
docs/getting_started/video_tutorials/quick_introduction.md
Normal file
16
docs/getting_started/video_tutorials/quick_introduction.md
Normal file
@ -0,0 +1,16 @@
|
||||
---
|
||||
title: Quick Introduction
|
||||
---
|
||||
|
||||
|
||||
## Video Tutorial
|
||||
|
||||
<div style={{position: 'relative', overflow: 'hidden', width: '100%', paddingTop: '56.25%' }} >
|
||||
<iframe style={{position: 'absolute', top: '0', left: '0', bottom: '0', right: '0', width: '100%', height: '100%'}}
|
||||
src="https://www.youtube.com/embed/-9vqxF2UfFU"
|
||||
title="YouTube video player"
|
||||
frameborder="0"
|
||||
allow="accelerometer; autoplay; clipboard-write; encrypted-media; gyroscope; picture-in-picture; fullscreen"
|
||||
allowfullscreen>
|
||||
</iframe>
|
||||
</div>
|
@ -0,0 +1,73 @@
|
||||
---
|
||||
title: The ClearML Autoscaler
|
||||
---
|
||||
|
||||
|
||||
## Video Tutorial
|
||||
|
||||
<div style={{position: 'relative', overflow: 'hidden', width: '100%', paddingTop: '56.25%' }} >
|
||||
<iframe style={{position: 'absolute', top: '0', left: '0', bottom: '0', right: '0', width: '100%', height: '100%'}}
|
||||
src="https://www.youtube.com/embed/j4XVMAaUt3E"
|
||||
title="YouTube video player"
|
||||
frameborder="0"
|
||||
allow="accelerometer; autoplay; clipboard-write; encrypted-media; gyroscope; picture-in-picture; fullscreen"
|
||||
allowfullscreen>
|
||||
</iframe>
|
||||
</div>
|
||||
|
||||
<details className="cml-expansion-panel info">
|
||||
<summary className="cml-expansion-panel-summary">Read the transcript</summary>
|
||||
<div className="cml-expansion-panel-content">
|
||||
Hello and welcome to ClearML. In this video we’ll go a little more advanced and introduce autoscalers, the easiest way to build your very own flock of ClearML Agents.
|
||||
|
||||
Data science is inherently very inconsistent in its demand for compute resources. One moment you’re just researching papers and need no compute at all, another moment you’re making 16 GPUs scream and wished you had more. Especially when running Hyperparameter Optimization or Pipelines, it can be very handy to have some extra hardware for a short time.
|
||||
|
||||
Even then, no one has 16 GPUs on their desk ready to go. It’s generally a good idea to run and test your code on your local machine first, maybe for 1 epoch or on a subset of the data just to check if it works. But then you’ll want to hand it over to more powerful remote machine to do the longer term heavy lifting, so you can play some elden ring in the meantime on your own GPU.
|
||||
|
||||
Remote machines are easy to get from any cloud provider and you only pay for the time you use them….
|
||||
|
||||
As long as you don’t forget the shut them down after you’re done. Seriously, I’m pretty sure at least 30% of GPU usage is people forgetting to shut down their remote machines.
|
||||
|
||||
Anyway, that’s what an autoscaler takes care of for you: spinning up as many machines as you need, when you need them and automatically shutting them down again when you don’t.
|
||||
|
||||
Once the autoscaler is deployed, you can just add experiments to a queue as we saw in the previous videos. Once there are experiments detected in the queue, the autoscaler will automatically spin up new remote machines and turn them into clearML agents that will run them for you. No fiddling with remote ssh and no docker containers. And no need to worry about shutting down … the AS gets done for you. When an agent has been idle for a while, it gets shut down automatically, so you don’t even have to think about it.
|
||||
|
||||
You can also get fancy with queues. Create as many of them as you want and you can specify which type of remote machine should serve which queues. So imagine you have a CPU queue and a GPU queue, all you have to do is put your experiment in the right queue and you know exactly what type of machine will be running it.
|
||||
|
||||
Obviously, you also configure a maximum budget by limiting the number of machines that can be spun up at one time, so you don’t incur unexpected expenses.
|
||||
|
||||
Now that the theory is taken care of, let’s take a look at how to set up an autoscaler on ClearML.
|
||||
|
||||
To launch the autoscaler, go to app.clear.ml and open the application page, there you’ll find the autoscalers for each of the large cloud providers. To launch the autoscaler this way requires ClearML Pro, but it’s cheap enough that forgetting to shutdown a remote GPU machine for 3 days costs more than a year of ClearML Pro, so…
|
||||
|
||||
We’ll go into the AWS wizard in this video, but the other autoscalers have a very similar setup. First are the credentials for your cloud provider of choice, make sure you assign the correct access rights because the autoscaler will use these credentials to launch the machines and shut them down again when they are idle.
|
||||
|
||||
Naturally, you want the agent to be able to run your original code, so we need to supply our git credentials as well. This works by using a git application token as password, you can find how to generate such a token in the description below.
|
||||
|
||||
If you’re running from a notebook, don’t worry! Even notebooks that were tracked can be reproduced on the remote machine!
|
||||
|
||||
The last big, important setting is of course which kind of machines we want to spin up.
|
||||
|
||||
The exact details will depend heavily on which cloud platform you end up using, but in general you’ll mainly need to provide what kind of machine type you want to run ( so, the amount of CPU cores, RAM and GPUs ). Each cloud provider has different options and naming schemes, but there will always be a handy tooltip here that will guide you to the relevant documentation.
|
||||
|
||||
Once you have decided the details of your machine, you can also enter which queues you want these kinds of machines to listen to here, like we discussed in the first part of the video. You also have to specify the maximum number of these kinds of machines that are allowed to run at the same time, so you can keep your budget under control.
|
||||
|
||||
You can add as many of these machine types as you wish. Finally, there are some more advanced configuration settings that you can read more about in the documentation linked below.
|
||||
|
||||
After filling in all these settings, let’s launch the autoscaler now, so we can see how it actually works.
|
||||
|
||||
We immediately start in the autoscaler dashboard and we can see the amount of machines that are running, the amount that are doing nothing, how many machines we have available per queue and all the autoscaler logs. Right now we have no machines running at all because our queues are empty.
|
||||
|
||||
So if we go to one of our projects, clone these tasks here and then enqueue them in the CPU queue and clone this task here as well. We can edit the parameters like we saw before and even change which container it should be run in. We then enqueue it in the GPU queue and we should now see the autoscaler kicking into action.
|
||||
|
||||
The autoscaler has detected the tasks in the queue and has started booting up remote machines to process them. We can follow along with the process in our autoscaler dashboard.
|
||||
|
||||
Once the machines are spinned up, the ClearML agents will register as available workers in the workers and queues tab. From here, they behave just like any other agent we’ve seen before.
|
||||
|
||||
Finally, when everything is done and the remote machines are idle, they will be shutdown automatically and the workers list will be empty again.
|
||||
|
||||
You can see that this functionality is very powerful when combined with for example Hyperparameter optimization or pipelines that launch a lot of tasks at once. Obviously, it can be used as the primary way to get access to remote compute but it can even be used as an extra layer on top of the machines you already have on-premise to spillover in case of large demand spikes for example. You don’t pay when you don’t use it, so there isn’t really a good reason not to have one running at all times.
|
||||
|
||||
Get started right now for free at app.clear.ml and start spinning up remote machines with ClearML Pro if you want to save some money and effort by automating the boring stuff. If you run into any issues along the way, join our slack channel and we’ll help you out.
|
||||
</div>
|
||||
</details>
|
15
sidebars.js
15
sidebars.js
@ -12,7 +12,20 @@ module.exports = {
|
||||
{'Getting Started': ['getting_started/main', {
|
||||
'Where do I start?': [{'Data Scientists': ['getting_started/ds/ds_first_steps', 'getting_started/ds/ds_second_steps', 'getting_started/ds/best_practices']},
|
||||
{'MLOps': ['getting_started/mlops/mlops_first_steps','getting_started/mlops/mlops_second_steps','getting_started/mlops/mlops_best_practices']}]
|
||||
}, 'getting_started/architecture']},
|
||||
}, 'getting_started/architecture', {'Video Tutorials':
|
||||
[
|
||||
'getting_started/video_tutorials/quick_introduction',
|
||||
'getting_started/video_tutorials/core_component_overview',
|
||||
'getting_started/video_tutorials/experiment_manager_hands-on',
|
||||
'getting_started/video_tutorials/experiment_management_best_practices',
|
||||
'getting_started/video_tutorials/agent_remote_execution_and_automation',
|
||||
'getting_started/video_tutorials/hyperparameter_optimization',
|
||||
'getting_started/video_tutorials/pipelines_from_code',
|
||||
'getting_started/video_tutorials/pipelines_from_tasks',
|
||||
'getting_started/video_tutorials/clearml-data',
|
||||
'getting_started/video_tutorials/the_clearml_autoscaler',
|
||||
'getting_started/video_tutorials/hyperdatasets_data_versioning'
|
||||
]}]},
|
||||
{'ClearML Fundamentals': ['fundamentals/projects', 'fundamentals/task', 'fundamentals/hyperparameters', 'fundamentals/artifacts', 'fundamentals/logger', 'fundamentals/agents_and_queues',
|
||||
'fundamentals/hpo']},
|
||||
{'ClearML SDK': ['clearml_sdk/clearml_sdk', 'clearml_sdk/task_sdk', 'clearml_sdk/model_sdk', 'clearml_sdk/apiclient_sdk']},
|
||||
|
Loading…
Reference in New Issue
Block a user