mirror of
https://github.com/clearml/clearml-docs
synced 2025-06-26 18:17:44 +00:00
Small edits (#828)
This commit is contained in:
@@ -20,27 +20,27 @@ keywords: [mlops, components, ClearML data]
|
||||
<br/>
|
||||
|
||||
<Collapsible type="info" title="Video Transcript">
|
||||
Hello and welcome to ClearML. In this video we’ll take a look at both the command line and python interfaces of our data versioning tool called `clearml-data`.
|
||||
Hello and welcome to ClearML. In this video we'll take a look at both the command line and python interfaces of our data versioning tool called `clearml-data`.
|
||||
|
||||
In the world of machine learning, you are very likely dealing with large amounts of data that you need to put into a dataset. ClearML Data solves 2 important challenges that occur in this situation:
|
||||
|
||||
One is accessibility, making sure the data can be accessed from every machine you use. And two is versioning, linking which dataset version was used in which task. This helps to make experiments more reproducible. Moreover, versioning systems like git were never really designed for the size and number of files in machine learning datasets. We’re going to need something else.
|
||||
One is accessibility, making sure the data can be accessed from every machine you use. And two is versioning, linking which dataset version was used in which task. This helps to make experiments more reproducible. Moreover, versioning systems like git were never really designed for the size and number of files in machine learning datasets. We're going to need something else.
|
||||
|
||||
ClearML Data comes built-in with the `clearml` python package and has both a command line interface for easy and quick operations and a python interface if you want more flexibility. Both interfaces are quite similar, so we’ll address both of them in the video.
|
||||
ClearML Data comes built-in with the `clearml` python package and has both a command line interface for easy and quick operations and a python interface if you want more flexibility. Both interfaces are quite similar, so we'll address both of them in the video.
|
||||
|
||||
Let’s start with an example. Say I have some files here that I want to put into a dataset and start to keep track of.
|
||||
Let's start with an example. Say I have some files here that I want to put into a dataset and start to keep track of.
|
||||
|
||||
First, we need to actually create an initial dataset version. The easiest way to do this is with the command line interface. Use the command `clearml-data create` and then give it a name and a project, just like with a ClearML task. It will return the dataset ID, which we will copy for later. The dataset is now initialized, but is still empty because we haven’t added any files yet.
|
||||
First, we need to actually create an initial dataset version. The easiest way to do this is with the command line interface. Use the command `clearml-data create` and then give it a name and a project, just like with a ClearML task. It will return the dataset ID, which we will copy for later. The dataset is now initialized, but is still empty because we haven't added any files yet.
|
||||
|
||||
We can do that by using the `clearml-data add` command and providing the path to the files we want to add. This will recursively add all files in that path to the Dataset.
|
||||
|
||||
Now we need to tell the server that we’re done here. We can call `clearml-data close` to upload the files and change the dataset status to done, which finalizes this version of the dataset.
|
||||
Now we need to tell the server that we're done here. We can call `clearml-data close` to upload the files and change the dataset status to done, which finalizes this version of the dataset.
|
||||
|
||||
The process of doing this with the python interface is very similar.
|
||||
|
||||
You can create a new Dataset by importing the Dataset object from the `clearml` pip package and calling its `create` method. Now we have to give the dataset a name and a project just like with the command line tool. The create method returns a dataset instance which we will use to do all of our operations on.
|
||||
|
||||
To add some files to this newly created dataset version, call the `add_files` method on the dataset object and provide a path to a local file or folder. Bear in mind that nothing is uploaded just yet, we’re simply instructing the dataset object what it should do when we eventually *do* want to upload.
|
||||
To add some files to this newly created dataset version, call the `add_files` method on the dataset object and provide a path to a local file or folder. Bear in mind that nothing is uploaded just yet, we're simply instructing the dataset object what it should do when we eventually *do* want to upload.
|
||||
|
||||
A really useful thing we can do with the python interface is adding some interesting statistics about the dataset itself, such as a plot for example. Here we simply report a histogram on the amount of files in the train and test folders. You can add anything to a dataset that you can add to a ClearML task, so go nuts!
|
||||
|
||||
@@ -48,37 +48,37 @@ Finally, upload the dataset and then finalize it, or just set `auto_upload` to `
|
||||
|
||||
In the web UI, we can now see the details of our dataset version by clicking on the Dataset button on the left. When we click on our newly created dataset here, we get an overview of our latest version, of course we have only one for now.
|
||||
|
||||
At a glance you can see things like the dataset ID, its size, and which files have been changed in this particular version. If you click on details, you’ll get a list of those files in the **Content** tab. Let’s make the view a little larger with this button, so it’s easier to see. When we switch to the preview tab, we can see the histogram we made before as well as an automatically generated preview of some of the files in our dataset version. Feel free to add anything you want in here! Finally, you can check out the original console logs that can be handy for debugging.
|
||||
At a glance you can see things like the dataset ID, its size, and which files have been changed in this particular version. If you click on details, you'll get a list of those files in the **Content** tab. Let's make the view a little larger with this button, so it's easier to see. When we switch to the preview tab, we can see the histogram we made before as well as an automatically generated preview of some of the files in our dataset version. Feel free to add anything you want in here! Finally, you can check out the original console logs that can be handy for debugging.
|
||||
|
||||
Now imagine we’re on a different machine. Maybe one from a team member, a classmate, or just one of your remote agents, and you want to get the dataset to do something cool with it.
|
||||
Now imagine we're on a different machine. Maybe one from a team member, a classmate, or just one of your remote agents, and you want to get the dataset to do something cool with it.
|
||||
|
||||
Using the command line tool, you can download a dataset version locally by using the `clearml-data get` command and providing its unique ID. You can find a dataset’s ID in the UI here, or alternatively, you can search for a specific dataset by providing the dataset name, its project, some tags attached to the dataset or any combination of the three. Running the command will give you the system path where the data was downloaded.
|
||||
Using the command line tool, you can download a dataset version locally by using the `clearml-data get` command and providing its unique ID. You can find a dataset's ID in the UI here, or alternatively, you can search for a specific dataset by providing the dataset name, its project, some tags attached to the dataset or any combination of the three. Running the command will give you the system path where the data was downloaded.
|
||||
|
||||
That path will be a local cached folder, which means that if you try to get the same dataset again, or any other dataset that’s based on this one, it will check which files are already on your system, and it will not download these again.
|
||||
That path will be a local cached folder, which means that if you try to get the same dataset again, or any other dataset that's based on this one, it will check which files are already on your system, and it will not download these again.
|
||||
|
||||
The python interface is similar, with one major difference. You can also get a dataset using any combination of name, project, ID or tags, but _getting_ the dataset does not mean it is downloaded, we simply got all of the metadata, which we can now access from the dataset object. This is important, as it means you don’t have to download the dataset to make changes to it, or to add files. More on that in just a moment.
|
||||
The python interface is similar, with one major difference. You can also get a dataset using any combination of name, project, ID or tags, but _getting_ the dataset does not mean it is downloaded, we simply got all of the metadata, which we can now access from the dataset object. This is important, as it means you don't have to download the dataset to make changes to it, or to add files. More on that in just a moment.
|
||||
|
||||
If you do want to download a local copy of the dataset, it has to be done explicitly, by calling `get_local_copy` which will return the path to which the data was downloaded for you.
|
||||
|
||||
This is a good approach for when you want to just download and use the data. But it *is* a read-only copy, so if we want to add or remove some data to create a new version, we’ll have to get a mutable copy instead, which we can do by using `get_local_mutable_copy` instead. We can give it a local path, and it will download the dataset into that path, but this time, we have full control over the contents.
|
||||
This is a good approach for when you want to just download and use the data. But it *is* a read-only copy, so if we want to add or remove some data to create a new version, we'll have to get a mutable copy instead, which we can do by using `get_local_mutable_copy` instead. We can give it a local path, and it will download the dataset into that path, but this time, we have full control over the contents.
|
||||
|
||||
We can do this with the command line tool too, by simply adding a `--copy` flag to the command
|
||||
|
||||
Now that we have this mutable copy, let’s try to change our dataset and create a new version.
|
||||
Now that we have this mutable copy, let's try to change our dataset and create a new version.
|
||||
|
||||
Let’s say we found an issue with the hamburgers here, so we remove them from the folder. Then we add new pictures of chocolate cake. Essentially, we have now removed 3 files and added 4 new ones.
|
||||
Let's say we found an issue with the hamburgers here, so we remove them from the folder. Then we add new pictures of chocolate cake. Essentially, we have now removed 3 files and added 4 new ones.
|
||||
|
||||
Now we can tell ClearML that the changes we made to this folder should become a new version of the previous dataset. We start by creating a new dataset just like we saw before, but now, we add the previous dataset ID as a parent. This tells ClearML that this new dataset version we’re creating is based on the previous one and so our dataset object here will already contain all the files that the parent contained.
|
||||
Now we can tell ClearML that the changes we made to this folder should become a new version of the previous dataset. We start by creating a new dataset just like we saw before, but now, we add the previous dataset ID as a parent. This tells ClearML that this new dataset version we're creating is based on the previous one and so our dataset object here will already contain all the files that the parent contained.
|
||||
|
||||
Now we can manually remove and add the files that we want, even without actually downloading the dataset. It will just change the metadata inside the python object and sync everything when it’s finalized.
|
||||
Now we can manually remove and add the files that we want, even without actually downloading the dataset. It will just change the metadata inside the python object and sync everything when it's finalized.
|
||||
|
||||
That said, we do have a local copy of the dataset in this case, so we have a better option.
|
||||
|
||||
Using the python SDK, we can call the `sync_folder` method. This method will essentially compare the dataset object metadata with the content of a `local_path` that you supply. So when we now call `finalize` and upload, it will only upload or remove the files that changed.
|
||||
|
||||
The command line interface doesn’t have the python object for metadata, so it can only work with local data using the sync command. But it bunches this whole process together in one single command. Call `clearml-data sync`, provide it with the dataset name and project for the new version and maybe add some parent datasets too if applicable. This single call will create a new dataset version, sync it and then upload the changes all in 1 go. Neat, right?
|
||||
The command line interface doesn't have the python object for metadata, so it can only work with local data using the sync command. But it bunches this whole process together in one single command. Call `clearml-data sync`, provide it with the dataset name and project for the new version and maybe add some parent datasets too if applicable. This single call will create a new dataset version, sync it and then upload the changes all in 1 go. Neat, right?
|
||||
|
||||
Now we can take a look again at the dataset UI. We’ll see our original dataset as well as the new version we made just now that’s based on it.
|
||||
Now we can take a look again at the dataset UI. We'll see our original dataset as well as the new version we made just now that’s based on it.
|
||||
|
||||
When we click on our newest version in the lineage view, we can see that we indeed added 4 files and removed 3.
|
||||
|
||||
|
||||
@@ -19,27 +19,27 @@ keywords: [mlops, components]
|
||||
<br/>
|
||||
|
||||
<Collapsible type="info" title="Video Transcript">
|
||||
Welcome to ClearML! This video will serve as an overview of the complete ClearML stack. We’ll introduce you to the most important concepts and show you how everything fits together, so you can deep dive into the next videos, which will cover the ClearML functionality in more detail.
|
||||
Welcome to ClearML! This video will serve as an overview of the complete ClearML stack. We'll introduce you to the most important concepts and show you how everything fits together, so you can deep dive into the next videos, which will cover the ClearML functionality in more detail.
|
||||
|
||||
ClearML is designed to get you up and running in less than 10 minutes and 2 magic lines of code. But if you start digging, you’ll quickly find out that it has a lot of functionality to offer. So let’s break it down, shall we?
|
||||
ClearML is designed to get you up and running in less than 10 minutes and 2 magic lines of code. But if you start digging, you'll quickly find out that it has a lot of functionality to offer. So let's break it down, shall we?
|
||||
|
||||
At the heart of ClearML lies the experiment manager. It consists of the `clearml` pip package and the ClearML Server.
|
||||
|
||||
After running `pip install clearml` we can add 2 simple lines of python code to your existing codebase. These 2 lines will capture all the output that your code produces: logs, source code, hyperparameters, plots, images, you name it.
|
||||
|
||||
The pip package also includes `clearml-data`. It can help you keep track of your ever-changing datasets and provides an easy way to store, track and version control your data. It’s also an easy way to share your dataset with colleagues over multiple machines while keeping track of who has which version. ClearML Data can even keep track of your data’s ancestry, making sure you can always figure out where specific parts of your data came from.
|
||||
The pip package also includes `clearml-data`. It can help you keep track of your ever-changing datasets and provides an easy way to store, track and version control your data. It's also an easy way to share your dataset with colleagues over multiple machines while keeping track of who has which version. ClearML Data can even keep track of your data's ancestry, making sure you can always figure out where specific parts of your data came from.
|
||||
|
||||
Both the 2 magic lines and the data tool will send all of their information to a ClearML server. This server then keeps an overview of your experiment runs and data sets over time, so you can always go back to a previous experiment, see how it was created and even recreate it exactly. Keep track of your best models by creating leaderboards based on your own metrics, and you can even directly compare multiple experiment runs, helping you to figure out the best way forward for your models.
|
||||
|
||||
To get started with a server right away, you can make use of the free tier. And when your needs grow, we’ve got you covered too! Just check out our website to find a tier that fits your organisation best. But, because we’re open source, you can also host your own completely for free. We have AWS images, Google Cloud images, you can run it on docker-compose locally or even, if you really hate yourself, run it on a self-hosted kubernetes cluster using our helm charts.
|
||||
To get started with a server right away, you can make use of the free tier. And when your needs grow, we've got you covered too! Just check out our website to find a tier that fits your organisation best. But, because we're open source, you can also host your own completely for free. We have AWS images, Google Cloud images, you can run it on docker-compose locally or even, if you really hate yourself, run it on a self-hosted kubernetes cluster using our helm charts.
|
||||
|
||||
So, to recap: to get started, all you need is a pip package and a server to store everything. Easy right? But MLOps is much more than experiment and data management. It’s also about automation and orchestration, which is exactly where the `clearml-agent` comes into play.
|
||||
So, to recap: to get started, all you need is a pip package and a server to store everything. Easy right? But MLOps is much more than experiment and data management. It's also about automation and orchestration, which is exactly where the `clearml-agent` comes into play.
|
||||
|
||||
The `clearml-agent` is a daemon that you can run on 1 or multiple machines and turns them into workers. An agent executes an experiment or other workflow by reproducing the state of the code from the original machine to a remote machine.
|
||||
|
||||
Now that we have this remote execution capability, the possibilities are near endless.
|
||||
|
||||
For example, It’s easy to set up an agent on either a CPU or a GPU machine, so you can easily run all of your experiments on any compute resource you have available. And if you spin up your agents in the cloud, they’ll even support autoscaling out of the box.
|
||||
For example, It's easy to set up an agent on either a CPU or a GPU machine, so you can easily run all of your experiments on any compute resource you have available. And if you spin up your agents in the cloud, they'll even support autoscaling out of the box.
|
||||
|
||||
You can set up multiple machines as agents to support large teams with their complex projects and easily configure a queuing system to get the most out of your available hardware.
|
||||
|
||||
|
||||
@@ -69,7 +69,7 @@ keep track of installed packages and stuff like that. In this case, of course, w
|
||||
it's only the `Task.init` and then just reporting some scalars.
|
||||
|
||||
What we do have is some scalars, so this is what it would look like, and we'll be using this one later down the line.
|
||||
Right, so if I go back here to my code you can also see we have a GitHub folder with the workflow sub-folder in there.
|
||||
Right, so if I go back here to my code you can also see we have a GitHub folder with the workflow subfolder in there.
|
||||
This basically tells GitHub that whatever you do--a push or commit or whatever--it will check this `yaml` file to see
|
||||
if it has to do any kind of checks. In this case, we'll call it ClearML checks, and we'll set it on to pull requests.
|
||||
Now, most of the time that you're using ClearML, it's going to be interesting to do checks on a pull request because it
|
||||
|
||||
@@ -20,17 +20,17 @@ keywords: [mlops, components, hyperparameter optimization, hyperparameter]
|
||||
<br/>
|
||||
|
||||
<Collapsible type="info" title="Video Transcript">
|
||||
Hello and welcome to ClearML. In this video we’ll take a look at one cool way of using the agent other than rerunning a task remotely: hyperparameter optimization (HPO).
|
||||
Hello and welcome to ClearML. In this video we'll take a look at one cool way of using the agent other than rerunning a task remotely: hyperparameter optimization (HPO).
|
||||
|
||||
By now, we know that ClearML can easily capture our hyperparameters and scalars as part of the experiment tracking. We also know we can clone any task and change its hyperparameters, so they’ll be injected into the original code at runtime. In the last video, we learnt how to make a remote machine execute this task automatically by using the agent.
|
||||
By now, we know that ClearML can easily capture our hyperparameters and scalars as part of the experiment tracking. We also know we can clone any task and change its hyperparameters, so they'll be injected into the original code at runtime. In the last video, we learnt how to make a remote machine execute this task automatically by using the agent.
|
||||
|
||||
Soooo… Can we just clone a task like 100 times, inject different hyperparameters in every clone, run the clones on 10 agents and then sort the results based on a specific scalar?
|
||||
|
||||
Yeah, yeah we can, it's called hyperparameter optimization. And we can do all of this automatically too! No way you were going to clone and edit those 100 tasks yourself, right?
|
||||
|
||||
If you don’t know what Hyperparameter Optimization is yet, you can find a link to our blog post on the topic in the description below. But in its most basic form, hyperparameter optimization tries to optimize a certain output by changing a set of inputs.
|
||||
If you don't know what Hyperparameter Optimization is yet, you can find a link to our blog post on the topic in the description below. But in its most basic form, hyperparameter optimization tries to optimize a certain output by changing a set of inputs.
|
||||
|
||||
Let’s say we’ve been working on this model here, and we were tracking our experiments with it anyway. We can see we have some hyperparameters to work with in the **Hyperparameters** tab of the web UI. They are logged by using the `Task.connect` function in our code. These are our inputs. We also have a scaler called `validation/epoch_accuracy`, that we want to get as high as possible. This is our output. We could also select to minimize the `epoch_loss` for example, that is something you can decide yourself.
|
||||
Let's say we've been working on this model here, and we were tracking our experiments with it anyway. We can see we have some hyperparameters to work with in the **Hyperparameters** tab of the web UI. They are logged by using the `Task.connect` function in our code. These are our inputs. We also have a scaler called `validation/epoch_accuracy`, that we want to get as high as possible. This is our output. We could also select to minimize the `epoch_loss` for example, that is something you can decide yourself.
|
||||
|
||||
We can see that no code was used to log the scalar. It's done automatically because we are using TensorBoard.
|
||||
|
||||
|
||||
@@ -20,19 +20,19 @@ keywords: [mlops, components, automation, orchestration, pipeline]
|
||||
<br/>
|
||||
|
||||
<Collapsible type="info" title="Video Transcript">
|
||||
Hello and welcome to ClearML. In this video we’ll take a look at how pipelines can be used as a way to easily automate and orchestrate multiple tasks.
|
||||
Hello and welcome to ClearML. In this video we'll take a look at how pipelines can be used as a way to easily automate and orchestrate multiple tasks.
|
||||
|
||||
Essentially, pipelines are a way to automate and orchestrate the execution of multiple tasks in a scalable way. Each task in the context of a ClearML pipeline is called a step or component, and it doesn’t necessarily have to be an existing ClearML *task*, it can be any code.
|
||||
Essentially, pipelines are a way to automate and orchestrate the execution of multiple tasks in a scalable way. Each task in the context of a ClearML pipeline is called a step or component, and it doesn't necessarily have to be an existing ClearML *task*, it can be any code.
|
||||
|
||||
A pipeline can be orchestrated using your own control logic. So you could say run task 2 only if task 1 was successful. But you can do more complex control logic too, like if the accuracy of the final model is not high enough, run the pipeline again with different parameters.
|
||||
|
||||
Pipelines are highly scalable too. Just like any object in the ClearML ecosystem, a pipeline is a task with inputs and outputs that you can clone just like any other. If you saw our video on HPO, this should ring a bell. It’s completely doable to use hyperparameter optimization to optimize a complete pipeline and have all of the steps be run distributed on an auto-scaling cluster of agents. How is that not awesome?
|
||||
Pipelines are highly scalable too. Just like any object in the ClearML ecosystem, a pipeline is a task with inputs and outputs that you can clone just like any other. If you saw our video on HPO, this should ring a bell. It's completely doable to use hyperparameter optimization to optimize a complete pipeline and have all of the steps be run distributed on an auto-scaling cluster of agents. How is that not awesome?
|
||||
|
||||
Ok, but how do we make one? In ClearML there are 2 main ways.
|
||||
|
||||
One is you can easily chain existing ClearML tasks together to create a single pipeline. This means each step in the pipeline is a task that you tracked before using the experiment manager. On the other hand, you could go a little deeper and create pipelines straight from your codebase, which is what we’ll focus on in this video. But don’t worry, the end result is the same in both cases: a ClearML pipeline.
|
||||
One is you can easily chain existing ClearML tasks together to create a single pipeline. This means each step in the pipeline is a task that you tracked before using the experiment manager. On the other hand, you could go a little deeper and create pipelines straight from your codebase, which is what we'll focus on in this video. But don't worry, the end result is the same in both cases: a ClearML pipeline.
|
||||
|
||||
Let’s say we have some functions that we already use to run ETL and another function that trains a model on the preprocessed data. We already have a main function too, that orchestrates when and how these other components should be run.
|
||||
Let's say we have some functions that we already use to run ETL and another function that trains a model on the preprocessed data. We already have a main function too, that orchestrates when and how these other components should be run.
|
||||
|
||||
If we want to make this code into a pipeline, the first thing we have to do is to tell ClearML that these functions are supposed to become steps in our pipeline. We can do that by using a python decorator! For each function we want as a step, we can decorate it with `PipelineDecorator.component`.
|
||||
|
||||
@@ -40,9 +40,9 @@ The component call will fully automatically transform this function into a Clear
|
||||
|
||||
We can specify what values the function will return and these will become artifacts in the new task. This will allow the following tasks in the pipeline to easily access them.
|
||||
|
||||
We can also cache the function, which means that if the pipeline is rerun, but this function didn’t change, we will not execute the function again, which is super handy when loading lots of data that takes a long time for example.
|
||||
We can also cache the function, which means that if the pipeline is rerun, but this function didn't change, we will not execute the function again, which is super handy when loading lots of data that takes a long time for example.
|
||||
|
||||
You can go quite far with configuring this component, one can even specify in which docker image this particular step should be executed when it’s run by the agent. Check our documentation in the links below for a detailed overview of all the arguments.
|
||||
You can go quite far with configuring this component, one can even specify in which docker image this particular step should be executed when it's run by the agent. Check our documentation in the links below for a detailed overview of all the arguments.
|
||||
|
||||
The next thing we need is our control logic, the code that binds all other code together. In ClearML this is called a controller. We already have our control logic as code in our main function, so we can add a different decorator on here which is called: `pipeline`. The only arguments you need for the pipeline decorator is a name and a project just like any other task. Easy as pie.
|
||||
|
||||
@@ -52,7 +52,7 @@ An important note here is that only if a step uses the output of a previous step
|
||||
|
||||
At last, we can now run our pipeline! We can choose to run it locally which means both the controller and all the steps will be run as subprocesses on your local machine. This is great for debugging, but if we want the real scaling powers of our pipeline, we can execute it normally and the pipeline and tasks will be queued instead, so they can be executed by our remote agents. The pipeline task itself will be enqueued in a special `services` queue, so when setting up your agents for pipeline execution, take a look at the documentation first.
|
||||
|
||||
After running the pipeline, you can see both the controller task and the first step popping up in the experiment view. But it’s easier to use the dedicated pipeline UI, which we can find on the left here.
|
||||
After running the pipeline, you can see both the controller task and the first step popping up in the experiment view. But it's easier to use the dedicated pipeline UI, which we can find on the left here.
|
||||
|
||||
Here, we can find our pipeline project which automatically keeps track of every run we do. If we click on our pipeline here, we can see a nice visual representation of our pipeline steps.
|
||||
|
||||
@@ -62,5 +62,5 @@ If we select a step from our pipeline, we can see much of the same details, but
|
||||
|
||||
But now comes the most powerful feature of all. Again, a pipeline controller is a task like any other, so… we can clone it like any other. Pressing the **+ New Run** button will allow us to do that from the UI! We can even change our global pipeline parameters here and, just like normal tasks, these will be injected into the original task and overwrite the original parameters. In this way, you can very quickly run many pipelines each with different parameters.
|
||||
|
||||
In the next video of this Getting Started series, we’ll get a long-overdue look at ClearML Data, our data versioning tool. In the meantime, slap some pipeline decorators on your own functions for free at [app.clear.ml](https://app.clear.ml), and don’t forget to join our [Slack channel](https://joinslack.clear.ml), if you need any help.
|
||||
In the next video of this Getting Started series, we'll get a long-overdue look at ClearML Data, our data versioning tool. In the meantime, slap some pipeline decorators on your own functions for free at [app.clear.ml](https://app.clear.ml), and don't forget to join our [Slack channel](https://joinslack.clear.ml), if you need any help.
|
||||
</Collapsible>
|
||||
|
||||
@@ -20,23 +20,23 @@ keywords: [mlops, components, automation, orchestration, pipeline]
|
||||
<br/>
|
||||
|
||||
<Collapsible type="info" title="Video Transcript">
|
||||
Hello and welcome to ClearML. In this video we’ll take a look at how pipelines can be created from tasks instead of from code like we saw in the last video.
|
||||
Hello and welcome to ClearML. In this video we'll take a look at how pipelines can be created from tasks instead of from code like we saw in the last video.
|
||||
|
||||
The tasks themselves are already in the system by using the experiment manager. What’s important to note here though is that hyperparameters, scalars, and artifacts should be reported correctly because the pipeline will consider them to be the inputs and outputs of each step. In that way, a step can easily access for example the artifacts from a previous step.
|
||||
The tasks themselves are already in the system by using the experiment manager. What's important to note here though is that hyperparameters, scalars, and artifacts should be reported correctly because the pipeline will consider them to be the inputs and outputs of each step. In that way, a step can easily access for example the artifacts from a previous step.
|
||||
|
||||
So with the tasks as our steps this time, we really only need to add our control logic. And since we don’t have the main function as we had in the last video, we’ll put our control logic code in a dedicated `PipelineController` script instead. Let’s start with a small example.
|
||||
So with the tasks as our steps this time, we really only need to add our control logic. And since we don't have the main function as we had in the last video, we'll put our control logic code in a dedicated `PipelineController` script instead. Let's start with a small example.
|
||||
|
||||
Our example pipeline will consist of three distinct tasks. The first task downloads some data and then uploads it to ClearML as an artifact.
|
||||
|
||||
In a future video, I’ll introduce you to ClearML Data which is actually our preferred way to handle data instead of uploading it as an artifact. So keep watching this getting started playlist if you want to know more.
|
||||
In a future video, I'll introduce you to ClearML Data which is actually our preferred way to handle data instead of uploading it as an artifact. So keep watching this getting started playlist if you want to know more.
|
||||
|
||||
The next task will preprocess that data. It has some hyperparameters here that configure the way the preprocessing is done. As you can see, the dataset `url` parameter is still empty. When the pipeline is run, these hyperparameters can be overwritten by the output of the previous step. We’ll see how that’s done a little later in the video. After the preprocessing, we’ll upload the resulting training and test data as an artifact again.
|
||||
The next task will preprocess that data. It has some hyperparameters here that configure the way the preprocessing is done. As you can see, the dataset `url` parameter is still empty. When the pipeline is run, these hyperparameters can be overwritten by the output of the previous step. We'll see how that's done a little later in the video. After the preprocessing, we'll upload the resulting training and test data as an artifact again.
|
||||
|
||||
The final task will train a model on the preprocessed data by downloading the train and test artifacts from the previous step. Again, the actual parameter, preprocessing task ID in this case, will be overwritten by the real ID when the pipeline is run. You can see here in my experiment list, that I already have these 3 tasks already logged.
|
||||
|
||||
Now comes our control logic. Let’s start by making a simple python script. We can create a `PipelineController` object and give it a name and a project, it will become visible in the experiment list under that name because just like anything in ClearML, the controller is just a task, albeit a special type of task in this case.
|
||||
Now comes our control logic. Let's start by making a simple python script. We can create a `PipelineController` object and give it a name and a project, it will become visible in the experiment list under that name because just like anything in ClearML, the controller is just a task, albeit a special type of task in this case.
|
||||
|
||||
Next, we can add some pipeline level parameters. These can be easily accessed from within every step of the pipeline. They’re basically global variables. In this case we’ll add a parameter that will tell the first step where to get the raw data from. This is very useful because we’ll see later that we can easily rerun our pipeline with a different URL.
|
||||
Next, we can add some pipeline level parameters. These can be easily accessed from within every step of the pipeline. They're basically global variables. In this case we'll add a parameter that will tell the first step where to get the raw data from. This is very useful because we'll see later that we can easily rerun our pipeline with a different URL.
|
||||
|
||||
Now we can define our steps. Each step needs a name and some link to the original task. We can either give it the original task's ID or provide the task name and project, in which case the controller will use the most recent task with that name in that project.
|
||||
|
||||
@@ -48,7 +48,7 @@ Now we do the same for the final step. However, remember the empty hyperparamete
|
||||
|
||||
For example, we can tell the first step to use the global pipeline parameter raw data url like so. But we can also reference output artifacts from a previous step by using its name, and we can of course also just overwrite a parameter with a normal value. Finally, we can even pass along the unique task ID of a previous step, so you can get the task object based on that ID and access anything and everything within that task.
|
||||
|
||||
And that’s it! We now have our first pipeline!
|
||||
And that's it! We now have our first pipeline!
|
||||
|
||||
Just like in the previous video, we can run the whole pipeline locally first, to debug our flow and make sure everything is working. If everything works as planned, we can then start it normally and everything will be enqueued instead. Your agents listening to the services queue will pick up the pipeline controller, clone the tasks that form your steps, override the necessary parameters and enqueue them into the `default` queue, for your other agents to start working on.
|
||||
|
||||
@@ -60,5 +60,5 @@ When we select a specific step, we can see its inputs and outputs as well as its
|
||||
|
||||
Finally, we can also clone the whole pipeline and change its parameters by clicking on the **+ New Run** button. This is the most powerful feature of all, as it allows us to really quickly rerun the whole pipeline with different parameters from the UI. The agents will take care of the rest!
|
||||
|
||||
In the next video of this Getting Started series, we’ll take a look at ClearML Data, for real this time. In the meantime, spin up some pipeline controllers yourself for free at [app.clear.ml](https://app.clear.ml) and don’t forget to join our [Slack channel](https://joinslack.clear.ml), if you need any help.
|
||||
In the next video of this Getting Started series, we'll take a look at ClearML Data, for real this time. In the meantime, spin up some pipeline controllers yourself for free at [app.clear.ml](https://app.clear.ml) and don't forget to join our [Slack channel](https://joinslack.clear.ml), if you need any help.
|
||||
</Collapsible>
|
||||
|
||||
@@ -20,29 +20,29 @@ keywords: [mlops, components, Autoscaler]
|
||||
<br/>
|
||||
|
||||
<Collapsible type="info" title="Video Transcript">
|
||||
Hello and welcome to ClearML. In this video we’ll go a little more advanced and introduce autoscalers, the easiest way to build your very own flock of ClearML Agents.
|
||||
Hello and welcome to ClearML. In this video we'll go a little more advanced and introduce autoscalers, the easiest way to build your very own flock of ClearML Agents.
|
||||
|
||||
Data science is inherently very inconsistent in its demand for compute resources. One moment you’re just researching papers and need no compute at all, another moment you’re making 16 GPUs scream and wishing you had more. Especially when running Hyperparameter Optimization or Pipelines, it can be very handy to have some extra hardware for a short time.
|
||||
Data science is inherently very inconsistent in its demand for compute resources. One moment you're just researching papers and need no compute at all, another moment you're making 16 GPUs scream and wishing you had more. Especially when running Hyperparameter Optimization or Pipelines, it can be very handy to have some extra hardware for a short time.
|
||||
|
||||
Remote machines are easy to get from any cloud provider, and you only pay for the time you use them… as long as you don’t forget to shut them down after you’re done. Seriously, I’m pretty sure at least 30% of GPU usage is people forgetting to shut down their remote machines.
|
||||
Remote machines are easy to get from any cloud provider, and you only pay for the time you use them… as long as you don't forget to shut them down after you're done. Seriously, I'm pretty sure at least 30% of GPU usage is people forgetting to shut down their remote machines.
|
||||
|
||||
Anyway, that’s what an autoscaler takes care of for you: spinning up as many machines as you need, when you need them and automatically shutting them down again when you don’t.
|
||||
Anyway, that's what an autoscaler takes care of for you: spinning up as many machines as you need, when you need them and automatically shutting them down again when you don't.
|
||||
|
||||
Once the autoscaler is deployed, you can just add experiments to a queue as we saw in the previous videos. Once there are experiments detected in the queue, the autoscaler will automatically spin up new remote machines and turn them into ClearML agents that will run them for you. No fiddling with remote SSH, and no docker containers, and no need to worry about shutting down--the Autoscaler does it for you.
|
||||
|
||||
You can also get fancy with queues. Create as many of them as you want, and you can specify which type of remote machine should serve which queues. So imagine you have a CPU queue and a GPU queue, all you have to do is put your experiment in the right queue, and you know exactly what type of machine will be running it.
|
||||
|
||||
Obviously, you also configure a maximum budget by limiting the number of machines that can be spun up at one time, so you don’t incur unexpected expenses.
|
||||
Obviously, you also configure a maximum budget by limiting the number of machines that can be spun up at one time, so you don't incur unexpected expenses.
|
||||
|
||||
Now that the theory is taken care of, let’s take a look at how to set up an autoscaler on ClearML.
|
||||
Now that the theory is taken care of, let's take a look at how to set up an autoscaler on ClearML.
|
||||
|
||||
To launch the autoscaler, go to [app.clear.ml](https://app.clear.ml) and open the Applications page. There you’ll find the autoscalers for each of the large cloud providers. To launch the autoscaler this way requires ClearML Pro, but it’s cheap enough that forgetting to shut down a remote GPU machine for 3 days costs more than a year of ClearML Pro, so…
|
||||
To launch the autoscaler, go to [app.clear.ml](https://app.clear.ml) and open the Applications page. There you'll find the autoscalers for each of the large cloud providers. To launch the autoscaler this way requires ClearML Pro, but it's cheap enough that forgetting to shut down a remote GPU machine for 3 days costs more than a year of ClearML Pro, so…
|
||||
|
||||
We’ll go into the AWS wizard in this video, but the other autoscalers have a very similar setup. First are the credentials for your cloud provider of choice, make sure you assign the correct access rights because the autoscaler will use these credentials to launch the machines and shut them down again when they are idle.
|
||||
We'll go into the AWS wizard in this video, but the other autoscalers have a very similar setup. First are the credentials for your cloud provider of choice, make sure you assign the correct access rights because the autoscaler will use these credentials to launch the machines and shut them down again when they are idle.
|
||||
|
||||
Naturally, you want the agent to be able to run your original code, so we need to supply our git credentials as well. This works by using a git application token as password, you can find how to generate such a token in the description below.
|
||||
|
||||
If you’re running from a notebook, don’t worry! Even notebooks that were tracked using the experiment manager can be reproduced on the remote machine!
|
||||
If you're running from a notebook, don’t worry! Even notebooks that were tracked using the experiment manager can be reproduced on the remote machine!
|
||||
|
||||
The last big, important setting is of course which kind of machines we want to spin up.
|
||||
|
||||
|
||||
Reference in New Issue
Block a user