mirror of
https://github.com/clearml/clearml-docs
synced 2025-06-26 18:17:44 +00:00
Small edits (#418)
This commit is contained in:
@@ -47,17 +47,17 @@ However, there’s also docker mode. In this case the agent will run every incom
|
||||
Now that our configuration is ready, we can start our agent in docker mode by running the command `clearml-agent daemon –docker`
|
||||
|
||||
|
||||
After running the command, we can see it pop up in our workers table. Now the agent will start listening for tasks in the `default` queue and it’s ready to go!
|
||||
After running the command, we can see it pop up in our workers table. Now the agent will start listening for tasks in the `default` queue, and it’s ready to go!
|
||||
|
||||
Let's give our workers something to do. Say you have a task that you already ran on your local machine and you tracked it using the 2 magic lines that we saw before. Just like in the last video, we can right click it and clone it, so it’s now in draft mode. We can easily change some of the hyperparameters on-the-fly and *enqueue* the task.
|
||||
Let's give our workers something to do. Say you have a task that you already ran on your local machine, and you tracked it using the 2 magic lines that we saw before. Just like in the last video, we can right-click it and clone it, so it’s now in draft mode. We can easily change some of the hyperparameters on-the-fly and *enqueue* the task.
|
||||
|
||||
The agent will immediately detect that we enqueued a task and start working on it. Like we saw before, it will spin up a docker container, install the required packages and dependencies and run the code.
|
||||
|
||||
The task itself is reported to the experiment manager just like any other task, and you can browse its outputs like normal, albeit with the changed parameters we edited earlier during draft mode.
|
||||
|
||||
On the left we can see a button labeled “Workers and Queues”. Under the workers tab we can see that our worker is indeed busy with our task and we can see its resource utilization as well. If we click on the current experiment, we end up in our experiment view again. Now, imagine we see in the scalar output that our model isn’t training the way we want it to, we can abort the task here and the agent will start working on the next task in the queue.
|
||||
On the left we can see a button labeled “Workers and Queues”. Under the workers tab we can see that our worker is indeed busy with our task, and we can see its resource utilization as well. If we click on the current experiment, we end up in our experiment view again. Now, imagine we see in the scalar output that our model isn’t training the way we want it to, we can abort the task here and the agent will start working on the next task in the queue.
|
||||
|
||||
Back to our workers overview. Over in the Queues tab, we get some extra information about which experiments are currently in the queue and we can even change their order by dragging them in the correct position like so. Finally, we have graphs of the overall waiting time and overall amount of enqueued tasks over time.
|
||||
Back to our workers overview. Over in the Queues tab, we get some extra information about which experiments are currently in the queue, and we can even change their order by dragging them in the correct position like so. Finally, we have graphs of the overall waiting time and overall amount of enqueued tasks over time.
|
||||
|
||||
Talking of which, let’s say your wait times are very long because all data scientists have collectively decided that now is a perfect time to train their models and your on-premise servers are at capacity. We have built-in autoscalers for AWS and GCP (in the works) which will automatically spin up new `clearml-agent` VMs when the queue wait time becomes too long. If you go for the premium tiers of ClearML, you’ll even get a really nice dashboard to go along with it.
|
||||
|
||||
|
||||
@@ -30,7 +30,7 @@ After running `pip install clearml` we can add 2 simple lines of python code to
|
||||
|
||||
The pip package also includes `clearml-data`. It can help you keep track of your ever-changing datasets and provides an easy way to store, track and version control your data. It’s also an easy way to share your dataset with colleagues over multiple machines while keeping track of who has which version. ClearML Data can even keep track of your data’s ancestry, making sure you can always figure out where specific parts of your data came from.
|
||||
|
||||
Both the 2 magic lines and the data tool will send all of their information to a ClearML server. This server then keeps an overview of your experiment runs and data sets over time, so you can always go back to a previous experiment, see how it was created and even recreate it exactly. Keep track of your best models by creating leaderboards based on your own metrics and you can even directly compare multiple experiment runs, helping you to figure out the best way forward for your models.
|
||||
Both the 2 magic lines and the data tool will send all of their information to a ClearML server. This server then keeps an overview of your experiment runs and data sets over time, so you can always go back to a previous experiment, see how it was created and even recreate it exactly. Keep track of your best models by creating leaderboards based on your own metrics, and you can even directly compare multiple experiment runs, helping you to figure out the best way forward for your models.
|
||||
|
||||
To get started with a server right away, you can make use of the free tier. And when your needs grow, we’ve got you covered too! Just check out our website to find a tier that fits your organisation best. But, because we’re open source, you can also host your own completely for free. We have AWS images, Google Cloud images, you can run it on docker-compose locally or even, if you really hate yourself, run it on a self-hosted kubernetes cluster using our helm charts.
|
||||
|
||||
@@ -40,7 +40,7 @@ The `clearml-agent` is a daemon that you can run on 1 or multiple machines and t
|
||||
|
||||
Now that we have this remote execution capability, the possibilities are near endless.
|
||||
|
||||
For example, It’s easy to set up an agent on a either a CPU or a GPU machine, so you can easily run all of your experiments on any compute resource you have available. And if you spin up your agents in the cloud, they’ll even support auto scaling out of the box.
|
||||
For example, It’s easy to set up an agent on either a CPU or a GPU machine, so you can easily run all of your experiments on any compute resource you have available. And if you spin up your agents in the cloud, they’ll even support auto scaling out of the box.
|
||||
|
||||
You can set up multiple machines as agents to support large teams with their complex projects and easily configure a queuing system to get the most out of your available hardware.
|
||||
|
||||
@@ -48,7 +48,7 @@ Talking about using multiple machines, say you have an experiment and want to op
|
||||
|
||||
You can even use a Google Colab instance as a ClearML Agent to get free GPU power, just sayin!
|
||||
|
||||
As a final example of how you could use the agent's functionality, ClearML provides a `PipelineController`, which allows you to chain together tasks by plugging the output of one task as the input of another. Each of the tasks are of course run on your army of agents for full automation.
|
||||
As a final example of how you could use the agent's functionality, ClearML provides a `PipelineController`, which allows you to chain together tasks by plugging the output of one task as the input of another. Each of the tasks is of course run on your army of agents for full automation.
|
||||
|
||||
As you can see ClearML is a large toolbox, stuffed with the most useful components for both data scientists and MLOps engineers. We’re diving deeper into each component in the following videos if you need more details, but feel free to get started now at clear.ml.
|
||||
|
||||
|
||||
@@ -60,7 +60,7 @@ Next to automatic logging, it is super easy to manually add anything you want to
|
||||
|
||||
Just take a look at our documentation for more info.
|
||||
|
||||
If you want to show colleagues or friends how well your models are performing, you can easily share a task by right clicking it and choosing share to make it accessible with a link. Anyone visiting that link will get the detail view in fullscreen mode and the task itself will get a tag showing that it’s now shared.
|
||||
If you want to show colleagues or friends how well your models are performing, you can easily share a task by right-clicking it and choosing share to make it accessible with a link. Anyone visiting that link will get the detail view in fullscreen mode and the task itself will get a tag showing that it’s now shared.
|
||||
|
||||
In many cases, we also want to compare multiple versions of our experiments directly, this is easily done by selecting the tasks you’re interested in and clicking on compare in the bottom ribbon.
|
||||
|
||||
|
||||
@@ -48,7 +48,7 @@ I've collapsed a lot of the functions here so that it's a lot easier to take a l
|
||||
I'm going through these files is the `Task.init` command and essentially this is what ClearML uses to keep track of every
|
||||
time you run this specific script. So you'll see it in `get_data.py`, you'll see it in `preprocessing.py`, and you'll
|
||||
see it in `training.py` as well. And so this line is all you need to get started. It will already start capturing
|
||||
everything that you'll need and that the program produces like plots or hyper parameters, you name it.
|
||||
everything that you'll need and that the program produces like plots or hyperparameters, you name it.
|
||||
|
||||
So let's take a look in depth first at what `get_data.py` does for me. So getting data is very simple, but what I used
|
||||
to do is I would get the data from like a remote location, You download a zip file or whatever, and then you extract it
|
||||
@@ -69,8 +69,8 @@ don't change the name, you overwrite it. so that's all the thing of the past. No
|
||||
it to you later in the UI, we have a nice and clear overview of all of the different versions.
|
||||
|
||||
I'll add some dataset statistics that's also something you can do and ClearML is just add some, for example, class
|
||||
distribution or other kind of plots that could be interesting and then I'm actually building the ClearML dataset here.
|
||||
Also, an an extra thing that is really really useful if you use ClearML datasets is you can actually share it as well.
|
||||
distribution or other kind of plots that could be interesting, and then I'm actually building the ClearML dataset here.
|
||||
Also, an an extra thing that is really, really useful if you use ClearML datasets is you can actually share it as well.
|
||||
So not only with colleagues and friends, for example. You can share the data with them, and they can add to the data, and
|
||||
always you will always have the latest version, you will always know what happened before that.
|
||||
|
||||
@@ -199,7 +199,7 @@ learning something, it's doing something so that actually is very interesting.
|
||||
And then you have debug samples as well, which you can use to show actually whatever kind of media you need. So these
|
||||
are for example, the images that I generated that are the mel spectrograms so that the preprocessing outputs uh, and you
|
||||
can just show them here with the name of what the label was and what to predict it was. So I can just have a very quick
|
||||
overview of how this is working and then I can actually even do it with audio samples as well. So I can for example here
|
||||
overview of how this is working, and then I can actually even do it with audio samples as well. So I can for example here
|
||||
say this is labeled "dog", and it is predicted as "children playing". So then I can listen to it and get an idea on, is
|
||||
this correct? Is it not correct? In this case, obviously it's not correct, but then I can go further into the iterations
|
||||
and then hopefully it will get better and better over time. But this is a quick way that I can just validate that what
|
||||
@@ -253,7 +253,7 @@ also use these differences to then go back to the original code.
|
||||
|
||||
Of course, hyperparameters. There weren't any differences. We didn't actually change any of the hyperparameters here,
|
||||
but if we did, that would also be highlighted in red in this section. So if we're going to look at the scalars, this is
|
||||
where it gets really interesting because now the plots are overlaid on top of each other and you can change the color
|
||||
where it gets really interesting because now the plots are overlaid on top of each other, and you can change the color
|
||||
if you don't if you don't like the color. I think green is a bit ugly. So let's take red for example. We can just
|
||||
change that here. And then we have a quick overview of two different compared experiments and then how their scalars did
|
||||
over time. And because they have the same X-axis the iterations, we can actually compare them immediately to each other,
|
||||
@@ -311,7 +311,7 @@ us the full range of experiments that we trained this way on the full dataset, a
|
||||
it got the most or the highest F1 score on the subset, we don't actually have the highest score on the full dataset yet.
|
||||
However, even though it is not the best model, it might be interesting to get a colleague or a friend to take a look at
|
||||
it and see what we could do better or just show off the new model that you made. So the last thing I want to show you is
|
||||
that you can now easily click it, right click, and then go to share, and you can share it publicly. If you create a
|
||||
that you can now easily click it, right-click, and then go to share, and you can share it publicly. If you create a
|
||||
link, you can send this link to your friend, colleague, whatever, and they will be able to see the complete details of
|
||||
the whole experiment, of everything you did, you can see the graphs, they can see the hyperparameters, and I can help
|
||||
you find the best ways forward for your own models.
|
||||
|
||||
@@ -22,9 +22,9 @@ keywords: [mlops, components, hyperparameter optimization, hyperparameter]
|
||||
<div className="cml-expansion-panel-content">
|
||||
Hello and welcome to ClearML. In this video we’ll take a look at one cool way of using the agent other than rerunning a task remotely: hyperparameter optimization.
|
||||
|
||||
By now, we know that ClearML can easily capture our hyperparameters and scalars as part of the experiment tracking. We also know we can clone any task and change its hyperparameters so they’ll be injected into the original code at runtime. In the last video, we learnt how to make a remote machine execute this task automatically by using the agent.
|
||||
By now, we know that ClearML can easily capture our hyperparameters and scalars as part of the experiment tracking. We also know we can clone any task and change its hyperparameters, so they’ll be injected into the original code at runtime. In the last video, we learnt how to make a remote machine execute this task automatically by using the agent.
|
||||
|
||||
Soooo… Can we just clone a task like a 100 times, inject different hyperparameters in every clone, run the clones on 10 agents and then sort the results based on a specific scalar?
|
||||
Soooo… Can we just clone a task like 100 times, inject different hyperparameters in every clone, run the clones on 10 agents and then sort the results based on a specific scalar?
|
||||
|
||||
Yeah, yeah we can, it's called hyperparameter optimization. And we can do all of this automatically too! No way you were going to clone and edit those 100 tasks yourself, right?
|
||||
|
||||
|
||||
@@ -47,7 +47,7 @@ The structure of your pipeline will be derived from looking at this `parents` ar
|
||||
|
||||
Now we do the same for the final step. However, remember the empty hyperparameters we saw before? We still have to overwrite these. We can use the `parameter_override` argument to do just that.
|
||||
|
||||
For example we can tell the first step to use the global pipeline parameter raw data url like so. But we can also reference output artifacts from a previous step by using its name and we can of course also just overwrite a parameter with a normal value. Finally, we can even pass along the unique task ID of a previous step, so you can get the task object based on that ID and access anything and everything within that task.
|
||||
For example, we can tell the first step to use the global pipeline parameter raw data url like so. But we can also reference output artifacts from a previous step by using its name and we can of course also just overwrite a parameter with a normal value. Finally, we can even pass along the unique task ID of a previous step, so you can get the task object based on that ID and access anything and everything within that task.
|
||||
|
||||
And that’s it! We now have our first pipeline!
|
||||
|
||||
|
||||
@@ -57,7 +57,7 @@ After filling in all these settings, let’s launch the autoscaler now, so we ca
|
||||
|
||||
We immediately start in the autoscaler dashboard, and we can see the amount of machines that are running, the amount that are doing nothing, how many machines we have available per queue and all the autoscaler logs. Right now we have no machines running at all because our queues are empty.
|
||||
|
||||
So if we go to one of our projects, clone these tasks here, and then enqueue them in the CPU queue and clone this task here as well. We can edit the parameters like we saw before and even change which container it should be run in. We then enqueue it in the GPU queue and we should now see the autoscaler kicking into action.
|
||||
So if we go to one of our projects, clone these tasks here, and then enqueue them in the CPU queue and clone this task here as well. We can edit the parameters like we saw before and even change which container it should be run in. We then enqueue it in the GPU queue, and we should now see the autoscaler kicking into action.
|
||||
|
||||
The autoscaler has detected the tasks in the queue and has started booting up remote machines to process them. We can follow along with the process in our autoscaler dashboard.
|
||||
|
||||
|
||||
Reference in New Issue
Block a user