From 7df37fe79ade8a080f0702dc71393f13367e8266 Mon Sep 17 00:00:00 2001
From: pollfly <75068813+pollfly@users.noreply.github.com>
Date: Tue, 24 Jan 2023 11:43:07 +0200
Subject: [PATCH] Edit video tutorial docs (#452)
---
.../agent_remote_execution_and_automation.md | 13 ++--
.../video_tutorials/clearml-data.md | 14 ++--
.../core_component_overview.md | 8 +-
.../experiment_management_best_practices.md | 11 +--
.../experiment_manager_hands-on.md | 13 +---
...how_clearml_is_used_by_a_data_scientist.md | 54 +++++++-------
...ow_clearml_is_used_by_an_mlops_engineer.md | 73 +++++++++----------
..._ci_cd_using_github_actions_and_clearml.md | 49 ++++++-------
.../hyperdatasets_data_versioning.md | 13 ++--
.../hyperparameter_optimization.md | 13 ++--
.../video_tutorials/pipelines_from_code.md | 10 +--
.../video_tutorials/pipelines_from_tasks.md | 10 +--
.../video_tutorials/quick_introduction.md | 8 +-
.../video_tutorials/the_clearml_autoscaler.md | 12 +--
14 files changed, 124 insertions(+), 177 deletions(-)
diff --git a/docs/getting_started/video_tutorials/agent_remote_execution_and_automation.md b/docs/getting_started/video_tutorials/agent_remote_execution_and_automation.md
index 7e4661f5..4e428d9b 100644
--- a/docs/getting_started/video_tutorials/agent_remote_execution_and_automation.md
+++ b/docs/getting_started/video_tutorials/agent_remote_execution_and_automation.md
@@ -17,9 +17,8 @@ keywords: [mlops, components, ClearML agent]
-
-Read the transcript
-
+### Video Transcript
+
Welcome to ClearML. In this video we’ll take a look at the ClearML Agent, which will allow you to run your tasks remotely and open the door for automating your workflows.
Remember our overview from the previous video? We talked about the pip package that allows us to run experiments and data management as well as the server, which stores everything we track. Today we add a third component: the ClearML Agent.
@@ -55,14 +54,12 @@ The agent will immediately detect that we enqueued a task and start working on i
The task itself is reported to the experiment manager just like any other task, and you can browse its outputs like normal, albeit with the changed parameters we edited earlier during draft mode.
-On the left we can see a button labeled “Workers and Queues”. Under the workers tab we can see that our worker is indeed busy with our task, and we can see its resource utilization as well. If we click on the current experiment, we end up in our experiment view again. Now, imagine we see in the scalar output that our model isn’t training the way we want it to, we can abort the task here and the agent will start working on the next task in the queue.
+On the left we can see a button labeled **Workers and Queues**. Under the **Workers** tab we can see that our worker is indeed busy with our task, and we can see its resource utilization as well. If we click on the current experiment, we end up in our experiment view again. Now, imagine we see in the scalar output that our model isn’t training the way we want it to, we can abort the task here and the agent will start working on the next task in the queue.
-Back to our workers overview. Over in the Queues tab, we get some extra information about which experiments are currently in the queue, and we can even change their order by dragging them in the correct position like so. Finally, we have graphs of the overall waiting time and overall amount of enqueued tasks over time.
+Back to our workers overview. Over in the **Queues** tab, we get some extra information about which experiments are currently in the queue, and we can even change their order by dragging them in the correct position like so. Finally, we have graphs of the overall waiting time and overall amount of enqueued tasks over time.
Talking of which, let’s say your wait times are very long because all data scientists have collectively decided that now is a perfect time to train their models and your on-premise servers are at capacity. We have built-in autoscalers for AWS and GCP (in the works) which will automatically spin up new `clearml-agent` VMs when the queue wait time becomes too long. If you go for the premium tiers of ClearML, you’ll even get a really nice dashboard to go along with it.
In the following video we’ll go a little deeper yet into this newly discovered automation thing we just saw and introduce things like automatic hyperparameter optimization and pipelines.
-But for now, feel free to start spinning up some agents on your own machines completely for free at app.clear.ml or by using our self-hosted server on GitHub, and don’t forget to join our Slack channel if you need any help.
-
-
+But for now, feel free to start spinning up some agents on your own machines completely for free at [app.clear.ml](https://app.clear.ml) or by using our self-hosted server on GitHub, and don’t forget to join our [Slack Channel](https://join.slack.com/t/clearml/shared_invite/zt-1kvcxu5hf-SRH_rmmHdLL7l2WadRJTQg) if you need any help.
diff --git a/docs/getting_started/video_tutorials/clearml-data.md b/docs/getting_started/video_tutorials/clearml-data.md
index e62038fb..d9110ffc 100644
--- a/docs/getting_started/video_tutorials/clearml-data.md
+++ b/docs/getting_started/video_tutorials/clearml-data.md
@@ -17,9 +17,7 @@ keywords: [mlops, components, ClearML data]
-
-Read the transcript
-
+### Video Transcript
Hello and welcome to ClearML. In this video we’ll take a look at both the command line and python interfaces of our data versioning tool called `clearml-data`.
@@ -47,9 +45,9 @@ A really useful thing we can do with the python interface is adding some interes
Finally, upload the dataset and then finalize it, or just set `auto_upload` to `true` to make it a one liner.
-In the webUI, we can now see the details of our dataset version by clicking on the Dataset button on the left. When we click on our newly created dataset here, we get an overview of our latest version, of course we have only one for now.
+In the web UI, we can now see the details of our dataset version by clicking on the Dataset button on the left. When we click on our newly created dataset here, we get an overview of our latest version, of course we have only one for now.
-At a glance you can see things like the dataset ID, its size, and which files have been changed in this particular version. If you click on details, you’ll get a list of those files in the content tab. Let’s make the view a little larger with this button, so it’s easier to see. When we switch to the preview tab, we can see the histogram we made before as well as an automatically generated preview of some of the files in our dataset version. Feel free to add anything you want in here! Finally, you can check out the original console logs that can be handy for debugging.
+At a glance you can see things like the dataset ID, its size, and which files have been changed in this particular version. If you click on details, you’ll get a list of those files in the **Content** tab. Let’s make the view a little larger with this button, so it’s easier to see. When we switch to the preview tab, we can see the histogram we made before as well as an automatically generated preview of some of the files in our dataset version. Feel free to add anything you want in here! Finally, you can check out the original console logs that can be handy for debugging.
Now imagine we’re on a different machine. Maybe one from a team member, a classmate, or just one of your remote agents, and you want to get the dataset to do something cool with it.
@@ -83,10 +81,8 @@ Now we can take a look again at the dataset UI. We’ll see our original dataset
When we click on our newest version in the lineage view, we can see that we indeed added 4 files and removed 3.
-If we now click on details again to look at the content, we can see that our chocolate cakes have been added correctly. You’ll also notice that when we go to the preview tab, we only see chocolate cakes. This is because a dataset version only stores the differences between itself and its parents. So in this case, only chocolate cakes were added.
+If we now click on details again to look at the content, we can see that our chocolate cakes have been added correctly. You’ll also notice that when we go to the **Preview** tab, we only see chocolate cakes. This is because a dataset version only stores the differences between itself and its parents. So in this case, only chocolate cakes were added.
In this video, we’ve covered the most important uses of ClearML Data, so hopefully you have a good intuition into what’s possible now and how valuable it can be. Building and updating your dataset versions from code is the best way to keep everything updated and make sure no data is ever lost. You’re highly encouraged to explore ways to automate as much of this process as possible, take a look at our documentation to find the full range of possibilities.
-So what are you waiting for? Start tracking your datasets with `clearml-data` and don’t forget to join our Slack channel if you need any help.
-
-
+So what are you waiting for? Start tracking your datasets with `clearml-data` and don’t forget to join our [Slack Channel](https://join.slack.com/t/clearml/shared_invite/zt-1kvcxu5hf-SRH_rmmHdLL7l2WadRJTQg) if you need any help.
diff --git a/docs/getting_started/video_tutorials/core_component_overview.md b/docs/getting_started/video_tutorials/core_component_overview.md
index c2f2a210..e6cd23e8 100644
--- a/docs/getting_started/video_tutorials/core_component_overview.md
+++ b/docs/getting_started/video_tutorials/core_component_overview.md
@@ -16,9 +16,7 @@ keywords: [mlops, components]
-
-Read the transcript
-
+### Video Transcript
Welcome to ClearML! This video will serve as an overview of the complete ClearML stack. We’ll introduce you to the most important concepts and show you how everything fits together, so you can deep dive into the next videos, which will cover the ClearML functionality in more detail.
@@ -40,7 +38,7 @@ The `clearml-agent` is a daemon that you can run on 1 or multiple machines and t
Now that we have this remote execution capability, the possibilities are near endless.
-For example, It’s easy to set up an agent on either a CPU or a GPU machine, so you can easily run all of your experiments on any compute resource you have available. And if you spin up your agents in the cloud, they’ll even support auto scaling out of the box.
+For example, It’s easy to set up an agent on either a CPU or a GPU machine, so you can easily run all of your experiments on any compute resource you have available. And if you spin up your agents in the cloud, they’ll even support autoscaling out of the box.
You can set up multiple machines as agents to support large teams with their complex projects and easily configure a queuing system to get the most out of your available hardware.
@@ -52,5 +50,3 @@ As a final example of how you could use the agent's functionality, ClearML provi
As you can see ClearML is a large toolbox, stuffed with the most useful components for both data scientists and MLOps engineers. We’re diving deeper into each component in the following videos if you need more details, but feel free to get started now at clear.ml.
-
-
diff --git a/docs/getting_started/video_tutorials/experiment_management_best_practices.md b/docs/getting_started/video_tutorials/experiment_management_best_practices.md
index 779f5dc3..f0d1ec0f 100644
--- a/docs/getting_started/video_tutorials/experiment_management_best_practices.md
+++ b/docs/getting_started/video_tutorials/experiment_management_best_practices.md
@@ -17,12 +17,8 @@ keywords: [mlops, components, Experiment Manager]
+### Video Transcript
-
-
-
-Read the transcript
-
Welcome to ClearML. In this video, we’ll go deeper into some of the best practices and advanced tricks you can use while working with ClearML experiment management.
The first thing to know is that the Task object is the central pillar of both the experiment manager and the orchestration and automation components. This means that if you manage the task well in the experiment phase, it will be much easier to scale to production later down the line.
@@ -69,6 +65,5 @@ And then we’re not even talking about all the ways to automate tasks using the
For the next videos we’ll finally cover automation and orchestration as well as ClearML Data, our data versioning tool.
-Feel free to check out and test all of these features at app.clear.ml, or using our self-hosted server on GitHub and don’t forget to join our Slack channel if you need any help.
-
-
+Feel free to check out and test all of these features at [app.clear.ml](https://app.clear.ml), or using our self-hosted server on GitHub and don’t forget to join our [Slack Channel](https://join.slack.com/t/clearml/shared_invite/zt-1kvcxu5hf-SRH_rmmHdLL7l2WadRJTQg) if you need any help.
+
diff --git a/docs/getting_started/video_tutorials/experiment_manager_hands-on.md b/docs/getting_started/video_tutorials/experiment_manager_hands-on.md
index ae42f849..febf7cea 100644
--- a/docs/getting_started/video_tutorials/experiment_manager_hands-on.md
+++ b/docs/getting_started/video_tutorials/experiment_manager_hands-on.md
@@ -17,12 +17,7 @@ keywords: [mlops, components, Experiment Manager]
-
-
-
-
-Read the transcript
-
+### Video Transcript
Welcome to ClearML! In this video, you’ll learn how to quickly get started with the experiment manager by adding 2 simple lines of Python code to your existing project.
@@ -34,7 +29,7 @@ The first thing to do is to install the `clearml` python package in our virtual
If you paid attention in the first video of this series, you’d remember that we need to connect to a ClearML Server to save all our tracked data. The server is where we saw the list of experiments earlier. This connection is what `clearml-init` will set up for us. When running the command it’ll ask for your server API credentials.
-To get those, go to your ClearML server webpage. If you’re using our hosted service, this will be at app.clear.ml. if you’re hosting your own, browse to your server's address at port 8080. Go to your settings on the top right and, under workspace, create new credentials. This will pop up a window with your API info, and you can just copy paste it into the `clearml-init` prompt.
+To get those, go to your ClearML server webpage. If you’re using our hosted service, this will be at [app.clear.ml](https://app.clear.ml). if you’re hosting your own, browse to your server's address at port 8080. Go to your settings on the top right and, under workspace, create new credentials. This will pop up a window with your API info, and you can just copy paste it into the `clearml-init` prompt.
The prompt will suggest the server URLs that were in your copied snippet. If they are correct just press Enter, otherwise you can change them here.
@@ -72,6 +67,4 @@ Scalars such as loss or accuracy will be plotted on the same axes which makes co
Finally, plots such as a confusion matrix and debug samples can be compared too. For those times when you just want to confirm that the new model is better with your own eyes.
-Now that you’re ready to start tracking and managing your experiments, we’ll cover some more advanced features and concepts of the experiment manager in the next video. But if you want to get started right now, head over to clear.ml and join our community Slack channel if you need any help.
-
-
+Now that you’re ready to start tracking and managing your experiments, we’ll cover some more advanced features and concepts of the experiment manager in the next video. But if you want to get started right now, head over to clear.ml and join our community [Slack Channel](https://join.slack.com/t/clearml/shared_invite/zt-1kvcxu5hf-SRH_rmmHdLL7l2WadRJTQg) if you need any help.
diff --git a/docs/getting_started/video_tutorials/hands-on_mlops_tutorials/how_clearml_is_used_by_a_data_scientist.md b/docs/getting_started/video_tutorials/hands-on_mlops_tutorials/how_clearml_is_used_by_a_data_scientist.md
index 45d5ee5f..7aef5ec7 100644
--- a/docs/getting_started/video_tutorials/hands-on_mlops_tutorials/how_clearml_is_used_by_a_data_scientist.md
+++ b/docs/getting_started/video_tutorials/hands-on_mlops_tutorials/how_clearml_is_used_by_a_data_scientist.md
@@ -17,9 +17,7 @@ keywords: [mlops, components, machine learning, data scientist]
-
-Read the transcript
-
+### Video Transcript
Welcome to ClearML! In this video, I'll try to walk you through a day in my life where I try to optimize a model, and
I'll be teaching you how I used to do it before I was working for ClearML, and then now that I'm using ClearML all the
@@ -28,12 +26,12 @@ time, what kind of problems it solved and what, how it made my life easier. So l
You can see the overview of the code, so I'm not going to dive into the code immediately, I'm just going to give you some
context, and then we'll go deeper from there.
-So the idea is that I'm doing audio classification here. I have a client who I want to give like a proof of concept on
+So the idea is that I'm doing audio classification here. I have a client who I want to give a proof of concept on
how well it can work, and I'm doing that on the UrbanSound dataset. So the first thing I'll do, and you'll see that
later is I'll get the data from the UrbanSound servers. I'm using a script called `get_data.py` for that, and then for
reasons I'll go further into in the video I'm actually putting all of that data into a ClearML dataset which is a special
-kind of dataset task or like a special kind of ClearML task that can keep track of your data. Then the `preprocessing.py`
-script will get that data and then convert the WAV files or like the audio files to spectrum images. Essentially you're
+kind of dataset task or a special kind of ClearML task that can keep track of your data. Then the `preprocessing.py`
+script will get that data and then convert the WAV files or the audio files to spectrum images. Essentially you're
turning all your data into image data because the models that do image data are actually very, very easy to work with and
are pretty good, so you can actually do the classification by classifying an image from your audio instead of classifying
your audio as a whole. Really cool stuff.
@@ -51,7 +49,7 @@ see it in `training.py` as well. And so this line is all you need to get started
everything that you'll need and that the program produces like plots or hyperparameters, you name it.
So let's take a look in depth first at what `get_data.py` does for me. So getting data is very simple, but what I used
-to do is I would get the data from like a remote location, You download a zip file or whatever, and then you extract it
+to do is I would get the data from a remote location. You download a zip file or whatever, and then you extract it
to your local folder, and then you start working on that. Now the problem with that is it's really difficult to keep
that thing clean. So, how would I version that right if I add data to it? For example, the preprocessed data we'll see
later. How can I keep my correct version? How did I? How do I know if the data changes over time? When did I do that?
@@ -68,8 +66,8 @@ folders and then probably different folders with different names as well for eve
don't change the name, you overwrite it. so that's all the thing of the past. Now we have nice and clear. I'll show
it to you later in the UI, we have a nice and clear overview of all of the different versions.
-I'll add some dataset statistics that's also something you can do and ClearML is just add some, for example, class
-distribution or other kind of plots that could be interesting, and then I'm actually building the ClearML dataset here.
+I'll add some dataset statistics that's also something you can do and ClearML is just adding some, for example, class
+distribution or other kinds of plots that could be interesting, and then I'm actually building the ClearML dataset here.
Also, an extra thing that is really, really useful if you use ClearML datasets is you can actually share it as well.
So not only with colleagues and friends, for example. You can share the data with them, and they can add to the data, and
always you will always have the latest version, you will always know what happened before that.
@@ -83,7 +81,7 @@ tries to solve. So that's what the dataset or the `get_data.py` does for you.
Then we have the `preprocessing.py` which is relatively simple. Essentially, what I'm doing is, I'll get the data from
the `get_data.py` So the previous dataset version. I'll get that data and then each line by line. So each, every, each
and every sample in that dataset will then be preprocessed using the preprocessing class, which will just calculate a
-mel spectrogram if you're into that kind of thing. but I won't go into depth in about it here. Essentially, we'll create
+mel spectrogram if you're into that kind of thing. but I won't go into depth about it here. Essentially, we'll create
a mel spectrogram for each sample that will give us an image, and then we take that image and put it into a different
dataset, which now has the same structure as the previous dataset, but now also with images there. And because the WAV
files, or the audio files, are already in the previous dataset, this new version will only upload the images that we
@@ -91,7 +89,7 @@ just produced. It won't duplicate the data because it knows it's already in a pr
instead. So that also saves a bit of disk space, if you're trying to put it on the cloud as well.
Now how I used to do this before ClearML is actually creating a new folder with a unique name for that specific run and
-then putting all of the images in there. But that's that's just a huge mess, right? We've all done this. But then you
+then putting all of the images in there. But that's just a huge mess, right? We've all done this. But then you
forget to change the name, and then you overwrite your previous samples. But you also don't know if you're just running
through it. You don't know what kind of code or like what the code was that created your previous versions right? So
they're not saved together which is a huge mess. It gets out of hand really quickly. You end up with a huge folder full
@@ -118,7 +116,7 @@ automatically captured by ClearML. So that's again something that I don't have t
saved together with my code together with those hyperparameters you just saw together with the output, which is really
handy.
-But then there is a last thing that I want to focus on. And that is the model files. So again, before I used ClearML,
+But then there is one last thing that I want to focus on. And that is the model files. So again, before I used ClearML,
the model files, I would essentially create one long name for the file with just underscores and all of the different
parameters in there, so that in every model file, I could easily see what the parameters were that I used over time to
create those model files. But that's just a huge mess because the amount of parameters that you use changes over time,
@@ -148,9 +146,9 @@ models or training runs that I did that failed or that I made a code mistake or
They just clutter the whole thing. You can do whatever you want of course, but this is my preference.
And then there is the option to sort as well. So in any of these columns you can just sort on that property and in this
-case I'm going to sort on name and then quick tip for you. If you use shift click you can sort secondarily on that
-column as well. So right now I'm sorted on name first and then all of the experiments that have the same name are
-secondarily sorted on their started time, which will give me the most recent one of each batch. And then you can see
+case I'm going to sort by name, and then a quick tip for you. If you use shift click you can sort secondarily on that
+column as well. So right now I'm sorted by name first and then all of the experiments that have the same name are
+secondarily sorted by their started time, which will give me the most recent one of each batch. And then you can see
here the last training that I did was 18 hours ago and if I scroll down a while ago I did some preprocessing. I did
some downloading of the data and that is also sorted quite nicely. You can also see on the tag that this is specifically
dataset, and then you can also see if I go to this dataset which is really cool.
@@ -194,7 +192,7 @@ iterations, I plot the confusion matrix again just so I can see over time how we
you can see here, a perfect confusion matrix will be a diagonal line because every true label will be combined with the
exact same predicted label. And in this case, it's horribly wrong. But then over time it starts getting closer and
closer to this diagonal shape that we're trying to get to. So this is showing me that at least in this point it's
-learning something, it's doing something so that actually is very interesting.
+learning something, it's doing something, so that actually is very interesting.
And then you have debug samples as well, which you can use to show actually whatever kind of media you need. So these
are for example, the images that I generated that are the mel spectrogram's so that the preprocessing outputs, and you
@@ -214,13 +212,13 @@ that, I can see the max F1 score of every single training run in this list, and
leaderboard essentially which will give me a nice overview of the best models that I have, and then I can just dive in
deeper to figure out why they were so good.
-So, But now it's time to actually start changing some stuff. So this is the beginning of my day. I've just gotten my
+But now it's time to actually start changing some stuff. So this is the beginning of my day. I've just gotten my
bearings, I know what the latest models were, and the score to beat here is 0.571429, and that's the F1 score we're
trying to beat on the subset and if the moment that we find a combination of parameters or a change of the code that
does better than this or that is in the general ballpark of this, we can actually then run it on the full dataset as
well. But I'll tell you more about that later.
-So the first thing we're going to do is going back to training the `training.py` script. I might want to change several
+So the first thing we're going to do is go back to training the `training.py` script. I might want to change several
parameters here, but what I've read something that I've been interested in and while getting the model, I see that here
I still use the optimizer stochastic gradient descent, and it could be really interesting to see how it compares if I
change this to atom. Now the atom optimizer is a really, really good one, so maybe it can give me an edge here. Of
@@ -230,10 +228,10 @@ all will be well. So you can see here that ClearML created a new task, and it's
using a specific dataset ID which you can find in the configuration dict. I set it to this dataset tag, use the latest
dataset using a subset tag so in that case it will get the latest data that is only in the subset. So that's what we're
training on here. You can see I have 102 samples in the training set, only seven in the test set. This is why it's
-subset, and now you can see that it's training in the app box and if I go to the experiment overview and I take a look
+subset, and now you can see that its training in the app box and if I go to the experiment overview and I take a look
at what is here, I can see that the training run, here, I'll sort it on started up front so that we have it up top.
I can see that the training run here is in status running, which means it's essentially reporting to ClearML which is
-what exactly what we want. And if I go to the Details tab, I can go to Results and see the console output being logged
+exactly what we want. And if I go to the **Details** tab, I can go to Results and see the console output being logged
here in real time and the causal output might not be this interesting to keep staring at, but what is interesting to
keep staring at is the scalars. So here you can see the F1 score and the training loss go up or down before your eyes
and that's really cool because then I can keep track of it or like have it as a dashboard somewhere just so that I know
@@ -243,10 +241,10 @@ into more in-depth analysis of this whole thing.
So right now we see that it's completed and if we go back to what we had before and I sort again by the F1 score, we see
that the newest training run that we just did two minutes ago, and it was updated a few seconds ago is actually better
than what we had before. So it seems that the atom optimizer in fact does have a big effect on what we're doing. And
-just to make sure that we didn't overlook anything, what I can do is I can select both my new model my new best model
+just to make sure that we didn't overlook anything, what I can do is I can select both my new model, my new best model,
and the previous model that I had and then compare them. So it's what I have right here and everything that you just saw
that was tracked, be it hyperparameters or plots or whatever, can all be compared between different training runs. So
-what we can see here if we click on execution, we have some uncommitted changes that are obviously different, and then
+what we can see here if we click on **Execution**, we have some uncommitted changes that are obviously different, and then
if we scroll down, what we can see is that for example, here the atom optimizer was added and the optimizer SGD was
removed. So this already gives us the idea of okay, this is what changed. This is really interesting and we can always
also use these differences to then go back to the original code.
@@ -256,7 +254,7 @@ but if we did, that would also be highlighted in red in this section. So if we'r
where it gets really interesting because now the plots are overlaid on top of each other, and you can change the color
if you don't if you don't like the color. I think green is a bit ugly. So let's take red for example. We can just
change that here. And then we have a quick overview of two different compared experiments and then how their scalars did
-over time. And because they have the same X-axis the iterations, we can actually compare them immediately to each other,
+over time. And because they have the same X-axis as the iterations, we can actually compare them immediately to each other,
which is really, really cool. We can actually even see how the GPU memory usage or the GPU utilization has fared over
time, which is really interesting. And then things come up like for example, in this case, the higher F1 score which is
in our case, the atom optimizer, had a higher loss as well, which is really interesting and we might want to take a look
@@ -288,10 +286,10 @@ these bars there and then say enqueue and what that will do is it will put that
stronger machine, a machine with a GPU as a ClearML Agent. And that agent is currently listening to a queue, and it will
grab experiments that are enqueued and start running them. And because we tracked everything, the code, the
hyperparameters, the original packages. The agent, the different machine, has no issues with completely reproducing my
-experiment, but just on a different dataset. And so now I can click on enqueue. I just want to enqueue it in the default
+experiment, but just on a different dataset. And so now I can click on **Enqueue**. I just want to enqueue it in the default
queue because my agent is listening to the default queue, and it is currently pending. As we can see here.
-If I now go to Workers and Queues, what you can see is that I have my available worker here, and it is currently running
+If I now go to **Workers and Queues**, what you can see is that I have my available worker here, and it is currently running
my training experiment. So if I click on the training experiment, it will get me back to the experiment that we just
edited. So if I go to the configuration we see that the batch sizes and the full dataset is right here. And it's
currently running, but it's not running on this machine. It's running on a different machine that is hosting the
@@ -300,7 +298,7 @@ the output of this agent is essentially right now, just showing that it's trying
had before. So now the agent is installing all the packages and is installing the whole environment to be able to run
your code without any issues on its own. And then we'll be able to follow along with the scalars as well and the plots
just as we would on any other task. How cool is that? That's awesome. So I'm going to let this run for a while, and
-we'll come back. We'll keep it on the scalars tab so that you can see the progress being made, and then we can see the
+we'll come back. We'll keep it on the **Scalars** tab so that you can see the progress being made, and then we can see the
whole loss and F1 score grow and go down over time, but on the full dataset this time, and then I'll come back when
it's done.
@@ -316,9 +314,7 @@ link, you can send this link to your friend, colleague, whatever, and they will
the whole experiment, of everything you did, you can see the graphs, they can see the hyperparameters, and I can help
you find the best ways forward for your own models.
-So I hope this kind of inspired you a little bit to try out ClearML. It's free to try at [app.clear.ml](https://app.clear.ml/),
+So I hope this kind of inspired you a little bit to try out ClearML. It's free to try at [app.clear.ml](https://app.clear.ml),
or you can even host your own open source server with the interface that you can see right now. So why not have a go at
it? And thank you for watching.
-
-
diff --git a/docs/getting_started/video_tutorials/hands-on_mlops_tutorials/how_clearml_is_used_by_an_mlops_engineer.md b/docs/getting_started/video_tutorials/hands-on_mlops_tutorials/how_clearml_is_used_by_an_mlops_engineer.md
index c457a69d..8742b7fa 100644
--- a/docs/getting_started/video_tutorials/hands-on_mlops_tutorials/how_clearml_is_used_by_an_mlops_engineer.md
+++ b/docs/getting_started/video_tutorials/hands-on_mlops_tutorials/how_clearml_is_used_by_an_mlops_engineer.md
@@ -17,24 +17,22 @@ keywords: [mlops, components, machine learning, mlops engineer]
-
-Read the transcript
-
+### Video Transcript
Hello again and welcome to ClearML. In this video we'll be going over a workflow of a potential MLOps Engineer. Now an
-MLOps Engineer is a vague term. This might be a specific person in your team that is doing only the Ops part of the
+MLOps Engineer is a vague term. This might be a specific person in your team that is doing only the Ops part of
machine learning. So the infrastructure and all of the workers and whatnot. Or it could be you as a data scientist. It
could be just the data scientist of the team that is most into things like docker and deployments. And that person now
has the job of a MLOps Engineer. So it really doesn't matter who you are. This video is going to be about what this kind
-of person will be doing and what ClearML can do to make the life of them a little easier. Just a little.
+of person will be doing and what ClearML can do to make their life a little easier. Just a little.
-So what we're going to do here is take a look or get started at the very least with our Workers and Queues tab. So if
+So what we're going to do here is take a look or get started at the very least with our Workers and **Queues** tab. So if
you've followed along with the Getting Started videos, this is actually something you've probably seen before. but I'm
going to go a little bit more into depth in this video.
-So the workers and queues tab, what does it do? So we have what we can expect. We have the workers tab, and we have the
-queues tab. Workers in ClearML are actually called agents. So you can see here that we have a bunch of available workers
-which are spun up by using the ClearML agent. I'll go more in depth in that in a minute. So we have a few available
+So the **Workers and Queues** tab, what does it do? So we have what we can expect. We have the **Workers** tab, and we have the
+**Queues** tab. Workers in ClearML are actually called agents. So you can see here that we have a bunch of available workers
+which are spun up by using the ClearML Agent. I'll go more in depth in that in a minute. So we have a few available
workers. We have Beast Zero, One, Two, and Three. I'm the person that called my own computer Beast. So my own computer
is running a few workers here. And then we also have Apps Agents, and I'll go a little bit more into detail what that
means later. Essentially, what it means is you have the applications right here and what that's going to do is give you
@@ -48,13 +46,13 @@ So we have example: Task 0 1, 2 and 3, programmer terms of course. We see the ex
iterations. In this case, it's a classical machine learning model, so we don't really have iterations, but if you have a
deep learning model, this is where your amount of iterations would come into play.
-If we click on any of these, we can see the worker name, and we can see its utilization over time in here as well. All
+If we click on any of these, we can see the worker name, and we can see its utilization over time here as well. All
right, so we can obviously make this longer. I've only been running this for a few hours or for an hour. So we have the
-worker name right here. We have the update time, so just to know that when was the last time that the worker actually
+worker name right here. We have the update time, so just to know when was the last time that the worker actually
sent in any new data. We have the current experiment on which we can click through, so I'll do that in a minute, and we
have the experiment runtime, and experiment iterations here as well.
-We also have the queues, which means that we can actually find out what queues this worker listening to. I should
+We also have the queues, which means that we can actually find out what queues this worker is listening to. I should
give some more context here. So if we go into the queues, ClearML works with a system of queues as well as workers. So
this actually comes from the fact that originally people were just giving out SSH keys to everyone to get them to work
on a remote system. And this is far far far from perfect, right? So you have several people SSHing in, you have several
@@ -62,7 +60,7 @@ people running their own workloads on the same GPUs. They have to share everythi
are also running their stuff on the GPU, you actually have no idea how long your own task will take, so that's something.
You can't have any priorities. So if everyone is just running their stuff and actually probably killing your stuff as
well because it's out of memory, because too many people are using it. So that's just a disaster, right? If you have a
-large GPU machine that you have to share with multiple people, or just even want to orchestrate several different tasks
+large GPU machine that you have to share with multiple people, or just want to orchestrate several different tasks
on, with different priorities, it becomes a real hassle. So that's actually why ClearML has workers and queues to try
and deal with that a little bit. And this is actually what we're calling orchestration. So if you look at our website,
you'll see orchestration and automation. Those terms might not mean very much. So this is what I'm going to be talking
@@ -77,8 +75,8 @@ worthless, right? But you can make however many of them you want. I can delete t
queue how many workers it has. So I'll show you that in a minute when we spin up a new worker. But we can actually
pretty easily see how many workers are serving a specific queue. So listening to that queue and that actually has an
effect on the overall waiting time. So for example, here we have four workers that we saw here before, right? So these
-are these four workers. They're all listening to the CPU queue. They're all running a CPU experiment. But then we have a
-bunch of other experiments in the queue still. So this is just a list of the next, like, essentially the order in which
+are these four workers. They're all listening to the CPU queue. They're all running a CPU experiment. But then we still
+have a bunch of other experiments in the queue. So this is just a list of the next, like, essentially the order in which
the next example tasks will be executed. So we see here that the next experiment is task four. We see that it was last
updated there, and now we see the queue experiment time rising rapidly. Because people are waiting in queue here, there
are many more tasks to be executed. We also have CPU queued experiments, which is just the amount of queued experiments
@@ -89,7 +87,7 @@ bit more into depth on that later because we don't actually have zero workers th
listening to this. Then we have the default queue, and we have the services queue and I should stop a little bit on the
services queue because the services queue is relatively special in ClearML. You have all of your custom queues that you
can create however you want CPU queue, GPU queue, whatever. The services queue is actually meant to host specific, not
-very heavy workloads, but that do this kind of orchestration that I was talking about. So imagine you have a pipeline
+very heavy workloads, but that does this kind of orchestration that I was talking about. So imagine you have a pipeline
for example, if you don't know what a pipeline does in ClearML, you can look at the Getting Started video that we made
on pipelines. And if you want to run a pipeline, you will need a pipeline controller. Essentially, it's a very small
piece of code that tells ClearML now you should run this step, then take that output, give it to the other step, run the
@@ -102,7 +100,7 @@ we can choose to just enqueue tasks or experiments that do simply that. So that'
when compared to other user-made queues.
We can see here that we have a bunch of workers, so we have the Beast 0, 1, 2 and 3 that are assigned to this services
-queue. But as we can see if we go and take a look at the CPU queue, we have a whole bunch of tasks here. So there is a
+queue. But as we can see if we go and take a look at the CPU queue, we have a whole bunch of tasks here. So there are a
lot of people waiting for their turn. So actually one thing that we can do is we can already change the priority. So
imagine we have example Person 19 that has a very, very tight deadline, and they actually need to be first. So we can
just drag them all the way up, let me scroll there, all the way up and now. There we go, all the way up top. So now we
@@ -123,7 +121,7 @@ bunch of output here. I can actually abort it as well. And if I abort it, what w
executing. Essentially, it will send a `ctrl c`, so a quit command or a terminate command, to the original task on the \
remote machine. So the remote machine will say okay, I'm done here. I will just quit it right here. If, for example,
your model is not performing very well, or you see like oh, something is definitely wrong here, you can always just
-abort it. And the cool thing is if we go back to the workers and queues, we'll see that the Beast 0 has given up working
+abort it. And the cool thing is if we go back to the **Workers and Queues**, we'll see that the `Beast 0` has given up working
on task 0 and has now picked task 18 instead. Which is the task that we put in there in terms of priority. So this has the
next priority. Work has already started on task 18. So this is really, really cool.
@@ -144,7 +142,7 @@ hosted version. You can also host your own open source server if you so require.
that I have already done this of course, but in this case you should be able to just add your credentials from the
server, and it should connect no problem. If you want more information on that, we have tutorial videos on that as well.
And then the next thing we should do is `clearml-agent daemon --queue` Now we can decide which queues we want this
-ClearML agent to listen to. So in this case, if we go and take a look at queues, we have CPU queue, which is by far the
+ClearML agent to listen to. So in this case, if we go and take a look at queues, we have a CPU queue, which is by far the
most requested queue. So in this case, imagine we have an extra machine on hand in the faculty or in the company you're
working with, and you should just add this machine or its resources to the pool. So in this case we're just going to say
`CPU Queue` and that's everything. So we just want a simple extra machine for the CPU queue. Then we're going to
@@ -154,17 +152,17 @@ depending on either the default image that I give here or the image that is atta
data scientists that are creating their remote tasks or their experiments, they can also assign a docker file or a
docker image to that, that it should be running. So if you have very specific package requirements or very specific
needs, you can, as a data scientist, already say I want to attach this docker image to it, and it will be run like such
-on the ClearML agent. So that's that gives you a lot of power. But in this case I will just say if the data scientists
+on the ClearML agent. So that gives you a lot of power. But in this case I will just say if the data scientists
gave no indication of what docker container to use, just use Python 3.7. This is the standard in our company, let's say,
and this is what we want to use. Okay, so if I run this, it should start up a new ClearML agent. So in this case you can
see it's running in docker mode, it's using default image Python 3.7, and it's listening to the CPU queue. Now if we go
-back to our workers and queues tab. We can see that here `any-remote-machine:0`. So we can actually see that we now
+back to our **Workers and Queues** tab. We can see that here `any-remote-machine:0`. So we can actually see that we now
immediately have a new remote worker, and it's actually already started on the next task. So now we're currently running
five workers on the CPU queue instead of four. So this was very, very easy to handle, very, very easy to set up. So this
is one part of what an MLOps engineer could be doing.
Now this is very, very manual to set up and the MLOps engineer is king of the automation after all. So we want some kind
-of some kind of way to automate all of this, right? So what we can do here is go to applications. And what we have is AWS
+of way to automate all of this, right? So what we can do here is go to applications. And what we have is AWS
Autoscaler and GCP Autoscaler in essence. Also, Azure will come later so that will be out soon. So if we go into the
AWS Autoscaler. What we see here is we have an MLOps GPU scaler and what that means is, we don't always have fixed
demand for GPU resources, right? So imagine you have a company in this case that has a lot of demand for CPU compute
@@ -244,7 +242,7 @@ it will say `clearml WARNING - Terminating local execution process`, so in this
machine. Now if we're going to take a look at the remote machine, we can see that we have our Model Training GPU in
`pending` state and remember we had no workers at all in our GPU queue. We have zero workers and the next experiment is
our Model Training GPU. But remember again that we also have the autoscaler. So if I go to Applications and go to
-autoscaler, you'll see here that we have indeed one task in the GPU queue. And we also see that the `GPU_machines`
+autoscaler, you'll see here that we indeed have one task in the GPU queue. And we also see that the `GPU_machines`
Running Instances is one as well. So we can follow along with the logs here. And it actually detected that there is a
task in a GPU queue, and it's now spinning up a new machine, a new GPU machine to be running that specific task, and then
it will shut that back down again when it's done. So this is just one example of how you can use `task.execute_remotely`
@@ -284,7 +282,7 @@ going to take a look, the query date here is a specific date, but that's not tod
want to do is rerun this, get data every single day or week or month depending on how quickly they can get their data
labeled.
-So this could be done manually relatively easily. You could just do every week, Click, go here, and it will just put a
+So this could be done manually relatively easily. You could just do every week, click, go here, and it will just put a
new entry in the experiment list, or you could of course automate it. And that's essentially what we're going to do with
the task scheduler. So you just get the task scheduler object from the automation module. You say the amount of sync
frequency. So this is essentially just when you change something in the configuration of the task scheduler, it will
@@ -297,7 +295,9 @@ can do. You could easily just clone a task. This is essentially what the task sc
Getting Started videos, you know that we can actually clone any of the experiments that are in the system and then
change the parameters and rerun it. So we could get the data or clone the `get data` and then do a task parameter
override of the query date with the current date today. That's very valid, but the NASA team actually made something
-really cool. If we go to pipelines here, you see that there is a NASA pipeline as well and the NASA pipeline is actually
+really cool.
+
+If we go to pipelines here, you see that there is a NASA pipeline as well and the NASA pipeline is actually
the exact steps that we saw before, but they train three different models with three different parameters and then pick
the best model from there. And what we see is that the query date is actually a parameter of the pipeline as well. And
if you remember correctly, pipelines are also tasks in ClearML, everything is a task, so that means that you can use
@@ -305,7 +305,9 @@ this the task scheduler also to schedule a complete pipeline run. And then overw
just as easily as you could do with any other task. So if I go into the full details of this task here, you will see
that this is actually the pipeline itself. The pipeline has just as any other task, these different tabs with info,
consoles, scalars, etc. and it has an ID as well. And this ID, if we copy it, we can actually use that instead. So let
-me paste it, it's already there. So the task to schedule is in fact the pipeline that we want to schedule. And then if I
+me paste it, it's already there.
+
+So the task to schedule is in fact the pipeline that we want to schedule. And then if I
do `scheduler.add_task`, I take the ID of the task to schedule, which is the pipeline in this case, I want to enqueue it
in the CPU queue. I want it to be at the hour 08:30 every Friday. So every week at 08:30, this will be run. So the
pipeline will be cloned and run using this parameter override. And the parameter override says essentially, take the
@@ -344,14 +346,14 @@ on the dataset of another data scientist in another project. So if we're going t
it's called Bob. So Bob is the other data scientist which is in charge of producing the dataset that is required.
So essentially, he uses the production tag to tell Alice this is the kind of dataset that you want. This is the dataset
that you want to use. This is the best so far. He has more recent datasets, but hasn't tagged them as production yet
-because he says they're not ready. So he can just keep continue to experiment, and do all the kind of things that he
+because he says they're not ready. So he can just keep continuing to experiment, and do all the kinds of things that he
wants while Alice is still firmly using production. So what Alice is doing is she's essentially querying on the dataset
of production. But it's annoying because they're in a different time zone, for example. And when Bob publishes a new
dataset, Alice has to be notified by using a chat application or whatever. And then Alice has to re-run everything
remotely so that her training is using the latest version. So this is not ideal, we can automate all of this. And this
is essentially what the task trigger is trying to do.
-So again, we make a scheduler just like we did with task scheduler. Pooling frequency in minutes is again to poll the
+So again, we make a scheduler just like we did with task scheduler. Polling frequency in minutes is again to poll the
configuration as well as sync frequency minutes. We put again the scheduler itself. We put it in the Project MLOps.
We call it Alice-Bob Trigger instead of the scheduler before. And then we get the task that we want to actually trigger.
So not the task that triggers, but the task that we want to create when the trigger is triggered, if that makes sense. So
@@ -371,16 +373,16 @@ Now we have to add a trigger. And you can add a dataset trigger, you can add a t
trigger that you wish. In this case it will be a dataset trigger. If we have a different dataset, a new kind of dataset
that fits this description, we want to trigger this task. So essentially the scheduled task ID is the task that we want
to run if the trigger is triggered, which is in this case `training_task.id` is the Project Alice task, the Model Training
-task. We have the schedule queue, so we want to obviously schedule it in any of the queues. We can use CPU queue in
+task. We have the schedule queue, so we want to obviously schedule it in any of the queues. We can use the CPU queue in
this case, and then we can give it a name as well. And just to make it clear that this training is not actually training
from Alice herself, but it's training on the new data of Bob. It's an automated training. We can give it a specific name
so that Alice knows this was triggered automatically, and then we can use `trigger_on_tags` where we should look to
actually trigger the trigger. Damn this is a lot of trigger.
So what happens here is we look in the Project Bob folder and then if a new tag production is found that wasn't there
-before we trigger, and in this case, this means we create a new task project Alice. So if we're going to run this so
+before we trigger, and in this case, this means we create a new task project Alice. So if we're going to run this
automation, not task scheduler but task trigger, this will again create a new specific let's call it orchestration task
-or automation task. And these are kind of tasks that you want in the services queue. These are the tasks that
+or automation task. And these are the kind of tasks that you want in the services queue. These are the tasks that
essentially just keep track of, is there a new data set version or not and it's very light to do so. This is typically
something you want in the services queue.
@@ -426,7 +428,7 @@ just add to your Slack if you want it, and it will essentially just tell you wha
are succeeded or failed. You can set, I want only alerts from this project, I want only alerts that are failed, only
alerts that are completed, will give you a bunch of output as well, which is really, really useful to see, etc. So this
is just a very small extra thing that you can take a look at to have some monitoring as well so that you don't even have
-to wait or take a look yourself at when your tasks finished.
+to wait or take a look yourself when your tasks are finished.
Another thing that I want to show you is a cool way to just get a little bit more of that juicy GPU power. One way you
can add agents next to Kubernetes spinning up themselves, spinning up a ClearML agent on your own machines or the auto
@@ -442,7 +444,7 @@ version than you, it's a bit annoying, you can't spin up a different container.
downside. So this is just a quick way. The actual notebook you can find on our GitHub. But this is just a really cool
way to get some extra GPU power as well.
-Now, all of these agents is one thing. You have the queues now, finally. Now, thank you for making it through this far.
+Now, all of these agents are one thing. You have the queues now, finally. Now, thank you for making it through this far.
We haven't actually even covered everything that ClearML can automate for you. There is HPO, which is hyperparameter
optimization. There are pipelines as well that can chain everything together. You saw a little bit when I showed you the
NASA project, but yeah, we're not there yet. There's also even a ClearML Session that you can use to run on a specific
@@ -450,7 +452,4 @@ machine, on a remote machine, and it will give you a remote interactive Jupyter
instance so that you can always code already on the remote machine. So that's also really, really cool. It's something
we're going to cover soon, but I think the video is already long enough. So thank you very, very much for watching.
Thank you very, very much for your attention. Let me know in the comments: if you want to see videos of these
-hyperparameters, and pipelines, and sessions and don't forget to join our Slack channel if you need any help.
-
-
-
+hyperparameters, and pipelines, and sessions, and don't forget to join our [Slack Channel](https://join.slack.com/t/clearml/shared_invite/zt-1kvcxu5hf-SRH_rmmHdLL7l2WadRJTQg) if you need any help.
diff --git a/docs/getting_started/video_tutorials/hands-on_mlops_tutorials/ml_ci_cd_using_github_actions_and_clearml.md b/docs/getting_started/video_tutorials/hands-on_mlops_tutorials/ml_ci_cd_using_github_actions_and_clearml.md
index d545599f..3183b583 100644
--- a/docs/getting_started/video_tutorials/hands-on_mlops_tutorials/ml_ci_cd_using_github_actions_and_clearml.md
+++ b/docs/getting_started/video_tutorials/hands-on_mlops_tutorials/ml_ci_cd_using_github_actions_and_clearml.md
@@ -17,9 +17,7 @@ keywords: [mlops, components, GitHub Actions, CI/CD]
-
-Read the transcript
-
+### Video Transcript
Hello, welcome back to ClearML my name is Victor and in this video I'll be going through some CI/CD tips and tricks you
can do with ClearML. For this video, I'm going to assume that you already know about ClearML and CI/CD.
@@ -48,12 +46,12 @@ is remotely runnable right.
So those were the three jobs that I want to talk about in this video. Let's get started.
So as you can see, I have here my example project with me and there's a few things immediately apparent. So one is we
-have the `.github` folder with workflows. We're using GitHub actions in this specific video again you don't have to use
-GitHub actions if you don't want to. It's just as an example for General CI/CD stuff. Then we have a few scripts here,
+have the `.github` folder with workflows. We're using GitHub actions in this specific video. Again, you don't have to use
+GitHub actions if you don't want to. It's just an example for General CI/CD stuff. Then we have a few scripts here,
and we have our task as well.
Now, I'll start with the task because that's the thing we're going to run as the experiment you want to keep track of
-in your Git, and in ClearML, and in this case, we'll just take like a dummy task. We'll take a very, very simple example
+in your Git, and in ClearML, and in this case, we'll just take a dummy task. We'll take a very, very simple example
here, so we just do `from clearml import Task`. If you're familiar with ClearML this will be very familiar to you as
well. It's just the `Task.init`, give it a project, give it a name, and then I basically always set `reuse_last_task_id`
to `false`, which basically means that it will never override the previous task if it didn't complete properly. It's more
@@ -76,10 +74,10 @@ if it has to do any kind of checks. In this case, we'll call it ClearML checks,
Now, most of the time that you're using ClearML, it's going to be interesting to do checks on a pull request because it
can take some time. It's machine learning after all, but it highly depends on what you want to do, of course. Now,
I'll be setting it to pull requests specifically to branch `main`. So if I want to do a pull request to my `main`
-branch, I will want those checks being fired, and then I wanted them to be added to several different actions there,
+branch, I will want those checks being fired, and then I want them to be added to several different actions there,
specifically the edited and opened are the ones that I'm interested in. So, every time I open a PR, but also every
time I update a PR, like send a new commit to it, it will trigger, and then what do we actually want to trigger. So this is
-the meat of the story this is the jobs.
+the meat of the story; this is the jobs.
In this video, we're going to run three specific jobs. One is `task-stats-to-comment`, the other one is `compare-models`,
and the third one is `test-remote-runnable`.
@@ -89,14 +87,14 @@ trying to merge, and then add a comment on the PR with the different performance
kind of neat; you can easily see what the task is doing, how good it is, stuff like that. So that's what we're going
to do first.
-Now, how this is built up? I'll run down this and I will go into the code later in a second, but then to start with we
+Now, how is this built up? I'll run down this and I will go into the code later in a second, but then to start with we
have the environment variables. Now, to be sure that the GitHub action worker or the gitlab runner or whatever you're
going to run these actions on has access to ClearML, you have to give it the ClearML credentials. You can do that with
the environment variable `CLEARML_API_ACCESS_KEY` and `CLEARML_API_SECRET_KEY`, these are the keys you get when you
create new credentials in the main UI. In this case I'll get them from the secrets; I've added them to GitHub as a
-secret, and we can gather them from there. Same thing with the ClearML API host. in our case it will just be
-`app.clear.ml`, which is the free tier version pf ClearML. You also want a GitHub token because we want to actually
-add a comment to a PR, so we also need to GitHub token, which is very easy to generate. I'll put a link for that down
+secret, and we can gather them from there. Same thing with the ClearML API host. In our case it will just be
+`app.clear.ml`, which is the free tier version of ClearML. You also want a GitHub token because we want to actually
+add a comment to a PR, so we also need a GitHub token, which is very easy to generate. I'll put a link for that down
in the description. Then we also have the comment commit ID. So, specifically we want the pull request headshot, which
is the latest commit in the pull request. We're going to do some things with that.
@@ -138,7 +136,7 @@ the task, there could be multiple, but again they're sorted on last update, so w
then if not `task[script.diff]`, basically if there's not any uncommitted changes, we know the exact code that was
used there then we can just return the task, and that's it.
-So now we have our task object. We know for sure that was run with the same code as was done in the PR, and we also know
+So now we have our task object. We know for sure that it was run with the same code as was done in the PR, and we also know
that it was completed successfully. So we want to add a tag for example `main_branch`, just in your ClearML, you will be
able to see a tag there `main_branch`.
@@ -169,7 +167,7 @@ everything, install ClearML, and then run the task. Now no task based on this co
just changed the code, it has an uncommitted change, remember? So there is no task in ClearML yet with the change that
we just made. So in order to get that running, we have to go into the task, run this first with this new PR, and now we
actually get a new task right here with the exact commits in branch `video_example`, without any uncommitted changes,
-and if we now rerun our pipeline we should be good to go. So let me just go there it is almost done here. Yep, it's done
+and if we now rerun our pipeline we should be good to go. So let me just go there. It is almost done here. Yep, it's done
so this should now be completed. And if I go back to our tests here, we can see that some of them have failed, so let's
rerun the failed jobs. Now, in this case we should actually find a task in ClearML that has all our
code changes, and it should work just nicely.
@@ -179,15 +177,17 @@ which of the tasks you run, but `task_stats_to_comment` was successful, so this
request, we see our little checkbox here that all the checks worked out perfectly fine, and if I go in
here, you can see that the actual performance metric of series 1 is right there, so that's really, really cool. We
just changed it and there's already an example there. So that was actually the first one, `task_stats_to_comment`, which
-is really handy. You can just slap it on any task, and you'll always get the output there, if you add a new commit to
+is really handy. You can just slap it on any task, and you'll always get the output there. If you add a new commit to
your PR, you'll just get a new comment from these checks just to be sure that it's always up-to-date.
So let's get to the second part. So we had our `task_stats_to_comment`, what else might you want to do with GitHub CI/CD?
-Another thing you might want to do is compare models, basically compare the output of the model or like the last metric
+Another thing you might want to do is compare models, basically compare the output of the model or the last metric
that we just pulled from the current task, which is the task connected to the PR that we want to open, or that we've
just opened, and compare its performance to the performance of the best model before it. So we can always know that it's
-either equal or better performance than the last commit. So if we go to `compare-models` here, and we have our
-environments again, so this is all the same thing. We run again on Ubuntu 20.04, we check out the code we set up Python,
+either equal or better performance than the last commit.
+
+So if we go to `compare-models` here, and we have our
+environments again, so this is all the same thing. We run again on Ubuntu 20.04, we check out the code we set up in Python,
we install our packages, and then we run `compare_models.py`. `compare_models.py` is very, very similar. It is very
simple. So here we print "running on Commit hash" which we get from the environment variable that we just gave to
GitHub, and then we run `compare_and_tag_task`. So what we want to do is basically compare and then if it's better, tag
@@ -258,7 +258,7 @@ nice because we don't want the timer to be triggered because it's waiting in the
to it, so we only want the timer to be started whenever it's actually being executed by ClearML agent. So we've reset
the timer. At some point the task status will change from `queued` to anything else. If this task status is `failed` or
`stopped`, it means we did have an error which is not ideal which is exactly what we want to catch in this case, so
-we'll raise a value error saying "Task did not return correctly, check the logs in the web UI." You'll see probably in
+we'll raise a value error saying "Task did not return correctly, check the logs in the web UI." You'll probably see in
ClearML that the task will actually have failed, and then you can check and debug there. Also raising a value error
will actually fail the pipeline as well, which is exactly what we want. We don't want this PR to go through if the
pipeline fails, because of a task that can't be run remotely, this is exactly what we want to catch.
@@ -267,9 +267,9 @@ But, if the task status is in progress, we go into a next loop, in which we say,
basically if we get only one iteration, it means that the whole setups process was successful, the model is training,
and we're good to go. So in that case, we just clean up, we've just checked everything is good, so we set the task as
`mark_stopped`, we set the task as `set_archived`, and we return `true`, which basically says get the task out of the
-way, it shouldn't be in the project anymore. We just checked everything works get it out of my sight.
+way, it shouldn't be in the project anymore. We just checked everything works. Get it out of my sight!
-So that was the last of the three checks that I wanted to cover today. I hope you found this interesting I mean if we
+So that was the last of the three checks that I wanted to cover today. I hope you found this interesting. I mean, if we
go back to the PR here, it's really nice to see all of these checks coming back green. It's very easy to just use the
ClearML API and even ClearML task for example to launch stuff remotely. It's not that far of a fetch either to just
think why not use ClearML agent as for example a test bed for GPU tests. So you could very easily add things to the
@@ -278,10 +278,7 @@ you could actually run tests that are supposed to be run on GPU machines this wa
or out-of-the-box allow you to run on GPU workers.
So it's just one of the very many ways that you can use ClearML to do
-these kind of things and I hope you learned something valuable today. All of the code that you saw in this example
-will be available in the link in the description, and if you need any help, join our Slack Channel, we're always there,
+these kinds of things and I hope you learned something valuable today. All of the code that you saw in this example
+will be available in the link in the description, and if you need any help, join our [Slack Channel](https://join.slack.com/t/clearml/shared_invite/zt-1kvcxu5hf-SRH_rmmHdLL7l2WadRJTQg), we're always there,
always happy to help and thank you for watching.
-
-
-
diff --git a/docs/getting_started/video_tutorials/hyperdatasets_data_versioning.md b/docs/getting_started/video_tutorials/hyperdatasets_data_versioning.md
index d9ddf52a..fd30bdde 100644
--- a/docs/getting_started/video_tutorials/hyperdatasets_data_versioning.md
+++ b/docs/getting_started/video_tutorials/hyperdatasets_data_versioning.md
@@ -17,9 +17,7 @@ keywords: [mlops, components, hyperdatasets]
-
-Read the transcript
-
+### Video Transcript
Hello and welcome to ClearML. In this video, we're taking a closer look at Hyper-Datasets, a supercharged version of ClearML Data.
@@ -39,7 +37,7 @@ Let’s take a look at an example that will show you how to use Hyper-Datasets t
When you open Hyper-Datasets to explore a dataset, you can find the version history of that dataset here. Datasets can have multiple versions, which in turn can have multiple child versions. Each of the child versions will inherit the contents of their parents.
-By default, a dataset version will be in draft mode, meaning it can still be modified. You can press the publish button to essentially lock it to make sure it will not change anymore. If you want to make changes to a published dataset version, make a new version that’s based on it.
+By default, a dataset version will be in draft mode, meaning it can still be modified. You can press the **Publish** button to essentially lock it to make sure it will not change anymore. If you want to make changes to a published dataset version, make a new version that’s based on it.
You’ll find automatically generated label statistics here, that give you a quick overview of the label distribution in your dataset as well as some version metadata and other version information.
@@ -57,7 +55,7 @@ As an example, imagine you have created an experiment that tries to train a mode
To get the data you need to train on, you can easily create a dataview from code like so. Then you can add all sorts of constraints, like class filters, metadata filters, and class weights which will over or under sample the data as is required.
-After running the task, we can see it in the experiment manager. The model is reporting scalars and training as we would expect. When using Hyper-Datasets, there is also a dataviews tab with all of the possibilities at your disposal. You can see which input datasets and versions that you used and can see the querying system that is used to subset them. This will already give you a nice, clean way to train your models on a very specific subset of the data, but there is more!
+After running the task, we can see it in the experiment manager. The model is reporting scalars and training as we would expect. When using Hyper-Datasets, there is also a **Dataviews** tab with all of the possibilities at your disposal. You can see which input datasets and versions that you used and can see the querying system that is used to subset them. This will already give you a nice, clean way to train your models on a very specific subset of the data, but there is more!
If you want to remap labels, or enumerate them to integers on-the-fly, ClearML will keep track of all the transformations that are done and make sure they are reproducible. There is, of course, more still, so if you’re interested check out our documentation on Hyper-Datasets.
@@ -73,6 +71,5 @@ After the remote machine has executed the experiment on the new dataview, we can
If you’ve been following along with the other Getting Started videos, you should already start to see the potential this approach can have. For example: we could now run hyperparameter optimization on the data itself, because all of the filters and settings previously shown are just parameters on a task. The whole process could be running in parallel on a cloud autoscaler for example. Imagine finding the best training data confidence threshold for each class to optimize the model performance.
-If you’re interested in using Hyper-Datasets for your team, then contact us using our website and we’ll get you going in no time. In the meantime, you can enjoy the power of the open source components at app.clear.ml, and don’t forget to join our Slack channel, if you need any help!
-
-
+If you’re interested in using Hyper-Datasets for your team, then contact us using our website, and we’ll get you going in no time. In the meantime, you can enjoy the power of the open source components at [app.clear.ml](https://app.clear.ml), and don’t forget to join our [Slack Channel](https://join.slack.com/t/clearml/shared_invite/zt-1kvcxu5hf-SRH_rmmHdLL7l2WadRJTQg), if you need any help!
+
diff --git a/docs/getting_started/video_tutorials/hyperparameter_optimization.md b/docs/getting_started/video_tutorials/hyperparameter_optimization.md
index b5f6c787..13b9f470 100644
--- a/docs/getting_started/video_tutorials/hyperparameter_optimization.md
+++ b/docs/getting_started/video_tutorials/hyperparameter_optimization.md
@@ -17,9 +17,8 @@ keywords: [mlops, components, hyperparameter optimization, hyperparameter]
-
-Read the transcript
-
+### Video Transcript
+
Hello and welcome to ClearML. In this video we’ll take a look at one cool way of using the agent other than rerunning a task remotely: hyperparameter optimization (HPO).
By now, we know that ClearML can easily capture our hyperparameters and scalars as part of the experiment tracking. We also know we can clone any task and change its hyperparameters, so they’ll be injected into the original code at runtime. In the last video, we learnt how to make a remote machine execute this task automatically by using the agent.
@@ -30,7 +29,7 @@ Yeah, yeah we can, it's called hyperparameter optimization. And we can do all of
If you don’t know what Hyperparameter Optimization is yet, you can find a link to our blog post on the topic in the description below. But in its most basic form, hyperparameter optimization tries to optimize a certain output by changing a set of inputs.
-Let’s say we’ve been working on this model here, and we were tracking our experiments with it anyway. We can see we have some hyperparameters to work with in the hyperparameters tab of the webUI. They are logged by using the `task.connect` function in our code. These are our inputs. We also have a scaler called `validation/epoch_accuracy`, that we want to get as high as possible. This is our output. We could also select to minimize the `epoch_loss` for example, that is something you can decide yourself.
+Let’s say we’ve been working on this model here, and we were tracking our experiments with it anyway. We can see we have some hyperparameters to work with in the **Hyperparameters** tab of the web UI. They are logged by using the `task.connect` function in our code. These are our inputs. We also have a scaler called `validation/epoch_accuracy`, that we want to get as high as possible. This is our output. We could also select to minimize the `epoch_loss` for example, that is something you can decide yourself.
We can see that no code was used to log the scalar. It's done automatically because we are using TensorBoard.
@@ -54,7 +53,7 @@ That’s it! With just a few lines of code, we can optimize a task. If we take a
And that’s really cool! Instead of inserting the HPO process in our original code, like you would do with most optimization libraries, we’ve now put it on top of it instead. So we can keep our code completely separate from the optimization process. Which, again, means we can optimize anything we want.
-We can now follow the progress of our optimization process by looking at the optimizer task under the plots section. Here we can see several interesting things happening.
+We can now follow the progress of our optimization process by looking at the optimizer task under the **Plots** section. Here we can see several interesting things happening.
Every point in this graph is a task, or a single run of your code using a specific hyperparameter configuration. It will give you a quick glimpse into how all tasks are performing.
@@ -66,6 +65,4 @@ As we saw earlier, if you’re a ClearML pro user, you can even launch your opti
And don’t forget about autoscaling! You can run it for free using code of course, but with ClearML Pro you can set it up in the UI as well. Which means that, starting from scratch, you can have an autoscaling cluster of cloud VMs running hyperparameter optimization on your experiment tasks in just a few minutes. How cool is that?
-In the next video, we’ll take a look at another example of automation goodness: pipelines. In the meantime, why not try and optimize one of your existing models for free at app.clear.ml, and don’t forget to join our Slack channel, if you need any help.
-
-
+In the next video, we’ll take a look at another example of automation goodness: pipelines. In the meantime, why not try and optimize one of your existing models for free at [app.clear.ml](https://app.clear.ml), and don’t forget to join our [Slack Channel](https://join.slack.com/t/clearml/shared_invite/zt-1kvcxu5hf-SRH_rmmHdLL7l2WadRJTQg), if you need any help.
diff --git a/docs/getting_started/video_tutorials/pipelines_from_code.md b/docs/getting_started/video_tutorials/pipelines_from_code.md
index cca8d434..39a13f59 100644
--- a/docs/getting_started/video_tutorials/pipelines_from_code.md
+++ b/docs/getting_started/video_tutorials/pipelines_from_code.md
@@ -17,9 +17,7 @@ keywords: [mlops, components, automation, orchestration, pipeline]
-
-Read the transcript
-
+### Video Transcript
Hello and welcome to ClearML. In this video we’ll take a look at how pipelines can be used as a way to easily automate and orchestrate multiple tasks.
@@ -61,8 +59,6 @@ When no step is selected, we can see our global pipeline info on the right. By c
If we select a step from our pipeline, we can see much of the same details, but this time for that specific step. On the right we can see any inputs or outputs our step produced and below, we can see the steps console output as well as the original code.
-But now comes the most powerful feature of all. Again, a pipeline controller is a task like any other, so… we can clone it like any other. Pressing the "new run" button will allow us to do that from the UI! We can even change our global pipeline parameters here and, just like normal tasks, these will be injected into the original task and overwrite the original parameters. In this way, you can very quickly run many pipelines each with different parameters.
+But now comes the most powerful feature of all. Again, a pipeline controller is a task like any other, so… we can clone it like any other. Pressing the **+ New Run** button will allow us to do that from the UI! We can even change our global pipeline parameters here and, just like normal tasks, these will be injected into the original task and overwrite the original parameters. In this way, you can very quickly run many pipelines each with different parameters.
-In the next video of this Getting Started series, we’ll get a long-overdue look at ClearML Data, our data versioning tool. In the meantime, slap some pipeline decorators on your own functions for free at app.clear.ml, and don’t forget to join our Slack channel, if you need any help.
-
-
+In the next video of this Getting Started series, we’ll get a long-overdue look at ClearML Data, our data versioning tool. In the meantime, slap some pipeline decorators on your own functions for free at [app.clear.ml](https://app.clear.ml), and don’t forget to join our [Slack Channel](https://join.slack.com/t/clearml/shared_invite/zt-1kvcxu5hf-SRH_rmmHdLL7l2WadRJTQg), if you need any help.
diff --git a/docs/getting_started/video_tutorials/pipelines_from_tasks.md b/docs/getting_started/video_tutorials/pipelines_from_tasks.md
index 17dbacdf..f574b615 100644
--- a/docs/getting_started/video_tutorials/pipelines_from_tasks.md
+++ b/docs/getting_started/video_tutorials/pipelines_from_tasks.md
@@ -17,9 +17,7 @@ keywords: [mlops, components, automation, orchestration, pipeline]
-
-Read the transcript
-
+### Video Transcript
Hello and welcome to ClearML. In this video we’ll take a look at how pipelines can be created from tasks instead of from code like we saw in the last video.
@@ -59,8 +57,6 @@ Now we can do all the same things that we could with a pipeline built from code.
When we select a specific step, we can see its inputs and outputs as well as its logs down here and even the original code.
-Finally, we can also clone the whole pipeline and change its parameters by clicking on the new run button. This is the most powerful feature of all, as it allows us to really quickly rerun the whole pipeline with different parameters from the UI. The agents will take care of the rest!
+Finally, we can also clone the whole pipeline and change its parameters by clicking on the **+ New Run** button. This is the most powerful feature of all, as it allows us to really quickly rerun the whole pipeline with different parameters from the UI. The agents will take care of the rest!
-In the next video of this Getting Started series, we’ll take a look at ClearML Data, for real this time. In the meantime, spin up some pipeline controllers yourself for free at app.clear.ml and don’t forget to join our Slack channel, if you need any help.
-
-
+In the next video of this Getting Started series, we’ll take a look at ClearML Data, for real this time. In the meantime, spin up some pipeline controllers yourself for free at [app.clear.ml](https://app.clear.ml) and don’t forget to join our [Slack Channel](https://join.slack.com/t/clearml/shared_invite/zt-1kvcxu5hf-SRH_rmmHdLL7l2WadRJTQg), if you need any help.
diff --git a/docs/getting_started/video_tutorials/quick_introduction.md b/docs/getting_started/video_tutorials/quick_introduction.md
index a6ab248e..d6c2fdcb 100644
--- a/docs/getting_started/video_tutorials/quick_introduction.md
+++ b/docs/getting_started/video_tutorials/quick_introduction.md
@@ -17,9 +17,7 @@ keywords: [mlops, components, features, ClearML]
-
-Read the transcript
-
+### Video Transcript
ClearML is an open source MLOPS platform.
@@ -33,7 +31,5 @@ It's essentially a toolbox stuffed with everything you'll need to go from experi
Doesn't matter if you're starting small or already in production, there's always a ClearML tool that can make your life easier.
-Start for free at app.clear.ml or host your own server from our GitHub page.
+Start for free at [app.clear.ml](https://app.clear.ml) or host your own server from our GitHub page.
-
-
diff --git a/docs/getting_started/video_tutorials/the_clearml_autoscaler.md b/docs/getting_started/video_tutorials/the_clearml_autoscaler.md
index 2d36cd85..ed0d35c8 100644
--- a/docs/getting_started/video_tutorials/the_clearml_autoscaler.md
+++ b/docs/getting_started/video_tutorials/the_clearml_autoscaler.md
@@ -17,9 +17,7 @@ keywords: [mlops, components, Autoscaler]
-
-Read the transcript
-
+### Video Transcript
Hello and welcome to ClearML. In this video we’ll go a little more advanced and introduce autoscalers, the easiest way to build your very own flock of ClearML Agents.
@@ -37,7 +35,7 @@ Obviously, you also configure a maximum budget by limiting the number of machine
Now that the theory is taken care of, let’s take a look at how to set up an autoscaler on ClearML.
-To launch the autoscaler, go to app.clear.ml and open the Applications page. There you’ll find the autoscalers for each of the large cloud providers. To launch the autoscaler this way requires ClearML Pro, but it’s cheap enough that forgetting to shut down a remote GPU machine for 3 days costs more than a year of ClearML Pro, so…
+To launch the autoscaler, go to [app.clear.ml](https://app.clear.ml) and open the Applications page. There you’ll find the autoscalers for each of the large cloud providers. To launch the autoscaler this way requires ClearML Pro, but it’s cheap enough that forgetting to shut down a remote GPU machine for 3 days costs more than a year of ClearML Pro, so…
We’ll go into the AWS wizard in this video, but the other autoscalers have a very similar setup. First are the credentials for your cloud provider of choice, make sure you assign the correct access rights because the autoscaler will use these credentials to launch the machines and shut them down again when they are idle.
@@ -61,12 +59,10 @@ So if we go to one of our projects, clone these tasks here, and then enqueue the
The autoscaler has detected the tasks in the queue and has started booting up remote machines to process them. We can follow along with the process in our autoscaler dashboard.
-Once the machines are spinned up, the ClearML agents will register as available workers in the workers and queues tab. From here, they behave just like any other agent we’ve seen before.
+Once the machines are spinned up, the ClearML agents will register as available workers in the **Workers and Queues** tab. From here, they behave just like any other agent we’ve seen before.
Finally, when everything is done and the remote machines are idle, they will be shut down automatically and the workers list will be empty again.
You can see that this functionality is very powerful when combined with for example hyperparameter optimization or pipelines that launch a lot of tasks at once. Obviously, it can be used as the primary way to get access to remote compute, but it can even be used as an extra layer on top of the machines you already have on-premise to spillover in case of large demand spikes for example. You don’t pay when you don’t use it, so there isn’t really a good reason not to have one running at all times.
-Get started right now for free at app.clear.ml and start spinning up remote machines with ClearML Pro if you want to save some money and effort by automating the boring stuff. If you run into any issues along the way, join our Slack channel, and we’ll help you out.
-
-
+Get started right now for free at [app.clear.ml](https://app.clear.ml) and start spinning up remote machines with ClearML Pro if you want to save some money and effort by automating the boring stuff. If you run into any issues along the way, join our [Slack Channel](https://join.slack.com/t/clearml/shared_invite/zt-1kvcxu5hf-SRH_rmmHdLL7l2WadRJTQg), and we’ll help you out.