mirror of
https://github.com/clearml/clearml-docs
synced 2025-06-26 18:17:44 +00:00
Small edits (#595)
This commit is contained in:
@@ -84,7 +84,7 @@ preprocess_task = Task.get_task(task_id='preprocessing_task_id')
|
||||
local_csv = preprocess_task.artifacts['data'].get_local_copy()
|
||||
```
|
||||
|
||||
The `task.artifacts` is a dictionary where the keys are the artifact names, and the returned object is the artifact object.
|
||||
`task.artifacts` is a dictionary where the keys are the artifact names, and the returned object is the artifact object.
|
||||
Calling `get_local_copy()` returns a local cached copy of the artifact. Therefore, next time we execute the code, we don't
|
||||
need to download the artifact again.
|
||||
Calling `get()` gets a deserialized pickled object.
|
||||
|
||||
@@ -30,11 +30,11 @@ So let’s start with the inputs: hyperparameters. Hyperparameters are the confi
|
||||
|
||||
Let’s take this simple code as an example. First of all, we start the script with the 2 magic lines of code that we covered before. Next to that we have a mix of command line arguments and some additional parameters in a dictionary here.
|
||||
|
||||
The command line arguments will be captured automatically, and for the dict (or really any python object) we can use the `task.connect()` function, to report our dict values as ClearML hyperparameters.
|
||||
The command line arguments will be captured automatically, and for the dict (or really any python object) we can use the `Task.connect()` function, to report our dict values as ClearML hyperparameters.
|
||||
|
||||
As you can see, when we run the script, all hyperparameters are captured and parsed by the server, giving you a clean overview in the UI.
|
||||
|
||||
Configuration objects, however, work slightly differently and are mostly used for more complex configurations, like a nested dict or a yaml file for example. They’re logged by using the `task.connect_configuration()` function instead and will save the configuration as a whole, without parsing it.
|
||||
Configuration objects, however, work slightly differently and are mostly used for more complex configurations, like a nested dict or a yaml file for example. They’re logged by using the `Task.connect_configuration()` function instead and will save the configuration as a whole, without parsing it.
|
||||
|
||||
We have now logged our task with all of its inputs, but if we wanted to, we could rerun our code with different parameters and this is where the magic happens.
|
||||
|
||||
|
||||
@@ -188,7 +188,7 @@ your machine usage and your GPU usage and stuff like that, and then the learning
|
||||
give you a really, really quick overview of the most important metrics that you're trying to solve. And keep in mind
|
||||
this F1 score because this is the thing that we're trying to optimize here.
|
||||
|
||||
Then plots. I can, for example, plot a confusion matrix every X iterations. So in this case ,for example, after a few
|
||||
Then plots. I can, for example, plot a confusion matrix every X iterations. So in this case, for example, after a few
|
||||
iterations, I plot the confusion matrix again just so I can see over time how well the model starts performing. So as
|
||||
you can see here, a perfect confusion matrix will be a diagonal line because every true label will be combined with the
|
||||
exact same predicted label. And in this case, it's horribly wrong. But then over time it starts getting closer and
|
||||
|
||||
@@ -119,7 +119,7 @@ abort task 0. So one way of doing that would be to go to the current experiment,
|
||||
ClearML will actually bring me to the original experiment view, the experiment manager, remember everything is
|
||||
integrated here. The experiment manager of that example task. So what I can do here if I look at the console, I have a
|
||||
bunch of output here. I can actually abort it as well. And if I abort it, what will happen is this task will stop
|
||||
executing. Essentially, it will send a `ctrl c`, so a quit command or a terminate command, to the original task on the \
|
||||
executing. Essentially, it will send a `ctrl c`, so a quit command or a terminate command, to the original task on the
|
||||
remote machine. So the remote machine will say okay, I'm done here. I will just quit it right here. If, for example,
|
||||
your model is not performing very well, or you see like oh, something is definitely wrong here, you can always just
|
||||
abort it. And the cool thing is if we go back to the **Workers and Queues**, we'll see that the `Beast 0` has given up working
|
||||
@@ -212,13 +212,13 @@ you would have made yourself, and now you want to get it into the queue. Now one
|
||||
you could do a `Task.init` which essentially tracks the run of your code as an experiment in the experiment manager, and
|
||||
then you could go and clone the experiment and then enqueue it. This is something that we saw in the Getting Started videos before.
|
||||
|
||||
Now, another way of doing this is to actually use what you can see here, which is `task.execute_remotely`. What this line
|
||||
Now, another way of doing this is to actually use what you can see here, which is `Task.execute_remotely()`. What this line
|
||||
specifically will do, is when you run the file right here. Let me just do that real quick. So if we do
|
||||
`python setup/example_task_CPU.py` what will happen is ClearML will do the `Task.init` like it would always do, but then
|
||||
it would encounter the `task.execute_remotely` and what that will tell ClearML is say okay, take all of this code, take
|
||||
it would encounter the `Task.execute_remotely()` and what that will tell ClearML is say okay, take all of this code, take
|
||||
all of the packages that are installed, take all of the things that you would normally take as part of the experiment
|
||||
manager, but stop executing right here and then send the rest, send everything through to a ClearML agent or to the queue
|
||||
so that a ClearML agent can start working on it. So one way of doing this is to add a `task.execute_remotely` just all
|
||||
so that a ClearML agent can start working on it. So one way of doing this is to add a `Task.execute_remotely()` just all
|
||||
the way at the top and then once you run it, you will see here `clearml WARNING - Terminating local execution process`,
|
||||
and so if we're seeing here if we're going to take a look we can see that Model Training currently running, and if we go
|
||||
and take a look, at our queues here, we have `any-remote-machine` running Model Training right here. And if we go and
|
||||
@@ -246,7 +246,7 @@ our Model Training GPU. But remember again that we also have the autoscaler. So
|
||||
autoscaler, you'll see here that we indeed have one task in the GPU queue. And we also see that the `GPU_machines`
|
||||
Running Instances is one as well. So we can follow along with the logs here. And it actually detected that there is a
|
||||
task in a GPU queue, and it's now spinning up a new machine, a new GPU machine to be running that specific task, and then
|
||||
it will shut that back down again when it's done. So this is just one example of how you can use `task.execute_remotely`
|
||||
it will shut that back down again when it's done. So this is just one example of how you can use `Task.execute_remotely()`
|
||||
to very efficiently get your tasks into the queue. Actually, it could also be the first time. So if you don't want to
|
||||
use the experiment manager for example, you don't actually have to use a task that is already in the system, you can
|
||||
just say it does not execute remotely, and it will just put it into the system for you and immediately launch it remotely.
|
||||
|
||||
@@ -146,7 +146,7 @@ there and open it up, we first get the status of the task, just to be sure. Reme
|
||||
something else might have happened in the meantime. If the status is not `completed`, we want to say this is the
|
||||
status, it isn't completed this should not happen but. If it is completed, we are going to create a table with these
|
||||
functions that I won't go deeper into. Basically, they format the dictionary of the state of the task scalars into
|
||||
markdown that we can actually use. Let me just go into this though one quick time. So we can basically do `task.get_last_scalar_metrics`,
|
||||
markdown that we can actually use. Let me just go into this though one quick time. So we can basically do `Task.get_last_scalar_metrics()`,
|
||||
and this function is built into ClearML, which basically gives you a dictionary with all the metrics on your task.
|
||||
We'll just get that formatted into a table, make it into a pandas DataFrame, and then tabulate it with this cool package
|
||||
that turns it into MarkDown. So now that we have marked down in the table, we then want to return results table. You can
|
||||
|
||||
@@ -30,7 +30,7 @@ Yeah, yeah we can, it's called hyperparameter optimization. And we can do all of
|
||||
|
||||
If you don’t know what Hyperparameter Optimization is yet, you can find a link to our blog post on the topic in the description below. But in its most basic form, hyperparameter optimization tries to optimize a certain output by changing a set of inputs.
|
||||
|
||||
Let’s say we’ve been working on this model here, and we were tracking our experiments with it anyway. We can see we have some hyperparameters to work with in the **Hyperparameters** tab of the web UI. They are logged by using the `task.connect` function in our code. These are our inputs. We also have a scaler called `validation/epoch_accuracy`, that we want to get as high as possible. This is our output. We could also select to minimize the `epoch_loss` for example, that is something you can decide yourself.
|
||||
Let’s say we’ve been working on this model here, and we were tracking our experiments with it anyway. We can see we have some hyperparameters to work with in the **Hyperparameters** tab of the web UI. They are logged by using the `Task.connect` function in our code. These are our inputs. We also have a scaler called `validation/epoch_accuracy`, that we want to get as high as possible. This is our output. We could also select to minimize the `epoch_loss` for example, that is something you can decide yourself.
|
||||
|
||||
We can see that no code was used to log the scalar. It's done automatically because we are using TensorBoard.
|
||||
|
||||
|
||||
Reference in New Issue
Block a user