mirror of
https://github.com/clearml/clearml-docs
synced 2025-06-26 18:17:44 +00:00
This commit is contained in:
@@ -36,13 +36,13 @@ The most important difference is that you’ll also be asked for your git inform
|
||||
|
||||
Before we run the agent though, let's take a quick look at what will happen when we spin it up.
|
||||
|
||||
Our server hosts one or more queues in which we can put our tasks. And then we have our agent. By default, it will be running in pip mode, or virtual environment mode. Once an agent pulls a new task from the queue to be executed, it will create a new python virtual environment for it. It will then clone the code itself and install all required python packages in the new virtual environment. It then runs the code and injects any new hyperparameters we changed in the UI.
|
||||
Our server hosts one or more queues in which we can put our tasks. And then we have our agent. By default, it will be running in pip mode, or virtual environment mode. Once an agent pulls a new task from the queue to be executed, it will create a new Python virtual environment for it. It will then clone the code itself and install all required Python packages in the new virtual environment. It then runs the code and injects any new hyperparameters we changed in the UI.
|
||||
|
||||
PIP mode is really handy and efficient. It will create a new python virtual environment for every task it pulls and will use smart caching so packages or even whole environments can be reused over multiple tasks.
|
||||
PIP mode is really handy and efficient. It will create a new Python virtual environment for every task it pulls and will use smart caching so packages or even whole environments can be reused over multiple tasks.
|
||||
|
||||
You can also run the agent in conda mode or poetry mode, which essentially do the same thing as pip mode, only with a conda or poetry environment instead.
|
||||
|
||||
However, there’s also docker mode. In this case the agent will run every incoming task in its own docker container instead of just a virtual environment. This makes things much easier if your tasks have system package dependencies for example, or when not every task uses the same python version. For our example, we’ll be using docker mode.
|
||||
However, there’s also docker mode. In this case the agent will run every incoming task in its own docker container instead of just a virtual environment. This makes things much easier if your tasks have system package dependencies for example, or when not every task uses the same Python version. For our example, we’ll be using docker mode.
|
||||
|
||||
Now that our configuration is ready, we can start our agent in docker mode by running the command `clearml-agent daemon –docker`.
|
||||
|
||||
|
||||
@@ -20,13 +20,13 @@ keywords: [mlops, components, ClearML data]
|
||||
<br/>
|
||||
|
||||
<Collapsible type="info" title="Video Transcript">
|
||||
Hello and welcome to ClearML. In this video we'll take a look at both the command line and python interfaces of our data versioning tool called `clearml-data`.
|
||||
Hello and welcome to ClearML. In this video we'll take a look at both the command line and Python interfaces of our data versioning tool called `clearml-data`.
|
||||
|
||||
In the world of machine learning, you are very likely dealing with large amounts of data that you need to put into a dataset. ClearML Data solves 2 important challenges that occur in this situation:
|
||||
|
||||
One is accessibility, making sure the data can be accessed from every machine you use. And two is versioning, linking which dataset version was used in which task. This helps to make experiments more reproducible. Moreover, versioning systems like git were never really designed for the size and number of files in machine learning datasets. We're going to need something else.
|
||||
|
||||
ClearML Data comes built-in with the `clearml` python package and has both a command line interface for easy and quick operations and a python interface if you want more flexibility. Both interfaces are quite similar, so we'll address both of them in the video.
|
||||
ClearML Data comes built-in with the `clearml` Python package and has both a command line interface for easy and quick operations and a Python interface if you want more flexibility. Both interfaces are quite similar, so we'll address both of them in the video.
|
||||
|
||||
Let's start with an example. Say I have some files here that I want to put into a dataset and start to keep track of.
|
||||
|
||||
@@ -36,13 +36,13 @@ We can do that by using the `clearml-data add` command and providing the path to
|
||||
|
||||
Now we need to tell the server that we're done here. We can call `clearml-data close` to upload the files and change the dataset status to done, which finalizes this version of the dataset.
|
||||
|
||||
The process of doing this with the python interface is very similar.
|
||||
The process of doing this with the Python interface is very similar.
|
||||
|
||||
You can create a new Dataset by importing the Dataset object from the `clearml` pip package and calling its `create` method. Now we have to give the dataset a name and a project just like with the command line tool. The create method returns a dataset instance which we will use to do all of our operations on.
|
||||
|
||||
To add some files to this newly created dataset version, call the `add_files` method on the dataset object and provide a path to a local file or folder. Bear in mind that nothing is uploaded just yet, we're simply instructing the dataset object what it should do when we eventually *do* want to upload.
|
||||
|
||||
A really useful thing we can do with the python interface is adding some interesting statistics about the dataset itself, such as a plot for example. Here we simply report a histogram on the amount of files in the train and test folders. You can add anything to a dataset that you can add to a ClearML task, so go nuts!
|
||||
A really useful thing we can do with the Python interface is adding some interesting statistics about the dataset itself, such as a plot for example. Here we simply report a histogram on the amount of files in the train and test folders. You can add anything to a dataset that you can add to a ClearML task, so go nuts!
|
||||
|
||||
Finally, upload the dataset and then finalize it, or just set `auto_upload` to `true` to make it a one-liner.
|
||||
|
||||
@@ -56,7 +56,7 @@ Using the command line tool, you can download a dataset version locally by using
|
||||
|
||||
That path will be a local cached folder, which means that if you try to get the same dataset again, or any other dataset that's based on this one, it will check which files are already on your system, and it will not download these again.
|
||||
|
||||
The python interface is similar, with one major difference. You can also get a dataset using any combination of name, project, ID or tags, but _getting_ the dataset does not mean it is downloaded, we simply got all of the metadata, which we can now access from the dataset object. This is important, as it means you don't have to download the dataset to make changes to it, or to add files. More on that in just a moment.
|
||||
The Python interface is similar, with one major difference. You can also get a dataset using any combination of name, project, ID or tags, but _getting_ the dataset does not mean it is downloaded, we simply got all of the metadata, which we can now access from the dataset object. This is important, as it means you don't have to download the dataset to make changes to it, or to add files. More on that in just a moment.
|
||||
|
||||
If you do want to download a local copy of the dataset, it has to be done explicitly, by calling `get_local_copy` which will return the path to which the data was downloaded for you.
|
||||
|
||||
@@ -70,7 +70,7 @@ Let's say we found an issue with the hamburgers here, so we remove them from the
|
||||
|
||||
Now we can tell ClearML that the changes we made to this folder should become a new version of the previous dataset. We start by creating a new dataset just like we saw before, but now, we add the previous dataset ID as a parent. This tells ClearML that this new dataset version we're creating is based on the previous one and so our dataset object here will already contain all the files that the parent contained.
|
||||
|
||||
Now we can manually remove and add the files that we want, even without actually downloading the dataset. It will just change the metadata inside the python object and sync everything when it's finalized.
|
||||
Now we can manually remove and add the files that we want, even without actually downloading the dataset. It will just change the metadata inside the Python object and sync everything when it's finalized.
|
||||
|
||||
That said, we do have a local copy of the dataset in this case, so we have a better option.
|
||||
|
||||
|
||||
@@ -25,7 +25,7 @@ ClearML is designed to get you up and running in less than 10 minutes and 2 magi
|
||||
|
||||
At the heart of ClearML lies the experiment manager. It consists of the `clearml` pip package and the ClearML Server.
|
||||
|
||||
After running `pip install clearml` we can add 2 simple lines of python code to your existing codebase. These 2 lines will capture all the output that your code produces: logs, source code, hyperparameters, plots, images, you name it.
|
||||
After running `pip install clearml` we can add 2 simple lines of Python code to your existing codebase. These 2 lines will capture all the output that your code produces: logs, source code, hyperparameters, plots, images, you name it.
|
||||
|
||||
The pip package also includes `clearml-data`. It can help you keep track of your ever-changing datasets and provides an easy way to store, track and version control your data. It's also an easy way to share your dataset with colleagues over multiple machines while keeping track of who has which version. ClearML Data can even keep track of your data's ancestry, making sure you can always figure out where specific parts of your data came from.
|
||||
|
||||
|
||||
@@ -26,7 +26,7 @@ This is the experiment manager's UI, and every row you can see here, is a single
|
||||
|
||||
We’re currently in our project folder. As you can see, we have our very basic toy example here that we want to keep track of by using ClearML’s experiment manager.
|
||||
|
||||
The first thing to do is to install the `clearml` python package in our virtual environment. Installing the package itself, will add 3 commands for you. We’ll cover the `clearml-data` and `clearml-task` commands later. For now the one we need is `clearml-init`.
|
||||
The first thing to do is to install the `clearml` Python package in our virtual environment. Installing the package itself, will add 3 commands for you. We’ll cover the `clearml-data` and `clearml-task` commands later. For now the one we need is `clearml-init`.
|
||||
|
||||
If you paid attention in the first video of this series, you’d remember that we need to connect to a ClearML Server to save all our tracked data. The server is where we saw the list of experiments earlier. This connection is what `clearml-init` will set up for us. When running the command it’ll ask for your server API credentials.
|
||||
|
||||
|
||||
@@ -36,7 +36,7 @@ We can see that no code was used to log the scalar. It's done automatically beca
|
||||
|
||||
We are using a training script as our task in our example here, but the optimizer doesn’t actually care what’s in our task, it just wants inputs and outputs. So you can optimize basically anything you want.
|
||||
|
||||
The only thing we have to do to start optimizing this model is to write a small python file detailing what exactly we want our optimizer to do.
|
||||
The only thing we have to do to start optimizing this model is to write a small Python file detailing what exactly we want our optimizer to do.
|
||||
|
||||
When you’re a ClearML Pro user, you can just start the optimizer straight from the UI, but more on that later.
|
||||
|
||||
|
||||
@@ -34,7 +34,7 @@ One is you can easily chain existing ClearML tasks together to create a single p
|
||||
|
||||
Let's say we have some functions that we already use to run ETL and another function that trains a model on the preprocessed data. We already have a main function too, that orchestrates when and how these other components should be run.
|
||||
|
||||
If we want to make this code into a pipeline, the first thing we have to do is to tell ClearML that these functions are supposed to become steps in our pipeline. We can do that by using a python decorator! For each function we want as a step, we can decorate it with `PipelineDecorator.component`.
|
||||
If we want to make this code into a pipeline, the first thing we have to do is to tell ClearML that these functions are supposed to become steps in our pipeline. We can do that by using a Python decorator! For each function we want as a step, we can decorate it with `PipelineDecorator.component`.
|
||||
|
||||
The component call will fully automatically transform this function into a ClearML task, with all the benefits that come with that. It will also make it clear that this task will be part of a larger pipeline.
|
||||
|
||||
|
||||
Reference in New Issue
Block a user