mirror of
https://github.com/clearml/clearml-docs
synced 2025-06-03 11:27:25 +00:00
Small edits (#595)
This commit is contained in:
parent
c256f46993
commit
fdffc9c271
@ -138,7 +138,7 @@ clearml-agent execute [-h] --id TASK_ID [--log-file LOG_FILE] [--disable-monitor
|
||||
|`--log-file`| The log file for Task execution output (stdout / stderr) to a text file.|<img src="/docs/latest/icons/ico-optional-yes.svg" alt="Yes" className="icon size-md center-md" />|
|
||||
|`--log-level`| SDK log level. The values are:<ul><li>`DEBUG`</li><li>`INFO`</li><li>`WARN`</li><li>`WARNING`</li><li>`ERROR`</li><li>`CRITICAL`</li></ul>|<img src="/docs/latest/icons/ico-optional-yes.svg" alt="Yes" className="icon size-md center-md" />|
|
||||
|`-O`| Compile optimized pyc code (see python documentation). Repeat for more optimization.|<img src="/docs/latest/icons/ico-optional-yes.svg" alt="Yes" className="icon size-md center-md" />|
|
||||
|`--require-queue`| If the specified task is not queued (in any queue), the execution will fail. (Used for 3rd party scheduler integration, e.g. K8s, SLURM, etc.)|<img src="/docs/latest/icons/ico-optional-yes.svg" alt="Yes" className="icon size-md center-md" />|
|
||||
|`--require-queue`| If the specified task is not queued, the execution will fail. (Used for 3rd party scheduler integration, e.g. K8s, SLURM, etc.)|<img src="/docs/latest/icons/ico-optional-yes.svg" alt="Yes" className="icon size-md center-md" />|
|
||||
|`--standalone-mode`| Do not use any network connects, assume everything is pre-installed|<img src="/docs/latest/icons/ico-optional-yes.svg" alt="Yes" className="icon size-md center-md" />|
|
||||
|
||||
## list
|
||||
|
@ -156,7 +156,7 @@ dataset = Dataset.create(dataset_name="my dataset", dataset_project="example pro
|
||||
dataset.add_files(path="path/to/folder_or_file")
|
||||
```
|
||||
|
||||
There is an option to add a set of files based on wildcard matching of a single string or a list of strings, using the
|
||||
You can add a set of files based on wildcard matching of a single string or a list of strings, using the
|
||||
`wildcard` parameter. Specify whether to match the wildcard files recursively using the `recursive` parameter.
|
||||
|
||||
For example:
|
||||
@ -207,7 +207,7 @@ To remove files from a current dataset, use the [`Dataset.remove_files`](../refe
|
||||
Input the path to the folder or file to be removed in the `dataset_path` parameter. The path is relative to the dataset.
|
||||
To remove links, specify their URL (e.g. `s3://bucket/file`).
|
||||
|
||||
There is also an option to input a wildcard into `dataset_path` in order to remove a set of files matching the wildcard.
|
||||
You can also input a wildcard into `dataset_path` in order to remove a set of files matching the wildcard.
|
||||
Set the `recursive` parameter to `True` in order to match all wildcard files recursively
|
||||
|
||||
For example:
|
||||
@ -220,7 +220,7 @@ dataset.remove_files(dataset_path="*.csv", recursive=True)
|
||||
|
||||
To upload the dataset files to network storage, use the [`Dataset.upload`](../references/sdk/dataset.md#upload) method.
|
||||
|
||||
Use the `output_url` parameter to specify storage target, such as S3 / GS / Azure (e.g. `s3://bucket/data`, `gs://bucket/data`, `azure://bucket/data` , `/mnt/share/data`).
|
||||
Use the `output_url` parameter to specify storage target, such as S3 / GS / Azure (e.g. `s3://bucket/data`, `gs://bucket/data`, `azure://bucket/data`, `/mnt/share/data`).
|
||||
By default, the dataset uploads to ClearML's file server. This target storage overrides the `output_uri` value of the
|
||||
[`Dataset.create`](#creating-datasets) method.
|
||||
|
||||
|
@ -153,7 +153,7 @@ Pass one of the following in the `continue_last_task` parameter:
|
||||
* Task ID (string) - The ID of the task to be continued.
|
||||
* Initial iteration offset (integer) - Specify the initial iteration offset. By default, the task will continue one
|
||||
iteration after the last reported one. Pass `0`, to disable the automatic last iteration offset. To also specify a
|
||||
task ID, use the `reuse_last_task_id` parameter .
|
||||
task ID, use the `reuse_last_task_id` parameter.
|
||||
|
||||
You can also continue a task previously executed in offline mode, using the `Task.import_offline_session` method.
|
||||
See [Offline Mode](#offline-mode).
|
||||
@ -417,7 +417,7 @@ a_func_task = task.create_function_task(
|
||||
)
|
||||
```
|
||||
Arguments passed to the function will be automatically logged in the
|
||||
experiment's **CONFIGURATION** tab under the **HYPERPARAMETER > Function** section .
|
||||
experiment's **CONFIGURATION** tab under the **HYPERPARAMETER > Function** section.
|
||||
Like any other arguments, they can be changed from the UI or programmatically.
|
||||
|
||||
:::note Function Task Creation
|
||||
@ -914,8 +914,8 @@ This returns a nested dictionary of the scalar graph values:
|
||||
{
|
||||
"title": {
|
||||
"series": {
|
||||
"x": [0, 1 ,2],
|
||||
"y": [10, 11 ,12]
|
||||
"x": [0, 1, 2],
|
||||
"y": [10, 11, 12]
|
||||
}
|
||||
}
|
||||
}
|
||||
@ -927,10 +927,10 @@ This call is not cached. If the Task has many reported scalars, it might take a
|
||||
|
||||
#### Get Single Value Scalars
|
||||
|
||||
To get the values of a reported single-value scalars, use [`task.get_reported_single_value()`](../references/sdk/task.md#get_reported_single_value)
|
||||
To get the values of a reported single-value scalars, use [`Task.get_reported_single_value()`](../references/sdk/task.md#get_reported_single_value)
|
||||
and specify the scalar's `name`.
|
||||
|
||||
To get all reported single scalar values, use [`task.get_reported_single_values()`](../references/sdk/task.md#get_reported_single_values),
|
||||
To get all reported single scalar values, use [`Task.get_reported_single_values()`](../references/sdk/task.md#get_reported_single_values),
|
||||
which returns a dictionary of scalar name and value pairs:
|
||||
|
||||
```console
|
||||
|
@ -198,7 +198,7 @@ clearml-serving model upload [-h] --name NAME [--tags TAGS [TAGS ...]] --project
|
||||
|`--name`|Specifying the model name to be registered in| <img src="/docs/latest/icons/ico-optional-no.svg" alt="No" className="icon size-md center-md" />|
|
||||
|`--tags`| Add tags to the newly created model| <img src="/docs/latest/icons/ico-optional-yes.svg" alt="Yes" className="icon size-md center-md" />|
|
||||
|`--project`| Specify the project for the model to be registered in| <img src="/docs/latest/icons/ico-optional-no.svg" alt="No" className="icon size-md center-md" />|
|
||||
|`--framework`| Specify the model framework. Options are: 'tensorflow', 'tensorflowjs', 'tensorflowlite', 'pytorch', 'torchscript', 'caffe', 'caffe2', 'onnx', 'keras', 'mknet', 'cntk' , 'torch', 'darknet', 'paddlepaddle', 'scikitlearn', 'xgboost', 'lightgbm', 'parquet', 'megengine', 'catboost', 'tensorrt', 'openvino', 'custom' | <img src="/docs/latest/icons/ico-optional-yes.svg" alt="Yes" className="icon size-md center-md" />|
|
||||
|`--framework`| Specify the model framework. Options are: 'tensorflow', 'tensorflowjs', 'tensorflowlite', 'pytorch', 'torchscript', 'caffe', 'caffe2', 'onnx', 'keras', 'mknet', 'cntk', 'torch', 'darknet', 'paddlepaddle', 'scikitlearn', 'xgboost', 'lightgbm', 'parquet', 'megengine', 'catboost', 'tensorrt', 'openvino', 'custom' | <img src="/docs/latest/icons/ico-optional-yes.svg" alt="Yes" className="icon size-md center-md" />|
|
||||
|`--publish`| Publish the newly created model (change model state to "published" (i.e. locked and ready to deploy)|<img src="/docs/latest/icons/ico-optional-yes.svg" alt="Yes" className="icon size-md center-md" />|
|
||||
|`--path`|Specify a model file/folder to be uploaded and registered| <img src="/docs/latest/icons/ico-optional-yes.svg" alt="Yes" className="icon size-md center-md" />|
|
||||
|`--url`| Specify an already uploaded model url (e.g. `s3://bucket/model.bin`, `gs://bucket/model.bin`)|<img src="/docs/latest/icons/ico-optional-yes.svg" alt="Yes" className="icon size-md center-md" />|
|
||||
|
@ -84,7 +84,7 @@ If your code is saved in a private repository, you can add your Git credentials
|
||||
cloud instances will be able to retrieve the code from your repos.
|
||||
|
||||
#### Cloud Storage Access
|
||||
If your tasks need to access data stored in cloud storage , you can provide your cloud storage credentials, so the
|
||||
If your tasks need to access data stored in cloud storage, you can provide your cloud storage credentials, so the
|
||||
executed tasks will have access to your storage service.
|
||||
|
||||
#### Additional Configuration
|
||||
@ -95,7 +95,7 @@ Go to a specific app’s documentation page to view all configuration options
|
||||
* [GCP Autoscaler](../webapp/applications/apps_gcp_autoscaler.md)
|
||||
|
||||
## Kubernetes
|
||||
ClearML offers an option to install `clearml-agent` through a Helm chart.
|
||||
You can install `clearml-agent` through a Helm chart.
|
||||
|
||||
The Clearml Agent deployment is set to service a queue(s). When tasks are added to the queues, the agent pulls the task
|
||||
and creates a pod to execute the task. Kubernetes handles resource management. Your task pod will remain pending until
|
||||
|
@ -61,7 +61,7 @@ help maintainers reproduce the problem:
|
||||
a [gist](https://gist.github.com) (and provide a link to that gist).
|
||||
* **Describe the behavior you observed after following the steps** and the exact problem with that behavior.
|
||||
* **Explain which behavior you expected to see and why.**
|
||||
* **For Web-App issues, please include screenshots and animated GIFs** that recreate the described steps and clearly demonstrate
|
||||
* **For WebApp (UI) issues, please include screenshots and animated GIFs** that recreate the described steps and clearly demonstrate
|
||||
the problem. You can use [LICEcap](https://www.cockos.com/licecap) to record GIFs on macOS and Windows, and [silentcast](https://github.com/colinkeenan/silentcast)
|
||||
or [byzanz](https://github.com/threedaymonk/byzanz) on Linux.
|
||||
|
||||
@ -85,9 +85,9 @@ Enhancement suggestions are tracked as GitHub issues. After you determine which
|
||||
|
||||
Before you submit a new PR:
|
||||
|
||||
* Verify that the work you plan to merge addresses an existing [issue](https://github.com/allegroai/clearml/issues) (If not, open a new one)
|
||||
* Verify that the work you plan to merge addresses an existing [issue](https://github.com/allegroai/clearml/issues) (if not, open a new one)
|
||||
* Check related discussions in the [ClearML slack community](https://joinslack.clear.ml)
|
||||
(Or start your own discussion on the ``#clearml-dev`` channel)
|
||||
(or start your own discussion on the ``#clearml-dev`` channel)
|
||||
* Make sure your code conforms to the ClearML coding standards by running:
|
||||
|
||||
flake8 --max-line-length=120 --statistics --show-source --extend-ignore=E501 ./clearml*
|
||||
|
@ -45,7 +45,7 @@ Once deployed, ClearML Server exposes the following services:
|
||||
1. Go to AWS EC2 Console.
|
||||
1. In the **Details** tab, **Public DNS (IPv4)** shows the ClearML Server address.
|
||||
|
||||
**To access ClearML Server Web-App (UI):**
|
||||
**To access ClearML Server WebApp (UI):**
|
||||
|
||||
* Direct browser to its web server URL: `http://<Server Address>:8080`
|
||||
|
||||
|
@ -88,7 +88,7 @@ The minimum requirements for ClearML Server are:
|
||||
|
||||
## Backing Up and Restoring Data and Configuration
|
||||
|
||||
The commands in this section are an example of how to back up and restore data and configuration .
|
||||
The commands in this section are an example of how to back up and restore data and configuration.
|
||||
|
||||
If data and configuration folders are in `/opt/clearml`, then archive all data into `~/clearml_backup_data.tgz`, and
|
||||
configuration into `~/clearml_backup_config.tgz`:
|
||||
|
@ -157,7 +157,7 @@ After deploying ClearML Server, the services expose the following ports:
|
||||
|
||||
## Backing Up and Restoring Data and Configuration
|
||||
|
||||
The commands in this section are an example of how to back up and to restore data and configuration .
|
||||
The commands in this section are an example of how to back up and to restore data and configuration.
|
||||
|
||||
If the data and configuration folders are in `/opt/clearml`, then archive all data into `~/clearml_backup_data.tgz`, and
|
||||
configuration into `~/clearml_backup_config.tgz`:
|
||||
|
16
docs/faq.md
16
docs/faq.md
@ -24,7 +24,7 @@ title: FAQ
|
||||
* [I noticed that all of my experiments appear as "Training". Are there other options?](#other-experiment-types)
|
||||
* [Sometimes I see experiments as running when in fact they are not. What's going on?](#experiment-running-but-stopped)
|
||||
* [My code throws an exception, but my experiment status is not "Failed". What happened?](#exception-not-failed)
|
||||
* [CERTIFICATE_VERIFY_FAILED - When I run my experiment, I get an SSL Connection error . Do you have a solution?](#ssl-connection-error)
|
||||
* [CERTIFICATE_VERIFY_FAILED - When I run my experiment, I get an SSL Connection error. Do you have a solution?](#ssl-connection-error)
|
||||
* [How do I modify experiment names once they have been created?](#modify_exp_names)
|
||||
* [Using Conda and the "typing" package, I get the error "AttributeError: type object 'Callable' has no attribute '_abc_registry'". How do I fix this?](#typing)
|
||||
* [My ClearML Server disk space usage is too high. What can I do about this?](#delete_exp)
|
||||
@ -91,12 +91,12 @@ title: FAQ
|
||||
|
||||
**ClearML Server Troubleshooting**
|
||||
|
||||
* [I did a reinstall. Why can't I create credentials in the Web-App (UI)?](#clearml-server-reinstall-cookies)
|
||||
* [I did a reinstall. Why can't I create credentials in the WebApp (UI)?](#clearml-server-reinstall-cookies)
|
||||
* [How do I fix Docker upgrade errors?](#common-docker-upgrade-errors)
|
||||
* [Why is web login authentication not working?](#port-conflict)
|
||||
* [How do I bypass a proxy configuration to access my local ClearML Server?](#proxy-localhost)
|
||||
* [Trains is failing to update ClearML Server. I get an error 500 (or 400). How do I fix this?](#elastic_watermark)
|
||||
* [Why is my Trains Web-App (UI) not showing any data?](#web-ui-empty)
|
||||
* [Why is my Trains WebApp (UI) not showing any data?](#web-ui-empty)
|
||||
* [Why can't I access my ClearML Server when I run my code in a virtual machine?](#vm_server)
|
||||
|
||||
**ClearML Agent**
|
||||
@ -321,7 +321,7 @@ task = Task.init(project_name, task_name, Task.TaskTypes.testing)
|
||||
|
||||
**Sometimes I see experiments as running when in fact they are not. What's going on?** <a id="experiment-running-but-stopped"></a>
|
||||
|
||||
ClearML monitors your Python process. When the process exits properly, ClearML closes the experiment. When the process crashes and terminates abnormally, it sometimes misses the stop signal. In this case, you can safely right-click the experiment in the Web-App and abort it.
|
||||
ClearML monitors your Python process. When the process exits properly, ClearML closes the experiment. When the process crashes and terminates abnormally, it sometimes misses the stop signal. In this case, you can safely right-click the experiment in the WebApp and abort it.
|
||||
|
||||
<br/>
|
||||
|
||||
@ -919,7 +919,7 @@ on the "Configuring Your Own ClearML Server" page.
|
||||
|
||||
**Can I add web login authentication to ClearML Server?** <a id="web-auth"></a>
|
||||
|
||||
By default, anyone can log in to the ClearML Server Web-App. You can configure the ClearML Server to allow only a specific set of users to access the system.
|
||||
By default, anyone can log in to the ClearML Server WebApp. You can configure the ClearML Server to allow only a specific set of users to access the system.
|
||||
|
||||
For detailed instructions, see [Web Login Authentication](deploying_clearml/clearml_server_config.md#web-login-authentication)
|
||||
on the "Configuring Your Own ClearML Server" page in the "Deploying ClearML" section.
|
||||
@ -940,7 +940,7 @@ For detailed instructions, see [Modifying non-responsive Task watchdog settings]
|
||||
|
||||
## ClearML Server Troubleshooting
|
||||
|
||||
**I did a reinstall. Why can't I create credentials in the Web-App (UI)?** <a id="clearml-server-reinstall-cookies"></a>
|
||||
**I did a reinstall. Why can't I create credentials in the WebApp (UI)?** <a id="clearml-server-reinstall-cookies"></a>
|
||||
|
||||
The issue is likely your browser cookies for ClearML Server. Clearing your browser cookies for ClearML Server is recommended.
|
||||
For example:
|
||||
@ -1089,9 +1089,9 @@ A likely indication of this situation can be determined by searching your clearm
|
||||
|
||||
<br/>
|
||||
|
||||
**Why is my ClearML Web-App (UI) not showing any data?** <a className="tr_top_negative" id="web-ui-empty"></a>
|
||||
**Why is my ClearML WebApp (UI) not showing any data?** <a className="tr_top_negative" id="web-ui-empty"></a>
|
||||
|
||||
If your ClearML Web-App (UI) does not show anything, it may be an error authenticating with the server. Try clearing the application cookies for the site in your browser's developer tools.
|
||||
If your ClearML WebApp (UI) does not show anything, it may be an error authenticating with the server. Try clearing the application cookies for the site in your browser's developer tools.
|
||||
|
||||
**Why can't I access my ClearML Server when I run my code in a virtual machine?** <a id="vm_server"></a>
|
||||
|
||||
|
@ -7,7 +7,7 @@ title: Tasks
|
||||
A Task is a single code execution session, which can represent an experiment, a step in a workflow, a workflow controller,
|
||||
or any custom implementation you choose.
|
||||
|
||||
To transform an existing script into a **ClearML Task**, one must call the [`Task.init()`](../references/sdk/task.md#taskinit) method
|
||||
To transform an existing script into a **ClearML Task**, call the [`Task.init()`](../references/sdk/task.md#taskinit) method
|
||||
and specify a task name and its project. This creates a Task object that automatically captures code execution
|
||||
information as well as execution outputs.
|
||||
|
||||
|
@ -84,7 +84,7 @@ preprocess_task = Task.get_task(task_id='preprocessing_task_id')
|
||||
local_csv = preprocess_task.artifacts['data'].get_local_copy()
|
||||
```
|
||||
|
||||
The `task.artifacts` is a dictionary where the keys are the artifact names, and the returned object is the artifact object.
|
||||
`task.artifacts` is a dictionary where the keys are the artifact names, and the returned object is the artifact object.
|
||||
Calling `get_local_copy()` returns a local cached copy of the artifact. Therefore, next time we execute the code, we don't
|
||||
need to download the artifact again.
|
||||
Calling `get()` gets a deserialized pickled object.
|
||||
|
@ -30,11 +30,11 @@ So let’s start with the inputs: hyperparameters. Hyperparameters are the confi
|
||||
|
||||
Let’s take this simple code as an example. First of all, we start the script with the 2 magic lines of code that we covered before. Next to that we have a mix of command line arguments and some additional parameters in a dictionary here.
|
||||
|
||||
The command line arguments will be captured automatically, and for the dict (or really any python object) we can use the `task.connect()` function, to report our dict values as ClearML hyperparameters.
|
||||
The command line arguments will be captured automatically, and for the dict (or really any python object) we can use the `Task.connect()` function, to report our dict values as ClearML hyperparameters.
|
||||
|
||||
As you can see, when we run the script, all hyperparameters are captured and parsed by the server, giving you a clean overview in the UI.
|
||||
|
||||
Configuration objects, however, work slightly differently and are mostly used for more complex configurations, like a nested dict or a yaml file for example. They’re logged by using the `task.connect_configuration()` function instead and will save the configuration as a whole, without parsing it.
|
||||
Configuration objects, however, work slightly differently and are mostly used for more complex configurations, like a nested dict or a yaml file for example. They’re logged by using the `Task.connect_configuration()` function instead and will save the configuration as a whole, without parsing it.
|
||||
|
||||
We have now logged our task with all of its inputs, but if we wanted to, we could rerun our code with different parameters and this is where the magic happens.
|
||||
|
||||
|
@ -188,7 +188,7 @@ your machine usage and your GPU usage and stuff like that, and then the learning
|
||||
give you a really, really quick overview of the most important metrics that you're trying to solve. And keep in mind
|
||||
this F1 score because this is the thing that we're trying to optimize here.
|
||||
|
||||
Then plots. I can, for example, plot a confusion matrix every X iterations. So in this case ,for example, after a few
|
||||
Then plots. I can, for example, plot a confusion matrix every X iterations. So in this case, for example, after a few
|
||||
iterations, I plot the confusion matrix again just so I can see over time how well the model starts performing. So as
|
||||
you can see here, a perfect confusion matrix will be a diagonal line because every true label will be combined with the
|
||||
exact same predicted label. And in this case, it's horribly wrong. But then over time it starts getting closer and
|
||||
|
@ -119,7 +119,7 @@ abort task 0. So one way of doing that would be to go to the current experiment,
|
||||
ClearML will actually bring me to the original experiment view, the experiment manager, remember everything is
|
||||
integrated here. The experiment manager of that example task. So what I can do here if I look at the console, I have a
|
||||
bunch of output here. I can actually abort it as well. And if I abort it, what will happen is this task will stop
|
||||
executing. Essentially, it will send a `ctrl c`, so a quit command or a terminate command, to the original task on the \
|
||||
executing. Essentially, it will send a `ctrl c`, so a quit command or a terminate command, to the original task on the
|
||||
remote machine. So the remote machine will say okay, I'm done here. I will just quit it right here. If, for example,
|
||||
your model is not performing very well, or you see like oh, something is definitely wrong here, you can always just
|
||||
abort it. And the cool thing is if we go back to the **Workers and Queues**, we'll see that the `Beast 0` has given up working
|
||||
@ -212,13 +212,13 @@ you would have made yourself, and now you want to get it into the queue. Now one
|
||||
you could do a `Task.init` which essentially tracks the run of your code as an experiment in the experiment manager, and
|
||||
then you could go and clone the experiment and then enqueue it. This is something that we saw in the Getting Started videos before.
|
||||
|
||||
Now, another way of doing this is to actually use what you can see here, which is `task.execute_remotely`. What this line
|
||||
Now, another way of doing this is to actually use what you can see here, which is `Task.execute_remotely()`. What this line
|
||||
specifically will do, is when you run the file right here. Let me just do that real quick. So if we do
|
||||
`python setup/example_task_CPU.py` what will happen is ClearML will do the `Task.init` like it would always do, but then
|
||||
it would encounter the `task.execute_remotely` and what that will tell ClearML is say okay, take all of this code, take
|
||||
it would encounter the `Task.execute_remotely()` and what that will tell ClearML is say okay, take all of this code, take
|
||||
all of the packages that are installed, take all of the things that you would normally take as part of the experiment
|
||||
manager, but stop executing right here and then send the rest, send everything through to a ClearML agent or to the queue
|
||||
so that a ClearML agent can start working on it. So one way of doing this is to add a `task.execute_remotely` just all
|
||||
so that a ClearML agent can start working on it. So one way of doing this is to add a `Task.execute_remotely()` just all
|
||||
the way at the top and then once you run it, you will see here `clearml WARNING - Terminating local execution process`,
|
||||
and so if we're seeing here if we're going to take a look we can see that Model Training currently running, and if we go
|
||||
and take a look, at our queues here, we have `any-remote-machine` running Model Training right here. And if we go and
|
||||
@ -246,7 +246,7 @@ our Model Training GPU. But remember again that we also have the autoscaler. So
|
||||
autoscaler, you'll see here that we indeed have one task in the GPU queue. And we also see that the `GPU_machines`
|
||||
Running Instances is one as well. So we can follow along with the logs here. And it actually detected that there is a
|
||||
task in a GPU queue, and it's now spinning up a new machine, a new GPU machine to be running that specific task, and then
|
||||
it will shut that back down again when it's done. So this is just one example of how you can use `task.execute_remotely`
|
||||
it will shut that back down again when it's done. So this is just one example of how you can use `Task.execute_remotely()`
|
||||
to very efficiently get your tasks into the queue. Actually, it could also be the first time. So if you don't want to
|
||||
use the experiment manager for example, you don't actually have to use a task that is already in the system, you can
|
||||
just say it does not execute remotely, and it will just put it into the system for you and immediately launch it remotely.
|
||||
|
@ -146,7 +146,7 @@ there and open it up, we first get the status of the task, just to be sure. Reme
|
||||
something else might have happened in the meantime. If the status is not `completed`, we want to say this is the
|
||||
status, it isn't completed this should not happen but. If it is completed, we are going to create a table with these
|
||||
functions that I won't go deeper into. Basically, they format the dictionary of the state of the task scalars into
|
||||
markdown that we can actually use. Let me just go into this though one quick time. So we can basically do `task.get_last_scalar_metrics`,
|
||||
markdown that we can actually use. Let me just go into this though one quick time. So we can basically do `Task.get_last_scalar_metrics()`,
|
||||
and this function is built into ClearML, which basically gives you a dictionary with all the metrics on your task.
|
||||
We'll just get that formatted into a table, make it into a pandas DataFrame, and then tabulate it with this cool package
|
||||
that turns it into MarkDown. So now that we have marked down in the table, we then want to return results table. You can
|
||||
|
@ -30,7 +30,7 @@ Yeah, yeah we can, it's called hyperparameter optimization. And we can do all of
|
||||
|
||||
If you don’t know what Hyperparameter Optimization is yet, you can find a link to our blog post on the topic in the description below. But in its most basic form, hyperparameter optimization tries to optimize a certain output by changing a set of inputs.
|
||||
|
||||
Let’s say we’ve been working on this model here, and we were tracking our experiments with it anyway. We can see we have some hyperparameters to work with in the **Hyperparameters** tab of the web UI. They are logged by using the `task.connect` function in our code. These are our inputs. We also have a scaler called `validation/epoch_accuracy`, that we want to get as high as possible. This is our output. We could also select to minimize the `epoch_loss` for example, that is something you can decide yourself.
|
||||
Let’s say we’ve been working on this model here, and we were tracking our experiments with it anyway. We can see we have some hyperparameters to work with in the **Hyperparameters** tab of the web UI. They are logged by using the `Task.connect` function in our code. These are our inputs. We also have a scaler called `validation/epoch_accuracy`, that we want to get as high as possible. This is our output. We could also select to minimize the `epoch_loss` for example, that is something you can decide yourself.
|
||||
|
||||
We can see that no code was used to log the scalar. It's done automatically because we are using TensorBoard.
|
||||
|
||||
|
@ -17,7 +17,7 @@ The example script does the following:
|
||||
1. Builds a sequential model using a categorical cross entropy loss objective function.
|
||||
1. Specifies accuracy as the metric, and uses two callbacks: a TensorBoard callback and a model checkpoint callback.
|
||||
1. During script execution, creates an experiment named `Keras with TensorBoard example`, which is associated with the
|
||||
`examples` project (in script) or the `Colab notebooks` project (in Jupyter Notebook) .
|
||||
`examples` project (in script) or the `Colab notebooks` project (in Jupyter Notebook).
|
||||
|
||||
|
||||
## Scalars
|
||||
|
@ -205,7 +205,7 @@ The logs show the Task ID and accuracy for the best model in **CONSOLE**.
|
||||
|
||||

|
||||
|
||||
The link to the model details is in **ARTIFACTS** **>** **Output Model** .
|
||||
The link to the model details is in **ARTIFACTS** **>** **Output Model**.
|
||||
|
||||

|
||||
|
||||
|
@ -103,7 +103,7 @@ In `slack_alerts.py`, the class `SlackMonitor` inherits from the `Monitor` class
|
||||
* Builds the Slack message which includes the most recent output to the console (retrieved by calling [`Task.get_reported_console_output`](../../references/sdk/task.md#get_reported_console_output)),
|
||||
and the URL of the Task's output log in the ClearML Web UI (retrieved by calling [`Task.get_output_log_web_page`](../../references/sdk/task.md#get_output_log_web_page)).
|
||||
|
||||
The example provides the option to run locally or execute remotely by calling the [`Task.execute_remotely`](../../references/sdk/task.md#execute_remotely)
|
||||
You can run the example remotely by calling the [`Task.execute_remotely`](../../references/sdk/task.md#execute_remotely)
|
||||
method.
|
||||
|
||||
To interface to Slack, the example uses `slack_sdk.WebClient` and `slack_sdk.errors.SlackApiError`.
|
||||
|
@ -22,7 +22,7 @@ In the `examples/frameworks/pytorch` directory, run the experiment script:
|
||||
|
||||
Clone the experiment to create an editable copy for tuning.
|
||||
|
||||
1. In the **ClearML Web-App (UI)**, on the Projects page, click the `examples` project card.
|
||||
1. In the ClearML WebApp (UI), on the Projects page, click the `examples` project card.
|
||||
|
||||
1. In the experiments table, right-click the experiment `pytorch mnist train`.
|
||||
|
||||
@ -82,7 +82,7 @@ Run the worker daemon on the local development machine.
|
||||
|
||||
Enqueue the tuned experiment.
|
||||
|
||||
1. In the **ClearML Web-App (UI)**, experiments table, right-click the experiment `Clone Of pytorch mnist train`.
|
||||
1. In the ClearML WebApp > experiments table, right-click the experiment `Clone Of pytorch mnist train`.
|
||||
|
||||
1. In the context menu, click **Enqueue**.
|
||||
|
||||
@ -95,7 +95,7 @@ Enqueue the tuned experiment.
|
||||
## Step 6: Compare the Experiments
|
||||
|
||||
To compare the original and tuned experiments:
|
||||
1. In the **ClearML Web-App (UI)**, on the Projects page, click the `examples` project.
|
||||
1. In the ClearML WebApp (UI), on the Projects page, click the `examples` project.
|
||||
1. In the experiments table, select the checkboxes for the two experiments: `pytorch mnist train` and `Clone Of pytorch mnist train`.
|
||||
1. On the menu bar at the bottom of the experiments table, click **COMPARE**. The experiment comparison window appears.
|
||||
All differences appear with a different background color to highlight them.
|
||||
|
@ -504,7 +504,7 @@ my_dataview = DataView.get(dataview_id='12344kg2p3hf8')
|
||||
|
||||
Access the Dataview's frames as a python list, dictionary, or through a pythonic iterator.
|
||||
|
||||
The [`DataView.to_list`](../references/hyperdataset/dataview.md#to_list) method returns the Dataview queries result as a python list .
|
||||
The [`DataView.to_list`](../references/hyperdataset/dataview.md#to_list) method returns the Dataview queries result as a python list.
|
||||
|
||||
The [`DataView.to_dict`](../references/hyperdataset/dataview.md#to_dict) method returns a list of dictionaries, where each dictionary represents a frame. Use the
|
||||
`projection` parameter to specify a subset of the frame fields to be included in the result. Input a list of strings,
|
||||
|
@ -27,7 +27,7 @@ In the UI, you can view the mapping in a dataset version's [Metadata](webapp/web
|
||||

|
||||
|
||||
When viewing a frame with a mask corresponding with the version’s mask-label mapping, the UI arbitrarily assigns a color
|
||||
to each label . The color assignment can be [customized](webapp/webapp_datasets_frames.md#labels).
|
||||
to each label. The color assignment can be [customized](webapp/webapp_datasets_frames.md#labels).
|
||||
|
||||
For example:
|
||||
* Original frame image:
|
||||
@ -44,7 +44,7 @@ The frame's sources array contains a masks list of dictionaries that looks somet
|
||||
```editorconfig
|
||||
{
|
||||
"id": "<framegroup_id>",
|
||||
"timestamp": "<timestamp>" ,
|
||||
"timestamp": "<timestamp>",
|
||||
"context_id": "car_1",
|
||||
"sources": [
|
||||
{
|
||||
|
@ -10,7 +10,7 @@ Hyper-Datasets are supported by the `allegroai` python package.
|
||||
|
||||
### Connecting Dataviews to a Task
|
||||
|
||||
Use [`Task.connect`](../references/sdk/task.md#connect) to connect a Dataview object to a Task:
|
||||
Use [`Task.connect()`](../references/sdk/task.md#connect) to connect a Dataview object to a Task:
|
||||
|
||||
```python
|
||||
from allegroai import DataView, Task
|
||||
|
@ -102,7 +102,7 @@ def step_one(pickle_data_url: str, extra: int = 43):
|
||||
instead of rerunning the step.
|
||||
* `packages` - A list of required packages or a local requirements.txt file. Example: `["tqdm>=2.1", "scikit-learn"]` or
|
||||
`"./requirements.txt"`. If not provided, packages are automatically added based on the imports used inside the function.
|
||||
* `execution_queue` (Optional) - Queue in which to enqueue the specific step. This overrides the queue set with the
|
||||
* `execution_queue` (optional) - Queue in which to enqueue the specific step. This overrides the queue set with the
|
||||
[`PipelineDecorator.set_default_execution_queue method`](../references/sdk/automation_controller_pipelinecontroller.md#pipelinedecoratorset_default_execution_queue)
|
||||
method.
|
||||
* `continue_on_fail` - If `True`, a failed step does not cause the pipeline to stop (or marked as failed). Notice, that
|
||||
@ -115,11 +115,11 @@ def step_one(pickle_data_url: str, extra: int = 43):
|
||||
* Examples:
|
||||
* remote url: `"https://github.com/user/repo.git"`
|
||||
* local repo copy: `"./repo"` -> will automatically store the remote repo url and commit ID based on the locally cloned copy
|
||||
* `repo_branch` (Optional) - Specify the remote repository branch (Ignored, if local repo path is used)
|
||||
* `repo_commit` (Optional) - Specify the repository commit ID (Ignored, if local repo path is used)
|
||||
* `helper_functions` (Optional) - A list of helper functions to make available for the standalone pipeline step. By default, the pipeline step function has no access to any of the other functions, by specifying additional functions here, the remote pipeline step could call the additional functions.
|
||||
* `repo_branch` (optional) - Specify the remote repository branch (ignored, if local repo path is used)
|
||||
* `repo_commit` (optional) - Specify the repository commit ID (ignored, if local repo path is used)
|
||||
* `helper_functions` (optional) - A list of helper functions to make available for the standalone pipeline step. By default, the pipeline step function has no access to any of the other functions, by specifying additional functions here, the remote pipeline step could call the additional functions.
|
||||
Example, assuming you have two functions, `parse_data()` and `load_data()`: `[parse_data, load_data]`
|
||||
* `parents` – Optional list of parent steps in the pipeline. The current step in the pipeline will be sent for execution only after all the parent steps have been executed successfully.
|
||||
* `parents` (optional) - A list of parent steps in the pipeline. The current step in the pipeline will be sent for execution only after all the parent steps have been executed successfully.
|
||||
* `retry_on_failure` - Number of times to retry step in case of failure. You can also input a callable function in the
|
||||
following format:
|
||||
|
||||
@ -153,12 +153,12 @@ def step_one(pickle_data_url: str, extra: int = 43):
|
||||
|
||||
Additionally, you can enable automatic logging of a step’s metrics / artifacts / models to the pipeline task using the
|
||||
following arguments:
|
||||
* `monitor_metrics` (Optional) - Automatically log the step's reported metrics also on the pipeline Task. The expected
|
||||
* `monitor_metrics` (optional) - Automatically log the step's reported metrics also on the pipeline Task. The expected
|
||||
format is one of the following:
|
||||
* List of pairs metric (title, series) to log: [(step_metric_title, step_metric_series), ]. Example: `[('test', 'accuracy'), ]`
|
||||
* List of tuple pairs, to specify a different target metric to use on the pipeline Task: [((step_metric_title, step_metric_series), (target_metric_title, target_metric_series)), ].
|
||||
Example: `[[('test', 'accuracy'), ('model', 'accuracy')], ]`
|
||||
* `monitor_artifacts` (Optional) - Automatically log the step's artifacts on the pipeline Task.
|
||||
* `monitor_artifacts` (optional) - Automatically log the step's artifacts on the pipeline Task.
|
||||
* Provided a list of
|
||||
artifact names created by the step function, these artifacts will be logged automatically also on the Pipeline Task
|
||||
itself. Example: `['processed_data', ]` (target artifact name on the Pipeline Task will have the same name as the original
|
||||
@ -166,7 +166,7 @@ following arguments:
|
||||
* Alternatively, provide a list of pairs (source_artifact_name, target_artifact_name), where the first string is the
|
||||
artifact name as it appears on the component Task, and the second is the target artifact name to put on the Pipeline
|
||||
Task. Example: `[('processed_data', 'final_processed_data'), ]`
|
||||
* `monitor_models` (Optional) - Automatically log the step's output models on the pipeline Task.
|
||||
* `monitor_models` (optional) - Automatically log the step's output models on the pipeline Task.
|
||||
* Provided a list of model names created by the step's Task, they will also appear on the Pipeline itself. Example: `['model_weights', ]`
|
||||
* To select the latest (lexicographic) model use `model_*`, or the last created model with just `*`. Example: `['model_weights_*', ]`
|
||||
* Alternatively, provide a list of pairs (source_model_name, target_model_name), where the first string is the model
|
||||
|
@ -224,7 +224,7 @@ title: Version 0.17
|
||||
* Replace "Download" action with "Copy to Clipboard" for local files.
|
||||
* Add ClearML usage tips.
|
||||
* Add Task and Model, Clone and Move to allow creating new project on the fly.
|
||||
* Add support for all S3 / minio special character \(%\) encoding cases .
|
||||
* Add support for all S3 / minio special character \(%\) encoding cases.
|
||||
* Add API for filter by parent.
|
||||
* Improve browser-search to find data not visible in uncommitted changes / installed packages window.
|
||||
* Improve Task artifacts optimization.
|
||||
|
@ -2,7 +2,7 @@
|
||||
title: Comparing Experiments
|
||||
---
|
||||
It is always useful to investigate what causes an experiment to succeed.
|
||||
The **ClearML Web UI** provides experiment comparison features, allowing to locate, visualize, and analyze differences including:
|
||||
The ClearML Web UI provides experiment comparison features, allowing to locate, visualize, and analyze differences including:
|
||||
|
||||
* [Details](#details)
|
||||
- Artifacts - Input model, output model, and model design.
|
||||
@ -123,7 +123,7 @@ Visualize the comparison of scalars, which includes metrics and monitored resour
|
||||
|
||||
### Compare Scalar Series
|
||||
|
||||
Compare scalar series in plots and analyze differences using **ClearML Web UI** plot tools.
|
||||
Compare scalar series in plots and analyze differences using plot tools.
|
||||
|
||||
**To compare scalar series:**
|
||||
|
||||
|
@ -7,7 +7,7 @@ The **ClearML Web UI** is the graphical user interface for the ClearML platform,
|
||||
* Browsing
|
||||
* Resource utilization monitoring
|
||||
* Profile management
|
||||
* Direct access to the ClearML community (Slack Channel, Youtube, and GitHub).
|
||||
* Direct access to the ClearML community (Slack Channel, YouTube, and GitHub).
|
||||
|
||||

|
||||
|
||||
|
@ -24,7 +24,7 @@ be sent to the experiment's page.
|
||||
|
||||
Every project has a `description` field. The UI provides a Markdown editor to edit this field.
|
||||
|
||||
In the Markdown document, you can write and share reports and add links to **ClearML** experiments
|
||||
In the Markdown document, you can write and share reports and add links to ClearML experiments
|
||||
or any network resource such as issue tracker, web repository, etc.
|
||||
|
||||
### Editing the Description
|
||||
|
Loading…
Reference in New Issue
Block a user