mirror of
https://github.com/clearml/clearml-docs
synced 2025-06-26 18:17:44 +00:00
Small edits (#476)
This commit is contained in:
@@ -17,11 +17,11 @@ If you are afraid of clutter, use the archive option, and set up your own [clean
|
||||
These metrics can later be part of your own in-house monitoring solution, don't let good data go to waste :)
|
||||
|
||||
## Clone Tasks
|
||||
In order to define a Task in ClearML we have two options
|
||||
Define a ClearML Task with one of the following options:
|
||||
- Run the actual code with `Task.init` call. This will create and auto-populate the Task in CleaML (including Git Repo / Python Packages / Command line etc.).
|
||||
- Register local / remote code repository with `clearml-task`. See [details](../../apps/clearml_task.md).
|
||||
|
||||
Once we have a Task in ClearML, we can clone and edit its definitions in the UI, then launch it on one of our nodes with [ClearML Agent](../../clearml_agent.md).
|
||||
Once you have a Task in ClearML, you can clone and edit its definitions in the UI, then launch it on one of your nodes with [ClearML Agent](../../clearml_agent.md).
|
||||
|
||||
## Advanced Automation
|
||||
- Create daily / weekly cron jobs for retraining best performing models on.
|
||||
|
||||
@@ -164,7 +164,7 @@ and [pipeline](../../pipelines/pipelines.md) solutions.
|
||||
Logging models into the model repository is the easiest way to integrate the development process directly with production.
|
||||
Any model stored by a supported framework (Keras / TensorFlow / PyTorch / Joblib etc.) will be automatically logged into ClearML.
|
||||
|
||||
ClearML also offers methods to explicitly log models. Models can be automatically stored on a preferred storage medium
|
||||
ClearML also supports methods to explicitly log models. Models can be automatically stored on a preferred storage medium
|
||||
(s3 bucket, google storage, etc.).
|
||||
|
||||
#### Log Metrics
|
||||
@@ -208,7 +208,7 @@ tasks = Task.get_tasks(
|
||||
Data is probably one of the biggest factors that determines the success of a project. Associating a model’s data with
|
||||
the model's configuration, code, and results (such as accuracy) is key to deducing meaningful insights into model behavior.
|
||||
|
||||
[ClearML Data](../../clearml_data/clearml_data.md) allows you to version your data, so it's never lost, fetch it from every
|
||||
[ClearML Data](../../clearml_data/clearml_data.md) lets you version your data, so it's never lost, fetch it from every
|
||||
machine with minimal code changes, and associate data to experiment results.
|
||||
|
||||
Logging data can be done via command line, or programmatically. If any preprocessing code is involved, ClearML logs it
|
||||
|
||||
@@ -16,19 +16,19 @@ The sections below describe the following scenarios:
|
||||
## Building Tasks
|
||||
### Dataset Creation
|
||||
|
||||
Let's assume we have some code that extracts data from a production database into a local folder.
|
||||
Our goal is to create an immutable copy of the data to be used by further steps:
|
||||
Let's assume you have some code that extracts data from a production database into a local folder.
|
||||
Your goal is to create an immutable copy of the data to be used by further steps:
|
||||
|
||||
```bash
|
||||
clearml-data create --project data --name dataset
|
||||
clearml-data sync --folder ./from_production
|
||||
```
|
||||
|
||||
We could also add a tag `latest` to the Dataset, marking it as the latest version.
|
||||
You can add a tag `latest` to the Dataset, marking it as the latest version.
|
||||
|
||||
### Preprocessing Data
|
||||
The second step is to preprocess the data. First we need to access it, then we want to modify it,
|
||||
and lastly we want to create a new version of the data.
|
||||
The second step is to preprocess the data. First access the data, then modify it,
|
||||
and lastly create a new version of the data.
|
||||
|
||||
```python
|
||||
# create a task for the data processing part
|
||||
@@ -59,10 +59,10 @@ dataset.tags = []
|
||||
new_dataset.tags = ['latest']
|
||||
```
|
||||
|
||||
We passed the `parents` argument when we created v2 of the Dataset, which inherits all the parent's version content.
|
||||
This not only helps trace back dataset changes with full genealogy, but also makes our storage more efficient,
|
||||
The new dataset inherits the contents of the datasets specified in `Dataset.create`'s `parents` argument.
|
||||
This not only helps trace back dataset changes with full genealogy, but also makes the storage more efficient,
|
||||
since it only stores the changed and / or added files from the parent versions.
|
||||
When we access the Dataset, it automatically merges the files from all parent versions
|
||||
When you access the Dataset, it automatically merges the files from all parent versions
|
||||
in a fully automatic and transparent process, as if the files were always part of the requested Dataset.
|
||||
|
||||
### Training
|
||||
|
||||
Reference in New Issue
Block a user