Small edits (#741)

This commit is contained in:
pollfly 2023-12-29 12:58:47 +02:00 committed by GitHub
parent 054eb2ad54
commit 50fcf7c700
No known key found for this signature in database
GPG Key ID: 4AEE18F83AFDEB23
6 changed files with 24 additions and 24 deletions

View File

@ -260,7 +260,7 @@ Dataset files must be uploaded before a dataset is [finalized](#finalizing-a-dat
## Finalizing a Dataset
Use the [`Dataset.finalize`](../references/sdk/dataset.md#finalize) method to close the current dataset. This marks the
Use [`Dataset.finalize()`](../references/sdk/dataset.md#finalize) to close the current dataset. This marks the
dataset task as *Completed*, at which point, the dataset can no longer be modified.
Before closing a dataset, its files must first be [uploaded](#uploading-files).
@ -268,7 +268,7 @@ Before closing a dataset, its files must first be [uploaded](#uploading-files).
## Syncing Local Storage
Use the [`Dataset.sync_folder`](../references/sdk/dataset.md#sync_folder) method in order to update a dataset according
Use [`Dataset.sync_folder()`](../references/sdk/dataset.md#sync_folder) in order to update a dataset according
to a specific folder's content changes. Specify the folder to sync with the `local_path` parameter (the method assumes all files within the folder and recursive).
This method is useful in the case where there's a single point of truth, either a local or network folder, that gets updated periodically.
@ -276,7 +276,7 @@ The folder changes will be reflected in a new dataset version. This method saves
update (add / remove) files in a dataset.
## Deleting Datasets
Delete a dataset using the [`Dataset.delete`](../references/sdk/dataset.md#datasetdelete) class method. Input any of the
Delete a dataset using [`Dataset.delete()`](../references/sdk/dataset.md#datasetdelete) method. Input any of the
attributes of the dataset(s) you want to delete, including ID, project name, version, and/or dataset name. Multiple
datasets matching the query will raise an exception, unless you pass `entire_dataset=True` and `force=True`. In this
case, all matching datasets will be deleted.
@ -360,11 +360,11 @@ Note that in offline mode, any methods that require communicating with the serve
`finalize()`, `get_local_copy()`, `get()`, `move_to_project()`, etc.).
Upload the offline dataset to the ClearML Server using [`Dataset.import_offline_session()`](../references/sdk/dataset.md#datasetimport_offline_session).
In the `session_folder_zip` argument, insert the path to the zip folder containing the dataset. To [upload](#uploading-files)
the dataset's data to network storage, set `upload` to `True`. To [finalize](#finalizing-a-dataset) the dataset,
which will close it and prevent further modifications to the dataset, set `finalize` to `True`.
```python
Dataset.import_offline_session(session_folder_zip="<path_to_offline_dataset>", upload=True, finalize=True)
```
In the `session_folder_zip` argument, insert the path to the zip folder containing the dataset. To [upload](#uploading-files)
the dataset's data to network storage, set `upload` to `True`. To [finalize](#finalizing-a-dataset) the dataset,
which will close it and prevent further modifications to the dataset, set `finalize` to `True`.

View File

@ -43,7 +43,7 @@ New dataset created id=ee1c35f60f384e65bc800f42f0aca5ec
Where `ee1c35f60f384e65bc800f42f0aca5ec` is the dataset ID.
## Adding Files
Add the files that were just downloaded to the dataset:
Add the [downloaded files](#downloading-the-data) to the dataset:
```
clearml-data add --files <dataset_path>

View File

@ -5,9 +5,9 @@ title: Multiple Tasks in Single Process
The [multiple_tasks_single_process](https://github.com/allegroai/clearml/blob/master/examples/advanced/multiple_tasks_single_process.py)
script demonstrates the capability to log a single script in multiple ClearML tasks.
In order to log a script in multiple tasks, each task needs to be initialized using the [`Task.init`](../../references/sdk/task.md#taskinit)
method with the `task_name` and `project_name` parameters input. Before initializing an additional task in the same script, the
previous task must be manually shut down with the [`close`](../../references/sdk/task.md#close) method.
In order to log a script in multiple tasks, each task needs to be initialized using [`Task.init()`](../../references/sdk/task.md#taskinit)
with the `task_name` and `project_name` parameters input. Before initializing an additional task in the same script, the
previous task must be manually shut down with [`Task.close()`](../../references/sdk/task.md#close).
When the script is executed, the console should display the following output:

View File

@ -30,7 +30,7 @@ The sections below describe in more detail what happens in the controller task a
## The Pipeline Controller
1. Create the [pipeline controller](../../references/sdk/automation_controller_pipelinecontroller.md) object.
1. Create the [PipelineController](../../references/sdk/automation_controller_pipelinecontroller.md) object:
```python
pipe = PipelineController(
@ -90,7 +90,7 @@ The sections below describe in more detail what happens in the controller task a
The [third step](#step-3---training-the-network) uses the pre-existing task `pipeline step 3 train model` in the
`examples` projects. The step uses Step 2's artifacts.
1. Run the pipeline.
1. Run the pipeline:
```python
pipe.start()
@ -103,7 +103,7 @@ The sections below describe in more detail what happens in the controller task a
The pipeline's first step ([step1_dataset_artifact.py](https://github.com/allegroai/clearml/blob/master/examples/pipeline/step1_dataset_artifact.py))
does the following:
1. Download data using [`StorageManager.get_local_copy`](../../references/sdk/storage.md#storagemanagerget_local_copy)
1. Download data using [`StorageManager.get_local_copy()`](../../references/sdk/storage.md#storagemanagerget_local_copy):
```python
# simulate local dataset, download one, so we have something local
@ -111,7 +111,7 @@ does the following:
remote_url='https://github.com/allegroai/events/raw/master/odsc20-east/generic/iris_dataset.pkl'
)
```
1. Store the data as an artifact named `dataset` using [`Task.upload_artifact`](../../references/sdk/task.md#upload_artifact)
1. Store the data as an artifact named `dataset` using [`Task.upload_artifact()`](../../references/sdk/task.md#upload_artifact):
```python
# add and upload local file containing our toy dataset
task.upload_artifact('dataset', artifact_object=local_iris_pkl)
@ -137,7 +137,7 @@ does the following:
```
1. Download the data created in the previous step (specified through the `dataset_url` parameter) using
[`StorageManager.get_local_copy`](../../references/sdk/storage.md#storagemanagerget_local_copy)
[`StorageManager.get_local_copy`()](../../references/sdk/storage.md#storagemanagerget_local_copy)
```python
iris_pickle = StorageManager.get_local_copy(remote_url=args['dataset_url'])
@ -167,13 +167,13 @@ does the following:
task.connect(args)
```
1. Clone the base task and enqueue it using [`Task.execute_remotely`](../../references/sdk/task.md#execute_remotely).
1. Clone the base task and enqueue it using [`Task.execute_remotely()`](../../references/sdk/task.md#execute_remotely):
```python
task.execute_remotely()
```
1. Access the data created in the previous task.
1. Access the data created in the previous task:
```python
dataset_task = Task.get_task(task_id=args['dataset_task_id'])
@ -189,14 +189,14 @@ does the following:
**To run the pipeline:**
1. If the pipeline steps tasks do not yet exist, run their code to create the ClearML tasks.
1. If the pipeline steps tasks do not yet exist, run their code to create the ClearML tasks:
```bash
python step1_dataset_artifact.py
python step2_data_processing.py
python step3_train_model.py
```
1. Run the pipeline controller.
1. Run the pipeline controller:
```bash
python pipeline_from_tasks.py

View File

@ -23,7 +23,7 @@ logged as required packages for the pipeline execution step.
## Pipeline Controller
1. Create the [PipelineController](../../references/sdk/automation_controller_pipelinecontroller.md) object.
1. Create the [PipelineController](../../references/sdk/automation_controller_pipelinecontroller.md) object:
```python
pipe = PipelineController(
@ -98,7 +98,7 @@ logged as required packages for the pipeline execution step.
)
```
1. Run the pipeline.
1. Run the pipeline:
```python
pipe.start()
```

View File

@ -11,14 +11,14 @@ artifact and utilizes it.
## Task 1: Uploading an Artifact
The first task uploads a data file as an artifact using the [`Task.upload_artifact`](../../references/sdk/task.md#upload_artifact)
method, inputting the artifact's name and the location of the file.
The first task uploads a data file as an artifact using [`Task.upload_artifact()`](../../references/sdk/task.md#upload_artifact),
and inputting the artifact's name and the location of the file.
```python
task1.upload_artifact(name='data file', artifact_object='data_samples/sample.json')
```
The task is then closed, using the [`Task.close`](../../references/sdk/task.md#close) method, so another task can be
The task is then closed, using [`Task.close()`](../../references/sdk/task.md#close), so another task can be
initialized in the same script.
Artifact details (location and size) can be viewed in ClearML's **web UI > experiment details > ARTIFACTS tab > OTHER section**.