Add info for non-AWS S3 storage (#783)

This commit is contained in:
pollfly
2024-02-26 17:07:59 +02:00
committed by GitHub
parent 90faaf234e
commit 91fcaa2f24
7 changed files with 39 additions and 6 deletions

View File

@@ -104,7 +104,12 @@ clearml-data remove [-h] [--id ID] [--files [FILES [FILES ...]]]
## upload
Upload the local dataset changes to the server. By default, it's uploaded to the [ClearML Server](../deploying_clearml/clearml_server.md). You can specify a different storage
medium by entering an upload destination, such as `s3://bucket`, `gs://`, `azure://`, `/mnt/shared/`.
medium by entering an upload destination. For example:
* A shared folder: `:/mnt/shared/folder`
* S3: `s3://bucket/folder`
* Non-AWS S3-like services (e.g. MinIO): `s3://host_addr:port/bucket`
* Google Cloud Storage: `gs://bucket-name/folder`
* Azure Storage: `azure://<account name>.blob.core.windows.net/path/to/file`
```bash
clearml-data upload [-h] [--id ID] [--storage STORAGE] [--chunk-size CHUNK_SIZE]

View File

@@ -62,7 +62,13 @@ and auto-increments the version number.
:::
Use the `output_uri` parameter to specify a network storage target to upload the dataset files, and associated information
(such as previews) to (e.g. `s3://bucket/data`, `gs://bucket/data`, `azure://<account name>.blob.core.windows.net/path/to/file`, `file:///mnt/share/data`).
(such as previews) to. For example:
* A shared folder: `/mnt/share/folder`
* S3: `s3://bucket/folder`
* Non-AWS S3-like services (e.g. MinIO): `s3://host_addr:port/bucket`
* Google Cloud Storage: `gs://bucket-name/folder`
* Azure Storage: `azure://<account name>.blob.core.windows.net/path/to/file`
By default, the dataset uploads to ClearML's file server. The `output_uri` parameter of the [`Dataset.upload`](#uploading-files)
method overrides this parameter's value.
@@ -248,7 +254,13 @@ dataset.get_logger().report_histogram(
To upload the dataset files to network storage, use the [`Dataset.upload`](../references/sdk/dataset.md#upload) method.
Use the `output_url` parameter to specify storage target, such as S3 / GS / Azure (e.g. `s3://bucket/data`, `gs://bucket/data`, `azure://<account name>.blob.core.windows.net/path/to/file`, `/mnt/share/data`).
Use the `output_url` parameter to specify storage target, such as S3 / GS / Azure. For example:
* A shared folder: `/mnt/share/folder`
* S3: `s3://bucket/folder`
* Non-AWS S3-like services (e.g. MinIO): `s3://host_addr:port/bucket`
* Google Cloud Storage: `gs://bucket-name/folder`
* Azure Storage: `azure://<account name>.blob.core.windows.net/path/to/file`
By default, the dataset uploads to ClearML's file server. This target storage overrides the `output_uri` value of the
[`Dataset.create`](#creating-datasets) method.