Add info for non-AWS S3 storage (#783)

This commit is contained in:
pollfly 2024-02-26 17:07:59 +02:00 committed by GitHub
parent 90faaf234e
commit 91fcaa2f24
No known key found for this signature in database
GPG Key ID: B5690EEEBB952194
7 changed files with 39 additions and 6 deletions

View File

@ -104,7 +104,12 @@ clearml-data remove [-h] [--id ID] [--files [FILES [FILES ...]]]
## upload
Upload the local dataset changes to the server. By default, it's uploaded to the [ClearML Server](../deploying_clearml/clearml_server.md). You can specify a different storage
medium by entering an upload destination, such as `s3://bucket`, `gs://`, `azure://`, `/mnt/shared/`.
medium by entering an upload destination. For example:
* A shared folder: `:/mnt/shared/folder`
* S3: `s3://bucket/folder`
* Non-AWS S3-like services (e.g. MinIO): `s3://host_addr:port/bucket`
* Google Cloud Storage: `gs://bucket-name/folder`
* Azure Storage: `azure://<account name>.blob.core.windows.net/path/to/file`
```bash
clearml-data upload [-h] [--id ID] [--storage STORAGE] [--chunk-size CHUNK_SIZE]

View File

@ -62,7 +62,13 @@ and auto-increments the version number.
:::
Use the `output_uri` parameter to specify a network storage target to upload the dataset files, and associated information
(such as previews) to (e.g. `s3://bucket/data`, `gs://bucket/data`, `azure://<account name>.blob.core.windows.net/path/to/file`, `file:///mnt/share/data`).
(such as previews) to. For example:
* A shared folder: `/mnt/share/folder`
* S3: `s3://bucket/folder`
* Non-AWS S3-like services (e.g. MinIO): `s3://host_addr:port/bucket`
* Google Cloud Storage: `gs://bucket-name/folder`
* Azure Storage: `azure://<account name>.blob.core.windows.net/path/to/file`
By default, the dataset uploads to ClearML's file server. The `output_uri` parameter of the [`Dataset.upload`](#uploading-files)
method overrides this parameter's value.
@ -248,7 +254,13 @@ dataset.get_logger().report_histogram(
To upload the dataset files to network storage, use the [`Dataset.upload`](../references/sdk/dataset.md#upload) method.
Use the `output_url` parameter to specify storage target, such as S3 / GS / Azure (e.g. `s3://bucket/data`, `gs://bucket/data`, `azure://<account name>.blob.core.windows.net/path/to/file`, `/mnt/share/data`).
Use the `output_url` parameter to specify storage target, such as S3 / GS / Azure. For example:
* A shared folder: `/mnt/share/folder`
* S3: `s3://bucket/folder`
* Non-AWS S3-like services (e.g. MinIO): `s3://host_addr:port/bucket`
* Google Cloud Storage: `gs://bucket-name/folder`
* Azure Storage: `azure://<account name>.blob.core.windows.net/path/to/file`
By default, the dataset uploads to ClearML's file server. This target storage overrides the `output_uri` value of the
[`Dataset.create`](#creating-datasets) method.

View File

@ -782,6 +782,15 @@ task = Task.init(
)
```
:::tip Output URI Formats
Specify the model storage URI location using the relevant format:
* A shared folder: `/mnt/share/folder`
* S3: `s3://bucket/folder`
* Non-AWS S3-like services (e.g. MinIO): `s3://host_addr:port/bucket`
* Google Cloud Storage: `gs://bucket-name/folder`
* Azure Storage: `azure://<account name>.blob.core.windows.net/path/to/file`
:::
To automatically store all models created by any experiment at a specific location, edit the `clearml.conf` (see
[ClearML Configuration Reference](../configs/clearml_conf.md#sdkdevelopment)) and set `sdk.developmenmt.default_output_uri`
to the desired storage (see [Storage](../integrations/storage.md)). This is especially helpful when

View File

@ -202,7 +202,7 @@ clearml-serving model upload [-h] --name NAME [--tags TAGS [TAGS ...]] --project
|`--publish`| Publish the newly created model (change model state to "published" (i.e. locked and ready to deploy)|<img src="/docs/latest/icons/ico-optional-yes.svg" alt="Yes" className="icon size-md center-md" />|
|`--path`|Specify a model file/folder to be uploaded and registered| <img src="/docs/latest/icons/ico-optional-yes.svg" alt="Yes" className="icon size-md center-md" />|
|`--url`| Specify an already uploaded model url (e.g. `s3://bucket/model.bin`, `gs://bucket/model.bin`)|<img src="/docs/latest/icons/ico-optional-yes.svg" alt="Yes" className="icon size-md center-md" />|
|`--destination`|Specify the target destination for the model to be uploaded (e.g. `s3://bucket/folder/`, `gs://bucket/folder/`)|<img src="/docs/latest/icons/ico-optional-yes.svg" alt="Yes" className="icon size-md center-md" />|
|`--destination`|Specify the target destination for the model to be uploaded. For example: `s3://bucket/folder/`, `s3://host_addr:port/bucket` (for non-AWS S3-like services like MinIO), `gs://bucket-name/folder`, `azure://<account name>.blob.core.windows.net/path/to/file`|<img src="/docs/latest/icons/ico-optional-yes.svg" alt="Yes" className="icon size-md center-md" />|
</div>

View File

@ -104,7 +104,7 @@ or with the `clearml-serving` CLI.
:::info Model Storage
You can also provide a different storage destination for the model, such as S3/GS/Azure, by passing
`--destination="s3://bucket/folder"`, `gs://bucket/folder`, `azure://<account name>.blob.core.windows.net/path/to/file`. There is no need to provide a unique
`--destination="s3://bucket/folder"`, `s3://host_addr:port/bucket` (for non-AWS S3-like services like MinIO), `gs://bucket/folder`, `azure://<account name>.blob.core.windows.net/path/to/file`. There is no need to provide a unique
path to the destination argument, the location of the model will be a unique path based on the serving service ID and the
model name
:::

View File

@ -670,7 +670,13 @@ the experiment's ID. If the experiment's ID is `6ea4f0b56d994320a713aeaf13a86d9d
/mnt/shared/folder/task.6ea4f0b56d994320a713aeaf13a86d9d/models/
```
ClearML supports other storage types for `output_uri`, including:
ClearML supports other storage types for `output_uri`:
* S3: `s3://bucket/folder`
* Non-AWS S3-like services (e.g. MinIO): `s3://host_addr:port/bucket`
* Google Cloud Storage: `gs://bucket-name/folder`
* Azure Storage: `azure://<account name>.blob.core.windows.net/path/to/file`
For example:
```python
# AWS S3 bucket
task = Task.init(project_name, task_name, output_uri="s3://bucket-name/folder")

View File

@ -86,6 +86,7 @@ and formats for specifying locations include:
* A shared folder: `/mnt/share/folder`
* S3: `s3://bucket/folder`
* Non-AWS S3-like services (e.g. MinIO): `s3://host_addr:port/bucket`
* Google Cloud Storage: `gs://bucket-name/folder`
* Azure Storage: `azure://<account name>.blob.core.windows.net/path/to/file`