Update StorageManager examples (#897)

This commit is contained in:
pollfly 2024-08-08 12:51:10 +03:00 committed by GitHub
parent 3ce20be57d
commit 9a61779a34
No known key found for this signature in database
GPG Key ID: B5690EEEBB952194
2 changed files with 58 additions and 14 deletions

View File

@ -1,24 +1,22 @@
--- ---
title: Storage Examples title: StorageManager
--- ---
This page describes storage examples using the [StorageManager](../../references/sdk/storage.md) This page describes storage examples using the [`StorageManager`](../../references/sdk/storage.md)
class. The storage examples include: class. The storage examples include:
* [Downloading a file](#downloading-a-file) - Get an object from storage. * Downloading [files](#downloading-a-file) and [folders](#downloading-a-folder) - Get an object from storage.
* [Uploading a file](#uploading-a-file) - Upload an object. * Uploading [files](#uploading-a-file) and [folders](#uploading-a-folder) - Upload an object.
* [Setting cache limits](#setting-cache-limits) - Set the maximum number of objects. * [Setting cache limits](#setting-cache-limits)
:::note :::note
`StorageManager` supports HTTP(S), S3, Google Cloud Storage, Azure, and file system folders. `StorageManager` supports HTTP(S), S3, Google Cloud Storage, Azure, and file system folders.
::: :::
## StorageManager ## Working with Files
### Downloading a File ### Downloading a File
To download a ZIP file from storage to the `global` cache context, call the [`StorageManager.get_local_copy`](../../references/sdk/storage.md#storagemanagerget_local_copy) To download a ZIP file from storage to the `global` cache context, use the [`StorageManager.get_local_copy`](../../references/sdk/storage.md#storagemanagerget_local_copy)
class method, and specify the destination location as the `remote_url` argument: class method, and specify the destination location as the `remote_url` argument:
```python ```python
@ -47,15 +45,19 @@ By default, the `StorageManager` reports its download progress to the console ev
[`StorageManager.set_report_download_chunk_size`](../../references/sdk/storage.md#storagemanagerset_report_download_chunk_size) [`StorageManager.set_report_download_chunk_size`](../../references/sdk/storage.md#storagemanagerset_report_download_chunk_size)
class method, and specifying the chunk size in MB (not supported for Azure and GCP storage). class method, and specifying the chunk size in MB (not supported for Azure and GCP storage).
```python
StorageManager.set_report_download_chunk_size(chunk_size_mb=10)
```
### Uploading a File ### Uploading a File
To upload a file to storage, call the [`StorageManager.upload_file`](../../references/sdk/storage.md#storagemanagerupload_file) To upload a file to storage, use the [`StorageManager.upload_file`](../../references/sdk/storage.md#storagemanagerupload_file)
class method. Specify the full path of the local file as the `local_file` argument, and the remote URL as the `remote_url` class method. Specify the full path of the local file as the `local_file` argument, and the remote URL as the `remote_url`
argument. argument.
```python ```python
StorageManager.upload_file( StorageManager.upload_file(
local_file="/mnt/data/also_file.ext", remote_url="s3://MyBucket/MyFolder" local_file="/mnt/data/also_file.ext", remote_url="s3://MyBucket/MyFolder/also_file.ext"
) )
``` ```
@ -65,10 +67,52 @@ By default, the `StorageManager` reports its upload progress to the console ever
[`StorageManager.set_report_upload_chunk_size`](../../references/sdk/storage.md#storagemanagerset_report_upload_chunk_size) [`StorageManager.set_report_upload_chunk_size`](../../references/sdk/storage.md#storagemanagerset_report_upload_chunk_size)
class method, and specifying the chunk size in MB (not supported for Azure and GCP storage). class method, and specifying the chunk size in MB (not supported for Azure and GCP storage).
```python
StorageManager.set_report_upload_chunk_size(chunk_size_mb=10)
```
### Setting Cache Limits ## Working with Folders
### Downloading a Folder
Download a folder to a local machine using the [`StorageManager.download_folder`](../../references/sdk/storage.md#storagemanagerdownload_folder)
class method. Specify the remote storage location as the `remote_url` argument and the target local location as the
`local_folder` argument.
To set a limit on the number of files cached, call the [`StorageManager.set_cache_file_limit`](../../references/sdk/storage.md#storagemanagerset_cache_file_limit) ```python
StorageManager.download_folder(remote_url="s3://bucket/", local_file="/folder/")
```
This method downloads a remote folder recursively, maintaining the sub-folder structure from
the remote storage.
For example: if you have a remote file `s3://bucket/sub/file.ext`, then
`StorageManager.download_folder(remote_url="s3://bucket/", local_file="/folder/")` will create `/folder/sub/file.ext`.
You can input `match_wildcard` so only files matching the wildcard are downloaded.
### Uploading a Folder
Upload a local folder to remote storage using the [`StorageManager.upload_folder`](../../references/sdk/storage.md#storagemanagerupload_folder)
class method. Specify the local folder to upload as the `local_folder` argument and the target remote location as the
`remote_url` argument.
```python
StorageManager.upload_folder(local_file="/LocalFolder", remote_url="s3://MyBucket/MyFolder")
```
This method uploads the local folder recursively to remote storage, maintaining the sub-folder structure from the local
storage.
For example: If you have a local file `/LocalFolder/sub/file.ext` then `StorageManager.upload_folder(local_file="/LocalFolder", remote_url="s3://MyBucket/MyFolder")`
will create `s3://bucket/sub/file.ext`.
Use the `retries` parameter to set the number of upload attempts for each file in the folder in case
of failure. If some files fail to upload, the incomplete folder will remain in the target location with the
successfully uploaded files.
You can input `match_wildcard` so only files matching the wildcard are uploaded.
## Setting Cache Limits
To set a limit on the number of files cached, use the [`StorageManager.set_cache_file_limit`](../../references/sdk/storage.md#storagemanagerset_cache_file_limit)
class method and specify the `cache_file_limit` argument as the maximum number of files. This does not limit the cache size, class method and specify the `cache_file_limit` argument as the maximum number of files. This does not limit the cache size,
only the number of files. only the number of files.

View File

@ -211,7 +211,7 @@ fails to load the credentials as a file, it will attempt to decode the JSON dire
ClearML provides the [StorageManager](../references/sdk/storage.md) class to manage downloading, uploading, and caching of ClearML provides the [StorageManager](../references/sdk/storage.md) class to manage downloading, uploading, and caching of
content directly from code. content directly from code.
See [Storage Examples](../guides/storage/examples_storagehelper.md). See [StorageManager Examples](../guides/storage/examples_storagehelper.md).
### Path Substitution ### Path Substitution
The ClearML StorageManager supports local path substitution when fetching files. The ClearML StorageManager supports local path substitution when fetching files.