clearml-docs/docs/guides/storage/examples_storagehelper.md

121 lines
5.1 KiB
Markdown
Raw Normal View History

2021-05-13 23:48:51 +00:00
---
2024-08-08 09:51:10 +00:00
title: StorageManager
2021-05-13 23:48:51 +00:00
---
2024-08-08 09:51:10 +00:00
This page describes storage examples using the [`StorageManager`](../../references/sdk/storage.md)
2021-05-13 23:48:51 +00:00
class. The storage examples include:
2024-08-08 09:51:10 +00:00
* Downloading [files](#downloading-a-file) and [folders](#downloading-a-folder) - Get an object from storage.
* Uploading [files](#uploading-a-file) and [folders](#uploading-a-folder) - Upload an object.
* [Setting cache limits](#setting-cache-limits)
2021-05-13 23:48:51 +00:00
:::note
2024-08-04 15:45:38 +00:00
`StorageManager` supports HTTP(S), S3, Google Cloud Storage, Azure, and file system folders.
2021-05-13 23:48:51 +00:00
:::
2024-08-08 09:51:10 +00:00
## Working with Files
2021-09-09 10:17:46 +00:00
### Downloading a File
2021-05-13 23:48:51 +00:00
2024-08-25 10:50:12 +00:00
To download a ZIP file from storage to the `global` cache context, use the [`StorageManager.get_local_copy()`](../../references/sdk/storage.md#storagemanagerget_local_copy)
2023-01-10 08:29:40 +00:00
class method, and specify the destination location as the `remote_url` argument:
2021-05-13 23:48:51 +00:00
2022-12-27 14:01:47 +00:00
```python
2023-01-10 08:29:40 +00:00
from clearml import StorageManager
StorageManager.get_local_copy(remote_url="s3://MyBucket/MyFolder/file.zip")
2022-12-27 14:01:47 +00:00
```
2021-05-13 23:48:51 +00:00
2021-09-01 06:41:27 +00:00
:::note
Zip and tar.gz files will be automatically extracted to cache. This can be controlled with the `extract_archive` flag.
2021-09-01 06:41:27 +00:00
:::
2021-05-13 23:48:51 +00:00
To download a file to a specific context in cache, specify the name of the context as the `cache_context` argument:
2022-12-27 14:01:47 +00:00
```python
2023-01-10 08:29:40 +00:00
StorageManager.get_local_copy(remote_url="s3://MyBucket/MyFolder/file.ext", cache_context="test")
2022-12-27 14:01:47 +00:00
```
2021-05-13 23:48:51 +00:00
To download a non-compressed file, set the `extract_archive` argument to `False`.
2022-12-27 14:01:47 +00:00
```python
2023-01-10 08:29:40 +00:00
StorageManager.get_local_copy(remote_url="s3://MyBucket/MyFolder/file.ext", extract_archive=False)
2022-12-27 14:01:47 +00:00
```
2021-05-13 23:48:51 +00:00
2022-12-26 08:34:56 +00:00
By default, the `StorageManager` reports its download progress to the console every 5MB. You can change this using the
2024-08-25 10:50:12 +00:00
[`StorageManager.set_report_download_chunk_size()`](../../references/sdk/storage.md#storagemanagerset_report_download_chunk_size)
2022-12-26 08:34:56 +00:00
class method, and specifying the chunk size in MB (not supported for Azure and GCP storage).
2021-05-13 23:48:51 +00:00
2024-08-08 09:51:10 +00:00
```python
StorageManager.set_report_download_chunk_size(chunk_size_mb=10)
```
2021-09-09 10:17:46 +00:00
### Uploading a File
2021-05-13 23:48:51 +00:00
2024-08-25 10:50:12 +00:00
To upload a file to storage, use the [`StorageManager.upload_file()`](../../references/sdk/storage.md#storagemanagerupload_file)
2023-01-10 08:29:40 +00:00
class method. Specify the full path of the local file as the `local_file` argument, and the remote URL as the `remote_url`
2021-05-13 23:48:51 +00:00
argument.
2022-12-27 14:01:47 +00:00
```python
2023-01-10 08:29:40 +00:00
StorageManager.upload_file(
2024-08-08 09:51:10 +00:00
local_file="/mnt/data/also_file.ext", remote_url="s3://MyBucket/MyFolder/also_file.ext"
2022-12-27 14:01:47 +00:00
)
```
2021-05-13 23:48:51 +00:00
2024-02-26 15:24:58 +00:00
Use the `retries` parameter to set the number of times file upload should be retried in case of failure.
2022-12-26 08:34:56 +00:00
By default, the `StorageManager` reports its upload progress to the console every 5MB. You can change this using the
2024-08-25 10:50:12 +00:00
[`StorageManager.set_report_upload_chunk_size()`](../../references/sdk/storage.md#storagemanagerset_report_upload_chunk_size)
2022-12-26 08:34:56 +00:00
class method, and specifying the chunk size in MB (not supported for Azure and GCP storage).
2024-08-08 09:51:10 +00:00
```python
StorageManager.set_report_upload_chunk_size(chunk_size_mb=10)
```
## Working with Folders
### Downloading a Folder
2024-08-25 10:50:12 +00:00
Download a folder to a local machine using the [`StorageManager.download_folder()`](../../references/sdk/storage.md#storagemanagerdownload_folder)
2024-08-08 09:51:10 +00:00
class method. Specify the remote storage location as the `remote_url` argument and the target local location as the
`local_folder` argument.
```python
StorageManager.download_folder(remote_url="s3://bucket/", local_file="/folder/")
```
This method downloads a remote folder recursively, maintaining the sub-folder structure from
the remote storage.
For example: if you have a remote file `s3://bucket/sub/file.ext`, then
`StorageManager.download_folder(remote_url="s3://bucket/", local_file="/folder/")` will create `/folder/sub/file.ext`.
You can input `match_wildcard` so only files matching the wildcard are downloaded.
### Uploading a Folder
2024-08-25 10:50:12 +00:00
Upload a local folder to remote storage using the [`StorageManager.upload_folder()`](../../references/sdk/storage.md#storagemanagerupload_folder)
2024-08-08 09:51:10 +00:00
class method. Specify the local folder to upload as the `local_folder` argument and the target remote location as the
`remote_url` argument.
```python
StorageManager.upload_folder(local_file="/LocalFolder", remote_url="s3://MyBucket/MyFolder")
```
This method uploads the local folder recursively to remote storage, maintaining the sub-folder structure from the local
storage.
For example: If you have a local file `/LocalFolder/sub/file.ext` then `StorageManager.upload_folder(local_file="/LocalFolder", remote_url="s3://MyBucket/MyFolder")`
will create `s3://bucket/sub/file.ext`.
Use the `retries` parameter to set the number of upload attempts for each file in the folder in case
of failure. If some files fail to upload, the incomplete folder will remain in the target location with the
successfully uploaded files.
You can input `match_wildcard` so only files matching the wildcard are uploaded.
2021-05-13 23:48:51 +00:00
2024-08-08 09:51:10 +00:00
## Setting Cache Limits
2021-05-13 23:48:51 +00:00
2024-08-25 10:50:12 +00:00
To set a limit on the number of files cached, use the [`StorageManager.set_cache_file_limit()`](../../references/sdk/storage.md#storagemanagerset_cache_file_limit)
2023-01-10 08:29:40 +00:00
class method and specify the `cache_file_limit` argument as the maximum number of files. This does not limit the cache size,
2021-05-13 23:48:51 +00:00
only the number of files.
2022-12-27 14:01:47 +00:00
```python
2023-01-10 08:29:40 +00:00
new_cache_limit = StorageManager.set_cache_file_limit(cache_file_limit=100)
2022-12-27 14:01:47 +00:00
```