clearml-docs/docs/guides/storage/examples_storagehelper.md

77 lines
3.0 KiB
Markdown
Raw Normal View History

2021-05-13 23:48:51 +00:00
---
title: Storage Examples
---
This page describes storage examples using the [StorageManager](../../references/sdk/storage.md)
class. The storage examples include:
2021-09-01 06:41:27 +00:00
* [Downloading a file](#downloading-a-file) - Get an object from storage.
* [Uploading a file](#uploading-a-file) - Upload an object.
* [Setting cache limits](#setting-cache-limits) - Set the maximum number of objects.
2021-05-13 23:48:51 +00:00
:::note
`StorageManager` supports http(s), S3, Google Cloud Storage, Azure, and file system folders.
:::
## StorageManager
2021-09-09 10:17:46 +00:00
### Downloading a File
2021-05-13 23:48:51 +00:00
To download a ZIP file from storage to the `global` cache context, call the [StorageManager.get_local_copy](../../references/sdk/storage.md#storagemanagerget_local_copy)
2023-01-10 08:29:40 +00:00
class method, and specify the destination location as the `remote_url` argument:
2021-05-13 23:48:51 +00:00
2022-12-27 14:01:47 +00:00
```python
2023-01-10 08:29:40 +00:00
from clearml import StorageManager
StorageManager.get_local_copy(remote_url="s3://MyBucket/MyFolder/file.zip")
2022-12-27 14:01:47 +00:00
```
2021-05-13 23:48:51 +00:00
2021-09-01 06:41:27 +00:00
:::note
Zip and tar.gz files will be automatically extracted to cache. This can be controlled with the`extract_archive` flag.
:::
2021-05-13 23:48:51 +00:00
To download a file to a specific context in cache, specify the name of the context as the `cache_context` argument:
2022-12-27 14:01:47 +00:00
```python
2023-01-10 08:29:40 +00:00
StorageManager.get_local_copy(remote_url="s3://MyBucket/MyFolder/file.ext", cache_context="test")
2022-12-27 14:01:47 +00:00
```
2021-05-13 23:48:51 +00:00
To download a non-compressed file, set the `extract_archive` argument to `False`.
2022-12-27 14:01:47 +00:00
```python
2023-01-10 08:29:40 +00:00
StorageManager.get_local_copy(remote_url="s3://MyBucket/MyFolder/file.ext", extract_archive=False)
2022-12-27 14:01:47 +00:00
```
2021-05-13 23:48:51 +00:00
2022-12-26 08:34:56 +00:00
By default, the `StorageManager` reports its download progress to the console every 5MB. You can change this using the
[`StorageManager.set_report_download_chunk_size`](../../references/sdk/storage.md#storagemanagerset_report_download_chunk_size)
class method, and specifying the chunk size in MB (not supported for Azure and GCP storage).
2021-05-13 23:48:51 +00:00
2021-09-09 10:17:46 +00:00
### Uploading a File
2021-05-13 23:48:51 +00:00
To upload a file to storage, call the [StorageManager.upload_file](../../references/sdk/storage.md#storagemanagerupload_file)
2023-01-10 08:29:40 +00:00
class method. Specify the full path of the local file as the `local_file` argument, and the remote URL as the `remote_url`
2021-05-13 23:48:51 +00:00
argument.
2022-12-27 14:01:47 +00:00
```python
2023-01-10 08:29:40 +00:00
StorageManager.upload_file(
2022-12-27 14:01:47 +00:00
local_file="/mnt/data/also_file.ext", remote_url="s3://MyBucket/MyFolder"
)
```
2021-05-13 23:48:51 +00:00
2022-12-26 08:34:56 +00:00
Use the `retries parameter` to set the number of times file upload should be retried in case of failure.
By default, the `StorageManager` reports its upload progress to the console every 5MB. You can change this using the
[`StorageManager.set_report_upload_chunk_size`](../../references/sdk/storage.md#storagemanagerset_report_upload_chunk_size)
class method, and specifying the chunk size in MB (not supported for Azure and GCP storage).
2021-05-13 23:48:51 +00:00
2021-09-09 10:17:46 +00:00
### Setting Cache Limits
2021-05-13 23:48:51 +00:00
To set a limit on the number of files cached, call the [StorageManager.set_cache_file_limit](../../references/sdk/storage.md#storagemanagerset_cache_file_limit)
2023-01-10 08:29:40 +00:00
class method and specify the `cache_file_limit` argument as the maximum number of files. This does not limit the cache size,
2021-05-13 23:48:51 +00:00
only the number of files.
2022-12-27 14:01:47 +00:00
```python
2023-01-10 08:29:40 +00:00
new_cache_limit = StorageManager.set_cache_file_limit(cache_file_limit=100)
2022-12-27 14:01:47 +00:00
```