Fix storage examples (#811)

This commit is contained in:
pollfly 2024-03-26 10:10:37 +02:00 committed by GitHub
parent ec1f4d069f
commit 57be45d2a8
No known key found for this signature in database
GPG Key ID: B5690EEEBB952194

View File

@ -45,32 +45,34 @@ You can specify additional [ExtraArgs](https://boto3.amazonaws.com/v1/documentat
to pass to boto3 when uploading files. You can set this on a per-bucket basis. to pass to boto3 when uploading files. You can set this on a per-bucket basis.
``` ```
aws { sdk {
s3 { aws {
# S3 credentials, used for read/write access by various SDK elements s3 {
# S3 credentials, used for read/write access by various SDK elements
# default, used for any bucket not specified below # default, used for any bucket not specified below
key: "" key: ""
secret: "" secret: ""
region: "" region: ""
use_credentials_chain: false use_credentials_chain: false
extra_args: {} extra_args: {}
credentials: [ credentials: [
# specifies key/secret credentials to use when handling s3 urls (read or write) # specifies key/secret credentials to use when handling s3 urls (read or write)
{ {
bucket: "my-bucket-name" bucket: "my-bucket-name"
key: "" key: ""
secret: "" secret: ""
verify: "/path/to/ca/bundle.crt" OR false to not verify verify: "/path/to/ca/bundle.crt" OR false to not verify
use_credentials_chain: false use_credentials_chain: false
}, },
] ]
} }
boto3 { boto3 {
pool_connections: 512 pool_connections: 512
max_multipart_concurrency: 16 max_multipart_concurrency: 16
}
} }
} }
``` ```
@ -79,36 +81,40 @@ AWS's S3 access parameters can be specified by referencing the standard environm
For example: For example:
``` ```
aws { sdk {
s3 { aws {
# default, used for any bucket not specified below s3 {
key: ${AWS_ACCESS_KEY_ID} # default, used for any bucket not specified below
secret: ${AWS_SECRET_ACCESS_KEY} key: ${AWS_ACCESS_KEY_ID}
region: ${AWS_DEFAULT_REGION} secret: ${AWS_SECRET_ACCESS_KEY}
} region: ${AWS_DEFAULT_REGION}
}
}
} }
``` ```
ClearML also supports [MinIO](https://github.com/minio/minio) by adding this configuration: ClearML also supports [MinIO](https://github.com/minio/minio) by adding this configuration:
``` ```
aws { sdk {
s3 { aws {
# default, used for any bucket not specified below s3 {
key: "" # default, used for any bucket not specified below
secret: "" key: ""
region: "" secret: ""
region: ""
credentials: [ credentials: [
{ {
# This will apply to all buckets in this host (unless key/value is specifically provided for a given bucket) # This will apply to all buckets in this host (unless key/value is specifically provided for a given bucket)
host: "my-minio-host:9000" host: "my-minio-host:9000"
key: "" key: ""
secret: "" secret: ""
multipart: false multipart: false
secure: false secure: false
} }
] ]
} }
}
} }
``` ```
@ -121,14 +127,16 @@ To enable TLS, pass `secure: true`.
To configure Azure blob storage specify the account name and key. To configure Azure blob storage specify the account name and key.
``` ```
azure.storage { sdk {
containers: [ azure.storage {
{ containers: [
account_name: "" {
account_key: "" account_name: ""
# container_name: account_key: ""
} # container_name:
] }
]
}
} }
``` ```
@ -136,14 +144,16 @@ Azure's storage access parameters can be specified by referencing the standard e
For example: For example:
``` ```
azure.storage { sdk {
containers: [ azure.storage {
{ containers: [
account_name: ${AZURE_STORAGE_ACCOUNT} {
account_key: ${AZURE_STORAGE_KEY} account_name: ${AZURE_STORAGE_ACCOUNT}
# container_name: account_key: ${AZURE_STORAGE_KEY}
} # container_name:
] }
]
}
} }
``` ```
@ -154,36 +164,40 @@ It's also possible to specify credentials for a specific bucket in the `google.s
configuration provided in the `google.storage` section is applied to any bucket without a bucket-specific configuration. configuration provided in the `google.storage` section is applied to any bucket without a bucket-specific configuration.
``` ```
google.storage { sdk {
# Default project and credentials file google.storage {
# Will be used when no bucket configuration is found # Default project and credentials file
project: "clearml" # Will be used when no bucket configuration is found
credentials_json: "/path/to/credentials.json" project: "clearml"
credentials_json: "/path/to/credentials.json"
# Specific credentials per bucket and sub directory # Specific credentials per bucket and sub directory
credentials = [ credentials = [
{ {
bucket: "" bucket: ""
subdir: "path/in/bucket" # Not required subdir: "path/in/bucket" # Not required
project: "" project: ""
credentials_json: "/path/to/credentials.json" credentials_json: "/path/to/credentials.json"
}, },
] ]
}
} }
``` ```
GCP's storage access parameters can be specified by referencing the standard environment variables if already defined. GCP's storage access parameters can be specified by referencing the standard environment variables if already defined.
``` ```
google.storage { sdk {
credentials = [ google.storage {
{ credentials = [
bucket: "" {
subdir: "path/in/bucket" # Not required bucket: ""
project: "" subdir: "path/in/bucket" # Not required
credentials_json: ${GOOGLE_APPLICATION_CREDENTIALS} project: ""
}, credentials_json: ${GOOGLE_APPLICATION_CREDENTIALS}
] },
]
}
} }
``` ```
@ -208,8 +222,8 @@ substitution allows for registering the data into `clearml-data` once, and then
To enable path substitution, modify the clearml.conf file and configure: To enable path substitution, modify the clearml.conf file and configure:
```bash ```bash
sdk{ sdk {
storage{ storage {
path_substitution = [ path_substitution = [
# Replace registered links with local prefixes, # Replace registered links with local prefixes,
# Solve mapping issues, and allow for external resource caching. # Solve mapping issues, and allow for external resource caching.
@ -233,18 +247,20 @@ piece twice!
Configure cache location by modifying the [clearml.conf](../configs/clearml_conf.md) file: Configure cache location by modifying the [clearml.conf](../configs/clearml_conf.md) file:
``` ```
storage { sdk {
cache { storage {
# Defaults to <system_temp_folder>/clearml_cache cache {
default_base_dir: "~/.clearml/cache" # Defaults to <system_temp_folder>/clearml_cache
} default_base_dir: "~/.clearml/cache"
}
direct_access: [ direct_access: [
# Objects matching are considered to be available for direct access, i.e. they will not be downloaded # Objects matching are considered to be available for direct access, i.e. they will not be downloaded
# or cached, and any download request will return a direct reference. # or cached, and any download request will return a direct reference.
# Objects are specified in glob format, available for url and content_type. # Objects are specified in glob format, available for url and content_type.
{ url: "file://*" } # file-urls are always directly referenced { url: "file://*" } # file-urls are always directly referenced
] ]
}
} }
``` ```