4.3 KiB
Handling the Google Cloud Storage breaking change
Rationale
Due to an issue with ClearML SDK versions 1.11.x, URLs of objects uploaded to the Google Cloud Storage were stored in the ClearML backend as a quoted string. This causes issues accessing these objects directly from the ClearML SDK. This is relevant for URLs of models, datasets, artifacts, and media files/debug samples. In case you have such objects uploaded with the affected ClearML SDK versions and wish to be able to access them programmatically using the ClearML SDK (note that access from the ClearML UI is still possible), you should perform one of the actions in the section below.
Recommended Steps
The code snippets below should serve as an example rather than an actual conversion script. Depending on what object you're trying to fix, you should pick the respective lines of code from step 1 and 2.
- You need to be able to download objects (models, datasets, media, artifacts) registered by affected versions. See the code snippet below and adjust it according to your use case to be able to get a local copy of the object
from clearml import Task, ImportModel
from urllib.parse import unquote # <- you will need this
ds_task = Task.get_task(dataset_id) # For Datasets
# OR
task = Task.get_task(task_id) # For Artifacts, Media, and Models
url = unquote(ds_task.artifacts['data'].url) # For Datasets
# OR
url = unquote(task.artifacts[artifact_name].url) # For Artifacts
# OR
model = InputModel(task.output_models_id['test_file']) # For Models associated to tasks
url = unquote(model.url)
# OR
model = InputModel(model_id) # For any Models
url = unquote(model.url)
# OR
samples = task.get_debug_samples(title, series) # For Media/Debug samples
sample_urls = [unquote(sample['url']) for sample in samples]
local_path = StorageManager.get_local_copy(url)
# NOTE: For Datasets you will need to unzip the `local_path`
- Once the object is downloaded locally, you can re-register it with the new version. See the snipped below and adjust according to your use case
from clearml import Task, Dataset, OutputModel
import os
ds = Dataset.create(dataset_name=task.name, dataset_projecte=task.get_project_name(), parents=[Dataset.get(dataset_id)]) # For Datasets
# OR
task = Task.get_task(task_name=task.name, project_name=task.get_project_name()) # For Artifacts, Media, and Models
ds.add_files(unzipped_local_path) # For Datasets
ds.finalize(auto_upload=True)
# OR
task.upload_artifact(name=artifact_name, artifact_object=local_path) # For Artifacts
# OR
model = OutputModel(task=task) # For any Models
model.update_weights(local_path) # note: if the original model was created with update_weights_package,
# preserve this behavior by saving the new one with update_weights_package too
# OR
for sample in samples:
task.get_logger().report_media(sample['metric'], sample['variant'], local_path=unquote(sample['url'])) # For Media/Debug samples
Alternative methods
These methods are more advanced (read "more likely to mess up"). If you're unsure whether to use them or not, better don't. Both methods described below will alter the existing objects. Note that you still need to run the code from step 1 to have access to all required metadata.
Method 1: You can try to alter the existing unpublished experiments/models using the lower-level APIClient
from clearml.backend_api.session.client import APIClient
client = APIClient()
client.tasks.add_or_update_artifacts(task=ds_task.id, force=True, artifacts=[{"uri": unquote(ds_task.artifacts['state'].url), "key": "state", "type": "dict"}])
client.tasks.add_or_update_artifacts(task=ds_task.id, force=True, artifacts=[{"uri": unquote(ds_task.artifacts['data'].url), "key": "data", "type": "custom"}]) # For datasets on completed dataset uploads
# OR
client.tasks.add_or_update_artifacts(task=task.id, force=True, artifacts=[{"uri": unquote(url), "key": artifact_name, "type": "custom"}]) # For artifacts on completed tasks
# OR
client.models.edit(model=model.id, force=True, uri=url) # For any unpublished Model
Method 2: There's an option available only to those who self-host their ClearML server. It is possible to manually update the values registered in MongoDB, but beware - this is an advanced procedure that should be handled with extreme care, as it can lead to an inconsistent state if mishandled.