diff --git a/docs/deploying_clearml/enterprise_deploy/change_artifact_links.md b/docs/deploying_clearml/enterprise_deploy/change_artifact_links.md new file mode 100644 index 00000000..cbb100fd --- /dev/null +++ b/docs/deploying_clearml/enterprise_deploy/change_artifact_links.md @@ -0,0 +1,78 @@ +--- +title: Changing CleaML Artifacts Links +--- + +This guide describes how to update artifact references in the ClearML Enterprise server. + +By default, artifacts are stored on the file server; however, an external storage such as AWS S3, Minio, Google Cloud +Storage, etc. may be used to store artifacts. References to these artifacts may exist in ClearML databases: MongoDB and ElasticSearch. +This procedure should be used if external storage is being migrated to a different location or URL. + +:::important +This procedure does not deal with the actual migration of the data--only with changing the references in ClearML that +point to the data. +::: + +## Preparation + +### Version Confirmation + +To change the links, use the `fix_fileserver_urls.py` script, located inside the `allegro-apiserver` +Docker container. This script will be executed from within the `apiserver` container. Make sure the `apiserver` version +is 3.20 or higher. + +### Backup + +It is highly recommended to back up the ClearML MongoDB and ElasticSearch databases before running the script, as the +script changes the values in the databases, and can't be undone. + +## Fixing MongoDB links + +1. Access the `apiserver` Docker container: + * In `docker-compose:` + + ```commandline + sudo docker exec -it allegro-apiserver /bin/bash + ``` + + * In Kubernetes: + + ```commandline + kubectl exec -it -n clearml -- bash + ``` + +1. Navigate to the script location in the `upgrade` folder: + + ```commandline + cd /opt/seematics/apiserver/server/upgrade + ``` + +1. Run the following command: + + :::important + Before running the script, verify that this is indeed the correct version (`apiserver` v3.20 or higher, + or that the script provided by ClearML was copied into the container). + :::: + + ```commandline + python3 fix_fileserver_urls.py \ + --mongo-host mongodb://mongo:27017 \ + --elastic-host elasticsearch:9200 \ + --host-source "" \ + --host-target "" --datasets + ``` + +:::note Notes +* If MongoDB or ElasticSearch services are accessed from the `apiserver` container using custom addresses, then +`--mongo-host` and `--elastic-host` arguments should be updated accordingly. +* If ElasticSearch is set up to require authentication then the following arguments should be used to pass the user +and password: `--elastic-user --elastic-password ` +::: + +The script fixes the links in MongoDB, and outputs `cURL` commands for updating the links in ElasticSearch. + +## Fixing the ElasticSearch Links + +Copy the `cURL` commands printed by the script run in the previous stage, and run them one after the other. Make sure to +inspect that a "success" result was returned from each command. Depending on the amount of the data in the ElasticSearch, +running these commands may take some time. \ No newline at end of file diff --git a/docs/deploying_clearml/enterprise_deploy/import_projects.md b/docs/deploying_clearml/enterprise_deploy/import_projects.md new file mode 100644 index 00000000..324350bf --- /dev/null +++ b/docs/deploying_clearml/enterprise_deploy/import_projects.md @@ -0,0 +1,240 @@ +--- +title: Exporting and Importing ClearML Projects +--- + +When migrating from a ClearML Open Server to a ClearML Enterprise Server, you may need to transfer projects. This is done +using the `data_tool.py` script. This utility is available in the `apiserver` Docker image, and can be used for +exporting and importing ClearML project data for both open source and Enterprise versions. + +This guide covers the following: +* Exporting data from Open Source and Enterprise servers +* Importing data into an Enterprise server +* Handling the artifacts stored on the file server. + +:::note +Export instructions differ for ClearML open and Enterprise servers. Make sure you follow the guidelines that match your +server type. +::: + +## Exporting Data + +The export process is done by running the ***data_tool*** script that generates a zip file containing project and task +data. This file should then be copied to the server on which the import will run. + +Note that artifacts stored in the ClearML ***file server*** should be copied manually if required (see [Handling Artifacts](#handling-artifacts)). + +### Exporting Data from ClearML Open Servers + +#### Preparation + +* Make sure the `apiserver` is at least Open Source server version 1.12.0. +* Note that any `pending` or `running` tasks will not be exported. If you wish to export them, make sure to stop/dequeue +them before exporting. + +#### Running the Data Tool + +Execute the data tool within the `apiserver` container. + +Open a bash session inside the `apiserver` container of the server: +* In docker-compose: + + ```commandline + sudo docker exec -it clearml-apiserver /bin/bash + ``` + +* In Kubernetes: + + ```commandline + kubectl exec -it -n -- bash + ``` + +#### Export Commands +**To export specific projects:** + +```commandline +python3 -m apiserver.data_tool export --projects +--statuses created stopped published failed completed --output .zip +``` + +As a result, you should get a `.zip` file that contains all the data from the specified projects and +their children. + +**To export all the projects:** + +```commandline +python3 -m apiserver.data_tool export \ + --all \ + --statuses created stopped published failed completed \ + --output .zip +``` + +#### Optional Parameters + +* `--experiments ` - If not specified then all experiments from the specified projects are exported +* `--statuses ` - Export tasks of specific statuses. If the parameter + is omitted, only `published` tasks are exported +* `--no-events` - Do not export task events, i.e. logs and metrics (scalar, plots, debug samples). + +Make sure to copy the generated zip file containing the exported data. + +### Exporting Data from ClearML Enterprise Servers + +#### Preparation + +* Make sure the `apiserver` is at least Enterprise Server version 3.18.0. +* Note that any `pending` or `running` tasks will not be exported. If you wish to export them, make sure to stop/dequeue +before exporting. + +#### Running the Data Tool + +Execute the data tool from within the `apiserver` docker container. + +Open a bash session inside the `apiserver` container of the server: +* In `docker-compose`: + + ```commandline + sudo docker exec -it allegro-apiserver /bin/bash + ``` + +* In Kubernetes: + + ```commandline + kubectl exec -it -n -- bash + ``` + +#### Export Commands + +**To export specific projects:** + +```commandline +PYTHONPATH=/opt/seematics/apiserver/trains-server-repo python3 data_tool.py \ + export \ + --projects \ + --statuses created stopped published failed completed \ + --output .zip +``` + +As a result, you should get `.zip` file that contains all the data from the specified projects and +their children. + +**To export all the projects:** + +```commandline +PYTHONPATH=/opt/seematics/apiserver/trains-server-repo python3 data_tool.py \ + export \ + --all \ + --statuses created stopped published failed completed \ + --output .zip +``` + +#### Optional Parameters + +* `--experiments ` - If not specified then all experiments from the specified projects are exported +* `--statuses ` - Can be used to allow exporting tasks of specific statuses. If the parameter is + omitted, only `published` tasks are exported. +* `--no-events` - Do not export task events, i.e. logs, and metrics (scalar, plots, debug samples). + +Make sure to copy the generated zip file containing the exported data. + +## Importing Data + +This section explains how to import the exported data into a ClearML Enterprise server. + +### Preparation + +* It is highly recommended to back up the ClearML databases before importing data, as import injects data into the +databases, and can't be undone. +* Make sure you are working with `apiserver` version 3.22.3 or higher. +* Make the zip file accessible from within the `apiserver` container by copying the exported data to the +`apiserver` container or to a folder on the host, which the `apiserver` is mounted to. + +### Usage + +The data tool should be executed from within the `apiserver` docker container. + +1. Open a bash session inside the `apiserver` container of the server: + * In `docker-compose`: + + ```commandline + sudo docker exec -it allegro-apiserver /bin/bash + ``` + + * In Kubernetes: + + ```commandline + kubectl exec -it -n -- bash + ``` + +1. Run the data tool script in *import* mode: + + ```commandline + PYTHONPATH=/opt/seematics/apiserver/trains-server-repo python3 data_tool.py \ + import \ + \ + --company \ + --user + ``` + + * `company_id`- The default company ID used in the target deployment. Inside the `apiserver` container you can + usually get it from the environment variable `CLEARML__APISERVER__DEFAULT_COMPANY`. + If you do not specify the `--company` parameter then all the data will be imported as `Examples` (read-only) + * `user_id` - The ID of the user in the target deployment who will become the owner of the imported data + +## Handling Artifacts + +***Artifacts*** refers to any content which the ClearML server holds references to. This can include: +* Dataset or Hyper-Dataset frame URLs +* ClearML artifact URLs +* Model snapshots +* Debug samples + +Artifacts may be stored in any external storage (e.g., AWS S3, minio, Google Cloud Storage) or in the ClearML file server. +* If the artifacts are **not** stored in the ClearML file server, they do not need to be moved during the export/import process, +as the URLs registered in ClearML entities pointing to these artifacts will not change. +* If the artifacts are stored in the ClearML file server, then the file server content must also be moved, and the URLs + in the ClearML databases must point to the new location. See instructions [below](#exporting-file-server-data-for-clearml-open-server). + +### Exporting File Server Data for ClearML Open Server + +Data in the file server is organized by project. For each project, all data references by entities in that project is +stored in a folder bearing the name of the project. This folder can be located in: + +``` +/opt/clearml/data/fileserver/ +``` + +The entire projects' folders content should be copied to the target server (see [Importing Fileserver Data](#importing-file-server-data)). + +### Exporting File Server Data for ClearML Enterprise Server + +Data in the file server is organized by tenant and project. For each project, all data references by entities in that +project is stored in a folder bearing the name of the project. This folder can be located in: + +``` +/opt/allegro/data/fileserver// +``` + +The entire projects' folders content should be copied to the target server (see [Importing Fileserver Data](#importing-file-server-data)). + +## Importing File Server Data + +### Copying the Data + +Place the exported projects' folder(s) content into the target file server's storage in the following folder: + +``` +/opt/allegro/data/fileserver// +``` + +### Fixing Registered URLs + +Since URLs pointing to the file server contain the file server's address, these need to be changed to the address of the +new file server. + +Note that this is not required if the new file server is replacing the old file server and can be accessed using the same +exact address. + +Once the projects' data has been copied to the target server, and the projects themselves were imported, see +[Changing CleaML Artifacts Links](change_artifact_links.md) for information on how to fix the URLs. + + diff --git a/sidebars.js b/sidebars.js index 2843c553..cbbd6289 100644 --- a/sidebars.js +++ b/sidebars.js @@ -652,6 +652,8 @@ module.exports = { ] }, 'deploying_clearml/enterprise_deploy/delete_tenant', + 'deploying_clearml/enterprise_deploy/import_projects', + 'deploying_clearml/enterprise_deploy/change_artifact_links', { 'Enterprise Applications': [ 'deploying_clearml/enterprise_deploy/app_install_ubuntu_on_prem',