2020-12-22 21:14:48 +00:00
< div align = "center" >
2019-06-11 15:55:04 +00:00
2020-12-22 21:14:48 +00:00
< img src = "docs/clearml_server_logo.png" width = "250px" >
**ClearML - Auto-Magical Suite of tools to streamline your ML workflow
2020-12-22 23:43:25 +00:00
< / br > Experiment Manager, ML-Ops and Data-Management**
2019-06-10 21:24:35 +00:00
2019-06-11 17:09:23 +00:00
[![GitHub license ](https://img.shields.io/badge/license-SSPL-green.svg )](https://img.shields.io/badge/license-SSPL-green.svg)
2019-06-18 13:32:19 +00:00
[![Python versions ](https://img.shields.io/badge/python-3.6%20%7C%203.7-blue.svg )](https://img.shields.io/badge/python-3.6%20%7C%203.7-blue.svg)
2019-06-11 17:09:23 +00:00
[![GitHub version ](https://img.shields.io/github/release-pre/allegroai/trains-server.svg )](https://img.shields.io/github/release-pre/allegroai/trains-server.svg)
2021-07-27 16:53:41 +00:00
[![Artifact Hub ](https://img.shields.io/endpoint?url=https://artifacthub.io/badge/repository/allegroai )](https://artifacthub.io/packages/search?repo=allegroai)
2019-06-11 17:09:23 +00:00
2020-12-22 21:14:48 +00:00
< / div >
---
< div align = "center" >
2020-05-03 08:08:21 +00:00
2021-12-14 13:52:53 +00:00
**Note regarding Apache Log4j2 Remote Code Execution (RCE) Vulnerability - CVE-2021-44228 - ESA-2021-31**
2020-06-01 21:21:32 +00:00
2020-12-22 21:14:48 +00:00
< / div >
2020-08-23 12:41:05 +00:00
2021-12-14 13:48:54 +00:00
According to [ElasticSearch's latest report ](https://discuss.elastic.co/t/apache-log4j2-remote-code-execution-rce-vulnerability-cve-2021-44228-esa-2021-31/291476 ),
supported versions of Elasticsearch (6.8.9+, 7.8+) used with recent versions of the JDK (JDK9+) **are not susceptible to either remote code execution or information leakage**
due to Elasticsearch’ s usage of the Java Security Manager.
2020-08-23 12:41:05 +00:00
2021-12-14 13:48:54 +00:00
**As the latest version of ClearML Server uses Elasticsearch 7.10+ with JDK15, it is not affected by these vulnerabilities.**
As a precaution, we've added the mitigation recommended by ElasticSearch to our latest [docker-compose.yml ](https://github.com/allegroai/clearml-server/blob/cfccbe05c158b75e520581f86e9668291da5c70a/docker/docker-compose.yml#L42 ) file.
While previous Elasticsearch versions (5.6.11+, 6.4.0+ and 7.0.0+) used by older ClearML Server versions are only susceptible to the information leakage vulnerability
(which in any case **does not permit access to data within the Elasticsearch cluster** ),
we still recommend upgrading to the latest version of ClearML Server. Alternatively, you can apply the mitigation as implemented in our latest
[docker-compose.yml ](https://github.com/allegroai/clearml-server/blob/cfccbe05c158b75e520581f86e9668291da5c70a/docker/docker-compose.yml#L42 ) file.
2020-08-23 12:41:05 +00:00
2021-12-15 13:51:18 +00:00
**Update 15 December**: A further vulnerability (CVE-2021-45046) was disclosed on December 14th.
ElasticSearch's guidance for Elasticsearch remains unchanged by this new vulnerability, thus **not affecting ClearML Server** .
2020-12-22 21:14:48 +00:00
---
2021-12-14 13:48:54 +00:00
## ClearML Server
2020-12-22 21:14:48 +00:00
#### *Formerly known as Trains Server*
2019-06-10 21:24:35 +00:00
2020-12-22 21:14:48 +00:00
The **ClearML Server** is the backend service infrastructure for [ClearML ](https://github.com/allegroai/clearml ).
2019-06-12 22:27:36 +00:00
It allows multiple users to collaborate and manage their experiments.
2021-10-10 08:27:44 +00:00
**ClearML** offers a [free hosted service ](https://app.community.clear.ml/ ), which is maintained by **ClearML** and open to anyone.
2020-12-22 21:14:48 +00:00
In order to host your own server, you will need to launch the **ClearML Server** and point **ClearML** to it.
2019-06-12 22:27:36 +00:00
2020-12-22 21:14:48 +00:00
The **ClearML Server** contains the following components:
2019-06-11 15:55:04 +00:00
2020-12-22 21:14:48 +00:00
* The **ClearML** Web-App, a single-page UI for experiment management and browsing
2019-06-16 21:55:05 +00:00
* RESTful API for:
* Documenting and logging experiment information, statistics and results
* Querying experiments history, logs and results
* Locally-hosted file server for storing images and models making them easily accessible using the Web-App
2019-06-10 21:24:35 +00:00
2020-12-22 21:14:48 +00:00
You can quickly [deploy ](#launching-the-clearml-server ) your **ClearML Server** using Docker, AWS EC2 AMI, or Kubernetes.
2019-06-10 21:24:35 +00:00
2019-08-09 00:24:47 +00:00
## System design
2019-06-10 21:24:35 +00:00
2019-06-13 23:14:14 +00:00
2021-07-27 09:54:41 +00:00
![Alt Text ](docs/ClearML_Server_Diagram.png )
2019-06-13 23:14:14 +00:00
2020-12-22 21:14:48 +00:00
The **ClearML Server** has two supported configurations:
2019-09-01 22:00:45 +00:00
- Single IP (domain) with the following open ports
2019-10-29 18:43:46 +00:00
- Web application on port 8080
2019-09-01 22:00:45 +00:00
- API service on port 8008
- File storage service on port 8081
2019-10-29 18:43:46 +00:00
2019-09-01 22:00:45 +00:00
- Sub-Domain configuration with default http/s ports (80 or 443)
- Web application on sub-domain: app.\*.\*
- API service on sub-domain: api.\*.\*
- File storage service on sub-domain: files.\*.\*
2019-12-19 16:27:16 +00:00
2020-12-22 21:14:48 +00:00
## Launching The ClearML Server
2019-10-29 18:43:46 +00:00
2019-12-19 16:27:16 +00:00
### Prerequisites
2019-06-18 13:32:19 +00:00
2020-12-22 21:14:48 +00:00
The ports 8080/8081/8008 must be available for the **ClearML Server** services.
2019-12-19 16:27:16 +00:00
For example, to see if port `8080` is in use:
2019-06-18 13:32:19 +00:00
2020-01-02 13:21:35 +00:00
* Linux or macOS:
2019-12-19 16:27:16 +00:00
sudo lsof -Pn -i4 | grep :8080 | grep LISTEN
2019-06-10 21:24:35 +00:00
2019-12-19 16:27:16 +00:00
* Windows:
2019-11-09 21:07:43 +00:00
2019-12-19 16:27:16 +00:00
netstat -an |find /i "8080"
2019-11-09 21:07:43 +00:00
2019-12-19 16:27:16 +00:00
### Launching
2019-11-09 21:07:43 +00:00
2020-12-22 21:14:48 +00:00
Launch The **ClearML Server** in any of the following formats:
2019-07-17 15:46:12 +00:00
2021-07-27 09:54:41 +00:00
- Pre-built [AWS EC2 AMI ](https://clear.ml/docs/latest/docs/deploying_clearml/clearml_server_aws_ec2_ami )
- Pre-built [GCP Custom Image ](https://clear.ml/docs/latest/docs/deploying_clearml/clearml_server_gcp )
2019-12-19 16:27:16 +00:00
- Pre-built Docker Image
2021-07-27 09:54:41 +00:00
- [Linux ](https://clear.ml/docs/latest/docs/deploying_clearml/clearml_server_linux_mac )
- [macOS ](https://clear.ml/docs/latest/docs/deploying_clearml/clearml_server_linux_mac )
- [Windows 10 ](https://clear.ml/docs/latest/docs/deploying_clearml/clearml_server_win )
2019-12-19 16:27:16 +00:00
- Kubernetes
2021-07-27 09:54:41 +00:00
- [Kubernetes Helm ](https://clear.ml/docs/latest/docs/deploying_clearml/clearml_server_kubernetes_helm )
- Manual [Kubernetes installation ](https://clear.ml/docs/latest/docs/deploying_clearml/clearml_server_kubernetes )
2019-10-29 18:43:46 +00:00
2020-12-22 21:14:48 +00:00
## Connecting ClearML to your ClearML Server
2019-10-29 18:43:46 +00:00
2021-10-10 08:27:44 +00:00
In order to set up the **ClearML** client to work with your **ClearML Server** :
2020-12-22 21:14:48 +00:00
- Run the `clearml-init` command for an interactive setup.
- Or manually edit `~/clearml.conf` file, making sure the server settings (`api_server`, `web_server` , `file_server` ) are configured correctly, for example:
2019-06-12 19:53:50 +00:00
2019-08-07 23:22:36 +00:00
api {
# API server on port 8008
api_server: "http://localhost:8008"
2019-10-29 18:43:46 +00:00
2019-08-07 23:22:36 +00:00
# web_server on port 8080
web_server: "http://localhost:8080"
2019-10-29 18:43:46 +00:00
2019-08-07 23:22:36 +00:00
# file server on port 8081
files_server: "http://localhost:8081"
}
2019-06-12 19:53:50 +00:00
2020-12-22 21:14:48 +00:00
**Note**: If you have set up your **ClearML Server** in a sub-domain configuration, then there is no need to specify a port number,
2019-09-01 22:00:45 +00:00
it will be inferred from the http/s scheme.
2020-12-22 21:14:48 +00:00
After launching the **ClearML Server** and configuring the **ClearML** client to use the **ClearML Server** ,
you can [use ](https://github.com/allegroai/clearml ) **ClearML** in your experiments and view them in your **ClearML Server** web server,
2019-12-19 16:27:16 +00:00
for example http://localhost:8080.
2020-12-22 21:14:48 +00:00
For more information about the ClearML client, see [**ClearML** ](https://github.com/allegroai/clearml ).
2019-06-12 19:53:50 +00:00
2020-12-22 21:14:48 +00:00
## ClearML-Agent Services <a name="services"></a>
2020-06-01 21:21:32 +00:00
2020-12-22 21:14:48 +00:00
As of version 0.15 of **ClearML Server** , dockerized deployment includes a **ClearML-Agent Services** container running as
2020-06-01 21:46:01 +00:00
part of the docker container collection.
2020-06-01 21:36:35 +00:00
2020-12-22 21:14:48 +00:00
ClearML-Agent Services is an extension of ClearML-Agent that provides the ability to launch long-lasting jobs
2020-06-01 21:36:35 +00:00
that previously had to be executed on local / dedicated machines. It allows a single agent to
2020-06-01 21:46:01 +00:00
launch multiple dockers (Tasks) for different use cases. To name a few use cases, auto-scaler service (spinning instances
2020-06-01 21:21:32 +00:00
when the need arises and the budget allows), Controllers (Implementing pipelines and more sophisticated DevOps logic),
Optimizer (such as Hyper-parameter Optimization or sweeping), and Application (such as interactive Bokeh apps for
increased data transparency)
2020-12-22 21:14:48 +00:00
ClearML-Agent Services container will spin **any** task enqueued into the dedicated `services` queue.
Every task launched by ClearML-Agent Services will be registered as a new node in the system,
2020-06-01 21:36:35 +00:00
providing tracking and transparency capabilities.
2020-12-22 21:14:48 +00:00
You can also run the ClearML-Agent Services manually, see details in [ClearML-agent services mode ](https://github.com/allegroai/clearml-agent#clearml-agent-services-mode- )
2020-06-01 21:36:35 +00:00
2020-06-01 21:46:01 +00:00
**Note**: It is the user's responsibility to make sure the proper tasks are pushed into the `services` queue.
Do not enqueue training / inference tasks into the `services` queue, as it will put unnecessary load on the server.
2020-06-01 21:36:35 +00:00
2019-12-19 16:27:16 +00:00
## Advanced Functionality
2019-06-16 21:55:05 +00:00
2020-12-22 21:14:48 +00:00
The **ClearML Server** provides a few additional useful features, which can be manually enabled:
2019-12-19 16:27:16 +00:00
2021-07-27 09:54:41 +00:00
* [Web login authentication ](https://clear.ml/docs/latest/docs/deploying_clearml/clearml_server_config#web-login-authentication )
* [Non-responsive experiments watchdog ](https://clear.ml/docs/latest/docs/deploying_clearml/clearml_server_config#non-responsive-task-watchdog )
2019-12-19 16:27:16 +00:00
2020-12-22 21:14:48 +00:00
## Restarting ClearML Server
2019-12-19 16:27:16 +00:00
2020-12-22 21:14:48 +00:00
To restart the **ClearML Server** , you must first stop the containers, and then restart them.
2019-12-19 16:27:16 +00:00
```bash
docker-compose down
2020-01-05 07:19:37 +00:00
docker-compose -f docker-compose.yml up
2019-12-19 16:27:16 +00:00
```
2019-06-16 21:55:05 +00:00
2019-08-07 22:51:40 +00:00
## Upgrading <a name="upgrade"></a>
2019-06-10 21:24:35 +00:00
2021-01-05 17:07:05 +00:00
**ClearML Server** releases are also reflected in the [docker compose configuration file ](https://github.com/allegroai/trains-server/blob/master/docker/docker-compose.yml ).
2020-12-22 21:14:48 +00:00
We strongly encourage you to keep your **ClearML Server** up to date, by keeping up with the current release.
2019-12-19 16:27:16 +00:00
**Note**: The following upgrade instructions use the Linux OS as an example.
2020-12-22 21:14:48 +00:00
To upgrade your existing **ClearML Server** deployment:
2019-06-11 15:55:04 +00:00
2019-11-09 22:18:16 +00:00
1. Shut down the docker containers
```bash
2019-12-19 16:27:16 +00:00
docker-compose down
2019-11-09 22:18:16 +00:00
```
2019-06-11 15:55:04 +00:00
2019-11-09 22:18:16 +00:00
1. We highly recommend backing up your data directory before upgrading.
2019-11-09 21:54:59 +00:00
2020-12-22 21:14:48 +00:00
Assuming your data directory is `/opt/clearml` , to archive all data into `~/clearml_backup.tgz` execute:
2019-11-09 21:54:59 +00:00
2019-11-09 22:18:16 +00:00
```bash
2020-12-22 21:14:48 +00:00
sudo tar czvf ~/clearml_backup.tgz /opt/clearml/data
2019-11-09 22:18:16 +00:00
```
2019-10-29 18:43:46 +00:00
2019-11-09 22:18:16 +00:00
< details >
< summary > Restore instructions:< / summary >
To restore this example backup, execute:
```bash
2020-12-22 21:14:48 +00:00
sudo rm -R /opt/clearml/data
sudo tar -xzf ~/clearml_backup.tgz -C /opt/clearml/data
2019-11-09 22:18:16 +00:00
```
< / details >
2020-01-05 07:19:37 +00:00
1. Download the latest `docker-compose.yml` file.
2019-07-08 20:58:09 +00:00
2019-11-09 22:18:16 +00:00
```bash
2021-01-05 17:07:05 +00:00
curl https://raw.githubusercontent.com/allegroai/trains-server/master/docker/docker-compose.yml -o docker-compose.yml
2019-11-09 22:18:16 +00:00
```
2020-12-22 21:14:48 +00:00
1. Configure the ClearML-Agent Services (not supported on Windows installation).
2021-08-11 13:21:02 +00:00
If `CLEARML_HOST_IP` is not provided, ClearML-Agent Services will use the external
public address of the **ClearML Server** . If `CLEARML_AGENT_GIT_USER` / `CLEARML_AGENT_GIT_PASS` are not provided,
2020-12-22 21:14:48 +00:00
the ClearML-Agent Services will not be able to access any private repositories for running service tasks.
2020-06-01 21:21:32 +00:00
```bash
2021-08-11 13:21:02 +00:00
export CLEARML_HOST_IP=server_host_ip_here
export CLEARML_AGENT_GIT_USER=git_username_here
export CLEARML_AGENT_GIT_PASS=git_password_here
2020-06-01 21:21:32 +00:00
```
2020-12-22 21:14:48 +00:00
1. Spin up the docker containers, it will automatically pull the latest **ClearML Server** build
2019-11-09 22:18:16 +00:00
```bash
2020-01-05 07:19:37 +00:00
docker-compose -f docker-compose.yml pull
docker-compose -f docker-compose.yml up
2019-11-09 22:18:16 +00:00
```
2019-10-29 18:43:46 +00:00
2021-07-27 09:54:41 +00:00
**\* If something went wrong along the way, check our FAQ: [Common Docker Upgrade Errors ](https://clear.ml/docs/latest/docs/faq/ ).**
2019-10-29 18:43:46 +00:00
2019-06-10 21:24:35 +00:00
2019-08-01 16:36:58 +00:00
## Community & Support
2021-07-27 09:54:41 +00:00
If you have any questions, look to the ClearML [FAQ ](https://clear.ml/docs/latest/docs/faq ), or
2021-01-05 20:56:43 +00:00
tag your questions on [stackoverflow ](https://stackoverflow.com/questions/tagged/clearml ) with '**clearml**' tag.
2019-08-01 16:36:58 +00:00
2021-01-05 20:56:43 +00:00
For feature requests or bug reports, please use [GitHub issues ](https://github.com/allegroai/clearml-server/issues ).
2019-08-01 16:36:58 +00:00
2020-12-22 21:14:48 +00:00
Additionally, you can always find us at *clearml@allegro.ai*
2019-08-01 16:36:58 +00:00
2019-06-10 21:24:35 +00:00
## License
[Server Side Public License v1.0 ](https://github.com/mongodb/mongo/blob/master/LICENSE-Community.txt )
2020-12-22 21:14:48 +00:00
The **ClearML Server** relies on both [MongoDB ](https://github.com/mongodb/mongo ) and [ElasticSearch ](https://github.com/elastic/elasticsearch ).
2019-10-29 18:43:46 +00:00
With the recent changes in both MongoDB's and ElasticSearch's OSS license, we feel it is our responsibility as a
2019-06-16 21:55:05 +00:00
member of the community to support the projects we love and cherish.
2019-10-29 18:43:46 +00:00
We believe the cause for the license change in both cases is more than just,
2019-06-16 21:55:05 +00:00
and chose [SSPL ](https://www.mongodb.com/licensing/server-side-public-license ) because it is the more general and flexible of the two licenses.
2019-06-10 21:24:35 +00:00
This is our way to say - we support you guys!