mirror of https://github.com/clearml/clearml-server synced 2025-06-26 23:15:47 +00:00

Go to file

allegroai 4a1d97c02f typo		2019-12-14 23:34:00 +02:00
docs	Update faq.md	2019-11-16 00:07:30 +02:00
fileserver	Add API version 2.4 with new trains-server capabilities including DevOps and scheduling	2019-10-25 15:36:58 +03:00
migration/mongodb	Add Artifacts support, changed tags to system_tags and added user tags	2019-09-24 21:35:41 +03:00
server	typo	2019-12-14 23:34:00 +02:00
webserver	Add File server CORS support	2019-07-17 18:16:43 +03:00
.gitignore	Add API version 2.4 with new trains-server capabilities including DevOps and scheduling	2019-10-25 15:36:58 +03:00
docker-compose-unified.yml	Upgrade to v0.12	2019-10-29 20:37:29 +02:00
docker-compose-win10.yml	Add docker-compose Windows support	2019-11-16 00:04:04 +02:00
docker-compose.yml	Upgrade to v0.12	2019-10-29 20:37:29 +02:00
LICENSE	Update LICENSE	2019-08-21 00:19:59 +03:00
README.md	Update README.md	2019-11-19 00:14:45 +02:00

README.md

TRAINS Server

Auto-Magical Experiment Manager & Version Control for AI

Introduction

The trains-server is the backend service infrastructure for TRAINS. It allows multiple users to collaborate and manage their experiments. By default, TRAINS is set up to work with the TRAINS demo server, which is open to anyone and resets periodically. In order to host your own server, you will need to install trains-server and point TRAINS to it.

trains-server contains the following components:

The TRAINS Web-App, a single-page UI for experiment management and browsing
RESTful API for:
- Documenting and logging experiment information, statistics and results
- Querying experiments history, logs and results
Locally-hosted file server for storing images and models making them easily accessible using the Web-App

You can quickly setup your trains-server using:

Docker Installation
Pre-built Amazon AWS image
Kubernetes Helm or manual Kubernetes installation

System design

trains-server has two supported configurations:

Single IP (domain) with the following open ports
- Web application on port 8080
- API service on port 8008
- File storage service on port 8081
Sub-Domain configuration with default http/s ports (80 or 443)
- Web application on sub-domain: app.*.*
- API service on sub-domain: api.*.*
- File storage service on sub-domain: files.*.*

Install / Upgrade - AWS

Use one of our pre-installed Amazon Machine Images for easy deployment in AWS.

For details and instructions, see TRAINS-server: AWS pre-installed images.

Docker Installation - Linux, macOS, and Windows

Use our pre-built Docker image for easy deployment in Linux and macOS.
For Windows, please see detailed docker-compose installation instructions on our FAQ.
Latest docker images can be found here.

Setup Docker (docker-compose installation details: Ubuntu / macOS)

Make sure ports 8080/8081/8008 are available for the TRAINS-server services:

For example, to see if port 8080 is in use:

$ sudo lsof -Pn -i4 | grep :8080 | grep LISTEN

Increase vm.max_map_count for ElasticSearch docker

Linux

$ echo "vm.max_map_count=262144" > /tmp/99-trains.conf
$ sudo mv /tmp/99-trains.conf /etc/sysctl.d/99-trains.conf
$ sudo sysctl -w vm.max_map_count=262144
$ sudo service docker restart

macOS

$ screen ~/Library/Containers/com.docker.docker/Data/vms/0/tty
$ sysctl -w vm.max_map_count=262144

Create local directories for the databases and storage.

$ sudo mkdir -p /opt/trains/data/elastic
$ sudo mkdir -p /opt/trains/data/mongo/db
$ sudo mkdir -p /opt/trains/data/mongo/configdb
$ sudo mkdir -p /opt/trains/data/redis
$ sudo mkdir -p /opt/trains/logs
$ sudo mkdir -p /opt/trains/data/fileserver
$ sudo mkdir -p /opt/trains/config

Set folder permissions

Linux
```
$ sudo chown -R 1000:1000 /opt/trains
```

macOS

$ sudo chown -R $(whoami):staff /opt/trains

Download the docker-compose.yml file, either download manually or execute:

$ curl https://raw.githubusercontent.com/allegroai/trains-server/master/docker-compose.yml -o docker-compose.yml

Launch the Docker containers

$ docker-compose -f docker-compose.yml up

Your server is now running on http://localhost:8080 and the following ports are available:
- Web server on port 8080
- API server on port 8008
- File server on port 8081

* If something went wrong along the way, check our FAQ: Docker Setup, Ubuntu Support, macOS Support

Optional Configuration

The trains-server default configuration can be easily overridden using external configuration files. By default, the server will look for these files in /opt/trains/config.

In order to apply the new configuration, you must restart the server (see Restarting trains-server).

By default anyone can login to the trains-server Web-App. You can configure the trains-server to allow only a specific set of users to access the system.

Enable this feature by placing apiserver.conf file under /opt/trains/config.

Sample apiserver.conf configuration file can be found here

To apply the changes, you must restart the trains-server.

Configuring the Non-Responsive Experiments Watchdog

The non-responsive experiment watchdog, monitors experiments that were not updated for a given period of time, and marks them as aborted. The watchdog is always active with a default of 7200 seconds (2 hours) of inactivity threshold.

To change the watchdog's timeouts, place a services.conf file under /opt/trains/config.

Sample watchdog services.conf configuration file can be found here

To apply the changes, you must restart the trains-server.

Restarting trains-server

To restart the trains-server, you must first stop the containers, and then restart them.

$ docker-compose down
$ docker-compose -f docker-compose.yml up

Configuring TRAINS client

Once you have installed the trains-server, make sure to configure TRAINS client to use your locally installed server (and not the demo server).

Run the trains-init command for an interactive setup

Or manually edit ~/trains.conf file, making sure the api_server value is configured correctly, for example:

  api {
      # API server on port 8008
      api_server: "http://localhost:8008"

      # web_server on port 8080
      web_server: "http://localhost:8080"

      # file server on port 8081
      files_server: "http://localhost:8081"
  }

Notice that if you setup trains-server in a sub-domain configuration, there is no need to specify a port number, it will be inferred from the http/s scheme.

See Installing and Configuring TRAINS for more details.

What next?

Now that the trains-server is installed, and TRAINS is configured to use it, you can use TRAINS in your experiments and view them in the web server, for example http://localhost:8080

Upgrading

We are constantly updating, improving and adding to the trains-server. New releases will include new pre-built Docker images. When we release a new version and include a new pre-built Docker image for it, upgrade as follows:

Shut down the docker containers
```
$ docker-compose down
```
We highly recommend backing up your data directory before upgrading.

Assuming your data directory is /opt/trains, to archive all data into ~/trains_backup.tgz execute:
```
$ sudo tar czvf ~/trains_backup.tgz /opt/trains/data
```
Restore instructions:

To restore this example backup, execute:
```
$ sudo rm -R /opt/trains/data
$ sudo tar -xzf ~/trains_backup.tgz -C /opt/trains/data
```

Download the latest docker-compose.yml file, either manually or execute:

$ curl https://raw.githubusercontent.com/allegroai/trains-server/master/docker-compose.yml -o docker-compose.yml

Spin up the docker containers, it will automatically pull the latest trains-server build
```
$ docker-compose -f docker-compose.yml pull
$ docker-compose -f docker-compose.yml up
```

* If something went wrong along the way, check our FAQ: Docker Upgrade

Community & Support

If you have any questions, look to the TRAINS-server FAQ, or tag your questions on stackoverflow with 'trains' tag.

For feature requests or bug reports, please use GitHub issues.

Additionally, you can always find us at trains@allegro.ai

License

Server Side Public License v1.0

trains-server relies on both MongoDB and ElasticSearch. With the recent changes in both MongoDB's and ElasticSearch's OSS license, we feel it is our responsibility as a member of the community to support the projects we love and cherish. We believe the cause for the license change in both cases is more than just, and chose SSPL because it is the more general and flexible of the two licenses.

This is our way to say - we support you guys!