2019-06-10 21:24:35 +00:00
# TRAINS Server
2019-06-11 15:55:04 +00:00
2019-06-13 23:17:46 +00:00
## Auto-Magical Experiment Manager & Version Control for AI
2019-06-10 21:24:35 +00:00
2019-06-11 17:09:23 +00:00
[![GitHub license ](https://img.shields.io/badge/license-SSPL-green.svg )](https://img.shields.io/badge/license-SSPL-green.svg)
2019-06-18 13:32:19 +00:00
[![Python versions ](https://img.shields.io/badge/python-3.6%20%7C%203.7-blue.svg )](https://img.shields.io/badge/python-3.6%20%7C%203.7-blue.svg)
2019-06-11 17:09:23 +00:00
[![GitHub version ](https://img.shields.io/github/release-pre/allegroai/trains-server.svg )](https://img.shields.io/github/release-pre/allegroai/trains-server.svg)
[![PyPI status ](https://img.shields.io/badge/status-beta-yellow.svg )](https://img.shields.io/badge/status-beta-yellow.svg)
2019-06-10 21:24:35 +00:00
## Introduction
2019-06-16 21:55:05 +00:00
The **trains-server** is the backend service infrastructure for [TRAINS ](https://github.com/allegroai/trains ).
2019-06-12 22:27:36 +00:00
It allows multiple users to collaborate and manage their experiments.
2019-10-29 18:43:46 +00:00
By default, TRAINS is set up to work with the TRAINS demo server, which is open to anyone and resets periodically.
2019-06-16 21:55:05 +00:00
In order to host your own server, you will need to install **trains-server** and point TRAINS to it.
2019-06-12 22:27:36 +00:00
2019-06-16 21:55:05 +00:00
**trains-server** contains the following components:
2019-06-11 15:55:04 +00:00
2019-06-16 21:55:05 +00:00
* The TRAINS Web-App, a single-page UI for experiment management and browsing
* RESTful API for:
* Documenting and logging experiment information, statistics and results
* Querying experiments history, logs and results
* Locally-hosted file server for storing images and models making them easily accessible using the Web-App
2019-06-10 21:24:35 +00:00
2019-09-01 22:00:45 +00:00
You can quickly setup your **trains-server** using:
2019-10-29 18:43:46 +00:00
- [Docker Installation ](#installation )
2019-09-01 22:00:45 +00:00
- Pre-built Amazon [AWS image ](#aws )
2019-10-29 18:43:46 +00:00
- [Kubernetes Helm ](https://github.com/allegroai/trains-server-helm#trains-server-for-kubernetes-clusters-using-helm )
2019-09-01 22:00:45 +00:00
or manual [Kubernetes installation ](https://github.com/allegroai/trains-server-k8s#trains-server-for-kubernetes-clusters )
2019-06-10 21:24:35 +00:00
2019-08-09 00:24:47 +00:00
## System design
2019-06-10 21:24:35 +00:00
2019-06-13 23:14:14 +00:00
![Alt Text ](https://github.com/allegroai/trains/blob/master/docs/system_diagram.png?raw=true )
2019-09-01 22:00:45 +00:00
**trains-server** has two supported configurations:
- Single IP (domain) with the following open ports
2019-10-29 18:43:46 +00:00
- Web application on port 8080
2019-09-01 22:00:45 +00:00
- API service on port 8008
- File storage service on port 8081
2019-10-29 18:43:46 +00:00
2019-09-01 22:00:45 +00:00
- Sub-Domain configuration with default http/s ports (80 or 443)
- Web application on sub-domain: app.\*.\*
- API service on sub-domain: api.\*.\*
- File storage service on sub-domain: files.\*.\*
2019-10-29 18:43:46 +00:00
2019-08-07 22:51:40 +00:00
## Install / Upgrade - AWS <a name="aws"></a>
2019-06-10 21:24:35 +00:00
2019-10-29 18:43:46 +00:00
Use one of our pre-installed Amazon Machine Images for easy deployment in AWS.
2019-06-18 13:32:19 +00:00
2019-08-07 22:51:40 +00:00
For details and instructions, see [TRAINS-server: AWS pre-installed images ](docs/install_aws.md ).
2019-06-18 13:32:19 +00:00
2019-11-15 22:04:04 +00:00
## Docker Installation - Linux, macOS, and Windows <a name="installation"></a>
2019-06-18 13:32:19 +00:00
2019-11-15 22:10:19 +00:00
Use our pre-built Docker image for easy deployment in Linux and macOS. < br >
For [Windows ](https://github.com/allegroai/trains-server/blob/master/docs/faq.md#docker_compose_win10 ), please see detailed docker-compose installation instructions on our [FAQ ](https://github.com/allegroai/trains-server/blob/master/docs/faq.md#docker_compose_win10 ).< br >
2019-08-14 01:01:41 +00:00
Latest docker images can be found [here ](https://hub.docker.com/r/allegroai/trains ).
2019-06-10 21:24:35 +00:00
2019-11-09 21:07:43 +00:00
1. Setup Docker (docker-compose installation details: [Ubuntu ](docs/faq.md#ubuntu ) / [macOS ](docs/faq.md#mac-osx ))
< details >
2019-11-09 21:54:59 +00:00
< summary > Make sure ports 8080/8081/8008 are available for the TRAINS-server services:< / summary >
2019-11-09 21:07:43 +00:00
For example, to see if port `8080` is in use:
2019-11-09 21:54:59 +00:00
```bash
$ sudo lsof -Pn -i4 | grep :8080 | grep LISTEN
```
2019-11-09 21:07:43 +00:00
< / details >
2019-08-07 22:51:40 +00:00
Increase vm.max_map_count for `ElasticSearch` docker
2019-10-29 18:43:46 +00:00
2019-11-09 21:07:43 +00:00
- Linux
```bash
$ echo "vm.max_map_count=262144" > /tmp/99-trains.conf
$ sudo mv /tmp/99-trains.conf /etc/sysctl.d/99-trains.conf
$ sudo sysctl -w vm.max_map_count=262144
$ sudo service docker restart
```
- macOS
```bash
$ screen ~/Library/Containers/com.docker.docker/Data/vms/0/tty
$ sysctl -w vm.max_map_count=262144
```
2019-06-10 21:24:35 +00:00
2019-08-07 22:51:40 +00:00
1. Create local directories for the databases and storage.
2019-10-29 18:43:46 +00:00
2019-06-12 19:53:50 +00:00
```bash
2019-11-09 21:07:43 +00:00
$ sudo mkdir -p /opt/trains/data/elastic
$ sudo mkdir -p /opt/trains/data/mongo/db
$ sudo mkdir -p /opt/trains/data/mongo/configdb
$ sudo mkdir -p /opt/trains/data/redis
$ sudo mkdir -p /opt/trains/logs
$ sudo mkdir -p /opt/trains/data/fileserver
$ sudo mkdir -p /opt/trains/config
2019-10-29 18:43:46 +00:00
```
2019-08-07 22:51:40 +00:00
2019-11-09 21:07:43 +00:00
Set folder permissions
- Linux
```bash
$ sudo chown -R 1000:1000 /opt/trains
```
- macOS
```bash
$ sudo chown -R $(whoami):staff /opt/trains
```
2019-10-29 18:43:46 +00:00
2019-11-09 21:07:43 +00:00
1. Download the `docker-compose.yml` file, either download [manually ](https://raw.githubusercontent.com/allegroai/trains-server/master/docker-compose.yml ) or execute:
2019-10-29 18:43:46 +00:00
```bash
2019-11-09 21:07:43 +00:00
$ curl https://raw.githubusercontent.com/allegroai/trains-server/master/docker-compose.yml -o docker-compose.yml
2019-08-07 23:08:39 +00:00
```
2019-10-29 18:43:46 +00:00
2019-08-07 22:51:40 +00:00
1. Launch the Docker containers < a name = "launch-docker" ></ a >
2019-06-10 21:24:35 +00:00
2019-10-29 18:43:46 +00:00
```bash
2019-11-18 22:01:12 +00:00
$ docker-compose -f docker-compose.yml up
2019-06-12 19:53:50 +00:00
```
2019-10-29 18:43:46 +00:00
2019-08-07 22:51:40 +00:00
1. Your server is now running on [http://localhost:8080 ](http://localhost:8080 ) and the following ports are available:
2019-10-29 18:43:46 +00:00
2019-08-07 22:51:40 +00:00
* Web server on port `8080`
* API server on port `8008`
* File server on port `8081`
2019-06-10 21:24:35 +00:00
2019-11-09 21:54:59 +00:00
**\* If something went wrong along the way, check our FAQ: [Docker Setup ](docs/docker_setup.md#setup-docker ), [Ubuntu Support ](docs/faq.md#ubuntu ), [macOS Support ](docs/faq.md#mac-osx )**
2019-11-09 21:07:43 +00:00
2019-08-09 00:24:47 +00:00
## Optional Configuration
2019-07-08 20:58:09 +00:00
The **trains-server** default configuration can be easily overridden using external configuration files. By default, the server will look for these files in `/opt/trains/config` .
2019-08-07 23:22:36 +00:00
In order to apply the new configuration, you must restart the server (see [Restarting trains-server ](#restart-server )).
2019-07-08 20:58:09 +00:00
2019-08-09 00:44:17 +00:00
### Adding Web Login Authentication
2019-07-08 20:58:09 +00:00
2019-08-07 22:51:40 +00:00
By default anyone can login to the **trains-server** Web-App.
2019-08-07 23:22:36 +00:00
You can configure the **trains-server** to allow only a specific set of users to access the system.
2019-07-17 15:46:12 +00:00
Enable this feature by placing `apiserver.conf` file under `/opt/trains/config` .
2019-10-29 18:43:46 +00:00
2019-11-09 22:05:21 +00:00
Sample `apiserver.conf` configuration file can be found [here ](https://github.com/allegroai/trains-server/blob/master/docs/apiserver.conf )
2019-10-29 18:43:46 +00:00
2019-11-09 22:08:52 +00:00
To apply the changes, you must [restart the *trains-server* ](#restart-server ).
2019-07-17 15:46:12 +00:00
2019-08-09 00:44:17 +00:00
### Configuring the Non-Responsive Experiments Watchdog
2019-07-08 20:58:09 +00:00
2019-10-29 18:43:46 +00:00
The non-responsive experiment watchdog, monitors experiments that were not updated for a given period of time,
2019-08-07 23:22:36 +00:00
and marks them as `aborted` . The watchdog is always active with a default of 7200 seconds (2 hours) of inactivity threshold.
2019-07-08 20:58:09 +00:00
2019-08-07 23:22:36 +00:00
To change the watchdog's timeouts, place a `services.conf` file under `/opt/trains/config` .
2019-11-09 22:05:21 +00:00
Sample watchdog `services.conf` configuration file can be found [here ](https://github.com/allegroai/trains-server/blob/master/docs/services.conf )
2019-07-08 20:58:09 +00:00
2019-11-09 22:08:52 +00:00
To apply the changes, you must [restart the *trains-server* ](#restart-server ).
2019-06-13 09:46:56 +00:00
2019-08-07 22:51:40 +00:00
### Restarting trains-server <a name="restart-server"></a>
2019-06-13 09:46:56 +00:00
2019-11-09 22:14:30 +00:00
To restart the **trains-server** , you must first stop the containers, and then restart them.
```bash
$ docker-compose down
2019-11-18 22:01:12 +00:00
$ docker-compose -f docker-compose.yml up
2019-11-09 22:14:30 +00:00
```
2019-06-10 21:24:35 +00:00
2019-08-07 22:51:40 +00:00
## Configuring **TRAINS** client
2019-06-10 21:24:35 +00:00
2019-10-29 18:43:46 +00:00
Once you have installed the **trains-server** , make sure to configure **TRAINS** [client ](https://github.com/allegroai/trains )
2019-08-07 22:51:40 +00:00
to use your locally installed server (and not the demo server).
2019-06-10 21:24:35 +00:00
2019-10-29 18:43:46 +00:00
- Run the `trains-init` command for an interactive setup
2019-06-12 19:53:50 +00:00
2019-08-07 22:51:40 +00:00
- Or manually edit `~/trains.conf` file, making sure the `api_server` value is configured correctly, for example:
2019-06-12 19:53:50 +00:00
2019-08-07 23:22:36 +00:00
api {
# API server on port 8008
api_server: "http://localhost:8008"
2019-10-29 18:43:46 +00:00
2019-08-07 23:22:36 +00:00
# web_server on port 8080
web_server: "http://localhost:8080"
2019-10-29 18:43:46 +00:00
2019-08-07 23:22:36 +00:00
# file server on port 8081
files_server: "http://localhost:8081"
}
2019-06-12 19:53:50 +00:00
2019-10-29 18:43:46 +00:00
* Notice that if you setup **trains-server** in a sub-domain configuration, there is no need to specify a port number,
2019-09-01 22:00:45 +00:00
it will be inferred from the http/s scheme.
2019-08-08 08:58:19 +00:00
See [Installing and Configuring TRAINS ](https://github.com/allegroai/trains#configuration ) for more details.
2019-06-12 19:53:50 +00:00
2019-06-16 21:55:05 +00:00
## What next?
2019-10-29 18:43:46 +00:00
Now that the **trains-server** is installed, and TRAINS is configured to use it,
you can [use ](https://github.com/allegroai/trains#using-trains ) TRAINS in your experiments and view them in the web server,
2019-06-16 21:55:05 +00:00
for example http://localhost:8080
2019-08-07 22:51:40 +00:00
## Upgrading <a name="upgrade"></a>
2019-06-10 21:24:35 +00:00
2019-06-11 15:55:04 +00:00
We are constantly updating, improving and adding to the **trains-server** .
New releases will include new pre-built Docker images.
When we release a new version and include a new pre-built Docker image for it, upgrade as follows:
2019-11-09 22:18:16 +00:00
1. Shut down the docker containers
```bash
$ docker-compose down
```
2019-06-11 15:55:04 +00:00
2019-11-09 22:18:16 +00:00
1. We highly recommend backing up your data directory before upgrading.
2019-11-09 21:54:59 +00:00
2019-11-09 22:18:16 +00:00
Assuming your data directory is `/opt/trains` , to archive all data into `~/trains_backup.tgz` execute:
2019-11-09 21:54:59 +00:00
2019-11-09 22:18:16 +00:00
```bash
$ sudo tar czvf ~/trains_backup.tgz /opt/trains/data
```
2019-10-29 18:43:46 +00:00
2019-11-09 22:18:16 +00:00
< details >
< summary > Restore instructions:< / summary >
To restore this example backup, execute:
```bash
$ sudo rm -R /opt/trains/data
$ sudo tar -xzf ~/trains_backup.tgz -C /opt/trains/data
```
< / details >
1. Download the latest `docker-compose.yml` file, either [manually ](https://raw.githubusercontent.com/allegroai/trains-server/master/docker-compose.yml ) or execute:
2019-07-08 20:58:09 +00:00
2019-11-09 22:18:16 +00:00
```bash
$ curl https://raw.githubusercontent.com/allegroai/trains-server/master/docker-compose.yml -o docker-compose.yml
```
1. Spin up the docker containers, it will automatically pull the latest trains-server build
```bash
2019-11-18 22:14:45 +00:00
$ docker-compose -f docker-compose.yml pull
2019-11-18 22:01:12 +00:00
$ docker-compose -f docker-compose.yml up
2019-11-09 22:18:16 +00:00
```
2019-10-29 18:43:46 +00:00
2019-11-09 21:54:59 +00:00
**\* If something went wrong along the way, check our FAQ: [Docker Upgrade ](docs/docker_setup.md#common-docker-upgrade-errors )**
2019-10-29 18:43:46 +00:00
2019-06-10 21:24:35 +00:00
2019-08-01 16:36:58 +00:00
## Community & Support
If you have any questions, look to the TRAINS-server [FAQ ](https://github.com/allegroai/trains-server/blob/master/docs/faq.md ), or
tag your questions on [stackoverflow ](https://stackoverflow.com/questions/tagged/trains ) with '**trains**' tag.
For feature requests or bug reports, please use [GitHub issues ](https://github.com/allegroai/trains-server/issues ).
Additionally, you can always find us at *trains@allegro.ai*
2019-06-10 21:24:35 +00:00
## License
[Server Side Public License v1.0 ](https://github.com/mongodb/mongo/blob/master/LICENSE-Community.txt )
2019-06-16 21:55:05 +00:00
**trains-server** relies on both [MongoDB ](https://github.com/mongodb/mongo ) and [ElasticSearch ](https://github.com/elastic/elasticsearch ).
2019-10-29 18:43:46 +00:00
With the recent changes in both MongoDB's and ElasticSearch's OSS license, we feel it is our responsibility as a
2019-06-16 21:55:05 +00:00
member of the community to support the projects we love and cherish.
2019-10-29 18:43:46 +00:00
We believe the cause for the license change in both cases is more than just,
2019-06-16 21:55:05 +00:00
and chose [SSPL ](https://www.mongodb.com/licensing/server-side-public-license ) because it is the more general and flexible of the two licenses.
2019-06-10 21:24:35 +00:00
This is our way to say - we support you guys!