2019-06-10 21:24:35 +00:00
# TRAINS Server
2019-06-11 15:55:04 +00:00
2019-06-13 23:17:46 +00:00
## Auto-Magical Experiment Manager & Version Control for AI
2019-06-10 21:24:35 +00:00
2019-06-11 17:09:23 +00:00
[![GitHub license ](https://img.shields.io/badge/license-SSPL-green.svg )](https://img.shields.io/badge/license-SSPL-green.svg)
2019-06-18 13:32:19 +00:00
[![Python versions ](https://img.shields.io/badge/python-3.6%20%7C%203.7-blue.svg )](https://img.shields.io/badge/python-3.6%20%7C%203.7-blue.svg)
2019-06-11 17:09:23 +00:00
[![GitHub version ](https://img.shields.io/github/release-pre/allegroai/trains-server.svg )](https://img.shields.io/github/release-pre/allegroai/trains-server.svg)
[![PyPI status ](https://img.shields.io/badge/status-beta-yellow.svg )](https://img.shields.io/badge/status-beta-yellow.svg)
2019-06-10 21:24:35 +00:00
## Introduction
2019-06-16 21:55:05 +00:00
The **trains-server** is the backend service infrastructure for [TRAINS ](https://github.com/allegroai/trains ).
2019-06-12 22:27:36 +00:00
It allows multiple users to collaborate and manage their experiments.
2019-10-29 18:43:46 +00:00
By default, TRAINS is set up to work with the TRAINS demo server, which is open to anyone and resets periodically.
2019-06-16 21:55:05 +00:00
In order to host your own server, you will need to install **trains-server** and point TRAINS to it.
2019-06-12 22:27:36 +00:00
2019-06-16 21:55:05 +00:00
**trains-server** contains the following components:
2019-06-11 15:55:04 +00:00
2019-06-16 21:55:05 +00:00
* The TRAINS Web-App, a single-page UI for experiment management and browsing
* RESTful API for:
* Documenting and logging experiment information, statistics and results
* Querying experiments history, logs and results
* Locally-hosted file server for storing images and models making them easily accessible using the Web-App
2019-06-10 21:24:35 +00:00
2019-09-01 22:00:45 +00:00
You can quickly setup your **trains-server** using:
2019-10-29 18:43:46 +00:00
- [Docker Installation ](#installation )
2019-09-01 22:00:45 +00:00
- Pre-built Amazon [AWS image ](#aws )
2019-10-29 18:43:46 +00:00
- [Kubernetes Helm ](https://github.com/allegroai/trains-server-helm#trains-server-for-kubernetes-clusters-using-helm )
2019-09-01 22:00:45 +00:00
or manual [Kubernetes installation ](https://github.com/allegroai/trains-server-k8s#trains-server-for-kubernetes-clusters )
2019-06-10 21:24:35 +00:00
2019-08-09 00:24:47 +00:00
## System design
2019-06-10 21:24:35 +00:00
2019-06-13 23:14:14 +00:00
![Alt Text ](https://github.com/allegroai/trains/blob/master/docs/system_diagram.png?raw=true )
2019-09-01 22:00:45 +00:00
**trains-server** has two supported configurations:
- Single IP (domain) with the following open ports
2019-10-29 18:43:46 +00:00
- Web application on port 8080
2019-09-01 22:00:45 +00:00
- API service on port 8008
- File storage service on port 8081
2019-10-29 18:43:46 +00:00
2019-09-01 22:00:45 +00:00
- Sub-Domain configuration with default http/s ports (80 or 443)
- Web application on sub-domain: app.\*.\*
- API service on sub-domain: api.\*.\*
- File storage service on sub-domain: files.\*.\*
2019-10-29 18:43:46 +00:00
2019-08-07 22:51:40 +00:00
## Install / Upgrade - AWS <a name="aws"></a>
2019-06-10 21:24:35 +00:00
2019-10-29 18:43:46 +00:00
Use one of our pre-installed Amazon Machine Images for easy deployment in AWS.
2019-06-18 13:32:19 +00:00
2019-08-07 22:51:40 +00:00
For details and instructions, see [TRAINS-server: AWS pre-installed images ](docs/install_aws.md ).
2019-06-18 13:32:19 +00:00
2019-08-14 01:01:41 +00:00
## Docker Installation - Linux, Mac OS X <a name="installation"></a>
2019-06-18 13:32:19 +00:00
2019-10-29 18:43:46 +00:00
Use our pre-built Docker image for easy deployment in Linux and Mac OS X.
2019-08-07 22:51:40 +00:00
For Windows, we recommend installing our pre-built Docker image on a Linux virtual machine.
2019-08-14 01:01:41 +00:00
Latest docker images can be found [here ](https://hub.docker.com/r/allegroai/trains ).
2019-06-10 21:24:35 +00:00
2019-08-09 00:40:28 +00:00
1. Setup Docker ([docker-compose Ubuntu](docs/faq.md#ubuntu), [docker-compose OS X ](docs/faq.md#mac-osx ), [Setup Docker Service Manually ](docs/docker_setup.md#setup-docker ))
2019-06-10 21:24:35 +00:00
2019-10-29 18:43:46 +00:00
Make sure port 8080/8081/8008 are available for the `trains-server` services
2019-08-07 22:51:40 +00:00
Increase vm.max_map_count for `ElasticSearch` docker
2019-10-29 18:43:46 +00:00
2019-06-12 19:53:50 +00:00
```bash
2019-08-07 22:51:40 +00:00
echo "vm.max_map_count=262144" > /tmp/99-trains.conf
sudo mv /tmp/99-trains.conf /etc/sysctl.d/99-trains.conf
sudo sysctl -w vm.max_map_count=262144
2019-10-29 18:43:46 +00:00
2019-08-07 23:22:36 +00:00
sudo service docker restart
2019-10-29 18:43:46 +00:00
```
2019-06-10 21:24:35 +00:00
2019-08-07 22:51:40 +00:00
1. Create local directories for the databases and storage.
2019-10-29 18:43:46 +00:00
2019-06-12 19:53:50 +00:00
```bash
2019-08-07 22:51:40 +00:00
sudo mkdir -p /opt/trains/data/elastic
sudo mkdir -p /opt/trains/data/mongo/db
sudo mkdir -p /opt/trains/data/mongo/configdb
2019-10-29 18:43:46 +00:00
sudo mkdir -p /opt/trains/data/redis
2019-08-07 22:51:40 +00:00
sudo mkdir -p /opt/trains/logs
sudo mkdir -p /opt/trains/data/fileserver
2019-10-07 11:00:51 +00:00
sudo mkdir -p /opt/trains/config
2019-10-29 18:43:46 +00:00
```
2019-08-07 22:51:40 +00:00
Linux
```bash
2019-08-07 23:08:39 +00:00
$ sudo chown -R 1000:1000 /opt/trains
2019-06-12 19:53:50 +00:00
```
2019-08-07 22:51:40 +00:00
Mac OS X
2019-10-29 18:43:46 +00:00
```bash
2019-08-07 23:08:39 +00:00
$ sudo chown -R $(whoami):staff /opt/trains
2019-06-12 19:53:50 +00:00
```
2019-10-29 18:43:46 +00:00
2019-08-07 22:51:40 +00:00
1. Clone the [trains-server ](https://github.com/allegroai/trains-server ) repository and change directories to the new **trains-server** directory.
2019-10-29 18:43:46 +00:00
```bash
2019-08-07 23:08:39 +00:00
$ git clone https://github.com/allegroai/trains-server.git
$ cd trains-server
```
2019-10-29 18:43:46 +00:00
2019-08-07 22:51:40 +00:00
1. Launch the Docker containers < a name = "launch-docker" ></ a >
2019-06-10 21:24:35 +00:00
2019-08-07 23:00:15 +00:00
* Automatically with docker-compose (details: [Linux/Ubuntu ](docs/faq.md#ubuntu ), [OS X ](docs/faq.md#mac-osx ))
2019-10-29 18:43:46 +00:00
```bash
2019-08-07 23:08:39 +00:00
$ docker-compose up
2019-06-12 19:53:50 +00:00
```
2019-10-29 18:43:46 +00:00
2019-08-14 01:01:41 +00:00
* Manually, see [Launching Docker Containers Manually ](docs/docker_setup.md#launch ) for instructions.
2019-10-29 18:43:46 +00:00
2019-08-07 22:51:40 +00:00
1. Your server is now running on [http://localhost:8080 ](http://localhost:8080 ) and the following ports are available:
2019-10-29 18:43:46 +00:00
2019-08-07 22:51:40 +00:00
* Web server on port `8080`
* API server on port `8008`
* File server on port `8081`
2019-06-10 21:24:35 +00:00
2019-08-09 00:24:47 +00:00
## Optional Configuration
2019-07-08 20:58:09 +00:00
The **trains-server** default configuration can be easily overridden using external configuration files. By default, the server will look for these files in `/opt/trains/config` .
2019-08-07 23:22:36 +00:00
In order to apply the new configuration, you must restart the server (see [Restarting trains-server ](#restart-server )).
2019-07-08 20:58:09 +00:00
2019-08-09 00:44:17 +00:00
### Adding Web Login Authentication
2019-07-08 20:58:09 +00:00
2019-08-07 22:51:40 +00:00
By default anyone can login to the **trains-server** Web-App.
2019-08-07 23:22:36 +00:00
You can configure the **trains-server** to allow only a specific set of users to access the system.
2019-07-17 15:46:12 +00:00
Enable this feature by placing `apiserver.conf` file under `/opt/trains/config` .
2019-10-29 18:43:46 +00:00
2019-07-17 15:46:12 +00:00
Sample fixed user configuration file `/opt/trains/config/apiserver.conf` :
auth {
2019-10-29 18:43:46 +00:00
# Fixed users login credetials
2019-07-17 15:46:12 +00:00
# No other user will be able to login
fixed_users {
enabled: true
users: [
{
username: "jane"
password: "12345678"
name: "Jane Doe"
},
{
username: "john"
password: "12345678"
name: "John Doe"
},
]
}
2019-07-08 20:58:09 +00:00
}
2019-10-29 18:43:46 +00:00
2019-08-07 22:51:40 +00:00
To apply the `apiserver.conf` changes, you must restart the *trains-apiserver* (docker) (see [Restarting trains-server ](#restart-server )).
2019-07-17 15:46:12 +00:00
2019-08-09 00:44:17 +00:00
### Configuring the Non-Responsive Experiments Watchdog
2019-07-08 20:58:09 +00:00
2019-10-29 18:43:46 +00:00
The non-responsive experiment watchdog, monitors experiments that were not updated for a given period of time,
2019-08-07 23:22:36 +00:00
and marks them as `aborted` . The watchdog is always active with a default of 7200 seconds (2 hours) of inactivity threshold.
2019-07-08 20:58:09 +00:00
2019-08-07 23:22:36 +00:00
To change the watchdog's timeouts, place a `services.conf` file under `/opt/trains/config` .
Sample watchdog configuration file `/opt/trains/config/services.conf` :
2019-07-08 20:58:09 +00:00
tasks {
non_responsive_tasks_watchdog {
# In-progress tasks that haven't been updated for at least 'value' seconds will be stopped by the watchdog
threshold_sec: 7200
2019-10-29 18:43:46 +00:00
2019-07-08 20:58:09 +00:00
# Watchdog will sleep for this number of seconds after each cycle
watch_interval_sec: 900
}
}
2019-08-07 22:51:40 +00:00
To apply the `services.conf` changes, you must restart the *trains-apiserver* (docker) (see [Restarting trains-server ](#restart-server )).
2019-06-13 09:46:56 +00:00
2019-08-07 22:51:40 +00:00
### Restarting trains-server <a name="restart-server"></a>
2019-06-13 09:46:56 +00:00
2019-08-07 22:51:40 +00:00
To restart the **trains-server** , you must first stop and remove the containers, and then restart.
2019-06-10 21:24:35 +00:00
2019-08-07 22:51:40 +00:00
1. Restarting docker-compose containers.
2019-06-10 21:24:35 +00:00
2019-08-07 23:08:39 +00:00
$ docker-compose down
$ docker-compose up
2019-10-29 18:43:46 +00:00
2019-08-08 08:58:19 +00:00
1. Manually restarting dockers [instructions ](docs/docker_setup.md#launch ).
2019-06-10 21:24:35 +00:00
2019-08-07 22:51:40 +00:00
## Configuring **TRAINS** client
2019-06-10 21:24:35 +00:00
2019-10-29 18:43:46 +00:00
Once you have installed the **trains-server** , make sure to configure **TRAINS** [client ](https://github.com/allegroai/trains )
2019-08-07 22:51:40 +00:00
to use your locally installed server (and not the demo server).
2019-06-10 21:24:35 +00:00
2019-10-29 18:43:46 +00:00
- Run the `trains-init` command for an interactive setup
2019-06-12 19:53:50 +00:00
2019-08-07 22:51:40 +00:00
- Or manually edit `~/trains.conf` file, making sure the `api_server` value is configured correctly, for example:
2019-06-12 19:53:50 +00:00
2019-08-07 23:22:36 +00:00
api {
# API server on port 8008
api_server: "http://localhost:8008"
2019-10-29 18:43:46 +00:00
2019-08-07 23:22:36 +00:00
# web_server on port 8080
web_server: "http://localhost:8080"
2019-10-29 18:43:46 +00:00
2019-08-07 23:22:36 +00:00
# file server on port 8081
files_server: "http://localhost:8081"
}
2019-06-12 19:53:50 +00:00
2019-10-29 18:43:46 +00:00
* Notice that if you setup **trains-server** in a sub-domain configuration, there is no need to specify a port number,
2019-09-01 22:00:45 +00:00
it will be inferred from the http/s scheme.
2019-08-08 08:58:19 +00:00
See [Installing and Configuring TRAINS ](https://github.com/allegroai/trains#configuration ) for more details.
2019-06-12 19:53:50 +00:00
2019-06-16 21:55:05 +00:00
## What next?
2019-10-29 18:43:46 +00:00
Now that the **trains-server** is installed, and TRAINS is configured to use it,
you can [use ](https://github.com/allegroai/trains#using-trains ) TRAINS in your experiments and view them in the web server,
2019-06-16 21:55:05 +00:00
for example http://localhost:8080
2019-08-07 22:51:40 +00:00
## Upgrading <a name="upgrade"></a>
2019-06-10 21:24:35 +00:00
2019-06-11 15:55:04 +00:00
We are constantly updating, improving and adding to the **trains-server** .
New releases will include new pre-built Docker images.
When we release a new version and include a new pre-built Docker image for it, upgrade as follows:
2019-10-29 18:43:46 +00:00
* Upgrading your docker-compose installation
2019-06-11 15:55:04 +00:00
2019-10-29 18:43:46 +00:00
* Shut down the docker containers
```bash
$ docker-compose down
```
* We highly recommend backing up your data directory before upgrading
(see **Step ii** in the Manual Docker upgrade)
* Spin up the docker containers, it will automatically pull the latest trains-server build
```bash
$ docker-compose up
```
2019-07-08 20:58:09 +00:00
2019-10-29 18:43:46 +00:00
* In case of a docker error: "... The container name "/trains-???" is already in use by ..."
Try removing deprecated images with:
```bash
$ docker rm -f $(docker ps -a -q)
```
* Manual Docker upgrade
1. Shut down and remove each of your Docker instances using the following commands:
2019-07-08 20:58:09 +00:00
2019-08-07 23:08:39 +00:00
```bash
$ sudo docker stop < docker-name >
$ sudo docker rm -v < docker-name >
```
2019-08-07 23:00:15 +00:00
The Docker names are (see [Launching Docker Containers ](#launch-docker )):
* `trains-elastic`
* `trains-mongo`
2019-10-29 18:43:46 +00:00
* `trains-redis`
2019-08-07 23:00:15 +00:00
* `trains-fileserver`
* `trains-apiserver`
* `trains-webserver`
2019-08-07 23:08:39 +00:00
2019-10-29 18:43:46 +00:00
2. We highly recommend backing up your data directory!. A simple way to do that is using `tar` :
2019-08-07 23:00:15 +00:00
2019-10-29 18:43:46 +00:00
For example, if your data directory is `/opt/trains` , use the following command:
```bash
$ sudo tar czvf ~/trains_backup.tgz /opt/trains/data
```
This backups all data to an archive in your home directory.
To restore this example backup, use the following command:
```bash
$ sudo rm -R /opt/trains/data
$ sudo tar -xzf ~/trains_backup.tgz -C /opt/trains/data
```
3. Pull the new **trains-server** docker image using the following command:
```bash
$ sudo docker pull allegroai/trains:latest
```
If you wish to pull a different version, replace `latest` with the required version number, for example:
```bash
$ sudo docker pull allegroai/trains:0.11.0
```
4. Launch the newly released Docker image (see [Launching Docker Containers ](#launch-docker )).
2019-06-10 21:24:35 +00:00
2019-08-01 16:36:58 +00:00
## Community & Support
If you have any questions, look to the TRAINS-server [FAQ ](https://github.com/allegroai/trains-server/blob/master/docs/faq.md ), or
tag your questions on [stackoverflow ](https://stackoverflow.com/questions/tagged/trains ) with '**trains**' tag.
For feature requests or bug reports, please use [GitHub issues ](https://github.com/allegroai/trains-server/issues ).
Additionally, you can always find us at *trains@allegro.ai*
2019-06-10 21:24:35 +00:00
## License
[Server Side Public License v1.0 ](https://github.com/mongodb/mongo/blob/master/LICENSE-Community.txt )
2019-06-16 21:55:05 +00:00
**trains-server** relies on both [MongoDB ](https://github.com/mongodb/mongo ) and [ElasticSearch ](https://github.com/elastic/elasticsearch ).
2019-10-29 18:43:46 +00:00
With the recent changes in both MongoDB's and ElasticSearch's OSS license, we feel it is our responsibility as a
2019-06-16 21:55:05 +00:00
member of the community to support the projects we love and cherish.
2019-10-29 18:43:46 +00:00
We believe the cause for the license change in both cases is more than just,
2019-06-16 21:55:05 +00:00
and chose [SSPL ](https://www.mongodb.com/licensing/server-side-public-license ) because it is the more general and flexible of the two licenses.
2019-06-10 21:24:35 +00:00
This is our way to say - we support you guys!