2019-12-24 15:58:02 +00:00
# Trains Server
2019-06-11 15:55:04 +00:00
2020-06-01 21:21:32 +00:00
## Auto-Magical Experiment Manager & Version Control for AI - ε Devops Included!
2019-06-10 21:24:35 +00:00
2019-06-11 17:09:23 +00:00
[![GitHub license ](https://img.shields.io/badge/license-SSPL-green.svg )](https://img.shields.io/badge/license-SSPL-green.svg)
2019-06-18 13:32:19 +00:00
[![Python versions ](https://img.shields.io/badge/python-3.6%20%7C%203.7-blue.svg )](https://img.shields.io/badge/python-3.6%20%7C%203.7-blue.svg)
2019-06-11 17:09:23 +00:00
[![GitHub version ](https://img.shields.io/github/release-pre/allegroai/trains-server.svg )](https://img.shields.io/github/release-pre/allegroai/trains-server.svg)
[![PyPI status ](https://img.shields.io/badge/status-beta-yellow.svg )](https://img.shields.io/badge/status-beta-yellow.svg)
2020-05-03 08:08:21 +00:00
### Help improve Trains by filling our 2-min [user survey](https://allegro.ai/lp/trains-user-survey/)
2020-06-01 21:21:32 +00:00
## :rocket: Trains-Agent Services is now included, for more information see [services](https://github.com/allegroai/trains-server#services)
2020-08-23 12:41:05 +00:00
## v0.16 Upgrade Notice
In v0.16, the Elasticsearch subsystem of Trains Server has been upgraded from version 5.6 to version 7.6. This change necessitates the migration of the database contents to accommodate the change in index structure across the different versions.
Follow [this procedure ](https://allegro.ai/docs/deploying_trains/trains_server_es7_migration/ ) to migrate existing data.
2019-06-10 21:24:35 +00:00
## Introduction
2019-12-24 15:58:02 +00:00
The **trains-server** is the backend service infrastructure for [Trains ](https://github.com/allegroai/trains ).
2019-06-12 22:27:36 +00:00
It allows multiple users to collaborate and manage their experiments.
2019-12-24 15:58:02 +00:00
By default, **Trains** is set up to work with the **Trains** demo server, which is open to anyone and resets periodically.
In order to host your own server, you will need to launch **trains-server** and point **Trains** to it.
2019-06-12 22:27:36 +00:00
2019-06-16 21:55:05 +00:00
**trains-server** contains the following components:
2019-06-11 15:55:04 +00:00
2019-12-24 15:58:02 +00:00
* The **Trains** Web-App, a single-page UI for experiment management and browsing
2019-06-16 21:55:05 +00:00
* RESTful API for:
* Documenting and logging experiment information, statistics and results
* Querying experiments history, logs and results
* Locally-hosted file server for storing images and models making them easily accessible using the Web-App
2019-06-10 21:24:35 +00:00
2019-12-19 16:27:16 +00:00
You can quickly [deploy ](#launching-trains-server ) your **trains-server** using Docker, AWS EC2 AMI, or Kubernetes.
2019-06-10 21:24:35 +00:00
2019-08-09 00:24:47 +00:00
## System design
2019-06-10 21:24:35 +00:00
2019-06-13 23:14:14 +00:00
![Alt Text ](https://github.com/allegroai/trains/blob/master/docs/system_diagram.png?raw=true )
2019-09-01 22:00:45 +00:00
**trains-server** has two supported configurations:
- Single IP (domain) with the following open ports
2019-10-29 18:43:46 +00:00
- Web application on port 8080
2019-09-01 22:00:45 +00:00
- API service on port 8008
- File storage service on port 8081
2019-10-29 18:43:46 +00:00
2019-09-01 22:00:45 +00:00
- Sub-Domain configuration with default http/s ports (80 or 443)
- Web application on sub-domain: app.\*.\*
- API service on sub-domain: api.\*.\*
- File storage service on sub-domain: files.\*.\*
2019-12-19 16:27:16 +00:00
## Launching trains-server
2019-10-29 18:43:46 +00:00
2019-12-19 16:27:16 +00:00
### Prerequisites
2019-06-18 13:32:19 +00:00
2019-12-19 16:27:16 +00:00
The ports 8080/8081/8008 must be available for the **trains-server** services.
For example, to see if port `8080` is in use:
2019-06-18 13:32:19 +00:00
2020-01-02 13:21:35 +00:00
* Linux or macOS:
2019-12-19 16:27:16 +00:00
sudo lsof -Pn -i4 | grep :8080 | grep LISTEN
2019-06-10 21:24:35 +00:00
2019-12-19 16:27:16 +00:00
* Windows:
2019-11-09 21:07:43 +00:00
2019-12-19 16:27:16 +00:00
netstat -an |find /i "8080"
2019-11-09 21:07:43 +00:00
2019-12-19 16:27:16 +00:00
### Launching
2019-11-09 21:07:43 +00:00
2019-12-19 16:27:16 +00:00
Launch **trains-server** in any of the following formats:
2019-07-17 15:46:12 +00:00
2020-08-10 20:48:38 +00:00
- Pre-built [AWS EC2 AMI ](https://allegro.ai/docs/deploying_trains/trains_server_aws_ec2_ami/ )
- Pre-built [GCP Custom Image ](https://allegro.ai/docs/deploying_trains/trains_server_gcp/ )
2019-12-19 16:27:16 +00:00
- Pre-built Docker Image
2020-08-10 20:48:38 +00:00
- [Linux ](https://allegro.ai/docs/deploying_trains/trains_server_linux_mac/ )
- [macOS ](https://allegro.ai/docs/deploying_trains/trains_server_linux_mac/ )
- [Windows 10 ](https://allegro.ai/docs/deploying_trains/trains_server_win/ )
2019-12-19 16:27:16 +00:00
- Kubernetes
2020-08-10 20:48:38 +00:00
- [Kubernetes Helm ](https://allegro.ai/docs/deploying_trains/trains_server_kubernetes_helm/ )
- Manual [Kubernetes installation ](https://allegro.ai/docs/deploying_trains/trains_server_kubernetes/ )
2019-10-29 18:43:46 +00:00
2019-12-24 15:58:02 +00:00
## Connecting Trains to your trains-server
2019-10-29 18:43:46 +00:00
2019-12-24 15:58:02 +00:00
By default, the **Trains** client is set up to work with the [**Trains** demo server ](https://demoapp.trains.allegro.ai/ ).
To have the **Trains** client use your **trains-server** instead:
2019-12-19 16:27:16 +00:00
- Run the `trains-init` command for an interactive setup.
- Or manually edit `~/trains.conf` file, making sure the server settings (`api_server`, `web_server` , `file_server` ) are configured correctly, for example:
2019-06-12 19:53:50 +00:00
2019-08-07 23:22:36 +00:00
api {
# API server on port 8008
api_server: "http://localhost:8008"
2019-10-29 18:43:46 +00:00
2019-08-07 23:22:36 +00:00
# web_server on port 8080
web_server: "http://localhost:8080"
2019-10-29 18:43:46 +00:00
2019-08-07 23:22:36 +00:00
# file server on port 8081
files_server: "http://localhost:8081"
}
2019-06-12 19:53:50 +00:00
2019-12-19 16:27:16 +00:00
**Note**: If you have set up **trains-server** in a sub-domain configuration, then there is no need to specify a port number,
2019-09-01 22:00:45 +00:00
it will be inferred from the http/s scheme.
2019-12-24 15:58:02 +00:00
After launching the **trains-server** and configuring the **Trains** client to use the **trains-server** ,
you can [use ](https://github.com/allegroai/trains#using-trains ) **Trains** in your experiments and view them in your **trains-server** web server,
2019-12-19 16:27:16 +00:00
for example http://localhost:8080.
2019-12-24 15:58:02 +00:00
For more information about the Trains client, see [**Trains** ](https://github.com/allegroai/trains ).
2019-06-12 19:53:50 +00:00
2020-06-01 21:21:32 +00:00
## Trains-Agent Services <a name="services"></a>
2020-06-01 21:46:01 +00:00
As of version 0.15 of **trains-server** , dockerized deployment includes a **Trains-Agent Services** container running as
part of the docker container collection.
2020-06-01 21:36:35 +00:00
2020-06-01 21:46:01 +00:00
Trains-Agent Services is an extension of Trains-Agent that provides the ability to launch long-lasting jobs
2020-06-01 21:36:35 +00:00
that previously had to be executed on local / dedicated machines. It allows a single agent to
2020-06-01 21:46:01 +00:00
launch multiple dockers (Tasks) for different use cases. To name a few use cases, auto-scaler service (spinning instances
2020-06-01 21:21:32 +00:00
when the need arises and the budget allows), Controllers (Implementing pipelines and more sophisticated DevOps logic),
Optimizer (such as Hyper-parameter Optimization or sweeping), and Application (such as interactive Bokeh apps for
increased data transparency)
2020-06-01 21:46:01 +00:00
Trains-Agent Services container will spin **any** task enqueued into the dedicated `services` queue.
Every task launched by Trains-Agent Services will be registered as a new node in the system,
2020-06-01 21:36:35 +00:00
providing tracking and transparency capabilities.
2020-06-03 00:51:11 +00:00
You can also run the Trains-Agent Services manually, see details in [trains-agent services mode ](https://github.com/allegroai/trains-agent#trains-agent-services-mode- )
2020-06-01 21:36:35 +00:00
2020-06-01 21:46:01 +00:00
**Note**: It is the user's responsibility to make sure the proper tasks are pushed into the `services` queue.
Do not enqueue training / inference tasks into the `services` queue, as it will put unnecessary load on the server.
2020-06-01 21:36:35 +00:00
2019-12-19 16:27:16 +00:00
## Advanced Functionality
2019-06-16 21:55:05 +00:00
2019-12-19 16:27:16 +00:00
**trains-server** provides a few additional useful features, which can be manually enabled:
2020-08-10 20:48:38 +00:00
* [Web login authentication ](https://allegro.ai/docs/faq/faq/#web-auth )
* [Non-responsive experiments watchdog ](https://allegro.ai/docs/faq/faq/#watchdog )
2019-12-19 16:27:16 +00:00
## Restarting trains-server
To restart the **trains-server** , you must first stop the containers, and then restart them.
```bash
docker-compose down
2020-01-05 07:19:37 +00:00
docker-compose -f docker-compose.yml up
2019-12-19 16:27:16 +00:00
```
2019-06-16 21:55:05 +00:00
2019-08-07 22:51:40 +00:00
## Upgrading <a name="upgrade"></a>
2019-06-10 21:24:35 +00:00
2020-01-05 07:19:37 +00:00
**trains-server** releases are also reflected in the [docker compose configuration file ](https://github.com/allegroai/trains-server/blob/master/docker-compose.yml ).
2019-12-19 16:27:16 +00:00
We strongly encourage you to keep your **trains-server** up to date, by keeping up with the current release.
**Note**: The following upgrade instructions use the Linux OS as an example.
To upgrade your existing **trains-server** deployment:
2019-06-11 15:55:04 +00:00
2019-11-09 22:18:16 +00:00
1. Shut down the docker containers
```bash
2019-12-19 16:27:16 +00:00
docker-compose down
2019-11-09 22:18:16 +00:00
```
2019-06-11 15:55:04 +00:00
2019-11-09 22:18:16 +00:00
1. We highly recommend backing up your data directory before upgrading.
2019-11-09 21:54:59 +00:00
2019-11-09 22:18:16 +00:00
Assuming your data directory is `/opt/trains` , to archive all data into `~/trains_backup.tgz` execute:
2019-11-09 21:54:59 +00:00
2019-11-09 22:18:16 +00:00
```bash
2019-12-19 16:27:16 +00:00
sudo tar czvf ~/trains_backup.tgz /opt/trains/data
2019-11-09 22:18:16 +00:00
```
2019-10-29 18:43:46 +00:00
2019-11-09 22:18:16 +00:00
< details >
< summary > Restore instructions:< / summary >
To restore this example backup, execute:
```bash
2019-12-19 16:27:16 +00:00
sudo rm -R /opt/trains/data
sudo tar -xzf ~/trains_backup.tgz -C /opt/trains/data
2019-11-09 22:18:16 +00:00
```
< / details >
2020-01-05 07:19:37 +00:00
1. Download the latest `docker-compose.yml` file.
2019-07-08 20:58:09 +00:00
2019-11-09 22:18:16 +00:00
```bash
2020-01-05 07:19:37 +00:00
curl https://raw.githubusercontent.com/allegroai/trains-server/master/docker-compose.yml -o docker-compose.yml
2019-11-09 22:18:16 +00:00
```
2020-06-01 21:46:01 +00:00
1. Configure the Trains-Agent Services (not supported on Windows installation).
If `TRAINS_HOST_IP` is not provided, Trains-Agent Services will use the external
public address of the **trains-server** . If `TRAINS_AGENT_GIT_USER` / `TRAINS_AGENT_GIT_PASS` are not provided,
the Trains-Agent Services will not be able to access any private repositories for running service tasks.
2020-06-01 21:21:32 +00:00
```bash
export TRAINS_HOST_IP=server_host_ip_here
export TRAINS_AGENT_GIT_USER=git_username_here
export TRAINS_AGENT_GIT_PASS=git_password_here
```
2019-12-19 16:27:16 +00:00
1. Spin up the docker containers, it will automatically pull the latest **trains-server** build
2019-11-09 22:18:16 +00:00
```bash
2020-01-05 07:19:37 +00:00
docker-compose -f docker-compose.yml pull
docker-compose -f docker-compose.yml up
2019-11-09 22:18:16 +00:00
```
2019-10-29 18:43:46 +00:00
2020-08-10 20:48:38 +00:00
**\* If something went wrong along the way, check our FAQ: [Common Docker Upgrade Errors ](https://allegro.ai/docs/faq/faq/#common-docker-upgrade-errors ).**
2019-10-29 18:43:46 +00:00
2019-06-10 21:24:35 +00:00
2019-08-01 16:36:58 +00:00
## Community & Support
2020-08-10 20:48:38 +00:00
If you have any questions, look to the Trains [FAQ ](https://allegro.ai/docs/faq/faq/ ), or
2019-08-01 16:36:58 +00:00
tag your questions on [stackoverflow ](https://stackoverflow.com/questions/tagged/trains ) with '**trains**' tag.
For feature requests or bug reports, please use [GitHub issues ](https://github.com/allegroai/trains-server/issues ).
Additionally, you can always find us at *trains@allegro.ai*
2019-06-10 21:24:35 +00:00
## License
[Server Side Public License v1.0 ](https://github.com/mongodb/mongo/blob/master/LICENSE-Community.txt )
2019-06-16 21:55:05 +00:00
**trains-server** relies on both [MongoDB ](https://github.com/mongodb/mongo ) and [ElasticSearch ](https://github.com/elastic/elasticsearch ).
2019-10-29 18:43:46 +00:00
With the recent changes in both MongoDB's and ElasticSearch's OSS license, we feel it is our responsibility as a
2019-06-16 21:55:05 +00:00
member of the community to support the projects we love and cherish.
2019-10-29 18:43:46 +00:00
We believe the cause for the license change in both cases is more than just,
2019-06-16 21:55:05 +00:00
and chose [SSPL ](https://www.mongodb.com/licensing/server-side-public-license ) because it is the more general and flexible of the two licenses.
2019-06-10 21:24:35 +00:00
This is our way to say - we support you guys!