# TRAINS Server ## Magic Version Control & Experiment Manager for AI ## Introduction The **trains-server** is the infrastructure behind [trains](https://github.com/allegroai/trains). The server provides: * UI (single-page webapp) for experiment management and browsing * REST interface for documenting and logging experiment information, statistics and results * REST interface for querying experiments history, logs and results * Locally-hosted fileserver, for storing images and models to be easily accessible from the UI The server is designed to allow multiple users to collaborate and manage their experiments. The server’s code is freely available [here](https://github.com/allegroai/trains-server). We've also pre-built a docker image to allow **trains** users to quickly set up their own server. ## System diagram
TRAINS-server +--------------------------------------------------------------------+ | | | Server Docker Elastic Docker Mongo Docker | | +-------------------------+ +---------------+ +------------+ | | | Pythonic Server | | | | | | | | +-----------------+ | | ElasticSearch | | MongoDB | | | | | WEB server | | | | | | | | | | Port 8080 | | | | | | | | | +--------+--------+ | | | | | | | | | | | | | | | | | +--------+--------+ | | | | | | | | | API server +----------------------------+ | | | | | Port 8008 +---------+ | | | | | | +-----------------+ | +-------+-------+ +-----+------+ | | | | | | | | | +-----------------+ | +---+----------------+------+ | | | | File Server +-------+ | Host Storage | | | | | Port 8081 | | +-----+ | | | | +-----------------+ | +---------------------------+ | | +------------+------------+ | +---------------|----------------------------------------------------+ |HTTP +--------+ GPU Machine | +------------------------|-------------------------------------------+ | +------------------|--------------+ | | | Training | | +---------------------+ | | | Code +---+------------+ | | trains configuration| | | | | TRAINS | | | ~/trains.conf | | | | | +------+ | | | | +----------------+ | +---------------------+ | | +---------------------------------+ | +--------------------------------------------------------------------+## Installation In order to install and run the pre-built **trains-server**, you must be logged in as a user with sudo privileges. ### Setup In order to run the pre-packaged **trains-server**, you'll need to install **docker**. #### Install docker ```bash sudo apt-get install docker ``` #### Setup docker daemon In order to run the ElasticSearch docker container, you'll need to change some of the default values in the Docker configuration file. For systems with an `/etc/sysconfig/docker` file, add the options in quotes to the available arguments in `OPTIONS`: ```bash OPTIONS="--default-ulimit nofile=1024:65536 --default-ulimit memlock=-1:-1" ``` For systems with an `/etc/docker/daemon.json` file, add the section in curly brackets to `default-ulimits`: ```json "default-ulimits": { "nofile": { "name": "nofile", "hard": 65536, "soft": 1024 }, "memlock": { "name": "memlock", "soft": -1, "hard": -1 } } ``` Following this configuration change, you will have to restart the docker daemon: ```bash sudo service docker stop sudo service docker start ``` #### vm.max_map_count The `vm.max_map_count` kernel setting must be at least 262144. The following example was tested with CentOS 7, Ubuntu 16.04, Mint 18.3, Ubuntu 18.04 and Mint 19: ```bash sudo echo "vm.max_map_count=262144" > /tmp/99-trains.conf sudo mv /tmp/99-trains.conf /etc/sysctl.d/99-trains.conf sudo sysctl -w vm.max_map_count=262144 ``` For additional information about setting this parameter on other systems, see the [elastic](https://www.elastic.co/guide/en/elasticsearch/reference/current/docker.html#docker-cli-run-prod-mode) documentation. #### Choose a data folder You will need to choose a directory on your system in which all data maintained by **trains-server** will be stored (among others, this includes database, uploaded files and logs). The following instructions assume the directory is `/opt/trains`. Issue the following commands: ```bash sudo mkdir -p /opt/trains/data/elastic && sudo chown -R 1000:1000 /opt/trains ``` ### Launching docker images To launch the docker images, issue the following commands: ```bash sudo docker run -d --restart="always" --name="trains-elastic" -e "ES_JAVA_OPTS=-Xms2g -Xmx2g" -e "bootstrap.memory_lock=true" -e "cluster.name=trains" -e "discovery.zen.minimum_master_nodes=1" -e "node.name=trains" -e "script.inline=true" -e "script.update=true" -e "thread_pool.bulk.queue_size=2000" -e "thread_pool.search.queue_size=10000" -e "xpack.security.enabled=false" -e "xpack.monitoring.enabled=false" -e "cluster.routing.allocation.node_initial_primaries_recoveries=500" -e "node.ingest=true" -e "http.compression_level=7" -e "reindex.remote.whitelist=*.*" -e "script.painless.regex.enabled=true" --network="host" -v /opt/trains/data/elastic:/usr/share/elasticsearch/data docker.elastic.co/elasticsearch/elasticsearch:5.6.16 ``` ```bash sudo docker run -d --restart="always" --name="trains-mongo" -v /opt/trains/data/mongo/db:/data/db -v /opt/trains/data/mongo/configdb:/data/configdb --network="host" mongo:3.6.5 ``` ```bash sudo docker run -d --restart="always" --name="trains-fileserver" --network="host" -v /opt/trains/logs:/var/log/trains -v /opt/trains/data/fileserver:/mnt/fileserver allegroai/trains:latest fileserver ``` ```bash sudo docker run -d --restart="always" --name="trains-apiserver" --network="host" -v /opt/trains/logs:/var/log/trains allegroai/trains:latest apiserver ``` ```bash sudo docker run -d --restart="always" --name="trains-webserver" --network="host" -v /opt/trains/logs:/var/log/trains allegroai/trains:latest webserver ``` Once the **trains-server** dockers are up, the following are available: * API server on port `8008` * Web server on port `8080` * File server on port `8081` ## Upgrade We are constantly updating and adding stuff. When we release a new version, we’ll include a new pre-built docker image. Once a new release is out, you can simply: 1. Shut down and remove your docker instances. Each instance can be shut down and removed using the following commands: ```bash sudo docker stop