fileserver | ||
server | ||
webserver | ||
.gitignore | ||
README.md |
TRAINS Server
Magic Version Control & Experiment Manager for AI
Introduction
The trains-server is the infrastructure behind trains.
The server provides:
- UI (single-page webapp) for experiment management and browsing
- REST interface for documenting and logging experiment information, statistics and results
- REST interface for querying experiments history, logs and results
- Locally-hosted fileserver, for storing images and models to be easily accessible from the UI
The server is designed to allow multiple users to collaborate and manage their experiments. The server’s code is freely available here. We've also pre-built a docker image to allow trains users to quickly set up their own server.
System diagram
TRAINS-server +--------------------------------------------------------------------+ | | | Server Docker Elastic Docker Mongo Docker | | +-------------------------+ +---------------+ +------------+ | | | Pythonic Server | | | | | | | | +-----------------+ | | ElasticSearch | | MongoDB | | | | | WEB server | | | | | | | | | | Port 8080 | | | | | | | | | +--------+--------+ | | | | | | | | | | | | | | | | | +--------+--------+ | | | | | | | | | API server +----------------------------+ | | | | | Port 8008 +---------+ | | | | | | +-----------------+ | +-------+-------+ +-----+------+ | | | | | | | | | +-----------------+ | +---+----------------+------+ | | | | File Server +-------+ | Host Storage | | | | | Port 8081 | | +-----+ | | | | +-----------------+ | +---------------------------+ | | +------------+------------+ | +---------------|----------------------------------------------------+ |HTTP +--------+ GPU Machine | +------------------------|-------------------------------------------+ | +------------------|--------------+ | | | Training | | +---------------------+ | | | Code +---+------------+ | | trains configuration| | | | | TRAINS | | | ~/trains.conf | | | | | +------+ | | | | +----------------+ | +---------------------+ | | +---------------------------------+ | +--------------------------------------------------------------------+
Installation
In order to install and run the pre-built trains-server, you must be logged in as a user with sudo privileges.
Setup
In order to run the pre-packaged trains-server, you'll need to install docker.
Install docker
sudo apt-get install docker
Setup docker daemon
In order to run the ElasticSearch docker container, you'll need to change some of the default values in the Docker configuration file.
For systems with an /etc/sysconfig/docker
file, add the options in quotes to the available arguments in OPTIONS
:
OPTIONS="--default-ulimit nofile=1024:65536 --default-ulimit memlock=-1:-1"
For systems with an /etc/docker/daemon.json
file, add the section in curly brackets to default-ulimits
:
"default-ulimits": {
"nofile": {
"name": "nofile",
"hard": 65536,
"soft": 1024
},
"memlock":
{
"name": "memlock",
"soft": -1,
"hard": -1
}
}
Following this configuration change, you will have to restart the docker daemon:
sudo service docker stop
sudo service docker start
vm.max_map_count
The vm.max_map_count
kernel setting must be at least 262144.
The following example was tested with CentOS 7, Ubuntu 16.04, Mint 18.3, Ubuntu 18.04 and Mint 19:
sudo echo "vm.max_map_count=262144" > /tmp/99-trains.conf
sudo mv /tmp/99-trains.conf /etc/sysctl.d/99-trains.conf
sudo sysctl -w vm.max_map_count=262144
For additional information about setting this parameter on other systems, see the elastic documentation.
Choose a data folder
You will need to choose a directory on your system in which all data maintained by trains-server will be stored (among others, this includes database, uploaded files and logs).
The following instructions assume the directory is /opt/trains
.
Issue the following commands:
sudo mkdir -p /opt/trains/data/elastic && sudo chown -R 1000:1000 /opt/trains
Launching docker images
To launch the docker images, issue the following commands:
sudo docker run -d --restart="always" --name="trains-elastic" -e "ES_JAVA_OPTS=-Xms2g -Xmx2g" -e "bootstrap.memory_lock=true" -e "cluster.name=trains" -e "discovery.zen.minimum_master_nodes=1" -e "node.name=trains" -e "script.inline=true" -e "script.update=true" -e "thread_pool.bulk.queue_size=2000" -e "thread_pool.search.queue_size=10000" -e "xpack.security.enabled=false" -e "xpack.monitoring.enabled=false" -e "cluster.routing.allocation.node_initial_primaries_recoveries=500" -e "node.ingest=true" -e "http.compression_level=7" -e "reindex.remote.whitelist=*.*" -e "script.painless.regex.enabled=true" --network="host" -v /opt/trains/data/elastic:/usr/share/elasticsearch/data docker.elastic.co/elasticsearch/elasticsearch:5.6.16
sudo docker run -d --restart="always" --name="trains-mongo" -v /opt/trains/data/mongo/db:/data/db -v /opt/trains/data/mongo/configdb:/data/configdb --network="host" mongo:3.6.5
sudo docker run -d --restart="always" --name="trains-fileserver" --network="host" -v /opt/trains/logs:/var/log/trains -v /opt/trains/data/fileserver:/mnt/fileserver allegroai/trains:latest fileserver
sudo docker run -d --restart="always" --name="trains-apiserver" --network="host" -v /opt/trains/logs:/var/log/trains allegroai/trains:latest apiserver
sudo docker run -d --restart="always" --name="trains-webserver" --network="host" -v /opt/trains/logs:/var/log/trains allegroai/trains:latest webserver
Once the trains-server dockers are up, the following are available:
- API server on port
8008
- Web server on port
8080
- File server on port
8081
Upgrade
We are constantly updating and adding stuff. When we release a new version, we’ll include a new pre-built docker image. Once a new release is out, you can simply:
-
Shut down and remove your docker instances. Each instance can be shut down and removed using the following commands:
sudo docker stop <docker-name> sudo docker rm -v <docker-name>
The docker names are (see Launching docker images):
trains-elastic
trains-mongo
trains-fileserver
trains-apiserver
trains-webserver
-
Back up your data folder (recommended!). A simple way to do that is using this command:
sudo tar czvf ~/trains_backup.tgz /opt/trains/data
Which will back up all data to an archive in your home folder. Restoring such a backup can be done using these commands:
sudo rm -R /opt/trains/data sudo tar -xzf ~/trains_backup.tgz -C /opt/trains/data
-
Launch the newly released docker image (see Launching docker images)
License
Server Side Public License v1.0
trains-server relies heavily on both MongoDB and ElasticSearch. With the recent changes in both MongoDB's and ElasticSearch's OSS license, we feel it is our job as a community to support the projects we love and cherish. We feel the cause for the license change in both cases is more than just, and chose SSPL because it is the more restrictive of the two.
This is our way to say - we support you guys!