Documentation

This commit is contained in:
allegroai 2019-06-11 18:55:04 +03:00
parent 06b64cd13f
commit 3269199262
5 changed files with 150 additions and 102 deletions

141
README.md
View File

@ -1,20 +1,25 @@
# TRAINS Server # TRAINS Server
## Magic Version Control & Experiment Manager for AI ## Magic Version Control & Experiment Manager for AI
## Introduction ## Introduction
The **trains-server** is the infrastructure behind [trains](https://github.com/allegroai/trains). The **trains-server** is the infrastructure for [trains](https://github.com/allegroai/trains).
It allows multiple users to collaborate and manage their experiments.
The server provides: The **trains-server** contains the following components:
* UI (single-page webapp) for experiment management and browsing * the Web-App which is a single-page UI for experiment management and browsing
* REST interface for documenting and logging experiment information, statistics and results * a REST interface for:
* REST interface for querying experiments history, logs and results * documenting and logging experiment information, statistics and results
* Locally-hosted fileserver, for storing images and models to be easily accessible from the UI * querying experiments history, logs and results
* a locally-hosted file server for storing images and models making them easily accessible using the Web-App
The server is designed to allow multiple users to collaborate and manage their experiments. You can quickly setup your **trains-server** using a pre-built Docker image (see [Installation](#installation)).
The servers code is freely available [here](https://github.com/allegroai/trains-server).
We've also pre-built a docker image to allow **trains** users to quickly set up their own server. When new releases are available, you can upgrade your pre-built Docker image (see [Upgrade](#upgrade)).
The **trains-server's** code is freely available [here](https://github.com/allegroai/trains-server).
## System diagram ## System diagram
@ -57,31 +62,61 @@ We've also pre-built a docker image to allow **trains** users to quickly set up
## Installation ## Installation
In order to install and run the pre-built **trains-server**, you must be logged in as a user with sudo privileges. This section contains the instructions to setup and launch a pre-built Docker image for the **trains-server**.
**Note**: This Docker image was tested with Linux, only. For Windows users, we recommend running the server
on a Linux virtual machine.
### Prerequisites
You must be logged in as a user with sudo privileges.
### Setup ### Setup
In order to run the pre-packaged **trains-server**, you'll need to install **docker**. #### Step 1. Install Docker CE
#### Install docker You must install Docker to run the pre-packaged **trains-server**.
* For [Ubuntu](https://docs.docker.com/install/linux/docker-ce/ubuntu/) / Mint (x86_64/amd64):
```bash ```bash
sudo apt-get install docker sudo apt-get install -y apt-transport-https ca-certificates curl software-properties-common
curl -fsSL https://download.docker.com/linux/ubuntu/gpg | sudo apt-key add -
. /etc/os-release
sudo add-apt-repository "deb [arch=amd64] https://download.docker.com/linux/ubuntu $UBUNTU_CODENAME stable"
sudo apt-get update
sudo apt-get install -y docker-ce
``` ```
#### Setup docker daemon * For other operating systems, see [Supported platforms](https://docs.docker.com/install//#support) in the Docker documentation for instructions.
In order to run the ElasticSearch docker container, you'll need to change some of the default values in the Docker configuration file.
For systems with an `/etc/sysconfig/docker` file, add the options in quotes to the available arguments in `OPTIONS`: #### Step 2. Setup the Docker daemon
To run the ElasticSearch Docker container, you must setup the Docker daemon by modifing the default
values required by Elastic in your Docker configuration file
that are used by the **trains-server**. We provide instructions for the most common Docker configuration files.
You must edit or create a Docker configuration file:
* If your Docker configuration file is `/etc/sysconfig/docker`, edit it.
Add the options in quotes to the available arguments in the `OPTIONS` section:
```bash ```bash
OPTIONS="--default-ulimit nofile=1024:65536 --default-ulimit memlock=-1:-1" OPTIONS="--default-ulimit nofile=1024:65536 --default-ulimit memlock=-1:-1"
``` ```
For systems with an `/etc/docker/daemon.json` file, add the section in curly brackets to `default-ulimits`: * Otherwise, edit `/etc/docker/daemon.json` (if it exists) or create it (if it does not exist).
Add or modify the `defaults-ulimits` section as shown below. Be sure your configuration file contains the `nofile` and `memlock` sub-sections and values shown.
**Note**: Your configuration file may contain other sections. If so, confirm that the sections are separated by commas. For more information about Docker configuration files, see an [Daemon configuration file](https://docs.docker.com/engine/reference/commandline/dockerd/#daemon-configuration-file) in the Docker documentation.
The **trains-server** required defaults values are:
```json ```json
"default-ulimits": { {
"default-ulimits": {
"nofile": { "nofile": {
"name": "nofile", "name": "nofile",
"hard": 65536, "hard": 65536,
@ -93,21 +128,28 @@ For systems with an `/etc/docker/daemon.json` file, add the section in curly bra
"soft": -1, "soft": -1,
"hard": -1 "hard": -1
} }
}
} }
``` ```
Following this configuration change, you will have to restart the docker daemon: #### Step 3. Restart the Docker daemon
You must restart the Docker daemon after modifying the configuration file:
```bash ```bash
sudo service docker stop sudo service docker stop
sudo service docker start sudo service docker start
``` ```
#### vm.max_map_count #### Step 4. Set the Maximum Number of Memory Map Areas
The `vm.max_map_count` kernel setting must be at least 262144. The maximum number of memory map areas a process can use is defined
using the `vm.max_map_count` kernel setting.
The following example was tested with CentOS 7, Ubuntu 16.04, Mint 18.3, Ubuntu 18.04 and Mint 19: Elastic requires that `vm.max_map_count` to be at least 262144.
* For CentOS 7, Ubuntu 16.04, Mint 18.3, Ubuntu 18.04 and Mint 19 users, we tested the following commands to set
`vm.max_map_count`:
```bash ```bash
sudo echo "vm.max_map_count=262144" > /tmp/99-trains.conf sudo echo "vm.max_map_count=262144" > /tmp/99-trains.conf
@ -115,25 +157,23 @@ sudo mv /tmp/99-trains.conf /etc/sysctl.d/99-trains.conf
sudo sysctl -w vm.max_map_count=262144 sudo sysctl -w vm.max_map_count=262144
``` ```
For additional information about setting this parameter on other systems, see the [elastic](https://www.elastic.co/guide/en/elasticsearch/reference/current/docker.html#docker-cli-run-prod-mode) documentation. * For information about setting this parameter on other systems, see the [elastic](https://www.elastic.co/guide/en/elasticsearch/reference/current/docker.html#docker-cli-run-prod-mode) documentation.
#### Choose a data folder #### Step 5. Choose a Data Directory
You will need to choose a directory on your system in which all data maintained by **trains-server** will be stored (among others, this includes database, uploaded files and logs). You must choose a directory on your system in which all data maintained by the **trains-server** is stored,
create that directory, and set its permissions. The data stored in that directory includes the database, uploaded files and logs.
The following instructions assume the directory is `/opt/trains`. For example, if your data directory is `/opt/trains`, then use the following command:
Issue the following commands:
```bash ```bash
sudo mkdir -p /opt/trains/data/elastic && sudo chown -R 1000:1000 /opt/trains sudo mkdir -p /opt/trains/data/elastic && sudo chown -R 1000:1000 /opt/trains
``` ```
### Launching docker images ### Launching Docker Containers
To launch the docker images, issue the following commands:
Launch the Docker containers. For example, if your data directory is `\opt\trains`,
then use the following commands:
```bash ```bash
sudo docker run -d --restart="always" --name="trains-elastic" -e "ES_JAVA_OPTS=-Xms2g -Xmx2g" -e "bootstrap.memory_lock=true" -e "cluster.name=trains" -e "discovery.zen.minimum_master_nodes=1" -e "node.name=trains" -e "script.inline=true" -e "script.update=true" -e "thread_pool.bulk.queue_size=2000" -e "thread_pool.search.queue_size=10000" -e "xpack.security.enabled=false" -e "xpack.monitoring.enabled=false" -e "cluster.routing.allocation.node_initial_primaries_recoveries=500" -e "node.ingest=true" -e "http.compression_level=7" -e "reindex.remote.whitelist=*.*" -e "script.painless.regex.enabled=true" --network="host" -v /opt/trains/data/elastic:/usr/share/elasticsearch/data docker.elastic.co/elasticsearch/elasticsearch:5.6.16 sudo docker run -d --restart="always" --name="trains-elastic" -e "ES_JAVA_OPTS=-Xms2g -Xmx2g" -e "bootstrap.memory_lock=true" -e "cluster.name=trains" -e "discovery.zen.minimum_master_nodes=1" -e "node.name=trains" -e "script.inline=true" -e "script.update=true" -e "thread_pool.bulk.queue_size=2000" -e "thread_pool.search.queue_size=10000" -e "xpack.security.enabled=false" -e "xpack.monitoring.enabled=false" -e "cluster.routing.allocation.node_initial_primaries_recoveries=500" -e "node.ingest=true" -e "http.compression_level=7" -e "reindex.remote.whitelist=*.*" -e "script.painless.regex.enabled=true" --network="host" -v /opt/trains/data/elastic:/usr/share/elasticsearch/data docker.elastic.co/elasticsearch/elasticsearch:5.6.16
@ -155,7 +195,7 @@ sudo docker run -d --restart="always" --name="trains-apiserver" --network="host"
sudo docker run -d --restart="always" --name="trains-webserver" --network="host" -v /opt/trains/logs:/var/log/trains allegroai/trains:latest webserver sudo docker run -d --restart="always" --name="trains-webserver" --network="host" -v /opt/trains/logs:/var/log/trains allegroai/trains:latest webserver
``` ```
Once the **trains-server** dockers are up, the following are available: After the **trains-server** Dockers are up, the following are available:
* API server on port `8008` * API server on port `8008`
* Web server on port `8080` * Web server on port `8080`
@ -163,32 +203,37 @@ Once the **trains-server** dockers are up, the following are available:
## Upgrade ## Upgrade
We are constantly updating and adding stuff. We are constantly updating, improving and adding to the **trains-server**.
When we release a new version, well include a new pre-built docker image. New releases will include new pre-built Docker images.
Once a new release is out, you can simply: When we release a new version and include a new pre-built Docker image for it, upgrade as follows:
1. Shut down and remove each of your Docker instances using the following commands:
1. Shut down and remove your docker instances. Each instance can be shut down and removed using the following commands:
```bash
sudo docker stop <docker-name> sudo docker stop <docker-name>
sudo docker rm -v <docker-name> sudo docker rm -v <docker-name>
```
The docker names are (see [Launching docker images](#Launching-docker-images)): The Docker names are (see [Launching Docker images](##launching-docker-images)):
* `trains-elastic` * `trains-elastic`
* `trains-mongo` * `trains-mongo`
* `trains-fileserver` * `trains-fileserver`
* `trains-apiserver` * `trains-apiserver`
* `trains-webserver` * `trains-webserver`
2. Back up your data folder (recommended!). A simple way to do that is using this command: 2. We highly recommend backing up your data directory!. A simple way to do that is using `tar`:
```bash
For example, if your data directory is `/opt/trains`, use the following command:
sudo tar czvf ~/trains_backup.tgz /opt/trains/data sudo tar czvf ~/trains_backup.tgz /opt/trains/data
```
Which will back up all data to an archive in your home folder. Restoring such a backup can be done using these commands: This back ups all data to an archive in your home directory.
```bash
To restore this example backup, use the following command:
sudo rm -R /opt/trains/data sudo rm -R /opt/trains/data
sudo tar -xzf ~/trains_backup.tgz -C /opt/trains/data sudo tar -xzf ~/trains_backup.tgz -C /opt/trains/data
```
3. Launch the newly released docker image (see [Launching docker images](#Launching-docker-images)) 3. Launch the newly released Docker image (see [Launching Docker images](#Launching-docker-images)).
## License ## License
@ -196,6 +241,6 @@ Once a new release is out, you can simply:
**trains-server** relies *heavily* on both [MongoDB](https://github.com/mongodb/mongo) and [ElasticSearch](https://github.com/elastic/elasticsearch). **trains-server** relies *heavily* on both [MongoDB](https://github.com/mongodb/mongo) and [ElasticSearch](https://github.com/elastic/elasticsearch).
With the recent changes in both MongoDB's and ElasticSearch's OSS license, we feel it is our job as a community to support the projects we love and cherish. With the recent changes in both MongoDB's and ElasticSearch's OSS license, we feel it is our job as a community to support the projects we love and cherish.
We feel the cause for the license change in both cases is more than just, and chose [SSPL](https://www.mongodb.com/licensing/server-side-public-license) because it is the more restrictive of the two. We feel the cause for the license change in both cases is more than just, and chose [SSPL](https://www.mongodb.com/licensing/server-side-public-license) because it is the more general and flexible of the two.
This is our way to say - we support you guys! This is our way to say - we support you guys!

View File

@ -57,7 +57,7 @@ class BasicConfig:
return conf return conf
if verbose: if verbose:
print("Loading config from {conf_root}") print(f"Loading config from {conf_root}")
for file in conf_root.rglob("*.conf"): for file in conf_root.rglob("*.conf"):
key = ".".join(file.relative_to(conf_root).with_suffix("").parts) key = ".".join(file.relative_to(conf_root).with_suffix("").parts)

View File

@ -57,7 +57,7 @@ class BasicConfig:
return conf return conf
if verbose: if verbose:
print("Loading config from {conf_root}") print(f"Loading config from {conf_root}")
for file in conf_root.rglob("*.conf"): for file in conf_root.rglob("*.conf"):
key = ".".join(file.relative_to(conf_root).with_suffix("").parts) key = ".".join(file.relative_to(conf_root).with_suffix("").parts)

View File

@ -1,10 +1,9 @@
{ # requested token expiration in seconds (one month)
# requested token expiration in seconds (one month) apiserver_token_expiration: 2592000
apiserver_token_expiration: 2592000
debug: false debug: false
flask { flask {
# Uncomment next line to disable login requirement while testing (or unit-testing) # Uncomment next line to disable login requirement while testing (or unit-testing)
TESTING: False TESTING: False
@ -18,14 +17,14 @@
REMEMBER_COOKIE_HTTPONLY: True REMEMBER_COOKIE_HTTPONLY: True
SESSION_COOKIE_SECURE: False SESSION_COOKIE_SECURE: False
REMEMBER_COOKIE_SECURE: False REMEMBER_COOKIE_SECURE: False
} }
listen { listen {
ip : "0.0.0.0" ip : "0.0.0.0"
port: 8080 port: 8080
} }
auth { auth {
cookies { cookies {
httponly: true # allow only http to access the cookies (no JS etc) httponly: true # allow only http to access the cookies (no JS etc)
secure: false # not using HTTPS secure: false # not using HTTPS
@ -35,15 +34,14 @@
session_auth_cookie_name: "trains_token_basic" session_auth_cookie_name: "trains_token_basic"
user_token_expiration_sec: 3600 user_token_expiration_sec: 3600
} }
docs { docs {
# Default filename used when file not found error is reported when serving docs. # Default filename used when file not found error is reported when serving docs.
# This usually happans when the path is to a folder and not a file. # This usually happans when the path is to a folder and not a file.
default_filename: "index.html" default_filename: "index.html"
}
default_company: "d1bd92a3b039400cbafc60a7a5b1e52b"
redirect_to_https: false
} }
default_company: "d1bd92a3b039400cbafc60a7a5b1e52b"
redirect_to_https: false

View File

@ -210,6 +210,11 @@ def _serve_webapp(path=None):
return response return response
@app.route("/favicon.ico")
def favicon():
return send_from_directory("static", "favicon.ico")
@app.route("/") @app.route("/")
def index(): def index():
if not current_user.is_authenticated: if not current_user.is_authenticated: