mirror of
https://github.com/clearml/clearml-server
synced 2025-06-26 23:15:47 +00:00
Documentation
This commit is contained in:
parent
06b64cd13f
commit
3269199262
141
README.md
141
README.md
@ -1,20 +1,25 @@
|
|||||||
# TRAINS Server
|
# TRAINS Server
|
||||||
|
|
||||||
## Magic Version Control & Experiment Manager for AI
|
## Magic Version Control & Experiment Manager for AI
|
||||||
|
|
||||||
## Introduction
|
## Introduction
|
||||||
|
|
||||||
The **trains-server** is the infrastructure behind [trains](https://github.com/allegroai/trains).
|
The **trains-server** is the infrastructure for [trains](https://github.com/allegroai/trains).
|
||||||
|
It allows multiple users to collaborate and manage their experiments.
|
||||||
|
|
||||||
The server provides:
|
The **trains-server** contains the following components:
|
||||||
|
|
||||||
* UI (single-page webapp) for experiment management and browsing
|
* the Web-App which is a single-page UI for experiment management and browsing
|
||||||
* REST interface for documenting and logging experiment information, statistics and results
|
* a REST interface for:
|
||||||
* REST interface for querying experiments history, logs and results
|
* documenting and logging experiment information, statistics and results
|
||||||
* Locally-hosted fileserver, for storing images and models to be easily accessible from the UI
|
* querying experiments history, logs and results
|
||||||
|
* a locally-hosted file server for storing images and models making them easily accessible using the Web-App
|
||||||
|
|
||||||
The server is designed to allow multiple users to collaborate and manage their experiments.
|
You can quickly setup your **trains-server** using a pre-built Docker image (see [Installation](#installation)).
|
||||||
The server’s code is freely available [here](https://github.com/allegroai/trains-server).
|
|
||||||
We've also pre-built a docker image to allow **trains** users to quickly set up their own server.
|
When new releases are available, you can upgrade your pre-built Docker image (see [Upgrade](#upgrade)).
|
||||||
|
|
||||||
|
The **trains-server's** code is freely available [here](https://github.com/allegroai/trains-server).
|
||||||
|
|
||||||
## System diagram
|
## System diagram
|
||||||
|
|
||||||
@ -57,31 +62,61 @@ We've also pre-built a docker image to allow **trains** users to quickly set up
|
|||||||
|
|
||||||
## Installation
|
## Installation
|
||||||
|
|
||||||
In order to install and run the pre-built **trains-server**, you must be logged in as a user with sudo privileges.
|
This section contains the instructions to setup and launch a pre-built Docker image for the **trains-server**.
|
||||||
|
|
||||||
|
**Note**: This Docker image was tested with Linux, only. For Windows users, we recommend running the server
|
||||||
|
on a Linux virtual machine.
|
||||||
|
|
||||||
|
### Prerequisites
|
||||||
|
|
||||||
|
You must be logged in as a user with sudo privileges.
|
||||||
|
|
||||||
### Setup
|
### Setup
|
||||||
|
|
||||||
In order to run the pre-packaged **trains-server**, you'll need to install **docker**.
|
#### Step 1. Install Docker CE
|
||||||
|
|
||||||
#### Install docker
|
You must install Docker to run the pre-packaged **trains-server**.
|
||||||
|
|
||||||
|
* For [Ubuntu](https://docs.docker.com/install/linux/docker-ce/ubuntu/) / Mint (x86_64/amd64):
|
||||||
|
|
||||||
```bash
|
```bash
|
||||||
sudo apt-get install docker
|
sudo apt-get install -y apt-transport-https ca-certificates curl software-properties-common
|
||||||
|
curl -fsSL https://download.docker.com/linux/ubuntu/gpg | sudo apt-key add -
|
||||||
|
. /etc/os-release
|
||||||
|
sudo add-apt-repository "deb [arch=amd64] https://download.docker.com/linux/ubuntu $UBUNTU_CODENAME stable"
|
||||||
|
sudo apt-get update
|
||||||
|
sudo apt-get install -y docker-ce
|
||||||
```
|
```
|
||||||
|
|
||||||
#### Setup docker daemon
|
* For other operating systems, see [Supported platforms](https://docs.docker.com/install//#support) in the Docker documentation for instructions.
|
||||||
In order to run the ElasticSearch docker container, you'll need to change some of the default values in the Docker configuration file.
|
|
||||||
|
|
||||||
For systems with an `/etc/sysconfig/docker` file, add the options in quotes to the available arguments in `OPTIONS`:
|
#### Step 2. Setup the Docker daemon
|
||||||
|
|
||||||
|
To run the ElasticSearch Docker container, you must setup the Docker daemon by modifing the default
|
||||||
|
values required by Elastic in your Docker configuration file
|
||||||
|
that are used by the **trains-server**. We provide instructions for the most common Docker configuration files.
|
||||||
|
|
||||||
|
You must edit or create a Docker configuration file:
|
||||||
|
|
||||||
|
* If your Docker configuration file is `/etc/sysconfig/docker`, edit it.
|
||||||
|
|
||||||
|
Add the options in quotes to the available arguments in the `OPTIONS` section:
|
||||||
|
|
||||||
```bash
|
```bash
|
||||||
OPTIONS="--default-ulimit nofile=1024:65536 --default-ulimit memlock=-1:-1"
|
OPTIONS="--default-ulimit nofile=1024:65536 --default-ulimit memlock=-1:-1"
|
||||||
```
|
```
|
||||||
|
|
||||||
For systems with an `/etc/docker/daemon.json` file, add the section in curly brackets to `default-ulimits`:
|
* Otherwise, edit `/etc/docker/daemon.json` (if it exists) or create it (if it does not exist).
|
||||||
|
|
||||||
|
Add or modify the `defaults-ulimits` section as shown below. Be sure your configuration file contains the `nofile` and `memlock` sub-sections and values shown.
|
||||||
|
|
||||||
|
**Note**: Your configuration file may contain other sections. If so, confirm that the sections are separated by commas. For more information about Docker configuration files, see an [Daemon configuration file](https://docs.docker.com/engine/reference/commandline/dockerd/#daemon-configuration-file) in the Docker documentation.
|
||||||
|
|
||||||
|
The **trains-server** required defaults values are:
|
||||||
|
|
||||||
```json
|
```json
|
||||||
"default-ulimits": {
|
{
|
||||||
|
"default-ulimits": {
|
||||||
"nofile": {
|
"nofile": {
|
||||||
"name": "nofile",
|
"name": "nofile",
|
||||||
"hard": 65536,
|
"hard": 65536,
|
||||||
@ -93,21 +128,28 @@ For systems with an `/etc/docker/daemon.json` file, add the section in curly bra
|
|||||||
"soft": -1,
|
"soft": -1,
|
||||||
"hard": -1
|
"hard": -1
|
||||||
}
|
}
|
||||||
|
}
|
||||||
}
|
}
|
||||||
```
|
```
|
||||||
|
|
||||||
Following this configuration change, you will have to restart the docker daemon:
|
#### Step 3. Restart the Docker daemon
|
||||||
|
|
||||||
|
You must restart the Docker daemon after modifying the configuration file:
|
||||||
|
|
||||||
```bash
|
```bash
|
||||||
sudo service docker stop
|
sudo service docker stop
|
||||||
sudo service docker start
|
sudo service docker start
|
||||||
```
|
```
|
||||||
|
|
||||||
#### vm.max_map_count
|
#### Step 4. Set the Maximum Number of Memory Map Areas
|
||||||
|
|
||||||
The `vm.max_map_count` kernel setting must be at least 262144.
|
The maximum number of memory map areas a process can use is defined
|
||||||
|
using the `vm.max_map_count` kernel setting.
|
||||||
|
|
||||||
The following example was tested with CentOS 7, Ubuntu 16.04, Mint 18.3, Ubuntu 18.04 and Mint 19:
|
Elastic requires that `vm.max_map_count` to be at least 262144.
|
||||||
|
|
||||||
|
* For CentOS 7, Ubuntu 16.04, Mint 18.3, Ubuntu 18.04 and Mint 19 users, we tested the following commands to set
|
||||||
|
`vm.max_map_count`:
|
||||||
|
|
||||||
```bash
|
```bash
|
||||||
sudo echo "vm.max_map_count=262144" > /tmp/99-trains.conf
|
sudo echo "vm.max_map_count=262144" > /tmp/99-trains.conf
|
||||||
@ -115,25 +157,23 @@ sudo mv /tmp/99-trains.conf /etc/sysctl.d/99-trains.conf
|
|||||||
sudo sysctl -w vm.max_map_count=262144
|
sudo sysctl -w vm.max_map_count=262144
|
||||||
```
|
```
|
||||||
|
|
||||||
For additional information about setting this parameter on other systems, see the [elastic](https://www.elastic.co/guide/en/elasticsearch/reference/current/docker.html#docker-cli-run-prod-mode) documentation.
|
* For information about setting this parameter on other systems, see the [elastic](https://www.elastic.co/guide/en/elasticsearch/reference/current/docker.html#docker-cli-run-prod-mode) documentation.
|
||||||
|
|
||||||
#### Choose a data folder
|
#### Step 5. Choose a Data Directory
|
||||||
|
|
||||||
You will need to choose a directory on your system in which all data maintained by **trains-server** will be stored (among others, this includes database, uploaded files and logs).
|
You must choose a directory on your system in which all data maintained by the **trains-server** is stored,
|
||||||
|
create that directory, and set its permissions. The data stored in that directory includes the database, uploaded files and logs.
|
||||||
|
|
||||||
The following instructions assume the directory is `/opt/trains`.
|
For example, if your data directory is `/opt/trains`, then use the following command:
|
||||||
|
|
||||||
Issue the following commands:
|
|
||||||
|
|
||||||
```bash
|
```bash
|
||||||
sudo mkdir -p /opt/trains/data/elastic && sudo chown -R 1000:1000 /opt/trains
|
sudo mkdir -p /opt/trains/data/elastic && sudo chown -R 1000:1000 /opt/trains
|
||||||
```
|
```
|
||||||
|
|
||||||
### Launching docker images
|
### Launching Docker Containers
|
||||||
|
|
||||||
|
|
||||||
To launch the docker images, issue the following commands:
|
|
||||||
|
|
||||||
|
Launch the Docker containers. For example, if your data directory is `\opt\trains`,
|
||||||
|
then use the following commands:
|
||||||
|
|
||||||
```bash
|
```bash
|
||||||
sudo docker run -d --restart="always" --name="trains-elastic" -e "ES_JAVA_OPTS=-Xms2g -Xmx2g" -e "bootstrap.memory_lock=true" -e "cluster.name=trains" -e "discovery.zen.minimum_master_nodes=1" -e "node.name=trains" -e "script.inline=true" -e "script.update=true" -e "thread_pool.bulk.queue_size=2000" -e "thread_pool.search.queue_size=10000" -e "xpack.security.enabled=false" -e "xpack.monitoring.enabled=false" -e "cluster.routing.allocation.node_initial_primaries_recoveries=500" -e "node.ingest=true" -e "http.compression_level=7" -e "reindex.remote.whitelist=*.*" -e "script.painless.regex.enabled=true" --network="host" -v /opt/trains/data/elastic:/usr/share/elasticsearch/data docker.elastic.co/elasticsearch/elasticsearch:5.6.16
|
sudo docker run -d --restart="always" --name="trains-elastic" -e "ES_JAVA_OPTS=-Xms2g -Xmx2g" -e "bootstrap.memory_lock=true" -e "cluster.name=trains" -e "discovery.zen.minimum_master_nodes=1" -e "node.name=trains" -e "script.inline=true" -e "script.update=true" -e "thread_pool.bulk.queue_size=2000" -e "thread_pool.search.queue_size=10000" -e "xpack.security.enabled=false" -e "xpack.monitoring.enabled=false" -e "cluster.routing.allocation.node_initial_primaries_recoveries=500" -e "node.ingest=true" -e "http.compression_level=7" -e "reindex.remote.whitelist=*.*" -e "script.painless.regex.enabled=true" --network="host" -v /opt/trains/data/elastic:/usr/share/elasticsearch/data docker.elastic.co/elasticsearch/elasticsearch:5.6.16
|
||||||
@ -155,7 +195,7 @@ sudo docker run -d --restart="always" --name="trains-apiserver" --network="host"
|
|||||||
sudo docker run -d --restart="always" --name="trains-webserver" --network="host" -v /opt/trains/logs:/var/log/trains allegroai/trains:latest webserver
|
sudo docker run -d --restart="always" --name="trains-webserver" --network="host" -v /opt/trains/logs:/var/log/trains allegroai/trains:latest webserver
|
||||||
```
|
```
|
||||||
|
|
||||||
Once the **trains-server** dockers are up, the following are available:
|
After the **trains-server** Dockers are up, the following are available:
|
||||||
|
|
||||||
* API server on port `8008`
|
* API server on port `8008`
|
||||||
* Web server on port `8080`
|
* Web server on port `8080`
|
||||||
@ -163,32 +203,37 @@ Once the **trains-server** dockers are up, the following are available:
|
|||||||
|
|
||||||
## Upgrade
|
## Upgrade
|
||||||
|
|
||||||
We are constantly updating and adding stuff.
|
We are constantly updating, improving and adding to the **trains-server**.
|
||||||
When we release a new version, we’ll include a new pre-built docker image.
|
New releases will include new pre-built Docker images.
|
||||||
Once a new release is out, you can simply:
|
When we release a new version and include a new pre-built Docker image for it, upgrade as follows:
|
||||||
|
|
||||||
|
1. Shut down and remove each of your Docker instances using the following commands:
|
||||||
|
|
||||||
1. Shut down and remove your docker instances. Each instance can be shut down and removed using the following commands:
|
|
||||||
```bash
|
|
||||||
sudo docker stop <docker-name>
|
sudo docker stop <docker-name>
|
||||||
sudo docker rm -v <docker-name>
|
sudo docker rm -v <docker-name>
|
||||||
```
|
|
||||||
The docker names are (see [Launching docker images](#Launching-docker-images)):
|
The Docker names are (see [Launching Docker images](##launching-docker-images)):
|
||||||
|
|
||||||
* `trains-elastic`
|
* `trains-elastic`
|
||||||
* `trains-mongo`
|
* `trains-mongo`
|
||||||
* `trains-fileserver`
|
* `trains-fileserver`
|
||||||
* `trains-apiserver`
|
* `trains-apiserver`
|
||||||
* `trains-webserver`
|
* `trains-webserver`
|
||||||
|
|
||||||
2. Back up your data folder (recommended!). A simple way to do that is using this command:
|
2. We highly recommend backing up your data directory!. A simple way to do that is using `tar`:
|
||||||
```bash
|
|
||||||
|
For example, if your data directory is `/opt/trains`, use the following command:
|
||||||
|
|
||||||
sudo tar czvf ~/trains_backup.tgz /opt/trains/data
|
sudo tar czvf ~/trains_backup.tgz /opt/trains/data
|
||||||
```
|
|
||||||
Which will back up all data to an archive in your home folder. Restoring such a backup can be done using these commands:
|
This back ups all data to an archive in your home directory.
|
||||||
```bash
|
|
||||||
|
To restore this example backup, use the following command:
|
||||||
|
|
||||||
sudo rm -R /opt/trains/data
|
sudo rm -R /opt/trains/data
|
||||||
sudo tar -xzf ~/trains_backup.tgz -C /opt/trains/data
|
sudo tar -xzf ~/trains_backup.tgz -C /opt/trains/data
|
||||||
```
|
|
||||||
3. Launch the newly released docker image (see [Launching docker images](#Launching-docker-images))
|
3. Launch the newly released Docker image (see [Launching Docker images](#Launching-docker-images)).
|
||||||
|
|
||||||
## License
|
## License
|
||||||
|
|
||||||
@ -196,6 +241,6 @@ Once a new release is out, you can simply:
|
|||||||
|
|
||||||
**trains-server** relies *heavily* on both [MongoDB](https://github.com/mongodb/mongo) and [ElasticSearch](https://github.com/elastic/elasticsearch).
|
**trains-server** relies *heavily* on both [MongoDB](https://github.com/mongodb/mongo) and [ElasticSearch](https://github.com/elastic/elasticsearch).
|
||||||
With the recent changes in both MongoDB's and ElasticSearch's OSS license, we feel it is our job as a community to support the projects we love and cherish.
|
With the recent changes in both MongoDB's and ElasticSearch's OSS license, we feel it is our job as a community to support the projects we love and cherish.
|
||||||
We feel the cause for the license change in both cases is more than just, and chose [SSPL](https://www.mongodb.com/licensing/server-side-public-license) because it is the more restrictive of the two.
|
We feel the cause for the license change in both cases is more than just, and chose [SSPL](https://www.mongodb.com/licensing/server-side-public-license) because it is the more general and flexible of the two.
|
||||||
|
|
||||||
This is our way to say - we support you guys!
|
This is our way to say - we support you guys!
|
||||||
|
@ -57,7 +57,7 @@ class BasicConfig:
|
|||||||
return conf
|
return conf
|
||||||
|
|
||||||
if verbose:
|
if verbose:
|
||||||
print("Loading config from {conf_root}")
|
print(f"Loading config from {conf_root}")
|
||||||
|
|
||||||
for file in conf_root.rglob("*.conf"):
|
for file in conf_root.rglob("*.conf"):
|
||||||
key = ".".join(file.relative_to(conf_root).with_suffix("").parts)
|
key = ".".join(file.relative_to(conf_root).with_suffix("").parts)
|
||||||
|
@ -57,7 +57,7 @@ class BasicConfig:
|
|||||||
return conf
|
return conf
|
||||||
|
|
||||||
if verbose:
|
if verbose:
|
||||||
print("Loading config from {conf_root}")
|
print(f"Loading config from {conf_root}")
|
||||||
|
|
||||||
for file in conf_root.rglob("*.conf"):
|
for file in conf_root.rglob("*.conf"):
|
||||||
key = ".".join(file.relative_to(conf_root).with_suffix("").parts)
|
key = ".".join(file.relative_to(conf_root).with_suffix("").parts)
|
||||||
|
@ -1,10 +1,9 @@
|
|||||||
{
|
# requested token expiration in seconds (one month)
|
||||||
# requested token expiration in seconds (one month)
|
apiserver_token_expiration: 2592000
|
||||||
apiserver_token_expiration: 2592000
|
|
||||||
|
|
||||||
debug: false
|
debug: false
|
||||||
|
|
||||||
flask {
|
flask {
|
||||||
# Uncomment next line to disable login requirement while testing (or unit-testing)
|
# Uncomment next line to disable login requirement while testing (or unit-testing)
|
||||||
TESTING: False
|
TESTING: False
|
||||||
|
|
||||||
@ -18,14 +17,14 @@
|
|||||||
REMEMBER_COOKIE_HTTPONLY: True
|
REMEMBER_COOKIE_HTTPONLY: True
|
||||||
SESSION_COOKIE_SECURE: False
|
SESSION_COOKIE_SECURE: False
|
||||||
REMEMBER_COOKIE_SECURE: False
|
REMEMBER_COOKIE_SECURE: False
|
||||||
}
|
}
|
||||||
|
|
||||||
listen {
|
listen {
|
||||||
ip : "0.0.0.0"
|
ip : "0.0.0.0"
|
||||||
port: 8080
|
port: 8080
|
||||||
}
|
}
|
||||||
|
|
||||||
auth {
|
auth {
|
||||||
cookies {
|
cookies {
|
||||||
httponly: true # allow only http to access the cookies (no JS etc)
|
httponly: true # allow only http to access the cookies (no JS etc)
|
||||||
secure: false # not using HTTPS
|
secure: false # not using HTTPS
|
||||||
@ -35,15 +34,14 @@
|
|||||||
session_auth_cookie_name: "trains_token_basic"
|
session_auth_cookie_name: "trains_token_basic"
|
||||||
|
|
||||||
user_token_expiration_sec: 3600
|
user_token_expiration_sec: 3600
|
||||||
}
|
}
|
||||||
|
|
||||||
docs {
|
docs {
|
||||||
# Default filename used when file not found error is reported when serving docs.
|
# Default filename used when file not found error is reported when serving docs.
|
||||||
# This usually happans when the path is to a folder and not a file.
|
# This usually happans when the path is to a folder and not a file.
|
||||||
default_filename: "index.html"
|
default_filename: "index.html"
|
||||||
}
|
|
||||||
|
|
||||||
default_company: "d1bd92a3b039400cbafc60a7a5b1e52b"
|
|
||||||
|
|
||||||
redirect_to_https: false
|
|
||||||
}
|
}
|
||||||
|
|
||||||
|
default_company: "d1bd92a3b039400cbafc60a7a5b1e52b"
|
||||||
|
|
||||||
|
redirect_to_https: false
|
||||||
|
@ -210,6 +210,11 @@ def _serve_webapp(path=None):
|
|||||||
return response
|
return response
|
||||||
|
|
||||||
|
|
||||||
|
@app.route("/favicon.ico")
|
||||||
|
def favicon():
|
||||||
|
return send_from_directory("static", "favicon.ico")
|
||||||
|
|
||||||
|
|
||||||
@app.route("/")
|
@app.route("/")
|
||||||
def index():
|
def index():
|
||||||
if not current_user.is_authenticated:
|
if not current_user.is_authenticated:
|
||||||
|
Loading…
Reference in New Issue
Block a user