mirror of https://github.com/clearml/clearml-server synced 2025-06-26 23:15:47 +00:00

Go to file

allegroai dacdd5e965 Documentation		2019-08-08 02:00:15 +03:00
docs	Documentation	2019-08-08 02:00:15 +03:00
fileserver	Add File server CORS support	2019-07-17 18:16:43 +03:00
server	Add easier sub-domains configuration	2019-07-17 18:17:27 +03:00
webserver	Add File server CORS support	2019-07-17 18:16:43 +03:00
.gitignore	Add .gitignore	2019-06-11 00:24:11 +03:00
docker-compose-unified.yml	Improved docker-compose installation process	2019-08-08 01:51:40 +03:00
docker-compose.yml	Improved docker-compose installation process	2019-08-08 01:51:40 +03:00
LICENSE	Add SSPL v1.0 License	2019-06-11 18:49:05 +03:00
README.md	Documentation	2019-08-08 02:00:15 +03:00

README.md

TRAINS Server

Auto-Magical Experiment Manager & Version Control for AI

Introduction

The trains-server is the backend service infrastructure for TRAINS. It allows multiple users to collaborate and manage their experiments. By default, TRAINS is set up to work with the TRAINS demo server, which is open to anyone and resets periodically. In order to host your own server, you will need to install trains-server and point TRAINS to it.

trains-server contains the following components:

The TRAINS Web-App, a single-page UI for experiment management and browsing
RESTful API for:
- Documenting and logging experiment information, statistics and results
- Querying experiments history, logs and results
Locally-hosted file server for storing images and models making them easily accessible using the Web-App

You can quickly setup your trains-server using a pre-built Docker image (see Installation).

When new releases are available, you can upgrade your pre-built Docker image (see Upgrade).

System diagram

Install / Upgrade - AWS

Use one of our pre-installed Amazon Machine Images for easy deployment in AWS.

For details and instructions, see TRAINS-server: AWS pre-installed images.

Install - Linux, Mac OS X

Use our pre-built Docker image for easy deployment in Linux and Mac OS X. For Windows, we recommend installing our pre-built Docker image on a Linux virtual machine.

Setup Docker (Full details Setup Docker Service)

Make sure port 8080/8081/8008 are available for the trains-server services

Increase vm.max_map_count for ElasticSearch docker

echo "vm.max_map_count=262144" > /tmp/99-trains.conf
sudo mv /tmp/99-trains.conf /etc/sysctl.d/99-trains.conf
sudo sysctl -w vm.max_map_count=262144

sudo sudo service docker restart

Create local directories for the databases and storage.

sudo mkdir -p /opt/trains/data/elastic
sudo mkdir -p /opt/trains/data/mongo/db
sudo mkdir -p /opt/trains/data/mongo/configdb
sudo mkdir -p /opt/trains/logs
sudo mkdir -p /opt/trains/data/fileserver

Linux

sudo chown -R 1000:1000 /opt/trains

Mac OS X

sudo chown -R $(whoami):staff /opt/trains

Clone the trains-server repository and change directories to the new trains-server directory.
```
 $ git clone https://github.com/allegroai/trains-server.git
 $ cd trains-server
```
Launch the Docker containers
- Automatically with docker-compose (details: Linux/Ubuntu, OS X)
```
$ docker-compose up
```
- Manually
  See TRAINS-server: Launching Docker Containers Manually) for instructions.
Your server is now running on http://localhost:8080 and the following ports are available:
- Web server on port 8080
- API server on port 8008
- File server on port 8081

Optional: Configuration

The trains-server default configuration can be easily overridden using external configuration files. By default, the server will look for these files in /opt/trains/config.

If the configuration is changed while the server is running, to apply the changes you must restart the server (see Restarting trains-server).

By default anyone can login to the trains-server Web-App. You can configure the trains-server to allow access only to specific users (with pre-configured user/pass).

Enable this feature by placing apiserver.conf file under /opt/trains/config.

Sample fixed user configuration file /opt/trains/config/apiserver.conf:

auth {
    # Fixed users login credetials 
    # No other user will be able to login
    fixed_users {
        enabled: true
        users: [
            {
                username: "jane"
                password: "12345678"
                name: "Jane Doe"
            },
            {
                username: "john"
                password: "12345678"
                name: "John Doe"
            },
        ]
    }
}

To apply the apiserver.conf changes, you must restart the trains-apiserver (docker) (see Restarting trains-server).

Configuring the Non-Responsive Experiments Watchdog Thresholds

The non-responsive experiment watchdog monitors experiments that were not updated for a given period of time, and marks them as aborted. The watchdog is always active with a default of 7200 seconds (2 hours).

To change the watchdog's timeouts, place a services.conf file under /opt/trains/config, containing for example:

tasks {
    non_responsive_tasks_watchdog {
        # In-progress tasks that haven't been updated for at least 'value' seconds will be stopped by the watchdog
        threshold_sec: 7200
    
        # Watchdog will sleep for this number of seconds after each cycle
        watch_interval_sec: 900
    }
}

To apply the services.conf changes, you must restart the trains-apiserver (docker) (see Restarting trains-server).

Restarting trains-server

To restart the trains-server, you must first stop and remove the containers, and then restart.

Restarting docker-compose containers.

 $ docker-compose down
 $ docker-compose up

Manually restarting dockers instructions.

Configuring TRAINS client

Once you have installed the trains-server, make sure to configure TRAINS client to use your locally installed server (and not the demo server).

Run the trains-init command for an interactive setup
Or manually edit ~/trains.conf file, making sure the api_server value is configured correctly, for example:

api { api_server: "http://localhost:8008" }

See Installing and Configuring TRAINS for more details.

What next?

Now that the trains-server is installed, and TRAINS is configured to use it, you can use TRAINS in your experiments and view them in the web server, for example http://localhost:8080

Upgrading

We are constantly updating, improving and adding to the trains-server. New releases will include new pre-built Docker images. When we release a new version and include a new pre-built Docker image for it, upgrade as follows:

Shut down and remove each of your Docker instances using the following commands:
- Using Docker-Compose
```
$ docker-compose up
```
- Manual Docker launching
```
  sudo docker stop <docker-name>
  sudo docker rm -v <docker-name>
```
  The Docker names are (see Launching Docker Containers):
  - trains-elastic
  - trains-mongo
  - trains-fileserver
  - trains-apiserver
  - trains-webserver
We highly recommend backing up your data directory!. A simple way to do that is using tar:

For example, if your data directory is /opt/trains, use the following command:
```
 sudo tar czvf ~/trains_backup.tgz /opt/trains/data
```
This backups all data to an archive in your home directory.

To restore this example backup, use the following command:
```
 sudo rm -R /opt/trains/data
 sudo tar -xzf ~/trains_backup.tgz -C /opt/trains/data
```
Pull the new trains-server docker image using the following command:
```
 sudo docker pull allegroai/trains:latest
```
If you wish to pull a different version, replace latest with the required version number, for example:
```
 sudo docker pull allegroai/trains:0.10.1
```
Launch the newly released Docker image (see Launching Docker Containers).

Community & Support

If you have any questions, look to the TRAINS-server FAQ, or tag your questions on stackoverflow with 'trains' tag.

For feature requests or bug reports, please use GitHub issues.

Additionally, you can always find us at trains@allegro.ai

License

Server Side Public License v1.0

trains-server relies on both MongoDB and ElasticSearch. With the recent changes in both MongoDB's and ElasticSearch's OSS license, we feel it is our responsibility as a member of the community to support the projects we love and cherish. We believe the cause for the license change in both cases is more than just, and chose SSPL because it is the more general and flexible of the two licenses.

This is our way to say - we support you guys!