mirror of
https://github.com/clearml/clearml-server
synced 2025-01-31 10:56:48 +00:00
328 lines
12 KiB
Markdown
328 lines
12 KiB
Markdown
# trains-server FAQ
|
|
|
|
Launching **trains-server**
|
|
|
|
* How do I launch **trains-server** on:
|
|
|
|
* [Stand alone Linux Ubuntu systems?](#ubuntu)
|
|
|
|
* [macOS?](#mac-osx)
|
|
|
|
* [Windows 10?](#docker_compose_win10)
|
|
|
|
* [How do I restart trains-server?](#restart)
|
|
|
|
Kubernetes
|
|
|
|
* [Can I deploy trains-server on Kubernetes clusters?](#kubernetes)
|
|
|
|
* [Can I create a Helm Chart for trains-server Kubernetes deployment?](#helm)
|
|
|
|
Configuration
|
|
|
|
* [How do I configure trains-server for sub-domains and load balancers?](#sub-domains)
|
|
|
|
* [Can I add web login authentication to trains-server?](#web-auth)
|
|
|
|
* [Can I modify the non-responsive experiment watchdog settings?](#watchdog)
|
|
|
|
Troubleshooting
|
|
|
|
* [How do I fix Docker upgrade errors?](#common-docker-upgrade-errors)
|
|
|
|
* [Why is web login authentication not working?](#port-conflict)
|
|
|
|
## Launching **trains-server**
|
|
|
|
### How do I launch trains-server on stand alone Linux Ubuntu systems? <a name="ubuntu"></a>
|
|
|
|
To launch **trains-server** on a stand alone Linux Ubuntu:
|
|
|
|
1. Install [docker for Ubuntu](https://docs.docker.com/install/linux/docker-ce/ubuntu/).
|
|
|
|
1. Install `docker-compose` using the following commands (for more detailed information, see the [Install Docker Compose](https://docs.docker.com/compose/install/) in the Docker documentation):
|
|
|
|
sudo curl -L "https://github.com/docker/compose/releases/download/1.24.1/docker-compose-$(uname -s)-$(uname -m)" -o /usr/local/bin/docker-compose
|
|
sudo chmod +x /usr/local/bin/docker-compose
|
|
|
|
1. Remove the previous installation of **trains-server**.
|
|
|
|
**WARNING**: This clears all existing **Trains** databases.
|
|
|
|
sudo rm -R /opt/trains/
|
|
|
|
1. Create local directories for the databases and storage.
|
|
|
|
sudo mkdir -p /opt/trains/data/elastic
|
|
sudo mkdir -p /opt/trains/data/mongo/db
|
|
sudo mkdir -p /opt/trains/data/mongo/configdb
|
|
sudo mkdir -p /opt/trains/logs
|
|
sudo mkdir -p /opt/trains/config
|
|
sudo mkdir -p /opt/trains/data/fileserver
|
|
sudo chown -R 1000:1000 /opt/trains
|
|
|
|
1. Clone the [trains-server](https://github.com/allegroai/trains-server) repository and change directories to the new **trains-server** directory.
|
|
|
|
git clone https://github.com/allegroai/trains-server.git
|
|
cd trains-server
|
|
|
|
1. Run `docker-compose`
|
|
|
|
/usr/local/bin/docker-compose -f docker-compose.yml up
|
|
|
|
Your server is now running on [http://localhost:8080](http://localhost:8080)
|
|
|
|
### How do I launch trains-server on macOS? <a name="mac-osx"></a>
|
|
|
|
To launch **trains-server** on macOS:
|
|
|
|
1. Install [docker for macOS](https://docs.docker.com/docker-for-mac/install/).
|
|
|
|
1. Configure [Docker](https://www.elastic.co/guide/en/elasticsearch/reference/current/docker.html#docker-cli-run-prod-mode).
|
|
|
|
screen ~/Library/Containers/com.docker.docker/Data/vms/0/tty
|
|
sysctl -w vm.max_map_count=262144
|
|
|
|
1. Create local directories for the databases and storage.
|
|
|
|
sudo mkdir -p /opt/trains/data/elastic
|
|
sudo mkdir -p /opt/trains/data/mongo/db
|
|
sudo mkdir -p /opt/trains/data/mongo/configdb
|
|
sudo mkdir -p /opt/trains/data/redis
|
|
sudo mkdir -p /opt/trains/logs
|
|
sudo mkdir -p /opt/trains/config
|
|
sudo mkdir -p /opt/trains/data/fileserver
|
|
sudo chown -R $(whoami):staff /opt/trains
|
|
|
|
1. Open the Docker app, select **Preferences**, and then on the **File Sharing** tab, add `/opt/trains`.
|
|
|
|
1. Clone the [trains-server](https://github.com/allegroai/trains-server) repository and change directories to the new **trains-server** directory.
|
|
|
|
git clone https://github.com/allegroai/trains-server.git
|
|
cd trains-server
|
|
|
|
1. Run `docker-compose` with the docker compose file.
|
|
|
|
docker-compose -f docker-compose.yml up
|
|
|
|
Your server is now running on [http://localhost:8080](http://localhost:8080)
|
|
|
|
### How do I launch trains-server on Windows 10? <a name="docker_compose_win10"></a>
|
|
|
|
You can run **trains-server** on Windows 10 using Docker Desktop for Windows (see the Docker [System Requirements](https://docs.docker.com/docker-for-windows/install/#system-requirements)).
|
|
|
|
To launch **trains-server** on Windows 10:
|
|
|
|
1. Install the Docker Desktop for Windows application by either:
|
|
|
|
* following the [Install Docker Desktop on Windows](https://docs.docker.com/docker-for-windows/install/) instructions.
|
|
* running the Docker installation [wizard](https://hub.docker.com/?overlay=onboarding).
|
|
|
|
1. Increase the memory allocation in Docker Desktop to `4GB`.
|
|
|
|
1. In your Windows notification area (system tray), right click the Docker icon.
|
|
|
|
1. Click *Settings*, *Advanced*, and then set the memory to at least `4096`.
|
|
|
|
1. Click *Apply*.
|
|
|
|
1. Create local directories for data and logs. Open PowerShell and execute the following commands:
|
|
|
|
cd c:
|
|
mkdir c:\opt\trains\data
|
|
mkdir c:\opt\trains\logs
|
|
|
|
1. Download the **trains-server** docker-compose YAML file [docker-compose-win10.yml](https://raw.githubusercontent.com/allegroai/trains-server/master/docker-compose-win10.yml) as `c:\opt\trains\docker-compose.yml`.
|
|
|
|
1. Run `docker-compose`. In PowerShell, execute the following commands:
|
|
|
|
docker-compose -f up docker-compose-win10.yml
|
|
|
|
Your server is now running on [http://localhost:8080](http://localhost:8080)
|
|
|
|
### How do I restart trains-server? <a name="restart"></a>
|
|
|
|
Restart *trains-server* by first stopping the Docker containers and then restarting them.
|
|
|
|
```bash
|
|
docker-compose down
|
|
docker-compose up -f docker-compose.yml
|
|
```
|
|
|
|
**Note**: If you are using a different docker-compose YAML file, specify that file.
|
|
|
|
## Kubernetes
|
|
|
|
### Can I deploy trains-server on Kubernetes clusters? <a name="kubernetes"></a>
|
|
|
|
**trains-server** supports Kubernetes. See [trains-server-k8s](https://github.com/allegroai/trains-server-k8s)
|
|
which contains the YAML files describing the required services and detailed instructions for deploying
|
|
**trains-server** to a Kubernetes clusters.
|
|
|
|
### Can I create a Helm Chart for trains-server Kubernetes deployment? <a name="helm"></a>
|
|
|
|
**trains-server** supports creating a Helm chart for Kubernetes deployment. See [trains-server-helm](https://github.com/allegroai/trains-server-helm)
|
|
which you can use to create a Helm chart for **trains-server** and contains detailed instructions for deploying
|
|
**trains-server** to a Kubernetes clusters using Helm.
|
|
|
|
## Configuration
|
|
|
|
### How do I configure trains-server for sub-domains and load balancers? <a name="sub-domains"></a>
|
|
|
|
You can configure **trains-server** for sub-domains and a load balancer.
|
|
|
|
For example, if your domain is `trains.mydomain.com` and your sub-domains are `app` and `api`, then do the following:
|
|
|
|
1. If you are not using the current **trains-server** version, [upgrade](https://github.com/allegroai/trains-server#upgrade) **trains-server**.
|
|
|
|
1. Add the following to `/opt/trains/config/apiserver.conf`:
|
|
|
|
auth {
|
|
cookies {
|
|
httponly: true
|
|
secure: true
|
|
domain: ".trains.mydomain.com"
|
|
max_age: 99999999999
|
|
}
|
|
}
|
|
|
|
1. Use the following load balancer configuration:
|
|
|
|
* Listeners:
|
|
* Optional: HTTP listener, that redirects all traffic to HTTPS.
|
|
* HTTPS listener for `app.` forwarded to `AppTargetGroup`
|
|
* HTTPS listener for `api.` forwarded to `ApiTargetGroup`
|
|
* HTTPS listener for `files.` forwarded to `FilesTargetGroup`
|
|
* Target groups:
|
|
* `AppTargetGroup`: HTTP based target group, port `8080`
|
|
* `ApiTargetGroup`: HTTP based target group, port `8008`
|
|
* `FilesTargetGroup`: HTTP based target group, port `8081`
|
|
* Security and routing:
|
|
* Load balancer: make sure the load balancers are able to receive traffic from the relevant IP addresses (Security groups and Subnets definitions).
|
|
* Instances: make sure the load balancers are able to access the instances, using the relevant ports (Security groups definitions).
|
|
|
|
1. Run the Docker containers with our updated `docker run` commands (see [Launching Docker Containers](#https://github.com/allegroai/trains-server#launching-docker-containers)).
|
|
|
|
### Can I add web login authentication to trains-server? <a name="web-auth"></a>
|
|
|
|
By default, anyone can login to the **trains-server** Web-App.
|
|
You can configure the **trains-server** to allow only a specific set of users to access the system.
|
|
|
|
To add web login authentication to **trains-server**:
|
|
|
|
1. If you are not using the current **trains-server** version, then [upgrade](https://github.com/allegroai/trains-server#upgrade).
|
|
|
|
1. In `/opt/trains/config/apiserver.conf`, add the `auth` section and in it specify the users, for example:
|
|
|
|
**Note**: A sample `apiserver.conf` configuration file is also available [here](https://github.com/allegroai/trains-server/blob/master/docs/apiserver.conf).
|
|
|
|
auth {
|
|
# Fixed users login credentials
|
|
# No other user will be able to login
|
|
fixed_users {
|
|
enabled: true
|
|
users: [
|
|
{
|
|
username: "jane"
|
|
password: "12345678"
|
|
name: "Jane Doe"
|
|
},
|
|
{
|
|
username: "john"
|
|
password: "12345678"
|
|
name: "John Doe"
|
|
},
|
|
]
|
|
}
|
|
}
|
|
|
|
1. Restart **trains-server** (see the [Restarting trains-server](#restart) FAQ).
|
|
|
|
### Can I modify the experiment watchdog settings? <a name="watchdog"></a>
|
|
|
|
The non-responsive experiment watchdog monitors experiments that were not updated for a specified period of time
|
|
and marks them as `aborted`. The watchdog is always active.
|
|
|
|
You can modify the following settings for the watchdog:
|
|
|
|
* the time threshold (in seconds) of experiment inactivity (default value is 7200 seconds (2 hours))
|
|
* the time interval (in seconds) between watchdog cycles
|
|
|
|
To change the watchdog's settings:
|
|
|
|
1. In `/opt/trains/config`, add the `services.conf` file and in it specify the watchdog settings, for example:
|
|
|
|
**Note**: A sample watchdog `services.conf` configuration file is also available [here](https://github.com/allegroai/trains-server/blob/master/docs/services.conf).
|
|
|
|
tasks {
|
|
non_responsive_tasks_watchdog {
|
|
# In-progress tasks that haven't been updated for at least 'value' seconds will be stopped by the watchdog
|
|
threshold_sec: 7200
|
|
|
|
# Watchdog will sleep for this number of seconds after each cycle
|
|
watch_interval_sec: 900
|
|
}
|
|
}
|
|
|
|
1. Restart **trains-server** (see the [Restarting trains-server](#restart) FAQ).
|
|
|
|
## Troubleshooting
|
|
|
|
### How do I fix Docker upgrade errors? <a name="common-docker-upgrade-errors"></a>
|
|
|
|
To resolve the Docker error "... The container name "/trains-???" is already in use by ...", try removing deprecated images:
|
|
|
|
docker rm -f $(docker ps -a -q)
|
|
|
|
### Why is web login authentication not working?
|
|
|
|
A port conflict between the **trains-server** MongoDB and / or Elastic instances, and other
|
|
instances running on your system may prevent web login authentication
|
|
from working correctly.
|
|
|
|
**trains-server** uses the following default ports which may be in conflict with other instances:
|
|
|
|
* MongoDB port `27017`
|
|
* Elastic port `9200`
|
|
|
|
You can check for port conflicts in the logs in `/opt/trains/log`.
|
|
|
|
If a port conflict occurs, change the MongoDB and / or Elastic ports in the `docker-compose.yml`,
|
|
and then run the Docker compose commands to restart the **trains-server** instance.
|
|
|
|
To change the MongoDB and / or Elastic ports for **trains-server**:
|
|
|
|
1. Edit the `docker-compose.yml` file.
|
|
|
|
1. In the `services/trainsserver/environment` section, add the following environment variable(s):
|
|
|
|
* For MongoDB:
|
|
|
|
MONGODB_SERVICE_PORT: <new-mongodb-port>
|
|
|
|
* For Elastic:
|
|
|
|
ELASTIC_SERVICE_PORT: <new-elasticsearch-port>
|
|
|
|
For example:
|
|
|
|
MONGODB_SERVICE_PORT: 27018
|
|
ELASTIC_SERVICE_PORT: 9201
|
|
|
|
1. For MongoDB, in the `services/mongo/ports` section, expose the new MongoDB port:
|
|
|
|
<new-mongodb-port>:27017
|
|
|
|
For example:
|
|
|
|
20718:27017
|
|
|
|
1. For Elastic, in the `services/elasticsearch/ports` section, expose the new Elastic port:
|
|
|
|
<new-elsticsearch-port>:9200
|
|
|
|
For example:
|
|
|
|
9201:9200
|
|
|
|
2. Restart **trains-server** (see the [Restarting trains-server](#restart) FAQ). |