mirror of
https://github.com/clearml/clearml-server
synced 2025-06-26 23:15:47 +00:00
Compare commits
59 Commits
| Author | SHA1 | Date | |
|---|---|---|---|
|
|
da1182a405 | ||
|
|
53e995ee8c | ||
|
|
4732dc1a88 | ||
|
|
e325bcaf67 | ||
|
|
a7c30453db | ||
|
|
dedac3b2fe | ||
|
|
7d10bbdf8e | ||
|
|
72213dffa4 | ||
|
|
f778837d4b | ||
|
|
153ed6a7b7 | ||
|
|
5d279c8c5a | ||
|
|
ed910d5f6a | ||
|
|
87d2b6fa15 | ||
|
|
94cfb17291 | ||
|
|
3f641d37b7 | ||
|
|
551be12f01 | ||
|
|
b536020058 | ||
|
|
fb6fbc0a06 | ||
|
|
5ae64fd791 | ||
|
|
f9776e4319 | ||
|
|
75e736e7d5 | ||
|
|
1e4756aa1d | ||
|
|
52529d3c55 | ||
|
|
53296e8891 | ||
|
|
1c87ebc900 | ||
|
|
14d9924ea0 | ||
|
|
69f9b424c7 | ||
|
|
1a6da301a8 | ||
|
|
2728b3ed14 | ||
|
|
38284eef1f | ||
|
|
9debe1adcd | ||
|
|
cc93c15f8a | ||
|
|
2c3f0e4ba3 | ||
|
|
c48eb34d8d | ||
|
|
49515e06e1 | ||
|
|
4a1d97c02f | ||
|
|
6c6c1c3f41 | ||
|
|
0ad687008c | ||
|
|
fe3dbc92dc | ||
|
|
dc53970ff0 | ||
|
|
73592b991b | ||
|
|
47b981a993 | ||
|
|
b500bcab0b | ||
|
|
59e910db1a | ||
|
|
2ecb430f02 | ||
|
|
a08722e394 | ||
|
|
67c210d9d7 | ||
|
|
101ba540f4 | ||
|
|
82fc28d477 | ||
|
|
7b73f699d2 | ||
|
|
a7e5380f67 | ||
|
|
bcade31786 | ||
|
|
6b902f85f4 | ||
|
|
6d4c974045 | ||
|
|
2346c6f3f5 | ||
|
|
82e51b4d36 | ||
|
|
e63599254e | ||
|
|
8e7e234161 | ||
|
|
17d94b26c3 |
4
.gitignore
vendored
4
.gitignore
vendored
@@ -1,11 +1,10 @@
|
||||
syntax: glob
|
||||
.idea
|
||||
apierrors/errors
|
||||
static/build.json
|
||||
static/dashboard/node_modules
|
||||
static/webapp/node_modules
|
||||
static/webapp/.git
|
||||
scripts/
|
||||
generators/
|
||||
*.pyc
|
||||
__pycache__
|
||||
.ropeproject
|
||||
@@ -20,3 +19,4 @@ build
|
||||
dist
|
||||
code.tar.gz
|
||||
server/schema/services/_cache.json
|
||||
server/apierrors/errors/*
|
||||
|
||||
315
README.md
315
README.md
@@ -1,4 +1,4 @@
|
||||
# TRAINS Server
|
||||
# Trains Server
|
||||
|
||||
## Auto-Magical Experiment Manager & Version Control for AI
|
||||
|
||||
@@ -9,25 +9,20 @@
|
||||
|
||||
## Introduction
|
||||
|
||||
The **trains-server** is the backend service infrastructure for [TRAINS](https://github.com/allegroai/trains).
|
||||
The **trains-server** is the backend service infrastructure for [Trains](https://github.com/allegroai/trains).
|
||||
It allows multiple users to collaborate and manage their experiments.
|
||||
By default, TRAINS is set up to work with the TRAINS demo server, which is open to anyone and resets periodically.
|
||||
In order to host your own server, you will need to install **trains-server** and point TRAINS to it.
|
||||
By default, **Trains** is set up to work with the **Trains** demo server, which is open to anyone and resets periodically.
|
||||
In order to host your own server, you will need to launch **trains-server** and point **Trains** to it.
|
||||
|
||||
**trains-server** contains the following components:
|
||||
|
||||
* The TRAINS Web-App, a single-page UI for experiment management and browsing
|
||||
* The **Trains** Web-App, a single-page UI for experiment management and browsing
|
||||
* RESTful API for:
|
||||
* Documenting and logging experiment information, statistics and results
|
||||
* Querying experiments history, logs and results
|
||||
* Locally-hosted file server for storing images and models making them easily accessible using the Web-App
|
||||
|
||||
You can quickly setup your **trains-server** using:
|
||||
- [Docker Installation](#installation)
|
||||
- Pre-built Amazon [AWS image](#aws)
|
||||
- [Kubernetes Helm](https://github.com/allegroai/trains-server-helm#trains-server-for-kubernetes-clusters-using-helm)
|
||||
or manual [Kubernetes installation](https://github.com/allegroai/trains-server-k8s#trains-server-for-kubernetes-clusters)
|
||||
|
||||
You can quickly [deploy](#launching-trains-server) your **trains-server** using Docker, AWS EC2 AMI, or Kubernetes.
|
||||
|
||||
## System design
|
||||
|
||||
@@ -44,155 +39,42 @@ You can quickly setup your **trains-server** using:
|
||||
- Web application on sub-domain: app.\*.\*
|
||||
- API service on sub-domain: api.\*.\*
|
||||
- File storage service on sub-domain: files.\*.\*
|
||||
|
||||
## Launching trains-server
|
||||
|
||||
## Install / Upgrade - AWS <a name="aws"></a>
|
||||
### Prerequisites
|
||||
|
||||
Use one of our pre-installed Amazon Machine Images for easy deployment in AWS.
|
||||
The ports 8080/8081/8008 must be available for the **trains-server** services.
|
||||
|
||||
For example, to see if port `8080` is in use:
|
||||
|
||||
For details and instructions, see [TRAINS-server: AWS pre-installed images](docs/install_aws.md).
|
||||
* Linux or macOS:
|
||||
|
||||
sudo lsof -Pn -i4 | grep :8080 | grep LISTEN
|
||||
|
||||
## Docker Installation - Linux, Mac OS X <a name="installation"></a>
|
||||
* Windows:
|
||||
|
||||
Use our pre-built Docker image for easy deployment in Linux and Mac OS X.
|
||||
For Windows, we recommend installing our pre-built Docker image on a Linux virtual machine.
|
||||
Latest docker images can be found [here](https://hub.docker.com/r/allegroai/trains).
|
||||
netstat -an |find /i "8080"
|
||||
|
||||
### Launching
|
||||
|
||||
Launch **trains-server** in any of the following formats:
|
||||
|
||||
1. Setup Docker ([docker-compose Ubuntu](docs/faq.md#ubuntu), [docker-compose OS X](docs/faq.md#mac-osx), [Setup Docker Service Manually](docs/docker_setup.md#setup-docker))
|
||||
- Pre-built [AWS EC2 AMI](https://github.com/allegroai/trains-server/blob/master/docs/install_aws.md)
|
||||
- Pre-built Docker Image
|
||||
- [Linux](https://github.com/allegroai/trains-server/blob/master/docs/install_linux_mac.md)
|
||||
- [macOS](https://github.com/allegroai/trains-server/blob/master/docs/install_linux_mac.md)
|
||||
- [Windows 10](https://github.com/allegroai/trains-server/blob/master/docs/install_win.md)
|
||||
- Kubernetes
|
||||
- [Kubernetes Helm](https://github.com/allegroai/trains-server-helm#prerequisites)
|
||||
- Manual [Kubernetes installation](https://github.com/allegroai/trains-server-k8s#prerequisites)
|
||||
|
||||
Make sure port 8080/8081/8008 are available for the `trains-server` services
|
||||
## Connecting Trains to your trains-server
|
||||
|
||||
Increase vm.max_map_count for `ElasticSearch` docker
|
||||
|
||||
```bash
|
||||
echo "vm.max_map_count=262144" > /tmp/99-trains.conf
|
||||
sudo mv /tmp/99-trains.conf /etc/sysctl.d/99-trains.conf
|
||||
sudo sysctl -w vm.max_map_count=262144
|
||||
|
||||
sudo service docker restart
|
||||
```
|
||||
|
||||
1. Create local directories for the databases and storage.
|
||||
|
||||
```bash
|
||||
sudo mkdir -p /opt/trains/data/elastic
|
||||
sudo mkdir -p /opt/trains/data/mongo/db
|
||||
sudo mkdir -p /opt/trains/data/mongo/configdb
|
||||
sudo mkdir -p /opt/trains/data/redis
|
||||
sudo mkdir -p /opt/trains/logs
|
||||
sudo mkdir -p /opt/trains/data/fileserver
|
||||
sudo mkdir -p /opt/trains/config
|
||||
```
|
||||
|
||||
Linux
|
||||
```bash
|
||||
$ sudo chown -R 1000:1000 /opt/trains
|
||||
```
|
||||
Mac OS X
|
||||
```bash
|
||||
$ sudo chown -R $(whoami):staff /opt/trains
|
||||
```
|
||||
|
||||
1. Clone the [trains-server](https://github.com/allegroai/trains-server) repository and change directories to the new **trains-server** directory.
|
||||
|
||||
```bash
|
||||
$ git clone https://github.com/allegroai/trains-server.git
|
||||
$ cd trains-server
|
||||
```
|
||||
|
||||
1. Launch the Docker containers <a name="launch-docker"></a>
|
||||
|
||||
* Automatically with docker-compose (details: [Linux/Ubuntu](docs/faq.md#ubuntu), [OS X](docs/faq.md#mac-osx))
|
||||
|
||||
```bash
|
||||
$ docker-compose up
|
||||
```
|
||||
|
||||
* Manually, see [Launching Docker Containers Manually](docs/docker_setup.md#launch) for instructions.
|
||||
|
||||
1. Your server is now running on [http://localhost:8080](http://localhost:8080) and the following ports are available:
|
||||
|
||||
* Web server on port `8080`
|
||||
* API server on port `8008`
|
||||
* File server on port `8081`
|
||||
|
||||
## Optional Configuration
|
||||
|
||||
The **trains-server** default configuration can be easily overridden using external configuration files. By default, the server will look for these files in `/opt/trains/config`.
|
||||
|
||||
In order to apply the new configuration, you must restart the server (see [Restarting trains-server](#restart-server)).
|
||||
|
||||
### Adding Web Login Authentication
|
||||
|
||||
By default anyone can login to the **trains-server** Web-App.
|
||||
You can configure the **trains-server** to allow only a specific set of users to access the system.
|
||||
|
||||
Enable this feature by placing `apiserver.conf` file under `/opt/trains/config`.
|
||||
|
||||
|
||||
Sample fixed user configuration file `/opt/trains/config/apiserver.conf`:
|
||||
|
||||
auth {
|
||||
# Fixed users login credetials
|
||||
# No other user will be able to login
|
||||
fixed_users {
|
||||
enabled: true
|
||||
users: [
|
||||
{
|
||||
username: "jane"
|
||||
password: "12345678"
|
||||
name: "Jane Doe"
|
||||
},
|
||||
{
|
||||
username: "john"
|
||||
password: "12345678"
|
||||
name: "John Doe"
|
||||
},
|
||||
]
|
||||
}
|
||||
}
|
||||
|
||||
To apply the `apiserver.conf` changes, you must restart the *trains-apiserver* (docker) (see [Restarting trains-server](#restart-server)).
|
||||
|
||||
### Configuring the Non-Responsive Experiments Watchdog
|
||||
|
||||
The non-responsive experiment watchdog, monitors experiments that were not updated for a given period of time,
|
||||
and marks them as `aborted`. The watchdog is always active with a default of 7200 seconds (2 hours) of inactivity threshold.
|
||||
|
||||
To change the watchdog's timeouts, place a `services.conf` file under `/opt/trains/config`.
|
||||
|
||||
Sample watchdog configuration file `/opt/trains/config/services.conf`:
|
||||
|
||||
tasks {
|
||||
non_responsive_tasks_watchdog {
|
||||
# In-progress tasks that haven't been updated for at least 'value' seconds will be stopped by the watchdog
|
||||
threshold_sec: 7200
|
||||
|
||||
# Watchdog will sleep for this number of seconds after each cycle
|
||||
watch_interval_sec: 900
|
||||
}
|
||||
}
|
||||
|
||||
To apply the `services.conf` changes, you must restart the *trains-apiserver* (docker) (see [Restarting trains-server](#restart-server)).
|
||||
|
||||
### Restarting trains-server <a name="restart-server"></a>
|
||||
|
||||
To restart the **trains-server**, you must first stop and remove the containers, and then restart.
|
||||
|
||||
1. Restarting docker-compose containers.
|
||||
|
||||
$ docker-compose down
|
||||
$ docker-compose up
|
||||
|
||||
1. Manually restarting dockers [instructions](docs/docker_setup.md#launch).
|
||||
|
||||
## Configuring **TRAINS** client
|
||||
|
||||
Once you have installed the **trains-server**, make sure to configure **TRAINS** [client](https://github.com/allegroai/trains)
|
||||
to use your locally installed server (and not the demo server).
|
||||
|
||||
- Run the `trains-init` command for an interactive setup
|
||||
|
||||
- Or manually edit `~/trains.conf` file, making sure the `api_server` value is configured correctly, for example:
|
||||
By default, the **Trains** client is set up to work with the [**Trains** demo server](https://demoapp.trains.allegro.ai/).
|
||||
To have the **Trains** client use your **trains-server** instead:
|
||||
- Run the `trains-init` command for an interactive setup.
|
||||
- Or manually edit `~/trains.conf` file, making sure the server settings (`api_server`, `web_server`, `file_server`) are configured correctly, for example:
|
||||
|
||||
api {
|
||||
# API server on port 8008
|
||||
@@ -205,93 +87,80 @@ to use your locally installed server (and not the demo server).
|
||||
files_server: "http://localhost:8081"
|
||||
}
|
||||
|
||||
* Notice that if you setup **trains-server** in a sub-domain configuration, there is no need to specify a port number,
|
||||
**Note**: If you have set up **trains-server** in a sub-domain configuration, then there is no need to specify a port number,
|
||||
it will be inferred from the http/s scheme.
|
||||
|
||||
See [Installing and Configuring TRAINS](https://github.com/allegroai/trains#configuration) for more details.
|
||||
After launching the **trains-server** and configuring the **Trains** client to use the **trains-server**,
|
||||
you can [use](https://github.com/allegroai/trains#using-trains) **Trains** in your experiments and view them in your **trains-server** web server,
|
||||
for example http://localhost:8080.
|
||||
For more information about the Trains client, see [**Trains**](https://github.com/allegroai/trains).
|
||||
|
||||
## What next?
|
||||
## Advanced Functionality
|
||||
|
||||
Now that the **trains-server** is installed, and TRAINS is configured to use it,
|
||||
you can [use](https://github.com/allegroai/trains#using-trains) TRAINS in your experiments and view them in the web server,
|
||||
for example http://localhost:8080
|
||||
**trains-server** provides a few additional useful features, which can be manually enabled:
|
||||
|
||||
* [Web login authentication](https://github.com/allegroai/trains-server/blob/master/docs/faq.md#web-auth)
|
||||
* [Non-responsive experiments watchdog](https://github.com/allegroai/trains-server/blob/master/docs/faq.md#watchdog-the-non-responsive-task-watchdog-settings)
|
||||
|
||||
## Restarting trains-server
|
||||
|
||||
To restart the **trains-server**, you must first stop the containers, and then restart them.
|
||||
|
||||
```bash
|
||||
docker-compose down
|
||||
docker-compose -f docker-compose.yml up
|
||||
```
|
||||
|
||||
## Upgrading <a name="upgrade"></a>
|
||||
|
||||
We are constantly updating, improving and adding to the **trains-server**.
|
||||
New releases will include new pre-built Docker images.
|
||||
When we release a new version and include a new pre-built Docker image for it, upgrade as follows:
|
||||
**trains-server** releases are also reflected in the [docker compose configuration file](https://github.com/allegroai/trains-server/blob/master/docker-compose.yml).
|
||||
We strongly encourage you to keep your **trains-server** up to date, by keeping up with the current release.
|
||||
|
||||
* Upgrading your docker-compose installation
|
||||
**Note**: The following upgrade instructions use the Linux OS as an example.
|
||||
|
||||
* Shut down the docker containers
|
||||
```bash
|
||||
$ docker-compose down
|
||||
```
|
||||
|
||||
* We highly recommend backing up your data directory before upgrading
|
||||
(see **Step ii** in the Manual Docker upgrade)
|
||||
To upgrade your existing **trains-server** deployment:
|
||||
|
||||
* Spin up the docker containers, it will automatically pull the latest trains-server build
|
||||
```bash
|
||||
$ docker-compose up
|
||||
```
|
||||
1. Shut down the docker containers
|
||||
```bash
|
||||
docker-compose down
|
||||
```
|
||||
|
||||
* In case of a docker error: "... The container name "/trains-???" is already in use by ..."
|
||||
Try removing deprecated images with:
|
||||
```bash
|
||||
$ docker rm -f $(docker ps -a -q)
|
||||
```
|
||||
1. We highly recommend backing up your data directory before upgrading.
|
||||
|
||||
* Manual Docker upgrade
|
||||
1. Shut down and remove each of your Docker instances using the following commands:
|
||||
|
||||
```bash
|
||||
$ sudo docker stop <docker-name>
|
||||
$ sudo docker rm -v <docker-name>
|
||||
```
|
||||
|
||||
The Docker names are (see [Launching Docker Containers](#launch-docker)):
|
||||
|
||||
* `trains-elastic`
|
||||
* `trains-mongo`
|
||||
* `trains-redis`
|
||||
* `trains-fileserver`
|
||||
* `trains-apiserver`
|
||||
* `trains-webserver`
|
||||
|
||||
2. We highly recommend backing up your data directory!. A simple way to do that is using `tar`:
|
||||
|
||||
For example, if your data directory is `/opt/trains`, use the following command:
|
||||
|
||||
```bash
|
||||
$ sudo tar czvf ~/trains_backup.tgz /opt/trains/data
|
||||
```
|
||||
This backups all data to an archive in your home directory.
|
||||
|
||||
To restore this example backup, use the following command:
|
||||
```bash
|
||||
$ sudo rm -R /opt/trains/data
|
||||
$ sudo tar -xzf ~/trains_backup.tgz -C /opt/trains/data
|
||||
```
|
||||
|
||||
3. Pull the new **trains-server** docker image using the following command:
|
||||
|
||||
```bash
|
||||
$ sudo docker pull allegroai/trains:latest
|
||||
```
|
||||
|
||||
If you wish to pull a different version, replace `latest` with the required version number, for example:
|
||||
```bash
|
||||
$ sudo docker pull allegroai/trains:0.11.0
|
||||
```
|
||||
|
||||
4. Launch the newly released Docker image (see [Launching Docker Containers](#launch-docker)).
|
||||
Assuming your data directory is `/opt/trains`, to archive all data into `~/trains_backup.tgz` execute:
|
||||
|
||||
```bash
|
||||
sudo tar czvf ~/trains_backup.tgz /opt/trains/data
|
||||
```
|
||||
|
||||
<details>
|
||||
<summary>Restore instructions:</summary>
|
||||
|
||||
To restore this example backup, execute:
|
||||
```bash
|
||||
sudo rm -R /opt/trains/data
|
||||
sudo tar -xzf ~/trains_backup.tgz -C /opt/trains/data
|
||||
```
|
||||
</details>
|
||||
|
||||
1. Download the latest `docker-compose.yml` file.
|
||||
|
||||
```bash
|
||||
curl https://raw.githubusercontent.com/allegroai/trains-server/master/docker-compose.yml -o docker-compose.yml
|
||||
```
|
||||
|
||||
1. Spin up the docker containers, it will automatically pull the latest **trains-server** build
|
||||
```bash
|
||||
docker-compose -f docker-compose.yml pull
|
||||
docker-compose -f docker-compose.yml up
|
||||
```
|
||||
|
||||
**\* If something went wrong along the way, check our FAQ: [Common Docker Upgrade Errors](https://github.com/allegroai/trains-server/blob/master/docs/faq.md#common-docker-upgrade-errors).**
|
||||
|
||||
|
||||
## Community & Support
|
||||
|
||||
If you have any questions, look to the TRAINS-server [FAQ](https://github.com/allegroai/trains-server/blob/master/docs/faq.md), or
|
||||
If you have any questions, look to the Trains server [FAQ](https://github.com/allegroai/trains-server/blob/master/docs/faq.md), or
|
||||
tag your questions on [stackoverflow](https://stackoverflow.com/questions/tagged/trains) with '**trains**' tag.
|
||||
|
||||
For feature requests or bug reports, please use [GitHub issues](https://github.com/allegroai/trains-server/issues).
|
||||
|
||||
@@ -20,9 +20,12 @@ services:
|
||||
- mongo
|
||||
- elasticsearch
|
||||
environment:
|
||||
ELASTIC_SERVICE_HOST: elasticsearch
|
||||
MONGODB_SERVICE_HOST: mongo
|
||||
REDIS_SERVICE_HOST: redis
|
||||
TRAINS_ELASTIC_SERVICE_HOST: elasticsearch
|
||||
TRAINS_ELASTIC_SERVICE_PORT: 9200
|
||||
TRAINS_MONGODB_SERVICE_HOST: mongo
|
||||
TRAINS_MONGODB_SERVICE_PORT: 27017
|
||||
TRAINS_REDIS_SERVICE_HOST: redis
|
||||
TRAINS_REDIS_SERVICE_PORT: 6379
|
||||
networks:
|
||||
- backend
|
||||
elasticsearch:
|
||||
|
||||
120
docker-compose-win10.yml
Normal file
120
docker-compose-win10.yml
Normal file
@@ -0,0 +1,120 @@
|
||||
version: "3.6"
|
||||
services:
|
||||
|
||||
apiserver:
|
||||
command:
|
||||
- apiserver
|
||||
container_name: trains-apiserver
|
||||
image: allegroai/trains:latest
|
||||
restart: unless-stopped
|
||||
volumes:
|
||||
- c:/opt/trains/logs:/var/log/trains
|
||||
- c:/opt/trains/config:/opt/trains/config
|
||||
depends_on:
|
||||
- redis
|
||||
- mongo
|
||||
- elasticsearch
|
||||
- fileserver
|
||||
environment:
|
||||
TRAINS_ELASTIC_SERVICE_HOST: elasticsearch
|
||||
TRAINS_ELASTIC_SERVICE_PORT: 9200
|
||||
TRAINS_MONGODB_SERVICE_HOST: mongo
|
||||
TRAINS_MONGODB_SERVICE_PORT: 27017
|
||||
TRAINS_REDIS_SERVICE_HOST: redis
|
||||
TRAINS_REDIS_SERVICE_PORT: 6379
|
||||
ports:
|
||||
- "8008:8008"
|
||||
networks:
|
||||
- backend
|
||||
|
||||
elasticsearch:
|
||||
networks:
|
||||
- backend
|
||||
container_name: trains-elastic
|
||||
environment:
|
||||
ES_JAVA_OPTS: -Xms2g -Xmx2g
|
||||
bootstrap.memory_lock: "true"
|
||||
cluster.name: trains
|
||||
cluster.routing.allocation.node_initial_primaries_recoveries: "500"
|
||||
discovery.zen.minimum_master_nodes: "1"
|
||||
http.compression_level: "7"
|
||||
node.ingest: "true"
|
||||
node.name: trains
|
||||
reindex.remote.whitelist: '*.*'
|
||||
script.inline: "true"
|
||||
script.painless.regex.enabled: "true"
|
||||
script.update: "true"
|
||||
thread_pool.bulk.queue_size: "2000"
|
||||
thread_pool.search.queue_size: "10000"
|
||||
xpack.monitoring.enabled: "false"
|
||||
xpack.security.enabled: "false"
|
||||
ulimits:
|
||||
memlock:
|
||||
soft: -1
|
||||
hard: -1
|
||||
nofile:
|
||||
soft: 65536
|
||||
hard: 65536
|
||||
image: docker.elastic.co/elasticsearch/elasticsearch:5.6.16
|
||||
restart: unless-stopped
|
||||
volumes:
|
||||
- c:/opt/trains/data/elastic:/usr/share/elasticsearch/data
|
||||
ports:
|
||||
- "9200:9200"
|
||||
|
||||
fileserver:
|
||||
networks:
|
||||
- backend
|
||||
command:
|
||||
- fileserver
|
||||
container_name: trains-fileserver
|
||||
image: allegroai/trains:latest
|
||||
restart: unless-stopped
|
||||
volumes:
|
||||
- c:/opt/trains/logs:/var/log/trains
|
||||
- c:/opt/trains/data/fileserver:/mnt/fileserver
|
||||
ports:
|
||||
- "8081:8081"
|
||||
|
||||
mongo:
|
||||
networks:
|
||||
- backend
|
||||
container_name: trains-mongo
|
||||
image: mongo:3.6.5
|
||||
restart: unless-stopped
|
||||
command: --setParameter internalQueryExecMaxBlockingSortBytes=196100200
|
||||
volumes:
|
||||
- mongodata:/data
|
||||
ports:
|
||||
- "27017:27017"
|
||||
|
||||
redis:
|
||||
networks:
|
||||
- backend
|
||||
container_name: trains-redis
|
||||
image: redis:5.0
|
||||
restart: unless-stopped
|
||||
volumes:
|
||||
- c:/opt/trains/data/redis:/data
|
||||
ports:
|
||||
- "6379:6379"
|
||||
|
||||
webserver:
|
||||
command:
|
||||
- webserver
|
||||
container_name: trains-webserver
|
||||
image: allegroai/trains:latest
|
||||
restart: unless-stopped
|
||||
volumes:
|
||||
- c:/trains/logs:/var/log/trains
|
||||
depends_on:
|
||||
- apiserver
|
||||
ports:
|
||||
- "8080:80"
|
||||
|
||||
networks:
|
||||
backend:
|
||||
driver: bridge
|
||||
|
||||
volumes:
|
||||
mongodata:
|
||||
@@ -16,9 +16,12 @@ services:
|
||||
- elasticsearch
|
||||
- fileserver
|
||||
environment:
|
||||
ELASTIC_SERVICE_HOST: elasticsearch
|
||||
MONGODB_SERVICE_HOST: mongo
|
||||
REDIS_SERVICE_HOST: redis
|
||||
TRAINS_ELASTIC_SERVICE_HOST: elasticsearch
|
||||
TRAINS_ELASTIC_SERVICE_PORT: 9200
|
||||
TRAINS_MONGODB_SERVICE_HOST: mongo
|
||||
TRAINS_MONGODB_SERVICE_PORT: 27017
|
||||
TRAINS_REDIS_SERVICE_HOST: redis
|
||||
TRAINS_REDIS_SERVICE_PORT: 6379
|
||||
ports:
|
||||
- "8008:8008"
|
||||
networks:
|
||||
|
||||
19
docs/apiserver.conf
Normal file
19
docs/apiserver.conf
Normal file
@@ -0,0 +1,19 @@
|
||||
auth {
|
||||
# Fixed users login credentials
|
||||
# No other user will be able to login
|
||||
fixed_users {
|
||||
enabled: true
|
||||
users: [
|
||||
{
|
||||
username: "jane"
|
||||
password: "12345678"
|
||||
name: "Jane Doe"
|
||||
},
|
||||
{
|
||||
username: "john"
|
||||
password: "12345678"
|
||||
name: "John Doe"
|
||||
},
|
||||
]
|
||||
}
|
||||
}
|
||||
@@ -1,106 +0,0 @@
|
||||
# TRAINS-server: Using Docker Pre-Built Images
|
||||
|
||||
The pre-built Docker image for the **trains-server** is the quickest way to get started with your own **TRAINS** server.
|
||||
|
||||
You can also build the entire **trains-server** architecture using the code available in the [trains-server](https://github.com/allegroai/trains-server) repository.
|
||||
|
||||
**Note**: We tested this pre-built Docker image with Linux, only. For Windows users, we recommend installing the pre-built image on a Linux virtual machine.
|
||||
|
||||
## Prerequisites
|
||||
|
||||
* You must be logged in as a user with sudo privileges
|
||||
* Use `bash` for all command-line instructions in this installation
|
||||
|
||||
## Setup Docker
|
||||
|
||||
### Step 1: Install Docker CE
|
||||
|
||||
You must first install Docker. For instructions about installing Docker, see [Supported platforms](https://docs.docker.com/install//#support) in the Docker documentation.
|
||||
|
||||
For example, to [install in Ubuntu](https://docs.docker.com/install/linux/docker-ce/ubuntu/) / Mint (x86_64/amd64):
|
||||
|
||||
```bash
|
||||
sudo apt-get install -y apt-transport-https ca-certificates curl software-properties-common
|
||||
curl -fsSL https://download.docker.com/linux/ubuntu/gpg | sudo apt-key add -
|
||||
. /etc/os-release
|
||||
sudo add-apt-repository "deb [arch=amd64] https://download.docker.com/linux/ubuntu $UBUNTU_CODENAME stable"
|
||||
sudo apt-get update
|
||||
sudo apt-get install -y docker-ce
|
||||
```
|
||||
|
||||
### Step 2: Set the Maximum Number of Memory Map Areas
|
||||
|
||||
Elastic requires that the `vm.max_map_count` kernel setting, which is the maximum number of memory map areas a process can use, is set to at least 262144.
|
||||
|
||||
For CentOS 7, Ubuntu 16.04, Mint 18.3, Ubuntu 18.04 and Mint 19.x, we tested the following commands to set `vm.max_map_count`:
|
||||
|
||||
```bash
|
||||
echo "vm.max_map_count=262144" > /tmp/99-trains.conf
|
||||
sudo mv /tmp/99-trains.conf /etc/sysctl.d/99-trains.conf
|
||||
sudo sysctl -w vm.max_map_count=262144
|
||||
```
|
||||
|
||||
For information about setting this parameter on other systems, see the [elastic](https://www.elastic.co/guide/en/elasticsearch/reference/current/docker.html#docker-cli-run-prod-mode) documentation.
|
||||
|
||||
### Step 3: Restart the Docker daemon
|
||||
|
||||
Restart the Docker daemon.
|
||||
|
||||
```bash
|
||||
sudo service docker restart
|
||||
```
|
||||
|
||||
### Step 4: Choose a Data Directory
|
||||
|
||||
Choose a directory on your system in which all data maintained by the **trains-server** is stored.
|
||||
Create this directory, and set its owner and group to `uid` 1000. The data stored in this directory includes the database, uploaded files and logs.
|
||||
|
||||
For example, if your data directory is `/opt/trains`, then use the following command:
|
||||
|
||||
```bash
|
||||
sudo mkdir -p /opt/trains/data/elastic
|
||||
sudo mkdir -p /opt/trains/data/mongo/db
|
||||
sudo mkdir -p /opt/trains/data/mongo/configdb
|
||||
sudo mkdir -p /opt/trains/data/redis
|
||||
sudo mkdir -p /opt/trains/logs
|
||||
sudo mkdir -p /opt/trains/data/fileserver
|
||||
sudo mkdir -p /opt/trains/config
|
||||
|
||||
sudo chown -R 1000:1000 /opt/trains
|
||||
```
|
||||
|
||||
## TRAINS-server: Manually Launching Docker Containers <a name="launch"></a>
|
||||
|
||||
You can manually launch the Docker containers using the following commands.
|
||||
|
||||
If your data directory is not `/opt/trains`, then in the five `docker run` commands below, you must replace all occurrences of `/opt/trains` with your data directory path.
|
||||
|
||||
1. Launch the **trains-elastic** Docker container.
|
||||
|
||||
sudo docker run -d --restart="always" --name="trains-elastic" -e "bootstrap.memory_lock=true" --ulimit memlock=-1:-1 -e "ES_JAVA_OPTS=-Xms2g -Xmx2g" -e "bootstrap.memory_lock=true" -e "cluster.name=trains" -e "discovery.zen.minimum_master_nodes=1" -e "node.name=trains" -e "script.inline=true" -e "script.update=true" -e "thread_pool.bulk.queue_size=2000" -e "thread_pool.search.queue_size=10000" -e "xpack.security.enabled=false" -e "xpack.monitoring.enabled=false" -e "cluster.routing.allocation.node_initial_primaries_recoveries=500" -e "node.ingest=true" -e "http.compression_level=7" -e "reindex.remote.whitelist=*.*" -e "script.painless.regex.enabled=true" --network="host" -v /opt/trains/data/elastic:/usr/share/elasticsearch/data docker.elastic.co/elasticsearch/elasticsearch:5.6.16
|
||||
|
||||
1. Launch the **trains-mongo** Docker container.
|
||||
|
||||
sudo docker run -d --restart="always" --name="trains-mongo" -v /opt/trains/data/mongo/db:/data/db -v /opt/trains/data/mongo/configdb:/data/configdb --network="host" mongo:3.6.5
|
||||
|
||||
1. Launch the **trains-redis** Docker container.
|
||||
|
||||
sudo docker run -d --restart="always" --name="trains-redis" -v /opt/trains/data/redis:/data --network="host" redis:5.0
|
||||
|
||||
1. Launch the **trains-fileserver** Docker container.
|
||||
|
||||
sudo docker run -d --restart="always" --name="trains-fileserver" --network="host" -v /opt/trains/logs:/var/log/trains -v /opt/trains/data/fileserver:/mnt/fileserver allegroai/trains:latest fileserver
|
||||
|
||||
1. Launch the **trains-apiserver** Docker container.
|
||||
|
||||
sudo docker run -d --restart="always" --name="trains-apiserver" --network="host" -v /opt/trains/logs:/var/log/trains -v /opt/trains/config:/opt/trains/config allegroai/trains:latest apiserver
|
||||
|
||||
1. Launch the **trains-webserver** Docker container.
|
||||
|
||||
sudo docker run -d --restart="always" --name="trains-webserver" -p 8080:80 allegroai/trains:latest webserver
|
||||
|
||||
1. Your server is now running on [http://localhost:8080](http://localhost:8080) and the following ports are available:
|
||||
|
||||
* API server on port `8008`
|
||||
* Web server on port `8080`
|
||||
* File server on port `8081`
|
||||
335
docs/faq.md
335
docs/faq.md
@@ -1,66 +1,42 @@
|
||||
# TRAINS-server FAQ
|
||||
# trains-server FAQ
|
||||
|
||||
* [Deploying trains-server on Kubernetes clusters](#kubernetes)
|
||||
Launching **trains-server**
|
||||
|
||||
* [Creating a Helm Chart for trains-server Kubernetes deployment](#helm)
|
||||
* How do I launch **trains-server** on:
|
||||
|
||||
* [Running trains-server on Mac OS X](#mac-osx)
|
||||
* [Stand alone Linux Ubuntu systems?](#ubuntu)
|
||||
|
||||
* [macOS?](#mac-osx)
|
||||
|
||||
* [Windows 10?](#docker_compose_win10)
|
||||
|
||||
* [Installing trains-server on stand alone Linux Ubuntu systems ](#ubuntu)
|
||||
* [How do I restart trains-server?](#restart)
|
||||
|
||||
* [Resolving port conflicts preventing fixed users mode authentication and login](#port-conflict)
|
||||
Kubernetes
|
||||
|
||||
* [Configuring trains-server for sub-domains and load balancers](#sub-domains)
|
||||
* [Can I deploy trains-server on Kubernetes clusters?](#kubernetes)
|
||||
|
||||
### Deploying trains-server on Kubernetes clusters <a name="kubernetes"></a>
|
||||
* [Can I create a Helm Chart for trains-server Kubernetes deployment?](#helm)
|
||||
|
||||
**trains-server** supports Kubernetes. See [trains-server-k8s](https://github.com/allegroai/trains-server-k8s)
|
||||
which contains the YAML files describing the required services and detailed instructions for deploying
|
||||
**trains-server** to a Kubernetes clusters.
|
||||
Configuration
|
||||
|
||||
### Creating a Helm Chart for trains-server Kubernetes deployment <a name="helm"></a>
|
||||
* [How do I configure trains-server for sub-domains and load balancers?](#sub-domains)
|
||||
|
||||
**trains-server** supports creating a Helm chart for Kubernetes deployment. See [trains-server-helm](https://github.com/allegroai/trains-server-helm)
|
||||
which you can use to create a Helm chart for **trains-server** and contains detailed instructions for deploying
|
||||
**trains-server** to a Kubernetes clusters using Helm.
|
||||
* [Can I add web login authentication to trains-server?](#web-auth)
|
||||
|
||||
### Running trains-server on Mac OS X <a name="mac-osx"></a>
|
||||
* [Can I modify the non-responsive experiment watchdog settings?](#watchdog)
|
||||
|
||||
To install and configure **trains-server** on Mac OS X, follow the steps below.
|
||||
Troubleshooting
|
||||
|
||||
1. Install [docker for OS X](https://docs.docker.com/docker-for-mac/install/).
|
||||
* [How do I fix Docker upgrade errors?](#common-docker-upgrade-errors)
|
||||
|
||||
1. Configure [Docker](https://www.elastic.co/guide/en/elasticsearch/reference/current/docker.html#docker-cli-run-prod-mode).
|
||||
* [Why is web login authentication not working?](#port-conflict)
|
||||
|
||||
$ screen ~/Library/Containers/com.docker.docker/Data/vms/0/tty
|
||||
sysctl -w vm.max_map_count=262144
|
||||
## Launching **trains-server**
|
||||
|
||||
1. Create local directories for the databases and storage.
|
||||
### How do I launch trains-server on stand alone Linux Ubuntu systems? <a name="ubuntu"></a>
|
||||
|
||||
$ sudo mkdir -p /opt/trains/data/elastic
|
||||
$ sudo mkdir -p /opt/trains/data/mongo/db
|
||||
$ sudo mkdir -p /opt/trains/data/mongo/configdb
|
||||
$ sudo mkdir -p /opt/trains/logs
|
||||
$ sudo mkdir -p /opt/trains/config
|
||||
$ sudo mkdir -p /opt/trains/data/fileserver
|
||||
$ sudo chown -R $(whoami):staff /opt/trains
|
||||
|
||||
1. Open the Docker app, select **Preferences**, and then on the **File Sharing** tab, add `/opt/trains`.
|
||||
|
||||
1. Clone the [trains-server](https://github.com/allegroai/trains-server) repository and change directories to the new **trains-server** directory.
|
||||
|
||||
$ git clone https://github.com/allegroai/trains-server.git
|
||||
$ cd trains-server
|
||||
|
||||
1. Run `docker-compose` with the unified docker image.
|
||||
|
||||
$ docker-compose -f docker-compose-unified.yml up
|
||||
|
||||
Your server is now running on [http://localhost:8080](http://localhost:8080)
|
||||
|
||||
### Installing trains-server on stand alone Linux Ubuntu systems <a name="ubuntu"></a>
|
||||
|
||||
To install **trains-server** on a stand alone Linux Ubuntu, follow the steps belows.
|
||||
To launch **trains-server** on a stand alone Linux Ubuntu:
|
||||
|
||||
1. Install [docker for Ubuntu](https://docs.docker.com/install/linux/docker-ce/ubuntu/).
|
||||
|
||||
@@ -71,81 +47,127 @@ To install **trains-server** on a stand alone Linux Ubuntu, follow the steps bel
|
||||
|
||||
1. Remove the previous installation of **trains-server**.
|
||||
|
||||
**WARNING**: This clears all existing **TRAINS** databases.
|
||||
**WARNING**: This clears all existing **Trains** databases.
|
||||
|
||||
$ sudo rm -R /opt/trains/
|
||||
sudo rm -R /opt/trains/
|
||||
|
||||
1. Create local directories for the databases and storage.
|
||||
|
||||
$ sudo mkdir -p /opt/trains/data/elastic
|
||||
$ sudo mkdir -p /opt/trains/data/mongo/db
|
||||
$ sudo mkdir -p /opt/trains/data/mongo/configdb
|
||||
$ sudo mkdir -p /opt/trains/logs
|
||||
$ sudo mkdir -p /opt/trains/config
|
||||
$ sudo mkdir -p /opt/trains/data/fileserver
|
||||
$ sudo chown -R 1000:1000 /opt/trains
|
||||
sudo mkdir -p /opt/trains/data/elastic
|
||||
sudo mkdir -p /opt/trains/data/mongo/db
|
||||
sudo mkdir -p /opt/trains/data/mongo/configdb
|
||||
sudo mkdir -p /opt/trains/logs
|
||||
sudo mkdir -p /opt/trains/config
|
||||
sudo mkdir -p /opt/trains/data/fileserver
|
||||
sudo chown -R 1000:1000 /opt/trains
|
||||
|
||||
1. Clone the [trains-server](https://github.com/allegroai/trains-server) repository and change directories to the new **trains-server** directory.
|
||||
|
||||
$ git clone https://github.com/allegroai/trains-server.git
|
||||
$ cd trains-server
|
||||
git clone https://github.com/allegroai/trains-server.git
|
||||
cd trains-server
|
||||
|
||||
1. Run `docker-compose`
|
||||
|
||||
$ /usr/local/bin/docker-compose -f docker-compose.yml up
|
||||
/usr/local/bin/docker-compose -f docker-compose.yml up
|
||||
|
||||
Your server is now running on [http://localhost:8080](http://localhost:8080)
|
||||
|
||||
### How do I launch trains-server on macOS? <a name="mac-osx"></a>
|
||||
|
||||
To launch **trains-server** on macOS:
|
||||
|
||||
1. Install [docker for macOS](https://docs.docker.com/docker-for-mac/install/).
|
||||
|
||||
1. Configure [Docker](https://www.elastic.co/guide/en/elasticsearch/reference/current/docker.html#docker-cli-run-prod-mode).
|
||||
|
||||
screen ~/Library/Containers/com.docker.docker/Data/vms/0/tty
|
||||
sysctl -w vm.max_map_count=262144
|
||||
|
||||
1. Create local directories for the databases and storage.
|
||||
|
||||
sudo mkdir -p /opt/trains/data/elastic
|
||||
sudo mkdir -p /opt/trains/data/mongo/db
|
||||
sudo mkdir -p /opt/trains/data/mongo/configdb
|
||||
sudo mkdir -p /opt/trains/data/redis
|
||||
sudo mkdir -p /opt/trains/logs
|
||||
sudo mkdir -p /opt/trains/config
|
||||
sudo mkdir -p /opt/trains/data/fileserver
|
||||
sudo chown -R $(whoami):staff /opt/trains
|
||||
|
||||
1. Open the Docker app, select **Preferences**, and then on the **File Sharing** tab, add `/opt/trains`.
|
||||
|
||||
1. Clone the [trains-server](https://github.com/allegroai/trains-server) repository and change directories to the new **trains-server** directory.
|
||||
|
||||
git clone https://github.com/allegroai/trains-server.git
|
||||
cd trains-server
|
||||
|
||||
1. Run `docker-compose` with the docker compose file.
|
||||
|
||||
docker-compose -f docker-compose.yml up
|
||||
|
||||
Your server is now running on [http://localhost:8080](http://localhost:8080)
|
||||
|
||||
### Resolving port conflicts preventing fixed users mode authentication and login <a name="port-conflict"></a>
|
||||
### How do I launch trains-server on Windows 10? <a name="docker_compose_win10"></a>
|
||||
|
||||
A port conflict may occur between the **trains-server** MongoDB and Elastic instances and other
|
||||
instances running on your system. **trains-server** uses the following default ports which may be in conflict with other instances:
|
||||
You can run **trains-server** on Windows 10 using Docker Desktop for Windows (see the Docker [System Requirements](https://docs.docker.com/docker-for-windows/install/#system-requirements)).
|
||||
|
||||
* MongoDB port `27017`
|
||||
* Elastic port `9200`
|
||||
To launch **trains-server** on Windows 10:
|
||||
|
||||
You can check for port conflicts in the logs in `/opt/trains/log`.
|
||||
1. Install the Docker Desktop for Windows application by either:
|
||||
|
||||
If a port conflict occurs, first change the port in your **trains-server** `/opt/trains/server/config/default/hosts.conf` file to the new port and then
|
||||
run the `docker run` command with the `port` option specifying the new port to restart the **trains-server** instance.
|
||||
* following the [Install Docker Desktop on Windows](https://docs.docker.com/docker-for-windows/install/) instructions.
|
||||
* running the Docker installation [wizard](https://hub.docker.com/?overlay=onboarding).
|
||||
|
||||
For example, to resolve a MongoDB port conflict change port `27017` to `27018`:
|
||||
1. Increase the memory allocation in Docker Desktop to `4GB`.
|
||||
|
||||
1. Modify `/opt/trains/server/config/default/hosts.conf` changing the ports in the `mongo` section:
|
||||
1. In your Windows notification area (system tray), right click the Docker icon.
|
||||
|
||||
1. Click *Settings*, *Advanced*, and then set the memory to at least `4096`.
|
||||
|
||||
1. Click *Apply*.
|
||||
|
||||
elastic {
|
||||
events {
|
||||
hosts: [{host: "127.0.0.1", port: 9200}]
|
||||
args {
|
||||
timeout: 60
|
||||
dead_timeout: 10
|
||||
max_retries: 5
|
||||
retry_on_timeout: true
|
||||
}
|
||||
index_version: "1"
|
||||
}
|
||||
}
|
||||
1. Create local directories for data and logs. Open PowerShell and execute the following commands:
|
||||
|
||||
mongo {
|
||||
backend {
|
||||
host: "mongodb://127.0.0.1:27018/backend"
|
||||
}
|
||||
auth {
|
||||
host: "mongodb://127.0.0.1:27018/auth"
|
||||
}
|
||||
}
|
||||
cd c:
|
||||
mkdir c:\opt\trains\data
|
||||
mkdir c:\opt\trains\logs
|
||||
|
||||
2. Start the **trains-server** MongoDB container using `--port 27018`.
|
||||
1. Download the **trains-server** docker-compose YAML file [docker-compose-win10.yml](https://raw.githubusercontent.com/allegroai/trains-server/master/docker-compose-win10.yml) as `c:\opt\trains\docker-compose.yml`.
|
||||
|
||||
sudo docker run -d --restart="always" --name="trains-mongo" -v /opt/trains/data/mongo/db:/data/db -v /opt/trains/data/mongo/configdb:/data/configdb --network="host" mongo:3.6.5 mongod --port 27018
|
||||
1. Run `docker-compose`. In PowerShell, execute the following commands:
|
||||
|
||||
In a future version of **trains-server**, to start the API server, environment variables will be available to use instead of modifying the configuration file (instead of Step 1 above).
|
||||
The environment variables will be available to set different ports for both MongoDB and Elastic instances:
|
||||
docker-compose -f up docker-compose-win10.yml
|
||||
|
||||
* `MONGODB_SERVICE_PORT` (e.g., `MONGODB_SERVICE_PORT=27018`)
|
||||
* `ELASTIC_SERVICE_POST` (e.g., `ELASTIC_SERVICE_POST=9201`)
|
||||
Your server is now running on [http://localhost:8080](http://localhost:8080)
|
||||
|
||||
### Configuring trains-server for sub-domains and load balancers <a name="sub-domains"></a>
|
||||
### How do I restart trains-server? <a name="restart"></a>
|
||||
|
||||
Restart *trains-server* by first stopping the Docker containers and then restarting them.
|
||||
|
||||
```bash
|
||||
docker-compose down
|
||||
docker-compose up -f docker-compose.yml
|
||||
```
|
||||
|
||||
**Note**: If you are using a different docker-compose YAML file, specify that file.
|
||||
|
||||
## Kubernetes
|
||||
|
||||
### Can I deploy trains-server on Kubernetes clusters? <a name="kubernetes"></a>
|
||||
|
||||
**trains-server** supports Kubernetes. See [trains-server-k8s](https://github.com/allegroai/trains-server-k8s)
|
||||
which contains the YAML files describing the required services and detailed instructions for deploying
|
||||
**trains-server** to a Kubernetes clusters.
|
||||
|
||||
### Can I create a Helm Chart for trains-server Kubernetes deployment? <a name="helm"></a>
|
||||
|
||||
**trains-server** supports creating a Helm chart for Kubernetes deployment. See [trains-server-helm](https://github.com/allegroai/trains-server-helm)
|
||||
which you can use to create a Helm chart for **trains-server** and contains detailed instructions for deploying
|
||||
**trains-server** to a Kubernetes clusters using Helm.
|
||||
|
||||
## Configuration
|
||||
|
||||
### How do I configure trains-server for sub-domains and load balancers? <a name="sub-domains"></a>
|
||||
|
||||
You can configure **trains-server** for sub-domains and a load balancer.
|
||||
|
||||
@@ -181,3 +203,126 @@ For example, if your domain is `trains.mydomain.com` and your sub-domains are `a
|
||||
|
||||
1. Run the Docker containers with our updated `docker run` commands (see [Launching Docker Containers](#https://github.com/allegroai/trains-server#launching-docker-containers)).
|
||||
|
||||
### Can I add web login authentication to trains-server? <a name="web-auth"></a>
|
||||
|
||||
By default, anyone can login to the **trains-server** Web-App.
|
||||
You can configure the **trains-server** to allow only a specific set of users to access the system.
|
||||
|
||||
To add web login authentication to **trains-server**:
|
||||
|
||||
1. If you are not using the current **trains-server** version, then [upgrade](https://github.com/allegroai/trains-server#upgrade).
|
||||
|
||||
1. In `/opt/trains/config/apiserver.conf`, add the `auth` section and in it specify the users, for example:
|
||||
|
||||
**Note**: A sample `apiserver.conf` configuration file is also available [here](https://github.com/allegroai/trains-server/blob/master/docs/apiserver.conf).
|
||||
|
||||
auth {
|
||||
# Fixed users login credentials
|
||||
# No other user will be able to login
|
||||
fixed_users {
|
||||
enabled: true
|
||||
users: [
|
||||
{
|
||||
username: "jane"
|
||||
password: "12345678"
|
||||
name: "Jane Doe"
|
||||
},
|
||||
{
|
||||
username: "john"
|
||||
password: "12345678"
|
||||
name: "John Doe"
|
||||
},
|
||||
]
|
||||
}
|
||||
}
|
||||
|
||||
1. Restart **trains-server** (see the [Restarting trains-server](#restart) FAQ).
|
||||
|
||||
### Can I modify the experiment watchdog settings? <a name="watchdog"></a>
|
||||
|
||||
The non-responsive experiment watchdog monitors experiments that were not updated for a specified period of time
|
||||
and marks them as `aborted`. The watchdog is always active.
|
||||
|
||||
You can modify the following settings for the watchdog:
|
||||
|
||||
* the time threshold (in seconds) of experiment inactivity (default value is 7200 seconds (2 hours))
|
||||
* the time interval (in seconds) between watchdog cycles
|
||||
|
||||
To change the watchdog's settings:
|
||||
|
||||
1. In `/opt/trains/config`, add the `services.conf` file and in it specify the watchdog settings, for example:
|
||||
|
||||
**Note**: A sample watchdog `services.conf` configuration file is also available [here](https://github.com/allegroai/trains-server/blob/master/docs/services.conf).
|
||||
|
||||
tasks {
|
||||
non_responsive_tasks_watchdog {
|
||||
# In-progress tasks that haven't been updated for at least 'value' seconds will be stopped by the watchdog
|
||||
threshold_sec: 7200
|
||||
|
||||
# Watchdog will sleep for this number of seconds after each cycle
|
||||
watch_interval_sec: 900
|
||||
}
|
||||
}
|
||||
|
||||
1. Restart **trains-server** (see the [Restarting trains-server](#restart) FAQ).
|
||||
|
||||
## Troubleshooting
|
||||
|
||||
### How do I fix Docker upgrade errors? <a name="common-docker-upgrade-errors"></a>
|
||||
|
||||
To resolve the Docker error "... The container name "/trains-???" is already in use by ...", try removing deprecated images:
|
||||
|
||||
docker rm -f $(docker ps -a -q)
|
||||
|
||||
### Why is web login authentication not working?
|
||||
|
||||
A port conflict between the **trains-server** MongoDB and / or Elastic instances, and other
|
||||
instances running on your system may prevent web login authentication
|
||||
from working correctly.
|
||||
|
||||
**trains-server** uses the following default ports which may be in conflict with other instances:
|
||||
|
||||
* MongoDB port `27017`
|
||||
* Elastic port `9200`
|
||||
|
||||
You can check for port conflicts in the logs in `/opt/trains/log`.
|
||||
|
||||
If a port conflict occurs, change the MongoDB and / or Elastic ports in the `docker-compose.yml`,
|
||||
and then run the Docker compose commands to restart the **trains-server** instance.
|
||||
|
||||
To change the MongoDB and / or Elastic ports for **trains-server**:
|
||||
|
||||
1. Edit the `docker-compose.yml` file.
|
||||
|
||||
1. In the `services/trainsserver/environment` section, add the following environment variable(s):
|
||||
|
||||
* For MongoDB:
|
||||
|
||||
MONGODB_SERVICE_PORT: <new-mongodb-port>
|
||||
|
||||
* For Elastic:
|
||||
|
||||
ELASTIC_SERVICE_PORT: <new-elasticsearch-port>
|
||||
|
||||
For example:
|
||||
|
||||
MONGODB_SERVICE_PORT: 27018
|
||||
ELASTIC_SERVICE_PORT: 9201
|
||||
|
||||
1. For MongoDB, in the `services/mongo/ports` section, expose the new MongoDB port:
|
||||
|
||||
<new-mongodb-port>:27017
|
||||
|
||||
For example:
|
||||
|
||||
20718:27017
|
||||
|
||||
1. For Elastic, in the `services/elasticsearch/ports` section, expose the new Elastic port:
|
||||
|
||||
<new-elsticsearch-port>:9200
|
||||
|
||||
For example:
|
||||
|
||||
9201:9200
|
||||
|
||||
2. Restart **trains-server** (see the [Restarting trains-server](#restart) FAQ).
|
||||
@@ -1,32 +1,36 @@
|
||||
# **TRAINS-server**: AWS pre-installed images
|
||||
# Deploying **trains-server** on AWS
|
||||
|
||||
In order to easily deploy **trains-server** on AWS, we created the following Amazon Machine Images (AMIs).
|
||||
To easily deploy **trains-server** on AWS, use one of our pre-built Amazon Machine Images (AMIs).
|
||||
We provide AMIs per region for each released version of **trains-server**, see [Released versions](#released-versions) below.
|
||||
|
||||
Service port numbers on these AMIs are:
|
||||
- Web: 8080
|
||||
- API: 8008
|
||||
- File Server: 8081
|
||||
Once the AMI is up and running, [configure the Trains client](https://github.com/allegroai/trains/blob/master/README.md#configuration) to use your **trains-server**.
|
||||
The service port numbers on our **trains-server** AMIs:
|
||||
|
||||
Persistent storage configuration:
|
||||
- MongoDB: /opt/trains/data/mongo/
|
||||
- ElasticSearch: /opt/trains/data/elastic/
|
||||
- File Server: /mnt/fileserver/
|
||||
- Web application: `8080`
|
||||
- API Server: `8008`
|
||||
- File Server: `8081`
|
||||
|
||||
Instructions on launching a custom AMI from the EC2 console can be found [here](https://aws.amazon.com/premiumsupport/knowledge-center/launch-instance-custom-ami/)
|
||||
and a detailed version [here](https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/launching-instance.html).
|
||||
The persistent storage configuration:
|
||||
|
||||
The minimum recommended instance type is **t3a.large**
|
||||
- MongoDB: `/opt/trains/data/mongo/`
|
||||
- ElasticSearch: `/opt/trains/data/elastic/`
|
||||
- File Server: `/mnt/fileserver/`
|
||||
|
||||
For examples and use cases, check the [Trains usage examples](https://github.com/allegroai/trains/blob/master/docs/trains_examples.md).
|
||||
|
||||
For instructions on launching a custom AMI from the EC2 console, see the [AWS Knowledge Center](https://aws.amazon.com/premiumsupport/knowledge-center/launch-instance-custom-ami/) or detailed instructions in the [AWS Documentation](https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/launching-instance.html).
|
||||
|
||||
The minimum recommended amount of RAM is 8GB. For example, **t3.large** or **t3a.large** would have the minimum recommended amount of resources.
|
||||
|
||||
## Upgrading
|
||||
|
||||
In order to upgrade **trains-server** on an existing EC2 instance based on one of these AMIs, SSH into the instance and follow the [upgrade instructions](../README.md#upgrade) for **trains-server**.
|
||||
To upgrade **trains-server** on an existing EC2 instance based on one of these AMIs, SSH into the instance and follow the [upgrade instructions](../README.md#upgrade) for **trains-server**.
|
||||
|
||||
### Upgrading AMI's to v0.12
|
||||
**Including the automatically updated AMI**
|
||||
### Upgrading AMIs to v0.12
|
||||
|
||||
Version 0.12 introduced an additional REDIS docker to the trains-server setup.
|
||||
This upgrade includes the automatically updated AMI in Version 0.12. It also includes an additional REDIS docker to the **trains-server** setup.
|
||||
|
||||
AMI upgrading instructions:
|
||||
To upgrade the AMI:
|
||||
|
||||
1. SSH to the EC2 machine running one of the `Latest Version AMI's`
|
||||
2. Execute the following bash commands
|
||||
@@ -44,47 +48,86 @@ AMI upgrading instructions:
|
||||
|
||||
## Released versions
|
||||
|
||||
The following sections provide a list containing AMI Image ID per region for each released **trains-server** version.
|
||||
The following sections contain lists of AMI Image IDs, per region, for each released **trains-server** version.
|
||||
|
||||
### Latest Version AMI <a name="autoupdate"></a>
|
||||
**For easier upgrades: The following AMI automatically update to the latest release every reboot**
|
||||
### Latest version AMI - v0.13.0 (auto update)<a name="autoupdate"></a>
|
||||
|
||||
* **eu-north-1** : ami-072aef14041e70651
|
||||
* **ap-south-1** : ami-08032d881daca4de1
|
||||
* **eu-west-3** : ami-0b39c123d4343d408
|
||||
* **eu-west-2** : ami-0e0fe6fd14b2e9029
|
||||
* **eu-west-1** : ami-087c81e06d722e938
|
||||
* **ap-northeast-2** : ami-0caf74f03322b994c
|
||||
* **ap-northeast-1** : ami-0f723b3d49c0f2749
|
||||
* **sa-east-1** : ami-0ac5595ad0e106502
|
||||
* **ca-central-1** : ami-053049b463869469a
|
||||
* **ap-southeast-1** : ami-0b440ec389d6ff541
|
||||
* **ap-southeast-2** : ami-02af978ddc2c15b71
|
||||
* **eu-central-1** : ami-09ef364aa8df29760
|
||||
* **us-east-2** : ami-02e33f8ab77071509
|
||||
* **us-west-1** : ami-0ff33f256907fd460
|
||||
* **us-west-2** : ami-0387728fb09c8cda7
|
||||
* **us-east-1** : ami-02c47c5233eed7f88
|
||||
For easier upgrades, the following AMIs automatically update to the latest release every reboot:
|
||||
|
||||
### v0.12.0
|
||||
* **eu-north-1** : ami-0ebb4bb8637d0da65
|
||||
* **ap-south-1** : ami-0fb3c89eb8a8fc294
|
||||
* **eu-west-3** : ami-0b55ea4a6698d5875
|
||||
* **eu-west-2** : ami-02979b6d77856b842
|
||||
* **eu-west-1** : ami-07f4c17a636489574
|
||||
* **ap-northeast-2** : ami-06071092427dd5ab4
|
||||
* **ap-northeast-1** : ami-0fbacddfc0e8d2651
|
||||
* **sa-east-1** : ami-073590d3b3e6f4cfd
|
||||
* **ca-central-1** : ami-0839610fc0101e41c
|
||||
* **ap-southeast-1** : ami-0ff0adeef7f9fa879
|
||||
* **ap-southeast-2** : ami-03ed15d31bfc2844c
|
||||
* **eu-central-1** : ami-0813c06d8b2462c39
|
||||
* **us-east-2** : ami-07c593425f988b054
|
||||
* **us-west-1** : ami-0eb0e13b1f06c03c0
|
||||
* **us-west-2** : ami-000568ca142798412
|
||||
* **us-east-1** : ami-062d9da44f96c8a87
|
||||
* **eu-north-1** : ami-003024b7b575d3f2a
|
||||
* **ap-south-1** : ami-0d784c7ac2ab4cc72
|
||||
* **eu-west-3** : ami-091d745be445b69db
|
||||
* **eu-west-2** : ami-0a4ebf5d45c672411
|
||||
* **eu-west-1** : ami-021e3421c50d1482c
|
||||
* **ap-northeast-2** : ami-0d0a25ec610d6d122
|
||||
* **ap-northeast-1** : ami-01d896f9ae5d87890
|
||||
* **sa-east-1** : ami-09bcb93835428a412
|
||||
* **ca-central-1** : ami-077fa58c9f73690c7
|
||||
* **ap-southeast-1** : ami-046fe4832b077b517
|
||||
* **ap-southeast-2** : ami-0ab9acb41f8abbba7
|
||||
* **eu-central-1** : ami-079be664aae12db00
|
||||
* **us-east-2** : ami-0d48555f80cb7993a
|
||||
* **us-west-1** : ami-0ed85ab91a7bb5a8a
|
||||
* **us-west-2** : ami-0b4fe4ca18e9b1227
|
||||
* **us-east-1** : ami-043b95dd034e581e6
|
||||
|
||||
### v0.13.0 (static update)
|
||||
* **eu-north-1** : ami-0e26c3af1663428dc
|
||||
* **ap-south-1** : ami-07451eb44f51380a8
|
||||
* **eu-west-3** : ami-0108e506c6e0be8d8
|
||||
* **eu-west-2** : ami-0fc1fdbc7699f0dde
|
||||
* **eu-west-1** : ami-0efbf8d2f580a9cee
|
||||
* **ap-northeast-2** : ami-08f0bbd7e08d0603e
|
||||
* **ap-northeast-1** : ami-024522bea34dbe3ce
|
||||
* **sa-east-1** : ami-0fe5b6e0ddc1553d9
|
||||
* **ca-central-1** : ami-0037c26178a584ade
|
||||
* **ap-southeast-1** : ami-049dbcc0f0a6dba20
|
||||
* **ap-southeast-2** : ami-02d1ce8d31c27f187
|
||||
* **eu-central-1** : ami-0550b14b40371182a
|
||||
* **us-east-2** : ami-040a1f16ceda8f255
|
||||
* **us-west-1** : ami-003b5673c08d68cdb
|
||||
* **us-west-2** : ami-0fec951d8043da62d
|
||||
* **us-east-1** : ami-049694de0137fdea4
|
||||
|
||||
### v0.12.1 (static update)
|
||||
* **eu-north-1** : ami-003118a8103286d84
|
||||
* **ap-south-1** : ami-02dfe86baa48e096f
|
||||
* **eu-west-3** : ami-0cc1f01267d2a780d
|
||||
* **eu-west-2** : ami-0e4c8332e5ce09585
|
||||
* **eu-west-1** : ami-03459a2f0b0a3b1ab
|
||||
* **ap-northeast-2** : ami-08f6c2aed3a53f24c
|
||||
* **ap-northeast-1** : ami-0b798eab95a7c5435
|
||||
* **sa-east-1** : ami-0d3ee166c09f0d1b2
|
||||
* **ca-central-1** : ami-00a758c56bd63acd5
|
||||
* **ap-southeast-1** : ami-0be64d4988cd03fbb
|
||||
* **ap-southeast-2** : ami-02087310d43a63f31
|
||||
* **eu-central-1** : ami-097bbefeac0c74225
|
||||
* **us-east-2** : ami-07eda256712b90f4d
|
||||
* **us-west-1** : ami-02ef2b55cbd01c7df
|
||||
* **us-west-2** : ami-037c6176ef4735360
|
||||
* **us-east-1** : ami-08715c20c0e3f1c15
|
||||
|
||||
### v0.12.0 (static update)
|
||||
|
||||
* **eu-north-1** : ami-03ff8ab48cd43e77e
|
||||
* **ap-south-1** : ami-079c1a41ff836487c
|
||||
* **eu-west-3** : ami-0121ef0398ae87ab0
|
||||
* **eu-west-2** : ami-09f0f97654d8c79de
|
||||
* **eu-west-1** : ami-0b7ba303f757bfcd9
|
||||
* **ap-northeast-2** : ami-053f416517b5f40a6
|
||||
* **ap-northeast-1** : ami-056dff06c698c2d9d
|
||||
* **sa-east-1** : ami-017ab655119258639
|
||||
* **ca-central-1** : ami-03bf5fa1d86ac97f6
|
||||
* **ap-southeast-1** : ami-0e667958002b0360c
|
||||
* **ap-southeast-2** : ami-091f1b69cb43b1933
|
||||
* **eu-central-1** : ami-068ec2f0e98c26541
|
||||
* **us-east-2** : ami-0524bbdc1b64ff83f
|
||||
* **us-west-1** : ami-0b4facd7534e393c9
|
||||
* **us-west-2** : ami-0018d5a7e58966848
|
||||
* **us-east-1** : ami-08f24178fc14a84d2
|
||||
|
||||
### v0.11.0 (static update)
|
||||
|
||||
### v0.11.0
|
||||
* **eu-north-1** : ami-0cbe338f058018c97
|
||||
* **ap-south-1** : ami-06d72ff894f7a5e5d
|
||||
* **eu-west-3** : ami-00f2a45d67df2d2f3
|
||||
@@ -102,7 +145,8 @@ The following sections provide a list containing AMI Image ID per region for eac
|
||||
* **us-west-2** : ami-0e384b6f78bf96ebe
|
||||
* **us-east-1** : ami-0a7b46f907d5d9c4a
|
||||
|
||||
### v0.10.1
|
||||
### v0.10.1 (static update)
|
||||
|
||||
* **eu-north-1** : ami-09937ec4d18350c32
|
||||
* **ap-south-1** : ami-089d6ba7541ec4c7f
|
||||
* **eu-west-3** : ami-0accb1a94bdd5c5c1
|
||||
@@ -120,7 +164,8 @@ The following sections provide a list containing AMI Image ID per region for eac
|
||||
* **us-west-2** : ami-0d1cb8ba7de246ff0
|
||||
* **us-east-1** : ami-049ccba6abdb40cba
|
||||
|
||||
### v0.10.0
|
||||
### v0.10.0 (static update)
|
||||
|
||||
* **eu-north-1** : ami-05ba33c763877e54e
|
||||
* **ap-south-1** : ami-0529eec569161cae5
|
||||
* **eu-west-3** : ami-03cb9396f63e26ff6
|
||||
@@ -139,7 +184,7 @@ The following sections provide a list containing AMI Image ID per region for eac
|
||||
* **us-west-2** : ami-04a522ecb2250fb44
|
||||
* **us-east-1** : ami-0a66ddbd50959f91e
|
||||
|
||||
### v0.9.0
|
||||
### v0.9.0 (static update)
|
||||
|
||||
* **us-east-1** : ami-0991ad536ecbacdac
|
||||
* **eu-north-1** : ami-07cbcdff501b14afe
|
||||
@@ -157,3 +202,4 @@ The following sections provide a list containing AMI Image ID per region for eac
|
||||
* **us-east-2** : ami-03b01914b07428488
|
||||
* **us-west-1** : ami-0cf4768e9d47ed076
|
||||
* **us-west-2** : ami-0b145f37da31eb9fb
|
||||
|
||||
|
||||
97
docs/install_linux_mac.md
Normal file
97
docs/install_linux_mac.md
Normal file
@@ -0,0 +1,97 @@
|
||||
# Launching the **trains-server** Docker in Linux or macOS
|
||||
|
||||
For Linux or macOS, use our pre-built Docker image for easy deployment. The latest Docker images can be found [here](https://hub.docker.com/r/allegroai/trains).
|
||||
|
||||
For Linux users:
|
||||
|
||||
* You must be logged in as a user with sudo privileges.
|
||||
* Use `bash` for all command-line instructions in this installation.
|
||||
|
||||
To launch **trains-server** on Linux or macOS:
|
||||
|
||||
1. Install Docker.
|
||||
|
||||
* Linux - see [Docker for Ubuntu](https://docs.docker.com/install/linux/docker-ce/ubuntu/).
|
||||
* macOS - see [Docker for macOS](https://docs.docker.com/docker-for-mac/install/).
|
||||
|
||||
1. Verify the Docker CE installation. Execute the command:
|
||||
|
||||
sudo docker run hello-world
|
||||
|
||||
The expected is output is:
|
||||
|
||||
Hello from Docker!
|
||||
This message shows that your installation appears to be working correctly.
|
||||
To generate this message, Docker took the following steps:
|
||||
|
||||
1. The Docker client contacted the Docker daemon.
|
||||
2. The Docker daemon pulled the "hello-world" image from the Docker Hub. (amd64)
|
||||
3. The Docker daemon created a new container from that image which runs the executable that produces the output you are currently reading.
|
||||
4. The Docker daemon streamed that output to the Docker client, which sent it to your terminal.
|
||||
|
||||
1. For Linux only, install `docker-compose`. Execute the following commands (for more information, see [Install Docker Compose](https://docs.docker.com/compose/install/) in the Docker documentation):
|
||||
|
||||
sudo curl -L "https://github.com/docker/compose/releases/download/1.24.1/docker-compose-$(uname -s)-$(uname -m)" -o /usr/local/bin/docker-compose
|
||||
sudo chmod +x /usr/local/bin/docker-compose
|
||||
|
||||
1. Increase `vm.max_map_count` for ElasticSearch docker.
|
||||
|
||||
Linux:
|
||||
|
||||
echo "vm.max_map_count=262144" > /tmp/99-trains.conf
|
||||
sudo mv /tmp/99-trains.conf /etc/sysctl.d/99-trains.conf
|
||||
sudo sysctl -w vm.max_map_count=262144
|
||||
sudo service docker restart
|
||||
|
||||
macOS:
|
||||
|
||||
screen ~/Library/Containers/com.docker.docker/Data/vms/0/tty
|
||||
sysctl -w vm.max_map_count=262144
|
||||
|
||||
|
||||
1. Remove any previous installation of **trains-server**.
|
||||
|
||||
**WARNING**: This clears all existing **Trains** databases.
|
||||
|
||||
sudo rm -R /opt/trains/
|
||||
|
||||
1. Create local directories for the databases and storage.
|
||||
|
||||
sudo mkdir -p /opt/trains/data/elastic
|
||||
sudo mkdir -p /opt/trains/data/mongo/db
|
||||
sudo mkdir -p /opt/trains/data/mongo/configdb
|
||||
sudo mkdir -p /opt/trains/data/redis
|
||||
sudo mkdir -p /opt/trains/logs
|
||||
sudo mkdir -p /opt/trains/config
|
||||
sudo mkdir -p /opt/trains/data/fileserver
|
||||
|
||||
1. For macOS only, open the Docker app, select **Preferences**, and then on the **File Sharing** tab, add `/opt/trains`.
|
||||
|
||||
1. Grant access to the Dockers.
|
||||
|
||||
Linux:
|
||||
|
||||
sudo chown -R 1000:1000 /opt/trains
|
||||
|
||||
macOS:
|
||||
|
||||
sudo chown -R $(whoami):staff /opt/trains
|
||||
|
||||
1. Download the **trains-server** docker-compose YAML file.
|
||||
|
||||
cd /opt/trains
|
||||
curl https://raw.githubusercontent.com/allegroai/trains-server/master/docker-compose.yml -o docker-compose.yml
|
||||
|
||||
1. Run `docker-compose` with the downloaded configuration file.
|
||||
|
||||
sudo docker-compose -f docker-compose.yml up
|
||||
|
||||
Your server is now running on [http://localhost:8080](http://localhost:8080) and the following ports are available:
|
||||
|
||||
* Web server on port `8080`
|
||||
* API server on port `8008`
|
||||
* File server on port `8081`
|
||||
|
||||
## Next Step
|
||||
|
||||
Configure the [Trains client for trains-server](https://github.com/allegroai/trains/blob/master/README.md#configuration).
|
||||
50
docs/install_win.md
Normal file
50
docs/install_win.md
Normal file
@@ -0,0 +1,50 @@
|
||||
# Launching the **trains-server** Docker in Windows 10
|
||||
|
||||
For Windows, we recommend launching our pre-built Docker image on a Linux virtual machine.
|
||||
However, you can launch **trains-server** on Windows 10 using Docker Desktop for Windows (see the Docker [System Requirements](https://docs.docker.com/docker-for-windows/install/#system-requirements)).
|
||||
|
||||
To launch **trains-server** on Windows 10:
|
||||
|
||||
1. Install the Docker Desktop for Windows application by either:
|
||||
|
||||
* Following the [Install Docker Desktop on Windows](https://docs.docker.com/docker-for-windows/install/) instructions.
|
||||
* Running the Docker installation [wizard](https://hub.docker.com/?overlay=onboarding).
|
||||
|
||||
1. Increase the memory allocation in Docker Desktop to `4GB`.
|
||||
|
||||
1. In your Windows notification area (system tray), right click the Docker icon.
|
||||
|
||||
1. Click *Settings*, *Advanced*, and then set the memory to at least `4096`.
|
||||
|
||||
1. Click *Apply*.
|
||||
|
||||
1. Remove any previous installation of **trains-server**.
|
||||
|
||||
**WARNING**: This clears all existing **Trains** databases.
|
||||
|
||||
rmdir c:\opt\trains /s
|
||||
|
||||
1. Create local directories for data and logs. Open PowerShell and execute the following commands:
|
||||
|
||||
cd c:
|
||||
mkdir c:\opt\trains\data
|
||||
mkdir c:\opt\trains\logs
|
||||
|
||||
1. Save the **trains-server** docker-compose YAML file.
|
||||
|
||||
cd c:\opt\trains
|
||||
curl https://raw.githubusercontent.com/allegroai/trains-server/master/docker-compose-win10.yml -o docker-compose-win10.yml
|
||||
|
||||
1. Run `docker-compose`. In PowerShell, execute the following commands:
|
||||
|
||||
docker-compose -f docker-compose-win10.yml up
|
||||
|
||||
Your server is now running on [http://localhost:8080](http://localhost:8080) and the following ports are available:
|
||||
|
||||
* Web server on port `8080`
|
||||
* API server on port `8008`
|
||||
* File server on port `8081`
|
||||
|
||||
## Next Step
|
||||
|
||||
Configure the [Trains client for trains-server](https://github.com/allegroai/trains/blob/master/README.md#configuration).
|
||||
9
docs/services.conf
Normal file
9
docs/services.conf
Normal file
@@ -0,0 +1,9 @@
|
||||
tasks {
|
||||
non_responsive_tasks_watchdog {
|
||||
# In-progress tasks that haven't been updated for at least 'value' seconds will be stopped by the watchdog
|
||||
threshold_sec: 7200
|
||||
|
||||
# Watchdog will sleep for this number of seconds after each cycle
|
||||
watch_interval_sec: 900
|
||||
}
|
||||
}
|
||||
@@ -105,7 +105,6 @@ _error_codes = {
|
||||
|
||||
(403, 'forbidden'): {
|
||||
10: ('routing_error', 'forbidden (routing error)'),
|
||||
11: ('missing_routing_header', 'forbidden (missing routing header)'),
|
||||
12: ('blocked_internal_endpoint', 'forbidden (blocked internal endpoint)'),
|
||||
20: ('role_not_allowed', 'forbidden (not allowed for role)'),
|
||||
21: ('no_write_permission', 'forbidden (modification not allowed)'),
|
||||
@@ -121,6 +120,7 @@ _error_codes = {
|
||||
100: ('data_error', 'general data error'),
|
||||
101: ('inconsistent_data', 'inconsistent data encountered in document'),
|
||||
102: ('database_unavailable', 'database is temporarily unavailable'),
|
||||
110: ('update_failed', 'update failed'),
|
||||
|
||||
# Index-related issues
|
||||
201: ('missing_index', 'missing internal index'),
|
||||
|
||||
@@ -5,12 +5,12 @@ from typing import Union, Type, Iterable
|
||||
|
||||
import jsonmodels.errors
|
||||
import six
|
||||
import validators
|
||||
from jsonmodels import fields
|
||||
from jsonmodels.fields import _LazyType, NotSet
|
||||
from jsonmodels.models import Base as ModelBase
|
||||
from jsonmodels.validators import Enum as EnumValidator
|
||||
from luqum.parser import parser, ParseError
|
||||
from validators import email as email_validator, domain as domain_validator
|
||||
|
||||
from apierrors import errors
|
||||
|
||||
@@ -66,9 +66,7 @@ class DictField(fields.BaseField):
|
||||
value_types = tuple()
|
||||
|
||||
return tuple(
|
||||
_LazyType(type_)
|
||||
if isinstance(type_, six.string_types)
|
||||
else type_
|
||||
_LazyType(type_) if isinstance(type_, six.string_types) else type_
|
||||
for type_ in value_types
|
||||
)
|
||||
|
||||
@@ -78,6 +76,9 @@ class DictField(fields.BaseField):
|
||||
if not self.value_types:
|
||||
return
|
||||
|
||||
if not value:
|
||||
return
|
||||
|
||||
for item in value.values():
|
||||
self.validate_single_value(item)
|
||||
|
||||
@@ -104,7 +105,7 @@ class IntField(fields.IntField):
|
||||
|
||||
|
||||
def validate_lucene_query(value):
|
||||
if value == '':
|
||||
if value == "":
|
||||
return
|
||||
try:
|
||||
parser.parse(value)
|
||||
@@ -122,6 +123,7 @@ class LuceneQueryField(fields.StringField):
|
||||
|
||||
class NullableEnumValidator(EnumValidator):
|
||||
"""Validator for enums that allows a None value."""
|
||||
|
||||
def validate(self, value):
|
||||
if value is not None:
|
||||
super(NullableEnumValidator, self).validate(value)
|
||||
@@ -150,10 +152,6 @@ class EnumField(fields.StringField):
|
||||
|
||||
|
||||
class ActualEnumField(fields.StringField):
|
||||
@property
|
||||
def types(self):
|
||||
return (self.__enum,)
|
||||
|
||||
def __init__(
|
||||
self,
|
||||
enum_class: Type[Enum],
|
||||
@@ -164,6 +162,7 @@ class ActualEnumField(fields.StringField):
|
||||
**kwargs
|
||||
):
|
||||
self.__enum = enum_class
|
||||
self.types = (enum_class,)
|
||||
# noinspection PyTypeChecker
|
||||
choices = list(enum_class)
|
||||
validator_cls = EnumValidator if required else NullableEnumValidator
|
||||
@@ -194,7 +193,7 @@ class EmailField(fields.StringField):
|
||||
super().validate(value)
|
||||
if value is None:
|
||||
return
|
||||
if validators.email(value) is not True:
|
||||
if email_validator(value) is not True:
|
||||
raise errors.bad_request.InvalidEmailAddress()
|
||||
|
||||
|
||||
@@ -203,7 +202,7 @@ class DomainField(fields.StringField):
|
||||
super().validate(value)
|
||||
if value is None:
|
||||
return
|
||||
if validators.domain(value) is not True:
|
||||
if domain_validator(value) is not True:
|
||||
raise errors.bad_request.InvalidDomainName()
|
||||
|
||||
|
||||
|
||||
@@ -58,3 +58,7 @@ class UpdateResponse(models.Base):
|
||||
class PagedRequest(models.Base):
|
||||
page = fields.IntField()
|
||||
page_size = fields.IntField()
|
||||
|
||||
|
||||
class IdResponse(models.Base):
|
||||
id = fields.StringField(required=True)
|
||||
|
||||
@@ -9,7 +9,7 @@ from apimodels.tasks import PublishResponse as TaskPublishResponse
|
||||
class CreateModelRequest(models.Base):
|
||||
name = fields.StringField(required=True)
|
||||
uri = fields.StringField(required=True)
|
||||
labels = DictField(value_types=string_types+(int,), required=True)
|
||||
labels = DictField(value_types=string_types+(int,))
|
||||
tags = ListField(items_types=string_types)
|
||||
system_tags = ListField(items_types=string_types)
|
||||
comment = fields.StringField()
|
||||
|
||||
15
server/apimodels/server.py
Normal file
15
server/apimodels/server.py
Normal file
@@ -0,0 +1,15 @@
|
||||
from jsonmodels.fields import BoolField, DateTimeField, StringField
|
||||
from jsonmodels.models import Base
|
||||
|
||||
|
||||
class ReportStatsOptionRequest(Base):
|
||||
enabled = BoolField(default=None, nullable=True)
|
||||
|
||||
|
||||
class ReportStatsOptionResponse(Base):
|
||||
supported = BoolField(default=True)
|
||||
enabled = BoolField()
|
||||
enabled_time = DateTimeField(nullable=True)
|
||||
enabled_version = StringField(nullable=True)
|
||||
enabled_user = StringField(nullable=True)
|
||||
current_version = StringField()
|
||||
@@ -1,6 +1,6 @@
|
||||
import six
|
||||
from jsonmodels import models
|
||||
from jsonmodels.fields import StringField, BoolField, IntField
|
||||
from jsonmodels.fields import StringField, BoolField, IntField, EmbeddedField
|
||||
from jsonmodels.validators import Enum
|
||||
|
||||
from apimodels import DictField, ListField
|
||||
@@ -9,6 +9,24 @@ from database.model.task.task import TaskType
|
||||
from database.utils import get_options
|
||||
|
||||
|
||||
class ArtifactTypeData(models.Base):
|
||||
preview = StringField()
|
||||
content_type = StringField()
|
||||
data_hash = StringField()
|
||||
|
||||
|
||||
class Artifact(models.Base):
|
||||
key = StringField(required=True)
|
||||
type = StringField(required=True)
|
||||
mode = StringField(validators=Enum("input", "output"), default="output")
|
||||
uri = StringField()
|
||||
hash = StringField()
|
||||
content_size = IntField()
|
||||
timestamp = IntField()
|
||||
type_data = EmbeddedField(ArtifactTypeData)
|
||||
display_data = ListField([list])
|
||||
|
||||
|
||||
class StartedResponse(UpdateResponse):
|
||||
started = IntField()
|
||||
|
||||
@@ -72,3 +90,22 @@ class CreateRequest(TaskData):
|
||||
|
||||
class PingRequest(TaskRequest):
|
||||
pass
|
||||
|
||||
|
||||
class CloneRequest(TaskRequest):
|
||||
new_task_name = StringField()
|
||||
new_task_comment = StringField()
|
||||
new_task_tags = ListField([str])
|
||||
new_task_system_tags = ListField([str])
|
||||
new_task_parent = StringField()
|
||||
new_task_project = StringField()
|
||||
execution_overrides = DictField()
|
||||
|
||||
|
||||
class AddOrUpdateArtifactsRequest(TaskRequest):
|
||||
artifacts = ListField([Artifact], required=True)
|
||||
|
||||
|
||||
class AddOrUpdateArtifactsResponse(models.Base):
|
||||
added = ListField([str])
|
||||
updated = ListField([str])
|
||||
|
||||
@@ -1,3 +1,4 @@
|
||||
import hashlib
|
||||
from collections import defaultdict
|
||||
from contextlib import closing
|
||||
from datetime import datetime
|
||||
@@ -16,6 +17,7 @@ import es_factory
|
||||
from apierrors import errors
|
||||
from bll.event.event_metrics import EventMetrics
|
||||
from bll.task import TaskBLL
|
||||
from config import config
|
||||
from database.errors import translate_errors_context
|
||||
from database.model.task.task import Task, TaskStatus
|
||||
from timing_context import TimingContext
|
||||
@@ -45,11 +47,12 @@ class TaskEventsResult(object):
|
||||
|
||||
|
||||
class EventBLL(object):
|
||||
id_fields = ["task", "iter", "metric", "variant", "key"]
|
||||
id_fields = ("task", "iter", "metric", "variant", "key")
|
||||
|
||||
def __init__(self, events_es=None):
|
||||
self.es = events_es or es_factory.connect("events")
|
||||
self._metrics = EventMetrics(self.es)
|
||||
self._skip_iteration_for_metric = set(config.get("services.events.ignore_iteration.metrics", []))
|
||||
|
||||
@property
|
||||
def metrics(self) -> EventMetrics:
|
||||
@@ -121,7 +124,7 @@ class EventBLL(object):
|
||||
if task_id is not None:
|
||||
es_action["_routing"] = task_id
|
||||
task_ids.add(task_id)
|
||||
if iter is not None:
|
||||
if iter is not None and event.get("metric") not in self._skip_iteration_for_metric:
|
||||
task_iteration[task_id] = max(iter, task_iteration[task_id])
|
||||
|
||||
if event_type == EventType.metrics_scalar.value:
|
||||
@@ -245,7 +248,7 @@ class EventBLL(object):
|
||||
|
||||
def _get_event_id(self, event):
|
||||
id_values = (str(event[field]) for field in self.id_fields if field in event)
|
||||
return "-".join(id_values)
|
||||
return hashlib.md5("-".join(id_values).encode()).hexdigest()
|
||||
|
||||
def scroll_task_events(
|
||||
self,
|
||||
|
||||
@@ -161,7 +161,7 @@ class QueueMetrics:
|
||||
In case no queue ids are specified the avg across all the
|
||||
company queues is calculated for each metric
|
||||
"""
|
||||
# self._log_current_metrics(company_id, queue_ids=queue_ids)
|
||||
# self._log_current_metrics(company, queue_ids=queue_ids)
|
||||
|
||||
if from_date >= to_date:
|
||||
raise bad_request.FieldsValueError("from_date must be less than to_date")
|
||||
|
||||
90
server/bll/statistics/resource_monitor.py
Normal file
90
server/bll/statistics/resource_monitor.py
Normal file
@@ -0,0 +1,90 @@
|
||||
from datetime import datetime
|
||||
import operator
|
||||
from threading import Thread, Lock
|
||||
from time import sleep
|
||||
|
||||
import attr
|
||||
import psutil
|
||||
|
||||
from utilities.threads_manager import ThreadsManager
|
||||
|
||||
|
||||
class ResourceMonitor(Thread):
|
||||
@attr.s(auto_attribs=True)
|
||||
class Sample:
|
||||
cpu_usage: float = 0.0
|
||||
mem_used_gb: float = 0
|
||||
mem_free_gb: float = 0
|
||||
|
||||
@classmethod
|
||||
def _apply(cls, op, *samples):
|
||||
return cls(
|
||||
**{
|
||||
field: op(*(getattr(sample, field) for sample in samples))
|
||||
for field in attr.fields_dict(cls)
|
||||
}
|
||||
)
|
||||
|
||||
def min(self, sample):
|
||||
return self._apply(min, self, sample)
|
||||
|
||||
def max(self, sample):
|
||||
return self._apply(max, self, sample)
|
||||
|
||||
def avg(self, sample, count):
|
||||
res = self._apply(lambda x: x * count, self)
|
||||
res = self._apply(operator.add, res, sample)
|
||||
res = self._apply(lambda x: x / (count + 1), res)
|
||||
return res
|
||||
|
||||
def __init__(self, sample_interval_sec=5):
|
||||
super(ResourceMonitor, self).__init__(daemon=True)
|
||||
self.sample_interval_sec = sample_interval_sec
|
||||
self._lock = Lock()
|
||||
self._clear()
|
||||
|
||||
def _clear(self):
|
||||
sample = self._get_sample()
|
||||
self._avg = sample
|
||||
self._min = sample
|
||||
self._max = sample
|
||||
self._clear_time = datetime.utcnow()
|
||||
self._count = 1
|
||||
|
||||
@classmethod
|
||||
def _get_sample(cls) -> Sample:
|
||||
return cls.Sample(
|
||||
cpu_usage=psutil.cpu_percent(),
|
||||
mem_used_gb=psutil.virtual_memory().used / (1024 ** 3),
|
||||
mem_free_gb=psutil.virtual_memory().free / (1024 ** 3),
|
||||
)
|
||||
|
||||
def run(self):
|
||||
while not ThreadsManager.terminating:
|
||||
sleep(self.sample_interval_sec)
|
||||
|
||||
sample = self._get_sample()
|
||||
|
||||
with self._lock:
|
||||
self._min = self._min.min(sample)
|
||||
self._max = self._max.max(sample)
|
||||
self._avg = self._avg.avg(sample, self._count)
|
||||
self._count += 1
|
||||
|
||||
def get_stats(self) -> dict:
|
||||
""" Returns current resource statistics and clears internal resource statistics """
|
||||
with self._lock:
|
||||
min_ = attr.asdict(self._min)
|
||||
max_ = attr.asdict(self._max)
|
||||
avg = attr.asdict(self._avg)
|
||||
interval = datetime.utcnow() - self._clear_time
|
||||
self._clear()
|
||||
|
||||
return {
|
||||
"interval_sec": interval.total_seconds(),
|
||||
"num_cores": psutil.cpu_count(),
|
||||
**{
|
||||
k: {"min": v, "max": max_[k], "avg": avg[k]}
|
||||
for k, v in min_.items()
|
||||
}
|
||||
}
|
||||
305
server/bll/statistics/stats_reporter.py
Normal file
305
server/bll/statistics/stats_reporter.py
Normal file
@@ -0,0 +1,305 @@
|
||||
import logging
|
||||
import queue
|
||||
import random
|
||||
import time
|
||||
from datetime import timedelta, datetime
|
||||
from time import sleep
|
||||
from typing import Sequence, Optional
|
||||
|
||||
import dpath
|
||||
import requests
|
||||
from requests.adapters import HTTPAdapter
|
||||
from requests.packages.urllib3.util.retry import Retry
|
||||
|
||||
from bll.query import Builder as QueryBuilder
|
||||
from bll.util import get_server_uuid
|
||||
from bll.workers import WorkerStats, WorkerBLL
|
||||
from config import config
|
||||
from config.info import get_deployment_type
|
||||
from database.model import Company, User
|
||||
from database.model.queue import Queue
|
||||
from database.model.task.task import Task
|
||||
from utilities import safe_get
|
||||
from utilities.json import dumps
|
||||
from utilities.threads_manager import ThreadsManager
|
||||
from version import __version__ as current_version
|
||||
from .resource_monitor import ResourceMonitor
|
||||
|
||||
log = config.logger(__file__)
|
||||
|
||||
worker_bll = WorkerBLL()
|
||||
|
||||
|
||||
class StatisticsReporter:
|
||||
threads = ThreadsManager("Statistics", resource_monitor=ResourceMonitor)
|
||||
send_queue = queue.Queue()
|
||||
supported = config.get("apiserver.statistics.supported", True)
|
||||
|
||||
@classmethod
|
||||
def start(cls):
|
||||
cls.start_sender()
|
||||
cls.start_reporter()
|
||||
|
||||
@classmethod
|
||||
@threads.register("reporter", daemon=True)
|
||||
def start_reporter(cls):
|
||||
"""
|
||||
Periodically send statistics reports for companies who have opted in.
|
||||
Note: in trains we usually have only a single company
|
||||
"""
|
||||
if not cls.supported:
|
||||
return
|
||||
|
||||
report_interval = timedelta(
|
||||
hours=config.get("apiserver.statistics.report_interval_hours", 24)
|
||||
)
|
||||
sleep(report_interval.total_seconds())
|
||||
while not ThreadsManager.terminating:
|
||||
try:
|
||||
for company in Company.objects(
|
||||
defaults__stats_option__enabled=True
|
||||
).only("id"):
|
||||
stats = cls.get_statistics(company.id)
|
||||
cls.send_queue.put(stats)
|
||||
|
||||
except Exception as ex:
|
||||
log.exception(f"Failed collecting stats: {str(ex)}")
|
||||
|
||||
sleep(report_interval.total_seconds())
|
||||
|
||||
@classmethod
|
||||
@threads.register("sender", daemon=True)
|
||||
def start_sender(cls):
|
||||
if not cls.supported:
|
||||
return
|
||||
|
||||
url = config.get("apiserver.statistics.url")
|
||||
|
||||
retries = config.get("apiserver.statistics.max_retries", 5)
|
||||
max_backoff = config.get("apiserver.statistics.max_backoff_sec", 5)
|
||||
session = requests.Session()
|
||||
adapter = HTTPAdapter(max_retries=Retry(retries))
|
||||
session.mount("http://", adapter)
|
||||
session.mount("https://", adapter)
|
||||
session.headers["Content-type"] = "application/json"
|
||||
|
||||
WarningFilter.attach()
|
||||
|
||||
while not ThreadsManager.terminating:
|
||||
try:
|
||||
report = cls.send_queue.get()
|
||||
|
||||
# Set a random backoff factor each time we send a report
|
||||
adapter.max_retries.backoff_factor = random.random() * max_backoff
|
||||
|
||||
session.post(url, data=dumps(report))
|
||||
|
||||
except Exception as ex:
|
||||
pass
|
||||
|
||||
@classmethod
|
||||
def get_statistics(cls, company_id: str) -> dict:
|
||||
"""
|
||||
Returns a statistics report per company
|
||||
"""
|
||||
return {
|
||||
"time": datetime.utcnow(),
|
||||
"company_id": company_id,
|
||||
"server": {
|
||||
"version": current_version,
|
||||
"deployment": get_deployment_type(),
|
||||
"uuid": get_server_uuid(),
|
||||
"queues": {"count": Queue.objects(company=company_id).count()},
|
||||
"users": {"count": User.objects(company=company_id).count()},
|
||||
"resources": cls.threads.resource_monitor.get_stats(),
|
||||
"experiments": next(
|
||||
iter(cls._get_experiments_stats(company_id).values()), {}
|
||||
),
|
||||
},
|
||||
"agents": cls._get_agents_statistics(company_id),
|
||||
}
|
||||
|
||||
@classmethod
|
||||
def _get_agents_statistics(cls, company_id: str) -> Sequence[dict]:
|
||||
result = cls._get_resource_stats_per_agent(company_id, key="resources")
|
||||
dpath.merge(
|
||||
result, cls._get_experiments_stats_per_agent(company_id, key="experiments")
|
||||
)
|
||||
return [{"uuid": agent_id, **data} for agent_id, data in result.items()]
|
||||
|
||||
@classmethod
|
||||
def _get_resource_stats_per_agent(cls, company_id: str, key: str) -> dict:
|
||||
agent_resource_threshold_sec = timedelta(
|
||||
hours=config.get("apiserver.statistics.report_interval_hours", 24)
|
||||
).total_seconds()
|
||||
to_timestamp = int(time.time())
|
||||
from_timestamp = to_timestamp - int(agent_resource_threshold_sec)
|
||||
es_req = {
|
||||
"size": 0,
|
||||
"query": QueryBuilder.dates_range(from_timestamp, to_timestamp),
|
||||
"aggs": {
|
||||
"workers": {
|
||||
"terms": {"field": "worker"},
|
||||
"aggs": {
|
||||
"categories": {
|
||||
"terms": {"field": "category"},
|
||||
"aggs": {"count": {"cardinality": {"field": "variant"}}},
|
||||
},
|
||||
"metrics": {
|
||||
"terms": {"field": "metric"},
|
||||
"aggs": {
|
||||
"min": {"min": {"field": "value"}},
|
||||
"max": {"max": {"field": "value"}},
|
||||
"avg": {"avg": {"field": "value"}},
|
||||
},
|
||||
},
|
||||
},
|
||||
}
|
||||
},
|
||||
}
|
||||
res = cls._run_worker_stats_query(company_id, es_req)
|
||||
|
||||
def _get_cardinality_fields(categories: Sequence[dict]) -> dict:
|
||||
names = {"cpu": "num_cores"}
|
||||
return {
|
||||
names[c["key"]]: safe_get(c, "count/value")
|
||||
for c in categories
|
||||
if c["key"] in names
|
||||
}
|
||||
|
||||
def _get_metric_fields(metrics: Sequence[dict]) -> dict:
|
||||
names = {
|
||||
"cpu_usage": "cpu_usage",
|
||||
"memory_used": "mem_used_gb",
|
||||
"memory_free": "mem_free_gb",
|
||||
}
|
||||
return {
|
||||
names[m["key"]]: {
|
||||
"min": safe_get(m, "min/value"),
|
||||
"max": safe_get(m, "max/value"),
|
||||
"avg": safe_get(m, "avg/value"),
|
||||
}
|
||||
for m in metrics
|
||||
if m["key"] in names
|
||||
}
|
||||
|
||||
buckets = safe_get(res, "aggregations/workers/buckets", default=[])
|
||||
return {
|
||||
b["key"]: {
|
||||
key: {
|
||||
"interval_sec": agent_resource_threshold_sec,
|
||||
**_get_cardinality_fields(safe_get(b, "categories/buckets", [])),
|
||||
**_get_metric_fields(safe_get(b, "metrics/buckets", [])),
|
||||
}
|
||||
}
|
||||
for b in buckets
|
||||
}
|
||||
|
||||
@classmethod
|
||||
def _get_experiments_stats_per_agent(cls, company_id: str, key: str) -> dict:
|
||||
agent_relevant_threshold = timedelta(
|
||||
days=config.get("apiserver.statistics.agent_relevant_threshold_days", 30)
|
||||
)
|
||||
to_timestamp = int(time.time())
|
||||
from_timestamp = to_timestamp - int(agent_relevant_threshold.total_seconds())
|
||||
workers = cls._get_active_workers(company_id, from_timestamp, to_timestamp)
|
||||
if not workers:
|
||||
return {}
|
||||
|
||||
stats = cls._get_experiments_stats(company_id, list(workers.keys()))
|
||||
return {
|
||||
worker_id: {key: {**workers[worker_id], **stat}}
|
||||
for worker_id, stat in stats.items()
|
||||
}
|
||||
|
||||
@classmethod
|
||||
def _get_active_workers(
|
||||
cls, company_id, from_timestamp: int, to_timestamp: int
|
||||
) -> dict:
|
||||
es_req = {
|
||||
"size": 0,
|
||||
"query": QueryBuilder.dates_range(from_timestamp, to_timestamp),
|
||||
"aggs": {
|
||||
"workers": {
|
||||
"terms": {"field": "worker"},
|
||||
"aggs": {"last_activity_time": {"max": {"field": "timestamp"}}},
|
||||
}
|
||||
},
|
||||
}
|
||||
res = cls._run_worker_stats_query(company_id, es_req)
|
||||
buckets = safe_get(res, "aggregations/workers/buckets", default=[])
|
||||
return {
|
||||
b["key"]: {"last_activity_time": b["last_activity_time"]["value"]}
|
||||
for b in buckets
|
||||
}
|
||||
|
||||
@classmethod
|
||||
def _run_worker_stats_query(cls, company_id, es_req) -> dict:
|
||||
return worker_bll.es_client.search(
|
||||
index=f"{WorkerStats.worker_stats_prefix_for_company(company_id)}*",
|
||||
doc_type="stat",
|
||||
body=es_req,
|
||||
)
|
||||
|
||||
@classmethod
|
||||
def _get_experiments_stats(
|
||||
cls, company_id, workers: Optional[Sequence] = None
|
||||
) -> dict:
|
||||
pipeline = [
|
||||
{
|
||||
"$match": {
|
||||
"company": company_id,
|
||||
"started": {"$exists": True, "$ne": None},
|
||||
"last_update": {"$exists": True, "$ne": None},
|
||||
"status": {"$nin": ["created", "queued"]},
|
||||
**({"last_worker": {"$in": workers}} if workers else {}),
|
||||
}
|
||||
},
|
||||
{
|
||||
"$group": {
|
||||
"_id": "$last_worker" if workers else None,
|
||||
"count": {"$sum": 1},
|
||||
"avg_run_time_sec": {
|
||||
"$avg": {
|
||||
"$divide": [
|
||||
{"$subtract": ["$last_update", "$started"]},
|
||||
1000,
|
||||
]
|
||||
}
|
||||
},
|
||||
"avg_iterations": {"$avg": "$last_iteration"},
|
||||
}
|
||||
},
|
||||
{
|
||||
"$project": {
|
||||
"count": 1,
|
||||
"avg_run_time_sec": {"$trunc": "$avg_run_time_sec"},
|
||||
"avg_iterations": {"$trunc": "$avg_iterations"},
|
||||
}
|
||||
},
|
||||
]
|
||||
return {
|
||||
group["_id"]: {k: v for k, v in group.items() if k != "_id"}
|
||||
for group in Task.aggregate(*pipeline)
|
||||
}
|
||||
|
||||
|
||||
class WarningFilter(logging.Filter):
|
||||
@classmethod
|
||||
def attach(cls):
|
||||
from urllib3.connectionpool import (
|
||||
ConnectionPool,
|
||||
) # required to make sure the logger is created
|
||||
|
||||
assert ConnectionPool # make sure import is not optimized out
|
||||
|
||||
logging.getLogger("urllib3.connectionpool").addFilter(cls())
|
||||
|
||||
def filter(self, record):
|
||||
if (
|
||||
record.levelno == logging.WARNING
|
||||
and len(record.args) > 2
|
||||
and record.args[2] == "/stats"
|
||||
):
|
||||
return False
|
||||
return True
|
||||
@@ -4,4 +4,5 @@ from .utils import (
|
||||
update_project_time,
|
||||
validate_status_change,
|
||||
split_by,
|
||||
ParameterKeyEscaper,
|
||||
)
|
||||
|
||||
@@ -1,15 +1,18 @@
|
||||
import re
|
||||
from collections import OrderedDict
|
||||
from datetime import datetime, timedelta
|
||||
from operator import attrgetter
|
||||
from random import random
|
||||
from time import sleep
|
||||
from typing import Collection, Sequence, Tuple, Any
|
||||
from typing import Collection, Sequence, Tuple, Any, Optional, List
|
||||
|
||||
import pymongo.results
|
||||
import six
|
||||
from mongoengine import Q
|
||||
from six import string_types
|
||||
|
||||
import es_factory
|
||||
from apierrors import errors
|
||||
from apimodels.tasks import Artifact as ApiArtifact
|
||||
from config import config
|
||||
from database.errors import translate_errors_context
|
||||
from database.model.model import Model
|
||||
@@ -20,16 +23,21 @@ from database.model.task.task import (
|
||||
TaskStatus,
|
||||
TaskStatusMessage,
|
||||
TaskSystemTags,
|
||||
ArtifactModes,
|
||||
Artifact,
|
||||
)
|
||||
from database.utils import get_company_or_none_constraint, id as create_id
|
||||
from service_repo import APICall
|
||||
from timing_context import TimingContext
|
||||
from utilities.dicts import deep_merge
|
||||
from utilities.threads_manager import ThreadsManager
|
||||
from .utils import ChangeStatusRequest, validate_status_change
|
||||
from .utils import ChangeStatusRequest, validate_status_change, ParameterKeyEscaper
|
||||
|
||||
log = config.logger(__file__)
|
||||
|
||||
|
||||
class TaskBLL(object):
|
||||
threads = ThreadsManager()
|
||||
threads = ThreadsManager("TaskBLL")
|
||||
|
||||
def __init__(self, events_es=None):
|
||||
self.events_es = (
|
||||
@@ -144,6 +152,59 @@ class TaskBLL(object):
|
||||
|
||||
return model
|
||||
|
||||
@classmethod
|
||||
def clone_task(
|
||||
cls,
|
||||
company_id,
|
||||
user_id,
|
||||
task_id,
|
||||
name: Optional[str] = None,
|
||||
comment: Optional[str] = None,
|
||||
parent: Optional[str] = None,
|
||||
project: Optional[str] = None,
|
||||
tags: Optional[Sequence[str]] = None,
|
||||
system_tags: Optional[Sequence[str]] = None,
|
||||
execution_overrides: Optional[dict] = None,
|
||||
) -> Task:
|
||||
task = cls.get_by_id(company_id=company_id, task_id=task_id)
|
||||
execution_dict = task.execution.to_proper_dict() if task.execution else {}
|
||||
if execution_overrides:
|
||||
parameters = execution_overrides.get("parameters")
|
||||
if parameters is not None:
|
||||
execution_overrides["parameters"] = {
|
||||
ParameterKeyEscaper.escape(k): v for k, v in parameters.items()
|
||||
}
|
||||
execution_dict = deep_merge(execution_dict, execution_overrides)
|
||||
artifacts = execution_dict.get("artifacts")
|
||||
if artifacts:
|
||||
execution_dict["artifacts"] = [
|
||||
a for a in artifacts if a.get("mode") != ArtifactModes.output
|
||||
]
|
||||
now = datetime.utcnow()
|
||||
|
||||
with translate_errors_context():
|
||||
new_task = Task(
|
||||
id=create_id(),
|
||||
user=user_id,
|
||||
company=company_id,
|
||||
created=now,
|
||||
last_update=now,
|
||||
name=name or task.name,
|
||||
comment=comment or task.comment,
|
||||
parent=parent or task.parent,
|
||||
project=project or task.project,
|
||||
tags=tags or task.tags,
|
||||
system_tags=system_tags or [],
|
||||
type=task.type,
|
||||
script=task.script,
|
||||
output=Output(destination=task.output.destination) if task.output else None,
|
||||
execution=execution_dict,
|
||||
)
|
||||
cls.validate(new_task)
|
||||
new_task.save()
|
||||
|
||||
return new_task
|
||||
|
||||
@classmethod
|
||||
def validate(cls, task: Task):
|
||||
assert isinstance(task, Task)
|
||||
@@ -153,23 +214,13 @@ class TaskBLL(object):
|
||||
):
|
||||
raise errors.bad_request.InvalidTaskId("invalid parent", parent=task.parent)
|
||||
|
||||
if task.project:
|
||||
Project.get_for_writing(company=task.company, id=task.project)
|
||||
if task.project and not Project.get_for_writing(
|
||||
company=task.company, id=task.project
|
||||
):
|
||||
raise errors.bad_request.InvalidProjectId(id=task.project)
|
||||
|
||||
cls.validate_execution_model(task)
|
||||
|
||||
if task.execution:
|
||||
if task.execution.parameters:
|
||||
cls._validate_execution_parameters(task.execution.parameters)
|
||||
|
||||
@staticmethod
|
||||
def _validate_execution_parameters(parameters):
|
||||
invalid_keys = [k for k in parameters if re.search(r"\s", k)]
|
||||
if invalid_keys:
|
||||
raise errors.bad_request.ValidationError(
|
||||
"execution.parameters keys contain whitespace", keys=invalid_keys
|
||||
)
|
||||
|
||||
@staticmethod
|
||||
def get_unique_metric_variants(company_id, project_ids=None):
|
||||
pipeline = [
|
||||
@@ -373,7 +424,7 @@ class TaskBLL(object):
|
||||
:return: updated task fields
|
||||
"""
|
||||
|
||||
task = TaskBLL.get_task_with_access(
|
||||
task = cls.get_task_with_access(
|
||||
task_id,
|
||||
company_id=company_id,
|
||||
only=(
|
||||
@@ -411,6 +462,97 @@ class TaskBLL(object):
|
||||
force=force,
|
||||
).execute()
|
||||
|
||||
@classmethod
|
||||
def add_or_update_artifacts(
|
||||
cls, task_id: str, company_id: str, artifacts: List[ApiArtifact]
|
||||
) -> Tuple[List[str], List[str]]:
|
||||
key = attrgetter("key", "mode")
|
||||
|
||||
if not artifacts:
|
||||
return [], []
|
||||
|
||||
with translate_errors_context(), TimingContext("mongo", "update_artifacts"):
|
||||
artifacts: List[Artifact] = [
|
||||
Artifact(**artifact.to_struct()) for artifact in artifacts
|
||||
]
|
||||
|
||||
attempts = int(config.get("services.tasks.artifacts.update_attempts", 10))
|
||||
|
||||
for retry in range(attempts):
|
||||
task = cls.get_task_with_access(
|
||||
task_id, company_id=company_id, requires_write_access=True
|
||||
)
|
||||
|
||||
current = list(map(key, task.execution.artifacts))
|
||||
updated = [a for a in artifacts if key(a) in current]
|
||||
added = [a for a in artifacts if a not in updated]
|
||||
|
||||
filter = {"_id": task_id, "company": company_id}
|
||||
update = {}
|
||||
array_filters = None
|
||||
if current:
|
||||
filter["execution.artifacts"] = {
|
||||
"$size": len(current),
|
||||
"$all": [
|
||||
*(
|
||||
{"$elemMatch": {"key": key, "mode": mode}}
|
||||
for key, mode in current
|
||||
)
|
||||
],
|
||||
}
|
||||
else:
|
||||
filter["$or"] = [
|
||||
{"execution.artifacts": {"$exists": False}},
|
||||
{"execution.artifacts": {"$size": 0}},
|
||||
]
|
||||
|
||||
if added:
|
||||
update["$push"] = {
|
||||
"execution.artifacts": {"$each": [a.to_mongo() for a in added]}
|
||||
}
|
||||
if updated:
|
||||
update["$set"] = {
|
||||
f"execution.artifacts.$[artifact{index}]": artifact.to_mongo()
|
||||
for index, artifact in enumerate(updated)
|
||||
}
|
||||
array_filters = [
|
||||
{
|
||||
f"artifact{index}.key": artifact.key,
|
||||
f"artifact{index}.mode": artifact.mode,
|
||||
}
|
||||
for index, artifact in enumerate(updated)
|
||||
]
|
||||
|
||||
if not update:
|
||||
return [], []
|
||||
|
||||
result: pymongo.results.UpdateResult = Task._get_collection().update_one(
|
||||
filter=filter,
|
||||
update=update,
|
||||
array_filters=array_filters,
|
||||
upsert=False,
|
||||
)
|
||||
|
||||
if result.matched_count >= 1:
|
||||
break
|
||||
|
||||
wait_msec = random() * int(
|
||||
config.get("services.tasks.artifacts.update_retry_msec", 500)
|
||||
)
|
||||
|
||||
log.warning(
|
||||
f"Failed to update artifacts for task {task_id} (updated by another party),"
|
||||
f" retrying {retry+1}/{attempts} in {wait_msec}ms"
|
||||
)
|
||||
|
||||
sleep(wait_msec / 1000)
|
||||
else:
|
||||
raise errors.server_error.UpdateFailed(
|
||||
"task artifacts updated by another party"
|
||||
)
|
||||
|
||||
return [a.key for a in added], [a.key for a in updated]
|
||||
|
||||
@classmethod
|
||||
@threads.register("non_responsive_tasks_watchdog", daemon=True)
|
||||
def start_non_responsive_tasks_watchdog(cls):
|
||||
@@ -421,13 +563,11 @@ class TaskBLL(object):
|
||||
"services.tasks.non_responsive_tasks_watchdog.threshold_sec", 7200
|
||||
)
|
||||
)
|
||||
while True:
|
||||
sleep(
|
||||
config.get(
|
||||
"services.tasks.non_responsive_tasks_watchdog.watch_interval_sec",
|
||||
900,
|
||||
)
|
||||
)
|
||||
watch_interval = config.get(
|
||||
"services.tasks.non_responsive_tasks_watchdog.watch_interval_sec", 900
|
||||
)
|
||||
sleep(watch_interval)
|
||||
while not ThreadsManager.terminating:
|
||||
try:
|
||||
|
||||
ref_time = datetime.utcnow() - threshold
|
||||
@@ -463,6 +603,8 @@ class TaskBLL(object):
|
||||
except Exception as ex:
|
||||
log.exception(f"Failed stopping tasks: {str(ex)}")
|
||||
|
||||
sleep(watch_interval)
|
||||
|
||||
@staticmethod
|
||||
def get_aggregated_project_execution_parameters(
|
||||
company_id,
|
||||
@@ -502,10 +644,7 @@ class TaskBLL(object):
|
||||
]
|
||||
|
||||
with translate_errors_context():
|
||||
result = next(
|
||||
Task.aggregate(*pipeline),
|
||||
None,
|
||||
)
|
||||
result = next(Task.aggregate(*pipeline), None)
|
||||
|
||||
total = 0
|
||||
remaining = 0
|
||||
@@ -513,7 +652,10 @@ class TaskBLL(object):
|
||||
|
||||
if result:
|
||||
total = int(result.get("total", -1))
|
||||
results = [r["_id"] for r in result.get("results", [])]
|
||||
results = [
|
||||
ParameterKeyEscaper.unescape(r["_id"])
|
||||
for r in result.get("results", [])
|
||||
]
|
||||
remaining = max(0, total - (len(results) + page * page_size))
|
||||
|
||||
return total, remaining, results
|
||||
|
||||
@@ -3,6 +3,7 @@ from typing import TypeVar, Callable, Tuple, Sequence
|
||||
|
||||
import attr
|
||||
import six
|
||||
from boltons.dictutils import OneToOne
|
||||
|
||||
from apierrors import errors
|
||||
from database.errors import translate_errors_context
|
||||
@@ -171,3 +172,26 @@ def split_by(
|
||||
[item for cond, item in applied if cond],
|
||||
[item for cond, item in applied if not cond],
|
||||
)
|
||||
|
||||
|
||||
class ParameterKeyEscaper:
|
||||
_mapping = OneToOne({".": "%2E", "$": "%24"})
|
||||
|
||||
@classmethod
|
||||
def escape(cls, value):
|
||||
""" Quote a parameter key """
|
||||
value = value.strip().replace("%", "%%")
|
||||
for c, r in cls._mapping.items():
|
||||
value = value.replace(c, r)
|
||||
return value
|
||||
|
||||
@classmethod
|
||||
def _unescape(cls, value):
|
||||
for c, r in cls._mapping.inv.items():
|
||||
value = value.replace(c, r)
|
||||
return value
|
||||
|
||||
@classmethod
|
||||
def unescape(cls, value):
|
||||
""" Unquote a quoted parameter key """
|
||||
return "%".join(map(cls._unescape, value.split("%%")))
|
||||
|
||||
@@ -1,7 +1,9 @@
|
||||
import functools
|
||||
from operator import itemgetter
|
||||
from typing import Sequence, Optional, Callable, Tuple, Dict, Any, Set
|
||||
|
||||
from database.model import AttributedDocument
|
||||
from database.model.settings import Settings
|
||||
|
||||
|
||||
def extract_properties_to_lists(
|
||||
@@ -64,3 +66,8 @@ class SetFieldsResolver:
|
||||
in the format suitable for projection (dot separated)
|
||||
"""
|
||||
return set(name.replace("__", ".") for name in self.fields.values())
|
||||
|
||||
|
||||
@functools.lru_cache()
|
||||
def get_server_uuid() -> Optional[str]:
|
||||
return Settings.get_by_key("server.uuid")
|
||||
|
||||
@@ -4,6 +4,7 @@ from typing import Sequence, Set, Optional
|
||||
|
||||
import attr
|
||||
import elasticsearch.helpers
|
||||
|
||||
import es_factory
|
||||
from apierrors import APIError
|
||||
from apierrors.errors import bad_request, server_error
|
||||
@@ -22,10 +23,9 @@ from database.model.auth import User
|
||||
from database.model.company import Company
|
||||
from database.model.queue import Queue
|
||||
from database.model.task.task import Task
|
||||
from service_repo.redis_manager import redman
|
||||
from redis_manager import redman
|
||||
from timing_context import TimingContext
|
||||
from tools import safe_get
|
||||
|
||||
from .stats import WorkerStats
|
||||
|
||||
log = config.logger(__file__)
|
||||
@@ -33,9 +33,9 @@ log = config.logger(__file__)
|
||||
|
||||
class WorkerBLL:
|
||||
def __init__(self, es=None, redis=None):
|
||||
self.es = es if es is not None else es_factory.connect("workers")
|
||||
self.es_client = es if es is not None else es_factory.connect("workers")
|
||||
self.redis = redis if redis is not None else redman.connection("workers")
|
||||
self._stats = WorkerStats(self.es)
|
||||
self._stats = WorkerStats(self.es_client)
|
||||
|
||||
@property
|
||||
def stats(self) -> WorkerStats:
|
||||
@@ -396,7 +396,7 @@ class WorkerBLL:
|
||||
for i, val in enumerate(value)
|
||||
)
|
||||
|
||||
es_res = elasticsearch.helpers.bulk(self.es, actions)
|
||||
es_res = elasticsearch.helpers.bulk(self.es_client, actions)
|
||||
added, errors = es_res[:2]
|
||||
return (added == len(actions)) and not errors
|
||||
|
||||
|
||||
@@ -101,4 +101,18 @@
|
||||
# GET request timeout
|
||||
request_timeout_sec: 3.0
|
||||
}
|
||||
|
||||
statistics {
|
||||
# Note: statistics are sent ONLY if the user has actively opted-in
|
||||
supported: true
|
||||
|
||||
url: "https://updates.trains.allegro.ai/stats"
|
||||
|
||||
report_interval_hours: 24
|
||||
agent_relevant_threshold_days: 30
|
||||
|
||||
max_retries: 5
|
||||
max_backoff_sec: 5
|
||||
}
|
||||
|
||||
}
|
||||
|
||||
@@ -1,3 +1,5 @@
|
||||
{
|
||||
es_index_prefix:"events"
|
||||
es_index_prefix: "events"
|
||||
|
||||
ignore_iteration {
|
||||
metrics: [":monitor:machine", ":monitor:gpu"]
|
||||
}
|
||||
@@ -5,3 +5,8 @@ non_responsive_tasks_watchdog {
|
||||
# Watchdog will sleep for this number of seconds after each cycle
|
||||
watch_interval_sec: 900
|
||||
}
|
||||
|
||||
artifacts {
|
||||
update_attempts: 10
|
||||
update_retry_msec: 500
|
||||
}
|
||||
@@ -1,28 +1,43 @@
|
||||
from functools import lru_cache
|
||||
from os import getenv
|
||||
from pathlib import Path
|
||||
from version import __version__
|
||||
|
||||
from config import config
|
||||
|
||||
root = Path(__file__).parent.parent
|
||||
|
||||
|
||||
def _get(prop_name, env_suffix=None, default=""):
|
||||
value = getenv(f"TRAINS_SERVER_{env_suffix or prop_name}")
|
||||
if value:
|
||||
return value
|
||||
|
||||
try:
|
||||
return (root / prop_name).read_text().strip()
|
||||
except FileNotFoundError:
|
||||
return default
|
||||
|
||||
|
||||
@lru_cache()
|
||||
def get_build_number():
|
||||
try:
|
||||
return (root / "BUILD").read_text().strip()
|
||||
except FileNotFoundError:
|
||||
return ""
|
||||
return _get("BUILD")
|
||||
|
||||
|
||||
@lru_cache()
|
||||
def get_version():
|
||||
try:
|
||||
return (root / "VERSION").read_text().strip()
|
||||
except FileNotFoundError:
|
||||
return ""
|
||||
return _get("VERSION", default=__version__)
|
||||
|
||||
|
||||
@lru_cache()
|
||||
def get_commit_number():
|
||||
try:
|
||||
return (root / "COMMIT").read_text().strip()
|
||||
except FileNotFoundError:
|
||||
return ""
|
||||
return _get("COMMIT")
|
||||
|
||||
|
||||
@lru_cache()
|
||||
def get_deployment_type() -> str:
|
||||
return _get("DEPLOY", env_suffix="DEPLOYMENT_TYPE", default="manual")
|
||||
|
||||
|
||||
def get_default_company():
|
||||
return config.get("apiserver.default_company")
|
||||
|
||||
@@ -1,5 +1,6 @@
|
||||
from os import getenv
|
||||
|
||||
from boltons.iterutils import first
|
||||
from furl import furl
|
||||
from jsonmodels import models
|
||||
from jsonmodels.errors import ValidationError
|
||||
@@ -11,14 +12,16 @@ from config import config
|
||||
from .defs import Database
|
||||
from .utils import get_items
|
||||
|
||||
from boltons.iterutils import first
|
||||
|
||||
log = config.logger("database")
|
||||
|
||||
strict = config.get("apiserver.mongo.strict", True)
|
||||
|
||||
OVERRIDE_HOST_ENV_KEY = ("MONGODB_SERVICE_HOST", "MONGODB_SERVICE_SERVICE_HOST")
|
||||
OVERRIDE_PORT_ENV_KEY = "MONGODB_SERVICE_PORT"
|
||||
OVERRIDE_HOST_ENV_KEY = (
|
||||
"TRAINS_MONGODB_SERVICE_HOST",
|
||||
"MONGODB_SERVICE_HOST",
|
||||
"MONGODB_SERVICE_SERVICE_HOST",
|
||||
)
|
||||
OVERRIDE_PORT_ENV_KEY = ("TRAINS_MONGODB_SERVICE_PORT", "MONGODB_SERVICE_PORT")
|
||||
|
||||
_entries = []
|
||||
|
||||
@@ -41,7 +44,7 @@ def initialize():
|
||||
if override_hostname:
|
||||
log.info(f"Using override mongodb host {override_hostname}")
|
||||
|
||||
override_port = getenv(OVERRIDE_PORT_ENV_KEY)
|
||||
override_port = first(map(getenv, OVERRIDE_PORT_ENV_KEY), None)
|
||||
if override_port:
|
||||
log.info(f"Using override mongodb port {override_port}")
|
||||
|
||||
|
||||
@@ -52,7 +52,7 @@ class User(DbModelMixin, AuthDocument):
|
||||
meta = {"db_alias": Database.auth, "strict": strict}
|
||||
|
||||
id = StringField(primary_key=True)
|
||||
name = StringField(unique_with="company")
|
||||
name = StringField()
|
||||
|
||||
created = DateTimeField()
|
||||
""" User auth entry creation time """
|
||||
|
||||
@@ -1,7 +1,7 @@
|
||||
import re
|
||||
from collections import namedtuple
|
||||
from functools import reduce
|
||||
from typing import Collection, Sequence, Union
|
||||
from typing import Collection, Sequence, Union, Optional
|
||||
|
||||
from boltons.iterutils import first
|
||||
from dateutil.parser import parse as parse_datetime
|
||||
@@ -60,7 +60,7 @@ class ProperDictMixin(object):
|
||||
|
||||
class GetMixin(PropsMixin):
|
||||
_text_score = "$text_score"
|
||||
|
||||
_projection_key = "projection"
|
||||
_ordering_key = "order_by"
|
||||
_search_text_key = "search_text"
|
||||
|
||||
@@ -270,11 +270,26 @@ class GetMixin(PropsMixin):
|
||||
return override_projection
|
||||
if not parameters:
|
||||
return []
|
||||
return parameters.get("projection") or parameters.get("only_fields", [])
|
||||
return parameters.get(cls._projection_key) or parameters.get("only_fields", [])
|
||||
|
||||
@classmethod
|
||||
def set_default_ordering(cls, parameters, value):
|
||||
parameters[cls._ordering_key] = parameters.get(cls._ordering_key) or value
|
||||
def set_projection(cls, parameters: dict, value: Sequence[str]) -> Sequence[str]:
|
||||
parameters.pop("only_fields", None)
|
||||
parameters[cls._projection_key] = value
|
||||
return value
|
||||
|
||||
@classmethod
|
||||
def get_ordering(cls, parameters: dict) -> Optional[Sequence[str]]:
|
||||
return parameters.get(cls._ordering_key)
|
||||
|
||||
@classmethod
|
||||
def set_ordering(cls, parameters: dict, value: Sequence[str]) -> Sequence[str]:
|
||||
parameters[cls._ordering_key] = value
|
||||
return value
|
||||
|
||||
@classmethod
|
||||
def set_default_ordering(cls, parameters: dict, value: Sequence[str]) -> None:
|
||||
cls.set_ordering(parameters, cls.get_ordering(parameters) or value)
|
||||
|
||||
@classmethod
|
||||
def get_many_with_join(
|
||||
|
||||
@@ -1,23 +1,36 @@
|
||||
from mongoengine import Document, EmbeddedDocument, EmbeddedDocumentField, StringField, Q
|
||||
from mongoengine import (
|
||||
Document,
|
||||
EmbeddedDocument,
|
||||
EmbeddedDocumentField,
|
||||
StringField,
|
||||
Q,
|
||||
BooleanField,
|
||||
DateTimeField,
|
||||
)
|
||||
|
||||
from database import Database, strict
|
||||
from database.fields import StrippedStringField
|
||||
from database.model import DbModelMixin
|
||||
|
||||
|
||||
class ReportStatsOption(EmbeddedDocument):
|
||||
enabled = BooleanField(default=False) # opt-in for statistics reporting
|
||||
enabled_version = StringField() # server version when enabled
|
||||
enabled_time = DateTimeField() # time when enabled
|
||||
enabled_user = StringField() # ID of user who enabled
|
||||
|
||||
|
||||
class CompanyDefaults(EmbeddedDocument):
|
||||
cluster = StringField()
|
||||
stats_option = EmbeddedDocumentField(ReportStatsOption, default=ReportStatsOption)
|
||||
|
||||
|
||||
class Company(DbModelMixin, Document):
|
||||
meta = {
|
||||
'db_alias': Database.backend,
|
||||
'strict': strict,
|
||||
}
|
||||
meta = {"db_alias": Database.backend, "strict": strict}
|
||||
|
||||
id = StringField(primary_key=True)
|
||||
name = StrippedStringField(unique=True, min_length=3)
|
||||
defaults = EmbeddedDocumentField(CompanyDefaults)
|
||||
defaults = EmbeddedDocumentField(CompanyDefaults, default=CompanyDefaults)
|
||||
|
||||
@classmethod
|
||||
def _prepare_perm_query(cls, company, allow_public=False):
|
||||
|
||||
53
server/database/model/settings.py
Normal file
53
server/database/model/settings.py
Normal file
@@ -0,0 +1,53 @@
|
||||
from typing import Any, Optional, Sequence, Tuple
|
||||
|
||||
from mongoengine import Document, StringField, DynamicField, Q
|
||||
from mongoengine.errors import NotUniqueError
|
||||
|
||||
from database import Database, strict
|
||||
from database.model import DbModelMixin
|
||||
|
||||
|
||||
class Settings(DbModelMixin, Document):
|
||||
meta = {
|
||||
"db_alias": Database.backend,
|
||||
"strict": strict,
|
||||
}
|
||||
|
||||
key = StringField(primary_key=True)
|
||||
value = DynamicField()
|
||||
|
||||
@classmethod
|
||||
def get_by_key(cls, key: str, default: Optional[Any] = None, sep: str = ".") -> Any:
|
||||
key = key.strip(sep)
|
||||
res = Settings.objects(key=key).first()
|
||||
if not res:
|
||||
return default
|
||||
return res.value
|
||||
|
||||
@classmethod
|
||||
def get_by_prefix(
|
||||
cls, key_prefix: str, default: Optional[Any] = None, sep: str = "."
|
||||
) -> Sequence[Tuple[str, Any]]:
|
||||
key_prefix = key_prefix.strip(sep)
|
||||
query = Q(key=key_prefix) | Q(key__startswith=key_prefix + sep)
|
||||
res = Settings.objects(query)
|
||||
if not res:
|
||||
return default
|
||||
return [(x.key, x.value) for x in res]
|
||||
|
||||
@classmethod
|
||||
def set_or_add_value(cls, key: str, value: Any, sep: str = ".") -> bool:
|
||||
""" Sets a new value or adds a new key/value setting (if key does not exist) """
|
||||
key = key.strip(sep)
|
||||
res = Settings.objects(key=key).update(key=key, value=value, upsert=True)
|
||||
return bool(res)
|
||||
|
||||
@classmethod
|
||||
def add_value(cls, key: str, value: Any, sep: str = ".") -> bool:
|
||||
""" Adds a new key/value settings. Fails if key already exists. """
|
||||
key = key.strip(sep)
|
||||
try:
|
||||
res = Settings(key=key, value=value).save(force_insert=True)
|
||||
return bool(res)
|
||||
except NotUniqueError:
|
||||
return False
|
||||
@@ -18,6 +18,7 @@ from database.fields import (
|
||||
SafeSortedListField,
|
||||
)
|
||||
from database.model import AttributedDocument
|
||||
from database.model.base import ProperDictMixin
|
||||
from database.model.model_labels import ModelLabels
|
||||
from database.model.project import Project
|
||||
from database.utils import get_options
|
||||
@@ -66,10 +67,15 @@ class ArtifactTypeData(EmbeddedDocument):
|
||||
data_hash = StringField()
|
||||
|
||||
|
||||
class ArtifactModes:
|
||||
input = "input"
|
||||
output = "output"
|
||||
|
||||
|
||||
class Artifact(EmbeddedDocument):
|
||||
key = StringField(required=True)
|
||||
type = StringField(required=True)
|
||||
mode = StringField(choices=("input", "output"), default="output")
|
||||
mode = StringField(choices=get_options(ArtifactModes), default=ArtifactModes.output)
|
||||
uri = StringField()
|
||||
hash = StringField()
|
||||
content_size = LongField()
|
||||
@@ -78,7 +84,7 @@ class Artifact(EmbeddedDocument):
|
||||
display_data = SafeSortedListField(ListField(UnionField((int, float, str))))
|
||||
|
||||
|
||||
class Execution(EmbeddedDocument):
|
||||
class Execution(EmbeddedDocument, ProperDictMixin):
|
||||
test_split = IntField(default=0)
|
||||
parameters = SafeDictField(default=dict)
|
||||
model = StringField(reference_field="Model")
|
||||
|
||||
@@ -1,7 +1,6 @@
|
||||
from mongoengine import Document, StringField
|
||||
from mongoengine import Document, StringField, DynamicField
|
||||
|
||||
from database import Database, strict
|
||||
from database.fields import SafeDictField
|
||||
from database.model import DbModelMixin
|
||||
from database.model.company import Company
|
||||
|
||||
@@ -18,4 +17,4 @@ class User(DbModelMixin, Document):
|
||||
family_name = StringField(user_set_allowed=True)
|
||||
given_name = StringField(user_set_allowed=True)
|
||||
avatar = StringField()
|
||||
preferences = SafeDictField(default=dict, exclude_by_default=True)
|
||||
preferences = DynamicField(default="", exclude_by_default=True)
|
||||
|
||||
@@ -10,7 +10,11 @@ from pathlib import Path
|
||||
from requests.adapters import HTTPAdapter
|
||||
from requests.packages.urllib3.util.retry import Retry
|
||||
|
||||
HERE = Path(__file__).parent
|
||||
HERE = Path(__file__).resolve().parent
|
||||
|
||||
session = requests.Session()
|
||||
adapter = HTTPAdapter(max_retries=Retry(5, backoff_factor=0.5))
|
||||
session.mount('http://', adapter)
|
||||
|
||||
|
||||
def apply_mappings_to_host(host: str):
|
||||
@@ -20,10 +24,6 @@ def apply_mappings_to_host(host: str):
|
||||
es_server = host
|
||||
url = f"{es_server}/_template/{f.stem}"
|
||||
|
||||
session = requests.Session()
|
||||
adapter = HTTPAdapter(max_retries=Retry(5, backoff_factor=0.5))
|
||||
session.mount('http://', adapter)
|
||||
|
||||
session.delete(url)
|
||||
r = session.post(
|
||||
url,
|
||||
|
||||
@@ -1,7 +1,7 @@
|
||||
{
|
||||
"template": "events-*",
|
||||
"settings": {
|
||||
"number_of_shards": 5
|
||||
"number_of_shards": 1
|
||||
},
|
||||
"mappings": {
|
||||
"_default_": {
|
||||
|
||||
@@ -1,20 +1,25 @@
|
||||
from datetime import datetime
|
||||
from os import getenv
|
||||
|
||||
from boltons.iterutils import first
|
||||
from elasticsearch import Elasticsearch, Transport
|
||||
|
||||
from config import config
|
||||
|
||||
log = config.logger(__file__)
|
||||
|
||||
OVERRIDE_HOST_ENV_KEY = ("ELASTIC_SERVICE_HOST", "ELASTIC_SERVICE_SERVICE_HOST")
|
||||
OVERRIDE_PORT_ENV_KEY = "ELASTIC_SERVICE_PORT"
|
||||
OVERRIDE_HOST_ENV_KEY = (
|
||||
"TRAINS_ELASTIC_SERVICE_HOST",
|
||||
"ELASTIC_SERVICE_HOST",
|
||||
"ELASTIC_SERVICE_SERVICE_HOST",
|
||||
)
|
||||
OVERRIDE_PORT_ENV_KEY = ("TRAINS_ELASTIC_SERVICE_PORT", "ELASTIC_SERVICE_PORT")
|
||||
|
||||
OVERRIDE_HOST = next(filter(None, map(getenv, OVERRIDE_HOST_ENV_KEY)), None)
|
||||
OVERRIDE_HOST = first(filter(None, map(getenv, OVERRIDE_HOST_ENV_KEY)))
|
||||
if OVERRIDE_HOST:
|
||||
log.info(f"Using override elastic host {OVERRIDE_HOST}")
|
||||
|
||||
OVERRIDE_PORT = getenv(OVERRIDE_PORT_ENV_KEY)
|
||||
OVERRIDE_PORT = first(filter(None, map(getenv, OVERRIDE_PORT_ENV_KEY)))
|
||||
if OVERRIDE_PORT:
|
||||
log.info(f"Using override elastic port {OVERRIDE_PORT}")
|
||||
|
||||
@@ -25,6 +30,7 @@ class MissingClusterConfiguration(Exception):
|
||||
"""
|
||||
Exception when cluster configuration is not found in config files
|
||||
"""
|
||||
|
||||
pass
|
||||
|
||||
|
||||
@@ -32,6 +38,7 @@ class InvalidClusterConfiguration(Exception):
|
||||
"""
|
||||
Exception when cluster configuration does not contain required properties
|
||||
"""
|
||||
|
||||
pass
|
||||
|
||||
|
||||
@@ -46,12 +53,14 @@ def connect(cluster_name):
|
||||
"""
|
||||
if cluster_name not in _instances:
|
||||
cluster_config = get_cluster_config(cluster_name)
|
||||
hosts = cluster_config.get('hosts', None)
|
||||
hosts = cluster_config.get("hosts", None)
|
||||
if not hosts:
|
||||
raise InvalidClusterConfiguration(cluster_name)
|
||||
|
||||
args = cluster_config.get('args', {})
|
||||
_instances[cluster_name] = Elasticsearch(hosts=hosts, transport_class=Transport, **args)
|
||||
args = cluster_config.get("args", {})
|
||||
_instances[cluster_name] = Elasticsearch(
|
||||
hosts=hosts, transport_class=Transport, **args
|
||||
)
|
||||
|
||||
return _instances[cluster_name]
|
||||
|
||||
@@ -63,13 +72,13 @@ def get_cluster_config(cluster_name):
|
||||
:return: config section for the cluster
|
||||
:raises MissingClusterConfiguration: in case no config section is found for the cluster
|
||||
"""
|
||||
cluster_key = '.'.join(('hosts.elastic', cluster_name))
|
||||
cluster_key = ".".join(("hosts.elastic", cluster_name))
|
||||
cluster_config = config.get(cluster_key, None)
|
||||
if not cluster_config:
|
||||
raise MissingClusterConfiguration(cluster_name)
|
||||
|
||||
def set_host_prop(key, value):
|
||||
for host in cluster_config.get('hosts', []):
|
||||
for host in cluster_config.get("hosts", []):
|
||||
host[key] = value
|
||||
|
||||
if OVERRIDE_HOST:
|
||||
|
||||
@@ -1,6 +1,7 @@
|
||||
import importlib.util
|
||||
from datetime import datetime
|
||||
from pathlib import Path
|
||||
from uuid import uuid4
|
||||
|
||||
import attr
|
||||
from furl import furl
|
||||
@@ -10,11 +11,13 @@ from semantic_version import Version
|
||||
import database.utils
|
||||
from bll.queue import QueueBLL
|
||||
from config import config
|
||||
from config.info import get_default_company
|
||||
from database import Database
|
||||
from database.model.auth import Role
|
||||
from database.model.auth import User as AuthUser, Credentials
|
||||
from database.model.company import Company
|
||||
from database.model.queue import Queue
|
||||
from database.model.settings import Settings
|
||||
from database.model.user import User
|
||||
from database.model.version import Version as DatabaseVersion
|
||||
from elastic.apply_mappings import apply_mappings_to_host
|
||||
@@ -23,7 +26,7 @@ from service_repo.auth.fixed_user import FixedUser
|
||||
|
||||
log = config.logger(__file__)
|
||||
|
||||
migration_dir = (Path(__file__) / "../../migration/mongodb").resolve()
|
||||
migration_dir = Path(__file__).resolve().parent / "mongo" / "migrations"
|
||||
|
||||
|
||||
class MissingElasticConfiguration(Exception):
|
||||
@@ -47,7 +50,7 @@ def init_es_data():
|
||||
|
||||
|
||||
def _ensure_company():
|
||||
company_id = config.get("apiserver.default_company")
|
||||
company_id = get_default_company()
|
||||
company = Company.objects(id=company_id).only("id").first()
|
||||
if company:
|
||||
return company_id
|
||||
@@ -109,10 +112,7 @@ def _ensure_user(user: FixedUser, company_id: str):
|
||||
data["email"] = f"{user.user_id}@example.com"
|
||||
data["role"] = Role.user
|
||||
|
||||
_ensure_auth_user(
|
||||
user_data=data,
|
||||
company_id=company_id,
|
||||
)
|
||||
_ensure_auth_user(user_data=data, company_id=company_id)
|
||||
|
||||
given_name, _, family_name = user.name.partition(" ")
|
||||
|
||||
@@ -142,9 +142,7 @@ def _apply_migrations():
|
||||
try:
|
||||
new_scripts = {
|
||||
ver: path
|
||||
for ver, path in (
|
||||
(Version(f.stem), f) for f in migration_dir.glob("*.py")
|
||||
)
|
||||
for ver, path in ((Version(f.stem), f) for f in migration_dir.glob("*.py"))
|
||||
if ver > last_version
|
||||
}
|
||||
except ValueError as ex:
|
||||
@@ -179,16 +177,30 @@ def _apply_migrations():
|
||||
).save()
|
||||
|
||||
|
||||
def _ensure_uuid():
|
||||
Settings.add_value("server.uuid", str(uuid4()))
|
||||
|
||||
|
||||
def init_mongo_data():
|
||||
try:
|
||||
_apply_migrations()
|
||||
|
||||
_ensure_uuid()
|
||||
|
||||
company_id = _ensure_company()
|
||||
_ensure_default_queue(company_id)
|
||||
|
||||
users = [
|
||||
{"name": "apiserver", "role": Role.system, "email": "apiserver@example.com"},
|
||||
{"name": "webserver", "role": Role.system, "email": "webserver@example.com"},
|
||||
{
|
||||
"name": "apiserver",
|
||||
"role": Role.system,
|
||||
"email": "apiserver@example.com",
|
||||
},
|
||||
{
|
||||
"name": "webserver",
|
||||
"role": Role.system,
|
||||
"email": "webserver@example.com",
|
||||
},
|
||||
{"name": "tests", "role": Role.user, "email": "tests@example.com"},
|
||||
]
|
||||
|
||||
@@ -200,10 +212,11 @@ def init_mongo_data():
|
||||
|
||||
if FixedUser.enabled():
|
||||
log.info("Fixed users mode is enabled")
|
||||
FixedUser.validate()
|
||||
for user in FixedUser.from_config():
|
||||
try:
|
||||
_ensure_user(user, company_id)
|
||||
except Exception as ex:
|
||||
log.error(f"Failed creating fixed user {user['name']}: {ex}")
|
||||
log.error(f"Failed creating fixed user {user.name}: {ex}")
|
||||
except Exception as ex:
|
||||
log.exception("Failed initializing mongodb")
|
||||
|
||||
20
server/mongo/migrations/0.13.0.py
Normal file
20
server/mongo/migrations/0.13.0.py
Normal file
@@ -0,0 +1,20 @@
|
||||
import json
|
||||
|
||||
from pymongo.database import Database, Collection
|
||||
|
||||
|
||||
def migrate_auth(db: Database):
|
||||
collection: Collection = db["user"]
|
||||
if "name_1_company_1" in [doc["name"] for doc in collection.list_indexes()]:
|
||||
collection.drop_index("name_1_company_1")
|
||||
|
||||
|
||||
def migrate_backend(db: Database):
|
||||
collection: Collection = db["user"]
|
||||
users = collection.find(
|
||||
{"preferences": {"$exists": True, "$ne": None, "$type": "object"}}
|
||||
)
|
||||
for doc in users:
|
||||
collection.update_one(
|
||||
{"_id": doc["_id"]}, {"$set": {"preferences": json.dumps(doc["preferences"])}}
|
||||
)
|
||||
@@ -2,21 +2,23 @@ import threading
|
||||
from os import getenv
|
||||
from time import sleep
|
||||
|
||||
from apierrors.errors.server_error import ConfigError, GeneralError
|
||||
from config import config
|
||||
from boltons.iterutils import first
|
||||
from redis import StrictRedis
|
||||
from redis.sentinel import Sentinel, SentinelConnectionPool
|
||||
|
||||
from apierrors.errors.server_error import ConfigError, GeneralError
|
||||
from config import config
|
||||
|
||||
log = config.logger(__file__)
|
||||
|
||||
OVERRIDE_HOST_ENV_KEY = "REDIS_SERVICE_HOST"
|
||||
OVERRIDE_PORT_ENV_KEY = "REDIS_SERVICE_PORT"
|
||||
OVERRIDE_HOST_ENV_KEY = ("TRAINS_REDIS_SERVICE_HOST", "REDIS_SERVICE_HOST")
|
||||
OVERRIDE_PORT_ENV_KEY = ("TRAINS_REDIS_SERVICE_PORT", "REDIS_SERVICE_PORT")
|
||||
|
||||
OVERRIDE_HOST = getenv(OVERRIDE_HOST_ENV_KEY)
|
||||
OVERRIDE_HOST = first(filter(None, map(getenv, OVERRIDE_HOST_ENV_KEY)))
|
||||
if OVERRIDE_HOST:
|
||||
log.info(f"Using override redis host {OVERRIDE_HOST}")
|
||||
|
||||
OVERRIDE_PORT = getenv(OVERRIDE_PORT_ENV_KEY)
|
||||
OVERRIDE_PORT = first(filter(None, map(getenv, OVERRIDE_PORT_ENV_KEY)))
|
||||
if OVERRIDE_PORT:
|
||||
log.info(f"Using override redis port {OVERRIDE_PORT}")
|
||||
|
||||
@@ -3,7 +3,6 @@ Flask>=0.12.2
|
||||
elasticsearch>=5.0.0,<6.0.0
|
||||
pyhocon>=0.3.35
|
||||
requests>=2.13.0
|
||||
arrow>=0.10.0
|
||||
pymongo==3.6.1 # 3.7 has a bug multiple users logged in
|
||||
Flask-Cors>=3.0.5
|
||||
Flask-Compress>=1.4.0
|
||||
@@ -17,7 +16,6 @@ jsonschema>=2.6.0
|
||||
dpath>=1.4.2
|
||||
funcsigs==1.0.2
|
||||
luqum>=0.7.2
|
||||
typing>=3.6.4
|
||||
attrs>=19.1.0
|
||||
nested_dict>=1.61
|
||||
related>=0.7.2
|
||||
@@ -28,3 +26,4 @@ semantic_version>=2.6.0,<3
|
||||
furl>=2.0.0
|
||||
redis>=2.10.5
|
||||
humanfriendly==4.18
|
||||
psutil>=5.6.5
|
||||
|
||||
@@ -455,7 +455,7 @@
|
||||
}
|
||||
batch_size {
|
||||
type: integer
|
||||
description: "Number of events to return each time"
|
||||
description: "Number of events to return each time (default 500)"
|
||||
}
|
||||
event_type {
|
||||
type: string
|
||||
|
||||
@@ -324,7 +324,6 @@
|
||||
required: [
|
||||
uri
|
||||
name
|
||||
labels
|
||||
]
|
||||
properties {
|
||||
uri {
|
||||
|
||||
@@ -3,6 +3,25 @@ _default {
|
||||
internal: true
|
||||
allow_roles: ["root", "system"]
|
||||
}
|
||||
get_stats {
|
||||
"2.1" {
|
||||
description: "Get the server collected statistics."
|
||||
request {
|
||||
type: object
|
||||
properties {
|
||||
interval {
|
||||
description: "The period for statistics collection in seconds."
|
||||
type: long
|
||||
}
|
||||
}
|
||||
}
|
||||
response {
|
||||
type: object
|
||||
properties: {
|
||||
}
|
||||
}
|
||||
}
|
||||
}
|
||||
config {
|
||||
"2.1" {
|
||||
description: "Get server configuration. Secure section is not returned."
|
||||
@@ -66,3 +85,44 @@ endpoints {
|
||||
}
|
||||
}
|
||||
}
|
||||
report_stats_option {
|
||||
allow_roles = [ "*" ]
|
||||
"2.4" {
|
||||
description: "Get or set the report statistics option per-company"
|
||||
request {
|
||||
type: object
|
||||
properties {
|
||||
enabled {
|
||||
description: "If provided, sets the report statistics option (true/false)"
|
||||
type: boolean
|
||||
}
|
||||
}
|
||||
}
|
||||
response {
|
||||
type: object
|
||||
properties {
|
||||
enabled {
|
||||
description: "Returns the current report stats option value"
|
||||
type: boolean
|
||||
}
|
||||
enabled_time {
|
||||
description: "If enabled, returns the time at which option was enabled"
|
||||
type: string
|
||||
format: date-time
|
||||
}
|
||||
enabled_version {
|
||||
description: "If enabled, returns the server version at the time option was enabled"
|
||||
type: string
|
||||
}
|
||||
enabled_user {
|
||||
description: "If enabled, returns Id of the user who enabled the option"
|
||||
type: string
|
||||
}
|
||||
current_version {
|
||||
description: "Returns the current server version"
|
||||
type: string
|
||||
}
|
||||
}
|
||||
}
|
||||
}
|
||||
}
|
||||
@@ -550,6 +550,60 @@ get_all {
|
||||
}
|
||||
}
|
||||
}
|
||||
clone {
|
||||
"2.5" {
|
||||
description: "Clone an existing task"
|
||||
request {
|
||||
type: object
|
||||
required: [ task ]
|
||||
properties {
|
||||
task {
|
||||
description: "ID of the task"
|
||||
type: string
|
||||
}
|
||||
new_task_name {
|
||||
description: "The name of the cloned task. If not provided then taken from the original task"
|
||||
type: string
|
||||
}
|
||||
new_task_comment {
|
||||
description: "The comment of the cloned task. If not provided then taken from the original task"
|
||||
type: string
|
||||
}
|
||||
new_task_tags {
|
||||
description: "The user-defined tags of the cloned task. If not provided then taken from the original task"
|
||||
type: array
|
||||
items { type: string }
|
||||
}
|
||||
new_task_system_tags {
|
||||
description: "The system tags of the cloned task. If not provided then empty"
|
||||
type: array
|
||||
items { type: string }
|
||||
}
|
||||
new_task_parent {
|
||||
description: "The parent of the cloned task. If not provided then taken from the original task"
|
||||
type: string
|
||||
}
|
||||
new_task_project {
|
||||
description: "The project of the cloned task. If not provided then taken from the original task"
|
||||
type: string
|
||||
}
|
||||
execution_overrides {
|
||||
description: "The execution params for the cloned task. The params not specified are taken from the original task"
|
||||
"$ref": "#/definitions/execution"
|
||||
}
|
||||
}
|
||||
}
|
||||
response {
|
||||
type: object
|
||||
properties {
|
||||
id {
|
||||
description: "ID of the new task"
|
||||
type: string
|
||||
}
|
||||
}
|
||||
}
|
||||
}
|
||||
}
|
||||
create {
|
||||
"2.1" {
|
||||
description: "Create a new task"
|
||||
@@ -1304,4 +1358,40 @@ ping {
|
||||
additionalProperties: false
|
||||
}
|
||||
}
|
||||
}
|
||||
|
||||
add_or_update_artifacts {
|
||||
"2.6" {
|
||||
description: """ Update an existing artifact (search by key/mode) or add a new one """
|
||||
request {
|
||||
type: object
|
||||
required: [ task, artifacts ]
|
||||
properties {
|
||||
task {
|
||||
description: "Task ID"
|
||||
type: string
|
||||
}
|
||||
artifacts {
|
||||
description: "Artifacts to add or update"
|
||||
type: array
|
||||
items { "$ref": "#/definitions/artifact" }
|
||||
}
|
||||
}
|
||||
}
|
||||
response {
|
||||
type: object
|
||||
properties {
|
||||
added {
|
||||
description: "Keys of artifacts added"
|
||||
type: array
|
||||
items { type: string }
|
||||
}
|
||||
updated {
|
||||
description: "Keys of artifacts updated"
|
||||
type: array
|
||||
items { type: string }
|
||||
}
|
||||
}
|
||||
}
|
||||
}
|
||||
}
|
||||
@@ -1,3 +1,4 @@
|
||||
import atexit
|
||||
from argparse import ArgumentParser
|
||||
|
||||
from flask import Flask, request, Response
|
||||
@@ -7,15 +8,16 @@ from werkzeug.exceptions import BadRequest
|
||||
|
||||
import database
|
||||
from apierrors.base import BaseError
|
||||
from bll.statistics.stats_reporter import StatisticsReporter
|
||||
from config import config
|
||||
from init_data import init_es_data, init_mongo_data
|
||||
from service_repo import ServiceRepo, APICall
|
||||
from service_repo.auth import AuthType
|
||||
from service_repo.errors import PathParsingError
|
||||
from timing_context import TimingContext
|
||||
from utilities import json
|
||||
from init_data import init_es_data, init_mongo_data
|
||||
from updates import check_updates_thread
|
||||
|
||||
from utilities import json
|
||||
from utilities.threads_manager import ThreadsManager
|
||||
|
||||
app = Flask(__name__, static_url_path="/static")
|
||||
CORS(app, **config.get("apiserver.cors"))
|
||||
@@ -38,6 +40,14 @@ log.info(f"Exposed Services: {' '.join(ServiceRepo.endpoint_names())}")
|
||||
|
||||
|
||||
check_updates_thread.start()
|
||||
StatisticsReporter.start()
|
||||
|
||||
|
||||
def graceful_shutdown():
|
||||
ThreadsManager.terminating = True
|
||||
|
||||
|
||||
atexit.register(graceful_shutdown)
|
||||
|
||||
|
||||
@app.before_first_request
|
||||
@@ -57,7 +67,9 @@ def before_request():
|
||||
content, content_type = ServiceRepo.handle_call(call)
|
||||
headers = {}
|
||||
if call.result.filename:
|
||||
headers["Content-Disposition"] = f"attachment; filename={call.result.filename}"
|
||||
headers[
|
||||
"Content-Disposition"
|
||||
] = f"attachment; filename={call.result.filename}"
|
||||
|
||||
if call.result.headers:
|
||||
headers.update(call.result.headers)
|
||||
@@ -71,7 +83,9 @@ def before_request():
|
||||
if value is None:
|
||||
response.set_cookie(key, "", expires=0)
|
||||
else:
|
||||
response.set_cookie(key, value, **config.get("apiserver.auth.cookies"))
|
||||
response.set_cookie(
|
||||
key, value, **config.get("apiserver.auth.cookies")
|
||||
)
|
||||
|
||||
return response
|
||||
except Exception as ex:
|
||||
|
||||
@@ -21,6 +21,8 @@ JSON_CONTENT_TYPE = "application/json"
|
||||
class DataContainer(object):
|
||||
""" Data container that supports raw data (dict or a list of batched dicts) and a data model """
|
||||
|
||||
null_schema_validator: SchemaValidator = SchemaValidator(None)
|
||||
|
||||
def __init__(self, data=None, batched_data=None):
|
||||
if data and batched_data:
|
||||
raise ValueError("data and batched data are not supported simultaneously")
|
||||
@@ -28,7 +30,7 @@ class DataContainer(object):
|
||||
self._data = None
|
||||
self._data_model = None
|
||||
self._data_model_cls = None
|
||||
self._schema_validator: SchemaValidator = SchemaValidator(None)
|
||||
self._schema_validator: SchemaValidator = self.null_schema_validator
|
||||
# use setter to properly initialize data
|
||||
self.data = data
|
||||
self.batched_data = batched_data
|
||||
@@ -255,6 +257,7 @@ class MissingIdentity(Exception):
|
||||
class APICall(DataContainer):
|
||||
HEADER_AUTHORIZATION = "Authorization"
|
||||
HEADER_REAL_IP = "X-Real-IP"
|
||||
HEADER_FORWARDED_FOR = "X-Forwarded-For"
|
||||
""" Standard headers """
|
||||
|
||||
_transaction_headers = ("X-Trains-Trx",)
|
||||
@@ -382,8 +385,13 @@ class APICall(DataContainer):
|
||||
|
||||
@property
|
||||
def real_ip(self):
|
||||
real_ip = self.get_header(self.HEADER_REAL_IP)
|
||||
return real_ip or self._remote_addr or "untrackable"
|
||||
""" Obtain visitor's IP address """
|
||||
return (
|
||||
self.get_header(self.HEADER_FORWARDED_FOR)
|
||||
or self.get_header(self.HEADER_REAL_IP)
|
||||
or self._remote_addr
|
||||
or "untrackable"
|
||||
)
|
||||
|
||||
@property
|
||||
def failed(self):
|
||||
|
||||
@@ -5,27 +5,45 @@ from typing import Sequence, TypeVar
|
||||
import attr
|
||||
|
||||
from config import config
|
||||
from config.info import get_default_company
|
||||
|
||||
T = TypeVar("T", bound="FixedUser")
|
||||
|
||||
|
||||
class FixedUsersError(Exception):
|
||||
pass
|
||||
|
||||
|
||||
@attr.s(auto_attribs=True)
|
||||
class FixedUser:
|
||||
username: str
|
||||
password: str
|
||||
name: str
|
||||
company: str = get_default_company()
|
||||
|
||||
def __attrs_post_init__(self):
|
||||
self.user_id = hashlib.md5(f"{self.username}:{self.password}".encode()).hexdigest()
|
||||
self.user_id = hashlib.md5(f"{self.company}:{self.username}".encode()).hexdigest()
|
||||
|
||||
@classmethod
|
||||
def enabled(cls):
|
||||
return config.get("apiserver.auth.fixed_users.enabled", False)
|
||||
|
||||
@classmethod
|
||||
def validate(cls):
|
||||
if not cls.enabled():
|
||||
return
|
||||
users = cls.from_config()
|
||||
if len({user.username for user in users}) < len(users):
|
||||
raise FixedUsersError(
|
||||
"Duplicate user names found in fixed users configuration"
|
||||
)
|
||||
|
||||
@classmethod
|
||||
@lru_cache()
|
||||
def from_config(cls) -> Sequence[T]:
|
||||
return [cls(**user) for user in config.get("apiserver.auth.fixed_users.users", [])]
|
||||
return [
|
||||
cls(**user) for user in config.get("apiserver.auth.fixed_users.users", [])
|
||||
]
|
||||
|
||||
@classmethod
|
||||
@lru_cache()
|
||||
|
||||
@@ -1,5 +1,6 @@
|
||||
from enum import Enum
|
||||
from typing import Callable, Sequence, Text
|
||||
|
||||
from boltons.iterutils import remap
|
||||
from jsonmodels import models
|
||||
from jsonmodels.errors import FieldNotSupported
|
||||
|
||||
@@ -87,7 +88,14 @@ class Endpoint(object):
|
||||
Provided data_model schema if available
|
||||
"""
|
||||
try:
|
||||
return data_model.to_json_schema()
|
||||
res = data_model.to_json_schema()
|
||||
|
||||
def visit(path, key, value):
|
||||
if isinstance(value, Enum):
|
||||
value = str(value)
|
||||
return key, value
|
||||
|
||||
return remap(res, visit=visit)
|
||||
except (FieldNotSupported, TypeError):
|
||||
return str(data_model.__name__)
|
||||
|
||||
|
||||
@@ -34,7 +34,7 @@ class ServiceRepo(object):
|
||||
"""If the check is set, parsing will fail for endpoint request with the version that is grater than the current
|
||||
maximum """
|
||||
|
||||
_max_version = PartialVersion("2.4")
|
||||
_max_version = PartialVersion("2.6")
|
||||
""" Maximum version number (the highest min_version value across all endpoints) """
|
||||
|
||||
_endpoint_exp = (
|
||||
|
||||
@@ -211,7 +211,7 @@ def vector_metrics_iter_histogram(call, company_id, req_model):
|
||||
@endpoint("events.get_task_events", required_fields=["task"])
|
||||
def get_task_events(call, company_id, _):
|
||||
task_id = call.data["task"]
|
||||
batch_size = call.data.get("batch_size")
|
||||
batch_size = call.data.get("batch_size", 500)
|
||||
event_type = call.data.get("event_type")
|
||||
scroll_id = call.data.get("scroll_id")
|
||||
order = call.data.get("order") or "asc"
|
||||
@@ -394,7 +394,7 @@ def get_task_plots_v1_7(call, company_id, req_model):
|
||||
|
||||
task_bll.assert_exists(call.identity.company, task_id, allow_public=True)
|
||||
# events, next_scroll_id, total_events = event_bll.get_task_events(
|
||||
# company_id, task_id,
|
||||
# company, task_id,
|
||||
# event_type="plot",
|
||||
# sort=[{"iter": {"order": "desc"}}],
|
||||
# last_iter_count=iters,
|
||||
@@ -453,7 +453,7 @@ def get_debug_images_v1_7(call, company_id, req_model):
|
||||
|
||||
task_bll.assert_exists(call.identity.company, task_id, allow_public=True)
|
||||
# events, next_scroll_id, total_events = event_bll.get_task_events(
|
||||
# company_id, task_id,
|
||||
# company, task_id,
|
||||
# event_type="training_debug_image",
|
||||
# sort=[{"iter": {"order": "desc"}}],
|
||||
# last_iter_count=iters,
|
||||
|
||||
@@ -1,10 +1,25 @@
|
||||
from datetime import datetime
|
||||
|
||||
from pyhocon.config_tree import NoneValue
|
||||
|
||||
from apierrors import errors
|
||||
from apimodels.server import ReportStatsOptionRequest, ReportStatsOptionResponse
|
||||
from bll.statistics.stats_reporter import StatisticsReporter
|
||||
from config import config
|
||||
from config.info import get_version, get_build_number, get_commit_number
|
||||
from database.errors import translate_errors_context
|
||||
from database.model import Company
|
||||
from database.model.company import ReportStatsOption
|
||||
from service_repo import ServiceRepo, APICall, endpoint
|
||||
|
||||
|
||||
@endpoint("server.get_stats")
|
||||
def get_stats(call: APICall):
|
||||
call.result.data = StatisticsReporter.get_statistics(
|
||||
company_id=call.identity.company
|
||||
)
|
||||
|
||||
|
||||
@endpoint("server.config")
|
||||
def get_config(call: APICall):
|
||||
path = call.data.get("path")
|
||||
@@ -43,3 +58,36 @@ def info(call: APICall):
|
||||
"build": get_build_number(),
|
||||
"commit": get_commit_number(),
|
||||
}
|
||||
|
||||
|
||||
@endpoint(
|
||||
"server.report_stats_option",
|
||||
request_data_model=ReportStatsOptionRequest,
|
||||
response_data_model=ReportStatsOptionResponse,
|
||||
)
|
||||
def report_stats(call: APICall, company: str, request: ReportStatsOptionRequest):
|
||||
if not StatisticsReporter.supported:
|
||||
result = ReportStatsOptionResponse(supported=False)
|
||||
else:
|
||||
enabled = request.enabled
|
||||
with translate_errors_context():
|
||||
query = Company.objects(id=company)
|
||||
if enabled is None:
|
||||
stats_option = query.first().defaults.stats_option
|
||||
else:
|
||||
stats_option = ReportStatsOption(
|
||||
enabled=enabled,
|
||||
enabled_time=datetime.utcnow(),
|
||||
enabled_version=get_version(),
|
||||
enabled_user=call.identity.user,
|
||||
)
|
||||
updated = query.update(defaults__stats_option=stats_option)
|
||||
if not updated:
|
||||
raise errors.server_error.InternalError(
|
||||
f"Failed setting report_stats to {enabled}"
|
||||
)
|
||||
data = stats_option.to_mongo()
|
||||
data["current_version"] = get_version()
|
||||
result = ReportStatsOptionResponse(**data)
|
||||
|
||||
call.result.data_model = result
|
||||
|
||||
@@ -1,18 +1,17 @@
|
||||
from copy import deepcopy
|
||||
from datetime import datetime
|
||||
from operator import attrgetter
|
||||
from typing import Sequence, Callable, Type, TypeVar
|
||||
from typing import Sequence, Callable, Type, TypeVar, Union
|
||||
|
||||
import attr
|
||||
import dpath
|
||||
import mongoengine
|
||||
import six
|
||||
from mongoengine import EmbeddedDocument, Q
|
||||
from mongoengine.queryset.transform import COMPARISON_OPERATORS
|
||||
from pymongo import UpdateOne
|
||||
|
||||
from apierrors import errors, APIError
|
||||
from apimodels.base import UpdateResponse
|
||||
from apimodels.base import UpdateResponse, IdResponse
|
||||
from apimodels.tasks import (
|
||||
StartedResponse,
|
||||
ResetResponse,
|
||||
@@ -27,10 +26,19 @@ from apimodels.tasks import (
|
||||
EnqueueRequest,
|
||||
EnqueueResponse,
|
||||
DequeueResponse,
|
||||
CloneRequest,
|
||||
AddOrUpdateArtifactsRequest,
|
||||
AddOrUpdateArtifactsResponse,
|
||||
)
|
||||
from bll.event import EventBLL
|
||||
from bll.queue import QueueBLL
|
||||
from bll.task import TaskBLL, ChangeStatusRequest, update_project_time, split_by
|
||||
from bll.task import (
|
||||
TaskBLL,
|
||||
ChangeStatusRequest,
|
||||
update_project_time,
|
||||
split_by,
|
||||
ParameterKeyEscaper,
|
||||
)
|
||||
from bll.util import SetFieldsResolver
|
||||
from database.errors import translate_errors_context
|
||||
from database.model.model import Model
|
||||
@@ -94,13 +102,37 @@ def get_by_id(call: APICall, company_id, req_model: TaskRequest):
|
||||
req_model.task, company_id=company_id, allow_public=True
|
||||
)
|
||||
task_dict = task.to_proper_dict()
|
||||
conform_output_tags(call, task_dict)
|
||||
unprepare_from_saved(call, task_dict)
|
||||
call.result.data = {"task": task_dict}
|
||||
|
||||
|
||||
def escape_execution_parameters(call: APICall):
|
||||
default_prefix = "execution.parameters."
|
||||
|
||||
def escape_paths(paths, prefix=default_prefix):
|
||||
return [
|
||||
prefix + ParameterKeyEscaper.escape(path[len(prefix) :])
|
||||
if path.startswith(prefix)
|
||||
else path
|
||||
for path in paths
|
||||
]
|
||||
|
||||
projection = Task.get_projection(call.data)
|
||||
if projection:
|
||||
Task.set_projection(call.data, escape_paths(projection))
|
||||
|
||||
ordering = Task.get_ordering(call.data)
|
||||
if ordering:
|
||||
ordering = Task.set_ordering(call.data, escape_paths(ordering, default_prefix))
|
||||
Task.set_ordering(call.data, escape_paths(ordering, "-" + default_prefix))
|
||||
|
||||
|
||||
@endpoint("tasks.get_all_ex", required_fields=[])
|
||||
def get_all_ex(call: APICall):
|
||||
conform_tag_fields(call, call.data)
|
||||
|
||||
escape_execution_parameters(call)
|
||||
|
||||
with translate_errors_context():
|
||||
with TimingContext("mongo", "task_get_all_ex"):
|
||||
tasks = Task.get_many_with_join(
|
||||
@@ -109,13 +141,16 @@ def get_all_ex(call: APICall):
|
||||
query_options=get_all_query_options,
|
||||
allow_public=True, # required in case projection is requested for public dataset/versions
|
||||
)
|
||||
conform_output_tags(call, tasks)
|
||||
unprepare_from_saved(call, tasks)
|
||||
call.result.data = {"tasks": tasks}
|
||||
|
||||
|
||||
@endpoint("tasks.get_all", required_fields=[])
|
||||
def get_all(call: APICall):
|
||||
conform_tag_fields(call, call.data)
|
||||
|
||||
escape_execution_parameters(call)
|
||||
|
||||
with translate_errors_context():
|
||||
with TimingContext("mongo", "task_get_all"):
|
||||
tasks = Task.get_many(
|
||||
@@ -125,7 +160,7 @@ def get_all(call: APICall):
|
||||
query_options=get_all_query_options,
|
||||
allow_public=True, # required in case projection is requested for public dataset/versions
|
||||
)
|
||||
conform_output_tags(call, tasks)
|
||||
unprepare_from_saved(call, tasks)
|
||||
call.result.data = {"tasks": tasks}
|
||||
|
||||
|
||||
@@ -220,6 +255,45 @@ create_fields = {
|
||||
}
|
||||
|
||||
|
||||
def prepare_for_save(call: APICall, fields: dict):
|
||||
conform_tag_fields(call, fields)
|
||||
|
||||
# Strip all script fields (remove leading and trailing whitespace chars) to avoid unusable names and paths
|
||||
for field in task_script_fields:
|
||||
try:
|
||||
path = f"script/{field}"
|
||||
value = dpath.get(fields, path)
|
||||
if isinstance(value, str):
|
||||
value = value.strip()
|
||||
dpath.set(fields, path, value)
|
||||
except KeyError:
|
||||
pass
|
||||
|
||||
parameters = safe_get(fields, "execution/parameters")
|
||||
if parameters is not None:
|
||||
# Escape keys to make them mongo-safe
|
||||
parameters = {ParameterKeyEscaper.escape(k): v for k, v in parameters.items()}
|
||||
dpath.set(fields, "execution/parameters", parameters)
|
||||
|
||||
return fields
|
||||
|
||||
|
||||
def unprepare_from_saved(call: APICall, tasks_data: Union[Sequence[dict], dict]):
|
||||
if isinstance(tasks_data, dict):
|
||||
tasks_data = [tasks_data]
|
||||
|
||||
conform_output_tags(call, tasks_data)
|
||||
|
||||
for task_data in tasks_data:
|
||||
parameters = safe_get(task_data, "execution/parameters")
|
||||
if parameters is not None:
|
||||
# Escape keys to make them mongo-safe
|
||||
parameters = {
|
||||
ParameterKeyEscaper.unescape(k): v for k, v in parameters.items()
|
||||
}
|
||||
dpath.set(task_data, "execution/parameters", parameters)
|
||||
|
||||
|
||||
def prepare_create_fields(
|
||||
call: APICall, valid_fields=None, output=None, previous_task: Task = None
|
||||
):
|
||||
@@ -239,25 +313,7 @@ def prepare_create_fields(
|
||||
output = Output(destination=output_dest)
|
||||
fields["output"] = output
|
||||
|
||||
conform_tag_fields(call, fields)
|
||||
|
||||
# Strip all script fields (remove leading and trailing whitespace chars) to avoid unusable names and paths
|
||||
for field in task_script_fields:
|
||||
try:
|
||||
path = "script/%s" % field
|
||||
value = dpath.get(fields, path)
|
||||
if isinstance(value, six.string_types):
|
||||
value = value.strip()
|
||||
dpath.set(fields, path, value)
|
||||
except KeyError:
|
||||
pass
|
||||
|
||||
parameters = safe_get(fields, "execution/parameters")
|
||||
if parameters is not None:
|
||||
parameters = {k.strip(): v for k, v in parameters.items()}
|
||||
dpath.set(fields, "execution/parameters", parameters)
|
||||
|
||||
return fields
|
||||
return prepare_for_save(call, fields)
|
||||
|
||||
|
||||
def _validate_and_get_task_from_call(call: APICall, **kwargs):
|
||||
@@ -278,7 +334,9 @@ def validate(call: APICall, company_id, req_model: CreateRequest):
|
||||
_validate_and_get_task_from_call(call)
|
||||
|
||||
|
||||
@endpoint("tasks.create", request_data_model=CreateRequest)
|
||||
@endpoint(
|
||||
"tasks.create", request_data_model=CreateRequest, response_data_model=IdResponse
|
||||
)
|
||||
def create(call: APICall, company_id, req_model: CreateRequest):
|
||||
task = _validate_and_get_task_from_call(call)
|
||||
|
||||
@@ -286,7 +344,26 @@ def create(call: APICall, company_id, req_model: CreateRequest):
|
||||
task.save()
|
||||
update_project_time(task.project)
|
||||
|
||||
call.result.data = {"id": task.id}
|
||||
call.result.data_model = IdResponse(id=task.id)
|
||||
|
||||
|
||||
@endpoint(
|
||||
"tasks.clone", request_data_model=CloneRequest, response_data_model=IdResponse
|
||||
)
|
||||
def clone_task(call: APICall, company_id, request: CloneRequest):
|
||||
task = task_bll.clone_task(
|
||||
company_id=company_id,
|
||||
user_id=call.identity.user,
|
||||
task_id=request.task,
|
||||
name=request.new_task_name,
|
||||
comment=request.new_task_comment,
|
||||
parent=request.new_task_parent,
|
||||
project=request.new_task_project,
|
||||
tags=request.new_task_tags,
|
||||
system_tags=request.new_task_system_tags,
|
||||
execution_overrides=request.execution_overrides,
|
||||
)
|
||||
call.result.data_model = IdResponse(id=task.id)
|
||||
|
||||
|
||||
def prepare_update_fields(call: APICall, task, call_data):
|
||||
@@ -296,8 +373,7 @@ def prepare_update_fields(call: APICall, task, call_data):
|
||||
t_fields = task_fields
|
||||
t_fields.add("output__error")
|
||||
fields = parse_from_call(call_data, update_fields, t_fields)
|
||||
conform_tag_fields(call, fields)
|
||||
return fields, valid_fields
|
||||
return prepare_for_save(call, fields), valid_fields
|
||||
|
||||
|
||||
@endpoint(
|
||||
@@ -324,7 +400,7 @@ def update(call: APICall, company_id, req_model: UpdateRequest):
|
||||
)
|
||||
|
||||
update_project_time(updated_fields.get("project"))
|
||||
conform_output_tags(call, updated_fields)
|
||||
unprepare_from_saved(call, updated_fields)
|
||||
return UpdateResponse(updated=updated_count, fields=updated_fields)
|
||||
|
||||
|
||||
@@ -449,7 +525,7 @@ def edit(call: APICall, company_id, req_model: UpdateRequest):
|
||||
fixed_fields.update(last_update=now)
|
||||
updated = task.update(upsert=False, **fixed_fields)
|
||||
update_project_time(fields.get("project"))
|
||||
conform_output_tags(call, fields)
|
||||
unprepare_from_saved(call, fields)
|
||||
call.result.data_model = UpdateResponse(updated=updated, fields=fields)
|
||||
else:
|
||||
call.result.data_model = UpdateResponse(updated=0)
|
||||
@@ -702,7 +778,7 @@ def cleanup_task(task, force=False):
|
||||
else:
|
||||
updated_models = 0
|
||||
|
||||
event_bll.delete_task_events(task.company, task.id)
|
||||
event_bll.delete_task_events(task.company, task.id, allow_locked=force)
|
||||
|
||||
return CleanupResult(
|
||||
deleted_models=deleted_models,
|
||||
@@ -837,3 +913,18 @@ def ping(_, company_id, request: PingRequest):
|
||||
TaskBLL.set_last_update(
|
||||
task_ids=[request.task], company_id=company_id, last_update=datetime.utcnow()
|
||||
)
|
||||
|
||||
|
||||
@endpoint(
|
||||
"tasks.add_or_update_artifacts",
|
||||
min_version="2.6",
|
||||
request_data_model=AddOrUpdateArtifactsRequest,
|
||||
response_data_model=AddOrUpdateArtifactsResponse,
|
||||
)
|
||||
def add_or_update_artifacts(
|
||||
call: APICall, company_id, request: AddOrUpdateArtifactsRequest
|
||||
):
|
||||
added, updated = TaskBLL.add_or_update_artifacts(
|
||||
task_id=request.task, company_id=company_id, artifacts=request.artifacts
|
||||
)
|
||||
call.result.data_model = AddOrUpdateArtifactsResponse(added=added, updated=updated)
|
||||
|
||||
@@ -7,10 +7,7 @@ from mongoengine import Q
|
||||
|
||||
from apierrors import errors
|
||||
from apimodels.base import UpdateResponse
|
||||
from apimodels.users import (
|
||||
CreateRequest,
|
||||
SetPreferencesRequest,
|
||||
)
|
||||
from apimodels.users import CreateRequest, SetPreferencesRequest
|
||||
from bll.user import UserBLL
|
||||
from config import config
|
||||
from database.errors import translate_errors_context
|
||||
@@ -19,6 +16,7 @@ from database.model.company import Company
|
||||
from database.model.user import User
|
||||
from database.utils import parse_from_call
|
||||
from service_repo import APICall, endpoint
|
||||
from utilities.json import loads, dumps
|
||||
|
||||
log = config.logger(__file__)
|
||||
get_all_query_options = User.QueryParameterOptions(list_fields=("id",))
|
||||
@@ -160,7 +158,10 @@ def update(call, company_id, _):
|
||||
|
||||
def get_user_preferences(call):
|
||||
user_id = call.identity.user
|
||||
return get_user(call, user_id, ["preferences"]).get("preferences", {})
|
||||
preferences = get_user(call, user_id, ["preferences"]).get("preferences")
|
||||
if preferences and isinstance(preferences, str):
|
||||
preferences = loads(preferences)
|
||||
return preferences or {}
|
||||
|
||||
|
||||
@endpoint("users.get_preferences")
|
||||
@@ -169,9 +170,7 @@ def get_preferences(call):
|
||||
return {"preferences": get_user_preferences(call)}
|
||||
|
||||
|
||||
@endpoint(
|
||||
"users.set_preferences", request_data_model=SetPreferencesRequest
|
||||
)
|
||||
@endpoint("users.set_preferences", request_data_model=SetPreferencesRequest)
|
||||
def set_preferences(call, company_id, req_model):
|
||||
# type: (APICall, str, SetPreferencesRequest) -> Dict
|
||||
assert isinstance(call, APICall)
|
||||
@@ -205,9 +204,11 @@ def set_preferences(call, company_id, req_model):
|
||||
updated, fields = 0, {}
|
||||
else:
|
||||
with translate_errors_context("updating user preferences"):
|
||||
fields = dict(preferences=new_preferences)
|
||||
updated = User.objects(id=call.identity.user, company=company_id).update(
|
||||
upsert=False, **fields
|
||||
upsert=False, preferences=dumps(new_preferences)
|
||||
)
|
||||
|
||||
return {"updated": updated, "fields": fields if updated else {}}
|
||||
return {
|
||||
"updated": updated,
|
||||
"fields": {"preferences": new_preferences} if updated else {},
|
||||
}
|
||||
|
||||
@@ -52,7 +52,7 @@ class TestEntityOrdering(TestService):
|
||||
def _get_page_tasks(self, order_by, page: int, page_size: int) -> Sequence:
|
||||
return self.api.tasks.get_all_ex(
|
||||
only_fields=self.only_fields,
|
||||
order_by=order_by,
|
||||
order_by=[order_by] if order_by else None,
|
||||
comment=self.test_comment,
|
||||
page=page,
|
||||
page_size=page_size,
|
||||
@@ -79,7 +79,7 @@ class TestEntityOrdering(TestService):
|
||||
def _assertGetTasksWithOrdering(self, order_by: str = None, **kwargs):
|
||||
tasks = self.api.tasks.get_all_ex(
|
||||
only_fields=self.only_fields,
|
||||
order_by=order_by,
|
||||
order_by=[order_by] if order_by else None,
|
||||
comment=self.test_comment,
|
||||
**kwargs,
|
||||
).tasks
|
||||
|
||||
@@ -6,6 +6,9 @@ log = config.logger(__file__)
|
||||
|
||||
|
||||
class TestTasksEdit(TestService):
|
||||
def setUp(self, **kwargs):
|
||||
super().setUp(version=2.5)
|
||||
|
||||
def new_task(self, **kwargs):
|
||||
return self.create_temp(
|
||||
"tasks", type="testing", name="test", input=dict(view=dict()), **kwargs
|
||||
@@ -34,3 +37,39 @@ class TestTasksEdit(TestService):
|
||||
self.api.models.edit(model=not_ready_model, ready=False)
|
||||
self.assertFalse(self.api.models.get_by_id(model=not_ready_model).model.ready)
|
||||
self.api.tasks.edit(task=task, execution=dict(model=not_ready_model))
|
||||
|
||||
def test_clone_task(self):
|
||||
script = dict(
|
||||
binary="python",
|
||||
requirements=dict(pip=["six"]),
|
||||
repository="https://example.come/foo/bar",
|
||||
entry_point="test.py",
|
||||
diff="foo",
|
||||
)
|
||||
execution = dict(parameters=dict(test="Test"))
|
||||
tags = ["hello"]
|
||||
system_tags = ["development", "test"]
|
||||
task = self.new_task(
|
||||
script=script, execution=execution, tags=tags, system_tags=system_tags
|
||||
)
|
||||
|
||||
new_name = "new test"
|
||||
new_tags = ["by"]
|
||||
execution_overrides = dict(framework="Caffe")
|
||||
new_task_id = self.api.tasks.clone(
|
||||
task=task,
|
||||
new_task_name=new_name,
|
||||
new_task_tags=new_tags,
|
||||
execution_overrides=execution_overrides,
|
||||
new_task_parent=task,
|
||||
).id
|
||||
new_task = self.api.tasks.get_by_id(task=new_task_id).task
|
||||
self.assertEqual(new_task.name, new_name)
|
||||
self.assertEqual(new_task.type, "testing")
|
||||
self.assertEqual(new_task.tags, new_tags)
|
||||
self.assertEqual(new_task.status, "created")
|
||||
self.assertEqual(new_task.script, script)
|
||||
self.assertEqual(new_task.parent, task)
|
||||
self.assertEqual(new_task.execution.parameters, execution["parameters"])
|
||||
self.assertEqual(new_task.execution.framework, execution_overrides["framework"])
|
||||
self.assertEqual(new_task.system_tags, [])
|
||||
|
||||
@@ -183,12 +183,13 @@ class TestWorkersService(TestService):
|
||||
# run on an empty es db since we have no way
|
||||
# to pass non existing workers to this api
|
||||
# res = self.api.workers.get_activity_report(
|
||||
# from_date=from_date.timestamp(),
|
||||
# to_date=to_date.timestamp(),
|
||||
# from_timestamp=from_timestamp.timestamp(),
|
||||
# to_timestamp=to_timestamp.timestamp(),
|
||||
# interval=20,
|
||||
# )
|
||||
|
||||
self._simulate_workers()
|
||||
|
||||
to_date = utc_now_tz_aware()
|
||||
from_date = to_date - timedelta(minutes=10)
|
||||
|
||||
|
||||
@@ -1 +1,2 @@
|
||||
numpy>=1.12.1
|
||||
nose==1.3.7
|
||||
parameterized>=0.7.1
|
||||
|
||||
@@ -8,7 +8,9 @@ import requests
|
||||
from semantic_version import Version
|
||||
|
||||
from config import config
|
||||
from version import __version__ as current_version
|
||||
from config.info import get_version
|
||||
from database.model.settings import Settings
|
||||
from utilities.threads_manager import ThreadsManager
|
||||
|
||||
log = config.logger(__name__)
|
||||
|
||||
@@ -39,12 +41,15 @@ class CheckUpdatesThread(Thread):
|
||||
|
||||
def _check_new_version_available(self) -> Optional[_VersionResponse]:
|
||||
url = config.get(
|
||||
"apiserver.check_for_updates.url", "https://updates.trains.allegro.ai/updates"
|
||||
"apiserver.check_for_updates.url",
|
||||
"https://updates.trains.allegro.ai/updates",
|
||||
)
|
||||
|
||||
uid = Settings.get_by_key("server.uuid")
|
||||
|
||||
response = requests.get(
|
||||
url,
|
||||
json={"versions": {self.component_name: str(current_version)}},
|
||||
json={"versions": {self.component_name: str(get_version())}, "uid": uid},
|
||||
timeout=float(
|
||||
config.get("apiserver.check_for_updates.request_timeout_sec", 3.0)
|
||||
),
|
||||
@@ -61,7 +66,7 @@ class CheckUpdatesThread(Thread):
|
||||
if not latest_version:
|
||||
return
|
||||
|
||||
cur_version = Version(current_version)
|
||||
cur_version = Version(get_version())
|
||||
latest_version = Version(latest_version)
|
||||
if cur_version >= latest_version:
|
||||
return
|
||||
@@ -76,7 +81,16 @@ class CheckUpdatesThread(Thread):
|
||||
)
|
||||
|
||||
def _check_updates(self):
|
||||
while True:
|
||||
update_interval_sec = max(
|
||||
float(
|
||||
config.get(
|
||||
"apiserver.check_for_updates.check_interval_sec",
|
||||
60 * 60 * 24,
|
||||
)
|
||||
),
|
||||
60 * 5,
|
||||
)
|
||||
while not ThreadsManager.terminating:
|
||||
# noinspection PyBroadException
|
||||
try:
|
||||
response = self._check_new_version_available()
|
||||
@@ -94,17 +108,7 @@ class CheckUpdatesThread(Thread):
|
||||
except Exception:
|
||||
log.exception("Failed obtaining updates")
|
||||
|
||||
sleep(
|
||||
max(
|
||||
float(
|
||||
config.get(
|
||||
"apiserver.check_for_updates.check_interval_sec",
|
||||
60 * 60 * 24,
|
||||
)
|
||||
),
|
||||
60 * 5,
|
||||
)
|
||||
)
|
||||
sleep(update_interval_sec)
|
||||
|
||||
|
||||
check_updates_thread = CheckUpdatesThread()
|
||||
|
||||
@@ -12,6 +12,24 @@ def flatten_nested_items(
|
||||
for key, value in dictionary.items():
|
||||
path = prefix + (key,)
|
||||
if isinstance(value, dict) and nesting != 0:
|
||||
yield from flatten_nested_items(value, next_nesting, include_leaves, prefix=path)
|
||||
yield from flatten_nested_items(
|
||||
value, next_nesting, include_leaves, prefix=path
|
||||
)
|
||||
elif include_leaves is None or key in include_leaves:
|
||||
yield path, value
|
||||
|
||||
|
||||
def deep_merge(source: dict, override: dict) -> dict:
|
||||
"""
|
||||
Merge the override dict into the source in-place
|
||||
Contrary to the dpath.merge the sequences are not expanded
|
||||
If override contains the sequence with the same name as source
|
||||
then the whole sequence in the source is overridden
|
||||
"""
|
||||
for key, value in override.items():
|
||||
if key in source and isinstance(source[key], dict) and isinstance(value, dict):
|
||||
deep_merge(source[key], value)
|
||||
else:
|
||||
source[key] = value
|
||||
|
||||
return source
|
||||
|
||||
@@ -1,13 +1,30 @@
|
||||
from functools import wraps
|
||||
from threading import Lock, Thread
|
||||
|
||||
import attr
|
||||
from typing import ClassVar
|
||||
|
||||
|
||||
@attr.s(auto_attribs=True)
|
||||
class ThreadsManager:
|
||||
objects = {}
|
||||
lock = Lock()
|
||||
terminating: ClassVar[bool] = False
|
||||
|
||||
def __init__(self, name=None, **threads):
|
||||
super(ThreadsManager, self).__init__()
|
||||
self.name = name or self.__class__.name
|
||||
self.objects = {}
|
||||
self.lock = Lock()
|
||||
|
||||
for thread_name, thread in threads.items():
|
||||
if issubclass(thread, Thread):
|
||||
thread = thread()
|
||||
thread.start()
|
||||
elif isinstance(thread, Thread):
|
||||
if not thread.is_alive():
|
||||
thread.start()
|
||||
else:
|
||||
raise Exception(f"Expected thread or thread class ({thread_name}): {thread}")
|
||||
|
||||
self.objects[thread_name] = thread
|
||||
|
||||
def register(self, thread_name, daemon=True):
|
||||
def decorator(f):
|
||||
@@ -17,7 +34,7 @@ class ThreadsManager:
|
||||
thread = self.objects.get(thread_name)
|
||||
if not thread:
|
||||
thread = Thread(
|
||||
target=f, name=thread_name, args=args, kwargs=kwargs
|
||||
target=f, name=f"{self.name}_{thread_name}", args=args, kwargs=kwargs
|
||||
)
|
||||
thread.daemon = daemon
|
||||
thread.start()
|
||||
@@ -27,3 +44,13 @@ class ThreadsManager:
|
||||
return wrapper
|
||||
|
||||
return decorator
|
||||
|
||||
def __getattr__(self, item):
|
||||
if item in self.objects:
|
||||
return self.objects[item]
|
||||
return self.__getattribute__(item)
|
||||
|
||||
def __getitem__(self, item):
|
||||
if item in self.objects:
|
||||
return self.objects[item]
|
||||
raise KeyError(item)
|
||||
|
||||
@@ -1 +1 @@
|
||||
__version__ = "0.12.0"
|
||||
__version__ = "0.13.0"
|
||||
|
||||
@@ -2,13 +2,13 @@
|
||||
|
||||
## Introduction
|
||||
|
||||
The webserver is the **trains-server**'s component responsible for serving the TRAINS webapp.
|
||||
The webserver is the **trains-server**'s component responsible for serving the Trains webapp.
|
||||
For this purpose, we use an [NGINX](https://www.nginx.com/) server.
|
||||
|
||||
## Configuration
|
||||
|
||||
In order to serve the TRAINS webapp, the following is required:
|
||||
* The pre-built TRAINS webapp should be copied to the NGINX html directory (usually `/usr/share/nginx/html`)
|
||||
In order to serve the Trains webapp, the following is required:
|
||||
* The pre-built Trains webapp should be copied to the NGINX html directory (usually `/usr/share/nginx/html`)
|
||||
* The default NGINX port (usually `80`) should be changed to match the **trains-server** configuration (usually `8080`)
|
||||
|
||||
NOTE: This configuration may vary in different systems, depending on the NGINX version and distribution used.
|
||||
|
||||
Reference in New Issue
Block a user