mirror of
https://github.com/clearml/clearml-server
synced 2025-06-26 23:15:47 +00:00
Compare commits
52 Commits
| Author | SHA1 | Date | |
|---|---|---|---|
|
|
fb5c06e9c3 | ||
|
|
1a9bbc9420 | ||
|
|
294da32401 | ||
|
|
7f00672010 | ||
|
|
99bf89a360 | ||
|
|
6c8508eb7f | ||
|
|
69714d5b5c | ||
|
|
f9516ec7d3 | ||
|
|
6fdde93dee | ||
|
|
7afc71ec91 | ||
|
|
4595117d91 | ||
|
|
8630cc1021 | ||
|
|
135885b609 | ||
|
|
eb0865662c | ||
|
|
b7b94e7ae5 | ||
|
|
72be8bee19 | ||
|
|
0722b20c1c | ||
|
|
a392a0e6ff | ||
|
|
e22fa2f478 | ||
|
|
8b49c1ac06 | ||
|
|
da1182a405 | ||
|
|
53e995ee8c | ||
|
|
4732dc1a88 | ||
|
|
e325bcaf67 | ||
|
|
a7c30453db | ||
|
|
dedac3b2fe | ||
|
|
7d10bbdf8e | ||
|
|
72213dffa4 | ||
|
|
f778837d4b | ||
|
|
153ed6a7b7 | ||
|
|
5d279c8c5a | ||
|
|
ed910d5f6a | ||
|
|
87d2b6fa15 | ||
|
|
94cfb17291 | ||
|
|
3f641d37b7 | ||
|
|
551be12f01 | ||
|
|
b536020058 | ||
|
|
fb6fbc0a06 | ||
|
|
5ae64fd791 | ||
|
|
f9776e4319 | ||
|
|
75e736e7d5 | ||
|
|
1e4756aa1d | ||
|
|
52529d3c55 | ||
|
|
53296e8891 | ||
|
|
1c87ebc900 | ||
|
|
14d9924ea0 | ||
|
|
69f9b424c7 | ||
|
|
1a6da301a8 | ||
|
|
2728b3ed14 | ||
|
|
38284eef1f | ||
|
|
9debe1adcd | ||
|
|
cc93c15f8a |
4
.gitignore
vendored
4
.gitignore
vendored
@@ -1,11 +1,10 @@
|
||||
syntax: glob
|
||||
.idea
|
||||
apierrors/errors
|
||||
static/build.json
|
||||
static/dashboard/node_modules
|
||||
static/webapp/node_modules
|
||||
static/webapp/.git
|
||||
scripts/
|
||||
generators/
|
||||
*.pyc
|
||||
__pycache__
|
||||
.ropeproject
|
||||
@@ -20,3 +19,4 @@ build
|
||||
dist
|
||||
code.tar.gz
|
||||
server/schema/services/_cache.json
|
||||
server/apierrors/errors/*
|
||||
|
||||
223
README.md
223
README.md
@@ -1,4 +1,4 @@
|
||||
# TRAINS Server
|
||||
# Trains Server
|
||||
|
||||
## Auto-Magical Experiment Manager & Version Control for AI
|
||||
|
||||
@@ -9,25 +9,20 @@
|
||||
|
||||
## Introduction
|
||||
|
||||
The **trains-server** is the backend service infrastructure for [TRAINS](https://github.com/allegroai/trains).
|
||||
The **trains-server** is the backend service infrastructure for [Trains](https://github.com/allegroai/trains).
|
||||
It allows multiple users to collaborate and manage their experiments.
|
||||
By default, TRAINS is set up to work with the TRAINS demo server, which is open to anyone and resets periodically.
|
||||
In order to host your own server, you will need to install **trains-server** and point TRAINS to it.
|
||||
By default, **Trains** is set up to work with the **Trains** demo server, which is open to anyone and resets periodically.
|
||||
In order to host your own server, you will need to launch **trains-server** and point **Trains** to it.
|
||||
|
||||
**trains-server** contains the following components:
|
||||
|
||||
* The TRAINS Web-App, a single-page UI for experiment management and browsing
|
||||
* The **Trains** Web-App, a single-page UI for experiment management and browsing
|
||||
* RESTful API for:
|
||||
* Documenting and logging experiment information, statistics and results
|
||||
* Querying experiments history, logs and results
|
||||
* Locally-hosted file server for storing images and models making them easily accessible using the Web-App
|
||||
|
||||
You can quickly setup your **trains-server** using:
|
||||
- [Docker Installation](#installation)
|
||||
- Pre-built Amazon [AWS image](#aws)
|
||||
- [Kubernetes Helm](https://github.com/allegroai/trains-server-helm#trains-server-for-kubernetes-clusters-using-helm)
|
||||
or manual [Kubernetes installation](https://github.com/allegroai/trains-server-k8s#trains-server-for-kubernetes-clusters)
|
||||
|
||||
You can quickly [deploy](#launching-trains-server) your **trains-server** using Docker, AWS EC2 AMI, or Kubernetes.
|
||||
|
||||
## System design
|
||||
|
||||
@@ -44,136 +39,42 @@ You can quickly setup your **trains-server** using:
|
||||
- Web application on sub-domain: app.\*.\*
|
||||
- API service on sub-domain: api.\*.\*
|
||||
- File storage service on sub-domain: files.\*.\*
|
||||
|
||||
## Launching trains-server
|
||||
|
||||
## Install / Upgrade - AWS <a name="aws"></a>
|
||||
### Prerequisites
|
||||
|
||||
Use one of our pre-installed Amazon Machine Images for easy deployment in AWS.
|
||||
|
||||
For details and instructions, see [TRAINS-server: AWS pre-installed images](docs/install_aws.md).
|
||||
|
||||
## Docker Installation - Linux, macOS, and Windows <a name="installation"></a>
|
||||
|
||||
Use our pre-built Docker image for easy deployment in Linux and macOS. <br>
|
||||
For [Windows](https://github.com/allegroai/trains-server/blob/master/docs/faq.md#docker_compose_win10), please see detailed docker-compose installation instructions on our [FAQ](https://github.com/allegroai/trains-server/blob/master/docs/faq.md#docker_compose_win10).<br>
|
||||
Latest docker images can be found [here](https://hub.docker.com/r/allegroai/trains).
|
||||
|
||||
1. Setup Docker (docker-compose installation details: [Ubuntu](docs/faq.md#ubuntu) / [macOS](docs/faq.md#mac-osx))
|
||||
|
||||
<details>
|
||||
<summary>Make sure ports 8080/8081/8008 are available for the TRAINS-server services:</summary>
|
||||
The ports 8080/8081/8008 must be available for the **trains-server** services.
|
||||
|
||||
For example, to see if port `8080` is in use:
|
||||
For example, to see if port `8080` is in use:
|
||||
|
||||
```bash
|
||||
$ sudo lsof -Pn -i4 | grep :8080 | grep LISTEN
|
||||
```
|
||||
* Linux or macOS:
|
||||
|
||||
sudo lsof -Pn -i4 | grep :8080 | grep LISTEN
|
||||
|
||||
* Windows:
|
||||
|
||||
netstat -an |find /i "8080"
|
||||
|
||||
### Launching
|
||||
|
||||
</details>
|
||||
|
||||
Increase vm.max_map_count for `ElasticSearch` docker
|
||||
Launch **trains-server** in any of the following formats:
|
||||
|
||||
- Linux
|
||||
```bash
|
||||
$ echo "vm.max_map_count=262144" > /tmp/99-trains.conf
|
||||
$ sudo mv /tmp/99-trains.conf /etc/sysctl.d/99-trains.conf
|
||||
$ sudo sysctl -w vm.max_map_count=262144
|
||||
$ sudo service docker restart
|
||||
```
|
||||
|
||||
- macOS
|
||||
```bash
|
||||
$ screen ~/Library/Containers/com.docker.docker/Data/vms/0/tty
|
||||
$ sysctl -w vm.max_map_count=262144
|
||||
```
|
||||
- Pre-built [AWS EC2 AMI](https://github.com/allegroai/trains-server/blob/master/docs/install_aws.md)
|
||||
- Pre-built Docker Image
|
||||
- [Linux](https://github.com/allegroai/trains-server/blob/master/docs/install_linux_mac.md)
|
||||
- [macOS](https://github.com/allegroai/trains-server/blob/master/docs/install_linux_mac.md)
|
||||
- [Windows 10](https://github.com/allegroai/trains-server/blob/master/docs/install_win.md)
|
||||
- Kubernetes
|
||||
- [Kubernetes Helm](https://github.com/allegroai/trains-server-helm#prerequisites)
|
||||
- Manual [Kubernetes installation](https://github.com/allegroai/trains-server-k8s#prerequisites)
|
||||
|
||||
1. Create local directories for the databases and storage.
|
||||
## Connecting Trains to your trains-server
|
||||
|
||||
```bash
|
||||
$ sudo mkdir -p /opt/trains/data/elastic
|
||||
$ sudo mkdir -p /opt/trains/data/mongo/db
|
||||
$ sudo mkdir -p /opt/trains/data/mongo/configdb
|
||||
$ sudo mkdir -p /opt/trains/data/redis
|
||||
$ sudo mkdir -p /opt/trains/logs
|
||||
$ sudo mkdir -p /opt/trains/data/fileserver
|
||||
$ sudo mkdir -p /opt/trains/config
|
||||
```
|
||||
|
||||
Set folder permissions
|
||||
|
||||
- Linux
|
||||
```bash
|
||||
$ sudo chown -R 1000:1000 /opt/trains
|
||||
```
|
||||
- macOS
|
||||
```bash
|
||||
$ sudo chown -R $(whoami):staff /opt/trains
|
||||
```
|
||||
|
||||
1. Download the `docker-compose.yml` file, either download [manually](https://raw.githubusercontent.com/allegroai/trains-server/master/docker-compose.yml) or execute:
|
||||
|
||||
```bash
|
||||
$ curl https://raw.githubusercontent.com/allegroai/trains-server/master/docker-compose.yml -o docker-compose.yml
|
||||
```
|
||||
|
||||
1. Launch the Docker containers <a name="launch-docker"></a>
|
||||
|
||||
```bash
|
||||
$ docker-compose -f docker-compose.yml up
|
||||
```
|
||||
|
||||
1. Your server is now running on [http://localhost:8080](http://localhost:8080) and the following ports are available:
|
||||
|
||||
* Web server on port `8080`
|
||||
* API server on port `8008`
|
||||
* File server on port `8081`
|
||||
|
||||
**\* If something went wrong along the way, check our FAQ: [Docker Setup](docs/docker_setup.md#setup-docker), [Ubuntu Support](docs/faq.md#ubuntu), [macOS Support](docs/faq.md#mac-osx)**
|
||||
|
||||
## Optional Configuration
|
||||
|
||||
The **trains-server** default configuration can be easily overridden using external configuration files. By default, the server will look for these files in `/opt/trains/config`.
|
||||
|
||||
In order to apply the new configuration, you must restart the server (see [Restarting trains-server](#restart-server)).
|
||||
|
||||
### Adding Web Login Authentication
|
||||
|
||||
By default anyone can login to the **trains-server** Web-App.
|
||||
You can configure the **trains-server** to allow only a specific set of users to access the system.
|
||||
|
||||
Enable this feature by placing `apiserver.conf` file under `/opt/trains/config`.
|
||||
|
||||
Sample `apiserver.conf` configuration file can be found [here](https://github.com/allegroai/trains-server/blob/master/docs/apiserver.conf)
|
||||
|
||||
To apply the changes, you must [restart the *trains-server*](#restart-server).
|
||||
|
||||
### Configuring the Non-Responsive Experiments Watchdog
|
||||
|
||||
The non-responsive experiment watchdog, monitors experiments that were not updated for a given period of time,
|
||||
and marks them as `aborted`. The watchdog is always active with a default of 7200 seconds (2 hours) of inactivity threshold.
|
||||
|
||||
To change the watchdog's timeouts, place a `services.conf` file under `/opt/trains/config`.
|
||||
|
||||
Sample watchdog `services.conf` configuration file can be found [here](https://github.com/allegroai/trains-server/blob/master/docs/services.conf)
|
||||
|
||||
To apply the changes, you must [restart the *trains-server*](#restart-server).
|
||||
|
||||
### Restarting trains-server <a name="restart-server"></a>
|
||||
|
||||
To restart the **trains-server**, you must first stop the containers, and then restart them.
|
||||
```bash
|
||||
$ docker-compose down
|
||||
$ docker-compose -f docker-compose.yml up
|
||||
```
|
||||
|
||||
|
||||
## Configuring **TRAINS** client
|
||||
|
||||
Once you have installed the **trains-server**, make sure to configure **TRAINS** [client](https://github.com/allegroai/trains)
|
||||
to use your locally installed server (and not the demo server).
|
||||
|
||||
- Run the `trains-init` command for an interactive setup
|
||||
|
||||
- Or manually edit `~/trains.conf` file, making sure the `api_server` value is configured correctly, for example:
|
||||
By default, the **Trains** client is set up to work with the [**Trains** demo server](https://demoapp.trains.allegro.ai/).
|
||||
To have the **Trains** client use your **trains-server** instead:
|
||||
- Run the `trains-init` command for an interactive setup.
|
||||
- Or manually edit `~/trains.conf` file, making sure the server settings (`api_server`, `web_server`, `file_server`) are configured correctly, for example:
|
||||
|
||||
api {
|
||||
# API server on port 8008
|
||||
@@ -186,26 +87,42 @@ to use your locally installed server (and not the demo server).
|
||||
files_server: "http://localhost:8081"
|
||||
}
|
||||
|
||||
* Notice that if you setup **trains-server** in a sub-domain configuration, there is no need to specify a port number,
|
||||
**Note**: If you have set up **trains-server** in a sub-domain configuration, then there is no need to specify a port number,
|
||||
it will be inferred from the http/s scheme.
|
||||
|
||||
See [Installing and Configuring TRAINS](https://github.com/allegroai/trains#configuration) for more details.
|
||||
After launching the **trains-server** and configuring the **Trains** client to use the **trains-server**,
|
||||
you can [use](https://github.com/allegroai/trains#using-trains) **Trains** in your experiments and view them in your **trains-server** web server,
|
||||
for example http://localhost:8080.
|
||||
For more information about the Trains client, see [**Trains**](https://github.com/allegroai/trains).
|
||||
|
||||
## What next?
|
||||
## Advanced Functionality
|
||||
|
||||
Now that the **trains-server** is installed, and TRAINS is configured to use it,
|
||||
you can [use](https://github.com/allegroai/trains#using-trains) TRAINS in your experiments and view them in the web server,
|
||||
for example http://localhost:8080
|
||||
**trains-server** provides a few additional useful features, which can be manually enabled:
|
||||
|
||||
* [Web login authentication](https://github.com/allegroai/trains-server/blob/master/docs/faq.md#web-auth)
|
||||
* [Non-responsive experiments watchdog](https://github.com/allegroai/trains-server/blob/master/docs/faq.md#watchdog-the-non-responsive-task-watchdog-settings)
|
||||
|
||||
## Restarting trains-server
|
||||
|
||||
To restart the **trains-server**, you must first stop the containers, and then restart them.
|
||||
|
||||
```bash
|
||||
docker-compose down
|
||||
docker-compose -f docker-compose.yml up
|
||||
```
|
||||
|
||||
## Upgrading <a name="upgrade"></a>
|
||||
|
||||
We are constantly updating, improving and adding to the **trains-server**.
|
||||
New releases will include new pre-built Docker images.
|
||||
When we release a new version and include a new pre-built Docker image for it, upgrade as follows:
|
||||
**trains-server** releases are also reflected in the [docker compose configuration file](https://github.com/allegroai/trains-server/blob/master/docker-compose.yml).
|
||||
We strongly encourage you to keep your **trains-server** up to date, by keeping up with the current release.
|
||||
|
||||
**Note**: The following upgrade instructions use the Linux OS as an example.
|
||||
|
||||
To upgrade your existing **trains-server** deployment:
|
||||
|
||||
1. Shut down the docker containers
|
||||
```bash
|
||||
$ docker-compose down
|
||||
docker-compose down
|
||||
```
|
||||
|
||||
1. We highly recommend backing up your data directory before upgrading.
|
||||
@@ -213,7 +130,7 @@ When we release a new version and include a new pre-built Docker image for it, u
|
||||
Assuming your data directory is `/opt/trains`, to archive all data into `~/trains_backup.tgz` execute:
|
||||
|
||||
```bash
|
||||
$ sudo tar czvf ~/trains_backup.tgz /opt/trains/data
|
||||
sudo tar czvf ~/trains_backup.tgz /opt/trains/data
|
||||
```
|
||||
|
||||
<details>
|
||||
@@ -221,29 +138,29 @@ When we release a new version and include a new pre-built Docker image for it, u
|
||||
|
||||
To restore this example backup, execute:
|
||||
```bash
|
||||
$ sudo rm -R /opt/trains/data
|
||||
$ sudo tar -xzf ~/trains_backup.tgz -C /opt/trains/data
|
||||
sudo rm -R /opt/trains/data
|
||||
sudo tar -xzf ~/trains_backup.tgz -C /opt/trains/data
|
||||
```
|
||||
</details>
|
||||
|
||||
1. Download the latest `docker-compose.yml` file, either [manually](https://raw.githubusercontent.com/allegroai/trains-server/master/docker-compose.yml) or execute:
|
||||
1. Download the latest `docker-compose.yml` file.
|
||||
|
||||
```bash
|
||||
$ curl https://raw.githubusercontent.com/allegroai/trains-server/master/docker-compose.yml -o docker-compose.yml
|
||||
curl https://raw.githubusercontent.com/allegroai/trains-server/master/docker-compose.yml -o docker-compose.yml
|
||||
```
|
||||
|
||||
1. Spin up the docker containers, it will automatically pull the latest trains-server build
|
||||
1. Spin up the docker containers, it will automatically pull the latest **trains-server** build
|
||||
```bash
|
||||
$ docker-compose -f docker-compose.yml pull
|
||||
$ docker-compose -f docker-compose.yml up
|
||||
docker-compose -f docker-compose.yml pull
|
||||
docker-compose -f docker-compose.yml up
|
||||
```
|
||||
|
||||
**\* If something went wrong along the way, check our FAQ: [Docker Upgrade](docs/docker_setup.md#common-docker-upgrade-errors)**
|
||||
**\* If something went wrong along the way, check our FAQ: [Common Docker Upgrade Errors](https://github.com/allegroai/trains-server/blob/master/docs/faq.md#common-docker-upgrade-errors).**
|
||||
|
||||
|
||||
## Community & Support
|
||||
|
||||
If you have any questions, look to the TRAINS-server [FAQ](https://github.com/allegroai/trains-server/blob/master/docs/faq.md), or
|
||||
If you have any questions, look to the Trains server [FAQ](https://github.com/allegroai/trains-server/blob/master/docs/faq.md), or
|
||||
tag your questions on [stackoverflow](https://stackoverflow.com/questions/tagged/trains) with '**trains**' tag.
|
||||
|
||||
For feature requests or bug reports, please use [GitHub issues](https://github.com/allegroai/trains-server/issues).
|
||||
|
||||
@@ -20,9 +20,12 @@ services:
|
||||
- mongo
|
||||
- elasticsearch
|
||||
environment:
|
||||
ELASTIC_SERVICE_HOST: elasticsearch
|
||||
MONGODB_SERVICE_HOST: mongo
|
||||
REDIS_SERVICE_HOST: redis
|
||||
TRAINS_ELASTIC_SERVICE_HOST: elasticsearch
|
||||
TRAINS_ELASTIC_SERVICE_PORT: 9200
|
||||
TRAINS_MONGODB_SERVICE_HOST: mongo
|
||||
TRAINS_MONGODB_SERVICE_PORT: 27017
|
||||
TRAINS_REDIS_SERVICE_HOST: redis
|
||||
TRAINS_REDIS_SERVICE_PORT: 6379
|
||||
networks:
|
||||
- backend
|
||||
elasticsearch:
|
||||
|
||||
@@ -16,9 +16,12 @@ services:
|
||||
- elasticsearch
|
||||
- fileserver
|
||||
environment:
|
||||
ELASTIC_SERVICE_HOST: elasticsearch
|
||||
MONGODB_SERVICE_HOST: mongo
|
||||
REDIS_SERVICE_HOST: redis
|
||||
TRAINS_ELASTIC_SERVICE_HOST: elasticsearch
|
||||
TRAINS_ELASTIC_SERVICE_PORT: 9200
|
||||
TRAINS_MONGODB_SERVICE_HOST: mongo
|
||||
TRAINS_MONGODB_SERVICE_PORT: 27017
|
||||
TRAINS_REDIS_SERVICE_HOST: redis
|
||||
TRAINS_REDIS_SERVICE_PORT: 6379
|
||||
ports:
|
||||
- "8008:8008"
|
||||
networks:
|
||||
@@ -114,4 +117,4 @@ networks:
|
||||
driver: bridge
|
||||
|
||||
volumes:
|
||||
mongodata:
|
||||
mongodata:
|
||||
|
||||
@@ -16,9 +16,14 @@ services:
|
||||
- elasticsearch
|
||||
- fileserver
|
||||
environment:
|
||||
ELASTIC_SERVICE_HOST: elasticsearch
|
||||
MONGODB_SERVICE_HOST: mongo
|
||||
REDIS_SERVICE_HOST: redis
|
||||
TRAINS_ELASTIC_SERVICE_HOST: elasticsearch
|
||||
TRAINS_ELASTIC_SERVICE_PORT: 9200
|
||||
TRAINS_MONGODB_SERVICE_HOST: mongo
|
||||
TRAINS_MONGODB_SERVICE_PORT: 27017
|
||||
TRAINS_REDIS_SERVICE_HOST: redis
|
||||
TRAINS_REDIS_SERVICE_PORT: 6379
|
||||
TRAINS__apiserver__mongo__pre_populate__enabled: "true"
|
||||
TRAINS__apiserver__mongo__pre_populate__zip_file: "/opt/trains/db-pre-populate/export.zip"
|
||||
ports:
|
||||
- "8008:8008"
|
||||
networks:
|
||||
|
||||
@@ -1,5 +1,5 @@
|
||||
auth {
|
||||
# Fixed users login credetials
|
||||
# Fixed users login credentials
|
||||
# No other user will be able to login
|
||||
fixed_users {
|
||||
enabled: true
|
||||
|
||||
@@ -1,166 +0,0 @@
|
||||
# TRAINS-server: Using Docker Pre-Built Images
|
||||
|
||||
The pre-built Docker image for the **trains-server** is the quickest way to get started with your own **TRAINS** server.
|
||||
|
||||
You can also build the entire **trains-server** architecture using the code available in the [trains-server](https://github.com/allegroai/trains-server) repository.
|
||||
|
||||
**Note**: We tested this pre-built Docker image with Linux, only. For Windows users, we recommend installing the pre-built image on a Linux virtual machine.
|
||||
|
||||
## Prerequisites
|
||||
|
||||
* You must be logged in as a user with sudo privileges
|
||||
* Use `bash` for all command-line instructions in this installation
|
||||
|
||||
## Setup Docker
|
||||
|
||||
### Step 1: Install Docker CE
|
||||
|
||||
You must first install Docker. For instructions about installing Docker, see [Supported platforms](https://docs.docker.com/install//#support) in the Docker documentation.
|
||||
|
||||
For example, to [install in Ubuntu](https://docs.docker.com/install/linux/docker-ce/ubuntu/) / Mint (x86_64/amd64):
|
||||
|
||||
```bash
|
||||
sudo apt-get install -y apt-transport-https ca-certificates curl software-properties-common
|
||||
curl -fsSL https://download.docker.com/linux/ubuntu/gpg | sudo apt-key add -
|
||||
. /etc/os-release
|
||||
sudo add-apt-repository "deb [arch=amd64] https://download.docker.com/linux/ubuntu $UBUNTU_CODENAME stable"
|
||||
sudo apt-get update
|
||||
sudo apt-get install -y docker-ce
|
||||
```
|
||||
|
||||
### Step 2: Set the Maximum Number of Memory Map Areas
|
||||
|
||||
Elastic requires that the `vm.max_map_count` kernel setting, which is the maximum number of memory map areas a process can use, is set to at least 262144.
|
||||
|
||||
For CentOS 7, Ubuntu 16.04, Mint 18.3, Ubuntu 18.04 and Mint 19.x, we tested the following commands to set `vm.max_map_count`:
|
||||
|
||||
```bash
|
||||
echo "vm.max_map_count=262144" > /tmp/99-trains.conf
|
||||
sudo mv /tmp/99-trains.conf /etc/sysctl.d/99-trains.conf
|
||||
sudo sysctl -w vm.max_map_count=262144
|
||||
```
|
||||
|
||||
For information about setting this parameter on other systems, see the [elastic](https://www.elastic.co/guide/en/elasticsearch/reference/current/docker.html#docker-cli-run-prod-mode) documentation.
|
||||
|
||||
### Step 3: Restart the Docker daemon
|
||||
|
||||
Restart the Docker daemon.
|
||||
|
||||
```bash
|
||||
sudo service docker restart
|
||||
```
|
||||
|
||||
### Step 4: Choose a Data Directory
|
||||
|
||||
Choose a directory on your system in which all data maintained by the **trains-server** is stored.
|
||||
Create this directory, and set its owner and group to `uid` 1000. The data stored in this directory includes the database, uploaded files and logs.
|
||||
|
||||
For example, if your data directory is `/opt/trains`, then use the following command:
|
||||
|
||||
```bash
|
||||
sudo mkdir -p /opt/trains/data/elastic
|
||||
sudo mkdir -p /opt/trains/data/mongo/db
|
||||
sudo mkdir -p /opt/trains/data/mongo/configdb
|
||||
sudo mkdir -p /opt/trains/data/redis
|
||||
sudo mkdir -p /opt/trains/logs
|
||||
sudo mkdir -p /opt/trains/data/fileserver
|
||||
sudo mkdir -p /opt/trains/config
|
||||
|
||||
sudo chown -R 1000:1000 /opt/trains
|
||||
```
|
||||
|
||||
## TRAINS-server: Manually Launching Docker Containers <a name="launch"></a>
|
||||
|
||||
You can manually launch the Docker containers using the following commands.
|
||||
|
||||
If your data directory is not `/opt/trains`, then in the five `docker run` commands below, you must replace all occurrences of `/opt/trains` with your data directory path.
|
||||
|
||||
1. Launch the **trains-elastic** Docker container.
|
||||
|
||||
sudo docker run -d --restart="always" --name="trains-elastic" -e "bootstrap.memory_lock=true" --ulimit memlock=-1:-1 -e "ES_JAVA_OPTS=-Xms2g -Xmx2g" -e "bootstrap.memory_lock=true" -e "cluster.name=trains" -e "discovery.zen.minimum_master_nodes=1" -e "node.name=trains" -e "script.inline=true" -e "script.update=true" -e "thread_pool.bulk.queue_size=2000" -e "thread_pool.search.queue_size=10000" -e "xpack.security.enabled=false" -e "xpack.monitoring.enabled=false" -e "cluster.routing.allocation.node_initial_primaries_recoveries=500" -e "node.ingest=true" -e "http.compression_level=7" -e "reindex.remote.whitelist=*.*" -e "script.painless.regex.enabled=true" --network="host" -v /opt/trains/data/elastic:/usr/share/elasticsearch/data docker.elastic.co/elasticsearch/elasticsearch:5.6.16
|
||||
|
||||
1. Launch the **trains-mongo** Docker container.
|
||||
|
||||
sudo docker run -d --restart="always" --name="trains-mongo" -v /opt/trains/data/mongo/db:/data/db -v /opt/trains/data/mongo/configdb:/data/configdb --network="host" mongo:3.6.5
|
||||
|
||||
1. Launch the **trains-redis** Docker container.
|
||||
|
||||
sudo docker run -d --restart="always" --name="trains-redis" -v /opt/trains/data/redis:/data --network="host" redis:5.0
|
||||
|
||||
1. Launch the **trains-fileserver** Docker container.
|
||||
|
||||
sudo docker run -d --restart="always" --name="trains-fileserver" --network="host" -v /opt/trains/logs:/var/log/trains -v /opt/trains/data/fileserver:/mnt/fileserver allegroai/trains:latest fileserver
|
||||
|
||||
1. Launch the **trains-apiserver** Docker container.
|
||||
|
||||
sudo docker run -d --restart="always" --name="trains-apiserver" --network="host" -v /opt/trains/logs:/var/log/trains -v /opt/trains/config:/opt/trains/config allegroai/trains:latest apiserver
|
||||
|
||||
1. Launch the **trains-webserver** Docker container.
|
||||
|
||||
sudo docker run -d --restart="always" --name="trains-webserver" -p 8080:80 allegroai/trains:latest webserver
|
||||
|
||||
1. Your server is now running on [http://localhost:8080](http://localhost:8080) and the following ports are available:
|
||||
|
||||
* API server on port `8008`
|
||||
* Web server on port `8080`
|
||||
* File server on port `8081`
|
||||
|
||||
## Manually Upgrading TRAINS-server Containers <a name="upgrade"></a>
|
||||
|
||||
We are constantly updating, improving and adding to the **trains-server**.
|
||||
New releases will include new pre-built Docker images.
|
||||
When we release a new version and include a new pre-built Docker image for it, upgrade as follows:
|
||||
|
||||
1. Shut down and remove each of your Docker instances using the following commands:
|
||||
|
||||
```bash
|
||||
$ sudo docker stop <docker-name>
|
||||
$ sudo docker rm -v <docker-name>
|
||||
```
|
||||
|
||||
The Docker names are (see [Launching Docker Containers](#launch-docker)):
|
||||
|
||||
* `trains-elastic`
|
||||
* `trains-mongo`
|
||||
* `trains-redis`
|
||||
* `trains-fileserver`
|
||||
* `trains-apiserver`
|
||||
* `trains-webserver`
|
||||
|
||||
2. We highly recommend backing up your data directory!. A simple way to do that is using `tar`:
|
||||
|
||||
For example, if your data directory is `/opt/trains`, use the following command:
|
||||
|
||||
```bash
|
||||
$ sudo tar czvf ~/trains_backup.tgz /opt/trains/data
|
||||
```
|
||||
This backups all data to an archive in your home directory.
|
||||
|
||||
To restore this example backup, use the following command:
|
||||
```bash
|
||||
$ sudo rm -R /opt/trains/data
|
||||
$ sudo tar -xzf ~/trains_backup.tgz -C /opt/trains/data
|
||||
```
|
||||
|
||||
3. Pull the new **trains-server** docker image using the following command:
|
||||
|
||||
```bash
|
||||
$ sudo docker pull allegroai/trains:latest
|
||||
```
|
||||
|
||||
If you wish to pull a different version, replace `latest` with the required version number, for example:
|
||||
```bash
|
||||
$ sudo docker pull allegroai/trains:0.11.0
|
||||
```
|
||||
|
||||
4. Launch the newly released Docker image (see [Launching Docker Containers](#trains-server-manually-launching-docker-containers-)).
|
||||
|
||||
|
||||
#### Common Docker Upgrade Errors
|
||||
|
||||
* In case of a docker error: "... The container name "/trains-???" is already in use by ..."
|
||||
Try removing deprecated images with:
|
||||
```bash
|
||||
$ docker rm -f $(docker ps -a -q)
|
||||
```
|
||||
|
||||
354
docs/faq.md
354
docs/faq.md
@@ -1,77 +1,122 @@
|
||||
# TRAINS-server FAQ
|
||||
# trains-server FAQ
|
||||
|
||||
* [Deploying trains-server on Kubernetes clusters](#kubernetes)
|
||||
Launching **trains-server**
|
||||
|
||||
* [Creating a Helm Chart for trains-server Kubernetes deployment](#helm)
|
||||
* How do I launch **trains-server** on:
|
||||
|
||||
* [Running trains-server on Mac OS X](#mac-osx)
|
||||
* [Stand alone Linux Ubuntu systems?](#ubuntu)
|
||||
|
||||
* [macOS?](#mac-osx)
|
||||
|
||||
* [Windows 10?](#docker_compose_win10)
|
||||
|
||||
* [Running trains-server on Windows 10](#docker_compose_win10)
|
||||
* [How do I restart trains-server?](#restart)
|
||||
|
||||
* [Installing trains-server on stand alone Linux Ubuntu systems ](#ubuntu)
|
||||
Kubernetes
|
||||
|
||||
* [Resolving port conflicts preventing fixed users mode authentication and login](#port-conflict)
|
||||
* [Can I deploy trains-server on Kubernetes clusters?](#kubernetes)
|
||||
|
||||
* [Configuring trains-server for sub-domains and load balancers](#sub-domains)
|
||||
* [Can I create a Helm Chart for trains-server Kubernetes deployment?](#helm)
|
||||
|
||||
Configuration
|
||||
|
||||
### Deploying trains-server on Kubernetes clusters <a name="kubernetes"></a>
|
||||
* [How do I configure trains-server for sub-domains and load balancers?](#sub-domains)
|
||||
|
||||
**trains-server** supports Kubernetes. See [trains-server-k8s](https://github.com/allegroai/trains-server-k8s)
|
||||
which contains the YAML files describing the required services and detailed instructions for deploying
|
||||
**trains-server** to a Kubernetes clusters.
|
||||
* [Can I add web login authentication to trains-server?](#web-auth)
|
||||
|
||||
### Creating a Helm Chart for trains-server Kubernetes deployment <a name="helm"></a>
|
||||
* [Can I modify the non-responsive experiment watchdog settings?](#watchdog)
|
||||
|
||||
**trains-server** supports creating a Helm chart for Kubernetes deployment. See [trains-server-helm](https://github.com/allegroai/trains-server-helm)
|
||||
which you can use to create a Helm chart for **trains-server** and contains detailed instructions for deploying
|
||||
**trains-server** to a Kubernetes clusters using Helm.
|
||||
Troubleshooting
|
||||
|
||||
### Running trains-server on Mac OS X <a name="mac-osx"></a>
|
||||
* [How do I fix Docker upgrade errors?](#common-docker-upgrade-errors)
|
||||
|
||||
To install and configure **trains-server** on Mac OS X, follow the steps below.
|
||||
* [Why is web login authentication not working?](#port-conflict)
|
||||
|
||||
1. Install [docker for OS X](https://docs.docker.com/docker-for-mac/install/).
|
||||
## Launching **trains-server**
|
||||
|
||||
1. Configure [Docker](https://www.elastic.co/guide/en/elasticsearch/reference/current/docker.html#docker-cli-run-prod-mode).
|
||||
### How do I launch trains-server on stand alone Linux Ubuntu systems? <a name="ubuntu"></a>
|
||||
|
||||
$ screen ~/Library/Containers/com.docker.docker/Data/vms/0/tty
|
||||
$ sysctl -w vm.max_map_count=262144
|
||||
To launch **trains-server** on a stand alone Linux Ubuntu:
|
||||
|
||||
1. Install [docker for Ubuntu](https://docs.docker.com/install/linux/docker-ce/ubuntu/).
|
||||
|
||||
1. Install `docker-compose` using the following commands (for more detailed information, see the [Install Docker Compose](https://docs.docker.com/compose/install/) in the Docker documentation):
|
||||
|
||||
sudo curl -L "https://github.com/docker/compose/releases/download/1.24.1/docker-compose-$(uname -s)-$(uname -m)" -o /usr/local/bin/docker-compose
|
||||
sudo chmod +x /usr/local/bin/docker-compose
|
||||
|
||||
1. Remove the previous installation of **trains-server**.
|
||||
|
||||
**WARNING**: This clears all existing **Trains** databases.
|
||||
|
||||
sudo rm -R /opt/trains/
|
||||
|
||||
1. Create local directories for the databases and storage.
|
||||
|
||||
$ sudo mkdir -p /opt/trains/data/elastic
|
||||
$ sudo mkdir -p /opt/trains/data/mongo/db
|
||||
$ sudo mkdir -p /opt/trains/data/mongo/configdb
|
||||
$ sudo mkdir -p /opt/trains/data/redis
|
||||
$ sudo mkdir -p /opt/trains/logs
|
||||
$ sudo mkdir -p /opt/trains/config
|
||||
$ sudo mkdir -p /opt/trains/data/fileserver
|
||||
$ sudo chown -R $(whoami):staff /opt/trains
|
||||
sudo mkdir -p /opt/trains/data/elastic
|
||||
sudo mkdir -p /opt/trains/data/mongo/db
|
||||
sudo mkdir -p /opt/trains/data/mongo/configdb
|
||||
sudo mkdir -p /opt/trains/logs
|
||||
sudo mkdir -p /opt/trains/config
|
||||
sudo mkdir -p /opt/trains/data/fileserver
|
||||
sudo chown -R 1000:1000 /opt/trains
|
||||
|
||||
1. Clone the [trains-server](https://github.com/allegroai/trains-server) repository and change directories to the new **trains-server** directory.
|
||||
|
||||
git clone https://github.com/allegroai/trains-server.git
|
||||
cd trains-server
|
||||
|
||||
1. Run `docker-compose`
|
||||
|
||||
/usr/local/bin/docker-compose -f docker-compose.yml up
|
||||
|
||||
Your server is now running on [http://localhost:8080](http://localhost:8080)
|
||||
|
||||
### How do I launch trains-server on macOS? <a name="mac-osx"></a>
|
||||
|
||||
To launch **trains-server** on macOS:
|
||||
|
||||
1. Install [docker for macOS](https://docs.docker.com/docker-for-mac/install/).
|
||||
|
||||
1. Configure [Docker](https://www.elastic.co/guide/en/elasticsearch/reference/current/docker.html#docker-cli-run-prod-mode).
|
||||
|
||||
screen ~/Library/Containers/com.docker.docker/Data/vms/0/tty
|
||||
sysctl -w vm.max_map_count=262144
|
||||
|
||||
1. Create local directories for the databases and storage.
|
||||
|
||||
sudo mkdir -p /opt/trains/data/elastic
|
||||
sudo mkdir -p /opt/trains/data/mongo/db
|
||||
sudo mkdir -p /opt/trains/data/mongo/configdb
|
||||
sudo mkdir -p /opt/trains/data/redis
|
||||
sudo mkdir -p /opt/trains/logs
|
||||
sudo mkdir -p /opt/trains/config
|
||||
sudo mkdir -p /opt/trains/data/fileserver
|
||||
sudo chown -R $(whoami):staff /opt/trains
|
||||
|
||||
1. Open the Docker app, select **Preferences**, and then on the **File Sharing** tab, add `/opt/trains`.
|
||||
|
||||
1. Clone the [trains-server](https://github.com/allegroai/trains-server) repository and change directories to the new **trains-server** directory.
|
||||
|
||||
$ git clone https://github.com/allegroai/trains-server.git
|
||||
$ cd trains-server
|
||||
git clone https://github.com/allegroai/trains-server.git
|
||||
cd trains-server
|
||||
|
||||
1. Run `docker-compose` with the unified docker image.
|
||||
1. Run `docker-compose` with the docker compose file.
|
||||
|
||||
$ docker-compose -f docker-compose-unified.yml up
|
||||
docker-compose -f docker-compose.yml up
|
||||
|
||||
Your server is now running on [http://localhost:8080](http://localhost:8080)
|
||||
|
||||
### Running trains-server on Windows 10 <a name="docker_compose_win10"></a>
|
||||
### How do I launch trains-server on Windows 10? <a name="docker_compose_win10"></a>
|
||||
|
||||
You can run **trains-server** on Windows 10 using Docker Desktop for Windows (see the Docker [System Requirements](https://docs.docker.com/docker-for-windows/install/#system-requirements)).
|
||||
|
||||
To run **trains-server** on Windows 10, follow the steps below.
|
||||
To launch **trains-server** on Windows 10:
|
||||
|
||||
1. Install the Docker Desktop for Windows application by either:
|
||||
|
||||
* Following the [Install Docker Desktop on Windows](https://docs.docker.com/docker-for-windows/install/) instructions.
|
||||
* Running the Docker installation [wizard](https://hub.docker.com/?overlay=onboarding).
|
||||
* following the [Install Docker Desktop on Windows](https://docs.docker.com/docker-for-windows/install/) instructions.
|
||||
* running the Docker installation [wizard](https://hub.docker.com/?overlay=onboarding).
|
||||
|
||||
1. Increase the memory allocation in Docker Desktop to `4GB`.
|
||||
|
||||
@@ -83,110 +128,46 @@ To run **trains-server** on Windows 10, follow the steps below.
|
||||
|
||||
1. Create local directories for data and logs. Open PowerShell and execute the following commands:
|
||||
|
||||
mkdir c:\opt\trains\logs
|
||||
mkdir c:\opt\trains\config
|
||||
cd c:
|
||||
mkdir c:\opt\trains\data
|
||||
mkdir c:\opt\trains\data\elastic
|
||||
mkdir c:\opt\trains\data\redis
|
||||
mkdir c:\opt\trains\data\fileserver
|
||||
mkdir c:\opt\trains\logs
|
||||
|
||||
1. Save the **trains-server** docker-compose YAML file [docker-compose-win10.yml](https://raw.githubusercontent.com/allegroai/trains-server/master/docker-compose-win10.yml) as `c:\opt\trains\docker-compose.yml`.
|
||||
1. Download the **trains-server** docker-compose YAML file [docker-compose-win10.yml](https://raw.githubusercontent.com/allegroai/trains-server/master/docker-compose-win10.yml) as `c:\opt\trains\docker-compose.yml`.
|
||||
|
||||
1. Run `docker-compose`. In PowerShell, execute the following commands:
|
||||
|
||||
cd c:\opt\trains\
|
||||
docker-compose up
|
||||
docker-compose -f up docker-compose-win10.yml
|
||||
|
||||
Your server is now running on [http://localhost:8080](http://localhost:8080)
|
||||
|
||||
### Installing trains-server on stand alone Linux Ubuntu systems <a name="ubuntu"></a>
|
||||
### How do I restart trains-server? <a name="restart"></a>
|
||||
|
||||
To install **trains-server** on a stand alone Linux Ubuntu, follow the steps belows.
|
||||
Restart *trains-server* by first stopping the Docker containers and then restarting them.
|
||||
|
||||
1. Install [docker for Ubuntu](https://docs.docker.com/install/linux/docker-ce/ubuntu/).
|
||||
```bash
|
||||
docker-compose down
|
||||
docker-compose up -f docker-compose.yml
|
||||
```
|
||||
|
||||
**Note**: If you are using a different docker-compose YAML file, specify that file.
|
||||
|
||||
1. Install `docker-compose` using the following commands (for more detailed information, see the [Install Docker Compose](https://docs.docker.com/compose/install/) in the Docker documentation):
|
||||
## Kubernetes
|
||||
|
||||
sudo curl -L "https://github.com/docker/compose/releases/download/1.24.1/docker-compose-$(uname -s)-$(uname -m)" -o /usr/local/bin/docker-compose
|
||||
sudo chmod +x /usr/local/bin/docker-compose
|
||||
### Can I deploy trains-server on Kubernetes clusters? <a name="kubernetes"></a>
|
||||
|
||||
1. Remove the previous installation of **trains-server**.
|
||||
**trains-server** supports Kubernetes. See [trains-server-k8s](https://github.com/allegroai/trains-server-k8s)
|
||||
which contains the YAML files describing the required services and detailed instructions for deploying
|
||||
**trains-server** to a Kubernetes clusters.
|
||||
|
||||
**WARNING**: This clears all existing **TRAINS** databases.
|
||||
### Can I create a Helm Chart for trains-server Kubernetes deployment? <a name="helm"></a>
|
||||
|
||||
$ sudo rm -R /opt/trains/
|
||||
**trains-server** supports creating a Helm chart for Kubernetes deployment. See [trains-server-helm](https://github.com/allegroai/trains-server-helm)
|
||||
which you can use to create a Helm chart for **trains-server** and contains detailed instructions for deploying
|
||||
**trains-server** to a Kubernetes clusters using Helm.
|
||||
|
||||
1. Create local directories for the databases and storage.
|
||||
## Configuration
|
||||
|
||||
$ sudo mkdir -p /opt/trains/data/elastic
|
||||
$ sudo mkdir -p /opt/trains/data/mongo/db
|
||||
$ sudo mkdir -p /opt/trains/data/mongo/configdb
|
||||
$ sudo mkdir -p /opt/trains/logs
|
||||
$ sudo mkdir -p /opt/trains/config
|
||||
$ sudo mkdir -p /opt/trains/data/fileserver
|
||||
$ sudo chown -R 1000:1000 /opt/trains
|
||||
|
||||
1. Clone the [trains-server](https://github.com/allegroai/trains-server) repository and change directories to the new **trains-server** directory.
|
||||
|
||||
$ git clone https://github.com/allegroai/trains-server.git
|
||||
$ cd trains-server
|
||||
|
||||
1. Run `docker-compose`
|
||||
|
||||
$ /usr/local/bin/docker-compose -f docker-compose.yml up
|
||||
|
||||
Your server is now running on [http://localhost:8080](http://localhost:8080)
|
||||
|
||||
### Resolving port conflicts preventing fixed users mode authentication and login <a name="port-conflict"></a>
|
||||
|
||||
A port conflict may occur between the **trains-server** MongoDB and Elastic instances and other
|
||||
instances running on your system. **trains-server** uses the following default ports which may be in conflict with other instances:
|
||||
|
||||
* MongoDB port `27017`
|
||||
* Elastic port `9200`
|
||||
|
||||
You can check for port conflicts in the logs in `/opt/trains/log`.
|
||||
|
||||
If a port conflict occurs, first change the port in your **trains-server** `/opt/trains/server/config/default/hosts.conf` file to the new port and then
|
||||
run the `docker run` command with the `port` option specifying the new port to restart the **trains-server** instance.
|
||||
|
||||
For example, to resolve a MongoDB port conflict change port `27017` to `27018`:
|
||||
|
||||
1. Modify `/opt/trains/server/config/default/hosts.conf` changing the ports in the `mongo` section:
|
||||
|
||||
elastic {
|
||||
events {
|
||||
hosts: [{host: "127.0.0.1", port: 9200}]
|
||||
args {
|
||||
timeout: 60
|
||||
dead_timeout: 10
|
||||
max_retries: 5
|
||||
retry_on_timeout: true
|
||||
}
|
||||
index_version: "1"
|
||||
}
|
||||
}
|
||||
|
||||
mongo {
|
||||
backend {
|
||||
host: "mongodb://127.0.0.1:27018/backend"
|
||||
}
|
||||
auth {
|
||||
host: "mongodb://127.0.0.1:27018/auth"
|
||||
}
|
||||
}
|
||||
|
||||
2. Start the **trains-server** MongoDB container using `--port 27018`.
|
||||
|
||||
sudo docker run -d --restart="always" --name="trains-mongo" -v /opt/trains/data/mongo/db:/data/db -v /opt/trains/data/mongo/configdb:/data/configdb --network="host" mongo:3.6.5 mongod --port 27018
|
||||
|
||||
In a future version of **trains-server**, to start the API server, environment variables will be available to use instead of modifying the configuration file (instead of Step 1 above).
|
||||
The environment variables will be available to set different ports for both MongoDB and Elastic instances:
|
||||
|
||||
* `MONGODB_SERVICE_PORT` (e.g., `MONGODB_SERVICE_PORT=27018`)
|
||||
* `ELASTIC_SERVICE_POST` (e.g., `ELASTIC_SERVICE_POST=9201`)
|
||||
|
||||
### Configuring trains-server for sub-domains and load balancers <a name="sub-domains"></a>
|
||||
### How do I configure trains-server for sub-domains and load balancers? <a name="sub-domains"></a>
|
||||
|
||||
You can configure **trains-server** for sub-domains and a load balancer.
|
||||
|
||||
@@ -222,3 +203,126 @@ For example, if your domain is `trains.mydomain.com` and your sub-domains are `a
|
||||
|
||||
1. Run the Docker containers with our updated `docker run` commands (see [Launching Docker Containers](#https://github.com/allegroai/trains-server#launching-docker-containers)).
|
||||
|
||||
### Can I add web login authentication to trains-server? <a name="web-auth"></a>
|
||||
|
||||
By default, anyone can login to the **trains-server** Web-App.
|
||||
You can configure the **trains-server** to allow only a specific set of users to access the system.
|
||||
|
||||
To add web login authentication to **trains-server**:
|
||||
|
||||
1. If you are not using the current **trains-server** version, then [upgrade](https://github.com/allegroai/trains-server#upgrade).
|
||||
|
||||
1. In `/opt/trains/config/apiserver.conf`, add the `auth` section and in it specify the users, for example:
|
||||
|
||||
**Note**: A sample `apiserver.conf` configuration file is also available [here](https://github.com/allegroai/trains-server/blob/master/docs/apiserver.conf).
|
||||
|
||||
auth {
|
||||
# Fixed users login credentials
|
||||
# No other user will be able to login
|
||||
fixed_users {
|
||||
enabled: true
|
||||
users: [
|
||||
{
|
||||
username: "jane"
|
||||
password: "12345678"
|
||||
name: "Jane Doe"
|
||||
},
|
||||
{
|
||||
username: "john"
|
||||
password: "12345678"
|
||||
name: "John Doe"
|
||||
},
|
||||
]
|
||||
}
|
||||
}
|
||||
|
||||
1. Restart **trains-server** (see the [Restarting trains-server](#restart) FAQ).
|
||||
|
||||
### Can I modify the experiment watchdog settings? <a name="watchdog"></a>
|
||||
|
||||
The non-responsive experiment watchdog monitors experiments that were not updated for a specified period of time
|
||||
and marks them as `aborted`. The watchdog is always active.
|
||||
|
||||
You can modify the following settings for the watchdog:
|
||||
|
||||
* the time threshold (in seconds) of experiment inactivity (default value is 7200 seconds (2 hours))
|
||||
* the time interval (in seconds) between watchdog cycles
|
||||
|
||||
To change the watchdog's settings:
|
||||
|
||||
1. In `/opt/trains/config`, add the `services.conf` file and in it specify the watchdog settings, for example:
|
||||
|
||||
**Note**: A sample watchdog `services.conf` configuration file is also available [here](https://github.com/allegroai/trains-server/blob/master/docs/services.conf).
|
||||
|
||||
tasks {
|
||||
non_responsive_tasks_watchdog {
|
||||
# In-progress tasks that haven't been updated for at least 'value' seconds will be stopped by the watchdog
|
||||
threshold_sec: 7200
|
||||
|
||||
# Watchdog will sleep for this number of seconds after each cycle
|
||||
watch_interval_sec: 900
|
||||
}
|
||||
}
|
||||
|
||||
1. Restart **trains-server** (see the [Restarting trains-server](#restart) FAQ).
|
||||
|
||||
## Troubleshooting
|
||||
|
||||
### How do I fix Docker upgrade errors? <a name="common-docker-upgrade-errors"></a>
|
||||
|
||||
To resolve the Docker error "... The container name "/trains-???" is already in use by ...", try removing deprecated images:
|
||||
|
||||
docker rm -f $(docker ps -a -q)
|
||||
|
||||
### Why is web login authentication not working?
|
||||
|
||||
A port conflict between the **trains-server** MongoDB and / or Elastic instances, and other
|
||||
instances running on your system may prevent web login authentication
|
||||
from working correctly.
|
||||
|
||||
**trains-server** uses the following default ports which may be in conflict with other instances:
|
||||
|
||||
* MongoDB port `27017`
|
||||
* Elastic port `9200`
|
||||
|
||||
You can check for port conflicts in the logs in `/opt/trains/log`.
|
||||
|
||||
If a port conflict occurs, change the MongoDB and / or Elastic ports in the `docker-compose.yml`,
|
||||
and then run the Docker compose commands to restart the **trains-server** instance.
|
||||
|
||||
To change the MongoDB and / or Elastic ports for **trains-server**:
|
||||
|
||||
1. Edit the `docker-compose.yml` file.
|
||||
|
||||
1. In the `services/trainsserver/environment` section, add the following environment variable(s):
|
||||
|
||||
* For MongoDB:
|
||||
|
||||
MONGODB_SERVICE_PORT: <new-mongodb-port>
|
||||
|
||||
* For Elastic:
|
||||
|
||||
ELASTIC_SERVICE_PORT: <new-elasticsearch-port>
|
||||
|
||||
For example:
|
||||
|
||||
MONGODB_SERVICE_PORT: 27018
|
||||
ELASTIC_SERVICE_PORT: 9201
|
||||
|
||||
1. For MongoDB, in the `services/mongo/ports` section, expose the new MongoDB port:
|
||||
|
||||
<new-mongodb-port>:27017
|
||||
|
||||
For example:
|
||||
|
||||
20718:27017
|
||||
|
||||
1. For Elastic, in the `services/elasticsearch/ports` section, expose the new Elastic port:
|
||||
|
||||
<new-elsticsearch-port>:9200
|
||||
|
||||
For example:
|
||||
|
||||
9201:9200
|
||||
|
||||
2. Restart **trains-server** (see the [Restarting trains-server](#restart) FAQ).
|
||||
@@ -1,32 +1,36 @@
|
||||
# **TRAINS-server**: AWS pre-installed images
|
||||
# Deploying **trains-server** on AWS
|
||||
|
||||
In order to easily deploy **trains-server** on AWS, we created the following Amazon Machine Images (AMIs).
|
||||
To easily deploy **trains-server** on AWS, use one of our pre-built Amazon Machine Images (AMIs).
|
||||
We provide AMIs per region for each released version of **trains-server**, see [Released versions](#released-versions) below.
|
||||
|
||||
Service port numbers on these AMIs are:
|
||||
- Web: 8080
|
||||
- API: 8008
|
||||
- File Server: 8081
|
||||
Once the AMI is up and running, [configure the Trains client](https://github.com/allegroai/trains/blob/master/README.md#configuration) to use your **trains-server**.
|
||||
The service port numbers on our **trains-server** AMIs:
|
||||
|
||||
Persistent storage configuration:
|
||||
- MongoDB: /opt/trains/data/mongo/
|
||||
- ElasticSearch: /opt/trains/data/elastic/
|
||||
- File Server: /mnt/fileserver/
|
||||
- Web application: `8080`
|
||||
- API Server: `8008`
|
||||
- File Server: `8081`
|
||||
|
||||
Instructions on launching a custom AMI from the EC2 console can be found [here](https://aws.amazon.com/premiumsupport/knowledge-center/launch-instance-custom-ami/)
|
||||
and a detailed version [here](https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/launching-instance.html).
|
||||
The persistent storage configuration:
|
||||
|
||||
The minimum recommended instance type is **t3a.large**
|
||||
- MongoDB: `/opt/trains/data/mongo/`
|
||||
- ElasticSearch: `/opt/trains/data/elastic/`
|
||||
- File Server: `/mnt/fileserver/`
|
||||
|
||||
For examples and use cases, check the [Trains usage examples](https://github.com/allegroai/trains/blob/master/docs/trains_examples.md).
|
||||
|
||||
For instructions on launching a custom AMI from the EC2 console, see the [AWS Knowledge Center](https://aws.amazon.com/premiumsupport/knowledge-center/launch-instance-custom-ami/) or detailed instructions in the [AWS Documentation](https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/launching-instance.html).
|
||||
|
||||
The minimum recommended amount of RAM is 8GB. For example, **t3.large** or **t3a.large** would have the minimum recommended amount of resources.
|
||||
|
||||
## Upgrading
|
||||
|
||||
In order to upgrade **trains-server** on an existing EC2 instance based on one of these AMIs, SSH into the instance and follow the [upgrade instructions](../README.md#upgrade) for **trains-server**.
|
||||
To upgrade **trains-server** on an existing EC2 instance based on one of these AMIs, SSH into the instance and follow the [upgrade instructions](../README.md#upgrade) for **trains-server**.
|
||||
|
||||
### Upgrading AMI's to v0.12
|
||||
**Including the automatically updated AMI**
|
||||
### Upgrading AMIs to v0.12
|
||||
|
||||
Version 0.12 introduced an additional REDIS docker to the trains-server setup.
|
||||
This upgrade includes the automatically updated AMI in Version 0.12. It also includes an additional REDIS docker to the **trains-server** setup.
|
||||
|
||||
AMI upgrading instructions:
|
||||
To upgrade the AMI:
|
||||
|
||||
1. SSH to the EC2 machine running one of the `Latest Version AMI's`
|
||||
2. Execute the following bash commands
|
||||
@@ -44,47 +48,85 @@ AMI upgrading instructions:
|
||||
|
||||
## Released versions
|
||||
|
||||
The following sections provide a list containing AMI Image ID per region for each released **trains-server** version.
|
||||
The following sections contain lists of AMI Image IDs, per region, for each released **trains-server** version.
|
||||
|
||||
### Latest Version AMI <a name="autoupdate"></a>
|
||||
**For easier upgrades: The following AMI automatically update to the latest release every reboot**
|
||||
### Latest version AMI - v0.14.0 (auto update)<a name="autoupdate"></a>
|
||||
|
||||
* **eu-north-1** : ami-055909c1b9471451d
|
||||
* **ap-south-1** : ami-0476123cc77226faf
|
||||
* **eu-west-3** : ami-01df7d35ab63cca70
|
||||
* **eu-west-2** : ami-00e8004c11fd0228e
|
||||
* **eu-west-1** : ami-04293fbba6d3acad1
|
||||
* **ap-northeast-2** : ami-004331f9c5eb13e94
|
||||
* **ap-northeast-1** : ami-08cc80e2049b30e61
|
||||
* **sa-east-1** : ami-06d814a0b6ffa3153
|
||||
* **ca-central-1** : ami-069210ff757e9c1b7
|
||||
* **ap-southeast-1** : ami-0d12cc70d6e9c0f39
|
||||
* **ap-southeast-2** : ami-0b4615aa76c055267
|
||||
* **eu-central-1** : ami-06537f431e52e4763
|
||||
* **us-east-2** : ami-0c3cfbcb8e72ecfc5
|
||||
* **us-west-1** : ami-0d83de031b83b6880
|
||||
* **us-west-2** : ami-06968633c4f7187c4
|
||||
* **us-east-1** : ami-07ff2f5f7ef99e8f6
|
||||
For easier upgrades, the following AMIs automatically update to the latest release every reboot:
|
||||
|
||||
### v0.12.1
|
||||
* **eu-north-1** : ami-003118a8103286d84
|
||||
* **ap-south-1** : ami-02dfe86baa48e096f
|
||||
* **eu-west-3** : ami-0cc1f01267d2a780d
|
||||
* **eu-west-2** : ami-0e4c8332e5ce09585
|
||||
* **eu-west-1** : ami-03459a2f0b0a3b1ab
|
||||
* **ap-northeast-2** : ami-08f6c2aed3a53f24c
|
||||
* **ap-northeast-1** : ami-0b798eab95a7c5435
|
||||
* **sa-east-1** : ami-0d3ee166c09f0d1b2
|
||||
* **ca-central-1** : ami-00a758c56bd63acd5
|
||||
* **ap-southeast-1** : ami-0be64d4988cd03fbb
|
||||
* **ap-southeast-2** : ami-02087310d43a63f31
|
||||
* **eu-central-1** : ami-097bbefeac0c74225
|
||||
* **us-east-2** : ami-07eda256712b90f4d
|
||||
* **us-west-1** : ami-02ef2b55cbd01c7df
|
||||
* **us-west-2** : ami-037c6176ef4735360
|
||||
* **us-east-1** : ami-08715c20c0e3f1c15
|
||||
* **eu-north-1** : ami-050c24cc0099e9512
|
||||
* **ap-south-1** : ami-07bb33de49e319d73
|
||||
* **eu-west-3** : ami-00ecdf092af972d24
|
||||
* **eu-west-2** : ami-09ace28116ad33dd9
|
||||
* **eu-west-1** : ami-01d85e00c7741d69b
|
||||
* **ap-northeast-2** : ami-0ccc3d85996362545
|
||||
* **ap-northeast-1** : ami-06abda05aa2407b1a
|
||||
* **sa-east-1** : ami-0ce3597b116cfdd79
|
||||
* **ca-central-1** : ami-0cb2d22a74007fa14
|
||||
* **ap-southeast-1** : ami-06a9784d792a7c30f
|
||||
* **ap-southeast-2** : ami-012ab6092f28f62b6
|
||||
* **eu-central-1** : ami-04443efac619cac6d
|
||||
* **us-east-2** : ami-05391549da2d5e38c
|
||||
* **us-west-1** : ami-0444959077f5f7310
|
||||
* **us-west-2** : ami-029b979c20d7f16f3
|
||||
* **us-east-1** : ami-024ab496fe05a4b4d
|
||||
|
||||
### v0.14.0 (static update)
|
||||
* **eu-north-1** : ami-02de71586ec496e38
|
||||
* **ap-south-1** : ami-074b03849b51852e5
|
||||
* **eu-west-3** : ami-022c388835e0eeb03
|
||||
* **eu-west-2** : ami-0a151c236c6b27707
|
||||
* **eu-west-1** : ami-06de69b06b4e73312
|
||||
* **ap-northeast-2** : ami-0ee821b72d9f669b1
|
||||
* **ap-northeast-1** : ami-03687ae215e64e100
|
||||
* **sa-east-1** : ami-01eb83364b7f667af
|
||||
* **ca-central-1** : ami-02e9b35f9c90377e6
|
||||
* **ap-southeast-1** : ami-0d3ab5ab0048fea51
|
||||
* **ap-southeast-2** : ami-0bd39d908fe3a9e06
|
||||
* **eu-central-1** : ami-0b8638701311b35c4
|
||||
* **us-east-2** : ami-02ff039693fc3a614
|
||||
* **us-west-1** : ami-08634f7dfb608a9a7
|
||||
* **us-west-2** : ami-034d693ef742b9333
|
||||
* **us-east-1** : ami-0b828b05c323dde7f
|
||||
|
||||
### v0.13.0 (static update)
|
||||
* **eu-north-1** : ami-0d9c74a015e7510d8
|
||||
* **ap-south-1** : ami-02acd6dd0659bb5c1
|
||||
* **eu-west-3** : ami-0f0cc5cb6d9afd194
|
||||
* **eu-west-2** : ami-0298fdc0860206ed9
|
||||
* **eu-west-1** : ami-0cdc072e528401d5e
|
||||
* **ap-northeast-2** : ami-0055579cc95b0e53e
|
||||
* **ap-northeast-1** : ami-0ced7becb9b83b5d0
|
||||
* **sa-east-1** : ami-033345d0f16a1b5e4
|
||||
* **ca-central-1** : ami-06c63b05aed47ae67
|
||||
* **ap-southeast-1** : ami-09f0355f367f30602
|
||||
* **ap-southeast-2** : ami-0bd2314163ce0fba0
|
||||
* **eu-central-1** : ami-05fbae957df63e366
|
||||
* **us-east-2** : ami-050c51b5b4074d3fc
|
||||
* **us-west-1** : ami-06ad513073d4e5a19
|
||||
* **us-west-2** : ami-0c96e1361d1d4ca94
|
||||
* **us-east-1** : ami-07b669040d1eea213
|
||||
|
||||
### v0.12.1 (static update)
|
||||
* **eu-north-1** : ami-003118a8103286d84
|
||||
* **ap-south-1** : ami-02dfe86baa48e096f
|
||||
* **eu-west-3** : ami-0cc1f01267d2a780d
|
||||
* **eu-west-2** : ami-0e4c8332e5ce09585
|
||||
* **eu-west-1** : ami-03459a2f0b0a3b1ab
|
||||
* **ap-northeast-2** : ami-08f6c2aed3a53f24c
|
||||
* **ap-northeast-1** : ami-0b798eab95a7c5435
|
||||
* **sa-east-1** : ami-0d3ee166c09f0d1b2
|
||||
* **ca-central-1** : ami-00a758c56bd63acd5
|
||||
* **ap-southeast-1** : ami-0be64d4988cd03fbb
|
||||
* **ap-southeast-2** : ami-02087310d43a63f31
|
||||
* **eu-central-1** : ami-097bbefeac0c74225
|
||||
* **us-east-2** : ami-07eda256712b90f4d
|
||||
* **us-west-1** : ami-02ef2b55cbd01c7df
|
||||
* **us-west-2** : ami-037c6176ef4735360
|
||||
* **us-east-1** : ami-08715c20c0e3f1c15
|
||||
|
||||
### v0.12.0 (static update)
|
||||
|
||||
### v0.12.0
|
||||
* **eu-north-1** : ami-03ff8ab48cd43e77e
|
||||
* **ap-south-1** : ami-079c1a41ff836487c
|
||||
* **eu-west-3** : ami-0121ef0398ae87ab0
|
||||
@@ -102,7 +144,8 @@ The following sections provide a list containing AMI Image ID per region for eac
|
||||
* **us-west-2** : ami-0018d5a7e58966848
|
||||
* **us-east-1** : ami-08f24178fc14a84d2
|
||||
|
||||
### v0.11.0
|
||||
### v0.11.0 (static update)
|
||||
|
||||
* **eu-north-1** : ami-0cbe338f058018c97
|
||||
* **ap-south-1** : ami-06d72ff894f7a5e5d
|
||||
* **eu-west-3** : ami-00f2a45d67df2d2f3
|
||||
@@ -120,7 +163,8 @@ The following sections provide a list containing AMI Image ID per region for eac
|
||||
* **us-west-2** : ami-0e384b6f78bf96ebe
|
||||
* **us-east-1** : ami-0a7b46f907d5d9c4a
|
||||
|
||||
### v0.10.1
|
||||
### v0.10.1 (static update)
|
||||
|
||||
* **eu-north-1** : ami-09937ec4d18350c32
|
||||
* **ap-south-1** : ami-089d6ba7541ec4c7f
|
||||
* **eu-west-3** : ami-0accb1a94bdd5c5c1
|
||||
@@ -138,7 +182,8 @@ The following sections provide a list containing AMI Image ID per region for eac
|
||||
* **us-west-2** : ami-0d1cb8ba7de246ff0
|
||||
* **us-east-1** : ami-049ccba6abdb40cba
|
||||
|
||||
### v0.10.0
|
||||
### v0.10.0 (static update)
|
||||
|
||||
* **eu-north-1** : ami-05ba33c763877e54e
|
||||
* **ap-south-1** : ami-0529eec569161cae5
|
||||
* **eu-west-3** : ami-03cb9396f63e26ff6
|
||||
@@ -157,7 +202,7 @@ The following sections provide a list containing AMI Image ID per region for eac
|
||||
* **us-west-2** : ami-04a522ecb2250fb44
|
||||
* **us-east-1** : ami-0a66ddbd50959f91e
|
||||
|
||||
### v0.9.0
|
||||
### v0.9.0 (static update)
|
||||
|
||||
* **us-east-1** : ami-0991ad536ecbacdac
|
||||
* **eu-north-1** : ami-07cbcdff501b14afe
|
||||
@@ -175,3 +220,4 @@ The following sections provide a list containing AMI Image ID per region for eac
|
||||
* **us-east-2** : ami-03b01914b07428488
|
||||
* **us-west-1** : ami-0cf4768e9d47ed076
|
||||
* **us-west-2** : ami-0b145f37da31eb9fb
|
||||
|
||||
|
||||
97
docs/install_linux_mac.md
Normal file
97
docs/install_linux_mac.md
Normal file
@@ -0,0 +1,97 @@
|
||||
# Launching the **trains-server** Docker in Linux or macOS
|
||||
|
||||
For Linux or macOS, use our pre-built Docker image for easy deployment. The latest Docker images can be found [here](https://hub.docker.com/r/allegroai/trains).
|
||||
|
||||
For Linux users:
|
||||
|
||||
* You must be logged in as a user with sudo privileges.
|
||||
* Use `bash` for all command-line instructions in this installation.
|
||||
|
||||
To launch **trains-server** on Linux or macOS:
|
||||
|
||||
1. Install Docker.
|
||||
|
||||
* Linux - see [Docker for Ubuntu](https://docs.docker.com/install/linux/docker-ce/ubuntu/).
|
||||
* macOS - see [Docker for macOS](https://docs.docker.com/docker-for-mac/install/).
|
||||
|
||||
1. Verify the Docker CE installation. Execute the command:
|
||||
|
||||
sudo docker run hello-world
|
||||
|
||||
The expected is output is:
|
||||
|
||||
Hello from Docker!
|
||||
This message shows that your installation appears to be working correctly.
|
||||
To generate this message, Docker took the following steps:
|
||||
|
||||
1. The Docker client contacted the Docker daemon.
|
||||
2. The Docker daemon pulled the "hello-world" image from the Docker Hub. (amd64)
|
||||
3. The Docker daemon created a new container from that image which runs the executable that produces the output you are currently reading.
|
||||
4. The Docker daemon streamed that output to the Docker client, which sent it to your terminal.
|
||||
|
||||
1. For Linux only, install `docker-compose`. Execute the following commands (for more information, see [Install Docker Compose](https://docs.docker.com/compose/install/) in the Docker documentation):
|
||||
|
||||
sudo curl -L "https://github.com/docker/compose/releases/download/1.24.1/docker-compose-$(uname -s)-$(uname -m)" -o /usr/local/bin/docker-compose
|
||||
sudo chmod +x /usr/local/bin/docker-compose
|
||||
|
||||
1. Increase `vm.max_map_count` for ElasticSearch docker.
|
||||
|
||||
Linux:
|
||||
|
||||
echo "vm.max_map_count=262144" > /tmp/99-trains.conf
|
||||
sudo mv /tmp/99-trains.conf /etc/sysctl.d/99-trains.conf
|
||||
sudo sysctl -w vm.max_map_count=262144
|
||||
sudo service docker restart
|
||||
|
||||
macOS:
|
||||
|
||||
screen ~/Library/Containers/com.docker.docker/Data/vms/0/tty
|
||||
sysctl -w vm.max_map_count=262144
|
||||
|
||||
|
||||
1. Remove any previous installation of **trains-server**.
|
||||
|
||||
**WARNING**: This clears all existing **Trains** databases.
|
||||
|
||||
sudo rm -R /opt/trains/
|
||||
|
||||
1. Create local directories for the databases and storage.
|
||||
|
||||
sudo mkdir -p /opt/trains/data/elastic
|
||||
sudo mkdir -p /opt/trains/data/mongo/db
|
||||
sudo mkdir -p /opt/trains/data/mongo/configdb
|
||||
sudo mkdir -p /opt/trains/data/redis
|
||||
sudo mkdir -p /opt/trains/logs
|
||||
sudo mkdir -p /opt/trains/config
|
||||
sudo mkdir -p /opt/trains/data/fileserver
|
||||
|
||||
1. For macOS only, open the Docker app, select **Preferences**, and then on the **File Sharing** tab, add `/opt/trains`.
|
||||
|
||||
1. Grant access to the Dockers.
|
||||
|
||||
Linux:
|
||||
|
||||
sudo chown -R 1000:1000 /opt/trains
|
||||
|
||||
macOS:
|
||||
|
||||
sudo chown -R $(whoami):staff /opt/trains
|
||||
|
||||
1. Download the **trains-server** docker-compose YAML file.
|
||||
|
||||
cd /opt/trains
|
||||
curl https://raw.githubusercontent.com/allegroai/trains-server/master/docker-compose.yml -o docker-compose.yml
|
||||
|
||||
1. Run `docker-compose` with the downloaded configuration file.
|
||||
|
||||
sudo docker-compose -f docker-compose.yml up
|
||||
|
||||
Your server is now running on [http://localhost:8080](http://localhost:8080) and the following ports are available:
|
||||
|
||||
* Web server on port `8080`
|
||||
* API server on port `8008`
|
||||
* File server on port `8081`
|
||||
|
||||
## Next Step
|
||||
|
||||
Configure the [Trains client for trains-server](https://github.com/allegroai/trains/blob/master/README.md#configuration).
|
||||
50
docs/install_win.md
Normal file
50
docs/install_win.md
Normal file
@@ -0,0 +1,50 @@
|
||||
# Launching the **trains-server** Docker in Windows 10
|
||||
|
||||
For Windows, we recommend launching our pre-built Docker image on a Linux virtual machine.
|
||||
However, you can launch **trains-server** on Windows 10 using Docker Desktop for Windows (see the Docker [System Requirements](https://docs.docker.com/docker-for-windows/install/#system-requirements)).
|
||||
|
||||
To launch **trains-server** on Windows 10:
|
||||
|
||||
1. Install the Docker Desktop for Windows application by either:
|
||||
|
||||
* Following the [Install Docker Desktop on Windows](https://docs.docker.com/docker-for-windows/install/) instructions.
|
||||
* Running the Docker installation [wizard](https://hub.docker.com/?overlay=onboarding).
|
||||
|
||||
1. Increase the memory allocation in Docker Desktop to `4GB`.
|
||||
|
||||
1. In your Windows notification area (system tray), right click the Docker icon.
|
||||
|
||||
1. Click *Settings*, *Advanced*, and then set the memory to at least `4096`.
|
||||
|
||||
1. Click *Apply*.
|
||||
|
||||
1. Remove any previous installation of **trains-server**.
|
||||
|
||||
**WARNING**: This clears all existing **Trains** databases.
|
||||
|
||||
rmdir c:\opt\trains /s
|
||||
|
||||
1. Create local directories for data and logs. Open PowerShell and execute the following commands:
|
||||
|
||||
cd c:
|
||||
mkdir c:\opt\trains\data
|
||||
mkdir c:\opt\trains\logs
|
||||
|
||||
1. Save the **trains-server** docker-compose YAML file.
|
||||
|
||||
cd c:\opt\trains
|
||||
curl https://raw.githubusercontent.com/allegroai/trains-server/master/docker-compose-win10.yml -o docker-compose-win10.yml
|
||||
|
||||
1. Run `docker-compose`. In PowerShell, execute the following commands:
|
||||
|
||||
docker-compose -f docker-compose-win10.yml up
|
||||
|
||||
Your server is now running on [http://localhost:8080](http://localhost:8080) and the following ports are available:
|
||||
|
||||
* Web server on port `8080`
|
||||
* API server on port `8008`
|
||||
* File server on port `8081`
|
||||
|
||||
## Next Step
|
||||
|
||||
Configure the [Trains client for trains-server](https://github.com/allegroai/trains/blob/master/README.md#configuration).
|
||||
@@ -14,6 +14,9 @@ app = Flask(__name__)
|
||||
CORS(app, **config.get("fileserver.cors"))
|
||||
Compress(app)
|
||||
|
||||
if os.environ.get("TRAINS_UPLOAD_FOLDER"):
|
||||
app.config["UPLOAD_FOLDER"] = os.environ.get("TRAINS_UPLOAD_FOLDER")
|
||||
|
||||
|
||||
@app.route("/", methods=["POST"])
|
||||
def upload():
|
||||
|
||||
1
server/api_version.py
Normal file
1
server/api_version.py
Normal file
@@ -0,0 +1 @@
|
||||
__version__ = "2.7.0"
|
||||
@@ -89,6 +89,8 @@ _error_codes = {
|
||||
1003: ('worker_registered', 'worker is already registered'),
|
||||
1004: ('worker_not_registered', 'worker is not registered'),
|
||||
1005: ('worker_stats_not_found', 'worker stats not found'),
|
||||
|
||||
1104: ('invalid_scroll_id', 'Invalid scroll id'),
|
||||
},
|
||||
|
||||
(401, 'unauthorized'): {
|
||||
@@ -105,7 +107,6 @@ _error_codes = {
|
||||
|
||||
(403, 'forbidden'): {
|
||||
10: ('routing_error', 'forbidden (routing error)'),
|
||||
11: ('missing_routing_header', 'forbidden (missing routing header)'),
|
||||
12: ('blocked_internal_endpoint', 'forbidden (blocked internal endpoint)'),
|
||||
20: ('role_not_allowed', 'forbidden (not allowed for role)'),
|
||||
21: ('no_write_permission', 'forbidden (modification not allowed)'),
|
||||
@@ -121,6 +122,7 @@ _error_codes = {
|
||||
100: ('data_error', 'general data error'),
|
||||
101: ('inconsistent_data', 'inconsistent data encountered in document'),
|
||||
102: ('database_unavailable', 'database is temporarily unavailable'),
|
||||
110: ('update_failed', 'update failed'),
|
||||
|
||||
# Index-related issues
|
||||
201: ('missing_index', 'missing internal index'),
|
||||
|
||||
@@ -5,12 +5,12 @@ from typing import Union, Type, Iterable
|
||||
|
||||
import jsonmodels.errors
|
||||
import six
|
||||
import validators
|
||||
from jsonmodels import fields
|
||||
from jsonmodels.fields import _LazyType, NotSet
|
||||
from jsonmodels.models import Base as ModelBase
|
||||
from jsonmodels.validators import Enum as EnumValidator
|
||||
from luqum.parser import parser, ParseError
|
||||
from validators import email as email_validator, domain as domain_validator
|
||||
|
||||
from apierrors import errors
|
||||
|
||||
@@ -66,9 +66,7 @@ class DictField(fields.BaseField):
|
||||
value_types = tuple()
|
||||
|
||||
return tuple(
|
||||
_LazyType(type_)
|
||||
if isinstance(type_, six.string_types)
|
||||
else type_
|
||||
_LazyType(type_) if isinstance(type_, six.string_types) else type_
|
||||
for type_ in value_types
|
||||
)
|
||||
|
||||
@@ -78,6 +76,9 @@ class DictField(fields.BaseField):
|
||||
if not self.value_types:
|
||||
return
|
||||
|
||||
if not value:
|
||||
return
|
||||
|
||||
for item in value.values():
|
||||
self.validate_single_value(item)
|
||||
|
||||
@@ -104,7 +105,7 @@ class IntField(fields.IntField):
|
||||
|
||||
|
||||
def validate_lucene_query(value):
|
||||
if value == '':
|
||||
if value == "":
|
||||
return
|
||||
try:
|
||||
parser.parse(value)
|
||||
@@ -122,6 +123,7 @@ class LuceneQueryField(fields.StringField):
|
||||
|
||||
class NullableEnumValidator(EnumValidator):
|
||||
"""Validator for enums that allows a None value."""
|
||||
|
||||
def validate(self, value):
|
||||
if value is not None:
|
||||
super(NullableEnumValidator, self).validate(value)
|
||||
@@ -150,10 +152,6 @@ class EnumField(fields.StringField):
|
||||
|
||||
|
||||
class ActualEnumField(fields.StringField):
|
||||
@property
|
||||
def types(self):
|
||||
return (self.__enum,)
|
||||
|
||||
def __init__(
|
||||
self,
|
||||
enum_class: Type[Enum],
|
||||
@@ -164,12 +162,13 @@ class ActualEnumField(fields.StringField):
|
||||
**kwargs
|
||||
):
|
||||
self.__enum = enum_class
|
||||
self.types = (enum_class,)
|
||||
# noinspection PyTypeChecker
|
||||
choices = list(enum_class)
|
||||
validator_cls = EnumValidator if required else NullableEnumValidator
|
||||
validators = [*(validators or []), validator_cls(*choices)]
|
||||
super().__init__(
|
||||
default=default and self.parse_value(default),
|
||||
default=self.parse_value(default) if default else NotSet,
|
||||
*args,
|
||||
required=required,
|
||||
validators=validators,
|
||||
@@ -194,7 +193,7 @@ class EmailField(fields.StringField):
|
||||
super().validate(value)
|
||||
if value is None:
|
||||
return
|
||||
if validators.email(value) is not True:
|
||||
if email_validator(value) is not True:
|
||||
raise errors.bad_request.InvalidEmailAddress()
|
||||
|
||||
|
||||
@@ -203,7 +202,7 @@ class DomainField(fields.StringField):
|
||||
super().validate(value)
|
||||
if value is None:
|
||||
return
|
||||
if validators.domain(value) is not True:
|
||||
if domain_validator(value) is not True:
|
||||
raise errors.bad_request.InvalidDomainName()
|
||||
|
||||
|
||||
|
||||
@@ -58,3 +58,7 @@ class UpdateResponse(models.Base):
|
||||
class PagedRequest(models.Base):
|
||||
page = fields.IntField()
|
||||
page_size = fields.IntField()
|
||||
|
||||
|
||||
class IdResponse(models.Base):
|
||||
id = fields.StringField(required=True)
|
||||
|
||||
@@ -1,9 +1,12 @@
|
||||
from typing import Sequence
|
||||
|
||||
from jsonmodels.fields import StringField
|
||||
from jsonmodels import validators
|
||||
from jsonmodels.fields import StringField, BoolField
|
||||
from jsonmodels.models import Base
|
||||
from jsonmodels.validators import Length
|
||||
|
||||
from apimodels import ListField, IntField, ActualEnumField
|
||||
from bll.event.event_metrics import EventType
|
||||
from bll.event.scalar_key import ScalarKeyEnum
|
||||
|
||||
|
||||
@@ -17,4 +20,44 @@ class ScalarMetricsIterHistogramRequest(HistogramRequestBase):
|
||||
|
||||
|
||||
class MultiTaskScalarMetricsIterHistogramRequest(HistogramRequestBase):
|
||||
tasks: Sequence[str] = ListField(items_types=str)
|
||||
tasks: Sequence[str] = ListField(
|
||||
items_types=str, validators=[Length(minimum_value=1)]
|
||||
)
|
||||
|
||||
|
||||
class TaskMetric(Base):
|
||||
task: str = StringField(required=True)
|
||||
metric: str = StringField(required=True)
|
||||
|
||||
|
||||
class DebugImagesRequest(Base):
|
||||
metrics: Sequence[TaskMetric] = ListField(
|
||||
items_types=TaskMetric, validators=[Length(minimum_value=1)]
|
||||
)
|
||||
iters: int = IntField(default=1, validators=validators.Min(1))
|
||||
navigate_earlier: bool = BoolField(default=True)
|
||||
refresh: bool = BoolField(default=False)
|
||||
scroll_id: str = StringField()
|
||||
|
||||
|
||||
class IterationEvents(Base):
|
||||
iter: int = IntField()
|
||||
events: Sequence[dict] = ListField(items_types=dict)
|
||||
|
||||
|
||||
class MetricEvents(Base):
|
||||
task: str = StringField()
|
||||
metric: str = StringField()
|
||||
iterations: Sequence[IterationEvents] = ListField(items_types=IterationEvents)
|
||||
|
||||
|
||||
class DebugImageResponse(Base):
|
||||
metrics: Sequence[MetricEvents] = ListField(items_types=MetricEvents)
|
||||
scroll_id: str = StringField()
|
||||
|
||||
|
||||
class TaskMetricsRequest(Base):
|
||||
tasks: Sequence[str] = ListField(
|
||||
items_types=str, validators=[Length(minimum_value=1)]
|
||||
)
|
||||
event_type: EventType = ActualEnumField(EventType, required=True)
|
||||
|
||||
@@ -9,7 +9,7 @@ from apimodels.tasks import PublishResponse as TaskPublishResponse
|
||||
class CreateModelRequest(models.Base):
|
||||
name = fields.StringField(required=True)
|
||||
uri = fields.StringField(required=True)
|
||||
labels = DictField(value_types=string_types+(int,), required=True)
|
||||
labels = DictField(value_types=string_types+(int,))
|
||||
tags = ListField(items_types=string_types)
|
||||
system_tags = ListField(items_types=string_types)
|
||||
comment = fields.StringField()
|
||||
|
||||
@@ -12,3 +12,4 @@ class ReportStatsOptionResponse(Base):
|
||||
enabled_time = DateTimeField(nullable=True)
|
||||
enabled_version = StringField(nullable=True)
|
||||
enabled_user = StringField(nullable=True)
|
||||
current_version = StringField()
|
||||
|
||||
@@ -1,6 +1,6 @@
|
||||
import six
|
||||
from jsonmodels import models
|
||||
from jsonmodels.fields import StringField, BoolField, IntField
|
||||
from jsonmodels.fields import StringField, BoolField, IntField, EmbeddedField
|
||||
from jsonmodels.validators import Enum
|
||||
|
||||
from apimodels import DictField, ListField
|
||||
@@ -9,6 +9,24 @@ from database.model.task.task import TaskType
|
||||
from database.utils import get_options
|
||||
|
||||
|
||||
class ArtifactTypeData(models.Base):
|
||||
preview = StringField()
|
||||
content_type = StringField()
|
||||
data_hash = StringField()
|
||||
|
||||
|
||||
class Artifact(models.Base):
|
||||
key = StringField(required=True)
|
||||
type = StringField(required=True)
|
||||
mode = StringField(validators=Enum("input", "output"), default="output")
|
||||
uri = StringField()
|
||||
hash = StringField()
|
||||
content_size = IntField()
|
||||
timestamp = IntField()
|
||||
type_data = EmbeddedField(ArtifactTypeData)
|
||||
display_data = ListField([list])
|
||||
|
||||
|
||||
class StartedResponse(UpdateResponse):
|
||||
started = IntField()
|
||||
|
||||
@@ -72,3 +90,22 @@ class CreateRequest(TaskData):
|
||||
|
||||
class PingRequest(TaskRequest):
|
||||
pass
|
||||
|
||||
|
||||
class CloneRequest(TaskRequest):
|
||||
new_task_name = StringField()
|
||||
new_task_comment = StringField()
|
||||
new_task_tags = ListField([str])
|
||||
new_task_system_tags = ListField([str])
|
||||
new_task_parent = StringField()
|
||||
new_task_project = StringField()
|
||||
execution_overrides = DictField()
|
||||
|
||||
|
||||
class AddOrUpdateArtifactsRequest(TaskRequest):
|
||||
artifacts = ListField([Artifact], required=True)
|
||||
|
||||
|
||||
class AddOrUpdateArtifactsResponse(models.Base):
|
||||
added = ListField([str])
|
||||
updated = ListField([str])
|
||||
|
||||
464
server/bll/event/debug_images_iterator.py
Normal file
464
server/bll/event/debug_images_iterator.py
Normal file
@@ -0,0 +1,464 @@
|
||||
from collections import defaultdict
|
||||
from concurrent.futures.thread import ThreadPoolExecutor
|
||||
from functools import partial
|
||||
from itertools import chain
|
||||
from operator import attrgetter, itemgetter
|
||||
|
||||
import attr
|
||||
import dpath
|
||||
from boltons.iterutils import bucketize
|
||||
from elasticsearch import Elasticsearch
|
||||
from redis import StrictRedis
|
||||
from typing import Sequence, Tuple, Optional, Mapping
|
||||
|
||||
import database
|
||||
from apierrors import errors
|
||||
from bll.redis_cache_manager import RedisCacheManager
|
||||
from bll.event.event_metrics import EventMetrics
|
||||
from config import config
|
||||
from database.errors import translate_errors_context
|
||||
from jsonmodels.models import Base
|
||||
from jsonmodels.fields import StringField, ListField, IntField
|
||||
|
||||
from database.model.task.metrics import MetricEventStats
|
||||
from database.model.task.task import Task
|
||||
from timing_context import TimingContext
|
||||
from utilities.json import loads, dumps
|
||||
|
||||
|
||||
class VariantScrollState(Base):
|
||||
name: str = StringField(required=True)
|
||||
recycle_url_marker: str = StringField()
|
||||
last_invalid_iteration: int = IntField()
|
||||
|
||||
|
||||
class MetricScrollState(Base):
|
||||
task: str = StringField(required=True)
|
||||
name: str = StringField(required=True)
|
||||
last_min_iter: Optional[int] = IntField()
|
||||
last_max_iter: Optional[int] = IntField()
|
||||
timestamp: int = IntField(default=0)
|
||||
variants: Sequence[VariantScrollState] = ListField([VariantScrollState])
|
||||
|
||||
def reset(self):
|
||||
"""Reset the scrolling state for the metric"""
|
||||
self.last_min_iter = self.last_max_iter = None
|
||||
|
||||
|
||||
class DebugImageEventsScrollState(Base):
|
||||
id: str = StringField(required=True)
|
||||
metrics: Sequence[MetricScrollState] = ListField([MetricScrollState])
|
||||
|
||||
def to_json(self):
|
||||
return dumps(self.to_struct())
|
||||
|
||||
@classmethod
|
||||
def from_json(cls, s):
|
||||
return cls(**loads(s))
|
||||
|
||||
|
||||
@attr.s(auto_attribs=True)
|
||||
class DebugImagesResult(object):
|
||||
metric_events: Sequence[tuple] = []
|
||||
next_scroll_id: str = None
|
||||
|
||||
|
||||
class DebugImagesIterator:
|
||||
EVENT_TYPE = "training_debug_image"
|
||||
STATE_EXPIRATION_SECONDS = 3600
|
||||
|
||||
@property
|
||||
def _max_workers(self):
|
||||
return config.get("services.events.max_metrics_concurrency", 4)
|
||||
|
||||
def __init__(self, redis: StrictRedis, es: Elasticsearch):
|
||||
self.es = es
|
||||
self.cache_manager = RedisCacheManager(
|
||||
state_class=DebugImageEventsScrollState,
|
||||
redis=redis,
|
||||
expiration_interval=self.STATE_EXPIRATION_SECONDS,
|
||||
)
|
||||
|
||||
def get_task_events(
|
||||
self,
|
||||
company_id: str,
|
||||
metrics: Sequence[Tuple[str, str]],
|
||||
iter_count: int,
|
||||
navigate_earlier: bool = True,
|
||||
refresh: bool = False,
|
||||
state_id: str = None,
|
||||
) -> DebugImagesResult:
|
||||
es_index = EventMetrics.get_index_name(company_id, self.EVENT_TYPE)
|
||||
if not self.es.indices.exists(es_index):
|
||||
return DebugImagesResult()
|
||||
|
||||
unique_metrics = set(metrics)
|
||||
state = self.cache_manager.get_state(state_id) if state_id else None
|
||||
if not state:
|
||||
state = DebugImageEventsScrollState(
|
||||
id=database.utils.id(),
|
||||
metrics=self._init_metric_states(es_index, list(unique_metrics)),
|
||||
)
|
||||
else:
|
||||
state_metrics = set((m.task, m.name) for m in state.metrics)
|
||||
if state_metrics != unique_metrics:
|
||||
raise errors.bad_request.InvalidScrollId(
|
||||
"while getting debug images events", scroll_id=state_id
|
||||
)
|
||||
|
||||
if refresh:
|
||||
self._reinit_outdated_metric_states(company_id, es_index, state)
|
||||
for metric_state in state.metrics:
|
||||
metric_state.reset()
|
||||
|
||||
res = DebugImagesResult(next_scroll_id=state.id)
|
||||
try:
|
||||
with ThreadPoolExecutor(self._max_workers) as pool:
|
||||
res.metric_events = list(
|
||||
pool.map(
|
||||
partial(
|
||||
self._get_task_metric_events,
|
||||
es_index=es_index,
|
||||
iter_count=iter_count,
|
||||
navigate_earlier=navigate_earlier,
|
||||
),
|
||||
state.metrics,
|
||||
)
|
||||
)
|
||||
finally:
|
||||
self.cache_manager.set_state(state)
|
||||
|
||||
return res
|
||||
|
||||
def _reinit_outdated_metric_states(
|
||||
self, company_id, es_index, state: DebugImageEventsScrollState
|
||||
):
|
||||
"""
|
||||
Determines the metrics for which new debug image events were added
|
||||
since their states were initialized and reinits these states
|
||||
"""
|
||||
task_ids = set(metric.task for metric in state.metrics)
|
||||
tasks = Task.objects(id__in=list(task_ids), company=company_id).only(
|
||||
"id", "metric_stats"
|
||||
)
|
||||
|
||||
def get_last_update_times_for_task_metrics(task: Task) -> Sequence[Tuple]:
|
||||
"""For metrics that reported debug image events get tuples of task_id/metric_name and last update times"""
|
||||
metric_stats: Mapping[str, MetricEventStats] = task.metric_stats
|
||||
if not metric_stats:
|
||||
return []
|
||||
|
||||
return [
|
||||
(
|
||||
(task.id, stats.metric),
|
||||
stats.event_stats_by_type[self.EVENT_TYPE].last_update,
|
||||
)
|
||||
for stats in metric_stats.values()
|
||||
if self.EVENT_TYPE in stats.event_stats_by_type
|
||||
]
|
||||
|
||||
update_times = dict(
|
||||
chain.from_iterable(
|
||||
get_last_update_times_for_task_metrics(task) for task in tasks
|
||||
)
|
||||
)
|
||||
outdated_metrics = [
|
||||
metric
|
||||
for metric in state.metrics
|
||||
if (metric.task, metric.name) in update_times
|
||||
and update_times[metric.task, metric.name] > metric.timestamp
|
||||
]
|
||||
state.metrics = [
|
||||
*(metric for metric in state.metrics if metric not in outdated_metrics),
|
||||
*(
|
||||
self._init_metric_states(
|
||||
es_index,
|
||||
[(metric.task, metric.name) for metric in outdated_metrics],
|
||||
)
|
||||
),
|
||||
]
|
||||
|
||||
def _init_metric_states(
|
||||
self, es_index, metrics: Sequence[Tuple[str, str]]
|
||||
) -> Sequence[MetricScrollState]:
|
||||
"""
|
||||
Returned initialized metric scroll stated for the requested task metrics
|
||||
"""
|
||||
tasks = defaultdict(list)
|
||||
for (task, metric) in metrics:
|
||||
tasks[task].append(metric)
|
||||
|
||||
with ThreadPoolExecutor(self._max_workers) as pool:
|
||||
return list(
|
||||
chain.from_iterable(
|
||||
pool.map(
|
||||
partial(self._init_metric_states_for_task, es_index=es_index),
|
||||
tasks.items(),
|
||||
)
|
||||
)
|
||||
)
|
||||
|
||||
def _init_metric_states_for_task(
|
||||
self, task_metrics: Tuple[str, Sequence[str]], es_index
|
||||
) -> Sequence[MetricScrollState]:
|
||||
"""
|
||||
Return metric scroll states for the task filled with the variant states
|
||||
for the variants that reported any debug images
|
||||
"""
|
||||
task, metrics = task_metrics
|
||||
es_req: dict = {
|
||||
"size": 0,
|
||||
"query": {
|
||||
"bool": {
|
||||
"must": [{"term": {"task": task}}, {"terms": {"metric": metrics}}]
|
||||
}
|
||||
},
|
||||
"aggs": {
|
||||
"metrics": {
|
||||
"terms": {
|
||||
"field": "metric",
|
||||
"size": EventMetrics.MAX_METRICS_COUNT,
|
||||
},
|
||||
"aggs": {
|
||||
"last_event_timestamp": {"max": {"field": "timestamp"}},
|
||||
"variants": {
|
||||
"terms": {
|
||||
"field": "variant",
|
||||
"size": EventMetrics.MAX_VARIANTS_COUNT,
|
||||
},
|
||||
"aggs": {
|
||||
"urls": {
|
||||
"terms": {
|
||||
"field": "url",
|
||||
"order": {"max_iter": "desc"},
|
||||
"size": 1, # we need only one url from the most recent iteration
|
||||
},
|
||||
"aggs": {
|
||||
"max_iter": {"max": {"field": "iter"}},
|
||||
"iters": {
|
||||
"top_hits": {
|
||||
"sort": {"iter": {"order": "desc"}},
|
||||
"size": 2, # need two last iterations so that we can take
|
||||
# the second one as invalid
|
||||
"_source": "iter",
|
||||
}
|
||||
},
|
||||
},
|
||||
}
|
||||
},
|
||||
},
|
||||
},
|
||||
}
|
||||
},
|
||||
}
|
||||
|
||||
with translate_errors_context(), TimingContext("es", "_init_metric_states"):
|
||||
es_res = self.es.search(index=es_index, body=es_req, routing=task)
|
||||
if "aggregations" not in es_res:
|
||||
return []
|
||||
|
||||
def init_variant_scroll_state(variant: dict):
|
||||
"""
|
||||
Return new variant scroll state for the passed variant bucket
|
||||
If the image urls get recycled then fill the last_invalid_iteration field
|
||||
"""
|
||||
state = VariantScrollState(name=variant["key"])
|
||||
top_iter_url = dpath.get(variant, "urls/buckets")[0]
|
||||
iters = dpath.get(top_iter_url, "iters/hits/hits")
|
||||
if len(iters) > 1:
|
||||
state.last_invalid_iteration = dpath.get(iters[1], "_source/iter")
|
||||
return state
|
||||
|
||||
return [
|
||||
MetricScrollState(
|
||||
task=task,
|
||||
name=metric["key"],
|
||||
variants=[
|
||||
init_variant_scroll_state(variant)
|
||||
for variant in dpath.get(metric, "variants/buckets")
|
||||
],
|
||||
timestamp=dpath.get(metric, "last_event_timestamp/value"),
|
||||
)
|
||||
for metric in dpath.get(es_res, "aggregations/metrics/buckets")
|
||||
]
|
||||
|
||||
def _get_task_metric_events(
|
||||
self,
|
||||
metric: MetricScrollState,
|
||||
es_index: str,
|
||||
iter_count: int,
|
||||
navigate_earlier: bool,
|
||||
) -> Tuple:
|
||||
"""
|
||||
Return task metric events grouped by iterations
|
||||
Update metric scroll state
|
||||
"""
|
||||
if metric.last_max_iter is None:
|
||||
# the first fetch is always from the latest iteration to the earlier ones
|
||||
navigate_earlier = True
|
||||
|
||||
must_conditions = [
|
||||
{"term": {"task": metric.task}},
|
||||
{"term": {"metric": metric.name}},
|
||||
]
|
||||
must_not_conditions = []
|
||||
|
||||
range_condition = None
|
||||
if navigate_earlier and metric.last_min_iter is not None:
|
||||
range_condition = {"lt": metric.last_min_iter}
|
||||
elif not navigate_earlier and metric.last_max_iter is not None:
|
||||
range_condition = {"gt": metric.last_max_iter}
|
||||
if range_condition:
|
||||
must_conditions.append({"range": {"iter": range_condition}})
|
||||
|
||||
if navigate_earlier:
|
||||
"""
|
||||
When navigating to earlier iterations consider only
|
||||
variants whose invalid iterations border is lower than
|
||||
our starting iteration. For these variants make sure
|
||||
that only events from the valid iterations are returned
|
||||
"""
|
||||
if not metric.last_min_iter:
|
||||
variants = metric.variants
|
||||
else:
|
||||
variants = list(
|
||||
v
|
||||
for v in metric.variants
|
||||
if v.last_invalid_iteration is None
|
||||
or v.last_invalid_iteration < metric.last_min_iter
|
||||
)
|
||||
if not variants:
|
||||
return metric.task, metric.name, []
|
||||
must_conditions.append(
|
||||
{"terms": {"variant": list(v.name for v in variants)}}
|
||||
)
|
||||
else:
|
||||
"""
|
||||
When navigating to later iterations all variants may be relevant.
|
||||
For the variants whose invalid border is higher than our starting
|
||||
iteration make sure that only events from valid iterations are returned
|
||||
"""
|
||||
variants = list(
|
||||
v
|
||||
for v in metric.variants
|
||||
if v.last_invalid_iteration is not None
|
||||
and v.last_invalid_iteration > metric.last_max_iter
|
||||
)
|
||||
|
||||
variants_conditions = [
|
||||
{
|
||||
"bool": {
|
||||
"must": [
|
||||
{"term": {"variant": v.name}},
|
||||
{"range": {"iter": {"lte": v.last_invalid_iteration}}},
|
||||
]
|
||||
}
|
||||
}
|
||||
for v in variants
|
||||
if v.last_invalid_iteration is not None
|
||||
]
|
||||
if variants_conditions:
|
||||
must_not_conditions.append({"bool": {"should": variants_conditions}})
|
||||
|
||||
es_req = {
|
||||
"size": 0,
|
||||
"query": {
|
||||
"bool": {"must": must_conditions, "must_not": must_not_conditions}
|
||||
},
|
||||
"aggs": {
|
||||
"iters": {
|
||||
"terms": {
|
||||
"field": "iter",
|
||||
"size": iter_count,
|
||||
"order": {"_term": "desc" if navigate_earlier else "asc"},
|
||||
},
|
||||
"aggs": {
|
||||
"variants": {
|
||||
"terms": {
|
||||
"field": "variant",
|
||||
"size": EventMetrics.MAX_VARIANTS_COUNT,
|
||||
},
|
||||
"aggs": {
|
||||
"events": {
|
||||
"top_hits": {"sort": {"url": {"order": "desc"}}}
|
||||
}
|
||||
},
|
||||
}
|
||||
},
|
||||
}
|
||||
},
|
||||
}
|
||||
with translate_errors_context(), TimingContext("es", "get_debug_image_events"):
|
||||
es_res = self.es.search(index=es_index, body=es_req, routing=metric.task)
|
||||
if "aggregations" not in es_res:
|
||||
return metric.task, metric.name, []
|
||||
|
||||
def get_iteration_events(variant_buckets: Sequence[dict]) -> Sequence:
|
||||
return [
|
||||
ev["_source"]
|
||||
for v in variant_buckets
|
||||
for ev in dpath.get(v, "events/hits/hits")
|
||||
]
|
||||
|
||||
iterations = [
|
||||
{
|
||||
"iter": it["key"],
|
||||
"events": get_iteration_events(dpath.get(it, "variants/buckets")),
|
||||
}
|
||||
for it in dpath.get(es_res, "aggregations/iters/buckets")
|
||||
]
|
||||
if not navigate_earlier:
|
||||
iterations.sort(key=itemgetter("iter"), reverse=True)
|
||||
if iterations:
|
||||
metric.last_max_iter = iterations[0]["iter"]
|
||||
metric.last_min_iter = iterations[-1]["iter"]
|
||||
|
||||
# Commented for now since the last invalid iteration is calculated in the beginning
|
||||
# if navigate_earlier and any(
|
||||
# variant.last_invalid_iteration is None for variant in variants
|
||||
# ):
|
||||
# """
|
||||
# Variants validation flags due to recycling can
|
||||
# be set only on navigation to earlier frames
|
||||
# """
|
||||
# iterations = self._update_variants_invalid_iterations(variants, iterations)
|
||||
|
||||
return metric.task, metric.name, iterations
|
||||
|
||||
@staticmethod
|
||||
def _update_variants_invalid_iterations(
|
||||
variants: Sequence[VariantScrollState], iterations: Sequence[dict]
|
||||
) -> Sequence[dict]:
|
||||
"""
|
||||
This code is currently not in used since the invalid iterations
|
||||
are calculated during MetricState initialization
|
||||
For variants that do not have recycle url marker set it from the
|
||||
first event
|
||||
For variants that do not have last_invalid_iteration set check if the
|
||||
recycle marker was reached on a certain iteration and set it to the
|
||||
corresponding iteration
|
||||
For variants that have a newly set last_invalid_iteration remove
|
||||
events from the invalid iterations
|
||||
Return the updated iterations list
|
||||
"""
|
||||
variants_lookup = bucketize(variants, attrgetter("name"))
|
||||
for it in iterations:
|
||||
iteration = it["iter"]
|
||||
events_to_remove = []
|
||||
for event in it["events"]:
|
||||
variant = variants_lookup[event["variant"]][0]
|
||||
if (
|
||||
variant.last_invalid_iteration
|
||||
and variant.last_invalid_iteration >= iteration
|
||||
):
|
||||
events_to_remove.append(event)
|
||||
continue
|
||||
event_url = event.get("url")
|
||||
if not variant.recycle_url_marker:
|
||||
variant.recycle_url_marker = event_url
|
||||
elif variant.recycle_url_marker == event_url:
|
||||
variant.last_invalid_iteration = iteration
|
||||
events_to_remove.append(event)
|
||||
if events_to_remove:
|
||||
it["events"] = [ev for ev in it["events"] if ev not in events_to_remove]
|
||||
return [it for it in iterations if it["events"]]
|
||||
@@ -1,7 +1,7 @@
|
||||
import hashlib
|
||||
from collections import defaultdict
|
||||
from contextlib import closing
|
||||
from datetime import datetime
|
||||
from enum import Enum
|
||||
from operator import attrgetter
|
||||
from typing import Sequence
|
||||
|
||||
@@ -14,42 +14,39 @@ from nested_dict import nested_dict
|
||||
import database.utils as dbutils
|
||||
import es_factory
|
||||
from apierrors import errors
|
||||
from bll.event.event_metrics import EventMetrics
|
||||
from bll.event.debug_images_iterator import DebugImagesIterator
|
||||
from bll.event.event_metrics import EventMetrics, EventType
|
||||
from bll.task import TaskBLL
|
||||
from config import config
|
||||
from database.errors import translate_errors_context
|
||||
from database.model.task.task import Task, TaskStatus
|
||||
from redis_manager import redman
|
||||
from timing_context import TimingContext
|
||||
from utilities.dicts import flatten_nested_items
|
||||
|
||||
|
||||
class EventType(Enum):
|
||||
metrics_scalar = "training_stats_scalar"
|
||||
metrics_vector = "training_stats_vector"
|
||||
metrics_image = "training_debug_image"
|
||||
metrics_plot = "plot"
|
||||
task_log = "log"
|
||||
|
||||
|
||||
# noinspection PyTypeChecker
|
||||
EVENT_TYPES = set(map(attrgetter("value"), EventType))
|
||||
|
||||
|
||||
LOCKED_TASK_STATUSES = (TaskStatus.publishing, TaskStatus.published)
|
||||
|
||||
|
||||
@attr.s
|
||||
@attr.s(auto_attribs=True)
|
||||
class TaskEventsResult(object):
|
||||
events = attr.ib(type=list, default=attr.Factory(list))
|
||||
total_events = attr.ib(type=int, default=0)
|
||||
next_scroll_id = attr.ib(type=str, default=None)
|
||||
total_events: int = 0
|
||||
next_scroll_id: str = None
|
||||
events: list = attr.ib(factory=list)
|
||||
|
||||
|
||||
class EventBLL(object):
|
||||
id_fields = ["task", "iter", "metric", "variant", "key"]
|
||||
id_fields = ("task", "iter", "metric", "variant", "key")
|
||||
|
||||
def __init__(self, events_es=None):
|
||||
def __init__(self, events_es=None, redis=None):
|
||||
self.es = events_es or es_factory.connect("events")
|
||||
self._metrics = EventMetrics(self.es)
|
||||
self._skip_iteration_for_metric = set(
|
||||
config.get("services.events.ignore_iteration.metrics", [])
|
||||
)
|
||||
self.redis = redis or redman.connection("apiserver")
|
||||
self.debug_images_iterator = DebugImagesIterator(es=self.es, redis=self.redis)
|
||||
|
||||
@property
|
||||
def metrics(self) -> EventMetrics:
|
||||
@@ -59,9 +56,12 @@ class EventBLL(object):
|
||||
actions = []
|
||||
task_ids = set()
|
||||
task_iteration = defaultdict(lambda: 0)
|
||||
task_last_events = nested_dict(
|
||||
task_last_scalar_events = nested_dict(
|
||||
3, dict
|
||||
) # task_id -> metric_hash -> variant_hash -> MetricEvent
|
||||
task_last_events = nested_dict(
|
||||
3, dict
|
||||
) # task_id -> metric_hash -> event_type -> MetricEvent
|
||||
|
||||
for event in events:
|
||||
# remove spaces from event type
|
||||
@@ -103,6 +103,9 @@ class EventBLL(object):
|
||||
event["value"] = event["values"]
|
||||
del event["values"]
|
||||
|
||||
event["metric"] = event.get("metric") or ""
|
||||
event["variant"] = event.get("variant") or ""
|
||||
|
||||
index_name = EventMetrics.get_index_name(company_id, event_type)
|
||||
es_action = {
|
||||
"_op_type": "index", # overwrite if exists with same ID
|
||||
@@ -121,12 +124,18 @@ class EventBLL(object):
|
||||
if task_id is not None:
|
||||
es_action["_routing"] = task_id
|
||||
task_ids.add(task_id)
|
||||
if iter is not None:
|
||||
if (
|
||||
iter is not None
|
||||
and event.get("metric") not in self._skip_iteration_for_metric
|
||||
):
|
||||
task_iteration[task_id] = max(iter, task_iteration[task_id])
|
||||
|
||||
self._update_last_metric_events_for_task(
|
||||
last_events=task_last_events[task_id], event=event,
|
||||
)
|
||||
if event_type == EventType.metrics_scalar.value:
|
||||
self._update_last_metric_event_for_task(
|
||||
task_last_events=task_last_events, task_id=task_id, event=event
|
||||
self._update_last_scalar_events_for_task(
|
||||
last_events=task_last_scalar_events[task_id], event=event
|
||||
)
|
||||
else:
|
||||
es_action["_routing"] = task_id
|
||||
@@ -179,6 +188,7 @@ class EventBLL(object):
|
||||
task_id=task_id,
|
||||
now=now,
|
||||
iter_max=task_iteration.get(task_id),
|
||||
last_scalar_events=task_last_scalar_events.get(task_id),
|
||||
last_events=task_last_events.get(task_id),
|
||||
)
|
||||
|
||||
@@ -194,12 +204,12 @@ class EventBLL(object):
|
||||
|
||||
return added, errors_in_bulk
|
||||
|
||||
def _update_last_metric_event_for_task(self, task_last_events, task_id, event):
|
||||
def _update_last_scalar_events_for_task(self, last_events, event):
|
||||
"""
|
||||
Update task_last_events structure for the provided task_id with the provided event details if this event is more
|
||||
Update last_events structure with the provided event details if this event is more
|
||||
recent than the currently stored event for its metric/variant combination.
|
||||
|
||||
task_last_events contains [hashed_metric_name -> hashed_variant_name -> event]. Keys are hashed to avoid mongodb
|
||||
last_events contains [hashed_metric_name -> hashed_variant_name -> event]. Keys are hashed to avoid mongodb
|
||||
key conflicts due to invalid characters and/or long field names.
|
||||
"""
|
||||
metric = event.get("metric")
|
||||
@@ -210,13 +220,34 @@ class EventBLL(object):
|
||||
metric_hash = dbutils.hash_field_name(metric)
|
||||
variant_hash = dbutils.hash_field_name(variant)
|
||||
|
||||
last_events = task_last_events[task_id]
|
||||
|
||||
timestamp = last_events[metric_hash][variant_hash].get("timestamp", None)
|
||||
if timestamp is None or timestamp < event["timestamp"]:
|
||||
last_events[metric_hash][variant_hash] = event
|
||||
|
||||
def _update_task(self, company_id, task_id, now, iter_max=None, last_events=None):
|
||||
def _update_last_metric_events_for_task(self, last_events, event):
|
||||
"""
|
||||
Update last_events structure with the provided event details if this event is more
|
||||
recent than the currently stored event for its metric/event_type combination.
|
||||
last_events contains [metric_name -> event_type -> event]
|
||||
"""
|
||||
metric = event.get("metric")
|
||||
event_type = event.get("type")
|
||||
if not (metric and event_type):
|
||||
return
|
||||
|
||||
timestamp = last_events[metric][event_type].get("timestamp", None)
|
||||
if timestamp is None or timestamp < event["timestamp"]:
|
||||
last_events[metric][event_type] = event
|
||||
|
||||
def _update_task(
|
||||
self,
|
||||
company_id,
|
||||
task_id,
|
||||
now,
|
||||
iter_max=None,
|
||||
last_scalar_events=None,
|
||||
last_events=None,
|
||||
):
|
||||
"""
|
||||
Update task information in DB with aggregated results after handling event(s) related to this task.
|
||||
|
||||
@@ -229,15 +260,18 @@ class EventBLL(object):
|
||||
if iter_max is not None:
|
||||
fields["last_iteration_max"] = iter_max
|
||||
|
||||
if last_events:
|
||||
fields["last_values"] = list(
|
||||
if last_scalar_events:
|
||||
fields["last_scalar_values"] = list(
|
||||
flatten_nested_items(
|
||||
last_events,
|
||||
last_scalar_events,
|
||||
nesting=2,
|
||||
include_leaves=["value", "metric", "variant"],
|
||||
)
|
||||
)
|
||||
|
||||
if last_events:
|
||||
fields["last_events"] = last_events
|
||||
|
||||
if not fields:
|
||||
return False
|
||||
|
||||
@@ -245,7 +279,7 @@ class EventBLL(object):
|
||||
|
||||
def _get_event_id(self, event):
|
||||
id_values = (str(event[field]) for field in self.id_fields if field in event)
|
||||
return "-".join(id_values)
|
||||
return hashlib.md5("-".join(id_values).encode()).hexdigest()
|
||||
|
||||
def scroll_task_events(
|
||||
self,
|
||||
@@ -276,7 +310,9 @@ class EventBLL(object):
|
||||
}
|
||||
|
||||
with translate_errors_context(), TimingContext("es", "scroll_task_events"):
|
||||
es_res = self.es.search(index=es_index, body=es_req, scroll="1h")
|
||||
es_res = self.es.search(
|
||||
index=es_index, body=es_req, scroll="1h", routing=task_id
|
||||
)
|
||||
|
||||
events = [hit["_source"] for hit in es_res["hits"]["hits"]]
|
||||
next_scroll_id = es_res["_scroll_id"]
|
||||
@@ -294,10 +330,16 @@ class EventBLL(object):
|
||||
"size": 0,
|
||||
"aggs": {
|
||||
"metrics": {
|
||||
"terms": {"field": "metric"},
|
||||
"terms": {
|
||||
"field": "metric",
|
||||
"size": EventMetrics.MAX_METRICS_COUNT,
|
||||
},
|
||||
"aggs": {
|
||||
"variants": {
|
||||
"terms": {"field": "variant"},
|
||||
"terms": {
|
||||
"field": "variant",
|
||||
"size": EventMetrics.MAX_VARIANTS_COUNT,
|
||||
},
|
||||
"aggs": {
|
||||
"iters": {
|
||||
"terms": {
|
||||
@@ -496,8 +538,18 @@ class EventBLL(object):
|
||||
"size": 0,
|
||||
"aggs": {
|
||||
"metrics": {
|
||||
"terms": {"field": "metric", "size": 200},
|
||||
"aggs": {"variants": {"terms": {"field": "variant", "size": 200}}},
|
||||
"terms": {
|
||||
"field": "metric",
|
||||
"size": EventMetrics.MAX_METRICS_COUNT,
|
||||
},
|
||||
"aggs": {
|
||||
"variants": {
|
||||
"terms": {
|
||||
"field": "variant",
|
||||
"size": EventMetrics.MAX_VARIANTS_COUNT,
|
||||
}
|
||||
}
|
||||
},
|
||||
}
|
||||
},
|
||||
"query": {"bool": {"must": [{"term": {"task": task_id}}]}},
|
||||
@@ -537,14 +589,14 @@ class EventBLL(object):
|
||||
"metrics": {
|
||||
"terms": {
|
||||
"field": "metric",
|
||||
"size": 1000,
|
||||
"size": EventMetrics.MAX_METRICS_COUNT,
|
||||
"order": {"_term": "asc"},
|
||||
},
|
||||
"aggs": {
|
||||
"variants": {
|
||||
"terms": {
|
||||
"field": "variant",
|
||||
"size": 1000,
|
||||
"size": EventMetrics.MAX_VARIANTS_COUNT,
|
||||
"order": {"_term": "asc"},
|
||||
},
|
||||
"aggs": {
|
||||
|
||||
@@ -1,12 +1,13 @@
|
||||
import itertools
|
||||
from collections import defaultdict
|
||||
from concurrent.futures import ThreadPoolExecutor
|
||||
from enum import Enum
|
||||
from functools import partial
|
||||
from operator import itemgetter
|
||||
from typing import Sequence, Tuple, Callable, Iterable
|
||||
|
||||
from boltons.iterutils import bucketize
|
||||
from elasticsearch import Elasticsearch
|
||||
from typing import Sequence, Tuple, Callable
|
||||
|
||||
from mongoengine import Q
|
||||
|
||||
from apierrors import errors
|
||||
@@ -20,10 +21,19 @@ from utilities import safe_get
|
||||
log = config.logger(__file__)
|
||||
|
||||
|
||||
class EventType(Enum):
|
||||
metrics_scalar = "training_stats_scalar"
|
||||
metrics_vector = "training_stats_vector"
|
||||
metrics_image = "training_debug_image"
|
||||
metrics_plot = "plot"
|
||||
task_log = "log"
|
||||
|
||||
|
||||
class EventMetrics:
|
||||
MAX_TASKS_COUNT = 100
|
||||
MAX_TASKS_COUNT = 50
|
||||
MAX_METRICS_COUNT = 200
|
||||
MAX_VARIANTS_COUNT = 500
|
||||
MAX_AGGS_ELEMENTS_COUNT = 50
|
||||
|
||||
def __init__(self, es: Elasticsearch):
|
||||
self.es = es
|
||||
@@ -62,6 +72,12 @@ class EventMetrics:
|
||||
Compare scalar metrics for different tasks per metric and variant
|
||||
The amount of points in each histogram should not exceed the requested samples
|
||||
"""
|
||||
if len(task_ids) > self.MAX_TASKS_COUNT:
|
||||
raise errors.BadRequest(
|
||||
f"Up to {self.MAX_TASKS_COUNT} tasks supported for comparison",
|
||||
len(task_ids),
|
||||
)
|
||||
|
||||
task_name_by_id = {}
|
||||
with translate_errors_context():
|
||||
task_objs = Task.get_many(
|
||||
@@ -97,6 +113,31 @@ class EventMetrics:
|
||||
MetricInterval = Tuple[int, Sequence[TaskMetric]]
|
||||
MetricData = Tuple[str, dict]
|
||||
|
||||
def _split_metrics_by_max_aggs_count(
|
||||
self, task_metrics: Sequence[TaskMetric]
|
||||
) -> Iterable[Sequence[TaskMetric]]:
|
||||
"""
|
||||
Return task metrics in groups where amount of task metrics in each group
|
||||
is roughly limited by MAX_AGGS_ELEMENTS_COUNT. The split is done on metrics and
|
||||
variants while always preserving all their tasks in the same group
|
||||
"""
|
||||
if len(task_metrics) < self.MAX_AGGS_ELEMENTS_COUNT:
|
||||
yield task_metrics
|
||||
return
|
||||
|
||||
tm_grouped = bucketize(task_metrics, key=itemgetter(1, 2))
|
||||
groups = []
|
||||
for group in tm_grouped.values():
|
||||
groups.append(group)
|
||||
if sum(map(len, groups)) >= self.MAX_AGGS_ELEMENTS_COUNT:
|
||||
yield list(itertools.chain(*groups))
|
||||
groups = []
|
||||
|
||||
if groups:
|
||||
yield list(itertools.chain(*groups))
|
||||
|
||||
return
|
||||
|
||||
def _run_get_scalar_metrics_as_parallel(
|
||||
self,
|
||||
company_id: str,
|
||||
@@ -126,21 +167,25 @@ class EventMetrics:
|
||||
if not intervals:
|
||||
return {}
|
||||
|
||||
with ThreadPoolExecutor(len(intervals)) as pool:
|
||||
metrics = list(
|
||||
itertools.chain.from_iterable(
|
||||
pool.map(
|
||||
partial(
|
||||
get_func, task_ids=task_ids, es_index=es_index, key=key
|
||||
),
|
||||
intervals,
|
||||
)
|
||||
intervals = list(
|
||||
itertools.chain.from_iterable(
|
||||
zip(itertools.repeat(i), self._split_metrics_by_max_aggs_count(tms))
|
||||
for i, tms in intervals
|
||||
)
|
||||
)
|
||||
max_concurrency = config.get("services.events.max_metrics_concurrency", 4)
|
||||
with ThreadPoolExecutor(max_workers=max_concurrency) as pool:
|
||||
metrics = itertools.chain.from_iterable(
|
||||
pool.map(
|
||||
partial(get_func, task_ids=task_ids, es_index=es_index, key=key),
|
||||
intervals,
|
||||
)
|
||||
)
|
||||
|
||||
ret = defaultdict(dict)
|
||||
for metric_key, metric_values in metrics:
|
||||
ret[metric_key].update(metric_values)
|
||||
|
||||
return ret
|
||||
|
||||
def _get_metric_intervals(
|
||||
@@ -310,7 +355,13 @@ class EventMetrics:
|
||||
"variants": {
|
||||
"terms": {"field": "variant", "size": self.MAX_VARIANTS_COUNT},
|
||||
"aggs": {
|
||||
"tasks": {"terms": {"field": "task"}, "aggs": aggregation}
|
||||
"tasks": {
|
||||
"terms": {
|
||||
"field": "task",
|
||||
"size": self.MAX_TASKS_COUNT,
|
||||
},
|
||||
"aggs": aggregation,
|
||||
}
|
||||
},
|
||||
}
|
||||
},
|
||||
@@ -396,3 +447,50 @@ class EventMetrics:
|
||||
]
|
||||
}
|
||||
}
|
||||
|
||||
def get_tasks_metrics(
|
||||
self, company_id, task_ids: Sequence, event_type: EventType
|
||||
) -> Sequence[Tuple]:
|
||||
"""
|
||||
For the requested tasks return all the metrics that
|
||||
reported events of the requested types
|
||||
"""
|
||||
es_index = EventMetrics.get_index_name(company_id, event_type.value)
|
||||
if not self.es.indices.exists(es_index):
|
||||
return [(tid, []) for tid in task_ids]
|
||||
|
||||
max_concurrency = config.get("services.events.max_metrics_concurrency", 4)
|
||||
with ThreadPoolExecutor(max_concurrency) as pool:
|
||||
res = pool.map(
|
||||
partial(
|
||||
self._get_task_metrics, es_index=es_index, event_type=event_type,
|
||||
),
|
||||
task_ids,
|
||||
)
|
||||
return list(zip(task_ids, res))
|
||||
|
||||
def _get_task_metrics(self, task_id, es_index, event_type: EventType) -> Sequence:
|
||||
es_req = {
|
||||
"size": 0,
|
||||
"query": {
|
||||
"bool": {
|
||||
"must": [
|
||||
{"term": {"task": task_id}},
|
||||
{"term": {"type": event_type.value}},
|
||||
]
|
||||
}
|
||||
},
|
||||
"aggs": {
|
||||
"metrics": {
|
||||
"terms": {"field": "metric", "size": self.MAX_METRICS_COUNT}
|
||||
}
|
||||
},
|
||||
}
|
||||
|
||||
with translate_errors_context(), TimingContext("es", "_get_task_metrics"):
|
||||
es_res = self.es.search(index=es_index, body=es_req, routing=task_id)
|
||||
|
||||
return [
|
||||
metric["key"]
|
||||
for metric in safe_get(es_res, "aggregations/metrics/buckets", default=[])
|
||||
]
|
||||
|
||||
@@ -9,9 +9,12 @@ import es_factory
|
||||
from apierrors import errors
|
||||
from bll.queue.queue_metrics import QueueMetrics
|
||||
from bll.workers import WorkerBLL
|
||||
from config import config
|
||||
from database.errors import translate_errors_context
|
||||
from database.model.queue import Queue, Entry
|
||||
|
||||
log = config.logger(__file__)
|
||||
|
||||
|
||||
class QueueBLL(object):
|
||||
def __init__(self, worker_bll: WorkerBLL = None, es: Elasticsearch = None):
|
||||
@@ -189,9 +192,7 @@ class QueueBLL(object):
|
||||
"""
|
||||
with translate_errors_context():
|
||||
query = dict(id=queue_id, company=company_id)
|
||||
queue = Queue.objects(**query).modify(
|
||||
pop__entries=-1, last_update=datetime.utcnow(), upsert=False
|
||||
)
|
||||
queue = Queue.objects(**query).modify(pop__entries=-1, upsert=False)
|
||||
if not queue:
|
||||
raise errors.bad_request.InvalidQueueId(**query)
|
||||
|
||||
@@ -200,6 +201,11 @@ class QueueBLL(object):
|
||||
if not queue.entries:
|
||||
return
|
||||
|
||||
try:
|
||||
Queue.objects(**query).update(last_update=datetime.utcnow())
|
||||
except Exception:
|
||||
log.exception("Error while updating Queue.last_update")
|
||||
|
||||
return queue.entries[0]
|
||||
|
||||
def remove_task(self, company_id: str, queue_id: str, task_id: str) -> int:
|
||||
|
||||
44
server/bll/redis_cache_manager.py
Normal file
44
server/bll/redis_cache_manager.py
Normal file
@@ -0,0 +1,44 @@
|
||||
from typing import Optional, TypeVar, Generic, Type
|
||||
|
||||
from redis import StrictRedis
|
||||
|
||||
from timing_context import TimingContext
|
||||
|
||||
T = TypeVar("T")
|
||||
|
||||
|
||||
class RedisCacheManager(Generic[T]):
|
||||
"""
|
||||
Class for store/retreive of state objects from redis
|
||||
|
||||
self.state_class - class of the state
|
||||
self.redis - instance of redis
|
||||
self.expiration_interval - expiration interval in seconds
|
||||
"""
|
||||
|
||||
def __init__(
|
||||
self, state_class: Type[T], redis: StrictRedis, expiration_interval: int
|
||||
):
|
||||
self.state_class = state_class
|
||||
self.redis = redis
|
||||
self.expiration_interval = expiration_interval
|
||||
|
||||
def set_state(self, state: T) -> None:
|
||||
redis_key = self._get_redis_key(state.id)
|
||||
with TimingContext("redis", "cache_set_state"):
|
||||
self.redis.set(redis_key, state.to_json())
|
||||
self.redis.expire(redis_key, self.expiration_interval)
|
||||
|
||||
def get_state(self, state_id) -> Optional[T]:
|
||||
redis_key = self._get_redis_key(state_id)
|
||||
with TimingContext("redis", "cache_get_state"):
|
||||
response = self.redis.get(redis_key)
|
||||
if response:
|
||||
return self.state_class.from_json(response)
|
||||
|
||||
def delete_state(self, state_id) -> None:
|
||||
with TimingContext("redis", "cache_delete_state"):
|
||||
self.redis.delete(self._get_redis_key(state_id))
|
||||
|
||||
def _get_redis_key(self, state_id):
|
||||
return f"{self.state_class}/{state_id}"
|
||||
@@ -6,6 +6,8 @@ from time import sleep
|
||||
import attr
|
||||
import psutil
|
||||
|
||||
from utilities.threads_manager import ThreadsManager
|
||||
|
||||
|
||||
class ResourceMonitor(Thread):
|
||||
@attr.s(auto_attribs=True)
|
||||
@@ -58,7 +60,9 @@ class ResourceMonitor(Thread):
|
||||
)
|
||||
|
||||
def run(self):
|
||||
while True:
|
||||
while not ThreadsManager.terminating:
|
||||
sleep(self.sample_interval_sec)
|
||||
|
||||
sample = self._get_sample()
|
||||
|
||||
with self._lock:
|
||||
@@ -67,21 +71,20 @@ class ResourceMonitor(Thread):
|
||||
self._avg = self._avg.avg(sample, self._count)
|
||||
self._count += 1
|
||||
|
||||
sleep(self.sample_interval_sec)
|
||||
|
||||
def get_stats(self) -> dict:
|
||||
""" Returns current resource statistics and clears internal resource statistics """
|
||||
with self._lock:
|
||||
min_ = attr.asdict(self._min)
|
||||
max_ = attr.asdict(self._max)
|
||||
avg = attr.asdict(self._avg)
|
||||
res = {
|
||||
"interval_sec": (datetime.utcnow() - self._clear_time).total_seconds(),
|
||||
"num_cores": psutil.cpu_count(),
|
||||
**{
|
||||
k: {"min": v, "max": max_[k], "avg": avg[k]}
|
||||
for k, v in min_.items()
|
||||
}
|
||||
}
|
||||
interval = datetime.utcnow() - self._clear_time
|
||||
self._clear()
|
||||
return res
|
||||
|
||||
return {
|
||||
"interval_sec": interval.total_seconds(),
|
||||
"num_cores": psutil.cpu_count(),
|
||||
**{
|
||||
k: {"min": v, "max": max_[k], "avg": avg[k]}
|
||||
for k, v in min_.items()
|
||||
}
|
||||
}
|
||||
|
||||
@@ -53,11 +53,8 @@ class StatisticsReporter:
|
||||
report_interval = timedelta(
|
||||
hours=config.get("apiserver.statistics.report_interval_hours", 24)
|
||||
)
|
||||
|
||||
while True:
|
||||
|
||||
sleep(report_interval.total_seconds())
|
||||
|
||||
sleep(report_interval.total_seconds())
|
||||
while not ThreadsManager.terminating:
|
||||
try:
|
||||
for company in Company.objects(
|
||||
defaults__stats_option__enabled=True
|
||||
@@ -68,6 +65,8 @@ class StatisticsReporter:
|
||||
except Exception as ex:
|
||||
log.exception(f"Failed collecting stats: {str(ex)}")
|
||||
|
||||
sleep(report_interval.total_seconds())
|
||||
|
||||
@classmethod
|
||||
@threads.register("sender", daemon=True)
|
||||
def start_sender(cls):
|
||||
@@ -86,7 +85,7 @@ class StatisticsReporter:
|
||||
|
||||
WarningFilter.attach()
|
||||
|
||||
while True:
|
||||
while not ThreadsManager.terminating:
|
||||
try:
|
||||
report = cls.send_queue.get()
|
||||
|
||||
|
||||
@@ -4,4 +4,5 @@ from .utils import (
|
||||
update_project_time,
|
||||
validate_status_change,
|
||||
split_by,
|
||||
ParameterKeyEscaper,
|
||||
)
|
||||
|
||||
@@ -1,31 +1,41 @@
|
||||
import re
|
||||
from collections import OrderedDict
|
||||
from datetime import datetime, timedelta
|
||||
from operator import attrgetter
|
||||
from random import random
|
||||
from time import sleep
|
||||
from typing import Collection, Sequence, Tuple, Any
|
||||
from typing import Collection, Sequence, Tuple, Any, Optional, List, Dict
|
||||
|
||||
import pymongo.results
|
||||
import six
|
||||
from mongoengine import Q
|
||||
from six import string_types
|
||||
|
||||
import database.utils as dbutils
|
||||
import es_factory
|
||||
from apierrors import errors
|
||||
from apimodels.tasks import Artifact as ApiArtifact
|
||||
from config import config
|
||||
from database.errors import translate_errors_context
|
||||
from database.model.model import Model
|
||||
from database.model.project import Project
|
||||
from database.model.task.metrics import EventStats, MetricEventStats
|
||||
from database.model.task.output import Output
|
||||
from database.model.task.task import (
|
||||
Task,
|
||||
TaskStatus,
|
||||
TaskStatusMessage,
|
||||
TaskSystemTags,
|
||||
ArtifactModes,
|
||||
Artifact,
|
||||
)
|
||||
from database.utils import get_company_or_none_constraint, id as create_id
|
||||
from service_repo import APICall
|
||||
from timing_context import TimingContext
|
||||
from utilities.dicts import deep_merge
|
||||
from utilities.threads_manager import ThreadsManager
|
||||
from .utils import ChangeStatusRequest, validate_status_change
|
||||
from .utils import ChangeStatusRequest, validate_status_change, ParameterKeyEscaper
|
||||
|
||||
log = config.logger(__file__)
|
||||
|
||||
|
||||
class TaskBLL(object):
|
||||
@@ -144,6 +154,61 @@ class TaskBLL(object):
|
||||
|
||||
return model
|
||||
|
||||
@classmethod
|
||||
def clone_task(
|
||||
cls,
|
||||
company_id,
|
||||
user_id,
|
||||
task_id,
|
||||
name: Optional[str] = None,
|
||||
comment: Optional[str] = None,
|
||||
parent: Optional[str] = None,
|
||||
project: Optional[str] = None,
|
||||
tags: Optional[Sequence[str]] = None,
|
||||
system_tags: Optional[Sequence[str]] = None,
|
||||
execution_overrides: Optional[dict] = None,
|
||||
) -> Task:
|
||||
task = cls.get_by_id(company_id=company_id, task_id=task_id)
|
||||
execution_dict = task.execution.to_proper_dict() if task.execution else {}
|
||||
if execution_overrides:
|
||||
parameters = execution_overrides.get("parameters")
|
||||
if parameters is not None:
|
||||
execution_overrides["parameters"] = {
|
||||
ParameterKeyEscaper.escape(k): v for k, v in parameters.items()
|
||||
}
|
||||
execution_dict = deep_merge(execution_dict, execution_overrides)
|
||||
artifacts = execution_dict.get("artifacts")
|
||||
if artifacts:
|
||||
execution_dict["artifacts"] = [
|
||||
a for a in artifacts if a.get("mode") != ArtifactModes.output
|
||||
]
|
||||
now = datetime.utcnow()
|
||||
|
||||
with translate_errors_context():
|
||||
new_task = Task(
|
||||
id=create_id(),
|
||||
user=user_id,
|
||||
company=company_id,
|
||||
created=now,
|
||||
last_update=now,
|
||||
name=name or task.name,
|
||||
comment=comment or task.comment,
|
||||
parent=parent or task.parent,
|
||||
project=project or task.project,
|
||||
tags=tags or task.tags,
|
||||
system_tags=system_tags or [],
|
||||
type=task.type,
|
||||
script=task.script,
|
||||
output=Output(destination=task.output.destination)
|
||||
if task.output
|
||||
else None,
|
||||
execution=execution_dict,
|
||||
)
|
||||
cls.validate(new_task)
|
||||
new_task.save()
|
||||
|
||||
return new_task
|
||||
|
||||
@classmethod
|
||||
def validate(cls, task: Task):
|
||||
assert isinstance(task, Task)
|
||||
@@ -153,23 +218,13 @@ class TaskBLL(object):
|
||||
):
|
||||
raise errors.bad_request.InvalidTaskId("invalid parent", parent=task.parent)
|
||||
|
||||
if task.project:
|
||||
Project.get_for_writing(company=task.company, id=task.project)
|
||||
if task.project and not Project.get_for_writing(
|
||||
company=task.company, id=task.project
|
||||
):
|
||||
raise errors.bad_request.InvalidProjectId(id=task.project)
|
||||
|
||||
cls.validate_execution_model(task)
|
||||
|
||||
if task.execution:
|
||||
if task.execution.parameters:
|
||||
cls._validate_execution_parameters(task.execution.parameters)
|
||||
|
||||
@staticmethod
|
||||
def _validate_execution_parameters(parameters):
|
||||
invalid_keys = [k for k in parameters if re.search(r"\s", k)]
|
||||
if invalid_keys:
|
||||
raise errors.bad_request.ValidationError(
|
||||
"execution.parameters keys contain whitespace", keys=invalid_keys
|
||||
)
|
||||
|
||||
@staticmethod
|
||||
def get_unique_metric_variants(company_id, project_ids=None):
|
||||
pipeline = [
|
||||
@@ -226,7 +281,8 @@ class TaskBLL(object):
|
||||
last_update: datetime = None,
|
||||
last_iteration: int = None,
|
||||
last_iteration_max: int = None,
|
||||
last_values: Sequence[Tuple[Tuple[str, ...], Any]] = None,
|
||||
last_scalar_values: Sequence[Tuple[Tuple[str, ...], Any]] = None,
|
||||
last_events: Dict[str, Dict[str, dict]] = None,
|
||||
**extra_updates,
|
||||
):
|
||||
"""
|
||||
@@ -238,7 +294,8 @@ class TaskBLL(object):
|
||||
task's last iteration value.
|
||||
:param last_iteration_max: Last reported iteration. Use this to conditionally set a value only
|
||||
if the current task's last iteration value is smaller than the provided value.
|
||||
:param last_values: Last reported metrics summary (value, metric, variant).
|
||||
:param last_scalar_values: Last reported metrics summary for scalar events (value, metric, variant).
|
||||
:param last_events: Last reported metrics summary (value, metric, event type).
|
||||
:param extra_updates: Extra task updates to include in this update call.
|
||||
:return:
|
||||
"""
|
||||
@@ -249,17 +306,33 @@ class TaskBLL(object):
|
||||
elif last_iteration_max is not None:
|
||||
extra_updates.update(max__last_iteration=last_iteration_max)
|
||||
|
||||
if last_values is not None:
|
||||
if last_scalar_values is not None:
|
||||
|
||||
def op_path(op, *path):
|
||||
return "__".join((op, "last_metrics") + path)
|
||||
|
||||
for path, value in last_values:
|
||||
for path, value in last_scalar_values:
|
||||
extra_updates[op_path("set", *path)] = value
|
||||
if path[-1] == "value":
|
||||
extra_updates[op_path("min", *path[:-1], "min_value")] = value
|
||||
extra_updates[op_path("max", *path[:-1], "max_value")] = value
|
||||
|
||||
if last_events is not None:
|
||||
|
||||
def events_per_type(metric_data: Dict[str, dict]) -> Dict[str, EventStats]:
|
||||
return {
|
||||
event_type: EventStats(last_update=event["timestamp"])
|
||||
for event_type, event in metric_data.items()
|
||||
}
|
||||
|
||||
metric_stats = {
|
||||
dbutils.hash_field_name(metric_key): MetricEventStats(
|
||||
metric=metric_key, event_stats_by_type=events_per_type(metric_data),
|
||||
)
|
||||
for metric_key, metric_data in last_events.items()
|
||||
}
|
||||
extra_updates["metric_stats"] = metric_stats
|
||||
|
||||
Task.objects(id=task_id, company=company_id).update(
|
||||
upsert=False, last_update=last_update, **extra_updates
|
||||
)
|
||||
@@ -373,7 +446,7 @@ class TaskBLL(object):
|
||||
:return: updated task fields
|
||||
"""
|
||||
|
||||
task = TaskBLL.get_task_with_access(
|
||||
task = cls.get_task_with_access(
|
||||
task_id,
|
||||
company_id=company_id,
|
||||
only=(
|
||||
@@ -411,6 +484,97 @@ class TaskBLL(object):
|
||||
force=force,
|
||||
).execute()
|
||||
|
||||
@classmethod
|
||||
def add_or_update_artifacts(
|
||||
cls, task_id: str, company_id: str, artifacts: List[ApiArtifact]
|
||||
) -> Tuple[List[str], List[str]]:
|
||||
key = attrgetter("key", "mode")
|
||||
|
||||
if not artifacts:
|
||||
return [], []
|
||||
|
||||
with translate_errors_context(), TimingContext("mongo", "update_artifacts"):
|
||||
artifacts: List[Artifact] = [
|
||||
Artifact(**artifact.to_struct()) for artifact in artifacts
|
||||
]
|
||||
|
||||
attempts = int(config.get("services.tasks.artifacts.update_attempts", 10))
|
||||
|
||||
for retry in range(attempts):
|
||||
task = cls.get_task_with_access(
|
||||
task_id, company_id=company_id, requires_write_access=True
|
||||
)
|
||||
|
||||
current = list(map(key, task.execution.artifacts))
|
||||
updated = [a for a in artifacts if key(a) in current]
|
||||
added = [a for a in artifacts if a not in updated]
|
||||
|
||||
filter = {"_id": task_id, "company": company_id}
|
||||
update = {}
|
||||
array_filters = None
|
||||
if current:
|
||||
filter["execution.artifacts"] = {
|
||||
"$size": len(current),
|
||||
"$all": [
|
||||
*(
|
||||
{"$elemMatch": {"key": key, "mode": mode}}
|
||||
for key, mode in current
|
||||
)
|
||||
],
|
||||
}
|
||||
else:
|
||||
filter["$or"] = [
|
||||
{"execution.artifacts": {"$exists": False}},
|
||||
{"execution.artifacts": {"$size": 0}},
|
||||
]
|
||||
|
||||
if added:
|
||||
update["$push"] = {
|
||||
"execution.artifacts": {"$each": [a.to_mongo() for a in added]}
|
||||
}
|
||||
if updated:
|
||||
update["$set"] = {
|
||||
f"execution.artifacts.$[artifact{index}]": artifact.to_mongo()
|
||||
for index, artifact in enumerate(updated)
|
||||
}
|
||||
array_filters = [
|
||||
{
|
||||
f"artifact{index}.key": artifact.key,
|
||||
f"artifact{index}.mode": artifact.mode,
|
||||
}
|
||||
for index, artifact in enumerate(updated)
|
||||
]
|
||||
|
||||
if not update:
|
||||
return [], []
|
||||
|
||||
result: pymongo.results.UpdateResult = Task._get_collection().update_one(
|
||||
filter=filter,
|
||||
update=update,
|
||||
array_filters=array_filters,
|
||||
upsert=False,
|
||||
)
|
||||
|
||||
if result.matched_count >= 1:
|
||||
break
|
||||
|
||||
wait_msec = random() * int(
|
||||
config.get("services.tasks.artifacts.update_retry_msec", 500)
|
||||
)
|
||||
|
||||
log.warning(
|
||||
f"Failed to update artifacts for task {task_id} (updated by another party),"
|
||||
f" retrying {retry+1}/{attempts} in {wait_msec}ms"
|
||||
)
|
||||
|
||||
sleep(wait_msec / 1000)
|
||||
else:
|
||||
raise errors.server_error.UpdateFailed(
|
||||
"task artifacts updated by another party"
|
||||
)
|
||||
|
||||
return [a.key for a in added], [a.key for a in updated]
|
||||
|
||||
@classmethod
|
||||
@threads.register("non_responsive_tasks_watchdog", daemon=True)
|
||||
def start_non_responsive_tasks_watchdog(cls):
|
||||
@@ -421,13 +585,11 @@ class TaskBLL(object):
|
||||
"services.tasks.non_responsive_tasks_watchdog.threshold_sec", 7200
|
||||
)
|
||||
)
|
||||
while True:
|
||||
sleep(
|
||||
config.get(
|
||||
"services.tasks.non_responsive_tasks_watchdog.watch_interval_sec",
|
||||
900,
|
||||
)
|
||||
)
|
||||
watch_interval = config.get(
|
||||
"services.tasks.non_responsive_tasks_watchdog.watch_interval_sec", 900
|
||||
)
|
||||
sleep(watch_interval)
|
||||
while not ThreadsManager.terminating:
|
||||
try:
|
||||
|
||||
ref_time = datetime.utcnow() - threshold
|
||||
@@ -463,6 +625,8 @@ class TaskBLL(object):
|
||||
except Exception as ex:
|
||||
log.exception(f"Failed stopping tasks: {str(ex)}")
|
||||
|
||||
sleep(watch_interval)
|
||||
|
||||
@staticmethod
|
||||
def get_aggregated_project_execution_parameters(
|
||||
company_id,
|
||||
@@ -502,10 +666,7 @@ class TaskBLL(object):
|
||||
]
|
||||
|
||||
with translate_errors_context():
|
||||
result = next(
|
||||
Task.aggregate(*pipeline),
|
||||
None,
|
||||
)
|
||||
result = next(Task.aggregate(*pipeline), None)
|
||||
|
||||
total = 0
|
||||
remaining = 0
|
||||
@@ -513,7 +674,10 @@ class TaskBLL(object):
|
||||
|
||||
if result:
|
||||
total = int(result.get("total", -1))
|
||||
results = [r["_id"] for r in result.get("results", [])]
|
||||
results = [
|
||||
ParameterKeyEscaper.unescape(r["_id"])
|
||||
for r in result.get("results", [])
|
||||
]
|
||||
remaining = max(0, total - (len(results) + page * page_size))
|
||||
|
||||
return total, remaining, results
|
||||
|
||||
@@ -3,6 +3,7 @@ from typing import TypeVar, Callable, Tuple, Sequence
|
||||
|
||||
import attr
|
||||
import six
|
||||
from boltons.dictutils import OneToOne
|
||||
|
||||
from apierrors import errors
|
||||
from database.errors import translate_errors_context
|
||||
@@ -171,3 +172,26 @@ def split_by(
|
||||
[item for cond, item in applied if cond],
|
||||
[item for cond, item in applied if not cond],
|
||||
)
|
||||
|
||||
|
||||
class ParameterKeyEscaper:
|
||||
_mapping = OneToOne({".": "%2E", "$": "%24"})
|
||||
|
||||
@classmethod
|
||||
def escape(cls, value):
|
||||
""" Quote a parameter key """
|
||||
value = value.strip().replace("%", "%%")
|
||||
for c, r in cls._mapping.items():
|
||||
value = value.replace(c, r)
|
||||
return value
|
||||
|
||||
@classmethod
|
||||
def _unescape(cls, value):
|
||||
for c, r in cls._mapping.inv.items():
|
||||
value = value.replace(c, r)
|
||||
return value
|
||||
|
||||
@classmethod
|
||||
def unescape(cls, value):
|
||||
""" Unquote a quoted parameter key """
|
||||
return "%".join(map(cls._unescape, value.split("%%")))
|
||||
|
||||
@@ -47,7 +47,7 @@ class BasicConfig:
|
||||
def logger(self, name):
|
||||
if Path(name).is_file():
|
||||
name = Path(name).stem
|
||||
path = ".".join((self.prefix, Path(name).stem))
|
||||
path = ".".join((self.prefix, name))
|
||||
return logging.getLogger(path)
|
||||
|
||||
def _read_extra_env_config_values(self):
|
||||
|
||||
@@ -34,6 +34,12 @@
|
||||
aggregate {
|
||||
allow_disk_use: true
|
||||
}
|
||||
|
||||
pre_populate {
|
||||
enabled: false
|
||||
zip_file: "/path/to/export.zip"
|
||||
fail_on_error: false
|
||||
}
|
||||
}
|
||||
|
||||
auth {
|
||||
|
||||
@@ -32,6 +32,11 @@ mongo {
|
||||
}
|
||||
|
||||
redis {
|
||||
apiserver {
|
||||
host: "127.0.0.1"
|
||||
port: 6379
|
||||
db: 0
|
||||
}
|
||||
workers {
|
||||
host: "127.0.0.1"
|
||||
port: 6379
|
||||
|
||||
@@ -1,3 +1,9 @@
|
||||
{
|
||||
es_index_prefix:"events"
|
||||
}
|
||||
es_index_prefix: "events"
|
||||
|
||||
ignore_iteration {
|
||||
metrics: [":monitor:machine", ":monitor:gpu"]
|
||||
}
|
||||
|
||||
# max number of concurrent queries to ES when calculating events metrics
|
||||
# should not exceed the amount of concurrent connections set in the ES driver
|
||||
max_metrics_concurrency: 4
|
||||
@@ -5,3 +5,8 @@ non_responsive_tasks_watchdog {
|
||||
# Watchdog will sleep for this number of seconds after each cycle
|
||||
watch_interval_sec: 900
|
||||
}
|
||||
|
||||
artifacts {
|
||||
update_attempts: 10
|
||||
update_retry_msec: 500
|
||||
}
|
||||
@@ -1,43 +1,43 @@
|
||||
from functools import lru_cache
|
||||
from pathlib import Path
|
||||
from os import getenv
|
||||
from pathlib import Path
|
||||
from version import __version__
|
||||
|
||||
from config import config
|
||||
|
||||
root = Path(__file__).parent.parent
|
||||
|
||||
|
||||
@lru_cache()
|
||||
def get_build_number():
|
||||
try:
|
||||
return (root / "BUILD").read_text().strip()
|
||||
except FileNotFoundError:
|
||||
return ""
|
||||
|
||||
|
||||
@lru_cache()
|
||||
def get_version():
|
||||
try:
|
||||
return (root / "VERSION").read_text().strip()
|
||||
except FileNotFoundError:
|
||||
return ""
|
||||
|
||||
|
||||
@lru_cache()
|
||||
def get_commit_number():
|
||||
try:
|
||||
return (root / "COMMIT").read_text().strip()
|
||||
except FileNotFoundError:
|
||||
return ""
|
||||
|
||||
|
||||
@lru_cache()
|
||||
def get_deployment_type() -> str:
|
||||
value = getenv("TRAINS_SERVER_DEPLOYMENT_TYPE")
|
||||
def _get(prop_name, env_suffix=None, default=""):
|
||||
value = getenv(f"TRAINS_SERVER_{env_suffix or prop_name}")
|
||||
if value:
|
||||
return value
|
||||
|
||||
try:
|
||||
value = (root / "DEPLOY").read_text().strip()
|
||||
return (root / prop_name).read_text().strip()
|
||||
except FileNotFoundError:
|
||||
pass
|
||||
return default
|
||||
|
||||
return value or "manual"
|
||||
|
||||
@lru_cache()
|
||||
def get_build_number():
|
||||
return _get("BUILD")
|
||||
|
||||
|
||||
@lru_cache()
|
||||
def get_version():
|
||||
return _get("VERSION", default=__version__)
|
||||
|
||||
|
||||
@lru_cache()
|
||||
def get_commit_number():
|
||||
return _get("COMMIT")
|
||||
|
||||
|
||||
@lru_cache()
|
||||
def get_deployment_type() -> str:
|
||||
return _get("DEPLOY", env_suffix="DEPLOYMENT_TYPE", default="manual")
|
||||
|
||||
|
||||
def get_default_company():
|
||||
return config.get("apiserver.default_company")
|
||||
|
||||
@@ -52,7 +52,7 @@ class User(DbModelMixin, AuthDocument):
|
||||
meta = {"db_alias": Database.auth, "strict": strict}
|
||||
|
||||
id = StringField(primary_key=True)
|
||||
name = StringField(unique_with="company")
|
||||
name = StringField()
|
||||
|
||||
created = DateTimeField()
|
||||
""" User auth entry creation time """
|
||||
|
||||
@@ -1,7 +1,7 @@
|
||||
import re
|
||||
from collections import namedtuple
|
||||
from functools import reduce
|
||||
from typing import Collection, Sequence, Union
|
||||
from typing import Collection, Sequence, Union, Optional
|
||||
|
||||
from boltons.iterutils import first
|
||||
from dateutil.parser import parse as parse_datetime
|
||||
@@ -60,7 +60,7 @@ class ProperDictMixin(object):
|
||||
|
||||
class GetMixin(PropsMixin):
|
||||
_text_score = "$text_score"
|
||||
|
||||
_projection_key = "projection"
|
||||
_ordering_key = "order_by"
|
||||
_search_text_key = "search_text"
|
||||
|
||||
@@ -270,11 +270,26 @@ class GetMixin(PropsMixin):
|
||||
return override_projection
|
||||
if not parameters:
|
||||
return []
|
||||
return parameters.get("projection") or parameters.get("only_fields", [])
|
||||
return parameters.get(cls._projection_key) or parameters.get("only_fields", [])
|
||||
|
||||
@classmethod
|
||||
def set_default_ordering(cls, parameters, value):
|
||||
parameters[cls._ordering_key] = parameters.get(cls._ordering_key) or value
|
||||
def set_projection(cls, parameters: dict, value: Sequence[str]) -> Sequence[str]:
|
||||
parameters.pop("only_fields", None)
|
||||
parameters[cls._projection_key] = value
|
||||
return value
|
||||
|
||||
@classmethod
|
||||
def get_ordering(cls, parameters: dict) -> Optional[Sequence[str]]:
|
||||
return parameters.get(cls._ordering_key)
|
||||
|
||||
@classmethod
|
||||
def set_ordering(cls, parameters: dict, value: Sequence[str]) -> Sequence[str]:
|
||||
parameters[cls._ordering_key] = value
|
||||
return value
|
||||
|
||||
@classmethod
|
||||
def set_default_ordering(cls, parameters: dict, value: Sequence[str]) -> None:
|
||||
cls.set_ordering(parameters, cls.get_ordering(parameters) or value)
|
||||
|
||||
@classmethod
|
||||
def get_many_with_join(
|
||||
|
||||
@@ -40,10 +40,6 @@ class Settings(DbModelMixin, Document):
|
||||
""" Sets a new value or adds a new key/value setting (if key does not exist) """
|
||||
key = key.strip(sep)
|
||||
res = Settings.objects(key=key).update(key=key, value=value, upsert=True)
|
||||
# if Settings.objects(key=key).only("key"):
|
||||
#
|
||||
# else:
|
||||
# res = Settings(key=key, value=value).save()
|
||||
return bool(res)
|
||||
|
||||
@classmethod
|
||||
|
||||
@@ -1,10 +1,18 @@
|
||||
from mongoengine import EmbeddedDocument, StringField, DynamicField
|
||||
from mongoengine import (
|
||||
EmbeddedDocument,
|
||||
StringField,
|
||||
DynamicField,
|
||||
LongField,
|
||||
EmbeddedDocumentField,
|
||||
)
|
||||
|
||||
from database.fields import SafeMapField
|
||||
|
||||
|
||||
class MetricEvent(EmbeddedDocument):
|
||||
meta = {
|
||||
# For backwards compatibility reasons
|
||||
'strict': False,
|
||||
"strict": False,
|
||||
}
|
||||
|
||||
metric = StringField(required=True)
|
||||
@@ -12,3 +20,20 @@ class MetricEvent(EmbeddedDocument):
|
||||
value = DynamicField(required=True)
|
||||
min_value = DynamicField() # for backwards compatibility reasons
|
||||
max_value = DynamicField() # for backwards compatibility reasons
|
||||
|
||||
|
||||
class EventStats(EmbeddedDocument):
|
||||
meta = {
|
||||
# For backwards compatibility reasons
|
||||
"strict": False,
|
||||
}
|
||||
last_update = LongField()
|
||||
|
||||
|
||||
class MetricEventStats(EmbeddedDocument):
|
||||
meta = {
|
||||
# For backwards compatibility reasons
|
||||
"strict": False,
|
||||
}
|
||||
metric = StringField(required=True)
|
||||
event_stats_by_type = SafeMapField(field=EmbeddedDocumentField(EventStats))
|
||||
|
||||
@@ -18,10 +18,11 @@ from database.fields import (
|
||||
SafeSortedListField,
|
||||
)
|
||||
from database.model import AttributedDocument
|
||||
from database.model.base import ProperDictMixin
|
||||
from database.model.model_labels import ModelLabels
|
||||
from database.model.project import Project
|
||||
from database.utils import get_options
|
||||
from .metrics import MetricEvent
|
||||
from .metrics import MetricEvent, MetricEventStats
|
||||
from .output import Output
|
||||
|
||||
DEFAULT_LAST_ITERATION = 0
|
||||
@@ -66,10 +67,15 @@ class ArtifactTypeData(EmbeddedDocument):
|
||||
data_hash = StringField()
|
||||
|
||||
|
||||
class ArtifactModes:
|
||||
input = "input"
|
||||
output = "output"
|
||||
|
||||
|
||||
class Artifact(EmbeddedDocument):
|
||||
key = StringField(required=True)
|
||||
type = StringField(required=True)
|
||||
mode = StringField(choices=("input", "output"), default="output")
|
||||
mode = StringField(choices=get_options(ArtifactModes), default=ArtifactModes.output)
|
||||
uri = StringField()
|
||||
hash = StringField()
|
||||
content_size = LongField()
|
||||
@@ -78,7 +84,7 @@ class Artifact(EmbeddedDocument):
|
||||
display_data = SafeSortedListField(ListField(UnionField((int, float, str))))
|
||||
|
||||
|
||||
class Execution(EmbeddedDocument):
|
||||
class Execution(EmbeddedDocument, ProperDictMixin):
|
||||
test_split = IntField(default=0)
|
||||
parameters = SafeDictField(default=dict)
|
||||
model = StringField(reference_field="Model")
|
||||
@@ -156,3 +162,4 @@ class Task(AttributedDocument):
|
||||
last_update = DateTimeField()
|
||||
last_iteration = IntField(default=DEFAULT_LAST_ITERATION)
|
||||
last_metrics = SafeMapField(field=SafeMapField(EmbeddedDocumentField(MetricEvent)))
|
||||
metric_stats = SafeMapField(field=EmbeddedDocumentField(MetricEventStats))
|
||||
|
||||
@@ -1,7 +1,6 @@
|
||||
from mongoengine import Document, StringField
|
||||
from mongoengine import Document, StringField, DynamicField
|
||||
|
||||
from database import Database, strict
|
||||
from database.fields import SafeDictField
|
||||
from database.model import DbModelMixin
|
||||
from database.model.company import Company
|
||||
|
||||
@@ -18,4 +17,4 @@ class User(DbModelMixin, Document):
|
||||
family_name = StringField(user_set_allowed=True)
|
||||
given_name = StringField(user_set_allowed=True)
|
||||
avatar = StringField()
|
||||
preferences = SafeDictField(default=dict, exclude_by_default=True)
|
||||
preferences = DynamicField(default="", exclude_by_default=True)
|
||||
|
||||
@@ -96,7 +96,12 @@ def parse_from_call(call_data, fields, cls_fields, discard_none_values=True):
|
||||
continue
|
||||
if desc:
|
||||
if callable(desc):
|
||||
desc(value)
|
||||
try:
|
||||
desc(value)
|
||||
except TypeError:
|
||||
raise ParseCallError(f"expecting {desc.__name__}", field=field)
|
||||
except Exception as ex:
|
||||
raise ParseCallError(str(ex), field=field)
|
||||
else:
|
||||
if issubclass(desc, (list, tuple, dict)) and not isinstance(
|
||||
value, desc
|
||||
|
||||
@@ -10,7 +10,11 @@ from pathlib import Path
|
||||
from requests.adapters import HTTPAdapter
|
||||
from requests.packages.urllib3.util.retry import Retry
|
||||
|
||||
HERE = Path(__file__).parent
|
||||
HERE = Path(__file__).resolve().parent
|
||||
|
||||
session = requests.Session()
|
||||
adapter = HTTPAdapter(max_retries=Retry(5, backoff_factor=0.5))
|
||||
session.mount('http://', adapter)
|
||||
|
||||
|
||||
def apply_mappings_to_host(host: str):
|
||||
@@ -20,10 +24,6 @@ def apply_mappings_to_host(host: str):
|
||||
es_server = host
|
||||
url = f"{es_server}/_template/{f.stem}"
|
||||
|
||||
session = requests.Session()
|
||||
adapter = HTTPAdapter(max_retries=Retry(5, backoff_factor=0.5))
|
||||
session.mount('http://', adapter)
|
||||
|
||||
session.delete(url)
|
||||
r = session.post(
|
||||
url,
|
||||
|
||||
27
server/elastic/initialize.py
Normal file
27
server/elastic/initialize.py
Normal file
@@ -0,0 +1,27 @@
|
||||
from furl import furl
|
||||
|
||||
from config import config
|
||||
from elastic.apply_mappings import apply_mappings_to_host
|
||||
from es_factory import get_cluster_config
|
||||
|
||||
log = config.logger(__file__)
|
||||
|
||||
|
||||
class MissingElasticConfiguration(Exception):
|
||||
"""
|
||||
Exception when cluster configuration is not found in config files
|
||||
"""
|
||||
|
||||
pass
|
||||
|
||||
|
||||
def init_es_data():
|
||||
hosts_config = get_cluster_config("events").get("hosts")
|
||||
if not hosts_config:
|
||||
raise MissingElasticConfiguration("for cluster 'events'")
|
||||
|
||||
for conf in hosts_config:
|
||||
host = furl(scheme="http", host=conf["host"], port=conf["port"]).url
|
||||
log.info(f"Applying mappings to host: {host}")
|
||||
res = apply_mappings_to_host(host)
|
||||
log.info(res)
|
||||
@@ -1,7 +1,7 @@
|
||||
{
|
||||
"template": "events-*",
|
||||
"settings": {
|
||||
"number_of_shards": 5
|
||||
"number_of_shards": 1
|
||||
},
|
||||
"mappings": {
|
||||
"_default_": {
|
||||
|
||||
@@ -1,220 +0,0 @@
|
||||
import importlib.util
|
||||
from datetime import datetime
|
||||
from pathlib import Path
|
||||
from uuid import uuid4
|
||||
|
||||
import attr
|
||||
from furl import furl
|
||||
from mongoengine.connection import get_db
|
||||
from semantic_version import Version
|
||||
|
||||
import database.utils
|
||||
from bll.queue import QueueBLL
|
||||
from config import config
|
||||
from database import Database
|
||||
from database.model.auth import Role
|
||||
from database.model.auth import User as AuthUser, Credentials
|
||||
from database.model.company import Company
|
||||
from database.model.queue import Queue
|
||||
from database.model.settings import Settings
|
||||
from database.model.user import User
|
||||
from database.model.version import Version as DatabaseVersion
|
||||
from elastic.apply_mappings import apply_mappings_to_host
|
||||
from es_factory import get_cluster_config
|
||||
from service_repo.auth.fixed_user import FixedUser
|
||||
|
||||
log = config.logger(__file__)
|
||||
|
||||
migration_dir = (Path(__file__) / "../../migration/mongodb").resolve()
|
||||
|
||||
|
||||
class MissingElasticConfiguration(Exception):
|
||||
"""
|
||||
Exception when cluster configuration is not found in config files
|
||||
"""
|
||||
|
||||
pass
|
||||
|
||||
|
||||
def init_es_data():
|
||||
hosts_config = get_cluster_config("events").get("hosts")
|
||||
if not hosts_config:
|
||||
raise MissingElasticConfiguration("for cluster 'events'")
|
||||
|
||||
for conf in hosts_config:
|
||||
host = furl(scheme="http", host=conf["host"], port=conf["port"]).url
|
||||
log.info(f"Applying mappings to host: {host}")
|
||||
res = apply_mappings_to_host(host)
|
||||
log.info(res)
|
||||
|
||||
|
||||
def _ensure_company():
|
||||
company_id = config.get("apiserver.default_company")
|
||||
company = Company.objects(id=company_id).only("id").first()
|
||||
if company:
|
||||
return company_id
|
||||
|
||||
company_name = "trains"
|
||||
log.info(f"Creating company: {company_name}")
|
||||
company = Company(id=company_id, name=company_name)
|
||||
company.save()
|
||||
return company_id
|
||||
|
||||
|
||||
def _ensure_default_queue(company):
|
||||
"""
|
||||
If no queue is present for the company then
|
||||
create a new one and mark it as a default
|
||||
"""
|
||||
queue = Queue.objects(company=company).only("id").first()
|
||||
if queue:
|
||||
return
|
||||
|
||||
QueueBLL.create(company, name="default", system_tags=["default"])
|
||||
|
||||
|
||||
def _ensure_auth_user(user_data, company_id):
|
||||
ensure_credentials = {"key", "secret"}.issubset(user_data.keys())
|
||||
if ensure_credentials:
|
||||
user = AuthUser.objects(
|
||||
credentials__match=Credentials(
|
||||
key=user_data["key"], secret=user_data["secret"]
|
||||
)
|
||||
).first()
|
||||
if user:
|
||||
return user.id
|
||||
|
||||
log.info(f"Creating user: {user_data['name']}")
|
||||
user = AuthUser(
|
||||
id=user_data.get("id", f"__{user_data['name']}__"),
|
||||
name=user_data["name"],
|
||||
company=company_id,
|
||||
role=user_data["role"],
|
||||
email=user_data["email"],
|
||||
created=datetime.utcnow(),
|
||||
credentials=[Credentials(key=user_data["key"], secret=user_data["secret"])]
|
||||
if ensure_credentials
|
||||
else None,
|
||||
)
|
||||
|
||||
user.save()
|
||||
|
||||
return user.id
|
||||
|
||||
|
||||
def _ensure_user(user: FixedUser, company_id: str):
|
||||
if User.objects(id=user.user_id).first():
|
||||
return
|
||||
|
||||
data = attr.asdict(user)
|
||||
data["id"] = user.user_id
|
||||
data["email"] = f"{user.user_id}@example.com"
|
||||
data["role"] = Role.user
|
||||
|
||||
_ensure_auth_user(user_data=data, company_id=company_id)
|
||||
|
||||
given_name, _, family_name = user.name.partition(" ")
|
||||
|
||||
User(
|
||||
id=user.user_id,
|
||||
company=company_id,
|
||||
name=user.name,
|
||||
given_name=given_name,
|
||||
family_name=family_name,
|
||||
).save()
|
||||
|
||||
|
||||
def _apply_migrations():
|
||||
if not migration_dir.is_dir():
|
||||
raise ValueError(f"Invalid migration dir {migration_dir}")
|
||||
|
||||
try:
|
||||
previous_versions = sorted(
|
||||
(Version(ver.num) for ver in DatabaseVersion.objects().only("num")),
|
||||
reverse=True,
|
||||
)
|
||||
except ValueError as ex:
|
||||
raise ValueError(f"Invalid database version number encountered: {ex}")
|
||||
|
||||
last_version = previous_versions[0] if previous_versions else Version("0.0.0")
|
||||
|
||||
try:
|
||||
new_scripts = {
|
||||
ver: path
|
||||
for ver, path in ((Version(f.stem), f) for f in migration_dir.glob("*.py"))
|
||||
if ver > last_version
|
||||
}
|
||||
except ValueError as ex:
|
||||
raise ValueError(f"Failed parsing migration version from file: {ex}")
|
||||
|
||||
dbs = {Database.auth: "migrate_auth", Database.backend: "migrate_backend"}
|
||||
|
||||
migration_log = log.getChild("mongodb_migration")
|
||||
|
||||
for script_version in sorted(new_scripts.keys()):
|
||||
script = new_scripts[script_version]
|
||||
spec = importlib.util.spec_from_file_location(script.stem, str(script))
|
||||
module = importlib.util.module_from_spec(spec)
|
||||
spec.loader.exec_module(module)
|
||||
|
||||
for alias, func_name in dbs.items():
|
||||
func = getattr(module, func_name, None)
|
||||
if not func:
|
||||
continue
|
||||
try:
|
||||
migration_log.info(f"Applying {script.stem}/{func_name}()")
|
||||
func(get_db(alias))
|
||||
except Exception:
|
||||
migration_log.exception(f"Failed applying {script}:{func_name}()")
|
||||
raise ValueError("Migration failed, aborting. Please restore backup.")
|
||||
|
||||
DatabaseVersion(
|
||||
id=database.utils.id(),
|
||||
num=script.stem,
|
||||
created=datetime.utcnow(),
|
||||
desc="Applied on server startup",
|
||||
).save()
|
||||
|
||||
|
||||
def _ensure_uuid():
|
||||
Settings.add_value("server.uuid", str(uuid4()))
|
||||
|
||||
|
||||
def init_mongo_data():
|
||||
try:
|
||||
_apply_migrations()
|
||||
|
||||
_ensure_uuid()
|
||||
|
||||
company_id = _ensure_company()
|
||||
_ensure_default_queue(company_id)
|
||||
|
||||
users = [
|
||||
{
|
||||
"name": "apiserver",
|
||||
"role": Role.system,
|
||||
"email": "apiserver@example.com",
|
||||
},
|
||||
{
|
||||
"name": "webserver",
|
||||
"role": Role.system,
|
||||
"email": "webserver@example.com",
|
||||
},
|
||||
{"name": "tests", "role": Role.user, "email": "tests@example.com"},
|
||||
]
|
||||
|
||||
for user in users:
|
||||
credentials = config.get(f"secure.credentials.{user['name']}")
|
||||
user["key"] = credentials.user_key
|
||||
user["secret"] = credentials.user_secret
|
||||
_ensure_auth_user(user, company_id)
|
||||
|
||||
if FixedUser.enabled():
|
||||
log.info("Fixed users mode is enabled")
|
||||
for user in FixedUser.from_config():
|
||||
try:
|
||||
_ensure_user(user, company_id)
|
||||
except Exception as ex:
|
||||
log.error(f"Failed creating fixed user {user['name']}: {ex}")
|
||||
except Exception as ex:
|
||||
log.exception("Failed initializing mongodb")
|
||||
70
server/mongo/initialize/__init__.py
Normal file
70
server/mongo/initialize/__init__.py
Normal file
@@ -0,0 +1,70 @@
|
||||
from pathlib import Path
|
||||
|
||||
from config import config
|
||||
from database.model.auth import Role
|
||||
from service_repo.auth.fixed_user import FixedUser
|
||||
from .migration import _apply_migrations
|
||||
from .pre_populate import PrePopulate
|
||||
from .user import ensure_fixed_user, _ensure_auth_user, _ensure_backend_user
|
||||
from .util import _ensure_company, _ensure_default_queue, _ensure_uuid
|
||||
|
||||
log = config.logger(__package__)
|
||||
|
||||
|
||||
def init_mongo_data():
|
||||
try:
|
||||
empty_dbs = _apply_migrations(log)
|
||||
|
||||
_ensure_uuid()
|
||||
|
||||
company_id = _ensure_company(log)
|
||||
|
||||
_ensure_default_queue(company_id)
|
||||
|
||||
if empty_dbs and config.get("apiserver.mongo.pre_populate.enabled", False):
|
||||
zip_file = config.get("apiserver.mongo.pre_populate.zip_file")
|
||||
if not zip_file or not Path(zip_file).is_file():
|
||||
msg = f"Failed pre-populating database: invalid zip file {zip_file}"
|
||||
if config.get("apiserver.mongo.pre_populate.fail_on_error", False):
|
||||
log.error(msg)
|
||||
raise ValueError(msg)
|
||||
else:
|
||||
log.warning(msg)
|
||||
else:
|
||||
|
||||
user_id = _ensure_backend_user(
|
||||
"__allegroai__", company_id, "Allegro.ai"
|
||||
)
|
||||
|
||||
PrePopulate.import_from_zip(zip_file, user_id=user_id)
|
||||
|
||||
users = [
|
||||
{
|
||||
"name": "apiserver",
|
||||
"role": Role.system,
|
||||
"email": "apiserver@example.com",
|
||||
},
|
||||
{
|
||||
"name": "webserver",
|
||||
"role": Role.system,
|
||||
"email": "webserver@example.com",
|
||||
},
|
||||
{"name": "tests", "role": Role.user, "email": "tests@example.com"},
|
||||
]
|
||||
|
||||
for user in users:
|
||||
credentials = config.get(f"secure.credentials.{user['name']}")
|
||||
user["key"] = credentials.user_key
|
||||
user["secret"] = credentials.user_secret
|
||||
_ensure_auth_user(user, company_id, log=log)
|
||||
|
||||
if FixedUser.enabled():
|
||||
log.info("Fixed users mode is enabled")
|
||||
FixedUser.validate()
|
||||
for user in FixedUser.from_config():
|
||||
try:
|
||||
ensure_fixed_user(user, company_id, log=log)
|
||||
except Exception as ex:
|
||||
log.error(f"Failed creating fixed user {user.name}: {ex}")
|
||||
except Exception as ex:
|
||||
log.exception("Failed initializing mongodb")
|
||||
86
server/mongo/initialize/migration.py
Normal file
86
server/mongo/initialize/migration.py
Normal file
@@ -0,0 +1,86 @@
|
||||
import importlib.util
|
||||
from datetime import datetime
|
||||
from logging import Logger
|
||||
from pathlib import Path
|
||||
|
||||
from mongoengine.connection import get_db
|
||||
from semantic_version import Version
|
||||
|
||||
import database.utils
|
||||
from database import Database
|
||||
from database.model.version import Version as DatabaseVersion
|
||||
|
||||
migration_dir = Path(__file__).resolve().parent.with_name("migrations")
|
||||
|
||||
|
||||
def _apply_migrations(log: Logger) -> bool:
|
||||
"""
|
||||
Apply migrations as found in the migration dir.
|
||||
Returns a boolean indicating whether the database was empty prior to migration.
|
||||
"""
|
||||
log = log.getChild(Path(__file__).stem)
|
||||
|
||||
log.info(f"Started mongodb migrations")
|
||||
|
||||
if not migration_dir.is_dir():
|
||||
raise ValueError(f"Invalid migration dir {migration_dir}")
|
||||
|
||||
empty_dbs = not any(
|
||||
get_db(alias).collection_names()
|
||||
for alias in database.utils.get_options(Database)
|
||||
)
|
||||
|
||||
try:
|
||||
previous_versions = sorted(
|
||||
(Version(ver.num) for ver in DatabaseVersion.objects().only("num")),
|
||||
reverse=True,
|
||||
)
|
||||
except ValueError as ex:
|
||||
raise ValueError(f"Invalid database version number encountered: {ex}")
|
||||
|
||||
last_version = previous_versions[0] if previous_versions else Version("0.0.0")
|
||||
|
||||
try:
|
||||
new_scripts = {
|
||||
ver: path
|
||||
for ver, path in ((Version(f.stem), f) for f in migration_dir.glob("*.py"))
|
||||
if ver > last_version
|
||||
}
|
||||
except ValueError as ex:
|
||||
raise ValueError(f"Failed parsing migration version from file: {ex}")
|
||||
|
||||
dbs = {Database.auth: "migrate_auth", Database.backend: "migrate_backend"}
|
||||
|
||||
for script_version in sorted(new_scripts):
|
||||
script = new_scripts[script_version]
|
||||
|
||||
if empty_dbs:
|
||||
log.info(f"Skipping migration {script.name} (empty databases)")
|
||||
else:
|
||||
spec = importlib.util.spec_from_file_location(script.stem, str(script))
|
||||
module = importlib.util.module_from_spec(spec)
|
||||
spec.loader.exec_module(module)
|
||||
|
||||
for alias, func_name in dbs.items():
|
||||
func = getattr(module, func_name, None)
|
||||
if not func:
|
||||
continue
|
||||
try:
|
||||
log.info(f"Applying {script.stem}/{func_name}()")
|
||||
func(get_db(alias))
|
||||
except Exception:
|
||||
log.exception(f"Failed applying {script}:{func_name}()")
|
||||
raise ValueError(
|
||||
"Migration failed, aborting. Please restore backup."
|
||||
)
|
||||
|
||||
DatabaseVersion(
|
||||
id=database.utils.id(),
|
||||
num=script.stem,
|
||||
created=datetime.utcnow(),
|
||||
desc="Applied on server startup",
|
||||
).save()
|
||||
|
||||
log.info("Finished mongodb migrations")
|
||||
|
||||
return empty_dbs
|
||||
153
server/mongo/initialize/pre_populate.py
Normal file
153
server/mongo/initialize/pre_populate.py
Normal file
@@ -0,0 +1,153 @@
|
||||
import importlib
|
||||
from collections import defaultdict
|
||||
from datetime import datetime
|
||||
from os.path import splitext
|
||||
from typing import List, Optional, Any, Type, Set, Dict
|
||||
from zipfile import ZipFile, ZIP_BZIP2
|
||||
|
||||
import mongoengine
|
||||
from tqdm import tqdm
|
||||
|
||||
|
||||
class PrePopulate:
|
||||
@classmethod
|
||||
def export_to_zip(
|
||||
cls, filename: str, experiments: List[str] = None, projects: List[str] = None
|
||||
):
|
||||
with ZipFile(filename, mode="w", compression=ZIP_BZIP2) as zfile:
|
||||
cls._export(zfile, experiments, projects)
|
||||
|
||||
@classmethod
|
||||
def import_from_zip(cls, filename: str, user_id: str = None):
|
||||
with ZipFile(filename) as zfile:
|
||||
cls._import(zfile, user_id)
|
||||
|
||||
@staticmethod
|
||||
def _resolve_type(
|
||||
cls: Type[mongoengine.Document], ids: Optional[List[str]]
|
||||
) -> List[Any]:
|
||||
ids = set(ids)
|
||||
items = list(cls.objects(id__in=list(ids)))
|
||||
resolved = {i.id for i in items}
|
||||
missing = ids - resolved
|
||||
for name_candidate in missing:
|
||||
results = list(cls.objects(name=name_candidate))
|
||||
if not results:
|
||||
print(f"ERROR: no match for `{name_candidate}`")
|
||||
exit(1)
|
||||
elif len(results) > 1:
|
||||
print(f"ERROR: more than one match for `{name_candidate}`")
|
||||
exit(1)
|
||||
items.append(results[0])
|
||||
return items
|
||||
|
||||
@classmethod
|
||||
def _resolve_entities(
|
||||
cls, experiments: List[str] = None, projects: List[str] = None
|
||||
) -> Dict[Type[mongoengine.Document], Set[mongoengine.Document]]:
|
||||
from database.model.project import Project
|
||||
from database.model.task.task import Task
|
||||
|
||||
entities = defaultdict(set)
|
||||
|
||||
if projects:
|
||||
print("Reading projects...")
|
||||
entities[Project].update(cls._resolve_type(Project, projects))
|
||||
print("--> Reading project experiments...")
|
||||
objs = Task.objects(
|
||||
project__in=list(set(filter(None, (p.id for p in entities[Project]))))
|
||||
)
|
||||
entities[Task].update(o for o in objs if o.id not in (experiments or []))
|
||||
|
||||
if experiments:
|
||||
print("Reading experiments...")
|
||||
entities[Task].update(cls._resolve_type(Task, experiments))
|
||||
print("--> Reading experiments projects...")
|
||||
objs = Project.objects(
|
||||
id__in=list(set(filter(None, (p.project for p in entities[Task]))))
|
||||
)
|
||||
project_ids = {p.id for p in entities[Project]}
|
||||
entities[Project].update(o for o in objs if o.id not in project_ids)
|
||||
|
||||
return entities
|
||||
|
||||
@classmethod
|
||||
def _cleanup_task(cls, task):
|
||||
from database.model.task.task import TaskStatus
|
||||
|
||||
task.completed = None
|
||||
task.started = None
|
||||
if task.execution:
|
||||
task.execution.model = None
|
||||
task.execution.model_desc = None
|
||||
task.execution.model_labels = None
|
||||
if task.output:
|
||||
task.output.model = None
|
||||
|
||||
task.status = TaskStatus.created
|
||||
task.comment = "Auto generated by Allegro.ai"
|
||||
task.created = datetime.utcnow()
|
||||
task.last_iteration = 0
|
||||
task.last_update = task.created
|
||||
task.status_changed = task.created
|
||||
task.status_message = ""
|
||||
task.status_reason = ""
|
||||
task.user = ""
|
||||
|
||||
@classmethod
|
||||
def _cleanup_entity(cls, entity_cls, entity):
|
||||
from database.model.task.task import Task
|
||||
if entity_cls == Task:
|
||||
cls._cleanup_task(entity)
|
||||
|
||||
@classmethod
|
||||
def _export(
|
||||
cls, writer: ZipFile, experiments: List[str] = None, projects: List[str] = None
|
||||
):
|
||||
entities = cls._resolve_entities(experiments, projects)
|
||||
|
||||
for cls_, items in entities.items():
|
||||
if not items:
|
||||
continue
|
||||
filename = f"{cls_.__module__}.{cls_.__name__}.json"
|
||||
print(f"Writing {len(items)} items into {writer.filename}:{filename}")
|
||||
with writer.open(filename, "w") as f:
|
||||
f.write("[\n".encode("utf-8"))
|
||||
last = len(items) - 1
|
||||
for i, item in enumerate(items):
|
||||
cls._cleanup_entity(cls_, item)
|
||||
f.write(item.to_json().encode("utf-8"))
|
||||
if i != last:
|
||||
f.write(",".encode("utf-8"))
|
||||
f.write("\n".encode("utf-8"))
|
||||
f.write("]\n".encode("utf-8"))
|
||||
|
||||
@staticmethod
|
||||
def _import(reader: ZipFile, user_id: str = None):
|
||||
for file_info in reader.filelist:
|
||||
full_name = splitext(file_info.orig_filename)[0]
|
||||
print(f"Reading {reader.filename}:{full_name}...")
|
||||
module_name, _, class_name = full_name.rpartition(".")
|
||||
module = importlib.import_module(module_name)
|
||||
cls_: Type[mongoengine.Document] = getattr(module, class_name)
|
||||
|
||||
with reader.open(file_info) as f:
|
||||
for item in tqdm(
|
||||
f.readlines(),
|
||||
desc=f"Writing {cls_.__name__.lower()}s into database",
|
||||
unit="doc",
|
||||
):
|
||||
item = (
|
||||
item.decode("utf-8")
|
||||
.strip()
|
||||
.lstrip("[")
|
||||
.rstrip("]")
|
||||
.rstrip(",")
|
||||
.strip()
|
||||
)
|
||||
if not item:
|
||||
continue
|
||||
doc = cls_.from_json(item)
|
||||
if user_id is not None and hasattr(doc, "user"):
|
||||
doc.user = user_id
|
||||
doc.save(force_insert=True)
|
||||
74
server/mongo/initialize/user.py
Normal file
74
server/mongo/initialize/user.py
Normal file
@@ -0,0 +1,74 @@
|
||||
from datetime import datetime
|
||||
from logging import Logger
|
||||
|
||||
import attr
|
||||
|
||||
from database.model.auth import Role
|
||||
from database.model.auth import User as AuthUser, Credentials
|
||||
from database.model.user import User
|
||||
from service_repo.auth.fixed_user import FixedUser
|
||||
|
||||
|
||||
def _ensure_auth_user(user_data: dict, company_id: str, log: Logger):
|
||||
ensure_credentials = {"key", "secret"}.issubset(user_data)
|
||||
if ensure_credentials:
|
||||
user = AuthUser.objects(
|
||||
credentials__match=Credentials(
|
||||
key=user_data["key"], secret=user_data["secret"]
|
||||
)
|
||||
).first()
|
||||
if user:
|
||||
return user.id
|
||||
|
||||
log.info(f"Creating user: {user_data['name']}")
|
||||
user = AuthUser(
|
||||
id=user_data.get("id", f"__{user_data['name']}__"),
|
||||
name=user_data["name"],
|
||||
company=company_id,
|
||||
role=user_data["role"],
|
||||
email=user_data["email"],
|
||||
created=datetime.utcnow(),
|
||||
credentials=[Credentials(key=user_data["key"], secret=user_data["secret"])]
|
||||
if ensure_credentials
|
||||
else None,
|
||||
)
|
||||
|
||||
user.save()
|
||||
|
||||
return user.id
|
||||
|
||||
|
||||
def _ensure_backend_user(user_id: str, company_id: str, user_name: str):
|
||||
given_name, _, family_name = user_name.partition(" ")
|
||||
|
||||
User(
|
||||
id=user_id,
|
||||
company=company_id,
|
||||
name=user_name,
|
||||
given_name=given_name,
|
||||
family_name=family_name,
|
||||
).save()
|
||||
|
||||
return user_id
|
||||
|
||||
|
||||
def ensure_fixed_user(user: FixedUser, company_id: str, log: Logger):
|
||||
if User.objects(id=user.user_id).first():
|
||||
return
|
||||
|
||||
data = attr.asdict(user)
|
||||
data["id"] = user.user_id
|
||||
data["email"] = f"{user.user_id}@example.com"
|
||||
data["role"] = Role.user
|
||||
|
||||
_ensure_auth_user(user_data=data, company_id=company_id, log=log)
|
||||
|
||||
given_name, _, family_name = user.name.partition(" ")
|
||||
|
||||
User(
|
||||
id=user.user_id,
|
||||
company=company_id,
|
||||
name=user.name,
|
||||
given_name=given_name,
|
||||
family_name=family_name,
|
||||
).save()
|
||||
40
server/mongo/initialize/util.py
Normal file
40
server/mongo/initialize/util.py
Normal file
@@ -0,0 +1,40 @@
|
||||
from logging import Logger
|
||||
from uuid import uuid4
|
||||
|
||||
from bll.queue import QueueBLL
|
||||
from config import config
|
||||
from config.info import get_default_company
|
||||
from database.model.company import Company
|
||||
from database.model.queue import Queue
|
||||
from database.model.settings import Settings
|
||||
|
||||
log = config.logger(__file__)
|
||||
|
||||
|
||||
def _ensure_company(log: Logger):
|
||||
company_id = get_default_company()
|
||||
company = Company.objects(id=company_id).only("id").first()
|
||||
if company:
|
||||
return company_id
|
||||
|
||||
company_name = "trains"
|
||||
log.info(f"Creating company: {company_name}")
|
||||
company = Company(id=company_id, name=company_name)
|
||||
company.save()
|
||||
return company_id
|
||||
|
||||
|
||||
def _ensure_default_queue(company):
|
||||
"""
|
||||
If no queue is present for the company then
|
||||
create a new one and mark it as a default
|
||||
"""
|
||||
queue = Queue.objects(company=company).only("id").first()
|
||||
if queue:
|
||||
return
|
||||
|
||||
QueueBLL.create(company, name="default", system_tags=["default"])
|
||||
|
||||
|
||||
def _ensure_uuid():
|
||||
Settings.add_value("server.uuid", str(uuid4()))
|
||||
20
server/mongo/migrations/0.13.0.py
Normal file
20
server/mongo/migrations/0.13.0.py
Normal file
@@ -0,0 +1,20 @@
|
||||
import json
|
||||
|
||||
from pymongo.database import Database, Collection
|
||||
|
||||
|
||||
def migrate_auth(db: Database):
|
||||
collection: Collection = db["user"]
|
||||
if "name_1_company_1" in [doc["name"] for doc in collection.list_indexes()]:
|
||||
collection.drop_index("name_1_company_1")
|
||||
|
||||
|
||||
def migrate_backend(db: Database):
|
||||
collection: Collection = db["user"]
|
||||
users = collection.find(
|
||||
{"preferences": {"$exists": True, "$ne": None, "$type": "object"}}
|
||||
)
|
||||
for doc in users:
|
||||
collection.update_one(
|
||||
{"_id": doc["_id"]}, {"$set": {"preferences": json.dumps(doc["preferences"])}}
|
||||
)
|
||||
46
server/mongo/migrations/0.14.0.py
Normal file
46
server/mongo/migrations/0.14.0.py
Normal file
@@ -0,0 +1,46 @@
|
||||
import hashlib
|
||||
|
||||
from pymongo.database import Database, Collection
|
||||
|
||||
from service_repo.auth.fixed_user import FixedUser
|
||||
|
||||
|
||||
def _get_ids():
|
||||
if not FixedUser.enabled():
|
||||
return
|
||||
|
||||
return {
|
||||
hashlib.md5(f"{user.username}:{user.password}".encode()).hexdigest(): user.user_id
|
||||
for user in FixedUser.from_config()
|
||||
}
|
||||
|
||||
|
||||
def _switch_uuid(collection: Collection, uuid_field: str, uuids: dict):
|
||||
docs = list(collection.find({uuid_field: {"$in": [uuids]}}))
|
||||
if not docs:
|
||||
return
|
||||
replaced_uuids = [doc[uuid_field] for doc in docs]
|
||||
for doc in docs:
|
||||
doc[uuid_field] = uuids[doc[uuid_field]]
|
||||
collection.insert_many(docs)
|
||||
collection.delete_many({uuid_field: {"$in": replaced_uuids}})
|
||||
|
||||
|
||||
def migrate_auth(db: Database):
|
||||
uuids = _get_ids()
|
||||
if not uuids:
|
||||
return
|
||||
|
||||
collection = db["user"]
|
||||
collection.drop_index("name_1_company_1")
|
||||
|
||||
_switch_uuid(collection=collection, uuid_field="_id", uuids=uuids)
|
||||
|
||||
|
||||
def migrate_backend(db: Database):
|
||||
uuids = _get_ids()
|
||||
if not uuids:
|
||||
return
|
||||
|
||||
for name in ("project", "task", "model"):
|
||||
_switch_uuid(collection=db[name], uuid_field="user", uuids=uuids)
|
||||
@@ -1,31 +1,30 @@
|
||||
six
|
||||
Flask>=0.12.2
|
||||
elasticsearch>=5.0.0,<6.0.0
|
||||
pyhocon>=0.3.35
|
||||
requests>=2.13.0
|
||||
arrow>=0.10.0
|
||||
pymongo==3.6.1 # 3.7 has a bug multiple users logged in
|
||||
Flask-Cors>=3.0.5
|
||||
Flask-Compress>=1.4.0
|
||||
mongoengine==0.16.2
|
||||
jsonmodels>=2.3
|
||||
pyjwt>=1.3.0
|
||||
gunicorn>=19.7.1
|
||||
Jinja2==2.10
|
||||
python-rapidjson>=0.6.3
|
||||
jsonschema>=2.6.0
|
||||
dpath>=1.4.2
|
||||
funcsigs==1.0.2
|
||||
luqum>=0.7.2
|
||||
typing>=3.6.4
|
||||
attrs>=19.1.0
|
||||
nested_dict>=1.61
|
||||
related>=0.7.2
|
||||
validators>=0.12.4
|
||||
fastjsonschema>=2.8
|
||||
boltons>=19.1.0
|
||||
semantic_version>=2.6.0,<3
|
||||
dpath>=1.4.2,<2.0
|
||||
elasticsearch>=5.0.0,<6.0.0
|
||||
fastjsonschema>=2.8
|
||||
Flask-Compress>=1.4.0
|
||||
Flask-Cors>=3.0.5
|
||||
Flask>=0.12.2
|
||||
funcsigs==1.0.2
|
||||
furl>=2.0.0
|
||||
redis>=2.10.5
|
||||
gunicorn>=19.7.1
|
||||
humanfriendly==4.18
|
||||
Jinja2==2.10
|
||||
jsonmodels>=2.3
|
||||
jsonschema>=2.6.0
|
||||
luqum>=0.7.2
|
||||
mongoengine==0.16.2
|
||||
nested_dict>=1.61
|
||||
psutil>=5.6.5
|
||||
pyhocon>=0.3.35
|
||||
pyjwt>=1.3.0
|
||||
pymongo==3.6.1 # 3.7 has a bug multiple users logged in
|
||||
python-rapidjson>=0.6.3
|
||||
redis>=2.10.5
|
||||
related>=0.7.2
|
||||
requests>=2.13.0
|
||||
semantic_version>=2.8.0,<3
|
||||
six
|
||||
tqdm
|
||||
validators>=0.12.4
|
||||
@@ -171,6 +171,30 @@
|
||||
critical
|
||||
]
|
||||
}
|
||||
event_type_enum {
|
||||
type: string
|
||||
enum: [
|
||||
training_stats_scalar
|
||||
training_stats_vector
|
||||
training_debug_image
|
||||
plot
|
||||
log
|
||||
]
|
||||
}
|
||||
task_metric {
|
||||
type: object
|
||||
required: [task, metric]
|
||||
properties {
|
||||
task {
|
||||
description: "Task ID"
|
||||
type: string
|
||||
}
|
||||
metric {
|
||||
description: "Metric name"
|
||||
type: string
|
||||
}
|
||||
}
|
||||
}
|
||||
task_log_event {
|
||||
description: """A log event associated with a task."""
|
||||
type: object
|
||||
@@ -319,6 +343,84 @@
|
||||
}
|
||||
}
|
||||
}
|
||||
"2.7" {
|
||||
description: "Get the debug image events for the requested amount of iterations per each task's metric"
|
||||
request {
|
||||
type: object
|
||||
required: [
|
||||
metrics
|
||||
]
|
||||
properties {
|
||||
metrics {
|
||||
type: array
|
||||
items { "$ref": "#/definitions/task_metric" }
|
||||
description: "List metrics for which the envents will be retreived"
|
||||
}
|
||||
iters {
|
||||
type: integer
|
||||
description: "Max number of latest iterations for which to return debug images"
|
||||
}
|
||||
navigate_earlier {
|
||||
type: boolean
|
||||
description: "If set then events are retreived from later iterations to earlier ones. Otherwise from earlier iterations to the later. The default is True"
|
||||
}
|
||||
refresh {
|
||||
type: boolean
|
||||
description: "If set then scroll will be moved to the latest iterations. The default is False"
|
||||
}
|
||||
scroll_id {
|
||||
type: string
|
||||
description: "Scroll ID of previous call (used for getting more results)"
|
||||
}
|
||||
}
|
||||
}
|
||||
response {
|
||||
type: object
|
||||
properties {
|
||||
metrics {
|
||||
type: array
|
||||
items: { type: object }
|
||||
description: "Debug image events grouped by task metrics and iterations"
|
||||
}
|
||||
scroll_id {
|
||||
type: string
|
||||
description: "Scroll ID for getting more results"
|
||||
}
|
||||
}
|
||||
}
|
||||
}
|
||||
}
|
||||
get_task_metrics{
|
||||
"2.7": {
|
||||
description: "For each task, get a list of metrics for which the requested event type was reported"
|
||||
request {
|
||||
type: object
|
||||
required: [
|
||||
tasks
|
||||
]
|
||||
properties {
|
||||
tasks {
|
||||
type: array
|
||||
items { type: string }
|
||||
description: "Task IDs"
|
||||
}
|
||||
event_type {
|
||||
"description": "Event type"
|
||||
"$ref": "#/definitions/event_type_enum"
|
||||
}
|
||||
}
|
||||
}
|
||||
response {
|
||||
type: object
|
||||
properties {
|
||||
metrics {
|
||||
type: array
|
||||
items { type: object }
|
||||
description: "List of task with their metrics"
|
||||
}
|
||||
}
|
||||
}
|
||||
}
|
||||
}
|
||||
get_task_log {
|
||||
"1.5" {
|
||||
@@ -455,7 +557,7 @@
|
||||
}
|
||||
batch_size {
|
||||
type: integer
|
||||
description: "Number of events to return each time"
|
||||
description: "Number of events to return each time (default 500)"
|
||||
}
|
||||
event_type {
|
||||
type: string
|
||||
|
||||
@@ -324,7 +324,6 @@
|
||||
required: [
|
||||
uri
|
||||
name
|
||||
labels
|
||||
]
|
||||
properties {
|
||||
uri {
|
||||
|
||||
@@ -86,6 +86,7 @@ endpoints {
|
||||
}
|
||||
}
|
||||
report_stats_option {
|
||||
allow_roles = [ "*" ]
|
||||
"2.4" {
|
||||
description: "Get or set the report statistics option per-company"
|
||||
request {
|
||||
@@ -117,6 +118,10 @@ report_stats_option {
|
||||
description: "If enabled, returns Id of the user who enabled the option"
|
||||
type: string
|
||||
}
|
||||
current_version {
|
||||
description: "Returns the current server version"
|
||||
type: string
|
||||
}
|
||||
}
|
||||
}
|
||||
}
|
||||
|
||||
@@ -550,6 +550,60 @@ get_all {
|
||||
}
|
||||
}
|
||||
}
|
||||
clone {
|
||||
"2.5" {
|
||||
description: "Clone an existing task"
|
||||
request {
|
||||
type: object
|
||||
required: [ task ]
|
||||
properties {
|
||||
task {
|
||||
description: "ID of the task"
|
||||
type: string
|
||||
}
|
||||
new_task_name {
|
||||
description: "The name of the cloned task. If not provided then taken from the original task"
|
||||
type: string
|
||||
}
|
||||
new_task_comment {
|
||||
description: "The comment of the cloned task. If not provided then taken from the original task"
|
||||
type: string
|
||||
}
|
||||
new_task_tags {
|
||||
description: "The user-defined tags of the cloned task. If not provided then taken from the original task"
|
||||
type: array
|
||||
items { type: string }
|
||||
}
|
||||
new_task_system_tags {
|
||||
description: "The system tags of the cloned task. If not provided then empty"
|
||||
type: array
|
||||
items { type: string }
|
||||
}
|
||||
new_task_parent {
|
||||
description: "The parent of the cloned task. If not provided then taken from the original task"
|
||||
type: string
|
||||
}
|
||||
new_task_project {
|
||||
description: "The project of the cloned task. If not provided then taken from the original task"
|
||||
type: string
|
||||
}
|
||||
execution_overrides {
|
||||
description: "The execution params for the cloned task. The params not specified are taken from the original task"
|
||||
"$ref": "#/definitions/execution"
|
||||
}
|
||||
}
|
||||
}
|
||||
response {
|
||||
type: object
|
||||
properties {
|
||||
id {
|
||||
description: "ID of the new task"
|
||||
type: string
|
||||
}
|
||||
}
|
||||
}
|
||||
}
|
||||
}
|
||||
create {
|
||||
"2.1" {
|
||||
description: "Create a new task"
|
||||
@@ -1304,4 +1358,40 @@ ping {
|
||||
additionalProperties: false
|
||||
}
|
||||
}
|
||||
}
|
||||
|
||||
add_or_update_artifacts {
|
||||
"2.6" {
|
||||
description: """ Update an existing artifact (search by key/mode) or add a new one """
|
||||
request {
|
||||
type: object
|
||||
required: [ task, artifacts ]
|
||||
properties {
|
||||
task {
|
||||
description: "Task ID"
|
||||
type: string
|
||||
}
|
||||
artifacts {
|
||||
description: "Artifacts to add or update"
|
||||
type: array
|
||||
items { "$ref": "#/definitions/artifact" }
|
||||
}
|
||||
}
|
||||
}
|
||||
response {
|
||||
type: object
|
||||
properties {
|
||||
added {
|
||||
description: "Keys of artifacts added"
|
||||
type: array
|
||||
items { type: string }
|
||||
}
|
||||
updated {
|
||||
description: "Keys of artifacts updated"
|
||||
type: array
|
||||
items { type: string }
|
||||
}
|
||||
}
|
||||
}
|
||||
}
|
||||
}
|
||||
@@ -1,3 +1,4 @@
|
||||
import atexit
|
||||
from argparse import ArgumentParser
|
||||
|
||||
from flask import Flask, request, Response
|
||||
@@ -9,13 +10,15 @@ import database
|
||||
from apierrors.base import BaseError
|
||||
from bll.statistics.stats_reporter import StatisticsReporter
|
||||
from config import config
|
||||
from init_data import init_es_data, init_mongo_data
|
||||
from elastic.initialize import init_es_data
|
||||
from mongo.initialize import init_mongo_data
|
||||
from service_repo import ServiceRepo, APICall
|
||||
from service_repo.auth import AuthType
|
||||
from service_repo.errors import PathParsingError
|
||||
from timing_context import TimingContext
|
||||
from updates import check_updates_thread
|
||||
from utilities import json
|
||||
from utilities.threads_manager import ThreadsManager
|
||||
|
||||
app = Flask(__name__, static_url_path="/static")
|
||||
CORS(app, **config.get("apiserver.cors"))
|
||||
@@ -41,6 +44,13 @@ check_updates_thread.start()
|
||||
StatisticsReporter.start()
|
||||
|
||||
|
||||
def graceful_shutdown():
|
||||
ThreadsManager.terminating = True
|
||||
|
||||
|
||||
atexit.register(graceful_shutdown)
|
||||
|
||||
|
||||
@app.before_first_request
|
||||
def before_app_first_request():
|
||||
pass
|
||||
|
||||
@@ -21,6 +21,8 @@ JSON_CONTENT_TYPE = "application/json"
|
||||
class DataContainer(object):
|
||||
""" Data container that supports raw data (dict or a list of batched dicts) and a data model """
|
||||
|
||||
null_schema_validator: SchemaValidator = SchemaValidator(None)
|
||||
|
||||
def __init__(self, data=None, batched_data=None):
|
||||
if data and batched_data:
|
||||
raise ValueError("data and batched data are not supported simultaneously")
|
||||
@@ -28,7 +30,7 @@ class DataContainer(object):
|
||||
self._data = None
|
||||
self._data_model = None
|
||||
self._data_model_cls = None
|
||||
self._schema_validator: SchemaValidator = SchemaValidator(None)
|
||||
self._schema_validator: SchemaValidator = self.null_schema_validator
|
||||
# use setter to properly initialize data
|
||||
self.data = data
|
||||
self.batched_data = batched_data
|
||||
|
||||
@@ -5,27 +5,45 @@ from typing import Sequence, TypeVar
|
||||
import attr
|
||||
|
||||
from config import config
|
||||
from config.info import get_default_company
|
||||
|
||||
T = TypeVar("T", bound="FixedUser")
|
||||
|
||||
|
||||
class FixedUsersError(Exception):
|
||||
pass
|
||||
|
||||
|
||||
@attr.s(auto_attribs=True)
|
||||
class FixedUser:
|
||||
username: str
|
||||
password: str
|
||||
name: str
|
||||
company: str = get_default_company()
|
||||
|
||||
def __attrs_post_init__(self):
|
||||
self.user_id = hashlib.md5(f"{self.username}:{self.password}".encode()).hexdigest()
|
||||
self.user_id = hashlib.md5(f"{self.company}:{self.username}".encode()).hexdigest()
|
||||
|
||||
@classmethod
|
||||
def enabled(cls):
|
||||
return config.get("apiserver.auth.fixed_users.enabled", False)
|
||||
|
||||
@classmethod
|
||||
def validate(cls):
|
||||
if not cls.enabled():
|
||||
return
|
||||
users = cls.from_config()
|
||||
if len({user.username for user in users}) < len(users):
|
||||
raise FixedUsersError(
|
||||
"Duplicate user names found in fixed users configuration"
|
||||
)
|
||||
|
||||
@classmethod
|
||||
@lru_cache()
|
||||
def from_config(cls) -> Sequence[T]:
|
||||
return [cls(**user) for user in config.get("apiserver.auth.fixed_users.users", [])]
|
||||
return [
|
||||
cls(**user) for user in config.get("apiserver.auth.fixed_users.users", [])
|
||||
]
|
||||
|
||||
@classmethod
|
||||
@lru_cache()
|
||||
|
||||
@@ -1,5 +1,6 @@
|
||||
from enum import Enum
|
||||
from typing import Callable, Sequence, Text
|
||||
|
||||
from boltons.iterutils import remap
|
||||
from jsonmodels import models
|
||||
from jsonmodels.errors import FieldNotSupported
|
||||
|
||||
@@ -87,7 +88,14 @@ class Endpoint(object):
|
||||
Provided data_model schema if available
|
||||
"""
|
||||
try:
|
||||
return data_model.to_json_schema()
|
||||
res = data_model.to_json_schema()
|
||||
|
||||
def visit(path, key, value):
|
||||
if isinstance(value, Enum):
|
||||
value = str(value)
|
||||
return key, value
|
||||
|
||||
return remap(res, visit=visit)
|
||||
except (FieldNotSupported, TypeError):
|
||||
return str(data_model.__name__)
|
||||
|
||||
|
||||
@@ -9,6 +9,7 @@ import jsonmodels.models
|
||||
import timing_context
|
||||
from apierrors import APIError
|
||||
from apierrors.errors.bad_request import RequestPathHasInvalidVersion
|
||||
from api_version import __version__ as _api_version_
|
||||
from config import config
|
||||
from service_repo.base import PartialVersion
|
||||
from .apicall import APICall
|
||||
@@ -34,7 +35,7 @@ class ServiceRepo(object):
|
||||
"""If the check is set, parsing will fail for endpoint request with the version that is grater than the current
|
||||
maximum """
|
||||
|
||||
_max_version = PartialVersion("2.4")
|
||||
_max_version = PartialVersion(".".join(_api_version_.split(".")[:2]))
|
||||
""" Maximum version number (the highest min_version value across all endpoints) """
|
||||
|
||||
_endpoint_exp = (
|
||||
@@ -166,7 +167,7 @@ class ServiceRepo(object):
|
||||
return
|
||||
|
||||
assert isinstance(endpoint, Endpoint)
|
||||
call.actual_endpoint_version: PartialVersion = endpoint.min_version
|
||||
call.actual_endpoint_version = endpoint.min_version
|
||||
call.requires_authorization = endpoint.authorize
|
||||
return endpoint
|
||||
|
||||
|
||||
@@ -2,12 +2,15 @@ import itertools
|
||||
from collections import defaultdict
|
||||
from operator import itemgetter
|
||||
|
||||
import six
|
||||
|
||||
from apierrors import errors
|
||||
from apimodels.events import (
|
||||
MultiTaskScalarMetricsIterHistogramRequest,
|
||||
ScalarMetricsIterHistogramRequest,
|
||||
DebugImagesRequest,
|
||||
DebugImageResponse,
|
||||
MetricEvents,
|
||||
IterationEvents,
|
||||
TaskMetricsRequest,
|
||||
)
|
||||
from bll.event import EventBLL
|
||||
from bll.event.event_metrics import EventMetrics
|
||||
@@ -211,7 +214,7 @@ def vector_metrics_iter_histogram(call, company_id, req_model):
|
||||
@endpoint("events.get_task_events", required_fields=["task"])
|
||||
def get_task_events(call, company_id, _):
|
||||
task_id = call.data["task"]
|
||||
batch_size = call.data.get("batch_size")
|
||||
batch_size = call.data.get("batch_size", 500)
|
||||
event_type = call.data.get("event_type")
|
||||
scroll_id = call.data.get("scroll_id")
|
||||
order = call.data.get("order") or "asc"
|
||||
@@ -299,7 +302,7 @@ def multi_task_scalar_metrics_iter_histogram(
|
||||
call, company_id, req_model: MultiTaskScalarMetricsIterHistogramRequest
|
||||
):
|
||||
task_ids = req_model.tasks
|
||||
if isinstance(task_ids, six.string_types):
|
||||
if isinstance(task_ids, str):
|
||||
task_ids = [s.strip() for s in task_ids.split(",")]
|
||||
# Note, bll already validates task ids as it needs their names
|
||||
call.result.data = dict(
|
||||
@@ -481,7 +484,7 @@ def get_debug_images_v1_7(call, company_id, req_model):
|
||||
|
||||
|
||||
@endpoint("events.debug_images", min_version="1.8", required_fields=["task"])
|
||||
def get_debug_images(call, company_id, req_model):
|
||||
def get_debug_images_v1_8(call, company_id, req_model):
|
||||
task_id = call.data["task"]
|
||||
iters = call.data.get("iters") or 1
|
||||
scroll_id = call.data.get("scroll_id")
|
||||
@@ -507,6 +510,53 @@ def get_debug_images(call, company_id, req_model):
|
||||
)
|
||||
|
||||
|
||||
@endpoint(
|
||||
"events.debug_images",
|
||||
min_version="2.7",
|
||||
request_data_model=DebugImagesRequest,
|
||||
response_data_model=DebugImageResponse,
|
||||
)
|
||||
def get_debug_images(call, company_id, req_model: DebugImagesRequest):
|
||||
tasks = set(m.task for m in req_model.metrics)
|
||||
task_bll.assert_exists(call.identity.company, task_ids=tasks, allow_public=True)
|
||||
result = event_bll.debug_images_iterator.get_task_events(
|
||||
company_id=company_id,
|
||||
metrics=[(m.task, m.metric) for m in req_model.metrics],
|
||||
iter_count=req_model.iters,
|
||||
navigate_earlier=req_model.navigate_earlier,
|
||||
refresh=req_model.refresh,
|
||||
state_id=req_model.scroll_id,
|
||||
)
|
||||
|
||||
call.result.data_model = DebugImageResponse(
|
||||
scroll_id=result.next_scroll_id,
|
||||
metrics=[
|
||||
MetricEvents(
|
||||
task=task,
|
||||
metric=metric,
|
||||
iterations=[
|
||||
IterationEvents(iter=iteration["iter"], events=iteration["events"])
|
||||
for iteration in iterations
|
||||
],
|
||||
)
|
||||
for (task, metric, iterations) in result.metric_events
|
||||
],
|
||||
)
|
||||
|
||||
|
||||
@endpoint("events.get_task_metrics", request_data_model=TaskMetricsRequest)
|
||||
def get_tasks_metrics(call: APICall, company_id, req_model: TaskMetricsRequest):
|
||||
task_bll.assert_exists(
|
||||
call.identity.company, task_ids=req_model.tasks, allow_public=True
|
||||
)
|
||||
res = event_bll.metrics.get_tasks_metrics(
|
||||
company_id, task_ids=req_model.tasks, event_type=req_model.event_type
|
||||
)
|
||||
call.result.data = {
|
||||
"metrics": [{"task": task, "metrics": metrics} for (task, metrics) in res]
|
||||
}
|
||||
|
||||
|
||||
@endpoint("events.delete_for_task", required_fields=["task"])
|
||||
def delete_for_task(call, company_id, req_model):
|
||||
task_id = call.data["task"]
|
||||
|
||||
@@ -61,7 +61,7 @@ def get_by_id(call):
|
||||
def make_projects_get_all_pipelines(project_ids, specific_state=None):
|
||||
archived = EntityVisibility.archived.value
|
||||
|
||||
def ensure_system_tags():
|
||||
def ensure_valid_fields():
|
||||
"""
|
||||
Make sure system tags is always an array (required by subsequent $in in archived_tasks_cond
|
||||
"""
|
||||
@@ -73,6 +73,9 @@ def make_projects_get_all_pipelines(project_ids, specific_state=None):
|
||||
"then": [],
|
||||
"else": "$system_tags",
|
||||
}
|
||||
},
|
||||
"status": {
|
||||
"$ifNull": ["$status", "unknown"]
|
||||
}
|
||||
}
|
||||
}
|
||||
@@ -80,7 +83,7 @@ def make_projects_get_all_pipelines(project_ids, specific_state=None):
|
||||
status_count_pipeline = [
|
||||
# count tasks per project per status
|
||||
{"$match": {"project": {"$in": project_ids}}},
|
||||
ensure_system_tags(),
|
||||
ensure_valid_fields(),
|
||||
{
|
||||
"$group": {
|
||||
"_id": {
|
||||
@@ -153,7 +156,7 @@ def make_projects_get_all_pipelines(project_ids, specific_state=None):
|
||||
"project": {"$in": project_ids},
|
||||
}
|
||||
},
|
||||
ensure_system_tags(),
|
||||
ensure_valid_fields(),
|
||||
{
|
||||
# for each project
|
||||
"$group": group_step
|
||||
|
||||
@@ -11,7 +11,6 @@ from database.errors import translate_errors_context
|
||||
from database.model import Company
|
||||
from database.model.company import ReportStatsOption
|
||||
from service_repo import ServiceRepo, APICall, endpoint
|
||||
from version import __version__ as current_version
|
||||
|
||||
|
||||
@endpoint("server.get_stats")
|
||||
@@ -79,7 +78,7 @@ def report_stats(call: APICall, company: str, request: ReportStatsOptionRequest)
|
||||
stats_option = ReportStatsOption(
|
||||
enabled=enabled,
|
||||
enabled_time=datetime.utcnow(),
|
||||
enabled_version=current_version,
|
||||
enabled_version=get_version(),
|
||||
enabled_user=call.identity.user,
|
||||
)
|
||||
updated = query.update(defaults__stats_option=stats_option)
|
||||
@@ -87,7 +86,8 @@ def report_stats(call: APICall, company: str, request: ReportStatsOptionRequest)
|
||||
raise errors.server_error.InternalError(
|
||||
f"Failed setting report_stats to {enabled}"
|
||||
)
|
||||
|
||||
result = ReportStatsOptionResponse(**stats_option.to_mongo())
|
||||
data = stats_option.to_mongo()
|
||||
data["current_version"] = get_version()
|
||||
result = ReportStatsOptionResponse(**data)
|
||||
|
||||
call.result.data_model = result
|
||||
|
||||
@@ -1,18 +1,17 @@
|
||||
from copy import deepcopy
|
||||
from datetime import datetime
|
||||
from operator import attrgetter
|
||||
from typing import Sequence, Callable, Type, TypeVar
|
||||
from typing import Sequence, Callable, Type, TypeVar, Union
|
||||
|
||||
import attr
|
||||
import dpath
|
||||
import mongoengine
|
||||
import six
|
||||
from mongoengine import EmbeddedDocument, Q
|
||||
from mongoengine.queryset.transform import COMPARISON_OPERATORS
|
||||
from pymongo import UpdateOne
|
||||
|
||||
from apierrors import errors, APIError
|
||||
from apimodels.base import UpdateResponse
|
||||
from apimodels.base import UpdateResponse, IdResponse
|
||||
from apimodels.tasks import (
|
||||
StartedResponse,
|
||||
ResetResponse,
|
||||
@@ -27,10 +26,19 @@ from apimodels.tasks import (
|
||||
EnqueueRequest,
|
||||
EnqueueResponse,
|
||||
DequeueResponse,
|
||||
CloneRequest,
|
||||
AddOrUpdateArtifactsRequest,
|
||||
AddOrUpdateArtifactsResponse,
|
||||
)
|
||||
from bll.event import EventBLL
|
||||
from bll.queue import QueueBLL
|
||||
from bll.task import TaskBLL, ChangeStatusRequest, update_project_time, split_by
|
||||
from bll.task import (
|
||||
TaskBLL,
|
||||
ChangeStatusRequest,
|
||||
update_project_time,
|
||||
split_by,
|
||||
ParameterKeyEscaper,
|
||||
)
|
||||
from bll.util import SetFieldsResolver
|
||||
from database.errors import translate_errors_context
|
||||
from database.model.model import Model
|
||||
@@ -94,13 +102,37 @@ def get_by_id(call: APICall, company_id, req_model: TaskRequest):
|
||||
req_model.task, company_id=company_id, allow_public=True
|
||||
)
|
||||
task_dict = task.to_proper_dict()
|
||||
conform_output_tags(call, task_dict)
|
||||
unprepare_from_saved(call, task_dict)
|
||||
call.result.data = {"task": task_dict}
|
||||
|
||||
|
||||
def escape_execution_parameters(call: APICall):
|
||||
default_prefix = "execution.parameters."
|
||||
|
||||
def escape_paths(paths, prefix=default_prefix):
|
||||
return [
|
||||
prefix + ParameterKeyEscaper.escape(path[len(prefix) :])
|
||||
if path.startswith(prefix)
|
||||
else path
|
||||
for path in paths
|
||||
]
|
||||
|
||||
projection = Task.get_projection(call.data)
|
||||
if projection:
|
||||
Task.set_projection(call.data, escape_paths(projection))
|
||||
|
||||
ordering = Task.get_ordering(call.data)
|
||||
if ordering:
|
||||
ordering = Task.set_ordering(call.data, escape_paths(ordering, default_prefix))
|
||||
Task.set_ordering(call.data, escape_paths(ordering, "-" + default_prefix))
|
||||
|
||||
|
||||
@endpoint("tasks.get_all_ex", required_fields=[])
|
||||
def get_all_ex(call: APICall):
|
||||
conform_tag_fields(call, call.data)
|
||||
|
||||
escape_execution_parameters(call)
|
||||
|
||||
with translate_errors_context():
|
||||
with TimingContext("mongo", "task_get_all_ex"):
|
||||
tasks = Task.get_many_with_join(
|
||||
@@ -109,13 +141,16 @@ def get_all_ex(call: APICall):
|
||||
query_options=get_all_query_options,
|
||||
allow_public=True, # required in case projection is requested for public dataset/versions
|
||||
)
|
||||
conform_output_tags(call, tasks)
|
||||
unprepare_from_saved(call, tasks)
|
||||
call.result.data = {"tasks": tasks}
|
||||
|
||||
|
||||
@endpoint("tasks.get_all", required_fields=[])
|
||||
def get_all(call: APICall):
|
||||
conform_tag_fields(call, call.data)
|
||||
|
||||
escape_execution_parameters(call)
|
||||
|
||||
with translate_errors_context():
|
||||
with TimingContext("mongo", "task_get_all"):
|
||||
tasks = Task.get_many(
|
||||
@@ -125,7 +160,7 @@ def get_all(call: APICall):
|
||||
query_options=get_all_query_options,
|
||||
allow_public=True, # required in case projection is requested for public dataset/versions
|
||||
)
|
||||
conform_output_tags(call, tasks)
|
||||
unprepare_from_saved(call, tasks)
|
||||
call.result.data = {"tasks": tasks}
|
||||
|
||||
|
||||
@@ -220,6 +255,45 @@ create_fields = {
|
||||
}
|
||||
|
||||
|
||||
def prepare_for_save(call: APICall, fields: dict):
|
||||
conform_tag_fields(call, fields)
|
||||
|
||||
# Strip all script fields (remove leading and trailing whitespace chars) to avoid unusable names and paths
|
||||
for field in task_script_fields:
|
||||
try:
|
||||
path = f"script/{field}"
|
||||
value = dpath.get(fields, path)
|
||||
if isinstance(value, str):
|
||||
value = value.strip()
|
||||
dpath.set(fields, path, value)
|
||||
except KeyError:
|
||||
pass
|
||||
|
||||
parameters = safe_get(fields, "execution/parameters")
|
||||
if parameters is not None:
|
||||
# Escape keys to make them mongo-safe
|
||||
parameters = {ParameterKeyEscaper.escape(k): v for k, v in parameters.items()}
|
||||
dpath.set(fields, "execution/parameters", parameters)
|
||||
|
||||
return fields
|
||||
|
||||
|
||||
def unprepare_from_saved(call: APICall, tasks_data: Union[Sequence[dict], dict]):
|
||||
if isinstance(tasks_data, dict):
|
||||
tasks_data = [tasks_data]
|
||||
|
||||
conform_output_tags(call, tasks_data)
|
||||
|
||||
for task_data in tasks_data:
|
||||
parameters = safe_get(task_data, "execution/parameters")
|
||||
if parameters is not None:
|
||||
# Escape keys to make them mongo-safe
|
||||
parameters = {
|
||||
ParameterKeyEscaper.unescape(k): v for k, v in parameters.items()
|
||||
}
|
||||
dpath.set(task_data, "execution/parameters", parameters)
|
||||
|
||||
|
||||
def prepare_create_fields(
|
||||
call: APICall, valid_fields=None, output=None, previous_task: Task = None
|
||||
):
|
||||
@@ -239,25 +313,7 @@ def prepare_create_fields(
|
||||
output = Output(destination=output_dest)
|
||||
fields["output"] = output
|
||||
|
||||
conform_tag_fields(call, fields)
|
||||
|
||||
# Strip all script fields (remove leading and trailing whitespace chars) to avoid unusable names and paths
|
||||
for field in task_script_fields:
|
||||
try:
|
||||
path = "script/%s" % field
|
||||
value = dpath.get(fields, path)
|
||||
if isinstance(value, six.string_types):
|
||||
value = value.strip()
|
||||
dpath.set(fields, path, value)
|
||||
except KeyError:
|
||||
pass
|
||||
|
||||
parameters = safe_get(fields, "execution/parameters")
|
||||
if parameters is not None:
|
||||
parameters = {k.strip(): v for k, v in parameters.items()}
|
||||
dpath.set(fields, "execution/parameters", parameters)
|
||||
|
||||
return fields
|
||||
return prepare_for_save(call, fields)
|
||||
|
||||
|
||||
def _validate_and_get_task_from_call(call: APICall, **kwargs):
|
||||
@@ -278,7 +334,9 @@ def validate(call: APICall, company_id, req_model: CreateRequest):
|
||||
_validate_and_get_task_from_call(call)
|
||||
|
||||
|
||||
@endpoint("tasks.create", request_data_model=CreateRequest)
|
||||
@endpoint(
|
||||
"tasks.create", request_data_model=CreateRequest, response_data_model=IdResponse
|
||||
)
|
||||
def create(call: APICall, company_id, req_model: CreateRequest):
|
||||
task = _validate_and_get_task_from_call(call)
|
||||
|
||||
@@ -286,7 +344,26 @@ def create(call: APICall, company_id, req_model: CreateRequest):
|
||||
task.save()
|
||||
update_project_time(task.project)
|
||||
|
||||
call.result.data = {"id": task.id}
|
||||
call.result.data_model = IdResponse(id=task.id)
|
||||
|
||||
|
||||
@endpoint(
|
||||
"tasks.clone", request_data_model=CloneRequest, response_data_model=IdResponse
|
||||
)
|
||||
def clone_task(call: APICall, company_id, request: CloneRequest):
|
||||
task = task_bll.clone_task(
|
||||
company_id=company_id,
|
||||
user_id=call.identity.user,
|
||||
task_id=request.task,
|
||||
name=request.new_task_name,
|
||||
comment=request.new_task_comment,
|
||||
parent=request.new_task_parent,
|
||||
project=request.new_task_project,
|
||||
tags=request.new_task_tags,
|
||||
system_tags=request.new_task_system_tags,
|
||||
execution_overrides=request.execution_overrides,
|
||||
)
|
||||
call.result.data_model = IdResponse(id=task.id)
|
||||
|
||||
|
||||
def prepare_update_fields(call: APICall, task, call_data):
|
||||
@@ -296,8 +373,7 @@ def prepare_update_fields(call: APICall, task, call_data):
|
||||
t_fields = task_fields
|
||||
t_fields.add("output__error")
|
||||
fields = parse_from_call(call_data, update_fields, t_fields)
|
||||
conform_tag_fields(call, fields)
|
||||
return fields, valid_fields
|
||||
return prepare_for_save(call, fields), valid_fields
|
||||
|
||||
|
||||
@endpoint(
|
||||
@@ -324,7 +400,7 @@ def update(call: APICall, company_id, req_model: UpdateRequest):
|
||||
)
|
||||
|
||||
update_project_time(updated_fields.get("project"))
|
||||
conform_output_tags(call, updated_fields)
|
||||
unprepare_from_saved(call, updated_fields)
|
||||
return UpdateResponse(updated=updated_count, fields=updated_fields)
|
||||
|
||||
|
||||
@@ -449,7 +525,7 @@ def edit(call: APICall, company_id, req_model: UpdateRequest):
|
||||
fixed_fields.update(last_update=now)
|
||||
updated = task.update(upsert=False, **fixed_fields)
|
||||
update_project_time(fields.get("project"))
|
||||
conform_output_tags(call, fields)
|
||||
unprepare_from_saved(call, fields)
|
||||
call.result.data_model = UpdateResponse(updated=updated, fields=fields)
|
||||
else:
|
||||
call.result.data_model = UpdateResponse(updated=0)
|
||||
@@ -702,7 +778,7 @@ def cleanup_task(task, force=False):
|
||||
else:
|
||||
updated_models = 0
|
||||
|
||||
event_bll.delete_task_events(task.company, task.id)
|
||||
event_bll.delete_task_events(task.company, task.id, allow_locked=force)
|
||||
|
||||
return CleanupResult(
|
||||
deleted_models=deleted_models,
|
||||
@@ -837,3 +913,18 @@ def ping(_, company_id, request: PingRequest):
|
||||
TaskBLL.set_last_update(
|
||||
task_ids=[request.task], company_id=company_id, last_update=datetime.utcnow()
|
||||
)
|
||||
|
||||
|
||||
@endpoint(
|
||||
"tasks.add_or_update_artifacts",
|
||||
min_version="2.6",
|
||||
request_data_model=AddOrUpdateArtifactsRequest,
|
||||
response_data_model=AddOrUpdateArtifactsResponse,
|
||||
)
|
||||
def add_or_update_artifacts(
|
||||
call: APICall, company_id, request: AddOrUpdateArtifactsRequest
|
||||
):
|
||||
added, updated = TaskBLL.add_or_update_artifacts(
|
||||
task_id=request.task, company_id=company_id, artifacts=request.artifacts
|
||||
)
|
||||
call.result.data_model = AddOrUpdateArtifactsResponse(added=added, updated=updated)
|
||||
|
||||
@@ -7,10 +7,7 @@ from mongoengine import Q
|
||||
|
||||
from apierrors import errors
|
||||
from apimodels.base import UpdateResponse
|
||||
from apimodels.users import (
|
||||
CreateRequest,
|
||||
SetPreferencesRequest,
|
||||
)
|
||||
from apimodels.users import CreateRequest, SetPreferencesRequest
|
||||
from bll.user import UserBLL
|
||||
from config import config
|
||||
from database.errors import translate_errors_context
|
||||
@@ -19,6 +16,7 @@ from database.model.company import Company
|
||||
from database.model.user import User
|
||||
from database.utils import parse_from_call
|
||||
from service_repo import APICall, endpoint
|
||||
from utilities.json import loads, dumps
|
||||
|
||||
log = config.logger(__file__)
|
||||
get_all_query_options = User.QueryParameterOptions(list_fields=("id",))
|
||||
@@ -160,7 +158,10 @@ def update(call, company_id, _):
|
||||
|
||||
def get_user_preferences(call):
|
||||
user_id = call.identity.user
|
||||
return get_user(call, user_id, ["preferences"]).get("preferences", {})
|
||||
preferences = get_user(call, user_id, ["preferences"]).get("preferences")
|
||||
if preferences and isinstance(preferences, str):
|
||||
preferences = loads(preferences)
|
||||
return preferences or {}
|
||||
|
||||
|
||||
@endpoint("users.get_preferences")
|
||||
@@ -169,9 +170,7 @@ def get_preferences(call):
|
||||
return {"preferences": get_user_preferences(call)}
|
||||
|
||||
|
||||
@endpoint(
|
||||
"users.set_preferences", request_data_model=SetPreferencesRequest
|
||||
)
|
||||
@endpoint("users.set_preferences", request_data_model=SetPreferencesRequest)
|
||||
def set_preferences(call, company_id, req_model):
|
||||
# type: (APICall, str, SetPreferencesRequest) -> Dict
|
||||
assert isinstance(call, APICall)
|
||||
@@ -205,9 +204,11 @@ def set_preferences(call, company_id, req_model):
|
||||
updated, fields = 0, {}
|
||||
else:
|
||||
with translate_errors_context("updating user preferences"):
|
||||
fields = dict(preferences=new_preferences)
|
||||
updated = User.objects(id=call.identity.user, company=company_id).update(
|
||||
upsert=False, **fields
|
||||
upsert=False, preferences=dumps(new_preferences)
|
||||
)
|
||||
|
||||
return {"updated": updated, "fields": fields if updated else {}}
|
||||
return {
|
||||
"updated": updated,
|
||||
"fields": {"preferences": new_preferences} if updated else {},
|
||||
}
|
||||
|
||||
@@ -1,14 +1,14 @@
|
||||
import operator
|
||||
from time import sleep
|
||||
|
||||
from typing import Sequence
|
||||
from typing import Sequence, Mapping
|
||||
|
||||
from tests.automated import TestService
|
||||
|
||||
|
||||
class TestEntityOrdering(TestService):
|
||||
test_comment = "Entity ordering test"
|
||||
only_fields = ["id", "started", "comment"]
|
||||
only_fields = ["id", "started", "comment", "execution.parameters"]
|
||||
|
||||
def setUp(self, **kwargs):
|
||||
super().setUp(**kwargs)
|
||||
@@ -27,6 +27,9 @@ class TestEntityOrdering(TestService):
|
||||
# sort by the same field that we use for the search
|
||||
self._assertGetTasksWithOrdering(order_by="comment")
|
||||
|
||||
# sort by parameter which type is not part of db schema
|
||||
self._assertGetTasksWithOrdering(order_by="execution.parameters.test")
|
||||
|
||||
def test_order_with_paging(self):
|
||||
order_field = "started"
|
||||
# all results in one page
|
||||
@@ -52,7 +55,7 @@ class TestEntityOrdering(TestService):
|
||||
def _get_page_tasks(self, order_by, page: int, page_size: int) -> Sequence:
|
||||
return self.api.tasks.get_all_ex(
|
||||
only_fields=self.only_fields,
|
||||
order_by=order_by,
|
||||
order_by=[order_by] if isinstance(order_by, str) else order_by,
|
||||
comment=self.test_comment,
|
||||
page=page,
|
||||
page_size=page_size,
|
||||
@@ -63,12 +66,19 @@ class TestEntityOrdering(TestService):
|
||||
Assert that vals are sorted in the ascending or descending order
|
||||
with None values are always coming from the end
|
||||
"""
|
||||
if None in vals:
|
||||
first_null_idx = vals.index(None)
|
||||
none_tail = vals[first_null_idx:]
|
||||
vals = vals[:first_null_idx]
|
||||
self.assertTrue(all(val is None for val in none_tail))
|
||||
self.assertTrue(all(val is not None for val in vals))
|
||||
empty = [None, "", [], {}]
|
||||
empty_value = None
|
||||
idx = 0
|
||||
for idx, val in enumerate(vals):
|
||||
if val in empty:
|
||||
empty_value = val
|
||||
break
|
||||
|
||||
if idx < len(vals) - 1:
|
||||
none_tail = vals[idx:]
|
||||
vals = vals[:idx]
|
||||
self.assertTrue(all(val == empty_value for val in none_tail))
|
||||
self.assertTrue(all(val != empty_value for val in vals))
|
||||
|
||||
if ascending:
|
||||
cmp = operator.le
|
||||
@@ -76,10 +86,18 @@ class TestEntityOrdering(TestService):
|
||||
cmp = operator.ge
|
||||
self.assertTrue(all(cmp(i, j) for i, j in zip(vals, vals[1:])))
|
||||
|
||||
def _get_value_for_path(self, data: Mapping, field_path: Sequence[str]):
|
||||
val = None
|
||||
for name in field_path:
|
||||
val = data.get(name)
|
||||
data = val if isinstance(val, dict) else {}
|
||||
|
||||
return val
|
||||
|
||||
def _assertGetTasksWithOrdering(self, order_by: str = None, **kwargs):
|
||||
tasks = self.api.tasks.get_all_ex(
|
||||
only_fields=self.only_fields,
|
||||
order_by=order_by,
|
||||
order_by=[order_by] if isinstance(order_by, str) else order_by,
|
||||
comment=self.test_comment,
|
||||
**kwargs,
|
||||
).tasks
|
||||
@@ -87,12 +105,17 @@ class TestEntityOrdering(TestService):
|
||||
if order_by:
|
||||
# test that the output is correctly ordered
|
||||
field_name = order_by if not order_by.startswith("-") else order_by[1:]
|
||||
field_vals = [t.get(field_name) for t in tasks]
|
||||
field_vals = [self._get_value_for_path(t, field_name.split(".")) for t in tasks]
|
||||
self._assertSorted(field_vals, ascending=not order_by.startswith("-"))
|
||||
|
||||
def _create_tasks(self):
|
||||
tasks = [self._temp_task() for _ in range(10)]
|
||||
for _, task in zip(range(5), tasks):
|
||||
tasks = [
|
||||
self._temp_task(
|
||||
**(dict(execution={"parameters": {"test": f"{i}"} if i >= 5 else {}}))
|
||||
)
|
||||
for i in range(10)
|
||||
]
|
||||
for idx, task in zip(range(5), tasks):
|
||||
self.api.tasks.started(task=task)
|
||||
sleep(0.1)
|
||||
return tasks
|
||||
|
||||
@@ -2,83 +2,199 @@
|
||||
Comprehensive test of all(?) use cases of datasets and frames
|
||||
"""
|
||||
import json
|
||||
import time
|
||||
import unittest
|
||||
from functools import partial
|
||||
from statistics import mean
|
||||
from typing import Sequence
|
||||
|
||||
import es_factory
|
||||
from config import config
|
||||
from tests.automated import TestService
|
||||
|
||||
log = config.logger(__file__)
|
||||
|
||||
|
||||
class TestTaskEvents(TestService):
|
||||
def setUp(self, version="1.7"):
|
||||
def setUp(self, version="2.7"):
|
||||
super().setUp(version=version)
|
||||
|
||||
self.created_tasks = []
|
||||
|
||||
self.task = dict(
|
||||
name="test task events",
|
||||
type="training",
|
||||
input=dict(mapping={}, view=dict(entries=[])),
|
||||
def _temp_task(self, name="test task events"):
|
||||
task_input = dict(
|
||||
name=name, type="training", input=dict(mapping={}, view=dict(entries=[])),
|
||||
)
|
||||
res, self.task_id = self.api.send("tasks.create", self.task, extract="id")
|
||||
assert res.meta.result_code == 200
|
||||
self.created_tasks.append(self.task_id)
|
||||
return self.create_temp("tasks", **task_input)
|
||||
|
||||
def tearDown(self):
|
||||
log.info("Cleanup...")
|
||||
for task_id in self.created_tasks:
|
||||
try:
|
||||
self.api.send("tasks.delete", dict(task=task_id, force=True))
|
||||
except Exception as ex:
|
||||
log.exception(ex)
|
||||
|
||||
def create_task_event(self, type, iteration):
|
||||
def _create_task_event(self, type_, task, iteration, **kwargs):
|
||||
return {
|
||||
"worker": "test",
|
||||
"type": type,
|
||||
"task": self.task_id,
|
||||
"type": type_,
|
||||
"task": task,
|
||||
"iter": iteration,
|
||||
"timestamp": es_factory.get_timestamp_millis()
|
||||
"timestamp": es_factory.get_timestamp_millis(),
|
||||
**kwargs,
|
||||
}
|
||||
|
||||
def copy_and_update(self, src_obj, new_data):
|
||||
def _copy_and_update(self, src_obj, new_data):
|
||||
obj = src_obj.copy()
|
||||
obj.update(new_data)
|
||||
return obj
|
||||
|
||||
def test_task_metrics(self):
|
||||
tasks = {
|
||||
self._temp_task(): {
|
||||
"Metric1": ["training_debug_image"],
|
||||
"Metric2": ["training_debug_image", "log"],
|
||||
},
|
||||
self._temp_task(): {"Metric3": ["training_debug_image"]},
|
||||
}
|
||||
events = [
|
||||
self._create_task_event(
|
||||
event_type,
|
||||
task=task,
|
||||
iteration=1,
|
||||
metric=metric,
|
||||
variant="Test variant",
|
||||
)
|
||||
for task, metrics in tasks.items()
|
||||
for metric, event_types in metrics.items()
|
||||
for event_type in event_types
|
||||
]
|
||||
self.send_batch(events)
|
||||
self._assert_task_metrics(tasks, "training_debug_image")
|
||||
self._assert_task_metrics(tasks, "log")
|
||||
self._assert_task_metrics(tasks, "training_stats_scalar")
|
||||
|
||||
def _assert_task_metrics(self, tasks: dict, event_type: str):
|
||||
res = self.api.events.get_task_metrics(tasks=list(tasks), event_type=event_type)
|
||||
for task, metrics in tasks.items():
|
||||
res_metrics = next(
|
||||
(tm.metrics for tm in res.metrics if tm.task == task), ()
|
||||
)
|
||||
self.assertEqual(
|
||||
set(res_metrics),
|
||||
set(
|
||||
metric for metric, events in metrics.items() if event_type in events
|
||||
),
|
||||
)
|
||||
|
||||
def test_task_debug_images(self):
|
||||
task = self._temp_task()
|
||||
metric = "Metric1"
|
||||
variants = [("Variant1", 7), ("Variant2", 4)]
|
||||
iterations = 10
|
||||
|
||||
# test empty
|
||||
res = self.api.events.debug_images(
|
||||
metrics=[{"task": task, "metric": metric}],
|
||||
iters=5,
|
||||
)
|
||||
self.assertFalse(res.metrics)
|
||||
|
||||
# create events
|
||||
events = [
|
||||
self._create_task_event(
|
||||
"training_debug_image",
|
||||
task=task,
|
||||
iteration=n,
|
||||
metric=metric,
|
||||
variant=variant,
|
||||
url=f"{metric}_{variant}_{n % unique_images}",
|
||||
)
|
||||
for n in range(iterations)
|
||||
for (variant, unique_images) in variants
|
||||
]
|
||||
self.send_batch(events)
|
||||
|
||||
# init testing
|
||||
unique_images = [unique for (_, unique) in variants]
|
||||
scroll_id = None
|
||||
assert_debug_images = partial(
|
||||
self._assertDebugImages,
|
||||
task=task,
|
||||
metric=metric,
|
||||
max_iter=iterations - 1,
|
||||
unique_images=unique_images,
|
||||
)
|
||||
|
||||
# test forward navigation
|
||||
for page in range(3):
|
||||
scroll_id = assert_debug_images(scroll_id=scroll_id, page=page)
|
||||
|
||||
# test backwards navigation
|
||||
scroll_id = assert_debug_images(
|
||||
scroll_id=scroll_id, page=0, navigate_earlier=False
|
||||
)
|
||||
|
||||
# beyond the latest iteration and back
|
||||
res = self.api.events.debug_images(
|
||||
metrics=[{"task": task, "metric": metric}],
|
||||
iters=5,
|
||||
scroll_id=scroll_id,
|
||||
navigate_earlier=False,
|
||||
)
|
||||
self.assertEqual(len(res["metrics"][0]["iterations"]), 0)
|
||||
assert_debug_images(scroll_id=scroll_id, page=1)
|
||||
|
||||
# refresh
|
||||
assert_debug_images(scroll_id=scroll_id, page=0, refresh=True)
|
||||
|
||||
def _assertDebugImages(
|
||||
self,
|
||||
task,
|
||||
metric,
|
||||
max_iter: int,
|
||||
unique_images: Sequence[int],
|
||||
scroll_id,
|
||||
page: int,
|
||||
iters: int = 5,
|
||||
**extra_params,
|
||||
):
|
||||
res = self.api.events.debug_images(
|
||||
metrics=[{"task": task, "metric": metric}],
|
||||
iters=iters,
|
||||
scroll_id=scroll_id,
|
||||
**extra_params,
|
||||
)
|
||||
data = res["metrics"][0]
|
||||
self.assertEqual(data["task"], task)
|
||||
self.assertEqual(data["metric"], metric)
|
||||
left_iterations = max(0, max(unique_images) - page * iters)
|
||||
self.assertEqual(len(data["iterations"]), min(iters, left_iterations))
|
||||
for it in data["iterations"]:
|
||||
events_per_iter = sum(
|
||||
1 for unique in unique_images if unique > max_iter - it["iter"]
|
||||
)
|
||||
self.assertEqual(len(it["events"]), events_per_iter)
|
||||
return res.scroll_id
|
||||
|
||||
def test_task_logs(self):
|
||||
events = []
|
||||
for iter in range(10):
|
||||
log_event = self.create_task_event("log", iteration=iter)
|
||||
task = self._temp_task()
|
||||
for iter_ in range(10):
|
||||
log_event = self._create_task_event("log", task, iteration=iter_)
|
||||
events.append(
|
||||
self.copy_and_update(
|
||||
self._copy_and_update(
|
||||
log_event,
|
||||
{"msg": "This is a log message from test task iter " + str(iter)},
|
||||
{"msg": "This is a log message from test task iter " + str(iter_)},
|
||||
)
|
||||
)
|
||||
# sleep so timestamp is not the same
|
||||
import time
|
||||
|
||||
time.sleep(0.01)
|
||||
self.send_batch(events)
|
||||
|
||||
data = self.api.events.get_task_log(task=self.task_id)
|
||||
data = self.api.events.get_task_log(task=task)
|
||||
assert len(data["events"]) == 10
|
||||
|
||||
self.api.tasks.reset(task=self.task_id)
|
||||
data = self.api.events.get_task_log(task=self.task_id)
|
||||
self.api.tasks.reset(task=task)
|
||||
data = self.api.events.get_task_log(task=task)
|
||||
assert len(data["events"]) == 0
|
||||
|
||||
def test_task_metric_value_intervals_keys(self):
|
||||
metric = "Metric1"
|
||||
variant = "Variant1"
|
||||
iter_count = 100
|
||||
task = self._temp_task()
|
||||
events = [
|
||||
{
|
||||
**self.create_task_event("training_stats_scalar", iteration),
|
||||
**self._create_task_event("training_stats_scalar", task, iteration),
|
||||
"metric": metric,
|
||||
"variant": variant,
|
||||
"value": iteration,
|
||||
@@ -88,19 +204,65 @@ class TestTaskEvents(TestService):
|
||||
self.send_batch(events)
|
||||
for key in None, "iter", "timestamp", "iso_time":
|
||||
with self.subTest(key=key):
|
||||
data = self.api.events.scalar_metrics_iter_histogram(task=self.task_id, key=key)
|
||||
data = self.api.events.scalar_metrics_iter_histogram(task=task, key=key)
|
||||
self.assertIn(metric, data)
|
||||
self.assertIn(variant, data[metric])
|
||||
self.assertIn("x", data[metric][variant])
|
||||
self.assertIn("y", data[metric][variant])
|
||||
|
||||
def test_multitask_events_many_metrics(self):
|
||||
tasks = [
|
||||
self._temp_task(name="test events1"),
|
||||
self._temp_task(name="test events2"),
|
||||
]
|
||||
iter_count = 10
|
||||
metrics_count = 10
|
||||
variants_count = 10
|
||||
events = [
|
||||
{
|
||||
**self._create_task_event("training_stats_scalar", task, iteration),
|
||||
"metric": f"Metric{metric_idx}",
|
||||
"variant": f"Variant{variant_idx}",
|
||||
"value": iteration,
|
||||
}
|
||||
for iteration in range(iter_count)
|
||||
for task in tasks
|
||||
for metric_idx in range(metrics_count)
|
||||
for variant_idx in range(variants_count)
|
||||
]
|
||||
self.send_batch(events)
|
||||
data = self.api.events.multi_task_scalar_metrics_iter_histogram(tasks=tasks)
|
||||
self._assert_metrics_and_variants(
|
||||
data.metrics,
|
||||
metrics=metrics_count,
|
||||
variants=variants_count,
|
||||
tasks=tasks,
|
||||
iterations=iter_count,
|
||||
)
|
||||
|
||||
def _assert_metrics_and_variants(
|
||||
self, data: dict, metrics: int, variants: int, tasks: Sequence, iterations: int
|
||||
):
|
||||
self.assertEqual(len(data), metrics)
|
||||
for m in range(metrics):
|
||||
metric_data = data[f"Metric{m}"]
|
||||
self.assertEqual(len(metric_data), variants)
|
||||
for v in range(variants):
|
||||
variant_data = metric_data[f"Variant{v}"]
|
||||
self.assertEqual(len(variant_data), len(tasks))
|
||||
for t in tasks:
|
||||
task_data = variant_data[t]
|
||||
self.assertEqual(len(task_data["x"]), iterations)
|
||||
self.assertEqual(len(task_data["y"]), iterations)
|
||||
|
||||
def test_task_metric_value_intervals(self):
|
||||
metric = "Metric1"
|
||||
variant = "Variant1"
|
||||
iter_count = 100
|
||||
task = self._temp_task()
|
||||
events = [
|
||||
{
|
||||
**self.create_task_event("training_stats_scalar", iteration),
|
||||
**self._create_task_event("training_stats_scalar", task, iteration),
|
||||
"metric": metric,
|
||||
"variant": variant,
|
||||
"value": iteration,
|
||||
@@ -109,13 +271,13 @@ class TestTaskEvents(TestService):
|
||||
]
|
||||
self.send_batch(events)
|
||||
|
||||
data = self.api.events.scalar_metrics_iter_histogram(task=self.task_id)
|
||||
data = self.api.events.scalar_metrics_iter_histogram(task=task)
|
||||
self._assert_metrics_histogram(data[metric][variant], iter_count, 100)
|
||||
|
||||
data = self.api.events.scalar_metrics_iter_histogram(task=self.task_id, samples=100)
|
||||
data = self.api.events.scalar_metrics_iter_histogram(task=task, samples=100)
|
||||
self._assert_metrics_histogram(data[metric][variant], iter_count, 100)
|
||||
|
||||
data = self.api.events.scalar_metrics_iter_histogram(task=self.task_id, samples=10)
|
||||
data = self.api.events.scalar_metrics_iter_histogram(task=task, samples=10)
|
||||
self._assert_metrics_histogram(data[metric][variant], iter_count, 10)
|
||||
|
||||
def _assert_metrics_histogram(self, data, iters, samples):
|
||||
@@ -130,7 +292,8 @@ class TestTaskEvents(TestService):
|
||||
)
|
||||
|
||||
def test_task_plots(self):
|
||||
event = self.create_task_event("plot", 0)
|
||||
task = self._temp_task()
|
||||
event = self._create_task_event("plot", task, 0)
|
||||
event["metric"] = "roc"
|
||||
event.update(
|
||||
{
|
||||
@@ -179,7 +342,7 @@ class TestTaskEvents(TestService):
|
||||
)
|
||||
self.send(event)
|
||||
|
||||
event = self.create_task_event("plot", 100)
|
||||
event = self._create_task_event("plot", task, 100)
|
||||
event["metric"] = "confusion"
|
||||
event.update(
|
||||
{
|
||||
@@ -222,11 +385,11 @@ class TestTaskEvents(TestService):
|
||||
)
|
||||
self.send(event)
|
||||
|
||||
data = self.api.events.get_task_plots(task=self.task_id)
|
||||
data = self.api.events.get_task_plots(task=task)
|
||||
assert len(data["plots"]) == 2
|
||||
|
||||
self.api.tasks.reset(task=self.task_id)
|
||||
data = self.api.events.get_task_plots(task=self.task_id)
|
||||
self.api.tasks.reset(task=task)
|
||||
data = self.api.events.get_task_plots(task=task)
|
||||
assert len(data["plots"]) == 0
|
||||
|
||||
def send_batch(self, events):
|
||||
|
||||
@@ -6,6 +6,9 @@ log = config.logger(__file__)
|
||||
|
||||
|
||||
class TestTasksEdit(TestService):
|
||||
def setUp(self, **kwargs):
|
||||
super().setUp(version=2.5)
|
||||
|
||||
def new_task(self, **kwargs):
|
||||
return self.create_temp(
|
||||
"tasks", type="testing", name="test", input=dict(view=dict()), **kwargs
|
||||
@@ -34,3 +37,39 @@ class TestTasksEdit(TestService):
|
||||
self.api.models.edit(model=not_ready_model, ready=False)
|
||||
self.assertFalse(self.api.models.get_by_id(model=not_ready_model).model.ready)
|
||||
self.api.tasks.edit(task=task, execution=dict(model=not_ready_model))
|
||||
|
||||
def test_clone_task(self):
|
||||
script = dict(
|
||||
binary="python",
|
||||
requirements=dict(pip=["six"]),
|
||||
repository="https://example.come/foo/bar",
|
||||
entry_point="test.py",
|
||||
diff="foo",
|
||||
)
|
||||
execution = dict(parameters=dict(test="Test"))
|
||||
tags = ["hello"]
|
||||
system_tags = ["development", "test"]
|
||||
task = self.new_task(
|
||||
script=script, execution=execution, tags=tags, system_tags=system_tags
|
||||
)
|
||||
|
||||
new_name = "new test"
|
||||
new_tags = ["by"]
|
||||
execution_overrides = dict(framework="Caffe")
|
||||
new_task_id = self.api.tasks.clone(
|
||||
task=task,
|
||||
new_task_name=new_name,
|
||||
new_task_tags=new_tags,
|
||||
execution_overrides=execution_overrides,
|
||||
new_task_parent=task,
|
||||
).id
|
||||
new_task = self.api.tasks.get_by_id(task=new_task_id).task
|
||||
self.assertEqual(new_task.name, new_name)
|
||||
self.assertEqual(new_task.type, "testing")
|
||||
self.assertEqual(new_task.tags, new_tags)
|
||||
self.assertEqual(new_task.status, "created")
|
||||
self.assertEqual(new_task.script, script)
|
||||
self.assertEqual(new_task.parent, task)
|
||||
self.assertEqual(new_task.execution.parameters, execution["parameters"])
|
||||
self.assertEqual(new_task.execution.framework, execution_overrides["framework"])
|
||||
self.assertEqual(new_task.system_tags, [])
|
||||
|
||||
@@ -108,7 +108,7 @@ class TestWorkersService(TestService):
|
||||
from_date = to_date - timedelta(days=1)
|
||||
|
||||
# no variants
|
||||
res = self.api.workers.get_statistics(
|
||||
res = self.api.workers.get_stats(
|
||||
items=[
|
||||
dict(key="cpu_usage", aggregation="avg"),
|
||||
dict(key="cpu_usage", aggregation="max"),
|
||||
@@ -142,7 +142,7 @@ class TestWorkersService(TestService):
|
||||
)
|
||||
|
||||
# split by variants
|
||||
res = self.api.workers.get_statistics(
|
||||
res = self.api.workers.get_stats(
|
||||
items=[dict(key="cpu_usage", aggregation="avg")],
|
||||
from_date=from_date.timestamp(),
|
||||
to_date=to_date.timestamp(),
|
||||
@@ -165,7 +165,7 @@ class TestWorkersService(TestService):
|
||||
|
||||
assert all(_check_metric_and_variants(worker) for worker in res["workers"])
|
||||
|
||||
res = self.api.workers.get_statistics(
|
||||
res = self.api.workers.get_stats(
|
||||
items=[dict(key="cpu_usage", aggregation="avg")],
|
||||
from_date=from_date.timestamp(),
|
||||
to_date=to_date.timestamp(),
|
||||
|
||||
@@ -1 +1,2 @@
|
||||
numpy>=1.12.1
|
||||
nose==1.3.7
|
||||
parameterized>=0.7.1
|
||||
|
||||
@@ -8,8 +8,9 @@ import requests
|
||||
from semantic_version import Version
|
||||
|
||||
from config import config
|
||||
from config.info import get_version
|
||||
from database.model.settings import Settings
|
||||
from version import __version__ as current_version
|
||||
from utilities.threads_manager import ThreadsManager
|
||||
|
||||
log = config.logger(__name__)
|
||||
|
||||
@@ -48,7 +49,7 @@ class CheckUpdatesThread(Thread):
|
||||
|
||||
response = requests.get(
|
||||
url,
|
||||
json={"versions": {self.component_name: str(current_version)}, "uid": uid},
|
||||
json={"versions": {self.component_name: str(get_version())}, "uid": uid},
|
||||
timeout=float(
|
||||
config.get("apiserver.check_for_updates.request_timeout_sec", 3.0)
|
||||
),
|
||||
@@ -65,7 +66,7 @@ class CheckUpdatesThread(Thread):
|
||||
if not latest_version:
|
||||
return
|
||||
|
||||
cur_version = Version(current_version)
|
||||
cur_version = Version(get_version())
|
||||
latest_version = Version(latest_version)
|
||||
if cur_version >= latest_version:
|
||||
return
|
||||
@@ -80,7 +81,16 @@ class CheckUpdatesThread(Thread):
|
||||
)
|
||||
|
||||
def _check_updates(self):
|
||||
while True:
|
||||
update_interval_sec = max(
|
||||
float(
|
||||
config.get(
|
||||
"apiserver.check_for_updates.check_interval_sec",
|
||||
60 * 60 * 24,
|
||||
)
|
||||
),
|
||||
60 * 5,
|
||||
)
|
||||
while not ThreadsManager.terminating:
|
||||
# noinspection PyBroadException
|
||||
try:
|
||||
response = self._check_new_version_available()
|
||||
@@ -98,17 +108,7 @@ class CheckUpdatesThread(Thread):
|
||||
except Exception:
|
||||
log.exception("Failed obtaining updates")
|
||||
|
||||
sleep(
|
||||
max(
|
||||
float(
|
||||
config.get(
|
||||
"apiserver.check_for_updates.check_interval_sec",
|
||||
60 * 60 * 24,
|
||||
)
|
||||
),
|
||||
60 * 5,
|
||||
)
|
||||
)
|
||||
sleep(update_interval_sec)
|
||||
|
||||
|
||||
check_updates_thread = CheckUpdatesThread()
|
||||
|
||||
@@ -12,6 +12,24 @@ def flatten_nested_items(
|
||||
for key, value in dictionary.items():
|
||||
path = prefix + (key,)
|
||||
if isinstance(value, dict) and nesting != 0:
|
||||
yield from flatten_nested_items(value, next_nesting, include_leaves, prefix=path)
|
||||
yield from flatten_nested_items(
|
||||
value, next_nesting, include_leaves, prefix=path
|
||||
)
|
||||
elif include_leaves is None or key in include_leaves:
|
||||
yield path, value
|
||||
|
||||
|
||||
def deep_merge(source: dict, override: dict) -> dict:
|
||||
"""
|
||||
Merge the override dict into the source in-place
|
||||
Contrary to the dpath.merge the sequences are not expanded
|
||||
If override contains the sequence with the same name as source
|
||||
then the whole sequence in the source is overridden
|
||||
"""
|
||||
for key, value in override.items():
|
||||
if key in source and isinstance(source[key], dict) and isinstance(value, dict):
|
||||
deep_merge(source[key], value)
|
||||
else:
|
||||
source[key] = value
|
||||
|
||||
return source
|
||||
|
||||
@@ -1,10 +1,12 @@
|
||||
from functools import wraps
|
||||
from threading import Lock, Thread
|
||||
from typing import ClassVar
|
||||
|
||||
|
||||
class ThreadsManager:
|
||||
objects = {}
|
||||
lock = Lock()
|
||||
terminating: ClassVar[bool] = False
|
||||
|
||||
def __init__(self, name=None, **threads):
|
||||
super(ThreadsManager, self).__init__()
|
||||
@@ -12,7 +14,7 @@ class ThreadsManager:
|
||||
self.objects = {}
|
||||
self.lock = Lock()
|
||||
|
||||
for name, thread in threads.items():
|
||||
for thread_name, thread in threads.items():
|
||||
if issubclass(thread, Thread):
|
||||
thread = thread()
|
||||
thread.start()
|
||||
@@ -20,9 +22,9 @@ class ThreadsManager:
|
||||
if not thread.is_alive():
|
||||
thread.start()
|
||||
else:
|
||||
raise Exception(f"Expected thread or thread class ({name}): {thread}")
|
||||
raise Exception(f"Expected thread or thread class ({thread_name}): {thread}")
|
||||
|
||||
self.objects[name] = thread
|
||||
self.objects[thread_name] = thread
|
||||
|
||||
def register(self, thread_name, daemon=True):
|
||||
def decorator(f):
|
||||
|
||||
@@ -1 +1 @@
|
||||
__version__ = "0.12.0"
|
||||
__version__ = "0.14.0"
|
||||
|
||||
@@ -2,13 +2,13 @@
|
||||
|
||||
## Introduction
|
||||
|
||||
The webserver is the **trains-server**'s component responsible for serving the TRAINS webapp.
|
||||
The webserver is the **trains-server**'s component responsible for serving the Trains webapp.
|
||||
For this purpose, we use an [NGINX](https://www.nginx.com/) server.
|
||||
|
||||
## Configuration
|
||||
|
||||
In order to serve the TRAINS webapp, the following is required:
|
||||
* The pre-built TRAINS webapp should be copied to the NGINX html directory (usually `/usr/share/nginx/html`)
|
||||
In order to serve the Trains webapp, the following is required:
|
||||
* The pre-built Trains webapp should be copied to the NGINX html directory (usually `/usr/share/nginx/html`)
|
||||
* The default NGINX port (usually `80`) should be changed to match the **trains-server** configuration (usually `8080`)
|
||||
|
||||
NOTE: This configuration may vary in different systems, depending on the NGINX version and distribution used.
|
||||
|
||||
Reference in New Issue
Block a user