Version bump to v0.14.0

Update docs with AMI IDs for v0.14.0
Fix getting empty metrics from task
2025-06-26 23:15:47 +00:00 · 2020-03-05 20:03:48 +02:00 · 2020-03-05 20:03:33 +02:00 · 2020-03-05 14:57:20 +02:00 · 2020-03-05 14:55:40 +02:00 · 2020-03-05 14:54:34 +02:00
80 changed files with 3093 additions and 1099 deletions
--- a/.gitignore
+++ b/.gitignore
@@ -1,11 +1,10 @@
+syntax: glob
 .idea
 apierrors/errors
 static/build.json
 static/dashboard/node_modules
 static/webapp/node_modules
 static/webapp/.git
-scripts/
-generators/
 *.pyc
 __pycache__
 .ropeproject
@@ -20,3 +19,4 @@ build
 dist
 code.tar.gz
 server/schema/services/_cache.json
+server/apierrors/errors/*
--- a/README.md
+++ b/README.md
@@ -1,4 +1,4 @@
-# TRAINS Server
+# Trains Server

 ##  Auto-Magical Experiment Manager & Version Control for AI

@@ -9,25 +9,20 @@

 ## Introduction

-The **trains-server** is the backend service infrastructure for [TRAINS](https://github.com/allegroai/trains).
+The **trains-server** is the backend service infrastructure for [Trains](https://github.com/allegroai/trains).
 It allows multiple users to collaborate and manage their experiments.
-By default, TRAINS is set up to work with the TRAINS demo server, which is open to anyone and resets periodically.
-In order to host your own server, you will need to install **trains-server** and point TRAINS to it.
+By default, **Trains** is set up to work with the **Trains** demo server, which is open to anyone and resets periodically.
+In order to host your own server, you will need to launch **trains-server** and point **Trains** to it.

 **trains-server** contains the following components:

-* The TRAINS Web-App, a single-page UI for experiment management and browsing
+* The **Trains** Web-App, a single-page UI for experiment management and browsing
 * RESTful API for:
    * Documenting and logging experiment information, statistics and results
    * Querying experiments history, logs and results
 * Locally-hosted file server for storing images and models making them easily accessible using the Web-App

-You can quickly setup your **trains-server** using:
- - [Docker Installation](#installation)
- - Pre-built Amazon [AWS image](#aws)
- - [Kubernetes Helm](https://github.com/allegroai/trains-server-helm#trains-server-for-kubernetes-clusters-using-helm)
- or manual [Kubernetes installation](https://github.com/allegroai/trains-server-k8s#trains-server-for-kubernetes-clusters)
-
+You can quickly [deploy](#launching-trains-server)  your **trains-server** using Docker, AWS EC2 AMI, or Kubernetes. 

 ## System design

@@ -44,136 +39,42 @@ You can quickly setup your **trains-server** using:
    - Web application on sub-domain: app.\*.\*
    - API service on sub-domain: api.\*.\*
    - File storage service on sub-domain: files.\*.\*
+    
+## Launching trains-server

-## Install / Upgrade - AWS <a name="aws"></a>
+### Prerequisites

-Use one of our pre-installed Amazon Machine Images for easy deployment in AWS.
-
-For details and instructions, see [TRAINS-server: AWS pre-installed images](docs/install_aws.md).
-
-## Docker Installation - Linux, macOS, and Windows <a name="installation"></a>
-
-Use our pre-built Docker image for easy deployment in Linux and macOS. <br>
-For [Windows](https://github.com/allegroai/trains-server/blob/master/docs/faq.md#docker_compose_win10), please see detailed docker-compose installation instructions on our [FAQ](https://github.com/allegroai/trains-server/blob/master/docs/faq.md#docker_compose_win10).<br>
-Latest docker images can be found [here](https://hub.docker.com/r/allegroai/trains).
-
-1. Setup Docker (docker-compose installation details: [Ubuntu](docs/faq.md#ubuntu) / [macOS](docs/faq.md#mac-osx))
-
-    <details>
-    <summary>Make sure ports 8080/8081/8008 are available for the TRAINS-server services:</summary>
+The ports 8080/8081/8008 must be available for the **trains-server** services.
   
-    For example, to see if port `8080` is in use: 
+For example, to see if port `8080` is in use:

-    ```bash
-    $ sudo lsof -Pn -i4 | grep :8080 | grep LISTEN
-    ```
+* Linux or macOS: 
+   
+        sudo lsof -Pn -i4 | grep :8080 | grep LISTEN
+
+* Windows:
+
+        netstat -an |find /i "8080"
+   
+### Launching   
    
-    </details>
-    
-    Increase vm.max_map_count for `ElasticSearch` docker
+Launch **trains-server** in any of the following formats:

-    - Linux
-        ```bash
-        $ echo "vm.max_map_count=262144" > /tmp/99-trains.conf
-        $ sudo mv /tmp/99-trains.conf /etc/sysctl.d/99-trains.conf
-        $ sudo sysctl -w vm.max_map_count=262144
-        $ sudo service docker restart
-        ```
-      
-    - macOS
-        ```bash
-        $ screen ~/Library/Containers/com.docker.docker/Data/vms/0/tty
-        $ sysctl -w vm.max_map_count=262144
-        ```    
+- Pre-built [AWS EC2 AMI](https://github.com/allegroai/trains-server/blob/master/docs/install_aws.md)
+- Pre-built Docker Image
+    - [Linux](https://github.com/allegroai/trains-server/blob/master/docs/install_linux_mac.md)
+    - [macOS](https://github.com/allegroai/trains-server/blob/master/docs/install_linux_mac.md)
+    - [Windows 10](https://github.com/allegroai/trains-server/blob/master/docs/install_win.md)
+- Kubernetes    
+    - [Kubernetes Helm](https://github.com/allegroai/trains-server-helm#prerequisites)
+    - Manual [Kubernetes installation](https://github.com/allegroai/trains-server-k8s#prerequisites)

-1. Create local directories for the databases and storage.
+## Connecting Trains to your trains-server

-    ```bash
-    $ sudo mkdir -p /opt/trains/data/elastic
-    $ sudo mkdir -p /opt/trains/data/mongo/db
-    $ sudo mkdir -p /opt/trains/data/mongo/configdb
-    $ sudo mkdir -p /opt/trains/data/redis
-    $ sudo mkdir -p /opt/trains/logs
-    $ sudo mkdir -p /opt/trains/data/fileserver
-    $ sudo mkdir -p /opt/trains/config
-    ```
-
-    Set folder permissions
-      
-    - Linux
-      ```bash
-      $ sudo chown -R 1000:1000 /opt/trains
-      ```
-    - macOS
-      ```bash
-      $ sudo chown -R $(whoami):staff /opt/trains
-      ```
-
-1. Download the `docker-compose.yml` file, either download [manually](https://raw.githubusercontent.com/allegroai/trains-server/master/docker-compose.yml) or execute:
-
-    ```bash
-    $ curl https://raw.githubusercontent.com/allegroai/trains-server/master/docker-compose.yml -o docker-compose.yml 
-    ```
-
-1. Launch the Docker containers <a name="launch-docker"></a>
-
-    ```bash
-    $ docker-compose -f docker-compose.yml up
-    ```
-
-1. Your server is now running on [http://localhost:8080](http://localhost:8080) and the following ports are available:
-
-    * Web server on port `8080`
-    * API server on port `8008`
-    * File server on port `8081`
-
-**\* If something went wrong along the way, check our FAQ: [Docker Setup](docs/docker_setup.md#setup-docker), [Ubuntu Support](docs/faq.md#ubuntu), [macOS Support](docs/faq.md#mac-osx)**
-
-## Optional Configuration
-
-The **trains-server** default configuration can be easily overridden using external configuration files. By default, the server will look for these files in `/opt/trains/config`.
-
-In order to apply the new configuration, you must restart the server (see [Restarting trains-server](#restart-server)).
-
-### Adding Web Login Authentication
-
-By default anyone can login to the **trains-server** Web-App.
-You can configure the **trains-server** to allow only a specific set of users to access the system.
-
-Enable this feature by placing `apiserver.conf` file under `/opt/trains/config`.
-
-Sample `apiserver.conf` configuration file can be found [here](https://github.com/allegroai/trains-server/blob/master/docs/apiserver.conf)
-
-To apply the changes, you must [restart the *trains-server*](#restart-server).
-
-### Configuring the Non-Responsive Experiments Watchdog
-
-The non-responsive experiment watchdog, monitors experiments that were not updated for a given period of time,
-and marks them as `aborted`. The watchdog is always active with a default of 7200 seconds (2 hours) of inactivity threshold.
-
-To change the watchdog's timeouts, place a `services.conf` file under `/opt/trains/config`.
-
-Sample watchdog `services.conf` configuration file can be found [here](https://github.com/allegroai/trains-server/blob/master/docs/services.conf)
-
-To apply the changes, you must [restart the *trains-server*](#restart-server).
-
-### Restarting trains-server <a name="restart-server"></a>
-
-To restart the **trains-server**, you must first stop the containers, and then restart them.
-   ```bash
-   $ docker-compose down
-   $ docker-compose -f docker-compose.yml up
-   ```
-
-
-## Configuring **TRAINS** client
-
-Once you have installed the **trains-server**, make sure to configure **TRAINS** [client](https://github.com/allegroai/trains)
-to use your locally installed server (and not the demo server).
-
- Run the `trains-init` command for an interactive setup
-
- Or manually edit `~/trains.conf` file, making sure the `api_server` value is configured correctly, for example:
+By default, the **Trains** client is set up to work with the [**Trains** demo server](https://demoapp.trains.allegro.ai/).  
+To have the **Trains** client use your **trains-server** instead:
+- Run the `trains-init` command for an interactive setup.
+- Or manually edit `~/trains.conf` file, making sure the server settings (`api_server`, `web_server`, `file_server`) are configured correctly, for example:

        api {
            # API server on port 8008
@@ -186,26 +87,42 @@ to use your locally installed server (and not the demo server).
            files_server: "http://localhost:8081"
        }

-* Notice that if you setup **trains-server** in a sub-domain configuration, there is no need to specify a port number,
+**Note**: If you have set up **trains-server** in a sub-domain configuration, then there is no need to specify a port number,
 it will be inferred from the http/s scheme.

-See [Installing and Configuring TRAINS](https://github.com/allegroai/trains#configuration) for more details.
+After launching the **trains-server** and configuring the **Trains** client to use the **trains-server**,
+you can [use](https://github.com/allegroai/trains#using-trains) **Trains** in your experiments and view them in your **trains-server** web server,
+for example http://localhost:8080.  
+For more information about the Trains client, see [**Trains**](https://github.com/allegroai/trains).

-## What next?
+## Advanced Functionality

-Now that the **trains-server** is installed, and TRAINS is configured to use it,
-you can [use](https://github.com/allegroai/trains#using-trains) TRAINS in your experiments and view them in the web server,
-for example http://localhost:8080
+**trains-server** provides a few additional useful features, which can be manually enabled:
+ 
+* [Web login authentication](https://github.com/allegroai/trains-server/blob/master/docs/faq.md#web-auth)
+* [Non-responsive experiments watchdog](https://github.com/allegroai/trains-server/blob/master/docs/faq.md#watchdog-the-non-responsive-task-watchdog-settings)  
+
+## Restarting trains-server
+
+To restart the **trains-server**, you must first stop the containers, and then restart them.
+
+   ```bash
+   docker-compose down
+   docker-compose -f docker-compose.yml up
+   ```

 ## Upgrading <a name="upgrade"></a>

-We are constantly updating, improving and adding to the **trains-server**.
-New releases will include new pre-built Docker images.
-When we release a new version and include a new pre-built Docker image for it, upgrade as follows:
+**trains-server** releases are also reflected in the [docker compose configuration file](https://github.com/allegroai/trains-server/blob/master/docker-compose.yml).  
+We strongly encourage you to keep your **trains-server** up to date, by keeping up with the current release.
+
+**Note**: The following upgrade instructions use the Linux OS as an example.
+
+To upgrade your existing **trains-server** deployment:

 1. Shut down the docker containers
   ```bash
-   $ docker-compose down
+   docker-compose down
   ```

 1. We highly recommend backing up your data directory before upgrading.
@@ -213,7 +130,7 @@ When we release a new version and include a new pre-built Docker image for it, u
   Assuming your data directory is `/opt/trains`, to archive all data into `~/trains_backup.tgz` execute:

   ```bash
-   $ sudo tar czvf ~/trains_backup.tgz /opt/trains/data
+   sudo tar czvf ~/trains_backup.tgz /opt/trains/data
   ```    

   <details>
@@ -221,29 +138,29 @@ When we release a new version and include a new pre-built Docker image for it, u

   To restore this example backup, execute:
   ```bash
-   $ sudo rm -R /opt/trains/data
-   $ sudo tar -xzf ~/trains_backup.tgz -C /opt/trains/data
+   sudo rm -R /opt/trains/data
+   sudo tar -xzf ~/trains_backup.tgz -C /opt/trains/data
   ```
   </details>

-1. Download the latest `docker-compose.yml` file, either [manually](https://raw.githubusercontent.com/allegroai/trains-server/master/docker-compose.yml) or execute:
+1. Download the latest `docker-compose.yml` file.

   ```bash
-   $ curl https://raw.githubusercontent.com/allegroai/trains-server/master/docker-compose.yml -o docker-compose.yml 
+   curl https://raw.githubusercontent.com/allegroai/trains-server/master/docker-compose.yml -o docker-compose.yml 
   ```

-1. Spin up the docker containers, it will automatically pull the latest trains-server build    
+1. Spin up the docker containers, it will automatically pull the latest **trains-server** build    
   ```bash
-   $ docker-compose -f docker-compose.yml pull
-   $ docker-compose -f docker-compose.yml up
+   docker-compose -f docker-compose.yml pull
+   docker-compose -f docker-compose.yml up
   ```

-**\* If something went wrong along the way, check our FAQ: [Docker Upgrade](docs/docker_setup.md#common-docker-upgrade-errors)**
+**\* If something went wrong along the way, check our FAQ: [Common Docker Upgrade Errors](https://github.com/allegroai/trains-server/blob/master/docs/faq.md#common-docker-upgrade-errors).**


 ## Community & Support

-If you have any questions, look to the TRAINS-server [FAQ](https://github.com/allegroai/trains-server/blob/master/docs/faq.md), or
+If you have any questions, look to the Trains server [FAQ](https://github.com/allegroai/trains-server/blob/master/docs/faq.md), or
 tag your questions on [stackoverflow](https://stackoverflow.com/questions/tagged/trains) with '**trains**' tag.

 For feature requests or bug reports, please use [GitHub issues](https://github.com/allegroai/trains-server/issues).
--- a/docker-compose-unified.yml
+++ b/docker-compose-unified.yml
@@ -20,9 +20,12 @@ services:
      - mongo
      - elasticsearch
    environment:
-      ELASTIC_SERVICE_HOST: elasticsearch
-      MONGODB_SERVICE_HOST: mongo
-      REDIS_SERVICE_HOST: redis
+      TRAINS_ELASTIC_SERVICE_HOST: elasticsearch
+      TRAINS_ELASTIC_SERVICE_PORT: 9200
+      TRAINS_MONGODB_SERVICE_HOST: mongo
+      TRAINS_MONGODB_SERVICE_PORT: 27017
+      TRAINS_REDIS_SERVICE_HOST: redis
+      TRAINS_REDIS_SERVICE_PORT: 6379
    networks:
      - backend
  elasticsearch:
--- a/docker-compose-win10.yml
+++ b/docker-compose-win10.yml
@@ -16,9 +16,12 @@ services:
      - elasticsearch
      - fileserver
    environment:
-      ELASTIC_SERVICE_HOST: elasticsearch
-      MONGODB_SERVICE_HOST: mongo
-      REDIS_SERVICE_HOST: redis
+      TRAINS_ELASTIC_SERVICE_HOST: elasticsearch
+      TRAINS_ELASTIC_SERVICE_PORT: 9200
+      TRAINS_MONGODB_SERVICE_HOST: mongo
+      TRAINS_MONGODB_SERVICE_PORT: 27017
+      TRAINS_REDIS_SERVICE_HOST: redis
+      TRAINS_REDIS_SERVICE_PORT: 6379
    ports:
    - "8008:8008"
    networks:
@@ -114,4 +117,4 @@ networks:
    driver: bridge

 volumes:
-  mongodata:
+  mongodata:
--- a/docker-compose.yml
+++ b/docker-compose.yml
@@ -16,9 +16,14 @@ services:
      - elasticsearch
      - fileserver
    environment:
-      ELASTIC_SERVICE_HOST: elasticsearch
-      MONGODB_SERVICE_HOST: mongo
-      REDIS_SERVICE_HOST: redis
+      TRAINS_ELASTIC_SERVICE_HOST: elasticsearch
+      TRAINS_ELASTIC_SERVICE_PORT: 9200
+      TRAINS_MONGODB_SERVICE_HOST: mongo
+      TRAINS_MONGODB_SERVICE_PORT: 27017
+      TRAINS_REDIS_SERVICE_HOST: redis
+      TRAINS_REDIS_SERVICE_PORT: 6379
+      TRAINS__apiserver__mongo__pre_populate__enabled: "true"
+      TRAINS__apiserver__mongo__pre_populate__zip_file: "/opt/trains/db-pre-populate/export.zip"
    ports:
    - "8008:8008"
    networks:
--- a/docs/apiserver.conf
+++ b/docs/apiserver.conf
@@ -1,5 +1,5 @@
 auth {
-    # Fixed users login credetials
+    # Fixed users login credentials
    # No other user will be able to login
    fixed_users {
        enabled: true
--- a/docs/docker_setup.md
+++ b/docs/docker_setup.md
@@ -1,166 +0,0 @@
-# TRAINS-server: Using Docker Pre-Built Images
-
-The pre-built Docker image for the **trains-server** is the quickest way to get started with your own **TRAINS** server.
-
-You can also build the entire **trains-server** architecture using the code available in the [trains-server](https://github.com/allegroai/trains-server) repository.
-
-**Note**: We tested this pre-built Docker image with Linux, only. For Windows users, we recommend installing the pre-built image on a Linux virtual machine.
-
-## Prerequisites
-
-* You must be logged in as a user with sudo privileges
-* Use `bash` for all command-line instructions in this installation
-
-## Setup Docker
-
-### Step 1: Install Docker CE
-
-You must first install Docker. For instructions about installing Docker, see [Supported platforms](https://docs.docker.com/install//#support) in the Docker documentation.
-
-For example, to [install in Ubuntu](https://docs.docker.com/install/linux/docker-ce/ubuntu/) / Mint (x86_64/amd64):
-
-```bash
-sudo apt-get install -y apt-transport-https ca-certificates curl software-properties-common
-curl -fsSL https://download.docker.com/linux/ubuntu/gpg | sudo apt-key add -
-. /etc/os-release
-sudo add-apt-repository "deb [arch=amd64] https://download.docker.com/linux/ubuntu $UBUNTU_CODENAME stable"
-sudo apt-get update
-sudo apt-get install -y docker-ce
-```
-
-### Step 2: Set the Maximum Number of Memory Map Areas
-
-Elastic requires that the `vm.max_map_count` kernel setting, which is the maximum number of memory map areas a process can use, is set to at least 262144.
-
-For CentOS 7, Ubuntu 16.04, Mint 18.3, Ubuntu 18.04 and Mint 19.x, we tested the following commands to set `vm.max_map_count`:
-
-```bash
-echo "vm.max_map_count=262144" > /tmp/99-trains.conf
-sudo mv /tmp/99-trains.conf /etc/sysctl.d/99-trains.conf
-sudo sysctl -w vm.max_map_count=262144
-```
-
-For information about setting this parameter on other systems, see the [elastic](https://www.elastic.co/guide/en/elasticsearch/reference/current/docker.html#docker-cli-run-prod-mode) documentation.
-
-### Step 3: Restart the Docker daemon
-
-Restart the Docker daemon.
-
-```bash
-sudo service docker restart
-```
-
-### Step 4: Choose a Data Directory
-
-Choose a directory on your system in which all data maintained by the **trains-server** is stored.
-Create this directory, and set its owner and group to `uid` 1000. The data stored in this directory includes the database, uploaded files and logs.
-
-For example, if your data directory is `/opt/trains`, then use the following command:
-
-```bash
-sudo mkdir -p /opt/trains/data/elastic
-sudo mkdir -p /opt/trains/data/mongo/db
-sudo mkdir -p /opt/trains/data/mongo/configdb
-sudo mkdir -p /opt/trains/data/redis
-sudo mkdir -p /opt/trains/logs
-sudo mkdir -p /opt/trains/data/fileserver
-sudo mkdir -p /opt/trains/config
-
-sudo chown -R 1000:1000 /opt/trains
-```
-
-## TRAINS-server: Manually Launching Docker Containers <a name="launch"></a>
-
-You can manually launch the Docker containers using the following commands.
-
-If your data directory is not `/opt/trains`, then in the five `docker run` commands below, you must replace all occurrences of `/opt/trains` with your data directory path.
-
-1. Launch the **trains-elastic** Docker container.
-
-        sudo docker run -d --restart="always" --name="trains-elastic" -e "bootstrap.memory_lock=true" --ulimit memlock=-1:-1 -e "ES_JAVA_OPTS=-Xms2g -Xmx2g" -e "bootstrap.memory_lock=true" -e "cluster.name=trains" -e "discovery.zen.minimum_master_nodes=1" -e "node.name=trains" -e "script.inline=true" -e "script.update=true" -e "thread_pool.bulk.queue_size=2000" -e "thread_pool.search.queue_size=10000" -e "xpack.security.enabled=false" -e "xpack.monitoring.enabled=false" -e "cluster.routing.allocation.node_initial_primaries_recoveries=500" -e "node.ingest=true" -e "http.compression_level=7" -e "reindex.remote.whitelist=*.*" -e "script.painless.regex.enabled=true" --network="host" -v /opt/trains/data/elastic:/usr/share/elasticsearch/data docker.elastic.co/elasticsearch/elasticsearch:5.6.16
-
-1. Launch the **trains-mongo** Docker container.
-
-        sudo docker run -d --restart="always" --name="trains-mongo" -v /opt/trains/data/mongo/db:/data/db -v /opt/trains/data/mongo/configdb:/data/configdb --network="host" mongo:3.6.5
-
-1. Launch the **trains-redis** Docker container.
-
-        sudo docker run -d --restart="always" --name="trains-redis" -v /opt/trains/data/redis:/data --network="host" redis:5.0
-
-1. Launch the **trains-fileserver** Docker container.
-
-        sudo docker run -d --restart="always" --name="trains-fileserver" --network="host" -v /opt/trains/logs:/var/log/trains -v /opt/trains/data/fileserver:/mnt/fileserver allegroai/trains:latest fileserver
-
-1. Launch the **trains-apiserver** Docker container.
-
-        sudo docker run -d --restart="always" --name="trains-apiserver" --network="host" -v /opt/trains/logs:/var/log/trains -v /opt/trains/config:/opt/trains/config allegroai/trains:latest apiserver
-
-1. Launch the **trains-webserver** Docker container.
-
-        sudo docker run -d --restart="always" --name="trains-webserver" -p 8080:80 allegroai/trains:latest webserver
-
-1. Your server is now running on [http://localhost:8080](http://localhost:8080) and the following ports are available:
-
-    * API server on port `8008`
-    * Web server on port `8080`
-    * File server on port `8081`
-
-## Manually Upgrading TRAINS-server Containers <a name="upgrade"></a>
-
-We are constantly updating, improving and adding to the **trains-server**.
-New releases will include new pre-built Docker images.
-When we release a new version and include a new pre-built Docker image for it, upgrade as follows:
-
-1. Shut down and remove each of your Docker instances using the following commands:
-    
-    ```bash
-    $ sudo docker stop <docker-name>
-    $ sudo docker rm -v <docker-name>
-    ```
-
-    The Docker names are (see [Launching Docker Containers](#launch-docker)):
-
-    * `trains-elastic`
-    * `trains-mongo`
-    * `trains-redis`
-    * `trains-fileserver`
-    * `trains-apiserver`
-    * `trains-webserver`
-
-2. We highly recommend backing up your data directory!. A simple way to do that is using `tar`:
-
-    For example, if your data directory is `/opt/trains`, use the following command:
-
-    ```bash
-    $ sudo tar czvf ~/trains_backup.tgz /opt/trains/data
-    ```
-    This backups all data to an archive in your home directory.
-
-    To restore this example backup, use the following command:
-    ```bash
-    $ sudo rm -R /opt/trains/data
-    $ sudo tar -xzf ~/trains_backup.tgz -C /opt/trains/data
-    ```
-
-3. Pull the new **trains-server** docker image using the following command:
-
-    ```bash
-    $ sudo docker pull allegroai/trains:latest
-    ```
-
-    If you wish to pull a different version, replace `latest` with the required version number, for example:
-    ```bash
-    $ sudo docker pull allegroai/trains:0.11.0
-     ```
-
-4. Launch the newly released Docker image (see [Launching Docker Containers](#trains-server-manually-launching-docker-containers-)).
-
-
-#### Common Docker Upgrade Errors 
-
-* In case of a docker error: "... The container name "/trains-???" is already in use by ..."      
-    Try removing deprecated images with:
-    ```bash
-    $ docker rm -f $(docker ps -a -q)
-    ```
-  
--- a/docs/faq.md
+++ b/docs/faq.md
@@ -1,77 +1,122 @@
-# TRAINS-server FAQ
+# trains-server FAQ

-* [Deploying trains-server on Kubernetes clusters](#kubernetes)
+Launching **trains-server**

-* [Creating a Helm Chart for trains-server Kubernetes deployment](#helm)
+* How do I launch **trains-server** on:

-* [Running trains-server on Mac OS X](#mac-osx)
+    * [Stand alone Linux Ubuntu systems?](#ubuntu)
+    
+    * [macOS?](#mac-osx)
+    
+    * [Windows 10?](#docker_compose_win10)

-* [Running trains-server on Windows 10](#docker_compose_win10)
+* [How do I restart trains-server?](#restart)

-* [Installing trains-server on stand alone Linux Ubuntu  systems ](#ubuntu)
+Kubernetes

-* [Resolving port conflicts preventing fixed users mode authentication and login](#port-conflict)
+* [Can I deploy trains-server on Kubernetes clusters?](#kubernetes)

-* [Configuring trains-server for sub-domains and load balancers](#sub-domains)
+* [Can I create a Helm Chart for trains-server Kubernetes deployment?](#helm)

+Configuration

-### Deploying trains-server on Kubernetes clusters <a name="kubernetes"></a>
+* [How do I configure trains-server for sub-domains and load balancers?](#sub-domains)

-**trains-server** supports Kubernetes. See [trains-server-k8s](https://github.com/allegroai/trains-server-k8s)
-which contains the YAML files describing the required services and detailed instructions for deploying
-**trains-server** to a Kubernetes clusters.
+* [Can I add web login authentication to trains-server?](#web-auth)

-### Creating a Helm Chart for trains-server Kubernetes deployment <a name="helm"></a>
+* [Can I modify the non-responsive experiment watchdog settings?](#watchdog)

-**trains-server** supports creating a Helm chart for Kubernetes deployment. See [trains-server-helm](https://github.com/allegroai/trains-server-helm)
-which you can use to create a Helm chart for **trains-server** and contains detailed instructions for deploying
-**trains-server** to a Kubernetes clusters using Helm.
+Troubleshooting

-### Running trains-server on Mac OS X <a name="mac-osx"></a>
+* [How do I fix Docker upgrade errors?](#common-docker-upgrade-errors)

-To install and configure **trains-server** on Mac OS X, follow the steps below.
+* [Why is web login authentication not working?](#port-conflict)

-1. Install [docker for OS X](https://docs.docker.com/docker-for-mac/install/).
+## Launching **trains-server**

-1. Configure [Docker](https://www.elastic.co/guide/en/elasticsearch/reference/current/docker.html#docker-cli-run-prod-mode).
+### How do I launch trains-server on stand alone Linux Ubuntu systems? <a name="ubuntu"></a>

-        $ screen ~/Library/Containers/com.docker.docker/Data/vms/0/tty
-        $ sysctl -w vm.max_map_count=262144
+To launch **trains-server** on a stand alone Linux Ubuntu:
+
+1. Install [docker for Ubuntu](https://docs.docker.com/install/linux/docker-ce/ubuntu/).
+
+1. Install `docker-compose` using the following commands (for more detailed information, see the [Install Docker Compose](https://docs.docker.com/compose/install/) in the Docker documentation):
+
+        sudo curl -L "https://github.com/docker/compose/releases/download/1.24.1/docker-compose-$(uname -s)-$(uname -m)" -o /usr/local/bin/docker-compose
+        sudo chmod +x /usr/local/bin/docker-compose
+
+1. Remove the previous installation of **trains-server**.
+
+    **WARNING**: This clears all existing **Trains** databases.
+
+        sudo rm -R /opt/trains/

 1. Create local directories for the databases and storage.

-        $ sudo mkdir -p /opt/trains/data/elastic
-        $ sudo mkdir -p /opt/trains/data/mongo/db
-        $ sudo mkdir -p /opt/trains/data/mongo/configdb
-        $ sudo mkdir -p /opt/trains/data/redis
-        $ sudo mkdir -p /opt/trains/logs
-        $ sudo mkdir -p /opt/trains/config
-        $ sudo mkdir -p /opt/trains/data/fileserver
-        $ sudo chown -R $(whoami):staff /opt/trains
+        sudo mkdir -p /opt/trains/data/elastic
+        sudo mkdir -p /opt/trains/data/mongo/db
+        sudo mkdir -p /opt/trains/data/mongo/configdb
+        sudo mkdir -p /opt/trains/logs
+        sudo mkdir -p /opt/trains/config
+        sudo mkdir -p /opt/trains/data/fileserver
+        sudo chown -R 1000:1000 /opt/trains
+
+1. Clone the [trains-server](https://github.com/allegroai/trains-server) repository and change directories to the new **trains-server** directory.
+
+        git clone https://github.com/allegroai/trains-server.git
+        cd trains-server
+
+1. Run `docker-compose`
+
+        /usr/local/bin/docker-compose -f docker-compose.yml up
+
+    Your server is now running on [http://localhost:8080](http://localhost:8080)
+    
+### How do I launch trains-server on macOS? <a name="mac-osx"></a>
+
+To launch **trains-server** on macOS:
+
+1. Install [docker for macOS](https://docs.docker.com/docker-for-mac/install/).
+
+1. Configure [Docker](https://www.elastic.co/guide/en/elasticsearch/reference/current/docker.html#docker-cli-run-prod-mode).
+
+        screen ~/Library/Containers/com.docker.docker/Data/vms/0/tty
+        sysctl -w vm.max_map_count=262144
+
+1. Create local directories for the databases and storage.
+
+        sudo mkdir -p /opt/trains/data/elastic
+        sudo mkdir -p /opt/trains/data/mongo/db
+        sudo mkdir -p /opt/trains/data/mongo/configdb
+        sudo mkdir -p /opt/trains/data/redis
+        sudo mkdir -p /opt/trains/logs
+        sudo mkdir -p /opt/trains/config
+        sudo mkdir -p /opt/trains/data/fileserver
+        sudo chown -R $(whoami):staff /opt/trains

 1. Open the Docker app, select **Preferences**, and then on the **File Sharing** tab, add `/opt/trains`.

 1. Clone the [trains-server](https://github.com/allegroai/trains-server) repository and change directories to the new **trains-server** directory.

-        $ git clone https://github.com/allegroai/trains-server.git
-        $ cd trains-server
+        git clone https://github.com/allegroai/trains-server.git
+        cd trains-server

-1. Run `docker-compose` with the unified docker image.
+1. Run `docker-compose` with the docker compose file.

-        $ docker-compose -f docker-compose-unified.yml up
+        docker-compose -f docker-compose.yml up

    Your server is now running on [http://localhost:8080](http://localhost:8080)

-### Running trains-server on Windows 10 <a name="docker_compose_win10"></a>
+### How do I launch trains-server on Windows 10? <a name="docker_compose_win10"></a>

 You can run **trains-server** on Windows 10 using Docker Desktop for Windows (see the Docker [System Requirements](https://docs.docker.com/docker-for-windows/install/#system-requirements)).

-To run **trains-server** on Windows 10, follow the steps below.
+To launch **trains-server** on Windows 10:

 1. Install the Docker Desktop for Windows application by either:

-    * Following the [Install Docker Desktop on Windows](https://docs.docker.com/docker-for-windows/install/) instructions.
-    * Running the Docker installation [wizard](https://hub.docker.com/?overlay=onboarding).
+    * following the [Install Docker Desktop on Windows](https://docs.docker.com/docker-for-windows/install/) instructions.
+    * running the Docker installation [wizard](https://hub.docker.com/?overlay=onboarding).

 1. Increase the memory allocation in Docker Desktop to `4GB`.

@@ -83,110 +128,46 @@ To run **trains-server** on Windows 10, follow the steps below.

 1. Create local directories for data and logs. Open PowerShell and execute the following commands:

-        mkdir c:\opt\trains\logs
-        mkdir c:\opt\trains\config
+        cd c:
        mkdir c:\opt\trains\data
-        mkdir c:\opt\trains\data\elastic
-        mkdir c:\opt\trains\data\redis
-        mkdir c:\opt\trains\data\fileserver
+        mkdir c:\opt\trains\logs

-1. Save the **trains-server** docker-compose YAML file [docker-compose-win10.yml](https://raw.githubusercontent.com/allegroai/trains-server/master/docker-compose-win10.yml) as `c:\opt\trains\docker-compose.yml`.
+1. Download the **trains-server** docker-compose YAML file [docker-compose-win10.yml](https://raw.githubusercontent.com/allegroai/trains-server/master/docker-compose-win10.yml) as `c:\opt\trains\docker-compose.yml`.

 1. Run `docker-compose`. In PowerShell, execute the following commands:

-        cd c:\opt\trains\
-        docker-compose up
+        docker-compose -f up docker-compose-win10.yml

    Your server is now running on [http://localhost:8080](http://localhost:8080)

-### Installing trains-server on stand alone Linux Ubuntu systems <a name="ubuntu"></a>
+### How do I restart trains-server? <a name="restart"></a>

-To install **trains-server** on a stand alone Linux Ubuntu, follow the steps belows.
+Restart *trains-server* by first stopping the Docker containers and then restarting them.

-1. Install [docker for Ubuntu](https://docs.docker.com/install/linux/docker-ce/ubuntu/).
+   ```bash
+   docker-compose down
+   docker-compose up -f docker-compose.yml
+   ```
+   
+   **Note**: If you are using a different docker-compose YAML file, specify that file.

-1. Install `docker-compose` using the following commands (for more detailed information, see the [Install Docker Compose](https://docs.docker.com/compose/install/) in the Docker documentation):
+## Kubernetes

-        sudo curl -L "https://github.com/docker/compose/releases/download/1.24.1/docker-compose-$(uname -s)-$(uname -m)" -o /usr/local/bin/docker-compose
-        sudo chmod +x /usr/local/bin/docker-compose
+### Can I deploy trains-server on Kubernetes clusters? <a name="kubernetes"></a>

-1. Remove the previous installation of **trains-server**.
+**trains-server** supports Kubernetes. See [trains-server-k8s](https://github.com/allegroai/trains-server-k8s)
+which contains the YAML files describing the required services and detailed instructions for deploying
+**trains-server** to a Kubernetes clusters.

-    **WARNING**: This clears all existing **TRAINS** databases.
+### Can I create a Helm Chart for trains-server Kubernetes deployment? <a name="helm"></a>

-        $ sudo rm -R /opt/trains/
+**trains-server** supports creating a Helm chart for Kubernetes deployment. See [trains-server-helm](https://github.com/allegroai/trains-server-helm)
+which you can use to create a Helm chart for **trains-server** and contains detailed instructions for deploying
+**trains-server** to a Kubernetes clusters using Helm.

-1. Create local directories for the databases and storage.
+## Configuration

-        $ sudo mkdir -p /opt/trains/data/elastic
-        $ sudo mkdir -p /opt/trains/data/mongo/db
-        $ sudo mkdir -p /opt/trains/data/mongo/configdb
-        $ sudo mkdir -p /opt/trains/logs
-        $ sudo mkdir -p /opt/trains/config
-        $ sudo mkdir -p /opt/trains/data/fileserver
-        $ sudo chown -R 1000:1000 /opt/trains
-
-1. Clone the [trains-server](https://github.com/allegroai/trains-server) repository and change directories to the new **trains-server** directory.
-
-        $ git clone https://github.com/allegroai/trains-server.git
-        $ cd trains-server
-
-1. Run `docker-compose`
-
-        $ /usr/local/bin/docker-compose -f docker-compose.yml up
-
-    Your server is now running on [http://localhost:8080](http://localhost:8080)
-
-### Resolving port conflicts preventing fixed users mode authentication and login <a name="port-conflict"></a>
-
-A port conflict may occur between the **trains-server** MongoDB and Elastic instances and other
-instances running on your system. **trains-server** uses the following default ports which may be in conflict with other instances:
-
-* MongoDB port `27017`
-* Elastic port `9200`
-
-You can check for port conflicts in the logs in `/opt/trains/log`.
-
-If a port conflict occurs, first change the port in your **trains-server** `/opt/trains/server/config/default/hosts.conf` file to the new port and then
-run the `docker run` command with the `port` option specifying the new port to restart the **trains-server** instance.
-
-For example, to resolve a MongoDB port conflict change port `27017` to `27018`:
-
-1. Modify `/opt/trains/server/config/default/hosts.conf` changing the ports in the `mongo` section:
-
-        elastic {
-          events {
-            hosts: [{host: "127.0.0.1", port: 9200}]
-            args {
-              timeout: 60
-              dead_timeout: 10
-              max_retries: 5
-              retry_on_timeout: true
-            }
-            index_version: "1"
-          }
-        }
-
-        mongo {
-          backend {
-            host: "mongodb://127.0.0.1:27018/backend"
-          }
-          auth {
-            host: "mongodb://127.0.0.1:27018/auth"
-          }
-        }
-
-2. Start the **trains-server** MongoDB container using `--port 27018`.
-
-        sudo docker run -d --restart="always" --name="trains-mongo" -v /opt/trains/data/mongo/db:/data/db -v /opt/trains/data/mongo/configdb:/data/configdb --network="host" mongo:3.6.5 mongod --port 27018
-
-In a future version of **trains-server**, to start the API server, environment variables will be available to use instead of modifying the configuration file (instead of Step 1 above).
-The environment variables will be available to set different ports for both MongoDB and Elastic instances:
-
-* `MONGODB_SERVICE_PORT` (e.g., `MONGODB_SERVICE_PORT=27018`)
-* `ELASTIC_SERVICE_POST` (e.g., `ELASTIC_SERVICE_POST=9201`)
-
-### Configuring trains-server for sub-domains and load balancers <a name="sub-domains"></a>
+### How do I configure trains-server for sub-domains and load balancers? <a name="sub-domains"></a>

 You can configure **trains-server** for sub-domains and a load balancer.

@@ -222,3 +203,126 @@ For example, if your domain is `trains.mydomain.com` and your sub-domains are `a

 1. Run the Docker containers with our updated `docker run` commands (see [Launching Docker Containers](#https://github.com/allegroai/trains-server#launching-docker-containers)).

+### Can I add web login authentication to trains-server? <a name="web-auth"></a>
+
+By default, anyone can login to the **trains-server** Web-App.
+You can configure the **trains-server** to allow only a specific set of users to access the system.
+
+To add web login authentication to **trains-server**:
+
+1. If you are not using the current **trains-server** version, then [upgrade](https://github.com/allegroai/trains-server#upgrade).
+
+1. In `/opt/trains/config/apiserver.conf`, add the `auth` section and in it specify the users, for example:
+
+    **Note**: A sample `apiserver.conf` configuration file is also available [here](https://github.com/allegroai/trains-server/blob/master/docs/apiserver.conf).
+
+        auth {
+            # Fixed users login credentials
+            # No other user will be able to login
+            fixed_users {
+                enabled: true
+                users: [
+                    {
+                        username: "jane"
+                        password: "12345678"
+                        name: "Jane Doe"
+                    },
+                    {
+                        username: "john"
+                        password: "12345678"
+                        name: "John Doe"
+                    },
+                ]
+            }
+        }
+
+1. Restart **trains-server** (see the [Restarting trains-server](#restart) FAQ).
+
+### Can I modify the experiment watchdog settings? <a name="watchdog"></a>
+
+The non-responsive experiment watchdog monitors experiments that were not updated for a specified period of time
+and marks them as `aborted`. The watchdog is always active. 
+
+You can modify the following settings for the watchdog:
+ 
+* the time threshold (in seconds) of experiment inactivity (default value is 7200 seconds (2 hours))
+* the time interval (in seconds) between watchdog cycles
+
+To change the watchdog's settings:
+
+1. In `/opt/trains/config`, add the `services.conf` file and in it specify the watchdog settings, for example:
+
+    **Note**: A sample watchdog `services.conf` configuration file is also available [here](https://github.com/allegroai/trains-server/blob/master/docs/services.conf).
+
+        tasks {
+            non_responsive_tasks_watchdog {
+                # In-progress tasks that haven't been updated for at least 'value' seconds will be stopped by the watchdog
+                threshold_sec: 7200
+        
+                # Watchdog will sleep for this number of seconds after each cycle
+                watch_interval_sec: 900
+            }
+        }
+
+1. Restart **trains-server** (see the [Restarting trains-server](#restart) FAQ).
+
+## Troubleshooting
+
+### How do I fix Docker upgrade errors? <a name="common-docker-upgrade-errors"></a>
+
+To resolve the Docker error "... The container name "/trains-???" is already in use by ...", try removing deprecated images:
+
+    docker rm -f $(docker ps -a -q)
+
+### Why is web login authentication not working?
+
+A port conflict between the **trains-server** MongoDB and / or Elastic instances, and other
+instances running on your system may prevent web login authentication
+from working correctly. 
+
+**trains-server** uses the following default ports which may be in conflict with other instances:
+
+* MongoDB port `27017`
+* Elastic port `9200`
+
+You can check for port conflicts in the logs in `/opt/trains/log`.
+
+If a port conflict occurs, change the MongoDB and / or Elastic ports in the `docker-compose.yml`,
+and then run the Docker compose commands to restart the **trains-server** instance.
+
+To change the MongoDB and / or Elastic ports for **trains-server**:
+
+1. Edit the `docker-compose.yml` file.
+
+1. In the `services/trainsserver/environment` section, add the following environment variable(s):
+
+    * For MongoDB:
+    
+            MONGODB_SERVICE_PORT: <new-mongodb-port>
+        
+    * For Elastic:
+            
+            ELASTIC_SERVICE_PORT: <new-elasticsearch-port> 
+        
+    For example:
+    
+        MONGODB_SERVICE_PORT: 27018
+        ELASTIC_SERVICE_PORT: 9201
+            
+1. For MongoDB, in the `services/mongo/ports` section, expose the new MongoDB port:
+
+        <new-mongodb-port>:27017
+        
+    For example:
+    
+        20718:27017
+        
+1. For Elastic, in the `services/elasticsearch/ports` section, expose the new Elastic port:
+
+        <new-elsticsearch-port>:9200
+            
+    For example:
+
+        9201:9200
+    
+2. Restart **trains-server** (see the [Restarting trains-server](#restart) FAQ).
--- a/docs/install_aws.md
+++ b/docs/install_aws.md
@@ -1,32 +1,36 @@
-# **TRAINS-server**: AWS pre-installed images
+# Deploying **trains-server** on AWS

-In order to easily deploy **trains-server** on AWS, we created the following Amazon Machine Images (AMIs).
+To easily deploy **trains-server** on AWS, use one of our pre-built Amazon Machine Images (AMIs).  
+We provide AMIs per region for each released version of **trains-server**, see [Released versions](#released-versions) below. 

-Service port numbers on these AMIs are:
- - Web: 8080
- - API: 8008
- - File Server: 8081
+Once the AMI is up and running, [configure the Trains client](https://github.com/allegroai/trains/blob/master/README.md#configuration) to use your **trains-server**.  
+The service port numbers on our **trains-server** AMIs:

-Persistent storage configuration:
- - MongoDB: /opt/trains/data/mongo/
- - ElasticSearch: /opt/trains/data/elastic/
- - File Server: /mnt/fileserver/
+- Web application: `8080`
+- API Server: `8008`
+- File Server: `8081`

-Instructions on launching a custom AMI from the EC2 console can be found [here](https://aws.amazon.com/premiumsupport/knowledge-center/launch-instance-custom-ami/)
-and a detailed version [here](https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/launching-instance.html).
+The persistent storage configuration:

-The minimum recommended instance type is **t3a.large**
+- MongoDB: `/opt/trains/data/mongo/`
+- ElasticSearch: `/opt/trains/data/elastic/`
+- File Server: `/mnt/fileserver/`
+
+For examples and use cases, check the [Trains usage examples](https://github.com/allegroai/trains/blob/master/docs/trains_examples.md).
+
+For instructions on launching a custom AMI from the EC2 console, see the [AWS Knowledge Center](https://aws.amazon.com/premiumsupport/knowledge-center/launch-instance-custom-ami/) or detailed instructions in the [AWS Documentation](https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/launching-instance.html).
+
+The minimum recommended amount of RAM is 8GB. For example, **t3.large** or **t3a.large** would have the minimum recommended amount of resources.

 ## Upgrading

-In order to upgrade **trains-server** on an existing EC2 instance based on one of these AMIs, SSH into the instance and follow the [upgrade instructions](../README.md#upgrade) for **trains-server**.
+To upgrade **trains-server** on an existing EC2 instance based on one of these AMIs, SSH into the instance and follow the [upgrade instructions](../README.md#upgrade) for **trains-server**.

-### Upgrading AMI's to v0.12 
-**Including the automatically updated AMI**
+### Upgrading AMIs to v0.12 

-Version 0.12 introduced an additional REDIS docker to the trains-server setup.
+This upgrade includes the automatically updated AMI in Version 0.12. It also includes an additional REDIS docker to the **trains-server** setup.

-AMI upgrading instructions:
+To upgrade the AMI:

 1. SSH to the EC2 machine running one of the `Latest Version AMI's`
 2. Execute the following bash commands
@@ -44,47 +48,85 @@ AMI upgrading instructions:

 ## Released versions

-The following sections provide a list containing AMI Image ID per region for each released **trains-server** version.
+The following sections contain lists of AMI Image IDs, per region, for each released **trains-server** version.

-### Latest Version AMI <a name="autoupdate"></a>
-**For easier upgrades: The following AMI automatically update to the latest release every reboot**
+### Latest version AMI - v0.14.0 (auto update)<a name="autoupdate"></a>

-* **eu-north-1** : ami-055909c1b9471451d
-* **ap-south-1** : ami-0476123cc77226faf
-* **eu-west-3** : ami-01df7d35ab63cca70
-* **eu-west-2** : ami-00e8004c11fd0228e
-* **eu-west-1** : ami-04293fbba6d3acad1
-* **ap-northeast-2** : ami-004331f9c5eb13e94
-* **ap-northeast-1** : ami-08cc80e2049b30e61
-* **sa-east-1** : ami-06d814a0b6ffa3153
-* **ca-central-1** : ami-069210ff757e9c1b7
-* **ap-southeast-1** : ami-0d12cc70d6e9c0f39
-* **ap-southeast-2** : ami-0b4615aa76c055267
-* **eu-central-1** : ami-06537f431e52e4763
-* **us-east-2** : ami-0c3cfbcb8e72ecfc5
-* **us-west-1** : ami-0d83de031b83b6880
-* **us-west-2** : ami-06968633c4f7187c4
-* **us-east-1** : ami-07ff2f5f7ef99e8f6
+For easier upgrades, the following AMIs automatically update to the latest release every reboot:

-### v0.12.1
-* **eu-north-1** : ami-003118a8103286d84
-* **ap-south-1** : ami-02dfe86baa48e096f
-* **eu-west-3** : ami-0cc1f01267d2a780d
-* **eu-west-2** : ami-0e4c8332e5ce09585
-* **eu-west-1** : ami-03459a2f0b0a3b1ab
-* **ap-northeast-2** : ami-08f6c2aed3a53f24c
-* **ap-northeast-1** : ami-0b798eab95a7c5435
-* **sa-east-1** : ami-0d3ee166c09f0d1b2
-* **ca-central-1** : ami-00a758c56bd63acd5
-* **ap-southeast-1** : ami-0be64d4988cd03fbb
-* **ap-southeast-2** : ami-02087310d43a63f31
-* **eu-central-1** : ami-097bbefeac0c74225
-* **us-east-2** : ami-07eda256712b90f4d
-* **us-west-1** : ami-02ef2b55cbd01c7df
-* **us-west-2** : ami-037c6176ef4735360
-* **us-east-1** : ami-08715c20c0e3f1c15
+* **eu-north-1** : ami-050c24cc0099e9512 
+* **ap-south-1** : ami-07bb33de49e319d73 
+* **eu-west-3** : ami-00ecdf092af972d24 
+* **eu-west-2** : ami-09ace28116ad33dd9 
+* **eu-west-1** : ami-01d85e00c7741d69b 
+* **ap-northeast-2** : ami-0ccc3d85996362545 
+* **ap-northeast-1** : ami-06abda05aa2407b1a 
+* **sa-east-1** : ami-0ce3597b116cfdd79 
+* **ca-central-1** : ami-0cb2d22a74007fa14 
+* **ap-southeast-1** : ami-06a9784d792a7c30f 
+* **ap-southeast-2** : ami-012ab6092f28f62b6 
+* **eu-central-1** : ami-04443efac619cac6d 
+* **us-east-2** : ami-05391549da2d5e38c 
+* **us-west-1** : ami-0444959077f5f7310 
+* **us-west-2** : ami-029b979c20d7f16f3 
+* **us-east-1** : ami-024ab496fe05a4b4d 
+
+### v0.14.0 (static update)
+* **eu-north-1** : ami-02de71586ec496e38 
+* **ap-south-1** : ami-074b03849b51852e5 
+* **eu-west-3** : ami-022c388835e0eeb03 
+* **eu-west-2** : ami-0a151c236c6b27707 
+* **eu-west-1** : ami-06de69b06b4e73312 
+* **ap-northeast-2** : ami-0ee821b72d9f669b1 
+* **ap-northeast-1** : ami-03687ae215e64e100 
+* **sa-east-1** : ami-01eb83364b7f667af 
+* **ca-central-1** : ami-02e9b35f9c90377e6 
+* **ap-southeast-1** : ami-0d3ab5ab0048fea51 
+* **ap-southeast-2** : ami-0bd39d908fe3a9e06 
+* **eu-central-1** : ami-0b8638701311b35c4 
+* **us-east-2** : ami-02ff039693fc3a614 
+* **us-west-1** : ami-08634f7dfb608a9a7 
+* **us-west-2** : ami-034d693ef742b9333 
+* **us-east-1** : ami-0b828b05c323dde7f
+
+### v0.13.0 (static update)
+* **eu-north-1** : ami-0d9c74a015e7510d8 
+* **ap-south-1** : ami-02acd6dd0659bb5c1 
+* **eu-west-3** : ami-0f0cc5cb6d9afd194 
+* **eu-west-2** : ami-0298fdc0860206ed9 
+* **eu-west-1** : ami-0cdc072e528401d5e 
+* **ap-northeast-2** : ami-0055579cc95b0e53e 
+* **ap-northeast-1** : ami-0ced7becb9b83b5d0 
+* **sa-east-1** : ami-033345d0f16a1b5e4 
+* **ca-central-1** : ami-06c63b05aed47ae67 
+* **ap-southeast-1** : ami-09f0355f367f30602 
+* **ap-southeast-2** : ami-0bd2314163ce0fba0 
+* **eu-central-1** : ami-05fbae957df63e366 
+* **us-east-2** : ami-050c51b5b4074d3fc 
+* **us-west-1** : ami-06ad513073d4e5a19 
+* **us-west-2** : ami-0c96e1361d1d4ca94 
+* **us-east-1** : ami-07b669040d1eea213 
+
+### v0.12.1 (static update)
+* **eu-north-1** : ami-003118a8103286d84 
+* **ap-south-1** : ami-02dfe86baa48e096f 
+* **eu-west-3** : ami-0cc1f01267d2a780d 
+* **eu-west-2** : ami-0e4c8332e5ce09585 
+* **eu-west-1** : ami-03459a2f0b0a3b1ab 
+* **ap-northeast-2** : ami-08f6c2aed3a53f24c 
+* **ap-northeast-1** : ami-0b798eab95a7c5435 
+* **sa-east-1** : ami-0d3ee166c09f0d1b2 
+* **ca-central-1** : ami-00a758c56bd63acd5 
+* **ap-southeast-1** : ami-0be64d4988cd03fbb 
+* **ap-southeast-2** : ami-02087310d43a63f31 
+* **eu-central-1** : ami-097bbefeac0c74225 
+* **us-east-2** : ami-07eda256712b90f4d 
+* **us-west-1** : ami-02ef2b55cbd01c7df 
+* **us-west-2** : ami-037c6176ef4735360 
+* **us-east-1** : ami-08715c20c0e3f1c15 
+
+### v0.12.0 (static update)

-### v0.12.0
 * **eu-north-1** : ami-03ff8ab48cd43e77e
 * **ap-south-1** : ami-079c1a41ff836487c
 * **eu-west-3** : ami-0121ef0398ae87ab0
@@ -102,7 +144,8 @@ The following sections provide a list containing AMI Image ID per region for eac
 * **us-west-2** : ami-0018d5a7e58966848
 * **us-east-1** : ami-08f24178fc14a84d2

-### v0.11.0
+### v0.11.0 (static update)
+
 * **eu-north-1** : ami-0cbe338f058018c97
 * **ap-south-1** : ami-06d72ff894f7a5e5d
 * **eu-west-3** : ami-00f2a45d67df2d2f3
@@ -120,7 +163,8 @@ The following sections provide a list containing AMI Image ID per region for eac
 * **us-west-2** : ami-0e384b6f78bf96ebe
 * **us-east-1** : ami-0a7b46f907d5d9c4a

-### v0.10.1
+### v0.10.1 (static update)
+
 * **eu-north-1** : ami-09937ec4d18350c32
 * **ap-south-1** : ami-089d6ba7541ec4c7f
 * **eu-west-3** : ami-0accb1a94bdd5c5c1
@@ -138,7 +182,8 @@ The following sections provide a list containing AMI Image ID per region for eac
 * **us-west-2** : ami-0d1cb8ba7de246ff0
 * **us-east-1** : ami-049ccba6abdb40cba

-### v0.10.0
+### v0.10.0 (static update)
+
 * **eu-north-1** : ami-05ba33c763877e54e
 * **ap-south-1** : ami-0529eec569161cae5
 * **eu-west-3** : ami-03cb9396f63e26ff6
@@ -157,7 +202,7 @@ The following sections provide a list containing AMI Image ID per region for eac
 * **us-west-2** : ami-04a522ecb2250fb44
 * **us-east-1** : ami-0a66ddbd50959f91e

-### v0.9.0
+### v0.9.0 (static update)

 * **us-east-1** : ami-0991ad536ecbacdac
 * **eu-north-1** : ami-07cbcdff501b14afe
@@ -175,3 +220,4 @@ The following sections provide a list containing AMI Image ID per region for eac
 * **us-east-2** : ami-03b01914b07428488
 * **us-west-1** : ami-0cf4768e9d47ed076
 * **us-west-2** : ami-0b145f37da31eb9fb
+
--- a/docs/install_linux_mac.md
+++ b/docs/install_linux_mac.md
@@ -0,0 +1,97 @@
+# Launching the **trains-server** Docker in Linux or macOS
+
+For Linux or macOS, use our pre-built Docker image for easy deployment. The latest Docker images can be found [here](https://hub.docker.com/r/allegroai/trains). 
+
+For Linux users:
+
+* You must be logged in as a user with sudo privileges.
+* Use `bash` for all command-line instructions in this installation.
+
+To launch **trains-server** on Linux or macOS:
+
+1. Install Docker.
+
+    * Linux - see [Docker for Ubuntu](https://docs.docker.com/install/linux/docker-ce/ubuntu/).
+    * macOS - see [Docker for macOS](https://docs.docker.com/docker-for-mac/install/).
+
+1. Verify the Docker CE installation. Execute the command:
+
+        sudo docker run hello-world
+   
+    The expected is output is:
+
+        Hello from Docker!
+        This message shows that your installation appears to be working correctly.
+        To generate this message, Docker took the following steps:
+        
+        1. The Docker client contacted the Docker daemon.
+        2. The Docker daemon pulled the "hello-world" image from the Docker Hub. (amd64)
+        3. The Docker daemon created a new container from that image which runs the executable that produces the output you are currently reading.
+        4. The Docker daemon streamed that output to the Docker client, which sent it to your terminal.
+
+1. For Linux only, install `docker-compose`. Execute the following commands (for more information, see [Install Docker Compose](https://docs.docker.com/compose/install/) in the Docker documentation): 
+
+        sudo curl -L "https://github.com/docker/compose/releases/download/1.24.1/docker-compose-$(uname -s)-$(uname -m)" -o /usr/local/bin/docker-compose
+        sudo chmod +x /usr/local/bin/docker-compose
+
+1. Increase `vm.max_map_count` for ElasticSearch docker.
+
+    Linux:
+
+        echo "vm.max_map_count=262144" > /tmp/99-trains.conf
+        sudo mv /tmp/99-trains.conf /etc/sysctl.d/99-trains.conf
+        sudo sysctl -w vm.max_map_count=262144
+        sudo service docker restart
+        
+    macOS:
+    
+        screen ~/Library/Containers/com.docker.docker/Data/vms/0/tty
+        sysctl -w vm.max_map_count=262144
+        
+
+1. Remove any previous installation of **trains-server**.
+
+    **WARNING**: This clears all existing **Trains** databases.
+
+        sudo rm -R /opt/trains/
+
+1. Create local directories for the databases and storage.
+
+        sudo mkdir -p /opt/trains/data/elastic
+        sudo mkdir -p /opt/trains/data/mongo/db
+        sudo mkdir -p /opt/trains/data/mongo/configdb
+        sudo mkdir -p /opt/trains/data/redis
+        sudo mkdir -p /opt/trains/logs
+        sudo mkdir -p /opt/trains/config
+        sudo mkdir -p /opt/trains/data/fileserver
+        
+1. For macOS only, open the Docker app, select **Preferences**, and then on the **File Sharing** tab, add `/opt/trains`.
+          
+1. Grant access to the Dockers.
+
+    Linux:
+
+        sudo chown -R 1000:1000 /opt/trains
+        
+    macOS:
+    
+        sudo chown -R $(whoami):staff /opt/trains
+
+1. Download the **trains-server** docker-compose YAML file.
+
+        cd /opt/trains
+        curl https://raw.githubusercontent.com/allegroai/trains-server/master/docker-compose.yml -o docker-compose.yml
+    
+1. Run `docker-compose` with the downloaded configuration file.
+
+        sudo docker-compose -f docker-compose.yml up
+   
+    Your server is now running on [http://localhost:8080](http://localhost:8080) and the following ports are available:
+
+    * Web server on port `8080`
+    * API server on port `8008`
+    * File server on port `8081`
+
+## Next Step
+
+Configure the [Trains client for trains-server](https://github.com/allegroai/trains/blob/master/README.md#configuration).
--- a/docs/install_win.md
+++ b/docs/install_win.md
@@ -0,0 +1,50 @@
+# Launching the **trains-server** Docker in Windows 10
+
+For Windows, we recommend launching our pre-built Docker image on a Linux virtual machine. 
+However, you can launch **trains-server** on Windows 10 using Docker Desktop for Windows (see the Docker [System Requirements](https://docs.docker.com/docker-for-windows/install/#system-requirements)).
+
+To launch **trains-server** on Windows 10:
+
+1. Install the Docker Desktop for Windows application by either:
+
+    * Following the [Install Docker Desktop on Windows](https://docs.docker.com/docker-for-windows/install/) instructions.
+    * Running the Docker installation [wizard](https://hub.docker.com/?overlay=onboarding).
+
+1. Increase the memory allocation in Docker Desktop to `4GB`.
+
+    1. In your Windows notification area (system tray), right click the Docker icon.
+    
+    1. Click *Settings*, *Advanced*, and then set the memory to at least `4096`. 
+    
+    1. Click *Apply*.
+    
+1. Remove any previous installation of **trains-server**.
+
+    **WARNING**: This clears all existing **Trains** databases.
+
+        rmdir c:\opt\trains /s
+
+1. Create local directories for data and logs. Open PowerShell and execute the following commands:
+
+        cd c:
+        mkdir c:\opt\trains\data
+        mkdir c:\opt\trains\logs
+
+1. Save the **trains-server** docker-compose YAML file.
+ 
+        cd c:\opt\trains
+        curl https://raw.githubusercontent.com/allegroai/trains-server/master/docker-compose-win10.yml -o docker-compose-win10.yml 
+ 
+1. Run `docker-compose`. In PowerShell, execute the following commands:
+
+        docker-compose -f docker-compose-win10.yml up
+   
+    Your server is now running on [http://localhost:8080](http://localhost:8080) and the following ports are available:
+
+    * Web server on port `8080`
+    * API server on port `8008`
+    * File server on port `8081`
+
+## Next Step
+
+Configure the [Trains client for trains-server](https://github.com/allegroai/trains/blob/master/README.md#configuration).
--- a/fileserver/fileserver.py
+++ b/fileserver/fileserver.py
@@ -14,6 +14,9 @@ app = Flask(__name__)
 CORS(app, **config.get("fileserver.cors"))
 Compress(app)

+if os.environ.get("TRAINS_UPLOAD_FOLDER"):
+    app.config["UPLOAD_FOLDER"] = os.environ.get("TRAINS_UPLOAD_FOLDER")
+

@app.route("/", methods=["POST"])
 def upload():
--- a/server/api_version.py
+++ b/server/api_version.py
@@ -0,0 +1 @@
+__version__ = "2.7.0"
--- a/server/apierrors/init.py
+++ b/server/apierrors/init.py
@@ -89,6 +89,8 @@ _error_codes = {
        1003: ('worker_registered', 'worker is already registered'),
        1004: ('worker_not_registered', 'worker is not registered'),
        1005: ('worker_stats_not_found', 'worker stats not found'),
+
+        1104: ('invalid_scroll_id', 'Invalid scroll id'),
    },

    (401, 'unauthorized'): {
@@ -105,7 +107,6 @@ _error_codes = {

    (403, 'forbidden'): {
        10: ('routing_error', 'forbidden (routing error)'),
-        11: ('missing_routing_header', 'forbidden (missing routing header)'),
        12: ('blocked_internal_endpoint', 'forbidden (blocked internal endpoint)'),
        20: ('role_not_allowed', 'forbidden (not allowed for role)'),
        21: ('no_write_permission', 'forbidden (modification not allowed)'),
@@ -121,6 +122,7 @@ _error_codes = {
        100: ('data_error', 'general data error'),
        101: ('inconsistent_data', 'inconsistent data encountered in document'),
        102: ('database_unavailable', 'database is temporarily unavailable'),
+        110: ('update_failed', 'update failed'),

        # Index-related issues
        201: ('missing_index', 'missing internal index'),
--- a/server/apimodels/init.py
+++ b/server/apimodels/init.py
@@ -5,12 +5,12 @@ from typing import Union, Type, Iterable

 import jsonmodels.errors
 import six
-import validators
 from jsonmodels import fields
 from jsonmodels.fields import _LazyType, NotSet
 from jsonmodels.models import Base as ModelBase
 from jsonmodels.validators import Enum as EnumValidator
 from luqum.parser import parser, ParseError
+from validators import email as email_validator, domain as domain_validator

 from apierrors import errors

@@ -66,9 +66,7 @@ class DictField(fields.BaseField):
            value_types = tuple()

        return tuple(
-            _LazyType(type_)
-            if isinstance(type_, six.string_types)
-            else type_
+            _LazyType(type_) if isinstance(type_, six.string_types) else type_
            for type_ in value_types
        )

@@ -78,6 +76,9 @@ class DictField(fields.BaseField):
        if not self.value_types:
            return

+        if not value:
+            return
+
        for item in value.values():
            self.validate_single_value(item)

@@ -104,7 +105,7 @@ class IntField(fields.IntField):


 def validate_lucene_query(value):
-    if value == '':
+    if value == "":
        return
    try:
        parser.parse(value)
@@ -122,6 +123,7 @@ class LuceneQueryField(fields.StringField):

 class NullableEnumValidator(EnumValidator):
    """Validator for enums that allows a None value."""
+
    def validate(self, value):
        if value is not None:
            super(NullableEnumValidator, self).validate(value)
@@ -150,10 +152,6 @@ class EnumField(fields.StringField):


 class ActualEnumField(fields.StringField):
-    @property
-    def types(self):
-        return (self.__enum,)
-
    def __init__(
        self,
        enum_class: Type[Enum],
@@ -164,12 +162,13 @@ class ActualEnumField(fields.StringField):
        **kwargs
    ):
        self.__enum = enum_class
+        self.types = (enum_class,)
        # noinspection PyTypeChecker
        choices = list(enum_class)
        validator_cls = EnumValidator if required else NullableEnumValidator
        validators = [*(validators or []), validator_cls(*choices)]
        super().__init__(
-            default=default and self.parse_value(default),
+            default=self.parse_value(default) if default else NotSet,
            *args,
            required=required,
            validators=validators,
@@ -194,7 +193,7 @@ class EmailField(fields.StringField):
        super().validate(value)
        if value is None:
            return
-        if validators.email(value) is not True:
+        if email_validator(value) is not True:
            raise errors.bad_request.InvalidEmailAddress()


@@ -203,7 +202,7 @@ class DomainField(fields.StringField):
        super().validate(value)
        if value is None:
            return
-        if validators.domain(value) is not True:
+        if domain_validator(value) is not True:
            raise errors.bad_request.InvalidDomainName()


--- a/server/apimodels/base.py
+++ b/server/apimodels/base.py
@@ -58,3 +58,7 @@ class UpdateResponse(models.Base):
 class PagedRequest(models.Base):
    page = fields.IntField()
    page_size = fields.IntField()
+
+
+class IdResponse(models.Base):
+    id = fields.StringField(required=True)
--- a/server/apimodels/events.py
+++ b/server/apimodels/events.py
@@ -1,9 +1,12 @@
 from typing import Sequence

-from jsonmodels.fields import StringField
+from jsonmodels import validators
+from jsonmodels.fields import StringField, BoolField
 from jsonmodels.models import Base
+from jsonmodels.validators import Length

 from apimodels import ListField, IntField, ActualEnumField
+from bll.event.event_metrics import EventType
 from bll.event.scalar_key import ScalarKeyEnum


@@ -17,4 +20,44 @@ class ScalarMetricsIterHistogramRequest(HistogramRequestBase):


 class MultiTaskScalarMetricsIterHistogramRequest(HistogramRequestBase):
-    tasks: Sequence[str] = ListField(items_types=str)
+    tasks: Sequence[str] = ListField(
+        items_types=str, validators=[Length(minimum_value=1)]
+    )
+
+
+class TaskMetric(Base):
+    task: str = StringField(required=True)
+    metric: str = StringField(required=True)
+
+
+class DebugImagesRequest(Base):
+    metrics: Sequence[TaskMetric] = ListField(
+        items_types=TaskMetric, validators=[Length(minimum_value=1)]
+    )
+    iters: int = IntField(default=1, validators=validators.Min(1))
+    navigate_earlier: bool = BoolField(default=True)
+    refresh: bool = BoolField(default=False)
+    scroll_id: str = StringField()
+
+
+class IterationEvents(Base):
+    iter: int = IntField()
+    events: Sequence[dict] = ListField(items_types=dict)
+
+
+class MetricEvents(Base):
+    task: str = StringField()
+    metric: str = StringField()
+    iterations: Sequence[IterationEvents] = ListField(items_types=IterationEvents)
+
+
+class DebugImageResponse(Base):
+    metrics: Sequence[MetricEvents] = ListField(items_types=MetricEvents)
+    scroll_id: str = StringField()
+
+
+class TaskMetricsRequest(Base):
+    tasks: Sequence[str] = ListField(
+        items_types=str, validators=[Length(minimum_value=1)]
+    )
+    event_type: EventType = ActualEnumField(EventType, required=True)
--- a/server/apimodels/models.py
+++ b/server/apimodels/models.py
@@ -9,7 +9,7 @@ from apimodels.tasks import PublishResponse as TaskPublishResponse
 class CreateModelRequest(models.Base):
    name = fields.StringField(required=True)
    uri = fields.StringField(required=True)
-    labels = DictField(value_types=string_types+(int,), required=True)
+    labels = DictField(value_types=string_types+(int,))
    tags = ListField(items_types=string_types)
    system_tags = ListField(items_types=string_types)
    comment = fields.StringField()
--- a/server/apimodels/server.py
+++ b/server/apimodels/server.py
@@ -12,3 +12,4 @@ class ReportStatsOptionResponse(Base):
    enabled_time = DateTimeField(nullable=True)
    enabled_version = StringField(nullable=True)
    enabled_user = StringField(nullable=True)
+    current_version = StringField()
--- a/server/apimodels/tasks.py
+++ b/server/apimodels/tasks.py
@@ -1,6 +1,6 @@
 import six
 from jsonmodels import models
-from jsonmodels.fields import StringField, BoolField, IntField
+from jsonmodels.fields import StringField, BoolField, IntField, EmbeddedField
 from jsonmodels.validators import Enum

 from apimodels import DictField, ListField
@@ -9,6 +9,24 @@ from database.model.task.task import TaskType
 from database.utils import get_options


+class ArtifactTypeData(models.Base):
+    preview = StringField()
+    content_type = StringField()
+    data_hash = StringField()
+
+
+class Artifact(models.Base):
+    key = StringField(required=True)
+    type = StringField(required=True)
+    mode = StringField(validators=Enum("input", "output"), default="output")
+    uri = StringField()
+    hash = StringField()
+    content_size = IntField()
+    timestamp = IntField()
+    type_data = EmbeddedField(ArtifactTypeData)
+    display_data = ListField([list])
+
+
 class StartedResponse(UpdateResponse):
    started = IntField()

@@ -72,3 +90,22 @@ class CreateRequest(TaskData):

 class PingRequest(TaskRequest):
    pass
+
+
+class CloneRequest(TaskRequest):
+    new_task_name = StringField()
+    new_task_comment = StringField()
+    new_task_tags = ListField([str])
+    new_task_system_tags = ListField([str])
+    new_task_parent = StringField()
+    new_task_project = StringField()
+    execution_overrides = DictField()
+
+
+class AddOrUpdateArtifactsRequest(TaskRequest):
+    artifacts = ListField([Artifact], required=True)
+
+
+class AddOrUpdateArtifactsResponse(models.Base):
+    added = ListField([str])
+    updated = ListField([str])
--- a/server/bll/event/debug_images_iterator.py
+++ b/server/bll/event/debug_images_iterator.py
@@ -0,0 +1,464 @@
+from collections import defaultdict
+from concurrent.futures.thread import ThreadPoolExecutor
+from functools import partial
+from itertools import chain
+from operator import attrgetter, itemgetter
+
+import attr
+import dpath
+from boltons.iterutils import bucketize
+from elasticsearch import Elasticsearch
+from redis import StrictRedis
+from typing import Sequence, Tuple, Optional, Mapping
+
+import database
+from apierrors import errors
+from bll.redis_cache_manager import RedisCacheManager
+from bll.event.event_metrics import EventMetrics
+from config import config
+from database.errors import translate_errors_context
+from jsonmodels.models import Base
+from jsonmodels.fields import StringField, ListField, IntField
+
+from database.model.task.metrics import MetricEventStats
+from database.model.task.task import Task
+from timing_context import TimingContext
+from utilities.json import loads, dumps
+
+
+class VariantScrollState(Base):
+    name: str = StringField(required=True)
+    recycle_url_marker: str = StringField()
+    last_invalid_iteration: int = IntField()
+
+
+class MetricScrollState(Base):
+    task: str = StringField(required=True)
+    name: str = StringField(required=True)
+    last_min_iter: Optional[int] = IntField()
+    last_max_iter: Optional[int] = IntField()
+    timestamp: int = IntField(default=0)
+    variants: Sequence[VariantScrollState] = ListField([VariantScrollState])
+
+    def reset(self):
+        """Reset the scrolling state for the metric"""
+        self.last_min_iter = self.last_max_iter = None
+
+
+class DebugImageEventsScrollState(Base):
+    id: str = StringField(required=True)
+    metrics: Sequence[MetricScrollState] = ListField([MetricScrollState])
+
+    def to_json(self):
+        return dumps(self.to_struct())
+
+    @classmethod
+    def from_json(cls, s):
+        return cls(**loads(s))
+
+
+@attr.s(auto_attribs=True)
+class DebugImagesResult(object):
+    metric_events: Sequence[tuple] = []
+    next_scroll_id: str = None
+
+
+class DebugImagesIterator:
+    EVENT_TYPE = "training_debug_image"
+    STATE_EXPIRATION_SECONDS = 3600
+
+    @property
+    def _max_workers(self):
+        return config.get("services.events.max_metrics_concurrency", 4)
+
+    def __init__(self, redis: StrictRedis, es: Elasticsearch):
+        self.es = es
+        self.cache_manager = RedisCacheManager(
+            state_class=DebugImageEventsScrollState,
+            redis=redis,
+            expiration_interval=self.STATE_EXPIRATION_SECONDS,
+        )
+
+    def get_task_events(
+        self,
+        company_id: str,
+        metrics: Sequence[Tuple[str, str]],
+        iter_count: int,
+        navigate_earlier: bool = True,
+        refresh: bool = False,
+        state_id: str = None,
+    ) -> DebugImagesResult:
+        es_index = EventMetrics.get_index_name(company_id, self.EVENT_TYPE)
+        if not self.es.indices.exists(es_index):
+            return DebugImagesResult()
+
+        unique_metrics = set(metrics)
+        state = self.cache_manager.get_state(state_id) if state_id else None
+        if not state:
+            state = DebugImageEventsScrollState(
+                id=database.utils.id(),
+                metrics=self._init_metric_states(es_index, list(unique_metrics)),
+            )
+        else:
+            state_metrics = set((m.task, m.name) for m in state.metrics)
+            if state_metrics != unique_metrics:
+                raise errors.bad_request.InvalidScrollId(
+                    "while getting debug images events", scroll_id=state_id
+                )
+
+            if refresh:
+                self._reinit_outdated_metric_states(company_id, es_index, state)
+                for metric_state in state.metrics:
+                    metric_state.reset()
+
+        res = DebugImagesResult(next_scroll_id=state.id)
+        try:
+            with ThreadPoolExecutor(self._max_workers) as pool:
+                res.metric_events = list(
+                    pool.map(
+                        partial(
+                            self._get_task_metric_events,
+                            es_index=es_index,
+                            iter_count=iter_count,
+                            navigate_earlier=navigate_earlier,
+                        ),
+                        state.metrics,
+                    )
+                )
+        finally:
+            self.cache_manager.set_state(state)
+
+        return res
+
+    def _reinit_outdated_metric_states(
+        self, company_id, es_index, state: DebugImageEventsScrollState
+    ):
+        """
+        Determines the metrics for which new debug image events were added
+        since their states were initialized and reinits these states
+        """
+        task_ids = set(metric.task for metric in state.metrics)
+        tasks = Task.objects(id__in=list(task_ids), company=company_id).only(
+            "id", "metric_stats"
+        )
+
+        def get_last_update_times_for_task_metrics(task: Task) -> Sequence[Tuple]:
+            """For metrics that reported debug image events get tuples of task_id/metric_name and last update times"""
+            metric_stats: Mapping[str, MetricEventStats] = task.metric_stats
+            if not metric_stats:
+                return []
+
+            return [
+                (
+                    (task.id, stats.metric),
+                    stats.event_stats_by_type[self.EVENT_TYPE].last_update,
+                )
+                for stats in metric_stats.values()
+                if self.EVENT_TYPE in stats.event_stats_by_type
+            ]
+
+        update_times = dict(
+            chain.from_iterable(
+                get_last_update_times_for_task_metrics(task) for task in tasks
+            )
+        )
+        outdated_metrics = [
+            metric
+            for metric in state.metrics
+            if (metric.task, metric.name) in update_times
+            and update_times[metric.task, metric.name] > metric.timestamp
+        ]
+        state.metrics = [
+            *(metric for metric in state.metrics if metric not in outdated_metrics),
+            *(
+                self._init_metric_states(
+                    es_index,
+                    [(metric.task, metric.name) for metric in outdated_metrics],
+                )
+            ),
+        ]
+
+    def _init_metric_states(
+        self, es_index, metrics: Sequence[Tuple[str, str]]
+    ) -> Sequence[MetricScrollState]:
+        """
+        Returned initialized metric scroll stated for the requested task metrics
+        """
+        tasks = defaultdict(list)
+        for (task, metric) in metrics:
+            tasks[task].append(metric)
+
+        with ThreadPoolExecutor(self._max_workers) as pool:
+            return list(
+                chain.from_iterable(
+                    pool.map(
+                        partial(self._init_metric_states_for_task, es_index=es_index),
+                        tasks.items(),
+                    )
+                )
+            )
+
+    def _init_metric_states_for_task(
+        self, task_metrics: Tuple[str, Sequence[str]], es_index
+    ) -> Sequence[MetricScrollState]:
+        """
+        Return metric scroll states for the task filled with the variant states
+        for the variants that reported any debug images
+        """
+        task, metrics = task_metrics
+        es_req: dict = {
+            "size": 0,
+            "query": {
+                "bool": {
+                    "must": [{"term": {"task": task}}, {"terms": {"metric": metrics}}]
+                }
+            },
+            "aggs": {
+                "metrics": {
+                    "terms": {
+                        "field": "metric",
+                        "size": EventMetrics.MAX_METRICS_COUNT,
+                    },
+                    "aggs": {
+                        "last_event_timestamp": {"max": {"field": "timestamp"}},
+                        "variants": {
+                            "terms": {
+                                "field": "variant",
+                                "size": EventMetrics.MAX_VARIANTS_COUNT,
+                            },
+                            "aggs": {
+                                "urls": {
+                                    "terms": {
+                                        "field": "url",
+                                        "order": {"max_iter": "desc"},
+                                        "size": 1,  # we need only one url from the most recent iteration
+                                    },
+                                    "aggs": {
+                                        "max_iter": {"max": {"field": "iter"}},
+                                        "iters": {
+                                            "top_hits": {
+                                                "sort": {"iter": {"order": "desc"}},
+                                                "size": 2,  # need two last iterations so that we can take
+                                                # the second one as invalid
+                                                "_source": "iter",
+                                            }
+                                        },
+                                    },
+                                }
+                            },
+                        },
+                    },
+                }
+            },
+        }
+
+        with translate_errors_context(), TimingContext("es", "_init_metric_states"):
+            es_res = self.es.search(index=es_index, body=es_req, routing=task)
+        if "aggregations" not in es_res:
+            return []
+
+        def init_variant_scroll_state(variant: dict):
+            """
+            Return new variant scroll state for the passed variant bucket
+            If the image urls get recycled then fill the last_invalid_iteration field
+            """
+            state = VariantScrollState(name=variant["key"])
+            top_iter_url = dpath.get(variant, "urls/buckets")[0]
+            iters = dpath.get(top_iter_url, "iters/hits/hits")
+            if len(iters) > 1:
+                state.last_invalid_iteration = dpath.get(iters[1], "_source/iter")
+            return state
+
+        return [
+            MetricScrollState(
+                task=task,
+                name=metric["key"],
+                variants=[
+                    init_variant_scroll_state(variant)
+                    for variant in dpath.get(metric, "variants/buckets")
+                ],
+                timestamp=dpath.get(metric, "last_event_timestamp/value"),
+            )
+            for metric in dpath.get(es_res, "aggregations/metrics/buckets")
+        ]
+
+    def _get_task_metric_events(
+        self,
+        metric: MetricScrollState,
+        es_index: str,
+        iter_count: int,
+        navigate_earlier: bool,
+    ) -> Tuple:
+        """
+        Return task metric events grouped by iterations
+        Update metric scroll state
+        """
+        if metric.last_max_iter is None:
+            # the first fetch is always from the latest iteration to the earlier ones
+            navigate_earlier = True
+
+        must_conditions = [
+            {"term": {"task": metric.task}},
+            {"term": {"metric": metric.name}},
+        ]
+        must_not_conditions = []
+
+        range_condition = None
+        if navigate_earlier and metric.last_min_iter is not None:
+            range_condition = {"lt": metric.last_min_iter}
+        elif not navigate_earlier and metric.last_max_iter is not None:
+            range_condition = {"gt": metric.last_max_iter}
+        if range_condition:
+            must_conditions.append({"range": {"iter": range_condition}})
+
+        if navigate_earlier:
+            """
+            When navigating to earlier iterations consider only
+            variants whose invalid iterations border is lower than
+            our starting iteration. For these variants make sure
+            that only events from the valid iterations are returned 
+            """
+            if not metric.last_min_iter:
+                variants = metric.variants
+            else:
+                variants = list(
+                    v
+                    for v in metric.variants
+                    if v.last_invalid_iteration is None
+                    or v.last_invalid_iteration < metric.last_min_iter
+                )
+                if not variants:
+                    return metric.task, metric.name, []
+                must_conditions.append(
+                    {"terms": {"variant": list(v.name for v in variants)}}
+                )
+        else:
+            """
+            When navigating to later iterations all variants may be relevant.
+            For the variants whose invalid border is higher than our starting 
+            iteration make sure that only events from valid iterations are returned 
+            """
+            variants = list(
+                v
+                for v in metric.variants
+                if v.last_invalid_iteration is not None
+                and v.last_invalid_iteration > metric.last_max_iter
+            )
+
+        variants_conditions = [
+            {
+                "bool": {
+                    "must": [
+                        {"term": {"variant": v.name}},
+                        {"range": {"iter": {"lte": v.last_invalid_iteration}}},
+                    ]
+                }
+            }
+            for v in variants
+            if v.last_invalid_iteration is not None
+        ]
+        if variants_conditions:
+            must_not_conditions.append({"bool": {"should": variants_conditions}})
+
+        es_req = {
+            "size": 0,
+            "query": {
+                "bool": {"must": must_conditions, "must_not": must_not_conditions}
+            },
+            "aggs": {
+                "iters": {
+                    "terms": {
+                        "field": "iter",
+                        "size": iter_count,
+                        "order": {"_term": "desc" if navigate_earlier else "asc"},
+                    },
+                    "aggs": {
+                        "variants": {
+                            "terms": {
+                                "field": "variant",
+                                "size": EventMetrics.MAX_VARIANTS_COUNT,
+                            },
+                            "aggs": {
+                                "events": {
+                                    "top_hits": {"sort": {"url": {"order": "desc"}}}
+                                }
+                            },
+                        }
+                    },
+                }
+            },
+        }
+        with translate_errors_context(), TimingContext("es", "get_debug_image_events"):
+            es_res = self.es.search(index=es_index, body=es_req, routing=metric.task)
+        if "aggregations" not in es_res:
+            return metric.task, metric.name, []
+
+        def get_iteration_events(variant_buckets: Sequence[dict]) -> Sequence:
+            return [
+                ev["_source"]
+                for v in variant_buckets
+                for ev in dpath.get(v, "events/hits/hits")
+            ]
+
+        iterations = [
+            {
+                "iter": it["key"],
+                "events": get_iteration_events(dpath.get(it, "variants/buckets")),
+            }
+            for it in dpath.get(es_res, "aggregations/iters/buckets")
+        ]
+        if not navigate_earlier:
+            iterations.sort(key=itemgetter("iter"), reverse=True)
+        if iterations:
+            metric.last_max_iter = iterations[0]["iter"]
+            metric.last_min_iter = iterations[-1]["iter"]
+
+        # Commented for now since the last invalid iteration is calculated in the beginning
+        # if navigate_earlier and any(
+        #     variant.last_invalid_iteration is None for variant in variants
+        # ):
+        #     """
+        #     Variants validation flags due to recycling can
+        #     be set only on navigation to earlier frames
+        #     """
+        #     iterations = self._update_variants_invalid_iterations(variants, iterations)
+
+        return metric.task, metric.name, iterations
+
+    @staticmethod
+    def _update_variants_invalid_iterations(
+        variants: Sequence[VariantScrollState], iterations: Sequence[dict]
+    ) -> Sequence[dict]:
+        """
+        This code is currently not in used since the invalid iterations
+        are calculated during MetricState initialization
+        For variants that do not have recycle url marker set it from the
+        first event
+        For variants that do not have last_invalid_iteration set check if the
+        recycle marker was reached on a certain iteration and set it to the
+        corresponding iteration
+        For variants that have a newly set last_invalid_iteration remove
+        events from the invalid iterations
+        Return the updated iterations list
+        """
+        variants_lookup = bucketize(variants, attrgetter("name"))
+        for it in iterations:
+            iteration = it["iter"]
+            events_to_remove = []
+            for event in it["events"]:
+                variant = variants_lookup[event["variant"]][0]
+                if (
+                    variant.last_invalid_iteration
+                    and variant.last_invalid_iteration >= iteration
+                ):
+                    events_to_remove.append(event)
+                    continue
+                event_url = event.get("url")
+                if not variant.recycle_url_marker:
+                    variant.recycle_url_marker = event_url
+                elif variant.recycle_url_marker == event_url:
+                    variant.last_invalid_iteration = iteration
+                    events_to_remove.append(event)
+            if events_to_remove:
+                it["events"] = [ev for ev in it["events"] if ev not in events_to_remove]
+        return [it for it in iterations if it["events"]]
--- a/server/bll/event/event_bll.py
+++ b/server/bll/event/event_bll.py
@@ -1,7 +1,7 @@
+import hashlib
 from collections import defaultdict
 from contextlib import closing
 from datetime import datetime
-from enum import Enum
 from operator import attrgetter
 from typing import Sequence

@@ -14,42 +14,39 @@ from nested_dict import nested_dict
 import database.utils as dbutils
 import es_factory
 from apierrors import errors
-from bll.event.event_metrics import EventMetrics
+from bll.event.debug_images_iterator import DebugImagesIterator
+from bll.event.event_metrics import EventMetrics, EventType
 from bll.task import TaskBLL
+from config import config
 from database.errors import translate_errors_context
 from database.model.task.task import Task, TaskStatus
+from redis_manager import redman
 from timing_context import TimingContext
 from utilities.dicts import flatten_nested_items

-
-class EventType(Enum):
-    metrics_scalar = "training_stats_scalar"
-    metrics_vector = "training_stats_vector"
-    metrics_image = "training_debug_image"
-    metrics_plot = "plot"
-    task_log = "log"
-
-
 # noinspection PyTypeChecker
 EVENT_TYPES = set(map(attrgetter("value"), EventType))
-
-
 LOCKED_TASK_STATUSES = (TaskStatus.publishing, TaskStatus.published)


-@attr.s
+@attr.s(auto_attribs=True)
 class TaskEventsResult(object):
-    events = attr.ib(type=list, default=attr.Factory(list))
-    total_events = attr.ib(type=int, default=0)
-    next_scroll_id = attr.ib(type=str, default=None)
+    total_events: int = 0
+    next_scroll_id: str = None
+    events: list = attr.ib(factory=list)


 class EventBLL(object):
-    id_fields = ["task", "iter", "metric", "variant", "key"]
+    id_fields = ("task", "iter", "metric", "variant", "key")

-    def __init__(self, events_es=None):
+    def __init__(self, events_es=None, redis=None):
        self.es = events_es or es_factory.connect("events")
        self._metrics = EventMetrics(self.es)
+        self._skip_iteration_for_metric = set(
+            config.get("services.events.ignore_iteration.metrics", [])
+        )
+        self.redis = redis or redman.connection("apiserver")
+        self.debug_images_iterator = DebugImagesIterator(es=self.es, redis=self.redis)

    @property
    def metrics(self) -> EventMetrics:
@@ -59,9 +56,12 @@ class EventBLL(object):
        actions = []
        task_ids = set()
        task_iteration = defaultdict(lambda: 0)
-        task_last_events = nested_dict(
+        task_last_scalar_events = nested_dict(
            3, dict
        )  # task_id -> metric_hash -> variant_hash -> MetricEvent
+        task_last_events = nested_dict(
+            3, dict
+        )  # task_id -> metric_hash -> event_type -> MetricEvent

        for event in events:
            # remove spaces from event type
@@ -103,6 +103,9 @@ class EventBLL(object):
                event["value"] = event["values"]
                del event["values"]

+            event["metric"] = event.get("metric") or ""
+            event["variant"] = event.get("variant") or ""
+
            index_name = EventMetrics.get_index_name(company_id, event_type)
            es_action = {
                "_op_type": "index",  # overwrite if exists with same ID
@@ -121,12 +124,18 @@ class EventBLL(object):
            if task_id is not None:
                es_action["_routing"] = task_id
                task_ids.add(task_id)
-                if iter is not None:
+                if (
+                    iter is not None
+                    and event.get("metric") not in self._skip_iteration_for_metric
+                ):
                    task_iteration[task_id] = max(iter, task_iteration[task_id])

+                self._update_last_metric_events_for_task(
+                    last_events=task_last_events[task_id], event=event,
+                )
                if event_type == EventType.metrics_scalar.value:
-                    self._update_last_metric_event_for_task(
-                        task_last_events=task_last_events, task_id=task_id, event=event
+                    self._update_last_scalar_events_for_task(
+                        last_events=task_last_scalar_events[task_id], event=event
                    )
            else:
                es_action["_routing"] = task_id
@@ -179,6 +188,7 @@ class EventBLL(object):
                    task_id=task_id,
                    now=now,
                    iter_max=task_iteration.get(task_id),
+                    last_scalar_events=task_last_scalar_events.get(task_id),
                    last_events=task_last_events.get(task_id),
                )

@@ -194,12 +204,12 @@ class EventBLL(object):

        return added, errors_in_bulk

-    def _update_last_metric_event_for_task(self, task_last_events, task_id, event):
+    def _update_last_scalar_events_for_task(self, last_events, event):
        """
-        Update task_last_events structure for the provided task_id with the provided event details if this event is more
+        Update last_events structure with the provided event details if this event is more
        recent than the currently stored event for its metric/variant combination.

-        task_last_events contains [hashed_metric_name -> hashed_variant_name -> event]. Keys are hashed to avoid mongodb
+        last_events contains [hashed_metric_name -> hashed_variant_name -> event]. Keys are hashed to avoid mongodb
        key conflicts due to invalid characters and/or long field names.
        """
        metric = event.get("metric")
@@ -210,13 +220,34 @@ class EventBLL(object):
        metric_hash = dbutils.hash_field_name(metric)
        variant_hash = dbutils.hash_field_name(variant)

-        last_events = task_last_events[task_id]
-
        timestamp = last_events[metric_hash][variant_hash].get("timestamp", None)
        if timestamp is None or timestamp < event["timestamp"]:
            last_events[metric_hash][variant_hash] = event

-    def _update_task(self, company_id, task_id, now, iter_max=None, last_events=None):
+    def _update_last_metric_events_for_task(self, last_events, event):
+        """
+        Update last_events structure with the provided event details if this event is more
+        recent than the currently stored event for its metric/event_type combination.
+        last_events contains [metric_name -> event_type -> event]
+        """
+        metric = event.get("metric")
+        event_type = event.get("type")
+        if not (metric and event_type):
+            return
+
+        timestamp = last_events[metric][event_type].get("timestamp", None)
+        if timestamp is None or timestamp < event["timestamp"]:
+            last_events[metric][event_type] = event
+
+    def _update_task(
+        self,
+        company_id,
+        task_id,
+        now,
+        iter_max=None,
+        last_scalar_events=None,
+        last_events=None,
+    ):
        """
        Update task information in DB with aggregated results after handling event(s) related to this task.

@@ -229,15 +260,18 @@ class EventBLL(object):
        if iter_max is not None:
            fields["last_iteration_max"] = iter_max

-        if last_events:
-            fields["last_values"] = list(
+        if last_scalar_events:
+            fields["last_scalar_values"] = list(
                flatten_nested_items(
-                    last_events,
+                    last_scalar_events,
                    nesting=2,
                    include_leaves=["value", "metric", "variant"],
                )
            )

+        if last_events:
+            fields["last_events"] = last_events
+
        if not fields:
            return False

@@ -245,7 +279,7 @@ class EventBLL(object):

    def _get_event_id(self, event):
        id_values = (str(event[field]) for field in self.id_fields if field in event)
-        return "-".join(id_values)
+        return hashlib.md5("-".join(id_values).encode()).hexdigest()

    def scroll_task_events(
        self,
@@ -276,7 +310,9 @@ class EventBLL(object):
            }

            with translate_errors_context(), TimingContext("es", "scroll_task_events"):
-                es_res = self.es.search(index=es_index, body=es_req, scroll="1h")
+                es_res = self.es.search(
+                    index=es_index, body=es_req, scroll="1h", routing=task_id
+                )

        events = [hit["_source"] for hit in es_res["hits"]["hits"]]
        next_scroll_id = es_res["_scroll_id"]
@@ -294,10 +330,16 @@ class EventBLL(object):
            "size": 0,
            "aggs": {
                "metrics": {
-                    "terms": {"field": "metric"},
+                    "terms": {
+                        "field": "metric",
+                        "size": EventMetrics.MAX_METRICS_COUNT,
+                    },
                    "aggs": {
                        "variants": {
-                            "terms": {"field": "variant"},
+                            "terms": {
+                                "field": "variant",
+                                "size": EventMetrics.MAX_VARIANTS_COUNT,
+                            },
                            "aggs": {
                                "iters": {
                                    "terms": {
@@ -496,8 +538,18 @@ class EventBLL(object):
            "size": 0,
            "aggs": {
                "metrics": {
-                    "terms": {"field": "metric", "size": 200},
-                    "aggs": {"variants": {"terms": {"field": "variant", "size": 200}}},
+                    "terms": {
+                        "field": "metric",
+                        "size": EventMetrics.MAX_METRICS_COUNT,
+                    },
+                    "aggs": {
+                        "variants": {
+                            "terms": {
+                                "field": "variant",
+                                "size": EventMetrics.MAX_VARIANTS_COUNT,
+                            }
+                        }
+                    },
                }
            },
            "query": {"bool": {"must": [{"term": {"task": task_id}}]}},
@@ -537,14 +589,14 @@ class EventBLL(object):
                "metrics": {
                    "terms": {
                        "field": "metric",
-                        "size": 1000,
+                        "size": EventMetrics.MAX_METRICS_COUNT,
                        "order": {"_term": "asc"},
                    },
                    "aggs": {
                        "variants": {
                            "terms": {
                                "field": "variant",
-                                "size": 1000,
+                                "size": EventMetrics.MAX_VARIANTS_COUNT,
                                "order": {"_term": "asc"},
                            },
                            "aggs": {
--- a/server/bll/event/event_metrics.py
+++ b/server/bll/event/event_metrics.py
@@ -1,12 +1,13 @@
 import itertools
 from collections import defaultdict
 from concurrent.futures import ThreadPoolExecutor
+from enum import Enum
 from functools import partial
 from operator import itemgetter
+from typing import Sequence, Tuple, Callable, Iterable

+from boltons.iterutils import bucketize
 from elasticsearch import Elasticsearch
-from typing import Sequence, Tuple, Callable
-
 from mongoengine import Q

 from apierrors import errors
@@ -20,10 +21,19 @@ from utilities import safe_get
 log = config.logger(__file__)


+class EventType(Enum):
+    metrics_scalar = "training_stats_scalar"
+    metrics_vector = "training_stats_vector"
+    metrics_image = "training_debug_image"
+    metrics_plot = "plot"
+    task_log = "log"
+
+
 class EventMetrics:
-    MAX_TASKS_COUNT = 100
+    MAX_TASKS_COUNT = 50
    MAX_METRICS_COUNT = 200
    MAX_VARIANTS_COUNT = 500
+    MAX_AGGS_ELEMENTS_COUNT = 50

    def __init__(self, es: Elasticsearch):
        self.es = es
@@ -62,6 +72,12 @@ class EventMetrics:
        Compare scalar metrics for different tasks per metric and variant
        The amount of points in each histogram should not exceed the requested samples
        """
+        if len(task_ids) > self.MAX_TASKS_COUNT:
+            raise errors.BadRequest(
+                f"Up to {self.MAX_TASKS_COUNT} tasks supported for comparison",
+                len(task_ids),
+            )
+
        task_name_by_id = {}
        with translate_errors_context():
            task_objs = Task.get_many(
@@ -97,6 +113,31 @@ class EventMetrics:
    MetricInterval = Tuple[int, Sequence[TaskMetric]]
    MetricData = Tuple[str, dict]

+    def _split_metrics_by_max_aggs_count(
+        self, task_metrics: Sequence[TaskMetric]
+    ) -> Iterable[Sequence[TaskMetric]]:
+        """
+        Return task metrics in groups where amount of task metrics in each group
+        is roughly limited by MAX_AGGS_ELEMENTS_COUNT. The split is done on metrics and
+        variants while always preserving all their tasks in the same group
+        """
+        if len(task_metrics) < self.MAX_AGGS_ELEMENTS_COUNT:
+            yield task_metrics
+            return
+
+        tm_grouped = bucketize(task_metrics, key=itemgetter(1, 2))
+        groups = []
+        for group in tm_grouped.values():
+            groups.append(group)
+            if sum(map(len, groups)) >= self.MAX_AGGS_ELEMENTS_COUNT:
+                yield list(itertools.chain(*groups))
+                groups = []
+
+        if groups:
+            yield list(itertools.chain(*groups))
+
+        return
+
    def _run_get_scalar_metrics_as_parallel(
        self,
        company_id: str,
@@ -126,21 +167,25 @@ class EventMetrics:
        if not intervals:
            return {}

-        with ThreadPoolExecutor(len(intervals)) as pool:
-            metrics = list(
-                itertools.chain.from_iterable(
-                    pool.map(
-                        partial(
-                            get_func, task_ids=task_ids, es_index=es_index, key=key
-                        ),
-                        intervals,
-                    )
+        intervals = list(
+            itertools.chain.from_iterable(
+                zip(itertools.repeat(i), self._split_metrics_by_max_aggs_count(tms))
+                for i, tms in intervals
+            )
+        )
+        max_concurrency = config.get("services.events.max_metrics_concurrency", 4)
+        with ThreadPoolExecutor(max_workers=max_concurrency) as pool:
+            metrics = itertools.chain.from_iterable(
+                pool.map(
+                    partial(get_func, task_ids=task_ids, es_index=es_index, key=key),
+                    intervals,
                )
            )

        ret = defaultdict(dict)
        for metric_key, metric_values in metrics:
            ret[metric_key].update(metric_values)
+
        return ret

    def _get_metric_intervals(
@@ -310,7 +355,13 @@ class EventMetrics:
                    "variants": {
                        "terms": {"field": "variant", "size": self.MAX_VARIANTS_COUNT},
                        "aggs": {
-                            "tasks": {"terms": {"field": "task"}, "aggs": aggregation}
+                            "tasks": {
+                                "terms": {
+                                    "field": "task",
+                                    "size": self.MAX_TASKS_COUNT,
+                                },
+                                "aggs": aggregation,
+                            }
                        },
                    }
                },
@@ -396,3 +447,50 @@ class EventMetrics:
                ]
            }
        }
+
+    def get_tasks_metrics(
+        self, company_id, task_ids: Sequence, event_type: EventType
+    ) -> Sequence[Tuple]:
+        """
+        For the requested tasks return all the metrics that
+        reported events of the requested types
+        """
+        es_index = EventMetrics.get_index_name(company_id, event_type.value)
+        if not self.es.indices.exists(es_index):
+            return [(tid, []) for tid in task_ids]
+
+        max_concurrency = config.get("services.events.max_metrics_concurrency", 4)
+        with ThreadPoolExecutor(max_concurrency) as pool:
+            res = pool.map(
+                partial(
+                    self._get_task_metrics, es_index=es_index, event_type=event_type,
+                ),
+                task_ids,
+            )
+        return list(zip(task_ids, res))
+
+    def _get_task_metrics(self, task_id, es_index, event_type: EventType) -> Sequence:
+        es_req = {
+            "size": 0,
+            "query": {
+                "bool": {
+                    "must": [
+                        {"term": {"task": task_id}},
+                        {"term": {"type": event_type.value}},
+                    ]
+                }
+            },
+            "aggs": {
+                "metrics": {
+                    "terms": {"field": "metric", "size": self.MAX_METRICS_COUNT}
+                }
+            },
+        }
+
+        with translate_errors_context(), TimingContext("es", "_get_task_metrics"):
+            es_res = self.es.search(index=es_index, body=es_req, routing=task_id)
+
+        return [
+            metric["key"]
+            for metric in safe_get(es_res, "aggregations/metrics/buckets", default=[])
+        ]
--- a/server/bll/queue/queue_bll.py
+++ b/server/bll/queue/queue_bll.py
@@ -9,9 +9,12 @@ import es_factory
 from apierrors import errors
 from bll.queue.queue_metrics import QueueMetrics
 from bll.workers import WorkerBLL
+from config import config
 from database.errors import translate_errors_context
 from database.model.queue import Queue, Entry

+log = config.logger(__file__)
+

 class QueueBLL(object):
    def __init__(self, worker_bll: WorkerBLL = None, es: Elasticsearch = None):
@@ -189,9 +192,7 @@ class QueueBLL(object):
        """
        with translate_errors_context():
            query = dict(id=queue_id, company=company_id)
-            queue = Queue.objects(**query).modify(
-                pop__entries=-1, last_update=datetime.utcnow(), upsert=False
-            )
+            queue = Queue.objects(**query).modify(pop__entries=-1, upsert=False)
            if not queue:
                raise errors.bad_request.InvalidQueueId(**query)

@@ -200,6 +201,11 @@ class QueueBLL(object):
            if not queue.entries:
                return

+            try:
+                Queue.objects(**query).update(last_update=datetime.utcnow())
+            except Exception:
+                log.exception("Error while updating Queue.last_update")
+
            return queue.entries[0]

    def remove_task(self, company_id: str, queue_id: str, task_id: str) -> int:
--- a/server/bll/redis_cache_manager.py
+++ b/server/bll/redis_cache_manager.py
@@ -0,0 +1,44 @@
+from typing import Optional, TypeVar, Generic, Type
+
+from redis import StrictRedis
+
+from timing_context import TimingContext
+
+T = TypeVar("T")
+
+
+class RedisCacheManager(Generic[T]):
+    """
+    Class for store/retreive of state objects from redis
+
+    self.state_class - class of the state
+    self.redis - instance of redis
+    self.expiration_interval - expiration interval in seconds
+    """
+
+    def __init__(
+        self, state_class: Type[T], redis: StrictRedis, expiration_interval: int
+    ):
+        self.state_class = state_class
+        self.redis = redis
+        self.expiration_interval = expiration_interval
+
+    def set_state(self, state: T) -> None:
+        redis_key = self._get_redis_key(state.id)
+        with TimingContext("redis", "cache_set_state"):
+            self.redis.set(redis_key, state.to_json())
+            self.redis.expire(redis_key, self.expiration_interval)
+
+    def get_state(self, state_id) -> Optional[T]:
+        redis_key = self._get_redis_key(state_id)
+        with TimingContext("redis", "cache_get_state"):
+            response = self.redis.get(redis_key)
+        if response:
+            return self.state_class.from_json(response)
+
+    def delete_state(self, state_id) -> None:
+        with TimingContext("redis", "cache_delete_state"):
+            self.redis.delete(self._get_redis_key(state_id))
+
+    def _get_redis_key(self, state_id):
+        return f"{self.state_class}/{state_id}"
--- a/server/bll/statistics/resource_monitor.py
+++ b/server/bll/statistics/resource_monitor.py
@@ -6,6 +6,8 @@ from time import sleep
 import attr
 import psutil

+from utilities.threads_manager import ThreadsManager
+

 class ResourceMonitor(Thread):
    @attr.s(auto_attribs=True)
@@ -58,7 +60,9 @@ class ResourceMonitor(Thread):
        )

    def run(self):
-        while True:
+        while not ThreadsManager.terminating:
+            sleep(self.sample_interval_sec)
+
            sample = self._get_sample()

            with self._lock:
@@ -67,21 +71,20 @@ class ResourceMonitor(Thread):
                self._avg = self._avg.avg(sample, self._count)
                self._count += 1

-            sleep(self.sample_interval_sec)
-
    def get_stats(self) -> dict:
        """ Returns current resource statistics and clears internal resource statistics """
        with self._lock:
            min_ = attr.asdict(self._min)
            max_ = attr.asdict(self._max)
            avg = attr.asdict(self._avg)
-            res = {
-                "interval_sec": (datetime.utcnow() - self._clear_time).total_seconds(),
-                "num_cores": psutil.cpu_count(),
-                **{
-                    k: {"min": v, "max": max_[k], "avg": avg[k]}
-                    for k, v in min_.items()
-                }
-            }
+            interval = datetime.utcnow() - self._clear_time
            self._clear()
-        return res
+
+        return {
+            "interval_sec": interval.total_seconds(),
+            "num_cores": psutil.cpu_count(),
+            **{
+                k: {"min": v, "max": max_[k], "avg": avg[k]}
+                for k, v in min_.items()
+            }
+        }
--- a/server/bll/statistics/stats_reporter.py
+++ b/server/bll/statistics/stats_reporter.py
@@ -53,11 +53,8 @@ class StatisticsReporter:
        report_interval = timedelta(
            hours=config.get("apiserver.statistics.report_interval_hours", 24)
        )
-
-        while True:
-
-            sleep(report_interval.total_seconds())
-
+        sleep(report_interval.total_seconds())
+        while not ThreadsManager.terminating:
            try:
                for company in Company.objects(
                    defaults__stats_option__enabled=True
@@ -68,6 +65,8 @@ class StatisticsReporter:
            except Exception as ex:
                log.exception(f"Failed collecting stats: {str(ex)}")

+            sleep(report_interval.total_seconds())
+
    @classmethod
    @threads.register("sender", daemon=True)
    def start_sender(cls):
@@ -86,7 +85,7 @@ class StatisticsReporter:

        WarningFilter.attach()

-        while True:
+        while not ThreadsManager.terminating:
            try:
                report = cls.send_queue.get()

--- a/server/bll/task/init.py
+++ b/server/bll/task/init.py
@@ -4,4 +4,5 @@ from .utils import (
    update_project_time,
    validate_status_change,
    split_by,
+    ParameterKeyEscaper,
 )
--- a/server/bll/task/task_bll.py
+++ b/server/bll/task/task_bll.py
@@ -1,31 +1,41 @@
-import re
 from collections import OrderedDict
 from datetime import datetime, timedelta
+from operator import attrgetter
+from random import random
 from time import sleep
-from typing import Collection, Sequence, Tuple, Any
+from typing import Collection, Sequence, Tuple, Any, Optional, List, Dict

+import pymongo.results
 import six
 from mongoengine import Q
 from six import string_types

+import database.utils as dbutils
 import es_factory
 from apierrors import errors
+from apimodels.tasks import Artifact as ApiArtifact
 from config import config
 from database.errors import translate_errors_context
 from database.model.model import Model
 from database.model.project import Project
+from database.model.task.metrics import EventStats, MetricEventStats
 from database.model.task.output import Output
 from database.model.task.task import (
    Task,
    TaskStatus,
    TaskStatusMessage,
    TaskSystemTags,
+    ArtifactModes,
+    Artifact,
 )
 from database.utils import get_company_or_none_constraint, id as create_id
 from service_repo import APICall
 from timing_context import TimingContext
+from utilities.dicts import deep_merge
 from utilities.threads_manager import ThreadsManager
-from .utils import ChangeStatusRequest, validate_status_change
+from .utils import ChangeStatusRequest, validate_status_change, ParameterKeyEscaper
+
+log = config.logger(__file__)


 class TaskBLL(object):
@@ -144,6 +154,61 @@ class TaskBLL(object):

        return model

+    @classmethod
+    def clone_task(
+        cls,
+        company_id,
+        user_id,
+        task_id,
+        name: Optional[str] = None,
+        comment: Optional[str] = None,
+        parent: Optional[str] = None,
+        project: Optional[str] = None,
+        tags: Optional[Sequence[str]] = None,
+        system_tags: Optional[Sequence[str]] = None,
+        execution_overrides: Optional[dict] = None,
+    ) -> Task:
+        task = cls.get_by_id(company_id=company_id, task_id=task_id)
+        execution_dict = task.execution.to_proper_dict() if task.execution else {}
+        if execution_overrides:
+            parameters = execution_overrides.get("parameters")
+            if parameters is not None:
+                execution_overrides["parameters"] = {
+                    ParameterKeyEscaper.escape(k): v for k, v in parameters.items()
+                }
+            execution_dict = deep_merge(execution_dict, execution_overrides)
+        artifacts = execution_dict.get("artifacts")
+        if artifacts:
+            execution_dict["artifacts"] = [
+                a for a in artifacts if a.get("mode") != ArtifactModes.output
+            ]
+        now = datetime.utcnow()
+
+        with translate_errors_context():
+            new_task = Task(
+                id=create_id(),
+                user=user_id,
+                company=company_id,
+                created=now,
+                last_update=now,
+                name=name or task.name,
+                comment=comment or task.comment,
+                parent=parent or task.parent,
+                project=project or task.project,
+                tags=tags or task.tags,
+                system_tags=system_tags or [],
+                type=task.type,
+                script=task.script,
+                output=Output(destination=task.output.destination)
+                if task.output
+                else None,
+                execution=execution_dict,
+            )
+            cls.validate(new_task)
+            new_task.save()
+
+        return new_task
+
    @classmethod
    def validate(cls, task: Task):
        assert isinstance(task, Task)
@@ -153,23 +218,13 @@ class TaskBLL(object):
        ):
            raise errors.bad_request.InvalidTaskId("invalid parent", parent=task.parent)

-        if task.project:
-            Project.get_for_writing(company=task.company, id=task.project)
+        if task.project and not Project.get_for_writing(
+            company=task.company, id=task.project
+        ):
+            raise errors.bad_request.InvalidProjectId(id=task.project)

        cls.validate_execution_model(task)

-        if task.execution:
-            if task.execution.parameters:
-                cls._validate_execution_parameters(task.execution.parameters)
-
-    @staticmethod
-    def _validate_execution_parameters(parameters):
-        invalid_keys = [k for k in parameters if re.search(r"\s", k)]
-        if invalid_keys:
-            raise errors.bad_request.ValidationError(
-                "execution.parameters keys contain whitespace", keys=invalid_keys
-            )
-
    @staticmethod
    def get_unique_metric_variants(company_id, project_ids=None):
        pipeline = [
@@ -226,7 +281,8 @@ class TaskBLL(object):
        last_update: datetime = None,
        last_iteration: int = None,
        last_iteration_max: int = None,
-        last_values: Sequence[Tuple[Tuple[str, ...], Any]] = None,
+        last_scalar_values: Sequence[Tuple[Tuple[str, ...], Any]] = None,
+        last_events: Dict[str, Dict[str, dict]] = None,
        **extra_updates,
    ):
        """
@@ -238,7 +294,8 @@ class TaskBLL(object):
            task's last iteration value.
        :param last_iteration_max: Last reported iteration. Use this to conditionally set a value only
            if the current task's last iteration value is smaller than the provided value.
-        :param last_values: Last reported metrics summary (value, metric, variant).
+        :param last_scalar_values: Last reported metrics summary for scalar events (value, metric, variant).
+        :param last_events: Last reported metrics summary (value, metric, event type).
        :param extra_updates: Extra task updates to include in this update call.
        :return:
        """
@@ -249,17 +306,33 @@ class TaskBLL(object):
        elif last_iteration_max is not None:
            extra_updates.update(max__last_iteration=last_iteration_max)

-        if last_values is not None:
+        if last_scalar_values is not None:

            def op_path(op, *path):
                return "__".join((op, "last_metrics") + path)

-            for path, value in last_values:
+            for path, value in last_scalar_values:
                extra_updates[op_path("set", *path)] = value
                if path[-1] == "value":
                    extra_updates[op_path("min", *path[:-1], "min_value")] = value
                    extra_updates[op_path("max", *path[:-1], "max_value")] = value

+        if last_events is not None:
+
+            def events_per_type(metric_data: Dict[str, dict]) -> Dict[str, EventStats]:
+                return {
+                    event_type: EventStats(last_update=event["timestamp"])
+                    for event_type, event in metric_data.items()
+                }
+
+            metric_stats = {
+                dbutils.hash_field_name(metric_key): MetricEventStats(
+                    metric=metric_key, event_stats_by_type=events_per_type(metric_data),
+                )
+                for metric_key, metric_data in last_events.items()
+            }
+            extra_updates["metric_stats"] = metric_stats
+
        Task.objects(id=task_id, company=company_id).update(
            upsert=False, last_update=last_update, **extra_updates
        )
@@ -373,7 +446,7 @@ class TaskBLL(object):
        :return: updated task fields
        """

-        task = TaskBLL.get_task_with_access(
+        task = cls.get_task_with_access(
            task_id,
            company_id=company_id,
            only=(
@@ -411,6 +484,97 @@ class TaskBLL(object):
            force=force,
        ).execute()

+    @classmethod
+    def add_or_update_artifacts(
+        cls, task_id: str, company_id: str, artifacts: List[ApiArtifact]
+    ) -> Tuple[List[str], List[str]]:
+        key = attrgetter("key", "mode")
+
+        if not artifacts:
+            return [], []
+
+        with translate_errors_context(), TimingContext("mongo", "update_artifacts"):
+            artifacts: List[Artifact] = [
+                Artifact(**artifact.to_struct()) for artifact in artifacts
+            ]
+
+            attempts = int(config.get("services.tasks.artifacts.update_attempts", 10))
+
+            for retry in range(attempts):
+                task = cls.get_task_with_access(
+                    task_id, company_id=company_id, requires_write_access=True
+                )
+
+                current = list(map(key, task.execution.artifacts))
+                updated = [a for a in artifacts if key(a) in current]
+                added = [a for a in artifacts if a not in updated]
+
+                filter = {"_id": task_id, "company": company_id}
+                update = {}
+                array_filters = None
+                if current:
+                    filter["execution.artifacts"] = {
+                        "$size": len(current),
+                        "$all": [
+                            *(
+                                {"$elemMatch": {"key": key, "mode": mode}}
+                                for key, mode in current
+                            )
+                        ],
+                    }
+                else:
+                    filter["$or"] = [
+                        {"execution.artifacts": {"$exists": False}},
+                        {"execution.artifacts": {"$size": 0}},
+                    ]
+
+                if added:
+                    update["$push"] = {
+                        "execution.artifacts": {"$each": [a.to_mongo() for a in added]}
+                    }
+                if updated:
+                    update["$set"] = {
+                        f"execution.artifacts.$[artifact{index}]": artifact.to_mongo()
+                        for index, artifact in enumerate(updated)
+                    }
+                    array_filters = [
+                        {
+                            f"artifact{index}.key": artifact.key,
+                            f"artifact{index}.mode": artifact.mode,
+                        }
+                        for index, artifact in enumerate(updated)
+                    ]
+
+                if not update:
+                    return [], []
+
+                result: pymongo.results.UpdateResult = Task._get_collection().update_one(
+                    filter=filter,
+                    update=update,
+                    array_filters=array_filters,
+                    upsert=False,
+                )
+
+                if result.matched_count >= 1:
+                    break
+
+                wait_msec = random() * int(
+                    config.get("services.tasks.artifacts.update_retry_msec", 500)
+                )
+
+                log.warning(
+                    f"Failed to update artifacts for task {task_id} (updated by another party),"
+                    f" retrying {retry+1}/{attempts} in {wait_msec}ms"
+                )
+
+                sleep(wait_msec / 1000)
+            else:
+                raise errors.server_error.UpdateFailed(
+                    "task artifacts updated by another party"
+                )
+
+            return [a.key for a in added], [a.key for a in updated]
+
    @classmethod
    @threads.register("non_responsive_tasks_watchdog", daemon=True)
    def start_non_responsive_tasks_watchdog(cls):
@@ -421,13 +585,11 @@ class TaskBLL(object):
                "services.tasks.non_responsive_tasks_watchdog.threshold_sec", 7200
            )
        )
-        while True:
-            sleep(
-                config.get(
-                    "services.tasks.non_responsive_tasks_watchdog.watch_interval_sec",
-                    900,
-                )
-            )
+        watch_interval = config.get(
+            "services.tasks.non_responsive_tasks_watchdog.watch_interval_sec", 900
+        )
+        sleep(watch_interval)
+        while not ThreadsManager.terminating:
            try:

                ref_time = datetime.utcnow() - threshold
@@ -463,6 +625,8 @@ class TaskBLL(object):
            except Exception as ex:
                log.exception(f"Failed stopping tasks: {str(ex)}")

+            sleep(watch_interval)
+
    @staticmethod
    def get_aggregated_project_execution_parameters(
        company_id,
@@ -502,10 +666,7 @@ class TaskBLL(object):
        ]

        with translate_errors_context():
-            result = next(
-                Task.aggregate(*pipeline),
-                None,
-            )
+            result = next(Task.aggregate(*pipeline), None)

        total = 0
        remaining = 0
@@ -513,7 +674,10 @@ class TaskBLL(object):

        if result:
            total = int(result.get("total", -1))
-            results = [r["_id"] for r in result.get("results", [])]
+            results = [
+                ParameterKeyEscaper.unescape(r["_id"])
+                for r in result.get("results", [])
+            ]
            remaining = max(0, total - (len(results) + page * page_size))

        return total, remaining, results
--- a/server/bll/task/utils.py
+++ b/server/bll/task/utils.py
@@ -3,6 +3,7 @@ from typing import TypeVar, Callable, Tuple, Sequence

 import attr
 import six
+from boltons.dictutils import OneToOne

 from apierrors import errors
 from database.errors import translate_errors_context
@@ -171,3 +172,26 @@ def split_by(
        [item for cond, item in applied if cond],
        [item for cond, item in applied if not cond],
    )
+
+
+class ParameterKeyEscaper:
+    _mapping = OneToOne({".": "%2E", "$": "%24"})
+
+    @classmethod
+    def escape(cls, value):
+        """ Quote a parameter key """
+        value = value.strip().replace("%", "%%")
+        for c, r in cls._mapping.items():
+            value = value.replace(c, r)
+        return value
+
+    @classmethod
+    def _unescape(cls, value):
+        for c, r in cls._mapping.inv.items():
+            value = value.replace(c, r)
+        return value
+
+    @classmethod
+    def unescape(cls, value):
+        """ Unquote a quoted parameter key """
+        return "%".join(map(cls._unescape, value.split("%%")))
--- a/server/config/basic.py
+++ b/server/config/basic.py
@@ -47,7 +47,7 @@ class BasicConfig:
    def logger(self, name):
        if Path(name).is_file():
            name = Path(name).stem
-        path = ".".join((self.prefix, Path(name).stem))
+        path = ".".join((self.prefix, name))
        return logging.getLogger(path)

    def _read_extra_env_config_values(self):
--- a/server/config/default/apiserver.conf
+++ b/server/config/default/apiserver.conf
@@ -34,6 +34,12 @@
        aggregate {
            allow_disk_use: true
        }
+
+        pre_populate {
+            enabled: false
+            zip_file: "/path/to/export.zip"
+            fail_on_error: false
+        }
    }

    auth {
--- a/server/config/default/hosts.conf
+++ b/server/config/default/hosts.conf
@@ -32,6 +32,11 @@ mongo {
 }

 redis {
+  apiserver {
+      host: "127.0.0.1"
+      port: 6379
+      db: 0
+  }
  workers {
    host: "127.0.0.1"
    port: 6379
--- a/server/config/default/services/events.conf
+++ b/server/config/default/services/events.conf
@@ -1,3 +1,9 @@
-{
-    es_index_prefix:"events"
-}
+es_index_prefix: "events"
+
+ignore_iteration {
+    metrics: [":monitor:machine", ":monitor:gpu"]
+}
+
+# max number of concurrent queries to ES when calculating events metrics
+# should not exceed the amount of concurrent connections set in the ES driver
+max_metrics_concurrency: 4
--- a/server/config/default/services/tasks.conf
+++ b/server/config/default/services/tasks.conf
@@ -5,3 +5,8 @@ non_responsive_tasks_watchdog {
    # Watchdog will sleep for this number of seconds after each cycle
    watch_interval_sec: 900
 }
+
+artifacts {
+    update_attempts: 10
+    update_retry_msec: 500
+}
--- a/server/config/info.py
+++ b/server/config/info.py
@@ -1,43 +1,43 @@
 from functools import lru_cache
-from pathlib import Path
 from os import getenv
+from pathlib import Path
+from version import __version__
+
+from config import config

 root = Path(__file__).parent.parent


-@lru_cache()
-def get_build_number():
-    try:
-        return (root / "BUILD").read_text().strip()
-    except FileNotFoundError:
-        return ""
-
-
-@lru_cache()
-def get_version():
-    try:
-        return (root / "VERSION").read_text().strip()
-    except FileNotFoundError:
-        return ""
-
-
-@lru_cache()
-def get_commit_number():
-    try:
-        return (root / "COMMIT").read_text().strip()
-    except FileNotFoundError:
-        return ""
-
-
-@lru_cache()
-def get_deployment_type() -> str:
-    value = getenv("TRAINS_SERVER_DEPLOYMENT_TYPE")
+def _get(prop_name, env_suffix=None, default=""):
+    value = getenv(f"TRAINS_SERVER_{env_suffix or prop_name}")
    if value:
        return value

    try:
-        value = (root / "DEPLOY").read_text().strip()
+        return (root / prop_name).read_text().strip()
    except FileNotFoundError:
-        pass
+        return default

-    return value or "manual"
+
+@lru_cache()
+def get_build_number():
+    return _get("BUILD")
+
+
+@lru_cache()
+def get_version():
+    return _get("VERSION", default=__version__)
+
+
+@lru_cache()
+def get_commit_number():
+    return _get("COMMIT")
+
+
+@lru_cache()
+def get_deployment_type() -> str:
+    return _get("DEPLOY", env_suffix="DEPLOYMENT_TYPE", default="manual")
+
+
+def get_default_company():
+    return config.get("apiserver.default_company")
--- a/server/database/model/auth.py
+++ b/server/database/model/auth.py
@@ -52,7 +52,7 @@ class User(DbModelMixin, AuthDocument):
    meta = {"db_alias": Database.auth, "strict": strict}

    id = StringField(primary_key=True)
-    name = StringField(unique_with="company")
+    name = StringField()

    created = DateTimeField()
    """ User auth entry creation time """
--- a/server/database/model/base.py
+++ b/server/database/model/base.py
@@ -1,7 +1,7 @@
 import re
 from collections import namedtuple
 from functools import reduce
-from typing import Collection, Sequence, Union
+from typing import Collection, Sequence, Union, Optional

 from boltons.iterutils import first
 from dateutil.parser import parse as parse_datetime
@@ -60,7 +60,7 @@ class ProperDictMixin(object):

 class GetMixin(PropsMixin):
    _text_score = "$text_score"
-
+    _projection_key = "projection"
    _ordering_key = "order_by"
    _search_text_key = "search_text"

@@ -270,11 +270,26 @@ class GetMixin(PropsMixin):
            return override_projection
        if not parameters:
            return []
-        return parameters.get("projection") or parameters.get("only_fields", [])
+        return parameters.get(cls._projection_key) or parameters.get("only_fields", [])

    @classmethod
-    def set_default_ordering(cls, parameters, value):
-        parameters[cls._ordering_key] = parameters.get(cls._ordering_key) or value
+    def set_projection(cls, parameters: dict, value: Sequence[str]) -> Sequence[str]:
+        parameters.pop("only_fields", None)
+        parameters[cls._projection_key] = value
+        return value
+
+    @classmethod
+    def get_ordering(cls, parameters: dict) -> Optional[Sequence[str]]:
+        return parameters.get(cls._ordering_key)
+
+    @classmethod
+    def set_ordering(cls, parameters: dict, value: Sequence[str]) -> Sequence[str]:
+        parameters[cls._ordering_key] = value
+        return value
+
+    @classmethod
+    def set_default_ordering(cls, parameters: dict, value: Sequence[str]) -> None:
+        cls.set_ordering(parameters, cls.get_ordering(parameters) or value)

    @classmethod
    def get_many_with_join(
--- a/server/database/model/settings.py
+++ b/server/database/model/settings.py
@@ -40,10 +40,6 @@ class Settings(DbModelMixin, Document):
        """ Sets a new value or adds a new key/value setting (if key does not exist) """
        key = key.strip(sep)
        res = Settings.objects(key=key).update(key=key, value=value, upsert=True)
-        # if Settings.objects(key=key).only("key"):
-        #
-        # else:
-        #     res = Settings(key=key, value=value).save()
        return bool(res)

    @classmethod
--- a/server/database/model/task/metrics.py
+++ b/server/database/model/task/metrics.py
@@ -1,10 +1,18 @@
-from mongoengine import EmbeddedDocument, StringField, DynamicField
+from mongoengine import (
+    EmbeddedDocument,
+    StringField,
+    DynamicField,
+    LongField,
+    EmbeddedDocumentField,
+)
+
+from database.fields import SafeMapField


 class MetricEvent(EmbeddedDocument):
    meta = {
        # For backwards compatibility reasons
-        'strict': False,
+        "strict": False,
    }

    metric = StringField(required=True)
@@ -12,3 +20,20 @@ class MetricEvent(EmbeddedDocument):
    value = DynamicField(required=True)
    min_value = DynamicField()  # for backwards compatibility reasons
    max_value = DynamicField()  # for backwards compatibility reasons
+
+
+class EventStats(EmbeddedDocument):
+    meta = {
+        # For backwards compatibility reasons
+        "strict": False,
+    }
+    last_update = LongField()
+
+
+class MetricEventStats(EmbeddedDocument):
+    meta = {
+        # For backwards compatibility reasons
+        "strict": False,
+    }
+    metric = StringField(required=True)
+    event_stats_by_type = SafeMapField(field=EmbeddedDocumentField(EventStats))
--- a/server/database/model/task/task.py
+++ b/server/database/model/task/task.py
@@ -18,10 +18,11 @@ from database.fields import (
    SafeSortedListField,
 )
 from database.model import AttributedDocument
+from database.model.base import ProperDictMixin
 from database.model.model_labels import ModelLabels
 from database.model.project import Project
 from database.utils import get_options
-from .metrics import MetricEvent
+from .metrics import MetricEvent, MetricEventStats
 from .output import Output

 DEFAULT_LAST_ITERATION = 0
@@ -66,10 +67,15 @@ class ArtifactTypeData(EmbeddedDocument):
    data_hash = StringField()


+class ArtifactModes:
+    input = "input"
+    output = "output"
+
+
 class Artifact(EmbeddedDocument):
    key = StringField(required=True)
    type = StringField(required=True)
-    mode = StringField(choices=("input", "output"), default="output")
+    mode = StringField(choices=get_options(ArtifactModes), default=ArtifactModes.output)
    uri = StringField()
    hash = StringField()
    content_size = LongField()
@@ -78,7 +84,7 @@ class Artifact(EmbeddedDocument):
    display_data = SafeSortedListField(ListField(UnionField((int, float, str))))


-class Execution(EmbeddedDocument):
+class Execution(EmbeddedDocument, ProperDictMixin):
    test_split = IntField(default=0)
    parameters = SafeDictField(default=dict)
    model = StringField(reference_field="Model")
@@ -156,3 +162,4 @@ class Task(AttributedDocument):
    last_update = DateTimeField()
    last_iteration = IntField(default=DEFAULT_LAST_ITERATION)
    last_metrics = SafeMapField(field=SafeMapField(EmbeddedDocumentField(MetricEvent)))
+    metric_stats = SafeMapField(field=EmbeddedDocumentField(MetricEventStats))
--- a/server/database/model/user.py
+++ b/server/database/model/user.py
@@ -1,7 +1,6 @@
-from mongoengine import Document, StringField
+from mongoengine import Document, StringField, DynamicField

 from database import Database, strict
-from database.fields import SafeDictField
 from database.model import DbModelMixin
 from database.model.company import Company

@@ -18,4 +17,4 @@ class User(DbModelMixin, Document):
    family_name = StringField(user_set_allowed=True)
    given_name = StringField(user_set_allowed=True)
    avatar = StringField()
-    preferences = SafeDictField(default=dict, exclude_by_default=True)
+    preferences = DynamicField(default="", exclude_by_default=True)
--- a/server/database/utils.py
+++ b/server/database/utils.py
@@ -96,7 +96,12 @@ def parse_from_call(call_data, fields, cls_fields, discard_none_values=True):
                continue
            if desc:
                if callable(desc):
-                    desc(value)
+                    try:
+                        desc(value)
+                    except TypeError:
+                        raise ParseCallError(f"expecting {desc.__name__}", field=field)
+                    except Exception as ex:
+                        raise ParseCallError(str(ex), field=field)
                else:
                    if issubclass(desc, (list, tuple, dict)) and not isinstance(
                        value, desc
--- a/server/elastic/apply_mappings.py
+++ b/server/elastic/apply_mappings.py
@@ -10,7 +10,11 @@ from pathlib import Path
 from requests.adapters import HTTPAdapter
 from requests.packages.urllib3.util.retry import Retry

-HERE = Path(__file__).parent
+HERE = Path(__file__).resolve().parent
+
+session = requests.Session()
+adapter = HTTPAdapter(max_retries=Retry(5, backoff_factor=0.5))
+session.mount('http://', adapter)


 def apply_mappings_to_host(host: str):
@@ -20,10 +24,6 @@ def apply_mappings_to_host(host: str):
            es_server = host
            url = f"{es_server}/_template/{f.stem}"

-            session = requests.Session()
-            adapter = HTTPAdapter(max_retries=Retry(5, backoff_factor=0.5))
-            session.mount('http://', adapter)
-
            session.delete(url)
            r = session.post(
                url,
--- a/server/elastic/initialize.py
+++ b/server/elastic/initialize.py
@@ -0,0 +1,27 @@
+from furl import furl
+
+from config import config
+from elastic.apply_mappings import apply_mappings_to_host
+from es_factory import get_cluster_config
+
+log = config.logger(__file__)
+
+
+class MissingElasticConfiguration(Exception):
+    """
+    Exception when cluster configuration is not found in config files
+    """
+
+    pass
+
+
+def init_es_data():
+    hosts_config = get_cluster_config("events").get("hosts")
+    if not hosts_config:
+        raise MissingElasticConfiguration("for cluster 'events'")
+
+    for conf in hosts_config:
+        host = furl(scheme="http", host=conf["host"], port=conf["port"]).url
+        log.info(f"Applying mappings to host: {host}")
+        res = apply_mappings_to_host(host)
+        log.info(res)
--- a/server/elastic/mappings/events.json
+++ b/server/elastic/mappings/events.json
@@ -1,7 +1,7 @@
 {
  "template": "events-*",
  "settings": {
-    "number_of_shards": 5
+    "number_of_shards": 1
  },
  "mappings": {
    "_default_": {
--- a/server/init_data.py
+++ b/server/init_data.py
@@ -1,220 +0,0 @@
-import importlib.util
-from datetime import datetime
-from pathlib import Path
-from uuid import uuid4
-
-import attr
-from furl import furl
-from mongoengine.connection import get_db
-from semantic_version import Version
-
-import database.utils
-from bll.queue import QueueBLL
-from config import config
-from database import Database
-from database.model.auth import Role
-from database.model.auth import User as AuthUser, Credentials
-from database.model.company import Company
-from database.model.queue import Queue
-from database.model.settings import Settings
-from database.model.user import User
-from database.model.version import Version as DatabaseVersion
-from elastic.apply_mappings import apply_mappings_to_host
-from es_factory import get_cluster_config
-from service_repo.auth.fixed_user import FixedUser
-
-log = config.logger(__file__)
-
-migration_dir = (Path(__file__) / "../../migration/mongodb").resolve()
-
-
-class MissingElasticConfiguration(Exception):
-    """
-    Exception when cluster configuration is not found in config files
-    """
-
-    pass
-
-
-def init_es_data():
-    hosts_config = get_cluster_config("events").get("hosts")
-    if not hosts_config:
-        raise MissingElasticConfiguration("for cluster 'events'")
-
-    for conf in hosts_config:
-        host = furl(scheme="http", host=conf["host"], port=conf["port"]).url
-        log.info(f"Applying mappings to host: {host}")
-        res = apply_mappings_to_host(host)
-        log.info(res)
-
-
-def _ensure_company():
-    company_id = config.get("apiserver.default_company")
-    company = Company.objects(id=company_id).only("id").first()
-    if company:
-        return company_id
-
-    company_name = "trains"
-    log.info(f"Creating company: {company_name}")
-    company = Company(id=company_id, name=company_name)
-    company.save()
-    return company_id
-
-
-def _ensure_default_queue(company):
-    """
-    If no queue is present for the company then
-    create a new one and mark it as a default
-    """
-    queue = Queue.objects(company=company).only("id").first()
-    if queue:
-        return
-
-    QueueBLL.create(company, name="default", system_tags=["default"])
-
-
-def _ensure_auth_user(user_data, company_id):
-    ensure_credentials = {"key", "secret"}.issubset(user_data.keys())
-    if ensure_credentials:
-        user = AuthUser.objects(
-            credentials__match=Credentials(
-                key=user_data["key"], secret=user_data["secret"]
-            )
-        ).first()
-        if user:
-            return user.id
-
-    log.info(f"Creating user: {user_data['name']}")
-    user = AuthUser(
-        id=user_data.get("id", f"__{user_data['name']}__"),
-        name=user_data["name"],
-        company=company_id,
-        role=user_data["role"],
-        email=user_data["email"],
-        created=datetime.utcnow(),
-        credentials=[Credentials(key=user_data["key"], secret=user_data["secret"])]
-        if ensure_credentials
-        else None,
-    )
-
-    user.save()
-
-    return user.id
-
-
-def _ensure_user(user: FixedUser, company_id: str):
-    if User.objects(id=user.user_id).first():
-        return
-
-    data = attr.asdict(user)
-    data["id"] = user.user_id
-    data["email"] = f"{user.user_id}@example.com"
-    data["role"] = Role.user
-
-    _ensure_auth_user(user_data=data, company_id=company_id)
-
-    given_name, _, family_name = user.name.partition(" ")
-
-    User(
-        id=user.user_id,
-        company=company_id,
-        name=user.name,
-        given_name=given_name,
-        family_name=family_name,
-    ).save()
-
-
-def _apply_migrations():
-    if not migration_dir.is_dir():
-        raise ValueError(f"Invalid migration dir {migration_dir}")
-
-    try:
-        previous_versions = sorted(
-            (Version(ver.num) for ver in DatabaseVersion.objects().only("num")),
-            reverse=True,
-        )
-    except ValueError as ex:
-        raise ValueError(f"Invalid database version number encountered: {ex}")
-
-    last_version = previous_versions[0] if previous_versions else Version("0.0.0")
-
-    try:
-        new_scripts = {
-            ver: path
-            for ver, path in ((Version(f.stem), f) for f in migration_dir.glob("*.py"))
-            if ver > last_version
-        }
-    except ValueError as ex:
-        raise ValueError(f"Failed parsing migration version from file: {ex}")
-
-    dbs = {Database.auth: "migrate_auth", Database.backend: "migrate_backend"}
-
-    migration_log = log.getChild("mongodb_migration")
-
-    for script_version in sorted(new_scripts.keys()):
-        script = new_scripts[script_version]
-        spec = importlib.util.spec_from_file_location(script.stem, str(script))
-        module = importlib.util.module_from_spec(spec)
-        spec.loader.exec_module(module)
-
-        for alias, func_name in dbs.items():
-            func = getattr(module, func_name, None)
-            if not func:
-                continue
-            try:
-                migration_log.info(f"Applying {script.stem}/{func_name}()")
-                func(get_db(alias))
-            except Exception:
-                migration_log.exception(f"Failed applying {script}:{func_name}()")
-                raise ValueError("Migration failed, aborting. Please restore backup.")
-
-        DatabaseVersion(
-            id=database.utils.id(),
-            num=script.stem,
-            created=datetime.utcnow(),
-            desc="Applied on server startup",
-        ).save()
-
-
-def _ensure_uuid():
-    Settings.add_value("server.uuid", str(uuid4()))
-
-
-def init_mongo_data():
-    try:
-        _apply_migrations()
-
-        _ensure_uuid()
-
-        company_id = _ensure_company()
-        _ensure_default_queue(company_id)
-
-        users = [
-            {
-                "name": "apiserver",
-                "role": Role.system,
-                "email": "apiserver@example.com",
-            },
-            {
-                "name": "webserver",
-                "role": Role.system,
-                "email": "webserver@example.com",
-            },
-            {"name": "tests", "role": Role.user, "email": "tests@example.com"},
-        ]
-
-        for user in users:
-            credentials = config.get(f"secure.credentials.{user['name']}")
-            user["key"] = credentials.user_key
-            user["secret"] = credentials.user_secret
-            _ensure_auth_user(user, company_id)
-
-        if FixedUser.enabled():
-            log.info("Fixed users mode is enabled")
-            for user in FixedUser.from_config():
-                try:
-                    _ensure_user(user, company_id)
-                except Exception as ex:
-                    log.error(f"Failed creating fixed user {user['name']}: {ex}")
-    except Exception as ex:
-        log.exception("Failed initializing mongodb")
--- a/server/mongo/initialize/init.py
+++ b/server/mongo/initialize/init.py
@@ -0,0 +1,70 @@
+from pathlib import Path
+
+from config import config
+from database.model.auth import Role
+from service_repo.auth.fixed_user import FixedUser
+from .migration import _apply_migrations
+from .pre_populate import PrePopulate
+from .user import ensure_fixed_user, _ensure_auth_user, _ensure_backend_user
+from .util import _ensure_company, _ensure_default_queue, _ensure_uuid
+
+log = config.logger(__package__)
+
+
+def init_mongo_data():
+    try:
+        empty_dbs = _apply_migrations(log)
+
+        _ensure_uuid()
+
+        company_id = _ensure_company(log)
+
+        _ensure_default_queue(company_id)
+
+        if empty_dbs and config.get("apiserver.mongo.pre_populate.enabled", False):
+            zip_file = config.get("apiserver.mongo.pre_populate.zip_file")
+            if not zip_file or not Path(zip_file).is_file():
+                msg = f"Failed pre-populating database: invalid zip file {zip_file}"
+                if config.get("apiserver.mongo.pre_populate.fail_on_error", False):
+                    log.error(msg)
+                    raise ValueError(msg)
+                else:
+                    log.warning(msg)
+            else:
+
+                user_id = _ensure_backend_user(
+                    "__allegroai__", company_id, "Allegro.ai"
+                )
+
+                PrePopulate.import_from_zip(zip_file, user_id=user_id)
+
+        users = [
+            {
+                "name": "apiserver",
+                "role": Role.system,
+                "email": "apiserver@example.com",
+            },
+            {
+                "name": "webserver",
+                "role": Role.system,
+                "email": "webserver@example.com",
+            },
+            {"name": "tests", "role": Role.user, "email": "tests@example.com"},
+        ]
+
+        for user in users:
+            credentials = config.get(f"secure.credentials.{user['name']}")
+            user["key"] = credentials.user_key
+            user["secret"] = credentials.user_secret
+            _ensure_auth_user(user, company_id, log=log)
+
+        if FixedUser.enabled():
+            log.info("Fixed users mode is enabled")
+            FixedUser.validate()
+            for user in FixedUser.from_config():
+                try:
+                    ensure_fixed_user(user, company_id, log=log)
+                except Exception as ex:
+                    log.error(f"Failed creating fixed user {user.name}: {ex}")
+    except Exception as ex:
+        log.exception("Failed initializing mongodb")
--- a/server/mongo/initialize/migration.py
+++ b/server/mongo/initialize/migration.py
@@ -0,0 +1,86 @@
+import importlib.util
+from datetime import datetime
+from logging import Logger
+from pathlib import Path
+
+from mongoengine.connection import get_db
+from semantic_version import Version
+
+import database.utils
+from database import Database
+from database.model.version import Version as DatabaseVersion
+
+migration_dir = Path(__file__).resolve().parent.with_name("migrations")
+
+
+def _apply_migrations(log: Logger) -> bool:
+    """
+    Apply migrations as found in the migration dir.
+    Returns a boolean indicating whether the database was empty prior to migration.
+    """
+    log = log.getChild(Path(__file__).stem)
+
+    log.info(f"Started mongodb migrations")
+
+    if not migration_dir.is_dir():
+        raise ValueError(f"Invalid migration dir {migration_dir}")
+
+    empty_dbs = not any(
+        get_db(alias).collection_names()
+        for alias in database.utils.get_options(Database)
+    )
+
+    try:
+        previous_versions = sorted(
+            (Version(ver.num) for ver in DatabaseVersion.objects().only("num")),
+            reverse=True,
+        )
+    except ValueError as ex:
+        raise ValueError(f"Invalid database version number encountered: {ex}")
+
+    last_version = previous_versions[0] if previous_versions else Version("0.0.0")
+
+    try:
+        new_scripts = {
+            ver: path
+            for ver, path in ((Version(f.stem), f) for f in migration_dir.glob("*.py"))
+            if ver > last_version
+        }
+    except ValueError as ex:
+        raise ValueError(f"Failed parsing migration version from file: {ex}")
+
+    dbs = {Database.auth: "migrate_auth", Database.backend: "migrate_backend"}
+
+    for script_version in sorted(new_scripts):
+        script = new_scripts[script_version]
+
+        if empty_dbs:
+            log.info(f"Skipping migration {script.name} (empty databases)")
+        else:
+            spec = importlib.util.spec_from_file_location(script.stem, str(script))
+            module = importlib.util.module_from_spec(spec)
+            spec.loader.exec_module(module)
+
+            for alias, func_name in dbs.items():
+                func = getattr(module, func_name, None)
+                if not func:
+                    continue
+                try:
+                    log.info(f"Applying {script.stem}/{func_name}()")
+                    func(get_db(alias))
+                except Exception:
+                    log.exception(f"Failed applying {script}:{func_name}()")
+                    raise ValueError(
+                        "Migration failed, aborting. Please restore backup."
+                    )
+
+        DatabaseVersion(
+            id=database.utils.id(),
+            num=script.stem,
+            created=datetime.utcnow(),
+            desc="Applied on server startup",
+        ).save()
+
+    log.info("Finished mongodb migrations")
+
+    return empty_dbs
--- a/server/mongo/initialize/pre_populate.py
+++ b/server/mongo/initialize/pre_populate.py
@@ -0,0 +1,153 @@
+import importlib
+from collections import defaultdict
+from datetime import datetime
+from os.path import splitext
+from typing import List, Optional, Any, Type, Set, Dict
+from zipfile import ZipFile, ZIP_BZIP2
+
+import mongoengine
+from tqdm import tqdm
+
+
+class PrePopulate:
+    @classmethod
+    def export_to_zip(
+        cls, filename: str, experiments: List[str] = None, projects: List[str] = None
+    ):
+        with ZipFile(filename, mode="w", compression=ZIP_BZIP2) as zfile:
+            cls._export(zfile, experiments, projects)
+
+    @classmethod
+    def import_from_zip(cls, filename: str, user_id: str = None):
+        with ZipFile(filename) as zfile:
+            cls._import(zfile, user_id)
+
+    @staticmethod
+    def _resolve_type(
+        cls: Type[mongoengine.Document], ids: Optional[List[str]]
+    ) -> List[Any]:
+        ids = set(ids)
+        items = list(cls.objects(id__in=list(ids)))
+        resolved = {i.id for i in items}
+        missing = ids - resolved
+        for name_candidate in missing:
+            results = list(cls.objects(name=name_candidate))
+            if not results:
+                print(f"ERROR: no match for `{name_candidate}`")
+                exit(1)
+            elif len(results) > 1:
+                print(f"ERROR: more than one match for `{name_candidate}`")
+                exit(1)
+            items.append(results[0])
+        return items
+
+    @classmethod
+    def _resolve_entities(
+        cls, experiments: List[str] = None, projects: List[str] = None
+    ) -> Dict[Type[mongoengine.Document], Set[mongoengine.Document]]:
+        from database.model.project import Project
+        from database.model.task.task import Task
+
+        entities = defaultdict(set)
+
+        if projects:
+            print("Reading projects...")
+            entities[Project].update(cls._resolve_type(Project, projects))
+            print("--> Reading project experiments...")
+            objs = Task.objects(
+                project__in=list(set(filter(None, (p.id for p in entities[Project]))))
+            )
+            entities[Task].update(o for o in objs if o.id not in (experiments or []))
+
+        if experiments:
+            print("Reading experiments...")
+            entities[Task].update(cls._resolve_type(Task, experiments))
+            print("--> Reading experiments projects...")
+            objs = Project.objects(
+                id__in=list(set(filter(None, (p.project for p in entities[Task]))))
+            )
+            project_ids = {p.id for p in entities[Project]}
+            entities[Project].update(o for o in objs if o.id not in project_ids)
+
+        return entities
+
+    @classmethod
+    def _cleanup_task(cls, task):
+        from database.model.task.task import TaskStatus
+
+        task.completed = None
+        task.started = None
+        if task.execution:
+            task.execution.model = None
+            task.execution.model_desc = None
+            task.execution.model_labels = None
+        if task.output:
+            task.output.model = None
+
+        task.status = TaskStatus.created
+        task.comment = "Auto generated by Allegro.ai"
+        task.created = datetime.utcnow()
+        task.last_iteration = 0
+        task.last_update = task.created
+        task.status_changed = task.created
+        task.status_message = ""
+        task.status_reason = ""
+        task.user = ""
+
+    @classmethod
+    def _cleanup_entity(cls, entity_cls, entity):
+        from database.model.task.task import Task
+        if entity_cls == Task:
+            cls._cleanup_task(entity)
+
+    @classmethod
+    def _export(
+        cls, writer: ZipFile, experiments: List[str] = None, projects: List[str] = None
+    ):
+        entities = cls._resolve_entities(experiments, projects)
+
+        for cls_, items in entities.items():
+            if not items:
+                continue
+            filename = f"{cls_.__module__}.{cls_.__name__}.json"
+            print(f"Writing {len(items)} items into {writer.filename}:{filename}")
+            with writer.open(filename, "w") as f:
+                f.write("[\n".encode("utf-8"))
+                last = len(items) - 1
+                for i, item in enumerate(items):
+                    cls._cleanup_entity(cls_, item)
+                    f.write(item.to_json().encode("utf-8"))
+                    if i != last:
+                        f.write(",".encode("utf-8"))
+                    f.write("\n".encode("utf-8"))
+                f.write("]\n".encode("utf-8"))
+
+    @staticmethod
+    def _import(reader: ZipFile, user_id: str = None):
+        for file_info in reader.filelist:
+            full_name = splitext(file_info.orig_filename)[0]
+            print(f"Reading {reader.filename}:{full_name}...")
+            module_name, _, class_name = full_name.rpartition(".")
+            module = importlib.import_module(module_name)
+            cls_: Type[mongoengine.Document] = getattr(module, class_name)
+
+            with reader.open(file_info) as f:
+                for item in tqdm(
+                    f.readlines(),
+                    desc=f"Writing {cls_.__name__.lower()}s into database",
+                    unit="doc",
+                ):
+                    item = (
+                        item.decode("utf-8")
+                        .strip()
+                        .lstrip("[")
+                        .rstrip("]")
+                        .rstrip(",")
+                        .strip()
+                    )
+                    if not item:
+                        continue
+                    doc = cls_.from_json(item)
+                    if user_id is not None and hasattr(doc, "user"):
+                        doc.user = user_id
+                    doc.save(force_insert=True)
--- a/server/mongo/initialize/user.py
+++ b/server/mongo/initialize/user.py
@@ -0,0 +1,74 @@
+from datetime import datetime
+from logging import Logger
+
+import attr
+
+from database.model.auth import Role
+from database.model.auth import User as AuthUser, Credentials
+from database.model.user import User
+from service_repo.auth.fixed_user import FixedUser
+
+
+def _ensure_auth_user(user_data: dict, company_id: str, log: Logger):
+    ensure_credentials = {"key", "secret"}.issubset(user_data)
+    if ensure_credentials:
+        user = AuthUser.objects(
+            credentials__match=Credentials(
+                key=user_data["key"], secret=user_data["secret"]
+            )
+        ).first()
+        if user:
+            return user.id
+
+    log.info(f"Creating user: {user_data['name']}")
+    user = AuthUser(
+        id=user_data.get("id", f"__{user_data['name']}__"),
+        name=user_data["name"],
+        company=company_id,
+        role=user_data["role"],
+        email=user_data["email"],
+        created=datetime.utcnow(),
+        credentials=[Credentials(key=user_data["key"], secret=user_data["secret"])]
+        if ensure_credentials
+        else None,
+    )
+
+    user.save()
+
+    return user.id
+
+
+def _ensure_backend_user(user_id: str, company_id: str, user_name: str):
+    given_name, _, family_name = user_name.partition(" ")
+
+    User(
+        id=user_id,
+        company=company_id,
+        name=user_name,
+        given_name=given_name,
+        family_name=family_name,
+    ).save()
+
+    return user_id
+
+
+def ensure_fixed_user(user: FixedUser, company_id: str, log: Logger):
+    if User.objects(id=user.user_id).first():
+        return
+
+    data = attr.asdict(user)
+    data["id"] = user.user_id
+    data["email"] = f"{user.user_id}@example.com"
+    data["role"] = Role.user
+
+    _ensure_auth_user(user_data=data, company_id=company_id, log=log)
+
+    given_name, _, family_name = user.name.partition(" ")
+
+    User(
+        id=user.user_id,
+        company=company_id,
+        name=user.name,
+        given_name=given_name,
+        family_name=family_name,
+    ).save()
--- a/server/mongo/initialize/util.py
+++ b/server/mongo/initialize/util.py
@@ -0,0 +1,40 @@
+from logging import Logger
+from uuid import uuid4
+
+from bll.queue import QueueBLL
+from config import config
+from config.info import get_default_company
+from database.model.company import Company
+from database.model.queue import Queue
+from database.model.settings import Settings
+
+log = config.logger(__file__)
+
+
+def _ensure_company(log: Logger):
+    company_id = get_default_company()
+    company = Company.objects(id=company_id).only("id").first()
+    if company:
+        return company_id
+
+    company_name = "trains"
+    log.info(f"Creating company: {company_name}")
+    company = Company(id=company_id, name=company_name)
+    company.save()
+    return company_id
+
+
+def _ensure_default_queue(company):
+    """
+    If no queue is present for the company then
+    create a new one and mark it as a default
+    """
+    queue = Queue.objects(company=company).only("id").first()
+    if queue:
+        return
+
+    QueueBLL.create(company, name="default", system_tags=["default"])
+
+
+def _ensure_uuid():
+    Settings.add_value("server.uuid", str(uuid4()))
--- a/server/mongo/migrations/0.12.1.py
+++ b/server/mongo/migrations/0.12.1.py
--- a/server/mongo/migrations/0.13.0.py
+++ b/server/mongo/migrations/0.13.0.py
@@ -0,0 +1,20 @@
+import json
+
+from pymongo.database import Database, Collection
+
+
+def migrate_auth(db: Database):
+    collection: Collection = db["user"]
+    if "name_1_company_1" in [doc["name"] for doc in collection.list_indexes()]:
+        collection.drop_index("name_1_company_1")
+
+
+def migrate_backend(db: Database):
+    collection: Collection = db["user"]
+    users = collection.find(
+        {"preferences": {"$exists": True, "$ne": None, "$type": "object"}}
+    )
+    for doc in users:
+        collection.update_one(
+            {"_id": doc["_id"]}, {"$set": {"preferences": json.dumps(doc["preferences"])}}
+        )
--- a/server/mongo/migrations/0.14.0.py
+++ b/server/mongo/migrations/0.14.0.py
@@ -0,0 +1,46 @@
+import hashlib
+
+from pymongo.database import Database, Collection
+
+from service_repo.auth.fixed_user import FixedUser
+
+
+def _get_ids():
+    if not FixedUser.enabled():
+        return
+
+    return {
+        hashlib.md5(f"{user.username}:{user.password}".encode()).hexdigest(): user.user_id
+        for user in FixedUser.from_config()
+    }
+
+
+def _switch_uuid(collection: Collection, uuid_field: str, uuids: dict):
+    docs = list(collection.find({uuid_field: {"$in": [uuids]}}))
+    if not docs:
+        return
+    replaced_uuids = [doc[uuid_field] for doc in docs]
+    for doc in docs:
+        doc[uuid_field] = uuids[doc[uuid_field]]
+    collection.insert_many(docs)
+    collection.delete_many({uuid_field: {"$in": replaced_uuids}})
+
+
+def migrate_auth(db: Database):
+    uuids = _get_ids()
+    if not uuids:
+        return
+
+    collection = db["user"]
+    collection.drop_index("name_1_company_1")
+
+    _switch_uuid(collection=collection, uuid_field="_id", uuids=uuids)
+
+
+def migrate_backend(db: Database):
+    uuids = _get_ids()
+    if not uuids:
+        return
+
+    for name in ("project", "task", "model"):
+        _switch_uuid(collection=db[name], uuid_field="user", uuids=uuids)
--- a/server/requirements.txt
+++ b/server/requirements.txt
@@ -1,31 +1,30 @@
-six
-Flask>=0.12.2
-elasticsearch>=5.0.0,<6.0.0
-pyhocon>=0.3.35
-requests>=2.13.0
-arrow>=0.10.0
-pymongo==3.6.1  # 3.7 has a bug multiple users logged in
-Flask-Cors>=3.0.5
-Flask-Compress>=1.4.0
-mongoengine==0.16.2
-jsonmodels>=2.3
-pyjwt>=1.3.0
-gunicorn>=19.7.1
-Jinja2==2.10
-python-rapidjson>=0.6.3
-jsonschema>=2.6.0
-dpath>=1.4.2
-funcsigs==1.0.2
-luqum>=0.7.2
-typing>=3.6.4
 attrs>=19.1.0
-nested_dict>=1.61
-related>=0.7.2
-validators>=0.12.4
-fastjsonschema>=2.8
 boltons>=19.1.0
-semantic_version>=2.6.0,<3
+dpath>=1.4.2,<2.0
+elasticsearch>=5.0.0,<6.0.0
+fastjsonschema>=2.8
+Flask-Compress>=1.4.0
+Flask-Cors>=3.0.5
+Flask>=0.12.2
+funcsigs==1.0.2
 furl>=2.0.0
-redis>=2.10.5
+gunicorn>=19.7.1
 humanfriendly==4.18
+Jinja2==2.10
+jsonmodels>=2.3
+jsonschema>=2.6.0
+luqum>=0.7.2
+mongoengine==0.16.2
+nested_dict>=1.61
 psutil>=5.6.5
+pyhocon>=0.3.35
+pyjwt>=1.3.0
+pymongo==3.6.1  # 3.7 has a bug multiple users logged in
+python-rapidjson>=0.6.3
+redis>=2.10.5
+related>=0.7.2
+requests>=2.13.0
+semantic_version>=2.8.0,<3
+six
+tqdm
+validators>=0.12.4
--- a/server/schema/services/events.conf
+++ b/server/schema/services/events.conf
@@ -171,6 +171,30 @@
                critical
            ]
        }
+        event_type_enum {
+            type: string
+            enum: [
+                training_stats_scalar
+                training_stats_vector
+                training_debug_image
+                plot
+                log
+            ]
+        }
+        task_metric {
+            type: object
+            required: [task, metric]
+            properties {
+                task {
+                    description: "Task ID"
+                    type: string
+                }
+                metric {
+                    description: "Metric name"
+                    type: string
+                }
+            }
+        }
        task_log_event {
            description: """A log event associated with a task."""
            type: object
@@ -319,6 +343,84 @@
                }
            }
        }
+        "2.7" {
+            description: "Get the debug image events for the requested amount of iterations per each task's metric"
+            request {
+                type: object
+                required: [
+                    metrics
+                ]
+                properties {
+                    metrics {
+                        type: array
+                        items { "$ref": "#/definitions/task_metric" }
+                        description: "List metrics for which the envents will be retreived"
+                    }
+                    iters {
+                        type: integer
+                        description: "Max number of latest iterations for which to return debug images"
+                    }
+                    navigate_earlier {
+                        type: boolean
+                        description: "If set then events are retreived from later iterations to earlier ones. Otherwise from earlier iterations to the later. The default is True"
+                    }
+                    refresh {
+                        type: boolean
+                        description: "If set then scroll will be moved to the latest iterations. The default is False"
+                    }
+                    scroll_id {
+                        type: string
+                        description: "Scroll ID of previous call (used for getting more results)"
+                    }
+                }
+            }
+            response {
+                type: object
+                properties {
+                    metrics {
+                        type: array
+                        items: { type: object }
+                        description: "Debug image events grouped by task metrics and iterations"
+                    }
+                    scroll_id {
+                        type: string
+                        description: "Scroll ID for getting more results"
+                    }
+                }
+            }
+        }
+    }
+    get_task_metrics{
+        "2.7": {
+            description: "For each task, get a list of metrics for which the requested event type was reported"
+            request {
+                type: object
+                required: [
+                    tasks
+                ]
+                properties {
+                    tasks {
+                        type: array
+                        items { type: string }
+                        description: "Task IDs"
+                    }
+                    event_type {
+                        "description": "Event type"
+                        "$ref": "#/definitions/event_type_enum"
+                    }
+                }
+            }
+            response {
+                type: object
+                properties {
+                    metrics {
+                        type: array
+                        items { type: object }
+                        description: "List of task with their metrics"
+                    }
+                }
+            }
+        }
    }
    get_task_log {
        "1.5" {
@@ -455,7 +557,7 @@
                    }
                    batch_size {
                        type: integer
-                        description: "Number of events to return each time"
+                        description: "Number of events to return each time (default 500)"
                    }
                    event_type {
                        type: string
--- a/server/schema/services/models.conf
+++ b/server/schema/services/models.conf
@@ -324,7 +324,6 @@
                required: [
                    uri
                    name
-                    labels
                ]
                properties {
                    uri {
--- a/server/schema/services/server.conf
+++ b/server/schema/services/server.conf
@@ -86,6 +86,7 @@ endpoints {
    }
 }
 report_stats_option {
+    allow_roles = [ "*" ]
    "2.4" {
        description: "Get or set the report statistics option per-company"
        request {
@@ -117,6 +118,10 @@ report_stats_option {
                    description: "If enabled, returns Id of the user who enabled the option"
                    type: string
                }
+                current_version {
+                    description: "Returns the current server version"
+                    type: string
+                }
            }
        }
    }
--- a/server/schema/services/tasks.conf
+++ b/server/schema/services/tasks.conf
@@ -550,6 +550,60 @@ get_all {
        }
    }
 }
+clone {
+    "2.5" {
+        description: "Clone an existing task"
+        request {
+            type: object
+            required: [ task ]
+            properties {
+                task {
+                    description: "ID of the task"
+                    type: string
+                }
+                new_task_name {
+                    description: "The name of the cloned task. If not provided then taken from the original task"
+                    type: string
+                }
+                new_task_comment {
+                    description: "The comment of the cloned task. If not provided then taken from the original task"
+                    type: string
+                }
+                new_task_tags {
+                    description: "The user-defined tags of the cloned task. If not provided then taken from the original task"
+                    type: array
+                    items { type: string }
+                }
+                new_task_system_tags {
+                    description: "The system tags of the cloned task. If not provided then empty"
+                    type: array
+                    items { type: string }
+                }
+                new_task_parent {
+                    description: "The parent of the cloned task. If not provided then taken from the original task"
+                    type: string
+                }
+                new_task_project {
+                    description: "The project of the cloned task. If not provided then taken from the original task"
+                    type: string
+                }
+                execution_overrides {
+                    description: "The execution params for the cloned task. The params not specified are taken from the original task"
+                    "$ref": "#/definitions/execution"
+                }
+            }
+        }
+        response {
+            type: object
+            properties {
+                id {
+                    description: "ID of the new task"
+                    type: string
+                }
+            }
+        }
+    }
+}
 create {
    "2.1" {
        description: "Create a new task"
@@ -1304,4 +1358,40 @@ ping {
            additionalProperties: false
        }
    }
+}
+
+add_or_update_artifacts {
+    "2.6" {
+        description: """ Update an existing artifact (search by key/mode) or add a new one """
+        request {
+            type: object
+            required: [ task, artifacts ]
+            properties {
+                task {
+                    description: "Task ID"
+                    type: string
+                }
+                artifacts {
+                    description: "Artifacts to add or update"
+                    type: array
+                    items { "$ref": "#/definitions/artifact" }
+                }
+            }
+        }
+        response {
+            type: object
+            properties {
+                added {
+                    description: "Keys of artifacts added"
+                    type: array
+                    items { type: string }
+                }
+                updated {
+                    description: "Keys of artifacts updated"
+                    type: array
+                    items { type: string }
+                }
+            }
+        }
+    }
 }
--- a/server/server.py
+++ b/server/server.py
@@ -1,3 +1,4 @@
+import atexit
 from argparse import ArgumentParser

 from flask import Flask, request, Response
@@ -9,13 +10,15 @@ import database
 from apierrors.base import BaseError
 from bll.statistics.stats_reporter import StatisticsReporter
 from config import config
-from init_data import init_es_data, init_mongo_data
+from elastic.initialize import init_es_data
+from mongo.initialize import init_mongo_data
 from service_repo import ServiceRepo, APICall
 from service_repo.auth import AuthType
 from service_repo.errors import PathParsingError
 from timing_context import TimingContext
 from updates import check_updates_thread
 from utilities import json
+from utilities.threads_manager import ThreadsManager

 app = Flask(__name__, static_url_path="/static")
 CORS(app, **config.get("apiserver.cors"))
@@ -41,6 +44,13 @@ check_updates_thread.start()
 StatisticsReporter.start()


+def graceful_shutdown():
+    ThreadsManager.terminating = True
+
+
+atexit.register(graceful_shutdown)
+
+
@app.before_first_request
 def before_app_first_request():
    pass
--- a/server/service_repo/apicall.py
+++ b/server/service_repo/apicall.py
@@ -21,6 +21,8 @@ JSON_CONTENT_TYPE = "application/json"
 class DataContainer(object):
    """ Data container that supports raw data (dict or a list of batched dicts) and a data model """

+    null_schema_validator: SchemaValidator = SchemaValidator(None)
+
    def __init__(self, data=None, batched_data=None):
        if data and batched_data:
            raise ValueError("data and batched data are not supported simultaneously")
@@ -28,7 +30,7 @@ class DataContainer(object):
        self._data = None
        self._data_model = None
        self._data_model_cls = None
-        self._schema_validator: SchemaValidator = SchemaValidator(None)
+        self._schema_validator: SchemaValidator = self.null_schema_validator
        # use setter to properly initialize data
        self.data = data
        self.batched_data = batched_data
--- a/server/service_repo/auth/fixed_user.py
+++ b/server/service_repo/auth/fixed_user.py
@@ -5,27 +5,45 @@ from typing import Sequence, TypeVar
 import attr

 from config import config
+from config.info import get_default_company

 T = TypeVar("T", bound="FixedUser")


+class FixedUsersError(Exception):
+    pass
+
+
@attr.s(auto_attribs=True)
 class FixedUser:
    username: str
    password: str
    name: str
+    company: str = get_default_company()

    def __attrs_post_init__(self):
-        self.user_id = hashlib.md5(f"{self.username}:{self.password}".encode()).hexdigest()
+        self.user_id = hashlib.md5(f"{self.company}:{self.username}".encode()).hexdigest()

    @classmethod
    def enabled(cls):
        return config.get("apiserver.auth.fixed_users.enabled", False)

+    @classmethod
+    def validate(cls):
+        if not cls.enabled():
+            return
+        users = cls.from_config()
+        if len({user.username for user in users}) < len(users):
+            raise FixedUsersError(
+                "Duplicate user names found in fixed users configuration"
+            )
+
    @classmethod
    @lru_cache()
    def from_config(cls) -> Sequence[T]:
-        return [cls(**user) for user in config.get("apiserver.auth.fixed_users.users", [])]
+        return [
+            cls(**user) for user in config.get("apiserver.auth.fixed_users.users", [])
+        ]

    @classmethod
    @lru_cache()
--- a/server/service_repo/endpoint.py
+++ b/server/service_repo/endpoint.py
@@ -1,5 +1,6 @@
+from enum import Enum
 from typing import Callable, Sequence, Text
-
+from boltons.iterutils import remap
 from jsonmodels import models
 from jsonmodels.errors import FieldNotSupported

@@ -87,7 +88,14 @@ class Endpoint(object):
            Provided data_model schema if available
            """
            try:
-                return data_model.to_json_schema()
+                res = data_model.to_json_schema()
+
+                def visit(path, key, value):
+                    if isinstance(value, Enum):
+                        value = str(value)
+                    return key, value
+
+                return remap(res, visit=visit)
            except (FieldNotSupported, TypeError):
                return str(data_model.__name__)

--- a/server/service_repo/service_repo.py
+++ b/server/service_repo/service_repo.py
@@ -9,6 +9,7 @@ import jsonmodels.models
 import timing_context
 from apierrors import APIError
 from apierrors.errors.bad_request import RequestPathHasInvalidVersion
+from api_version import __version__ as _api_version_
 from config import config
 from service_repo.base import PartialVersion
 from .apicall import APICall
@@ -34,7 +35,7 @@ class ServiceRepo(object):
    """If the check is set, parsing will fail for endpoint request with the version that is grater than the current 
    maximum """

-    _max_version = PartialVersion("2.4")
+    _max_version = PartialVersion(".".join(_api_version_.split(".")[:2]))
    """ Maximum version number (the highest min_version value across all endpoints) """

    _endpoint_exp = (
@@ -166,7 +167,7 @@ class ServiceRepo(object):
            return

        assert isinstance(endpoint, Endpoint)
-        call.actual_endpoint_version: PartialVersion = endpoint.min_version
+        call.actual_endpoint_version = endpoint.min_version
        call.requires_authorization = endpoint.authorize
        return endpoint

--- a/server/services/events.py
+++ b/server/services/events.py
@@ -2,12 +2,15 @@ import itertools
 from collections import defaultdict
 from operator import itemgetter

-import six
-
 from apierrors import errors
 from apimodels.events import (
    MultiTaskScalarMetricsIterHistogramRequest,
    ScalarMetricsIterHistogramRequest,
+    DebugImagesRequest,
+    DebugImageResponse,
+    MetricEvents,
+    IterationEvents,
+    TaskMetricsRequest,
 )
 from bll.event import EventBLL
 from bll.event.event_metrics import EventMetrics
@@ -211,7 +214,7 @@ def vector_metrics_iter_histogram(call, company_id, req_model):
@endpoint("events.get_task_events", required_fields=["task"])
 def get_task_events(call, company_id, _):
    task_id = call.data["task"]
-    batch_size = call.data.get("batch_size")
+    batch_size = call.data.get("batch_size", 500)
    event_type = call.data.get("event_type")
    scroll_id = call.data.get("scroll_id")
    order = call.data.get("order") or "asc"
@@ -299,7 +302,7 @@ def multi_task_scalar_metrics_iter_histogram(
    call, company_id, req_model: MultiTaskScalarMetricsIterHistogramRequest
 ):
    task_ids = req_model.tasks
-    if isinstance(task_ids, six.string_types):
+    if isinstance(task_ids, str):
        task_ids = [s.strip() for s in task_ids.split(",")]
    # Note, bll already validates task ids as it needs their names
    call.result.data = dict(
@@ -481,7 +484,7 @@ def get_debug_images_v1_7(call, company_id, req_model):


@endpoint("events.debug_images", min_version="1.8", required_fields=["task"])
-def get_debug_images(call, company_id, req_model):
+def get_debug_images_v1_8(call, company_id, req_model):
    task_id = call.data["task"]
    iters = call.data.get("iters") or 1
    scroll_id = call.data.get("scroll_id")
@@ -507,6 +510,53 @@ def get_debug_images(call, company_id, req_model):
    )


+@endpoint(
+    "events.debug_images",
+    min_version="2.7",
+    request_data_model=DebugImagesRequest,
+    response_data_model=DebugImageResponse,
+)
+def get_debug_images(call, company_id, req_model: DebugImagesRequest):
+    tasks = set(m.task for m in req_model.metrics)
+    task_bll.assert_exists(call.identity.company, task_ids=tasks, allow_public=True)
+    result = event_bll.debug_images_iterator.get_task_events(
+        company_id=company_id,
+        metrics=[(m.task, m.metric) for m in req_model.metrics],
+        iter_count=req_model.iters,
+        navigate_earlier=req_model.navigate_earlier,
+        refresh=req_model.refresh,
+        state_id=req_model.scroll_id,
+    )
+
+    call.result.data_model = DebugImageResponse(
+        scroll_id=result.next_scroll_id,
+        metrics=[
+            MetricEvents(
+                task=task,
+                metric=metric,
+                iterations=[
+                    IterationEvents(iter=iteration["iter"], events=iteration["events"])
+                    for iteration in iterations
+                ],
+            )
+            for (task, metric, iterations) in result.metric_events
+        ],
+    )
+
+
+@endpoint("events.get_task_metrics", request_data_model=TaskMetricsRequest)
+def get_tasks_metrics(call: APICall, company_id, req_model: TaskMetricsRequest):
+    task_bll.assert_exists(
+        call.identity.company, task_ids=req_model.tasks, allow_public=True
+    )
+    res = event_bll.metrics.get_tasks_metrics(
+        company_id, task_ids=req_model.tasks, event_type=req_model.event_type
+    )
+    call.result.data = {
+        "metrics": [{"task": task, "metrics": metrics} for (task, metrics) in res]
+    }
+
+
@endpoint("events.delete_for_task", required_fields=["task"])
 def delete_for_task(call, company_id, req_model):
    task_id = call.data["task"]
--- a/server/services/projects.py
+++ b/server/services/projects.py
@@ -61,7 +61,7 @@ def get_by_id(call):
 def make_projects_get_all_pipelines(project_ids, specific_state=None):
    archived = EntityVisibility.archived.value

-    def ensure_system_tags():
+    def ensure_valid_fields():
        """
        Make sure system tags is always an array (required by subsequent $in in archived_tasks_cond
        """
@@ -73,6 +73,9 @@ def make_projects_get_all_pipelines(project_ids, specific_state=None):
                        "then": [],
                        "else": "$system_tags",
                    }
+                },
+                "status": {
+                    "$ifNull": ["$status", "unknown"]
                }
            }
        }
@@ -80,7 +83,7 @@ def make_projects_get_all_pipelines(project_ids, specific_state=None):
    status_count_pipeline = [
        # count tasks per project per status
        {"$match": {"project": {"$in": project_ids}}},
-        ensure_system_tags(),
+        ensure_valid_fields(),
        {
            "$group": {
                "_id": {
@@ -153,7 +156,7 @@ def make_projects_get_all_pipelines(project_ids, specific_state=None):
                "project": {"$in": project_ids},
            }
        },
-        ensure_system_tags(),
+        ensure_valid_fields(),
        {
            # for each project
            "$group": group_step
--- a/server/services/server/init.py
+++ b/server/services/server/init.py
@@ -11,7 +11,6 @@ from database.errors import translate_errors_context
 from database.model import Company
 from database.model.company import ReportStatsOption
 from service_repo import ServiceRepo, APICall, endpoint
-from version import __version__ as current_version


@endpoint("server.get_stats")
@@ -79,7 +78,7 @@ def report_stats(call: APICall, company: str, request: ReportStatsOptionRequest)
                stats_option = ReportStatsOption(
                    enabled=enabled,
                    enabled_time=datetime.utcnow(),
-                    enabled_version=current_version,
+                    enabled_version=get_version(),
                    enabled_user=call.identity.user,
                )
                updated = query.update(defaults__stats_option=stats_option)
@@ -87,7 +86,8 @@ def report_stats(call: APICall, company: str, request: ReportStatsOptionRequest)
                    raise errors.server_error.InternalError(
                        f"Failed setting report_stats to {enabled}"
                    )
-
-        result = ReportStatsOptionResponse(**stats_option.to_mongo())
+        data = stats_option.to_mongo()
+        data["current_version"] = get_version()
+        result = ReportStatsOptionResponse(**data)

    call.result.data_model = result
--- a/server/services/tasks.py
+++ b/server/services/tasks.py
@@ -1,18 +1,17 @@
 from copy import deepcopy
 from datetime import datetime
 from operator import attrgetter
-from typing import Sequence, Callable, Type, TypeVar
+from typing import Sequence, Callable, Type, TypeVar, Union

 import attr
 import dpath
 import mongoengine
-import six
 from mongoengine import EmbeddedDocument, Q
 from mongoengine.queryset.transform import COMPARISON_OPERATORS
 from pymongo import UpdateOne

 from apierrors import errors, APIError
-from apimodels.base import UpdateResponse
+from apimodels.base import UpdateResponse, IdResponse
 from apimodels.tasks import (
    StartedResponse,
    ResetResponse,
@@ -27,10 +26,19 @@ from apimodels.tasks import (
    EnqueueRequest,
    EnqueueResponse,
    DequeueResponse,
+    CloneRequest,
+    AddOrUpdateArtifactsRequest,
+    AddOrUpdateArtifactsResponse,
 )
 from bll.event import EventBLL
 from bll.queue import QueueBLL
-from bll.task import TaskBLL, ChangeStatusRequest, update_project_time, split_by
+from bll.task import (
+    TaskBLL,
+    ChangeStatusRequest,
+    update_project_time,
+    split_by,
+    ParameterKeyEscaper,
+)
 from bll.util import SetFieldsResolver
 from database.errors import translate_errors_context
 from database.model.model import Model
@@ -94,13 +102,37 @@ def get_by_id(call: APICall, company_id, req_model: TaskRequest):
        req_model.task, company_id=company_id, allow_public=True
    )
    task_dict = task.to_proper_dict()
-    conform_output_tags(call, task_dict)
+    unprepare_from_saved(call, task_dict)
    call.result.data = {"task": task_dict}


+def escape_execution_parameters(call: APICall):
+    default_prefix = "execution.parameters."
+
+    def escape_paths(paths, prefix=default_prefix):
+        return [
+            prefix + ParameterKeyEscaper.escape(path[len(prefix) :])
+            if path.startswith(prefix)
+            else path
+            for path in paths
+        ]
+
+    projection = Task.get_projection(call.data)
+    if projection:
+        Task.set_projection(call.data, escape_paths(projection))
+
+    ordering = Task.get_ordering(call.data)
+    if ordering:
+        ordering = Task.set_ordering(call.data, escape_paths(ordering, default_prefix))
+        Task.set_ordering(call.data, escape_paths(ordering, "-" + default_prefix))
+
+
@endpoint("tasks.get_all_ex", required_fields=[])
 def get_all_ex(call: APICall):
    conform_tag_fields(call, call.data)
+
+    escape_execution_parameters(call)
+
    with translate_errors_context():
        with TimingContext("mongo", "task_get_all_ex"):
            tasks = Task.get_many_with_join(
@@ -109,13 +141,16 @@ def get_all_ex(call: APICall):
                query_options=get_all_query_options,
                allow_public=True,  # required in case projection is requested for public dataset/versions
            )
-        conform_output_tags(call, tasks)
+        unprepare_from_saved(call, tasks)
        call.result.data = {"tasks": tasks}


@endpoint("tasks.get_all", required_fields=[])
 def get_all(call: APICall):
    conform_tag_fields(call, call.data)
+
+    escape_execution_parameters(call)
+
    with translate_errors_context():
        with TimingContext("mongo", "task_get_all"):
            tasks = Task.get_many(
@@ -125,7 +160,7 @@ def get_all(call: APICall):
                query_options=get_all_query_options,
                allow_public=True,  # required in case projection is requested for public dataset/versions
            )
-        conform_output_tags(call, tasks)
+        unprepare_from_saved(call, tasks)
        call.result.data = {"tasks": tasks}


@@ -220,6 +255,45 @@ create_fields = {
 }


+def prepare_for_save(call: APICall, fields: dict):
+    conform_tag_fields(call, fields)
+
+    # Strip all script fields (remove leading and trailing whitespace chars) to avoid unusable names and paths
+    for field in task_script_fields:
+        try:
+            path = f"script/{field}"
+            value = dpath.get(fields, path)
+            if isinstance(value, str):
+                value = value.strip()
+            dpath.set(fields, path, value)
+        except KeyError:
+            pass
+
+    parameters = safe_get(fields, "execution/parameters")
+    if parameters is not None:
+        # Escape keys to make them mongo-safe
+        parameters = {ParameterKeyEscaper.escape(k): v for k, v in parameters.items()}
+        dpath.set(fields, "execution/parameters", parameters)
+
+    return fields
+
+
+def unprepare_from_saved(call: APICall, tasks_data: Union[Sequence[dict], dict]):
+    if isinstance(tasks_data, dict):
+        tasks_data = [tasks_data]
+
+    conform_output_tags(call, tasks_data)
+
+    for task_data in tasks_data:
+        parameters = safe_get(task_data, "execution/parameters")
+        if parameters is not None:
+            # Escape keys to make them mongo-safe
+            parameters = {
+                ParameterKeyEscaper.unescape(k): v for k, v in parameters.items()
+            }
+            dpath.set(task_data, "execution/parameters", parameters)
+
+
 def prepare_create_fields(
    call: APICall, valid_fields=None, output=None, previous_task: Task = None
 ):
@@ -239,25 +313,7 @@ def prepare_create_fields(
            output = Output(destination=output_dest)
        fields["output"] = output

-    conform_tag_fields(call, fields)
-
-    # Strip all script fields (remove leading and trailing whitespace chars) to avoid unusable names and paths
-    for field in task_script_fields:
-        try:
-            path = "script/%s" % field
-            value = dpath.get(fields, path)
-            if isinstance(value, six.string_types):
-                value = value.strip()
-            dpath.set(fields, path, value)
-        except KeyError:
-            pass
-
-    parameters = safe_get(fields, "execution/parameters")
-    if parameters is not None:
-        parameters = {k.strip(): v for k, v in parameters.items()}
-        dpath.set(fields, "execution/parameters", parameters)
-
-    return fields
+    return prepare_for_save(call, fields)


 def _validate_and_get_task_from_call(call: APICall, **kwargs):
@@ -278,7 +334,9 @@ def validate(call: APICall, company_id, req_model: CreateRequest):
    _validate_and_get_task_from_call(call)


-@endpoint("tasks.create", request_data_model=CreateRequest)
+@endpoint(
+    "tasks.create", request_data_model=CreateRequest, response_data_model=IdResponse
+)
 def create(call: APICall, company_id, req_model: CreateRequest):
    task = _validate_and_get_task_from_call(call)

@@ -286,7 +344,26 @@ def create(call: APICall, company_id, req_model: CreateRequest):
        task.save()
        update_project_time(task.project)

-    call.result.data = {"id": task.id}
+    call.result.data_model = IdResponse(id=task.id)
+
+
+@endpoint(
+    "tasks.clone", request_data_model=CloneRequest, response_data_model=IdResponse
+)
+def clone_task(call: APICall, company_id, request: CloneRequest):
+    task = task_bll.clone_task(
+        company_id=company_id,
+        user_id=call.identity.user,
+        task_id=request.task,
+        name=request.new_task_name,
+        comment=request.new_task_comment,
+        parent=request.new_task_parent,
+        project=request.new_task_project,
+        tags=request.new_task_tags,
+        system_tags=request.new_task_system_tags,
+        execution_overrides=request.execution_overrides,
+    )
+    call.result.data_model = IdResponse(id=task.id)


 def prepare_update_fields(call: APICall, task, call_data):
@@ -296,8 +373,7 @@ def prepare_update_fields(call: APICall, task, call_data):
    t_fields = task_fields
    t_fields.add("output__error")
    fields = parse_from_call(call_data, update_fields, t_fields)
-    conform_tag_fields(call, fields)
-    return fields, valid_fields
+    return prepare_for_save(call, fields), valid_fields


@endpoint(
@@ -324,7 +400,7 @@ def update(call: APICall, company_id, req_model: UpdateRequest):
        )

        update_project_time(updated_fields.get("project"))
-        conform_output_tags(call, updated_fields)
+        unprepare_from_saved(call, updated_fields)
        return UpdateResponse(updated=updated_count, fields=updated_fields)


@@ -449,7 +525,7 @@ def edit(call: APICall, company_id, req_model: UpdateRequest):
            fixed_fields.update(last_update=now)
            updated = task.update(upsert=False, **fixed_fields)
            update_project_time(fields.get("project"))
-            conform_output_tags(call, fields)
+            unprepare_from_saved(call, fields)
            call.result.data_model = UpdateResponse(updated=updated, fields=fields)
        else:
            call.result.data_model = UpdateResponse(updated=0)
@@ -702,7 +778,7 @@ def cleanup_task(task, force=False):
    else:
        updated_models = 0

-    event_bll.delete_task_events(task.company, task.id)
+    event_bll.delete_task_events(task.company, task.id, allow_locked=force)

    return CleanupResult(
        deleted_models=deleted_models,
@@ -837,3 +913,18 @@ def ping(_, company_id, request: PingRequest):
    TaskBLL.set_last_update(
        task_ids=[request.task], company_id=company_id, last_update=datetime.utcnow()
    )
+
+
+@endpoint(
+    "tasks.add_or_update_artifacts",
+    min_version="2.6",
+    request_data_model=AddOrUpdateArtifactsRequest,
+    response_data_model=AddOrUpdateArtifactsResponse,
+)
+def add_or_update_artifacts(
+    call: APICall, company_id, request: AddOrUpdateArtifactsRequest
+):
+    added, updated = TaskBLL.add_or_update_artifacts(
+        task_id=request.task, company_id=company_id, artifacts=request.artifacts
+    )
+    call.result.data_model = AddOrUpdateArtifactsResponse(added=added, updated=updated)
--- a/server/services/users.py
+++ b/server/services/users.py
@@ -7,10 +7,7 @@ from mongoengine import Q

 from apierrors import errors
 from apimodels.base import UpdateResponse
-from apimodels.users import (
-    CreateRequest,
-    SetPreferencesRequest,
-)
+from apimodels.users import CreateRequest, SetPreferencesRequest
 from bll.user import UserBLL
 from config import config
 from database.errors import translate_errors_context
@@ -19,6 +16,7 @@ from database.model.company import Company
 from database.model.user import User
 from database.utils import parse_from_call
 from service_repo import APICall, endpoint
+from utilities.json import loads, dumps

 log = config.logger(__file__)
 get_all_query_options = User.QueryParameterOptions(list_fields=("id",))
@@ -160,7 +158,10 @@ def update(call, company_id, _):

 def get_user_preferences(call):
    user_id = call.identity.user
-    return get_user(call, user_id, ["preferences"]).get("preferences", {})
+    preferences = get_user(call, user_id, ["preferences"]).get("preferences")
+    if preferences and isinstance(preferences, str):
+        preferences = loads(preferences)
+    return preferences or {}


@endpoint("users.get_preferences")
@@ -169,9 +170,7 @@ def get_preferences(call):
    return {"preferences": get_user_preferences(call)}


-@endpoint(
-    "users.set_preferences", request_data_model=SetPreferencesRequest
-)
+@endpoint("users.set_preferences", request_data_model=SetPreferencesRequest)
 def set_preferences(call, company_id, req_model):
    # type: (APICall, str, SetPreferencesRequest) -> Dict
    assert isinstance(call, APICall)
@@ -205,9 +204,11 @@ def set_preferences(call, company_id, req_model):
        updated, fields = 0, {}
    else:
        with translate_errors_context("updating user preferences"):
-            fields = dict(preferences=new_preferences)
            updated = User.objects(id=call.identity.user, company=company_id).update(
-                upsert=False, **fields
+                upsert=False, preferences=dumps(new_preferences)
            )

-    return {"updated": updated, "fields": fields if updated else {}}
+    return {
+        "updated": updated,
+        "fields": {"preferences": new_preferences} if updated else {},
+    }
--- a/server/tests/automated/test_entity_ordering.py
+++ b/server/tests/automated/test_entity_ordering.py
@@ -1,14 +1,14 @@
 import operator
 from time import sleep

-from typing import Sequence
+from typing import Sequence, Mapping

 from tests.automated import TestService


 class TestEntityOrdering(TestService):
    test_comment = "Entity ordering test"
-    only_fields = ["id", "started", "comment"]
+    only_fields = ["id", "started", "comment", "execution.parameters"]

    def setUp(self, **kwargs):
        super().setUp(**kwargs)
@@ -27,6 +27,9 @@ class TestEntityOrdering(TestService):
        # sort by the same field that we use for the search
        self._assertGetTasksWithOrdering(order_by="comment")

+        # sort by parameter which type is not part of db schema
+        self._assertGetTasksWithOrdering(order_by="execution.parameters.test")
+
    def test_order_with_paging(self):
        order_field = "started"
        # all results in one page
@@ -52,7 +55,7 @@ class TestEntityOrdering(TestService):
    def _get_page_tasks(self, order_by, page: int, page_size: int) -> Sequence:
        return self.api.tasks.get_all_ex(
            only_fields=self.only_fields,
-            order_by=order_by,
+            order_by=[order_by] if isinstance(order_by, str) else order_by,
            comment=self.test_comment,
            page=page,
            page_size=page_size,
@@ -63,12 +66,19 @@ class TestEntityOrdering(TestService):
        Assert that vals are sorted in the ascending or descending order
        with None values are always coming from the end
        """
-        if None in vals:
-            first_null_idx = vals.index(None)
-            none_tail = vals[first_null_idx:]
-            vals = vals[:first_null_idx]
-            self.assertTrue(all(val is None for val in none_tail))
-            self.assertTrue(all(val is not None for val in vals))
+        empty = [None, "", [], {}]
+        empty_value = None
+        idx = 0
+        for idx, val in enumerate(vals):
+            if val in empty:
+                empty_value = val
+                break
+
+        if idx < len(vals) - 1:
+            none_tail = vals[idx:]
+            vals = vals[:idx]
+            self.assertTrue(all(val == empty_value for val in none_tail))
+            self.assertTrue(all(val != empty_value for val in vals))

        if ascending:
            cmp = operator.le
@@ -76,10 +86,18 @@ class TestEntityOrdering(TestService):
            cmp = operator.ge
        self.assertTrue(all(cmp(i, j) for i, j in zip(vals, vals[1:])))

+    def _get_value_for_path(self, data: Mapping, field_path: Sequence[str]):
+        val = None
+        for name in field_path:
+            val = data.get(name)
+            data = val if isinstance(val, dict) else {}
+
+        return val
+
    def _assertGetTasksWithOrdering(self, order_by: str = None, **kwargs):
        tasks = self.api.tasks.get_all_ex(
            only_fields=self.only_fields,
-            order_by=order_by,
+            order_by=[order_by] if isinstance(order_by, str) else order_by,
            comment=self.test_comment,
            **kwargs,
        ).tasks
@@ -87,12 +105,17 @@ class TestEntityOrdering(TestService):
        if order_by:
            # test that the output is correctly ordered
            field_name = order_by if not order_by.startswith("-") else order_by[1:]
-            field_vals = [t.get(field_name) for t in tasks]
+            field_vals = [self._get_value_for_path(t, field_name.split(".")) for t in tasks]
            self._assertSorted(field_vals, ascending=not order_by.startswith("-"))

    def _create_tasks(self):
-        tasks = [self._temp_task() for _ in range(10)]
-        for _, task in zip(range(5), tasks):
+        tasks = [
+            self._temp_task(
+                **(dict(execution={"parameters": {"test": f"{i}"} if i >= 5 else {}}))
+            )
+            for i in range(10)
+        ]
+        for idx, task in zip(range(5), tasks):
            self.api.tasks.started(task=task)
            sleep(0.1)
        return tasks
--- a/server/tests/automated/test_task_events.py
+++ b/server/tests/automated/test_task_events.py
@@ -2,83 +2,199 @@
 Comprehensive test of all(?) use cases of datasets and frames
 """
 import json
+import time
 import unittest
+from functools import partial
 from statistics import mean
+from typing import Sequence

 import es_factory
-from config import config
 from tests.automated import TestService

-log = config.logger(__file__)
-

 class TestTaskEvents(TestService):
-    def setUp(self, version="1.7"):
+    def setUp(self, version="2.7"):
        super().setUp(version=version)

-        self.created_tasks = []
-
-        self.task = dict(
-            name="test task events",
-            type="training",
-            input=dict(mapping={}, view=dict(entries=[])),
+    def _temp_task(self, name="test task events"):
+        task_input = dict(
+            name=name, type="training", input=dict(mapping={}, view=dict(entries=[])),
        )
-        res, self.task_id = self.api.send("tasks.create", self.task, extract="id")
-        assert res.meta.result_code == 200
-        self.created_tasks.append(self.task_id)
+        return self.create_temp("tasks", **task_input)

-    def tearDown(self):
-        log.info("Cleanup...")
-        for task_id in self.created_tasks:
-            try:
-                self.api.send("tasks.delete", dict(task=task_id, force=True))
-            except Exception as ex:
-                log.exception(ex)
-
-    def create_task_event(self, type, iteration):
+    def _create_task_event(self, type_, task, iteration, **kwargs):
        return {
            "worker": "test",
-            "type": type,
-            "task": self.task_id,
+            "type": type_,
+            "task": task,
            "iter": iteration,
-            "timestamp": es_factory.get_timestamp_millis()
+            "timestamp": es_factory.get_timestamp_millis(),
+            **kwargs,
        }

-    def copy_and_update(self, src_obj, new_data):
+    def _copy_and_update(self, src_obj, new_data):
        obj = src_obj.copy()
        obj.update(new_data)
        return obj

+    def test_task_metrics(self):
+        tasks = {
+            self._temp_task(): {
+                "Metric1": ["training_debug_image"],
+                "Metric2": ["training_debug_image", "log"],
+            },
+            self._temp_task(): {"Metric3": ["training_debug_image"]},
+        }
+        events = [
+            self._create_task_event(
+                event_type,
+                task=task,
+                iteration=1,
+                metric=metric,
+                variant="Test variant",
+            )
+            for task, metrics in tasks.items()
+            for metric, event_types in metrics.items()
+            for event_type in event_types
+        ]
+        self.send_batch(events)
+        self._assert_task_metrics(tasks, "training_debug_image")
+        self._assert_task_metrics(tasks, "log")
+        self._assert_task_metrics(tasks, "training_stats_scalar")
+
+    def _assert_task_metrics(self, tasks: dict, event_type: str):
+        res = self.api.events.get_task_metrics(tasks=list(tasks), event_type=event_type)
+        for task, metrics in tasks.items():
+            res_metrics = next(
+                (tm.metrics for tm in res.metrics if tm.task == task), ()
+            )
+            self.assertEqual(
+                set(res_metrics),
+                set(
+                    metric for metric, events in metrics.items() if event_type in events
+                ),
+            )
+
+    def test_task_debug_images(self):
+        task = self._temp_task()
+        metric = "Metric1"
+        variants = [("Variant1", 7), ("Variant2", 4)]
+        iterations = 10
+
+        # test empty
+        res = self.api.events.debug_images(
+            metrics=[{"task": task, "metric": metric}],
+            iters=5,
+        )
+        self.assertFalse(res.metrics)
+
+        # create events
+        events = [
+            self._create_task_event(
+                "training_debug_image",
+                task=task,
+                iteration=n,
+                metric=metric,
+                variant=variant,
+                url=f"{metric}_{variant}_{n % unique_images}",
+            )
+            for n in range(iterations)
+            for (variant, unique_images) in variants
+        ]
+        self.send_batch(events)
+
+        # init testing
+        unique_images = [unique for (_, unique) in variants]
+        scroll_id = None
+        assert_debug_images = partial(
+            self._assertDebugImages,
+            task=task,
+            metric=metric,
+            max_iter=iterations - 1,
+            unique_images=unique_images,
+        )
+
+        # test forward navigation
+        for page in range(3):
+            scroll_id = assert_debug_images(scroll_id=scroll_id, page=page)
+
+        # test backwards navigation
+        scroll_id = assert_debug_images(
+            scroll_id=scroll_id, page=0, navigate_earlier=False
+        )
+
+        # beyond the latest iteration and back
+        res = self.api.events.debug_images(
+            metrics=[{"task": task, "metric": metric}],
+            iters=5,
+            scroll_id=scroll_id,
+            navigate_earlier=False,
+        )
+        self.assertEqual(len(res["metrics"][0]["iterations"]), 0)
+        assert_debug_images(scroll_id=scroll_id, page=1)
+
+        # refresh
+        assert_debug_images(scroll_id=scroll_id, page=0, refresh=True)
+
+    def _assertDebugImages(
+        self,
+        task,
+        metric,
+        max_iter: int,
+        unique_images: Sequence[int],
+        scroll_id,
+        page: int,
+        iters: int = 5,
+        **extra_params,
+    ):
+        res = self.api.events.debug_images(
+            metrics=[{"task": task, "metric": metric}],
+            iters=iters,
+            scroll_id=scroll_id,
+            **extra_params,
+        )
+        data = res["metrics"][0]
+        self.assertEqual(data["task"], task)
+        self.assertEqual(data["metric"], metric)
+        left_iterations = max(0, max(unique_images) - page * iters)
+        self.assertEqual(len(data["iterations"]), min(iters, left_iterations))
+        for it in data["iterations"]:
+            events_per_iter = sum(
+                1 for unique in unique_images if unique > max_iter - it["iter"]
+            )
+            self.assertEqual(len(it["events"]), events_per_iter)
+        return res.scroll_id
+
    def test_task_logs(self):
        events = []
-        for iter in range(10):
-            log_event = self.create_task_event("log", iteration=iter)
+        task = self._temp_task()
+        for iter_ in range(10):
+            log_event = self._create_task_event("log", task, iteration=iter_)
            events.append(
-                self.copy_and_update(
+                self._copy_and_update(
                    log_event,
-                    {"msg": "This is a log message from test task iter " + str(iter)},
+                    {"msg": "This is a log message from test task iter " + str(iter_)},
                )
            )
            # sleep so timestamp is not the same
-            import time
-
            time.sleep(0.01)
        self.send_batch(events)

-        data = self.api.events.get_task_log(task=self.task_id)
+        data = self.api.events.get_task_log(task=task)
        assert len(data["events"]) == 10

-        self.api.tasks.reset(task=self.task_id)
-        data = self.api.events.get_task_log(task=self.task_id)
+        self.api.tasks.reset(task=task)
+        data = self.api.events.get_task_log(task=task)
        assert len(data["events"]) == 0

    def test_task_metric_value_intervals_keys(self):
        metric = "Metric1"
        variant = "Variant1"
        iter_count = 100
+        task = self._temp_task()
        events = [
            {
-                **self.create_task_event("training_stats_scalar", iteration),
+                **self._create_task_event("training_stats_scalar", task, iteration),
                "metric": metric,
                "variant": variant,
                "value": iteration,
@@ -88,19 +204,65 @@ class TestTaskEvents(TestService):
        self.send_batch(events)
        for key in None, "iter", "timestamp", "iso_time":
            with self.subTest(key=key):
-                data = self.api.events.scalar_metrics_iter_histogram(task=self.task_id, key=key)
+                data = self.api.events.scalar_metrics_iter_histogram(task=task, key=key)
                self.assertIn(metric, data)
                self.assertIn(variant, data[metric])
                self.assertIn("x", data[metric][variant])
                self.assertIn("y", data[metric][variant])

+    def test_multitask_events_many_metrics(self):
+        tasks = [
+            self._temp_task(name="test events1"),
+            self._temp_task(name="test events2"),
+        ]
+        iter_count = 10
+        metrics_count = 10
+        variants_count = 10
+        events = [
+            {
+                **self._create_task_event("training_stats_scalar", task, iteration),
+                "metric": f"Metric{metric_idx}",
+                "variant": f"Variant{variant_idx}",
+                "value": iteration,
+            }
+            for iteration in range(iter_count)
+            for task in tasks
+            for metric_idx in range(metrics_count)
+            for variant_idx in range(variants_count)
+        ]
+        self.send_batch(events)
+        data = self.api.events.multi_task_scalar_metrics_iter_histogram(tasks=tasks)
+        self._assert_metrics_and_variants(
+            data.metrics,
+            metrics=metrics_count,
+            variants=variants_count,
+            tasks=tasks,
+            iterations=iter_count,
+        )
+
+    def _assert_metrics_and_variants(
+        self, data: dict, metrics: int, variants: int, tasks: Sequence, iterations: int
+    ):
+        self.assertEqual(len(data), metrics)
+        for m in range(metrics):
+            metric_data = data[f"Metric{m}"]
+            self.assertEqual(len(metric_data), variants)
+            for v in range(variants):
+                variant_data = metric_data[f"Variant{v}"]
+                self.assertEqual(len(variant_data), len(tasks))
+                for t in tasks:
+                    task_data = variant_data[t]
+                    self.assertEqual(len(task_data["x"]), iterations)
+                    self.assertEqual(len(task_data["y"]), iterations)
+
    def test_task_metric_value_intervals(self):
        metric = "Metric1"
        variant = "Variant1"
        iter_count = 100
+        task = self._temp_task()
        events = [
            {
-                **self.create_task_event("training_stats_scalar", iteration),
+                **self._create_task_event("training_stats_scalar", task, iteration),
                "metric": metric,
                "variant": variant,
                "value": iteration,
@@ -109,13 +271,13 @@ class TestTaskEvents(TestService):
        ]
        self.send_batch(events)

-        data = self.api.events.scalar_metrics_iter_histogram(task=self.task_id)
+        data = self.api.events.scalar_metrics_iter_histogram(task=task)
        self._assert_metrics_histogram(data[metric][variant], iter_count, 100)

-        data = self.api.events.scalar_metrics_iter_histogram(task=self.task_id, samples=100)
+        data = self.api.events.scalar_metrics_iter_histogram(task=task, samples=100)
        self._assert_metrics_histogram(data[metric][variant], iter_count, 100)

-        data = self.api.events.scalar_metrics_iter_histogram(task=self.task_id, samples=10)
+        data = self.api.events.scalar_metrics_iter_histogram(task=task, samples=10)
        self._assert_metrics_histogram(data[metric][variant], iter_count, 10)

    def _assert_metrics_histogram(self, data, iters, samples):
@@ -130,7 +292,8 @@ class TestTaskEvents(TestService):
            )

    def test_task_plots(self):
-        event = self.create_task_event("plot", 0)
+        task = self._temp_task()
+        event = self._create_task_event("plot", task, 0)
        event["metric"] = "roc"
        event.update(
            {
@@ -179,7 +342,7 @@ class TestTaskEvents(TestService):
        )
        self.send(event)

-        event = self.create_task_event("plot", 100)
+        event = self._create_task_event("plot", task, 100)
        event["metric"] = "confusion"
        event.update(
            {
@@ -222,11 +385,11 @@ class TestTaskEvents(TestService):
        )
        self.send(event)

-        data = self.api.events.get_task_plots(task=self.task_id)
+        data = self.api.events.get_task_plots(task=task)
        assert len(data["plots"]) == 2

-        self.api.tasks.reset(task=self.task_id)
-        data = self.api.events.get_task_plots(task=self.task_id)
+        self.api.tasks.reset(task=task)
+        data = self.api.events.get_task_plots(task=task)
        assert len(data["plots"]) == 0

    def send_batch(self, events):
--- a/server/tests/automated/test_tasks_edit.py
+++ b/server/tests/automated/test_tasks_edit.py
@@ -6,6 +6,9 @@ log = config.logger(__file__)


 class TestTasksEdit(TestService):
+    def setUp(self, **kwargs):
+        super().setUp(version=2.5)
+
    def new_task(self, **kwargs):
        return self.create_temp(
            "tasks", type="testing", name="test", input=dict(view=dict()), **kwargs
@@ -34,3 +37,39 @@ class TestTasksEdit(TestService):
        self.api.models.edit(model=not_ready_model, ready=False)
        self.assertFalse(self.api.models.get_by_id(model=not_ready_model).model.ready)
        self.api.tasks.edit(task=task, execution=dict(model=not_ready_model))
+
+    def test_clone_task(self):
+        script = dict(
+            binary="python",
+            requirements=dict(pip=["six"]),
+            repository="https://example.come/foo/bar",
+            entry_point="test.py",
+            diff="foo",
+        )
+        execution = dict(parameters=dict(test="Test"))
+        tags = ["hello"]
+        system_tags = ["development", "test"]
+        task = self.new_task(
+            script=script, execution=execution, tags=tags, system_tags=system_tags
+        )
+
+        new_name = "new test"
+        new_tags = ["by"]
+        execution_overrides = dict(framework="Caffe")
+        new_task_id = self.api.tasks.clone(
+            task=task,
+            new_task_name=new_name,
+            new_task_tags=new_tags,
+            execution_overrides=execution_overrides,
+            new_task_parent=task,
+        ).id
+        new_task = self.api.tasks.get_by_id(task=new_task_id).task
+        self.assertEqual(new_task.name, new_name)
+        self.assertEqual(new_task.type, "testing")
+        self.assertEqual(new_task.tags, new_tags)
+        self.assertEqual(new_task.status, "created")
+        self.assertEqual(new_task.script, script)
+        self.assertEqual(new_task.parent, task)
+        self.assertEqual(new_task.execution.parameters, execution["parameters"])
+        self.assertEqual(new_task.execution.framework, execution_overrides["framework"])
+        self.assertEqual(new_task.system_tags, [])
--- a/server/tests/automated/test_workers.py
+++ b/server/tests/automated/test_workers.py
@@ -108,7 +108,7 @@ class TestWorkersService(TestService):
        from_date = to_date - timedelta(days=1)

        # no variants
-        res = self.api.workers.get_statistics(
+        res = self.api.workers.get_stats(
            items=[
                dict(key="cpu_usage", aggregation="avg"),
                dict(key="cpu_usage", aggregation="max"),
@@ -142,7 +142,7 @@ class TestWorkersService(TestService):
        )

        # split by variants
-        res = self.api.workers.get_statistics(
+        res = self.api.workers.get_stats(
            items=[dict(key="cpu_usage", aggregation="avg")],
            from_date=from_date.timestamp(),
            to_date=to_date.timestamp(),
@@ -165,7 +165,7 @@ class TestWorkersService(TestService):

        assert all(_check_metric_and_variants(worker) for worker in res["workers"])

-        res = self.api.workers.get_statistics(
+        res = self.api.workers.get_stats(
            items=[dict(key="cpu_usage", aggregation="avg")],
            from_date=from_date.timestamp(),
            to_date=to_date.timestamp(),
--- a/server/tests/requirements.txt
+++ b/server/tests/requirements.txt
@@ -1 +1,2 @@
-numpy>=1.12.1
+nose==1.3.7
+parameterized>=0.7.1
--- a/server/updates.py
+++ b/server/updates.py
@@ -8,8 +8,9 @@ import requests
 from semantic_version import Version

 from config import config
+from config.info import get_version
 from database.model.settings import Settings
-from version import __version__ as current_version
+from utilities.threads_manager import ThreadsManager

 log = config.logger(__name__)

@@ -48,7 +49,7 @@ class CheckUpdatesThread(Thread):

        response = requests.get(
            url,
-            json={"versions": {self.component_name: str(current_version)}, "uid": uid},
+            json={"versions": {self.component_name: str(get_version())}, "uid": uid},
            timeout=float(
                config.get("apiserver.check_for_updates.request_timeout_sec", 3.0)
            ),
@@ -65,7 +66,7 @@ class CheckUpdatesThread(Thread):
        if not latest_version:
            return

-        cur_version = Version(current_version)
+        cur_version = Version(get_version())
        latest_version = Version(latest_version)
        if cur_version >= latest_version:
            return
@@ -80,7 +81,16 @@ class CheckUpdatesThread(Thread):
        )

    def _check_updates(self):
-        while True:
+        update_interval_sec = max(
+            float(
+                config.get(
+                    "apiserver.check_for_updates.check_interval_sec",
+                    60 * 60 * 24,
+                )
+            ),
+            60 * 5,
+        )
+        while not ThreadsManager.terminating:
            # noinspection PyBroadException
            try:
                response = self._check_new_version_available()
@@ -98,17 +108,7 @@ class CheckUpdatesThread(Thread):
            except Exception:
                log.exception("Failed obtaining updates")

-            sleep(
-                max(
-                    float(
-                        config.get(
-                            "apiserver.check_for_updates.check_interval_sec",
-                            60 * 60 * 24,
-                        )
-                    ),
-                    60 * 5,
-                )
-            )
+            sleep(update_interval_sec)


 check_updates_thread = CheckUpdatesThread()
--- a/server/utilities/dicts.py
+++ b/server/utilities/dicts.py
@@ -12,6 +12,24 @@ def flatten_nested_items(
    for key, value in dictionary.items():
        path = prefix + (key,)
        if isinstance(value, dict) and nesting != 0:
-            yield from flatten_nested_items(value, next_nesting, include_leaves, prefix=path)
+            yield from flatten_nested_items(
+                value, next_nesting, include_leaves, prefix=path
+            )
        elif include_leaves is None or key in include_leaves:
            yield path, value
+
+
+def deep_merge(source: dict, override: dict) -> dict:
+    """
+    Merge the override dict into the source in-place
+    Contrary to the dpath.merge the sequences are not expanded
+    If override contains the sequence with the same name as source
+    then the whole sequence in the source is overridden
+    """
+    for key, value in override.items():
+        if key in source and isinstance(source[key], dict) and isinstance(value, dict):
+            deep_merge(source[key], value)
+        else:
+            source[key] = value
+
+    return source
--- a/server/utilities/threads_manager.py
+++ b/server/utilities/threads_manager.py
@@ -1,10 +1,12 @@
 from functools import wraps
 from threading import Lock, Thread
+from typing import ClassVar


 class ThreadsManager:
    objects = {}
    lock = Lock()
+    terminating: ClassVar[bool] = False

    def __init__(self, name=None, **threads):
        super(ThreadsManager, self).__init__()
@@ -12,7 +14,7 @@ class ThreadsManager:
        self.objects = {}
        self.lock = Lock()

-        for name, thread in threads.items():
+        for thread_name, thread in threads.items():
            if issubclass(thread, Thread):
                thread = thread()
                thread.start()
@@ -20,9 +22,9 @@ class ThreadsManager:
                if not thread.is_alive():
                    thread.start()
            else:
-                raise Exception(f"Expected thread or thread class ({name}): {thread}")
+                raise Exception(f"Expected thread or thread class ({thread_name}): {thread}")

-            self.objects[name] = thread
+            self.objects[thread_name] = thread

    def register(self, thread_name, daemon=True):
        def decorator(f):
--- a/server/version.py
+++ b/server/version.py
@@ -1 +1 @@
-__version__ = "0.12.0"
+__version__ = "0.14.0"
--- a/webserver/README.md
+++ b/webserver/README.md
@@ -2,13 +2,13 @@

 ## Introduction

-The webserver is the **trains-server**'s component responsible for serving the TRAINS webapp.
+The webserver is the **trains-server**'s component responsible for serving the Trains webapp.
 For this purpose, we use an [NGINX](https://www.nginx.com/) server.

 ## Configuration

-In order to serve the TRAINS webapp, the following is required:
-* The pre-built TRAINS webapp should be copied to the NGINX html directory (usually `/usr/share/nginx/html`)
+In order to serve the Trains webapp, the following is required:
+* The pre-built Trains webapp should be copied to the NGINX html directory (usually `/usr/share/nginx/html`)
 * The default NGINX port (usually `80`) should be changed to match the **trains-server** configuration (usually `8080`)

 NOTE: This configuration may vary in different systems, depending on the NGINX version and distribution used.
Author	SHA1	Message	Date
allegroai	fb5c06e9c3	Version bump to v0.14.0	2020-03-05 20:03:48 +02:00
allegroai	1a9bbc9420	Update docs with AMI IDs for v0.14.0	2020-03-05 20:03:33 +02:00
allegroai	294da32401	Fix getting empty metrics from task	2020-03-05 14:57:20 +02:00
allegroai	7f00672010	Fix missing routing value when downloading tasks events	2020-03-05 14:55:40 +02:00
allegroai	99bf89a360	Add pre-populate feature to allow starting a new server installation with packaged example experiments	2020-03-05 14:54:34 +02:00
allegroai	6c8508eb7f	Add support for pagination in events.debug_images	2020-03-01 18:00:07 +02:00
allegroai	69714d5b5c	Use top-level module for api version number instead of a fixed value	2020-03-01 17:51:03 +02:00
allegroai	f9516ec7d3	Fix ActualEnumField initialization in case default was not provided	2020-03-01 17:47:47 +02:00
allegroai	6fdde93dee	Add migration script	2020-03-01 17:46:10 +02:00
allegroai	7afc71ec91	Update requirements	2020-02-26 17:26:59 +02:00
allegroai	4595117d91	Support setting fileserver upload folder using an environment variable	2020-02-26 17:26:46 +02:00
allegroai	8630cc1021	Fix queue update time to update when task is taken from queue, not when queried	2020-02-20 18:26:56 +02:00
allegroai	135885b609	Improve unit test for entity ordering	2020-02-04 18:21:13 +02:00
allegroai	eb0865662c	Fix projects aggregation on tasks with invalid status	2020-02-04 18:21:04 +02:00
allegroai	b7b94e7ae5	Add more validation when parsing task call	2020-02-04 18:19:07 +02:00
allegroai	72be8bee19	Limit metrics and variants to avoid ES error	2020-02-04 18:18:26 +02:00
allegroai	0722b20c1c	Fix task scalars comparison aggregation	2020-02-04 18:16:27 +02:00
allegroai	a392a0e6ff	Fix request field required constraint	2020-02-04 18:12:30 +02:00
allegroai	e22fa2f478	Limit dpath requirement	2020-02-04 18:09:55 +02:00
allegroai	8b49c1ac06	Update docs with AWS AMI IDs for v0.13.0	2020-01-07 14:40:09 +02:00
allegroai	da1182a405	Update docs with AWS AMI IDs for v0.13.0	2020-01-06 18:41:09 +02:00
allegroai	53e995ee8c	Version bump to v0.13.0	2020-01-06 15:28:31 +02:00
allegroai	4732dc1a88	Remove deprecated env vars from docker compose files	2020-01-06 12:23:06 +02:00
allegroai	e325bcaf67	Hash ROI id to make sure it does not violate Elastic's 512 bytes id limitation	2020-01-05 09:20:38 +02:00
allegroai	a7c30453db	Update documentation	2020-01-05 09:19:37 +02:00
allegroai	dedac3b2fe	Allow using "$", "." and whitespaces in hyper-parameter keys	2020-01-02 15:28:50 +02:00
allegroai	7d10bbdf8e	Update requirement	2020-01-02 15:27:04 +02:00
allegroai	72213dffa4	Update migration to convert user preferences to JSON	2020-01-02 15:26:45 +02:00
allegroai	f778837d4b	Change the way user preferences are stored (JSON instead of plain dict)	2020-01-02 15:23:47 +02:00
allegroai	153ed6a7b7	Update documentation	2020-01-02 15:21:35 +02:00
allegroai	5d279c8c5a	Add fixed user validation Fix the way a fixed user id is generated	2020-01-02 15:20:55 +02:00
allegroai	ed910d5f6a	Improve server threads shutdown on SIGTERM	2019-12-29 09:04:07 +02:00
allegroai	87d2b6fa15	Add some missing definitions	2019-12-29 09:03:19 +02:00
allegroai	94cfb17291	Add minor updates	2019-12-29 09:02:32 +02:00
allegroai	3f641d37b7	Optimize empty schema validator usage	2019-12-29 08:59:52 +02:00
allegroai	551be12f01	Move mongodb migrations inside the server's folder	2019-12-29 08:58:54 +02:00
allegroai	b536020058	Update documentation	2019-12-29 08:47:47 +02:00
Allegro AI	fb6fbc0a06	Update README.md	2019-12-25 14:21:16 +02:00
allegroai	5ae64fd791	Add support for tasks.clone	2019-12-24 18:01:48 +02:00
allegroai	f9776e4319	Allow two users to have the same full name	2019-12-24 17:58:59 +02:00
allegroai	75e736e7d5	Update readme files	2019-12-24 17:58:02 +02:00
allegroai	1e4756aa1d	Add support for atomic add/update of task artifacts	2019-12-24 17:57:26 +02:00
allegroai	52529d3c55	Avoid updating experiment last iteration for metric events related to machine/gpu monitoring	2019-12-21 18:14:13 +02:00
allegroai	53296e8891	Use a single definitive way to obtain server version and build	2019-12-21 18:13:05 +02:00
allegroai	1c87ebc900	Use trains-specific environment variables for server configuration	2019-12-21 18:10:48 +02:00
allegroai	14d9924ea0	Update .gitignore	2019-12-21 18:09:04 +02:00
allegroai	69f9b424c7	Update readme and documentation	2019-12-19 18:27:16 +02:00
allegroai	1a6da301a8	Update internal version string	2019-12-19 18:26:19 +02:00
allegroai	2728b3ed14	Add labels to standalone models	2019-12-14 23:54:24 +02:00
allegroai	38284eef1f	Add safe guards	2019-12-14 23:53:09 +02:00
allegroai	9debe1adcd	Improve resource monitoring	2019-12-14 23:52:39 +02:00
allegroai	cc93c15f8a	Optimize ELK	2019-12-14 23:50:26 +02:00