Print error on resource monitor failure

Fix git+ssh:// links inside installed packages not being converted properly to HTTPS authenticated and vice versa
Support new Retry.DEFAULT_BACKOFF_MAX in a backwards-compatible way
2025-06-26 18:16:15 +00:00 · 2023-05-11 16:18:11 +03:00 · 2023-05-11 16:16:51 +03:00 · 2023-05-11 16:16:18 +03:00 · 2023-05-11 16:15:06 +03:00 · 2023-04-10 10:58:10 +03:00
58 changed files with 5598 additions and 1026 deletions
--- a/README.md
+++ b/README.md
@@ -8,15 +8,15 @@ ML-Ops scheduler & orchestration solution supporting Linux, macOS and Windows**
 [![GitHub license](https://img.shields.io/github/license/allegroai/clearml-agent.svg)](https://img.shields.io/github/license/allegroai/clearml-agent.svg)
 [![PyPI pyversions](https://img.shields.io/pypi/pyversions/clearml-agent.svg)](https://img.shields.io/pypi/pyversions/clearml-agent.svg)
 [![PyPI version shields.io](https://img.shields.io/pypi/v/clearml-agent.svg)](https://img.shields.io/pypi/v/clearml-agent.svg)
+[![PyPI Downloads](https://pepy.tech/badge/clearml-agent/month)](https://pypi.org/project/clearml-agent/)
 [![Artifact Hub](https://img.shields.io/endpoint?url=https://artifacthub.io/badge/repository/allegroai)](https://artifacthub.io/packages/search?repo=allegroai)
-    
 </div>

 ---

 ### ClearML-Agent
-#### *Formerly known as Trains Agent*

+#### *Formerly known as Trains Agent*

 * Run jobs (experiments) on any local or cloud based resource
 * Implement optimized resource utilization policies
@@ -24,23 +24,31 @@ ML-Ops scheduler & orchestration solution supporting Linux, macOS and Windows**
 * Launch-and-Forget service containers
 * [Cloud autoscaling](https://clear.ml/docs/latest/docs/guides/services/aws_autoscaler)
 * [Customizable cleanup](https://clear.ml/docs/latest/docs/guides/services/cleanup_service)
-* Advanced [pipeline building and execution](https://clear.ml/docs/latest/docs/guides/frameworks/pytorch/notebooks/table/tabular_training_pipeline)
+*
+Advanced [pipeline building and execution](https://clear.ml/docs/latest/docs/guides/frameworks/pytorch/notebooks/table/tabular_training_pipeline)

 It is a zero configuration fire-and-forget execution agent, providing a full ML/DL cluster solution.

 **Full Automation in 5 steps**
-1. ClearML Server [self-hosted](https://github.com/allegroai/clearml-server) or [free tier hosting](https://app.community.clear.ml)
-2. `pip install clearml-agent` ([install](#installing-the-clearml-agent) the ClearML Agent on any GPU machine: on-premises / cloud / ...)
-3. Create a [job](https://github.com/allegroai/clearml/docs/clearml-task.md) or Add [ClearML](https://github.com/allegroai/clearml) to your code with just 2 lines
-4. Change the [parameters](#using-the-clearml-agent) in the UI & schedule for [execution](#using-the-clearml-agent) (or automate with an [AutoML pipeline](#automl-and-orchestration-pipelines-))
+
+1. ClearML Server [self-hosted](https://github.com/allegroai/clearml-server)
+   or [free tier hosting](https://app.clear.ml)
+2. `pip install clearml-agent` ([install](#installing-the-clearml-agent) the ClearML Agent on any GPU machine:
+   on-premises / cloud / ...)
+3. Create a [job](https://github.com/allegroai/clearml/docs/clearml-task.md) or
+   Add [ClearML](https://github.com/allegroai/clearml) to your code with just 2 lines
+4. Change the [parameters](#using-the-clearml-agent) in the UI & schedule for [execution](#using-the-clearml-agent) (or
+   automate with an [AutoML pipeline](#automl-and-orchestration-pipelines-))
 5. :chart_with_downwards_trend: :chart_with_upwards_trend: :eyes:  :beer:

 "All the Deep/Machine-Learning DevOps your research needs, and then some... Because ain't nobody got time for that"

-**Try ClearML now** [Self Hosted](https://github.com/allegroai/clearml-server) or [Free tier Hosting](https://app.community.clear.ml)
-<a href="https://app.community.clear.ml"><img src="https://github.com/allegroai/clearml-agent/blob/master/docs/screenshots.gif?raw=true" width="100%"></a>
+**Try ClearML now** [Self Hosted](https://github.com/allegroai/clearml-server)
+or [Free tier Hosting](https://app.clear.ml)
+<a href="https://app.clear.ml"><img src="https://github.com/allegroai/clearml-agent/blob/master/docs/screenshots.gif?raw=true" width="100%"></a>

 ### Simple, Flexible Experiment Orchestration
+
 **The ClearML Agent was built to address the DL/ML R&D DevOps needs:**

 * Easily add & remove machines from the cluster
@@ -56,18 +64,23 @@ It is a zero configuration fire-and-forget execution agent, providing a full ML/

 *epsilon - Because we are :triangular_ruler: and nothing is really zero work

-
 ### Kubernetes Integration (Optional)
-We think Kubernetes is awesome, but it should be a choice.
-We designed `clearml-agent` so you can run bare-metal or inside a pod with any mix that fits your environment.
-#### Benefits of integrating existing K8s with ClearML-Agent 
+
+We think Kubernetes is awesome, but it should be a choice. We designed `clearml-agent` so you can run bare-metal or
+inside a pod with any mix that fits your environment.
+
+Find Dockerfiles in the [docker](./docker) dir and a helm Chart in https://github.com/allegroai/clearml-helm-charts
+
+#### Benefits of integrating existing K8s with ClearML-Agent
+
 - ClearML-Agent adds the missing scheduling capabilities to K8s
 - Allowing for more flexible automation from code
 - A programmatic interface for easier learning curve (and debugging)
 - Seamless integration with ML/DL experiment manager
- Web UI for customization, scheduling & prioritization of jobs 
+- Web UI for customization, scheduling & prioritization of jobs
+
+**Two K8s integration flavours**

-**Two K8s integration flavours** 
 - Spin ClearML-Agent as a long-lasting service pod
    - use [clearml-agent](https://hub.docker.com/r/allegroai/clearml-agent) docker image
    - map docker socket into the pod (soon replaced by [podman](https://github.com/containers/podman))
@@ -75,57 +88,66 @@ We designed `clearml-agent` so you can run bare-metal or inside a pod with any m
    - benefits: full use of the ClearML scheduling, no need to worry about wrong container images / lost pods etc.
    - downside: Sibling containers
 - Kubernetes Glue, map ClearML jobs directly to K8s jobs
-    - Run the [clearml-k8s glue](https://github.com/allegroai/clearml-agent/blob/master/examples/k8s_glue_example.py) on a K8s cpu node
-    - The clearml-k8s glue pulls jobs from the ClearML job execution queue and prepares a K8s job (based on provided yaml template)
-    - Inside the pod itself the clearml-agent will install the job (experiment) environment and spin and monitor the experiment's process
+    - Run the [clearml-k8s glue](https://github.com/allegroai/clearml-agent/blob/master/examples/k8s_glue_example.py) on
+      a K8s cpu node
+    - The clearml-k8s glue pulls jobs from the ClearML job execution queue and prepares a K8s job (based on provided
+      yaml template)
+    - Inside the pod itself the clearml-agent will install the job (experiment) environment and spin and monitor the
+      experiment's process
    - benefits: Kubernetes full view of all running jobs in the system
-    - downside: No real scheduling (k8s scheduler), no docker image verification (post-mortem only) 
+    - downside: No real scheduling (k8s scheduler), no docker image verification (post-mortem only)

 ### Using the ClearML Agent
+
 **Full scale HPC with a click of a button**

-The ClearML Agent is a job scheduler that listens on job queue(s), pulls jobs, sets the job environments, executes the job and monitors its progress.
+The ClearML Agent is a job scheduler that listens on job queue(s), pulls jobs, sets the job environments, executes the
+job and monitors its progress.

 Any 'Draft' experiment can be scheduled for execution by a ClearML agent.

 A previously run experiment can be put into 'Draft' state by either of two methods:
-* Using the **'Reset'** action from the experiment right-click context menu in the
-  ClearML UI - This will clear any results and artifacts the previous run had created.
-* Using the **'Clone'** action from the experiment right-click context menu in the
-  ClearML UI - This will create a new 'Draft' experiment with the same configuration as the original experiment.

-An experiment is scheduled for execution using the **'Enqueue'** action from the experiment
- right-click context menu in the ClearML UI and selecting the execution queue.
+* Using the **'Reset'** action from the experiment right-click context menu in the ClearML UI - This will clear any
+  results and artifacts the previous run had created.
+* Using the **'Clone'** action from the experiment right-click context menu in the ClearML UI - This will create a new '
+  Draft' experiment with the same configuration as the original experiment.
+
+An experiment is scheduled for execution using the **'Enqueue'** action from the experiment right-click context menu in
+the ClearML UI and selecting the execution queue.

 See [creating an experiment and enqueuing it for execution](#from-scratch).

 Once an experiment is enqueued, it will be picked up and executed by a ClearML agent monitoring this queue.

 The ClearML UI Workers & Queues page provides ongoing execution information:
-  - Workers Tab: Monitor you cluster
+
+- Workers Tab: Monitor you cluster
    - Review available resources
    - Monitor machines statistics (CPU / GPU / Disk / Network)
-  - Queues Tab:
+- Queues Tab:
    - Control the scheduling order of jobs
    - Cancel or abort job execution
    - Move jobs between execution queues

 #### What The ClearML Agent Actually Does
+
 The ClearML Agent executes experiments using the following process:
-  - Create a new virtual environment (or launch the selected docker image)
-  - Clone the code into the virtual-environment (or inside the docker)
-  - Install python packages based on the package requirements listed for the experiment
-    - Special note for PyTorch: The ClearML Agent will automatically select the
-      torch packages based on the CUDA_VERSION environment variable of the machine
-  - Execute the code, while monitoring the process
-  - Log all stdout/stderr in the ClearML UI, including the cloning and installation process, for easy debugging
-  - Monitor the execution and allow you to manually abort the job using the ClearML UI (or, in the unfortunate case of a code crash, catch the error and signal the experiment has failed)
+
+- Create a new virtual environment (or launch the selected docker image)
+- Clone the code into the virtual-environment (or inside the docker)
+- Install python packages based on the package requirements listed for the experiment
+    - Special note for PyTorch: The ClearML Agent will automatically select the torch packages based on the CUDA_VERSION
+      environment variable of the machine
+- Execute the code, while monitoring the process
+- Log all stdout/stderr in the ClearML UI, including the cloning and installation process, for easy debugging
+- Monitor the execution and allow you to manually abort the job using the ClearML UI (or, in the unfortunate case of a
+  code crash, catch the error and signal the experiment has failed)

 #### System Design & Flow

 <img src="https://github.com/allegroai/clearml-agent/blob/master/docs/clearml_architecture.png" width="100%" alt="clearml-architecture">

-
 #### Installing the ClearML Agent

 ```bash
@@ -135,6 +157,7 @@ pip install clearml-agent
 #### ClearML Agent Usage Examples

 Full Interface and capabilities are available with
+
 ```bash
 clearml-agent --help
 clearml-agent daemon --help
@@ -146,7 +169,8 @@ clearml-agent daemon --help
 clearml-agent init
 ```

-Note: The ClearML Agent uses a cache folder to cache pip packages, apt packages and cloned repositories. The default ClearML Agent cache folder is `~/.clearml`
+Note: The ClearML Agent uses a cache folder to cache pip packages, apt packages and cloned repositories. The default
+ClearML Agent cache folder is `~/.clearml`

 See full details in your configuration file at `~/clearml.conf`

@@ -156,29 +180,36 @@ They are designed to share the same configuration file, see example [here](docs/
 #### Running the ClearML Agent

 For debug and experimentation, start the ClearML agent in `foreground` mode, where all the output is printed to screen
+
 ```bash
 clearml-agent daemon --queue default --foreground
 ```

 For actual service mode, all the stdout will be stored automatically into a temporary file (no need to pipe)
 Notice: with `--detached` flag, the *clearml-agent* will be running in the background
+
 ```bash
 clearml-agent daemon --detached --queue default
 ```

-GPU allocation is controlled via the standard OS environment `NVIDIA_VISIBLE_DEVICES` or `--gpus` flag (or disabled with `--cpu-only`).
+GPU allocation is controlled via the standard OS environment `NVIDIA_VISIBLE_DEVICES` or `--gpus` flag (or disabled
+with `--cpu-only`).

-If no flag is set, and `NVIDIA_VISIBLE_DEVICES` variable doesn't exist, all GPU's will be allocated for the `clearml-agent` <br>
-If `--cpu-only` flag is set, or `NVIDIA_VISIBLE_DEVICES` is an empty string (""), no gpu will be allocated for the `clearml-agent`
+If no flag is set, and `NVIDIA_VISIBLE_DEVICES` variable doesn't exist, all GPU's will be allocated for
+the `clearml-agent` <br>
+If `--cpu-only` flag is set, or `NVIDIA_VISIBLE_DEVICES="none"`, no gpu will be allocated for
+the `clearml-agent`

 Example: spin two agents, one per gpu on the same machine:
 Notice: with `--detached` flag, the *clearml-agent* will be running in the background
+
 ```bash
 clearml-agent daemon --detached --gpus 0 --queue default
 clearml-agent daemon --detached --gpus 1 --queue default
 ```

 Example: spin two agents, pulling from dedicated `dual_gpu` queue, two gpu's per agent
+
 ```bash
 clearml-agent daemon --detached --gpus 0,1 --queue dual_gpu
 clearml-agent daemon --detached --gpus 2,3 --queue dual_gpu
@@ -187,23 +218,29 @@ clearml-agent daemon --detached --gpus 2,3 --queue dual_gpu
 ##### Starting the ClearML Agent in docker mode

 For debug and experimentation, start the ClearML agent in `foreground` mode, where all the output is printed to screen
+
 ```bash
 clearml-agent daemon --queue default --docker --foreground
 ```

 For actual service mode, all the stdout will be stored automatically into a file (no need to pipe)
 Notice: with `--detached` flag, the *clearml-agent* will be running in the background
+
 ```bash
 clearml-agent daemon --detached --queue default --docker
 ```

-Example: spin two agents, one per gpu on the same machine, with default nvidia/cuda:10.1-cudnn7-runtime-ubuntu18.04 docker:
+Example: spin two agents, one per gpu on the same machine, with default nvidia/cuda:10.1-cudnn7-runtime-ubuntu18.04
+docker:
+
 ```bash
 clearml-agent daemon --detached --gpus 0 --queue default --docker nvidia/cuda:10.1-cudnn7-runtime-ubuntu18.04
 clearml-agent daemon --detached --gpus 1 --queue default --docker nvidia/cuda:10.1-cudnn7-runtime-ubuntu18.04
 ```

-Example: spin two agents, pulling from dedicated `dual_gpu` queue, two gpu's per agent, with default nvidia/cuda:10.1-cudnn7-runtime-ubuntu18.04 docker:
+Example: spin two agents, pulling from dedicated `dual_gpu` queue, two gpu's per agent, with default nvidia/cuda:
+10.1-cudnn7-runtime-ubuntu18.04 docker:
+
 ```bash
 clearml-agent daemon --detached --gpus 0,1 --queue dual_gpu --docker nvidia/cuda:10.1-cudnn7-runtime-ubuntu18.04
 clearml-agent daemon --detached --gpus 2,3 --queue dual_gpu --docker nvidia/cuda:10.1-cudnn7-runtime-ubuntu18.04
@@ -214,55 +251,61 @@ clearml-agent daemon --detached --gpus 2,3 --queue dual_gpu --docker nvidia/cuda
 Priority Queues are also supported, example use case:

 High priority queue: `important_jobs`  Low priority queue: `default`
+
 ```bash
 clearml-agent daemon --queue important_jobs default
 ```
-The **ClearML Agent** will first try to pull jobs from the `important_jobs` queue, only then it will fetch a job from the `default` queue.

-Adding queues, managing job order within a queue and moving jobs between queues, is available using the Web UI, see example on our [free server](https://app.community.clear.ml/workers-and-queues/queues)
+The **ClearML Agent** will first try to pull jobs from the `important_jobs` queue, only then it will fetch a job from
+the `default` queue.
+
+Adding queues, managing job order within a queue and moving jobs between queues, is available using the Web UI, see
+example on our [free server](https://app.clear.ml/workers-and-queues/queues)

 ##### Stopping the ClearML Agent

-To stop a **ClearML Agent** running in the background, run the same command line used to start the agent with `--stop` appended.
-For example, to stop the first of the above shown same machine, single gpu agents:
+To stop a **ClearML Agent** running in the background, run the same command line used to start the agent with `--stop`
+appended. For example, to stop the first of the above shown same machine, single gpu agents:
+
 ```bash
 clearml-agent daemon --detached --gpus 0 --queue default --docker nvidia/cuda:10.1-cudnn7-runtime-ubuntu18.04 --stop
 ```

 ### How do I create an experiment on the ClearML Server? <a name="from-scratch"></a>
+
 * Integrate [ClearML](https://github.com/allegroai/clearml) with your code
 * Execute the code on your machine (Manually / PyCharm / Jupyter Notebook)
 * As your code is running, **ClearML** creates an experiment logging all the necessary execution information:
-  - Git repository link and commit ID (or an entire jupyter notebook)
-  - Git diff (we’re not saying you never commit and push, but still...)
-  - Python packages used by your code (including specific versions used)
-  - Hyper-Parameters
-  - Input Artifacts
+    - Git repository link and commit ID (or an entire jupyter notebook)
+    - Git diff (we’re not saying you never commit and push, but still...)
+    - Python packages used by your code (including specific versions used)
+    - Hyper-Parameters
+    - Input Artifacts

  You now have a 'template' of your experiment with everything required for automated execution

-* In the ClearML UI, Right click on the experiment and select 'clone'. A copy of your experiment will be created.
+* In the ClearML UI, Right-click on the experiment and select 'clone'. A copy of your experiment will be created.
 * You now have a new draft experiment cloned from your original experiment, feel free to edit it
-  - Change the Hyper-Parameters
-  - Switch to the latest code base of the repository
-  - Update package versions
-  - Select a specific docker image to run in (see docker execution mode section)
-  - Or simply change nothing to run the same experiment again...
+    - Change the Hyper-Parameters
+    - Switch to the latest code base of the repository
+    - Update package versions
+    - Select a specific docker image to run in (see docker execution mode section)
+    - Or simply change nothing to run the same experiment again...
 * Schedule the newly created experiment for execution: Right-click the experiment and select 'enqueue'

 ### ClearML-Agent Services Mode <a name="services"></a>

-ClearML-Agent Services is a special mode of ClearML-Agent that provides the ability to launch long-lasting jobs
-that previously had to be executed on local / dedicated machines. It allows a single agent to
-launch multiple dockers (Tasks) for different use cases. To name a few use cases, auto-scaler service (spinning instances
-when the need arises and the budget allows), Controllers (Implementing pipelines and more sophisticated DevOps logic),
-Optimizer (such as Hyper-parameter Optimization or sweeping), and Application (such as interactive Bokeh apps for
-increased data transparency)
+ClearML-Agent Services is a special mode of ClearML-Agent that provides the ability to launch long-lasting jobs that
+previously had to be executed on local / dedicated machines. It allows a single agent to launch multiple dockers (Tasks)
+for different use cases. To name a few use cases, auto-scaler service (spinning instances when the need arises and the
+budget allows), Controllers (Implementing pipelines and more sophisticated DevOps logic), Optimizer (such as
+Hyper-parameter Optimization or sweeping), and Application (such as interactive Bokeh apps for increased data
+transparency)

-ClearML-Agent Services mode will spin **any** task enqueued into the specified queue.
-Every task launched by ClearML-Agent Services will be registered as a new node in the system,
-providing tracking and transparency capabilities.
-Currently clearml-agent in services-mode supports cpu only configuration. ClearML-agent services mode can be launched alongside GPU agents.
+ClearML-Agent Services mode will spin **any** task enqueued into the specified queue. Every task launched by
+ClearML-Agent Services will be registered as a new node in the system, providing tracking and transparency capabilities.
+Currently clearml-agent in services-mode supports cpu only configuration. ClearML-agent services mode can be launched
+alongside GPU agents.

 ```bash
 clearml-agent daemon --services-mode --detached --queue services --create-queue --docker ubuntu:18.04 --cpu-only
@@ -270,22 +313,27 @@ clearml-agent daemon --services-mode --detached --queue services --create-queue

 **Note**: It is the user's responsibility to make sure the proper tasks are pushed into the specified queue.

-
 ### AutoML and Orchestration Pipelines <a name="automl-pipes"></a>
-The ClearML Agent can also be used to implement AutoML orchestration and Experiment Pipelines in conjunction with the ClearML package.

-Sample AutoML & Orchestration examples can be found in the ClearML [example/automation](https://github.com/allegroai/clearml/tree/master/examples/automation) folder.
+The ClearML Agent can also be used to implement AutoML orchestration and Experiment Pipelines in conjunction with the
+ClearML package.
+
+Sample AutoML & Orchestration examples can be found in the
+ClearML [example/automation](https://github.com/allegroai/clearml/tree/master/examples/automation) folder.

 AutoML examples
-  - [Toy Keras training experiment](https://github.com/allegroai/clearml/blob/master/examples/optimization/hyper-parameter-optimization/base_template_keras_simple.py)
+
+- [Toy Keras training experiment](https://github.com/allegroai/clearml/blob/master/examples/optimization/hyper-parameter-optimization/base_template_keras_simple.py)
    - In order to create an experiment-template in the system, this code must be executed once manually
-  - [Random Search over the above Keras experiment-template](https://github.com/allegroai/clearml/blob/master/examples/automation/manual_random_param_search_example.py)
-    - This example will create multiple copies of the Keras experiment-template, with different hyper-parameter combinations
+- [Random Search over the above Keras experiment-template](https://github.com/allegroai/clearml/blob/master/examples/automation/manual_random_param_search_example.py)
+    - This example will create multiple copies of the Keras experiment-template, with different hyper-parameter
+      combinations

 Experiment Pipeline examples
-  - [First step experiment](https://github.com/allegroai/clearml/blob/master/examples/automation/task_piping_example.py)
+
+- [First step experiment](https://github.com/allegroai/clearml/blob/master/examples/automation/task_piping_example.py)
    - This example will "process data", and once done, will launch a copy of the 'second step' experiment-template
-  - [Second step experiment](https://github.com/allegroai/clearml/blob/master/examples/automation/toy_base_task.py)
+- [Second step experiment](https://github.com/allegroai/clearml/blob/master/examples/automation/toy_base_task.py)
    - In order to create an experiment-template in the system, this code must be executed once manually

 ### License
--- a/clearml_agent/main.py
+++ b/clearml_agent/main.py
@@ -12,7 +12,7 @@ from clearml_agent.definitions import FileBuffering, CONFIG_FILE
 from clearml_agent.helper.base import reverse_home_folder_expansion, chain_map, named_temporary_file
 from clearml_agent.helper.process import ExitStatus
 from . import interface, session, definitions, commands
-from .errors import ConfigFileNotFound, Sigterm, APIError
+from .errors import ConfigFileNotFound, Sigterm, APIError, CustomBuildScriptFailed
 from .helper.trace import PackageTrace
 from .interface import get_parser

@@ -44,6 +44,8 @@ def run_command(parser, args, command_name):
        debug = command._session.debug_mode
        func = getattr(command, command_name)
        return func(**args_dict)
+    except CustomBuildScriptFailed as e:
+        command_class.exit(e.message, e.errno)
    except ConfigFileNotFound:
        message = 'Cannot find configuration file in "{}".\n' \
                  'To create a configuration file, run:\n' \
--- a/clearml_agent/backend_api/config/default/agent.conf
+++ b/clearml_agent/backend_api/config/default/agent.conf
@@ -11,8 +11,15 @@

    # Set GIT user/pass credentials (if user/pass are set, GIT protocol will be set to https)
    # leave blank for GIT SSH credentials (set force_git_ssh_protocol=true to force SSH protocol)
+    # **Notice**: GitHub personal token is equivalent to password, you can put it directly into `git_pass`
+    # To learn how to generate git token GitHub/Bitbucket/GitLab:
+    # https://docs.github.com/en/authentication/keeping-your-account-and-data-secure/creating-a-personal-access-token
+    # https://support.atlassian.com/bitbucket-cloud/docs/app-passwords/
+    # https://docs.gitlab.com/ee/user/profile/personal_access_tokens.html
    # git_user: ""
    # git_pass: ""
+    # Limit credentials to a single domain, for example: github.com,
+    # all other domains will use public access (no user/pass). Default: always send user/pass for any VCS domain
    # git_host: ""

    # Force GIT protocol to use SSH regardless of the git url (Assumes GIT user/pass are blank)
@@ -30,6 +37,22 @@
    # specific python version and the system supports multiple python the agent will use the requested python version)
    # ignore_requested_python_version: true

+    # Force the root folder of the git repository (instead of the working directory) into the PYHTONPATH
+    # default false, only the working directory will be added to the PYHTONPATH
+    # force_git_root_python_path: false
+
+    # if set, use GIT_ASKPASS to pass user/pass when cloning / fetch repositories
+    # it solves passing user/token to git submodules.
+    # this is a safer way to ensure multiple users using the same repository will
+    # not accidentally leak credentials
+    # Only supported on Linux systems, it will be the default in future releases
+    # enable_git_ask_pass: false
+
+    # in docker mode, if container's entrypoint automatically activated a virtual environment
+    # use the activated virtual environment and install everything there
+    # set to False to disable, and always create a new venv inheriting from the system_site_packages
+    # docker_use_activated_venv: true
+
    # select python package manager:
    # currently supported: pip, conda and poetry
    # if "pip" or "conda" are used, the agent installs the required packages
@@ -42,17 +65,20 @@
        # supported options: pip, conda, poetry
        type: pip,

-        # specify pip version to use (examples "<20", "==19.3.1", "", empty string will install the latest version)
-        pip_version: "<20.2",
+        # specify pip version to use (examples "<20.2", "==19.3.1", "", empty string will install the latest version)
+        pip_version: ["<20.2 ; python_version < '3.10'", "<22.3 ; python_version >= '3.10'"],
+        # specify poetry version to use (examples "<2", "==1.1.1", "", empty string will install the latest version)
+        # poetry_version: "<2",
+        # poetry_install_extra_args: ["-v"]

-        # virtual environment inheres packages from system
+        # virtual environment inherits packages from system
        system_site_packages: false,

        # install with --upgrade
        force_upgrade: false,

        # additional artifact repositories to use when installing python packages
-        # extra_index_url: ["https://allegroai.jfrog.io/clearmlai/api/pypi/public/simple"]
+        # extra_index_url: ["https://allegroai.jfrog.io/clearml/api/pypi/public/simple"]

        # additional conda channels to use when installing with conda package manager
        conda_channels: ["pytorch", "conda-forge", "defaults", ]
@@ -67,7 +93,7 @@
        # set the optional priority packages to be installed before the rest of the required packages,
        # In case a package installation fails, the package will be ignored,
        # and the virtual environment process will continue
-        # priority_optional_packages: ["pygobject", ]
+        priority_optional_packages: ["pygobject", ]

        # set the post packages to be installed after all the rest of the required packages
        # post_packages: ["horovod", ]
@@ -80,6 +106,10 @@
        # set to True to support torch nightly build installation,
        # notice: torch nightly builds are ephemeral and are deleted from time to time
        torch_nightly: false,
+
+        # if set to true, the agent will look for the "poetry.lock" file 
+        # in the passed current working directory instead of the repository's root directory.
+        poetry_files_from_repo_working_dir: false
    },

    # target folder for virtual environments builds, created when executing experiment
@@ -92,7 +122,7 @@
        # minimum required free space to allow for cache entry, disable by passing 0 or negative value
        free_space_threshold_gb: 2.0
        # unmark to enable virtual environment caching
-        # path: ~/.clearml/venvs-cache
+        path: ~/.clearml/venvs-cache
    },

    # cached git clone folder
@@ -114,6 +144,12 @@
    },

    translate_ssh: true,
+
+    # set "disable_ssh_mount: true" to disable the automatic mount of ~/.ssh folder into the docker containers
+    # default is false, automatically mounts ~/.ssh
+    # Must be set to True if using "clearml-session" with this agent!
+    # disable_ssh_mount: false
+
    # reload configuration file every daemon execution
    reload_config: false,

@@ -156,7 +192,7 @@

    default_docker: {
        # default docker image to use when running in docker mode
-        image: "nvidia/cuda:10.1-cudnn7-runtime-ubuntu18.04"
+        image: "nvidia/cuda:10.2-cudnn7-runtime-ubuntu18.04"

        # optional arguments to pass to docker image
        # arguments: ["--ipc=host", ]
@@ -186,8 +222,8 @@
    # default is True, report a single \r line in a sequence of consecutive lines, per 5 seconds.
    # suppress_carriage_return: true

-    # cuda versions used for solving pytorch wheel packages
-    # should be detected automatically. Override with os environment CUDA_VERSION / CUDNN_VERSION
+    # CUDA versions used for Conda setup & solving PyTorch wheel packages
+    # Should be detected automatically. Override with os environment CUDA_VERSION / CUDNN_VERSION
    # cuda_version: 10.1
    # cudnn_version: 7.6

@@ -201,24 +237,152 @@
    hide_docker_command_env_vars {
        enabled: true
        extra_keys: []
+        parse_embedded_urls: true
    }

+    # Maximum execution time (in seconds) for Task's abort function call
+    abort_callback_max_timeout: 1800
+
    # allow to set internal mount points inside the docker,
    # especially useful for non-root docker container images.
    docker_internal_mounts {
        sdk_cache: "/clearml_agent_cache"
        apt_cache: "/var/cache/apt/archives"
-        ssh_folder: "/root/.ssh"
+        ssh_folder: "~/.ssh"
+        ssh_ro_folder: "/.ssh"
        pip_cache: "/root/.cache/pip"
        poetry_cache: "/root/.cache/pypoetry"
        vcs_cache: "/root/.clearml/vcs-cache"
-        venv_build: "/root/.clearml/venvs-builds"
+        venv_build: "~/.clearml/venvs-builds"
        pip_download: "/root/.clearml/pip-download-cache"
    }

    # Name docker containers created by the daemon using the following string format (supported from Docker 0.6.5)
-    # Allowed variables are task_id, worker_id and rand_string (random lower-case letters string, up to 32 charaters)
-    # Note: resulting name must start with an alpha-numeric character and continue with a alpha-numeric characters,
-    #  underscores (_), dots (.) and/or dashes (-)
-    #docker_container_name_format: "clearml-id-{task_id}-{rand_string:.8}"
+    # Allowed variables are task_id, worker_id and rand_string (random lower-case letters string, up to 32 characters)
+    # Note: resulting name must start with an alphanumeric character and
+    #       continue with alphanumeric characters, underscores (_), dots (.) and/or dashes (-)
+    # docker_container_name_format: "clearml-id-{task_id}-{rand_string:.8}"
+
+    # Apply top-level environment section from configuration into os.environ
+    apply_environment: true
+    # Top-level environment section is in the form of:
+    #   environment {
+    #     key: value
+    #     ...
+    #   }
+    # and is applied to the OS environment as `key=value` for each key/value pair
+
+    # Apply top-level files section from configuration into local file system
+    apply_files: true
+    # Top-level files section allows auto-generating files at designated paths with a predefined contents
+    # and target format. Options include:
+    #  contents: the target file's content, typically a string (or any base type int/float/list/dict etc.)
+    #  format: a custom format for the contents. Currently supported value is `base64` to automatically decode a
+    #          base64-encoded contents string, otherwise ignored
+    #  path: the target file's path, may include ~ and inplace env vars
+    #  target_format: format used to encode contents before writing into the target file. Supported values are json,
+    #                 yaml, yml and bytes (in which case the file will be written in binary mode). Default is text mode.
+    #  overwrite: overwrite the target file in case it exists. Default is true.
+    #
+    # Example:
+    #   files {
+    #     myfile1 {
+    #       contents: "The quick brown fox jumped over the lazy dog"
+    #       path: "/tmp/fox.txt"
+    #     }
+    #     myjsonfile {
+    #       contents: {
+    #         some {
+    #           nested {
+    #             value: [1, 2, 3, 4]
+    #           }
+    #         }
+    #       }
+    #       path: "/tmp/test.json"
+    #       target_format: json
+    #     }
+    #   }
+
+    # Specifies a custom environment setup script to be executed instead of installing a virtual environment.
+    # If provided, this script is executed following Git cloning. Script command may include environment variable and
+    # will be expanded before execution (e.g. "$CLEARML_GIT_ROOT/script.sh").
+    # The script can also be specified using the CLEARML_AGENT_CUSTOM_BUILD_SCRIPT environment variable.
+    #
+    # When running the script, the following environment variables will be set:
+    # - CLEARML_CUSTOM_BUILD_TASK_CONFIG_JSON: specifies a path to a temporary files containing the complete task
+    #  contents in JSON format
+    # - CLEARML_TASK_SCRIPT_ENTRY: task entrypoint script as defined in the task's script section
+    # - CLEARML_TASK_WORKING_DIR: task working directory as defined in the task's script section
+    # - CLEARML_VENV_PATH: path to the agent's default virtual environment path (as defined in the configuration)
+    # - CLEARML_GIT_ROOT: path to the cloned Git repository
+    # - CLEARML_CUSTOM_BUILD_OUTPUT: a path to a non-existing file that may be created by the script. If created,
+    #  this file must be in the following JSON format:
+    #      ```json
+    #      {
+    #        "binary": "/absolute/path/to/python-executable",
+    #        "entry_point": "/absolute/path/to/task-entrypoint-script",
+    #        "working_dir": "/absolute/path/to/task-working/dir"
+    #      }
+    #      ```
+    #  If provided, the agent will use these instead of the predefined task script section to execute the task and will
+    #  skip virtual environment creation.
+    #
+    # In case the custom script returns with a non-zero exit code, the agent will fail with the same exit code.
+    # In case the custom script is specified but does not exist, or if the custom script does not write valid content
+    # into the file specified in CLEARML_CUSTOM_BUILD_OUTPUT, the agent will emit a warning and continue with the
+    # standard flow.
+    custom_build_script: ""
+
+    # Crash on exception: by default when encountering an exception while running a task,
+    # the agent will catch the exception, log it and continue running.
+    # Set this to `true` to propagate exceptions and crash the agent.
+    # crash_on_exception: true
+
+    # Disable task docker override. If true, the agent will use the default docker image and ignore any docker image
+    # and arguments specified in the task's container section (setup shell script from the task container section will
+    # be used in any case, if specified).
+    disable_task_docker_override: false
+
+    # Choose the default docker based on the Task properties,
+    # Examples: 'script.requirements', 'script.binary', 'script.repository', 'script.branch', 'project'
+    # Notice: Matching is done via regular expression, for example "^searchme$" will match exactly "searchme$" string
+    #
+    #     "default_docker": {
+    #         "image": "nvidia/cuda:10.2-cudnn7-runtime-ubuntu18.04",
+    #         # optional arguments to pass to docker image
+    #         # arguments: ["--ipc=host", ]
+    #         "match_rules": [
+    #             {
+    #                 "image": "sample_container:tag",
+    #                 "arguments": "-e VALUE=1 --ipc=host",
+    #                 "match": {
+    #                     "script": {
+    #                         "requirements": {
+    #                             "pip": {
+    #                                 "tensorflow": "~=1.6"
+    #                             }
+    #                         },
+    #                         "repository": "",
+    #                         "branch": "master"
+    #                     },
+    #                     "project": "example"
+    #                 }
+    #             },
+    #             {
+    #                 "image": "better_container:tag",
+    #                 "arguments": "",
+    #                 "match": {
+    #                     "container": "replace_me_please"
+    #                 }
+    #             },
+    #             {
+    #                 "image": "another_container:tag",
+    #                 "arguments": "",
+    #                 "match": {
+    #                     "project": "^examples", # anything that starts with "examples", e.g. "examples", "examples/sub_project"
+    #                 }
+    #             }
+    #         ]
+    #     },
+    #
 }
--- a/clearml_agent/backend_api/config/default/api.conf
+++ b/clearml_agent/backend_api/config/default/api.conf
@@ -28,6 +28,9 @@

        pool_maxsize: 512
        pool_connections: 512
+
+        # Override the default http method, use "put" if working behind GCP load balancer (default: "get")
+        # default_method: "get"
    }

    auth {
--- a/clearml_agent/backend_api/schema/service.py
+++ b/clearml_agent/backend_api/schema/service.py
@@ -4,7 +4,7 @@ import re
 import attr
 import six

-import pyhocon
+from clearml_agent.external import pyhocon

 from .action import Action

--- a/clearml_agent/backend_api/session/datamodel.py
+++ b/clearml_agent/backend_api/session/datamodel.py
@@ -66,11 +66,16 @@ class DataModel(object):
        }

    def validate(self, schema=None):
-        jsonschema.validate(
-            self.to_dict(),
-            schema or self._schema,
-            types=dict(array=(list, tuple), integer=six.integer_types),
+        schema = schema or self._schema
+        validator = jsonschema.validators.validator_for(schema)
+        validator_cls = jsonschema.validators.extend(
+            validator=validator,
+            type_checker=validator.TYPE_CHECKER.redefine_many({
+                "array": lambda s, instance: isinstance(instance, (list, tuple)),
+                "integer": lambda s, instance: isinstance(instance, six.integer_types),
+            }),
        )
+        jsonschema.validate(self.to_dict(), schema, cls=validator_cls)

    def __repr__(self):
        return '<{}.{}: {}>'.format(
--- a/clearml_agent/backend_api/session/defs.py
+++ b/clearml_agent/backend_api/session/defs.py
@@ -13,6 +13,19 @@ ENV_HOST_VERIFY_CERT = EnvEntry("CLEARML_API_HOST_VERIFY_CERT", "TRAINS_API_HOST
 ENV_CONDA_ENV_PACKAGE = EnvEntry("CLEARML_CONDA_ENV_PACKAGE", "TRAINS_CONDA_ENV_PACKAGE")
 ENV_NO_DEFAULT_SERVER = EnvEntry("CLEARML_NO_DEFAULT_SERVER", "TRAINS_NO_DEFAULT_SERVER", type=bool, default=True)
 ENV_DISABLE_VAULT_SUPPORT = EnvEntry('CLEARML_AGENT_DISABLE_VAULT_SUPPORT', type=bool)
+ENV_ENABLE_ENV_CONFIG_SECTION = EnvEntry('CLEARML_AGENT_ENABLE_ENV_CONFIG_SECTION', type=bool)
+ENV_ENABLE_FILES_CONFIG_SECTION = EnvEntry('CLEARML_AGENT_ENABLE_FILES_CONFIG_SECTION', type=bool)
+ENV_VENV_CONFIGURED = EnvEntry('VIRTUAL_ENV', type=str)
+ENV_PROPAGATE_EXITCODE = EnvEntry("CLEARML_AGENT_PROPAGATE_EXITCODE", type=bool, default=False)
 ENV_INITIAL_CONNECT_RETRY_OVERRIDE = EnvEntry(
    'CLEARML_AGENT_INITIAL_CONNECT_RETRY_OVERRIDE', default=True, converter=safe_text_to_bool
 )
+
+"""
+Experimental option to set the request method for all API requests and auth login.
+This could be useful when GET requests with payloads are blocked by a server as
+POST requests can be used instead.
+
+However this has not been vigorously tested and may have unintended consequences.
+"""
+ENV_API_DEFAULT_REQ_METHOD = EnvEntry("CLEARML_API_DEFAULT_REQ_METHOD", default="GET")
--- a/clearml_agent/backend_api/session/request.py
+++ b/clearml_agent/backend_api/session/request.py
@@ -5,10 +5,18 @@ import six

 from .apimodel import ApiModel
 from .datamodel import DataModel
+from .defs import ENV_API_DEFAULT_REQ_METHOD
+
+
+if ENV_API_DEFAULT_REQ_METHOD.get().upper() not in ("GET", "POST", "PUT"):
+    raise ValueError(
+        "CLEARML_API_DEFAULT_REQ_METHOD environment variable must be 'get' or 'post' (any case is allowed)."
+    )


 class Request(ApiModel):
-    _method = 'get'
+    def_method = ENV_API_DEFAULT_REQ_METHOD.get(default="get")
+    _method = ENV_API_DEFAULT_REQ_METHOD.get(default="get")

    def __init__(self, **kwargs):
        if kwargs:
--- a/clearml_agent/backend_api/session/session.py
+++ b/clearml_agent/backend_api/session/session.py
@@ -2,20 +2,25 @@
 import json as json_lib
 import os
 import sys
+import time
 import types
+from random import SystemRandom
 from socket import gethostname
 from typing import Optional

 import jwt
 import requests
 import six
-from pyhocon import ConfigTree, ConfigFactory
+from requests import RequestException
 from requests.auth import HTTPBasicAuth
 from six.moves.urllib.parse import urlparse, urlunparse

+from clearml_agent.external.pyhocon import ConfigTree, ConfigFactory
+
 from .callresult import CallResult
-from .defs import ENV_VERBOSE, ENV_HOST, ENV_ACCESS_KEY, ENV_SECRET_KEY, ENV_WEB_HOST, ENV_FILES_HOST, ENV_AUTH_TOKEN, \
-    ENV_NO_DEFAULT_SERVER, ENV_DISABLE_VAULT_SUPPORT, ENV_INITIAL_CONNECT_RETRY_OVERRIDE
+from .defs import (
+    ENV_VERBOSE, ENV_HOST, ENV_ACCESS_KEY, ENV_SECRET_KEY, ENV_WEB_HOST, ENV_FILES_HOST, ENV_AUTH_TOKEN,
+    ENV_NO_DEFAULT_SERVER, ENV_DISABLE_VAULT_SUPPORT, ENV_INITIAL_CONNECT_RETRY_OVERRIDE, ENV_API_DEFAULT_REQ_METHOD, )
 from .request import Request, BatchRequest
 from .token_manager import TokenManager
 from ..config import load
@@ -24,6 +29,9 @@ from ...backend_config.environment import backward_compatibility_support
 from ...version import __version__


+sys_random = SystemRandom()
+
+
 class LoginError(Exception):
    pass

@@ -47,6 +55,7 @@ class Session(TokenManager):
    _session_initial_retry_connect_override = 4
    _write_session_data_size = 15000
    _write_session_timeout = (30.0, 30.)
+    _request_exception_retry_timeout = (2.0, 3.0)

    api_version = '2.1'
    feature_set = 'basic'
@@ -109,6 +118,9 @@ class Session(TokenManager):
        self._verbose = verbose if verbose is not None else ENV_VERBOSE.get()
        self._logger = logger
        self.__auth_token = None
+        self._propagate_exceptions_on_send = True
+
+        self.update_default_api_method()

        if ENV_AUTH_TOKEN.get(
            value_cb=lambda key, value: print("Using environment access token {}=********".format(key))
@@ -142,7 +154,7 @@ class Session(TokenManager):
                "Could not find host server definition "
                "(missing `~/clearml.conf` or Environment CLEARML_API_HOST)\n"
                "To get started with ClearML: setup your own `clearml-server`, "
-                "or create a free account at https://app.community.clear.ml and run `clearml-agent init`"
+                "or create a free account at https://app.clear.ml and run `clearml-agent init`"
            )

        self.__host = host.strip("/")
@@ -163,6 +175,10 @@ class Session(TokenManager):
        )
        # try to connect with the server
        self.refresh_token()
+
+        # for resilience, from now on we won't allow propagating exceptions when sending requests
+        self._propagate_exceptions_on_send = False
+
        # create the default session with many retries
        http_retries_config, self.__http_session = self._setup_session(http_retries_config)

@@ -183,8 +199,6 @@ class Session(TokenManager):
        # notice: this is across the board warning omission
        urllib_log_warning_setup(total_retries=http_retries_config.get('total', 0), display_warning_after=3)

-        self._load_vaults()
-
    def _setup_session(self, http_retries_config, initial_session=False, default_initial_connect_override=None):
        # type: (dict, bool, Optional[bool]) -> (dict, requests.Session)
        http_retries_config = http_retries_config or self.config.get(
@@ -208,9 +222,24 @@ class Session(TokenManager):
                http_retries_config = dict(**http_retries_config)
                http_retries_config['connect'] = connect_retries

-        return http_retries_config, get_http_session_with_retry(**http_retries_config)
+        return http_retries_config, get_http_session_with_retry(config=self.config or None, **http_retries_config)

-    def _load_vaults(self):
+    def update_default_api_method(self):
+        if ENV_API_DEFAULT_REQ_METHOD.get(default=None):
+            # Make sure we update the config object, so we pass it into the new containers when we map them
+            self.config.put("api.http.default_method", ENV_API_DEFAULT_REQ_METHOD.get())
+            # notice the default setting of Request.def_method are already set by the OS environment
+        elif self.config.get("api.http.default_method", None):
+            def_method = str(self.config.get("api.http.default_method", None)).strip()
+            if def_method.upper() not in ("GET", "POST", "PUT"):
+                raise ValueError(
+                    "api.http.default_method variable must be 'get', 'post' or 'put' (any case is allowed)."
+                )
+            Request.def_method = def_method
+            Request._method = Request.def_method
+
+    def load_vaults(self):
+        # () -> Optional[bool]
        if not self.check_min_api_version("2.15") or self.feature_set == "basic":
            return

@@ -231,28 +260,37 @@ class Session(TokenManager):

        # noinspection PyBroadException
        try:
-            res = self.send_request("users", "get_vaults", json={"enabled": True, "types": ["config"]})
+            # Use params and not data/json otherwise payload might be dropped if we're using GET with a strict firewall
+            res = self.send_request("users", "get_vaults", params="enabled=true&types=config&types=config")
            if res.ok:
                vaults = res.json().get("data", {}).get("vaults", [])
                data = list(filter(None, map(parse, vaults)))
                if data:
                    self.config.set_overrides(*data)
+                    return True
            elif res.status_code != 404:
                raise Exception(res.json().get("meta", {}).get("result_msg", res.text))
        except Exception as ex:
            print("Failed getting vaults: {}".format(ex))

+    def verify_feature_set(self, feature_set):
+        if isinstance(feature_set, str):
+            feature_set = [feature_set]
+        if self.feature_set not in feature_set:
+            raise ValueError('ClearML-server does not support requested feature set {}'.format(feature_set))
+
    def _send_request(
        self,
        service,
        action,
        version=None,
-        method="get",
+        method=Request.def_method,
        headers=None,
        auth=None,
        data=None,
        json=None,
        refresh_token_if_unauthorized=True,
+        params=None,
    ):
        """ Internal implementation for making a raw API request.
            - Constructs the api endpoint name
@@ -276,6 +314,7 @@ class Session(TokenManager):
            if version
            else "{host}/{service}.{action}"
        ).format(**locals())
+
        while True:
            if data and len(data) > self._write_session_data_size:
                timeout = self._write_session_timeout
@@ -283,16 +322,29 @@ class Session(TokenManager):
                timeout = self._session_initial_timeout
            else:
                timeout = self._session_timeout
-            res = self.__http_session.request(
-                method, url, headers=headers, auth=auth, data=data, json=json, timeout=timeout)
+
+            try:
+                res = self.__http_session.request(
+                    method, url, headers=headers, auth=auth, data=data, json=json, timeout=timeout, params=params)
+            except RequestException as ex:
+                if self._propagate_exceptions_on_send:
+                    raise
+                sleep_time = sys_random.uniform(*self._request_exception_retry_timeout)
+                self._logger.error(
+                    "{} exception sending {} {}: {} (retrying in {:.1f}sec)".format(
+                        type(ex).__name__, method.upper(), url, str(ex), sleep_time
+                    )
+                )
+                time.sleep(sleep_time)
+                continue

            if (
                refresh_token_if_unauthorized
                and res.status_code == requests.codes.unauthorized
                and not token_refreshed_on_error
            ):
-                # it seems we're unauthorized, so we'll try to refresh our token once in case permissions changed since
-                # the last time we got the token, and try again
+                # it seems we're unauthorized, so we'll try to refresh our token once in case permissions changed
+                # since the last time we got the token, and try again
                self.refresh_token()
                token_refreshed_on_error = True
                # try again
@@ -324,11 +376,12 @@ class Session(TokenManager):
        service,
        action,
        version=None,
-        method="get",
+        method=Request.def_method,
        headers=None,
        data=None,
        json=None,
        async_enable=False,
+        params=None,
    ):
        """
        Send a raw API request.
@@ -341,6 +394,7 @@ class Session(TokenManager):
                     content type will be application/json)
        :param data: Dictionary, bytes, or file-like object to send in the request body
        :param async_enable: whether request is asynchronous
+        :param params: additional query parameters
        :return: requests Response instance
        """
        headers = self.add_auth_headers(
@@ -357,6 +411,7 @@ class Session(TokenManager):
            headers=headers,
            data=data,
            json=json,
+            params=params,
        )

    def send_request_batch(
@@ -367,7 +422,7 @@ class Session(TokenManager):
        headers=None,
        data=None,
        json=None,
-        method="get",
+        method=Request.def_method,
    ):
        """
        Send a raw batch API request. Batch requests always use application/json-lines content type.
@@ -609,14 +664,14 @@ class Session(TokenManager):

        res = None
        try:
-            data = {"expiration_sec": exp} if exp else {}
            res = self._send_request(
+                method=Request.def_method,
                service="auth",
                action="login",
                auth=auth,
-                json=data,
                headers=headers,
                refresh_token_if_unauthorized=False,
+                params={"expiration_sec": exp} if exp else {},
            )
            try:
                resp = res.json()
@@ -655,3 +710,13 @@ class Session(TokenManager):
        return "{self.__class__.__name__}[{self.host}, {self.access_key}/{secret_key}]".format(
            self=self, secret_key=self.secret_key[:5] + "*" * (len(self.secret_key) - 5)
        )
+
+    @property
+    def propagate_exceptions_on_send(self):
+        # type: () -> bool
+        return self._propagate_exceptions_on_send
+
+    @propagate_exceptions_on_send.setter
+    def propagate_exceptions_on_send(self, value):
+        # type: (bool) -> None
+        self._propagate_exceptions_on_send = value
--- a/clearml_agent/backend_api/session/token_manager.py
+++ b/clearml_agent/backend_api/session/token_manager.py
@@ -87,10 +87,16 @@ class TokenManager(object):
    @classmethod
    def get_decoded_token(cls, token, verify=False):
        """ Get token expiration time. If not present, assume forever """
+        if hasattr(jwt, '__version__') and jwt.__version__[0] == '1':
+            return jwt.decode(
+                token,
+                verify=verify,
+                algorithms=get_default_algorithms(),
+            )
+        
        return jwt.decode(
            token,
-            verify=verify,
-            options=dict(verify_signature=False),
+            options=dict(verify_signature=verify),
            algorithms=get_default_algorithms(),
        )

--- a/clearml_agent/backend_api/utils.py
+++ b/clearml_agent/backend_api/utils.py
@@ -86,7 +86,10 @@ def get_http_session_with_retry(
    session = requests.Session()

    if backoff_max is not None:
-        Retry.BACKOFF_MAX = backoff_max
+        if "BACKOFF_MAX" in vars(Retry):
+            Retry.BACKOFF_MAX = backoff_max
+        else:
+            Retry.DEFAULT_BACKOFF_MAX = backoff_max

    retry = Retry(
        total=total, connect=connect, read=read, redirect=redirect, status=status,
--- a/clearml_agent/backend_config/config.py
+++ b/clearml_agent/backend_config/config.py
@@ -7,10 +7,8 @@ import sys
 from os.path import expanduser
 from typing import Any

-import pyhocon
 import six
 from pathlib2 import Path
-from pyhocon import ConfigTree, ConfigFactory
 from pyparsing import (
    ParseFatalException,
    ParseException,
@@ -18,6 +16,9 @@ from pyparsing import (
    ParseSyntaxException,
 )

+from clearml_agent.external import pyhocon
+from clearml_agent.external.pyhocon import ConfigTree, ConfigFactory
+
 from .defs import (
    Environment,
    DEFAULT_CONFIG_FOLDER,
@@ -82,7 +83,7 @@ class Config(object):
        relative_to=None,
        app=None,
        is_server=False,
-        **_,
+        **_
    ):
        self._app = app
        self._verbose = verbose
@@ -191,16 +192,20 @@ class Config(object):
            config, self._read_extra_env_config_values(), copy_trees=True
        )

-        if self._overrides_configs:
-            config = functools.reduce(
-                lambda cfg, override: ConfigTree.merge_configs(cfg, override, copy_trees=True),
-                self._overrides_configs,
-                config,
-            )
+        config = self.resolve_override_configs(config)

        config["env"] = env
        return config

+    def resolve_override_configs(self, initial=None):
+        if not self._overrides_configs:
+            return initial
+        return functools.reduce(
+            lambda cfg, override: ConfigTree.merge_configs(cfg, override, copy_trees=True),
+            self._overrides_configs,
+            initial or ConfigTree(),
+        )
+
    def _read_extra_env_config_values(self) -> ConfigTree:
        """ Loads extra configuration from environment-injected values """
        result = ConfigTree()
@@ -214,7 +219,7 @@ class Config(object):
                    .lower()
                )
                result = ConfigTree.merge_configs(
-                    result, ConfigFactory.parse_string(f"{path}: {os.environ[key]}")
+                    result, ConfigFactory.parse_string("{}: {}".format(path, os.environ[key]))
                )

        return result
@@ -289,6 +294,12 @@ class Config(object):
            )
        return value

+    def put(self, key, value):
+        self._config.put(key, value)
+
+    def pop(self, key, default=None):
+        return self._config.pop(key, default=default)
+
    def to_dict(self):
        return self._config.as_plain_ordered_dict()

--- a/clearml_agent/backend_config/converters.py
+++ b/clearml_agent/backend_config/converters.py
@@ -14,6 +14,14 @@ except ImportError:
 ConverterType = TypeVar("ConverterType", bound=Callable[[Any], Any])


+def text_to_int(value, default=0):
+    # type: (Any, int) -> int
+    try:
+        return int(value)
+    except (ValueError, TypeError):
+        return default
+
+
 def base64_to_text(value):
    # type: (Any) -> Text
    return base64.b64decode(value).decode("utf-8")
--- a/clearml_agent/backend_config/utils.py
+++ b/clearml_agent/backend_config/utils.py
@@ -1,3 +1,14 @@
+import base64
+import os
+from os.path import expandvars, expanduser
+from pathlib import Path
+from typing import List, TYPE_CHECKING
+
+from clearml_agent.external.pyhocon import HOCONConverter, ConfigTree
+
+if TYPE_CHECKING:
+    from .config import Config
+

 def get_items(cls):
    """ get key/value items from an enum-like class (members represent enumeration key/value) """
@@ -7,3 +18,95 @@ def get_items(cls):
 def get_options(cls):
    """ get options from an enum-like class (members represent enumeration key/value) """
    return get_items(cls).values()
+
+
+def apply_environment(config):
+    # type: (Config) -> List[str]
+    env_vars = config.get("environment", None)
+    if not env_vars:
+        return []
+    if isinstance(env_vars, (list, tuple)):
+        env_vars = dict(env_vars)
+
+    keys = list(filter(None, env_vars.keys()))
+
+    for key in keys:
+        os.environ[str(key)] = str(env_vars[key] or "")
+
+    return keys
+
+
+def apply_files(config):
+    # type: (Config) -> None
+    files = config.get("files", None)
+    if not files:
+        return
+
+    if isinstance(files, (list, tuple)):
+        files = dict(files)
+
+    print("Creating files from configuration")
+    for key, data in files.items():
+        path = data.get("path")
+        fmt = data.get("format", "string")
+        target_fmt = data.get("target_format", "string")
+        overwrite = bool(data.get("overwrite", True))
+        contents = data.get("contents")
+
+        target = Path(expanduser(expandvars(path)))
+
+        # noinspection PyBroadException
+        try:
+            if target.is_dir():
+                print("Skipped [{}]: is a directory {}".format(key, target))
+                continue
+
+            if not overwrite and target.is_file():
+                print("Skipped [{}]: file exists {}".format(key, target))
+                continue
+        except Exception as ex:
+            print("Skipped [{}]: can't access {} ({})".format(key, target, ex))
+            continue
+
+        if contents:
+            try:
+                if fmt == "base64":
+                    contents = base64.b64decode(contents)
+                    if target_fmt != "bytes":
+                        contents = contents.decode("utf-8")
+            except Exception as ex:
+                print("Skipped [{}]: failed decoding {} ({})".format(key, fmt, ex))
+                continue
+
+        # noinspection PyBroadException
+        try:
+            target.parent.mkdir(parents=True, exist_ok=True)
+        except Exception as ex:
+            print("Skipped [{}]: failed creating path {} ({})".format(key, target.parent, ex))
+            continue
+
+        try:
+            if target_fmt == "bytes":
+                try:
+                    target.write_bytes(contents)
+                except TypeError:
+                    # simpler error so the user won't get confused
+                    raise TypeError("a bytes-like object is required")
+            else:
+                try:
+                    if target_fmt == "json":
+                        text = HOCONConverter.to_json(contents)
+                    elif target_fmt in ("yaml", "yml"):
+                        text = HOCONConverter.to_yaml(contents)
+                    else:
+                        if isinstance(contents, ConfigTree):
+                            contents = contents.as_plain_ordered_dict()
+                        text = str(contents)
+                except Exception as ex:
+                    print("Skipped [{}]: failed encoding to {} ({})".format(key, target_fmt, ex))
+                    continue
+                target.write_text(text)
+            print("Saved [{}]: {}".format(key, target))
+        except Exception as ex:
+            print("Skipped [{}]: failed saving file {} ({})".format(key, target, ex))
+            continue
--- a/clearml_agent/commands/base.py
+++ b/clearml_agent/commands/base.py
@@ -118,13 +118,15 @@ class ServiceCommandSection(BaseCommandSection):
        """ The name of the REST service used by this command """
        pass

-    def get(self, endpoint, *args, session=None, **kwargs):
+    def get(self, endpoint, *args, service=None, session=None, **kwargs):
        session = session or self._session
-        return session.get(service=self.service, action=endpoint, *args, **kwargs)
+        service = service or self.service
+        return session.get(service=service, action=endpoint, *args, **kwargs)

-    def post(self, endpoint, *args, session=None, **kwargs):
+    def post(self, endpoint, *args, service=None, session=None, **kwargs):
        session = session or self._session
-        return session.post(service=self.service, action=endpoint, *args, **kwargs)
+        service = service or self.service
+        return session.post(service=service, action=endpoint, *args, **kwargs)

    def get_with_act_as(self, endpoint, *args, **kwargs):
        return self._session.get_with_act_as(service=self.service, action=endpoint, *args, **kwargs)
@@ -347,7 +349,7 @@ class ServiceCommandSection(BaseCommandSection):
        except AttributeError:
            raise NameResolutionError('Name resolution unavailable for {}'.format(service))

-        request = request_cls.from_dict(dict(name=name, only_fields=['name', 'id']))
+        request = request_cls.from_dict(dict(name=re.escape(name), only_fields=['name', 'id']))
        # from_dict will ignore unrecognised keyword arguments - not all GetAll's have only_fields
        response = getattr(self._session.send_api(request), service)
        matches = [db_object for db_object in response if name.lower() == db_object.name.lower()]
--- a/clearml_agent/commands/config.py
+++ b/clearml_agent/commands/config.py
@@ -1,20 +1,21 @@
 from __future__ import print_function

-from six.moves import input
-from pyhocon import ConfigFactory, ConfigMissingException
+from typing import Dict, Optional
+
 from pathlib2 import Path
+from six.moves import input
 from six.moves.urllib.parse import urlparse

 from clearml_agent.backend_api.session import Session
 from clearml_agent.backend_api.session.defs import ENV_HOST
 from clearml_agent.backend_config.defs import LOCAL_CONFIG_FILES
-
+from clearml_agent.external.pyhocon import ConfigFactory, ConfigMissingException

 description = """
-Please create new clearml credentials through the profile page in your `clearml-server` web app, 
-or create a free account at https://app.community.clear.ml/profile
+Please create new clearml credentials through the settings page in your `clearml-server` web app, 
+or create a free account at https://app.clear.ml/settings/webapp-configuration
    
-In the profile page, press "Create new credentials", then press "Copy to clipboard".
+In the settings > workspace  page, press "Create new credentials", then press "Copy to clipboard".

 Paste copied configuration here: 
 """
@@ -27,9 +28,9 @@ except Exception:

 host_description = """
 Editing configuration file: {CONFIG_FILE}
-Enter the url of the clearml-server's Web service, for example: {HOST}
+Enter the url of the clearml-server's Web service, for example: {HOST} or https://app.clear.ml
 """.format(
-    CONFIG_FILE=LOCAL_CONFIG_FILES[0],
+    CONFIG_FILE=LOCAL_CONFIG_FILES[-1],
    HOST=def_host,
 )

@@ -84,7 +85,7 @@ def main():
        host = input_url('API Host', api_server)
    else:
        print(host_description)
-        host = input_url('WEB Host', '')
+        host = input_url('WEB Host', 'https://app.clear.ml')

    parsed_host = verify_url(host)
    api_host, files_host, web_host = parse_host(parsed_host, allow_input=True)
@@ -112,13 +113,34 @@ def main():
        print('Exiting setup without creating configuration file')
        return

+    selection = input_options(
+        'Default Output URI (used to automatically store models and artifacts)',
+        {'N': 'None', 'S': 'ClearML Server', 'C': 'Custom'},
+        default='None'
+    )
+    if selection == 'Custom':
+        print('Custom Default Output URI: ', end='')
+        default_output_uri = input().strip()
+    elif selection == "ClearML Server":
+        default_output_uri = files_host
+    else:
+        default_output_uri = None
+
+    print('\nDefault Output URI: {}'.format(default_output_uri if default_output_uri else 'not set'))
+
    # get GIT User/Pass for cloning
    print('Enter git username for repository cloning (leave blank for SSH key authentication): [] ', end='')
    git_user = input()
    if git_user.strip():
-        print('Enter password for user \'{}\': '.format(git_user), end='')
+        print(
+            "Git personal token is equivalent to a password, to learn how to generate a token:\n"
+            "  GitHub: https://docs.github.com/en/authentication/keeping-your-account-and-data-secure/creating-a-personal-access-token\n"  # noqa
+            "  Bitbucket: https://support.atlassian.com/bitbucket-cloud/docs/app-passwords/\n"
+            "  GitLab: https://docs.gitlab.com/ee/user/profile/personal_access_tokens.html\n"
+        )
+        print('Enter git personal token for user \'{}\': '.format(git_user), end='')
        git_pass = input()
-        print('Git repository cloning will be using user={} password={}'.format(git_user, git_pass))
+        print('Git repository cloning will be using user={} token={}'.format(git_user, git_pass))
    else:
        git_user = None
        git_pass = None
@@ -157,7 +179,7 @@ def main():
                     '    api_server: %s\n' \
                     '    web_server: %s\n' \
                     '    files_server: %s\n' \
-                     '    # Credentials are generated using the webapp, %s/profile\n' \
+                     '    # Credentials are generated using the webapp, %s/settings\n' \
                     '    # Override with os environment: CLEARML_API_ACCESS_KEY / CLEARML_API_SECRET_KEY\n' \
                     '    credentials {"access_key": "%s", "secret_key": "%s"}\n' \
                     '}\n\n' % (api_host, web_host, files_host,
@@ -173,6 +195,13 @@ def main():
                              'agent.package_manager.extra_index_url= ' \
                              '[\n{}\n]\n\n'.format("\n".join(map("\"{}\"".format, extra_index_urls)))
            f.write(extra_index_str)
+            if default_output_uri:
+                default_output_url_str = '# Default Task output_uri. if output_uri is not provided to Task.init, ' \
+                                         'default_output_uri will be used instead.\n' \
+                                         'sdk.development.default_output_uri="{}"\n' \
+                                         '\n'.format(default_output_uri.strip('"'))
+                f.write(default_output_url_str)
+                default_conf = default_conf.replace('default_output_uri: ""', '# default_output_uri: ""')
            f.write(default_conf)
    except Exception:
        print('Error! Could not write configuration file at: {}'.format(str(conf_file)))
@@ -299,6 +328,25 @@ def input_url(host_type, host=None):
    return host


+def input_options(message, options, default=None):
+    # type: (str, Dict[str, str], Optional[str]) -> str
+    options_msg = "/".join(
+        "".join(('(' + c.upper() + ')') if c == o else c for c in option)
+        for o, option in options.items()
+    )
+    if default:
+        options_msg += " [{}]".format(default)
+    while True:
+        print('{}: {} '.format(message, options_msg), end='')
+        res = input().strip()
+        if not res:
+            return default
+        elif res.lower() in options:
+            return options[res.lower()]
+        elif res.upper() in options:
+            return options[res.upper()]
+
+
 def input_host_port(host_type, parsed_host):
    print('Enter port for {} host '.format(host_type), end='')
    replace_port = input().lower()
--- a/clearml_agent/commands/events.py
+++ b/clearml_agent/commands/events.py
@@ -3,8 +3,6 @@ from __future__ import print_function
 import json
 import time

-from future.builtins import super
-
 from clearml_agent.commands.base import ServiceCommandSection
 from clearml_agent.helper.base import return_list

--- a/clearml_agent/commands/resolver.py
+++ b/clearml_agent/commands/resolver.py
@@ -0,0 +1,168 @@
+import json
+import re
+import shlex
+
+from clearml_agent.backend_api.session import Request
+from clearml_agent.helper.package.requirements import (
+    RequirementsManager, MarkerRequirement,
+    compare_version_rules, )
+
+
+def resolve_default_container(session, task_id, container_config):
+    container_lookup = session.config.get('agent.default_docker.match_rules', None)
+    if not session.check_min_api_version("2.13") or not container_lookup:
+        return container_config
+
+    # check backend support before sending any more requests (because they will fail and crash the Task)
+    try:
+        session.verify_feature_set('advanced')
+    except ValueError:
+        return container_config
+
+    result = session.send_request(
+        service='tasks',
+        action='get_all',
+        version='2.14',
+        json={'id': [task_id],
+              'only_fields': ['script.requirements', 'script.binary',
+                              'script.repository', 'script.branch',
+                              'project', 'container'],
+              'search_hidden': True},
+        method=Request.def_method,
+        async_enable=False,
+    )
+    try:
+        task_info = result.json()['data']['tasks'][0] if result.ok else {}
+    except (ValueError, TypeError):
+        return container_config
+
+    from clearml_agent.external.requirements_parser.requirement import Requirement
+
+    # store tasks repository
+    repository = task_info.get('script', {}).get('repository') or ''
+    branch = task_info.get('script', {}).get('branch') or ''
+    binary = task_info.get('script', {}).get('binary') or ''
+    requested_container = task_info.get('container', {})
+
+    # get project full path
+    project_full_name = ''
+    if task_info.get('project', None):
+        result = session.send_request(
+            service='projects',
+            action='get_all',
+            version='2.13',
+            json={
+                'id': [task_info.get('project')],
+                'only_fields': ['name'],
+            },
+            method=Request.def_method,
+            async_enable=False,
+        )
+        try:
+            if result.ok:
+                project_full_name = result.json()['data']['projects'][0]['name'] or ''
+        except (ValueError, TypeError):
+            pass
+
+    task_packages_lookup = {}
+    for entry in container_lookup:
+        match = entry.get('match', None)
+        if not match:
+            continue
+        if match.get('project', None):
+            # noinspection PyBroadException
+            try:
+                if not re.search(match.get('project', None), project_full_name):
+                    continue
+            except Exception:
+                print('Failed parsing regular expression \"{}\" in rule: {}'.format(
+                    match.get('project', None), entry))
+                continue
+
+        if match.get('script.repository', None):
+            # noinspection PyBroadException
+            try:
+                if not re.search(match.get('script.repository', None), repository):
+                    continue
+            except Exception:
+                print('Failed parsing regular expression \"{}\" in rule: {}'.format(
+                    match.get('script.repository', None), entry))
+                continue
+
+        if match.get('script.branch', None):
+            # noinspection PyBroadException
+            try:
+                if not re.search(match.get('script.branch', None), branch):
+                    continue
+            except Exception:
+                print('Failed parsing regular expression \"{}\" in rule: {}'.format(
+                    match.get('script.branch', None), entry))
+                continue
+
+        if match.get('script.binary', None):
+            # noinspection PyBroadException
+            try:
+                if not re.search(match.get('script.binary', None), binary):
+                    continue
+            except Exception:
+                print('Failed parsing regular expression \"{}\" in rule: {}'.format(
+                    match.get('script.binary', None), entry))
+                continue
+
+        if match.get('container', None):
+            # noinspection PyBroadException
+            try:
+                if not re.search(match.get('container', None), requested_container.get('image', '')):
+                    continue
+            except Exception:
+                print('Failed parsing regular expression \"{}\" in rule: {}'.format(
+                    match.get('container', None), entry))
+                continue
+
+        matched = True
+        for req_section in ['script.requirements.pip', 'script.requirements.conda']:
+            if not match.get(req_section, None):
+                continue
+
+            match_pip_reqs = [MarkerRequirement(Requirement.parse('{} {}'.format(k, v)))
+                              for k, v in match.get(req_section, None).items()]
+
+            if not task_packages_lookup.get(req_section):
+                req_section_parts = req_section.split('.')
+                task_packages_lookup[req_section] = \
+                    RequirementsManager.parse_requirements_section_to_marker_requirements(
+                        requirements=task_info.get(req_section_parts[0], {}).get(
+                            req_section_parts[1], {}).get(req_section_parts[2], None)
+                    )
+
+            matched_all_reqs = True
+            for mr in match_pip_reqs:
+                matched_req = False
+                for pr in task_packages_lookup[req_section]:
+                    if mr.req.name != pr.req.name:
+                        continue
+                    if compare_version_rules(mr.specs, pr.specs):
+                        matched_req = True
+                        break
+                if not matched_req:
+                    matched_all_reqs = False
+                    break
+
+            # if ew have a match, check second section
+            if matched_all_reqs:
+                continue
+            # no match stop
+            matched = False
+            break
+
+        if matched:
+            if not container_config.get('container'):
+                container_config['container'] = entry.get('image', None)
+            if not container_config.get('arguments'):
+                container_config['arguments'] = entry.get('arguments', None)
+                container_config['arguments'] = shlex.split(str(container_config.get('arguments') or '').strip())
+            print('Matching default container with rule:\n{}'.format(json.dumps(entry)))
+            return container_config
+
+    return container_config
+
--- a/clearml_agent/commands/worker.py
+++ b/clearml_agent/commands/worker.py
--- a/clearml_agent/config.py
+++ b/clearml_agent/config.py
@@ -1,6 +1,6 @@
-from pyhocon import ConfigTree
-
 import six
+
+from clearml_agent.external.pyhocon import ConfigTree
 from clearml_agent.helper.base import Singleton


--- a/clearml_agent/definitions.py
+++ b/clearml_agent/definitions.py
@@ -5,9 +5,9 @@ from enum import IntEnum
 from os import getenv, environ
 from typing import Text, Optional, Union, Tuple, Any

+import six
 from pathlib2 import Path

-import six
 from clearml_agent.helper.base import normalize_path

 PROGRAM_NAME = "clearml-agent"
@@ -69,41 +69,65 @@ ENV_AWS_SECRET_KEY = EnvironmentConfig("AWS_SECRET_ACCESS_KEY")
 ENV_AZURE_ACCOUNT_KEY = EnvironmentConfig("AZURE_STORAGE_KEY")

 ENVIRONMENT_CONFIG = {
-    "api.api_server": EnvironmentConfig("CLEARML_API_HOST", "TRAINS_API_HOST", ),
-    "api.files_server": EnvironmentConfig("CLEARML_FILES_HOST", "TRAINS_FILES_HOST", ),
-    "api.web_server": EnvironmentConfig("CLEARML_WEB_HOST", "TRAINS_WEB_HOST", ),
+    "api.api_server": EnvironmentConfig(
+        "CLEARML_API_HOST",
+        "TRAINS_API_HOST",
+    ),
+    "api.files_server": EnvironmentConfig(
+        "CLEARML_FILES_HOST",
+        "TRAINS_FILES_HOST",
+    ),
+    "api.web_server": EnvironmentConfig(
+        "CLEARML_WEB_HOST",
+        "TRAINS_WEB_HOST",
+    ),
    "api.credentials.access_key": EnvironmentConfig(
-        "CLEARML_API_ACCESS_KEY", "TRAINS_API_ACCESS_KEY",
+        "CLEARML_API_ACCESS_KEY",
+        "TRAINS_API_ACCESS_KEY",
    ),
    "api.credentials.secret_key": ENV_AGENT_SECRET_KEY,
-    "agent.worker_name": EnvironmentConfig("CLEARML_WORKER_NAME", "TRAINS_WORKER_NAME", ),
-    "agent.worker_id": EnvironmentConfig("CLEARML_WORKER_ID", "TRAINS_WORKER_ID", ),
-    "agent.cuda_version": EnvironmentConfig(
-        "CLEARML_CUDA_VERSION", "TRAINS_CUDA_VERSION", "CUDA_VERSION"
+    "agent.worker_name": EnvironmentConfig(
+        "CLEARML_WORKER_NAME",
+        "TRAINS_WORKER_NAME",
    ),
-    "agent.cudnn_version": EnvironmentConfig(
-        "CLEARML_CUDNN_VERSION", "TRAINS_CUDNN_VERSION", "CUDNN_VERSION"
-    ),
-    "agent.cpu_only": EnvironmentConfig(
-        names=("CLEARML_CPU_ONLY", "TRAINS_CPU_ONLY", "CPU_ONLY"), type=bool
+    "agent.worker_id": EnvironmentConfig(
+        "CLEARML_WORKER_ID",
+        "TRAINS_WORKER_ID",
    ),
+    "agent.cuda_version": EnvironmentConfig("CLEARML_CUDA_VERSION", "TRAINS_CUDA_VERSION", "CUDA_VERSION"),
+    "agent.cudnn_version": EnvironmentConfig("CLEARML_CUDNN_VERSION", "TRAINS_CUDNN_VERSION", "CUDNN_VERSION"),
+    "agent.cpu_only": EnvironmentConfig(names=("CLEARML_CPU_ONLY", "TRAINS_CPU_ONLY", "CPU_ONLY"), type=bool),
+    "agent.crash_on_exception": EnvironmentConfig("CLEAMRL_AGENT_CRASH_ON_EXCEPTION", type=bool),
    "sdk.aws.s3.key": EnvironmentConfig("AWS_ACCESS_KEY_ID"),
    "sdk.aws.s3.secret": ENV_AWS_SECRET_KEY,
    "sdk.aws.s3.region": EnvironmentConfig("AWS_DEFAULT_REGION"),
-    "sdk.azure.storage.containers.0": {'account_name': EnvironmentConfig("AZURE_STORAGE_ACCOUNT"),
-                                       'account_key': ENV_AZURE_ACCOUNT_KEY},
+    "sdk.azure.storage.containers.0": {
+        "account_name": EnvironmentConfig("AZURE_STORAGE_ACCOUNT"),
+        "account_key": ENV_AZURE_ACCOUNT_KEY,
+    },
    "sdk.google.storage.credentials_json": EnvironmentConfig("GOOGLE_APPLICATION_CREDENTIALS"),
 }

 ENVIRONMENT_SDK_PARAMS = {
-    "task_id": ("CLEARML_TASK_ID", "TRAINS_TASK_ID", ),
-    "config_file": ("CLEARML_CONFIG_FILE", "TRAINS_CONFIG_FILE", ),
-    "log_level": ("CLEARML_LOG_LEVEL", "TRAINS_LOG_LEVEL", ),
-    "log_to_backend": ("CLEARML_LOG_TASK_TO_BACKEND", "TRAINS_LOG_TASK_TO_BACKEND", ),
+    "task_id": (
+        "CLEARML_TASK_ID",
+        "TRAINS_TASK_ID",
+    ),
+    "config_file": (
+        "CLEARML_CONFIG_FILE",
+        "TRAINS_CONFIG_FILE",
+    ),
+    "log_level": (
+        "CLEARML_LOG_LEVEL",
+        "TRAINS_LOG_LEVEL",
+    ),
+    "log_to_backend": (
+        "CLEARML_LOG_TASK_TO_BACKEND",
+        "TRAINS_LOG_TASK_TO_BACKEND",
+    ),
 }

-ENVIRONMENT_BACKWARD_COMPATIBLE = EnvironmentConfig(
-    names=("CLEARML_AGENT_ALG_ENV", "TRAINS_AGENT_ALG_ENV"), type=bool)
+ENVIRONMENT_BACKWARD_COMPATIBLE = EnvironmentConfig(names=("CLEARML_AGENT_ALG_ENV", "TRAINS_AGENT_ALG_ENV"), type=bool)

 VIRTUAL_ENVIRONMENT_PATH = {
    "python2": normalize_path(CONFIG_DIR, "py2venv"),
@@ -122,30 +146,93 @@ TOKEN_EXPIRATION_SECONDS = int(timedelta(days=2).total_seconds())

 METADATA_EXTENSION = ".json"

-DEFAULT_VENV_UPDATE_URL = (
-    "https://raw.githubusercontent.com/Yelp/venv-update/v3.2.4/venv_update.py"
-)
+DEFAULT_VENV_UPDATE_URL = "https://raw.githubusercontent.com/Yelp/venv-update/v3.2.4/venv_update.py"
 WORKING_REPOSITORY_DIR = "task_repository"
+WORKING_STANDALONE_DIR = "code"
 DEFAULT_VCS_CACHE = normalize_path(CONFIG_DIR, "vcs-cache")
-PIP_EXTRA_INDICES = [
-]
+PIP_EXTRA_INDICES = []
 DEFAULT_PIP_DOWNLOAD_CACHE = normalize_path(CONFIG_DIR, "pip-download-cache")
-ENV_DOCKER_IMAGE = EnvironmentConfig('CLEARML_DOCKER_IMAGE', 'TRAINS_DOCKER_IMAGE')
-ENV_WORKER_ID = EnvironmentConfig('CLEARML_WORKER_ID', 'TRAINS_WORKER_ID')
-ENV_WORKER_TAGS = EnvironmentConfig('CLEARML_WORKER_TAGS')
-ENV_AGENT_SKIP_PIP_VENV_INSTALL = EnvironmentConfig('CLEARML_AGENT_SKIP_PIP_VENV_INSTALL')
-ENV_DOCKER_SKIP_GPUS_FLAG = EnvironmentConfig('CLEARML_DOCKER_SKIP_GPUS_FLAG', 'TRAINS_DOCKER_SKIP_GPUS_FLAG')
-ENV_AGENT_GIT_USER = EnvironmentConfig('CLEARML_AGENT_GIT_USER', 'TRAINS_AGENT_GIT_USER')
-ENV_AGENT_GIT_PASS = EnvironmentConfig('CLEARML_AGENT_GIT_PASS', 'TRAINS_AGENT_GIT_PASS')
-ENV_AGENT_GIT_HOST = EnvironmentConfig('CLEARML_AGENT_GIT_HOST', 'TRAINS_AGENT_GIT_HOST')
-ENV_AGENT_DISABLE_SSH_MOUNT = EnvironmentConfig('CLEARML_AGENT_DISABLE_SSH_MOUNT', type=bool)
-ENV_SSH_AUTH_SOCK = EnvironmentConfig('SSH_AUTH_SOCK')
-ENV_TASK_EXECUTE_AS_USER = EnvironmentConfig('CLEARML_AGENT_EXEC_USER', 'TRAINS_AGENT_EXEC_USER')
-ENV_TASK_EXTRA_PYTHON_PATH = EnvironmentConfig('CLEARML_AGENT_EXTRA_PYTHON_PATH', 'TRAINS_AGENT_EXTRA_PYTHON_PATH')
-ENV_DOCKER_HOST_MOUNT = EnvironmentConfig('CLEARML_AGENT_K8S_HOST_MOUNT', 'CLEARML_AGENT_DOCKER_HOST_MOUNT',
-                                          'TRAINS_AGENT_K8S_HOST_MOUNT', 'TRAINS_AGENT_DOCKER_HOST_MOUNT')
-ENV_VENV_CACHE_PATH = EnvironmentConfig('CLEARML_AGENT_VENV_CACHE_PATH')
-ENV_EXTRA_DOCKER_ARGS = EnvironmentConfig('CLEARML_AGENT_EXTRA_DOCKER_ARGS', type=list)
+ENV_DOCKER_IMAGE = EnvironmentConfig("CLEARML_DOCKER_IMAGE", "TRAINS_DOCKER_IMAGE")
+ENV_WORKER_ID = EnvironmentConfig("CLEARML_WORKER_ID", "TRAINS_WORKER_ID")
+ENV_WORKER_TAGS = EnvironmentConfig("CLEARML_WORKER_TAGS")
+ENV_AGENT_SKIP_PIP_VENV_INSTALL = EnvironmentConfig("CLEARML_AGENT_SKIP_PIP_VENV_INSTALL")
+ENV_AGENT_SKIP_PYTHON_ENV_INSTALL = EnvironmentConfig("CLEARML_AGENT_SKIP_PYTHON_ENV_INSTALL", type=bool)
+ENV_DOCKER_SKIP_GPUS_FLAG = EnvironmentConfig("CLEARML_DOCKER_SKIP_GPUS_FLAG", "TRAINS_DOCKER_SKIP_GPUS_FLAG")
+ENV_AGENT_GIT_USER = EnvironmentConfig("CLEARML_AGENT_GIT_USER", "TRAINS_AGENT_GIT_USER")
+ENV_AGENT_GIT_PASS = EnvironmentConfig("CLEARML_AGENT_GIT_PASS", "TRAINS_AGENT_GIT_PASS")
+ENV_AGENT_GIT_HOST = EnvironmentConfig("CLEARML_AGENT_GIT_HOST", "TRAINS_AGENT_GIT_HOST")
+ENV_AGENT_DISABLE_SSH_MOUNT = EnvironmentConfig("CLEARML_AGENT_DISABLE_SSH_MOUNT", type=bool)
+ENV_SSH_AUTH_SOCK = EnvironmentConfig("SSH_AUTH_SOCK")
+ENV_TASK_EXECUTE_AS_USER = EnvironmentConfig("CLEARML_AGENT_EXEC_USER", "TRAINS_AGENT_EXEC_USER")
+ENV_TASK_EXTRA_PYTHON_PATH = EnvironmentConfig("CLEARML_AGENT_EXTRA_PYTHON_PATH", "TRAINS_AGENT_EXTRA_PYTHON_PATH")
+ENV_DOCKER_HOST_MOUNT = EnvironmentConfig(
+    "CLEARML_AGENT_K8S_HOST_MOUNT",
+    "CLEARML_AGENT_DOCKER_HOST_MOUNT",
+    "TRAINS_AGENT_K8S_HOST_MOUNT",
+    "TRAINS_AGENT_DOCKER_HOST_MOUNT",
+)
+ENV_VENV_CACHE_PATH = EnvironmentConfig("CLEARML_AGENT_VENV_CACHE_PATH")
+ENV_EXTRA_DOCKER_ARGS = EnvironmentConfig("CLEARML_AGENT_EXTRA_DOCKER_ARGS", type=list)
+ENV_DEBUG_INFO = EnvironmentConfig("CLEARML_AGENT_DEBUG_INFO")
+ENV_CHILD_AGENTS_COUNT_CMD = EnvironmentConfig("CLEARML_AGENT_CHILD_AGENTS_COUNT_CMD")
+ENV_DOCKER_ARGS_FILTERS = EnvironmentConfig("CLEARML_AGENT_DOCKER_ARGS_FILTERS")
+ENV_DOCKER_ARGS_HIDE_ENV = EnvironmentConfig("CLEARML_AGENT_DOCKER_ARGS_HIDE_ENV")
+ENV_CONFIG_BC_IN_STANDALONE = EnvironmentConfig("CLEARML_AGENT_STANDALONE_CONFIG_BC", type=bool)
+""" Maintain backwards compatible configuration when launching in standalone mode """
+
+ENV_SERVICES_DOCKER_RESTART = EnvironmentConfig("CLEARML_AGENT_SERVICES_DOCKER_RESTART")
+"""
+    Specify a restart value for a services agent task containers.
+    Note that when a restart value is provided, task containers will not be run with the '--rm' flag and will
+     not be cleaned up automatically when completed (this will need to be done externally using the
+     'docker container prune' command to free up resources).
+    Value format for this env var is "<restart-value>;<task-selector>", where:
+    - <restart-value> can be any valid restart value for docker-run (see https://docs.docker.com/engine/reference/commandline/run/#restart)
+    - <task-selector> is optional, allowing to restrict this behaviour to specific tasks. The format is:
+        "<path-to-task-field>=<value>" where:
+        * <path-to-task-field> is a dot-separated path to a task field (e.g. "container.image")
+        * <value> is optional. If not provided, the restart policy till be applied for the task container if the
+            path provided exists. If provided, the restart policy will be applied if the value matches the value
+            obtained from the task (value parsing and comparison is based on the type of value obtained from the task) 
+    For example:
+        CLEARML_AGENT_SERVICES_DOCKER_RESTART=unless-stopped
+        CLEARML_AGENT_SERVICES_DOCKER_RESTART=unless-stopped;container.image=some-image
+"""
+
+ENV_FORCE_SYSTEM_SITE_PACKAGES = EnvironmentConfig("CLEARML_AGENT_FORCE_SYSTEM_SITE_PACKAGES", type=bool)
+""" Force system_site_packages: true when running tasks in containers (i.e. docker mode or k8s glue) """
+
+ENV_CUSTOM_BUILD_SCRIPT = EnvironmentConfig("CLEARML_AGENT_CUSTOM_BUILD_SCRIPT")
+"""
+    Specifies a custom environment setup script to be executed instead of installing a virtual environment.
+    If provided, this script is executed following Git cloning. Script command may include environment variable and
+    will be expanded before execution (e.g. "$CLEARML_GIT_ROOT/script.sh").
+    The script can also be specified using the `agent.custom_build_script` configuration setting.
+    
+    When running the script, the following environment variables will be set:
+    - CLEARML_CUSTOM_BUILD_TASK_CONFIG_JSON: specifies a path to a temporary files containing the complete task
+     contents in JSON format
+    - CLEARML_TASK_SCRIPT_ENTRY: task entrypoint script as defined in the task's script section
+    - CLEARML_TASK_WORKING_DIR: task working directory as defined in the task's script section
+    - CLEARML_VENV_PATH: path to the agent's default virtual environment path (as defined in the configuration)
+    - CLEARML_GIT_ROOT: path to the cloned Git repository
+    - CLEARML_CUSTOM_BUILD_OUTPUT: a path to a non-existing file that may be created by the script. If created,
+     this file must be in the following JSON format:
+         ```json
+         {
+           "binary": "/absolute/path/to/python-executable",
+           "entry_point": "/absolute/path/to/task-entrypoint-script",
+           "working_dir": "/absolute/path/to/task-working/dir"
+         }
+         ```
+     If provided, the agent will use these instead of the predefined task script section to execute the task and will
+     skip virtual environment creation.
+    
+    In case the custom script returns with a non-zero exit code, the agent will fail with the same exit code.
+    In case the custom script is specified but does not exist, or if the custom script does not write valid content
+    into the file specified in CLEARML_CUSTOM_BUILD_OUTPUT, the agent will emit a warning and continue with the
+    standard flow.
+"""


 class FileBuffering(IntEnum):
--- a/clearml_agent/errors.py
+++ b/clearml_agent/errors.py
@@ -84,3 +84,13 @@ class MissingPackageError(CommandFailedError):
    def __str__(self):
        return '{self.__class__.__name__}: ' \
               '"{self.name}" package is required. Please run "pip install {self.name}"'.format(self=self)
+
+
+class CustomBuildScriptFailed(CommandFailedError):
+    def __init__(self, errno, *args, **kwargs):
+        super(CustomBuildScriptFailed, self).__init__(*args, **kwargs)
+        self.errno = errno
+
+
+class SkippedCustomBuildScript(CommandFailedError):
+    pass
--- a/clearml_agent/external/pyhocon/init.py
+++ b/clearml_agent/external/pyhocon/init.py
@@ -0,0 +1,5 @@
+from .config_parser import ConfigParser, ConfigFactory, ConfigMissingException
+from .config_tree import ConfigTree
+from .converter import HOCONConverter
+
+__all__ = ["ConfigParser", "ConfigFactory", "ConfigMissingException", "ConfigTree", "HOCONConverter"]
--- a/clearml_agent/external/pyhocon/config_parser.py
+++ b/clearml_agent/external/pyhocon/config_parser.py
@@ -0,0 +1,762 @@
+import itertools
+import re
+import os
+import socket
+import contextlib
+import codecs
+from datetime import timedelta
+
+from pyparsing import Forward, Keyword, QuotedString, Word, Literal, Suppress, Regex, Optional, SkipTo, ZeroOrMore, \
+    Group, lineno, col, TokenConverter, replaceWith, alphanums, alphas8bit, ParseSyntaxException, StringEnd
+from pyparsing import ParserElement
+from .config_tree import ConfigTree, ConfigSubstitution, ConfigList, ConfigValues, ConfigUnquotedString, \
+    ConfigInclude, NoneValue, ConfigQuotedString
+from .exceptions import ConfigSubstitutionException, ConfigMissingException, ConfigException
+import logging
+import copy
+
+use_urllib2 = False
+try:
+    # For Python 3.0 and later
+    from urllib.request import urlopen
+    from urllib.error import HTTPError, URLError
+except ImportError:  # pragma: no cover
+    # Fall back to Python 2's urllib2
+    from urllib2 import urlopen, HTTPError, URLError
+
+    use_urllib2 = True
+try:
+    basestring
+except NameError:  # pragma: no cover
+    basestring = str
+    unicode = str
+
+logger = logging.getLogger(__name__)
+
+#
+# Substitution Defaults
+#
+
+
+class DEFAULT_SUBSTITUTION(object):
+    pass
+
+
+class MANDATORY_SUBSTITUTION(object):
+    pass
+
+
+class NO_SUBSTITUTION(object):
+    pass
+
+
+class STR_SUBSTITUTION(object):
+    pass
+
+
+def period(period_value, period_unit):
+    try:
+        from dateutil.relativedelta import relativedelta as period_impl
+    except Exception:
+        from datetime import timedelta as period_impl
+
+    if period_unit == 'nanoseconds':
+        period_unit = 'microseconds'
+        period_value = int(period_value / 1000)
+
+    arguments = dict(zip((period_unit,), (period_value,)))
+
+    if period_unit == 'milliseconds':
+        return timedelta(**arguments)
+
+    return period_impl(**arguments)
+
+
+class ConfigFactory(object):
+
+    @classmethod
+    def parse_file(cls, filename, encoding='utf-8', required=True, resolve=True, unresolved_value=DEFAULT_SUBSTITUTION):
+        """Parse file
+
+        :param filename: filename
+        :type filename: basestring
+        :param encoding: file encoding
+        :type encoding: basestring
+        :param required: If true, raises an exception if can't load file
+        :type required: boolean
+        :param resolve: if true, resolve substitutions
+        :type resolve: boolean
+        :param unresolved_value: assigned value value to unresolved substitution.
+            If overriden with a default value, it will replace all unresolved value to the default value.
+            If it is set to to pyhocon.STR_SUBSTITUTION then it will replace the value by its
+            substitution expression (e.g., ${x})
+        :type unresolved_value: class
+        :return: Config object
+        :type return: Config
+        """
+        try:
+            with codecs.open(filename, 'r', encoding=encoding) as fd:
+                content = fd.read()
+                return cls.parse_string(content, os.path.dirname(filename), resolve, unresolved_value)
+        except IOError as e:
+            if required:
+                raise e
+            logger.warn('Cannot include file %s. File does not exist or cannot be read.', filename)
+            return []
+
+    @classmethod
+    def parse_URL(cls, url, timeout=None, resolve=True, required=False, unresolved_value=DEFAULT_SUBSTITUTION):
+        """Parse URL
+
+        :param url: url to parse
+        :type url: basestring
+        :param resolve: if true, resolve substitutions
+        :type resolve: boolean
+        :param unresolved_value: assigned value value to unresolved substitution.
+            If overriden with a default value, it will replace all unresolved value to the default value.
+            If it is set to to pyhocon.STR_SUBSTITUTION then it will replace the value by
+            its substitution expression (e.g., ${x})
+        :type unresolved_value: boolean
+        :return: Config object or []
+        :type return: Config or list
+        """
+        socket_timeout = socket._GLOBAL_DEFAULT_TIMEOUT if timeout is None else timeout
+
+        try:
+            with contextlib.closing(urlopen(url, timeout=socket_timeout)) as fd:
+                content = fd.read() if use_urllib2 else fd.read().decode('utf-8')
+                return cls.parse_string(content, os.path.dirname(url), resolve, unresolved_value)
+        except (HTTPError, URLError) as e:
+            logger.warn('Cannot include url %s. Resource is inaccessible.', url)
+            if required:
+                raise e
+            else:
+                return []
+
+    @classmethod
+    def parse_string(cls, content, basedir=None, resolve=True, unresolved_value=DEFAULT_SUBSTITUTION):
+        """Parse URL
+
+        :param content: content to parse
+        :type content: basestring
+        :param resolve: If true, resolve substitutions
+        :param resolve: if true, resolve substitutions
+        :type resolve: boolean
+        :param unresolved_value: assigned value value to unresolved substitution.
+            If overriden with a default value, it will replace all unresolved value to the default value.
+            If it is set to to pyhocon.STR_SUBSTITUTION then it will replace the value by
+            its substitution expression (e.g., ${x})
+        :type unresolved_value: boolean
+        :return: Config object
+        :type return: Config
+        """
+        return ConfigParser().parse(content, basedir, resolve, unresolved_value)
+
+    @classmethod
+    def from_dict(cls, dictionary, root=False):
+        """Convert dictionary (and ordered dictionary) into a ConfigTree
+        :param dictionary: dictionary to convert
+        :type dictionary: dict
+        :return: Config object
+        :type return: Config
+        """
+
+        def create_tree(value):
+            if isinstance(value, dict):
+                res = ConfigTree(root=root)
+                for key, child_value in value.items():
+                    res.put(key, create_tree(child_value))
+                return res
+            if isinstance(value, list):
+                return [create_tree(v) for v in value]
+            else:
+                return value
+
+        return create_tree(dictionary)
+
+
+class ConfigParser(object):
+    """
+    Parse HOCON files: https://github.com/typesafehub/config/blob/master/HOCON.md
+    """
+
+    REPLACEMENTS = {
+        '\\\\': '\\',
+        '\\\n': '\n',
+        '\\n': '\n',
+        '\\r': '\r',
+        '\\t': '\t',
+        '\\=': '=',
+        '\\#': '#',
+        '\\!': '!',
+        '\\"': '"',
+    }
+
+    period_type_map = {
+        'nanoseconds': ['ns', 'nano', 'nanos', 'nanosecond', 'nanoseconds'],
+
+        'microseconds': ['us', 'micro', 'micros', 'microsecond', 'microseconds'],
+        'milliseconds': ['ms', 'milli', 'millis', 'millisecond', 'milliseconds'],
+        'seconds': ['s', 'second', 'seconds'],
+        'minutes': ['m', 'minute', 'minutes'],
+        'hours': ['h', 'hour', 'hours'],
+        'weeks': ['w', 'week', 'weeks'],
+        'days': ['d', 'day', 'days'],
+
+    }
+
+    optional_period_type_map = {
+        'months': ['mo', 'month', 'months'],  # 'm' from hocon spec removed. conflicts with minutes syntax.
+        'years': ['y', 'year', 'years']
+    }
+
+    supported_period_map = None
+
+    @classmethod
+    def get_supported_period_type_map(cls):
+        if cls.supported_period_map is None:
+            cls.supported_period_map = {}
+            cls.supported_period_map.update(cls.period_type_map)
+
+            try:
+                from dateutil import relativedelta
+
+                if relativedelta is not None:
+                    cls.supported_period_map.update(cls.optional_period_type_map)
+            except Exception:
+                pass
+
+        return cls.supported_period_map
+
+    @classmethod
+    def parse(cls, content, basedir=None, resolve=True, unresolved_value=DEFAULT_SUBSTITUTION):
+        """parse a HOCON content
+
+        :param content: HOCON content to parse
+        :type content: basestring
+        :param resolve: if true, resolve substitutions
+        :type resolve: boolean
+        :param unresolved_value: assigned value value to unresolved substitution.
+            If overriden with a default value, it will replace all unresolved value to the default value.
+            If it is set to to pyhocon.STR_SUBSTITUTION then it will replace the value by
+            its substitution expression (e.g., ${x})
+        :type unresolved_value: boolean
+        :return: a ConfigTree or a list
+        """
+
+        unescape_pattern = re.compile(r'\\.')
+
+        def replace_escape_sequence(match):
+            value = match.group(0)
+            return cls.REPLACEMENTS.get(value, value)
+
+        def norm_string(value):
+            return unescape_pattern.sub(replace_escape_sequence, value)
+
+        def unescape_string(tokens):
+            return ConfigUnquotedString(norm_string(tokens[0]))
+
+        def parse_multi_string(tokens):
+            # remove the first and last 3 "
+            return tokens[0][3: -3]
+
+        def convert_number(tokens):
+            n = tokens[0]
+            try:
+                return int(n, 10)
+            except ValueError:
+                return float(n)
+
+        def safe_convert_number(tokens):
+            n = tokens[0]
+            try:
+                return int(n, 10)
+            except ValueError:
+                try:
+                    return float(n)
+                except ValueError:
+                    return n
+
+        def convert_period(tokens):
+
+            period_value = int(tokens.value)
+            period_identifier = tokens.unit
+
+            period_unit = next((single_unit for single_unit, values
+                                in cls.get_supported_period_type_map().items()
+                                if period_identifier in values))
+
+            return period(period_value, period_unit)
+
+        # ${path} or ${?path} for optional substitution
+        SUBSTITUTION_PATTERN = r"\$\{(?P<optional>\?)?(?P<variable>[^}]+)\}(?P<ws>[ \t]*)"
+
+        def create_substitution(instring, loc, token):
+            # remove the ${ and }
+            match = re.match(SUBSTITUTION_PATTERN, token[0])
+            variable = match.group('variable')
+            ws = match.group('ws')
+            optional = match.group('optional') == '?'
+            substitution = ConfigSubstitution(variable, optional, ws, instring, loc)
+            return substitution
+
+        # ${path} or ${?path} for optional substitution
+        STRING_PATTERN = '"(?P<value>(?:[^"\\\\]|\\\\.)*)"(?P<ws>[ \t]*)'
+
+        def create_quoted_string(instring, loc, token):
+            # remove the ${ and }
+            match = re.match(STRING_PATTERN, token[0])
+            value = norm_string(match.group('value'))
+            ws = match.group('ws')
+            return ConfigQuotedString(value, ws, instring, loc)
+
+        def include_config(instring, loc, token):
+            url = None
+            file = None
+            required = False
+
+            if token[0] == 'required':
+                required = True
+                final_tokens = token[1:]
+            else:
+                final_tokens = token
+
+            if len(final_tokens) == 1:  # include "test"
+                value = final_tokens[0].value if isinstance(final_tokens[0], ConfigQuotedString) else final_tokens[0]
+                if value.startswith("http://") or value.startswith("https://") or value.startswith("file://"):
+                    url = value
+                else:
+                    file = value
+            elif len(final_tokens) == 2:  # include url("test") or file("test")
+                value = final_tokens[1].value if isinstance(token[1], ConfigQuotedString) else final_tokens[1]
+                if final_tokens[0] == 'url':
+                    url = value
+                else:
+                    file = value
+
+            if url is not None:
+                logger.debug('Loading config from url %s', url)
+                obj = ConfigFactory.parse_URL(
+                    url,
+                    resolve=False,
+                    required=required,
+                    unresolved_value=NO_SUBSTITUTION
+                )
+            elif file is not None:
+                path = file if basedir is None else os.path.join(basedir, file)
+                logger.debug('Loading config from file %s', path)
+                obj = ConfigFactory.parse_file(
+                    path,
+                    resolve=False,
+                    required=required,
+                    unresolved_value=NO_SUBSTITUTION
+                )
+            else:
+                raise ConfigException('No file or URL specified at: {loc}: {instring}', loc=loc, instring=instring)
+
+            return ConfigInclude(obj if isinstance(obj, list) else obj.items())
+
+        @contextlib.contextmanager
+        def set_default_white_spaces():
+            default = ParserElement.DEFAULT_WHITE_CHARS
+            ParserElement.setDefaultWhitespaceChars(' \t')
+            yield
+            ParserElement.setDefaultWhitespaceChars(default)
+
+        with set_default_white_spaces():
+            assign_expr = Forward()
+            true_expr = Keyword("true", caseless=True).setParseAction(replaceWith(True))
+            false_expr = Keyword("false", caseless=True).setParseAction(replaceWith(False))
+            null_expr = Keyword("null", caseless=True).setParseAction(replaceWith(NoneValue()))
+            # key = QuotedString('"', escChar='\\', unquoteResults=False) | Word(alphanums + alphas8bit + '._- /')
+            regexp_numbers = r'[+-]?(\d*\.\d+|\d+(\.\d+)?)([eE][+\-]?\d+)?(?=$|[ \t]*([\$\}\],#\n\r]|//))'
+            key = QuotedString('"', escChar='\\', unquoteResults=False) | \
+                Regex(regexp_numbers, re.DOTALL).setParseAction(safe_convert_number) | \
+                Word(alphanums + alphas8bit + '._- /')
+
+            eol = Word('\n\r').suppress()
+            eol_comma = Word('\n\r,').suppress()
+            comment = (Literal('#') | Literal('//')) - SkipTo(eol | StringEnd())
+            comment_eol = Suppress(Optional(eol_comma) + comment)
+            comment_no_comma_eol = (comment | eol).suppress()
+            number_expr = Regex(regexp_numbers, re.DOTALL).setParseAction(convert_number)
+
+            period_types = itertools.chain.from_iterable(cls.get_supported_period_type_map().values())
+            period_expr = Regex(r'(?P<value>\d+)\s*(?P<unit>' + '|'.join(period_types) + ')$'
+                                ).setParseAction(convert_period)
+
+            # multi line string using """
+            # Using fix described in http://pyparsing.wikispaces.com/share/view/3778969
+            multiline_string = Regex('""".*?"*"""', re.DOTALL | re.UNICODE).setParseAction(parse_multi_string)
+            # single quoted line string
+            quoted_string = Regex(r'"(?:[^"\\\n]|\\.)*"[ \t]*', re.UNICODE).setParseAction(create_quoted_string)
+            # unquoted string that takes the rest of the line until an optional comment
+            # we support .properties multiline support which is like this:
+            # line1  \
+            # line2 \
+            # so a backslash precedes the \n
+            unquoted_string = Regex(r'(?:[^^`+?!@*&"\[\{\s\]\}#,=\$\\]|\\.)+[ \t]*',
+                                    re.UNICODE).setParseAction(unescape_string)
+            substitution_expr = Regex(r'[ \t]*\$\{[^\}]+\}[ \t]*').setParseAction(create_substitution)
+            string_expr = multiline_string | quoted_string | unquoted_string
+
+            value_expr = period_expr | number_expr | true_expr | false_expr | null_expr | string_expr
+
+            include_content = (quoted_string | ((Keyword('url') | Keyword(
+                'file')) - Literal('(').suppress() - quoted_string - Literal(')').suppress()))
+            include_expr = (
+                Keyword("include", caseless=True).suppress() + (
+                    include_content | (
+                        Keyword("required") - Literal('(').suppress() - include_content - Literal(')').suppress()
+                    )
+                )
+            ).setParseAction(include_config)
+
+            root_dict_expr = Forward()
+            dict_expr = Forward()
+            list_expr = Forward()
+            multi_value_expr = ZeroOrMore(comment_eol | include_expr | substitution_expr |
+                                          dict_expr | list_expr | value_expr | (Literal('\\') - eol).suppress())
+            # for a dictionary : or = is optional
+            # last zeroOrMore is because we can have t = {a:4} {b: 6} {c: 7} which is dictionary concatenation
+            inside_dict_expr = ConfigTreeParser(ZeroOrMore(comment_eol | include_expr | assign_expr | eol_comma))
+            inside_root_dict_expr = ConfigTreeParser(ZeroOrMore(
+                comment_eol | include_expr | assign_expr | eol_comma), root=True)
+            dict_expr << Suppress('{') - inside_dict_expr - Suppress('}')
+            root_dict_expr << Suppress('{') - inside_root_dict_expr - Suppress('}')
+            list_entry = ConcatenatedValueParser(multi_value_expr)
+            list_expr << Suppress('[') - ListParser(list_entry - ZeroOrMore(eol_comma - list_entry)) - Suppress(']')
+
+            # special case when we have a value assignment where the string can potentially be the remainder of the line
+            assign_expr << Group(key - ZeroOrMore(comment_no_comma_eol) -
+                                 (dict_expr | (Literal('=') | Literal(':') | Literal('+=')) -
+                                  ZeroOrMore(comment_no_comma_eol) - ConcatenatedValueParser(multi_value_expr)))
+
+            # the file can be { ... } where {} can be omitted or []
+            config_expr = ZeroOrMore(comment_eol | eol) + (list_expr | root_dict_expr |
+                                                           inside_root_dict_expr) + ZeroOrMore(comment_eol | eol_comma)
+            config = config_expr.parseString(content, parseAll=True)[0]
+
+            if resolve:
+                allow_unresolved = resolve and unresolved_value is not DEFAULT_SUBSTITUTION and \
+                                   unresolved_value is not MANDATORY_SUBSTITUTION
+                has_unresolved = cls.resolve_substitutions(config, allow_unresolved)
+                if has_unresolved and unresolved_value is MANDATORY_SUBSTITUTION:
+                    raise ConfigSubstitutionException(
+                        'resolve cannot be set to True and unresolved_value to MANDATORY_SUBSTITUTION')
+
+            if unresolved_value is not NO_SUBSTITUTION and unresolved_value is not DEFAULT_SUBSTITUTION:
+                cls.unresolve_substitutions_to_value(config, unresolved_value)
+        return config
+
+    @classmethod
+    def _resolve_variable(cls, config, substitution):
+        """
+        :param config:
+        :param substitution:
+        :return: (is_resolved, resolved_variable)
+        """
+        variable = substitution.variable
+        try:
+            return True, config.get(variable)
+        except ConfigMissingException:
+            # default to environment variable
+            value = os.environ.get(variable)
+
+            if value is None:
+                if substitution.optional:
+                    return False, None
+                else:
+                    raise ConfigSubstitutionException(
+                        "Cannot resolve variable ${{{variable}}} (line: {line}, col: {col})".format(
+                            variable=variable,
+                            line=lineno(substitution.loc, substitution.instring),
+                            col=col(substitution.loc, substitution.instring)))
+            elif isinstance(value, ConfigList) or isinstance(value, ConfigTree):
+                raise ConfigSubstitutionException(
+                    "Cannot substitute variable ${{{variable}}} because it does not point to a "
+                    "string, int, float, boolean or null {type} (line:{line}, col: {col})".format(
+                        variable=variable,
+                        type=value.__class__.__name__,
+                        line=lineno(substitution.loc, substitution.instring),
+                        col=col(substitution.loc, substitution.instring)))
+            return True, value
+
+    @classmethod
+    def _fixup_self_references(cls, config, accept_unresolved=False):
+        if isinstance(config, ConfigTree) and config.root:
+            for key in config:  # Traverse history of element
+                history = config.history[key]
+                previous_item = history[0]
+                for current_item in history[1:]:
+                    for substitution in cls._find_substitutions(current_item):
+                        prop_path = ConfigTree.parse_key(substitution.variable)
+                        if len(prop_path) > 1 and config.get(substitution.variable, None) is not None:
+                            continue  # If value is present in latest version, don't do anything
+                        if prop_path[0] == key:
+                            if isinstance(previous_item, ConfigValues) and not accept_unresolved:
+                                # We hit a dead end, we cannot evaluate
+                                raise ConfigSubstitutionException(
+                                    "Property {variable} cannot be substituted. Check for cycles.".format(
+                                        variable=substitution.variable
+                                    )
+                                )
+                            else:
+                                value = previous_item if len(
+                                    prop_path) == 1 else previous_item.get(".".join(prop_path[1:]))
+                                _, _, current_item = cls._do_substitute(substitution, value)
+                    previous_item = current_item
+
+                if len(history) == 1:
+                    for substitution in cls._find_substitutions(previous_item):
+                        prop_path = ConfigTree.parse_key(substitution.variable)
+                        if len(prop_path) > 1 and config.get(substitution.variable, None) is not None:
+                            continue  # If value is present in latest version, don't do anything
+                        if prop_path[0] == key and substitution.optional:
+                            cls._do_substitute(substitution, None)
+                        if prop_path[0] == key:
+                            value = os.environ.get(key)
+                            if value is not None:
+                                cls._do_substitute(substitution, value)
+                                continue
+                            if substitution.optional:  # special case, when self optional referencing without existing
+                                cls._do_substitute(substitution, None)
+
+    # traverse config to find all the substitutions
+    @classmethod
+    def _find_substitutions(cls, item):
+        """Convert HOCON input into a JSON output
+
+        :return: JSON string representation
+        :type return: basestring
+        """
+        if isinstance(item, ConfigValues):
+            return item.get_substitutions()
+
+        substitutions = []
+        elements = []
+        if isinstance(item, ConfigTree):
+            elements = item.values()
+        elif isinstance(item, list):
+            elements = item
+
+        for child in elements:
+            substitutions += cls._find_substitutions(child)
+        return substitutions
+
+    @classmethod
+    def _do_substitute(cls, substitution, resolved_value, is_optional_resolved=True):
+        unresolved = False
+        new_substitutions = []
+        if isinstance(resolved_value, ConfigValues):
+            resolved_value = resolved_value.transform()
+        if isinstance(resolved_value, ConfigValues):
+            unresolved = True
+            result = resolved_value
+        else:
+            # replace token by substitution
+            config_values = substitution.parent
+            # if it is a string, then add the extra ws that was present in the original string after the substitution
+            formatted_resolved_value = resolved_value \
+                if resolved_value is None \
+                or isinstance(resolved_value, (dict, list)) \
+                or substitution.index == len(config_values.tokens) - 1 \
+                else (str(resolved_value) + substitution.ws)
+            # use a deepcopy of resolved_value to avoid mutation
+            config_values.put(substitution.index, copy.deepcopy(formatted_resolved_value))
+            transformation = config_values.transform()
+            result = config_values.overriden_value \
+                if transformation is None and not is_optional_resolved \
+                else transformation
+
+            if result is None and config_values.key in config_values.parent:
+                del config_values.parent[config_values.key]
+            else:
+                config_values.parent[config_values.key] = result
+                s = cls._find_substitutions(result)
+                if s:
+                    new_substitutions = s
+                    unresolved = True
+
+        return (unresolved, new_substitutions, result)
+
+    @classmethod
+    def _final_fixup(cls, item):
+        if isinstance(item, ConfigValues):
+            return item.transform()
+        elif isinstance(item, list):
+            return list([cls._final_fixup(child) for child in item])
+        elif isinstance(item, ConfigTree):
+            items = list(item.items())
+            for key, child in items:
+                item[key] = cls._final_fixup(child)
+        return item
+
+    @classmethod
+    def unresolve_substitutions_to_value(cls, config, unresolved_value=STR_SUBSTITUTION):
+        for substitution in cls._find_substitutions(config):
+            if unresolved_value is STR_SUBSTITUTION:
+                value = substitution.raw_str()
+            elif unresolved_value is None:
+                value = NoneValue()
+            else:
+                value = unresolved_value
+            cls._do_substitute(substitution, value, False)
+        cls._final_fixup(config)
+
+    @classmethod
+    def resolve_substitutions(cls, config, accept_unresolved=False):
+        has_unresolved = False
+        cls._fixup_self_references(config, accept_unresolved)
+        substitutions = cls._find_substitutions(config)
+        if len(substitutions) > 0:
+            unresolved = True
+            any_unresolved = True
+            _substitutions = []
+            cache = {}
+            while any_unresolved and len(substitutions) > 0 and set(substitutions) != set(_substitutions):
+                unresolved = False
+                any_unresolved = True
+                _substitutions = substitutions[:]
+
+                for substitution in _substitutions:
+                    is_optional_resolved, resolved_value = cls._resolve_variable(config, substitution)
+
+                    # if the substitution is optional
+                    if not is_optional_resolved and substitution.optional:
+                        resolved_value = None
+                    if isinstance(resolved_value, ConfigValues):
+                        parents = cache.get(resolved_value)
+                        if parents is None:
+                            parents = []
+                            link = resolved_value
+                            while isinstance(link, ConfigValues):
+                                parents.append(link)
+                                link = link.overriden_value
+                            cache[resolved_value] = parents
+
+                    if isinstance(resolved_value, ConfigValues) \
+                       and substitution.parent in parents \
+                       and hasattr(substitution.parent, 'overriden_value') \
+                       and substitution.parent.overriden_value:
+
+                        # self resolution, backtrack
+                        resolved_value = substitution.parent.overriden_value
+
+                    unresolved, new_substitutions, result = cls._do_substitute(
+                        substitution, resolved_value, is_optional_resolved)
+                    any_unresolved = unresolved or any_unresolved
+                    substitutions.extend(new_substitutions)
+                    if not isinstance(result, ConfigValues):
+                        substitutions.remove(substitution)
+
+            cls._final_fixup(config)
+            if unresolved:
+                has_unresolved = True
+                if not accept_unresolved:
+                    raise ConfigSubstitutionException("Cannot resolve {variables}. Check for cycles.".format(
+                        variables=', '.join('${{{variable}}}: (line: {line}, col: {col})'.format(
+                            variable=substitution.variable,
+                            line=lineno(substitution.loc, substitution.instring),
+                            col=col(substitution.loc, substitution.instring)) for substitution in substitutions)))
+
+        cls._final_fixup(config)
+        return has_unresolved
+
+
+class ListParser(TokenConverter):
+    """Parse a list [elt1, etl2, ...]
+    """
+
+    def __init__(self, expr=None):
+        super(ListParser, self).__init__(expr)
+        self.saveAsList = True
+
+    def postParse(self, instring, loc, token_list):
+        """Create a list from the tokens
+
+        :param instring:
+        :param loc:
+        :param token_list:
+        :return:
+        """
+        cleaned_token_list = [token for tokens in (token.tokens if isinstance(token, ConfigInclude) else [token]
+                                                   for token in token_list if token != '')
+                              for token in tokens]
+        config_list = ConfigList(cleaned_token_list)
+        return [config_list]
+
+
+class ConcatenatedValueParser(TokenConverter):
+    def __init__(self, expr=None):
+        super(ConcatenatedValueParser, self).__init__(expr)
+        self.parent = None
+        self.key = None
+
+    def postParse(self, instring, loc, token_list):
+        config_values = ConfigValues(token_list, instring, loc)
+        return [config_values.transform()]
+
+
+class ConfigTreeParser(TokenConverter):
+    """
+    Parse a config tree from tokens
+    """
+
+    def __init__(self, expr=None, root=False):
+        super(ConfigTreeParser, self).__init__(expr)
+        self.root = root
+        self.saveAsList = True
+
+    def postParse(self, instring, loc, token_list):
+        """Create ConfigTree from tokens
+
+        :param instring:
+        :param loc:
+        :param token_list:
+        :return:
+        """
+        config_tree = ConfigTree(root=self.root)
+        for element in token_list:
+            expanded_tokens = element.tokens if isinstance(element, ConfigInclude) else [element]
+
+            for tokens in expanded_tokens:
+                # key, value1 (optional), ...
+                key = tokens[0].strip() if isinstance(tokens[0], (unicode, basestring)) else tokens[0]
+                operator = '='
+                if len(tokens) == 3 and tokens[1].strip() in [':', '=', '+=']:
+                    operator = tokens[1].strip()
+                    values = tokens[2:]
+                elif len(tokens) == 2:
+                    values = tokens[1:]
+                else:
+                    raise ParseSyntaxException("Unknown tokens {tokens} received".format(tokens=tokens))
+                # empty string
+                if len(values) == 0:
+                    config_tree.put(key, '')
+                else:
+                    value = values[0]
+                    if isinstance(value, list) and operator == "+=":
+                        value = ConfigValues([ConfigSubstitution(key, True, '', False, loc), value], False, loc)
+                        config_tree.put(key, value, False)
+                    elif isinstance(value, unicode) and operator == "+=":
+                        value = ConfigValues([ConfigSubstitution(key, True, '', True, loc), ' ' + value], True, loc)
+                        config_tree.put(key, value, False)
+                    elif isinstance(value, list):
+                        config_tree.put(key, value, False)
+                    else:
+                        existing_value = config_tree.get(key, None)
+                        if isinstance(value, ConfigTree) and not isinstance(existing_value, list):
+                            # Only Tree has to be merged with tree
+                            config_tree.put(key, value, True)
+                        elif isinstance(value, ConfigValues):
+                            conf_value = value
+                            value.parent = config_tree
+                            value.key = key
+                            if isinstance(existing_value, list) or isinstance(existing_value, ConfigTree):
+                                config_tree.put(key, conf_value, True)
+                            else:
+                                config_tree.put(key, conf_value, False)
+                        else:
+                            config_tree.put(key, value, False)
+        return config_tree
--- a/clearml_agent/external/pyhocon/config_tree.py
+++ b/clearml_agent/external/pyhocon/config_tree.py
@@ -0,0 +1,608 @@
+from collections import OrderedDict
+from pyparsing import lineno
+from pyparsing import col
+try:
+    basestring
+except NameError:  # pragma: no cover
+    basestring = str
+    unicode = str
+
+import re
+import copy
+from .exceptions import ConfigException, ConfigWrongTypeException, ConfigMissingException
+
+
+class UndefinedKey(object):
+    pass
+
+
+class NonExistentKey(object):
+    pass
+
+
+class NoneValue(object):
+    pass
+
+
+class ConfigTree(OrderedDict):
+    KEY_SEP = '.'
+
+    def __init__(self, *args, **kwds):
+        self.root = kwds.pop('root') if 'root' in kwds else False
+        if self.root:
+            self.history = {}
+        super(ConfigTree, self).__init__(*args, **kwds)
+        for key, value in self.items():
+            if isinstance(value, ConfigValues):
+                value.parent = self
+                value.index = key
+
+    @staticmethod
+    def merge_configs(a, b, copy_trees=False):
+        """Merge config b into a
+
+        :param a: target config
+        :type a: ConfigTree
+        :param b: source config
+        :type b: ConfigTree
+        :return: merged config a
+        """
+        for key, value in b.items():
+            # if key is in both a and b and both values are dictionary then merge it otherwise override it
+            if key in a and isinstance(a[key], ConfigTree) and isinstance(b[key], ConfigTree):
+                if copy_trees:
+                    a[key] = a[key].copy()
+                ConfigTree.merge_configs(a[key], b[key], copy_trees=copy_trees)
+            else:
+                if isinstance(value, ConfigValues):
+                    value.parent = a
+                    value.key = key
+                    if key in a:
+                        value.overriden_value = a[key]
+                a[key] = value
+                if a.root:
+                    if b.root:
+                        a.history[key] = a.history.get(key, []) + b.history.get(key, [value])
+                    else:
+                        a.history[key] = a.history.get(key, []) + [value]
+
+        return a
+
+    def _put(self, key_path, value, append=False):
+        key_elt = key_path[0]
+        if len(key_path) == 1:
+            # if value to set does not exist, override
+            # if they are both configs then merge
+            # if not then override
+            if key_elt in self and isinstance(self[key_elt], ConfigTree) and isinstance(value, ConfigTree):
+                if self.root:
+                    new_value = ConfigTree.merge_configs(ConfigTree(), self[key_elt], copy_trees=True)
+                    new_value = ConfigTree.merge_configs(new_value, value, copy_trees=True)
+                    self._push_history(key_elt, new_value)
+                    self[key_elt] = new_value
+                else:
+                    ConfigTree.merge_configs(self[key_elt], value)
+            elif append:
+                # If we have t=1
+                # and we try to put t.a=5 then t is replaced by {a: 5}
+                l_value = self.get(key_elt, None)
+                if isinstance(l_value, ConfigValues):
+                    l_value.tokens.append(value)
+                    l_value.recompute()
+                elif isinstance(l_value, ConfigTree) and isinstance(value, ConfigValues):
+                    value.overriden_value = l_value
+                    value.tokens.insert(0, l_value)
+                    value.recompute()
+                    value.parent = self
+                    value.key = key_elt
+                    self._push_history(key_elt, value)
+                    self[key_elt] = value
+                elif isinstance(l_value, list) and isinstance(value, ConfigValues):
+                    self._push_history(key_elt, value)
+                    value.overriden_value = l_value
+                    value.parent = self
+                    value.key = key_elt
+                    self[key_elt] = value
+                elif isinstance(l_value, list):
+                    self[key_elt] = l_value + value
+                    self._push_history(key_elt, l_value)
+                elif l_value is None:
+                    self._push_history(key_elt, value)
+                    self[key_elt] = value
+
+                else:
+                    raise ConfigWrongTypeException(
+                        u"Cannot concatenate the list {key}: {value} to {prev_value} of {type}".format(
+                            key='.'.join(key_path),
+                            value=value,
+                            prev_value=l_value,
+                            type=l_value.__class__.__name__)
+                    )
+            else:
+                # if there was an override keep overide value
+                if isinstance(value, ConfigValues):
+                    value.parent = self
+                    value.key = key_elt
+                    value.overriden_value = self.get(key_elt, None)
+                self._push_history(key_elt, value)
+                self[key_elt] = value
+        else:
+            next_config_tree = super(ConfigTree, self).get(key_elt)
+            if not isinstance(next_config_tree, ConfigTree):
+                # create a new dictionary or overwrite a previous value
+                next_config_tree = ConfigTree()
+                self._push_history(key_elt, next_config_tree)
+                self[key_elt] = next_config_tree
+            next_config_tree._put(key_path[1:], value, append)
+
+    def _push_history(self, key, value):
+        if self.root:
+            hist = self.history.get(key)
+            if hist is None:
+                hist = self.history[key] = []
+            hist.append(value)
+
+    def _get(self, key_path, key_index=0, default=UndefinedKey):
+        key_elt = key_path[key_index]
+        elt = super(ConfigTree, self).get(key_elt, UndefinedKey)
+
+        if elt is UndefinedKey:
+            if default is UndefinedKey:
+                raise ConfigMissingException(u"No configuration setting found for key {key}".format(
+                    key='.'.join(key_path[: key_index + 1])))
+            else:
+                return default
+
+        if key_index == len(key_path) - 1:
+            if isinstance(elt, NoneValue):
+                return None
+            elif isinstance(elt, list):
+                return [None if isinstance(x, NoneValue) else x for x in elt]
+            else:
+                return elt
+        elif isinstance(elt, ConfigTree):
+            return elt._get(key_path, key_index + 1, default)
+        else:
+            if default is UndefinedKey:
+                raise ConfigWrongTypeException(
+                    u"{key} has type {type} rather than dict".format(key='.'.join(key_path[:key_index + 1]),
+                                                                     type=type(elt).__name__))
+            else:
+                return default
+
+    @staticmethod
+    def parse_key(string):
+        """
+        Split a key into path elements:
+        - a.b.c => a, b, c
+        - a."b.c" => a, QuotedKey("b.c") if . is any of the special characters: $}[]:=+#`^?!@*&.
+        - "a" => a
+        - a.b."c" => a, b, c (special case)
+        :param string: either string key (parse '.' as sub-key) or int / float as regular keys
+        :return:
+        """
+        if isinstance(string, (int, float)):
+            return [string]
+
+        special_characters = '$}[]:=+#`^?!@*&.'
+        tokens = re.findall(
+            r'"[^"]+"|[^{special_characters}]+'.format(special_characters=re.escape(special_characters)),
+            string)
+
+        def contains_special_character(token):
+            return any((c in special_characters) for c in token)
+
+        return [token if contains_special_character(token) else token.strip('"') for token in tokens]
+
+    def put(self, key, value, append=False):
+        """Put a value in the tree (dot separated)
+
+        :param key: key to use (dot separated). E.g., a.b.c
+        :type key: basestring
+        :param value: value to put
+        """
+        self._put(ConfigTree.parse_key(key), value, append)
+
+    def get(self, key, default=UndefinedKey):
+        """Get a value from the tree
+
+        :param key: key to use (dot separated). E.g., a.b.c
+        :type key: basestring
+        :param default: default value if key not found
+        :type default: object
+        :return: value in the tree located at key
+        """
+        return self._get(ConfigTree.parse_key(key), 0, default)
+
+    def get_string(self, key, default=UndefinedKey):
+        """Return string representation of value found at key
+
+        :param key: key to use (dot separated). E.g., a.b.c
+        :type key: basestring
+        :param default: default value if key not found
+        :type default: basestring
+        :return: string value
+        :type return: basestring
+        """
+        value = self.get(key, default)
+        if value is None:
+            return None
+
+        string_value = unicode(value)
+        if isinstance(value, bool):
+            string_value = string_value.lower()
+        return string_value
+
+    def pop(self, key, default=UndefinedKey):
+        """Remove specified key and return the corresponding value.
+        If key is not found, default is returned if given, otherwise ConfigMissingException is raised
+
+        This method assumes the user wants to remove the last value in the chain so it parses via parse_key
+        and pops the last value out of the dict.
+
+        :param key: key to use (dot separated). E.g., a.b.c
+        :type key: basestring
+        :param default: default value if key not found
+        :type default: object
+        :param default: default value if key not found
+        :return: value in the tree located at key
+        """
+        if default != UndefinedKey and key not in self:
+            return default
+
+        value = self.get(key, UndefinedKey)
+        lst = ConfigTree.parse_key(key)
+        parent = self.KEY_SEP.join(lst[0:-1])
+        child = lst[-1]
+
+        if parent:
+            self.get(parent).__delitem__(child)
+        else:
+            self.__delitem__(child)
+        return value
+
+    def get_int(self, key, default=UndefinedKey):
+        """Return int representation of value found at key
+
+        :param key: key to use (dot separated). E.g., a.b.c
+        :type key: basestring
+        :param default: default value if key not found
+        :type default: int
+        :return: int value
+        :type return: int
+        """
+        value = self.get(key, default)
+        try:
+            return int(value) if value is not None else None
+        except (TypeError, ValueError):
+            raise ConfigException(
+                u"{key} has type '{type}' rather than 'int'".format(key=key, type=type(value).__name__))
+
+    def get_float(self, key, default=UndefinedKey):
+        """Return float representation of value found at key
+
+        :param key: key to use (dot separated). E.g., a.b.c
+        :type key: basestring
+        :param default: default value if key not found
+        :type default: float
+        :return: float value
+        :type return: float
+        """
+        value = self.get(key, default)
+        try:
+            return float(value) if value is not None else None
+        except (TypeError, ValueError):
+            raise ConfigException(
+                u"{key} has type '{type}' rather than 'float'".format(key=key, type=type(value).__name__))
+
+    def get_bool(self, key, default=UndefinedKey):
+        """Return boolean representation of value found at key
+
+        :param key: key to use (dot separated). E.g., a.b.c
+        :type key: basestring
+        :param default: default value if key not found
+        :type default: bool
+        :return: boolean value
+        :type return: bool
+        """
+
+        # String conversions as per API-recommendations:
+        # https://github.com/typesafehub/config/blob/master/HOCON.md#automatic-type-conversions
+        bool_conversions = {
+            None: None,
+            'true': True, 'yes': True, 'on': True,
+            'false': False, 'no': False, 'off': False
+        }
+        string_value = self.get_string(key, default)
+        if string_value is not None:
+            string_value = string_value.lower()
+        try:
+            return bool_conversions[string_value]
+        except KeyError:
+            raise ConfigException(
+                u"{key} does not translate to a Boolean value".format(key=key))
+
+    def get_list(self, key, default=UndefinedKey):
+        """Return list representation of value found at key
+
+        :param key: key to use (dot separated). E.g., a.b.c
+        :type key: basestring
+        :param default: default value if key not found
+        :type default: list
+        :return: list value
+        :type return: list
+        """
+        value = self.get(key, default)
+        if isinstance(value, list):
+            return value
+        elif isinstance(value, ConfigTree):
+            lst = []
+            for k, v in sorted(value.items(), key=lambda kv: kv[0]):
+                if re.match('^[1-9][0-9]*$|0', k):
+                    lst.append(v)
+                else:
+                    raise ConfigException(u"{key} does not translate to a list".format(key=key))
+            return lst
+        elif value is None:
+            return None
+        else:
+            raise ConfigException(
+                u"{key} has type '{type}' rather than 'list'".format(key=key, type=type(value).__name__))
+
+    def get_config(self, key, default=UndefinedKey):
+        """Return tree config representation of value found at key
+
+        :param key: key to use (dot separated). E.g., a.b.c
+        :type key: basestring
+        :param default: default value if key not found
+        :type default: config
+        :return: config value
+        :type return: ConfigTree
+        """
+        value = self.get(key, default)
+        if isinstance(value, dict):
+            return value
+        elif value is None:
+            return None
+        else:
+            raise ConfigException(
+                u"{key} has type '{type}' rather than 'config'".format(key=key, type=type(value).__name__))
+
+    def __getitem__(self, item):
+        val = self.get(item)
+        if val is UndefinedKey:
+            raise KeyError(item)
+        return val
+
+    try:
+        from collections import _OrderedDictItemsView
+    except ImportError:  # pragma: nocover
+        pass
+    else:
+        def items(self):  # pragma: nocover
+            return self._OrderedDictItemsView(self)
+
+    def __getattr__(self, item):
+        val = self.get(item, NonExistentKey)
+        if val is NonExistentKey:
+            return super(ConfigTree, self).__getattr__(item)
+        return val
+
+    def __contains__(self, item):
+        return self._get(self.parse_key(item), default=NoneValue) is not NoneValue
+
+    def with_fallback(self, config, resolve=True):
+        """
+        return a new config with fallback on config
+        :param config: config or filename of the config to fallback on
+        :param resolve: resolve substitutions
+        :return: new config with fallback on config
+        """
+        if isinstance(config, ConfigTree):
+            result = ConfigTree.merge_configs(copy.deepcopy(config), copy.deepcopy(self))
+        else:
+            from . import ConfigFactory
+            result = ConfigTree.merge_configs(ConfigFactory.parse_file(config, resolve=False), copy.deepcopy(self))
+
+        if resolve:
+            from . import ConfigParser
+            ConfigParser.resolve_substitutions(result)
+        return result
+
+    def as_plain_ordered_dict(self):
+        """return a deep copy of this config as a plain OrderedDict
+
+        The config tree should be fully resolved.
+
+        This is useful to get an object with no special semantics such as path expansion for the keys.
+        In particular this means that keys that contain dots are not surrounded with '"' in the plain OrderedDict.
+
+        :return: this config as an OrderedDict
+        :type return: OrderedDict
+        """
+        def plain_value(v):
+            if isinstance(v, list):
+                return [plain_value(e) for e in v]
+            elif isinstance(v, ConfigTree):
+                return v.as_plain_ordered_dict()
+            else:
+                if isinstance(v, ConfigValues):
+                    raise ConfigException("The config tree contains unresolved elements")
+                return v
+
+        return OrderedDict((key.strip('"') if isinstance(key, (unicode, basestring)) else key, plain_value(value))
+                           for key, value in self.items())
+
+
+class ConfigList(list):
+    def __init__(self, iterable=[]):
+        new_list = list(iterable)
+        super(ConfigList, self).__init__(new_list)
+        for index, value in enumerate(new_list):
+            if isinstance(value, ConfigValues):
+                value.parent = self
+                value.key = index
+
+
+class ConfigInclude(object):
+    def __init__(self, tokens):
+        self.tokens = tokens
+
+
+class ConfigValues(object):
+    def __init__(self, tokens, instring, loc):
+        self.tokens = tokens
+        self.parent = None
+        self.key = None
+        self._instring = instring
+        self._loc = loc
+        self.overriden_value = None
+        self.recompute()
+
+    def recompute(self):
+        for index, token in enumerate(self.tokens):
+            if isinstance(token, ConfigSubstitution):
+                token.parent = self
+                token.index = index
+
+        # no value return empty string
+        if len(self.tokens) == 0:
+            self.tokens = ['']
+
+        # if the last token is an unquoted string then right strip it
+        if isinstance(self.tokens[-1], ConfigUnquotedString):
+            # rstrip only whitespaces, not \n\r because they would have been used escaped
+            self.tokens[-1] = self.tokens[-1].rstrip(' \t')
+
+    def has_substitution(self):
+        return len(self.get_substitutions()) > 0
+
+    def get_substitutions(self):
+        lst = []
+        node = self
+        while node:
+            lst = [token for token in node.tokens if isinstance(token, ConfigSubstitution)] + lst
+            if hasattr(node, 'overriden_value'):
+                node = node.overriden_value
+                if not isinstance(node, ConfigValues):
+                    break
+            else:
+                break
+        return lst
+
+    def transform(self):
+        def determine_type(token):
+            return ConfigTree if isinstance(token, ConfigTree) else ConfigList if isinstance(token, list) else str
+
+        def format_str(v, last=False):
+            if isinstance(v, ConfigQuotedString):
+                return v.value + ('' if last else v.ws)
+            else:
+                return '' if v is None else unicode(v)
+
+        if self.has_substitution():
+            return self
+
+        # remove None tokens
+        tokens = [token for token in self.tokens if token is not None]
+
+        if not tokens:
+            return None
+
+        # check if all tokens are compatible
+        first_tok_type = determine_type(tokens[0])
+        for index, token in enumerate(tokens[1:]):
+            tok_type = determine_type(token)
+            if first_tok_type is not tok_type:
+                raise ConfigWrongTypeException(
+                    "Token '{token}' of type {tok_type} (index {index}) must be of type {req_tok_type} "
+                    "(line: {line}, col: {col})".format(
+                        token=token,
+                        index=index + 1,
+                        tok_type=tok_type.__name__,
+                        req_tok_type=first_tok_type.__name__,
+                        line=lineno(self._loc, self._instring),
+                        col=col(self._loc, self._instring)))
+
+        if first_tok_type is ConfigTree:
+            child = []
+            if hasattr(self, 'overriden_value'):
+                node = self.overriden_value
+                while node:
+                    if isinstance(node, ConfigValues):
+                        value = node.transform()
+                        if isinstance(value, ConfigTree):
+                            child.append(value)
+                        else:
+                            break
+                    elif isinstance(node, ConfigTree):
+                        child.append(node)
+                    else:
+                        break
+                    if hasattr(node, 'overriden_value'):
+                        node = node.overriden_value
+                    else:
+                        break
+
+            result = ConfigTree()
+            for conf in reversed(child):
+                ConfigTree.merge_configs(result, conf, copy_trees=True)
+            for token in tokens:
+                ConfigTree.merge_configs(result, token, copy_trees=True)
+            return result
+        elif first_tok_type is ConfigList:
+            result = []
+            main_index = 0
+            for sublist in tokens:
+                sublist_result = ConfigList()
+                for token in sublist:
+                    if isinstance(token, ConfigValues):
+                        token.parent = result
+                        token.key = main_index
+                    main_index += 1
+                    sublist_result.append(token)
+                result.extend(sublist_result)
+            return result
+        else:
+            if len(tokens) == 1:
+                if isinstance(tokens[0], ConfigQuotedString):
+                    return tokens[0].value
+                return tokens[0]
+            else:
+                return ''.join(format_str(token) for token in tokens[:-1]) + format_str(tokens[-1], True)
+
+    def put(self, index, value):
+        self.tokens[index] = value
+
+    def __repr__(self):  # pragma: no cover
+        return '[ConfigValues: ' + ','.join(str(o) for o in self.tokens) + ']'
+
+
+class ConfigSubstitution(object):
+    def __init__(self, variable, optional, ws, instring, loc):
+        self.variable = variable
+        self.optional = optional
+        self.ws = ws
+        self.index = None
+        self.parent = None
+        self.instring = instring
+        self.loc = loc
+
+    def __repr__(self):  # pragma: no cover
+        return '[ConfigSubstitution: ' + self.variable + ']'
+
+
+class ConfigUnquotedString(unicode):
+    def __new__(cls, value):
+        return super(ConfigUnquotedString, cls).__new__(cls, value)
+
+
+class ConfigQuotedString(object):
+    def __init__(self, value, ws, instring, loc):
+        self.value = value
+        self.ws = ws
+        self.instring = instring
+        self.loc = loc
+
+    def __repr__(self):  # pragma: no cover
+        return '[ConfigQuotedString: ' + self.value + ']'
--- a/clearml_agent/external/pyhocon/converter.py
+++ b/clearml_agent/external/pyhocon/converter.py
@@ -0,0 +1,329 @@
+import json
+import re
+import sys
+
+from . import ConfigFactory
+from .config_tree import ConfigQuotedString
+from .config_tree import ConfigSubstitution
+from .config_tree import ConfigTree
+from .config_tree import ConfigValues
+from .config_tree import NoneValue
+
+
+try:
+    basestring
+except NameError:
+    basestring = str
+    unicode = str
+
+
+class HOCONConverter(object):
+    _number_re = r'[+-]?(\d*\.\d+|\d+(\.\d+)?)([eE][+\-]?\d+)?(?=$|[ \t]*([\$\}\],#\n\r]|//))'
+    _number_re_matcher = re.compile(_number_re)
+
+    @classmethod
+    def to_json(cls, config, compact=False, indent=2, level=0):
+        """Convert HOCON input into a JSON output
+
+        :return: JSON string representation
+        :type return: basestring
+        """
+        lines = ""
+        if isinstance(config, ConfigTree):
+            if len(config) == 0:
+                lines += '{}'
+            else:
+                lines += '{\n'
+                bet_lines = []
+                for key, item in config.items():
+                    bet_lines.append('{indent}"{key}": {value}'.format(
+                        indent=''.rjust((level + 1) * indent, ' '),
+                        key=key.strip('"'),  # for dotted keys enclosed with "" to not be interpreted as nested key
+                        value=cls.to_json(item, compact, indent, level + 1))
+                    )
+                lines += ',\n'.join(bet_lines)
+                lines += '\n{indent}}}'.format(indent=''.rjust(level * indent, ' '))
+        elif isinstance(config, list):
+            if len(config) == 0:
+                lines += '[]'
+            else:
+                lines += '[\n'
+                bet_lines = []
+                for item in config:
+                    bet_lines.append('{indent}{value}'.format(
+                        indent=''.rjust((level + 1) * indent, ' '),
+                        value=cls.to_json(item, compact, indent, level + 1))
+                    )
+                lines += ',\n'.join(bet_lines)
+                lines += '\n{indent}]'.format(indent=''.rjust(level * indent, ' '))
+        elif isinstance(config, basestring):
+            lines = json.dumps(config)
+        elif config is None or isinstance(config, NoneValue):
+            lines = 'null'
+        elif config is True:
+            lines = 'true'
+        elif config is False:
+            lines = 'false'
+        else:
+            lines = str(config)
+        return lines
+
+    @staticmethod
+    def _auto_indent(lines, section):
+        # noinspection PyBroadException
+        try:
+            indent = len(lines) - lines.rindex('\n')
+        except Exception:
+            indent = len(lines)
+        # noinspection PyBroadException
+        try:
+            section_indent = section.index('\n')
+        except Exception:
+            section_indent = len(section)
+        if section_indent < 3:
+            return lines + section
+
+        indent = '\n' + ''.rjust(indent, ' ')
+        return lines + indent.join([sec.strip() for sec in section.split('\n')])
+        # indent = ''.rjust(indent, ' ')
+        # return lines + section.replace('\n', '\n'+indent)
+
+    @classmethod
+    def to_hocon(cls, config, compact=False, indent=2, level=0):
+        """Convert HOCON input into a HOCON output
+
+        :return: JSON string representation
+        :type return: basestring
+        """
+        lines = ""
+        if isinstance(config, ConfigTree):
+            if len(config) == 0:
+                lines += '{}'
+            else:
+                if level > 0:  # don't display { at root level
+                    lines += '{\n'
+                bet_lines = []
+
+                for key, item in config.items():
+                    if compact:
+                        full_key = key
+                        while isinstance(item, ConfigTree) and len(item) == 1:
+                            key, item = next(iter(item.items()))
+                            full_key += '.' + key
+                    else:
+                        full_key = key
+
+                    if isinstance(full_key, float) or \
+                            (isinstance(full_key, (basestring, unicode)) and cls._number_re_matcher.match(full_key)):
+                        # if key can be casted to float, and it is a string, make sure we quote it
+                        full_key = '\"{}\"'.format(full_key)
+
+                    bet_line = ('{indent}{key}{assign_sign} '.format(
+                        indent=''.rjust(level * indent, ' '),
+                        key=full_key,
+                        assign_sign='' if isinstance(item, dict) else ' =',)
+                    )
+                    value_line = cls.to_hocon(item, compact, indent, level + 1)
+                    if isinstance(item, (list, tuple)):
+                        bet_lines.append(cls._auto_indent(bet_line, value_line))
+                    else:
+                        bet_lines.append(bet_line + value_line)
+                lines += '\n'.join(bet_lines)
+
+                if level > 0:  # don't display { at root level
+                    lines += '\n{indent}}}'.format(indent=''.rjust((level - 1) * indent, ' '))
+        elif isinstance(config, (list, tuple)):
+            if len(config) == 0:
+                lines += '[]'
+            else:
+                # lines += '[\n'
+                lines += '['
+                bet_lines = []
+                base_len = len(lines)
+                skip_comma = False
+                for i, item in enumerate(config):
+                    if 0 < i and not skip_comma:
+                        # if not isinstance(item, (str, int, float)):
+                        #     lines += ',\n{indent}'.format(indent=''.rjust(level * indent, ' '))
+                        # else:
+                        #     lines += ', '
+                        lines += ', '
+
+                    skip_comma = False
+                    new_line = cls.to_hocon(item, compact, indent, level + 1)
+                    lines += new_line
+                    if '\n' in new_line or len(lines) - base_len > 80:
+                        if i < len(config) - 1:
+                            lines += ',\n{indent}'.format(indent=''.rjust(level * indent, ' '))
+                        base_len = len(lines)
+                        skip_comma = True
+                    # bet_lines.append('{value}'.format(value=cls.to_hocon(item, compact, indent, level + 1)))
+
+                # lines += '\n'.join(bet_lines)
+                # lines += ', '.join(bet_lines)
+
+                # lines += '\n{indent}]'.format(indent=''.rjust((level - 1) * indent, ' '))
+                lines += ']'
+        elif isinstance(config, basestring):
+            if '\n' in config and len(config) > 1:
+                lines = '"""{value}"""'.format(value=config)  # multilines
+            else:
+                lines = '"{value}"'.format(value=cls.__escape_string(config))
+        elif isinstance(config, ConfigValues):
+            lines = ''.join(cls.to_hocon(o, compact, indent, level) for o in config.tokens)
+        elif isinstance(config, ConfigSubstitution):
+            lines = '${'
+            if config.optional:
+                lines += '?'
+            lines += config.variable + '}' + config.ws
+        elif isinstance(config, ConfigQuotedString):
+            if '\n' in config.value and len(config.value) > 1:
+                lines = '"""{value}"""'.format(value=config.value)  # multilines
+            else:
+                lines = '"{value}"'.format(value=cls.__escape_string(config.value))
+        elif config is None or isinstance(config, NoneValue):
+            lines = 'null'
+        elif config is True:
+            lines = 'true'
+        elif config is False:
+            lines = 'false'
+        else:
+            lines = str(config)
+        return lines
+
+    @classmethod
+    def to_yaml(cls, config, compact=False, indent=2, level=0):
+        """Convert HOCON input into a YAML output
+
+        :return: YAML string representation
+        :type return: basestring
+        """
+        lines = ""
+        if isinstance(config, ConfigTree):
+            if len(config) > 0:
+                if level > 0:
+                    lines += '\n'
+                bet_lines = []
+                for key, item in config.items():
+                    bet_lines.append('{indent}{key}: {value}'.format(
+                        indent=''.rjust(level * indent, ' '),
+                        key=key.strip('"'),  # for dotted keys enclosed with "" to not be interpreted as nested key,
+                        value=cls.to_yaml(item, compact, indent, level + 1))
+                    )
+                lines += '\n'.join(bet_lines)
+        elif isinstance(config, list):
+            config_list = [line for line in config if line is not None]
+            if len(config_list) == 0:
+                lines += '[]'
+            else:
+                lines += '\n'
+                bet_lines = []
+                for item in config_list:
+                    bet_lines.append('{indent}- {value}'.format(indent=''.rjust(level * indent, ' '),
+                                                                value=cls.to_yaml(item, compact, indent, level + 1)))
+                lines += '\n'.join(bet_lines)
+        elif isinstance(config, basestring):
+            # if it contains a \n then it's multiline
+            lines = config.split('\n')
+            if len(lines) == 1:
+                lines = config
+            else:
+                lines = '|\n' + '\n'.join([line.rjust(level * indent, ' ') for line in lines])
+        elif config is None or isinstance(config, NoneValue):
+            lines = 'null'
+        elif config is True:
+            lines = 'true'
+        elif config is False:
+            lines = 'false'
+        else:
+            lines = str(config)
+        return lines
+
+    @classmethod
+    def to_properties(cls, config, compact=False, indent=2, key_stack=[]):
+        """Convert HOCON input into a .properties output
+
+        :return: .properties string representation
+        :type return: basestring
+        :return:
+        """
+
+        def escape_value(value):
+            return value.replace('=', '\\=').replace('!', '\\!').replace('#', '\\#').replace('\n', '\\\n')
+
+        stripped_key_stack = [key.strip('"') for key in key_stack]
+        lines = []
+        if isinstance(config, ConfigTree):
+            for key, item in config.items():
+                if item is not None:
+                    lines.append(cls.to_properties(item, compact, indent, stripped_key_stack + [key]))
+        elif isinstance(config, list):
+            for index, item in enumerate(config):
+                if item is not None:
+                    lines.append(cls.to_properties(item, compact, indent, stripped_key_stack + [str(index)]))
+        elif isinstance(config, basestring):
+            lines.append('.'.join(stripped_key_stack) + ' = ' + escape_value(config))
+        elif config is True:
+            lines.append('.'.join(stripped_key_stack) + ' = true')
+        elif config is False:
+            lines.append('.'.join(stripped_key_stack) + ' = false')
+        elif config is None or isinstance(config, NoneValue):
+            pass
+        else:
+            lines.append('.'.join(stripped_key_stack) + ' = ' + str(config))
+        return '\n'.join([line for line in lines if len(line) > 0])
+
+    @classmethod
+    def convert(cls, config, output_format='json', indent=2, compact=False):
+        converters = {
+            'json': cls.to_json,
+            'properties': cls.to_properties,
+            'yaml': cls.to_yaml,
+            'hocon': cls.to_hocon,
+        }
+
+        if output_format in converters:
+            return converters[output_format](config, compact, indent)
+        else:
+            raise Exception("Invalid format '{format}'. Format must be 'json', 'properties', 'yaml' or 'hocon'".format(
+                format=output_format))
+
+    @classmethod
+    def convert_from_file(cls, input_file=None, output_file=None, output_format='json', indent=2, compact=False):
+        """Convert to json, properties or yaml
+
+        :param input_file: input file, if not specified stdin
+        :param output_file: output file, if not specified stdout
+        :param output_format: json, properties or yaml
+        :return: json, properties or yaml string representation
+        """
+
+        if input_file is None:
+            content = sys.stdin.read()
+            config = ConfigFactory.parse_string(content)
+        else:
+            config = ConfigFactory.parse_file(input_file)
+
+        res = cls.convert(config, output_format, indent, compact)
+        if output_file is None:
+            print(res)
+        else:
+            with open(output_file, "w") as fd:
+                fd.write(res)
+
+    @classmethod
+    def __escape_match(cls, match):
+        char = match.group(0)
+        return {
+            '\b': r'\b',
+            '\t': r'\t',
+            '\n': r'\n',
+            '\f': r'\f',
+            '\r': r'\r',
+            '"': r'\"',
+            '\\': r'\\',
+        }.get(char) or (r'\u%04x' % ord(char))
+
+    @classmethod
+    def __escape_string(cls, string):
+        return re.sub(r'[\x00-\x1F"\\]', cls.__escape_match, string)
--- a/clearml_agent/external/pyhocon/exceptions.py
+++ b/clearml_agent/external/pyhocon/exceptions.py
@@ -0,0 +1,17 @@
+class ConfigException(Exception):
+
+    def __init__(self, message, ex=None):
+        super(ConfigException, self).__init__(message)
+        self._exception = ex
+
+
+class ConfigMissingException(ConfigException, KeyError):
+    pass
+
+
+class ConfigSubstitutionException(ConfigException):
+    pass
+
+
+class ConfigWrongTypeException(ConfigException):
+    pass
--- a/clearml_agent/external/requirements_parser/parser.py
+++ b/clearml_agent/external/requirements_parser/parser.py
@@ -1,6 +1,9 @@
 import os
+import re
 import warnings

+from clearml_agent.definitions import PIP_EXTRA_INDICES
+
 from .requirement import Requirement


@@ -42,9 +45,14 @@ def parse(reqstr, cwd=None):
                    yield requirement
        elif line.startswith('-f') or line.startswith('--find-links') or \
                line.startswith('-i') or line.startswith('--index-url') or \
-                line.startswith('--extra-index-url') or \
                line.startswith('--no-index'):
            warnings.warn('Private repos not supported. Skipping.')
+        elif line.startswith('--extra-index-url'):
+            extra_index = line[len('--extra-index-url'):].strip()
+            extra_index = re.sub(r"\s+#.*$", "", extra_index)  # strip comments
+            if extra_index and extra_index not in PIP_EXTRA_INDICES:
+                PIP_EXTRA_INDICES.append(extra_index)
+                print(f"appended {extra_index} to list of extra pip indices")
            continue
        elif line.startswith('-Z') or line.startswith('--always-unzip'):
            warnings.warn('Unused option --always-unzip. Skipping.')
--- a/clearml_agent/glue/definitions.py
+++ b/clearml_agent/glue/definitions.py
@@ -0,0 +1,7 @@
+from clearml_agent.definitions import EnvironmentConfig
+
+ENV_START_AGENT_SCRIPT_PATH = EnvironmentConfig('CLEARML_K8S_GLUE_START_AGENT_SCRIPT_PATH')
+"""
+Script path to use when creating the bash script to run the agent inside the scheduled pod's docker container. 
+Script will be appended to the specified file.
+"""
--- a/clearml_agent/glue/k8s.py
+++ b/clearml_agent/glue/k8s.py
--- a/clearml_agent/helper/base.py
+++ b/clearml_agent/helper/base.py
@@ -20,13 +20,13 @@ from typing import Text, Dict, Any, Optional, AnyStr, IO, Union

 import attr
 import furl
-import pyhocon
 import yaml
 from attr import fields_dict
 from pathlib2 import Path

 import six
 from six.moves import reduce
+from clearml_agent.external import pyhocon
 from clearml_agent.errors import CommandFailedError
 from clearml_agent.helper.dicts import filter_keys

@@ -204,10 +204,13 @@ def get_python_path(script_dir, entry_point, package_api, is_conda_env=False):
            ["-c", "import sys; print('{}'.join(sys.path))".format(python_path_sep)])
        org_python_path = python_path_cmd.get_output(cwd=script_dir)
        # Add path of the script directory and executable directory
-        python_path = '{}{python_path_sep}{}{python_path_sep}'.format(
-            Path(script_dir).absolute().as_posix(),
-            (Path(script_dir) / Path(entry_point)).parent.absolute().as_posix(),
-            python_path_sep=python_path_sep)
+        python_path = '{}{python_path_sep}'.format(
+            Path(script_dir).absolute().as_posix(), python_path_sep=python_path_sep)
+        if entry_point:
+            python_path += '{}{python_path_sep}'.format(
+                (Path(script_dir) / Path(entry_point)).parent.absolute().as_posix(),
+                python_path_sep=python_path_sep)
+
        if is_windows_platform():
            python_path = python_path.replace('/', '\\')

@@ -503,6 +506,38 @@ def is_conda(config):
    return config['agent.package_manager.type'].lower() == 'conda'


+def convert_cuda_version_to_float_single_digit_str(cuda_version):
+    """
+    Convert a cuda_version (string/float/int) into a float representation, e.g. 11.4
+    Notice returns String Single digit only!
+    :return str:
+    """
+    cuda_version = str(cuda_version or 0)
+    # if we have patch version we parse it here
+    cuda_version_parts = [int(v) for v in cuda_version.split('.')]
+    if len(cuda_version_parts) > 1 or cuda_version_parts[0] < 60:
+        cuda_version = 10 * cuda_version_parts[0]
+        if len(cuda_version_parts) > 1:
+            cuda_version += float(".{:d}".format(cuda_version_parts[1]))*10
+
+        cuda_version_full = "{:.1f}".format(float(cuda_version) / 10.)
+    else:
+        cuda_version = cuda_version_parts[0]
+        cuda_version_full = "{:.1f}".format(float(cuda_version) / 10.)
+
+    return cuda_version_full
+
+
+def convert_cuda_version_to_int_10_base_str(cuda_version):
+    """
+    Convert a cuda_version (string/float/int) into an integer version, e.g. 112 for cuda 11.2
+    Return string
+    :return str:
+    """
+    cuda_version = convert_cuda_version_to_float_single_digit_str(cuda_version)
+    return str(int(float(cuda_version)*10))
+
+
 class NonStrictAttrs(object):

    @classmethod
--- a/clearml_agent/helper/console.py
+++ b/clearml_agent/helper/console.py
@@ -2,7 +2,7 @@ from __future__ import unicode_literals, print_function

 import csv
 import sys
-from collections import Iterable
+from collections.abc import Iterable
 from typing import List, Dict, Text, Any

 from attr import attrs, attrib
--- a/clearml_agent/helper/docker_args.py
+++ b/clearml_agent/helper/docker_args.py
@@ -0,0 +1,96 @@
+import re
+import shlex
+from typing import Tuple, List, TYPE_CHECKING
+from urllib.parse import urlunparse, urlparse
+
+from clearml_agent.definitions import (
+    ENV_AGENT_GIT_PASS,
+    ENV_AGENT_SECRET_KEY,
+    ENV_AWS_SECRET_KEY,
+    ENV_AZURE_ACCOUNT_KEY,
+    ENV_AGENT_AUTH_TOKEN,
+    ENV_DOCKER_IMAGE,
+    ENV_DOCKER_ARGS_HIDE_ENV,
+)
+
+if TYPE_CHECKING:
+    from clearml_agent.session import Session
+
+
+class DockerArgsSanitizer:
+    @classmethod
+    def sanitize_docker_command(cls, session, docker_command):
+        # type: (Session, List[str]) -> List[str]
+        if not docker_command:
+            return docker_command
+
+        enabled = (
+            session.config.get('agent.hide_docker_command_env_vars.enabled', False) or ENV_DOCKER_ARGS_HIDE_ENV.get()
+        )
+        if not enabled:
+            return docker_command
+
+        keys = set(session.config.get('agent.hide_docker_command_env_vars.extra_keys', []))
+        if ENV_DOCKER_ARGS_HIDE_ENV.get():
+            keys.update(shlex.split(ENV_DOCKER_ARGS_HIDE_ENV.get().strip()))
+        keys.update(
+            ENV_AGENT_GIT_PASS.vars,
+            ENV_AGENT_SECRET_KEY.vars,
+            ENV_AWS_SECRET_KEY.vars,
+            ENV_AZURE_ACCOUNT_KEY.vars,
+            ENV_AGENT_AUTH_TOKEN.vars,
+        )
+
+        parse_embedded_urls = bool(session.config.get(
+            'agent.hide_docker_command_env_vars.parse_embedded_urls', True
+        ))
+
+        skip_next = False
+        result = docker_command[:]
+        for i, item in enumerate(docker_command):
+            if skip_next:
+                skip_next = False
+                continue
+            try:
+                if item in ("-e", "--env"):
+                    key, sep, val = result[i + 1].partition("=")
+                    if not sep:
+                        continue
+                    if key in ENV_DOCKER_IMAGE.vars:
+                        # special case - this contains a complete docker command
+                        val = " ".join(cls.sanitize_docker_command(session, re.split(r"\s", val)))
+                    elif key in keys:
+                        val = "********"
+                    elif parse_embedded_urls:
+                        val = cls._sanitize_urls(val)[0]
+                    result[i + 1] = "{}={}".format(key, val)
+                    skip_next = True
+                elif parse_embedded_urls and not item.startswith("-"):
+                    item, changed = cls._sanitize_urls(item)
+                    if changed:
+                        result[i] = item
+            except (KeyError, TypeError):
+                pass
+
+        return result
+
+    @staticmethod
+    def _sanitize_urls(s: str) -> Tuple[str, bool]:
+        """ Replaces passwords in URLs with asterisks """
+        regex = re.compile("^([^:]*:)[^@]+(.*)$")
+        tokens = re.split(r"\s", s)
+        changed = False
+        for k in range(len(tokens)):
+            if "@" in tokens[k]:
+                res = urlparse(tokens[k])
+                if regex.match(res.netloc):
+                    changed = True
+                    tokens[k] = urlunparse((
+                        res.scheme,
+                        regex.sub("\\1********\\2", res.netloc),
+                        res.path,
+                        res.params,
+                        res.query,
+                        res.fragment
+                    ))
+        return " ".join(tokens) if changed else s, changed
--- a/clearml_agent/helper/package/base.py
+++ b/clearml_agent/helper/package/base.py
@@ -80,7 +80,12 @@ class PackageManager(object):

    def upgrade_pip(self):
        result = self._install(
-            select_for_platform(windows='pip{}', linux='pip{}').format(self.get_pip_version()), "--upgrade")
+            *select_for_platform(
+                windows=self.get_pip_versions(),
+                linux=self.get_pip_versions()
+            ),
+            "--upgrade"
+        )
        packages = self.run_with_env(('list',), output=True).splitlines()
        # p.split is ('pip', 'x.y.z')
        pip = [p.split() for p in packages if len(p.split()) == 2 and p.split()[0] == 'pip']
@@ -157,15 +162,26 @@ class PackageManager(object):
    def set_pip_version(cls, version):
        if not version:
            return
-        version = version.replace(' ', '')
-        if ('=' in version) or ('~' in version) or ('<' in version) or ('>' in version):
-            cls._pip_version = version
+
+        if isinstance(version, (list, tuple)):
+            versions = version
        else:
-            cls._pip_version = "=="+version
+            versions = [version]
+
+        cls._pip_version = []
+        for version in versions:
+            version = version.strip()
+            if ('=' in version) or ('~' in version) or ('<' in version) or ('>' in version):
+                cls._pip_version.append(version)
+            else:
+                cls._pip_version.append("==" + version)

    @classmethod
-    def get_pip_version(cls):
-        return cls._pip_version or ''
+    def get_pip_versions(cls, pip="pip", wrap=''):
+        return [
+            (wrap + pip + version + wrap)
+            for version in cls._pip_version or [pip]
+        ]

    def get_cached_venv(self, requirements, docker_cmd, python_version, cuda_version, destination_folder):
        # type: (Dict, Optional[Union[dict, str]], Optional[str], Optional[str], Path) -> Optional[Path]
@@ -176,8 +192,13 @@ class PackageManager(object):
        if not self._get_cache_manager():
            return None

-        keys = self._generate_reqs_hash_keys(requirements, docker_cmd, python_version, cuda_version)
-        return self._get_cache_manager().copy_cached_entry(keys, destination_folder)
+        try:
+            keys = self._generate_reqs_hash_keys(requirements, docker_cmd, python_version, cuda_version)
+            return self._get_cache_manager().copy_cached_entry(keys, destination_folder)
+        except Exception as ex:
+            print("WARNING: Failed accessing venvs cache at {}: {}".format(destination_folder, ex))
+            print("WARNING: Skipping venv cache - folder not accessible!")
+            return None

    def add_cached_venv(
            self,
@@ -194,9 +215,15 @@ class PackageManager(object):
        """
        if not self._get_cache_manager():
            return
-        keys = self._generate_reqs_hash_keys(requirements, docker_cmd, python_version, cuda_version)
-        return self._get_cache_manager().add_entry(
-            keys=keys, source_folder=source_folder, exclude_sub_folders=exclude_sub_folders)
+
+        try:
+            keys = self._generate_reqs_hash_keys(requirements, docker_cmd, python_version, cuda_version)
+            return self._get_cache_manager().add_entry(
+                keys=keys, source_folder=source_folder, exclude_sub_folders=exclude_sub_folders)
+        except Exception as ex:
+            print("WARNING: Failed accessing venvs cache at {}: {}".format(source_folder, ex))
+            print("WARNING: Skipping venv cache - folder not accessible!")
+            return None

    def get_cache_folder(self):
        # type: () -> Optional[Path]
@@ -213,6 +240,13 @@ class PackageManager(object):
            return
        return self._get_cache_manager().get_last_copied_entry()

+    def is_cached_enabled(self):
+        if not self._cache_manager:
+            cache_folder = ENV_VENV_CACHE_PATH.get() or self.session.config.get(self._config_cache_folder, None)
+            if not cache_folder:
+                return False
+        return True
+
    @classmethod
    def _generate_reqs_hash_keys(cls, requirements_list, docker_cmd, python_version, cuda_version):
        # type: (Union[Dict, List[Dict]], Optional[Union[dict, str]], Optional[str], Optional[str]) -> List[str]
@@ -257,12 +291,19 @@ class PackageManager(object):

    def _get_cache_manager(self):
        if not self._cache_manager:
-            cache_folder = ENV_VENV_CACHE_PATH.get() or self.session.config.get(self._config_cache_folder, None)
-            if not cache_folder:
+            cache_folder = None
+            try:
+                cache_folder = ENV_VENV_CACHE_PATH.get() or self.session.config.get(self._config_cache_folder, None)
+                if not cache_folder:
+                    return None
+
+                max_entries = int(self.session.config.get(self._config_cache_max_entries, 10))
+                free_space_threshold = float(self.session.config.get(self._config_cache_free_space_threshold, 0))
+                self._cache_manager = FolderCache(
+                    cache_folder, max_cache_entries=max_entries, min_free_space_gb=free_space_threshold)
+            except Exception as ex:
+                print("WARNING: Failed accessing venvs cache at {}: {}".format(cache_folder, ex))
+                print("WARNING: Skipping venv cache - folder not accessible!")
                return None

-            max_entries = int(self.session.config.get(self._config_cache_max_entries, 10))
-            free_space_threshold = float(self.session.config.get(self._config_cache_free_space_threshold, 0))
-            self._cache_manager = FolderCache(
-                cache_folder, max_cache_entries=max_entries, min_free_space_gb=free_space_threshold)
        return self._cache_manager
--- a/clearml_agent/helper/package/conda_api.py
+++ b/clearml_agent/helper/package/conda_api.py
@@ -19,7 +19,9 @@ from clearml_agent.external.requirements_parser import parse
 from clearml_agent.external.requirements_parser.requirement import Requirement

 from clearml_agent.errors import CommandFailedError
-from clearml_agent.helper.base import rm_tree, NonStrictAttrs, select_for_platform, is_windows_platform, ExecutionInfo
+from clearml_agent.helper.base import (
+    rm_tree, NonStrictAttrs, select_for_platform, is_windows_platform, ExecutionInfo,
+    convert_cuda_version_to_float_single_digit_str, convert_cuda_version_to_int_10_base_str, )
 from clearml_agent.helper.process import Argv, Executable, DEVNULL, CommandSequence, PathLike
 from clearml_agent.helper.package.requirements import SimpleVersion
 from clearml_agent.session import Session
@@ -133,7 +135,12 @@ class CondaAPI(PackageManager):
        if self.env_read_only:
            print('Conda environment in read-only mode, skipping pip upgrade.')
            return ''
-        return self._install(select_for_platform(windows='pip{}', linux='pip{}').format(self.pip.get_pip_version()))
+        return self._install(
+            *select_for_platform(
+                windows=self.pip.get_pip_versions(),
+                linux=self.pip.get_pip_versions()
+            )
+        )

    def create(self):
        """
@@ -167,7 +174,7 @@ class CondaAPI(PackageManager):
                raise ValueError("Could not restore Conda environment, cannot find {}".format(
                    self.conda_pre_build_env_path))

-        output = Argv(
+        command = Argv(
            self.conda,
            "create",
            "--yes",
@@ -175,7 +182,9 @@ class CondaAPI(PackageManager):
            "--prefix",
            self.path,
            "python={}".format(self.python),
-        ).get_output(stderr=DEVNULL)
+        )
+        print('Executing Conda: {}'.format(command.serialize()))
+        output = command.get_output(stderr=DEVNULL)
        match = re.search(
            r"\W*(.*activate) ({})".format(re.escape(str(self.path))), output
        )
@@ -189,14 +198,6 @@ class CondaAPI(PackageManager):
        if conda_env.is_file() and not is_windows_platform():
            self.source = self.pip.source = CommandSequence(('source', conda_env.as_posix()), self.source)

-        # install cuda toolkit
-        # noinspection PyBroadException
-        try:
-            cuda_version = float(int(self.session.config['agent.cuda_version'])) / 10.0
-            if cuda_version > 0:
-                self._install('cudatoolkit={:.1f}'.format(cuda_version))
-        except Exception:
-            pass
        return self

    def _init_existing_environment(self, conda_pre_build_env_path):
@@ -428,7 +429,7 @@ class CondaAPI(PackageManager):
            finally:
                PackageManager._selected_manager = self

-        self.requirements_manager.post_install(self.session)
+        self.requirements_manager.post_install(self.session, package_manager=self)

    def load_requirements(self, requirements):
        # if we are in read only mode, do not uninstall anything
@@ -456,9 +457,18 @@ class CondaAPI(PackageManager):
            requirements['conda'] = requirements['conda'].split('\n')
        has_torch = False
        has_matplotlib = False
+        has_cudatoolkit = False
+        cuda_version_full = 0
+        # noinspection PyBroadException
        try:
-            cuda_version = int(self.session.config.get('agent.cuda_version', 0))
-        except:
+            # notice this is an integer version: 112 (means 11.2)
+            cuda_version = str(self.session.config.get('agent.cuda_version', "")).strip()
+            if not cuda_version:
+                cuda_version = 0
+            else:
+                cuda_version_full = convert_cuda_version_to_float_single_digit_str(cuda_version)
+                cuda_version = int(convert_cuda_version_to_int_10_base_str(cuda_version))
+        except Exception:
            cuda_version = 0

        # notice 'conda' entry with empty string is a valid conda requirements list, it means pip only
@@ -475,6 +485,7 @@ class CondaAPI(PackageManager):
                continue

            m = MarkerRequirement(marker[0])
+            m.validate_local_file_ref()
            # conda does not support version control links
            if m.vcs:
                pip_requirements.append(m)
@@ -488,6 +499,19 @@ class CondaAPI(PackageManager):
                if '.' not in m.specs[0][1]:
                    continue

+            if m.name.lower() == 'cudatoolkit':
+                # skip cuda if we are running on CPU
+                if not cuda_version:
+                    continue
+
+                has_cudatoolkit = True
+                # cuda version, only major.minor
+                requested_cuda_version = '.'.join(m.specs[0][1].split('.')[:2])
+                # make sure that the cuda_version we support can install the requested cuda (major version)
+                if int(float(requested_cuda_version)) > int(float(cuda_version)/10.0):
+                    continue
+                m.specs = [(m.specs[0][0], str(requested_cuda_version)), ]
+
            conda_supported_req_names.append(m.name.lower())
            if m.req.name.lower() == 'matplotlib':
                has_matplotlib = True
@@ -504,6 +528,11 @@ class CondaAPI(PackageManager):

            reqs.append(m)

+        if not has_cudatoolkit and cuda_version:
+            m = MarkerRequirement(Requirement.parse("cudatoolkit == {}".format(cuda_version_full)))
+            has_cudatoolkit = True
+            reqs.append(m)
+
        # if we have a conda list, the rest should be installed with pip,
        # this means  any experiment that was executed with pip environment,
        # will be installed using pip
@@ -517,9 +546,9 @@ class CondaAPI(PackageManager):
                    continue

                m = MarkerRequirement(marker[0])
-                # skip over local files (we cannot change the version to a local file)
-                if m.local_file:
-                    continue
+                # remove local files reference if it does not exist (leave the package name)
+                m.validate_local_file_ref()
+
                m_name = (m.name or '').lower()
                if m_name in conda_supported_req_names:
                    # this package is in the conda list,
@@ -559,8 +588,12 @@ class CondaAPI(PackageManager):
            # change _ to - in name but not the prefix _ (as this is conda prefix)
            if r.name and not r.name.startswith('_') and not requirements.get('conda', None):
                r.name = r.name.replace('_', '-')
-            # remove .post from version numbers, it fails ~= version, and change == to ~=
-            if r.specs and r.specs[0]:
+
+            if has_cudatoolkit and r.specs and len(r.specs[0]) > 1 and r.name == 'cudatoolkit':
+                # select specific cuda version if it came from the requirements
+                r.specs = [(r.specs[0][0].replace('==', '='), r.specs[0][1].split('.post')[0])]
+            elif r.specs and r.specs[0] and len(r.specs[0]) > 1:
+                # remove .post from version numbers it fails with ~= version, and change == to ~=
                r.specs = [(r.specs[0][0].replace('==', '~='), r.specs[0][1].split('.post')[0])]

        while reqs:
@@ -614,7 +647,7 @@ class CondaAPI(PackageManager):
            finally:
                PackageManager._selected_manager = self

-        self.requirements_manager.post_install(self.session)
+        self.requirements_manager.post_install(self.session, package_manager=self)
        return True

    def _parse_conda_result_bad_packges(self, result_dict):
--- a/clearml_agent/helper/package/external_req.py
+++ b/clearml_agent/helper/package/external_req.py
@@ -46,10 +46,17 @@ class ExternalRequirements(SimpleSubstitution):
        post_install_req = self.post_install_req
        self.post_install_req = []
        for req in post_install_req:
-            try:
-                freeze_base = PackageManager.out_of_scope_freeze() or ''
-            except:
-                freeze_base = ''
+            if self.is_already_installed(req):
+                print("No need to reinstall \'{}\' from VCS, "
+                      "the exact same version is already installed".format(req.name))
+                continue
+
+            if not req.pip_new_version:
+                # noinspection PyBroadException
+                try:
+                    freeze_base = PackageManager.out_of_scope_freeze() or dict(pip=[])
+                except Exception:
+                    freeze_base = dict(pip=[])

            req_line = self._add_vcs_credentials(req, session)

@@ -59,14 +66,14 @@ class ExternalRequirements(SimpleSubstitution):
                PackageManager.out_of_scope_install_package(req_line, "--no-deps")
                # noinspection PyBroadException
                try:
-                    freeze_post = PackageManager.out_of_scope_freeze() or ''
+                    freeze_post = PackageManager.out_of_scope_freeze() or dict(pip=[])
                    package_name = list(set(freeze_post['pip']) - set(freeze_base['pip']))
                    if package_name and package_name[0] not in self.post_install_req_lookup:
                        self.post_install_req_lookup[package_name[0]] = req.req.line
                except Exception:
                    pass

-            # no need to force reinstall, pip will always rebuilt if the package comes from git
+            # no need to force reinstall, pip will always rebuild if the package comes from git
            # and make sure the required packages are installed (if they are not it will install them)
            if not PackageManager.out_of_scope_install_package(req_line):
                raise ValueError("Failed installing GIT/HTTPs package \'{}\'".format(req_line))
@@ -85,19 +92,14 @@ class ExternalRequirements(SimpleSubstitution):
                vcs_url = req_line[4:]
                # reverse replace
                vcs_url = vcs_url[::-1].replace(fragment[::-1], '', 1)[::-1]
-                # remove ssh:// or git:// prefix for git detection and credentials
-                scheme = ''
-                if vcs_url and (vcs_url.startswith('ssh://') or vcs_url.startswith('git://')):
-                    scheme = 'ssh://'  # notice git:// is actually ssh://
-                    vcs_url = vcs_url[6:]
+                # notice git:// is actually ssh://
+                if vcs_url and vcs_url.startswith('git://'):
+                    vcs_url = vcs_url.replace('git://', 'ssh://', 1)

                from ..repo import Git
                vcs = Git(session=session, url=vcs_url, location=None, revision=None)
                vcs._set_ssh_url()
-                new_req_line = 'git+{}{}{}'.format(
-                    '' if scheme and '://' in vcs.url else scheme,
-                    vcs.url_with_auth, fragment
-                )
+                new_req_line = 'git+{}{}'.format(vcs.url_with_auth, fragment)
                if new_req_line != req_line:
                    furl_line = furl(new_req_line)
                    print('Replacing original pip vcs \'{}\' with \'{}\''.format(
@@ -175,5 +177,11 @@ class OnlyExternalRequirements(ExternalRequirements):
        # Do not store the skipped requirements
        # mark skip package
        if super(OnlyExternalRequirements, self).match(req):
+            if self.is_already_installed(req):
+                print("No need to reinstall \'{}\' from VCS, "
+                      "the exact same version is already installed".format(req.name))
+                return Text('')
+
            return self._add_vcs_credentials(req, self._session)
+
        return Text('')
--- a/clearml_agent/helper/package/pip_api/venv.py
+++ b/clearml_agent/helper/package/pip_api/venv.py
@@ -12,7 +12,7 @@ from ..requirements import RequirementsManager

 class VirtualenvPip(SystemPip, PackageManager):
    def __init__(self, session, python, requirements_manager, path, interpreter=None, execution_info=None, **kwargs):
-        # type: (Session, float, RequirementsManager, PathLike, PathLike, ExecutionInfo, Any) -> ()
+        # type: (Session, str, RequirementsManager, PathLike, PathLike, ExecutionInfo, Any) -> ()
        """
        Program interface to virtualenv pip.
        Must be given either path to virtualenv or source command.
@@ -39,7 +39,7 @@ class VirtualenvPip(SystemPip, PackageManager):
        if isinstance(requirements, dict) and requirements.get("pip"):
            requirements["pip"] = self.requirements_manager.replace(requirements["pip"])
        super(VirtualenvPip, self).load_requirements(requirements)
-        self.requirements_manager.post_install(self.session)
+        self.requirements_manager.post_install(self.session, package_manager=self)

    def create_flags(self):
        """
--- a/clearml_agent/helper/package/poetry_api.py
+++ b/clearml_agent/helper/package/poetry_api.py
@@ -5,6 +5,7 @@ import attr
 import sys
 import os
 from pathlib2 import Path
+
 from clearml_agent.helper.process import Argv, DEVNULL, check_if_command_exists
 from clearml_agent.session import Session, POETRY

@@ -68,6 +69,11 @@ class PoetryConfig:
                path = path.replace(':'+sys.base_prefix, ':'+sys.real_prefix, 1)
                kwargs['env']['PATH'] = path

+        if self.session and self.session.config:
+            extra_args = self.session.config.get("agent.package_manager.poetry_install_extra_args", None)
+            if extra_args:
+                args = args + tuple(extra_args)
+
        if check_if_command_exists("poetry"):
            argv = Argv("poetry", *args)
        else:
@@ -81,6 +87,32 @@ class PoetryConfig:
    @_guard_enabled
    def initialize(self, cwd=None):
        if not self._initialized:
+            if self.session.config.get("agent.package_manager.poetry_version", None) is not None:
+                version = str(self.session.config.get("agent.package_manager.poetry_version"))
+                print('Upgrading Poetry package {}'.format(version))
+                # first upgrade pip if we need to
+                try:
+                    from clearml_agent.helper.package.pip_api.venv import VirtualenvPip
+                    pip = VirtualenvPip(
+                        session=self.session, python=self._python,
+                        requirements_manager=None, path=None, interpreter=self._python)
+                    pip.upgrade_pip()
+                except Exception as ex:
+                    self.log.warning("failed upgrading pip: {}".format(ex))
+
+                # now install poetry
+                try:
+                    version = version.replace(' ', '')
+                    if ('=' in version) or ('~' in version) or ('<' in version) or ('>' in version):
+                        version = version
+                    elif version:
+                        version = "==" + version
+                    argv = Argv(self._python, "-m", "pip", "install", "poetry{}".format(version),
+                                "--upgrade", "--disable-pip-version-check")
+                    print(argv.get_output())
+                except Exception as ex:
+                    self.log.warning("failed upgrading poetry: {}".format(ex))
+
            self._initialized = True
            try:
                self._config("--local", "virtualenvs.in-project",  "true", cwd=cwd)
--- a/clearml_agent/helper/package/priority_req.py
+++ b/clearml_agent/helper/package/priority_req.py
@@ -1,3 +1,4 @@
+import re
 from typing import Text

 from .base import PackageManager
@@ -11,13 +12,14 @@ class PriorityPackageRequirement(SimpleSubstitution):

    def __init__(self, *args, **kwargs):
        super(PriorityPackageRequirement, self).__init__(*args, **kwargs)
+        self._replaced_packages = {}
        # check if we need to replace the packages:
        priority_packages = self.config.get('agent.package_manager.priority_packages', None)
        if priority_packages:
-            self.__class__.name = priority_packages
+            self.__class__.name = [p.lower() for p in priority_packages]
        priority_optional_packages = self.config.get('agent.package_manager.priority_optional_packages', None)
        if priority_optional_packages:
-            self.__class__.optional_package_names = priority_optional_packages
+            self.__class__.optional_package_names = [p.lower() for p in priority_optional_packages]

    def match(self, req):
        # match both Cython & cython
@@ -28,7 +30,9 @@ class PriorityPackageRequirement(SimpleSubstitution):
        Replace a requirement
        :raises: ValueError if version is pre-release
        """
-        if req.name in self.optional_package_names:
+        self._replaced_packages[req.name] = req.line
+
+        if req.name.lower() in self.optional_package_names:
            # noinspection PyBroadException
            try:
                if PackageManager.out_of_scope_install_package(str(req)):
@@ -39,6 +43,41 @@ class PriorityPackageRequirement(SimpleSubstitution):
        PackageManager.out_of_scope_install_package(str(req))
        return Text(req)

+    def replace_back(self, list_of_requirements):
+        """
+        :param list_of_requirements: {'pip': ['a==1.0', ]}
+        :return: {'pip': ['a==1.0', ]}
+        """
+        # if we replaced setuptools, it means someone requested it, and since freeze will not contain it,
+        # we need to add it manually
+        if not self._replaced_packages or "setuptools" not in self._replaced_packages:
+            return list_of_requirements
+
+        try:
+            for k, lines in list_of_requirements.items():
+                # k is either pip/conda
+                if k not in ('pip', 'conda'):
+                    continue
+                for i, line in enumerate(lines):
+                    if not line or line.lstrip().startswith('#'):
+                        continue
+                    parts = [p for p in re.split(r'\s|=|\.|<|>|~|!|@|#', line) if p]
+                    if not parts:
+                        continue
+                    # if we found setuptools, do nothing
+                    if parts[0] == "setuptools":
+                        return list_of_requirements
+
+            # if we are here it means we have not found setuptools
+            # we should add it:
+            if "pip" in list_of_requirements:
+                list_of_requirements["pip"] = [self._replaced_packages["setuptools"]] + list_of_requirements["pip"]
+
+        except Exception as ex:  # noqa
+            return list_of_requirements
+
+        return list_of_requirements
+

 class PackageCollectorRequirement(SimpleSubstitution):
    """
--- a/clearml_agent/helper/package/pytorch.py
+++ b/clearml_agent/helper/package/pytorch.py
@@ -2,17 +2,21 @@ from __future__ import unicode_literals

 import re
 import sys
+import platform
 from furl import furl
 import urllib.parse
 from operator import itemgetter
 from html.parser import HTMLParser
-from typing import Text
+from typing import Text, Optional, Dict

 import attr
 import requests

 import six
-from .requirements import SimpleSubstitution, FatalSpecsResolutionError, SimpleVersion
+from .requirements import (
+    SimpleSubstitution, FatalSpecsResolutionError, SimpleVersion, MarkerRequirement,
+    compare_version_rules, )
+from ...external.requirements_parser.requirement import Requirement

 OS_TO_WHEEL_NAME = {"linux": "linux_x86_64", "windows": "win_amd64"}

@@ -51,17 +55,16 @@ class PytorchWheel(object):
    python = attr.ib(type=str, converter=lambda x: str(x).replace(".", ""))
    torch_version = attr.ib(type=str, converter=fix_version)

-    url_template = (
-        "http://download.pytorch.org/whl/"
-        "{0.cuda_version}/torch-{0.torch_version}-cp{0.python}-cp{0.python}m{0.unicode}-{0.os_name}.whl"
-    )
+    url_template_prefix = "http://download.pytorch.org/whl/"
+    url_template = "{0.cuda_version}/torch-{0.torch_version}" \
+                   "-cp{0.python}-cp{0.python}m{0.unicode}-{0.os_name}.whl"

    def __attrs_post_init__(self):
        self.unicode = "u" if self.python.startswith("2") else ""

    def make_url(self):
        # type: () -> Text
-        return self.url_template.format(self)
+        return (self.url_template_prefix + self.url_template).format(self)


 class PytorchResolutionError(FatalSpecsResolutionError):
@@ -168,41 +171,72 @@ class PytorchRequirement(SimpleSubstitution):
    name = "torch"
    packages = ("torch", "torchvision", "torchaudio", "torchcsprng", "torchtext")

+    extra_index_url_template = 'https://download.pytorch.org/whl/cu{}/'
+    nightly_extra_index_url_template = 'https://download.pytorch.org/whl/nightly/cu{}/'
+    torch_index_url_lookup = {}
+
    def __init__(self, *args, **kwargs):
        os_name = kwargs.pop("os_override", None)
        super(PytorchRequirement, self).__init__(*args, **kwargs)
        self.log = self._session.get_logger(__name__)
        self.package_manager = self.config["agent.package_manager.type"].lower()
        self.os = os_name or self.get_platform()
-        self.cuda = "cuda{}".format(self.cuda_version).lower()
-        self.python_version_string = str(self.config["agent.default_python"])
-        self.python_major_minor_str = '.'.join(self.python_version_string.split('.')[:2])
-        if '.' not in self.python_major_minor_str:
-            raise PytorchResolutionError(
-                "invalid python version {!r} defined in configuration file, key 'agent.default_python': "
-                "must have both major and minor parts of the version (for example: '3.7')".format(
-                    self.python_version_string
-                )
-            )
-        self.python = "python{}".format(self.python_major_minor_str)
-
-        self.exceptions = [
-            PytorchResolutionError(message)
-            for message in (
-                None,
-                'cuda version "{}" is not supported'.format(self.cuda),
-                'python version "{}" is not supported'.format(
-                    self.python_version_string
-                ),
-            )
-        ]
-
-        try:
-            self.validate_python_version()
-        except PytorchResolutionError as e:
-            self.log.warn("will not be able to install pytorch wheels: %s", e.args[0])
-
+        self.cuda = None
+        self.python_version_string = None
+        self.python_major_minor_str = None
+        self.python = None
+        self._fix_setuptools = None
+        self.exceptions = []
        self._original_req = []
+        # allow override pytorch lookup pages
+        if self.config.get("agent.package_manager.extra_index_url_template", None):
+            self.extra_index_url_template = \
+                self.config.get("agent.package_manager.extra_index_url_template", None)
+        if self.config.get("agent.package_manager.nightly_extra_index_url_template", None):
+            self.nightly_extra_index_url_template = \
+                self.config.get("agent.package_manager.nightly_extra_index_url_template", None)
+        # allow override pytorch lookup pages
+        if self.config.get("agent.package_manager.torch_page", None):
+            SimplePytorchRequirement.page_lookup_template = \
+                self.config.get("agent.package_manager.torch_page", None)
+        if self.config.get("agent.package_manager.torch_nightly_page", None):
+            SimplePytorchRequirement.nightly_page_lookup_template = \
+                self.config.get("agent.package_manager.torch_nightly_page", None)
+        if self.config.get("agent.package_manager.torch_url_template_prefix", None):
+            PytorchWheel.url_template_prefix = \
+                self.config.get("agent.package_manager.torch_url_template_prefix", None)
+        if self.config.get("agent.package_manager.torch_url_template", None):
+            PytorchWheel.url_template = \
+                self.config.get("agent.package_manager.torch_url_template", None)
+
+    def _init_python_ver_cuda_ver(self):
+        if self.cuda is None:
+            self.cuda = "cuda{}".format(self.cuda_version).lower()
+        if self.python_version_string is None:
+            self.python_version_string = str(self.config["agent.default_python"])
+        if self.python_major_minor_str is None:
+            self.python_major_minor_str = '.'.join(self.python_version_string.split('.')[:2])
+            if '.' not in self.python_major_minor_str:
+                raise PytorchResolutionError(
+                    "invalid python version {!r} defined in configuration file, key 'agent.default_python': "
+                    "must have both major and minor parts of the version (for example: '3.7')".format(
+                        self.python_version_string
+                    )
+                )
+        if self.python is None:
+            self.python = "python{}".format(self.python_major_minor_str)
+
+        if not self.exceptions:
+            self.exceptions = [
+                PytorchResolutionError(message)
+                for message in (
+                    None,
+                    'cuda version "{}" is not supported'.format(self.cuda),
+                    'python version "{}" is not supported'.format(
+                        self.python_version_string
+                    ),
+                )
+            ]

    @property
    def is_conda(self):
@@ -216,6 +250,8 @@ class PytorchRequirement(SimpleSubstitution):
        """
        Make sure python version has both major and minor versions as required for choosing pytorch wheel
        """
+        self._init_python_ver_cuda_ver()
+
        if self.is_pip and not self.python_major_minor_str:
            raise PytorchResolutionError(
                "invalid python version {!r} defined in configuration file, key 'agent.default_python': "
@@ -237,10 +273,15 @@ class PytorchRequirement(SimpleSubstitution):
            return "macos"
        raise RuntimeError("unrecognized OS")

+    @staticmethod
+    def get_arch():
+        return str(platform.machine()).lower()
+
    def _get_link_from_torch_page(self, req, torch_url):
        links_parser = LinksHTMLParser()
        links_parser.feed(requests.get(torch_url, timeout=10).text)
        platform_wheel = "win" if self.get_platform() == "windows" else self.get_platform()
+        arch_wheel = self.get_arch()
        py_ver = self.python_major_minor_str.replace('.', '')
        url = None
        last_v = None
@@ -261,7 +302,13 @@ class PytorchRequirement(SimpleSubstitution):
                continue
            if len(parts) < 3 or not parts[2].endswith(py_ver):
                continue
-            if len(parts) < 5 or platform_wheel not in parts[4]:
+            if len(parts) < 5 or platform_wheel not in parts[4].lower():
+                continue
+            if len(parts) < 5 or arch_wheel not in parts[4].lower():
+                continue
+
+            # yes this is for linux python 2.7 support, this is the only python 2.7 we support...
+            if py_ver and py_ver[0] == '2' and len(parts) > 3 and not parts[3].endswith('u'):
                continue
            # update the closest matched version (from above)
            if not closest_v:
@@ -291,18 +338,21 @@ class PytorchRequirement(SimpleSubstitution):

    def get_url_for_platform(self, req):
        # check if package is already installed with system packages
+        self.validate_python_version()
        # noinspection PyBroadException
        try:
            if self.config.get("agent.package_manager.system_site_packages", None):
                from pip._internal.commands.show import search_packages_info
                installed_torch = list(search_packages_info([req.name]))
                # notice the comparison order, the first part will make sure we have a valid installed package
-                if installed_torch and installed_torch[0]['version'] and \
-                        req.compare_version(installed_torch[0]['version']):
+                installed_torch_version = (getattr(installed_torch[0], 'version', None) or installed_torch[0]['version']) \
+                    if installed_torch else None
+                if installed_torch and installed_torch_version and \
+                        req.compare_version(installed_torch_version):
                    print('PyTorch: requested "{}" version {}, using pre-installed version {}'.format(
-                        req.name, req.specs[0] if req.specs else 'unspecified', installed_torch[0]['version']))
+                        req.name, req.specs[0] if req.specs else 'unspecified', installed_torch_version))
                    # package already installed, do nothing
-                    req.specs = [('==', str(installed_torch[0]['version']))]
+                    req.specs = [('==', str(installed_torch_version))]
                    return '{} {} {}'.format(req.name, req.specs[0][0], req.specs[0][1]), True
        except Exception:
            pass
@@ -343,6 +393,11 @@ class PytorchRequirement(SimpleSubstitution):
            else:
                print('Trying PyTorch CUDA version {} support'.format(torch_url_key))

+        # fix broken pytorch setuptools incompatibility
+        if req.name == "torch" and closest_matched_version and \
+                SimpleVersion.compare_versions(closest_matched_version, "<", "1.11.0"):
+            self._fix_setuptools = "setuptools < 59"
+
        if not url:
            url = PytorchWheel(
                torch_version=fix_version(version),
@@ -420,6 +475,44 @@ class PytorchRequirement(SimpleSubstitution):
        return self.match_version(req, base).replace(" ", "\n")

    def replace(self, req):
+        # check if package is already installed with system packages
+        self.validate_python_version()
+
+        # try to check if we can just use the new index URL, if we do not we will revert to old method
+        try:
+            extra_index_url = self.get_torch_index_url(self.cuda_version)
+            if extra_index_url:
+                # check if the torch version cannot be above 1.11 , we need to fix setup tools
+                try:
+                    if req.name == "torch" and not compare_version_rules(req.specs, [(">=", "1.11.0")]):
+                        self._fix_setuptools = "setuptools < 59"
+                except Exception:  # noqa
+                    pass
+                # now we just need to add the correct extra index url for the cuda version
+                self.set_add_install_extra_index(extra_index_url[0])
+
+                if req.specs and len(req.specs) == 1 and req.specs[0][0] == "==":
+                    # remove any +cu extension and let pip resolve that
+                    # and add .* if we have 3 parts version to deal with nvidia container 'a' version
+                    # i.e. "1.13.0" -> "1.13.0.*" so it should match preinstalled "1.13.0a0+936e930"
+                    spec_3_parts = req.format_specs(num_parts=3)
+                    spec_max3_parts = req.format_specs(max_num_parts=3)
+                    if spec_3_parts == spec_max3_parts and not spec_max3_parts.endswith("*"):
+                        line = "{} {}.*".format(req.name, spec_max3_parts)
+                    else:
+                        line = "{} {}".format(req.name, spec_max3_parts)
+
+                    if req.marker:
+                        line += " ; {}".format(req.marker)
+                else:
+                    # return the original line
+                    line = req.line
+
+                return line
+
+        except Exception:  # noqa
+            pass
+
        try:
            new_req = self._replace(req)
            if new_req:
@@ -483,7 +576,7 @@ class PytorchRequirement(SimpleSubstitution):
                for i, line in enumerate(lines):
                    if not line or line.lstrip().startswith('#'):
                        continue
-                    parts = [p for p in re.split('\s|=|\.|<|>|~|!|@|#', line) if p]
+                    parts = [p for p in re.split(r'\s|=|\.|<|>|~|!|@|#', line) if p]
                    if not parts:
                        continue
                    for req, new_req in self._original_req:
@@ -505,6 +598,61 @@ class PytorchRequirement(SimpleSubstitution):

        return list_of_requirements

+    def post_scan_add_req(self):  # type: () -> Optional[MarkerRequirement]
+        """
+        Allows the RequirementSubstitution to add an extra line/requirements after
+        the initial requirements scan is completed.
+        Called only once per requirements.txt object
+        """
+        if self._fix_setuptools:
+            return MarkerRequirement(Requirement.parse(self._fix_setuptools))
+        return None
+
+    @classmethod
+    def get_torch_index_url(cls, cuda_version, nightly=False):
+        # noinspection PyBroadException
+        try:
+            cuda = int(cuda_version)
+        except Exception:
+            cuda = 0
+
+        if nightly:
+            for c in range(cuda, max(-1, cuda-15), -1):
+                # then try the nightly builds, it might be there...
+                torch_url = cls.nightly_extra_index_url_template.format(c)
+                # noinspection PyBroadException
+                try:
+                    if requests.get(torch_url, timeout=10).ok:
+                        print('Torch nightly CUDA {} index page found'.format(c))
+                        cls.torch_index_url_lookup[c] = torch_url
+                        return cls.torch_index_url_lookup[c], c
+                except Exception:
+                    pass
+            return
+
+        # first check if key is valid
+        if cuda in cls.torch_index_url_lookup:
+            return cls.torch_index_url_lookup[cuda], cuda
+
+        # then try a new cuda version page
+        for c in range(cuda, max(-1, cuda-15), -1):
+            torch_url = cls.extra_index_url_template.format(c)
+            # noinspection PyBroadException
+            try:
+                if requests.get(torch_url, timeout=10).ok:
+                    print('Torch CUDA {} index page found'.format(c))
+                    cls.torch_index_url_lookup[c] = torch_url
+                    return cls.torch_index_url_lookup[c], c
+            except Exception:
+                pass
+
+        keys = sorted(cls.torch_index_url_lookup.keys(), reverse=True)
+        for k in keys:
+            if k <= cuda:
+                return cls.torch_index_url_lookup[k], k
+        # return default - zero
+        return cls.torch_index_url_lookup[0], 0
+
    MAP = {
        "windows": {
            "cuda100": {
--- a/clearml_agent/helper/package/requirements.py
+++ b/clearml_agent/helper/package/requirements.py
@@ -11,11 +11,15 @@ from os import path
 from typing import Text, List, Type, Optional, Tuple, Dict

 from pathlib2 import Path
-from pyhocon import ConfigTree
+from clearml_agent.external.pyhocon import ConfigTree

 import six
+from six.moves.urllib.parse import unquote
+import logging
 from clearml_agent.definitions import PIP_EXTRA_INDICES
-from clearml_agent.helper.base import warning, is_conda, which, join_lines, is_windows_platform
+from clearml_agent.helper.base import (
+    warning, is_conda, which, join_lines, is_windows_platform,
+    convert_cuda_version_to_int_10_base_str, )
 from clearml_agent.helper.process import Argv, PathLike
 from clearml_agent.helper.gpu.gpustat import get_driver_cuda_version
 from clearml_agent.session import Session, normalize_cuda_version
@@ -96,7 +100,8 @@ class MarkerRequirement(object):
            return ','.join(starmap(operator.add, self.specs))

        op, version = self.specs[0]
-        for v in self._sub_versions_pep440:
+        # noinspection PyProtectedMember
+        for v in SimpleVersion._sub_versions_pep440:
            version = version.replace(v, '.')
        if num_parts:
            version = (version.strip('.').split('.') + ['0'] * num_parts)[:max_num_parts]
@@ -153,6 +158,33 @@ class MarkerRequirement(object):
        return SimpleVersion.compare_versions(
            version_a=requested_version, op=op, version_b=version, num_parts=num_parts)

+    def remove_local_file_ref(self):
+        if not self.local_file or self.vcs or self.editable or self.path:
+            return False
+        parts = re.split(r"@\s*{}".format(self.req.uri), self.req.line)
+        # if we did not find anything do nothing
+        if len(parts) < 2:
+            return False
+        self.req.line = ''.join(parts).strip()
+        self.req.uri = None
+        self.req.local_file = False
+        return True
+
+    def validate_local_file_ref(self):
+        # if local file does not exist, remove the reference to it
+        if self.vcs or self.editable or self.path or not self.local_file or not self.name or \
+                not self.uri or not self.uri.startswith("file://"):
+            return
+        local_path = Path(self.uri[len("file://"):])
+        if not local_path.exists():
+            local_path = Path(unquote(self.uri)[len("file://"):])
+            if not local_path.exists():
+                line = self.line
+                if self.remove_local_file_ref():
+                    # print warning
+                    logging.getLogger(__name__).warning(
+                        'Local file not found [{}], references removed'.format(line))
+

 class SimpleVersion:
    _sub_versions_pep440 = ['a', 'b', 'rc', '.post', '.dev', '+', ]
@@ -208,7 +240,11 @@ class SimpleVersion:
        if not version_b:
            return True

+        if not num_parts:
+            num_parts = max(len(version_a.split('.')), len(version_b.split('.')), )
+
        if op == '~=':
+            num_parts = len(version_b.split('.')) - 1
            num_parts = max(num_parts, 2)
            op = '=='
            ignore_sub_versions = True
@@ -243,8 +279,20 @@ class SimpleVersion:
            return version_a_key > version_b_key
        if op == '<':
            return version_a_key < version_b_key
+        if op == '!=':
+            return version_a_key != version_b_key
        raise ValueError('Unrecognized comparison operator [{}]'.format(op))

+    @classmethod
+    def max_version(cls, version_a, version_b):
+        return version_a if cls.compare_versions(
+            version_a=version_a, op='>=', version_b=version_b, num_parts=None) else version_b
+
+    @classmethod
+    def min_version(cls, version_a, version_b):
+        return version_a if cls.compare_versions(
+            version_a=version_a, op='<=', version_b=version_b, num_parts=None) else version_b
+
    @staticmethod
    def _parse_letter_version(
            letter,  # type: str
@@ -313,17 +361,94 @@ class SimpleVersion:
        return ()


+def compare_version_rules(specs_a, specs_b):
+    # specs_a/b are a list of tuples: [('==', '1.2.3'), ] or [('>=', '1.2'), ('<', '1.3')]
+    # section definition:
+    class Section(object):
+        def __init__(self, left="-999999999", left_eq=False, right="999999999", right_eq=False):
+            self.left, self.left_eq, self.right, self.right_eq = left, left_eq, right, right_eq
+    # first create a list of in/out sections for each spec
+    # >, >= are left rule
+    # <, <= are right rule
+    # ~= x.y.z is converted to: >= x.y and < x.y+1
+    # ==/=== are converted to: >= and <=
+    # != x.y.z will split a section into: left < x.y.z and right > x.y.z
+    def create_section(specs):
+        section = Section()
+        for op, v in specs:
+            a = section
+            if op == '>':
+                a.left = v
+                a.left_eq = False
+            elif op == '>=':
+                a.left = v
+                a.left_eq = True
+            elif op == '<':
+                a.right = v
+                a.right_eq = False
+            elif op == '<=':
+                a.right = v
+                a.right_eq = True
+            elif op == '==':
+                a.left = v
+                a.left_eq = True
+                a.right = v
+                a.right_eq = True
+            elif op == '~=':
+                new_v = v.split('.')
+                a_left = '.'.join(new_v[:-1])
+                a.left = a_left if not a.left else SimpleVersion.max_version(a_left, a.left)
+                a.left_eq = True
+                a_right = '.'.join(new_v[:-2] + [str(int(new_v[-2])+1)])
+                a.right = a_right if not a.right else SimpleVersion.min_version(a_right, a.right)
+                a.right_eq = False if a.right == a_right else a.right_eq
+
+        return section
+
+    section_a = create_section(specs_a)
+    section_b = create_section(specs_b)
+    i = Section()
+    # then we have a list of sections for spec A/B
+    if section_a.left == section_b.left:
+        i.left = section_a.left
+        i.left_eq = section_a.left_eq and section_b.left_eq
+    else:
+        i.left = SimpleVersion.max_version(section_a.left, section_b.left)
+        i.left_eq = section_a.left_eq if i.left == section_a.left else section_b.left_eq
+    if section_a.right == section_b.right:
+        i.right = section_a.right
+        i.right_eq = section_a.right_eq and section_b.right_eq
+    else:
+        i.right = SimpleVersion.min_version(section_a.right, section_b.right)
+        i.right_eq = section_a.right_eq if i.right == section_a.right else section_b.right_eq
+
+    # return true if any section from A intersects a section from B
+    valid = True
+    valid &= SimpleVersion.compare_versions(
+        version_a=i.left, op='<=' if i.left_eq else '<', version_b=i.right, num_parts=None)
+    valid &= SimpleVersion.compare_versions(
+        version_a=i.right, op='>=' if i.left_eq else '>', version_b=i.left, num_parts=None)
+
+    return valid
+
+
@six.add_metaclass(ABCMeta)
 class RequirementSubstitution(object):

    _pip_extra_index_url = PIP_EXTRA_INDICES

+    @classmethod
+    def set_add_install_extra_index(cls, extra_index_url):
+        if extra_index_url not in cls._pip_extra_index_url:
+            cls._pip_extra_index_url.append(extra_index_url)
+
    def __init__(self, session):
        # type: (Session) -> ()
        self._session = session
        self.config = session.config  # type: ConfigTree
        self.suffix = '.post{config[agent.cuda_version]}.dev{config[agent.cudnn_version]}'.format(config=self.config)
        self.package_manager = self.config['agent.package_manager.type']
+        self._is_already_installed_cb = None

    @abstractmethod
    def match(self, req):  # type: (MarkerRequirement) -> bool
@@ -339,6 +464,20 @@ class RequirementSubstitution(object):
        """
        pass

+    def set_is_already_installed_cb(self, cb):
+        self._is_already_installed_cb = cb
+
+    def is_already_installed(self, req):
+        if not self._is_already_installed_cb:
+            return False
+        # noinspection PyBroadException
+        try:
+            return self._is_already_installed_cb(req)
+        except BaseException as ex:
+            # debug could not resolve something
+            print("Warning: Requirements post install callback exception (check if package installed): {}".format(ex))
+            return False
+
    def post_scan_add_req(self):  # type: () -> Optional[MarkerRequirement]
        """
        Allows the RequirementSubstitution to add an extra line/requirements after
@@ -363,7 +502,7 @@ class RequirementSubstitution(object):

    @property
    def cuda_version(self):
-        return self.config['agent.cuda_version']
+        return convert_cuda_version_to_int_10_base_str(self.config['agent.cuda_version'])

    @property
    def cudnn_version(self):
@@ -449,6 +588,7 @@ class RequirementsManager(object):
                                                 cache_dir=pip_cache_dir.as_posix())
        self._base_interpreter = base_interpreter
        self._cwd = None
+        self._installed_parsed_packages = set()

    def register(self, cls):  # type: (Type[RequirementSubstitution]) -> None
        self.handlers.append(cls(self._session))
@@ -468,20 +608,9 @@ class RequirementsManager(object):
        return None

    def replace(self, requirements):  # type: (Text) -> Text
-        def safe_parse(req_str):
-            # noinspection PyBroadException
-            try:
-                return list(parse(req_str, cwd=self._cwd))
-            except Exception as ex:
-                return [Requirement(req_str)]
+        parsed_requirements = self.parse_requirements_section_to_marker_requirements(
+            requirements=requirements, cwd=self._cwd)

-        parsed_requirements = tuple(
-            map(
-                MarkerRequirement,
-                [r for line in (requirements.splitlines() if isinstance(requirements, six.text_type) else requirements)
-                 for r in safe_parse(line)]
-            )
-        )
        if not parsed_requirements:
            # return the original requirements just in case
            return requirements
@@ -510,14 +639,29 @@ class RequirementsManager(object):

        result = list(result)
        # add post scan add requirements call back
+        double_req_set = None
        for h in self.handlers:
-            req = h.post_scan_add_req()
-            if req:
-                result.append(req.tostr())
+            reqs = h.post_scan_add_req()
+            if reqs:
+                if double_req_set is None:
+                    def safe_parse_name(line):
+                        try:
+                            return Requirement.parse(line).name
+                        except:  # noqa
+                            return None
+                    double_req_set = set([safe_parse_name(r) for r in result if r])
+
+                for r in (reqs if isinstance(reqs, (tuple, list)) else [reqs]):
+                    if r and (not r.name or r.name not in double_req_set):
+                        result.append(r.tostr())
+                    elif r:
+                        print("SKIPPING additional auto installed package: \"{}\"".format(r))

        return join_lines(result)

-    def post_install(self, session):
+    def post_install(self, session, package_manager=None):
+        if package_manager:
+            self.update_installed_packages_state(package_manager.freeze())
        for h in self.handlers:
            try:
                h.post_install(session)
@@ -539,6 +683,34 @@ class RequirementsManager(object):
    def get_interpreter(self):
        return self._base_interpreter

+    def update_installed_packages_state(self, requirements):
+        """
+        Updates internal Installed Packages objects, so that later we can detect
+        if we already have a pre-installed package
+        :param requirements: is the output of a freeze() call, i.e. dict {'pip': "package==version"}
+        """
+        requirements = requirements if not isinstance(requirements, dict) else requirements.get("pip")
+        self._installed_parsed_packages = self.parse_requirements_section_to_marker_requirements(
+                requirements=requirements, cwd=self._cwd)
+        for h in self.handlers:
+            h.set_is_already_installed_cb(self._callback_is_already_installed)
+
+    def _callback_is_already_installed(self, req):
+        for p in (self._installed_parsed_packages or []):
+            if p.name != req.name:
+                continue
+            # if this is version control package, only return true of both installed and requests specify commit ID
+            if req.vcs:
+                return p.vcs and req.revision and req.revision == p.revision
+
+            if not req.specs and not p.specs:
+                return True
+
+            # return if this is the same version
+            return req.specs and p.specs and req.compare_version(p, op="==")
+
+        return False
+
    @staticmethod
    def get_cuda_version(config):  # type: (ConfigTree) -> (Text, Text)
        # we assume os.environ already updated the config['agent.cuda_version'] & config['agent.cudnn_version']
@@ -614,3 +786,29 @@ class RequirementsManager(object):

        return (normalize_cuda_version(cuda_version or 0),
                normalize_cuda_version(cudnn_version or 0))
+
+    @staticmethod
+    def parse_requirements_section_to_marker_requirements(requirements, cwd=None):
+        def safe_parse(req_str):
+            # noinspection PyBroadException
+            try:
+                return list(parse(req_str, cwd=cwd))
+            except Exception as ex:
+                return [Requirement(req_str)]
+
+        def create_req(x):
+            r = MarkerRequirement(x)
+            r.validate_local_file_ref()
+            return r
+
+        if not requirements:
+            return tuple()
+
+        parsed_requirements = tuple(
+            map(
+                create_req,
+                [r for line in (requirements.splitlines() if isinstance(requirements, str) else requirements)
+                 for r in safe_parse(line)]
+            )
+        )
+        return parsed_requirements
--- a/clearml_agent/helper/package/translator.py
+++ b/clearml_agent/helper/package/translator.py
@@ -1,3 +1,4 @@
+from tempfile import mkdtemp
 from typing import Text

 from furl import furl
@@ -20,7 +21,16 @@ class RequirementsTranslator(object):
        config = session.config
        self.cache_dir = cache_dir or Path(config["agent.pip_download_cache.path"]).expanduser().as_posix()
        self.enabled = config["agent.pip_download_cache.enabled"]
-        Path(self.cache_dir).mkdir(parents=True, exist_ok=True)
+        # noinspection PyBroadException
+        try:
+            Path(self.cache_dir).mkdir(parents=True, exist_ok=True)
+        except Exception:
+            temp_cache_folder = mkdtemp(prefix='pip_download_cache.')
+            print("Failed creating pip download cache folder at `{}` reverting to `{}`".format(
+                self.cache_dir, temp_cache_folder))
+            self.cache_dir = temp_cache_folder
+            Path(self.cache_dir).mkdir(parents=True, exist_ok=True)
+
        self.config = Config()
        self.pip = SystemPip(interpreter=interpreter, session=self._session)
        self._translate_back = {}
--- a/clearml_agent/helper/process.py
+++ b/clearml_agent/helper/process.py
@@ -16,7 +16,6 @@ from typing import Union, Text, Sequence, Any, TypeVar, Callable

 import psutil
 from furl import furl
-from future.builtins import super
 from pathlib2 import Path

 import six
@@ -26,7 +25,7 @@ from clearml_agent.helper.base import bash_c, is_windows_platform, select_for_pl
 PathLike = Union[Text, Path]


-def get_bash_output(cmd, strip=False, stderr=subprocess.STDOUT, stdin=False):
+def get_bash_output(cmd, strip=False, stderr=subprocess.STDOUT, stdin=False, raise_error=False):
    try:
        output = (
            subprocess.check_output(
@@ -38,10 +37,16 @@ def get_bash_output(cmd, strip=False, stderr=subprocess.STDOUT, stdin=False):
            .strip()
        )
    except subprocess.CalledProcessError:
+        if raise_error:
+            raise
        output = None
    return output if not strip or not output else output.strip()


+def stringify_bash_output(value):
+    return '' if not value else (value if isinstance(value, str) else value.decode('utf-8'))
+
+
 def terminate_process(pid, timeout=10., ignore_zombie=True, include_children=False):
    # noinspection PyBroadException
    try:
@@ -112,10 +117,11 @@ def terminate_all_child_processes(pid=None, timeout=10., include_parent=True):


 def get_docker_id(docker_cmd_contains):
+    # noinspection PyBroadException
    try:
        containers_running = get_bash_output(cmd='docker ps --no-trunc --format \"{{.ID}}: {{.Command}}\"')
        for docker_line in containers_running.split('\n'):
-            parts = docker_line.split(':')
+            parts = docker_line.split(':', 1)
            if docker_cmd_contains in parts[-1]:
                # we found our docker, return it
                return parts[0]
--- a/clearml_agent/helper/repo.py
+++ b/clearml_agent/helper/repo.py
@@ -1,7 +1,11 @@
 import abc
+import os
 import re
 import shutil
+import stat
 import subprocess
+import sys
+import tempfile
 from distutils.spawn import find_executable
 from hashlib import md5
 from os import environ
@@ -23,7 +27,7 @@ from clearml_agent.helper.base import (
    rm_tree,
    ExecutionInfo,
    normalize_path,
-    create_file_if_not_exists,
+    create_file_if_not_exists, safe_remove_file,
 )
 from clearml_agent.helper.os.locks import FileLock
 from clearml_agent.helper.process import DEVNULL, Argv, PathLike, COMMAND_SUCCESS
@@ -108,7 +112,7 @@ class VCS(object):
        )
        self.url = url
        self.location = Text(location)
-        self.revision = revision
+        self._revision = revision
        self.log = self.session.get_logger(__name__)

    @property
@@ -118,6 +122,13 @@ class VCS(object):
        """
        return self.add_auth(self.session.config, self.url)

+    @property
+    def url_without_auth(self):
+        """
+        Return URL without configured user/password
+        """
+        return self.add_auth(self.session.config, self.url, reset_auth=True)
+
    @abc.abstractmethod
    def executable_name(self):
        """
@@ -309,6 +320,7 @@ class VCS(object):
                        self.url, new_url))
                    self.url = new_url
                return
+
            # rewrite ssh URLs only if either ssh port or ssh user are forced in config
            if parsed_url.scheme == "ssh" and (
                self.session.config.get('agent.force_git_ssh_port', None) or
@@ -323,6 +335,9 @@ class VCS(object):
                    print("Using SSH credentials - ssh url '{}' with ssh url '{}'".format(
                        self.url, new_url))
                    self.url = new_url
+                return
+            elif parsed_url.scheme == "ssh":
+                return

        if not self.session.config.agent.translate_ssh:
            return
@@ -332,7 +347,7 @@ class VCS(object):
                (ENV_AGENT_GIT_PASS.get() or self.session.config.get('agent.git_pass', None)):
            # only apply to a specific domain (if requested)
            config_domain = \
-                ENV_AGENT_GIT_HOST.get() or self.session.config.get("git_host", None)
+                ENV_AGENT_GIT_HOST.get() or self.session.config.get("agent.git_host", None)
            if config_domain and config_domain != furl(self.url).host:
                return

@@ -349,7 +364,9 @@ class VCS(object):
        If not in debug mode, filter VCS password from output.
        """
        self._set_ssh_url()
-        clone_command = ("clone", self.url_with_auth, self.location) + self.clone_flags
+        # if we are on linux no need for the full auth url because we use GIT_ASKPASS
+        url = self.url_without_auth if self._use_ask_pass else self.url_with_auth
+        clone_command = ("clone", url, self.location) + self.clone_flags
        # clone all branches regardless of when we want to later checkout
        # if branch:
        #    clone_command += ("-b", branch)
@@ -357,40 +374,41 @@ class VCS(object):
            self.call(*clone_command)
            return

-        def normalize_output(result):
-            """
-            Returns result string without user's password.
-            NOTE: ``self.get_stderr``'s result might or might not have the same type as ``e.output`` in case of error.
-            """
-            string_type = (
-                ensure_text
-                if isinstance(result, six.text_type)
-                else ensure_binary
-            )
-            return result.replace(
-                string_type(self.url),
-                string_type(furl(self.url).remove(password=True).tostr()),
-            )
-
-        def print_output(output):
-            print(ensure_text(output))
-
        try:
-            print_output(normalize_output(self.get_stderr(*clone_command)))
+            self._print_output(self._normalize_output(self.get_stderr(*clone_command)))
        except subprocess.CalledProcessError as e:
            # In Python 3, subprocess.CalledProcessError has a `stderr` attribute,
            # but since stderr is redirect to `subprocess.PIPE` it will appear in the usual `output` attribute
            if e.output:
-                e.output = normalize_output(e.output)
-                print_output(e.output)
+                e.output = self._normalize_output(e.output)
+                self._print_output(e.output)
            raise

+    def _normalize_output(self, result):
+        """
+        Returns result string without user's password.
+        NOTE: ``self.get_stderr``'s result might or might not have the same type as ``e.output`` in case of error.
+        """
+        string_type = (
+            ensure_text
+            if isinstance(result, six.text_type)
+            else ensure_binary
+        )
+        return result.replace(
+            string_type(self.url),
+            string_type(furl(self.url).remove(password=True).tostr()),
+        )
+
+    @staticmethod
+    def _print_output(output):
+        print(ensure_text(output))
+
    def checkout(self):
        # type: () -> None
        """
        Checkout repository at specified revision
        """
-        self.call("checkout", self.revision, *self.checkout_flags, cwd=self.location)
+        self.call("checkout", self._revision, *self.checkout_flags, cwd=self.location)

    @abc.abstractmethod
    def pull(self):
@@ -473,16 +491,18 @@ class VCS(object):
        return Argv(self.executable_name, *argv)

    @classmethod
-    def add_auth(cls, config, url):
+    def add_auth(cls, config, url, reset_auth=False):
        """
        Add username and password to URL if missing from URL and present in config.
        Does not modify ssh URLs.
+
+        :param reset_auth: If true remove the user/pass from the URL (default False)
        """
        try:
            parsed_url = furl(url)
        except ValueError:
            return url
-        if parsed_url.scheme in ["", "ssh"] or parsed_url.scheme.startswith("git"):
+        if parsed_url.scheme in ["", "ssh"] or (parsed_url.scheme or '').startswith("git"):
            return parsed_url.url
        config_user = ENV_AGENT_GIT_USER.get() or config.get("agent.{}_user".format(cls.executable_name), None)
        config_pass = ENV_AGENT_GIT_PASS.get() or config.get("agent.{}_pass".format(cls.executable_name), None)
@@ -493,7 +513,10 @@ class VCS(object):
            and config_pass
            and (not config_domain or config_domain.lower() == parsed_url.host)
        ):
-            parsed_url.set(username=config_user, password=config_pass)
+            if reset_auth:
+                parsed_url.set(username=None, password=None)
+            else:
+                parsed_url.set(username=config_user, password=config_pass)
        return parsed_url.url

    @abc.abstractmethod
@@ -519,7 +542,7 @@ class VCS(object):

 class Git(VCS):
    executable_name = "git"
-    main_branch = "master"
+    main_branch = ("master", "main")
    clone_flags = ("--quiet", "--recursive")
    checkout_flags = ("--force",)
    COMMAND_ENV = {
@@ -529,9 +552,22 @@ class Git(VCS):
        "GIT_SSH_COMMAND": "ssh -oBatchMode=yes",
    }

+    def __init__(self, *args, **kwargs):
+        super(Git, self).__init__(*args, **kwargs)
+
+        self._use_ask_pass = False if not self.session.config.get('agent.enable_git_ask_pass', None) \
+            else sys.platform == "linux"
+
+        try:
+            self.call("config", "--global", "--replace-all", "safe.directory", "*", cwd=self.location)
+        except:  # noqa
+            pass
+
    @staticmethod
    def remote_branch_name(branch):
-        return "origin/{}".format(branch)
+        return [
+            "origin/{}".format(b) for b in ([branch] if isinstance(branch, str) else branch)
+        ]

    def executable_not_found_error_help(self):
        return 'Cannot find "{}" executable. {}'.format(
@@ -549,11 +585,79 @@ class Git(VCS):
    def pull(self):
        self.call("fetch", "--all", "--recurse-submodules", cwd=self.location)

+    def _git_pass_auth_wrapper(self, func, *args, **kwargs):
+        try:
+            url_with_auth = furl(self.url_with_auth)
+            password = url_with_auth.password if url_with_auth else None
+            username = url_with_auth.username if url_with_auth else None
+        except:  # noqa
+            password = None
+            username = None
+
+        # if this is not linux or we do not have a password, just run as is
+        if not self._use_ask_pass or not password or not username:
+            return func(*args, **kwargs)
+
+        # create the password file
+        fp, pass_file = tempfile.mkstemp(prefix='clearml_git_', suffix='.sh')
+        os.close(fp)
+        with open(pass_file, 'wt') as f:
+            # get first letter only (username / password are the argument options)
+            # then echo the correct information
+            f.writelines([
+                '#!/bin/bash\n',
+                'c="$1"\n',
+                'c="${c%"${c#?}"}"\n',
+                'if [ "$c" == "u" ] || [ "$c" == "U" ]; then echo "{}"; else echo "{}"; fi\n'.format(
+                    username.replace('"', '\\"'), password.replace('"', '\\"')
+                )
+            ])
+        # mark executable
+        st = os.stat(pass_file)
+        os.chmod(pass_file, st.st_mode | stat.S_IEXEC)
+        # let GIT use it
+        self.COMMAND_ENV["GIT_ASKPASS"] = pass_file
+        # call git command
+        try:
+            ret = func(*args, **kwargs)
+        finally:
+            # delete temp password file
+            self.COMMAND_ENV.pop("GIT_ASKPASS", None)
+            safe_remove_file(pass_file)
+
+        return ret
+
+    def get_stderr(self, *argv, **kwargs):
+        """
+        Wrapper with git password authentication
+        """
+        return self._git_pass_auth_wrapper(super(Git, self).get_stderr, *argv, **kwargs)
+
+    def call_with_stdin(self, *argv, **kwargs):
+        """
+        Wrapper with git password authentication
+        """
+        return self._git_pass_auth_wrapper(super(Git, self).call_with_stdin, *argv, **kwargs)
+
+    def call(self, *argv, **kwargs):
+        """
+        Wrapper with git password authentication
+        """
+        return self._git_pass_auth_wrapper(super(Git, self).call, *argv, **kwargs)
+
    def checkout(self):  # type: () -> None
        """
        Checkout repository at specified revision
        """
-        self.call("checkout", self.revision, *self.checkout_flags, cwd=self.location)
+        revisions = [self._revision] if isinstance(self._revision, str) else self._revision
+        for i, revision in enumerate(revisions):
+            try:
+                self.call("checkout", revision, *self.checkout_flags, cwd=self.location)
+                break
+            except subprocess.CalledProcessError:
+                if i == len(revisions) - 1:
+                    raise
+
        try:
            self.call("submodule", "update", "--recursive", cwd=self.location)
        except:  # noqa
@@ -593,7 +697,7 @@ class Hg(VCS):
            "pull",
            self.url_with_auth,
            cwd=self.location,
-            *(("-r", self.revision) if self.revision else ())
+            *(("-r", self._revision) if self._revision else ())
        )

    info_commands = dict(
@@ -663,7 +767,9 @@ def clone_repository_cached(session, execution, destination):

            vcs.pull()
            rm_tree(destination)
-            shutil.copytree(Text(cached_repo_path), Text(clone_folder))
+            shutil.copytree(Text(cached_repo_path), Text(clone_folder),
+                            symlinks=select_for_platform(linux=True, windows=False),
+                            ignore_dangling_symlinks=True)
            if not clone_folder.is_dir():
                raise CommandFailedError(
                    "copying of repository failed: from {} to {}".format(
@@ -671,9 +777,9 @@ def clone_repository_cached(session, execution, destination):
                    )
                )

-    # checkout in the newly copy destination
-    vcs.location = Text(clone_folder)
-    vcs.checkout()
+            # checkout in the newly copy destination
+            vcs.location = Text(clone_folder)
+            vcs.checkout()

    repo_info = vcs.get_repository_copy_info(clone_folder)

--- a/clearml_agent/helper/resource_monitor.py
+++ b/clearml_agent/helper/resource_monitor.py
@@ -82,7 +82,7 @@ class ResourceMonitor(object):
        if not worker_tags and ENV_WORKER_TAGS.get():
            worker_tags = shlex.split(ENV_WORKER_TAGS.get())
        self._worker_tags = worker_tags
-        if os.environ.get('NVIDIA_VISIBLE_DEVICES') == 'none':
+        if Session.get_nvidia_visible_env() == 'none':
            # NVIDIA_VISIBLE_DEVICES set to none, marks cpu_only flag
            # active_gpus == False means no GPU reporting
            self._active_gpus = False
@@ -92,10 +92,10 @@ class ResourceMonitor(object):
            # None means no filtering, report all gpus
            self._active_gpus = None
            try:
-                active_gpus = os.environ.get('NVIDIA_VISIBLE_DEVICES', '') or \
-                              os.environ.get('CUDA_VISIBLE_DEVICES', '')
-                if active_gpus:
-                    self._active_gpus = [int(g.strip()) for g in active_gpus.split(',')]
+                active_gpus = Session.get_nvidia_visible_env()
+                # None means no filtering, report all gpus
+                if active_gpus and active_gpus != "all":
+                    self._active_gpus = [g.strip() for g in str(active_gpus).split(',')]
            except Exception:
                pass

@@ -139,42 +139,45 @@ class ResourceMonitor(object):
    def _daemon(self):
        seconds_since_started = 0
        reported = 0
-        while True:
-            last_report = time()
-            current_report_frequency = (
-                self._report_frequency if reported != 0 else self._first_report_sec
-            )
-            while (time() - last_report) < current_report_frequency:
-                # wait for self._sample_frequency seconds, if event set quit
-                if self._exit_event.wait(1 / self._sample_frequency):
-                    return
-                # noinspection PyBroadException
-                try:
-                    self._update_readouts()
-                except Exception as ex:
-                    log.warning("failed getting machine stats: %s", report_error(ex))
-                    self._failure()
+        try:
+            while True:
+                last_report = time()
+                current_report_frequency = (
+                    self._report_frequency if reported != 0 else self._first_report_sec
+                )
+                while (time() - last_report) < current_report_frequency:
+                    # wait for self._sample_frequency seconds, if event set quit
+                    if self._exit_event.wait(1 / self._sample_frequency):
+                        return
+                    # noinspection PyBroadException
+                    try:
+                        self._update_readouts()
+                    except Exception as ex:
+                        log.warning("failed getting machine stats: %s", report_error(ex))
+                        self._failure()

-            seconds_since_started += int(round(time() - last_report))
-            # check if we do not report any metric (so it means the last iteration will not be changed)
+                seconds_since_started += int(round(time() - last_report))
+                # check if we do not report any metric (so it means the last iteration will not be changed)

-            # if we do not have last_iteration, we just use seconds as iteration
+                # if we do not have last_iteration, we just use seconds as iteration

-            # start reporting only when we figured out, if this is seconds based, or iterations based
-            average_readouts = self._get_average_readouts()
-            stats = {
-                # 3 points after the dot
-                key: round(value, 3) if isinstance(value, float) else [round(v, 3) for v in value]
-                for key, value in average_readouts.items()
-            }
+                # start reporting only when we figured out, if this is seconds based, or iterations based
+                average_readouts = self._get_average_readouts()
+                stats = {
+                    # 3 points after the dot
+                    key: round(value, 3) if isinstance(value, float) else [round(v, 3) for v in value]
+                    for key, value in average_readouts.items()
+                }

-            # send actual report
-            if self.send_report(stats):
-                # clear readouts if this is update was sent
-                self._clear_readouts()
+                # send actual report
+                if self.send_report(stats):
+                    # clear readouts if this is update was sent
+                    self._clear_readouts()

-            # count reported iterations
-            reported += 1
+                # count reported iterations
+                reported += 1
+        except Exception as ex:
+            log.exception("Error reporting monitoring info: %s", str(ex))

    def _update_readouts(self):
        readouts = self._machine_stats()
@@ -263,7 +266,7 @@ class ResourceMonitor(object):
                gpu_stat = self._gpustat.new_query()
                for i, g in enumerate(gpu_stat.gpus):
                    # only monitor the active gpu's, if none were selected, monitor everything
-                    if self._active_gpus and i not in self._active_gpus:
+                    if self._active_gpus and str(i) not in self._active_gpus:
                        continue
                    stats["gpu_temperature_{:d}".format(i)] = g["temperature.gpu"]
                    stats["gpu_utilization_{:d}".format(i)] = g["utilization.gpu"]
--- a/clearml_agent/interface/worker.py
+++ b/clearml_agent/interface/worker.py
@@ -22,7 +22,7 @@ WORKER_ARGS = {
        'help': 'git username for repository access',
    },
    '--git-pass': {
-        'help': 'git password for repository access',
+        'help': 'git password (personal access tokens) for repository access',
    },
    '--log-level': {
        'help': 'SDK log level',
@@ -99,12 +99,14 @@ DAEMON_ARGS = dict({
        'aliases': ['-d'],
    },
    '--stop': {
-        'help': 'Stop the running agent (based on the same set of arguments)',
-        'action': 'store_true',
+        'help': 'Stop the running agent (based on the same set of arguments). '
+                'Optional: provide a list of specific local worker IDs to stop',
+        'nargs': '*',
+        'default': False,
    },
    '--dynamic-gpus': {
        'help': 'Allow to dynamically allocate gpus based on queue properties, '
-                'configure with \'--queues <queue_name>=<num_gpus>\'.'
+                'configure with \'--queue <queue_name>=<num_gpus>\'.'
                ' Example: \'--dynamic-gpus --gpus 0-3 --queue dual_gpus=2 single_gpu=1\''
                ' Example Opportunistic: \'--dynamic-gpus --gpus 0-3 --queue dual_gpus=2 max_quad_gpus=1-4 \'',
        'action': 'store_true',
@@ -165,7 +167,7 @@ COMMANDS = {
            },
            '--docker': {
                'help': 'Run execution task inside a docker (v19.03 and above). Optional args <image> <arguments> or '
-                        'specify default docker image in agent.default_docker.image / agent.default_docker.arguments'
+                        'specify default docker image in agent.default_docker.image / agent.default_docker.arguments '
                        'use --gpus/--cpu-only (or set NVIDIA_VISIBLE_DEVICES) to limit gpu visibility for docker',
                'nargs': '*',
                'default': False,
@@ -199,11 +201,18 @@ COMMANDS = {
            },
            '--docker': {
                'help': 'Build the experiment inside a docker (v19.03 and above). Optional args <image> <arguments> or '
-                'specify default docker image in agent.default_docker.image / agent.default_docker.arguments'
+                'specify default docker image in agent.default_docker.image / agent.default_docker.arguments '
                'use --gpus/--cpu-only (or set NVIDIA_VISIBLE_DEVICES) to limit gpu visibility for docker',
                'nargs': '*',
                'default': False,
            },
+            '--force-docker': {
+                'help': 'Force using the agent-specified docker image (either explicitly in the --docker argument or '
+                        'using the agent\'s default docker image). If provided, the agent will not use any docker '
+                        'container information stored on the task itself (default False)',
+                'default': False,
+                'action': 'store_true',
+            },
            '--python-version': {
                'help': 'Virtual environment python version to use',
            },
--- a/clearml_agent/session.py
+++ b/clearml_agent/session.py
@@ -10,8 +10,8 @@ from typing import Any, Callable

 import attr
 from pathlib2 import Path
-from pyhocon import ConfigFactory, HOCONConverter, ConfigTree

+from clearml_agent.external.pyhocon import ConfigFactory, HOCONConverter, ConfigTree
 from clearml_agent.backend_api.session import Session as _Session, Request
 from clearml_agent.backend_api.session.client import APIClient
 from clearml_agent.backend_config.defs import LOCAL_CONFIG_FILE_OVERRIDE_VAR, LOCAL_CONFIG_FILES
@@ -19,6 +19,7 @@ from clearml_agent.definitions import ENVIRONMENT_CONFIG, ENV_TASK_EXECUTE_AS_US
 from clearml_agent.errors import APIError
 from clearml_agent.helper.base import HOCONEncoder
 from clearml_agent.helper.process import Argv
+from clearml_agent.helper.docker_args import DockerArgsSanitizer
 from .version import __version__

 POETRY = "poetry"
@@ -76,7 +77,7 @@ class Session(_Session):

        cpu_only = kwargs.get('cpu_only')
        if cpu_only:
-            os.environ['CUDA_VISIBLE_DEVICES'] = os.environ['NVIDIA_VISIBLE_DEVICES'] = 'none'
+            Session.set_nvidia_visible_env('none')

        if kwargs.get('gpus') and not os.environ.get('KUBERNETES_SERVICE_HOST') \
                and not os.environ.get('KUBERNETES_PORT'):
@@ -85,7 +86,7 @@ class Session(_Session):
                os.environ.pop('CUDA_VISIBLE_DEVICES', None)
                os.environ['NVIDIA_VISIBLE_DEVICES'] = kwargs.get('gpus')
            else:
-                os.environ['CUDA_VISIBLE_DEVICES'] = os.environ['NVIDIA_VISIBLE_DEVICES'] = kwargs.get('gpus')
+                Session.set_nvidia_visible_env(kwargs.get('gpus'))

        if kwargs.get('only_load_config'):
            from clearml_agent.backend_api.config import load
@@ -105,7 +106,7 @@ class Session(_Session):
                if os.path.exists(os.path.expanduser(os.path.expandvars(f))):
                    self._config_file = f
                    break
-        self.api_client = APIClient(session=self, api_version="2.5")
+        self._api_client = None
        # HACK make sure we have python version to execute,
        # if nothing was specific, use the one that runs us
        def_python = ConfigValue(self.config, "agent.default_python")
@@ -132,7 +133,7 @@ class Session(_Session):
        # override with environment variables
        # cuda_version & cudnn_version are overridden with os.environ here, and normalized in the next section
        for config_key, env_config in ENVIRONMENT_CONFIG.items():
-            # check if the propery is of a list:
+            # check if the property is of a list:
            if config_key.endswith('.0'):
                if all(not i.get() for i in env_config.values()):
                    continue
@@ -166,6 +167,16 @@ class Session(_Session):
        if not kwargs.get('only_load_config'):
            self.create_cache_folders()

+    @property
+    def api_client(self):
+        if self._api_client is None:
+            self._api_client = APIClient(session=self, api_version="2.5")
+        return self._api_client
+
+    @api_client.setter
+    def api_client(self, value):
+        self._api_client = value
+
    @staticmethod
    def get_logger(name):
        logger = logging.getLogger(name)
@@ -229,26 +240,38 @@ class Session(_Session):
            except:
                pass

-    def print_configuration(self, remove_secret_keys=("secret", "pass", "token", "account_key")):
+    def print_configuration(
+            self,
+            remove_secret_keys=("secret", "pass", "token", "account_key", "contents"),
+            skip_value_keys=("environment", ),
+            docker_args_sanitize_keys=("extra_docker_arguments", ),
+    ):
        # remove all the secrets from the print
-        def recursive_remove_secrets(dictionary, secret_keys=()):
+        def recursive_remove_secrets(dictionary, secret_keys=(), empty_keys=()):
            for k in list(dictionary):
                for s in secret_keys:
                    if s in k:
                        dictionary.pop(k)
                        break
+                for s in empty_keys:
+                    if s == k:
+                        dictionary[k] = {key: '****' for key in dictionary[k]} \
+                            if isinstance(dictionary[k], dict) else '****'
+                        break
                if isinstance(dictionary.get(k, None), dict):
-                    recursive_remove_secrets(dictionary[k], secret_keys=secret_keys)
+                    recursive_remove_secrets(dictionary[k], secret_keys=secret_keys, empty_keys=empty_keys)
                elif isinstance(dictionary.get(k, None), (list, tuple)):
+                    if k in (docker_args_sanitize_keys or []):
+                        dictionary[k] = DockerArgsSanitizer.sanitize_docker_command(self, dictionary[k])
                    for item in dictionary[k]:
                        if isinstance(item, dict):
-                            recursive_remove_secrets(item, secret_keys=secret_keys)
+                            recursive_remove_secrets(item, secret_keys=secret_keys, empty_keys=empty_keys)

        config = deepcopy(self.config.to_dict())
        # remove the env variable, it's not important
        config.pop('env', None)
-        if remove_secret_keys:
-            recursive_remove_secrets(config, secret_keys=remove_secret_keys)
+        if remove_secret_keys or skip_value_keys or docker_args_sanitize_keys:
+            recursive_remove_secrets(config, secret_keys=remove_secret_keys, empty_keys=skip_value_keys)
        # remove logging.loggers.urllib3.level from the print
        try:
            config['logging']['loggers']['urllib3'].pop('level', None)
@@ -279,7 +302,7 @@ class Session(_Session):
    def get(self, service, action, version=None, headers=None,
            data=None, json=None, async_enable=False, **kwargs):
        return self._manual_request(service=service, action=action,
-                                    version=version, method="get", headers=headers,
+                                    version=version, method=Request.def_method, headers=headers,
                                    data=data, async_enable=async_enable,
                                    json=json or kwargs)

@@ -290,7 +313,7 @@ class Session(_Session):
                                    data=data, async_enable=async_enable,
                                    json=json or kwargs)

-    def _manual_request(self, service, action, version=None, method="get", headers=None,
+    def _manual_request(self, service, action, version=None, method=Request.def_method, headers=None,
            data=None, json=None, async_enable=False, **kwargs):

        res = self.send_request(service=service, action=action,
@@ -318,6 +341,23 @@ class Session(_Session):
    def command(self, *args):
        return Argv(*args, log=self.get_logger(Argv.__module__))

+    @staticmethod
+    def set_nvidia_visible_env(gpus):
+        if not gpus:
+            gpus = ""
+        visible_env = gpus.replace(".", ":") if isinstance(gpus, str) else \
+            ','.join(str(g).replace(".", ":") for g in gpus)
+
+        os.environ['CUDA_VISIBLE_DEVICES'] = os.environ['NVIDIA_VISIBLE_DEVICES'] = visible_env
+
+    @staticmethod
+    def get_nvidia_visible_env():
+        visible_env = os.environ.get('NVIDIA_VISIBLE_DEVICES') or os.environ.get('CUDA_VISIBLE_DEVICES')
+        if visible_env is None:
+            return None
+        visible_env = str(visible_env).replace(":", ".")
+        return visible_env
+

@attr.s
 class TrainsAgentLogger(object):
--- a/clearml_agent/version.py
+++ b/clearml_agent/version.py
@@ -1 +1 @@
-__version__ = '1.1.0'
+__version__ = '1.5.2'
--- a/docker/k8s-glue/build-resources/clearml.conf
+++ b/docker/k8s-glue/build-resources/clearml.conf
@@ -57,8 +57,8 @@ agent {
        # supported options: pip, conda, poetry
        type: pip,

-        # specify pip version to use (examples "<20", "==19.3.1", "", empty string will install the latest version)
-        pip_version: "<20.2",
+        # specify pip version to use (examples "<20.2", "==19.3.1", "", empty string will install the latest version)
+        pip_version: "<21",

        # virtual environment inheres packages from system
        system_site_packages: false,
@@ -171,7 +171,7 @@ agent {

    default_docker: {
        # default docker image to use when running in docker mode
-        image: "nvidia/cuda:10.1-cudnn7-runtime-ubuntu18.04"
+        image: "nvidia/cuda:10.2-cudnn7-runtime-ubuntu18.04"

        # optional arguments to pass to docker image
        # arguments: ["--ipc=host", ]
--- a/docker/k8s-glue/glue-build/Dockerfile.alpine
+++ b/docker/k8s-glue/glue-build/Dockerfile.alpine
@@ -0,0 +1,75 @@
+ARG TAG=3.7.12-alpine3.15
+
+FROM python:${TAG} as build
+
+RUN apk add --no-cache \
+    gcc \
+    musl-dev \
+    libffi-dev
+
+RUN python3 \
+    -m pip \
+    install \
+    --prefix=/install \
+    --no-cache-dir \
+    -U \
+    clearml-agent \
+    cryptography>=2.9
+
+FROM python:${TAG} as target
+
+WORKDIR /app
+
+ARG KUBECTL_VERSION=1.22.4
+
+# Not sure about these ENV vars
+# ENV LC_ALL=en_US.UTF-8
+# ENV LANG=en_US.UTF-8
+# ENV LANGUAGE=en_US.UTF-8
+# ENV PYTHONIOENCODING=UTF-8
+
+COPY --from=build /install /usr/local
+
+ADD https://storage.googleapis.com/kubernetes-release/release/v${KUBECTL_VERSION}/bin/linux/amd64/kubectl /usr/bin/
+
+RUN chmod +x /usr/bin/kubectl
+
+RUN apk add --no-cache \
+    bash
+
+COPY k8s_glue_example.py .
+
+# AWS CLI
+# https://github.com/kyleknap/aws-cli/blob/source-proposal/proposals/source-install.md#alpine-linux
+# https://github.com/aws/aws-cli/issues/4685
+# https://github.com/aws/aws-cli/pull/6352
+
+# https://github.com/GoogleCloudPlatform/cloud-sdk-docker/blob/master/alpine/Dockerfile
+
+FROM target as gcp
+
+ARG CLOUD_SDK_VERSION=371.0.0
+ENV CLOUD_SDK_VERSION=$CLOUD_SDK_VERSION
+ENV PATH /google-cloud-sdk/bin:$PATH
+
+WORKDIR /
+
+RUN apk --no-cache add \
+        curl \
+        python3 \
+        py3-crcmod \
+        py3-openssl \
+        bash \
+        libc6-compat \
+        openssh-client \
+        git \
+        gnupg \
+    && curl -O https://dl.google.com/dl/cloudsdk/channels/rapid/downloads/google-cloud-sdk-${CLOUD_SDK_VERSION}-linux-x86_64.tar.gz && \
+    tar xzf google-cloud-sdk-${CLOUD_SDK_VERSION}-linux-x86_64.tar.gz && \
+    rm google-cloud-sdk-${CLOUD_SDK_VERSION}-linux-x86_64.tar.gz && \
+    gcloud config set core/disable_usage_reporting true && \
+    gcloud config set component_manager/disable_update_check true && \
+    gcloud config set metrics/environment github_docker_image && \
+    gcloud --version
+
+WORKDIR /app
--- a/docker/k8s-glue/glue-build/Dockerfile.bullseye
+++ b/docker/k8s-glue/glue-build/Dockerfile.bullseye
@@ -0,0 +1,82 @@
+ARG TAG=3.7.12-slim-bullseye
+
+FROM python:${TAG} as target
+
+ARG KUBECTL_VERSION=1.22.4
+
+WORKDIR /app
+
+RUN python3 \
+    -m pip \
+    install \
+    --no-cache-dir \
+    -U \
+    clearml-agent \
+    cryptography>=2.9
+
+# Not sure about these ENV vars
+# ENV LC_ALL=en_US.UTF-8
+# ENV LANG=en_US.UTF-8
+# ENV LANGUAGE=en_US.UTF-8
+# ENV PYTHONIOENCODING=UTF-8
+
+ADD https://storage.googleapis.com/kubernetes-release/release/v${KUBECTL_VERSION}/bin/linux/amd64/kubectl /usr/bin/
+
+RUN chmod +x /usr/bin/kubectl
+
+COPY k8s_glue_example.py .
+
+CMD ["python3", "k8s_glue_example.py"]
+
+FROM target as aws
+
+# https://docs.aws.amazon.com/cli/latest/userguide/getting-started-install.html
+# https://docs.aws.amazon.com/eks/latest/userguide/install-aws-iam-authenticator.html
+
+RUN apt-get update -qqy && \
+    apt-get install -qqy \
+    unzip && \
+    rm -rf /var/lib/apt/lists/*
+
+ADD https://awscli.amazonaws.com/awscli-exe-linux-x86_64.zip awscliv2.zip
+ADD https://amazon-eks.s3.us-west-2.amazonaws.com/1.21.2/2021-07-05/bin/linux/amd64/aws-iam-authenticator /usr/local/bin/aws-iam-authenticator
+
+RUN unzip awscliv2.zip && \
+    ./aws/install && \
+    rm -r awscliv2.zip aws/ && \
+    chmod +x /usr/local/bin/aws-iam-authenticator && \
+    aws --version && \
+    aws-iam-authenticator version
+
+# https://github.com/GoogleCloudPlatform/cloud-sdk-docker/blob/master/debian_slim/Dockerfile
+
+FROM target as gcp
+
+ARG CLOUD_SDK_VERSION=371.0.0
+ENV CLOUD_SDK_VERSION=$CLOUD_SDK_VERSION
+
+ENV PATH "$PATH:/opt/google-cloud-sdk/bin/"
+
+ARG INSTALL_COMPONENTS
+RUN mkdir -p /usr/share/man/man1/
+RUN apt-get update -qqy && \
+    apt-get install -qqy \
+        curl \
+        gcc \
+        python3-dev \
+        python3-pip \
+        apt-transport-https \
+        lsb-release \
+        openssh-client \
+        git \
+        gnupg && \
+        rm -rf /var/lib/apt/lists/* && \
+    pip3 install -U crcmod && \
+    export CLOUD_SDK_REPO="cloud-sdk-$(lsb_release -c -s)" && \
+    echo "deb https://packages.cloud.google.com/apt $CLOUD_SDK_REPO main" > /etc/apt/sources.list.d/google-cloud-sdk.list && \
+    curl https://packages.cloud.google.com/apt/doc/apt-key.gpg | apt-key add - && \
+    apt-get update && apt-get install -y google-cloud-sdk=${CLOUD_SDK_VERSION}-0 $INSTALL_COMPONENTS && \
+    gcloud config set core/disable_usage_reporting true && \
+    gcloud config set component_manager/disable_update_check true && \
+    gcloud config set metrics/environment github_docker_image && \
+    gcloud --version
--- a/docker/k8s-glue/glue-build/k8s_glue_example.py
+++ b/docker/k8s-glue/glue-build/k8s_glue_example.py
@@ -0,0 +1,94 @@
+"""
+This example assumes you have preconfigured services with selectors in the form of
+ "ai.allegro.agent.serial=pod-<number>" and a targetPort of 10022.
+The K8sIntegration component will label each pod accordingly.
+"""
+from argparse import ArgumentParser
+
+from clearml_agent.glue.k8s import K8sIntegration
+
+
+def parse_args():
+    parser = ArgumentParser()
+    group = parser.add_mutually_exclusive_group()
+
+    parser.add_argument(
+        "--queue", type=str, help="Queue to pull tasks from"
+    )
+    group.add_argument(
+        "--ports-mode", action='store_true', default=False,
+        help="Ports-Mode will add a label to the pod which can be used as service, in order to expose ports"
+             "Should not be used with max-pods"
+    )
+    parser.add_argument(
+        "--num-of-services", type=int, default=20,
+        help="Specify the number of k8s services to be used. Use only with ports-mode."
+    )
+    parser.add_argument(
+        "--base-port", type=int,
+        help="Used in conjunction with ports-mode, specifies the base port exposed by the services. "
+             "For pod #X, the port will be <base-port>+X. Note that pod number is calculated based on base-pod-num"
+             "e.g. if base-port=20000 and base-pod-num=3, the port for the first pod will be 20003"
+    )
+    parser.add_argument(
+        "--base-pod-num", type=int, default=1,
+        help="Used in conjunction with ports-mode and base-port, specifies the base pod number to be used by the "
+             "service (default: %(default)s)"
+    )
+    parser.add_argument(
+        "--gateway-address", type=str, default=None,
+        help="Used in conjunction with ports-mode, specify the external address of the k8s ingress / ELB"
+    )
+    parser.add_argument(
+        "--pod-clearml-conf", type=str,
+        help="Configuration file to be used by the pod itself (if not provided, current configuration is used)"
+    )
+    parser.add_argument(
+        "--overrides-yaml", type=str,
+        help="YAML file containing pod overrides to be used when launching a new pod"
+    )
+    parser.add_argument(
+        "--template-yaml", type=str,
+        help="YAML file containing pod template. If provided pod will be scheduled with kubectl apply "
+             "and overrides are ignored, otherwise it will be scheduled with kubectl run"
+    )
+    parser.add_argument(
+        "--ssh-server-port", type=int, default=0,
+        help="If non-zero, every pod will also start an SSH server on the selected port (default: zero, not active)"
+    )
+    parser.add_argument(
+        "--namespace", type=str,
+        help="Specify the namespace in which pods will be created (default: %(default)s)", default="clearml"
+    )
+    group.add_argument(
+        "--max-pods", type=int,
+        help="Limit the maximum number of pods that this service can run at the same time."
+             "Should not be used with ports-mode"
+    )
+    return parser.parse_args()
+
+
+def main():
+    args = parse_args()
+
+    user_props_cb = None
+    if args.ports_mode and args.base_port:
+        def k8s_user_props_cb(pod_number=0):
+            user_prop = {"k8s-pod-port": args.base_port + pod_number}
+            if args.gateway_address:
+                user_prop["k8s-gateway-address"] = args.gateway_address
+            return user_prop
+        user_props_cb = k8s_user_props_cb
+
+    k8s = K8sIntegration(
+        ports_mode=args.ports_mode, num_of_services=args.num_of_services, base_pod_num=args.base_pod_num,
+        user_props_cb=user_props_cb, overrides_yaml=args.overrides_yaml, clearml_conf_file=args.pod_clearml_conf,
+        template_yaml=args.template_yaml, extra_bash_init_script=K8sIntegration.get_ssh_server_bash(
+            ssh_port_number=args.ssh_server_port) if args.ssh_server_port else None,
+        namespace=args.namespace, max_pods_limit=args.max_pods or None,
+    )
+    k8s.k8s_daemon(args.queue)
+
+
+if __name__ == "__main__":
+    main()
--- a/docker/services/entrypoint.sh
+++ b/docker/services/entrypoint.sh
@@ -1,16 +1,36 @@
-#!/bin/sh
+#!/bin/bash +x

-CLEARML_FILES_HOST=${CLEARML_FILES_HOST:-$TRAINS_FILES_HOST}
+if [ -n "$SHUTDOWN_IF_NO_ACCESS_KEY" ] && [ -z "$CLEARML_API_ACCESS_KEY" ] && [ -z "$TRAINS_API_ACCESS_KEY" ]; then
+  echo "CLEARML_API_ACCESS_KEY was not provided, service will not be started"
+  exit 0
+fi
+
+export CLEARML_FILES_HOST=${CLEARML_FILES_HOST:-$TRAINS_FILES_HOST}

 if [ -z "$CLEARML_FILES_HOST" ]; then
    CLEARML_HOST_IP=${CLEARML_HOST_IP:-${TRAINS_HOST_IP:-$(curl -s https://ifconfig.me/ip)}}
 fi

-CLEARML_FILES_HOST=${CLEARML_FILES_HOST:-${TRAINS_FILES_HOST:-"http://$CLEARML_HOST_IP:8081"}}
-CLEARML_WEB_HOST=${CLEARML_WEB_HOST:-${TRAINS_WEB_HOST:-"http://$CLEARML_HOST_IP:8080"}}
-CLEARML_API_HOST=${CLEARML_API_HOST:-${TRAINS_API_HOST:-"http://$CLEARML_HOST_IP:8008"}}
+export CLEARML_FILES_HOST=${CLEARML_FILES_HOST:-${TRAINS_FILES_HOST:-"http://$CLEARML_HOST_IP:8081"}}
+export CLEARML_WEB_HOST=${CLEARML_WEB_HOST:-${TRAINS_WEB_HOST:-"http://$CLEARML_HOST_IP:8080"}}
+export CLEARML_API_HOST=${CLEARML_API_HOST:-${TRAINS_API_HOST:-"http://$CLEARML_HOST_IP:8008"}}

 echo $CLEARML_FILES_HOST $CLEARML_WEB_HOST $CLEARML_API_HOST 1>&2

-python3 -m pip install -q -U "clearml-agent${CLEARML_AGENT_UPDATE_VERSION:-$TRAINS_AGENT_UPDATE_VERSION}"
-clearml-agent daemon --services-mode --queue services --create-queue --docker "${CLEARML_AGENT_DEFAULT_BASE_DOCKER:-$TRAINS_AGENT_DEFAULT_BASE_DOCKER}" --cpu-only ${CLEARML_AGENT_EXTRA_ARGS:-$TRAINS_AGENT_EXTRA_ARGS}
+if [[ "$CLEARML_AGENT_UPDATE_VERSION" =~ ^[0-9]{1,3}\.[0-9]{1,3}(\.[0-9]{1,3}([a-zA-Z]{1,3}[0-9]{1,3})?)?$ ]]
+then
+    CLEARML_AGENT_UPDATE_VERSION="==$CLEARML_AGENT_UPDATE_VERSION"
+fi
+
+DAEMON_OPTIONS=${CLEARML_AGENT_DAEMON_OPTIONS:---services-mode --create-queue}
+QUEUES=${CLEARML_AGENT_QUEUES:-services}
+
+if [ -z "$CLEARML_AGENT_NO_UPDATE" ]; then
+  if [ -n "$CLEARML_AGENT_UPDATE_REPO" ]; then
+    python3 -m pip install -q -U $CLEARML_AGENT_UPDATE_REPO
+  else
+    python3 -m pip install -q -U "clearml-agent${CLEARML_AGENT_UPDATE_VERSION:-$TRAINS_AGENT_UPDATE_VERSION}"
+  fi
+fi
+
+clearml-agent daemon $DAEMON_OPTIONS --queue $QUEUES --docker "${CLEARML_AGENT_DEFAULT_BASE_DOCKER:-$TRAINS_AGENT_DEFAULT_BASE_DOCKER}" --cpu-only ${CLEARML_AGENT_EXTRA_ARGS:-$TRAINS_AGENT_EXTRA_ARGS}
--- a/docs/clearml.conf
+++ b/docs/clearml.conf
@@ -4,7 +4,7 @@ api {
    web_server: https://demoapp.demo.clear.ml
    files_server: https://demofiles.demo.clear.ml

-    # Credentials are generated in the webapp, https://demoapp.demo.clear.ml/profile
+    # Credentials are generated in the webapp, https://app.clear.ml/settings/workspace-configuration
    # Overridden with os environment: CLEARML_API_ACCESS_KEY / CLEARML_API_SECRET_KEY
    credentials {"access_key": "EGRTCO8JMSIGI6S39GTP43NFWXDQOW", "secret_key": "x!XTov_G-#vspE*Y(h$Anm&DIc5Ou-F)jsl$PdOyj5wG1&E!Z8"}

@@ -13,13 +13,27 @@ api {
 }

 agent {
+    # unique name of this worker, if None, created based on hostname:process_id
+    # Override with os environment: CLEARML_WORKER_ID
+    # worker_id: "clearml-agent-machine1:gpu0"
+    worker_id: ""
+
+    # worker name, replaces the hostname when creating a unique name for this worker
+    # Override with os environment: CLEARML_WORKER_NAME
+    # worker_name: "clearml-agent-machine1"
+    worker_name: ""
    # Set GIT user/pass credentials (if user/pass are set, GIT protocol will be set to https)
    # leave blank for GIT SSH credentials (set force_git_ssh_protocol=true to force SSH protocol)
-    git_user=""
-    git_pass=""
+    # **Notice**: GitHub personal token is equivalent to password, you can put it directly into `git_pass`
+    # To learn how to generate git token GitHub/Bitbucket/GitLab:
+    # https://docs.github.com/en/authentication/keeping-your-account-and-data-secure/creating-a-personal-access-token
+    # https://support.atlassian.com/bitbucket-cloud/docs/app-passwords/
+    # https://docs.gitlab.com/ee/user/profile/personal_access_tokens.html
+    # git_user: ""
+    # git_pass: ""
    # Limit credentials to a single domain, for example: github.com,
    # all other domains will use public access (no user/pass). Default: always send user/pass for any VCS domain
-    git_host=""
+    # git_host: ""

    # Force GIT protocol to use SSH regardless of the git url (Assumes GIT user/pass are blank)
    force_git_ssh_protocol: false
@@ -28,16 +42,6 @@ agent {
    # Force a specific SSH username when converting http to ssh links (the default username is 'git')
    # force_git_ssh_user: git

-    # unique name of this worker, if None, created based on hostname:process_id
-    # Overridden with os environment: CLEARML_WORKER_NAME
-    # worker_id: "clearml-agent-machine1:gpu0"
-    worker_id: ""
-
-    # worker name, replaces the hostname when creating a unique name for this worker
-    # Overridden with os environment: CLEARML_WORKER_ID
-    # worker_name: "clearml-agent-machine1"
-    worker_name: ""
-
    # Set the python version to use when creating the virtual environment and launching the experiment
    # Example values: "/usr/bin/python3" or "/usr/local/bin/python3.6"
    # The default is the python executing the clearml_agent
@@ -46,6 +50,22 @@ agent {
    # specific python version and the system supports multiple python the agent will use the requested python version)
    # ignore_requested_python_version: true

+    # Force the root folder of the git repository (instead of the working directory) into the PYHTONPATH
+    # default false, only the working directory will be added to the PYHTONPATH
+    # force_git_root_python_path: false
+
+    # if set, use GIT_ASKPASS to pass user/pass when cloning / fetch repositories
+    # it solves passing user/token to git submodules.
+    # this is a safer way to ensure multiple users using the same repository will
+    # not accidentally leak credentials
+    # Only supported on Linux systems, it will be the default in future releases
+    # enable_git_ask_pass: false
+
+    # in docker mode, if container's entrypoint automatically activated a virtual environment
+    # use the activated virtual environment and install everything there
+    # set to False to disable, and always create a new venv inheriting from the system_site_packages
+    # docker_use_activated_venv: true
+
    # select python package manager:
    # currently supported: pip, conda and poetry
    # if "pip" or "conda" are used, the agent installs the required packages
@@ -58,8 +78,11 @@ agent {
        # supported options: pip, conda, poetry
        type: pip,

-        # specify pip version to use (examples "<20", "==19.3.1", "", empty string will install the latest version)
-        # pip_version: "<20"
+        # specify pip version to use (examples "<20.2", "==19.3.1", "", empty string will install the latest version)
+        # pip_version: ["<20.2 ; python_version < '3.10'",  "<22.3 ; python_version >= '3.10'"]
+        # specify poetry version to use (examples "<2", "==1.1.1", "", empty string will install the latest version)
+        # poetry_version: "<2",
+        # poetry_install_extra_args: ["-v"]

        # virtual environment inheres packages from system
        system_site_packages: false,
@@ -106,7 +129,7 @@ agent {
        # minimum required free space to allow for cache entry, disable by passing 0 or negative value
        free_space_threshold_gb: 2.0
        # unmark to enable virtual environment caching
-        # path: ~/.clearml/venvs-cache
+        path: ~/.clearml/venvs-cache
    },

    # cached git clone folder
@@ -129,6 +152,12 @@ agent {
    },

    translate_ssh: true,
+
+    # set "disable_ssh_mount: true" to disable the automatic mount of ~/.ssh folder into the docker containers
+    # default is false, automatically mounts ~/.ssh
+    # Must be set to True if using "clearml-session" with this agent!
+    # disable_ssh_mount: false
+
    # reload configuration file every daemon execution
    reload_config: false,

@@ -155,17 +184,64 @@ agent {

    default_docker: {
        # default docker image to use when running in docker mode
-        image: "nvidia/cuda:10.1-cudnn7-runtime-ubuntu18.04"
+        image: "nvidia/cuda:10.2-cudnn7-runtime-ubuntu18.04"

        # optional arguments to pass to docker image
        # arguments: ["--ipc=host"]
+
+        # lookup table rules for default container
+        # first matched rule will be picked, according to rule order
+        # enterprise version only
+        # match_rules: [
+        #     {
+        #         image: "nvidia/cuda:10.1-cudnn7-runtime-ubuntu18.04"
+        #         arguments: "-e define=value"
+        #         match: {
+        #             script{
+        #                 # Optional: must match all requirements (not partial)
+        #                 requirements: {
+        #                     # version selection matching PEP-440
+        #                     pip: {
+        #                         tensorflow: "~=2.6"
+        #                     },
+        #                 }
+        #                 # Optional: matching based on regular expression, example: "^exact_match$"
+        #                 repository: "/my_repository/"
+        #                 branch: "main"
+        #                 binary: "python3.6"
+        #             }
+        #             # Optional: matching based on regular expression, example: "^exact_match$"
+        #             project: "project/sub_project"
+        #         }
+        #     },
+        #     {
+        #         image: "nvidia/cuda:10.1-cudnn7-runtime-ubuntu18.04"
+        #         arguments: "-e define=value"
+        #         match: {
+        #             # must match all requirements (not partial)
+        #             script{
+        #                 requirements: {
+        #                     conda: {
+        #                         torch: ">=2.6,<2.8"
+        #                     }
+        #                 }
+        #                 # no repository matching required
+        #                 repository: ""
+        #             }
+        #             # no container image matching required (allow to replace one requested container with another)
+        #             container: ""
+        #             # no repository matching required
+        #             project: ""
+        #         }
+        #     },
+        # ]
    }

    # set the OS environments based on the Task's Environment section before launching the Task process.
    enable_task_env: false

    # CUDA versions used for Conda setup & solving PyTorch wheel packages
-    # it Should be detected automatically. Override with os environment CUDA_VERSION / CUDNN_VERSION
+    # Should be detected automatically. Override with os environment CUDA_VERSION / CUDNN_VERSION
    # cuda_version: 10.1
    # cudnn_version: 7.6

@@ -179,6 +255,7 @@ agent {
    hide_docker_command_env_vars {
        enabled: true
        extra_keys: []
+        parse_embedded_urls: true
    }

    # allow to set internal mount points inside the docker,
@@ -190,14 +267,14 @@ agent {
    #     pip_cache: "/root/.cache/pip"
    #     poetry_cache: "/root/.cache/pypoetry"
    #     vcs_cache: "/root/.clearml/vcs-cache"
-    #     venv_build: "/root/.clearml/venvs-builds"
+    #     venv_build: "~/.clearml/venvs-builds"
    #     pip_download: "/root/.clearml/pip-download-cache"
    # }

    # Name docker containers created by the daemon using the following string format (supported from Docker 0.6.5)
-    # Allowed variables are task_id, worker_id and rand_string (random lower-case letters string, up to 32 charaters)
-    # Note: resulting name must start with an alpha-numeric character and
-    #       continue with a alpha-numeric characters, underscores (_), dots (.) and/or dashes (-)
+    # Allowed variables are task_id, worker_id and rand_string (random lower-case letters string, up to 32 characters)
+    # Note: resulting name must start with an alphanumeric character and
+    #       continue with alphanumeric characters, underscores (_), dots (.) and/or dashes (-)
    # docker_container_name_format: "clearml-id-{task_id}-{rand_string:.8}"
 }

@@ -270,6 +347,11 @@ sdk {
            key: ""
            secret: ""
            region: ""
+            # Or enable credentials chain to let Boto3 pick the right credentials.
+            # This includes picking credentials from environment variables,
+            # credential file and IAM role using metadata service.
+            # Refer to the latest Boto3 docs
+            use_credentials_chain: false

            credentials: [
                # specifies key/secret credentials to use when handling s3 urls (read or write)
@@ -285,6 +367,7 @@ sdk {
                #     secret: "12345678"
                #     multipart: false
                #     secure: false
+                #     verify: /path/to/ca/bundle.crt OR false to not verify
                # }
            ]
        }
@@ -359,5 +442,49 @@ sdk {
            log_stdout: True
        }
    }
+
+    # Apply top-level environment section from configuration into os.environ
+    apply_environment: true
+    # Apply top-level files section from configuration into local file system
+    apply_files: true
 }

+# Environment section (top-level) is applied to the OS environment as `key=value` for each key/value pair
+# * enable/disable with `agent.apply_environment` OR `sdk.apply_environment`
+# Example:
+#
+#   environment {
+#     key_a: value_a
+#     key_b: value_b
+#   }
+
+# Files section (top-level) allows auto-generating files at designated paths with
+# predefined content and target format.
+# * enable/disable with `agent.apply_files` OR `sdk.apply_files`
+# Files content options include:
+#  contents: the target file's content, typically a string (or any base type int/float/list/dict etc.)
+#  format: a custom format for the contents. Currently supported value is `base64` to automatically decode a
+#          base64-encoded contents string, otherwise ignored
+#  path: the target file's path, may include ~ and inplace env vars
+#  target_format: format used to encode contents before writing into the target file. Supported values are json,
+#                 yaml, yml and bytes (in which case the file will be written in binary mode). Default is text mode.
+#  overwrite: overwrite the target file in case it exists. Default is true.
+#
+# Example:
+#   files {
+#     myfile1 {
+#       contents: "The quick brown fox jumped over the lazy dog"
+#       path: "/tmp/fox.txt"
+#     }
+#     myjsonfile {
+#       contents: {
+#         some {
+#           nested {
+#             value: [1, 2, 3, 4]
+#           }
+#         }
+#       }
+#       path: "/tmp/test.json"
+#       target_format: json
+#     }
+#   }
--- a/docs/screenshots.gif
+++ b/docs/screenshots.gif
--- a/examples/k8s_glue_example.py
+++ b/examples/k8s_glue_example.py
@@ -65,6 +65,10 @@ def parse_args():
        help="Limit the maximum number of pods that this service can run at the same time."
             "Should not be used with ports-mode"
    )
+    parser.add_argument(
+        "--use-owner-token", action="store_true", default=False,
+        help="Generate and use task owner token for the execution of each task"
+    )
    return parser.parse_args()


@@ -87,7 +91,7 @@ def main():
            ssh_port_number=args.ssh_server_port) if args.ssh_server_port else None,
        namespace=args.namespace, max_pods_limit=args.max_pods or None,
    )
-    k8s.k8s_daemon(args.queue)
+    k8s.k8s_daemon(args.queue, use_owner_token=args.use_owner_token)


 if __name__ == "__main__":
--- a/requirements.txt
+++ b/requirements.txt
@@ -1,17 +1,15 @@
-attrs>=18.0,<20.4.0
+attrs>=18.0,<23.0.0
 enum34>=0.9,<1.2.0 ; python_version < '3.6'
 furl>=2.0.0,<2.2.0
-future>=0.16.0,<0.19.0
-jsonschema>=2.6.0,<3.3.0
+jsonschema>=2.6.0,<5.0.0
 pathlib2>=2.3.0,<2.4.0
-psutil>=3.4.2,<5.9.0
-pyhocon>=0.3.38,<0.4.0
-pyparsing>=2.0.3,<2.5.0
+psutil>=3.4.2,<5.10.0
+pyparsing>=2.0.3,<3.1.0
 python-dateutil>=2.4.2,<2.9.0
-pyjwt>=1.6.4,<2.1.0
-PyYAML>=3.12,<5.5.0
-requests>=2.20.0,<2.26.0
-six>=1.11.0,<1.16.0
-typing>=3.6.4,<3.8.0
+pyjwt>=2.4.0,<2.7.0
+PyYAML>=3.12,<6.1
+requests>=2.20.0,<2.29.0
+six>=1.13.0,<1.17.0
+typing>=3.6.4,<3.8.0 ; python_version < '3.5'
 urllib3>=1.21.1,<1.27.0
 virtualenv>=16,<21
--- a/setup.py
+++ b/setup.py
@@ -61,6 +61,7 @@ setup(
        'Programming Language :: Python :: 3.7',
        'Programming Language :: Python :: 3.8',
        'Programming Language :: Python :: 3.9',
+        'Programming Language :: Python :: 3.10',
        'License :: OSI Approved :: Apache Software License',
    ],