Add more debug printouts in k8s glue

Fix resolving k8s pending queue may cause a queue with a uuid name to be created
Add docker ssh_ro_folder (default: "/.ssh") changed docker ssh_folder (default: "~/.ssh")
2025-06-26 18:16:15 +00:00 · 2022-09-02 23:49:28 +03:00 · 2022-09-02 23:49:28 +03:00 · 2022-09-02 23:49:27 +03:00 · 2022-09-02 23:49:27 +03:00 · 2022-09-02 23:49:27 +03:00
40 changed files with 2611 additions and 587 deletions
--- a/README.md
+++ b/README.md
@@ -8,15 +8,15 @@ ML-Ops scheduler & orchestration solution supporting Linux, macOS and Windows**
 [![GitHub license](https://img.shields.io/github/license/allegroai/clearml-agent.svg)](https://img.shields.io/github/license/allegroai/clearml-agent.svg)
 [![PyPI pyversions](https://img.shields.io/pypi/pyversions/clearml-agent.svg)](https://img.shields.io/pypi/pyversions/clearml-agent.svg)
 [![PyPI version shields.io](https://img.shields.io/pypi/v/clearml-agent.svg)](https://img.shields.io/pypi/v/clearml-agent.svg)
+[![PyPI Downloads](https://pepy.tech/badge/clearml-agent/month)](https://pypi.org/project/clearml-agent/)
 [![Artifact Hub](https://img.shields.io/endpoint?url=https://artifacthub.io/badge/repository/allegroai)](https://artifacthub.io/packages/search?repo=allegroai)
-    
 </div>

 ---

 ### ClearML-Agent
-#### *Formerly known as Trains Agent*

+#### *Formerly known as Trains Agent*

 * Run jobs (experiments) on any local or cloud based resource
 * Implement optimized resource utilization policies
@@ -24,23 +24,31 @@ ML-Ops scheduler & orchestration solution supporting Linux, macOS and Windows**
 * Launch-and-Forget service containers
 * [Cloud autoscaling](https://clear.ml/docs/latest/docs/guides/services/aws_autoscaler)
 * [Customizable cleanup](https://clear.ml/docs/latest/docs/guides/services/cleanup_service)
-* Advanced [pipeline building and execution](https://clear.ml/docs/latest/docs/guides/frameworks/pytorch/notebooks/table/tabular_training_pipeline)
+*
+Advanced [pipeline building and execution](https://clear.ml/docs/latest/docs/guides/frameworks/pytorch/notebooks/table/tabular_training_pipeline)

 It is a zero configuration fire-and-forget execution agent, providing a full ML/DL cluster solution.

 **Full Automation in 5 steps**
-1. ClearML Server [self-hosted](https://github.com/allegroai/clearml-server) or [free tier hosting](https://app.community.clear.ml)
-2. `pip install clearml-agent` ([install](#installing-the-clearml-agent) the ClearML Agent on any GPU machine: on-premises / cloud / ...)
-3. Create a [job](https://github.com/allegroai/clearml/docs/clearml-task.md) or Add [ClearML](https://github.com/allegroai/clearml) to your code with just 2 lines
-4. Change the [parameters](#using-the-clearml-agent) in the UI & schedule for [execution](#using-the-clearml-agent) (or automate with an [AutoML pipeline](#automl-and-orchestration-pipelines-))
+
+1. ClearML Server [self-hosted](https://github.com/allegroai/clearml-server)
+   or [free tier hosting](https://app.clear.ml)
+2. `pip install clearml-agent` ([install](#installing-the-clearml-agent) the ClearML Agent on any GPU machine:
+   on-premises / cloud / ...)
+3. Create a [job](https://github.com/allegroai/clearml/docs/clearml-task.md) or
+   Add [ClearML](https://github.com/allegroai/clearml) to your code with just 2 lines
+4. Change the [parameters](#using-the-clearml-agent) in the UI & schedule for [execution](#using-the-clearml-agent) (or
+   automate with an [AutoML pipeline](#automl-and-orchestration-pipelines-))
 5. :chart_with_downwards_trend: :chart_with_upwards_trend: :eyes:  :beer:

 "All the Deep/Machine-Learning DevOps your research needs, and then some... Because ain't nobody got time for that"

-**Try ClearML now** [Self Hosted](https://github.com/allegroai/clearml-server) or [Free tier Hosting](https://app.community.clear.ml)
-<a href="https://app.community.clear.ml"><img src="https://github.com/allegroai/clearml-agent/blob/master/docs/screenshots.gif?raw=true" width="100%"></a>
+**Try ClearML now** [Self Hosted](https://github.com/allegroai/clearml-server)
+or [Free tier Hosting](https://app.clear.ml)
+<a href="https://app.clear.ml"><img src="https://github.com/allegroai/clearml-agent/blob/master/docs/screenshots.gif?raw=true" width="100%"></a>

 ### Simple, Flexible Experiment Orchestration
+
 **The ClearML Agent was built to address the DL/ML R&D DevOps needs:**

 * Easily add & remove machines from the cluster
@@ -56,18 +64,23 @@ It is a zero configuration fire-and-forget execution agent, providing a full ML/

 *epsilon - Because we are :triangular_ruler: and nothing is really zero work

-
 ### Kubernetes Integration (Optional)
-We think Kubernetes is awesome, but it should be a choice.
-We designed `clearml-agent` so you can run bare-metal or inside a pod with any mix that fits your environment.
-#### Benefits of integrating existing K8s with ClearML-Agent 
+
+We think Kubernetes is awesome, but it should be a choice. We designed `clearml-agent` so you can run bare-metal or
+inside a pod with any mix that fits your environment.
+
+Find Dockerfiles in the [docker](./docker) dir and a helm Chart in https://github.com/allegroai/clearml-helm-charts
+
+#### Benefits of integrating existing K8s with ClearML-Agent
+
 - ClearML-Agent adds the missing scheduling capabilities to K8s
 - Allowing for more flexible automation from code
 - A programmatic interface for easier learning curve (and debugging)
 - Seamless integration with ML/DL experiment manager
- Web UI for customization, scheduling & prioritization of jobs 
+- Web UI for customization, scheduling & prioritization of jobs
+
+**Two K8s integration flavours**

-**Two K8s integration flavours** 
 - Spin ClearML-Agent as a long-lasting service pod
    - use [clearml-agent](https://hub.docker.com/r/allegroai/clearml-agent) docker image
    - map docker socket into the pod (soon replaced by [podman](https://github.com/containers/podman))
@@ -75,57 +88,66 @@ We designed `clearml-agent` so you can run bare-metal or inside a pod with any m
    - benefits: full use of the ClearML scheduling, no need to worry about wrong container images / lost pods etc.
    - downside: Sibling containers
 - Kubernetes Glue, map ClearML jobs directly to K8s jobs
-    - Run the [clearml-k8s glue](https://github.com/allegroai/clearml-agent/blob/master/examples/k8s_glue_example.py) on a K8s cpu node
-    - The clearml-k8s glue pulls jobs from the ClearML job execution queue and prepares a K8s job (based on provided yaml template)
-    - Inside the pod itself the clearml-agent will install the job (experiment) environment and spin and monitor the experiment's process
+    - Run the [clearml-k8s glue](https://github.com/allegroai/clearml-agent/blob/master/examples/k8s_glue_example.py) on
+      a K8s cpu node
+    - The clearml-k8s glue pulls jobs from the ClearML job execution queue and prepares a K8s job (based on provided
+      yaml template)
+    - Inside the pod itself the clearml-agent will install the job (experiment) environment and spin and monitor the
+      experiment's process
    - benefits: Kubernetes full view of all running jobs in the system
-    - downside: No real scheduling (k8s scheduler), no docker image verification (post-mortem only) 
+    - downside: No real scheduling (k8s scheduler), no docker image verification (post-mortem only)

 ### Using the ClearML Agent
+
 **Full scale HPC with a click of a button**

-The ClearML Agent is a job scheduler that listens on job queue(s), pulls jobs, sets the job environments, executes the job and monitors its progress.
+The ClearML Agent is a job scheduler that listens on job queue(s), pulls jobs, sets the job environments, executes the
+job and monitors its progress.

 Any 'Draft' experiment can be scheduled for execution by a ClearML agent.

 A previously run experiment can be put into 'Draft' state by either of two methods:
-* Using the **'Reset'** action from the experiment right-click context menu in the
-  ClearML UI - This will clear any results and artifacts the previous run had created.
-* Using the **'Clone'** action from the experiment right-click context menu in the
-  ClearML UI - This will create a new 'Draft' experiment with the same configuration as the original experiment.

-An experiment is scheduled for execution using the **'Enqueue'** action from the experiment
- right-click context menu in the ClearML UI and selecting the execution queue.
+* Using the **'Reset'** action from the experiment right-click context menu in the ClearML UI - This will clear any
+  results and artifacts the previous run had created.
+* Using the **'Clone'** action from the experiment right-click context menu in the ClearML UI - This will create a new '
+  Draft' experiment with the same configuration as the original experiment.
+
+An experiment is scheduled for execution using the **'Enqueue'** action from the experiment right-click context menu in
+the ClearML UI and selecting the execution queue.

 See [creating an experiment and enqueuing it for execution](#from-scratch).

 Once an experiment is enqueued, it will be picked up and executed by a ClearML agent monitoring this queue.

 The ClearML UI Workers & Queues page provides ongoing execution information:
-  - Workers Tab: Monitor you cluster
+
+- Workers Tab: Monitor you cluster
    - Review available resources
    - Monitor machines statistics (CPU / GPU / Disk / Network)
-  - Queues Tab:
+- Queues Tab:
    - Control the scheduling order of jobs
    - Cancel or abort job execution
    - Move jobs between execution queues

 #### What The ClearML Agent Actually Does
+
 The ClearML Agent executes experiments using the following process:
-  - Create a new virtual environment (or launch the selected docker image)
-  - Clone the code into the virtual-environment (or inside the docker)
-  - Install python packages based on the package requirements listed for the experiment
-    - Special note for PyTorch: The ClearML Agent will automatically select the
-      torch packages based on the CUDA_VERSION environment variable of the machine
-  - Execute the code, while monitoring the process
-  - Log all stdout/stderr in the ClearML UI, including the cloning and installation process, for easy debugging
-  - Monitor the execution and allow you to manually abort the job using the ClearML UI (or, in the unfortunate case of a code crash, catch the error and signal the experiment has failed)
+
+- Create a new virtual environment (or launch the selected docker image)
+- Clone the code into the virtual-environment (or inside the docker)
+- Install python packages based on the package requirements listed for the experiment
+    - Special note for PyTorch: The ClearML Agent will automatically select the torch packages based on the CUDA_VERSION
+      environment variable of the machine
+- Execute the code, while monitoring the process
+- Log all stdout/stderr in the ClearML UI, including the cloning and installation process, for easy debugging
+- Monitor the execution and allow you to manually abort the job using the ClearML UI (or, in the unfortunate case of a
+  code crash, catch the error and signal the experiment has failed)

 #### System Design & Flow

 <img src="https://github.com/allegroai/clearml-agent/blob/master/docs/clearml_architecture.png" width="100%" alt="clearml-architecture">

-
 #### Installing the ClearML Agent

 ```bash
@@ -135,6 +157,7 @@ pip install clearml-agent
 #### ClearML Agent Usage Examples

 Full Interface and capabilities are available with
+
 ```bash
 clearml-agent --help
 clearml-agent daemon --help
@@ -146,7 +169,8 @@ clearml-agent daemon --help
 clearml-agent init
 ```

-Note: The ClearML Agent uses a cache folder to cache pip packages, apt packages and cloned repositories. The default ClearML Agent cache folder is `~/.clearml`
+Note: The ClearML Agent uses a cache folder to cache pip packages, apt packages and cloned repositories. The default
+ClearML Agent cache folder is `~/.clearml`

 See full details in your configuration file at `~/clearml.conf`

@@ -156,29 +180,36 @@ They are designed to share the same configuration file, see example [here](docs/
 #### Running the ClearML Agent

 For debug and experimentation, start the ClearML agent in `foreground` mode, where all the output is printed to screen
+
 ```bash
 clearml-agent daemon --queue default --foreground
 ```

 For actual service mode, all the stdout will be stored automatically into a temporary file (no need to pipe)
 Notice: with `--detached` flag, the *clearml-agent* will be running in the background
+
 ```bash
 clearml-agent daemon --detached --queue default
 ```

-GPU allocation is controlled via the standard OS environment `NVIDIA_VISIBLE_DEVICES` or `--gpus` flag (or disabled with `--cpu-only`).
+GPU allocation is controlled via the standard OS environment `NVIDIA_VISIBLE_DEVICES` or `--gpus` flag (or disabled
+with `--cpu-only`).

-If no flag is set, and `NVIDIA_VISIBLE_DEVICES` variable doesn't exist, all GPU's will be allocated for the `clearml-agent` <br>
-If `--cpu-only` flag is set, or `NVIDIA_VISIBLE_DEVICES` is an empty string (""), no gpu will be allocated for the `clearml-agent`
+If no flag is set, and `NVIDIA_VISIBLE_DEVICES` variable doesn't exist, all GPU's will be allocated for
+the `clearml-agent` <br>
+If `--cpu-only` flag is set, or `NVIDIA_VISIBLE_DEVICES="none"`, no gpu will be allocated for
+the `clearml-agent`

 Example: spin two agents, one per gpu on the same machine:
 Notice: with `--detached` flag, the *clearml-agent* will be running in the background
+
 ```bash
 clearml-agent daemon --detached --gpus 0 --queue default
 clearml-agent daemon --detached --gpus 1 --queue default
 ```

 Example: spin two agents, pulling from dedicated `dual_gpu` queue, two gpu's per agent
+
 ```bash
 clearml-agent daemon --detached --gpus 0,1 --queue dual_gpu
 clearml-agent daemon --detached --gpus 2,3 --queue dual_gpu
@@ -187,23 +218,29 @@ clearml-agent daemon --detached --gpus 2,3 --queue dual_gpu
 ##### Starting the ClearML Agent in docker mode

 For debug and experimentation, start the ClearML agent in `foreground` mode, where all the output is printed to screen
+
 ```bash
 clearml-agent daemon --queue default --docker --foreground
 ```

 For actual service mode, all the stdout will be stored automatically into a file (no need to pipe)
 Notice: with `--detached` flag, the *clearml-agent* will be running in the background
+
 ```bash
 clearml-agent daemon --detached --queue default --docker
 ```

-Example: spin two agents, one per gpu on the same machine, with default nvidia/cuda:10.1-cudnn7-runtime-ubuntu18.04 docker:
+Example: spin two agents, one per gpu on the same machine, with default nvidia/cuda:10.1-cudnn7-runtime-ubuntu18.04
+docker:
+
 ```bash
 clearml-agent daemon --detached --gpus 0 --queue default --docker nvidia/cuda:10.1-cudnn7-runtime-ubuntu18.04
 clearml-agent daemon --detached --gpus 1 --queue default --docker nvidia/cuda:10.1-cudnn7-runtime-ubuntu18.04
 ```

-Example: spin two agents, pulling from dedicated `dual_gpu` queue, two gpu's per agent, with default nvidia/cuda:10.1-cudnn7-runtime-ubuntu18.04 docker:
+Example: spin two agents, pulling from dedicated `dual_gpu` queue, two gpu's per agent, with default nvidia/cuda:
+10.1-cudnn7-runtime-ubuntu18.04 docker:
+
 ```bash
 clearml-agent daemon --detached --gpus 0,1 --queue dual_gpu --docker nvidia/cuda:10.1-cudnn7-runtime-ubuntu18.04
 clearml-agent daemon --detached --gpus 2,3 --queue dual_gpu --docker nvidia/cuda:10.1-cudnn7-runtime-ubuntu18.04
@@ -214,55 +251,61 @@ clearml-agent daemon --detached --gpus 2,3 --queue dual_gpu --docker nvidia/cuda
 Priority Queues are also supported, example use case:

 High priority queue: `important_jobs`  Low priority queue: `default`
+
 ```bash
 clearml-agent daemon --queue important_jobs default
 ```
-The **ClearML Agent** will first try to pull jobs from the `important_jobs` queue, only then it will fetch a job from the `default` queue.

-Adding queues, managing job order within a queue and moving jobs between queues, is available using the Web UI, see example on our [free server](https://app.community.clear.ml/workers-and-queues/queues)
+The **ClearML Agent** will first try to pull jobs from the `important_jobs` queue, only then it will fetch a job from
+the `default` queue.
+
+Adding queues, managing job order within a queue and moving jobs between queues, is available using the Web UI, see
+example on our [free server](https://app.clear.ml/workers-and-queues/queues)

 ##### Stopping the ClearML Agent

-To stop a **ClearML Agent** running in the background, run the same command line used to start the agent with `--stop` appended.
-For example, to stop the first of the above shown same machine, single gpu agents:
+To stop a **ClearML Agent** running in the background, run the same command line used to start the agent with `--stop`
+appended. For example, to stop the first of the above shown same machine, single gpu agents:
+
 ```bash
 clearml-agent daemon --detached --gpus 0 --queue default --docker nvidia/cuda:10.1-cudnn7-runtime-ubuntu18.04 --stop
 ```

 ### How do I create an experiment on the ClearML Server? <a name="from-scratch"></a>
+
 * Integrate [ClearML](https://github.com/allegroai/clearml) with your code
 * Execute the code on your machine (Manually / PyCharm / Jupyter Notebook)
 * As your code is running, **ClearML** creates an experiment logging all the necessary execution information:
-  - Git repository link and commit ID (or an entire jupyter notebook)
-  - Git diff (we’re not saying you never commit and push, but still...)
-  - Python packages used by your code (including specific versions used)
-  - Hyper-Parameters
-  - Input Artifacts
+    - Git repository link and commit ID (or an entire jupyter notebook)
+    - Git diff (we’re not saying you never commit and push, but still...)
+    - Python packages used by your code (including specific versions used)
+    - Hyper-Parameters
+    - Input Artifacts

  You now have a 'template' of your experiment with everything required for automated execution

-* In the ClearML UI, Right click on the experiment and select 'clone'. A copy of your experiment will be created.
+* In the ClearML UI, Right-click on the experiment and select 'clone'. A copy of your experiment will be created.
 * You now have a new draft experiment cloned from your original experiment, feel free to edit it
-  - Change the Hyper-Parameters
-  - Switch to the latest code base of the repository
-  - Update package versions
-  - Select a specific docker image to run in (see docker execution mode section)
-  - Or simply change nothing to run the same experiment again...
+    - Change the Hyper-Parameters
+    - Switch to the latest code base of the repository
+    - Update package versions
+    - Select a specific docker image to run in (see docker execution mode section)
+    - Or simply change nothing to run the same experiment again...
 * Schedule the newly created experiment for execution: Right-click the experiment and select 'enqueue'

 ### ClearML-Agent Services Mode <a name="services"></a>

-ClearML-Agent Services is a special mode of ClearML-Agent that provides the ability to launch long-lasting jobs
-that previously had to be executed on local / dedicated machines. It allows a single agent to
-launch multiple dockers (Tasks) for different use cases. To name a few use cases, auto-scaler service (spinning instances
-when the need arises and the budget allows), Controllers (Implementing pipelines and more sophisticated DevOps logic),
-Optimizer (such as Hyper-parameter Optimization or sweeping), and Application (such as interactive Bokeh apps for
-increased data transparency)
+ClearML-Agent Services is a special mode of ClearML-Agent that provides the ability to launch long-lasting jobs that
+previously had to be executed on local / dedicated machines. It allows a single agent to launch multiple dockers (Tasks)
+for different use cases. To name a few use cases, auto-scaler service (spinning instances when the need arises and the
+budget allows), Controllers (Implementing pipelines and more sophisticated DevOps logic), Optimizer (such as
+Hyper-parameter Optimization or sweeping), and Application (such as interactive Bokeh apps for increased data
+transparency)

-ClearML-Agent Services mode will spin **any** task enqueued into the specified queue.
-Every task launched by ClearML-Agent Services will be registered as a new node in the system,
-providing tracking and transparency capabilities.
-Currently clearml-agent in services-mode supports cpu only configuration. ClearML-agent services mode can be launched alongside GPU agents.
+ClearML-Agent Services mode will spin **any** task enqueued into the specified queue. Every task launched by
+ClearML-Agent Services will be registered as a new node in the system, providing tracking and transparency capabilities.
+Currently clearml-agent in services-mode supports cpu only configuration. ClearML-agent services mode can be launched
+alongside GPU agents.

 ```bash
 clearml-agent daemon --services-mode --detached --queue services --create-queue --docker ubuntu:18.04 --cpu-only
@@ -270,22 +313,27 @@ clearml-agent daemon --services-mode --detached --queue services --create-queue

 **Note**: It is the user's responsibility to make sure the proper tasks are pushed into the specified queue.

-
 ### AutoML and Orchestration Pipelines <a name="automl-pipes"></a>
-The ClearML Agent can also be used to implement AutoML orchestration and Experiment Pipelines in conjunction with the ClearML package.

-Sample AutoML & Orchestration examples can be found in the ClearML [example/automation](https://github.com/allegroai/clearml/tree/master/examples/automation) folder.
+The ClearML Agent can also be used to implement AutoML orchestration and Experiment Pipelines in conjunction with the
+ClearML package.
+
+Sample AutoML & Orchestration examples can be found in the
+ClearML [example/automation](https://github.com/allegroai/clearml/tree/master/examples/automation) folder.

 AutoML examples
-  - [Toy Keras training experiment](https://github.com/allegroai/clearml/blob/master/examples/optimization/hyper-parameter-optimization/base_template_keras_simple.py)
+
+- [Toy Keras training experiment](https://github.com/allegroai/clearml/blob/master/examples/optimization/hyper-parameter-optimization/base_template_keras_simple.py)
    - In order to create an experiment-template in the system, this code must be executed once manually
-  - [Random Search over the above Keras experiment-template](https://github.com/allegroai/clearml/blob/master/examples/automation/manual_random_param_search_example.py)
-    - This example will create multiple copies of the Keras experiment-template, with different hyper-parameter combinations
+- [Random Search over the above Keras experiment-template](https://github.com/allegroai/clearml/blob/master/examples/automation/manual_random_param_search_example.py)
+    - This example will create multiple copies of the Keras experiment-template, with different hyper-parameter
+      combinations

 Experiment Pipeline examples
-  - [First step experiment](https://github.com/allegroai/clearml/blob/master/examples/automation/task_piping_example.py)
+
+- [First step experiment](https://github.com/allegroai/clearml/blob/master/examples/automation/task_piping_example.py)
    - This example will "process data", and once done, will launch a copy of the 'second step' experiment-template
-  - [Second step experiment](https://github.com/allegroai/clearml/blob/master/examples/automation/toy_base_task.py)
+- [Second step experiment](https://github.com/allegroai/clearml/blob/master/examples/automation/toy_base_task.py)
    - In order to create an experiment-template in the system, this code must be executed once manually

 ### License
--- a/clearml_agent/main.py
+++ b/clearml_agent/main.py
@@ -12,7 +12,7 @@ from clearml_agent.definitions import FileBuffering, CONFIG_FILE
 from clearml_agent.helper.base import reverse_home_folder_expansion, chain_map, named_temporary_file
 from clearml_agent.helper.process import ExitStatus
 from . import interface, session, definitions, commands
-from .errors import ConfigFileNotFound, Sigterm, APIError
+from .errors import ConfigFileNotFound, Sigterm, APIError, CustomBuildScriptFailed
 from .helper.trace import PackageTrace
 from .interface import get_parser

@@ -44,6 +44,8 @@ def run_command(parser, args, command_name):
        debug = command._session.debug_mode
        func = getattr(command, command_name)
        return func(**args_dict)
+    except CustomBuildScriptFailed as e:
+        command_class.exit(e.message, e.errno)
    except ConfigFileNotFound:
        message = 'Cannot find configuration file in "{}".\n' \
                  'To create a configuration file, run:\n' \
--- a/clearml_agent/backend_api/config/default/agent.conf
+++ b/clearml_agent/backend_api/config/default/agent.conf
@@ -11,6 +11,11 @@

    # Set GIT user/pass credentials (if user/pass are set, GIT protocol will be set to https)
    # leave blank for GIT SSH credentials (set force_git_ssh_protocol=true to force SSH protocol)
+    # **Notice**: GitHub personal token is equivalent to password, you can put it directly into `git_pass`
+    # To learn how to generate git token GitHub/Bitbucket/GitLab:
+    # https://docs.github.com/en/authentication/keeping-your-account-and-data-secure/creating-a-personal-access-token
+    # https://support.atlassian.com/bitbucket-cloud/docs/app-passwords/
+    # https://docs.gitlab.com/ee/user/profile/personal_access_tokens.html
    # git_user: ""
    # git_pass: ""
    # git_host: ""
@@ -30,6 +35,22 @@
    # specific python version and the system supports multiple python the agent will use the requested python version)
    # ignore_requested_python_version: true

+    # Force the root folder of the git repository (instead of the working directory) into the PYHTONPATH
+    # default false, only the working directory will be added to the PYHTONPATH
+    # force_git_root_python_path: false
+
+    # if set, use GIT_ASKPASS to pass user/pass when cloning / fetch repositories
+    # it solves passing user/token to git submodules.
+    # this is a safer way to ensure multiple users using the same repository will
+    # not accidentally leak credentials
+    # Only supported on Linux systems, it will be the default in future releases
+    # enable_git_ask_pass: false
+
+    # in docker mode, if container's entrypoint automatically activated a virtual environment
+    # use the activated virtual environment and install everything there
+    # set to False to disable, and always create a new venv inheriting from the system_site_packages
+    # docker_use_activated_venv: true
+
    # select python package manager:
    # currently supported: pip, conda and poetry
    # if "pip" or "conda" are used, the agent installs the required packages
@@ -44,6 +65,8 @@

        # specify pip version to use (examples "<20", "==19.3.1", "", empty string will install the latest version)
        pip_version: "<20.2",
+        # specify poetry version to use (examples "<2", "==1.1.1", "", empty string will install the latest version)
+        # poetry_version: "<2",

        # virtual environment inheres packages from system
        system_site_packages: false,
@@ -67,7 +90,7 @@
        # set the optional priority packages to be installed before the rest of the required packages,
        # In case a package installation fails, the package will be ignored,
        # and the virtual environment process will continue
-        # priority_optional_packages: ["pygobject", ]
+        priority_optional_packages: ["pygobject", ]

        # set the post packages to be installed after all the rest of the required packages
        # post_packages: ["horovod", ]
@@ -156,7 +179,7 @@

    default_docker: {
        # default docker image to use when running in docker mode
-        image: "nvidia/cuda:10.1-cudnn7-runtime-ubuntu18.04"
+        image: "nvidia/cuda:10.2-cudnn7-runtime-ubuntu18.04"

        # optional arguments to pass to docker image
        # arguments: ["--ipc=host", ]
@@ -201,24 +224,99 @@
    hide_docker_command_env_vars {
        enabled: true
        extra_keys: []
+        parse_embedded_urls: true
    }

+    # Maximum execution time (in seconds) for Task's abort function call
+    abort_callback_max_timeout: 1800
+
    # allow to set internal mount points inside the docker,
    # especially useful for non-root docker container images.
    docker_internal_mounts {
        sdk_cache: "/clearml_agent_cache"
        apt_cache: "/var/cache/apt/archives"
-        ssh_folder: "/root/.ssh"
+        ssh_folder: "~/.ssh"
+        ssh_ro_folder: "/.ssh"
        pip_cache: "/root/.cache/pip"
        poetry_cache: "/root/.cache/pypoetry"
        vcs_cache: "/root/.clearml/vcs-cache"
-        venv_build: "/root/.clearml/venvs-builds"
+        venv_build: "~/.clearml/venvs-builds"
        pip_download: "/root/.clearml/pip-download-cache"
    }

    # Name docker containers created by the daemon using the following string format (supported from Docker 0.6.5)
-    # Allowed variables are task_id, worker_id and rand_string (random lower-case letters string, up to 32 charaters)
-    # Note: resulting name must start with an alpha-numeric character and continue with a alpha-numeric characters,
+    # Allowed variables are task_id, worker_id and rand_string (random lower-case letters string, up to 32 characters)
+    # Note: resulting name must start with an alphanumeric character and continue with alphanumeric characters,
    #  underscores (_), dots (.) and/or dashes (-)
    #docker_container_name_format: "clearml-id-{task_id}-{rand_string:.8}"
+
+    # Apply top-level environment section from configuration into os.environ
+    apply_environment: true
+    # Top-level environment section is in the form of:
+    #   environment {
+    #     key: value
+    #     ...
+    #   }
+    # and is applied to the OS environment as `key=value` for each key/value pair
+
+    # Apply top-level files section from configuration into local file system
+    apply_files: true
+    # Top-level files section allows auto-generating files at designated paths with a predefined contents
+    # and target format. Options include:
+    #  contents: the target file's content, typically a string (or any base type int/float/list/dict etc.)
+    #  format: a custom format for the contents. Currently supported value is `base64` to automatically decode a
+    #          base64-encoded contents string, otherwise ignored
+    #  path: the target file's path, may include ~ and inplace env vars
+    #  target_format: format used to encode contents before writing into the target file. Supported values are json,
+    #                 yaml, yml and bytes (in which case the file will be written in binary mode). Default is text mode.
+    #  overwrite: overwrite the target file in case it exists. Default is true.
+    #
+    # Example:
+    #   files {
+    #     myfile1 {
+    #       contents: "The quick brown fox jumped over the lazy dog"
+    #       path: "/tmp/fox.txt"
+    #     }
+    #     myjsonfile {
+    #       contents: {
+    #         some {
+    #           nested {
+    #             value: [1, 2, 3, 4]
+    #           }
+    #         }
+    #       }
+    #       path: "/tmp/test.json"
+    #       target_format: json
+    #     }
+    #   }
+
+    # Specifies a custom environment setup script to be executed instead of installing a virtual environment.
+    # If provided, this script is executed following Git cloning. Script command may include environment variable and
+    # will be expanded before execution (e.g. "$CLEARML_GIT_ROOT/script.sh").
+    # The script can also be specified using the CLEARML_AGENT_CUSTOM_BUILD_SCRIPT environment variable.
+    #
+    # When running the script, the following environment variables will be set:
+    # - CLEARML_CUSTOM_BUILD_TASK_CONFIG_JSON: specifies a path to a temporary files containing the complete task
+    #  contents in JSON format
+    # - CLEARML_TASK_SCRIPT_ENTRY: task entrypoint script as defined in the task's script section
+    # - CLEARML_TASK_WORKING_DIR: task working directory as defined in the task's script section
+    # - CLEARML_VENV_PATH: path to the agent's default virtual environment path (as defined in the configuration)
+    # - CLEARML_GIT_ROOT: path to the cloned Git repository
+    # - CLEARML_CUSTOM_BUILD_OUTPUT: a path to a non-existing file that may be created by the script. If created,
+    #  this file must be in the following JSON format:
+    #      ```json
+    #      {
+    #        "binary": "/absolute/path/to/python-executable",
+    #        "entry_point": "/absolute/path/to/task-entrypoint-script",
+    #        "working_dir": "/absolute/path/to/task-working/dir"
+    #      }
+    #      ```
+    #  If provided, the agent will use these instead of the predefined task script section to execute the task and will
+    #  skip virtual environment creation.
+    #
+    # In case the custom script returns with a non-zero exit code, the agent will fail with the same exit code.
+    # In case the custom script is specified but does not exist, or if the custom script does not write valid content
+    # into the file specified in CLEARML_CUSTOM_BUILD_OUTPUT, the agent will emit a warning and continue with the
+    # standard flow.
+    custom_build_script: ""
 }
--- a/clearml_agent/backend_api/session/defs.py
+++ b/clearml_agent/backend_api/session/defs.py
@@ -13,6 +13,19 @@ ENV_HOST_VERIFY_CERT = EnvEntry("CLEARML_API_HOST_VERIFY_CERT", "TRAINS_API_HOST
 ENV_CONDA_ENV_PACKAGE = EnvEntry("CLEARML_CONDA_ENV_PACKAGE", "TRAINS_CONDA_ENV_PACKAGE")
 ENV_NO_DEFAULT_SERVER = EnvEntry("CLEARML_NO_DEFAULT_SERVER", "TRAINS_NO_DEFAULT_SERVER", type=bool, default=True)
 ENV_DISABLE_VAULT_SUPPORT = EnvEntry('CLEARML_AGENT_DISABLE_VAULT_SUPPORT', type=bool)
+ENV_ENABLE_ENV_CONFIG_SECTION = EnvEntry('CLEARML_AGENT_ENABLE_ENV_CONFIG_SECTION', type=bool)
+ENV_ENABLE_FILES_CONFIG_SECTION = EnvEntry('CLEARML_AGENT_ENABLE_FILES_CONFIG_SECTION', type=bool)
+ENV_VENV_CONFIGURED = EnvEntry('VIRTUAL_ENV', type=str)
+ENV_PROPAGATE_EXITCODE = EnvEntry("CLEARML_AGENT_PROPAGATE_EXITCODE", type=bool, default=False)
 ENV_INITIAL_CONNECT_RETRY_OVERRIDE = EnvEntry(
    'CLEARML_AGENT_INITIAL_CONNECT_RETRY_OVERRIDE', default=True, converter=safe_text_to_bool
 )
+
+"""
+Experimental option to set the request method for all API requests and auth login.
+This could be useful when GET requests with payloads are blocked by a server as
+POST requests can be used instead.
+
+However this has not been vigorously tested and may have unintended consequences.
+"""
+ENV_API_DEFAULT_REQ_METHOD = EnvEntry("CLEARML_API_DEFAULT_REQ_METHOD", default="GET")
--- a/clearml_agent/backend_api/session/request.py
+++ b/clearml_agent/backend_api/session/request.py
@@ -5,10 +5,17 @@ import six

 from .apimodel import ApiModel
 from .datamodel import DataModel
+from .defs import ENV_API_DEFAULT_REQ_METHOD
+
+
+if ENV_API_DEFAULT_REQ_METHOD.get().upper() not in ("GET", "POST"):
+    raise ValueError(
+        "CLEARML_API_DEFAULT_REQ_METHOD environment variable must be 'get' or 'post' (any case is allowed)."
+    )


 class Request(ApiModel):
-    _method = 'get'
+    _method = ENV_API_DEFAULT_REQ_METHOD.get(default="get")

    def __init__(self, **kwargs):
        if kwargs:
--- a/clearml_agent/backend_api/session/session.py
+++ b/clearml_agent/backend_api/session/session.py
@@ -15,7 +15,7 @@ from six.moves.urllib.parse import urlparse, urlunparse

 from .callresult import CallResult
 from .defs import ENV_VERBOSE, ENV_HOST, ENV_ACCESS_KEY, ENV_SECRET_KEY, ENV_WEB_HOST, ENV_FILES_HOST, ENV_AUTH_TOKEN, \
-    ENV_NO_DEFAULT_SERVER, ENV_DISABLE_VAULT_SUPPORT, ENV_INITIAL_CONNECT_RETRY_OVERRIDE
+    ENV_NO_DEFAULT_SERVER, ENV_DISABLE_VAULT_SUPPORT, ENV_INITIAL_CONNECT_RETRY_OVERRIDE, ENV_API_DEFAULT_REQ_METHOD
 from .request import Request, BatchRequest
 from .token_manager import TokenManager
 from ..config import load
@@ -142,7 +142,7 @@ class Session(TokenManager):
                "Could not find host server definition "
                "(missing `~/clearml.conf` or Environment CLEARML_API_HOST)\n"
                "To get started with ClearML: setup your own `clearml-server`, "
-                "or create a free account at https://app.community.clear.ml and run `clearml-agent init`"
+                "or create a free account at https://app.clear.ml and run `clearml-agent init`"
            )

        self.__host = host.strip("/")
@@ -183,8 +183,6 @@ class Session(TokenManager):
        # notice: this is across the board warning omission
        urllib_log_warning_setup(total_retries=http_retries_config.get('total', 0), display_warning_after=3)

-        self._load_vaults()
-
    def _setup_session(self, http_retries_config, initial_session=False, default_initial_connect_override=None):
        # type: (dict, bool, Optional[bool]) -> (dict, requests.Session)
        http_retries_config = http_retries_config or self.config.get(
@@ -208,9 +206,9 @@ class Session(TokenManager):
                http_retries_config = dict(**http_retries_config)
                http_retries_config['connect'] = connect_retries

-        return http_retries_config, get_http_session_with_retry(**http_retries_config)
+        return http_retries_config, get_http_session_with_retry(config=self.config or None, **http_retries_config)

-    def _load_vaults(self):
+    def load_vaults(self):
        if not self.check_min_api_version("2.15") or self.feature_set == "basic":
            return

@@ -242,6 +240,12 @@ class Session(TokenManager):
        except Exception as ex:
            print("Failed getting vaults: {}".format(ex))

+    def verify_feature_set(self, feature_set):
+        if isinstance(feature_set, str):
+            feature_set = [feature_set]
+        if self.feature_set not in feature_set:
+            raise ValueError('ClearML-server does not support requested feature set {}'.format(feature_set))
+
    def _send_request(
        self,
        service,
@@ -611,6 +615,7 @@ class Session(TokenManager):
        try:
            data = {"expiration_sec": exp} if exp else {}
            res = self._send_request(
+                method=ENV_API_DEFAULT_REQ_METHOD.get(default="get"),
                service="auth",
                action="login",
                auth=auth,
--- a/clearml_agent/backend_api/session/token_manager.py
+++ b/clearml_agent/backend_api/session/token_manager.py
@@ -87,10 +87,16 @@ class TokenManager(object):
    @classmethod
    def get_decoded_token(cls, token, verify=False):
        """ Get token expiration time. If not present, assume forever """
+        if hasattr(jwt, '__version__') and jwt.__version__[0] == '1':
+            return jwt.decode(
+                token,
+                verify=verify,
+                algorithms=get_default_algorithms(),
+            )
+        
        return jwt.decode(
            token,
-            verify=verify,
-            options=dict(verify_signature=False),
+            options=dict(verify_signature=verify),
            algorithms=get_default_algorithms(),
        )

--- a/clearml_agent/backend_config/config.py
+++ b/clearml_agent/backend_config/config.py
@@ -82,7 +82,7 @@ class Config(object):
        relative_to=None,
        app=None,
        is_server=False,
-        **_,
+        **_
    ):
        self._app = app
        self._verbose = verbose
@@ -214,7 +214,7 @@ class Config(object):
                    .lower()
                )
                result = ConfigTree.merge_configs(
-                    result, ConfigFactory.parse_string(f"{path}: {os.environ[key]}")
+                    result, ConfigFactory.parse_string("{}: {}".format(path, os.environ[key]))
                )

        return result
--- a/clearml_agent/backend_config/utils.py
+++ b/clearml_agent/backend_config/utils.py
@@ -1,3 +1,14 @@
+import base64
+import os
+from os.path import expandvars, expanduser
+from pathlib import Path
+from typing import List, TYPE_CHECKING
+
+from pyhocon import HOCONConverter, ConfigTree
+
+if TYPE_CHECKING:
+    from .config import Config
+

 def get_items(cls):
    """ get key/value items from an enum-like class (members represent enumeration key/value) """
@@ -7,3 +18,95 @@ def get_items(cls):
 def get_options(cls):
    """ get options from an enum-like class (members represent enumeration key/value) """
    return get_items(cls).values()
+
+
+def apply_environment(config):
+    # type: (Config) -> List[str]
+    env_vars = config.get("environment", None)
+    if not env_vars:
+        return []
+    if isinstance(env_vars, (list, tuple)):
+        env_vars = dict(env_vars)
+
+    keys = list(filter(None, env_vars.keys()))
+
+    for key in keys:
+        os.environ[str(key)] = str(env_vars[key] or "")
+
+    return keys
+
+
+def apply_files(config):
+    # type: (Config) -> None
+    files = config.get("files", None)
+    if not files:
+        return
+
+    if isinstance(files, (list, tuple)):
+        files = dict(files)
+
+    print("Creating files from configuration")
+    for key, data in files.items():
+        path = data.get("path")
+        fmt = data.get("format", "string")
+        target_fmt = data.get("target_format", "string")
+        overwrite = bool(data.get("overwrite", True))
+        contents = data.get("contents")
+
+        target = Path(expanduser(expandvars(path)))
+
+        # noinspection PyBroadException
+        try:
+            if target.is_dir():
+                print("Skipped [{}]: is a directory {}".format(key, target))
+                continue
+
+            if not overwrite and target.is_file():
+                print("Skipped [{}]: file exists {}".format(key, target))
+                continue
+        except Exception as ex:
+            print("Skipped [{}]: can't access {} ({})".format(key, target, ex))
+            continue
+
+        if contents:
+            try:
+                if fmt == "base64":
+                    contents = base64.b64decode(contents)
+                    if target_fmt != "bytes":
+                        contents = contents.decode("utf-8")
+            except Exception as ex:
+                print("Skipped [{}]: failed decoding {} ({})".format(key, fmt, ex))
+                continue
+
+        # noinspection PyBroadException
+        try:
+            target.parent.mkdir(parents=True, exist_ok=True)
+        except Exception as ex:
+            print("Skipped [{}]: failed creating path {} ({})".format(key, target.parent, ex))
+            continue
+
+        try:
+            if target_fmt == "bytes":
+                try:
+                    target.write_bytes(contents)
+                except TypeError:
+                    # simpler error so the user won't get confused
+                    raise TypeError("a bytes-like object is required")
+            else:
+                try:
+                    if target_fmt == "json":
+                        text = HOCONConverter.to_json(contents)
+                    elif target_fmt in ("yaml", "yml"):
+                        text = HOCONConverter.to_yaml(contents)
+                    else:
+                        if isinstance(contents, ConfigTree):
+                            contents = contents.as_plain_ordered_dict()
+                        text = str(contents)
+                except Exception as ex:
+                    print("Skipped [{}]: failed encoding to {} ({})".format(key, target_fmt, ex))
+                    continue
+                target.write_text(text)
+            print("Saved [{}]: {}".format(key, target))
+        except Exception as ex:
+            print("Skipped [{}]: failed saving file {} ({})".format(key, target, ex))
+            continue
--- a/clearml_agent/commands/base.py
+++ b/clearml_agent/commands/base.py
@@ -347,7 +347,7 @@ class ServiceCommandSection(BaseCommandSection):
        except AttributeError:
            raise NameResolutionError('Name resolution unavailable for {}'.format(service))

-        request = request_cls.from_dict(dict(name=name, only_fields=['name', 'id']))
+        request = request_cls.from_dict(dict(name=re.escape(name), only_fields=['name', 'id']))
        # from_dict will ignore unrecognised keyword arguments - not all GetAll's have only_fields
        response = getattr(self._session.send_api(request), service)
        matches = [db_object for db_object in response if name.lower() == db_object.name.lower()]
--- a/clearml_agent/commands/config.py
+++ b/clearml_agent/commands/config.py
@@ -11,10 +11,10 @@ from clearml_agent.backend_config.defs import LOCAL_CONFIG_FILES


 description = """
-Please create new clearml credentials through the profile page in your `clearml-server` web app, 
-or create a free account at https://app.community.clear.ml/profile
+Please create new clearml credentials through the settings page in your `clearml-server` web app, 
+or create a free account at https://app.clear.ml/settings/webapp-configuration
    
-In the profile page, press "Create new credentials", then press "Copy to clipboard".
+In the settings > workspace  page, press "Create new credentials", then press "Copy to clipboard".

 Paste copied configuration here: 
 """
@@ -27,9 +27,9 @@ except Exception:

 host_description = """
 Editing configuration file: {CONFIG_FILE}
-Enter the url of the clearml-server's Web service, for example: {HOST}
+Enter the url of the clearml-server's Web service, for example: {HOST} or https://app.clear.ml
 """.format(
-    CONFIG_FILE=LOCAL_CONFIG_FILES[0],
+    CONFIG_FILE=LOCAL_CONFIG_FILES[-1],
    HOST=def_host,
 )

@@ -84,7 +84,7 @@ def main():
        host = input_url('API Host', api_server)
    else:
        print(host_description)
-        host = input_url('WEB Host', '')
+        host = input_url('WEB Host', 'https://app.clear.ml')

    parsed_host = verify_url(host)
    api_host, files_host, web_host = parse_host(parsed_host, allow_input=True)
@@ -116,9 +116,15 @@ def main():
    print('Enter git username for repository cloning (leave blank for SSH key authentication): [] ', end='')
    git_user = input()
    if git_user.strip():
-        print('Enter password for user \'{}\': '.format(git_user), end='')
+        print(
+            "Git personal token is equivalent to a password, to learn how to generate a token:\n"
+            "  GitHub: https://docs.github.com/en/authentication/keeping-your-account-and-data-secure/creating-a-personal-access-token\n"  # noqa
+            "  Bitbucket: https://support.atlassian.com/bitbucket-cloud/docs/app-passwords/\n"
+            "  GitLab: https://docs.gitlab.com/ee/user/profile/personal_access_tokens.html\n"
+        )
+        print('Enter git personal token for user \'{}\': '.format(git_user), end='')
        git_pass = input()
-        print('Git repository cloning will be using user={} password={}'.format(git_user, git_pass))
+        print('Git repository cloning will be using user={} token={}'.format(git_user, git_pass))
    else:
        git_user = None
        git_pass = None
@@ -157,7 +163,7 @@ def main():
                     '    api_server: %s\n' \
                     '    web_server: %s\n' \
                     '    files_server: %s\n' \
-                     '    # Credentials are generated using the webapp, %s/profile\n' \
+                     '    # Credentials are generated using the webapp, %s/settings\n' \
                     '    # Override with os environment: CLEARML_API_ACCESS_KEY / CLEARML_API_SECRET_KEY\n' \
                     '    credentials {"access_key": "%s", "secret_key": "%s"}\n' \
                     '}\n\n' % (api_host, web_host, files_host,
--- a/clearml_agent/commands/resolver.py
+++ b/clearml_agent/commands/resolver.py
@@ -0,0 +1,166 @@
+import json
+import re
+import shlex
+from clearml_agent.helper.package.requirements import (
+    RequirementsManager, MarkerRequirement,
+    compare_version_rules, )
+
+
+def resolve_default_container(session, task_id, container_config):
+    container_lookup = session.config.get('agent.default_docker.match_rules', None)
+    if not session.check_min_api_version("2.13") or not container_lookup:
+        return container_config
+
+    # check backend support before sending any more requests (because they will fail and crash the Task)
+    try:
+        session.verify_feature_set('advanced')
+    except ValueError:
+        return container_config
+
+    result = session.send_request(
+        service='tasks',
+        action='get_all',
+        version='2.14',
+        json={'id': [task_id],
+              'only_fields': ['script.requirements', 'script.binary',
+                              'script.repository', 'script.branch',
+                              'project', 'container'],
+              'search_hidden': True},
+        method='get',
+        async_enable=False,
+    )
+    try:
+        task_info = result.json()['data']['tasks'][0] if result.ok else {}
+    except (ValueError, TypeError):
+        return container_config
+
+    from clearml_agent.external.requirements_parser.requirement import Requirement
+
+    # store tasks repository
+    repository = task_info.get('script', {}).get('repository') or ''
+    branch = task_info.get('script', {}).get('branch') or ''
+    binary = task_info.get('script', {}).get('binary') or ''
+    requested_container = task_info.get('container', {})
+
+    # get project full path
+    project_full_name = ''
+    if task_info.get('project', None):
+        result = session.send_request(
+            service='projects',
+            action='get_all',
+            version='2.13',
+            json={
+                'id': [task_info.get('project')],
+                'only_fields': ['name'],
+            },
+            method='get',
+            async_enable=False,
+        )
+        try:
+            if result.ok:
+                project_full_name = result.json()['data']['projects'][0]['name'] or ''
+        except (ValueError, TypeError):
+            pass
+
+    task_packages_lookup = {}
+    for entry in container_lookup:
+        match = entry.get('match', None)
+        if not match:
+            continue
+        if match.get('project', None):
+            # noinspection PyBroadException
+            try:
+                if not re.search(match.get('project', None), project_full_name):
+                    continue
+            except Exception:
+                print('Failed parsing regular expression \"{}\" in rule: {}'.format(
+                    match.get('project', None), entry))
+                continue
+
+        if match.get('script.repository', None):
+            # noinspection PyBroadException
+            try:
+                if not re.search(match.get('script.repository', None), repository):
+                    continue
+            except Exception:
+                print('Failed parsing regular expression \"{}\" in rule: {}'.format(
+                    match.get('script.repository', None), entry))
+                continue
+
+        if match.get('script.branch', None):
+            # noinspection PyBroadException
+            try:
+                if not re.search(match.get('script.branch', None), branch):
+                    continue
+            except Exception:
+                print('Failed parsing regular expression \"{}\" in rule: {}'.format(
+                    match.get('script.branch', None), entry))
+                continue
+
+        if match.get('script.binary', None):
+            # noinspection PyBroadException
+            try:
+                if not re.search(match.get('script.binary', None), binary):
+                    continue
+            except Exception:
+                print('Failed parsing regular expression \"{}\" in rule: {}'.format(
+                    match.get('script.binary', None), entry))
+                continue
+
+        if match.get('container', None):
+            # noinspection PyBroadException
+            try:
+                if not re.search(match.get('container', None), requested_container.get('image', '')):
+                    continue
+            except Exception:
+                print('Failed parsing regular expression \"{}\" in rule: {}'.format(
+                    match.get('container', None), entry))
+                continue
+
+        matched = True
+        for req_section in ['script.requirements.pip', 'script.requirements.conda']:
+            if not match.get(req_section, None):
+                continue
+
+            match_pip_reqs = [MarkerRequirement(Requirement.parse('{} {}'.format(k, v)))
+                              for k, v in match.get(req_section, None).items()]
+
+            if not task_packages_lookup.get(req_section):
+                req_section_parts = req_section.split('.')
+                task_packages_lookup[req_section] = \
+                    RequirementsManager.parse_requirements_section_to_marker_requirements(
+                        requirements=task_info.get(req_section_parts[0], {}).get(
+                            req_section_parts[1], {}).get(req_section_parts[2], None)
+                    )
+
+            matched_all_reqs = True
+            for mr in match_pip_reqs:
+                matched_req = False
+                for pr in task_packages_lookup[req_section]:
+                    if mr.req.name != pr.req.name:
+                        continue
+                    if compare_version_rules(mr.specs, pr.specs):
+                        matched_req = True
+                        break
+                if not matched_req:
+                    matched_all_reqs = False
+                    break
+
+            # if ew have a match, check second section
+            if matched_all_reqs:
+                continue
+            # no match stop
+            matched = False
+            break
+
+        if matched:
+            if not container_config.get('container'):
+                container_config['container'] = entry.get('image', None)
+            if not container_config.get('arguments'):
+                container_config['arguments'] = entry.get('arguments', None)
+                container_config['arguments'] = shlex.split(str(container_config.get('arguments') or '').strip())
+            print('Matching default container with rule:\n{}'.format(json.dumps(entry)))
+            return container_config
+
+    return container_config
+
--- a/clearml_agent/commands/worker.py
+++ b/clearml_agent/commands/worker.py
--- a/clearml_agent/definitions.py
+++ b/clearml_agent/definitions.py
@@ -126,6 +126,7 @@ DEFAULT_VENV_UPDATE_URL = (
    "https://raw.githubusercontent.com/Yelp/venv-update/v3.2.4/venv_update.py"
 )
 WORKING_REPOSITORY_DIR = "task_repository"
+WORKING_STANDALONE_DIR = "code"
 DEFAULT_VCS_CACHE = normalize_path(CONFIG_DIR, "vcs-cache")
 PIP_EXTRA_INDICES = [
 ]
@@ -134,6 +135,7 @@ ENV_DOCKER_IMAGE = EnvironmentConfig('CLEARML_DOCKER_IMAGE', 'TRAINS_DOCKER_IMAG
 ENV_WORKER_ID = EnvironmentConfig('CLEARML_WORKER_ID', 'TRAINS_WORKER_ID')
 ENV_WORKER_TAGS = EnvironmentConfig('CLEARML_WORKER_TAGS')
 ENV_AGENT_SKIP_PIP_VENV_INSTALL = EnvironmentConfig('CLEARML_AGENT_SKIP_PIP_VENV_INSTALL')
+ENV_AGENT_SKIP_PYTHON_ENV_INSTALL = EnvironmentConfig('CLEARML_AGENT_SKIP_PYTHON_ENV_INSTALL', type=bool)
 ENV_DOCKER_SKIP_GPUS_FLAG = EnvironmentConfig('CLEARML_DOCKER_SKIP_GPUS_FLAG', 'TRAINS_DOCKER_SKIP_GPUS_FLAG')
 ENV_AGENT_GIT_USER = EnvironmentConfig('CLEARML_AGENT_GIT_USER', 'TRAINS_AGENT_GIT_USER')
 ENV_AGENT_GIT_PASS = EnvironmentConfig('CLEARML_AGENT_GIT_PASS', 'TRAINS_AGENT_GIT_PASS')
@@ -146,6 +148,39 @@ ENV_DOCKER_HOST_MOUNT = EnvironmentConfig('CLEARML_AGENT_K8S_HOST_MOUNT', 'CLEAR
                                          'TRAINS_AGENT_K8S_HOST_MOUNT', 'TRAINS_AGENT_DOCKER_HOST_MOUNT')
 ENV_VENV_CACHE_PATH = EnvironmentConfig('CLEARML_AGENT_VENV_CACHE_PATH')
 ENV_EXTRA_DOCKER_ARGS = EnvironmentConfig('CLEARML_AGENT_EXTRA_DOCKER_ARGS', type=list)
+ENV_DEBUG_INFO = EnvironmentConfig('CLEARML_AGENT_DEBUG_INFO')
+
+ENV_CUSTOM_BUILD_SCRIPT = EnvironmentConfig('CLEARML_AGENT_CUSTOM_BUILD_SCRIPT')
+"""
+    Specifies a custom environment setup script to be executed instead of installing a virtual environment.
+    If provided, this script is executed following Git cloning. Script command may include environment variable and
+    will be expanded before execution (e.g. "$CLEARML_GIT_ROOT/script.sh").
+    The script can also be specified using the `agent.custom_build_script` configuration setting.
+    
+    When running the script, the following environment variables will be set:
+    - CLEARML_CUSTOM_BUILD_TASK_CONFIG_JSON: specifies a path to a temporary files containing the complete task
+     contents in JSON format
+    - CLEARML_TASK_SCRIPT_ENTRY: task entrypoint script as defined in the task's script section
+    - CLEARML_TASK_WORKING_DIR: task working directory as defined in the task's script section
+    - CLEARML_VENV_PATH: path to the agent's default virtual environment path (as defined in the configuration)
+    - CLEARML_GIT_ROOT: path to the cloned Git repository
+    - CLEARML_CUSTOM_BUILD_OUTPUT: a path to a non-existing file that may be created by the script. If created,
+     this file must be in the following JSON format:
+         ```json
+         {
+           "binary": "/absolute/path/to/python-executable",
+           "entry_point": "/absolute/path/to/task-entrypoint-script",
+           "working_dir": "/absolute/path/to/task-working/dir"
+         }
+         ```
+     If provided, the agent will use these instead of the predefined task script section to execute the task and will
+     skip virtual environment creation.
+    
+    In case the custom script returns with a non-zero exit code, the agent will fail with the same exit code.
+    In case the custom script is specified but does not exist, or if the custom script does not write valid content
+    into the file specified in CLEARML_CUSTOM_BUILD_OUTPUT, the agent will emit a warning and continue with the
+    standard flow.
+"""


 class FileBuffering(IntEnum):
--- a/clearml_agent/errors.py
+++ b/clearml_agent/errors.py
@@ -84,3 +84,13 @@ class MissingPackageError(CommandFailedError):
    def __str__(self):
        return '{self.__class__.__name__}: ' \
               '"{self.name}" package is required. Please run "pip install {self.name}"'.format(self=self)
+
+
+class CustomBuildScriptFailed(CommandFailedError):
+    def __init__(self, errno, *args, **kwargs):
+        super(CustomBuildScriptFailed, self).__init__(*args, **kwargs)
+        self.errno = errno
+
+
+class SkippedCustomBuildScript(CommandFailedError):
+    pass
--- a/clearml_agent/external/requirements_parser/parser.py
+++ b/clearml_agent/external/requirements_parser/parser.py
@@ -1,6 +1,9 @@
 import os
+import re
 import warnings

+from clearml_agent.definitions import PIP_EXTRA_INDICES
+
 from .requirement import Requirement


@@ -42,9 +45,14 @@ def parse(reqstr, cwd=None):
                    yield requirement
        elif line.startswith('-f') or line.startswith('--find-links') or \
                line.startswith('-i') or line.startswith('--index-url') or \
-                line.startswith('--extra-index-url') or \
                line.startswith('--no-index'):
            warnings.warn('Private repos not supported. Skipping.')
+        elif line.startswith('--extra-index-url'):
+            extra_index = line[len('--extra-index-url'):].strip()
+            extra_index = re.sub(r"\s+#.*$", "", extra_index)  # strip comments
+            if extra_index and extra_index not in PIP_EXTRA_INDICES:
+                PIP_EXTRA_INDICES.append(extra_index)
+                print(f"appended {extra_index} to list of extra pip indices")
            continue
        elif line.startswith('-Z') or line.startswith('--always-unzip'):
            warnings.warn('Unused option --always-unzip. Skipping.')
--- a/clearml_agent/glue/definitions.py
+++ b/clearml_agent/glue/definitions.py
@@ -0,0 +1,7 @@
+from clearml_agent.definitions import EnvironmentConfig
+
+ENV_START_AGENT_SCRIPT_PATH = EnvironmentConfig('CLEARML_K8S_GLUE_START_AGENT_SCRIPT_PATH')
+"""
+Script path to use when creating the bash script to run the agent inside the scheduled pod's docker container. 
+Script will be appended to the specified file.
+"""
--- a/clearml_agent/glue/k8s.py
+++ b/clearml_agent/glue/k8s.py
@@ -11,6 +11,7 @@ import subprocess
 import tempfile
 from copy import deepcopy
 from pathlib import Path
+from pprint import pformat
 from threading import Thread
 from time import sleep
 from typing import Text, List, Callable, Any, Collection, Optional, Union
@@ -18,7 +19,7 @@ from typing import Text, List, Callable, Any, Collection, Optional, Union
 import yaml

 from clearml_agent.commands.events import Events
-from clearml_agent.commands.worker import Worker, get_task_container, set_task_container
+from clearml_agent.commands.worker import Worker, get_task_container, set_task_container, get_next_task
 from clearml_agent.definitions import ENV_DOCKER_IMAGE
 from clearml_agent.errors import APIError
 from clearml_agent.helper.base import safe_remove_file
@@ -27,6 +28,8 @@ from clearml_agent.helper.process import get_bash_output
 from clearml_agent.helper.resource_monitor import ResourceMonitor
 from clearml_agent.interface.base import ObjectID

+from .definitions import ENV_START_AGENT_SCRIPT_PATH
+

 class K8sIntegration(Worker):
    K8S_PENDING_QUEUE = "k8s_scheduler"
@@ -69,7 +72,7 @@ class K8sIntegration(Worker):
        "apt-get update",
        "apt-get install -y git libsm6 libxext6 libxrender-dev libglib2.0-0",
        "declare LOCAL_PYTHON",
-        "for i in {{10..5}}; do which python3.$i && python3.$i -m pip --version && "
+        "[ ! -z $LOCAL_PYTHON ] || for i in {{15..5}}; do which python3.$i && python3.$i -m pip --version && "
        "export LOCAL_PYTHON=$(which python3.$i) && break ; done",
        "[ ! -z $LOCAL_PYTHON ] || apt-get install -y python3-pip",
        "[ ! -z $LOCAL_PYTHON ] || export LOCAL_PYTHON=python3",
@@ -119,7 +122,7 @@ class K8sIntegration(Worker):
            when scheduling a task to run in a pod. Callable can receive an optional pod number and should return
            a dictionary of user properties (name and value). Signature is [[Optional[int]], Dict[str,str]]
        :param str overrides_yaml: YAML file containing the overrides for the pod (optional)
-        :param str template_yaml: YAML file containing the template  for the pod (optional).
+        :param str template_yaml: YAML file containing the template for the pod (optional).
            If provided the pod is scheduled with kubectl apply and overrides are ignored, otherwise with kubectl run.
        :param str clearml_conf_file: clearml.conf file to be use by the pod itself (optional)
        :param str extra_bash_init_script: Additional bash script to run before starting the Task inside the container
@@ -128,6 +131,7 @@ class K8sIntegration(Worker):
        """
        super(K8sIntegration, self).__init__()
        self.k8s_pending_queue_name = k8s_pending_queue_name or self.K8S_PENDING_QUEUE
+        self.k8s_pending_queue_id = None
        self.kubectl_cmd = kubectl_cmd or self.KUBECTL_RUN_CMD
        self.container_bash_script = container_bash_script or self.CONTAINER_BASH_SCRIPT
        # Always do system packages, because by we will be running inside a docker
@@ -135,7 +139,8 @@ class K8sIntegration(Worker):
        # Add debug logging
        if debug:
            self.log.logger.disabled = False
-            self.log.logger.setLevel(logging.INFO)
+            self.log.logger.setLevel(logging.DEBUG)
+            self.log.logger.addHandler(logging.StreamHandler())
        self.ports_mode = ports_mode
        self.num_of_services = num_of_services
        self.base_pod_num = base_pod_num
@@ -152,8 +157,7 @@ class K8sIntegration(Worker):
        self.pod_requests = []
        self.max_pods_limit = max_pods_limit if not self.ports_mode else None
        if overrides_yaml:
-            with open(os.path.expandvars(os.path.expanduser(str(overrides_yaml))), 'rt') as f:
-                overrides = yaml.load(f, Loader=getattr(yaml, 'FullLoader', None))
+            overrides = self._load_template_file(overrides_yaml)
            if overrides:
                containers = overrides.get('spec', {}).get('containers', [])
                for c in containers:
@@ -174,8 +178,7 @@ class K8sIntegration(Worker):
                    self.log.warning('Removing containers section: {}'.format(overrides['spec'].pop('containers')))
                self.overrides_json_string = json.dumps(overrides)
        if template_yaml:
-            with open(os.path.expandvars(os.path.expanduser(str(template_yaml))), 'rt') as f:
-                self.template_dict = yaml.load(f, Loader=getattr(yaml, 'FullLoader', None))
+            self.template_dict = self._load_template_file(template_yaml)

        clearml_conf_file = clearml_conf_file or kwargs.get('trains_conf_file')

@@ -194,6 +197,11 @@ class K8sIntegration(Worker):
        _check_pod_thread.daemon = True
        _check_pod_thread.start()

+    @staticmethod
+    def _load_template_file(path):
+        with open(os.path.expandvars(os.path.expanduser(str(path))), 'rt') as f:
+            return yaml.load(f, Loader=getattr(yaml, 'FullLoader', None))
+
    @staticmethod
    def _get_path(d, *path, default=None):
        try:
@@ -203,13 +211,27 @@ class K8sIntegration(Worker):
        except (IndexError, KeyError):
            return default

+    def _get_kubectl_options(self, command, extra_labels=None):
+        labels = [self._get_agent_label()] + (list(extra_labels) if extra_labels else [])
+        return {
+            "-l": ",".join(labels),
+            "-n": str(self.namespace),
+            "-o": "json"
+        }
+
+    def get_kubectl_command(self, command, extra_labels=None):
+        opts = self._get_kubectl_options(command, extra_labels)
+        return 'kubectl {command} {opts}'.format(
+            command=command, opts=" ".join(x for item in opts.items() for x in item)
+        )
+
    def _monitor_hanging_pods_daemon(self):
        last_tasks_msgs = {}  # last msg updated for every task

        while True:
-            output = get_bash_output('kubectl get pods -n {namespace} -o=JSON'.format(
-                namespace=self.namespace
-            ))
+            kubectl_cmd = self.get_kubectl_command("get pods")
+            self.log.debug("Detecting hanging pods: {}".format(kubectl_cmd))
+            output = get_bash_output(kubectl_cmd)
            output = '' if not output else output if isinstance(output, str) else output.decode('utf-8')
            try:
                output_config = json.loads(output)
@@ -231,6 +253,10 @@ class K8sIntegration(Worker):
                if not task_id:
                    continue

+                namespace = pod.get('metadata', {}).get('namespace', None)
+                if not namespace:
+                    continue
+
                task_ids.add(task_id)

                msg = None
@@ -250,7 +276,7 @@ class K8sIntegration(Worker):
                    msg = reason + (" ({})".format(message) if message else "")

                    if reason == 'ImagePullBackOff':
-                        delete_pod_cmd = 'kubectl delete pods {} -n {}'.format(pod_name, self.namespace)
+                        delete_pod_cmd = 'kubectl delete pods {} -n {}'.format(pod_name, namespace)
                        get_bash_output(delete_pod_cmd)
                        try:
                            self._session.api_client.tasks.failed(
@@ -336,13 +362,11 @@ class K8sIntegration(Worker):

        return self._agent_label

-    def _get_number_used_pods(self):
+    def _get_used_pods(self):
        # noinspection PyBroadException
        try:
-            kubectl_cmd_new = "kubectl get pods -l {agent_label} -n {namespace} -o json".format(
-                agent_label=self._get_agent_label(),
-                namespace=self.namespace,
-            )
+            kubectl_cmd_new = self.get_kubectl_command("get pods")
+            self.log.debug("Getting used pods: {}".format(kubectl_cmd_new))
            process = subprocess.Popen(kubectl_cmd_new.split(), stdout=subprocess.PIPE, stderr=subprocess.PIPE)
            output, error = process.communicate()
            output = '' if not output else output if isinstance(output, str) else output.decode('utf-8')
@@ -350,36 +374,39 @@ class K8sIntegration(Worker):

            if not output:
                # No such pod exist so we can use the pod_number we found
-                return 0
+                return 0, {}

            try:
-                current_pod_count = len(json.loads(output).get("items", []))
-            except (ValueError, TypeError) as ex:
-                return -1
+                items = json.loads(output).get("items", [])
+                current_pod_count = len(items)
+                namespaces = {item["metadata"]["namespace"] for item in items}
+            except (KeyError, ValueError, TypeError, AttributeError) as ex:
+                print("Failed parsing used pods command response for cleanup: {}".format(ex))
+                return -1, {}

-            return current_pod_count
+            return current_pod_count, namespaces
        except Exception as ex:
-            print('Failed getting number of used pods: {}'.format(ex))
-            return -2
+            print('Failed obtaining used pods information: {}'.format(ex))
+            return -2, {}

-    def run_one_task(self, queue: Text, task_id: Text, worker_args=None, **_):
+    def run_one_task(self, queue: Text, task_id: Text, worker_args=None, task_session=None, **_):
        print('Pulling task {} launching on kubernetes cluster'.format(task_id))
        task_data = self._session.api_client.tasks.get_all(id=[task_id])[0]

        # push task into the k8s queue, so we have visibility on pending tasks in the k8s scheduler
        try:
            print('Pushing task {} into temporary pending queue'.format(task_id))
-            res = self._session.api_client.tasks.stop(task_id, force=True)
+            _ = self._session.api_client.tasks.stop(task_id, force=True)
            res = self._session.api_client.tasks.enqueue(
                task_id,
-                queue=self.k8s_pending_queue_name,
+                queue=self.k8s_pending_queue_id,
                status_reason='k8s pending scheduler',
            )
            if res.meta.result_code != 200:
                raise Exception(res.meta.result_msg)
        except Exception as e:
-            self.log.error("ERROR: Could not push back task [{}] to k8s pending queue [{}], error: {}".format(
-                task_id, self.k8s_pending_queue_name, e))
+            self.log.error("ERROR: Could not push back task [{}] to k8s pending queue {} [{}], error: {}".format(
+                task_id, self.k8s_pending_queue_name, self.k8s_pending_queue_id, e))
            return

        container = get_task_container(self._session, task_id)
@@ -398,11 +425,19 @@ class K8sIntegration(Worker):
            self.conf_file_content
            or Path(self._session._config_file).read_text()
        ).encode("ascii")
-        create_clearml_conf = "echo '{}' | base64 --decode >> ~/clearml.conf".format(
+
+        create_clearml_conf = ["echo '{}' | base64 --decode >> ~/clearml.conf".format(
            base64.b64encode(
                hocon_config_encoded
            ).decode('ascii')
-        )
+        )]
+
+        if task_session:
+            create_clearml_conf.append(
+                "export CLEARML_AUTH_TOKEN=$(echo '{}' | base64 --decode)".format(
+                    base64.b64encode(task_session.token.encode("ascii")).decode('ascii')
+                )
+            )

        if self.ports_mode:
            print("Kubernetes looking for available pod to use")
@@ -418,39 +453,36 @@ class K8sIntegration(Worker):
        pod_number = self.base_pod_num
        while self.ports_mode or self.max_pods_limit:
            pod_number = self.base_pod_num + pod_count
-            if self.ports_mode:
-                kubectl_cmd_new = "kubectl get pods -l {pod_label},{agent_label} -n {namespace}".format(
-                    pod_label=self.LIMIT_POD_LABEL.format(pod_number=pod_number),
-                    agent_label=self._get_agent_label(),
-                    namespace=self.namespace,
-                )
-            else:
-                kubectl_cmd_new = "kubectl get pods -l {agent_label} -n {namespace} -o json".format(
-                    agent_label=self._get_agent_label(),
-                    namespace=self.namespace,
-                )
+
+            kubectl_cmd_new = self.get_kubectl_command(
+                "get pods",
+                extra_labels=[self.LIMIT_POD_LABEL.format(pod_number=pod_number)] if self.ports_mode else None
+            )
+            self.log.debug("Looking for a free pod/port: {}".format(kubectl_cmd_new))
            process = subprocess.Popen(kubectl_cmd_new.split(), stdout=subprocess.PIPE, stderr=subprocess.PIPE)
            output, error = process.communicate()
            output = '' if not output else output if isinstance(output, str) else output.decode('utf-8')
            error = '' if not error else error if isinstance(error, str) else error.decode('utf-8')

-            if not output:
-                # No such pod exist so we can use the pod_number we found
+            try:
+                items_count = len(json.loads(output).get("items", []))
+            except (ValueError, TypeError) as ex:
+                self.log.warning(
+                    "K8S Glue pods monitor: Failed parsing kubectl output:\n{}\ntask '{}' "
+                    "will be enqueued back to queue '{}'\nEx: {}".format(
+                        output, task_id, queue, ex
+                    )
+                )
+                self._session.api_client.tasks.stop(task_id, force=True)
+                self._session.api_client.tasks.enqueue(task_id, queue=queue, status_reason='kubectl parsing error')
+                return
+
+            if not items_count:
+                # No such pod exist so we can use the pod_number we found (result exists but with no items)
                break

            if self.max_pods_limit:
-                try:
-                    current_pod_count = len(json.loads(output).get("items", []))
-                except (ValueError, TypeError) as ex:
-                    self.log.warning(
-                        "K8S Glue pods monitor: Failed parsing kubectl output:\n{}\ntask '{}' "
-                        "will be enqueued back to queue '{}'\nEx: {}".format(
-                            output, task_id, queue, ex
-                        )
-                    )
-                    self._session.api_client.tasks.stop(task_id, force=True)
-                    self._session.api_client.tasks.enqueue(task_id, queue=queue, status_reason='kubectl parsing error')
-                    return
+                current_pod_count = items_count
                max_count = self.max_pods_limit
            else:
                current_pod_count = pod_count
@@ -475,10 +507,9 @@ class K8sIntegration(Worker):
                break
            pod_count += 1

-        labels = ([self.LIMIT_POD_LABEL.format(pod_number=pod_number)] if self.ports_mode else []) + \
-                 [self._get_agent_label()]
-        labels.append("clearml-agent-queue={}".format(self._safe_k8s_label_value(queue)))
-        labels.append("clearml-agent-queue-name={}".format(self._safe_k8s_label_value(queue_name)))
+        labels = self._get_pod_labels(queue, queue_name)
+        if self.ports_mode:
+            labels.append(self.LIMIT_POD_LABEL.format(pod_number=pod_number))

        if self.ports_mode:
            print("Kubernetes scheduling task id={} on pod={} (pod_count={})".format(task_id, pod_number, pod_count))
@@ -495,8 +526,14 @@ class K8sIntegration(Worker):
            queue=queue
        )

-        if self.template_dict:
-            output, error = self._kubectl_apply(**kubectl_kwargs)
+        try:
+            template = self._resolve_template(task_session, task_data, queue)
+        except Exception as ex:
+            print("ERROR: Failed resolving template (skipping): {}".format(ex))
+            return
+
+        if template:
+            output, error = self._kubectl_apply(template=template, **kubectl_kwargs)
        else:
            output, error = self._kubectl_run(task_data=task_data, **kubectl_kwargs)

@@ -532,6 +569,13 @@ class K8sIntegration(Worker):
                **user_props
            )

+    def _get_pod_labels(self, queue, queue_name):
+        return [
+            self._get_agent_label(),
+            "clearml-agent-queue={}".format(self._safe_k8s_label_value(queue)),
+            "clearml-agent-queue-name={}".format(self._safe_k8s_label_value(queue_name))
+        ]
+
    def _get_docker_args(self, docker_args, flags, target=None, convert=None):
        # type: (List[str], Collection[str], Optional[str], Callable[[str], Any]) -> Union[dict, List[str]]
        """
@@ -558,8 +602,16 @@ class K8sIntegration(Worker):
            return {target: results} if results else {}
        return results

-    def _kubectl_apply(self, create_clearml_conf, docker_image, docker_args, docker_bash, labels, queue, task_id):
-        template = deepcopy(self.template_dict)
+    def _kubectl_apply(
+        self, create_clearml_conf, docker_image, docker_args, docker_bash, labels, queue, task_id, template=None
+    ):
+        template = template or deepcopy(self.template_dict)
+
+        try:
+            namespace = template['metadata']['namespace'] or self.namespace
+        except (KeyError, TypeError, AttributeError):
+            namespace = self.namespace
+
        template.setdefault('apiVersion', 'v1')
        template['kind'] = 'Pod'
        template.setdefault('metadata', {})
@@ -594,19 +646,25 @@ class K8sIntegration(Worker):
                         extra_docker_bash_script=extra_docker_bash_script)
             for line in container_bash_script])

-        create_init_script = \
-            "echo '{}' | base64 --decode >> ~/__start_agent__.sh ; " \
-            "/bin/bash ~/__start_agent__.sh".format(
-                base64.b64encode(
+        extra_bash_commands = list(create_clearml_conf or [])
+
+        start_agent_script_path = ENV_START_AGENT_SCRIPT_PATH.get() or "~/__start_agent__.sh"
+
+        extra_bash_commands.append(
+            "echo '{content}' | base64 --decode >> {script_path} ; /bin/bash {script_path}".format(
+                content=base64.b64encode(
                    script_encoded.encode('ascii')
-                ).decode('ascii'))
+                ).decode('ascii'),
+                script_path=start_agent_script_path
+            )
+        )

        # Notice: we always leave with exit code 0, so pods are never restarted
        container = self._merge_containers(
            container,
            dict(name=name, image=docker_image,
                 command=['/bin/bash'],
-                 args=['-c', '{} ; {} ; exit 0'.format(create_clearml_conf, create_init_script)])
+                 args=['-c', '{} ; exit 0'.format(' ; '.join(extra_bash_commands))])
        )

        if template['spec']['containers']:
@@ -623,11 +681,13 @@ class K8sIntegration(Worker):
        with open(yaml_file, 'wt') as f:
            yaml.dump(template, f)

+        self.log.debug("Applying template:\n{}".format(pformat(template, indent=2)))
+
        kubectl_cmd = self.KUBECTL_APPLY_CMD.format(
            task_id=task_id,
            docker_image=docker_image,
            queue_id=queue,
-            namespace=self.namespace
+            namespace=namespace
        )
        # make sure we provide a list
        if isinstance(kubectl_cmd, str):
@@ -685,7 +745,7 @@ class K8sIntegration(Worker):
            "--",
            "/bin/sh",
            "-c",
-            "{} ; {}".format(create_clearml_conf, container_bash_script.format(
+            "{} ; {}".format(" ; ".join(create_clearml_conf or []), container_bash_script.format(
                extra_bash_init_cmd=self.extra_bash_init_script or "",
                extra_docker_bash_script=docker_bash or "",
                task_id=task_id
@@ -709,26 +769,29 @@ class K8sIntegration(Worker):
        events_service = self.get_service(Events)

        # make sure we have a k8s pending queue
-        # noinspection PyBroadException
-        try:
-            self._session.api_client.queues.create(self.k8s_pending_queue_name)
-        except Exception:
-            pass
-        # get queue id
-        self.k8s_pending_queue_name = self._resolve_name(self.k8s_pending_queue_name, "queues")
+        if not self.k8s_pending_queue_id:
+            resolved_ids = self._resolve_queue_names([self.k8s_pending_queue_name], create_if_missing=True)
+            if not resolved_ids:
+                raise ValueError(
+                    "Failed resolving or creating k8s pending queue {}".format(self.k8s_pending_queue_name)
+                )
+            self.k8s_pending_queue_id = resolved_ids[0]

        _last_machine_update_ts = 0
        while True:
+            # Get used pods and namespaces
+            current_pods, namespaces = self._get_used_pods()
+
            # check if have pod limit, then check if we hit it.
            if self.max_pods_limit:
-                current_pods = self._get_number_used_pods()
                if current_pods >= self.max_pods_limit:
                    print("Maximum pod limit reached {}/{}, sleeping for {:.1f} seconds".format(
                        current_pods, self.max_pods_limit, self._polling_interval))
                    # delete old completed / failed pods
-                    get_bash_output(
-                        self.KUBECTL_DELETE_CMD.format(namespace=self.namespace, selector=self._get_agent_label())
-                    )
+                    for namespace in namespaces:
+                        kubectl_cmd = self.KUBECTL_DELETE_CMD.format(namespace=namespace, selector=self._get_agent_label())
+                        self.log.debug("Deleting old/failed pods due to pod limit: {}".format(kubectl_cmd))
+                        get_bash_output(kubectl_cmd)
                    # go to sleep
                    sleep(self._polling_interval)
                    continue
@@ -736,20 +799,23 @@ class K8sIntegration(Worker):
            # iterate over queues (priority style, queues[0] is highest)
            for queue in queues:
                # delete old completed / failed pods
-                get_bash_output(
-                    self.KUBECTL_DELETE_CMD.format(namespace=self.namespace, selector=self._get_agent_label())
-                )
+                for namespace in namespaces:
+                    kubectl_cmd = self.KUBECTL_DELETE_CMD.format(namespace=namespace, selector=self._get_agent_label())
+                    self.log.debug("Deleting old/failed pods: {}".format(kubectl_cmd))
+                    get_bash_output(kubectl_cmd)

                # get next task in queue
                try:
-                    response = self._session.api_client.queues.get_next_task(queue=queue)
+                    response = self._get_next_task(queue=queue, get_task_info=self._impersonate_as_task_owner)
                except Exception as e:
                    print("Warning: Could not access task queue [{}], error: {}".format(queue, e))
                    continue
                else:
+                    if not response:
+                        continue
                    try:
-                        task_id = response.entry.task
-                    except AttributeError:
+                        task_id = response["entry"]["task"]
+                    except (KeyError, TypeError, AttributeError):
                        print("No tasks in queue {}".format(queue))
                        continue
                    events_service.send_log_events(
@@ -761,8 +827,26 @@ class K8sIntegration(Worker):
                        level="INFO",
                    )

+                    task_session = None
+                    if self._impersonate_as_task_owner:
+                        try:
+                            task_user = response["task_info"]["user"]
+                            task_company = response["task_info"]["company"]
+                        except (KeyError, TypeError, AttributeError):
+                            print("Error: cannot retrieve owner user for the task '{}', skipping".format(task_id))
+                            continue
+
+                        task_session = self.get_task_session(task_user, task_company)
+                        if not task_session:
+                            print(
+                                "Error: Could not login as the user '{}' for the task '{}', skipping".format(
+                                    task_user, task_id
+                                )
+                            )
+                            continue
+
                    self.report_monitor(ResourceMonitor.StatusReport(queues=queues, queue=queue, task=task_id))
-                    self.run_one_task(queue, task_id, worker_params)
+                    self.run_one_task(queue, task_id, worker_params, task_session)
                    self.report_monitor(ResourceMonitor.StatusReport(queues=self.queues))
                    break
            else:
@@ -773,7 +857,7 @@ class K8sIntegration(Worker):
            if self._session.config["agent.reload_config"]:
                self.reload_config()

-    def k8s_daemon(self, queue):
+    def k8s_daemon(self, queue, **kwargs):
        """
        Start the k8s Glue service.
        This service will be pulling tasks from *queue* and scheduling them for execution using kubectl.
@@ -784,8 +868,19 @@ class K8sIntegration(Worker):

        :param list(str) queue: queue name to pull from
        """
-        return self.daemon(queues=[ObjectID(name=queue)] if queue else None,
-                           log_level=logging.INFO, foreground=True, docker=False)
+        return self.daemon(
+            queues=[ObjectID(name=queue)] if queue else None,
+            log_level=logging.INFO, foreground=True, docker=False, **kwargs,
+        )
+
+    def _get_next_task(self, queue, get_task_info):
+        return get_next_task(
+            self._session, queue=queue, get_task_info=get_task_info
+        )
+
+    def _resolve_template(self, task_session, task_data, queue):
+        if self.template_dict:
+            return deepcopy(self.template_dict)

    @classmethod
    def get_ssh_server_bash(cls, ssh_port_number):
--- a/clearml_agent/helper/base.py
+++ b/clearml_agent/helper/base.py
@@ -204,10 +204,13 @@ def get_python_path(script_dir, entry_point, package_api, is_conda_env=False):
            ["-c", "import sys; print('{}'.join(sys.path))".format(python_path_sep)])
        org_python_path = python_path_cmd.get_output(cwd=script_dir)
        # Add path of the script directory and executable directory
-        python_path = '{}{python_path_sep}{}{python_path_sep}'.format(
-            Path(script_dir).absolute().as_posix(),
-            (Path(script_dir) / Path(entry_point)).parent.absolute().as_posix(),
-            python_path_sep=python_path_sep)
+        python_path = '{}{python_path_sep}'.format(
+            Path(script_dir).absolute().as_posix(), python_path_sep=python_path_sep)
+        if entry_point:
+            python_path += '{}{python_path_sep}'.format(
+                (Path(script_dir) / Path(entry_point)).parent.absolute().as_posix(),
+                python_path_sep=python_path_sep)
+
        if is_windows_platform():
            python_path = python_path.replace('/', '\\')

@@ -503,6 +506,38 @@ def is_conda(config):
    return config['agent.package_manager.type'].lower() == 'conda'


+def convert_cuda_version_to_float_single_digit_str(cuda_version):
+    """
+    Convert a cuda_version (string/float/int) into a float representation, e.g. 11.4
+    Notice returns String Single digit only!
+    :return str:
+    """
+    cuda_version = str(cuda_version or 0)
+    # if we have patch version we parse it here
+    cuda_version_parts = [int(v) for v in cuda_version.split('.')]
+    if len(cuda_version_parts) > 1 or cuda_version_parts[0] < 60:
+        cuda_version = 10 * cuda_version_parts[0]
+        if len(cuda_version_parts) > 1:
+            cuda_version += float(".{:d}".format(cuda_version_parts[1]))*10
+
+        cuda_version_full = "{:.1f}".format(float(cuda_version) / 10.)
+    else:
+        cuda_version = cuda_version_parts[0]
+        cuda_version_full = "{:.1f}".format(float(cuda_version) / 10.)
+
+    return cuda_version_full
+
+
+def convert_cuda_version_to_int_10_base_str(cuda_version):
+    """
+    Convert a cuda_version (string/float/int) into an integer version, e.g. 112 for cuda 11.2
+    Return string
+    :return str:
+    """
+    cuda_version = convert_cuda_version_to_float_single_digit_str(cuda_version)
+    return str(int(float(cuda_version)*10))
+
+
 class NonStrictAttrs(object):

    @classmethod
--- a/clearml_agent/helper/console.py
+++ b/clearml_agent/helper/console.py
@@ -2,7 +2,7 @@ from __future__ import unicode_literals, print_function

 import csv
 import sys
-from collections import Iterable
+from collections.abc import Iterable
 from typing import List, Dict, Text, Any

 from attr import attrs, attrib
--- a/clearml_agent/helper/package/conda_api.py
+++ b/clearml_agent/helper/package/conda_api.py
@@ -19,7 +19,9 @@ from clearml_agent.external.requirements_parser import parse
 from clearml_agent.external.requirements_parser.requirement import Requirement

 from clearml_agent.errors import CommandFailedError
-from clearml_agent.helper.base import rm_tree, NonStrictAttrs, select_for_platform, is_windows_platform, ExecutionInfo
+from clearml_agent.helper.base import (
+    rm_tree, NonStrictAttrs, select_for_platform, is_windows_platform, ExecutionInfo,
+    convert_cuda_version_to_float_single_digit_str, convert_cuda_version_to_int_10_base_str, )
 from clearml_agent.helper.process import Argv, Executable, DEVNULL, CommandSequence, PathLike
 from clearml_agent.helper.package.requirements import SimpleVersion
 from clearml_agent.session import Session
@@ -167,7 +169,7 @@ class CondaAPI(PackageManager):
                raise ValueError("Could not restore Conda environment, cannot find {}".format(
                    self.conda_pre_build_env_path))

-        output = Argv(
+        command = Argv(
            self.conda,
            "create",
            "--yes",
@@ -175,7 +177,9 @@ class CondaAPI(PackageManager):
            "--prefix",
            self.path,
            "python={}".format(self.python),
-        ).get_output(stderr=DEVNULL)
+        )
+        print('Executing Conda: {}'.format(command.serialize()))
+        output = command.get_output(stderr=DEVNULL)
        match = re.search(
            r"\W*(.*activate) ({})".format(re.escape(str(self.path))), output
        )
@@ -189,14 +193,6 @@ class CondaAPI(PackageManager):
        if conda_env.is_file() and not is_windows_platform():
            self.source = self.pip.source = CommandSequence(('source', conda_env.as_posix()), self.source)

-        # install cuda toolkit
-        # noinspection PyBroadException
-        try:
-            cuda_version = float(int(self.session.config['agent.cuda_version'])) / 10.0
-            if cuda_version > 0:
-                self._install('cudatoolkit={:.1f}'.format(cuda_version))
-        except Exception:
-            pass
        return self

    def _init_existing_environment(self, conda_pre_build_env_path):
@@ -428,7 +424,7 @@ class CondaAPI(PackageManager):
            finally:
                PackageManager._selected_manager = self

-        self.requirements_manager.post_install(self.session)
+        self.requirements_manager.post_install(self.session, package_manager=self)

    def load_requirements(self, requirements):
        # if we are in read only mode, do not uninstall anything
@@ -456,9 +452,18 @@ class CondaAPI(PackageManager):
            requirements['conda'] = requirements['conda'].split('\n')
        has_torch = False
        has_matplotlib = False
+        has_cudatoolkit = False
+        cuda_version_full = 0
+        # noinspection PyBroadException
        try:
-            cuda_version = int(self.session.config.get('agent.cuda_version', 0))
-        except:
+            # notice this is an integer version: 112 (means 11.2)
+            cuda_version = str(self.session.config.get('agent.cuda_version', "")).strip()
+            if not cuda_version:
+                cuda_version = 0
+            else:
+                cuda_version_full = convert_cuda_version_to_float_single_digit_str(cuda_version)
+                cuda_version = int(convert_cuda_version_to_int_10_base_str(cuda_version))
+        except Exception:
            cuda_version = 0

        # notice 'conda' entry with empty string is a valid conda requirements list, it means pip only
@@ -475,6 +480,7 @@ class CondaAPI(PackageManager):
                continue

            m = MarkerRequirement(marker[0])
+            m.validate_local_file_ref()
            # conda does not support version control links
            if m.vcs:
                pip_requirements.append(m)
@@ -488,6 +494,19 @@ class CondaAPI(PackageManager):
                if '.' not in m.specs[0][1]:
                    continue

+            if m.name.lower() == 'cudatoolkit':
+                # skip cuda if we are running on CPU
+                if not cuda_version:
+                    continue
+
+                has_cudatoolkit = True
+                # cuda version, only major.minor
+                requested_cuda_version = '.'.join(m.specs[0][1].split('.')[:2])
+                # make sure that the cuda_version we support can install the requested cuda (major version)
+                if int(float(requested_cuda_version)) > int(float(cuda_version)/10.0):
+                    continue
+                m.specs = [(m.specs[0][0], str(requested_cuda_version)), ]
+
            conda_supported_req_names.append(m.name.lower())
            if m.req.name.lower() == 'matplotlib':
                has_matplotlib = True
@@ -504,6 +523,11 @@ class CondaAPI(PackageManager):

            reqs.append(m)

+        if not has_cudatoolkit and cuda_version:
+            m = MarkerRequirement(Requirement.parse("cudatoolkit == {}".format(cuda_version_full)))
+            has_cudatoolkit = True
+            reqs.append(m)
+
        # if we have a conda list, the rest should be installed with pip,
        # this means  any experiment that was executed with pip environment,
        # will be installed using pip
@@ -517,9 +541,9 @@ class CondaAPI(PackageManager):
                    continue

                m = MarkerRequirement(marker[0])
-                # skip over local files (we cannot change the version to a local file)
-                if m.local_file:
-                    continue
+                # remove local files reference if it does not exist (leave the package name)
+                m.validate_local_file_ref()
+
                m_name = (m.name or '').lower()
                if m_name in conda_supported_req_names:
                    # this package is in the conda list,
@@ -559,8 +583,12 @@ class CondaAPI(PackageManager):
            # change _ to - in name but not the prefix _ (as this is conda prefix)
            if r.name and not r.name.startswith('_') and not requirements.get('conda', None):
                r.name = r.name.replace('_', '-')
-            # remove .post from version numbers, it fails ~= version, and change == to ~=
-            if r.specs and r.specs[0]:
+
+            if has_cudatoolkit and r.specs and len(r.specs[0]) > 1 and r.name == 'cudatoolkit':
+                # select specific cuda version if it came from the requirements
+                r.specs = [(r.specs[0][0].replace('==', '='), r.specs[0][1].split('.post')[0])]
+            elif r.specs and r.specs[0] and len(r.specs[0]) > 1:
+                # remove .post from version numbers it fails with ~= version, and change == to ~=
                r.specs = [(r.specs[0][0].replace('==', '~='), r.specs[0][1].split('.post')[0])]

        while reqs:
@@ -614,7 +642,7 @@ class CondaAPI(PackageManager):
            finally:
                PackageManager._selected_manager = self

-        self.requirements_manager.post_install(self.session)
+        self.requirements_manager.post_install(self.session, package_manager=self)
        return True

    def _parse_conda_result_bad_packges(self, result_dict):
--- a/clearml_agent/helper/package/external_req.py
+++ b/clearml_agent/helper/package/external_req.py
@@ -46,11 +46,10 @@ class ExternalRequirements(SimpleSubstitution):
        post_install_req = self.post_install_req
        self.post_install_req = []
        for req in post_install_req:
-            try:
-                freeze_base = PackageManager.out_of_scope_freeze() or ''
-            except:
-                freeze_base = ''
-
+            if self.is_already_installed(req):
+                print("No need to reinstall \'{}\' from VCS, "
+                      "the exact same version is already installed".format(req.name))
+                continue
            req_line = self._add_vcs_credentials(req, session)

            # if we have older pip version we have to make sure we replace back the package name with the
@@ -96,7 +95,8 @@ class ExternalRequirements(SimpleSubstitution):
                vcs._set_ssh_url()
                new_req_line = 'git+{}{}{}'.format(
                    '' if scheme and '://' in vcs.url else scheme,
-                    vcs.url_with_auth, fragment
+                    vcs_url if session.config.get('agent.force_git_ssh_protocol', None) else vcs.url_with_auth,
+                    fragment
                )
                if new_req_line != req_line:
                    furl_line = furl(new_req_line)
@@ -175,5 +175,11 @@ class OnlyExternalRequirements(ExternalRequirements):
        # Do not store the skipped requirements
        # mark skip package
        if super(OnlyExternalRequirements, self).match(req):
+            if self.is_already_installed(req):
+                print("No need to reinstall \'{}\' from VCS, "
+                      "the exact same version is already installed".format(req.name))
+                return Text('')
+
            return self._add_vcs_credentials(req, self._session)
+
        return Text('')
--- a/clearml_agent/helper/package/pip_api/venv.py
+++ b/clearml_agent/helper/package/pip_api/venv.py
@@ -12,7 +12,7 @@ from ..requirements import RequirementsManager

 class VirtualenvPip(SystemPip, PackageManager):
    def __init__(self, session, python, requirements_manager, path, interpreter=None, execution_info=None, **kwargs):
-        # type: (Session, float, RequirementsManager, PathLike, PathLike, ExecutionInfo, Any) -> ()
+        # type: (Session, str, RequirementsManager, PathLike, PathLike, ExecutionInfo, Any) -> ()
        """
        Program interface to virtualenv pip.
        Must be given either path to virtualenv or source command.
@@ -39,7 +39,7 @@ class VirtualenvPip(SystemPip, PackageManager):
        if isinstance(requirements, dict) and requirements.get("pip"):
            requirements["pip"] = self.requirements_manager.replace(requirements["pip"])
        super(VirtualenvPip, self).load_requirements(requirements)
-        self.requirements_manager.post_install(self.session)
+        self.requirements_manager.post_install(self.session, package_manager=self)

    def create_flags(self):
        """
--- a/clearml_agent/helper/package/poetry_api.py
+++ b/clearml_agent/helper/package/poetry_api.py
@@ -5,6 +5,7 @@ import attr
 import sys
 import os
 from pathlib2 import Path
+
 from clearml_agent.helper.process import Argv, DEVNULL, check_if_command_exists
 from clearml_agent.session import Session, POETRY

@@ -81,6 +82,32 @@ class PoetryConfig:
    @_guard_enabled
    def initialize(self, cwd=None):
        if not self._initialized:
+            if self.session.config.get("agent.package_manager.poetry_version", None) is not None:
+                version = str(self.session.config.get("agent.package_manager.poetry_version"))
+                print('Upgrading Poetry package {}'.format(version))
+                # first upgrade pip if we need to
+                try:
+                    from clearml_agent.helper.package.pip_api.venv import VirtualenvPip
+                    pip = VirtualenvPip(
+                        session=self.session, python=self._python,
+                        requirements_manager=None, path=None, interpreter=self._python)
+                    pip.upgrade_pip()
+                except Exception as ex:
+                    self.log.warning("failed upgrading pip: {}".format(ex))
+
+                # now install poetry
+                try:
+                    version = version.replace(' ', '')
+                    if ('=' in version) or ('~' in version) or ('<' in version) or ('>' in version):
+                        version = version
+                    elif version:
+                        version = "==" + version
+                    argv = Argv(self._python, "-m", "pip", "install", "poetry{}".format(version),
+                                "--upgrade", "--disable-pip-version-check")
+                    print(argv.get_output())
+                except Exception as ex:
+                    self.log.warning("failed upgrading poetry: {}".format(ex))
+
            self._initialized = True
            try:
                self._config("--local", "virtualenvs.in-project",  "true", cwd=cwd)
--- a/clearml_agent/helper/package/priority_req.py
+++ b/clearml_agent/helper/package/priority_req.py
@@ -1,3 +1,4 @@
+import re
 from typing import Text

 from .base import PackageManager
@@ -11,13 +12,14 @@ class PriorityPackageRequirement(SimpleSubstitution):

    def __init__(self, *args, **kwargs):
        super(PriorityPackageRequirement, self).__init__(*args, **kwargs)
+        self._replaced_packages = {}
        # check if we need to replace the packages:
        priority_packages = self.config.get('agent.package_manager.priority_packages', None)
        if priority_packages:
-            self.__class__.name = priority_packages
+            self.__class__.name = [p.lower() for p in priority_packages]
        priority_optional_packages = self.config.get('agent.package_manager.priority_optional_packages', None)
        if priority_optional_packages:
-            self.__class__.optional_package_names = priority_optional_packages
+            self.__class__.optional_package_names = [p.lower() for p in priority_optional_packages]

    def match(self, req):
        # match both Cython & cython
@@ -28,7 +30,9 @@ class PriorityPackageRequirement(SimpleSubstitution):
        Replace a requirement
        :raises: ValueError if version is pre-release
        """
-        if req.name in self.optional_package_names:
+        self._replaced_packages[req.name] = req.line
+
+        if req.name.lower() in self.optional_package_names:
            # noinspection PyBroadException
            try:
                if PackageManager.out_of_scope_install_package(str(req)):
@@ -39,6 +43,41 @@ class PriorityPackageRequirement(SimpleSubstitution):
        PackageManager.out_of_scope_install_package(str(req))
        return Text(req)

+    def replace_back(self, list_of_requirements):
+        """
+        :param list_of_requirements: {'pip': ['a==1.0', ]}
+        :return: {'pip': ['a==1.0', ]}
+        """
+        # if we replaced setuptools, it means someone requested it, and since freeze will not contain it,
+        # we need to add it manually
+        if not self._replaced_packages or "setuptools" not in self._replaced_packages:
+            return list_of_requirements
+
+        try:
+            for k, lines in list_of_requirements.items():
+                # k is either pip/conda
+                if k not in ('pip', 'conda'):
+                    continue
+                for i, line in enumerate(lines):
+                    if not line or line.lstrip().startswith('#'):
+                        continue
+                    parts = [p for p in re.split(r'\s|=|\.|<|>|~|!|@|#', line) if p]
+                    if not parts:
+                        continue
+                    # if we found setuptools, do nothing
+                    if parts[0] == "setuptools":
+                        return list_of_requirements
+
+            # if we are here it means we have not found setuptools
+            # we should add it:
+            if "pip" in list_of_requirements:
+                list_of_requirements["pip"] = [self._replaced_packages["setuptools"]] + list_of_requirements["pip"]
+
+        except Exception as ex:  # noqa
+            return list_of_requirements
+
+        return list_of_requirements
+

 class PackageCollectorRequirement(SimpleSubstitution):
    """
--- a/clearml_agent/helper/package/pytorch.py
+++ b/clearml_agent/helper/package/pytorch.py
@@ -2,17 +2,19 @@ from __future__ import unicode_literals

 import re
 import sys
+import platform
 from furl import furl
 import urllib.parse
 from operator import itemgetter
 from html.parser import HTMLParser
-from typing import Text
+from typing import Text, Optional, Dict

 import attr
 import requests

 import six
-from .requirements import SimpleSubstitution, FatalSpecsResolutionError, SimpleVersion
+from .requirements import SimpleSubstitution, FatalSpecsResolutionError, SimpleVersion, MarkerRequirement
+from ...external.requirements_parser.requirement import Requirement

 OS_TO_WHEEL_NAME = {"linux": "linux_x86_64", "windows": "win_amd64"}

@@ -174,36 +176,43 @@ class PytorchRequirement(SimpleSubstitution):
        self.log = self._session.get_logger(__name__)
        self.package_manager = self.config["agent.package_manager.type"].lower()
        self.os = os_name or self.get_platform()
-        self.cuda = "cuda{}".format(self.cuda_version).lower()
-        self.python_version_string = str(self.config["agent.default_python"])
-        self.python_major_minor_str = '.'.join(self.python_version_string.split('.')[:2])
-        if '.' not in self.python_major_minor_str:
-            raise PytorchResolutionError(
-                "invalid python version {!r} defined in configuration file, key 'agent.default_python': "
-                "must have both major and minor parts of the version (for example: '3.7')".format(
-                    self.python_version_string
-                )
-            )
-        self.python = "python{}".format(self.python_major_minor_str)
-
-        self.exceptions = [
-            PytorchResolutionError(message)
-            for message in (
-                None,
-                'cuda version "{}" is not supported'.format(self.cuda),
-                'python version "{}" is not supported'.format(
-                    self.python_version_string
-                ),
-            )
-        ]
-
-        try:
-            self.validate_python_version()
-        except PytorchResolutionError as e:
-            self.log.warn("will not be able to install pytorch wheels: %s", e.args[0])
-
+        self.cuda = None
+        self.python_version_string = None
+        self.python_major_minor_str = None
+        self.python = None
+        self._fix_setuptools = None
+        self.exceptions = []
        self._original_req = []

+    def _init_python_ver_cuda_ver(self):
+        if self.cuda is None:
+            self.cuda = "cuda{}".format(self.cuda_version).lower()
+        if self.python_version_string is None:
+            self.python_version_string = str(self.config["agent.default_python"])
+        if self.python_major_minor_str is None:
+            self.python_major_minor_str = '.'.join(self.python_version_string.split('.')[:2])
+            if '.' not in self.python_major_minor_str:
+                raise PytorchResolutionError(
+                    "invalid python version {!r} defined in configuration file, key 'agent.default_python': "
+                    "must have both major and minor parts of the version (for example: '3.7')".format(
+                        self.python_version_string
+                    )
+                )
+        if self.python is None:
+            self.python = "python{}".format(self.python_major_minor_str)
+
+        if not self.exceptions:
+            self.exceptions = [
+                PytorchResolutionError(message)
+                for message in (
+                    None,
+                    'cuda version "{}" is not supported'.format(self.cuda),
+                    'python version "{}" is not supported'.format(
+                        self.python_version_string
+                    ),
+                )
+            ]
+
    @property
    def is_conda(self):
        return self.package_manager == "conda"
@@ -216,6 +225,8 @@ class PytorchRequirement(SimpleSubstitution):
        """
        Make sure python version has both major and minor versions as required for choosing pytorch wheel
        """
+        self._init_python_ver_cuda_ver()
+
        if self.is_pip and not self.python_major_minor_str:
            raise PytorchResolutionError(
                "invalid python version {!r} defined in configuration file, key 'agent.default_python': "
@@ -237,10 +248,15 @@ class PytorchRequirement(SimpleSubstitution):
            return "macos"
        raise RuntimeError("unrecognized OS")

+    @staticmethod
+    def get_arch():
+        return str(platform.machine()).lower()
+
    def _get_link_from_torch_page(self, req, torch_url):
        links_parser = LinksHTMLParser()
        links_parser.feed(requests.get(torch_url, timeout=10).text)
        platform_wheel = "win" if self.get_platform() == "windows" else self.get_platform()
+        arch_wheel = self.get_arch()
        py_ver = self.python_major_minor_str.replace('.', '')
        url = None
        last_v = None
@@ -261,7 +277,13 @@ class PytorchRequirement(SimpleSubstitution):
                continue
            if len(parts) < 3 or not parts[2].endswith(py_ver):
                continue
-            if len(parts) < 5 or platform_wheel not in parts[4]:
+            if len(parts) < 5 or platform_wheel not in parts[4].lower():
+                continue
+            if len(parts) < 5 or arch_wheel not in parts[4].lower():
+                continue
+
+            # yes this is for linux python 2.7 support, this is the only python 2.7 we support...
+            if py_ver and py_ver[0] == '2' and len(parts) > 3 and not parts[3].endswith('u'):
                continue
            # update the closest matched version (from above)
            if not closest_v:
@@ -291,18 +313,21 @@ class PytorchRequirement(SimpleSubstitution):

    def get_url_for_platform(self, req):
        # check if package is already installed with system packages
+        self.validate_python_version()
        # noinspection PyBroadException
        try:
            if self.config.get("agent.package_manager.system_site_packages", None):
                from pip._internal.commands.show import search_packages_info
                installed_torch = list(search_packages_info([req.name]))
                # notice the comparison order, the first part will make sure we have a valid installed package
-                if installed_torch and installed_torch[0]['version'] and \
-                        req.compare_version(installed_torch[0]['version']):
+                installed_torch_version = (getattr(installed_torch[0], 'version', None) or installed_torch[0]['version']) \
+                    if installed_torch else None
+                if installed_torch and installed_torch_version and \
+                        req.compare_version(installed_torch_version):
                    print('PyTorch: requested "{}" version {}, using pre-installed version {}'.format(
-                        req.name, req.specs[0] if req.specs else 'unspecified', installed_torch[0]['version']))
+                        req.name, req.specs[0] if req.specs else 'unspecified', installed_torch_version))
                    # package already installed, do nothing
-                    req.specs = [('==', str(installed_torch[0]['version']))]
+                    req.specs = [('==', str(installed_torch_version))]
                    return '{} {} {}'.format(req.name, req.specs[0][0], req.specs[0][1]), True
        except Exception:
            pass
@@ -343,6 +368,10 @@ class PytorchRequirement(SimpleSubstitution):
            else:
                print('Trying PyTorch CUDA version {} support'.format(torch_url_key))

+        # fix broken pytorch setuptools incompatibility
+        if closest_matched_version and SimpleVersion.compare_versions(closest_matched_version, "<", "1.11.0"):
+            self._fix_setuptools = "setuptools < 59"
+
        if not url:
            url = PytorchWheel(
                torch_version=fix_version(version),
@@ -483,7 +512,7 @@ class PytorchRequirement(SimpleSubstitution):
                for i, line in enumerate(lines):
                    if not line or line.lstrip().startswith('#'):
                        continue
-                    parts = [p for p in re.split('\s|=|\.|<|>|~|!|@|#', line) if p]
+                    parts = [p for p in re.split(r'\s|=|\.|<|>|~|!|@|#', line) if p]
                    if not parts:
                        continue
                    for req, new_req in self._original_req:
@@ -505,6 +534,16 @@ class PytorchRequirement(SimpleSubstitution):

        return list_of_requirements

+    def post_scan_add_req(self):  # type: () -> Optional[MarkerRequirement]
+        """
+        Allows the RequirementSubstitution to add an extra line/requirements after
+        the initial requirements scan is completed.
+        Called only once per requirements.txt object
+        """
+        if self._fix_setuptools:
+            return MarkerRequirement(Requirement.parse(self._fix_setuptools))
+        return None
+
    MAP = {
        "windows": {
            "cuda100": {
--- a/clearml_agent/helper/package/requirements.py
+++ b/clearml_agent/helper/package/requirements.py
@@ -14,8 +14,12 @@ from pathlib2 import Path
 from pyhocon import ConfigTree

 import six
+from six.moves.urllib.parse import unquote
+import logging
 from clearml_agent.definitions import PIP_EXTRA_INDICES
-from clearml_agent.helper.base import warning, is_conda, which, join_lines, is_windows_platform
+from clearml_agent.helper.base import (
+    warning, is_conda, which, join_lines, is_windows_platform,
+    convert_cuda_version_to_int_10_base_str, )
 from clearml_agent.helper.process import Argv, PathLike
 from clearml_agent.helper.gpu.gpustat import get_driver_cuda_version
 from clearml_agent.session import Session, normalize_cuda_version
@@ -153,6 +157,33 @@ class MarkerRequirement(object):
        return SimpleVersion.compare_versions(
            version_a=requested_version, op=op, version_b=version, num_parts=num_parts)

+    def remove_local_file_ref(self):
+        if not self.local_file or self.vcs or self.editable or self.path:
+            return False
+        parts = re.split(r"@\s*{}".format(self.req.uri), self.req.line)
+        # if we did not find anything do nothing
+        if len(parts) < 2:
+            return False
+        self.req.line = ''.join(parts).strip()
+        self.req.uri = None
+        self.req.local_file = False
+        return True
+
+    def validate_local_file_ref(self):
+        # if local file does not exist, remove the reference to it
+        if self.vcs or self.editable or self.path or not self.local_file or not self.name or \
+                not self.uri or not self.uri.startswith("file://"):
+            return
+        local_path = Path(self.uri[len("file://"):])
+        if not local_path.exists():
+            local_path = Path(unquote(self.uri)[len("file://"):])
+            if not local_path.exists():
+                line = self.line
+                if self.remove_local_file_ref():
+                    # print warning
+                    logging.getLogger(__name__).warning(
+                        'Local file not found [{}], references removed'.format(line))
+

 class SimpleVersion:
    _sub_versions_pep440 = ['a', 'b', 'rc', '.post', '.dev', '+', ]
@@ -208,7 +239,11 @@ class SimpleVersion:
        if not version_b:
            return True

+        if not num_parts:
+            num_parts = max(len(version_a.split('.')), len(version_b.split('.')), )
+
        if op == '~=':
+            num_parts = len(version_b.split('.')) - 1
            num_parts = max(num_parts, 2)
            op = '=='
            ignore_sub_versions = True
@@ -245,6 +280,16 @@ class SimpleVersion:
            return version_a_key < version_b_key
        raise ValueError('Unrecognized comparison operator [{}]'.format(op))

+    @classmethod
+    def max_version(cls, version_a, version_b):
+        return version_a if cls.compare_versions(
+            version_a=version_a, op='>=', version_b=version_b, num_parts=None) else version_b
+
+    @classmethod
+    def min_version(cls, version_a, version_b):
+        return version_a if cls.compare_versions(
+            version_a=version_a, op='<=', version_b=version_b, num_parts=None) else version_b
+
    @staticmethod
    def _parse_letter_version(
            letter,  # type: str
@@ -313,6 +358,77 @@ class SimpleVersion:
        return ()


+def compare_version_rules(specs_a, specs_b):
+    # specs_a/b are a list of tuples: [('==', '1.2.3'), ] or [('>=', '1.2'), ('<', '1.3')]
+    # section definition:
+    class Section(object):
+        def __init__(self, left=None, left_eq=False, right=None, right_eq=False):
+            self.left, self.left_eq, self.right, self.right_eq = left, left_eq, right, right_eq
+    # first create a list of in/out sections for each spec
+    # >, >= are left rule
+    # <, <= are right rule
+    # ~= x.y.z is converted to: >= x.y and < x.y+1
+    # ==/=== are converted to: >= and <=
+    # != x.y.z will split a section into: left < x.y.z and right > x.y.z
+    def create_section(specs):
+        section = Section()
+        for op, v in specs:
+            a = section
+            if op == '>':
+                a.left = v
+                a.left_eq = False
+            elif op == '>=':
+                a.left = v
+                a.left_eq = True
+            elif op == '<':
+                a.right = v
+                a.right_eq = False
+            elif op == '<=':
+                a.right = v
+                a.right_eq = True
+            elif op == '==':
+                a.left = v
+                a.left_eq = True
+                a.right = v
+                a.right_eq = True
+            elif op == '~=':
+                new_v = v.split('.')
+                a_left = '.'.join(new_v[:-1])
+                a.left = a_left if not a.left else SimpleVersion.max_version(a_left, a.left)
+                a.left_eq = True
+                a_right = '.'.join(new_v[:-2] + [str(int(new_v[-2])+1)])
+                a.right = a_right if not a.right else SimpleVersion.min_version(a_right, a.right)
+                a.right_eq = False if a.right == a_right else a.right_eq
+
+        return section
+
+    section_a = create_section(specs_a)
+    section_b = create_section(specs_b)
+    i = Section()
+    # then we have a list of sections for spec A/B
+    if section_a.left == section_b.left:
+        i.left = section_a.left
+        i.left_eq = section_a.left_eq and section_b.left_eq
+    else:
+        i.left = SimpleVersion.max_version(section_a.left, section_b.left)
+        i.left_eq = section_a.left_eq if i.left == section_a.left else section_b.left_eq
+    if section_a.right == section_b.right:
+        i.right = section_a.right
+        i.right_eq = section_a.right_eq and section_b.right_eq
+    else:
+        i.right = SimpleVersion.min_version(section_a.right, section_b.right)
+        i.right_eq = section_a.right_eq if i.right == section_a.right else section_b.right_eq
+
+    # return true if any section from A intersects a section from B
+    valid = True
+    valid &= SimpleVersion.compare_versions(
+        version_a=i.left, op='<=' if i.left_eq else '<', version_b=i.right, num_parts=None)
+    valid &= SimpleVersion.compare_versions(
+        version_a=i.right, op='>=' if i.left_eq else '>', version_b=i.left, num_parts=None)
+
+    return valid
+
+
@six.add_metaclass(ABCMeta)
 class RequirementSubstitution(object):

@@ -324,6 +440,7 @@ class RequirementSubstitution(object):
        self.config = session.config  # type: ConfigTree
        self.suffix = '.post{config[agent.cuda_version]}.dev{config[agent.cudnn_version]}'.format(config=self.config)
        self.package_manager = self.config['agent.package_manager.type']
+        self._is_already_installed_cb = None

    @abstractmethod
    def match(self, req):  # type: (MarkerRequirement) -> bool
@@ -339,6 +456,20 @@ class RequirementSubstitution(object):
        """
        pass

+    def set_is_already_installed_cb(self, cb):
+        self._is_already_installed_cb = cb
+
+    def is_already_installed(self, req):
+        if not self._is_already_installed_cb:
+            return False
+        # noinspection PyBroadException
+        try:
+            return self._is_already_installed_cb(req)
+        except BaseException as ex:
+            # debug could not resolve something
+            print("Warning: Requirements post install callback exception (check if package installed): {}".format(ex))
+            return False
+
    def post_scan_add_req(self):  # type: () -> Optional[MarkerRequirement]
        """
        Allows the RequirementSubstitution to add an extra line/requirements after
@@ -363,7 +494,7 @@ class RequirementSubstitution(object):

    @property
    def cuda_version(self):
-        return self.config['agent.cuda_version']
+        return convert_cuda_version_to_int_10_base_str(self.config['agent.cuda_version'])

    @property
    def cudnn_version(self):
@@ -449,6 +580,7 @@ class RequirementsManager(object):
                                                 cache_dir=pip_cache_dir.as_posix())
        self._base_interpreter = base_interpreter
        self._cwd = None
+        self._installed_parsed_packages = set()

    def register(self, cls):  # type: (Type[RequirementSubstitution]) -> None
        self.handlers.append(cls(self._session))
@@ -468,20 +600,9 @@ class RequirementsManager(object):
        return None

    def replace(self, requirements):  # type: (Text) -> Text
-        def safe_parse(req_str):
-            # noinspection PyBroadException
-            try:
-                return list(parse(req_str, cwd=self._cwd))
-            except Exception as ex:
-                return [Requirement(req_str)]
+        parsed_requirements = self.parse_requirements_section_to_marker_requirements(
+            requirements=requirements, cwd=self._cwd)

-        parsed_requirements = tuple(
-            map(
-                MarkerRequirement,
-                [r for line in (requirements.splitlines() if isinstance(requirements, six.text_type) else requirements)
-                 for r in safe_parse(line)]
-            )
-        )
        if not parsed_requirements:
            # return the original requirements just in case
            return requirements
@@ -510,14 +631,29 @@ class RequirementsManager(object):

        result = list(result)
        # add post scan add requirements call back
+        double_req_set = None
        for h in self.handlers:
-            req = h.post_scan_add_req()
-            if req:
-                result.append(req.tostr())
+            reqs = h.post_scan_add_req()
+            if reqs:
+                if double_req_set is None:
+                    def safe_parse_name(line):
+                        try:
+                            return Requirement.parse(line).name
+                        except:  # noqa
+                            return None
+                    double_req_set = set([safe_parse_name(r) for r in result if r])
+
+                for r in (reqs if isinstance(reqs, (tuple, list)) else [reqs]):
+                    if r and (not r.name or r.name not in double_req_set):
+                        result.append(r.tostr())
+                    elif r:
+                        print("SKIPPING additional auto installed package: \"{}\"".format(r))

        return join_lines(result)

-    def post_install(self, session):
+    def post_install(self, session, package_manager=None):
+        if package_manager:
+            self.update_installed_packages_state(package_manager.freeze())
        for h in self.handlers:
            try:
                h.post_install(session)
@@ -539,6 +675,34 @@ class RequirementsManager(object):
    def get_interpreter(self):
        return self._base_interpreter

+    def update_installed_packages_state(self, requirements):
+        """
+        Updates internal Installed Packages objects, so that later we can detect
+        if we already have a pre-installed package
+        :param requirements: is the output of a freeze() call, i.e. dict {'pip': "package==version"}
+        """
+        requirements = requirements if not isinstance(requirements, dict) else requirements.get("pip")
+        self._installed_parsed_packages = self.parse_requirements_section_to_marker_requirements(
+                requirements=requirements, cwd=self._cwd)
+        for h in self.handlers:
+            h.set_is_already_installed_cb(self._callback_is_already_installed)
+
+    def _callback_is_already_installed(self, req):
+        for p in (self._installed_parsed_packages or []):
+            if p.name != req.name:
+                continue
+            # if this is version control package, only return true of both installed and requests specify commit ID
+            if req.vcs:
+                return p.vcs and req.revision and req.revision == p.revision
+
+            if not req.specs and not p.specs:
+                return True
+
+            # return if this is the same version
+            return req.specs and p.specs and req.compare_version(p, op="==")
+
+        return False
+
    @staticmethod
    def get_cuda_version(config):  # type: (ConfigTree) -> (Text, Text)
        # we assume os.environ already updated the config['agent.cuda_version'] & config['agent.cudnn_version']
@@ -614,3 +778,29 @@ class RequirementsManager(object):

        return (normalize_cuda_version(cuda_version or 0),
                normalize_cuda_version(cudnn_version or 0))
+
+    @staticmethod
+    def parse_requirements_section_to_marker_requirements(requirements, cwd=None):
+        def safe_parse(req_str):
+            # noinspection PyBroadException
+            try:
+                return list(parse(req_str, cwd=cwd))
+            except Exception as ex:
+                return [Requirement(req_str)]
+
+        def create_req(x):
+            r = MarkerRequirement(x)
+            r.validate_local_file_ref()
+            return r
+
+        if not requirements:
+            return tuple()
+
+        parsed_requirements = tuple(
+            map(
+                create_req,
+                [r for line in (requirements.splitlines() if isinstance(requirements, str) else requirements)
+                 for r in safe_parse(line)]
+            )
+        )
+        return parsed_requirements
--- a/clearml_agent/helper/repo.py
+++ b/clearml_agent/helper/repo.py
@@ -1,7 +1,11 @@
 import abc
+import os
 import re
 import shutil
+import stat
 import subprocess
+import sys
+import tempfile
 from distutils.spawn import find_executable
 from hashlib import md5
 from os import environ
@@ -23,7 +27,7 @@ from clearml_agent.helper.base import (
    rm_tree,
    ExecutionInfo,
    normalize_path,
-    create_file_if_not_exists,
+    create_file_if_not_exists, safe_remove_file,
 )
 from clearml_agent.helper.os.locks import FileLock
 from clearml_agent.helper.process import DEVNULL, Argv, PathLike, COMMAND_SUCCESS
@@ -108,7 +112,7 @@ class VCS(object):
        )
        self.url = url
        self.location = Text(location)
-        self.revision = revision
+        self._revision = revision
        self.log = self.session.get_logger(__name__)

    @property
@@ -118,6 +122,13 @@ class VCS(object):
        """
        return self.add_auth(self.session.config, self.url)

+    @property
+    def url_without_auth(self):
+        """
+        Return URL without configured user/password
+        """
+        return self.add_auth(self.session.config, self.url, reset_auth=True)
+
    @abc.abstractmethod
    def executable_name(self):
        """
@@ -349,7 +360,9 @@ class VCS(object):
        If not in debug mode, filter VCS password from output.
        """
        self._set_ssh_url()
-        clone_command = ("clone", self.url_with_auth, self.location) + self.clone_flags
+        # if we are on linux no need for the full auth url because we use GIT_ASKPASS
+        url = self.url_without_auth if self._use_ask_pass else self.url_with_auth
+        clone_command = ("clone", url, self.location) + self.clone_flags
        # clone all branches regardless of when we want to later checkout
        # if branch:
        #    clone_command += ("-b", branch)
@@ -357,40 +370,41 @@ class VCS(object):
            self.call(*clone_command)
            return

-        def normalize_output(result):
-            """
-            Returns result string without user's password.
-            NOTE: ``self.get_stderr``'s result might or might not have the same type as ``e.output`` in case of error.
-            """
-            string_type = (
-                ensure_text
-                if isinstance(result, six.text_type)
-                else ensure_binary
-            )
-            return result.replace(
-                string_type(self.url),
-                string_type(furl(self.url).remove(password=True).tostr()),
-            )
-
-        def print_output(output):
-            print(ensure_text(output))
-
        try:
-            print_output(normalize_output(self.get_stderr(*clone_command)))
+            self._print_output(self._normalize_output(self.get_stderr(*clone_command)))
        except subprocess.CalledProcessError as e:
            # In Python 3, subprocess.CalledProcessError has a `stderr` attribute,
            # but since stderr is redirect to `subprocess.PIPE` it will appear in the usual `output` attribute
            if e.output:
-                e.output = normalize_output(e.output)
-                print_output(e.output)
+                e.output = self._normalize_output(e.output)
+                self._print_output(e.output)
            raise

+    def _normalize_output(self, result):
+        """
+        Returns result string without user's password.
+        NOTE: ``self.get_stderr``'s result might or might not have the same type as ``e.output`` in case of error.
+        """
+        string_type = (
+            ensure_text
+            if isinstance(result, six.text_type)
+            else ensure_binary
+        )
+        return result.replace(
+            string_type(self.url),
+            string_type(furl(self.url).remove(password=True).tostr()),
+        )
+
+    @staticmethod
+    def _print_output(output):
+        print(ensure_text(output))
+
    def checkout(self):
        # type: () -> None
        """
        Checkout repository at specified revision
        """
-        self.call("checkout", self.revision, *self.checkout_flags, cwd=self.location)
+        self.call("checkout", self._revision, *self.checkout_flags, cwd=self.location)

    @abc.abstractmethod
    def pull(self):
@@ -473,16 +487,18 @@ class VCS(object):
        return Argv(self.executable_name, *argv)

    @classmethod
-    def add_auth(cls, config, url):
+    def add_auth(cls, config, url, reset_auth=False):
        """
        Add username and password to URL if missing from URL and present in config.
        Does not modify ssh URLs.
+
+        :param reset_auth: If true remove the user/pass from the URL (default False)
        """
        try:
            parsed_url = furl(url)
        except ValueError:
            return url
-        if parsed_url.scheme in ["", "ssh"] or parsed_url.scheme.startswith("git"):
+        if parsed_url.scheme in ["", "ssh"] or (parsed_url.scheme or '').startswith("git"):
            return parsed_url.url
        config_user = ENV_AGENT_GIT_USER.get() or config.get("agent.{}_user".format(cls.executable_name), None)
        config_pass = ENV_AGENT_GIT_PASS.get() or config.get("agent.{}_pass".format(cls.executable_name), None)
@@ -493,7 +509,10 @@ class VCS(object):
            and config_pass
            and (not config_domain or config_domain.lower() == parsed_url.host)
        ):
-            parsed_url.set(username=config_user, password=config_pass)
+            if reset_auth:
+                parsed_url.set(username=None, password=None)
+            else:
+                parsed_url.set(username=config_user, password=config_pass)
        return parsed_url.url

    @abc.abstractmethod
@@ -519,7 +538,7 @@ class VCS(object):

 class Git(VCS):
    executable_name = "git"
-    main_branch = "master"
+    main_branch = ("master", "main")
    clone_flags = ("--quiet", "--recursive")
    checkout_flags = ("--force",)
    COMMAND_ENV = {
@@ -529,9 +548,22 @@ class Git(VCS):
        "GIT_SSH_COMMAND": "ssh -oBatchMode=yes",
    }

+    def __init__(self, *args, **kwargs):
+        super(Git, self).__init__(*args, **kwargs)
+
+        self._use_ask_pass = False if not self.session.config.get('agent.enable_git_ask_pass', None) \
+            else sys.platform == "linux"
+
+        try:
+            self.call("config", "--global", "--replace-all", "safe.directory", "*", cwd=self.location)
+        except:  # noqa
+            pass
+
    @staticmethod
    def remote_branch_name(branch):
-        return "origin/{}".format(branch)
+        return [
+            "origin/{}".format(b) for b in ([branch] if isinstance(branch, str) else branch)
+        ]

    def executable_not_found_error_help(self):
        return 'Cannot find "{}" executable. {}'.format(
@@ -549,11 +581,79 @@ class Git(VCS):
    def pull(self):
        self.call("fetch", "--all", "--recurse-submodules", cwd=self.location)

+    def _git_pass_auth_wrapper(self, func, *args, **kwargs):
+        try:
+            url_with_auth = furl(self.url_with_auth)
+            password = url_with_auth.password if url_with_auth else None
+            username = url_with_auth.username if url_with_auth else None
+        except:  # noqa
+            password = None
+            username = None
+
+        # if this is not linux or we do not have a password, just run as is
+        if not self._use_ask_pass or not password or not username:
+            return func(*args, **kwargs)
+
+        # create the password file
+        fp, pass_file = tempfile.mkstemp(prefix='clearml_git_', suffix='.sh')
+        os.close(fp)
+        with open(pass_file, 'wt') as f:
+            # get first letter only (username / password are the argument options)
+            # then echo the correct information
+            f.writelines([
+                '#!/bin/bash\n',
+                'c="$1"\n',
+                'c="${c%"${c#?}"}"\n',
+                'if [ "$c" == "u" ] || [ "$c" == "U" ]; then echo "{}"; else echo "{}"; fi\n'.format(
+                    username.replace('"', '\\"'), password.replace('"', '\\"')
+                )
+            ])
+        # mark executable
+        st = os.stat(pass_file)
+        os.chmod(pass_file, st.st_mode | stat.S_IEXEC)
+        # let GIT use it
+        self.COMMAND_ENV["GIT_ASKPASS"] = pass_file
+        # call git command
+        try:
+            ret = func(*args, **kwargs)
+        finally:
+            # delete temp password file
+            self.COMMAND_ENV.pop("GIT_ASKPASS", None)
+            safe_remove_file(pass_file)
+
+        return ret
+
+    def get_stderr(self, *argv, **kwargs):
+        """
+        Wrapper with git password authentication
+        """
+        return self._git_pass_auth_wrapper(super(Git, self).get_stderr, *argv, **kwargs)
+
+    def call_with_stdin(self, *argv, **kwargs):
+        """
+        Wrapper with git password authentication
+        """
+        return self._git_pass_auth_wrapper(super(Git, self).call_with_stdin, *argv, **kwargs)
+
+    def call(self, *argv, **kwargs):
+        """
+        Wrapper with git password authentication
+        """
+        return self._git_pass_auth_wrapper(super(Git, self).call, *argv, **kwargs)
+
    def checkout(self):  # type: () -> None
        """
        Checkout repository at specified revision
        """
-        self.call("checkout", self.revision, *self.checkout_flags, cwd=self.location)
+        revisions = [self._revision] if isinstance(self._revision, str) else self._revision
+        for i, revision in enumerate(revisions):
+            try:
+                self.call("checkout", revision, *self.checkout_flags, cwd=self.location)
+                break
+            except subprocess.CalledProcessError:
+                if i == len(revisions) - 1:
+                    raise
+
        try:
            self.call("submodule", "update", "--recursive", cwd=self.location)
        except:  # noqa
@@ -593,7 +693,7 @@ class Hg(VCS):
            "pull",
            self.url_with_auth,
            cwd=self.location,
-            *(("-r", self.revision) if self.revision else ())
+            *(("-r", self._revision) if self._revision else ())
        )

    info_commands = dict(
@@ -663,7 +763,9 @@ def clone_repository_cached(session, execution, destination):

            vcs.pull()
            rm_tree(destination)
-            shutil.copytree(Text(cached_repo_path), Text(clone_folder))
+            shutil.copytree(Text(cached_repo_path), Text(clone_folder),
+                            symlinks=select_for_platform(linux=True, windows=False),
+                            ignore_dangling_symlinks=True)
            if not clone_folder.is_dir():
                raise CommandFailedError(
                    "copying of repository failed: from {} to {}".format(
@@ -671,9 +773,9 @@ def clone_repository_cached(session, execution, destination):
                    )
                )

-    # checkout in the newly copy destination
-    vcs.location = Text(clone_folder)
-    vcs.checkout()
+            # checkout in the newly copy destination
+            vcs.location = Text(clone_folder)
+            vcs.checkout()

    repo_info = vcs.get_repository_copy_info(clone_folder)

--- a/clearml_agent/helper/resource_monitor.py
+++ b/clearml_agent/helper/resource_monitor.py
@@ -82,7 +82,7 @@ class ResourceMonitor(object):
        if not worker_tags and ENV_WORKER_TAGS.get():
            worker_tags = shlex.split(ENV_WORKER_TAGS.get())
        self._worker_tags = worker_tags
-        if os.environ.get('NVIDIA_VISIBLE_DEVICES') == 'none':
+        if Session.get_nvidia_visible_env() == 'none':
            # NVIDIA_VISIBLE_DEVICES set to none, marks cpu_only flag
            # active_gpus == False means no GPU reporting
            self._active_gpus = False
@@ -92,10 +92,9 @@ class ResourceMonitor(object):
            # None means no filtering, report all gpus
            self._active_gpus = None
            try:
-                active_gpus = os.environ.get('NVIDIA_VISIBLE_DEVICES', '') or \
-                              os.environ.get('CUDA_VISIBLE_DEVICES', '')
+                active_gpus = Session.get_nvidia_visible_env() or ""
                if active_gpus:
-                    self._active_gpus = [int(g.strip()) for g in active_gpus.split(',')]
+                    self._active_gpus = [g.strip() for g in active_gpus.split(',')]
            except Exception:
                pass

@@ -263,7 +262,7 @@ class ResourceMonitor(object):
                gpu_stat = self._gpustat.new_query()
                for i, g in enumerate(gpu_stat.gpus):
                    # only monitor the active gpu's, if none were selected, monitor everything
-                    if self._active_gpus and i not in self._active_gpus:
+                    if self._active_gpus and str(i) not in self._active_gpus:
                        continue
                    stats["gpu_temperature_{:d}".format(i)] = g["temperature.gpu"]
                    stats["gpu_utilization_{:d}".format(i)] = g["utilization.gpu"]
--- a/clearml_agent/interface/worker.py
+++ b/clearml_agent/interface/worker.py
@@ -22,7 +22,7 @@ WORKER_ARGS = {
        'help': 'git username for repository access',
    },
    '--git-pass': {
-        'help': 'git password for repository access',
+        'help': 'git password (personal access tokens) for repository access',
    },
    '--log-level': {
        'help': 'SDK log level',
@@ -99,12 +99,14 @@ DAEMON_ARGS = dict({
        'aliases': ['-d'],
    },
    '--stop': {
-        'help': 'Stop the running agent (based on the same set of arguments)',
-        'action': 'store_true',
+        'help': 'Stop the running agent (based on the same set of arguments). '
+                'Optional: provide a list of specific local worker IDs to stop',
+        'nargs': '*',
+        'default': False,
    },
    '--dynamic-gpus': {
        'help': 'Allow to dynamically allocate gpus based on queue properties, '
-                'configure with \'--queues <queue_name>=<num_gpus>\'.'
+                'configure with \'--queue <queue_name>=<num_gpus>\'.'
                ' Example: \'--dynamic-gpus --gpus 0-3 --queue dual_gpus=2 single_gpu=1\''
                ' Example Opportunistic: \'--dynamic-gpus --gpus 0-3 --queue dual_gpus=2 max_quad_gpus=1-4 \'',
        'action': 'store_true',
@@ -165,7 +167,7 @@ COMMANDS = {
            },
            '--docker': {
                'help': 'Run execution task inside a docker (v19.03 and above). Optional args <image> <arguments> or '
-                        'specify default docker image in agent.default_docker.image / agent.default_docker.arguments'
+                        'specify default docker image in agent.default_docker.image / agent.default_docker.arguments '
                        'use --gpus/--cpu-only (or set NVIDIA_VISIBLE_DEVICES) to limit gpu visibility for docker',
                'nargs': '*',
                'default': False,
@@ -199,11 +201,18 @@ COMMANDS = {
            },
            '--docker': {
                'help': 'Build the experiment inside a docker (v19.03 and above). Optional args <image> <arguments> or '
-                'specify default docker image in agent.default_docker.image / agent.default_docker.arguments'
+                'specify default docker image in agent.default_docker.image / agent.default_docker.arguments '
                'use --gpus/--cpu-only (or set NVIDIA_VISIBLE_DEVICES) to limit gpu visibility for docker',
                'nargs': '*',
                'default': False,
            },
+            '--force-docker': {
+                'help': 'Force using the agent-specified docker image (either explicitly in the --docker argument or '
+                        'using the agent\'s default docker image). If provided, the agent will not use any docker '
+                        'container information stored on the task itself (default False)',
+                'default': False,
+                'action': 'store_true',
+            },
            '--python-version': {
                'help': 'Virtual environment python version to use',
            },
--- a/clearml_agent/session.py
+++ b/clearml_agent/session.py
@@ -76,7 +76,7 @@ class Session(_Session):

        cpu_only = kwargs.get('cpu_only')
        if cpu_only:
-            os.environ['CUDA_VISIBLE_DEVICES'] = os.environ['NVIDIA_VISIBLE_DEVICES'] = 'none'
+            Session.set_nvidia_visible_env('none')

        if kwargs.get('gpus') and not os.environ.get('KUBERNETES_SERVICE_HOST') \
                and not os.environ.get('KUBERNETES_PORT'):
@@ -85,7 +85,7 @@ class Session(_Session):
                os.environ.pop('CUDA_VISIBLE_DEVICES', None)
                os.environ['NVIDIA_VISIBLE_DEVICES'] = kwargs.get('gpus')
            else:
-                os.environ['CUDA_VISIBLE_DEVICES'] = os.environ['NVIDIA_VISIBLE_DEVICES'] = kwargs.get('gpus')
+                Session.set_nvidia_visible_env(kwargs.get('gpus'))

        if kwargs.get('only_load_config'):
            from clearml_agent.backend_api.config import load
@@ -229,26 +229,35 @@ class Session(_Session):
            except:
                pass

-    def print_configuration(self, remove_secret_keys=("secret", "pass", "token", "account_key")):
+    def print_configuration(
+            self,
+            remove_secret_keys=("secret", "pass", "token", "account_key", "contents"),
+            skip_value_keys=("environment", )
+    ):
        # remove all the secrets from the print
-        def recursive_remove_secrets(dictionary, secret_keys=()):
+        def recursive_remove_secrets(dictionary, secret_keys=(), empty_keys=()):
            for k in list(dictionary):
                for s in secret_keys:
                    if s in k:
                        dictionary.pop(k)
                        break
+                for s in empty_keys:
+                    if s == k:
+                        dictionary[k] = {key: '****' for key in dictionary[k]} \
+                            if isinstance(dictionary[k], dict) else '****'
+                        break
                if isinstance(dictionary.get(k, None), dict):
-                    recursive_remove_secrets(dictionary[k], secret_keys=secret_keys)
+                    recursive_remove_secrets(dictionary[k], secret_keys=secret_keys, empty_keys=empty_keys)
                elif isinstance(dictionary.get(k, None), (list, tuple)):
                    for item in dictionary[k]:
                        if isinstance(item, dict):
-                            recursive_remove_secrets(item, secret_keys=secret_keys)
+                            recursive_remove_secrets(item, secret_keys=secret_keys, empty_keys=empty_keys)

        config = deepcopy(self.config.to_dict())
        # remove the env variable, it's not important
        config.pop('env', None)
-        if remove_secret_keys:
-            recursive_remove_secrets(config, secret_keys=remove_secret_keys)
+        if remove_secret_keys or skip_value_keys:
+            recursive_remove_secrets(config, secret_keys=remove_secret_keys, empty_keys=skip_value_keys)
        # remove logging.loggers.urllib3.level from the print
        try:
            config['logging']['loggers']['urllib3'].pop('level', None)
@@ -318,6 +327,23 @@ class Session(_Session):
    def command(self, *args):
        return Argv(*args, log=self.get_logger(Argv.__module__))

+    @staticmethod
+    def set_nvidia_visible_env(gpus):
+        if not gpus:
+            gpus = ""
+        visible_env = gpus.replace(".", ":") if isinstance(gpus, str) else \
+            ','.join(str(g).replace(".", ":") for g in gpus)
+
+        os.environ['CUDA_VISIBLE_DEVICES'] = os.environ['NVIDIA_VISIBLE_DEVICES'] = visible_env
+
+    @staticmethod
+    def get_nvidia_visible_env():
+        visible_env = os.environ.get('NVIDIA_VISIBLE_DEVICES') or os.environ.get('CUDA_VISIBLE_DEVICES')
+        if visible_env is None:
+            return None
+        visible_env = str(visible_env).replace(":", ".")
+        return visible_env
+

@attr.s
 class TrainsAgentLogger(object):
--- a/clearml_agent/version.py
+++ b/clearml_agent/version.py
@@ -1 +1 @@
-__version__ = '1.1.0'
+__version__ = '1.3.0'
--- a/docker/k8s-glue/build-resources/clearml.conf
+++ b/docker/k8s-glue/build-resources/clearml.conf
@@ -171,7 +171,7 @@ agent {

    default_docker: {
        # default docker image to use when running in docker mode
-        image: "nvidia/cuda:10.1-cudnn7-runtime-ubuntu18.04"
+        image: "nvidia/cuda:10.2-cudnn7-runtime-ubuntu18.04"

        # optional arguments to pass to docker image
        # arguments: ["--ipc=host", ]
--- a/docker/k8s-glue/glue-build/Dockerfile.alpine
+++ b/docker/k8s-glue/glue-build/Dockerfile.alpine
@@ -0,0 +1,75 @@
+ARG TAG=3.7.12-alpine3.15
+
+FROM python:${TAG} as build
+
+RUN apk add --no-cache \
+    gcc \
+    musl-dev \
+    libffi-dev
+
+RUN python3 \
+    -m pip \
+    install \
+    --prefix=/install \
+    --no-cache-dir \
+    -U \
+    clearml-agent \
+    cryptography>=2.9
+
+FROM python:${TAG} as target
+
+WORKDIR /app
+
+ARG KUBECTL_VERSION=1.22.4
+
+# Not sure about these ENV vars
+# ENV LC_ALL=en_US.UTF-8
+# ENV LANG=en_US.UTF-8
+# ENV LANGUAGE=en_US.UTF-8
+# ENV PYTHONIOENCODING=UTF-8
+
+COPY --from=build /install /usr/local
+
+ADD https://storage.googleapis.com/kubernetes-release/release/v${KUBECTL_VERSION}/bin/linux/amd64/kubectl /usr/bin/
+
+RUN chmod +x /usr/bin/kubectl
+
+RUN apk add --no-cache \
+    bash
+
+COPY k8s_glue_example.py .
+
+# AWS CLI
+# https://github.com/kyleknap/aws-cli/blob/source-proposal/proposals/source-install.md#alpine-linux
+# https://github.com/aws/aws-cli/issues/4685
+# https://github.com/aws/aws-cli/pull/6352
+
+# https://github.com/GoogleCloudPlatform/cloud-sdk-docker/blob/master/alpine/Dockerfile
+
+FROM target as gcp
+
+ARG CLOUD_SDK_VERSION=371.0.0
+ENV CLOUD_SDK_VERSION=$CLOUD_SDK_VERSION
+ENV PATH /google-cloud-sdk/bin:$PATH
+
+WORKDIR /
+
+RUN apk --no-cache add \
+        curl \
+        python3 \
+        py3-crcmod \
+        py3-openssl \
+        bash \
+        libc6-compat \
+        openssh-client \
+        git \
+        gnupg \
+    && curl -O https://dl.google.com/dl/cloudsdk/channels/rapid/downloads/google-cloud-sdk-${CLOUD_SDK_VERSION}-linux-x86_64.tar.gz && \
+    tar xzf google-cloud-sdk-${CLOUD_SDK_VERSION}-linux-x86_64.tar.gz && \
+    rm google-cloud-sdk-${CLOUD_SDK_VERSION}-linux-x86_64.tar.gz && \
+    gcloud config set core/disable_usage_reporting true && \
+    gcloud config set component_manager/disable_update_check true && \
+    gcloud config set metrics/environment github_docker_image && \
+    gcloud --version
+
+WORKDIR /app
--- a/docker/k8s-glue/glue-build/Dockerfile.bullseye
+++ b/docker/k8s-glue/glue-build/Dockerfile.bullseye
@@ -0,0 +1,82 @@
+ARG TAG=3.7.12-slim-bullseye
+
+FROM python:${TAG} as target
+
+ARG KUBECTL_VERSION=1.22.4
+
+WORKDIR /app
+
+RUN python3 \
+    -m pip \
+    install \
+    --no-cache-dir \
+    -U \
+    clearml-agent \
+    cryptography>=2.9
+
+# Not sure about these ENV vars
+# ENV LC_ALL=en_US.UTF-8
+# ENV LANG=en_US.UTF-8
+# ENV LANGUAGE=en_US.UTF-8
+# ENV PYTHONIOENCODING=UTF-8
+
+ADD https://storage.googleapis.com/kubernetes-release/release/v${KUBECTL_VERSION}/bin/linux/amd64/kubectl /usr/bin/
+
+RUN chmod +x /usr/bin/kubectl
+
+COPY k8s_glue_example.py .
+
+CMD ["python3", "k8s_glue_example.py"]
+
+FROM target as aws
+
+# https://docs.aws.amazon.com/cli/latest/userguide/getting-started-install.html
+# https://docs.aws.amazon.com/eks/latest/userguide/install-aws-iam-authenticator.html
+
+RUN apt-get update -qqy && \
+    apt-get install -qqy \
+    unzip && \
+    rm -rf /var/lib/apt/lists/*
+
+ADD https://awscli.amazonaws.com/awscli-exe-linux-x86_64.zip awscliv2.zip
+ADD https://amazon-eks.s3.us-west-2.amazonaws.com/1.21.2/2021-07-05/bin/linux/amd64/aws-iam-authenticator /usr/local/bin/aws-iam-authenticator
+
+RUN unzip awscliv2.zip && \
+    ./aws/install && \
+    rm -r awscliv2.zip aws/ && \
+    chmod +x /usr/local/bin/aws-iam-authenticator && \
+    aws --version && \
+    aws-iam-authenticator version
+
+# https://github.com/GoogleCloudPlatform/cloud-sdk-docker/blob/master/debian_slim/Dockerfile
+
+FROM target as gcp
+
+ARG CLOUD_SDK_VERSION=371.0.0
+ENV CLOUD_SDK_VERSION=$CLOUD_SDK_VERSION
+
+ENV PATH "$PATH:/opt/google-cloud-sdk/bin/"
+
+ARG INSTALL_COMPONENTS
+RUN mkdir -p /usr/share/man/man1/
+RUN apt-get update -qqy && \
+    apt-get install -qqy \
+        curl \
+        gcc \
+        python3-dev \
+        python3-pip \
+        apt-transport-https \
+        lsb-release \
+        openssh-client \
+        git \
+        gnupg && \
+        rm -rf /var/lib/apt/lists/* && \
+    pip3 install -U crcmod && \
+    export CLOUD_SDK_REPO="cloud-sdk-$(lsb_release -c -s)" && \
+    echo "deb https://packages.cloud.google.com/apt $CLOUD_SDK_REPO main" > /etc/apt/sources.list.d/google-cloud-sdk.list && \
+    curl https://packages.cloud.google.com/apt/doc/apt-key.gpg | apt-key add - && \
+    apt-get update && apt-get install -y google-cloud-sdk=${CLOUD_SDK_VERSION}-0 $INSTALL_COMPONENTS && \
+    gcloud config set core/disable_usage_reporting true && \
+    gcloud config set component_manager/disable_update_check true && \
+    gcloud config set metrics/environment github_docker_image && \
+    gcloud --version
--- a/docker/k8s-glue/glue-build/k8s_glue_example.py
+++ b/docker/k8s-glue/glue-build/k8s_glue_example.py
@@ -0,0 +1,94 @@
+"""
+This example assumes you have preconfigured services with selectors in the form of
+ "ai.allegro.agent.serial=pod-<number>" and a targetPort of 10022.
+The K8sIntegration component will label each pod accordingly.
+"""
+from argparse import ArgumentParser
+
+from clearml_agent.glue.k8s import K8sIntegration
+
+
+def parse_args():
+    parser = ArgumentParser()
+    group = parser.add_mutually_exclusive_group()
+
+    parser.add_argument(
+        "--queue", type=str, help="Queue to pull tasks from"
+    )
+    group.add_argument(
+        "--ports-mode", action='store_true', default=False,
+        help="Ports-Mode will add a label to the pod which can be used as service, in order to expose ports"
+             "Should not be used with max-pods"
+    )
+    parser.add_argument(
+        "--num-of-services", type=int, default=20,
+        help="Specify the number of k8s services to be used. Use only with ports-mode."
+    )
+    parser.add_argument(
+        "--base-port", type=int,
+        help="Used in conjunction with ports-mode, specifies the base port exposed by the services. "
+             "For pod #X, the port will be <base-port>+X. Note that pod number is calculated based on base-pod-num"
+             "e.g. if base-port=20000 and base-pod-num=3, the port for the first pod will be 20003"
+    )
+    parser.add_argument(
+        "--base-pod-num", type=int, default=1,
+        help="Used in conjunction with ports-mode and base-port, specifies the base pod number to be used by the "
+             "service (default: %(default)s)"
+    )
+    parser.add_argument(
+        "--gateway-address", type=str, default=None,
+        help="Used in conjunction with ports-mode, specify the external address of the k8s ingress / ELB"
+    )
+    parser.add_argument(
+        "--pod-clearml-conf", type=str,
+        help="Configuration file to be used by the pod itself (if not provided, current configuration is used)"
+    )
+    parser.add_argument(
+        "--overrides-yaml", type=str,
+        help="YAML file containing pod overrides to be used when launching a new pod"
+    )
+    parser.add_argument(
+        "--template-yaml", type=str,
+        help="YAML file containing pod template. If provided pod will be scheduled with kubectl apply "
+             "and overrides are ignored, otherwise it will be scheduled with kubectl run"
+    )
+    parser.add_argument(
+        "--ssh-server-port", type=int, default=0,
+        help="If non-zero, every pod will also start an SSH server on the selected port (default: zero, not active)"
+    )
+    parser.add_argument(
+        "--namespace", type=str,
+        help="Specify the namespace in which pods will be created (default: %(default)s)", default="clearml"
+    )
+    group.add_argument(
+        "--max-pods", type=int,
+        help="Limit the maximum number of pods that this service can run at the same time."
+             "Should not be used with ports-mode"
+    )
+    return parser.parse_args()
+
+
+def main():
+    args = parse_args()
+
+    user_props_cb = None
+    if args.ports_mode and args.base_port:
+        def k8s_user_props_cb(pod_number=0):
+            user_prop = {"k8s-pod-port": args.base_port + pod_number}
+            if args.gateway_address:
+                user_prop["k8s-gateway-address"] = args.gateway_address
+            return user_prop
+        user_props_cb = k8s_user_props_cb
+
+    k8s = K8sIntegration(
+        ports_mode=args.ports_mode, num_of_services=args.num_of_services, base_pod_num=args.base_pod_num,
+        user_props_cb=user_props_cb, overrides_yaml=args.overrides_yaml, clearml_conf_file=args.pod_clearml_conf,
+        template_yaml=args.template_yaml, extra_bash_init_script=K8sIntegration.get_ssh_server_bash(
+            ssh_port_number=args.ssh_server_port) if args.ssh_server_port else None,
+        namespace=args.namespace, max_pods_limit=args.max_pods or None,
+    )
+    k8s.k8s_daemon(args.queue)
+
+
+if __name__ == "__main__":
+    main()
--- a/docs/clearml.conf
+++ b/docs/clearml.conf
@@ -4,7 +4,7 @@ api {
    web_server: https://demoapp.demo.clear.ml
    files_server: https://demofiles.demo.clear.ml

-    # Credentials are generated in the webapp, https://demoapp.demo.clear.ml/profile
+    # Credentials are generated in the webapp, https://app.clear.ml/settings/workspace-configuration
    # Overridden with os environment: CLEARML_API_ACCESS_KEY / CLEARML_API_SECRET_KEY
    credentials {"access_key": "EGRTCO8JMSIGI6S39GTP43NFWXDQOW", "secret_key": "x!XTov_G-#vspE*Y(h$Anm&DIc5Ou-F)jsl$PdOyj5wG1&E!Z8"}

@@ -15,6 +15,11 @@ api {
 agent {
    # Set GIT user/pass credentials (if user/pass are set, GIT protocol will be set to https)
    # leave blank for GIT SSH credentials (set force_git_ssh_protocol=true to force SSH protocol)
+    # **Notice**: GitHub personal token is equivalent to password, you can put it directly into `git_pass`
+    # To learn how to generate git token GitHub/Bitbucket/GitLab:
+    # https://docs.github.com/en/authentication/keeping-your-account-and-data-secure/creating-a-personal-access-token
+    # https://support.atlassian.com/bitbucket-cloud/docs/app-passwords/
+    # https://docs.gitlab.com/ee/user/profile/personal_access_tokens.html
    git_user=""
    git_pass=""
    # Limit credentials to a single domain, for example: github.com,
@@ -29,12 +34,12 @@ agent {
    # force_git_ssh_user: git

    # unique name of this worker, if None, created based on hostname:process_id
-    # Overridden with os environment: CLEARML_WORKER_NAME
+    # Overridden with os environment: CLEARML_WORKER_ID
    # worker_id: "clearml-agent-machine1:gpu0"
    worker_id: ""

    # worker name, replaces the hostname when creating a unique name for this worker
-    # Overridden with os environment: CLEARML_WORKER_ID
+    # Overridden with os environment: CLEARML_WORKER_NAME
    # worker_name: "clearml-agent-machine1"
    worker_name: ""

@@ -60,6 +65,8 @@ agent {

        # specify pip version to use (examples "<20", "==19.3.1", "", empty string will install the latest version)
        # pip_version: "<20"
+        # specify poetry version to use (examples "<2", "==1.1.1", "", empty string will install the latest version)
+        # poetry_version: "<2",

        # virtual environment inheres packages from system
        system_site_packages: false,
@@ -155,10 +162,57 @@ agent {

    default_docker: {
        # default docker image to use when running in docker mode
-        image: "nvidia/cuda:10.1-cudnn7-runtime-ubuntu18.04"
+        image: "nvidia/cuda:10.2-cudnn7-runtime-ubuntu18.04"

        # optional arguments to pass to docker image
        # arguments: ["--ipc=host"]
+
+        # lookup table rules for default container
+        # first matched rule will be picked, according to rule order
+        # enterprise version only
+        # match_rules: [
+        #     {
+        #         image: "nvidia/cuda:10.1-cudnn7-runtime-ubuntu18.04"
+        #         arguments: "-e define=value"
+        #         match: {
+        #             script{
+        #                 # Optional: must match all requirements (not partial)
+        #                 requirements: {
+        #                     # version selection matching PEP-440
+        #                     pip: {
+        #                         tensorflow: "~=2.6"
+        #                     },
+        #                 }
+        #                 # Optional: matching based on regular expression, example: "^exact_match$"
+        #                 repository: "/my_repository/"
+        #                 branch: "main"
+        #                 binary: "python3.6"
+        #             }
+        #             # Optional: matching based on regular expression, example: "^exact_match$"
+        #             project: "project/sub_project"
+        #         }
+        #     },
+        #     {
+        #         image: "nvidia/cuda:10.1-cudnn7-runtime-ubuntu18.04"
+        #         arguments: "-e define=value"
+        #         match: {
+        #             # must match all requirements (not partial)
+        #             script{
+        #                 requirements: {
+        #                     conda: {
+        #                         torch: ">=2.6,<2.8"
+        #                     }
+        #                 }
+        #                 # no repository matching required
+        #                 repository: ""
+        #             }
+        #             # no container image matching required (allow to replace one requested container with another)
+        #             container: ""
+        #             # no repository matching required
+        #             project: ""
+        #         }
+        #     },
+        # ]
    }

    # set the OS environments based on the Task's Environment section before launching the Task process.
@@ -179,6 +233,7 @@ agent {
    hide_docker_command_env_vars {
        enabled: true
        extra_keys: []
+        parse_embedded_urls: true
    }

    # allow to set internal mount points inside the docker,
@@ -190,14 +245,14 @@ agent {
    #     pip_cache: "/root/.cache/pip"
    #     poetry_cache: "/root/.cache/pypoetry"
    #     vcs_cache: "/root/.clearml/vcs-cache"
-    #     venv_build: "/root/.clearml/venvs-builds"
+    #     venv_build: "~/.clearml/venvs-builds"
    #     pip_download: "/root/.clearml/pip-download-cache"
    # }

    # Name docker containers created by the daemon using the following string format (supported from Docker 0.6.5)
-    # Allowed variables are task_id, worker_id and rand_string (random lower-case letters string, up to 32 charaters)
-    # Note: resulting name must start with an alpha-numeric character and
-    #       continue with a alpha-numeric characters, underscores (_), dots (.) and/or dashes (-)
+    # Allowed variables are task_id, worker_id and rand_string (random lower-case letters string, up to 32 characters)
+    # Note: resulting name must start with an alphanumeric character and
+    #       continue with alphanumeric characters, underscores (_), dots (.) and/or dashes (-)
    # docker_container_name_format: "clearml-id-{task_id}-{rand_string:.8}"
 }

@@ -270,6 +325,11 @@ sdk {
            key: ""
            secret: ""
            region: ""
+            # Or enable credentials chain to let Boto3 pick the right credentials.
+            # This includes picking credentials from environment variables,
+            # credential file and IAM role using metadata service.
+            # Refer to the latest Boto3 docs
+            use_credentials_chain: false

            credentials: [
                # specifies key/secret credentials to use when handling s3 urls (read or write)
@@ -285,6 +345,7 @@ sdk {
                #     secret: "12345678"
                #     multipart: false
                #     secure: false
+                #     verify: /path/to/ca/bundle.crt OR false to not verify
                # }
            ]
        }
@@ -359,5 +420,45 @@ sdk {
            log_stdout: True
        }
    }
+
+    # Apply top-level environment section from configuration into os.environ
+    apply_environment: true
+    # Top-level environment section is in the form of:
+    #   environment {
+    #     key: value
+    #     ...
+    #   }
+    # and is applied to the OS environment as `key=value` for each key/value pair
+
+    # Apply top-level files section from configuration into local file system
+    apply_files: true
+    # Top-level files section allows auto-generating files at designated paths with a predefined contents
+    # and target format. Options include:
+    #  contents: the target file's content, typically a string (or any base type int/float/list/dict etc.)
+    #  format: a custom format for the contents. Currently supported value is `base64` to automatically decode a
+    #          base64-encoded contents string, otherwise ignored
+    #  path: the target file's path, may include ~ and inplace env vars
+    #  target_format: format used to encode contents before writing into the target file. Supported values are json,
+    #                 yaml, yml and bytes (in which case the file will be written in binary mode). Default is text mode.
+    #  overwrite: overwrite the target file in case it exists. Default is true.
+    #
+    # Example:
+    #   files {
+    #     myfile1 {
+    #       contents: "The quick brown fox jumped over the lazy dog"
+    #       path: "/tmp/fox.txt"
+    #     }
+    #     myjsonfile {
+    #       contents: {
+    #         some {
+    #           nested {
+    #             value: [1, 2, 3, 4]
+    #           }
+    #         }
+    #       }
+    #       path: "/tmp/test.json"
+    #       target_format: json
+    #     }
+    #   }
 }

--- a/examples/k8s_glue_example.py
+++ b/examples/k8s_glue_example.py
@@ -65,6 +65,10 @@ def parse_args():
        help="Limit the maximum number of pods that this service can run at the same time."
             "Should not be used with ports-mode"
    )
+    parser.add_argument(
+        "--use-owner-token", action="store_true", default=False,
+        help="Generate and use task owner token for the execution of each task"
+    )
    return parser.parse_args()


@@ -87,7 +91,7 @@ def main():
            ssh_port_number=args.ssh_server_port) if args.ssh_server_port else None,
        namespace=args.namespace, max_pods_limit=args.max_pods or None,
    )
-    k8s.k8s_daemon(args.queue)
+    k8s.k8s_daemon(args.queue, use_owner_token=args.use_owner_token)


 if __name__ == "__main__":
--- a/requirements.txt
+++ b/requirements.txt
@@ -8,10 +8,10 @@ psutil>=3.4.2,<5.9.0
 pyhocon>=0.3.38,<0.4.0
 pyparsing>=2.0.3,<2.5.0
 python-dateutil>=2.4.2,<2.9.0
-pyjwt>=1.6.4,<2.1.0
+pyjwt>=2.4.0,<2.5.0
 PyYAML>=3.12,<5.5.0
 requests>=2.20.0,<2.26.0
-six>=1.11.0,<1.16.0
-typing>=3.6.4,<3.8.0
+six>=1.13.0,<1.16.0
+typing>=3.6.4,<3.8.0 ; python_version < '3.5'
 urllib3>=1.21.1,<1.27.0
 virtualenv>=16,<21
--- a/setup.py
+++ b/setup.py
@@ -61,6 +61,7 @@ setup(
        'Programming Language :: Python :: 3.7',
        'Programming Language :: Python :: 3.8',
        'Programming Language :: Python :: 3.9',
+        'Programming Language :: Python :: 3.10',
        'License :: OSI Approved :: Apache Software License',
    ],
Author	SHA1	Message	Date
allegroai	8b4f1eefc2	Add more debug printouts in k8s glue	2022-09-02 23:49:28 +03:00
allegroai	97c2e21dcc	Fix resolving k8s pending queue may cause a queue with a uuid name to be created	2022-09-02 23:49:28 +03:00
allegroai	918dd39b87	Add docker ssh_ro_folder (default: "/.ssh") changed docker ssh_folder (default: "~/.ssh")	2022-09-02 23:49:27 +03:00
allegroai	7776e906c4	Fix second .ssh temp mount fails if container changes the files inside	2022-09-02 23:49:27 +03:00
allegroai	1bf865ec08	Fix name not escaped as regex (all services "get_all" use regex for name)	2022-09-02 23:49:27 +03:00
Luca Cerone	3f1ce847dc	Fixed documentation (#117 ) * Fixed documentation * Update README.md	2022-09-01 17:18:48 +03:00
allegroai	9006c2d28f	Add support for abort callback registration	2022-08-29 18:06:59 +03:00
allegroai	ec216198a0	Add agent.enable_git_ask_pass to improve passing user/pass to git commands	2022-08-29 18:06:26 +03:00
allegroai	fe6adbf110	Fix package @ file:// with quoted (url style) links should not be ignored	2022-08-29 18:06:09 +03:00
allegroai	2693c565ba	Fix docker mode use "~/.clearml/venvs-builds" as default for easier user-mode containers	2022-08-29 18:05:53 +03:00
allegroai	9054ea37c2	Fix home folder	2022-08-23 23:16:56 +03:00
allegroai	7292263f86	Add CLEARML_K8S_GLUE_START_AGENT_SCRIPT_PATH to allow customizing the agent startup script location for k8s glue agent	2022-08-23 23:16:36 +03:00
allegroai	f8a6cd697f	Add k8s agent debug env var	2022-08-23 23:15:53 +03:00
allegroai	ec9d027678	Add support for MIG devices, use 0:1 for GPU 0 slice 1 (or use 0.1)	2022-08-01 18:58:42 +03:00
allegroai	48a145a8bd	Fix messages	2022-08-01 18:57:36 +03:00
allegroai	71d2ab4ce7	Add missing use_credentials_chain to config file example	2022-08-01 18:57:04 +03:00
allegroai	12a8872b27	Fix Python 3.10+ support	2022-08-01 18:56:37 +03:00
allegroai	820ab4dc0c	Fix k8s glue debug mode, refactoring	2022-08-01 18:55:49 +03:00
allegroai	1d1ffd17fb	Fix README	2022-07-31 19:36:48 +03:00
allegroai	d96b8ff906	Fix template namespace should override default namespace	2022-07-22 22:44:32 +03:00
allegroai	e687418194	Refactor k8s glue template handling	2022-07-22 22:43:07 +03:00
allegroai	a5a797ec5e	Version bump to v1.3.0	2022-06-16 23:24:28 +03:00
allegroai	ff6cee4a44	Fix requirements --extra-index-url line with trailing comment Fix --extra-index-url is added for different command line switches	2022-06-16 23:22:29 +03:00
allegroai	9acbad28f7	Fix repository URL contains credentials even when agent.force_git_ssh_protocolagent.force_git_ssh_protocol is true	2022-06-16 23:20:53 +03:00
allegroai	560e689ccd	Fix always make `pygobject` an optional package (i.e. if installation fails continue the Task package environment setup)	2022-06-16 23:18:55 +03:00
allegroai	f66e42ddb1	Fix optional priority packaged always compare lower case package name	2022-06-16 23:18:31 +03:00
allegroai	d9856d5de5	Add Python 3.10 support	2022-06-16 23:16:06 +03:00
Niels ten Boom	24177cc5a9	Support private repos from requirements.txt file (#107 ) * support private repos * fix double indices	2022-06-15 10:26:24 +03:00
allegroai	178af0dee8	Bump PyJWT version due to "Key confusion through non-blocklisted public key formats" vulnerability	2022-05-25 16:41:26 +03:00
allegroai	51eb0a713c	Version bump	2022-05-12 23:31:54 +03:00
allegroai	249aa006cb	Make sure that if we have "setuptools" in the original required packages, we preserve the line in the pip freeze list	2022-05-12 23:31:32 +03:00
allegroai	c08e2ac0bb	Fix clearml.conf access in non-root containers	2022-05-05 12:23:11 +03:00
allegroai	335ef91d8e	Fix git unsafe directory issue (disable check on cached vcs folder)	2022-05-05 12:22:40 +03:00
allegroai	6c7a639673	Fix broken pytorch setuptools incompatibility (force setuptools < 59 if torch is below 1.11)	2022-05-05 12:22:13 +03:00
allegroai	5f77cad5ac	Fix error message	2022-04-27 15:36:39 +03:00
allegroai	0228ae0494	Set environment variables before expanding path	2022-04-27 15:14:16 +03:00
allegroai	165677e800	Version bump	2022-04-27 14:59:51 +03:00
allegroai	2e5298b737	Add support for use-owner-token in k8s glue	2022-04-27 14:59:27 +03:00
allegroai	c9ffb8a053	Version bump	2022-04-20 08:57:16 +03:00
allegroai	2466eed23f	Fix dynamic GPUs with "all" GPUs on he same worker	2022-04-20 08:56:22 +03:00
allegroai	6e31171d31	Version bump to v1.2.3	2022-04-14 22:39:38 +03:00
allegroai	592254709e	Fix typo	2022-04-14 22:38:19 +03:00
allegroai	e43f31eb80	Version bump	2022-04-13 10:02:25 +03:00
allegroai	f50ba005b5	Protect dynamic GPUs from failing to parse worker GPU index	2022-04-13 10:01:50 +03:00
allegroai	1011544533	Fix copy breaks agent and nulls the worker name	2022-04-13 10:01:12 +03:00
allegroai	6572023173	Fix avoid reinstall pytorch package if the same version is already installed	2022-04-09 14:18:38 +03:00
allegroai	9c7e2aacd0	Fix PYTHONPATH is overwritten when executing a task (append to it instead)	2022-04-09 14:17:49 +03:00
Allegro AI	715f102f6d	Update README.md	2022-04-01 17:48:27 +03:00
allegroai	5446aed9cf	Version bump to v1.2.2	2022-03-30 20:48:28 +03:00
allegroai	b94ec85461	Fix update should run with -y	2022-03-30 20:48:11 +03:00
allegroai	f55f4f7535	Version bump	2022-03-30 20:11:13 +03:00
allegroai	c87da3a079	Fix apt-get update fail causes apt-get install to not be executed	2022-03-30 20:10:57 +03:00
allegroai	c3590a53a8	Fix CLEARML_AGENT_SKIP_PIP_VENV_INSTALL fails to find python executable	2022-03-30 20:10:08 +03:00
allegroai	a4315722ab	Version bump to vv1.2.1	2022-03-28 18:13:20 +03:00
allegroai	c901bd331c	Fix git packages are installed even if commit is given and is preinstalled when using cached virtual environment	2022-03-28 18:11:46 +03:00
allegroai	df97f170a2	Fix clearml-agent init Use app.clear.ml as default server Add git token refrences	2022-03-24 22:08:06 +02:00
allegroai	a30a2dad66	Add git personal token docs	2022-03-24 22:07:15 +02:00
allegroai	2432f5bb68	Add `CLEARML_AGENT_PROPAGATE_EXITCODE`, set to 1 to let clearml-agent execute retrun a nonzero exit code on failure (notice by default we keep the retrun code 0, the exception is the k8s glue with non-restarting Pods, where users would want to get visibility into failing Tasks, do not use unless you know what to expect from k8s)	2022-03-24 22:04:25 +02:00
allegroai	341086d86a	Fix vcs packages are reinstalled when same commit version is already installed	2022-03-24 22:03:25 +02:00
allegroai	1163c96438	Add `agent.package_manager.force_original_requirements` allowing to only use the "org_pip" coming from dev execution (using this prevents editing the installed packages from the UI)	2022-03-24 22:00:33 +02:00
allegroai	4c120d7cd0	Add ability to override container LOCAL_PYTHON, add auto python support (max 3.15)	2022-03-24 21:58:07 +02:00
Jan Stratil	966a9758b8	Add condition to requirements for typing package (python < 3.5) (#103 ) - According to the maintainer of the typing package, it is recommended to use the typing package with condition for python version since for python3.5 and later typing package is useless (as it is in the stdlib). - Typing package can cause some issues so NOT installing it can solve some of them. Co-authored-by: Jan Stratil <jan.stratil@innovatrics.com>	2022-03-23 15:03:37 +02:00
allegroai	f58071fc74	Fix README	2022-03-20 23:24:07 +02:00
allegroai	8712c5e636	Fix PyTorch aarch64 and windows support	2022-03-16 17:40:21 +02:00
allegroai	a51f9bed49	Version bump	2022-03-15 10:04:45 +02:00
allegroai	531e514003	Add custom build script support Add extra configurations when starting daemon Propagate token to docker in case credentials are not available	2022-03-15 10:04:25 +02:00
allegroai	2cd9e706c8	Fix user-provided `"` is unnecessarily replaced to `\\"`	2022-03-15 10:02:28 +02:00
Idan Tene	e3e6a1dda8	Fix virtualenv python interpreter used (#98 ) * Add virtualenv version logging * Force using requested python interpreter	2022-02-27 11:25:25 +02:00
Andrey Okhotnikov	92b5ce61a0	Add additional k8s-glue dockerfiles (#94 )	2022-02-21 15:59:50 +02:00
pollfly	36073ad488	Fix links (#100 )	2022-02-17 12:04:11 +02:00
allegroai	d89d0f9ff5	Fix pathlib2 six conflict, version bump	2022-02-09 18:29:04 +02:00
allegroai	14c48d0a78	Fix FORCE_LOCAL_CLEARML_AGENT_WHEEL when running from a Windows host	2022-02-09 18:28:17 +02:00
allegroai	b1ee3e105b	Version bump	2022-02-07 20:05:03 +02:00
allegroai	1f53c4fd1b	Fix agent fails to check out code from main branch when branch/commit is not explicitly specified	2022-02-07 20:04:08 +02:00
allegroai	bfed3ccf4d	Fix agent attempts to check out code when in standalone mode	2022-02-07 20:03:08 +02:00
pollfly	d521482409	Add spaces to help menu (#96 )	2022-02-06 12:45:21 +02:00
allegroai	53eba5658f	Fix conda package manager listed packages with local links (@ file://) should ignore the local package if it does not exist Fix cuda patch version support in conda	2022-02-02 16:33:07 +02:00
allegroai	bb64e4a850	Fix hide_docker_command_env_vars mode to include URL passwords and handle env vars containing docker commands	2022-02-02 16:30:34 +02:00
allegroai	771690d5c0	Fix ENV_API_DEFAULT_REQ_METHOD no default value causes ValueError if not specified	2022-01-31 12:39:39 +02:00
pollfly	d39e30995a	Fix links (#93 )	2022-01-27 12:15:36 +02:00
allegroai	363aaeaba8	Fix symbolic links not copied from cached VCS into working copy. Windows platform will result with default copy content instead of original symbolic link (issue #89 )	2022-01-23 10:42:11 +02:00
allegroai	fa1307e62c	Add agent.poetry_version to specify poetry version (and force installation of poetry if missing)	2022-01-23 10:40:05 +02:00
allegroai	e7c9e9695b	Fix using deprecated abc support	2022-01-23 10:39:13 +02:00
Mal Miller	bf07b7f76d	Add environment variable for request method (#91 ) * Add environment variable for default request method	2022-01-12 20:29:17 +02:00
allegroai	5afb604e3d	Fix default_python set to None	2022-01-07 15:12:27 +02:00
allegroai	b3e8be6296	Add agent.force_git_root_python_path configuration setting to force adding the git repository root folder to the PYTHONPATH (if set working directory is not added to the PYHTONPATH)	2022-01-07 15:11:59 +02:00
allegroai	2cb452b1c2	Version bump	2021-12-29 13:21:31 +02:00
allegroai	938fcc4530	Add build --force-docker command line argument to the to allow ignoring task container data	2021-12-29 13:21:25 +02:00
allegroai	73625bf00f	Version bump	2021-12-21 14:29:43 +02:00
allegroai	f41ed09dc1	Add support for custom docker image resolving	2021-12-21 14:29:43 +02:00
allegroai	f03c4576f7	Update default docker image	2021-12-21 14:29:43 +02:00
pshowbs	6c5087e425	Update S3 bucket verify option for minio (#83 ) Use verify configuration option to skip verify or set ca bundle path	2021-11-06 14:40:35 +02:00
allegroai	5a6caf6399	Fix "git+git://" requirements	2021-10-29 22:58:28 +03:00
allegroai	a07053d961	Version bump to v1.1.1	2021-10-26 10:12:21 +03:00
allegroai	aa9a9a25fb	version bump	2021-10-21 12:03:29 +03:00
allegroai	cd4a39d8fc	Fix config example	2021-10-21 12:03:07 +03:00
allegroai	92e3f00435	Add support for truncating task log file after reporting to server	2021-10-21 12:02:31 +03:00
allegroai	a890e36a36	Fix PY2.7 support for pytorch	2021-10-19 10:47:09 +03:00
allegroai	bed94ee431	Add support for configuration env and files section	2021-10-19 10:46:43 +03:00
allegroai	175e99b12b	Fix if queue tag default does not exist and --queue not specified, try queue name "default"	2021-10-16 23:21:45 +03:00
allegroai	2a941e3abf	Fix --stop checking default queue tag (issue #80 )	2021-10-16 23:21:12 +03:00
allegroai	3c8e0ae5db	Improve PyJWT resiliency support	2021-10-10 09:08:36 +03:00
allegroai	e416ab526b	Fix Python 3.5 compatibility	2021-09-26 00:05:08 +03:00
pollfly	e17246d8ea	Fix docstring typos (#79 ) * edit doctring typo * fix typos	2021-09-14 18:42:18 +03:00