Compare commits

..

1 Commits

Author SHA1 Message Date
allegroai
7022df2670 Fix pathlib2 six conflict
Version bump to v1.1.2
2022-02-09 18:08:56 +02:00
52 changed files with 870 additions and 4688 deletions

200
README.md
View File

@@ -8,47 +8,39 @@ ML-Ops scheduler & orchestration solution supporting Linux, macOS and Windows**
[![GitHub license](https://img.shields.io/github/license/allegroai/clearml-agent.svg)](https://img.shields.io/github/license/allegroai/clearml-agent.svg)
[![PyPI pyversions](https://img.shields.io/pypi/pyversions/clearml-agent.svg)](https://img.shields.io/pypi/pyversions/clearml-agent.svg)
[![PyPI version shields.io](https://img.shields.io/pypi/v/clearml-agent.svg)](https://img.shields.io/pypi/v/clearml-agent.svg)
[![PyPI Downloads](https://pepy.tech/badge/clearml-agent/month)](https://pypi.org/project/clearml-agent/)
[![Artifact Hub](https://img.shields.io/endpoint?url=https://artifacthub.io/badge/repository/allegroai)](https://artifacthub.io/packages/search?repo=allegroai)
</div>
---
### ClearML-Agent
#### *Formerly known as Trains Agent*
* Run jobs (experiments) on any local or cloud based resource
* Implement optimized resource utilization policies
* Deploy execution environments with either virtualenv or fully docker containerized with zero effort
* Launch-and-Forget service containers
* [Cloud autoscaling](https://clear.ml/docs/latest/docs/guides/services/aws_autoscaler)
* [Customizable cleanup](https://clear.ml/docs/latest/docs/guides/services/cleanup_service)
*
Advanced [pipeline building and execution](https://clear.ml/docs/latest/docs/guides/frameworks/pytorch/notebooks/table/tabular_training_pipeline)
* Advanced [pipeline building and execution](https://clear.ml/docs/latest/docs/guides/frameworks/pytorch/notebooks/table/tabular_training_pipeline)
It is a zero configuration fire-and-forget execution agent, providing a full ML/DL cluster solution.
**Full Automation in 5 steps**
1. ClearML Server [self-hosted](https://github.com/allegroai/clearml-server)
or [free tier hosting](https://app.clear.ml)
2. `pip install clearml-agent` ([install](#installing-the-clearml-agent) the ClearML Agent on any GPU machine:
on-premises / cloud / ...)
3. Create a [job](https://github.com/allegroai/clearml/docs/clearml-task.md) or
Add [ClearML](https://github.com/allegroai/clearml) to your code with just 2 lines
4. Change the [parameters](#using-the-clearml-agent) in the UI & schedule for [execution](#using-the-clearml-agent) (or
automate with an [AutoML pipeline](#automl-and-orchestration-pipelines-))
1. ClearML Server [self-hosted](https://github.com/allegroai/clearml-server) or [free tier hosting](https://app.community.clear.ml)
2. `pip install clearml-agent` ([install](#installing-the-clearml-agent) the ClearML Agent on any GPU machine: on-premises / cloud / ...)
3. Create a [job](https://github.com/allegroai/clearml/docs/clearml-task.md) or Add [ClearML](https://github.com/allegroai/clearml) to your code with just 2 lines
4. Change the [parameters](#using-the-clearml-agent) in the UI & schedule for [execution](#using-the-clearml-agent) (or automate with an [AutoML pipeline](#automl-and-orchestration-pipelines-))
5. :chart_with_downwards_trend: :chart_with_upwards_trend: :eyes: :beer:
"All the Deep/Machine-Learning DevOps your research needs, and then some... Because ain't nobody got time for that"
**Try ClearML now** [Self Hosted](https://github.com/allegroai/clearml-server)
or [Free tier Hosting](https://app.clear.ml)
<a href="https://app.clear.ml"><img src="https://github.com/allegroai/clearml-agent/blob/master/docs/screenshots.gif?raw=true" width="100%"></a>
**Try ClearML now** [Self Hosted](https://github.com/allegroai/clearml-server) or [Free tier Hosting](https://app.community.clear.ml)
<a href="https://app.community.clear.ml"><img src="https://github.com/allegroai/clearml-agent/blob/master/docs/screenshots.gif?raw=true" width="100%"></a>
### Simple, Flexible Experiment Orchestration
**The ClearML Agent was built to address the DL/ML R&D DevOps needs:**
* Easily add & remove machines from the cluster
@@ -64,23 +56,18 @@ or [Free tier Hosting](https://app.clear.ml)
*epsilon - Because we are :triangular_ruler: and nothing is really zero work
### Kubernetes Integration (Optional)
We think Kubernetes is awesome, but it should be a choice. We designed `clearml-agent` so you can run bare-metal or
inside a pod with any mix that fits your environment.
Find Dockerfiles in the [docker](./docker) dir and a helm Chart in https://github.com/allegroai/clearml-helm-charts
#### Benefits of integrating existing K8s with ClearML-Agent
We think Kubernetes is awesome, but it should be a choice.
We designed `clearml-agent` so you can run bare-metal or inside a pod with any mix that fits your environment.
#### Benefits of integrating existing K8s with ClearML-Agent
- ClearML-Agent adds the missing scheduling capabilities to K8s
- Allowing for more flexible automation from code
- A programmatic interface for easier learning curve (and debugging)
- Seamless integration with ML/DL experiment manager
- Web UI for customization, scheduling & prioritization of jobs
**Two K8s integration flavours**
- Web UI for customization, scheduling & prioritization of jobs
**Two K8s integration flavours**
- Spin ClearML-Agent as a long-lasting service pod
- use [clearml-agent](https://hub.docker.com/r/allegroai/clearml-agent) docker image
- map docker socket into the pod (soon replaced by [podman](https://github.com/containers/podman))
@@ -88,66 +75,57 @@ Find Dockerfiles in the [docker](./docker) dir and a helm Chart in https://githu
- benefits: full use of the ClearML scheduling, no need to worry about wrong container images / lost pods etc.
- downside: Sibling containers
- Kubernetes Glue, map ClearML jobs directly to K8s jobs
- Run the [clearml-k8s glue](https://github.com/allegroai/clearml-agent/blob/master/examples/k8s_glue_example.py) on
a K8s cpu node
- The clearml-k8s glue pulls jobs from the ClearML job execution queue and prepares a K8s job (based on provided
yaml template)
- Inside the pod itself the clearml-agent will install the job (experiment) environment and spin and monitor the
experiment's process
- Run the [clearml-k8s glue](https://github.com/allegroai/clearml-agent/blob/master/examples/k8s_glue_example.py) on a K8s cpu node
- The clearml-k8s glue pulls jobs from the ClearML job execution queue and prepares a K8s job (based on provided yaml template)
- Inside the pod itself the clearml-agent will install the job (experiment) environment and spin and monitor the experiment's process
- benefits: Kubernetes full view of all running jobs in the system
- downside: No real scheduling (k8s scheduler), no docker image verification (post-mortem only)
- downside: No real scheduling (k8s scheduler), no docker image verification (post-mortem only)
### Using the ClearML Agent
**Full scale HPC with a click of a button**
The ClearML Agent is a job scheduler that listens on job queue(s), pulls jobs, sets the job environments, executes the
job and monitors its progress.
The ClearML Agent is a job scheduler that listens on job queue(s), pulls jobs, sets the job environments, executes the job and monitors its progress.
Any 'Draft' experiment can be scheduled for execution by a ClearML agent.
A previously run experiment can be put into 'Draft' state by either of two methods:
* Using the **'Reset'** action from the experiment right-click context menu in the
ClearML UI - This will clear any results and artifacts the previous run had created.
* Using the **'Clone'** action from the experiment right-click context menu in the
ClearML UI - This will create a new 'Draft' experiment with the same configuration as the original experiment.
* Using the **'Reset'** action from the experiment right-click context menu in the ClearML UI - This will clear any
results and artifacts the previous run had created.
* Using the **'Clone'** action from the experiment right-click context menu in the ClearML UI - This will create a new '
Draft' experiment with the same configuration as the original experiment.
An experiment is scheduled for execution using the **'Enqueue'** action from the experiment right-click context menu in
the ClearML UI and selecting the execution queue.
An experiment is scheduled for execution using the **'Enqueue'** action from the experiment
right-click context menu in the ClearML UI and selecting the execution queue.
See [creating an experiment and enqueuing it for execution](#from-scratch).
Once an experiment is enqueued, it will be picked up and executed by a ClearML agent monitoring this queue.
The ClearML UI Workers & Queues page provides ongoing execution information:
- Workers Tab: Monitor you cluster
- Workers Tab: Monitor you cluster
- Review available resources
- Monitor machines statistics (CPU / GPU / Disk / Network)
- Queues Tab:
- Queues Tab:
- Control the scheduling order of jobs
- Cancel or abort job execution
- Move jobs between execution queues
#### What The ClearML Agent Actually Does
The ClearML Agent executes experiments using the following process:
- Create a new virtual environment (or launch the selected docker image)
- Clone the code into the virtual-environment (or inside the docker)
- Install python packages based on the package requirements listed for the experiment
- Special note for PyTorch: The ClearML Agent will automatically select the torch packages based on the CUDA_VERSION
environment variable of the machine
- Execute the code, while monitoring the process
- Log all stdout/stderr in the ClearML UI, including the cloning and installation process, for easy debugging
- Monitor the execution and allow you to manually abort the job using the ClearML UI (or, in the unfortunate case of a
code crash, catch the error and signal the experiment has failed)
- Create a new virtual environment (or launch the selected docker image)
- Clone the code into the virtual-environment (or inside the docker)
- Install python packages based on the package requirements listed for the experiment
- Special note for PyTorch: The ClearML Agent will automatically select the
torch packages based on the CUDA_VERSION environment variable of the machine
- Execute the code, while monitoring the process
- Log all stdout/stderr in the ClearML UI, including the cloning and installation process, for easy debugging
- Monitor the execution and allow you to manually abort the job using the ClearML UI (or, in the unfortunate case of a code crash, catch the error and signal the experiment has failed)
#### System Design & Flow
<img src="https://github.com/allegroai/clearml-agent/blob/master/docs/clearml_architecture.png" width="100%" alt="clearml-architecture">
#### Installing the ClearML Agent
```bash
@@ -157,7 +135,6 @@ pip install clearml-agent
#### ClearML Agent Usage Examples
Full Interface and capabilities are available with
```bash
clearml-agent --help
clearml-agent daemon --help
@@ -169,8 +146,7 @@ clearml-agent daemon --help
clearml-agent init
```
Note: The ClearML Agent uses a cache folder to cache pip packages, apt packages and cloned repositories. The default
ClearML Agent cache folder is `~/.clearml`
Note: The ClearML Agent uses a cache folder to cache pip packages, apt packages and cloned repositories. The default ClearML Agent cache folder is `~/.clearml`
See full details in your configuration file at `~/clearml.conf`
@@ -180,36 +156,29 @@ They are designed to share the same configuration file, see example [here](docs/
#### Running the ClearML Agent
For debug and experimentation, start the ClearML agent in `foreground` mode, where all the output is printed to screen
```bash
clearml-agent daemon --queue default --foreground
```
For actual service mode, all the stdout will be stored automatically into a temporary file (no need to pipe)
Notice: with `--detached` flag, the *clearml-agent* will be running in the background
```bash
clearml-agent daemon --detached --queue default
```
GPU allocation is controlled via the standard OS environment `NVIDIA_VISIBLE_DEVICES` or `--gpus` flag (or disabled
with `--cpu-only`).
GPU allocation is controlled via the standard OS environment `NVIDIA_VISIBLE_DEVICES` or `--gpus` flag (or disabled with `--cpu-only`).
If no flag is set, and `NVIDIA_VISIBLE_DEVICES` variable doesn't exist, all GPU's will be allocated for
the `clearml-agent` <br>
If `--cpu-only` flag is set, or `NVIDIA_VISIBLE_DEVICES="none"`, no gpu will be allocated for
the `clearml-agent`
If no flag is set, and `NVIDIA_VISIBLE_DEVICES` variable doesn't exist, all GPU's will be allocated for the `clearml-agent` <br>
If `--cpu-only` flag is set, or `NVIDIA_VISIBLE_DEVICES` is an empty string (""), no gpu will be allocated for the `clearml-agent`
Example: spin two agents, one per gpu on the same machine:
Notice: with `--detached` flag, the *clearml-agent* will be running in the background
```bash
clearml-agent daemon --detached --gpus 0 --queue default
clearml-agent daemon --detached --gpus 1 --queue default
```
Example: spin two agents, pulling from dedicated `dual_gpu` queue, two gpu's per agent
```bash
clearml-agent daemon --detached --gpus 0,1 --queue dual_gpu
clearml-agent daemon --detached --gpus 2,3 --queue dual_gpu
@@ -218,29 +187,23 @@ clearml-agent daemon --detached --gpus 2,3 --queue dual_gpu
##### Starting the ClearML Agent in docker mode
For debug and experimentation, start the ClearML agent in `foreground` mode, where all the output is printed to screen
```bash
clearml-agent daemon --queue default --docker --foreground
```
For actual service mode, all the stdout will be stored automatically into a file (no need to pipe)
Notice: with `--detached` flag, the *clearml-agent* will be running in the background
```bash
clearml-agent daemon --detached --queue default --docker
```
Example: spin two agents, one per gpu on the same machine, with default nvidia/cuda:10.1-cudnn7-runtime-ubuntu18.04
docker:
Example: spin two agents, one per gpu on the same machine, with default nvidia/cuda:10.1-cudnn7-runtime-ubuntu18.04 docker:
```bash
clearml-agent daemon --detached --gpus 0 --queue default --docker nvidia/cuda:10.1-cudnn7-runtime-ubuntu18.04
clearml-agent daemon --detached --gpus 1 --queue default --docker nvidia/cuda:10.1-cudnn7-runtime-ubuntu18.04
```
Example: spin two agents, pulling from dedicated `dual_gpu` queue, two gpu's per agent, with default nvidia/cuda:
10.1-cudnn7-runtime-ubuntu18.04 docker:
Example: spin two agents, pulling from dedicated `dual_gpu` queue, two gpu's per agent, with default nvidia/cuda:10.1-cudnn7-runtime-ubuntu18.04 docker:
```bash
clearml-agent daemon --detached --gpus 0,1 --queue dual_gpu --docker nvidia/cuda:10.1-cudnn7-runtime-ubuntu18.04
clearml-agent daemon --detached --gpus 2,3 --queue dual_gpu --docker nvidia/cuda:10.1-cudnn7-runtime-ubuntu18.04
@@ -251,61 +214,55 @@ clearml-agent daemon --detached --gpus 2,3 --queue dual_gpu --docker nvidia/cuda
Priority Queues are also supported, example use case:
High priority queue: `important_jobs` Low priority queue: `default`
```bash
clearml-agent daemon --queue important_jobs default
```
The **ClearML Agent** will first try to pull jobs from the `important_jobs` queue, only then it will fetch a job from the `default` queue.
The **ClearML Agent** will first try to pull jobs from the `important_jobs` queue, only then it will fetch a job from
the `default` queue.
Adding queues, managing job order within a queue and moving jobs between queues, is available using the Web UI, see
example on our [free server](https://app.clear.ml/workers-and-queues/queues)
Adding queues, managing job order within a queue and moving jobs between queues, is available using the Web UI, see example on our [free server](https://app.community.clear.ml/workers-and-queues/queues)
##### Stopping the ClearML Agent
To stop a **ClearML Agent** running in the background, run the same command line used to start the agent with `--stop`
appended. For example, to stop the first of the above shown same machine, single gpu agents:
To stop a **ClearML Agent** running in the background, run the same command line used to start the agent with `--stop` appended.
For example, to stop the first of the above shown same machine, single gpu agents:
```bash
clearml-agent daemon --detached --gpus 0 --queue default --docker nvidia/cuda:10.1-cudnn7-runtime-ubuntu18.04 --stop
```
### How do I create an experiment on the ClearML Server? <a name="from-scratch"></a>
* Integrate [ClearML](https://github.com/allegroai/clearml) with your code
* Execute the code on your machine (Manually / PyCharm / Jupyter Notebook)
* As your code is running, **ClearML** creates an experiment logging all the necessary execution information:
- Git repository link and commit ID (or an entire jupyter notebook)
- Git diff (were not saying you never commit and push, but still...)
- Python packages used by your code (including specific versions used)
- Hyper-Parameters
- Input Artifacts
- Git repository link and commit ID (or an entire jupyter notebook)
- Git diff (were not saying you never commit and push, but still...)
- Python packages used by your code (including specific versions used)
- Hyper-Parameters
- Input Artifacts
You now have a 'template' of your experiment with everything required for automated execution
* In the ClearML UI, Right-click on the experiment and select 'clone'. A copy of your experiment will be created.
* In the ClearML UI, Right click on the experiment and select 'clone'. A copy of your experiment will be created.
* You now have a new draft experiment cloned from your original experiment, feel free to edit it
- Change the Hyper-Parameters
- Switch to the latest code base of the repository
- Update package versions
- Select a specific docker image to run in (see docker execution mode section)
- Or simply change nothing to run the same experiment again...
- Change the Hyper-Parameters
- Switch to the latest code base of the repository
- Update package versions
- Select a specific docker image to run in (see docker execution mode section)
- Or simply change nothing to run the same experiment again...
* Schedule the newly created experiment for execution: Right-click the experiment and select 'enqueue'
### ClearML-Agent Services Mode <a name="services"></a>
ClearML-Agent Services is a special mode of ClearML-Agent that provides the ability to launch long-lasting jobs that
previously had to be executed on local / dedicated machines. It allows a single agent to launch multiple dockers (Tasks)
for different use cases. To name a few use cases, auto-scaler service (spinning instances when the need arises and the
budget allows), Controllers (Implementing pipelines and more sophisticated DevOps logic), Optimizer (such as
Hyper-parameter Optimization or sweeping), and Application (such as interactive Bokeh apps for increased data
transparency)
ClearML-Agent Services is a special mode of ClearML-Agent that provides the ability to launch long-lasting jobs
that previously had to be executed on local / dedicated machines. It allows a single agent to
launch multiple dockers (Tasks) for different use cases. To name a few use cases, auto-scaler service (spinning instances
when the need arises and the budget allows), Controllers (Implementing pipelines and more sophisticated DevOps logic),
Optimizer (such as Hyper-parameter Optimization or sweeping), and Application (such as interactive Bokeh apps for
increased data transparency)
ClearML-Agent Services mode will spin **any** task enqueued into the specified queue. Every task launched by
ClearML-Agent Services will be registered as a new node in the system, providing tracking and transparency capabilities.
Currently clearml-agent in services-mode supports cpu only configuration. ClearML-agent services mode can be launched
alongside GPU agents.
ClearML-Agent Services mode will spin **any** task enqueued into the specified queue.
Every task launched by ClearML-Agent Services will be registered as a new node in the system,
providing tracking and transparency capabilities.
Currently clearml-agent in services-mode supports cpu only configuration. ClearML-agent services mode can be launched alongside GPU agents.
```bash
clearml-agent daemon --services-mode --detached --queue services --create-queue --docker ubuntu:18.04 --cpu-only
@@ -313,27 +270,22 @@ clearml-agent daemon --services-mode --detached --queue services --create-queue
**Note**: It is the user's responsibility to make sure the proper tasks are pushed into the specified queue.
### AutoML and Orchestration Pipelines <a name="automl-pipes"></a>
The ClearML Agent can also be used to implement AutoML orchestration and Experiment Pipelines in conjunction with the ClearML package.
The ClearML Agent can also be used to implement AutoML orchestration and Experiment Pipelines in conjunction with the
ClearML package.
Sample AutoML & Orchestration examples can be found in the
ClearML [example/automation](https://github.com/allegroai/clearml/tree/master/examples/automation) folder.
Sample AutoML & Orchestration examples can be found in the ClearML [example/automation](https://github.com/allegroai/clearml/tree/master/examples/automation) folder.
AutoML examples
- [Toy Keras training experiment](https://github.com/allegroai/clearml/blob/master/examples/optimization/hyper-parameter-optimization/base_template_keras_simple.py)
- [Toy Keras training experiment](https://github.com/allegroai/clearml/blob/master/examples/optimization/hyper-parameter-optimization/base_template_keras_simple.py)
- In order to create an experiment-template in the system, this code must be executed once manually
- [Random Search over the above Keras experiment-template](https://github.com/allegroai/clearml/blob/master/examples/automation/manual_random_param_search_example.py)
- This example will create multiple copies of the Keras experiment-template, with different hyper-parameter
combinations
- [Random Search over the above Keras experiment-template](https://github.com/allegroai/clearml/blob/master/examples/automation/manual_random_param_search_example.py)
- This example will create multiple copies of the Keras experiment-template, with different hyper-parameter combinations
Experiment Pipeline examples
- [First step experiment](https://github.com/allegroai/clearml/blob/master/examples/automation/task_piping_example.py)
- [First step experiment](https://github.com/allegroai/clearml/blob/master/examples/automation/task_piping_example.py)
- This example will "process data", and once done, will launch a copy of the 'second step' experiment-template
- [Second step experiment](https://github.com/allegroai/clearml/blob/master/examples/automation/toy_base_task.py)
- [Second step experiment](https://github.com/allegroai/clearml/blob/master/examples/automation/toy_base_task.py)
- In order to create an experiment-template in the system, this code must be executed once manually
### License

View File

@@ -12,7 +12,7 @@ from clearml_agent.definitions import FileBuffering, CONFIG_FILE
from clearml_agent.helper.base import reverse_home_folder_expansion, chain_map, named_temporary_file
from clearml_agent.helper.process import ExitStatus
from . import interface, session, definitions, commands
from .errors import ConfigFileNotFound, Sigterm, APIError, CustomBuildScriptFailed
from .errors import ConfigFileNotFound, Sigterm, APIError
from .helper.trace import PackageTrace
from .interface import get_parser
@@ -44,8 +44,6 @@ def run_command(parser, args, command_name):
debug = command._session.debug_mode
func = getattr(command, command_name)
return func(**args_dict)
except CustomBuildScriptFailed as e:
command_class.exit(e.message, e.errno)
except ConfigFileNotFound:
message = 'Cannot find configuration file in "{}".\n' \
'To create a configuration file, run:\n' \

View File

@@ -11,15 +11,8 @@
# Set GIT user/pass credentials (if user/pass are set, GIT protocol will be set to https)
# leave blank for GIT SSH credentials (set force_git_ssh_protocol=true to force SSH protocol)
# **Notice**: GitHub personal token is equivalent to password, you can put it directly into `git_pass`
# To learn how to generate git token GitHub/Bitbucket/GitLab:
# https://docs.github.com/en/authentication/keeping-your-account-and-data-secure/creating-a-personal-access-token
# https://support.atlassian.com/bitbucket-cloud/docs/app-passwords/
# https://docs.gitlab.com/ee/user/profile/personal_access_tokens.html
# git_user: ""
# git_pass: ""
# Limit credentials to a single domain, for example: github.com,
# all other domains will use public access (no user/pass). Default: always send user/pass for any VCS domain
# git_host: ""
# Force GIT protocol to use SSH regardless of the git url (Assumes GIT user/pass are blank)
@@ -37,22 +30,6 @@
# specific python version and the system supports multiple python the agent will use the requested python version)
# ignore_requested_python_version: true
# Force the root folder of the git repository (instead of the working directory) into the PYHTONPATH
# default false, only the working directory will be added to the PYHTONPATH
# force_git_root_python_path: false
# if set, use GIT_ASKPASS to pass user/pass when cloning / fetch repositories
# it solves passing user/token to git submodules.
# this is a safer way to ensure multiple users using the same repository will
# not accidentally leak credentials
# Only supported on Linux systems, it will be the default in future releases
# enable_git_ask_pass: false
# in docker mode, if container's entrypoint automatically activated a virtual environment
# use the activated virtual environment and install everything there
# set to False to disable, and always create a new venv inheriting from the system_site_packages
# docker_use_activated_venv: true
# select python package manager:
# currently supported: pip, conda and poetry
# if "pip" or "conda" are used, the agent installs the required packages
@@ -65,10 +42,8 @@
# supported options: pip, conda, poetry
type: pip,
# specify pip version to use (examples "<20.2", "==19.3.1", "", empty string will install the latest version)
pip_version: "<21",
# specify poetry version to use (examples "<2", "==1.1.1", "", empty string will install the latest version)
# poetry_version: "<2",
# specify pip version to use (examples "<20", "==19.3.1", "", empty string will install the latest version)
pip_version: "<20.2",
# virtual environment inheres packages from system
system_site_packages: false,
@@ -77,7 +52,7 @@
force_upgrade: false,
# additional artifact repositories to use when installing python packages
# extra_index_url: ["https://allegroai.jfrog.io/clearml/api/pypi/public/simple"]
# extra_index_url: ["https://allegroai.jfrog.io/clearmlai/api/pypi/public/simple"]
# additional conda channels to use when installing with conda package manager
conda_channels: ["pytorch", "conda-forge", "defaults", ]
@@ -92,7 +67,7 @@
# set the optional priority packages to be installed before the rest of the required packages,
# In case a package installation fails, the package will be ignored,
# and the virtual environment process will continue
priority_optional_packages: ["pygobject", ]
# priority_optional_packages: ["pygobject", ]
# set the post packages to be installed after all the rest of the required packages
# post_packages: ["horovod", ]
@@ -117,7 +92,7 @@
# minimum required free space to allow for cache entry, disable by passing 0 or negative value
free_space_threshold_gb: 2.0
# unmark to enable virtual environment caching
path: ~/.clearml/venvs-cache
# path: ~/.clearml/venvs-cache
},
# cached git clone folder
@@ -139,12 +114,6 @@
},
translate_ssh: true,
# set "disable_ssh_mount: true" to disable the automatic mount of ~/.ssh folder into the docker containers
# default is false, automatically mounts ~/.ssh
# Must be set to True if using "clearml-session" with this agent!
# disable_ssh_mount: false
# reload configuration file every daemon execution
reload_config: false,
@@ -187,7 +156,7 @@
default_docker: {
# default docker image to use when running in docker mode
image: "nvidia/cuda:10.2-cudnn7-runtime-ubuntu18.04"
image: "nvidia/cuda:10.1-cudnn7-runtime-ubuntu18.04"
# optional arguments to pass to docker image
# arguments: ["--ipc=host", ]
@@ -217,8 +186,8 @@
# default is True, report a single \r line in a sequence of consecutive lines, per 5 seconds.
# suppress_carriage_return: true
# CUDA versions used for Conda setup & solving PyTorch wheel packages
# Should be detected automatically. Override with os environment CUDA_VERSION / CUDNN_VERSION
# cuda versions used for solving pytorch wheel packages
# should be detected automatically. Override with os environment CUDA_VERSION / CUDNN_VERSION
# cuda_version: 10.1
# cudnn_version: 7.6
@@ -232,31 +201,26 @@
hide_docker_command_env_vars {
enabled: true
extra_keys: []
parse_embedded_urls: true
}
# Maximum execution time (in seconds) for Task's abort function call
abort_callback_max_timeout: 1800
# allow to set internal mount points inside the docker,
# especially useful for non-root docker container images.
docker_internal_mounts {
sdk_cache: "/clearml_agent_cache"
apt_cache: "/var/cache/apt/archives"
ssh_folder: "~/.ssh"
ssh_ro_folder: "/.ssh"
ssh_folder: "/root/.ssh"
pip_cache: "/root/.cache/pip"
poetry_cache: "/root/.cache/pypoetry"
vcs_cache: "/root/.clearml/vcs-cache"
venv_build: "~/.clearml/venvs-builds"
venv_build: "/root/.clearml/venvs-builds"
pip_download: "/root/.clearml/pip-download-cache"
}
# Name docker containers created by the daemon using the following string format (supported from Docker 0.6.5)
# Allowed variables are task_id, worker_id and rand_string (random lower-case letters string, up to 32 characters)
# Note: resulting name must start with an alphanumeric character and
# continue with alphanumeric characters, underscores (_), dots (.) and/or dashes (-)
# docker_container_name_format: "clearml-id-{task_id}-{rand_string:.8}"
# Note: resulting name must start with an alphanumeric character and continue with alphanumeric characters,
# underscores (_), dots (.) and/or dashes (-)
#docker_container_name_format: "clearml-id-{task_id}-{rand_string:.8}"
# Apply top-level environment section from configuration into os.environ
apply_environment: true
@@ -297,39 +261,4 @@
# target_format: json
# }
# }
# Specifies a custom environment setup script to be executed instead of installing a virtual environment.
# If provided, this script is executed following Git cloning. Script command may include environment variable and
# will be expanded before execution (e.g. "$CLEARML_GIT_ROOT/script.sh").
# The script can also be specified using the CLEARML_AGENT_CUSTOM_BUILD_SCRIPT environment variable.
#
# When running the script, the following environment variables will be set:
# - CLEARML_CUSTOM_BUILD_TASK_CONFIG_JSON: specifies a path to a temporary files containing the complete task
# contents in JSON format
# - CLEARML_TASK_SCRIPT_ENTRY: task entrypoint script as defined in the task's script section
# - CLEARML_TASK_WORKING_DIR: task working directory as defined in the task's script section
# - CLEARML_VENV_PATH: path to the agent's default virtual environment path (as defined in the configuration)
# - CLEARML_GIT_ROOT: path to the cloned Git repository
# - CLEARML_CUSTOM_BUILD_OUTPUT: a path to a non-existing file that may be created by the script. If created,
# this file must be in the following JSON format:
# ```json
# {
# "binary": "/absolute/path/to/python-executable",
# "entry_point": "/absolute/path/to/task-entrypoint-script",
# "working_dir": "/absolute/path/to/task-working/dir"
# }
# ```
# If provided, the agent will use these instead of the predefined task script section to execute the task and will
# skip virtual environment creation.
#
# In case the custom script returns with a non-zero exit code, the agent will fail with the same exit code.
# In case the custom script is specified but does not exist, or if the custom script does not write valid content
# into the file specified in CLEARML_CUSTOM_BUILD_OUTPUT, the agent will emit a warning and continue with the
# standard flow.
custom_build_script: ""
# Crash on exception: by default when encountering an exception while running a task,
# the agent will catch the exception, log it and continue running.
# Set this to `true` to propagate exceptions and crash the agent.
# crash_on_exception: true
}

View File

@@ -28,9 +28,6 @@
pool_maxsize: 512
pool_connections: 512
# Override the default http method, use "put" if working behind GCP load balancer (default: "get")
# default_method: "get"
}
auth {

View File

@@ -4,7 +4,7 @@ import re
import attr
import six
from clearml_agent.external import pyhocon
import pyhocon
from .action import Action

View File

@@ -15,17 +15,6 @@ ENV_NO_DEFAULT_SERVER = EnvEntry("CLEARML_NO_DEFAULT_SERVER", "TRAINS_NO_DEFAULT
ENV_DISABLE_VAULT_SUPPORT = EnvEntry('CLEARML_AGENT_DISABLE_VAULT_SUPPORT', type=bool)
ENV_ENABLE_ENV_CONFIG_SECTION = EnvEntry('CLEARML_AGENT_ENABLE_ENV_CONFIG_SECTION', type=bool)
ENV_ENABLE_FILES_CONFIG_SECTION = EnvEntry('CLEARML_AGENT_ENABLE_FILES_CONFIG_SECTION', type=bool)
ENV_VENV_CONFIGURED = EnvEntry('VIRTUAL_ENV', type=str)
ENV_PROPAGATE_EXITCODE = EnvEntry("CLEARML_AGENT_PROPAGATE_EXITCODE", type=bool, default=False)
ENV_INITIAL_CONNECT_RETRY_OVERRIDE = EnvEntry(
'CLEARML_AGENT_INITIAL_CONNECT_RETRY_OVERRIDE', default=True, converter=safe_text_to_bool
)
"""
Experimental option to set the request method for all API requests and auth login.
This could be useful when GET requests with payloads are blocked by a server as
POST requests can be used instead.
However this has not been vigorously tested and may have unintended consequences.
"""
ENV_API_DEFAULT_REQ_METHOD = EnvEntry("CLEARML_API_DEFAULT_REQ_METHOD", default="GET")

View File

@@ -5,18 +5,10 @@ import six
from .apimodel import ApiModel
from .datamodel import DataModel
from .defs import ENV_API_DEFAULT_REQ_METHOD
if ENV_API_DEFAULT_REQ_METHOD.get().upper() not in ("GET", "POST", "PUT"):
raise ValueError(
"CLEARML_API_DEFAULT_REQ_METHOD environment variable must be 'get' or 'post' (any case is allowed)."
)
class Request(ApiModel):
def_method = ENV_API_DEFAULT_REQ_METHOD.get(default="get")
_method = ENV_API_DEFAULT_REQ_METHOD.get(default="get")
_method = 'get'
def __init__(self, **kwargs):
if kwargs:

View File

@@ -9,15 +9,13 @@ from typing import Optional
import jwt
import requests
import six
from pyhocon import ConfigTree, ConfigFactory
from requests.auth import HTTPBasicAuth
from six.moves.urllib.parse import urlparse, urlunparse
from clearml_agent.external.pyhocon import ConfigTree, ConfigFactory
from .callresult import CallResult
from .defs import (
ENV_VERBOSE, ENV_HOST, ENV_ACCESS_KEY, ENV_SECRET_KEY, ENV_WEB_HOST, ENV_FILES_HOST, ENV_AUTH_TOKEN,
ENV_NO_DEFAULT_SERVER, ENV_DISABLE_VAULT_SUPPORT, ENV_INITIAL_CONNECT_RETRY_OVERRIDE, ENV_API_DEFAULT_REQ_METHOD, )
from .defs import ENV_VERBOSE, ENV_HOST, ENV_ACCESS_KEY, ENV_SECRET_KEY, ENV_WEB_HOST, ENV_FILES_HOST, ENV_AUTH_TOKEN, \
ENV_NO_DEFAULT_SERVER, ENV_DISABLE_VAULT_SUPPORT, ENV_INITIAL_CONNECT_RETRY_OVERRIDE
from .request import Request, BatchRequest
from .token_manager import TokenManager
from ..config import load
@@ -112,19 +110,6 @@ class Session(TokenManager):
self._logger = logger
self.__auth_token = None
if ENV_API_DEFAULT_REQ_METHOD.get(default=None):
# Make sure we update the config object, so we pass it into the new containers when we map them
self.config.put("api.http.default_method", ENV_API_DEFAULT_REQ_METHOD.get())
# notice the default setting of Request.def_method are already set by the OS environment
elif self.config.get("api.http.default_method", None):
def_method = str(self.config.get("api.http.default_method", None)).strip()
if def_method.upper() not in ("GET", "POST", "PUT"):
raise ValueError(
"api.http.default_method variable must be 'get' or 'post' (any case is allowed)."
)
Request.def_method = def_method
Request._method = Request.def_method
if ENV_AUTH_TOKEN.get(
value_cb=lambda key, value: print("Using environment access token {}=********".format(key))
):
@@ -157,7 +142,7 @@ class Session(TokenManager):
"Could not find host server definition "
"(missing `~/clearml.conf` or Environment CLEARML_API_HOST)\n"
"To get started with ClearML: setup your own `clearml-server`, "
"or create a free account at https://app.clear.ml and run `clearml-agent init`"
"or create a free account at https://app.community.clear.ml and run `clearml-agent init`"
)
self.__host = host.strip("/")
@@ -221,7 +206,7 @@ class Session(TokenManager):
http_retries_config = dict(**http_retries_config)
http_retries_config['connect'] = connect_retries
return http_retries_config, get_http_session_with_retry(config=self.config or None, **http_retries_config)
return http_retries_config, get_http_session_with_retry(**http_retries_config)
def load_vaults(self):
if not self.check_min_api_version("2.15") or self.feature_set == "basic":
@@ -255,18 +240,12 @@ class Session(TokenManager):
except Exception as ex:
print("Failed getting vaults: {}".format(ex))
def verify_feature_set(self, feature_set):
if isinstance(feature_set, str):
feature_set = [feature_set]
if self.feature_set not in feature_set:
raise ValueError('ClearML-server does not support requested feature set {}'.format(feature_set))
def _send_request(
self,
service,
action,
version=None,
method=Request.def_method,
method="get",
headers=None,
auth=None,
data=None,
@@ -343,7 +322,7 @@ class Session(TokenManager):
service,
action,
version=None,
method=Request.def_method,
method="get",
headers=None,
data=None,
json=None,
@@ -386,7 +365,7 @@ class Session(TokenManager):
headers=None,
data=None,
json=None,
method=Request.def_method,
method="get",
):
"""
Send a raw batch API request. Batch requests always use application/json-lines content type.
@@ -630,7 +609,6 @@ class Session(TokenManager):
try:
data = {"expiration_sec": exp} if exp else {}
res = self._send_request(
method=Request.def_method,
service="auth",
action="login",
auth=auth,

View File

@@ -7,8 +7,10 @@ import sys
from os.path import expanduser
from typing import Any
import pyhocon
import six
from pathlib2 import Path
from pyhocon import ConfigTree, ConfigFactory
from pyparsing import (
ParseFatalException,
ParseException,
@@ -16,9 +18,6 @@ from pyparsing import (
ParseSyntaxException,
)
from clearml_agent.external import pyhocon
from clearml_agent.external.pyhocon import ConfigTree, ConfigFactory
from .defs import (
Environment,
DEFAULT_CONFIG_FOLDER,
@@ -192,20 +191,16 @@ class Config(object):
config, self._read_extra_env_config_values(), copy_trees=True
)
config = self.resolve_override_configs(config)
if self._overrides_configs:
config = functools.reduce(
lambda cfg, override: ConfigTree.merge_configs(cfg, override, copy_trees=True),
self._overrides_configs,
config,
)
config["env"] = env
return config
def resolve_override_configs(self, initial=None):
if not self._overrides_configs:
return initial
return functools.reduce(
lambda cfg, override: ConfigTree.merge_configs(cfg, override, copy_trees=True),
self._overrides_configs,
initial or ConfigTree(),
)
def _read_extra_env_config_values(self) -> ConfigTree:
""" Loads extra configuration from environment-injected values """
result = ConfigTree()
@@ -294,9 +289,6 @@ class Config(object):
)
return value
def put(self, key, value):
self._config.put(key, value)
def to_dict(self):
return self._config.as_plain_ordered_dict()

View File

@@ -14,14 +14,6 @@ except ImportError:
ConverterType = TypeVar("ConverterType", bound=Callable[[Any], Any])
def text_to_int(value, default=0):
# type: (Any, int) -> int
try:
return int(value)
except (ValueError, TypeError):
return default
def base64_to_text(value):
# type: (Any) -> Text
return base64.b64decode(value).decode("utf-8")

View File

@@ -4,7 +4,7 @@ from os.path import expandvars, expanduser
from pathlib import Path
from typing import List, TYPE_CHECKING
from clearml_agent.external.pyhocon import HOCONConverter, ConfigTree
from pyhocon import HOCONConverter, ConfigTree
if TYPE_CHECKING:
from .config import Config

View File

@@ -347,7 +347,7 @@ class ServiceCommandSection(BaseCommandSection):
except AttributeError:
raise NameResolutionError('Name resolution unavailable for {}'.format(service))
request = request_cls.from_dict(dict(name=re.escape(name), only_fields=['name', 'id']))
request = request_cls.from_dict(dict(name=name, only_fields=['name', 'id']))
# from_dict will ignore unrecognised keyword arguments - not all GetAll's have only_fields
response = getattr(self._session.send_api(request), service)
matches = [db_object for db_object in response if name.lower() == db_object.name.lower()]

View File

@@ -1,20 +1,20 @@
from __future__ import print_function
from six.moves import input
from pyhocon import ConfigFactory, ConfigMissingException
from pathlib2 import Path
from six.moves.urllib.parse import urlparse
from clearml_agent.external.pyhocon import ConfigFactory, ConfigMissingException
from clearml_agent.backend_api.session import Session
from clearml_agent.backend_api.session.defs import ENV_HOST
from clearml_agent.backend_config.defs import LOCAL_CONFIG_FILES
description = """
Please create new clearml credentials through the settings page in your `clearml-server` web app,
or create a free account at https://app.clear.ml/settings/webapp-configuration
Please create new clearml credentials through the profile page in your `clearml-server` web app,
or create a free account at https://app.community.clear.ml/profile
In the settings > workspace page, press "Create new credentials", then press "Copy to clipboard".
In the profile page, press "Create new credentials", then press "Copy to clipboard".
Paste copied configuration here:
"""
@@ -27,9 +27,9 @@ except Exception:
host_description = """
Editing configuration file: {CONFIG_FILE}
Enter the url of the clearml-server's Web service, for example: {HOST} or https://app.clear.ml
Enter the url of the clearml-server's Web service, for example: {HOST}
""".format(
CONFIG_FILE=LOCAL_CONFIG_FILES[-1],
CONFIG_FILE=LOCAL_CONFIG_FILES[0],
HOST=def_host,
)
@@ -84,7 +84,7 @@ def main():
host = input_url('API Host', api_server)
else:
print(host_description)
host = input_url('WEB Host', 'https://app.clear.ml')
host = input_url('WEB Host', '')
parsed_host = verify_url(host)
api_host, files_host, web_host = parse_host(parsed_host, allow_input=True)
@@ -116,15 +116,9 @@ def main():
print('Enter git username for repository cloning (leave blank for SSH key authentication): [] ', end='')
git_user = input()
if git_user.strip():
print(
"Git personal token is equivalent to a password, to learn how to generate a token:\n"
" GitHub: https://docs.github.com/en/authentication/keeping-your-account-and-data-secure/creating-a-personal-access-token\n" # noqa
" Bitbucket: https://support.atlassian.com/bitbucket-cloud/docs/app-passwords/\n"
" GitLab: https://docs.gitlab.com/ee/user/profile/personal_access_tokens.html\n"
)
print('Enter git personal token for user \'{}\': '.format(git_user), end='')
print('Enter password for user \'{}\': '.format(git_user), end='')
git_pass = input()
print('Git repository cloning will be using user={} token={}'.format(git_user, git_pass))
print('Git repository cloning will be using user={} password={}'.format(git_user, git_pass))
else:
git_user = None
git_pass = None
@@ -163,7 +157,7 @@ def main():
' api_server: %s\n' \
' web_server: %s\n' \
' files_server: %s\n' \
' # Credentials are generated using the webapp, %s/settings\n' \
' # Credentials are generated using the webapp, %s/profile\n' \
' # Override with os environment: CLEARML_API_ACCESS_KEY / CLEARML_API_SECRET_KEY\n' \
' credentials {"access_key": "%s", "secret_key": "%s"}\n' \
'}\n\n' % (api_host, web_host, files_host,

View File

@@ -3,6 +3,8 @@ from __future__ import print_function
import json
import time
from future.builtins import super
from clearml_agent.commands.base import ServiceCommandSection
from clearml_agent.helper.base import return_list

View File

@@ -1,168 +0,0 @@
import json
import re
import shlex
from clearml_agent.backend_api.session import Request
from clearml_agent.helper.package.requirements import (
RequirementsManager, MarkerRequirement,
compare_version_rules, )
def resolve_default_container(session, task_id, container_config):
container_lookup = session.config.get('agent.default_docker.match_rules', None)
if not session.check_min_api_version("2.13") or not container_lookup:
return container_config
# check backend support before sending any more requests (because they will fail and crash the Task)
try:
session.verify_feature_set('advanced')
except ValueError:
return container_config
result = session.send_request(
service='tasks',
action='get_all',
version='2.14',
json={'id': [task_id],
'only_fields': ['script.requirements', 'script.binary',
'script.repository', 'script.branch',
'project', 'container'],
'search_hidden': True},
method=Request.def_method,
async_enable=False,
)
try:
task_info = result.json()['data']['tasks'][0] if result.ok else {}
except (ValueError, TypeError):
return container_config
from clearml_agent.external.requirements_parser.requirement import Requirement
# store tasks repository
repository = task_info.get('script', {}).get('repository') or ''
branch = task_info.get('script', {}).get('branch') or ''
binary = task_info.get('script', {}).get('binary') or ''
requested_container = task_info.get('container', {})
# get project full path
project_full_name = ''
if task_info.get('project', None):
result = session.send_request(
service='projects',
action='get_all',
version='2.13',
json={
'id': [task_info.get('project')],
'only_fields': ['name'],
},
method=Request.def_method,
async_enable=False,
)
try:
if result.ok:
project_full_name = result.json()['data']['projects'][0]['name'] or ''
except (ValueError, TypeError):
pass
task_packages_lookup = {}
for entry in container_lookup:
match = entry.get('match', None)
if not match:
continue
if match.get('project', None):
# noinspection PyBroadException
try:
if not re.search(match.get('project', None), project_full_name):
continue
except Exception:
print('Failed parsing regular expression \"{}\" in rule: {}'.format(
match.get('project', None), entry))
continue
if match.get('script.repository', None):
# noinspection PyBroadException
try:
if not re.search(match.get('script.repository', None), repository):
continue
except Exception:
print('Failed parsing regular expression \"{}\" in rule: {}'.format(
match.get('script.repository', None), entry))
continue
if match.get('script.branch', None):
# noinspection PyBroadException
try:
if not re.search(match.get('script.branch', None), branch):
continue
except Exception:
print('Failed parsing regular expression \"{}\" in rule: {}'.format(
match.get('script.branch', None), entry))
continue
if match.get('script.binary', None):
# noinspection PyBroadException
try:
if not re.search(match.get('script.binary', None), binary):
continue
except Exception:
print('Failed parsing regular expression \"{}\" in rule: {}'.format(
match.get('script.binary', None), entry))
continue
if match.get('container', None):
# noinspection PyBroadException
try:
if not re.search(match.get('container', None), requested_container.get('image', '')):
continue
except Exception:
print('Failed parsing regular expression \"{}\" in rule: {}'.format(
match.get('container', None), entry))
continue
matched = True
for req_section in ['script.requirements.pip', 'script.requirements.conda']:
if not match.get(req_section, None):
continue
match_pip_reqs = [MarkerRequirement(Requirement.parse('{} {}'.format(k, v)))
for k, v in match.get(req_section, None).items()]
if not task_packages_lookup.get(req_section):
req_section_parts = req_section.split('.')
task_packages_lookup[req_section] = \
RequirementsManager.parse_requirements_section_to_marker_requirements(
requirements=task_info.get(req_section_parts[0], {}).get(
req_section_parts[1], {}).get(req_section_parts[2], None)
)
matched_all_reqs = True
for mr in match_pip_reqs:
matched_req = False
for pr in task_packages_lookup[req_section]:
if mr.req.name != pr.req.name:
continue
if compare_version_rules(mr.specs, pr.specs):
matched_req = True
break
if not matched_req:
matched_all_reqs = False
break
# if ew have a match, check second section
if matched_all_reqs:
continue
# no match stop
matched = False
break
if matched:
if not container_config.get('container'):
container_config['container'] = entry.get('image', None)
if not container_config.get('arguments'):
container_config['arguments'] = entry.get('arguments', None)
container_config['arguments'] = shlex.split(str(container_config.get('arguments') or '').strip())
print('Matching default container with rule:\n{}'.format(json.dumps(entry)))
return container_config
return container_config

File diff suppressed because it is too large Load Diff

View File

@@ -1,6 +1,6 @@
import six
from pyhocon import ConfigTree
from clearml_agent.external.pyhocon import ConfigTree
import six
from clearml_agent.helper.base import Singleton

View File

@@ -87,7 +87,6 @@ ENVIRONMENT_CONFIG = {
"agent.cpu_only": EnvironmentConfig(
names=("CLEARML_CPU_ONLY", "TRAINS_CPU_ONLY", "CPU_ONLY"), type=bool
),
"agent.crash_on_exception": EnvironmentConfig("CLEAMRL_AGENT_CRASH_ON_EXCEPTION", type=bool),
"sdk.aws.s3.key": EnvironmentConfig("AWS_ACCESS_KEY_ID"),
"sdk.aws.s3.secret": ENV_AWS_SECRET_KEY,
"sdk.aws.s3.region": EnvironmentConfig("AWS_DEFAULT_REGION"),
@@ -127,7 +126,6 @@ DEFAULT_VENV_UPDATE_URL = (
"https://raw.githubusercontent.com/Yelp/venv-update/v3.2.4/venv_update.py"
)
WORKING_REPOSITORY_DIR = "task_repository"
WORKING_STANDALONE_DIR = "code"
DEFAULT_VCS_CACHE = normalize_path(CONFIG_DIR, "vcs-cache")
PIP_EXTRA_INDICES = [
]
@@ -136,7 +134,6 @@ ENV_DOCKER_IMAGE = EnvironmentConfig('CLEARML_DOCKER_IMAGE', 'TRAINS_DOCKER_IMAG
ENV_WORKER_ID = EnvironmentConfig('CLEARML_WORKER_ID', 'TRAINS_WORKER_ID')
ENV_WORKER_TAGS = EnvironmentConfig('CLEARML_WORKER_TAGS')
ENV_AGENT_SKIP_PIP_VENV_INSTALL = EnvironmentConfig('CLEARML_AGENT_SKIP_PIP_VENV_INSTALL')
ENV_AGENT_SKIP_PYTHON_ENV_INSTALL = EnvironmentConfig('CLEARML_AGENT_SKIP_PYTHON_ENV_INSTALL', type=bool)
ENV_DOCKER_SKIP_GPUS_FLAG = EnvironmentConfig('CLEARML_DOCKER_SKIP_GPUS_FLAG', 'TRAINS_DOCKER_SKIP_GPUS_FLAG')
ENV_AGENT_GIT_USER = EnvironmentConfig('CLEARML_AGENT_GIT_USER', 'TRAINS_AGENT_GIT_USER')
ENV_AGENT_GIT_PASS = EnvironmentConfig('CLEARML_AGENT_GIT_PASS', 'TRAINS_AGENT_GIT_PASS')
@@ -149,42 +146,6 @@ ENV_DOCKER_HOST_MOUNT = EnvironmentConfig('CLEARML_AGENT_K8S_HOST_MOUNT', 'CLEAR
'TRAINS_AGENT_K8S_HOST_MOUNT', 'TRAINS_AGENT_DOCKER_HOST_MOUNT')
ENV_VENV_CACHE_PATH = EnvironmentConfig('CLEARML_AGENT_VENV_CACHE_PATH')
ENV_EXTRA_DOCKER_ARGS = EnvironmentConfig('CLEARML_AGENT_EXTRA_DOCKER_ARGS', type=list)
ENV_DEBUG_INFO = EnvironmentConfig('CLEARML_AGENT_DEBUG_INFO')
ENV_CHILD_AGENTS_COUNT_CMD = EnvironmentConfig('CLEARML_AGENT_CHILD_AGENTS_COUNT_CMD')
ENV_DOCKER_ARGS_FILTERS = EnvironmentConfig('CLEARML_AGENT_DOCKER_ARGS_FILTERS')
ENV_DOCKER_ARGS_HIDE_ENV = EnvironmentConfig('CLEARML_AGENT_DOCKER_ARGS_HIDE_ENV')
ENV_CUSTOM_BUILD_SCRIPT = EnvironmentConfig('CLEARML_AGENT_CUSTOM_BUILD_SCRIPT')
"""
Specifies a custom environment setup script to be executed instead of installing a virtual environment.
If provided, this script is executed following Git cloning. Script command may include environment variable and
will be expanded before execution (e.g. "$CLEARML_GIT_ROOT/script.sh").
The script can also be specified using the `agent.custom_build_script` configuration setting.
When running the script, the following environment variables will be set:
- CLEARML_CUSTOM_BUILD_TASK_CONFIG_JSON: specifies a path to a temporary files containing the complete task
contents in JSON format
- CLEARML_TASK_SCRIPT_ENTRY: task entrypoint script as defined in the task's script section
- CLEARML_TASK_WORKING_DIR: task working directory as defined in the task's script section
- CLEARML_VENV_PATH: path to the agent's default virtual environment path (as defined in the configuration)
- CLEARML_GIT_ROOT: path to the cloned Git repository
- CLEARML_CUSTOM_BUILD_OUTPUT: a path to a non-existing file that may be created by the script. If created,
this file must be in the following JSON format:
```json
{
"binary": "/absolute/path/to/python-executable",
"entry_point": "/absolute/path/to/task-entrypoint-script",
"working_dir": "/absolute/path/to/task-working/dir"
}
```
If provided, the agent will use these instead of the predefined task script section to execute the task and will
skip virtual environment creation.
In case the custom script returns with a non-zero exit code, the agent will fail with the same exit code.
In case the custom script is specified but does not exist, or if the custom script does not write valid content
into the file specified in CLEARML_CUSTOM_BUILD_OUTPUT, the agent will emit a warning and continue with the
standard flow.
"""
class FileBuffering(IntEnum):

View File

@@ -84,13 +84,3 @@ class MissingPackageError(CommandFailedError):
def __str__(self):
return '{self.__class__.__name__}: ' \
'"{self.name}" package is required. Please run "pip install {self.name}"'.format(self=self)
class CustomBuildScriptFailed(CommandFailedError):
def __init__(self, errno, *args, **kwargs):
super(CustomBuildScriptFailed, self).__init__(*args, **kwargs)
self.errno = errno
class SkippedCustomBuildScript(CommandFailedError):
pass

View File

@@ -1,5 +0,0 @@
from .config_parser import ConfigParser, ConfigFactory, ConfigMissingException
from .config_tree import ConfigTree
from .converter import HOCONConverter
__all__ = ["ConfigParser", "ConfigFactory", "ConfigMissingException", "ConfigTree", "HOCONConverter"]

View File

@@ -1,762 +0,0 @@
import itertools
import re
import os
import socket
import contextlib
import codecs
from datetime import timedelta
from pyparsing import Forward, Keyword, QuotedString, Word, Literal, Suppress, Regex, Optional, SkipTo, ZeroOrMore, \
Group, lineno, col, TokenConverter, replaceWith, alphanums, alphas8bit, ParseSyntaxException, StringEnd
from pyparsing import ParserElement
from .config_tree import ConfigTree, ConfigSubstitution, ConfigList, ConfigValues, ConfigUnquotedString, \
ConfigInclude, NoneValue, ConfigQuotedString
from .exceptions import ConfigSubstitutionException, ConfigMissingException, ConfigException
import logging
import copy
use_urllib2 = False
try:
# For Python 3.0 and later
from urllib.request import urlopen
from urllib.error import HTTPError, URLError
except ImportError: # pragma: no cover
# Fall back to Python 2's urllib2
from urllib2 import urlopen, HTTPError, URLError
use_urllib2 = True
try:
basestring
except NameError: # pragma: no cover
basestring = str
unicode = str
logger = logging.getLogger(__name__)
#
# Substitution Defaults
#
class DEFAULT_SUBSTITUTION(object):
pass
class MANDATORY_SUBSTITUTION(object):
pass
class NO_SUBSTITUTION(object):
pass
class STR_SUBSTITUTION(object):
pass
def period(period_value, period_unit):
try:
from dateutil.relativedelta import relativedelta as period_impl
except Exception:
from datetime import timedelta as period_impl
if period_unit == 'nanoseconds':
period_unit = 'microseconds'
period_value = int(period_value / 1000)
arguments = dict(zip((period_unit,), (period_value,)))
if period_unit == 'milliseconds':
return timedelta(**arguments)
return period_impl(**arguments)
class ConfigFactory(object):
@classmethod
def parse_file(cls, filename, encoding='utf-8', required=True, resolve=True, unresolved_value=DEFAULT_SUBSTITUTION):
"""Parse file
:param filename: filename
:type filename: basestring
:param encoding: file encoding
:type encoding: basestring
:param required: If true, raises an exception if can't load file
:type required: boolean
:param resolve: if true, resolve substitutions
:type resolve: boolean
:param unresolved_value: assigned value value to unresolved substitution.
If overriden with a default value, it will replace all unresolved value to the default value.
If it is set to to pyhocon.STR_SUBSTITUTION then it will replace the value by its
substitution expression (e.g., ${x})
:type unresolved_value: class
:return: Config object
:type return: Config
"""
try:
with codecs.open(filename, 'r', encoding=encoding) as fd:
content = fd.read()
return cls.parse_string(content, os.path.dirname(filename), resolve, unresolved_value)
except IOError as e:
if required:
raise e
logger.warn('Cannot include file %s. File does not exist or cannot be read.', filename)
return []
@classmethod
def parse_URL(cls, url, timeout=None, resolve=True, required=False, unresolved_value=DEFAULT_SUBSTITUTION):
"""Parse URL
:param url: url to parse
:type url: basestring
:param resolve: if true, resolve substitutions
:type resolve: boolean
:param unresolved_value: assigned value value to unresolved substitution.
If overriden with a default value, it will replace all unresolved value to the default value.
If it is set to to pyhocon.STR_SUBSTITUTION then it will replace the value by
its substitution expression (e.g., ${x})
:type unresolved_value: boolean
:return: Config object or []
:type return: Config or list
"""
socket_timeout = socket._GLOBAL_DEFAULT_TIMEOUT if timeout is None else timeout
try:
with contextlib.closing(urlopen(url, timeout=socket_timeout)) as fd:
content = fd.read() if use_urllib2 else fd.read().decode('utf-8')
return cls.parse_string(content, os.path.dirname(url), resolve, unresolved_value)
except (HTTPError, URLError) as e:
logger.warn('Cannot include url %s. Resource is inaccessible.', url)
if required:
raise e
else:
return []
@classmethod
def parse_string(cls, content, basedir=None, resolve=True, unresolved_value=DEFAULT_SUBSTITUTION):
"""Parse URL
:param content: content to parse
:type content: basestring
:param resolve: If true, resolve substitutions
:param resolve: if true, resolve substitutions
:type resolve: boolean
:param unresolved_value: assigned value value to unresolved substitution.
If overriden with a default value, it will replace all unresolved value to the default value.
If it is set to to pyhocon.STR_SUBSTITUTION then it will replace the value by
its substitution expression (e.g., ${x})
:type unresolved_value: boolean
:return: Config object
:type return: Config
"""
return ConfigParser().parse(content, basedir, resolve, unresolved_value)
@classmethod
def from_dict(cls, dictionary, root=False):
"""Convert dictionary (and ordered dictionary) into a ConfigTree
:param dictionary: dictionary to convert
:type dictionary: dict
:return: Config object
:type return: Config
"""
def create_tree(value):
if isinstance(value, dict):
res = ConfigTree(root=root)
for key, child_value in value.items():
res.put(key, create_tree(child_value))
return res
if isinstance(value, list):
return [create_tree(v) for v in value]
else:
return value
return create_tree(dictionary)
class ConfigParser(object):
"""
Parse HOCON files: https://github.com/typesafehub/config/blob/master/HOCON.md
"""
REPLACEMENTS = {
'\\\\': '\\',
'\\\n': '\n',
'\\n': '\n',
'\\r': '\r',
'\\t': '\t',
'\\=': '=',
'\\#': '#',
'\\!': '!',
'\\"': '"',
}
period_type_map = {
'nanoseconds': ['ns', 'nano', 'nanos', 'nanosecond', 'nanoseconds'],
'microseconds': ['us', 'micro', 'micros', 'microsecond', 'microseconds'],
'milliseconds': ['ms', 'milli', 'millis', 'millisecond', 'milliseconds'],
'seconds': ['s', 'second', 'seconds'],
'minutes': ['m', 'minute', 'minutes'],
'hours': ['h', 'hour', 'hours'],
'weeks': ['w', 'week', 'weeks'],
'days': ['d', 'day', 'days'],
}
optional_period_type_map = {
'months': ['mo', 'month', 'months'], # 'm' from hocon spec removed. conflicts with minutes syntax.
'years': ['y', 'year', 'years']
}
supported_period_map = None
@classmethod
def get_supported_period_type_map(cls):
if cls.supported_period_map is None:
cls.supported_period_map = {}
cls.supported_period_map.update(cls.period_type_map)
try:
from dateutil import relativedelta
if relativedelta is not None:
cls.supported_period_map.update(cls.optional_period_type_map)
except Exception:
pass
return cls.supported_period_map
@classmethod
def parse(cls, content, basedir=None, resolve=True, unresolved_value=DEFAULT_SUBSTITUTION):
"""parse a HOCON content
:param content: HOCON content to parse
:type content: basestring
:param resolve: if true, resolve substitutions
:type resolve: boolean
:param unresolved_value: assigned value value to unresolved substitution.
If overriden with a default value, it will replace all unresolved value to the default value.
If it is set to to pyhocon.STR_SUBSTITUTION then it will replace the value by
its substitution expression (e.g., ${x})
:type unresolved_value: boolean
:return: a ConfigTree or a list
"""
unescape_pattern = re.compile(r'\\.')
def replace_escape_sequence(match):
value = match.group(0)
return cls.REPLACEMENTS.get(value, value)
def norm_string(value):
return unescape_pattern.sub(replace_escape_sequence, value)
def unescape_string(tokens):
return ConfigUnquotedString(norm_string(tokens[0]))
def parse_multi_string(tokens):
# remove the first and last 3 "
return tokens[0][3: -3]
def convert_number(tokens):
n = tokens[0]
try:
return int(n, 10)
except ValueError:
return float(n)
def safe_convert_number(tokens):
n = tokens[0]
try:
return int(n, 10)
except ValueError:
try:
return float(n)
except ValueError:
return n
def convert_period(tokens):
period_value = int(tokens.value)
period_identifier = tokens.unit
period_unit = next((single_unit for single_unit, values
in cls.get_supported_period_type_map().items()
if period_identifier in values))
return period(period_value, period_unit)
# ${path} or ${?path} for optional substitution
SUBSTITUTION_PATTERN = r"\$\{(?P<optional>\?)?(?P<variable>[^}]+)\}(?P<ws>[ \t]*)"
def create_substitution(instring, loc, token):
# remove the ${ and }
match = re.match(SUBSTITUTION_PATTERN, token[0])
variable = match.group('variable')
ws = match.group('ws')
optional = match.group('optional') == '?'
substitution = ConfigSubstitution(variable, optional, ws, instring, loc)
return substitution
# ${path} or ${?path} for optional substitution
STRING_PATTERN = '"(?P<value>(?:[^"\\\\]|\\\\.)*)"(?P<ws>[ \t]*)'
def create_quoted_string(instring, loc, token):
# remove the ${ and }
match = re.match(STRING_PATTERN, token[0])
value = norm_string(match.group('value'))
ws = match.group('ws')
return ConfigQuotedString(value, ws, instring, loc)
def include_config(instring, loc, token):
url = None
file = None
required = False
if token[0] == 'required':
required = True
final_tokens = token[1:]
else:
final_tokens = token
if len(final_tokens) == 1: # include "test"
value = final_tokens[0].value if isinstance(final_tokens[0], ConfigQuotedString) else final_tokens[0]
if value.startswith("http://") or value.startswith("https://") or value.startswith("file://"):
url = value
else:
file = value
elif len(final_tokens) == 2: # include url("test") or file("test")
value = final_tokens[1].value if isinstance(token[1], ConfigQuotedString) else final_tokens[1]
if final_tokens[0] == 'url':
url = value
else:
file = value
if url is not None:
logger.debug('Loading config from url %s', url)
obj = ConfigFactory.parse_URL(
url,
resolve=False,
required=required,
unresolved_value=NO_SUBSTITUTION
)
elif file is not None:
path = file if basedir is None else os.path.join(basedir, file)
logger.debug('Loading config from file %s', path)
obj = ConfigFactory.parse_file(
path,
resolve=False,
required=required,
unresolved_value=NO_SUBSTITUTION
)
else:
raise ConfigException('No file or URL specified at: {loc}: {instring}', loc=loc, instring=instring)
return ConfigInclude(obj if isinstance(obj, list) else obj.items())
@contextlib.contextmanager
def set_default_white_spaces():
default = ParserElement.DEFAULT_WHITE_CHARS
ParserElement.setDefaultWhitespaceChars(' \t')
yield
ParserElement.setDefaultWhitespaceChars(default)
with set_default_white_spaces():
assign_expr = Forward()
true_expr = Keyword("true", caseless=True).setParseAction(replaceWith(True))
false_expr = Keyword("false", caseless=True).setParseAction(replaceWith(False))
null_expr = Keyword("null", caseless=True).setParseAction(replaceWith(NoneValue()))
# key = QuotedString('"', escChar='\\', unquoteResults=False) | Word(alphanums + alphas8bit + '._- /')
regexp_numbers = r'[+-]?(\d*\.\d+|\d+(\.\d+)?)([eE][+\-]?\d+)?(?=$|[ \t]*([\$\}\],#\n\r]|//))'
key = QuotedString('"', escChar='\\', unquoteResults=False) | \
Regex(regexp_numbers, re.DOTALL).setParseAction(safe_convert_number) | \
Word(alphanums + alphas8bit + '._- /')
eol = Word('\n\r').suppress()
eol_comma = Word('\n\r,').suppress()
comment = (Literal('#') | Literal('//')) - SkipTo(eol | StringEnd())
comment_eol = Suppress(Optional(eol_comma) + comment)
comment_no_comma_eol = (comment | eol).suppress()
number_expr = Regex(regexp_numbers, re.DOTALL).setParseAction(convert_number)
period_types = itertools.chain.from_iterable(cls.get_supported_period_type_map().values())
period_expr = Regex(r'(?P<value>\d+)\s*(?P<unit>' + '|'.join(period_types) + ')$'
).setParseAction(convert_period)
# multi line string using """
# Using fix described in http://pyparsing.wikispaces.com/share/view/3778969
multiline_string = Regex('""".*?"*"""', re.DOTALL | re.UNICODE).setParseAction(parse_multi_string)
# single quoted line string
quoted_string = Regex(r'"(?:[^"\\\n]|\\.)*"[ \t]*', re.UNICODE).setParseAction(create_quoted_string)
# unquoted string that takes the rest of the line until an optional comment
# we support .properties multiline support which is like this:
# line1 \
# line2 \
# so a backslash precedes the \n
unquoted_string = Regex(r'(?:[^^`+?!@*&"\[\{\s\]\}#,=\$\\]|\\.)+[ \t]*',
re.UNICODE).setParseAction(unescape_string)
substitution_expr = Regex(r'[ \t]*\$\{[^\}]+\}[ \t]*').setParseAction(create_substitution)
string_expr = multiline_string | quoted_string | unquoted_string
value_expr = period_expr | number_expr | true_expr | false_expr | null_expr | string_expr
include_content = (quoted_string | ((Keyword('url') | Keyword(
'file')) - Literal('(').suppress() - quoted_string - Literal(')').suppress()))
include_expr = (
Keyword("include", caseless=True).suppress() + (
include_content | (
Keyword("required") - Literal('(').suppress() - include_content - Literal(')').suppress()
)
)
).setParseAction(include_config)
root_dict_expr = Forward()
dict_expr = Forward()
list_expr = Forward()
multi_value_expr = ZeroOrMore(comment_eol | include_expr | substitution_expr |
dict_expr | list_expr | value_expr | (Literal('\\') - eol).suppress())
# for a dictionary : or = is optional
# last zeroOrMore is because we can have t = {a:4} {b: 6} {c: 7} which is dictionary concatenation
inside_dict_expr = ConfigTreeParser(ZeroOrMore(comment_eol | include_expr | assign_expr | eol_comma))
inside_root_dict_expr = ConfigTreeParser(ZeroOrMore(
comment_eol | include_expr | assign_expr | eol_comma), root=True)
dict_expr << Suppress('{') - inside_dict_expr - Suppress('}')
root_dict_expr << Suppress('{') - inside_root_dict_expr - Suppress('}')
list_entry = ConcatenatedValueParser(multi_value_expr)
list_expr << Suppress('[') - ListParser(list_entry - ZeroOrMore(eol_comma - list_entry)) - Suppress(']')
# special case when we have a value assignment where the string can potentially be the remainder of the line
assign_expr << Group(key - ZeroOrMore(comment_no_comma_eol) -
(dict_expr | (Literal('=') | Literal(':') | Literal('+=')) -
ZeroOrMore(comment_no_comma_eol) - ConcatenatedValueParser(multi_value_expr)))
# the file can be { ... } where {} can be omitted or []
config_expr = ZeroOrMore(comment_eol | eol) + (list_expr | root_dict_expr |
inside_root_dict_expr) + ZeroOrMore(comment_eol | eol_comma)
config = config_expr.parseString(content, parseAll=True)[0]
if resolve:
allow_unresolved = resolve and unresolved_value is not DEFAULT_SUBSTITUTION and \
unresolved_value is not MANDATORY_SUBSTITUTION
has_unresolved = cls.resolve_substitutions(config, allow_unresolved)
if has_unresolved and unresolved_value is MANDATORY_SUBSTITUTION:
raise ConfigSubstitutionException(
'resolve cannot be set to True and unresolved_value to MANDATORY_SUBSTITUTION')
if unresolved_value is not NO_SUBSTITUTION and unresolved_value is not DEFAULT_SUBSTITUTION:
cls.unresolve_substitutions_to_value(config, unresolved_value)
return config
@classmethod
def _resolve_variable(cls, config, substitution):
"""
:param config:
:param substitution:
:return: (is_resolved, resolved_variable)
"""
variable = substitution.variable
try:
return True, config.get(variable)
except ConfigMissingException:
# default to environment variable
value = os.environ.get(variable)
if value is None:
if substitution.optional:
return False, None
else:
raise ConfigSubstitutionException(
"Cannot resolve variable ${{{variable}}} (line: {line}, col: {col})".format(
variable=variable,
line=lineno(substitution.loc, substitution.instring),
col=col(substitution.loc, substitution.instring)))
elif isinstance(value, ConfigList) or isinstance(value, ConfigTree):
raise ConfigSubstitutionException(
"Cannot substitute variable ${{{variable}}} because it does not point to a "
"string, int, float, boolean or null {type} (line:{line}, col: {col})".format(
variable=variable,
type=value.__class__.__name__,
line=lineno(substitution.loc, substitution.instring),
col=col(substitution.loc, substitution.instring)))
return True, value
@classmethod
def _fixup_self_references(cls, config, accept_unresolved=False):
if isinstance(config, ConfigTree) and config.root:
for key in config: # Traverse history of element
history = config.history[key]
previous_item = history[0]
for current_item in history[1:]:
for substitution in cls._find_substitutions(current_item):
prop_path = ConfigTree.parse_key(substitution.variable)
if len(prop_path) > 1 and config.get(substitution.variable, None) is not None:
continue # If value is present in latest version, don't do anything
if prop_path[0] == key:
if isinstance(previous_item, ConfigValues) and not accept_unresolved:
# We hit a dead end, we cannot evaluate
raise ConfigSubstitutionException(
"Property {variable} cannot be substituted. Check for cycles.".format(
variable=substitution.variable
)
)
else:
value = previous_item if len(
prop_path) == 1 else previous_item.get(".".join(prop_path[1:]))
_, _, current_item = cls._do_substitute(substitution, value)
previous_item = current_item
if len(history) == 1:
for substitution in cls._find_substitutions(previous_item):
prop_path = ConfigTree.parse_key(substitution.variable)
if len(prop_path) > 1 and config.get(substitution.variable, None) is not None:
continue # If value is present in latest version, don't do anything
if prop_path[0] == key and substitution.optional:
cls._do_substitute(substitution, None)
if prop_path[0] == key:
value = os.environ.get(key)
if value is not None:
cls._do_substitute(substitution, value)
continue
if substitution.optional: # special case, when self optional referencing without existing
cls._do_substitute(substitution, None)
# traverse config to find all the substitutions
@classmethod
def _find_substitutions(cls, item):
"""Convert HOCON input into a JSON output
:return: JSON string representation
:type return: basestring
"""
if isinstance(item, ConfigValues):
return item.get_substitutions()
substitutions = []
elements = []
if isinstance(item, ConfigTree):
elements = item.values()
elif isinstance(item, list):
elements = item
for child in elements:
substitutions += cls._find_substitutions(child)
return substitutions
@classmethod
def _do_substitute(cls, substitution, resolved_value, is_optional_resolved=True):
unresolved = False
new_substitutions = []
if isinstance(resolved_value, ConfigValues):
resolved_value = resolved_value.transform()
if isinstance(resolved_value, ConfigValues):
unresolved = True
result = resolved_value
else:
# replace token by substitution
config_values = substitution.parent
# if it is a string, then add the extra ws that was present in the original string after the substitution
formatted_resolved_value = resolved_value \
if resolved_value is None \
or isinstance(resolved_value, (dict, list)) \
or substitution.index == len(config_values.tokens) - 1 \
else (str(resolved_value) + substitution.ws)
# use a deepcopy of resolved_value to avoid mutation
config_values.put(substitution.index, copy.deepcopy(formatted_resolved_value))
transformation = config_values.transform()
result = config_values.overriden_value \
if transformation is None and not is_optional_resolved \
else transformation
if result is None and config_values.key in config_values.parent:
del config_values.parent[config_values.key]
else:
config_values.parent[config_values.key] = result
s = cls._find_substitutions(result)
if s:
new_substitutions = s
unresolved = True
return (unresolved, new_substitutions, result)
@classmethod
def _final_fixup(cls, item):
if isinstance(item, ConfigValues):
return item.transform()
elif isinstance(item, list):
return list([cls._final_fixup(child) for child in item])
elif isinstance(item, ConfigTree):
items = list(item.items())
for key, child in items:
item[key] = cls._final_fixup(child)
return item
@classmethod
def unresolve_substitutions_to_value(cls, config, unresolved_value=STR_SUBSTITUTION):
for substitution in cls._find_substitutions(config):
if unresolved_value is STR_SUBSTITUTION:
value = substitution.raw_str()
elif unresolved_value is None:
value = NoneValue()
else:
value = unresolved_value
cls._do_substitute(substitution, value, False)
cls._final_fixup(config)
@classmethod
def resolve_substitutions(cls, config, accept_unresolved=False):
has_unresolved = False
cls._fixup_self_references(config, accept_unresolved)
substitutions = cls._find_substitutions(config)
if len(substitutions) > 0:
unresolved = True
any_unresolved = True
_substitutions = []
cache = {}
while any_unresolved and len(substitutions) > 0 and set(substitutions) != set(_substitutions):
unresolved = False
any_unresolved = True
_substitutions = substitutions[:]
for substitution in _substitutions:
is_optional_resolved, resolved_value = cls._resolve_variable(config, substitution)
# if the substitution is optional
if not is_optional_resolved and substitution.optional:
resolved_value = None
if isinstance(resolved_value, ConfigValues):
parents = cache.get(resolved_value)
if parents is None:
parents = []
link = resolved_value
while isinstance(link, ConfigValues):
parents.append(link)
link = link.overriden_value
cache[resolved_value] = parents
if isinstance(resolved_value, ConfigValues) \
and substitution.parent in parents \
and hasattr(substitution.parent, 'overriden_value') \
and substitution.parent.overriden_value:
# self resolution, backtrack
resolved_value = substitution.parent.overriden_value
unresolved, new_substitutions, result = cls._do_substitute(
substitution, resolved_value, is_optional_resolved)
any_unresolved = unresolved or any_unresolved
substitutions.extend(new_substitutions)
if not isinstance(result, ConfigValues):
substitutions.remove(substitution)
cls._final_fixup(config)
if unresolved:
has_unresolved = True
if not accept_unresolved:
raise ConfigSubstitutionException("Cannot resolve {variables}. Check for cycles.".format(
variables=', '.join('${{{variable}}}: (line: {line}, col: {col})'.format(
variable=substitution.variable,
line=lineno(substitution.loc, substitution.instring),
col=col(substitution.loc, substitution.instring)) for substitution in substitutions)))
cls._final_fixup(config)
return has_unresolved
class ListParser(TokenConverter):
"""Parse a list [elt1, etl2, ...]
"""
def __init__(self, expr=None):
super(ListParser, self).__init__(expr)
self.saveAsList = True
def postParse(self, instring, loc, token_list):
"""Create a list from the tokens
:param instring:
:param loc:
:param token_list:
:return:
"""
cleaned_token_list = [token for tokens in (token.tokens if isinstance(token, ConfigInclude) else [token]
for token in token_list if token != '')
for token in tokens]
config_list = ConfigList(cleaned_token_list)
return [config_list]
class ConcatenatedValueParser(TokenConverter):
def __init__(self, expr=None):
super(ConcatenatedValueParser, self).__init__(expr)
self.parent = None
self.key = None
def postParse(self, instring, loc, token_list):
config_values = ConfigValues(token_list, instring, loc)
return [config_values.transform()]
class ConfigTreeParser(TokenConverter):
"""
Parse a config tree from tokens
"""
def __init__(self, expr=None, root=False):
super(ConfigTreeParser, self).__init__(expr)
self.root = root
self.saveAsList = True
def postParse(self, instring, loc, token_list):
"""Create ConfigTree from tokens
:param instring:
:param loc:
:param token_list:
:return:
"""
config_tree = ConfigTree(root=self.root)
for element in token_list:
expanded_tokens = element.tokens if isinstance(element, ConfigInclude) else [element]
for tokens in expanded_tokens:
# key, value1 (optional), ...
key = tokens[0].strip() if isinstance(tokens[0], (unicode, basestring)) else tokens[0]
operator = '='
if len(tokens) == 3 and tokens[1].strip() in [':', '=', '+=']:
operator = tokens[1].strip()
values = tokens[2:]
elif len(tokens) == 2:
values = tokens[1:]
else:
raise ParseSyntaxException("Unknown tokens {tokens} received".format(tokens=tokens))
# empty string
if len(values) == 0:
config_tree.put(key, '')
else:
value = values[0]
if isinstance(value, list) and operator == "+=":
value = ConfigValues([ConfigSubstitution(key, True, '', False, loc), value], False, loc)
config_tree.put(key, value, False)
elif isinstance(value, unicode) and operator == "+=":
value = ConfigValues([ConfigSubstitution(key, True, '', True, loc), ' ' + value], True, loc)
config_tree.put(key, value, False)
elif isinstance(value, list):
config_tree.put(key, value, False)
else:
existing_value = config_tree.get(key, None)
if isinstance(value, ConfigTree) and not isinstance(existing_value, list):
# Only Tree has to be merged with tree
config_tree.put(key, value, True)
elif isinstance(value, ConfigValues):
conf_value = value
value.parent = config_tree
value.key = key
if isinstance(existing_value, list) or isinstance(existing_value, ConfigTree):
config_tree.put(key, conf_value, True)
else:
config_tree.put(key, conf_value, False)
else:
config_tree.put(key, value, False)
return config_tree

View File

@@ -1,608 +0,0 @@
from collections import OrderedDict
from pyparsing import lineno
from pyparsing import col
try:
basestring
except NameError: # pragma: no cover
basestring = str
unicode = str
import re
import copy
from .exceptions import ConfigException, ConfigWrongTypeException, ConfigMissingException
class UndefinedKey(object):
pass
class NonExistentKey(object):
pass
class NoneValue(object):
pass
class ConfigTree(OrderedDict):
KEY_SEP = '.'
def __init__(self, *args, **kwds):
self.root = kwds.pop('root') if 'root' in kwds else False
if self.root:
self.history = {}
super(ConfigTree, self).__init__(*args, **kwds)
for key, value in self.items():
if isinstance(value, ConfigValues):
value.parent = self
value.index = key
@staticmethod
def merge_configs(a, b, copy_trees=False):
"""Merge config b into a
:param a: target config
:type a: ConfigTree
:param b: source config
:type b: ConfigTree
:return: merged config a
"""
for key, value in b.items():
# if key is in both a and b and both values are dictionary then merge it otherwise override it
if key in a and isinstance(a[key], ConfigTree) and isinstance(b[key], ConfigTree):
if copy_trees:
a[key] = a[key].copy()
ConfigTree.merge_configs(a[key], b[key], copy_trees=copy_trees)
else:
if isinstance(value, ConfigValues):
value.parent = a
value.key = key
if key in a:
value.overriden_value = a[key]
a[key] = value
if a.root:
if b.root:
a.history[key] = a.history.get(key, []) + b.history.get(key, [value])
else:
a.history[key] = a.history.get(key, []) + [value]
return a
def _put(self, key_path, value, append=False):
key_elt = key_path[0]
if len(key_path) == 1:
# if value to set does not exist, override
# if they are both configs then merge
# if not then override
if key_elt in self and isinstance(self[key_elt], ConfigTree) and isinstance(value, ConfigTree):
if self.root:
new_value = ConfigTree.merge_configs(ConfigTree(), self[key_elt], copy_trees=True)
new_value = ConfigTree.merge_configs(new_value, value, copy_trees=True)
self._push_history(key_elt, new_value)
self[key_elt] = new_value
else:
ConfigTree.merge_configs(self[key_elt], value)
elif append:
# If we have t=1
# and we try to put t.a=5 then t is replaced by {a: 5}
l_value = self.get(key_elt, None)
if isinstance(l_value, ConfigValues):
l_value.tokens.append(value)
l_value.recompute()
elif isinstance(l_value, ConfigTree) and isinstance(value, ConfigValues):
value.overriden_value = l_value
value.tokens.insert(0, l_value)
value.recompute()
value.parent = self
value.key = key_elt
self._push_history(key_elt, value)
self[key_elt] = value
elif isinstance(l_value, list) and isinstance(value, ConfigValues):
self._push_history(key_elt, value)
value.overriden_value = l_value
value.parent = self
value.key = key_elt
self[key_elt] = value
elif isinstance(l_value, list):
self[key_elt] = l_value + value
self._push_history(key_elt, l_value)
elif l_value is None:
self._push_history(key_elt, value)
self[key_elt] = value
else:
raise ConfigWrongTypeException(
u"Cannot concatenate the list {key}: {value} to {prev_value} of {type}".format(
key='.'.join(key_path),
value=value,
prev_value=l_value,
type=l_value.__class__.__name__)
)
else:
# if there was an override keep overide value
if isinstance(value, ConfigValues):
value.parent = self
value.key = key_elt
value.overriden_value = self.get(key_elt, None)
self._push_history(key_elt, value)
self[key_elt] = value
else:
next_config_tree = super(ConfigTree, self).get(key_elt)
if not isinstance(next_config_tree, ConfigTree):
# create a new dictionary or overwrite a previous value
next_config_tree = ConfigTree()
self._push_history(key_elt, next_config_tree)
self[key_elt] = next_config_tree
next_config_tree._put(key_path[1:], value, append)
def _push_history(self, key, value):
if self.root:
hist = self.history.get(key)
if hist is None:
hist = self.history[key] = []
hist.append(value)
def _get(self, key_path, key_index=0, default=UndefinedKey):
key_elt = key_path[key_index]
elt = super(ConfigTree, self).get(key_elt, UndefinedKey)
if elt is UndefinedKey:
if default is UndefinedKey:
raise ConfigMissingException(u"No configuration setting found for key {key}".format(
key='.'.join(key_path[: key_index + 1])))
else:
return default
if key_index == len(key_path) - 1:
if isinstance(elt, NoneValue):
return None
elif isinstance(elt, list):
return [None if isinstance(x, NoneValue) else x for x in elt]
else:
return elt
elif isinstance(elt, ConfigTree):
return elt._get(key_path, key_index + 1, default)
else:
if default is UndefinedKey:
raise ConfigWrongTypeException(
u"{key} has type {type} rather than dict".format(key='.'.join(key_path[:key_index + 1]),
type=type(elt).__name__))
else:
return default
@staticmethod
def parse_key(string):
"""
Split a key into path elements:
- a.b.c => a, b, c
- a."b.c" => a, QuotedKey("b.c") if . is any of the special characters: $}[]:=+#`^?!@*&.
- "a" => a
- a.b."c" => a, b, c (special case)
:param string: either string key (parse '.' as sub-key) or int / float as regular keys
:return:
"""
if isinstance(string, (int, float)):
return [string]
special_characters = '$}[]:=+#`^?!@*&.'
tokens = re.findall(
r'"[^"]+"|[^{special_characters}]+'.format(special_characters=re.escape(special_characters)),
string)
def contains_special_character(token):
return any((c in special_characters) for c in token)
return [token if contains_special_character(token) else token.strip('"') for token in tokens]
def put(self, key, value, append=False):
"""Put a value in the tree (dot separated)
:param key: key to use (dot separated). E.g., a.b.c
:type key: basestring
:param value: value to put
"""
self._put(ConfigTree.parse_key(key), value, append)
def get(self, key, default=UndefinedKey):
"""Get a value from the tree
:param key: key to use (dot separated). E.g., a.b.c
:type key: basestring
:param default: default value if key not found
:type default: object
:return: value in the tree located at key
"""
return self._get(ConfigTree.parse_key(key), 0, default)
def get_string(self, key, default=UndefinedKey):
"""Return string representation of value found at key
:param key: key to use (dot separated). E.g., a.b.c
:type key: basestring
:param default: default value if key not found
:type default: basestring
:return: string value
:type return: basestring
"""
value = self.get(key, default)
if value is None:
return None
string_value = unicode(value)
if isinstance(value, bool):
string_value = string_value.lower()
return string_value
def pop(self, key, default=UndefinedKey):
"""Remove specified key and return the corresponding value.
If key is not found, default is returned if given, otherwise ConfigMissingException is raised
This method assumes the user wants to remove the last value in the chain so it parses via parse_key
and pops the last value out of the dict.
:param key: key to use (dot separated). E.g., a.b.c
:type key: basestring
:param default: default value if key not found
:type default: object
:param default: default value if key not found
:return: value in the tree located at key
"""
if default != UndefinedKey and key not in self:
return default
value = self.get(key, UndefinedKey)
lst = ConfigTree.parse_key(key)
parent = self.KEY_SEP.join(lst[0:-1])
child = lst[-1]
if parent:
self.get(parent).__delitem__(child)
else:
self.__delitem__(child)
return value
def get_int(self, key, default=UndefinedKey):
"""Return int representation of value found at key
:param key: key to use (dot separated). E.g., a.b.c
:type key: basestring
:param default: default value if key not found
:type default: int
:return: int value
:type return: int
"""
value = self.get(key, default)
try:
return int(value) if value is not None else None
except (TypeError, ValueError):
raise ConfigException(
u"{key} has type '{type}' rather than 'int'".format(key=key, type=type(value).__name__))
def get_float(self, key, default=UndefinedKey):
"""Return float representation of value found at key
:param key: key to use (dot separated). E.g., a.b.c
:type key: basestring
:param default: default value if key not found
:type default: float
:return: float value
:type return: float
"""
value = self.get(key, default)
try:
return float(value) if value is not None else None
except (TypeError, ValueError):
raise ConfigException(
u"{key} has type '{type}' rather than 'float'".format(key=key, type=type(value).__name__))
def get_bool(self, key, default=UndefinedKey):
"""Return boolean representation of value found at key
:param key: key to use (dot separated). E.g., a.b.c
:type key: basestring
:param default: default value if key not found
:type default: bool
:return: boolean value
:type return: bool
"""
# String conversions as per API-recommendations:
# https://github.com/typesafehub/config/blob/master/HOCON.md#automatic-type-conversions
bool_conversions = {
None: None,
'true': True, 'yes': True, 'on': True,
'false': False, 'no': False, 'off': False
}
string_value = self.get_string(key, default)
if string_value is not None:
string_value = string_value.lower()
try:
return bool_conversions[string_value]
except KeyError:
raise ConfigException(
u"{key} does not translate to a Boolean value".format(key=key))
def get_list(self, key, default=UndefinedKey):
"""Return list representation of value found at key
:param key: key to use (dot separated). E.g., a.b.c
:type key: basestring
:param default: default value if key not found
:type default: list
:return: list value
:type return: list
"""
value = self.get(key, default)
if isinstance(value, list):
return value
elif isinstance(value, ConfigTree):
lst = []
for k, v in sorted(value.items(), key=lambda kv: kv[0]):
if re.match('^[1-9][0-9]*$|0', k):
lst.append(v)
else:
raise ConfigException(u"{key} does not translate to a list".format(key=key))
return lst
elif value is None:
return None
else:
raise ConfigException(
u"{key} has type '{type}' rather than 'list'".format(key=key, type=type(value).__name__))
def get_config(self, key, default=UndefinedKey):
"""Return tree config representation of value found at key
:param key: key to use (dot separated). E.g., a.b.c
:type key: basestring
:param default: default value if key not found
:type default: config
:return: config value
:type return: ConfigTree
"""
value = self.get(key, default)
if isinstance(value, dict):
return value
elif value is None:
return None
else:
raise ConfigException(
u"{key} has type '{type}' rather than 'config'".format(key=key, type=type(value).__name__))
def __getitem__(self, item):
val = self.get(item)
if val is UndefinedKey:
raise KeyError(item)
return val
try:
from collections import _OrderedDictItemsView
except ImportError: # pragma: nocover
pass
else:
def items(self): # pragma: nocover
return self._OrderedDictItemsView(self)
def __getattr__(self, item):
val = self.get(item, NonExistentKey)
if val is NonExistentKey:
return super(ConfigTree, self).__getattr__(item)
return val
def __contains__(self, item):
return self._get(self.parse_key(item), default=NoneValue) is not NoneValue
def with_fallback(self, config, resolve=True):
"""
return a new config with fallback on config
:param config: config or filename of the config to fallback on
:param resolve: resolve substitutions
:return: new config with fallback on config
"""
if isinstance(config, ConfigTree):
result = ConfigTree.merge_configs(copy.deepcopy(config), copy.deepcopy(self))
else:
from . import ConfigFactory
result = ConfigTree.merge_configs(ConfigFactory.parse_file(config, resolve=False), copy.deepcopy(self))
if resolve:
from . import ConfigParser
ConfigParser.resolve_substitutions(result)
return result
def as_plain_ordered_dict(self):
"""return a deep copy of this config as a plain OrderedDict
The config tree should be fully resolved.
This is useful to get an object with no special semantics such as path expansion for the keys.
In particular this means that keys that contain dots are not surrounded with '"' in the plain OrderedDict.
:return: this config as an OrderedDict
:type return: OrderedDict
"""
def plain_value(v):
if isinstance(v, list):
return [plain_value(e) for e in v]
elif isinstance(v, ConfigTree):
return v.as_plain_ordered_dict()
else:
if isinstance(v, ConfigValues):
raise ConfigException("The config tree contains unresolved elements")
return v
return OrderedDict((key.strip('"') if isinstance(key, (unicode, basestring)) else key, plain_value(value))
for key, value in self.items())
class ConfigList(list):
def __init__(self, iterable=[]):
new_list = list(iterable)
super(ConfigList, self).__init__(new_list)
for index, value in enumerate(new_list):
if isinstance(value, ConfigValues):
value.parent = self
value.key = index
class ConfigInclude(object):
def __init__(self, tokens):
self.tokens = tokens
class ConfigValues(object):
def __init__(self, tokens, instring, loc):
self.tokens = tokens
self.parent = None
self.key = None
self._instring = instring
self._loc = loc
self.overriden_value = None
self.recompute()
def recompute(self):
for index, token in enumerate(self.tokens):
if isinstance(token, ConfigSubstitution):
token.parent = self
token.index = index
# no value return empty string
if len(self.tokens) == 0:
self.tokens = ['']
# if the last token is an unquoted string then right strip it
if isinstance(self.tokens[-1], ConfigUnquotedString):
# rstrip only whitespaces, not \n\r because they would have been used escaped
self.tokens[-1] = self.tokens[-1].rstrip(' \t')
def has_substitution(self):
return len(self.get_substitutions()) > 0
def get_substitutions(self):
lst = []
node = self
while node:
lst = [token for token in node.tokens if isinstance(token, ConfigSubstitution)] + lst
if hasattr(node, 'overriden_value'):
node = node.overriden_value
if not isinstance(node, ConfigValues):
break
else:
break
return lst
def transform(self):
def determine_type(token):
return ConfigTree if isinstance(token, ConfigTree) else ConfigList if isinstance(token, list) else str
def format_str(v, last=False):
if isinstance(v, ConfigQuotedString):
return v.value + ('' if last else v.ws)
else:
return '' if v is None else unicode(v)
if self.has_substitution():
return self
# remove None tokens
tokens = [token for token in self.tokens if token is not None]
if not tokens:
return None
# check if all tokens are compatible
first_tok_type = determine_type(tokens[0])
for index, token in enumerate(tokens[1:]):
tok_type = determine_type(token)
if first_tok_type is not tok_type:
raise ConfigWrongTypeException(
"Token '{token}' of type {tok_type} (index {index}) must be of type {req_tok_type} "
"(line: {line}, col: {col})".format(
token=token,
index=index + 1,
tok_type=tok_type.__name__,
req_tok_type=first_tok_type.__name__,
line=lineno(self._loc, self._instring),
col=col(self._loc, self._instring)))
if first_tok_type is ConfigTree:
child = []
if hasattr(self, 'overriden_value'):
node = self.overriden_value
while node:
if isinstance(node, ConfigValues):
value = node.transform()
if isinstance(value, ConfigTree):
child.append(value)
else:
break
elif isinstance(node, ConfigTree):
child.append(node)
else:
break
if hasattr(node, 'overriden_value'):
node = node.overriden_value
else:
break
result = ConfigTree()
for conf in reversed(child):
ConfigTree.merge_configs(result, conf, copy_trees=True)
for token in tokens:
ConfigTree.merge_configs(result, token, copy_trees=True)
return result
elif first_tok_type is ConfigList:
result = []
main_index = 0
for sublist in tokens:
sublist_result = ConfigList()
for token in sublist:
if isinstance(token, ConfigValues):
token.parent = result
token.key = main_index
main_index += 1
sublist_result.append(token)
result.extend(sublist_result)
return result
else:
if len(tokens) == 1:
if isinstance(tokens[0], ConfigQuotedString):
return tokens[0].value
return tokens[0]
else:
return ''.join(format_str(token) for token in tokens[:-1]) + format_str(tokens[-1], True)
def put(self, index, value):
self.tokens[index] = value
def __repr__(self): # pragma: no cover
return '[ConfigValues: ' + ','.join(str(o) for o in self.tokens) + ']'
class ConfigSubstitution(object):
def __init__(self, variable, optional, ws, instring, loc):
self.variable = variable
self.optional = optional
self.ws = ws
self.index = None
self.parent = None
self.instring = instring
self.loc = loc
def __repr__(self): # pragma: no cover
return '[ConfigSubstitution: ' + self.variable + ']'
class ConfigUnquotedString(unicode):
def __new__(cls, value):
return super(ConfigUnquotedString, cls).__new__(cls, value)
class ConfigQuotedString(object):
def __init__(self, value, ws, instring, loc):
self.value = value
self.ws = ws
self.instring = instring
self.loc = loc
def __repr__(self): # pragma: no cover
return '[ConfigQuotedString: ' + self.value + ']'

View File

@@ -1,329 +0,0 @@
import json
import re
import sys
from . import ConfigFactory
from .config_tree import ConfigQuotedString
from .config_tree import ConfigSubstitution
from .config_tree import ConfigTree
from .config_tree import ConfigValues
from .config_tree import NoneValue
try:
basestring
except NameError:
basestring = str
unicode = str
class HOCONConverter(object):
_number_re = r'[+-]?(\d*\.\d+|\d+(\.\d+)?)([eE][+\-]?\d+)?(?=$|[ \t]*([\$\}\],#\n\r]|//))'
_number_re_matcher = re.compile(_number_re)
@classmethod
def to_json(cls, config, compact=False, indent=2, level=0):
"""Convert HOCON input into a JSON output
:return: JSON string representation
:type return: basestring
"""
lines = ""
if isinstance(config, ConfigTree):
if len(config) == 0:
lines += '{}'
else:
lines += '{\n'
bet_lines = []
for key, item in config.items():
bet_lines.append('{indent}"{key}": {value}'.format(
indent=''.rjust((level + 1) * indent, ' '),
key=key.strip('"'), # for dotted keys enclosed with "" to not be interpreted as nested key
value=cls.to_json(item, compact, indent, level + 1))
)
lines += ',\n'.join(bet_lines)
lines += '\n{indent}}}'.format(indent=''.rjust(level * indent, ' '))
elif isinstance(config, list):
if len(config) == 0:
lines += '[]'
else:
lines += '[\n'
bet_lines = []
for item in config:
bet_lines.append('{indent}{value}'.format(
indent=''.rjust((level + 1) * indent, ' '),
value=cls.to_json(item, compact, indent, level + 1))
)
lines += ',\n'.join(bet_lines)
lines += '\n{indent}]'.format(indent=''.rjust(level * indent, ' '))
elif isinstance(config, basestring):
lines = json.dumps(config)
elif config is None or isinstance(config, NoneValue):
lines = 'null'
elif config is True:
lines = 'true'
elif config is False:
lines = 'false'
else:
lines = str(config)
return lines
@staticmethod
def _auto_indent(lines, section):
# noinspection PyBroadException
try:
indent = len(lines) - lines.rindex('\n')
except Exception:
indent = len(lines)
# noinspection PyBroadException
try:
section_indent = section.index('\n')
except Exception:
section_indent = len(section)
if section_indent < 3:
return lines + section
indent = '\n' + ''.rjust(indent, ' ')
return lines + indent.join([sec.strip() for sec in section.split('\n')])
# indent = ''.rjust(indent, ' ')
# return lines + section.replace('\n', '\n'+indent)
@classmethod
def to_hocon(cls, config, compact=False, indent=2, level=0):
"""Convert HOCON input into a HOCON output
:return: JSON string representation
:type return: basestring
"""
lines = ""
if isinstance(config, ConfigTree):
if len(config) == 0:
lines += '{}'
else:
if level > 0: # don't display { at root level
lines += '{\n'
bet_lines = []
for key, item in config.items():
if compact:
full_key = key
while isinstance(item, ConfigTree) and len(item) == 1:
key, item = next(iter(item.items()))
full_key += '.' + key
else:
full_key = key
if isinstance(full_key, float) or \
(isinstance(full_key, (basestring, unicode)) and cls._number_re_matcher.match(full_key)):
# if key can be casted to float, and it is a string, make sure we quote it
full_key = '\"{}\"'.format(full_key)
bet_line = ('{indent}{key}{assign_sign} '.format(
indent=''.rjust(level * indent, ' '),
key=full_key,
assign_sign='' if isinstance(item, dict) else ' =',)
)
value_line = cls.to_hocon(item, compact, indent, level + 1)
if isinstance(item, (list, tuple)):
bet_lines.append(cls._auto_indent(bet_line, value_line))
else:
bet_lines.append(bet_line + value_line)
lines += '\n'.join(bet_lines)
if level > 0: # don't display { at root level
lines += '\n{indent}}}'.format(indent=''.rjust((level - 1) * indent, ' '))
elif isinstance(config, (list, tuple)):
if len(config) == 0:
lines += '[]'
else:
# lines += '[\n'
lines += '['
bet_lines = []
base_len = len(lines)
skip_comma = False
for i, item in enumerate(config):
if 0 < i and not skip_comma:
# if not isinstance(item, (str, int, float)):
# lines += ',\n{indent}'.format(indent=''.rjust(level * indent, ' '))
# else:
# lines += ', '
lines += ', '
skip_comma = False
new_line = cls.to_hocon(item, compact, indent, level + 1)
lines += new_line
if '\n' in new_line or len(lines) - base_len > 80:
if i < len(config) - 1:
lines += ',\n{indent}'.format(indent=''.rjust(level * indent, ' '))
base_len = len(lines)
skip_comma = True
# bet_lines.append('{value}'.format(value=cls.to_hocon(item, compact, indent, level + 1)))
# lines += '\n'.join(bet_lines)
# lines += ', '.join(bet_lines)
# lines += '\n{indent}]'.format(indent=''.rjust((level - 1) * indent, ' '))
lines += ']'
elif isinstance(config, basestring):
if '\n' in config and len(config) > 1:
lines = '"""{value}"""'.format(value=config) # multilines
else:
lines = '"{value}"'.format(value=cls.__escape_string(config))
elif isinstance(config, ConfigValues):
lines = ''.join(cls.to_hocon(o, compact, indent, level) for o in config.tokens)
elif isinstance(config, ConfigSubstitution):
lines = '${'
if config.optional:
lines += '?'
lines += config.variable + '}' + config.ws
elif isinstance(config, ConfigQuotedString):
if '\n' in config.value and len(config.value) > 1:
lines = '"""{value}"""'.format(value=config.value) # multilines
else:
lines = '"{value}"'.format(value=cls.__escape_string(config.value))
elif config is None or isinstance(config, NoneValue):
lines = 'null'
elif config is True:
lines = 'true'
elif config is False:
lines = 'false'
else:
lines = str(config)
return lines
@classmethod
def to_yaml(cls, config, compact=False, indent=2, level=0):
"""Convert HOCON input into a YAML output
:return: YAML string representation
:type return: basestring
"""
lines = ""
if isinstance(config, ConfigTree):
if len(config) > 0:
if level > 0:
lines += '\n'
bet_lines = []
for key, item in config.items():
bet_lines.append('{indent}{key}: {value}'.format(
indent=''.rjust(level * indent, ' '),
key=key.strip('"'), # for dotted keys enclosed with "" to not be interpreted as nested key,
value=cls.to_yaml(item, compact, indent, level + 1))
)
lines += '\n'.join(bet_lines)
elif isinstance(config, list):
config_list = [line for line in config if line is not None]
if len(config_list) == 0:
lines += '[]'
else:
lines += '\n'
bet_lines = []
for item in config_list:
bet_lines.append('{indent}- {value}'.format(indent=''.rjust(level * indent, ' '),
value=cls.to_yaml(item, compact, indent, level + 1)))
lines += '\n'.join(bet_lines)
elif isinstance(config, basestring):
# if it contains a \n then it's multiline
lines = config.split('\n')
if len(lines) == 1:
lines = config
else:
lines = '|\n' + '\n'.join([line.rjust(level * indent, ' ') for line in lines])
elif config is None or isinstance(config, NoneValue):
lines = 'null'
elif config is True:
lines = 'true'
elif config is False:
lines = 'false'
else:
lines = str(config)
return lines
@classmethod
def to_properties(cls, config, compact=False, indent=2, key_stack=[]):
"""Convert HOCON input into a .properties output
:return: .properties string representation
:type return: basestring
:return:
"""
def escape_value(value):
return value.replace('=', '\\=').replace('!', '\\!').replace('#', '\\#').replace('\n', '\\\n')
stripped_key_stack = [key.strip('"') for key in key_stack]
lines = []
if isinstance(config, ConfigTree):
for key, item in config.items():
if item is not None:
lines.append(cls.to_properties(item, compact, indent, stripped_key_stack + [key]))
elif isinstance(config, list):
for index, item in enumerate(config):
if item is not None:
lines.append(cls.to_properties(item, compact, indent, stripped_key_stack + [str(index)]))
elif isinstance(config, basestring):
lines.append('.'.join(stripped_key_stack) + ' = ' + escape_value(config))
elif config is True:
lines.append('.'.join(stripped_key_stack) + ' = true')
elif config is False:
lines.append('.'.join(stripped_key_stack) + ' = false')
elif config is None or isinstance(config, NoneValue):
pass
else:
lines.append('.'.join(stripped_key_stack) + ' = ' + str(config))
return '\n'.join([line for line in lines if len(line) > 0])
@classmethod
def convert(cls, config, output_format='json', indent=2, compact=False):
converters = {
'json': cls.to_json,
'properties': cls.to_properties,
'yaml': cls.to_yaml,
'hocon': cls.to_hocon,
}
if output_format in converters:
return converters[output_format](config, compact, indent)
else:
raise Exception("Invalid format '{format}'. Format must be 'json', 'properties', 'yaml' or 'hocon'".format(
format=output_format))
@classmethod
def convert_from_file(cls, input_file=None, output_file=None, output_format='json', indent=2, compact=False):
"""Convert to json, properties or yaml
:param input_file: input file, if not specified stdin
:param output_file: output file, if not specified stdout
:param output_format: json, properties or yaml
:return: json, properties or yaml string representation
"""
if input_file is None:
content = sys.stdin.read()
config = ConfigFactory.parse_string(content)
else:
config = ConfigFactory.parse_file(input_file)
res = cls.convert(config, output_format, indent, compact)
if output_file is None:
print(res)
else:
with open(output_file, "w") as fd:
fd.write(res)
@classmethod
def __escape_match(cls, match):
char = match.group(0)
return {
'\b': r'\b',
'\t': r'\t',
'\n': r'\n',
'\f': r'\f',
'\r': r'\r',
'"': r'\"',
'\\': r'\\',
}.get(char) or (r'\u%04x' % ord(char))
@classmethod
def __escape_string(cls, string):
return re.sub(r'[\x00-\x1F"\\]', cls.__escape_match, string)

View File

@@ -1,17 +0,0 @@
class ConfigException(Exception):
def __init__(self, message, ex=None):
super(ConfigException, self).__init__(message)
self._exception = ex
class ConfigMissingException(ConfigException, KeyError):
pass
class ConfigSubstitutionException(ConfigException):
pass
class ConfigWrongTypeException(ConfigException):
pass

View File

@@ -1,9 +1,6 @@
import os
import re
import warnings
from clearml_agent.definitions import PIP_EXTRA_INDICES
from .requirement import Requirement
@@ -45,14 +42,9 @@ def parse(reqstr, cwd=None):
yield requirement
elif line.startswith('-f') or line.startswith('--find-links') or \
line.startswith('-i') or line.startswith('--index-url') or \
line.startswith('--extra-index-url') or \
line.startswith('--no-index'):
warnings.warn('Private repos not supported. Skipping.')
elif line.startswith('--extra-index-url'):
extra_index = line[len('--extra-index-url'):].strip()
extra_index = re.sub(r"\s+#.*$", "", extra_index) # strip comments
if extra_index and extra_index not in PIP_EXTRA_INDICES:
PIP_EXTRA_INDICES.append(extra_index)
print(f"appended {extra_index} to list of extra pip indices")
continue
elif line.startswith('-Z') or line.startswith('--always-unzip'):
warnings.warn('Unused option --always-unzip. Skipping.')

View File

@@ -1,7 +0,0 @@
from clearml_agent.definitions import EnvironmentConfig
ENV_START_AGENT_SCRIPT_PATH = EnvironmentConfig('CLEARML_K8S_GLUE_START_AGENT_SCRIPT_PATH')
"""
Script path to use when creating the bash script to run the agent inside the scheduled pod's docker container.
Script will be appended to the specified file.
"""

View File

@@ -9,43 +9,43 @@ import os
import re
import subprocess
import tempfile
from collections import defaultdict
from copy import deepcopy
from pathlib import Path
from pprint import pformat
from threading import Thread
from time import sleep, time
from typing import Text, List, Callable, Any, Collection, Optional, Union, Iterable, Dict, Tuple, Set
from time import sleep
from typing import Text, List, Callable, Any, Collection, Optional, Union
import yaml
from clearml_agent.backend_api.session import Request
from clearml_agent.commands.events import Events
from clearml_agent.commands.worker import Worker, get_task_container, set_task_container, get_next_task
from clearml_agent.definitions import ENV_DOCKER_IMAGE, ENV_AGENT_GIT_USER, ENV_AGENT_GIT_PASS
from clearml_agent.commands.worker import Worker, get_task_container, set_task_container
from clearml_agent.definitions import ENV_DOCKER_IMAGE
from clearml_agent.errors import APIError
from clearml_agent.glue.definitions import ENV_START_AGENT_SCRIPT_PATH
from clearml_agent.helper.base import safe_remove_file
from clearml_agent.helper.dicts import merge_dicts
from clearml_agent.helper.process import get_bash_output, stringify_bash_output
from clearml_agent.helper.process import get_bash_output
from clearml_agent.helper.resource_monitor import ResourceMonitor
from clearml_agent.interface.base import ObjectID
class K8sIntegration(Worker):
K8S_PENDING_QUEUE = "k8s_scheduler"
K8S_DEFAULT_NAMESPACE = "clearml"
AGENT_LABEL = "CLEARML=agent"
LIMIT_POD_LABEL = "ai.allegro.agent.serial=pod-{pod_number}"
KUBECTL_APPLY_CMD = "kubectl apply --namespace={namespace} -f"
KUBECTL_RUN_CMD = "kubectl run clearml-id-{task_id} " \
"--image {docker_image} {docker_args} " \
"--restart=Never " \
"--namespace={namespace}"
KUBECTL_DELETE_CMD = "kubectl delete pods " \
"-l={agent_label} " \
"--selector={selector} " \
"--field-selector=status.phase!=Pending,status.phase!=Running " \
"--namespace={namespace} " \
"--output name"
"--namespace={namespace}"
BASH_INSTALL_SSH_CMD = [
"apt-get update",
@@ -62,9 +62,6 @@ class K8sIntegration(Worker):
'echo "ldconfig" >> /etc/profile',
"/usr/sbin/sshd -p {port}"]
DEFAULT_EXECUTION_AGENT_ARGS = os.getenv("K8S_GLUE_DEF_EXEC_AGENT_ARGS", "--full-monitoring --require-queue")
POD_AGENT_INSTALL_ARGS = os.getenv("K8S_GLUE_POD_AGENT_INSTALL_ARGS", "")
CONTAINER_BASH_SCRIPT = [
"export DEBIAN_FRONTEND='noninteractive'",
"echo 'Binary::apt::APT::Keep-Downloaded-Packages \"true\";' > /etc/apt/apt.conf.d/docker-clean",
@@ -72,24 +69,22 @@ class K8sIntegration(Worker):
"apt-get update",
"apt-get install -y git libsm6 libxext6 libxrender-dev libglib2.0-0",
"declare LOCAL_PYTHON",
"[ ! -z $LOCAL_PYTHON ] || for i in {{15..5}}; do which python3.$i && python3.$i -m pip --version && "
"for i in {{10..5}}; do which python3.$i && python3.$i -m pip --version && "
"export LOCAL_PYTHON=$(which python3.$i) && break ; done",
"[ ! -z $LOCAL_PYTHON ] || apt-get install -y python3-pip",
"[ ! -z $LOCAL_PYTHON ] || export LOCAL_PYTHON=python3",
"$LOCAL_PYTHON -m pip install clearml-agent",
"{extra_bash_init_cmd}",
"$LOCAL_PYTHON -m pip install clearml-agent{agent_install_args}",
"{extra_docker_bash_script}",
"$LOCAL_PYTHON -m clearml_agent execute {default_execution_agent_args} --id {task_id}"
"$LOCAL_PYTHON -m clearml_agent execute --full-monitoring --require-queue --id {task_id}"
]
DEFAULT_POD_NAME_PREFIX = "clearml-id-"
DEFAULT_LIMIT_POD_LABEL = "ai.allegro.agent.serial=pod-{pod_number}"
_edit_hyperparams_version = "2.9"
def __init__(
self,
k8s_pending_queue_name=None,
kubectl_cmd=None,
container_bash_script=None,
debug=False,
ports_mode=False,
@@ -102,14 +97,15 @@ class K8sIntegration(Worker):
extra_bash_init_script=None,
namespace=None,
max_pods_limit=None,
pod_name_prefix=None,
limit_pod_label=None,
**kwargs
):
"""
Initialize the k8s integration glue layer daemon
:param str k8s_pending_queue_name: queue name to use when task is pending in the k8s scheduler
:param str|callable kubectl_cmd: kubectl command line str, supports formatting (default: KUBECTL_RUN_CMD)
example: "task={task_id} image={docker_image} queue_id={queue_id}"
or a callable function: kubectl_cmd(task_id, docker_image, docker_args, queue_id, task_data)
:param str container_bash_script: container bash script to be executed in k8s (default: CONTAINER_BASH_SCRIPT)
Notice this string will use format() call, if you have curly brackets they should be doubled { -> {{
Format arguments passed: {task_id} and {extra_bash_init_cmd}
@@ -123,7 +119,7 @@ class K8sIntegration(Worker):
when scheduling a task to run in a pod. Callable can receive an optional pod number and should return
a dictionary of user properties (name and value). Signature is [[Optional[int]], Dict[str,str]]
:param str overrides_yaml: YAML file containing the overrides for the pod (optional)
:param str template_yaml: YAML file containing the template for the pod (optional).
:param str template_yaml: YAML file containing the template for the pod (optional).
If provided the pod is scheduled with kubectl apply and overrides are ignored, otherwise with kubectl run.
:param str clearml_conf_file: clearml.conf file to be use by the pod itself (optional)
:param str extra_bash_init_script: Additional bash script to run before starting the Task inside the container
@@ -131,18 +127,15 @@ class K8sIntegration(Worker):
:param int max_pods_limit: Maximum number of pods that K8S glue can run at the same time
"""
super(K8sIntegration, self).__init__()
self.pod_name_prefix = pod_name_prefix or self.DEFAULT_POD_NAME_PREFIX
self.limit_pod_label = limit_pod_label or self.DEFAULT_LIMIT_POD_LABEL
self.k8s_pending_queue_name = k8s_pending_queue_name or self.K8S_PENDING_QUEUE
self.k8s_pending_queue_id = None
self.kubectl_cmd = kubectl_cmd or self.KUBECTL_RUN_CMD
self.container_bash_script = container_bash_script or self.CONTAINER_BASH_SCRIPT
# Always do system packages, because by we will be running inside a docker
self._session.config.put("agent.package_manager.system_site_packages", True)
# Add debug logging
if debug:
self.log.logger.disabled = False
self.log.logger.setLevel(logging.DEBUG)
self.log.logger.addHandler(logging.StreamHandler())
self.log.logger.setLevel(logging.INFO)
self.ports_mode = ports_mode
self.num_of_services = num_of_services
self.base_pod_num = base_pod_num
@@ -158,60 +151,49 @@ class K8sIntegration(Worker):
self.pod_limits = []
self.pod_requests = []
self.max_pods_limit = max_pods_limit if not self.ports_mode else None
self._load_overrides_yaml(overrides_yaml)
if overrides_yaml:
with open(os.path.expandvars(os.path.expanduser(str(overrides_yaml))), 'rt') as f:
overrides = yaml.load(f, Loader=getattr(yaml, 'FullLoader', None))
if overrides:
containers = overrides.get('spec', {}).get('containers', [])
for c in containers:
resources = {str(k).lower(): v for k, v in c.get('resources', {}).items()}
if not resources:
continue
if resources.get('limits'):
self.pod_limits += ['{}={}'.format(k, v) for k, v in resources['limits'].items()]
if resources.get('requests'):
self.pod_requests += ['{}={}'.format(k, v) for k, v in resources['requests'].items()]
# remove double entries
self.pod_limits = list(set(self.pod_limits))
self.pod_requests = list(set(self.pod_requests))
if self.pod_limits or self.pod_requests:
self.log.warning('Found pod container requests={} limits={}'.format(
self.pod_limits, self.pod_requests))
if containers:
self.log.warning('Removing containers section: {}'.format(overrides['spec'].pop('containers')))
self.overrides_json_string = json.dumps(overrides)
if template_yaml:
self.template_dict = self._load_template_file(template_yaml)
with open(os.path.expandvars(os.path.expanduser(str(template_yaml))), 'rt') as f:
self.template_dict = yaml.load(f, Loader=getattr(yaml, 'FullLoader', None))
clearml_conf_file = clearml_conf_file or kwargs.get('trains_conf_file')
if clearml_conf_file:
with open(os.path.expandvars(os.path.expanduser(str(clearml_conf_file))), 'rt') as f:
self.conf_file_content = f.read()
# make sure we use system packages!
self.conf_file_content += '\nagent.package_manager.system_site_packages=true\n'
self._agent_label = None
self._monitor_hanging_pods()
self._min_cleanup_interval_per_ns_sec = 1.0
self._last_pod_cleanup_per_ns = defaultdict(lambda: 0.)
def _load_overrides_yaml(self, overrides_yaml):
if not overrides_yaml:
return
overrides = self._load_template_file(overrides_yaml)
if not overrides:
return
containers = overrides.get('spec', {}).get('containers', [])
for c in containers:
resources = {str(k).lower(): v for k, v in c.get('resources', {}).items()}
if not resources:
continue
if resources.get('limits'):
self.pod_limits += ['{}={}'.format(k, v) for k, v in resources['limits'].items()]
if resources.get('requests'):
self.pod_requests += ['{}={}'.format(k, v) for k, v in resources['requests'].items()]
# remove double entries
self.pod_limits = list(set(self.pod_limits))
self.pod_requests = list(set(self.pod_requests))
if self.pod_limits or self.pod_requests:
self.log.warning('Found pod container requests={} limits={}'.format(
self.pod_limits, self.pod_requests))
if containers:
self.log.warning('Removing containers section: {}'.format(overrides['spec'].pop('containers')))
self.overrides_json_string = json.dumps(overrides)
def _monitor_hanging_pods(self):
_check_pod_thread = Thread(target=self._monitor_hanging_pods_daemon)
_check_pod_thread.daemon = True
_check_pod_thread.start()
@staticmethod
def _load_template_file(path):
with open(os.path.expandvars(os.path.expanduser(str(path))), 'rt') as f:
return yaml.load(f, Loader=getattr(yaml, 'FullLoader', None))
@staticmethod
def _get_path(d, *path, default=None):
try:
@@ -221,33 +203,14 @@ class K8sIntegration(Worker):
except (IndexError, KeyError):
return default
def _get_kubectl_options(self, command, extra_labels=None, filters=None, output="json", labels=None):
# type: (str, Iterable[str], Iterable[str], str, Iterable[str]) -> Dict
if not labels:
labels = [self._get_agent_label()]
labels = list(labels) + (list(extra_labels) if extra_labels else [])
d = {
"-l": ",".join(labels),
"-n": str(self.namespace),
"-o": output,
}
if filters:
d["--field-selector"] = ",".join(filters)
return d
def get_kubectl_command(self, command, output="json", **args):
opts = self._get_kubectl_options(command, output=output, **args)
return 'kubectl {command} {opts}'.format(
command=command, opts=" ".join(x for item in opts.items() for x in item)
)
def _monitor_hanging_pods_daemon(self):
last_tasks_msgs = {} # last msg updated for every task
while True:
kubectl_cmd = self.get_kubectl_command("get pods", filters=["status.phase=Pending"])
self.log.debug("Detecting hanging pods: {}".format(kubectl_cmd))
output = stringify_bash_output(get_bash_output(kubectl_cmd))
output = get_bash_output('kubectl get pods -n {namespace} -o=JSON'.format(
namespace=self.namespace
))
output = '' if not output else output if isinstance(output, str) else output.decode('utf-8')
try:
output_config = json.loads(output)
except Exception as ex:
@@ -257,6 +220,9 @@ class K8sIntegration(Worker):
pods = output_config.get('items', [])
task_ids = set()
for pod in pods:
if self._get_path(pod, 'status', 'phase') != "Pending":
continue
pod_name = pod.get('metadata', {}).get('name', None)
if not pod_name:
continue
@@ -265,10 +231,6 @@ class K8sIntegration(Worker):
if not task_id:
continue
namespace = pod.get('metadata', {}).get('namespace', None)
if not namespace:
continue
task_ids.add(task_id)
msg = None
@@ -288,11 +250,9 @@ class K8sIntegration(Worker):
msg = reason + (" ({})".format(message) if message else "")
if reason == 'ImagePullBackOff':
delete_pod_cmd = 'kubectl delete pods {} -n {}'.format(pod_name, namespace)
self.log.debug(" - deleting pod due to ImagePullBackOff: {}".format(delete_pod_cmd))
delete_pod_cmd = 'kubectl delete pods {} -n {}'.format(pod_name, self.namespace)
get_bash_output(delete_pod_cmd)
try:
self.log.debug(" - Detecting hanging pods: {}".format(kubectl_cmd))
self._session.api_client.tasks.failed(
task=task_id,
status_reason="K8S glue error: {}".format(msg),
@@ -313,7 +273,7 @@ class K8sIntegration(Worker):
service='tasks',
action='update',
json={"task": task_id, "status_message": "K8S glue status: {}".format(msg)},
method=Request.def_method,
method='get',
async_enable=False,
)
if not result.ok:
@@ -324,8 +284,8 @@ class K8sIntegration(Worker):
last_tasks_msgs[task_id] = msg
except Exception as ex:
self.log.warning(
'K8S Glue pods monitor: Failed setting status message for task "{}"\nMSG: {}\nEX: {}'.format(
task_id, msg, ex
'K8S Glue pods monitor: Failed setting status message for task "{}"\nEX: {}'.format(
task_id, ex
)
)
@@ -334,8 +294,7 @@ class K8sIntegration(Worker):
sleep(self._polling_interval)
def _set_task_user_properties(self, task_id: str, task_session=None, **properties: str):
session = task_session or self._session
def _set_task_user_properties(self, task_id: str, **properties: str):
if self._edit_hyperparams_support is not True:
# either not supported or never tested
if self._edit_hyperparams_support == self._session.api_version:
@@ -346,7 +305,7 @@ class K8sIntegration(Worker):
self._edit_hyperparams_support = self._session.api_version
return
try:
session.get(
self._session.get(
service="tasks",
action="edit_hyper_params",
task=task_id,
@@ -377,108 +336,73 @@ class K8sIntegration(Worker):
return self._agent_label
def _get_used_pods(self):
# type: () -> Tuple[int, Set[str]]
def _get_number_used_pods(self):
# noinspection PyBroadException
try:
kubectl_cmd = self.get_kubectl_command(
"get pods",
output="jsonpath=\"{range .items[*]}{.metadata.name}{' '}{.metadata.namespace}{'\\n'}{end}\""
kubectl_cmd_new = "kubectl get pods -l {agent_label} -n {namespace} -o json".format(
agent_label=self._get_agent_label(),
namespace=self.namespace,
)
self.log.debug("Getting used pods: {}".format(kubectl_cmd))
output = stringify_bash_output(get_bash_output(kubectl_cmd, raise_error=True))
process = subprocess.Popen(kubectl_cmd_new.split(), stdout=subprocess.PIPE, stderr=subprocess.PIPE)
output, error = process.communicate()
output = '' if not output else output if isinstance(output, str) else output.decode('utf-8')
error = '' if not error else error if isinstance(error, str) else error.decode('utf-8')
if not output:
# No such pod exist so we can use the pod_number we found
return 0, set([])
return 0
try:
items = output.splitlines()
current_pod_count = len(items)
namespaces = {item.rpartition(" ")[-1] for item in items}
self.log.debug(" - found {} pods in namespaces {}".format(current_pod_count, ", ".join(namespaces)))
except (KeyError, ValueError, TypeError, AttributeError) as ex:
print("Failed parsing used pods command response for cleanup: {}".format(ex))
return -1, set([])
current_pod_count = len(json.loads(output).get("items", []))
except (ValueError, TypeError) as ex:
return -1
return current_pod_count, namespaces
return current_pod_count
except Exception as ex:
print('Failed obtaining used pods information: {}'.format(ex))
return -2, set([])
print('Failed getting number of used pods: {}'.format(ex))
return -2
def _is_same_tenant(self, task_session):
if not task_session or task_session is self._session:
return True
# noinspection PyStatementEffect
try:
tenant = self._session.get_decoded_token(self._session.token, verify=False)["tenant"]
task_tenant = task_session.get_decoded_token(task_session.token, verify=False)["tenant"]
return tenant == task_tenant
except Exception as ex:
print("ERROR: Failed getting tenant for task session: {}".format(ex))
def run_one_task(self, queue: Text, task_id: Text, worker_args=None, task_session=None, **_):
def run_one_task(self, queue: Text, task_id: Text, worker_args=None, **_):
print('Pulling task {} launching on kubernetes cluster'.format(task_id))
session = task_session or self._session
task_data = session.api_client.tasks.get_all(id=[task_id])[0]
task_data = self._session.api_client.tasks.get_all(id=[task_id])[0]
# push task into the k8s queue, so we have visibility on pending tasks in the k8s scheduler
if self._is_same_tenant(task_session):
try:
print('Pushing task {} into temporary pending queue'.format(task_id))
_ = session.api_client.tasks.stop(task_id, force=True)
try:
print('Pushing task {} into temporary pending queue'.format(task_id))
res = self._session.api_client.tasks.stop(task_id, force=True)
res = self._session.api_client.tasks.enqueue(
task_id,
queue=self.k8s_pending_queue_name,
status_reason='k8s pending scheduler',
)
if res.meta.result_code != 200:
raise Exception(res.meta.result_msg)
except Exception as e:
self.log.error("ERROR: Could not push back task [{}] to k8s pending queue [{}], error: {}".format(
task_id, self.k8s_pending_queue_name, e))
return
res = self._session.api_client.tasks.enqueue(
task_id,
queue=self.k8s_pending_queue_id,
status_reason='k8s pending scheduler',
)
if res.meta.result_code != 200:
raise Exception(res.meta.result_msg)
except Exception as e:
self.log.error("ERROR: Could not push back task [{}] to k8s pending queue {} [{}], error: {}".format(
task_id, self.k8s_pending_queue_name, self.k8s_pending_queue_id, e))
return
container = get_task_container(session, task_id)
container = get_task_container(self._session, task_id)
if not container.get('image'):
container['image'] = str(
ENV_DOCKER_IMAGE.get() or session.config.get("agent.default_docker.image", "nvidia/cuda")
ENV_DOCKER_IMAGE.get() or self._session.config.get("agent.default_docker.image", "nvidia/cuda")
)
container['arguments'] = session.config.get("agent.default_docker.arguments", None)
container['arguments'] = self._session.config.get("agent.default_docker.arguments", None)
set_task_container(
session, task_id, docker_image=container['image'], docker_arguments=container['arguments']
self._session, task_id, docker_image=container['image'], docker_arguments=container['arguments']
)
# get the clearml.conf encoded file, make sure we use system packages!
git_user = ENV_AGENT_GIT_USER.get() or self._session.config.get("agent.git_user", None)
git_pass = ENV_AGENT_GIT_PASS.get() or self._session.config.get("agent.git_pass", None)
extra_config_values = [
'agent.package_manager.system_site_packages: true',
'agent.git_user: "{}"'.format(git_user) if git_user else '',
'agent.git_pass: "{}"'.format(git_pass) if git_pass else '',
]
# get the clearml.conf encoded file
# noinspection PyProtectedMember
config_content = (
self.conf_file_content or Path(session._config_file).read_text() or ""
) + '\n{}\n'.format('\n'.join(x for x in extra_config_values if x))
hocon_config_encoded = config_content.encode("ascii")
create_clearml_conf = ["echo '{}' | base64 --decode >> ~/clearml.conf".format(
hocon_config_encoded = (
self.conf_file_content
or Path(self._session._config_file).read_text()
).encode("ascii")
create_clearml_conf = "echo '{}' | base64 --decode >> ~/clearml.conf".format(
base64.b64encode(
hocon_config_encoded
).decode('ascii')
)]
if task_session:
create_clearml_conf.append(
"export CLEARML_AUTH_TOKEN=$(echo '{}' | base64 --decode)".format(
base64.b64encode(task_session.token.encode("ascii")).decode('ascii')
)
)
)
if self.ports_mode:
print("Kubernetes looking for available pod to use")
@@ -494,40 +418,39 @@ class K8sIntegration(Worker):
pod_number = self.base_pod_num
while self.ports_mode or self.max_pods_limit:
pod_number = self.base_pod_num + pod_count
kubectl_cmd_new = self.get_kubectl_command(
"get pods",
extra_labels=[self.limit_pod_label.format(pod_number=pod_number)] if self.ports_mode else None
)
self.log.debug("Looking for a free pod/port: {}".format(kubectl_cmd_new))
if self.ports_mode:
kubectl_cmd_new = "kubectl get pods -l {pod_label},{agent_label} -n {namespace}".format(
pod_label=self.LIMIT_POD_LABEL.format(pod_number=pod_number),
agent_label=self._get_agent_label(),
namespace=self.namespace,
)
else:
kubectl_cmd_new = "kubectl get pods -l {agent_label} -n {namespace} -o json".format(
agent_label=self._get_agent_label(),
namespace=self.namespace,
)
process = subprocess.Popen(kubectl_cmd_new.split(), stdout=subprocess.PIPE, stderr=subprocess.PIPE)
output, error = process.communicate()
output = stringify_bash_output(output)
error = stringify_bash_output(error)
output = '' if not output else output if isinstance(output, str) else output.decode('utf-8')
error = '' if not error else error if isinstance(error, str) else error.decode('utf-8')
try:
items_count = len(json.loads(output).get("items", []))
except (ValueError, TypeError) as ex:
self.log.warning(
"K8S Glue pods monitor: Failed parsing kubectl output:\n{}\ntask '{}' "
"will be enqueued back to queue '{}'\nEx: {}".format(
output, task_id, queue, ex
)
)
session.api_client.tasks.stop(task_id, force=True)
# noinspection PyBroadException
try:
self._session.api_client.tasks.enqueue(task_id, queue=queue, status_reason='kubectl parsing error')
except:
self.log.warning("Failed enqueuing task to queue '{}'".format(queue))
return
if not items_count:
# No such pod exist so we can use the pod_number we found (result exists but with no items)
if not output:
# No such pod exist so we can use the pod_number we found
break
if self.max_pods_limit:
current_pod_count = items_count
try:
current_pod_count = len(json.loads(output).get("items", []))
except (ValueError, TypeError) as ex:
self.log.warning(
"K8S Glue pods monitor: Failed parsing kubectl output:\n{}\ntask '{}' "
"will be enqueued back to queue '{}'\nEx: {}".format(
output, task_id, queue, ex
)
)
self._session.api_client.tasks.stop(task_id, force=True)
self._session.api_client.tasks.enqueue(task_id, queue=queue, status_reason='kubectl parsing error')
return
max_count = self.max_pods_limit
else:
current_pod_count = pod_count
@@ -543,54 +466,42 @@ class K8sIntegration(Worker):
task_id, queue
)
)
session.api_client.tasks.stop(task_id, force=True)
# noinspection PyBroadException
try:
self._session.api_client.tasks.enqueue(
task_id, queue=queue, status_reason='k8s max pod limit (no free k8s service)'
)
except:
self.log.warning("Failed enqueuing task to queue '{}'".format(queue))
self._session.api_client.tasks.stop(task_id, force=True)
self._session.api_client.tasks.enqueue(
task_id, queue=queue, status_reason='k8s max pod limit (no free k8s service)')
return
elif self.max_pods_limit:
# max pods limit hasn't reached yet, so we can create the pod
break
pod_count += 1
labels = self._get_pod_labels(queue, queue_name)
if self.ports_mode:
labels.append(self.limit_pod_label.format(pod_number=pod_number))
labels = ([self.LIMIT_POD_LABEL.format(pod_number=pod_number)] if self.ports_mode else []) + \
[self._get_agent_label()]
labels.append("clearml-agent-queue={}".format(self._safe_k8s_label_value(queue)))
labels.append("clearml-agent-queue-name={}".format(self._safe_k8s_label_value(queue_name)))
if self.ports_mode:
print("Kubernetes scheduling task id={} on pod={} (pod_count={})".format(task_id, pod_number, pod_count))
else:
print("Kubernetes scheduling task id={}".format(task_id))
try:
template = self._resolve_template(task_session, task_data, queue)
except Exception as ex:
print("ERROR: Failed resolving template (skipping): {}".format(ex))
return
kubectl_kwargs = dict(
create_clearml_conf=create_clearml_conf,
labels=labels,
docker_image=container['image'],
docker_args=container['arguments'],
docker_bash=container.get('setup_shell_script'),
task_id=task_id,
queue=queue
)
try:
namespace = template['metadata']['namespace'] or self.namespace
except (KeyError, TypeError, AttributeError):
namespace = self.namespace
if template:
output, error = self._kubectl_apply(
template=template,
pod_number=pod_number,
create_clearml_conf=create_clearml_conf,
labels=labels,
docker_image=container['image'],
docker_args=container['arguments'],
docker_bash=container.get('setup_shell_script'),
task_id=task_id,
queue=queue,
namespace=namespace,
)
if self.template_dict:
output, error = self._kubectl_apply(**kubectl_kwargs)
else:
output, error = self._kubectl_run(task_data=task_data, **kubectl_kwargs)
error = '' if not error else (error if isinstance(error, str) else error.decode('utf-8'))
output = '' if not output else (output if isinstance(output, str) else output.decode('utf-8'))
print('kubectl output:\n{}\n{}'.format(error, output))
if error:
send_log = "Running kubectl encountered an error: {}".format(error)
@@ -618,17 +529,9 @@ class K8sIntegration(Worker):
if user_props:
self._set_task_user_properties(
task_id=task_id,
task_session=task_session,
**user_props
)
def _get_pod_labels(self, queue, queue_name):
return [
self._get_agent_label(),
"clearml-agent-queue={}".format(self._safe_k8s_label_value(queue)),
"clearml-agent-queue-name={}".format(self._safe_k8s_label_value(queue_name))
]
def _get_docker_args(self, docker_args, flags, target=None, convert=None):
# type: (List[str], Collection[str], Optional[str], Callable[[str], Any]) -> Union[dict, List[str]]
"""
@@ -655,23 +558,12 @@ class K8sIntegration(Worker):
return {target: results} if results else {}
return results
def _kubectl_apply(
self,
create_clearml_conf,
docker_image,
docker_args,
docker_bash,
labels,
queue,
task_id,
namespace,
template=None,
pod_number=None
):
def _kubectl_apply(self, create_clearml_conf, docker_image, docker_args, docker_bash, labels, queue, task_id):
template = deepcopy(self.template_dict)
template.setdefault('apiVersion', 'v1')
template['kind'] = 'Pod'
template.setdefault('metadata', {})
name = self.pod_name_prefix + str(task_id)
name = 'clearml-id-{task_id}'.format(task_id=task_id)
template['metadata']['name'] = name
template.setdefault('spec', {})
template['spec'].setdefault('containers', [])
@@ -699,30 +591,22 @@ class K8sIntegration(Worker):
['#!/bin/bash', ] +
[line.format(extra_bash_init_cmd=self.extra_bash_init_script or '',
task_id=task_id,
extra_docker_bash_script=extra_docker_bash_script,
default_execution_agent_args=self.DEFAULT_EXECUTION_AGENT_ARGS,
agent_install_args=self.POD_AGENT_INSTALL_ARGS)
extra_docker_bash_script=extra_docker_bash_script)
for line in container_bash_script])
extra_bash_commands = list(create_clearml_conf or [])
start_agent_script_path = ENV_START_AGENT_SCRIPT_PATH.get() or "~/__start_agent__.sh"
extra_bash_commands.append(
"echo '{content}' | base64 --decode >> {script_path} ; /bin/bash {script_path}".format(
content=base64.b64encode(
create_init_script = \
"echo '{}' | base64 --decode >> ~/__start_agent__.sh ; " \
"/bin/bash ~/__start_agent__.sh".format(
base64.b64encode(
script_encoded.encode('ascii')
).decode('ascii'),
script_path=start_agent_script_path
)
)
).decode('ascii'))
# Notice: we always leave with exit code 0, so pods are never restarted
container = self._merge_containers(
container,
dict(name=name, image=docker_image,
command=['/bin/bash'],
args=['-c', '{} ; exit 0'.format(' ; '.join(extra_bash_commands))])
args=['-c', '{} ; {} ; exit 0'.format(create_clearml_conf, create_init_script)])
)
if template['spec']['containers']:
@@ -739,13 +623,11 @@ class K8sIntegration(Worker):
with open(yaml_file, 'wt') as f:
yaml.dump(template, f)
self.log.debug("Applying template:\n{}".format(pformat(template, indent=2)))
kubectl_cmd = self.KUBECTL_APPLY_CMD.format(
task_id=task_id,
docker_image=docker_image,
queue_id=queue,
namespace=namespace
namespace=self.namespace
)
# make sure we provide a list
if isinstance(kubectl_cmd, str):
@@ -761,34 +643,57 @@ class K8sIntegration(Worker):
finally:
safe_remove_file(yaml_file)
return stringify_bash_output(output), stringify_bash_output(error)
return output, error
def _cleanup_old_pods(self, namespaces, extra_msg=None):
# type: (Iterable[str], Optional[str]) -> Dict[str, List[str]]
self.log.debug("Cleaning up pods")
deleted_pods = defaultdict(list)
for namespace in namespaces:
if time() - self._last_pod_cleanup_per_ns[namespace] < self._min_cleanup_interval_per_ns_sec:
# Do not try to cleanup the same namespace too quickly
continue
kubectl_cmd = self.KUBECTL_DELETE_CMD.format(namespace=namespace, agent_label=self._get_agent_label())
self.log.debug("Deleting old/failed pods{} for ns {}: {}".format(
extra_msg or "", namespace, kubectl_cmd
))
try:
res = get_bash_output(kubectl_cmd, raise_error=True)
lines = [
line for line in
(r.strip().rpartition("/")[-1] for r in res.splitlines())
if line.startswith(self.pod_name_prefix)
]
self.log.debug(" - deleted pod(s) %s", ", ".join(lines))
deleted_pods[namespace].extend(lines)
except Exception as ex:
self.log.error("Failed deleting old/failed pods for ns %s: %s", namespace, str(ex))
finally:
self._last_pod_cleanup_per_ns[namespace] = time()
return deleted_pods
def _kubectl_run(
self, create_clearml_conf, docker_image, docker_args, docker_bash, labels, queue, task_data, task_id
):
if callable(self.kubectl_cmd):
kubectl_cmd = self.kubectl_cmd(task_id, docker_image, docker_args, queue, task_data)
else:
kubectl_cmd = self.kubectl_cmd.format(
task_id=task_id,
docker_image=docker_image,
docker_args=" ".join(self._get_docker_args(
docker_args, flags={"-e", "--env"}, convert=lambda env: '--env={}'.format(env))
),
queue_id=queue,
namespace=self.namespace,
)
# make sure we provide a list
if isinstance(kubectl_cmd, str):
kubectl_cmd = kubectl_cmd.split()
if self.overrides_json_string:
kubectl_cmd += ['--overrides=' + self.overrides_json_string]
if self.pod_limits:
kubectl_cmd += ['--limits', ",".join(self.pod_limits)]
if self.pod_requests:
kubectl_cmd += ['--requests', ",".join(self.pod_requests)]
if self._docker_force_pull and not any(x.startswith("--image-pull-policy=") for x in kubectl_cmd):
kubectl_cmd += ["--image-pull-policy='always'"]
container_bash_script = [self.container_bash_script] if isinstance(self.container_bash_script, str) \
else self.container_bash_script
container_bash_script = ' ; '.join(container_bash_script)
kubectl_cmd += [
"--labels=" + ",".join(labels),
"--command",
"--",
"/bin/sh",
"-c",
"{} ; {}".format(create_clearml_conf, container_bash_script.format(
extra_bash_init_cmd=self.extra_bash_init_script or "",
extra_docker_bash_script=docker_bash or "",
task_id=task_id
)),
]
process = subprocess.Popen(kubectl_cmd, stdout=subprocess.PIPE, stderr=subprocess.PIPE)
output, error = process.communicate()
return output, error
def run_tasks_loop(self, queues: List[Text], worker_params, **kwargs):
"""
@@ -804,29 +709,26 @@ class K8sIntegration(Worker):
events_service = self.get_service(Events)
# make sure we have a k8s pending queue
if not self.k8s_pending_queue_id:
resolved_ids = self._resolve_queue_names([self.k8s_pending_queue_name], create_if_missing=True)
if not resolved_ids:
raise ValueError(
"Failed resolving or creating k8s pending queue {}".format(self.k8s_pending_queue_name)
)
self.k8s_pending_queue_id = resolved_ids[0]
# noinspection PyBroadException
try:
self._session.api_client.queues.create(self.k8s_pending_queue_name)
except Exception:
pass
# get queue id
self.k8s_pending_queue_name = self._resolve_name(self.k8s_pending_queue_name, "queues")
_last_machine_update_ts = 0
while True:
# Get used pods and namespaces
current_pods, namespaces = self._get_used_pods()
# just in case there are no pods, make sure we look at our base namespace
namespaces.add(self.namespace)
# check if have pod limit, then check if we hit it.
if self.max_pods_limit:
current_pods = self._get_number_used_pods()
if current_pods >= self.max_pods_limit:
print("Maximum pod limit reached {}/{}, sleeping for {:.1f} seconds".format(
current_pods, self.max_pods_limit, self._polling_interval))
# delete old completed / failed pods
self._cleanup_old_pods(namespaces, " due to pod limit")
get_bash_output(
self.KUBECTL_DELETE_CMD.format(namespace=self.namespace, selector=self._get_agent_label())
)
# go to sleep
sleep(self._polling_interval)
continue
@@ -834,41 +736,22 @@ class K8sIntegration(Worker):
# iterate over queues (priority style, queues[0] is highest)
for queue in queues:
# delete old completed / failed pods
self._cleanup_old_pods(namespaces)
get_bash_output(
self.KUBECTL_DELETE_CMD.format(namespace=self.namespace, selector=self._get_agent_label())
)
# get next task in queue
try:
response = self._get_next_task(queue=queue, get_task_info=self._impersonate_as_task_owner)
response = self._session.api_client.queues.get_next_task(queue=queue)
except Exception as e:
print("Warning: Could not access task queue [{}], error: {}".format(queue, e))
continue
else:
if not response:
continue
try:
task_id = response["entry"]["task"]
except (KeyError, TypeError, AttributeError):
task_id = response.entry.task
except AttributeError:
print("No tasks in queue {}".format(queue))
continue
task_session = None
if self._impersonate_as_task_owner:
try:
task_user = response["task_info"]["user"]
task_company = response["task_info"]["company"]
except (KeyError, TypeError, AttributeError):
print("Error: cannot retrieve owner user for the task '{}', skipping".format(task_id))
continue
task_session = self.get_task_session(task_user, task_company)
if not task_session:
print(
"Error: Could not login as the user '{}' for the task '{}', skipping".format(
task_user, task_id
)
)
continue
events_service.send_log_events(
self.worker_id,
task_id=task_id,
@@ -876,11 +759,10 @@ class K8sIntegration(Worker):
task_id, queue, self.worker_id
),
level="INFO",
session=task_session,
)
self.report_monitor(ResourceMonitor.StatusReport(queues=queues, queue=queue, task=task_id))
self.run_one_task(queue, task_id, worker_params, task_session)
self.run_one_task(queue, task_id, worker_params)
self.report_monitor(ResourceMonitor.StatusReport(queues=self.queues))
break
else:
@@ -891,7 +773,7 @@ class K8sIntegration(Worker):
if self._session.config["agent.reload_config"]:
self.reload_config()
def k8s_daemon(self, queue, **kwargs):
def k8s_daemon(self, queue):
"""
Start the k8s Glue service.
This service will be pulling tasks from *queue* and scheduling them for execution using kubectl.
@@ -902,19 +784,8 @@ class K8sIntegration(Worker):
:param list(str) queue: queue name to pull from
"""
return self.daemon(
queues=[ObjectID(name=queue)] if queue else None,
log_level=logging.INFO, foreground=True, docker=False, **kwargs,
)
def _get_next_task(self, queue, get_task_info):
return get_next_task(
self._session, queue=queue, get_task_info=get_task_info
)
def _resolve_template(self, task_session, task_data, queue):
if self.template_dict:
return deepcopy(self.template_dict)
return self.daemon(queues=[ObjectID(name=queue)] if queue else None,
log_level=logging.INFO, foreground=True, docker=False)
@classmethod
def get_ssh_server_bash(cls, ssh_port_number):

View File

@@ -20,13 +20,13 @@ from typing import Text, Dict, Any, Optional, AnyStr, IO, Union
import attr
import furl
import pyhocon
import yaml
from attr import fields_dict
from pathlib2 import Path
import six
from six.moves import reduce
from clearml_agent.external import pyhocon
from clearml_agent.errors import CommandFailedError
from clearml_agent.helper.dicts import filter_keys
@@ -204,13 +204,10 @@ def get_python_path(script_dir, entry_point, package_api, is_conda_env=False):
["-c", "import sys; print('{}'.join(sys.path))".format(python_path_sep)])
org_python_path = python_path_cmd.get_output(cwd=script_dir)
# Add path of the script directory and executable directory
python_path = '{}{python_path_sep}'.format(
Path(script_dir).absolute().as_posix(), python_path_sep=python_path_sep)
if entry_point:
python_path += '{}{python_path_sep}'.format(
(Path(script_dir) / Path(entry_point)).parent.absolute().as_posix(),
python_path_sep=python_path_sep)
python_path = '{}{python_path_sep}{}{python_path_sep}'.format(
Path(script_dir).absolute().as_posix(),
(Path(script_dir) / Path(entry_point)).parent.absolute().as_posix(),
python_path_sep=python_path_sep)
if is_windows_platform():
python_path = python_path.replace('/', '\\')
@@ -506,38 +503,6 @@ def is_conda(config):
return config['agent.package_manager.type'].lower() == 'conda'
def convert_cuda_version_to_float_single_digit_str(cuda_version):
"""
Convert a cuda_version (string/float/int) into a float representation, e.g. 11.4
Notice returns String Single digit only!
:return str:
"""
cuda_version = str(cuda_version or 0)
# if we have patch version we parse it here
cuda_version_parts = [int(v) for v in cuda_version.split('.')]
if len(cuda_version_parts) > 1 or cuda_version_parts[0] < 60:
cuda_version = 10 * cuda_version_parts[0]
if len(cuda_version_parts) > 1:
cuda_version += float(".{:d}".format(cuda_version_parts[1]))*10
cuda_version_full = "{:.1f}".format(float(cuda_version) / 10.)
else:
cuda_version = cuda_version_parts[0]
cuda_version_full = "{:.1f}".format(float(cuda_version) / 10.)
return cuda_version_full
def convert_cuda_version_to_int_10_base_str(cuda_version):
"""
Convert a cuda_version (string/float/int) into an integer version, e.g. 112 for cuda 11.2
Return string
:return str:
"""
cuda_version = convert_cuda_version_to_float_single_digit_str(cuda_version)
return str(int(float(cuda_version)*10))
class NonStrictAttrs(object):
@classmethod

View File

@@ -2,7 +2,7 @@ from __future__ import unicode_literals, print_function
import csv
import sys
from collections.abc import Iterable
from collections import Iterable
from typing import List, Dict, Text, Any
from attr import attrs, attrib

View File

@@ -1,96 +0,0 @@
import re
import shlex
from typing import Tuple, List, TYPE_CHECKING
from urllib.parse import urlunparse, urlparse
from clearml_agent.definitions import (
ENV_AGENT_GIT_PASS,
ENV_AGENT_SECRET_KEY,
ENV_AWS_SECRET_KEY,
ENV_AZURE_ACCOUNT_KEY,
ENV_AGENT_AUTH_TOKEN,
ENV_DOCKER_IMAGE,
ENV_DOCKER_ARGS_HIDE_ENV,
)
if TYPE_CHECKING:
from clearml_agent.session import Session
class DockerArgsSanitizer:
@classmethod
def sanitize_docker_command(cls, session, docker_command):
# type: (Session, List[str]) -> List[str]
if not docker_command:
return docker_command
enabled = (
session.config.get('agent.hide_docker_command_env_vars.enabled', False) or ENV_DOCKER_ARGS_HIDE_ENV.get()
)
if not enabled:
return docker_command
keys = set(session.config.get('agent.hide_docker_command_env_vars.extra_keys', []))
if ENV_DOCKER_ARGS_HIDE_ENV.get():
keys.update(shlex.split(ENV_DOCKER_ARGS_HIDE_ENV.get().strip()))
keys.update(
ENV_AGENT_GIT_PASS.vars,
ENV_AGENT_SECRET_KEY.vars,
ENV_AWS_SECRET_KEY.vars,
ENV_AZURE_ACCOUNT_KEY.vars,
ENV_AGENT_AUTH_TOKEN.vars,
)
parse_embedded_urls = bool(session.config.get(
'agent.hide_docker_command_env_vars.parse_embedded_urls', True
))
skip_next = False
result = docker_command[:]
for i, item in enumerate(docker_command):
if skip_next:
skip_next = False
continue
try:
if item in ("-e", "--env"):
key, sep, val = result[i + 1].partition("=")
if not sep:
continue
if key in ENV_DOCKER_IMAGE.vars:
# special case - this contains a complete docker command
val = " ".join(cls.sanitize_docker_command(session, re.split(r"\s", val)))
elif key in keys:
val = "********"
elif parse_embedded_urls:
val = cls._sanitize_urls(val)[0]
result[i + 1] = "{}={}".format(key, val)
skip_next = True
elif parse_embedded_urls and not item.startswith("-"):
item, changed = cls._sanitize_urls(item)
if changed:
result[i] = item
except (KeyError, TypeError):
pass
return result
@staticmethod
def _sanitize_urls(s: str) -> Tuple[str, bool]:
""" Replaces passwords in URLs with asterisks """
regex = re.compile("^([^:]*:)[^@]+(.*)$")
tokens = re.split(r"\s", s)
changed = False
for k in range(len(tokens)):
if "@" in tokens[k]:
res = urlparse(tokens[k])
if regex.match(res.netloc):
changed = True
tokens[k] = urlunparse((
res.scheme,
regex.sub("\\1********\\2", res.netloc),
res.path,
res.params,
res.query,
res.fragment
))
return " ".join(tokens) if changed else s, changed

View File

@@ -213,13 +213,6 @@ class PackageManager(object):
return
return self._get_cache_manager().get_last_copied_entry()
def is_cached_enabled(self):
if not self._cache_manager:
cache_folder = ENV_VENV_CACHE_PATH.get() or self.session.config.get(self._config_cache_folder, None)
if not cache_folder:
return False
return True
@classmethod
def _generate_reqs_hash_keys(cls, requirements_list, docker_cmd, python_version, cuda_version):
# type: (Union[Dict, List[Dict]], Optional[Union[dict, str]], Optional[str], Optional[str]) -> List[str]

View File

@@ -19,9 +19,7 @@ from clearml_agent.external.requirements_parser import parse
from clearml_agent.external.requirements_parser.requirement import Requirement
from clearml_agent.errors import CommandFailedError
from clearml_agent.helper.base import (
rm_tree, NonStrictAttrs, select_for_platform, is_windows_platform, ExecutionInfo,
convert_cuda_version_to_float_single_digit_str, convert_cuda_version_to_int_10_base_str, )
from clearml_agent.helper.base import rm_tree, NonStrictAttrs, select_for_platform, is_windows_platform, ExecutionInfo
from clearml_agent.helper.process import Argv, Executable, DEVNULL, CommandSequence, PathLike
from clearml_agent.helper.package.requirements import SimpleVersion
from clearml_agent.session import Session
@@ -169,7 +167,7 @@ class CondaAPI(PackageManager):
raise ValueError("Could not restore Conda environment, cannot find {}".format(
self.conda_pre_build_env_path))
command = Argv(
output = Argv(
self.conda,
"create",
"--yes",
@@ -177,9 +175,7 @@ class CondaAPI(PackageManager):
"--prefix",
self.path,
"python={}".format(self.python),
)
print('Executing Conda: {}'.format(command.serialize()))
output = command.get_output(stderr=DEVNULL)
).get_output(stderr=DEVNULL)
match = re.search(
r"\W*(.*activate) ({})".format(re.escape(str(self.path))), output
)
@@ -193,6 +189,14 @@ class CondaAPI(PackageManager):
if conda_env.is_file() and not is_windows_platform():
self.source = self.pip.source = CommandSequence(('source', conda_env.as_posix()), self.source)
# install cuda toolkit
# noinspection PyBroadException
try:
cuda_version = float(int(self.session.config['agent.cuda_version'])) / 10.0
if cuda_version > 0:
self._install('cudatoolkit={:.1f}'.format(cuda_version))
except Exception:
pass
return self
def _init_existing_environment(self, conda_pre_build_env_path):
@@ -424,7 +428,7 @@ class CondaAPI(PackageManager):
finally:
PackageManager._selected_manager = self
self.requirements_manager.post_install(self.session, package_manager=self)
self.requirements_manager.post_install(self.session)
def load_requirements(self, requirements):
# if we are in read only mode, do not uninstall anything
@@ -452,18 +456,9 @@ class CondaAPI(PackageManager):
requirements['conda'] = requirements['conda'].split('\n')
has_torch = False
has_matplotlib = False
has_cudatoolkit = False
cuda_version_full = 0
# noinspection PyBroadException
try:
# notice this is an integer version: 112 (means 11.2)
cuda_version = str(self.session.config.get('agent.cuda_version', "")).strip()
if not cuda_version:
cuda_version = 0
else:
cuda_version_full = convert_cuda_version_to_float_single_digit_str(cuda_version)
cuda_version = int(convert_cuda_version_to_int_10_base_str(cuda_version))
except Exception:
cuda_version = int(self.session.config.get('agent.cuda_version', 0))
except:
cuda_version = 0
# notice 'conda' entry with empty string is a valid conda requirements list, it means pip only
@@ -480,7 +475,6 @@ class CondaAPI(PackageManager):
continue
m = MarkerRequirement(marker[0])
m.validate_local_file_ref()
# conda does not support version control links
if m.vcs:
pip_requirements.append(m)
@@ -494,19 +488,6 @@ class CondaAPI(PackageManager):
if '.' not in m.specs[0][1]:
continue
if m.name.lower() == 'cudatoolkit':
# skip cuda if we are running on CPU
if not cuda_version:
continue
has_cudatoolkit = True
# cuda version, only major.minor
requested_cuda_version = '.'.join(m.specs[0][1].split('.')[:2])
# make sure that the cuda_version we support can install the requested cuda (major version)
if int(float(requested_cuda_version)) > int(float(cuda_version)/10.0):
continue
m.specs = [(m.specs[0][0], str(requested_cuda_version)), ]
conda_supported_req_names.append(m.name.lower())
if m.req.name.lower() == 'matplotlib':
has_matplotlib = True
@@ -523,11 +504,6 @@ class CondaAPI(PackageManager):
reqs.append(m)
if not has_cudatoolkit and cuda_version:
m = MarkerRequirement(Requirement.parse("cudatoolkit == {}".format(cuda_version_full)))
has_cudatoolkit = True
reqs.append(m)
# if we have a conda list, the rest should be installed with pip,
# this means any experiment that was executed with pip environment,
# will be installed using pip
@@ -541,9 +517,9 @@ class CondaAPI(PackageManager):
continue
m = MarkerRequirement(marker[0])
# remove local files reference if it does not exist (leave the package name)
m.validate_local_file_ref()
# skip over local files (we cannot change the version to a local file)
if m.local_file:
continue
m_name = (m.name or '').lower()
if m_name in conda_supported_req_names:
# this package is in the conda list,
@@ -583,12 +559,8 @@ class CondaAPI(PackageManager):
# change _ to - in name but not the prefix _ (as this is conda prefix)
if r.name and not r.name.startswith('_') and not requirements.get('conda', None):
r.name = r.name.replace('_', '-')
if has_cudatoolkit and r.specs and len(r.specs[0]) > 1 and r.name == 'cudatoolkit':
# select specific cuda version if it came from the requirements
r.specs = [(r.specs[0][0].replace('==', '='), r.specs[0][1].split('.post')[0])]
elif r.specs and r.specs[0] and len(r.specs[0]) > 1:
# remove .post from version numbers it fails with ~= version, and change == to ~=
# remove .post from version numbers, it fails ~= version, and change == to ~=
if r.specs and r.specs[0]:
r.specs = [(r.specs[0][0].replace('==', '~='), r.specs[0][1].split('.post')[0])]
while reqs:
@@ -642,7 +614,7 @@ class CondaAPI(PackageManager):
finally:
PackageManager._selected_manager = self
self.requirements_manager.post_install(self.session, package_manager=self)
self.requirements_manager.post_install(self.session)
return True
def _parse_conda_result_bad_packges(self, result_dict):

View File

@@ -46,10 +46,11 @@ class ExternalRequirements(SimpleSubstitution):
post_install_req = self.post_install_req
self.post_install_req = []
for req in post_install_req:
if self.is_already_installed(req):
print("No need to reinstall \'{}\' from VCS, "
"the exact same version is already installed".format(req.name))
continue
try:
freeze_base = PackageManager.out_of_scope_freeze() or ''
except:
freeze_base = ''
req_line = self._add_vcs_credentials(req, session)
# if we have older pip version we have to make sure we replace back the package name with the
@@ -95,8 +96,7 @@ class ExternalRequirements(SimpleSubstitution):
vcs._set_ssh_url()
new_req_line = 'git+{}{}{}'.format(
'' if scheme and '://' in vcs.url else scheme,
vcs_url if session.config.get('agent.force_git_ssh_protocol', None) else vcs.url_with_auth,
fragment
vcs.url_with_auth, fragment
)
if new_req_line != req_line:
furl_line = furl(new_req_line)
@@ -175,11 +175,5 @@ class OnlyExternalRequirements(ExternalRequirements):
# Do not store the skipped requirements
# mark skip package
if super(OnlyExternalRequirements, self).match(req):
if self.is_already_installed(req):
print("No need to reinstall \'{}\' from VCS, "
"the exact same version is already installed".format(req.name))
return Text('')
return self._add_vcs_credentials(req, self._session)
return Text('')

View File

@@ -12,7 +12,7 @@ from ..requirements import RequirementsManager
class VirtualenvPip(SystemPip, PackageManager):
def __init__(self, session, python, requirements_manager, path, interpreter=None, execution_info=None, **kwargs):
# type: (Session, str, RequirementsManager, PathLike, PathLike, ExecutionInfo, Any) -> ()
# type: (Session, float, RequirementsManager, PathLike, PathLike, ExecutionInfo, Any) -> ()
"""
Program interface to virtualenv pip.
Must be given either path to virtualenv or source command.
@@ -39,7 +39,7 @@ class VirtualenvPip(SystemPip, PackageManager):
if isinstance(requirements, dict) and requirements.get("pip"):
requirements["pip"] = self.requirements_manager.replace(requirements["pip"])
super(VirtualenvPip, self).load_requirements(requirements)
self.requirements_manager.post_install(self.session, package_manager=self)
self.requirements_manager.post_install(self.session)
def create_flags(self):
"""

View File

@@ -5,7 +5,6 @@ import attr
import sys
import os
from pathlib2 import Path
from clearml_agent.helper.process import Argv, DEVNULL, check_if_command_exists
from clearml_agent.session import Session, POETRY
@@ -82,32 +81,6 @@ class PoetryConfig:
@_guard_enabled
def initialize(self, cwd=None):
if not self._initialized:
if self.session.config.get("agent.package_manager.poetry_version", None) is not None:
version = str(self.session.config.get("agent.package_manager.poetry_version"))
print('Upgrading Poetry package {}'.format(version))
# first upgrade pip if we need to
try:
from clearml_agent.helper.package.pip_api.venv import VirtualenvPip
pip = VirtualenvPip(
session=self.session, python=self._python,
requirements_manager=None, path=None, interpreter=self._python)
pip.upgrade_pip()
except Exception as ex:
self.log.warning("failed upgrading pip: {}".format(ex))
# now install poetry
try:
version = version.replace(' ', '')
if ('=' in version) or ('~' in version) or ('<' in version) or ('>' in version):
version = version
elif version:
version = "==" + version
argv = Argv(self._python, "-m", "pip", "install", "poetry{}".format(version),
"--upgrade", "--disable-pip-version-check")
print(argv.get_output())
except Exception as ex:
self.log.warning("failed upgrading poetry: {}".format(ex))
self._initialized = True
try:
self._config("--local", "virtualenvs.in-project", "true", cwd=cwd)

View File

@@ -1,4 +1,3 @@
import re
from typing import Text
from .base import PackageManager
@@ -12,14 +11,13 @@ class PriorityPackageRequirement(SimpleSubstitution):
def __init__(self, *args, **kwargs):
super(PriorityPackageRequirement, self).__init__(*args, **kwargs)
self._replaced_packages = {}
# check if we need to replace the packages:
priority_packages = self.config.get('agent.package_manager.priority_packages', None)
if priority_packages:
self.__class__.name = [p.lower() for p in priority_packages]
self.__class__.name = priority_packages
priority_optional_packages = self.config.get('agent.package_manager.priority_optional_packages', None)
if priority_optional_packages:
self.__class__.optional_package_names = [p.lower() for p in priority_optional_packages]
self.__class__.optional_package_names = priority_optional_packages
def match(self, req):
# match both Cython & cython
@@ -30,9 +28,7 @@ class PriorityPackageRequirement(SimpleSubstitution):
Replace a requirement
:raises: ValueError if version is pre-release
"""
self._replaced_packages[req.name] = req.line
if req.name.lower() in self.optional_package_names:
if req.name in self.optional_package_names:
# noinspection PyBroadException
try:
if PackageManager.out_of_scope_install_package(str(req)):
@@ -43,41 +39,6 @@ class PriorityPackageRequirement(SimpleSubstitution):
PackageManager.out_of_scope_install_package(str(req))
return Text(req)
def replace_back(self, list_of_requirements):
"""
:param list_of_requirements: {'pip': ['a==1.0', ]}
:return: {'pip': ['a==1.0', ]}
"""
# if we replaced setuptools, it means someone requested it, and since freeze will not contain it,
# we need to add it manually
if not self._replaced_packages or "setuptools" not in self._replaced_packages:
return list_of_requirements
try:
for k, lines in list_of_requirements.items():
# k is either pip/conda
if k not in ('pip', 'conda'):
continue
for i, line in enumerate(lines):
if not line or line.lstrip().startswith('#'):
continue
parts = [p for p in re.split(r'\s|=|\.|<|>|~|!|@|#', line) if p]
if not parts:
continue
# if we found setuptools, do nothing
if parts[0] == "setuptools":
return list_of_requirements
# if we are here it means we have not found setuptools
# we should add it:
if "pip" in list_of_requirements:
list_of_requirements["pip"] = [self._replaced_packages["setuptools"]] + list_of_requirements["pip"]
except Exception as ex: # noqa
return list_of_requirements
return list_of_requirements
class PackageCollectorRequirement(SimpleSubstitution):
"""

View File

@@ -2,21 +2,17 @@ from __future__ import unicode_literals
import re
import sys
import platform
from furl import furl
import urllib.parse
from operator import itemgetter
from html.parser import HTMLParser
from typing import Text, Optional, Dict
from typing import Text
import attr
import requests
import six
from .requirements import (
SimpleSubstitution, FatalSpecsResolutionError, SimpleVersion, MarkerRequirement,
compare_version_rules, )
from ...external.requirements_parser.requirement import Requirement
from .requirements import SimpleSubstitution, FatalSpecsResolutionError, SimpleVersion
OS_TO_WHEEL_NAME = {"linux": "linux_x86_64", "windows": "win_amd64"}
@@ -55,16 +51,17 @@ class PytorchWheel(object):
python = attr.ib(type=str, converter=lambda x: str(x).replace(".", ""))
torch_version = attr.ib(type=str, converter=fix_version)
url_template_prefix = "http://download.pytorch.org/whl/"
url_template = "{0.cuda_version}/torch-{0.torch_version}" \
"-cp{0.python}-cp{0.python}m{0.unicode}-{0.os_name}.whl"
url_template = (
"http://download.pytorch.org/whl/"
"{0.cuda_version}/torch-{0.torch_version}-cp{0.python}-cp{0.python}m{0.unicode}-{0.os_name}.whl"
)
def __attrs_post_init__(self):
self.unicode = "u" if self.python.startswith("2") else ""
def make_url(self):
# type: () -> Text
return (self.url_template_prefix + self.url_template).format(self)
return self.url_template.format(self)
class PytorchResolutionError(FatalSpecsResolutionError):
@@ -171,72 +168,41 @@ class PytorchRequirement(SimpleSubstitution):
name = "torch"
packages = ("torch", "torchvision", "torchaudio", "torchcsprng", "torchtext")
extra_index_url_template = 'https://download.pytorch.org/whl/cu{}/'
nightly_extra_index_url_template = 'https://download.pytorch.org/whl/nightly/cu{}/'
torch_index_url_lookup = {}
def __init__(self, *args, **kwargs):
os_name = kwargs.pop("os_override", None)
super(PytorchRequirement, self).__init__(*args, **kwargs)
self.log = self._session.get_logger(__name__)
self.package_manager = self.config["agent.package_manager.type"].lower()
self.os = os_name or self.get_platform()
self.cuda = None
self.python_version_string = None
self.python_major_minor_str = None
self.python = None
self._fix_setuptools = None
self.exceptions = []
self.cuda = "cuda{}".format(self.cuda_version).lower()
self.python_version_string = str(self.config["agent.default_python"])
self.python_major_minor_str = '.'.join(self.python_version_string.split('.')[:2])
if '.' not in self.python_major_minor_str:
raise PytorchResolutionError(
"invalid python version {!r} defined in configuration file, key 'agent.default_python': "
"must have both major and minor parts of the version (for example: '3.7')".format(
self.python_version_string
)
)
self.python = "python{}".format(self.python_major_minor_str)
self.exceptions = [
PytorchResolutionError(message)
for message in (
None,
'cuda version "{}" is not supported'.format(self.cuda),
'python version "{}" is not supported'.format(
self.python_version_string
),
)
]
try:
self.validate_python_version()
except PytorchResolutionError as e:
self.log.warn("will not be able to install pytorch wheels: %s", e.args[0])
self._original_req = []
# allow override pytorch lookup pages
if self.config.get("agent.package_manager.extra_index_url_template", None):
self.extra_index_url_template = \
self.config.get("agent.package_manager.extra_index_url_template", None)
if self.config.get("agent.package_manager.nightly_extra_index_url_template", None):
self.nightly_extra_index_url_template = \
self.config.get("agent.package_manager.nightly_extra_index_url_template", None)
# allow override pytorch lookup pages
if self.config.get("agent.package_manager.torch_page", None):
SimplePytorchRequirement.page_lookup_template = \
self.config.get("agent.package_manager.torch_page", None)
if self.config.get("agent.package_manager.torch_nightly_page", None):
SimplePytorchRequirement.nightly_page_lookup_template = \
self.config.get("agent.package_manager.torch_nightly_page", None)
if self.config.get("agent.package_manager.torch_url_template_prefix", None):
PytorchWheel.url_template_prefix = \
self.config.get("agent.package_manager.torch_url_template_prefix", None)
if self.config.get("agent.package_manager.torch_url_template", None):
PytorchWheel.url_template = \
self.config.get("agent.package_manager.torch_url_template", None)
def _init_python_ver_cuda_ver(self):
if self.cuda is None:
self.cuda = "cuda{}".format(self.cuda_version).lower()
if self.python_version_string is None:
self.python_version_string = str(self.config["agent.default_python"])
if self.python_major_minor_str is None:
self.python_major_minor_str = '.'.join(self.python_version_string.split('.')[:2])
if '.' not in self.python_major_minor_str:
raise PytorchResolutionError(
"invalid python version {!r} defined in configuration file, key 'agent.default_python': "
"must have both major and minor parts of the version (for example: '3.7')".format(
self.python_version_string
)
)
if self.python is None:
self.python = "python{}".format(self.python_major_minor_str)
if not self.exceptions:
self.exceptions = [
PytorchResolutionError(message)
for message in (
None,
'cuda version "{}" is not supported'.format(self.cuda),
'python version "{}" is not supported'.format(
self.python_version_string
),
)
]
@property
def is_conda(self):
@@ -250,8 +216,6 @@ class PytorchRequirement(SimpleSubstitution):
"""
Make sure python version has both major and minor versions as required for choosing pytorch wheel
"""
self._init_python_ver_cuda_ver()
if self.is_pip and not self.python_major_minor_str:
raise PytorchResolutionError(
"invalid python version {!r} defined in configuration file, key 'agent.default_python': "
@@ -273,15 +237,10 @@ class PytorchRequirement(SimpleSubstitution):
return "macos"
raise RuntimeError("unrecognized OS")
@staticmethod
def get_arch():
return str(platform.machine()).lower()
def _get_link_from_torch_page(self, req, torch_url):
links_parser = LinksHTMLParser()
links_parser.feed(requests.get(torch_url, timeout=10).text)
platform_wheel = "win" if self.get_platform() == "windows" else self.get_platform()
arch_wheel = self.get_arch()
py_ver = self.python_major_minor_str.replace('.', '')
url = None
last_v = None
@@ -302,11 +261,8 @@ class PytorchRequirement(SimpleSubstitution):
continue
if len(parts) < 3 or not parts[2].endswith(py_ver):
continue
if len(parts) < 5 or platform_wheel not in parts[4].lower():
if len(parts) < 5 or platform_wheel not in parts[4]:
continue
if len(parts) < 5 or arch_wheel not in parts[4].lower():
continue
# yes this is for linux python 2.7 support, this is the only python 2.7 we support...
if py_ver and py_ver[0] == '2' and len(parts) > 3 and not parts[3].endswith('u'):
continue
@@ -338,21 +294,18 @@ class PytorchRequirement(SimpleSubstitution):
def get_url_for_platform(self, req):
# check if package is already installed with system packages
self.validate_python_version()
# noinspection PyBroadException
try:
if self.config.get("agent.package_manager.system_site_packages", None):
from pip._internal.commands.show import search_packages_info
installed_torch = list(search_packages_info([req.name]))
# notice the comparison order, the first part will make sure we have a valid installed package
installed_torch_version = (getattr(installed_torch[0], 'version', None) or installed_torch[0]['version']) \
if installed_torch else None
if installed_torch and installed_torch_version and \
req.compare_version(installed_torch_version):
if installed_torch and installed_torch[0]['version'] and \
req.compare_version(installed_torch[0]['version']):
print('PyTorch: requested "{}" version {}, using pre-installed version {}'.format(
req.name, req.specs[0] if req.specs else 'unspecified', installed_torch_version))
req.name, req.specs[0] if req.specs else 'unspecified', installed_torch[0]['version']))
# package already installed, do nothing
req.specs = [('==', str(installed_torch_version))]
req.specs = [('==', str(installed_torch[0]['version']))]
return '{} {} {}'.format(req.name, req.specs[0][0], req.specs[0][1]), True
except Exception:
pass
@@ -393,11 +346,6 @@ class PytorchRequirement(SimpleSubstitution):
else:
print('Trying PyTorch CUDA version {} support'.format(torch_url_key))
# fix broken pytorch setuptools incompatibility
if req.name == "torch" and closest_matched_version and \
SimpleVersion.compare_versions(closest_matched_version, "<", "1.11.0"):
self._fix_setuptools = "setuptools < 59"
if not url:
url = PytorchWheel(
torch_version=fix_version(version),
@@ -475,36 +423,6 @@ class PytorchRequirement(SimpleSubstitution):
return self.match_version(req, base).replace(" ", "\n")
def replace(self, req):
# check if package is already installed with system packages
self.validate_python_version()
# try to check if we can just use the new index URL, if we do not we will revert to old method
try:
extra_index_url = self.get_torch_index_url(self.cuda_version)
if extra_index_url:
# check if the torch version cannot be above 1.11 , we need to fix setup tools
try:
if req.name == "torch" and not compare_version_rules(req.specs, [(">=", "1.11.0")]):
self._fix_setuptools = "setuptools < 59"
except Exception: # noqa
pass
# now we just need to add the correct extra index url for the cuda version
self.set_add_install_extra_index(extra_index_url[0])
if req.specs and len(req.specs) == 1 and req.specs[0][0] == "==":
# remove any +cu extension and let pip resolve that
line = "{} {}".format(req.name, req.format_specs(max_num_parts=3))
if req.marker:
line += " ; {}".format(req.marker)
else:
# return the original line
line = req.line
return line
except Exception: # noqa
pass
try:
new_req = self._replace(req)
if new_req:
@@ -568,7 +486,7 @@ class PytorchRequirement(SimpleSubstitution):
for i, line in enumerate(lines):
if not line or line.lstrip().startswith('#'):
continue
parts = [p for p in re.split(r'\s|=|\.|<|>|~|!|@|#', line) if p]
parts = [p for p in re.split('\s|=|\.|<|>|~|!|@|#', line) if p]
if not parts:
continue
for req, new_req in self._original_req:
@@ -590,61 +508,6 @@ class PytorchRequirement(SimpleSubstitution):
return list_of_requirements
def post_scan_add_req(self): # type: () -> Optional[MarkerRequirement]
"""
Allows the RequirementSubstitution to add an extra line/requirements after
the initial requirements scan is completed.
Called only once per requirements.txt object
"""
if self._fix_setuptools:
return MarkerRequirement(Requirement.parse(self._fix_setuptools))
return None
@classmethod
def get_torch_index_url(cls, cuda_version, nightly=False):
# noinspection PyBroadException
try:
cuda = int(cuda_version)
except Exception:
cuda = 0
if nightly:
for c in range(cuda, max(-1, cuda-15), -1):
# then try the nightly builds, it might be there...
torch_url = cls.nightly_extra_index_url_template.format(c)
# noinspection PyBroadException
try:
if requests.get(torch_url, timeout=10).ok:
print('Torch nightly CUDA {} index page found'.format(c))
cls.torch_index_url_lookup[c] = torch_url
return cls.torch_index_url_lookup[c], c
except Exception:
pass
return
# first check if key is valid
if cuda in cls.torch_index_url_lookup:
return cls.torch_index_url_lookup[cuda], cuda
# then try a new cuda version page
for c in range(cuda, max(-1, cuda-15), -1):
torch_url = cls.extra_index_url_template.format(c)
# noinspection PyBroadException
try:
if requests.get(torch_url, timeout=10).ok:
print('Torch CUDA {} index page found'.format(c))
cls.torch_index_url_lookup[c] = torch_url
return cls.torch_index_url_lookup[c], c
except Exception:
pass
keys = sorted(cls.torch_index_url_lookup.keys(), reverse=True)
for k in keys:
if k <= cuda:
return cls.torch_index_url_lookup[k], k
# return default - zero
return cls.torch_index_url_lookup[0], 0
MAP = {
"windows": {
"cuda100": {

View File

@@ -11,15 +11,11 @@ from os import path
from typing import Text, List, Type, Optional, Tuple, Dict
from pathlib2 import Path
from clearml_agent.external.pyhocon import ConfigTree
from pyhocon import ConfigTree
import six
from six.moves.urllib.parse import unquote
import logging
from clearml_agent.definitions import PIP_EXTRA_INDICES
from clearml_agent.helper.base import (
warning, is_conda, which, join_lines, is_windows_platform,
convert_cuda_version_to_int_10_base_str, )
from clearml_agent.helper.base import warning, is_conda, which, join_lines, is_windows_platform
from clearml_agent.helper.process import Argv, PathLike
from clearml_agent.helper.gpu.gpustat import get_driver_cuda_version
from clearml_agent.session import Session, normalize_cuda_version
@@ -100,8 +96,7 @@ class MarkerRequirement(object):
return ','.join(starmap(operator.add, self.specs))
op, version = self.specs[0]
# noinspection PyProtectedMember
for v in SimpleVersion._sub_versions_pep440:
for v in self._sub_versions_pep440:
version = version.replace(v, '.')
if num_parts:
version = (version.strip('.').split('.') + ['0'] * num_parts)[:max_num_parts]
@@ -158,33 +153,6 @@ class MarkerRequirement(object):
return SimpleVersion.compare_versions(
version_a=requested_version, op=op, version_b=version, num_parts=num_parts)
def remove_local_file_ref(self):
if not self.local_file or self.vcs or self.editable or self.path:
return False
parts = re.split(r"@\s*{}".format(self.req.uri), self.req.line)
# if we did not find anything do nothing
if len(parts) < 2:
return False
self.req.line = ''.join(parts).strip()
self.req.uri = None
self.req.local_file = False
return True
def validate_local_file_ref(self):
# if local file does not exist, remove the reference to it
if self.vcs or self.editable or self.path or not self.local_file or not self.name or \
not self.uri or not self.uri.startswith("file://"):
return
local_path = Path(self.uri[len("file://"):])
if not local_path.exists():
local_path = Path(unquote(self.uri)[len("file://"):])
if not local_path.exists():
line = self.line
if self.remove_local_file_ref():
# print warning
logging.getLogger(__name__).warning(
'Local file not found [{}], references removed'.format(line))
class SimpleVersion:
_sub_versions_pep440 = ['a', 'b', 'rc', '.post', '.dev', '+', ]
@@ -240,11 +208,7 @@ class SimpleVersion:
if not version_b:
return True
if not num_parts:
num_parts = max(len(version_a.split('.')), len(version_b.split('.')), )
if op == '~=':
num_parts = len(version_b.split('.')) - 1
num_parts = max(num_parts, 2)
op = '=='
ignore_sub_versions = True
@@ -279,20 +243,8 @@ class SimpleVersion:
return version_a_key > version_b_key
if op == '<':
return version_a_key < version_b_key
if op == '!=':
return version_a_key != version_b_key
raise ValueError('Unrecognized comparison operator [{}]'.format(op))
@classmethod
def max_version(cls, version_a, version_b):
return version_a if cls.compare_versions(
version_a=version_a, op='>=', version_b=version_b, num_parts=None) else version_b
@classmethod
def min_version(cls, version_a, version_b):
return version_a if cls.compare_versions(
version_a=version_a, op='<=', version_b=version_b, num_parts=None) else version_b
@staticmethod
def _parse_letter_version(
letter, # type: str
@@ -361,94 +313,17 @@ class SimpleVersion:
return ()
def compare_version_rules(specs_a, specs_b):
# specs_a/b are a list of tuples: [('==', '1.2.3'), ] or [('>=', '1.2'), ('<', '1.3')]
# section definition:
class Section(object):
def __init__(self, left="-999999999", left_eq=False, right="999999999", right_eq=False):
self.left, self.left_eq, self.right, self.right_eq = left, left_eq, right, right_eq
# first create a list of in/out sections for each spec
# >, >= are left rule
# <, <= are right rule
# ~= x.y.z is converted to: >= x.y and < x.y+1
# ==/=== are converted to: >= and <=
# != x.y.z will split a section into: left < x.y.z and right > x.y.z
def create_section(specs):
section = Section()
for op, v in specs:
a = section
if op == '>':
a.left = v
a.left_eq = False
elif op == '>=':
a.left = v
a.left_eq = True
elif op == '<':
a.right = v
a.right_eq = False
elif op == '<=':
a.right = v
a.right_eq = True
elif op == '==':
a.left = v
a.left_eq = True
a.right = v
a.right_eq = True
elif op == '~=':
new_v = v.split('.')
a_left = '.'.join(new_v[:-1])
a.left = a_left if not a.left else SimpleVersion.max_version(a_left, a.left)
a.left_eq = True
a_right = '.'.join(new_v[:-2] + [str(int(new_v[-2])+1)])
a.right = a_right if not a.right else SimpleVersion.min_version(a_right, a.right)
a.right_eq = False if a.right == a_right else a.right_eq
return section
section_a = create_section(specs_a)
section_b = create_section(specs_b)
i = Section()
# then we have a list of sections for spec A/B
if section_a.left == section_b.left:
i.left = section_a.left
i.left_eq = section_a.left_eq and section_b.left_eq
else:
i.left = SimpleVersion.max_version(section_a.left, section_b.left)
i.left_eq = section_a.left_eq if i.left == section_a.left else section_b.left_eq
if section_a.right == section_b.right:
i.right = section_a.right
i.right_eq = section_a.right_eq and section_b.right_eq
else:
i.right = SimpleVersion.min_version(section_a.right, section_b.right)
i.right_eq = section_a.right_eq if i.right == section_a.right else section_b.right_eq
# return true if any section from A intersects a section from B
valid = True
valid &= SimpleVersion.compare_versions(
version_a=i.left, op='<=' if i.left_eq else '<', version_b=i.right, num_parts=None)
valid &= SimpleVersion.compare_versions(
version_a=i.right, op='>=' if i.left_eq else '>', version_b=i.left, num_parts=None)
return valid
@six.add_metaclass(ABCMeta)
class RequirementSubstitution(object):
_pip_extra_index_url = PIP_EXTRA_INDICES
@classmethod
def set_add_install_extra_index(cls, extra_index_url):
if extra_index_url not in cls._pip_extra_index_url:
cls._pip_extra_index_url.append(extra_index_url)
def __init__(self, session):
# type: (Session) -> ()
self._session = session
self.config = session.config # type: ConfigTree
self.suffix = '.post{config[agent.cuda_version]}.dev{config[agent.cudnn_version]}'.format(config=self.config)
self.package_manager = self.config['agent.package_manager.type']
self._is_already_installed_cb = None
@abstractmethod
def match(self, req): # type: (MarkerRequirement) -> bool
@@ -464,20 +339,6 @@ class RequirementSubstitution(object):
"""
pass
def set_is_already_installed_cb(self, cb):
self._is_already_installed_cb = cb
def is_already_installed(self, req):
if not self._is_already_installed_cb:
return False
# noinspection PyBroadException
try:
return self._is_already_installed_cb(req)
except BaseException as ex:
# debug could not resolve something
print("Warning: Requirements post install callback exception (check if package installed): {}".format(ex))
return False
def post_scan_add_req(self): # type: () -> Optional[MarkerRequirement]
"""
Allows the RequirementSubstitution to add an extra line/requirements after
@@ -502,7 +363,7 @@ class RequirementSubstitution(object):
@property
def cuda_version(self):
return convert_cuda_version_to_int_10_base_str(self.config['agent.cuda_version'])
return self.config['agent.cuda_version']
@property
def cudnn_version(self):
@@ -588,7 +449,6 @@ class RequirementsManager(object):
cache_dir=pip_cache_dir.as_posix())
self._base_interpreter = base_interpreter
self._cwd = None
self._installed_parsed_packages = set()
def register(self, cls): # type: (Type[RequirementSubstitution]) -> None
self.handlers.append(cls(self._session))
@@ -608,9 +468,20 @@ class RequirementsManager(object):
return None
def replace(self, requirements): # type: (Text) -> Text
parsed_requirements = self.parse_requirements_section_to_marker_requirements(
requirements=requirements, cwd=self._cwd)
def safe_parse(req_str):
# noinspection PyBroadException
try:
return list(parse(req_str, cwd=self._cwd))
except Exception as ex:
return [Requirement(req_str)]
parsed_requirements = tuple(
map(
MarkerRequirement,
[r for line in (requirements.splitlines() if isinstance(requirements, six.text_type) else requirements)
for r in safe_parse(line)]
)
)
if not parsed_requirements:
# return the original requirements just in case
return requirements
@@ -639,29 +510,14 @@ class RequirementsManager(object):
result = list(result)
# add post scan add requirements call back
double_req_set = None
for h in self.handlers:
reqs = h.post_scan_add_req()
if reqs:
if double_req_set is None:
def safe_parse_name(line):
try:
return Requirement.parse(line).name
except: # noqa
return None
double_req_set = set([safe_parse_name(r) for r in result if r])
for r in (reqs if isinstance(reqs, (tuple, list)) else [reqs]):
if r and (not r.name or r.name not in double_req_set):
result.append(r.tostr())
elif r:
print("SKIPPING additional auto installed package: \"{}\"".format(r))
req = h.post_scan_add_req()
if req:
result.append(req.tostr())
return join_lines(result)
def post_install(self, session, package_manager=None):
if package_manager:
self.update_installed_packages_state(package_manager.freeze())
def post_install(self, session):
for h in self.handlers:
try:
h.post_install(session)
@@ -683,34 +539,6 @@ class RequirementsManager(object):
def get_interpreter(self):
return self._base_interpreter
def update_installed_packages_state(self, requirements):
"""
Updates internal Installed Packages objects, so that later we can detect
if we already have a pre-installed package
:param requirements: is the output of a freeze() call, i.e. dict {'pip': "package==version"}
"""
requirements = requirements if not isinstance(requirements, dict) else requirements.get("pip")
self._installed_parsed_packages = self.parse_requirements_section_to_marker_requirements(
requirements=requirements, cwd=self._cwd)
for h in self.handlers:
h.set_is_already_installed_cb(self._callback_is_already_installed)
def _callback_is_already_installed(self, req):
for p in (self._installed_parsed_packages or []):
if p.name != req.name:
continue
# if this is version control package, only return true of both installed and requests specify commit ID
if req.vcs:
return p.vcs and req.revision and req.revision == p.revision
if not req.specs and not p.specs:
return True
# return if this is the same version
return req.specs and p.specs and req.compare_version(p, op="==")
return False
@staticmethod
def get_cuda_version(config): # type: (ConfigTree) -> (Text, Text)
# we assume os.environ already updated the config['agent.cuda_version'] & config['agent.cudnn_version']
@@ -786,29 +614,3 @@ class RequirementsManager(object):
return (normalize_cuda_version(cuda_version or 0),
normalize_cuda_version(cudnn_version or 0))
@staticmethod
def parse_requirements_section_to_marker_requirements(requirements, cwd=None):
def safe_parse(req_str):
# noinspection PyBroadException
try:
return list(parse(req_str, cwd=cwd))
except Exception as ex:
return [Requirement(req_str)]
def create_req(x):
r = MarkerRequirement(x)
r.validate_local_file_ref()
return r
if not requirements:
return tuple()
parsed_requirements = tuple(
map(
create_req,
[r for line in (requirements.splitlines() if isinstance(requirements, str) else requirements)
for r in safe_parse(line)]
)
)
return parsed_requirements

View File

@@ -16,6 +16,7 @@ from typing import Union, Text, Sequence, Any, TypeVar, Callable
import psutil
from furl import furl
from future.builtins import super
from pathlib2 import Path
import six
@@ -25,7 +26,7 @@ from clearml_agent.helper.base import bash_c, is_windows_platform, select_for_pl
PathLike = Union[Text, Path]
def get_bash_output(cmd, strip=False, stderr=subprocess.STDOUT, stdin=False, raise_error=False):
def get_bash_output(cmd, strip=False, stderr=subprocess.STDOUT, stdin=False):
try:
output = (
subprocess.check_output(
@@ -37,16 +38,10 @@ def get_bash_output(cmd, strip=False, stderr=subprocess.STDOUT, stdin=False, rai
.strip()
)
except subprocess.CalledProcessError:
if raise_error:
raise
output = None
return output if not strip or not output else output.strip()
def stringify_bash_output(value):
return '' if not value else (value if isinstance(value, str) else value.decode('utf-8'))
def terminate_process(pid, timeout=10., ignore_zombie=True, include_children=False):
# noinspection PyBroadException
try:

View File

@@ -1,11 +1,7 @@
import abc
import os
import re
import shutil
import stat
import subprocess
import sys
import tempfile
from distutils.spawn import find_executable
from hashlib import md5
from os import environ
@@ -27,7 +23,7 @@ from clearml_agent.helper.base import (
rm_tree,
ExecutionInfo,
normalize_path,
create_file_if_not_exists, safe_remove_file,
create_file_if_not_exists,
)
from clearml_agent.helper.os.locks import FileLock
from clearml_agent.helper.process import DEVNULL, Argv, PathLike, COMMAND_SUCCESS
@@ -112,7 +108,7 @@ class VCS(object):
)
self.url = url
self.location = Text(location)
self._revision = revision
self.revision = revision
self.log = self.session.get_logger(__name__)
@property
@@ -122,13 +118,6 @@ class VCS(object):
"""
return self.add_auth(self.session.config, self.url)
@property
def url_without_auth(self):
"""
Return URL without configured user/password
"""
return self.add_auth(self.session.config, self.url, reset_auth=True)
@abc.abstractmethod
def executable_name(self):
"""
@@ -360,9 +349,7 @@ class VCS(object):
If not in debug mode, filter VCS password from output.
"""
self._set_ssh_url()
# if we are on linux no need for the full auth url because we use GIT_ASKPASS
url = self.url_without_auth if self._use_ask_pass else self.url_with_auth
clone_command = ("clone", url, self.location) + self.clone_flags
clone_command = ("clone", self.url_with_auth, self.location) + self.clone_flags
# clone all branches regardless of when we want to later checkout
# if branch:
# clone_command += ("-b", branch)
@@ -370,41 +357,40 @@ class VCS(object):
self.call(*clone_command)
return
def normalize_output(result):
"""
Returns result string without user's password.
NOTE: ``self.get_stderr``'s result might or might not have the same type as ``e.output`` in case of error.
"""
string_type = (
ensure_text
if isinstance(result, six.text_type)
else ensure_binary
)
return result.replace(
string_type(self.url),
string_type(furl(self.url).remove(password=True).tostr()),
)
def print_output(output):
print(ensure_text(output))
try:
self._print_output(self._normalize_output(self.get_stderr(*clone_command)))
print_output(normalize_output(self.get_stderr(*clone_command)))
except subprocess.CalledProcessError as e:
# In Python 3, subprocess.CalledProcessError has a `stderr` attribute,
# but since stderr is redirect to `subprocess.PIPE` it will appear in the usual `output` attribute
if e.output:
e.output = self._normalize_output(e.output)
self._print_output(e.output)
e.output = normalize_output(e.output)
print_output(e.output)
raise
def _normalize_output(self, result):
"""
Returns result string without user's password.
NOTE: ``self.get_stderr``'s result might or might not have the same type as ``e.output`` in case of error.
"""
string_type = (
ensure_text
if isinstance(result, six.text_type)
else ensure_binary
)
return result.replace(
string_type(self.url),
string_type(furl(self.url).remove(password=True).tostr()),
)
@staticmethod
def _print_output(output):
print(ensure_text(output))
def checkout(self):
# type: () -> None
"""
Checkout repository at specified revision
"""
self.call("checkout", self._revision, *self.checkout_flags, cwd=self.location)
self.call("checkout", self.revision, *self.checkout_flags, cwd=self.location)
@abc.abstractmethod
def pull(self):
@@ -487,18 +473,16 @@ class VCS(object):
return Argv(self.executable_name, *argv)
@classmethod
def add_auth(cls, config, url, reset_auth=False):
def add_auth(cls, config, url):
"""
Add username and password to URL if missing from URL and present in config.
Does not modify ssh URLs.
:param reset_auth: If true remove the user/pass from the URL (default False)
"""
try:
parsed_url = furl(url)
except ValueError:
return url
if parsed_url.scheme in ["", "ssh"] or (parsed_url.scheme or '').startswith("git"):
if parsed_url.scheme in ["", "ssh"] or parsed_url.scheme.startswith("git"):
return parsed_url.url
config_user = ENV_AGENT_GIT_USER.get() or config.get("agent.{}_user".format(cls.executable_name), None)
config_pass = ENV_AGENT_GIT_PASS.get() or config.get("agent.{}_pass".format(cls.executable_name), None)
@@ -509,10 +493,7 @@ class VCS(object):
and config_pass
and (not config_domain or config_domain.lower() == parsed_url.host)
):
if reset_auth:
parsed_url.set(username=None, password=None)
else:
parsed_url.set(username=config_user, password=config_pass)
parsed_url.set(username=config_user, password=config_pass)
return parsed_url.url
@abc.abstractmethod
@@ -538,7 +519,7 @@ class VCS(object):
class Git(VCS):
executable_name = "git"
main_branch = ("master", "main")
main_branch = "master"
clone_flags = ("--quiet", "--recursive")
checkout_flags = ("--force",)
COMMAND_ENV = {
@@ -548,22 +529,9 @@ class Git(VCS):
"GIT_SSH_COMMAND": "ssh -oBatchMode=yes",
}
def __init__(self, *args, **kwargs):
super(Git, self).__init__(*args, **kwargs)
self._use_ask_pass = False if not self.session.config.get('agent.enable_git_ask_pass', None) \
else sys.platform == "linux"
try:
self.call("config", "--global", "--replace-all", "safe.directory", "*", cwd=self.location)
except: # noqa
pass
@staticmethod
def remote_branch_name(branch):
return [
"origin/{}".format(b) for b in ([branch] if isinstance(branch, str) else branch)
]
return "origin/{}".format(branch)
def executable_not_found_error_help(self):
return 'Cannot find "{}" executable. {}'.format(
@@ -581,79 +549,11 @@ class Git(VCS):
def pull(self):
self.call("fetch", "--all", "--recurse-submodules", cwd=self.location)
def _git_pass_auth_wrapper(self, func, *args, **kwargs):
try:
url_with_auth = furl(self.url_with_auth)
password = url_with_auth.password if url_with_auth else None
username = url_with_auth.username if url_with_auth else None
except: # noqa
password = None
username = None
# if this is not linux or we do not have a password, just run as is
if not self._use_ask_pass or not password or not username:
return func(*args, **kwargs)
# create the password file
fp, pass_file = tempfile.mkstemp(prefix='clearml_git_', suffix='.sh')
os.close(fp)
with open(pass_file, 'wt') as f:
# get first letter only (username / password are the argument options)
# then echo the correct information
f.writelines([
'#!/bin/bash\n',
'c="$1"\n',
'c="${c%"${c#?}"}"\n',
'if [ "$c" == "u" ] || [ "$c" == "U" ]; then echo "{}"; else echo "{}"; fi\n'.format(
username.replace('"', '\\"'), password.replace('"', '\\"')
)
])
# mark executable
st = os.stat(pass_file)
os.chmod(pass_file, st.st_mode | stat.S_IEXEC)
# let GIT use it
self.COMMAND_ENV["GIT_ASKPASS"] = pass_file
# call git command
try:
ret = func(*args, **kwargs)
finally:
# delete temp password file
self.COMMAND_ENV.pop("GIT_ASKPASS", None)
safe_remove_file(pass_file)
return ret
def get_stderr(self, *argv, **kwargs):
"""
Wrapper with git password authentication
"""
return self._git_pass_auth_wrapper(super(Git, self).get_stderr, *argv, **kwargs)
def call_with_stdin(self, *argv, **kwargs):
"""
Wrapper with git password authentication
"""
return self._git_pass_auth_wrapper(super(Git, self).call_with_stdin, *argv, **kwargs)
def call(self, *argv, **kwargs):
"""
Wrapper with git password authentication
"""
return self._git_pass_auth_wrapper(super(Git, self).call, *argv, **kwargs)
def checkout(self): # type: () -> None
"""
Checkout repository at specified revision
"""
revisions = [self._revision] if isinstance(self._revision, str) else self._revision
for i, revision in enumerate(revisions):
try:
self.call("checkout", revision, *self.checkout_flags, cwd=self.location)
break
except subprocess.CalledProcessError:
if i == len(revisions) - 1:
raise
self.call("checkout", self.revision, *self.checkout_flags, cwd=self.location)
try:
self.call("submodule", "update", "--recursive", cwd=self.location)
except: # noqa
@@ -693,7 +593,7 @@ class Hg(VCS):
"pull",
self.url_with_auth,
cwd=self.location,
*(("-r", self._revision) if self._revision else ())
*(("-r", self.revision) if self.revision else ())
)
info_commands = dict(
@@ -763,9 +663,7 @@ def clone_repository_cached(session, execution, destination):
vcs.pull()
rm_tree(destination)
shutil.copytree(Text(cached_repo_path), Text(clone_folder),
symlinks=select_for_platform(linux=True, windows=False),
ignore_dangling_symlinks=True)
shutil.copytree(Text(cached_repo_path), Text(clone_folder))
if not clone_folder.is_dir():
raise CommandFailedError(
"copying of repository failed: from {} to {}".format(
@@ -773,9 +671,9 @@ def clone_repository_cached(session, execution, destination):
)
)
# checkout in the newly copy destination
vcs.location = Text(clone_folder)
vcs.checkout()
# checkout in the newly copy destination
vcs.location = Text(clone_folder)
vcs.checkout()
repo_info = vcs.get_repository_copy_info(clone_folder)

View File

@@ -82,7 +82,7 @@ class ResourceMonitor(object):
if not worker_tags and ENV_WORKER_TAGS.get():
worker_tags = shlex.split(ENV_WORKER_TAGS.get())
self._worker_tags = worker_tags
if Session.get_nvidia_visible_env() == 'none':
if os.environ.get('NVIDIA_VISIBLE_DEVICES') == 'none':
# NVIDIA_VISIBLE_DEVICES set to none, marks cpu_only flag
# active_gpus == False means no GPU reporting
self._active_gpus = False
@@ -92,10 +92,10 @@ class ResourceMonitor(object):
# None means no filtering, report all gpus
self._active_gpus = None
try:
active_gpus = Session.get_nvidia_visible_env()
# None means no filtering, report all gpus
if active_gpus and active_gpus != "all":
self._active_gpus = [g.strip() for g in str(active_gpus).split(',')]
active_gpus = os.environ.get('NVIDIA_VISIBLE_DEVICES', '') or \
os.environ.get('CUDA_VISIBLE_DEVICES', '')
if active_gpus:
self._active_gpus = [int(g.strip()) for g in active_gpus.split(',')]
except Exception:
pass
@@ -263,7 +263,7 @@ class ResourceMonitor(object):
gpu_stat = self._gpustat.new_query()
for i, g in enumerate(gpu_stat.gpus):
# only monitor the active gpu's, if none were selected, monitor everything
if self._active_gpus and str(i) not in self._active_gpus:
if self._active_gpus and i not in self._active_gpus:
continue
stats["gpu_temperature_{:d}".format(i)] = g["temperature.gpu"]
stats["gpu_utilization_{:d}".format(i)] = g["utilization.gpu"]

View File

@@ -22,7 +22,7 @@ WORKER_ARGS = {
'help': 'git username for repository access',
},
'--git-pass': {
'help': 'git password (personal access tokens) for repository access',
'help': 'git password for repository access',
},
'--log-level': {
'help': 'SDK log level',
@@ -99,10 +99,8 @@ DAEMON_ARGS = dict({
'aliases': ['-d'],
},
'--stop': {
'help': 'Stop the running agent (based on the same set of arguments). '
'Optional: provide a list of specific local worker IDs to stop',
'nargs': '*',
'default': False,
'help': 'Stop the running agent (based on the same set of arguments)',
'action': 'store_true',
},
'--dynamic-gpus': {
'help': 'Allow to dynamically allocate gpus based on queue properties, '
@@ -167,7 +165,7 @@ COMMANDS = {
},
'--docker': {
'help': 'Run execution task inside a docker (v19.03 and above). Optional args <image> <arguments> or '
'specify default docker image in agent.default_docker.image / agent.default_docker.arguments '
'specify default docker image in agent.default_docker.image / agent.default_docker.arguments'
'use --gpus/--cpu-only (or set NVIDIA_VISIBLE_DEVICES) to limit gpu visibility for docker',
'nargs': '*',
'default': False,
@@ -201,18 +199,11 @@ COMMANDS = {
},
'--docker': {
'help': 'Build the experiment inside a docker (v19.03 and above). Optional args <image> <arguments> or '
'specify default docker image in agent.default_docker.image / agent.default_docker.arguments '
'specify default docker image in agent.default_docker.image / agent.default_docker.arguments'
'use --gpus/--cpu-only (or set NVIDIA_VISIBLE_DEVICES) to limit gpu visibility for docker',
'nargs': '*',
'default': False,
},
'--force-docker': {
'help': 'Force using the agent-specified docker image (either explicitly in the --docker argument or '
'using the agent\'s default docker image). If provided, the agent will not use any docker '
'container information stored on the task itself (default False)',
'default': False,
'action': 'store_true',
},
'--python-version': {
'help': 'Virtual environment python version to use',
},

View File

@@ -10,8 +10,8 @@ from typing import Any, Callable
import attr
from pathlib2 import Path
from pyhocon import ConfigFactory, HOCONConverter, ConfigTree
from clearml_agent.external.pyhocon import ConfigFactory, HOCONConverter, ConfigTree
from clearml_agent.backend_api.session import Session as _Session, Request
from clearml_agent.backend_api.session.client import APIClient
from clearml_agent.backend_config.defs import LOCAL_CONFIG_FILE_OVERRIDE_VAR, LOCAL_CONFIG_FILES
@@ -19,7 +19,6 @@ from clearml_agent.definitions import ENVIRONMENT_CONFIG, ENV_TASK_EXECUTE_AS_US
from clearml_agent.errors import APIError
from clearml_agent.helper.base import HOCONEncoder
from clearml_agent.helper.process import Argv
from clearml_agent.helper.docker_args import DockerArgsSanitizer
from .version import __version__
POETRY = "poetry"
@@ -77,7 +76,7 @@ class Session(_Session):
cpu_only = kwargs.get('cpu_only')
if cpu_only:
Session.set_nvidia_visible_env('none')
os.environ['CUDA_VISIBLE_DEVICES'] = os.environ['NVIDIA_VISIBLE_DEVICES'] = 'none'
if kwargs.get('gpus') and not os.environ.get('KUBERNETES_SERVICE_HOST') \
and not os.environ.get('KUBERNETES_PORT'):
@@ -86,7 +85,7 @@ class Session(_Session):
os.environ.pop('CUDA_VISIBLE_DEVICES', None)
os.environ['NVIDIA_VISIBLE_DEVICES'] = kwargs.get('gpus')
else:
Session.set_nvidia_visible_env(kwargs.get('gpus'))
os.environ['CUDA_VISIBLE_DEVICES'] = os.environ['NVIDIA_VISIBLE_DEVICES'] = kwargs.get('gpus')
if kwargs.get('only_load_config'):
from clearml_agent.backend_api.config import load
@@ -106,7 +105,7 @@ class Session(_Session):
if os.path.exists(os.path.expanduser(os.path.expandvars(f))):
self._config_file = f
break
self._api_client = None
self.api_client = APIClient(session=self, api_version="2.5")
# HACK make sure we have python version to execute,
# if nothing was specific, use the one that runs us
def_python = ConfigValue(self.config, "agent.default_python")
@@ -133,7 +132,7 @@ class Session(_Session):
# override with environment variables
# cuda_version & cudnn_version are overridden with os.environ here, and normalized in the next section
for config_key, env_config in ENVIRONMENT_CONFIG.items():
# check if the property is of a list:
# check if the propery is of a list:
if config_key.endswith('.0'):
if all(not i.get() for i in env_config.values()):
continue
@@ -167,16 +166,6 @@ class Session(_Session):
if not kwargs.get('only_load_config'):
self.create_cache_folders()
@property
def api_client(self):
if self._api_client is None:
self._api_client = APIClient(session=self, api_version="2.5")
return self._api_client
@api_client.setter
def api_client(self, value):
self._api_client = value
@staticmethod
def get_logger(name):
logger = logging.getLogger(name)
@@ -240,38 +229,26 @@ class Session(_Session):
except:
pass
def print_configuration(
self,
remove_secret_keys=("secret", "pass", "token", "account_key", "contents"),
skip_value_keys=("environment", ),
docker_args_sanitize_keys=("extra_docker_arguments", ),
):
def print_configuration(self, remove_secret_keys=("secret", "pass", "token", "account_key")):
# remove all the secrets from the print
def recursive_remove_secrets(dictionary, secret_keys=(), empty_keys=()):
def recursive_remove_secrets(dictionary, secret_keys=()):
for k in list(dictionary):
for s in secret_keys:
if s in k:
dictionary.pop(k)
break
for s in empty_keys:
if s == k:
dictionary[k] = {key: '****' for key in dictionary[k]} \
if isinstance(dictionary[k], dict) else '****'
break
if isinstance(dictionary.get(k, None), dict):
recursive_remove_secrets(dictionary[k], secret_keys=secret_keys, empty_keys=empty_keys)
recursive_remove_secrets(dictionary[k], secret_keys=secret_keys)
elif isinstance(dictionary.get(k, None), (list, tuple)):
if k in (docker_args_sanitize_keys or []):
dictionary[k] = DockerArgsSanitizer.sanitize_docker_command(self, dictionary[k])
for item in dictionary[k]:
if isinstance(item, dict):
recursive_remove_secrets(item, secret_keys=secret_keys, empty_keys=empty_keys)
recursive_remove_secrets(item, secret_keys=secret_keys)
config = deepcopy(self.config.to_dict())
# remove the env variable, it's not important
config.pop('env', None)
if remove_secret_keys or skip_value_keys or docker_args_sanitize_keys:
recursive_remove_secrets(config, secret_keys=remove_secret_keys, empty_keys=skip_value_keys)
if remove_secret_keys:
recursive_remove_secrets(config, secret_keys=remove_secret_keys)
# remove logging.loggers.urllib3.level from the print
try:
config['logging']['loggers']['urllib3'].pop('level', None)
@@ -302,7 +279,7 @@ class Session(_Session):
def get(self, service, action, version=None, headers=None,
data=None, json=None, async_enable=False, **kwargs):
return self._manual_request(service=service, action=action,
version=version, method=Request.def_method, headers=headers,
version=version, method="get", headers=headers,
data=data, async_enable=async_enable,
json=json or kwargs)
@@ -313,7 +290,7 @@ class Session(_Session):
data=data, async_enable=async_enable,
json=json or kwargs)
def _manual_request(self, service, action, version=None, method=Request.def_method, headers=None,
def _manual_request(self, service, action, version=None, method="get", headers=None,
data=None, json=None, async_enable=False, **kwargs):
res = self.send_request(service=service, action=action,
@@ -341,23 +318,6 @@ class Session(_Session):
def command(self, *args):
return Argv(*args, log=self.get_logger(Argv.__module__))
@staticmethod
def set_nvidia_visible_env(gpus):
if not gpus:
gpus = ""
visible_env = gpus.replace(".", ":") if isinstance(gpus, str) else \
','.join(str(g).replace(".", ":") for g in gpus)
os.environ['CUDA_VISIBLE_DEVICES'] = os.environ['NVIDIA_VISIBLE_DEVICES'] = visible_env
@staticmethod
def get_nvidia_visible_env():
visible_env = os.environ.get('NVIDIA_VISIBLE_DEVICES') or os.environ.get('CUDA_VISIBLE_DEVICES')
if visible_env is None:
return None
visible_env = str(visible_env).replace(":", ".")
return visible_env
@attr.s
class TrainsAgentLogger(object):

View File

@@ -1 +1 @@
__version__ = '1.5.0'
__version__ = '1.1.2'

View File

@@ -57,8 +57,8 @@ agent {
# supported options: pip, conda, poetry
type: pip,
# specify pip version to use (examples "<20.2", "==19.3.1", "", empty string will install the latest version)
pip_version: "<21",
# specify pip version to use (examples "<20", "==19.3.1", "", empty string will install the latest version)
pip_version: "<20.2",
# virtual environment inheres packages from system
system_site_packages: false,
@@ -171,7 +171,7 @@ agent {
default_docker: {
# default docker image to use when running in docker mode
image: "nvidia/cuda:10.2-cudnn7-runtime-ubuntu18.04"
image: "nvidia/cuda:10.1-cudnn7-runtime-ubuntu18.04"
# optional arguments to pass to docker image
# arguments: ["--ipc=host", ]

View File

@@ -1,75 +0,0 @@
ARG TAG=3.7.12-alpine3.15
FROM python:${TAG} as build
RUN apk add --no-cache \
gcc \
musl-dev \
libffi-dev
RUN python3 \
-m pip \
install \
--prefix=/install \
--no-cache-dir \
-U \
clearml-agent \
cryptography>=2.9
FROM python:${TAG} as target
WORKDIR /app
ARG KUBECTL_VERSION=1.22.4
# Not sure about these ENV vars
# ENV LC_ALL=en_US.UTF-8
# ENV LANG=en_US.UTF-8
# ENV LANGUAGE=en_US.UTF-8
# ENV PYTHONIOENCODING=UTF-8
COPY --from=build /install /usr/local
ADD https://storage.googleapis.com/kubernetes-release/release/v${KUBECTL_VERSION}/bin/linux/amd64/kubectl /usr/bin/
RUN chmod +x /usr/bin/kubectl
RUN apk add --no-cache \
bash
COPY k8s_glue_example.py .
# AWS CLI
# https://github.com/kyleknap/aws-cli/blob/source-proposal/proposals/source-install.md#alpine-linux
# https://github.com/aws/aws-cli/issues/4685
# https://github.com/aws/aws-cli/pull/6352
# https://github.com/GoogleCloudPlatform/cloud-sdk-docker/blob/master/alpine/Dockerfile
FROM target as gcp
ARG CLOUD_SDK_VERSION=371.0.0
ENV CLOUD_SDK_VERSION=$CLOUD_SDK_VERSION
ENV PATH /google-cloud-sdk/bin:$PATH
WORKDIR /
RUN apk --no-cache add \
curl \
python3 \
py3-crcmod \
py3-openssl \
bash \
libc6-compat \
openssh-client \
git \
gnupg \
&& curl -O https://dl.google.com/dl/cloudsdk/channels/rapid/downloads/google-cloud-sdk-${CLOUD_SDK_VERSION}-linux-x86_64.tar.gz && \
tar xzf google-cloud-sdk-${CLOUD_SDK_VERSION}-linux-x86_64.tar.gz && \
rm google-cloud-sdk-${CLOUD_SDK_VERSION}-linux-x86_64.tar.gz && \
gcloud config set core/disable_usage_reporting true && \
gcloud config set component_manager/disable_update_check true && \
gcloud config set metrics/environment github_docker_image && \
gcloud --version
WORKDIR /app

View File

@@ -1,82 +0,0 @@
ARG TAG=3.7.12-slim-bullseye
FROM python:${TAG} as target
ARG KUBECTL_VERSION=1.22.4
WORKDIR /app
RUN python3 \
-m pip \
install \
--no-cache-dir \
-U \
clearml-agent \
cryptography>=2.9
# Not sure about these ENV vars
# ENV LC_ALL=en_US.UTF-8
# ENV LANG=en_US.UTF-8
# ENV LANGUAGE=en_US.UTF-8
# ENV PYTHONIOENCODING=UTF-8
ADD https://storage.googleapis.com/kubernetes-release/release/v${KUBECTL_VERSION}/bin/linux/amd64/kubectl /usr/bin/
RUN chmod +x /usr/bin/kubectl
COPY k8s_glue_example.py .
CMD ["python3", "k8s_glue_example.py"]
FROM target as aws
# https://docs.aws.amazon.com/cli/latest/userguide/getting-started-install.html
# https://docs.aws.amazon.com/eks/latest/userguide/install-aws-iam-authenticator.html
RUN apt-get update -qqy && \
apt-get install -qqy \
unzip && \
rm -rf /var/lib/apt/lists/*
ADD https://awscli.amazonaws.com/awscli-exe-linux-x86_64.zip awscliv2.zip
ADD https://amazon-eks.s3.us-west-2.amazonaws.com/1.21.2/2021-07-05/bin/linux/amd64/aws-iam-authenticator /usr/local/bin/aws-iam-authenticator
RUN unzip awscliv2.zip && \
./aws/install && \
rm -r awscliv2.zip aws/ && \
chmod +x /usr/local/bin/aws-iam-authenticator && \
aws --version && \
aws-iam-authenticator version
# https://github.com/GoogleCloudPlatform/cloud-sdk-docker/blob/master/debian_slim/Dockerfile
FROM target as gcp
ARG CLOUD_SDK_VERSION=371.0.0
ENV CLOUD_SDK_VERSION=$CLOUD_SDK_VERSION
ENV PATH "$PATH:/opt/google-cloud-sdk/bin/"
ARG INSTALL_COMPONENTS
RUN mkdir -p /usr/share/man/man1/
RUN apt-get update -qqy && \
apt-get install -qqy \
curl \
gcc \
python3-dev \
python3-pip \
apt-transport-https \
lsb-release \
openssh-client \
git \
gnupg && \
rm -rf /var/lib/apt/lists/* && \
pip3 install -U crcmod && \
export CLOUD_SDK_REPO="cloud-sdk-$(lsb_release -c -s)" && \
echo "deb https://packages.cloud.google.com/apt $CLOUD_SDK_REPO main" > /etc/apt/sources.list.d/google-cloud-sdk.list && \
curl https://packages.cloud.google.com/apt/doc/apt-key.gpg | apt-key add - && \
apt-get update && apt-get install -y google-cloud-sdk=${CLOUD_SDK_VERSION}-0 $INSTALL_COMPONENTS && \
gcloud config set core/disable_usage_reporting true && \
gcloud config set component_manager/disable_update_check true && \
gcloud config set metrics/environment github_docker_image && \
gcloud --version

View File

@@ -1,94 +0,0 @@
"""
This example assumes you have preconfigured services with selectors in the form of
"ai.allegro.agent.serial=pod-<number>" and a targetPort of 10022.
The K8sIntegration component will label each pod accordingly.
"""
from argparse import ArgumentParser
from clearml_agent.glue.k8s import K8sIntegration
def parse_args():
parser = ArgumentParser()
group = parser.add_mutually_exclusive_group()
parser.add_argument(
"--queue", type=str, help="Queue to pull tasks from"
)
group.add_argument(
"--ports-mode", action='store_true', default=False,
help="Ports-Mode will add a label to the pod which can be used as service, in order to expose ports"
"Should not be used with max-pods"
)
parser.add_argument(
"--num-of-services", type=int, default=20,
help="Specify the number of k8s services to be used. Use only with ports-mode."
)
parser.add_argument(
"--base-port", type=int,
help="Used in conjunction with ports-mode, specifies the base port exposed by the services. "
"For pod #X, the port will be <base-port>+X. Note that pod number is calculated based on base-pod-num"
"e.g. if base-port=20000 and base-pod-num=3, the port for the first pod will be 20003"
)
parser.add_argument(
"--base-pod-num", type=int, default=1,
help="Used in conjunction with ports-mode and base-port, specifies the base pod number to be used by the "
"service (default: %(default)s)"
)
parser.add_argument(
"--gateway-address", type=str, default=None,
help="Used in conjunction with ports-mode, specify the external address of the k8s ingress / ELB"
)
parser.add_argument(
"--pod-clearml-conf", type=str,
help="Configuration file to be used by the pod itself (if not provided, current configuration is used)"
)
parser.add_argument(
"--overrides-yaml", type=str,
help="YAML file containing pod overrides to be used when launching a new pod"
)
parser.add_argument(
"--template-yaml", type=str,
help="YAML file containing pod template. If provided pod will be scheduled with kubectl apply "
"and overrides are ignored, otherwise it will be scheduled with kubectl run"
)
parser.add_argument(
"--ssh-server-port", type=int, default=0,
help="If non-zero, every pod will also start an SSH server on the selected port (default: zero, not active)"
)
parser.add_argument(
"--namespace", type=str,
help="Specify the namespace in which pods will be created (default: %(default)s)", default="clearml"
)
group.add_argument(
"--max-pods", type=int,
help="Limit the maximum number of pods that this service can run at the same time."
"Should not be used with ports-mode"
)
return parser.parse_args()
def main():
args = parse_args()
user_props_cb = None
if args.ports_mode and args.base_port:
def k8s_user_props_cb(pod_number=0):
user_prop = {"k8s-pod-port": args.base_port + pod_number}
if args.gateway_address:
user_prop["k8s-gateway-address"] = args.gateway_address
return user_prop
user_props_cb = k8s_user_props_cb
k8s = K8sIntegration(
ports_mode=args.ports_mode, num_of_services=args.num_of_services, base_pod_num=args.base_pod_num,
user_props_cb=user_props_cb, overrides_yaml=args.overrides_yaml, clearml_conf_file=args.pod_clearml_conf,
template_yaml=args.template_yaml, extra_bash_init_script=K8sIntegration.get_ssh_server_bash(
ssh_port_number=args.ssh_server_port) if args.ssh_server_port else None,
namespace=args.namespace, max_pods_limit=args.max_pods or None,
)
k8s.k8s_daemon(args.queue)
if __name__ == "__main__":
main()

View File

@@ -4,7 +4,7 @@ api {
web_server: https://demoapp.demo.clear.ml
files_server: https://demofiles.demo.clear.ml
# Credentials are generated in the webapp, https://app.clear.ml/settings/workspace-configuration
# Credentials are generated in the webapp, https://demoapp.demo.clear.ml/profile
# Overridden with os environment: CLEARML_API_ACCESS_KEY / CLEARML_API_SECRET_KEY
credentials {"access_key": "EGRTCO8JMSIGI6S39GTP43NFWXDQOW", "secret_key": "x!XTov_G-#vspE*Y(h$Anm&DIc5Ou-F)jsl$PdOyj5wG1&E!Z8"}
@@ -13,27 +13,13 @@ api {
}
agent {
# unique name of this worker, if None, created based on hostname:process_id
# Override with os environment: CLEARML_WORKER_ID
# worker_id: "clearml-agent-machine1:gpu0"
worker_id: ""
# worker name, replaces the hostname when creating a unique name for this worker
# Override with os environment: CLEARML_WORKER_NAME
# worker_name: "clearml-agent-machine1"
worker_name: ""
# Set GIT user/pass credentials (if user/pass are set, GIT protocol will be set to https)
# leave blank for GIT SSH credentials (set force_git_ssh_protocol=true to force SSH protocol)
# **Notice**: GitHub personal token is equivalent to password, you can put it directly into `git_pass`
# To learn how to generate git token GitHub/Bitbucket/GitLab:
# https://docs.github.com/en/authentication/keeping-your-account-and-data-secure/creating-a-personal-access-token
# https://support.atlassian.com/bitbucket-cloud/docs/app-passwords/
# https://docs.gitlab.com/ee/user/profile/personal_access_tokens.html
# git_user: ""
# git_pass: ""
git_user=""
git_pass=""
# Limit credentials to a single domain, for example: github.com,
# all other domains will use public access (no user/pass). Default: always send user/pass for any VCS domain
# git_host: ""
git_host=""
# Force GIT protocol to use SSH regardless of the git url (Assumes GIT user/pass are blank)
force_git_ssh_protocol: false
@@ -42,6 +28,16 @@ agent {
# Force a specific SSH username when converting http to ssh links (the default username is 'git')
# force_git_ssh_user: git
# unique name of this worker, if None, created based on hostname:process_id
# Overridden with os environment: CLEARML_WORKER_NAME
# worker_id: "clearml-agent-machine1:gpu0"
worker_id: ""
# worker name, replaces the hostname when creating a unique name for this worker
# Overridden with os environment: CLEARML_WORKER_ID
# worker_name: "clearml-agent-machine1"
worker_name: ""
# Set the python version to use when creating the virtual environment and launching the experiment
# Example values: "/usr/bin/python3" or "/usr/local/bin/python3.6"
# The default is the python executing the clearml_agent
@@ -50,22 +46,6 @@ agent {
# specific python version and the system supports multiple python the agent will use the requested python version)
# ignore_requested_python_version: true
# Force the root folder of the git repository (instead of the working directory) into the PYHTONPATH
# default false, only the working directory will be added to the PYHTONPATH
# force_git_root_python_path: false
# if set, use GIT_ASKPASS to pass user/pass when cloning / fetch repositories
# it solves passing user/token to git submodules.
# this is a safer way to ensure multiple users using the same repository will
# not accidentally leak credentials
# Only supported on Linux systems, it will be the default in future releases
# enable_git_ask_pass: false
# in docker mode, if container's entrypoint automatically activated a virtual environment
# use the activated virtual environment and install everything there
# set to False to disable, and always create a new venv inheriting from the system_site_packages
# docker_use_activated_venv: true
# select python package manager:
# currently supported: pip, conda and poetry
# if "pip" or "conda" are used, the agent installs the required packages
@@ -78,10 +58,8 @@ agent {
# supported options: pip, conda, poetry
type: pip,
# specify pip version to use (examples "<20.2", "==19.3.1", "", empty string will install the latest version)
# pip_version: "<21"
# specify poetry version to use (examples "<2", "==1.1.1", "", empty string will install the latest version)
# poetry_version: "<2",
# specify pip version to use (examples "<20", "==19.3.1", "", empty string will install the latest version)
# pip_version: "<20"
# virtual environment inheres packages from system
system_site_packages: false,
@@ -128,7 +106,7 @@ agent {
# minimum required free space to allow for cache entry, disable by passing 0 or negative value
free_space_threshold_gb: 2.0
# unmark to enable virtual environment caching
path: ~/.clearml/venvs-cache
# path: ~/.clearml/venvs-cache
},
# cached git clone folder
@@ -151,12 +129,6 @@ agent {
},
translate_ssh: true,
# set "disable_ssh_mount: true" to disable the automatic mount of ~/.ssh folder into the docker containers
# default is false, automatically mounts ~/.ssh
# Must be set to True if using "clearml-session" with this agent!
# disable_ssh_mount: false
# reload configuration file every daemon execution
reload_config: false,
@@ -183,64 +155,17 @@ agent {
default_docker: {
# default docker image to use when running in docker mode
image: "nvidia/cuda:10.2-cudnn7-runtime-ubuntu18.04"
image: "nvidia/cuda:10.1-cudnn7-runtime-ubuntu18.04"
# optional arguments to pass to docker image
# arguments: ["--ipc=host"]
# lookup table rules for default container
# first matched rule will be picked, according to rule order
# enterprise version only
# match_rules: [
# {
# image: "nvidia/cuda:10.1-cudnn7-runtime-ubuntu18.04"
# arguments: "-e define=value"
# match: {
# script{
# # Optional: must match all requirements (not partial)
# requirements: {
# # version selection matching PEP-440
# pip: {
# tensorflow: "~=2.6"
# },
# }
# # Optional: matching based on regular expression, example: "^exact_match$"
# repository: "/my_repository/"
# branch: "main"
# binary: "python3.6"
# }
# # Optional: matching based on regular expression, example: "^exact_match$"
# project: "project/sub_project"
# }
# },
# {
# image: "nvidia/cuda:10.1-cudnn7-runtime-ubuntu18.04"
# arguments: "-e define=value"
# match: {
# # must match all requirements (not partial)
# script{
# requirements: {
# conda: {
# torch: ">=2.6,<2.8"
# }
# }
# # no repository matching required
# repository: ""
# }
# # no container image matching required (allow to replace one requested container with another)
# container: ""
# # no repository matching required
# project: ""
# }
# },
# ]
}
# set the OS environments based on the Task's Environment section before launching the Task process.
enable_task_env: false
# CUDA versions used for Conda setup & solving PyTorch wheel packages
# Should be detected automatically. Override with os environment CUDA_VERSION / CUDNN_VERSION
# it Should be detected automatically. Override with os environment CUDA_VERSION / CUDNN_VERSION
# cuda_version: 10.1
# cudnn_version: 7.6
@@ -254,7 +179,6 @@ agent {
hide_docker_command_env_vars {
enabled: true
extra_keys: []
parse_embedded_urls: true
}
# allow to set internal mount points inside the docker,
@@ -266,7 +190,7 @@ agent {
# pip_cache: "/root/.cache/pip"
# poetry_cache: "/root/.cache/pypoetry"
# vcs_cache: "/root/.clearml/vcs-cache"
# venv_build: "~/.clearml/venvs-builds"
# venv_build: "/root/.clearml/venvs-builds"
# pip_download: "/root/.clearml/pip-download-cache"
# }
@@ -346,11 +270,6 @@ sdk {
key: ""
secret: ""
region: ""
# Or enable credentials chain to let Boto3 pick the right credentials.
# This includes picking credentials from environment variables,
# credential file and IAM role using metadata service.
# Refer to the latest Boto3 docs
use_credentials_chain: false
credentials: [
# specifies key/secret credentials to use when handling s3 urls (read or write)
@@ -366,7 +285,6 @@ sdk {
# secret: "12345678"
# multipart: false
# secure: false
# verify: /path/to/ca/bundle.crt OR false to not verify
# }
]
}
@@ -444,46 +362,42 @@ sdk {
# Apply top-level environment section from configuration into os.environ
apply_environment: true
# Top-level environment section is in the form of:
# environment {
# key: value
# ...
# }
# and is applied to the OS environment as `key=value` for each key/value pair
# Apply top-level files section from configuration into local file system
apply_files: true
# Top-level files section allows auto-generating files at designated paths with a predefined contents
# and target format. Options include:
# contents: the target file's content, typically a string (or any base type int/float/list/dict etc.)
# format: a custom format for the contents. Currently supported value is `base64` to automatically decode a
# base64-encoded contents string, otherwise ignored
# path: the target file's path, may include ~ and inplace env vars
# target_format: format used to encode contents before writing into the target file. Supported values are json,
# yaml, yml and bytes (in which case the file will be written in binary mode). Default is text mode.
# overwrite: overwrite the target file in case it exists. Default is true.
#
# Example:
# files {
# myfile1 {
# contents: "The quick brown fox jumped over the lazy dog"
# path: "/tmp/fox.txt"
# }
# myjsonfile {
# contents: {
# some {
# nested {
# value: [1, 2, 3, 4]
# }
# }
# }
# path: "/tmp/test.json"
# target_format: json
# }
# }
}
# Environment section (top-level) is applied to the OS environment as `key=value` for each key/value pair
# * enable/disable with `agent.apply_environment` OR `sdk.apply_environment`
# Example:
#
# environment {
# key_a: value_a
# key_b: value_b
# }
# Files section (top-level) allows auto-generating files at designated paths with
# predefined content and target format.
# * enable/disable with `agent.apply_files` OR `sdk.apply_files`
# Files content options include:
# contents: the target file's content, typically a string (or any base type int/float/list/dict etc.)
# format: a custom format for the contents. Currently supported value is `base64` to automatically decode a
# base64-encoded contents string, otherwise ignored
# path: the target file's path, may include ~ and inplace env vars
# target_format: format used to encode contents before writing into the target file. Supported values are json,
# yaml, yml and bytes (in which case the file will be written in binary mode). Default is text mode.
# overwrite: overwrite the target file in case it exists. Default is true.
#
# Example:
# files {
# myfile1 {
# contents: "The quick brown fox jumped over the lazy dog"
# path: "/tmp/fox.txt"
# }
# myjsonfile {
# contents: {
# some {
# nested {
# value: [1, 2, 3, 4]
# }
# }
# }
# path: "/tmp/test.json"
# target_format: json
# }
# }

View File

@@ -65,10 +65,6 @@ def parse_args():
help="Limit the maximum number of pods that this service can run at the same time."
"Should not be used with ports-mode"
)
parser.add_argument(
"--use-owner-token", action="store_true", default=False,
help="Generate and use task owner token for the execution of each task"
)
return parser.parse_args()
@@ -91,7 +87,7 @@ def main():
ssh_port_number=args.ssh_server_port) if args.ssh_server_port else None,
namespace=args.namespace, max_pods_limit=args.max_pods or None,
)
k8s.k8s_daemon(args.queue, use_owner_token=args.use_owner_token)
k8s.k8s_daemon(args.queue)
if __name__ == "__main__":

View File

@@ -1,15 +1,17 @@
attrs>=18.0,<20.4.0
enum34>=0.9,<1.2.0 ; python_version < '3.6'
furl>=2.0.0,<2.2.0
future>=0.16.0,<0.19.0
jsonschema>=2.6.0,<3.3.0
pathlib2>=2.3.0,<2.4.0
psutil>=3.4.2,<5.10.0
psutil>=3.4.2,<5.9.0
pyhocon>=0.3.38,<0.4.0
pyparsing>=2.0.3,<2.5.0
python-dateutil>=2.4.2,<2.9.0
pyjwt>=2.4.0,<2.5.0
PyYAML>=3.12,<6.1
requests>=2.20.0,<2.29.0
pyjwt>=1.6.4,<2.1.0
PyYAML>=3.12,<5.5.0
requests>=2.20.0,<2.26.0
six>=1.13.0,<1.16.0
typing>=3.6.4,<3.8.0 ; python_version < '3.5'
typing>=3.6.4,<3.8.0
urllib3>=1.21.1,<1.27.0
virtualenv>=16,<21

View File

@@ -61,7 +61,6 @@ setup(
'Programming Language :: Python :: 3.7',
'Programming Language :: Python :: 3.8',
'Programming Language :: Python :: 3.9',
'Programming Language :: Python :: 3.10',
'License :: OSI Approved :: Apache Software License',
],