mirror of
https://github.com/clearml/clearml-agent
synced 2025-06-26 18:16:15 +00:00
Compare commits
110 Commits
| Author | SHA1 | Date | |
|---|---|---|---|
|
|
3fe92a92ba | ||
|
|
154db59ce6 | ||
|
|
afffa83063 | ||
|
|
787c7d88bb | ||
|
|
667c2ced3d | ||
|
|
7f5b3c8df4 | ||
|
|
46ded2864d | ||
|
|
40456be948 | ||
|
|
8d51aed679 | ||
|
|
bfc4ba38cd | ||
|
|
3cedc104df | ||
|
|
b367c80477 | ||
|
|
262b6d3a00 | ||
|
|
95e996bfda | ||
|
|
b6d132b226 | ||
|
|
4f17a2c17d | ||
|
|
00e8e9eb5a | ||
|
|
af6a77918f | ||
|
|
855622fd30 | ||
|
|
8cd12810f3 | ||
|
|
ebb955187d | ||
|
|
85e1fadf9b | ||
|
|
249b51a31b | ||
|
|
da19ef26c4 | ||
|
|
f69e16ea9d | ||
|
|
efa1f71dac | ||
|
|
692cb8cf13 | ||
|
|
ebdc215632 | ||
|
|
b2da639582 | ||
|
|
71fdb43f10 | ||
|
|
ca2791c65e | ||
|
|
dd75cedaab | ||
|
|
669fb1a6e5 | ||
|
|
5d517c91b5 | ||
|
|
6be75abc86 | ||
|
|
4c777fa2ee | ||
|
|
dc5e0033c8 | ||
|
|
3dd5973734 | ||
|
|
53d379205f | ||
|
|
57cde21c48 | ||
|
|
396abf13b6 | ||
|
|
6e7fb5f331 | ||
|
|
1d5c118b70 | ||
|
|
18612aac4d | ||
|
|
76c533a2e8 | ||
|
|
9eee213683 | ||
|
|
e4861fc0fb | ||
|
|
53ef984065 | ||
|
|
26e62da1a8 | ||
|
|
d2f3614ab0 | ||
|
|
c6d767bd64 | ||
|
|
efb06891a8 | ||
|
|
70771b12a9 | ||
|
|
3f7a4840cc | ||
|
|
e28048dc25 | ||
|
|
2ef5d38b32 | ||
|
|
d216d70cdf | ||
|
|
0de10345f7 | ||
|
|
a243fa211f | ||
|
|
d794b047be | ||
|
|
f0fd62a28f | ||
|
|
e8493d3807 | ||
|
|
5353e9c44d | ||
|
|
75f5814f9f | ||
|
|
94b8b5520d | ||
|
|
42450dcbc4 | ||
|
|
ef47225d41 | ||
|
|
e61accefb9 | ||
|
|
5c1543d112 | ||
|
|
7ff6aee20c | ||
|
|
37ea381d98 | ||
|
|
67fc884895 | ||
|
|
1e3646b57c | ||
|
|
ba2db4e727 | ||
|
|
077148be00 | ||
|
|
594ee5842e | ||
|
|
a69766bd8b | ||
|
|
857a750eb1 | ||
|
|
26aa50f1b5 | ||
|
|
8b4f1eefc2 | ||
|
|
97c2e21dcc | ||
|
|
918dd39b87 | ||
|
|
7776e906c4 | ||
|
|
1bf865ec08 | ||
|
|
3f1ce847dc | ||
|
|
9006c2d28f | ||
|
|
ec216198a0 | ||
|
|
fe6adbf110 | ||
|
|
2693c565ba | ||
|
|
9054ea37c2 | ||
|
|
7292263f86 | ||
|
|
f8a6cd697f | ||
|
|
ec9d027678 | ||
|
|
48a145a8bd | ||
|
|
71d2ab4ce7 | ||
|
|
12a8872b27 | ||
|
|
820ab4dc0c | ||
|
|
1d1ffd17fb | ||
|
|
d96b8ff906 | ||
|
|
e687418194 | ||
|
|
a5a797ec5e | ||
|
|
ff6cee4a44 | ||
|
|
9acbad28f7 | ||
|
|
560e689ccd | ||
|
|
f66e42ddb1 | ||
|
|
d9856d5de5 | ||
|
|
24177cc5a9 | ||
|
|
178af0dee8 | ||
|
|
51eb0a713c | ||
|
|
249aa006cb |
196
README.md
196
README.md
@@ -9,14 +9,14 @@ ML-Ops scheduler & orchestration solution supporting Linux, macOS and Windows**
|
||||
[](https://img.shields.io/pypi/pyversions/clearml-agent.svg)
|
||||
[](https://img.shields.io/pypi/v/clearml-agent.svg)
|
||||
[](https://pypi.org/project/clearml-agent/)
|
||||
[](https://artifacthub.io/packages/search?repo=allegroai)
|
||||
[](https://artifacthub.io/packages/search?repo=allegroai)
|
||||
</div>
|
||||
|
||||
---
|
||||
|
||||
### ClearML-Agent
|
||||
#### *Formerly known as Trains Agent*
|
||||
|
||||
#### *Formerly known as Trains Agent*
|
||||
|
||||
* Run jobs (experiments) on any local or cloud based resource
|
||||
* Implement optimized resource utilization policies
|
||||
@@ -24,23 +24,31 @@ ML-Ops scheduler & orchestration solution supporting Linux, macOS and Windows**
|
||||
* Launch-and-Forget service containers
|
||||
* [Cloud autoscaling](https://clear.ml/docs/latest/docs/guides/services/aws_autoscaler)
|
||||
* [Customizable cleanup](https://clear.ml/docs/latest/docs/guides/services/cleanup_service)
|
||||
* Advanced [pipeline building and execution](https://clear.ml/docs/latest/docs/guides/frameworks/pytorch/notebooks/table/tabular_training_pipeline)
|
||||
*
|
||||
Advanced [pipeline building and execution](https://clear.ml/docs/latest/docs/guides/frameworks/pytorch/notebooks/table/tabular_training_pipeline)
|
||||
|
||||
It is a zero configuration fire-and-forget execution agent, providing a full ML/DL cluster solution.
|
||||
|
||||
**Full Automation in 5 steps**
|
||||
1. ClearML Server [self-hosted](https://github.com/allegroai/clearml-server) or [free tier hosting](https://app.clear.ml)
|
||||
2. `pip install clearml-agent` ([install](#installing-the-clearml-agent) the ClearML Agent on any GPU machine: on-premises / cloud / ...)
|
||||
3. Create a [job](https://github.com/allegroai/clearml/docs/clearml-task.md) or Add [ClearML](https://github.com/allegroai/clearml) to your code with just 2 lines
|
||||
4. Change the [parameters](#using-the-clearml-agent) in the UI & schedule for [execution](#using-the-clearml-agent) (or automate with an [AutoML pipeline](#automl-and-orchestration-pipelines-))
|
||||
|
||||
1. ClearML Server [self-hosted](https://github.com/allegroai/clearml-server)
|
||||
or [free tier hosting](https://app.clear.ml)
|
||||
2. `pip install clearml-agent` ([install](#installing-the-clearml-agent) the ClearML Agent on any GPU machine:
|
||||
on-premises / cloud / ...)
|
||||
3. Create a [job](https://github.com/allegroai/clearml/docs/clearml-task.md) or
|
||||
Add [ClearML](https://github.com/allegroai/clearml) to your code with just 2 lines
|
||||
4. Change the [parameters](#using-the-clearml-agent) in the UI & schedule for [execution](#using-the-clearml-agent) (or
|
||||
automate with an [AutoML pipeline](#automl-and-orchestration-pipelines-))
|
||||
5. :chart_with_downwards_trend: :chart_with_upwards_trend: :eyes: :beer:
|
||||
|
||||
"All the Deep/Machine-Learning DevOps your research needs, and then some... Because ain't nobody got time for that"
|
||||
|
||||
**Try ClearML now** [Self Hosted](https://github.com/allegroai/clearml-server) or [Free tier Hosting](https://app.clear.ml)
|
||||
**Try ClearML now** [Self Hosted](https://github.com/allegroai/clearml-server)
|
||||
or [Free tier Hosting](https://app.clear.ml)
|
||||
<a href="https://app.clear.ml"><img src="https://github.com/allegroai/clearml-agent/blob/master/docs/screenshots.gif?raw=true" width="100%"></a>
|
||||
|
||||
### Simple, Flexible Experiment Orchestration
|
||||
|
||||
**The ClearML Agent was built to address the DL/ML R&D DevOps needs:**
|
||||
|
||||
* Easily add & remove machines from the cluster
|
||||
@@ -56,20 +64,23 @@ It is a zero configuration fire-and-forget execution agent, providing a full ML/
|
||||
|
||||
*epsilon - Because we are :triangular_ruler: and nothing is really zero work
|
||||
|
||||
|
||||
### Kubernetes Integration (Optional)
|
||||
We think Kubernetes is awesome, but it should be a choice.
|
||||
We designed `clearml-agent` so you can run bare-metal or inside a pod with any mix that fits your environment.
|
||||
|
||||
We think Kubernetes is awesome, but it should be a choice. We designed `clearml-agent` so you can run bare-metal or
|
||||
inside a pod with any mix that fits your environment.
|
||||
|
||||
Find Dockerfiles in the [docker](./docker) dir and a helm Chart in https://github.com/allegroai/clearml-helm-charts
|
||||
#### Benefits of integrating existing K8s with ClearML-Agent
|
||||
|
||||
#### Benefits of integrating existing K8s with ClearML-Agent
|
||||
|
||||
- ClearML-Agent adds the missing scheduling capabilities to K8s
|
||||
- Allowing for more flexible automation from code
|
||||
- A programmatic interface for easier learning curve (and debugging)
|
||||
- Seamless integration with ML/DL experiment manager
|
||||
- Web UI for customization, scheduling & prioritization of jobs
|
||||
- Web UI for customization, scheduling & prioritization of jobs
|
||||
|
||||
**Two K8s integration flavours**
|
||||
|
||||
**Two K8s integration flavours**
|
||||
- Spin ClearML-Agent as a long-lasting service pod
|
||||
- use [clearml-agent](https://hub.docker.com/r/allegroai/clearml-agent) docker image
|
||||
- map docker socket into the pod (soon replaced by [podman](https://github.com/containers/podman))
|
||||
@@ -77,57 +88,66 @@ Find Dockerfiles in the [docker](./docker) dir and a helm Chart in https://githu
|
||||
- benefits: full use of the ClearML scheduling, no need to worry about wrong container images / lost pods etc.
|
||||
- downside: Sibling containers
|
||||
- Kubernetes Glue, map ClearML jobs directly to K8s jobs
|
||||
- Run the [clearml-k8s glue](https://github.com/allegroai/clearml-agent/blob/master/examples/k8s_glue_example.py) on a K8s cpu node
|
||||
- The clearml-k8s glue pulls jobs from the ClearML job execution queue and prepares a K8s job (based on provided yaml template)
|
||||
- Inside the pod itself the clearml-agent will install the job (experiment) environment and spin and monitor the experiment's process
|
||||
- Run the [clearml-k8s glue](https://github.com/allegroai/clearml-agent/blob/master/examples/k8s_glue_example.py) on
|
||||
a K8s cpu node
|
||||
- The clearml-k8s glue pulls jobs from the ClearML job execution queue and prepares a K8s job (based on provided
|
||||
yaml template)
|
||||
- Inside the pod itself the clearml-agent will install the job (experiment) environment and spin and monitor the
|
||||
experiment's process
|
||||
- benefits: Kubernetes full view of all running jobs in the system
|
||||
- downside: No real scheduling (k8s scheduler), no docker image verification (post-mortem only)
|
||||
- downside: No real scheduling (k8s scheduler), no docker image verification (post-mortem only)
|
||||
|
||||
### Using the ClearML Agent
|
||||
|
||||
**Full scale HPC with a click of a button**
|
||||
|
||||
The ClearML Agent is a job scheduler that listens on job queue(s), pulls jobs, sets the job environments, executes the job and monitors its progress.
|
||||
The ClearML Agent is a job scheduler that listens on job queue(s), pulls jobs, sets the job environments, executes the
|
||||
job and monitors its progress.
|
||||
|
||||
Any 'Draft' experiment can be scheduled for execution by a ClearML agent.
|
||||
|
||||
A previously run experiment can be put into 'Draft' state by either of two methods:
|
||||
* Using the **'Reset'** action from the experiment right-click context menu in the
|
||||
ClearML UI - This will clear any results and artifacts the previous run had created.
|
||||
* Using the **'Clone'** action from the experiment right-click context menu in the
|
||||
ClearML UI - This will create a new 'Draft' experiment with the same configuration as the original experiment.
|
||||
|
||||
An experiment is scheduled for execution using the **'Enqueue'** action from the experiment
|
||||
right-click context menu in the ClearML UI and selecting the execution queue.
|
||||
* Using the **'Reset'** action from the experiment right-click context menu in the ClearML UI - This will clear any
|
||||
results and artifacts the previous run had created.
|
||||
* Using the **'Clone'** action from the experiment right-click context menu in the ClearML UI - This will create a new '
|
||||
Draft' experiment with the same configuration as the original experiment.
|
||||
|
||||
An experiment is scheduled for execution using the **'Enqueue'** action from the experiment right-click context menu in
|
||||
the ClearML UI and selecting the execution queue.
|
||||
|
||||
See [creating an experiment and enqueuing it for execution](#from-scratch).
|
||||
|
||||
Once an experiment is enqueued, it will be picked up and executed by a ClearML agent monitoring this queue.
|
||||
|
||||
The ClearML UI Workers & Queues page provides ongoing execution information:
|
||||
- Workers Tab: Monitor you cluster
|
||||
|
||||
- Workers Tab: Monitor you cluster
|
||||
- Review available resources
|
||||
- Monitor machines statistics (CPU / GPU / Disk / Network)
|
||||
- Queues Tab:
|
||||
- Queues Tab:
|
||||
- Control the scheduling order of jobs
|
||||
- Cancel or abort job execution
|
||||
- Move jobs between execution queues
|
||||
|
||||
#### What The ClearML Agent Actually Does
|
||||
|
||||
The ClearML Agent executes experiments using the following process:
|
||||
- Create a new virtual environment (or launch the selected docker image)
|
||||
- Clone the code into the virtual-environment (or inside the docker)
|
||||
- Install python packages based on the package requirements listed for the experiment
|
||||
- Special note for PyTorch: The ClearML Agent will automatically select the
|
||||
torch packages based on the CUDA_VERSION environment variable of the machine
|
||||
- Execute the code, while monitoring the process
|
||||
- Log all stdout/stderr in the ClearML UI, including the cloning and installation process, for easy debugging
|
||||
- Monitor the execution and allow you to manually abort the job using the ClearML UI (or, in the unfortunate case of a code crash, catch the error and signal the experiment has failed)
|
||||
|
||||
- Create a new virtual environment (or launch the selected docker image)
|
||||
- Clone the code into the virtual-environment (or inside the docker)
|
||||
- Install python packages based on the package requirements listed for the experiment
|
||||
- Special note for PyTorch: The ClearML Agent will automatically select the torch packages based on the CUDA_VERSION
|
||||
environment variable of the machine
|
||||
- Execute the code, while monitoring the process
|
||||
- Log all stdout/stderr in the ClearML UI, including the cloning and installation process, for easy debugging
|
||||
- Monitor the execution and allow you to manually abort the job using the ClearML UI (or, in the unfortunate case of a
|
||||
code crash, catch the error and signal the experiment has failed)
|
||||
|
||||
#### System Design & Flow
|
||||
|
||||
<img src="https://github.com/allegroai/clearml-agent/blob/master/docs/clearml_architecture.png" width="100%" alt="clearml-architecture">
|
||||
|
||||
|
||||
#### Installing the ClearML Agent
|
||||
|
||||
```bash
|
||||
@@ -137,6 +157,7 @@ pip install clearml-agent
|
||||
#### ClearML Agent Usage Examples
|
||||
|
||||
Full Interface and capabilities are available with
|
||||
|
||||
```bash
|
||||
clearml-agent --help
|
||||
clearml-agent daemon --help
|
||||
@@ -148,7 +169,8 @@ clearml-agent daemon --help
|
||||
clearml-agent init
|
||||
```
|
||||
|
||||
Note: The ClearML Agent uses a cache folder to cache pip packages, apt packages and cloned repositories. The default ClearML Agent cache folder is `~/.clearml`
|
||||
Note: The ClearML Agent uses a cache folder to cache pip packages, apt packages and cloned repositories. The default
|
||||
ClearML Agent cache folder is `~/.clearml`
|
||||
|
||||
See full details in your configuration file at `~/clearml.conf`
|
||||
|
||||
@@ -158,29 +180,36 @@ They are designed to share the same configuration file, see example [here](docs/
|
||||
#### Running the ClearML Agent
|
||||
|
||||
For debug and experimentation, start the ClearML agent in `foreground` mode, where all the output is printed to screen
|
||||
|
||||
```bash
|
||||
clearml-agent daemon --queue default --foreground
|
||||
```
|
||||
|
||||
For actual service mode, all the stdout will be stored automatically into a temporary file (no need to pipe)
|
||||
Notice: with `--detached` flag, the *clearml-agent* will be running in the background
|
||||
|
||||
```bash
|
||||
clearml-agent daemon --detached --queue default
|
||||
```
|
||||
|
||||
GPU allocation is controlled via the standard OS environment `NVIDIA_VISIBLE_DEVICES` or `--gpus` flag (or disabled with `--cpu-only`).
|
||||
GPU allocation is controlled via the standard OS environment `NVIDIA_VISIBLE_DEVICES` or `--gpus` flag (or disabled
|
||||
with `--cpu-only`).
|
||||
|
||||
If no flag is set, and `NVIDIA_VISIBLE_DEVICES` variable doesn't exist, all GPU's will be allocated for the `clearml-agent` <br>
|
||||
If `--cpu-only` flag is set, or `NVIDIA_VISIBLE_DEVICES` is an empty string (""), no gpu will be allocated for the `clearml-agent`
|
||||
If no flag is set, and `NVIDIA_VISIBLE_DEVICES` variable doesn't exist, all GPU's will be allocated for
|
||||
the `clearml-agent` <br>
|
||||
If `--cpu-only` flag is set, or `NVIDIA_VISIBLE_DEVICES="none"`, no gpu will be allocated for
|
||||
the `clearml-agent`
|
||||
|
||||
Example: spin two agents, one per gpu on the same machine:
|
||||
Notice: with `--detached` flag, the *clearml-agent* will be running in the background
|
||||
|
||||
```bash
|
||||
clearml-agent daemon --detached --gpus 0 --queue default
|
||||
clearml-agent daemon --detached --gpus 1 --queue default
|
||||
```
|
||||
|
||||
Example: spin two agents, pulling from dedicated `dual_gpu` queue, two gpu's per agent
|
||||
|
||||
```bash
|
||||
clearml-agent daemon --detached --gpus 0,1 --queue dual_gpu
|
||||
clearml-agent daemon --detached --gpus 2,3 --queue dual_gpu
|
||||
@@ -189,23 +218,29 @@ clearml-agent daemon --detached --gpus 2,3 --queue dual_gpu
|
||||
##### Starting the ClearML Agent in docker mode
|
||||
|
||||
For debug and experimentation, start the ClearML agent in `foreground` mode, where all the output is printed to screen
|
||||
|
||||
```bash
|
||||
clearml-agent daemon --queue default --docker --foreground
|
||||
```
|
||||
|
||||
For actual service mode, all the stdout will be stored automatically into a file (no need to pipe)
|
||||
Notice: with `--detached` flag, the *clearml-agent* will be running in the background
|
||||
|
||||
```bash
|
||||
clearml-agent daemon --detached --queue default --docker
|
||||
```
|
||||
|
||||
Example: spin two agents, one per gpu on the same machine, with default nvidia/cuda:10.1-cudnn7-runtime-ubuntu18.04 docker:
|
||||
Example: spin two agents, one per gpu on the same machine, with default nvidia/cuda:10.1-cudnn7-runtime-ubuntu18.04
|
||||
docker:
|
||||
|
||||
```bash
|
||||
clearml-agent daemon --detached --gpus 0 --queue default --docker nvidia/cuda:10.1-cudnn7-runtime-ubuntu18.04
|
||||
clearml-agent daemon --detached --gpus 1 --queue default --docker nvidia/cuda:10.1-cudnn7-runtime-ubuntu18.04
|
||||
```
|
||||
|
||||
Example: spin two agents, pulling from dedicated `dual_gpu` queue, two gpu's per agent, with default nvidia/cuda:10.1-cudnn7-runtime-ubuntu18.04 docker:
|
||||
Example: spin two agents, pulling from dedicated `dual_gpu` queue, two gpu's per agent, with default nvidia/cuda:
|
||||
10.1-cudnn7-runtime-ubuntu18.04 docker:
|
||||
|
||||
```bash
|
||||
clearml-agent daemon --detached --gpus 0,1 --queue dual_gpu --docker nvidia/cuda:10.1-cudnn7-runtime-ubuntu18.04
|
||||
clearml-agent daemon --detached --gpus 2,3 --queue dual_gpu --docker nvidia/cuda:10.1-cudnn7-runtime-ubuntu18.04
|
||||
@@ -216,55 +251,61 @@ clearml-agent daemon --detached --gpus 2,3 --queue dual_gpu --docker nvidia/cuda
|
||||
Priority Queues are also supported, example use case:
|
||||
|
||||
High priority queue: `important_jobs` Low priority queue: `default`
|
||||
|
||||
```bash
|
||||
clearml-agent daemon --queue important_jobs default
|
||||
```
|
||||
The **ClearML Agent** will first try to pull jobs from the `important_jobs` queue, only then it will fetch a job from the `default` queue.
|
||||
|
||||
Adding queues, managing job order within a queue and moving jobs between queues, is available using the Web UI, see example on our [free server](https://app.clear.ml/workers-and-queues/queues)
|
||||
The **ClearML Agent** will first try to pull jobs from the `important_jobs` queue, only then it will fetch a job from
|
||||
the `default` queue.
|
||||
|
||||
Adding queues, managing job order within a queue and moving jobs between queues, is available using the Web UI, see
|
||||
example on our [free server](https://app.clear.ml/workers-and-queues/queues)
|
||||
|
||||
##### Stopping the ClearML Agent
|
||||
|
||||
To stop a **ClearML Agent** running in the background, run the same command line used to start the agent with `--stop` appended.
|
||||
For example, to stop the first of the above shown same machine, single gpu agents:
|
||||
To stop a **ClearML Agent** running in the background, run the same command line used to start the agent with `--stop`
|
||||
appended. For example, to stop the first of the above shown same machine, single gpu agents:
|
||||
|
||||
```bash
|
||||
clearml-agent daemon --detached --gpus 0 --queue default --docker nvidia/cuda:10.1-cudnn7-runtime-ubuntu18.04 --stop
|
||||
```
|
||||
|
||||
### How do I create an experiment on the ClearML Server? <a name="from-scratch"></a>
|
||||
|
||||
* Integrate [ClearML](https://github.com/allegroai/clearml) with your code
|
||||
* Execute the code on your machine (Manually / PyCharm / Jupyter Notebook)
|
||||
* As your code is running, **ClearML** creates an experiment logging all the necessary execution information:
|
||||
- Git repository link and commit ID (or an entire jupyter notebook)
|
||||
- Git diff (we’re not saying you never commit and push, but still...)
|
||||
- Python packages used by your code (including specific versions used)
|
||||
- Hyper-Parameters
|
||||
- Input Artifacts
|
||||
- Git repository link and commit ID (or an entire jupyter notebook)
|
||||
- Git diff (we’re not saying you never commit and push, but still...)
|
||||
- Python packages used by your code (including specific versions used)
|
||||
- Hyper-Parameters
|
||||
- Input Artifacts
|
||||
|
||||
You now have a 'template' of your experiment with everything required for automated execution
|
||||
|
||||
* In the ClearML UI, Right click on the experiment and select 'clone'. A copy of your experiment will be created.
|
||||
* In the ClearML UI, Right-click on the experiment and select 'clone'. A copy of your experiment will be created.
|
||||
* You now have a new draft experiment cloned from your original experiment, feel free to edit it
|
||||
- Change the Hyper-Parameters
|
||||
- Switch to the latest code base of the repository
|
||||
- Update package versions
|
||||
- Select a specific docker image to run in (see docker execution mode section)
|
||||
- Or simply change nothing to run the same experiment again...
|
||||
- Change the Hyper-Parameters
|
||||
- Switch to the latest code base of the repository
|
||||
- Update package versions
|
||||
- Select a specific docker image to run in (see docker execution mode section)
|
||||
- Or simply change nothing to run the same experiment again...
|
||||
* Schedule the newly created experiment for execution: Right-click the experiment and select 'enqueue'
|
||||
|
||||
### ClearML-Agent Services Mode <a name="services"></a>
|
||||
|
||||
ClearML-Agent Services is a special mode of ClearML-Agent that provides the ability to launch long-lasting jobs
|
||||
that previously had to be executed on local / dedicated machines. It allows a single agent to
|
||||
launch multiple dockers (Tasks) for different use cases. To name a few use cases, auto-scaler service (spinning instances
|
||||
when the need arises and the budget allows), Controllers (Implementing pipelines and more sophisticated DevOps logic),
|
||||
Optimizer (such as Hyper-parameter Optimization or sweeping), and Application (such as interactive Bokeh apps for
|
||||
increased data transparency)
|
||||
ClearML-Agent Services is a special mode of ClearML-Agent that provides the ability to launch long-lasting jobs that
|
||||
previously had to be executed on local / dedicated machines. It allows a single agent to launch multiple dockers (Tasks)
|
||||
for different use cases. To name a few use cases, auto-scaler service (spinning instances when the need arises and the
|
||||
budget allows), Controllers (Implementing pipelines and more sophisticated DevOps logic), Optimizer (such as
|
||||
Hyper-parameter Optimization or sweeping), and Application (such as interactive Bokeh apps for increased data
|
||||
transparency)
|
||||
|
||||
ClearML-Agent Services mode will spin **any** task enqueued into the specified queue.
|
||||
Every task launched by ClearML-Agent Services will be registered as a new node in the system,
|
||||
providing tracking and transparency capabilities.
|
||||
Currently clearml-agent in services-mode supports cpu only configuration. ClearML-agent services mode can be launched alongside GPU agents.
|
||||
ClearML-Agent Services mode will spin **any** task enqueued into the specified queue. Every task launched by
|
||||
ClearML-Agent Services will be registered as a new node in the system, providing tracking and transparency capabilities.
|
||||
Currently clearml-agent in services-mode supports cpu only configuration. ClearML-agent services mode can be launched
|
||||
alongside GPU agents.
|
||||
|
||||
```bash
|
||||
clearml-agent daemon --services-mode --detached --queue services --create-queue --docker ubuntu:18.04 --cpu-only
|
||||
@@ -272,22 +313,27 @@ clearml-agent daemon --services-mode --detached --queue services --create-queue
|
||||
|
||||
**Note**: It is the user's responsibility to make sure the proper tasks are pushed into the specified queue.
|
||||
|
||||
|
||||
### AutoML and Orchestration Pipelines <a name="automl-pipes"></a>
|
||||
The ClearML Agent can also be used to implement AutoML orchestration and Experiment Pipelines in conjunction with the ClearML package.
|
||||
|
||||
Sample AutoML & Orchestration examples can be found in the ClearML [example/automation](https://github.com/allegroai/clearml/tree/master/examples/automation) folder.
|
||||
The ClearML Agent can also be used to implement AutoML orchestration and Experiment Pipelines in conjunction with the
|
||||
ClearML package.
|
||||
|
||||
Sample AutoML & Orchestration examples can be found in the
|
||||
ClearML [example/automation](https://github.com/allegroai/clearml/tree/master/examples/automation) folder.
|
||||
|
||||
AutoML examples
|
||||
- [Toy Keras training experiment](https://github.com/allegroai/clearml/blob/master/examples/optimization/hyper-parameter-optimization/base_template_keras_simple.py)
|
||||
|
||||
- [Toy Keras training experiment](https://github.com/allegroai/clearml/blob/master/examples/optimization/hyper-parameter-optimization/base_template_keras_simple.py)
|
||||
- In order to create an experiment-template in the system, this code must be executed once manually
|
||||
- [Random Search over the above Keras experiment-template](https://github.com/allegroai/clearml/blob/master/examples/automation/manual_random_param_search_example.py)
|
||||
- This example will create multiple copies of the Keras experiment-template, with different hyper-parameter combinations
|
||||
- [Random Search over the above Keras experiment-template](https://github.com/allegroai/clearml/blob/master/examples/automation/manual_random_param_search_example.py)
|
||||
- This example will create multiple copies of the Keras experiment-template, with different hyper-parameter
|
||||
combinations
|
||||
|
||||
Experiment Pipeline examples
|
||||
- [First step experiment](https://github.com/allegroai/clearml/blob/master/examples/automation/task_piping_example.py)
|
||||
|
||||
- [First step experiment](https://github.com/allegroai/clearml/blob/master/examples/automation/task_piping_example.py)
|
||||
- This example will "process data", and once done, will launch a copy of the 'second step' experiment-template
|
||||
- [Second step experiment](https://github.com/allegroai/clearml/blob/master/examples/automation/toy_base_task.py)
|
||||
- [Second step experiment](https://github.com/allegroai/clearml/blob/master/examples/automation/toy_base_task.py)
|
||||
- In order to create an experiment-template in the system, this code must be executed once manually
|
||||
|
||||
### License
|
||||
|
||||
@@ -18,6 +18,8 @@
|
||||
# https://docs.gitlab.com/ee/user/profile/personal_access_tokens.html
|
||||
# git_user: ""
|
||||
# git_pass: ""
|
||||
# Limit credentials to a single domain, for example: github.com,
|
||||
# all other domains will use public access (no user/pass). Default: always send user/pass for any VCS domain
|
||||
# git_host: ""
|
||||
|
||||
# Force GIT protocol to use SSH regardless of the git url (Assumes GIT user/pass are blank)
|
||||
@@ -39,6 +41,13 @@
|
||||
# default false, only the working directory will be added to the PYHTONPATH
|
||||
# force_git_root_python_path: false
|
||||
|
||||
# if set, use GIT_ASKPASS to pass user/pass when cloning / fetch repositories
|
||||
# it solves passing user/token to git submodules.
|
||||
# this is a safer way to ensure multiple users using the same repository will
|
||||
# not accidentally leak credentials
|
||||
# Only supported on Linux systems, it will be the default in future releases
|
||||
# enable_git_ask_pass: false
|
||||
|
||||
# in docker mode, if container's entrypoint automatically activated a virtual environment
|
||||
# use the activated virtual environment and install everything there
|
||||
# set to False to disable, and always create a new venv inheriting from the system_site_packages
|
||||
@@ -56,19 +65,20 @@
|
||||
# supported options: pip, conda, poetry
|
||||
type: pip,
|
||||
|
||||
# specify pip version to use (examples "<20", "==19.3.1", "", empty string will install the latest version)
|
||||
pip_version: "<20.2",
|
||||
# specify pip version to use (examples "<20.2", "==19.3.1", "", empty string will install the latest version)
|
||||
pip_version: ["<20.2 ; python_version < '3.10'", "<22.3 ; python_version >= '3.10'"],
|
||||
# specify poetry version to use (examples "<2", "==1.1.1", "", empty string will install the latest version)
|
||||
# poetry_version: "<2",
|
||||
# poetry_install_extra_args: ["-v"]
|
||||
|
||||
# virtual environment inheres packages from system
|
||||
# virtual environment inherits packages from system
|
||||
system_site_packages: false,
|
||||
|
||||
# install with --upgrade
|
||||
force_upgrade: false,
|
||||
|
||||
# additional artifact repositories to use when installing python packages
|
||||
# extra_index_url: ["https://allegroai.jfrog.io/clearmlai/api/pypi/public/simple"]
|
||||
# extra_index_url: ["https://allegroai.jfrog.io/clearml/api/pypi/public/simple"]
|
||||
|
||||
# additional conda channels to use when installing with conda package manager
|
||||
conda_channels: ["pytorch", "conda-forge", "defaults", ]
|
||||
@@ -83,7 +93,7 @@
|
||||
# set the optional priority packages to be installed before the rest of the required packages,
|
||||
# In case a package installation fails, the package will be ignored,
|
||||
# and the virtual environment process will continue
|
||||
# priority_optional_packages: ["pygobject", ]
|
||||
priority_optional_packages: ["pygobject", ]
|
||||
|
||||
# set the post packages to be installed after all the rest of the required packages
|
||||
# post_packages: ["horovod", ]
|
||||
@@ -96,6 +106,10 @@
|
||||
# set to True to support torch nightly build installation,
|
||||
# notice: torch nightly builds are ephemeral and are deleted from time to time
|
||||
torch_nightly: false,
|
||||
|
||||
# if set to true, the agent will look for the "poetry.lock" file
|
||||
# in the passed current working directory instead of the repository's root directory.
|
||||
poetry_files_from_repo_working_dir: false
|
||||
},
|
||||
|
||||
# target folder for virtual environments builds, created when executing experiment
|
||||
@@ -108,7 +122,7 @@
|
||||
# minimum required free space to allow for cache entry, disable by passing 0 or negative value
|
||||
free_space_threshold_gb: 2.0
|
||||
# unmark to enable virtual environment caching
|
||||
# path: ~/.clearml/venvs-cache
|
||||
path: ~/.clearml/venvs-cache
|
||||
},
|
||||
|
||||
# cached git clone folder
|
||||
@@ -130,6 +144,12 @@
|
||||
},
|
||||
|
||||
translate_ssh: true,
|
||||
|
||||
# set "disable_ssh_mount: true" to disable the automatic mount of ~/.ssh folder into the docker containers
|
||||
# default is false, automatically mounts ~/.ssh
|
||||
# Must be set to True if using "clearml-session" with this agent!
|
||||
# disable_ssh_mount: false
|
||||
|
||||
# reload configuration file every daemon execution
|
||||
reload_config: false,
|
||||
|
||||
@@ -202,8 +222,8 @@
|
||||
# default is True, report a single \r line in a sequence of consecutive lines, per 5 seconds.
|
||||
# suppress_carriage_return: true
|
||||
|
||||
# cuda versions used for solving pytorch wheel packages
|
||||
# should be detected automatically. Override with os environment CUDA_VERSION / CUDNN_VERSION
|
||||
# CUDA versions used for Conda setup & solving PyTorch wheel packages
|
||||
# Should be detected automatically. Override with os environment CUDA_VERSION / CUDNN_VERSION
|
||||
# cuda_version: 10.1
|
||||
# cudnn_version: 7.6
|
||||
|
||||
@@ -220,24 +240,28 @@
|
||||
parse_embedded_urls: true
|
||||
}
|
||||
|
||||
# Maximum execution time (in seconds) for Task's abort function call
|
||||
abort_callback_max_timeout: 1800
|
||||
|
||||
# allow to set internal mount points inside the docker,
|
||||
# especially useful for non-root docker container images.
|
||||
docker_internal_mounts {
|
||||
sdk_cache: "/clearml_agent_cache"
|
||||
apt_cache: "/var/cache/apt/archives"
|
||||
ssh_folder: "/root/.ssh"
|
||||
ssh_folder: "~/.ssh"
|
||||
ssh_ro_folder: "/.ssh"
|
||||
pip_cache: "/root/.cache/pip"
|
||||
poetry_cache: "/root/.cache/pypoetry"
|
||||
vcs_cache: "/root/.clearml/vcs-cache"
|
||||
venv_build: "/root/.clearml/venvs-builds"
|
||||
venv_build: "~/.clearml/venvs-builds"
|
||||
pip_download: "/root/.clearml/pip-download-cache"
|
||||
}
|
||||
|
||||
# Name docker containers created by the daemon using the following string format (supported from Docker 0.6.5)
|
||||
# Allowed variables are task_id, worker_id and rand_string (random lower-case letters string, up to 32 characters)
|
||||
# Note: resulting name must start with an alphanumeric character and continue with alphanumeric characters,
|
||||
# underscores (_), dots (.) and/or dashes (-)
|
||||
#docker_container_name_format: "clearml-id-{task_id}-{rand_string:.8}"
|
||||
# Note: resulting name must start with an alphanumeric character and
|
||||
# continue with alphanumeric characters, underscores (_), dots (.) and/or dashes (-)
|
||||
# docker_container_name_format: "clearml-id-{task_id}-{rand_string:.8}"
|
||||
|
||||
# Apply top-level environment section from configuration into os.environ
|
||||
apply_environment: true
|
||||
@@ -308,4 +332,57 @@
|
||||
# into the file specified in CLEARML_CUSTOM_BUILD_OUTPUT, the agent will emit a warning and continue with the
|
||||
# standard flow.
|
||||
custom_build_script: ""
|
||||
|
||||
# Crash on exception: by default when encountering an exception while running a task,
|
||||
# the agent will catch the exception, log it and continue running.
|
||||
# Set this to `true` to propagate exceptions and crash the agent.
|
||||
# crash_on_exception: true
|
||||
|
||||
# Disable task docker override. If true, the agent will use the default docker image and ignore any docker image
|
||||
# and arguments specified in the task's container section (setup shell script from the task container section will
|
||||
# be used in any case, if specified).
|
||||
disable_task_docker_override: false
|
||||
|
||||
# Choose the default docker based on the Task properties,
|
||||
# Examples: 'script.requirements', 'script.binary', 'script.repository', 'script.branch', 'project'
|
||||
# Notice: Matching is done via regular expression, for example "^searchme$" will match exactly "searchme$" string
|
||||
#
|
||||
# "default_docker": {
|
||||
# "image": "nvidia/cuda:10.2-cudnn7-runtime-ubuntu18.04",
|
||||
# # optional arguments to pass to docker image
|
||||
# # arguments: ["--ipc=host", ]
|
||||
# "match_rules": [
|
||||
# {
|
||||
# "image": "sample_container:tag",
|
||||
# "arguments": "-e VALUE=1 --ipc=host",
|
||||
# "match": {
|
||||
# "script": {
|
||||
# "requirements": {
|
||||
# "pip": {
|
||||
# "tensorflow": "~=1.6"
|
||||
# }
|
||||
# },
|
||||
# "repository": "",
|
||||
# "branch": "master"
|
||||
# },
|
||||
# "project": "example"
|
||||
# }
|
||||
# },
|
||||
# {
|
||||
# "image": "better_container:tag",
|
||||
# "arguments": "",
|
||||
# "match": {
|
||||
# "container": "replace_me_please"
|
||||
# }
|
||||
# },
|
||||
# {
|
||||
# "image": "another_container:tag",
|
||||
# "arguments": "",
|
||||
# "match": {
|
||||
# "project": "^examples", # anything that starts with "examples", e.g. "examples", "examples/sub_project"
|
||||
# }
|
||||
# }
|
||||
# ]
|
||||
# },
|
||||
#
|
||||
}
|
||||
|
||||
@@ -28,6 +28,9 @@
|
||||
|
||||
pool_maxsize: 512
|
||||
pool_connections: 512
|
||||
|
||||
# Override the default http method, use "put" if working behind GCP load balancer (default: "get")
|
||||
# default_method: "get"
|
||||
}
|
||||
|
||||
auth {
|
||||
|
||||
@@ -4,7 +4,7 @@ import re
|
||||
import attr
|
||||
import six
|
||||
|
||||
import pyhocon
|
||||
from clearml_agent.external import pyhocon
|
||||
|
||||
from .action import Action
|
||||
|
||||
|
||||
@@ -66,11 +66,16 @@ class DataModel(object):
|
||||
}
|
||||
|
||||
def validate(self, schema=None):
|
||||
jsonschema.validate(
|
||||
self.to_dict(),
|
||||
schema or self._schema,
|
||||
types=dict(array=(list, tuple), integer=six.integer_types),
|
||||
schema = schema or self._schema
|
||||
validator = jsonschema.validators.validator_for(schema)
|
||||
validator_cls = jsonschema.validators.extend(
|
||||
validator=validator,
|
||||
type_checker=validator.TYPE_CHECKER.redefine_many({
|
||||
"array": lambda s, instance: isinstance(instance, (list, tuple)),
|
||||
"integer": lambda s, instance: isinstance(instance, six.integer_types),
|
||||
}),
|
||||
)
|
||||
jsonschema.validate(self.to_dict(), schema, cls=validator_cls)
|
||||
|
||||
def __repr__(self):
|
||||
return '<{}.{}: {}>'.format(
|
||||
|
||||
@@ -8,13 +8,14 @@ from .datamodel import DataModel
|
||||
from .defs import ENV_API_DEFAULT_REQ_METHOD
|
||||
|
||||
|
||||
if ENV_API_DEFAULT_REQ_METHOD.get().upper() not in ("GET", "POST"):
|
||||
if ENV_API_DEFAULT_REQ_METHOD.get().upper() not in ("GET", "POST", "PUT"):
|
||||
raise ValueError(
|
||||
"CLEARML_API_DEFAULT_REQ_METHOD environment variable must be 'get' or 'post' (any case is allowed)."
|
||||
)
|
||||
|
||||
|
||||
class Request(ApiModel):
|
||||
def_method = ENV_API_DEFAULT_REQ_METHOD.get(default="get")
|
||||
_method = ENV_API_DEFAULT_REQ_METHOD.get(default="get")
|
||||
|
||||
def __init__(self, **kwargs):
|
||||
|
||||
@@ -2,20 +2,25 @@
|
||||
import json as json_lib
|
||||
import os
|
||||
import sys
|
||||
import time
|
||||
import types
|
||||
from random import SystemRandom
|
||||
from socket import gethostname
|
||||
from typing import Optional
|
||||
|
||||
import jwt
|
||||
import requests
|
||||
import six
|
||||
from pyhocon import ConfigTree, ConfigFactory
|
||||
from requests import RequestException
|
||||
from requests.auth import HTTPBasicAuth
|
||||
from six.moves.urllib.parse import urlparse, urlunparse
|
||||
|
||||
from clearml_agent.external.pyhocon import ConfigTree, ConfigFactory
|
||||
|
||||
from .callresult import CallResult
|
||||
from .defs import ENV_VERBOSE, ENV_HOST, ENV_ACCESS_KEY, ENV_SECRET_KEY, ENV_WEB_HOST, ENV_FILES_HOST, ENV_AUTH_TOKEN, \
|
||||
ENV_NO_DEFAULT_SERVER, ENV_DISABLE_VAULT_SUPPORT, ENV_INITIAL_CONNECT_RETRY_OVERRIDE, ENV_API_DEFAULT_REQ_METHOD
|
||||
from .defs import (
|
||||
ENV_VERBOSE, ENV_HOST, ENV_ACCESS_KEY, ENV_SECRET_KEY, ENV_WEB_HOST, ENV_FILES_HOST, ENV_AUTH_TOKEN,
|
||||
ENV_NO_DEFAULT_SERVER, ENV_DISABLE_VAULT_SUPPORT, ENV_INITIAL_CONNECT_RETRY_OVERRIDE, ENV_API_DEFAULT_REQ_METHOD, )
|
||||
from .request import Request, BatchRequest
|
||||
from .token_manager import TokenManager
|
||||
from ..config import load
|
||||
@@ -24,6 +29,9 @@ from ...backend_config.environment import backward_compatibility_support
|
||||
from ...version import __version__
|
||||
|
||||
|
||||
sys_random = SystemRandom()
|
||||
|
||||
|
||||
class LoginError(Exception):
|
||||
pass
|
||||
|
||||
@@ -47,6 +55,7 @@ class Session(TokenManager):
|
||||
_session_initial_retry_connect_override = 4
|
||||
_write_session_data_size = 15000
|
||||
_write_session_timeout = (30.0, 30.)
|
||||
_request_exception_retry_timeout = (2.0, 3.0)
|
||||
|
||||
api_version = '2.1'
|
||||
feature_set = 'basic'
|
||||
@@ -109,6 +118,9 @@ class Session(TokenManager):
|
||||
self._verbose = verbose if verbose is not None else ENV_VERBOSE.get()
|
||||
self._logger = logger
|
||||
self.__auth_token = None
|
||||
self._propagate_exceptions_on_send = True
|
||||
|
||||
self.update_default_api_method()
|
||||
|
||||
if ENV_AUTH_TOKEN.get(
|
||||
value_cb=lambda key, value: print("Using environment access token {}=********".format(key))
|
||||
@@ -163,6 +175,10 @@ class Session(TokenManager):
|
||||
)
|
||||
# try to connect with the server
|
||||
self.refresh_token()
|
||||
|
||||
# for resilience, from now on we won't allow propagating exceptions when sending requests
|
||||
self._propagate_exceptions_on_send = False
|
||||
|
||||
# create the default session with many retries
|
||||
http_retries_config, self.__http_session = self._setup_session(http_retries_config)
|
||||
|
||||
@@ -208,7 +224,22 @@ class Session(TokenManager):
|
||||
|
||||
return http_retries_config, get_http_session_with_retry(config=self.config or None, **http_retries_config)
|
||||
|
||||
def update_default_api_method(self):
|
||||
if ENV_API_DEFAULT_REQ_METHOD.get(default=None):
|
||||
# Make sure we update the config object, so we pass it into the new containers when we map them
|
||||
self.config.put("api.http.default_method", ENV_API_DEFAULT_REQ_METHOD.get())
|
||||
# notice the default setting of Request.def_method are already set by the OS environment
|
||||
elif self.config.get("api.http.default_method", None):
|
||||
def_method = str(self.config.get("api.http.default_method", None)).strip()
|
||||
if def_method.upper() not in ("GET", "POST", "PUT"):
|
||||
raise ValueError(
|
||||
"api.http.default_method variable must be 'get', 'post' or 'put' (any case is allowed)."
|
||||
)
|
||||
Request.def_method = def_method
|
||||
Request._method = Request.def_method
|
||||
|
||||
def load_vaults(self):
|
||||
# () -> Optional[bool]
|
||||
if not self.check_min_api_version("2.15") or self.feature_set == "basic":
|
||||
return
|
||||
|
||||
@@ -229,12 +260,14 @@ class Session(TokenManager):
|
||||
|
||||
# noinspection PyBroadException
|
||||
try:
|
||||
res = self.send_request("users", "get_vaults", json={"enabled": True, "types": ["config"]})
|
||||
# Use params and not data/json otherwise payload might be dropped if we're using GET with a strict firewall
|
||||
res = self.send_request("users", "get_vaults", params="enabled=true&types=config&types=config")
|
||||
if res.ok:
|
||||
vaults = res.json().get("data", {}).get("vaults", [])
|
||||
data = list(filter(None, map(parse, vaults)))
|
||||
if data:
|
||||
self.config.set_overrides(*data)
|
||||
return True
|
||||
elif res.status_code != 404:
|
||||
raise Exception(res.json().get("meta", {}).get("result_msg", res.text))
|
||||
except Exception as ex:
|
||||
@@ -251,12 +284,13 @@ class Session(TokenManager):
|
||||
service,
|
||||
action,
|
||||
version=None,
|
||||
method="get",
|
||||
method=Request.def_method,
|
||||
headers=None,
|
||||
auth=None,
|
||||
data=None,
|
||||
json=None,
|
||||
refresh_token_if_unauthorized=True,
|
||||
params=None,
|
||||
):
|
||||
""" Internal implementation for making a raw API request.
|
||||
- Constructs the api endpoint name
|
||||
@@ -280,6 +314,7 @@ class Session(TokenManager):
|
||||
if version
|
||||
else "{host}/{service}.{action}"
|
||||
).format(**locals())
|
||||
|
||||
while True:
|
||||
if data and len(data) > self._write_session_data_size:
|
||||
timeout = self._write_session_timeout
|
||||
@@ -287,16 +322,29 @@ class Session(TokenManager):
|
||||
timeout = self._session_initial_timeout
|
||||
else:
|
||||
timeout = self._session_timeout
|
||||
res = self.__http_session.request(
|
||||
method, url, headers=headers, auth=auth, data=data, json=json, timeout=timeout)
|
||||
|
||||
try:
|
||||
res = self.__http_session.request(
|
||||
method, url, headers=headers, auth=auth, data=data, json=json, timeout=timeout, params=params)
|
||||
except RequestException as ex:
|
||||
if self._propagate_exceptions_on_send:
|
||||
raise
|
||||
sleep_time = sys_random.uniform(*self._request_exception_retry_timeout)
|
||||
self._logger.error(
|
||||
"{} exception sending {} {}: {} (retrying in {:.1f}sec)".format(
|
||||
type(ex).__name__, method.upper(), url, str(ex), sleep_time
|
||||
)
|
||||
)
|
||||
time.sleep(sleep_time)
|
||||
continue
|
||||
|
||||
if (
|
||||
refresh_token_if_unauthorized
|
||||
and res.status_code == requests.codes.unauthorized
|
||||
and not token_refreshed_on_error
|
||||
):
|
||||
# it seems we're unauthorized, so we'll try to refresh our token once in case permissions changed since
|
||||
# the last time we got the token, and try again
|
||||
# it seems we're unauthorized, so we'll try to refresh our token once in case permissions changed
|
||||
# since the last time we got the token, and try again
|
||||
self.refresh_token()
|
||||
token_refreshed_on_error = True
|
||||
# try again
|
||||
@@ -328,11 +376,12 @@ class Session(TokenManager):
|
||||
service,
|
||||
action,
|
||||
version=None,
|
||||
method="get",
|
||||
method=Request.def_method,
|
||||
headers=None,
|
||||
data=None,
|
||||
json=None,
|
||||
async_enable=False,
|
||||
params=None,
|
||||
):
|
||||
"""
|
||||
Send a raw API request.
|
||||
@@ -345,6 +394,7 @@ class Session(TokenManager):
|
||||
content type will be application/json)
|
||||
:param data: Dictionary, bytes, or file-like object to send in the request body
|
||||
:param async_enable: whether request is asynchronous
|
||||
:param params: additional query parameters
|
||||
:return: requests Response instance
|
||||
"""
|
||||
headers = self.add_auth_headers(
|
||||
@@ -361,6 +411,7 @@ class Session(TokenManager):
|
||||
headers=headers,
|
||||
data=data,
|
||||
json=json,
|
||||
params=params,
|
||||
)
|
||||
|
||||
def send_request_batch(
|
||||
@@ -371,7 +422,7 @@ class Session(TokenManager):
|
||||
headers=None,
|
||||
data=None,
|
||||
json=None,
|
||||
method="get",
|
||||
method=Request.def_method,
|
||||
):
|
||||
"""
|
||||
Send a raw batch API request. Batch requests always use application/json-lines content type.
|
||||
@@ -613,15 +664,14 @@ class Session(TokenManager):
|
||||
|
||||
res = None
|
||||
try:
|
||||
data = {"expiration_sec": exp} if exp else {}
|
||||
res = self._send_request(
|
||||
method=ENV_API_DEFAULT_REQ_METHOD.get(default="get"),
|
||||
method=Request.def_method,
|
||||
service="auth",
|
||||
action="login",
|
||||
auth=auth,
|
||||
json=data,
|
||||
headers=headers,
|
||||
refresh_token_if_unauthorized=False,
|
||||
params={"expiration_sec": exp} if exp else {},
|
||||
)
|
||||
try:
|
||||
resp = res.json()
|
||||
@@ -660,3 +710,13 @@ class Session(TokenManager):
|
||||
return "{self.__class__.__name__}[{self.host}, {self.access_key}/{secret_key}]".format(
|
||||
self=self, secret_key=self.secret_key[:5] + "*" * (len(self.secret_key) - 5)
|
||||
)
|
||||
|
||||
@property
|
||||
def propagate_exceptions_on_send(self):
|
||||
# type: () -> bool
|
||||
return self._propagate_exceptions_on_send
|
||||
|
||||
@propagate_exceptions_on_send.setter
|
||||
def propagate_exceptions_on_send(self, value):
|
||||
# type: (bool) -> None
|
||||
self._propagate_exceptions_on_send = value
|
||||
|
||||
@@ -7,10 +7,8 @@ import sys
|
||||
from os.path import expanduser
|
||||
from typing import Any
|
||||
|
||||
import pyhocon
|
||||
import six
|
||||
from pathlib2 import Path
|
||||
from pyhocon import ConfigTree, ConfigFactory
|
||||
from pyparsing import (
|
||||
ParseFatalException,
|
||||
ParseException,
|
||||
@@ -18,6 +16,9 @@ from pyparsing import (
|
||||
ParseSyntaxException,
|
||||
)
|
||||
|
||||
from clearml_agent.external import pyhocon
|
||||
from clearml_agent.external.pyhocon import ConfigTree, ConfigFactory
|
||||
|
||||
from .defs import (
|
||||
Environment,
|
||||
DEFAULT_CONFIG_FOLDER,
|
||||
@@ -191,16 +192,20 @@ class Config(object):
|
||||
config, self._read_extra_env_config_values(), copy_trees=True
|
||||
)
|
||||
|
||||
if self._overrides_configs:
|
||||
config = functools.reduce(
|
||||
lambda cfg, override: ConfigTree.merge_configs(cfg, override, copy_trees=True),
|
||||
self._overrides_configs,
|
||||
config,
|
||||
)
|
||||
config = self.resolve_override_configs(config)
|
||||
|
||||
config["env"] = env
|
||||
return config
|
||||
|
||||
def resolve_override_configs(self, initial=None):
|
||||
if not self._overrides_configs:
|
||||
return initial
|
||||
return functools.reduce(
|
||||
lambda cfg, override: ConfigTree.merge_configs(cfg, override, copy_trees=True),
|
||||
self._overrides_configs,
|
||||
initial or ConfigTree(),
|
||||
)
|
||||
|
||||
def _read_extra_env_config_values(self) -> ConfigTree:
|
||||
""" Loads extra configuration from environment-injected values """
|
||||
result = ConfigTree()
|
||||
@@ -289,6 +294,9 @@ class Config(object):
|
||||
)
|
||||
return value
|
||||
|
||||
def put(self, key, value):
|
||||
self._config.put(key, value)
|
||||
|
||||
def to_dict(self):
|
||||
return self._config.as_plain_ordered_dict()
|
||||
|
||||
|
||||
@@ -14,6 +14,14 @@ except ImportError:
|
||||
ConverterType = TypeVar("ConverterType", bound=Callable[[Any], Any])
|
||||
|
||||
|
||||
def text_to_int(value, default=0):
|
||||
# type: (Any, int) -> int
|
||||
try:
|
||||
return int(value)
|
||||
except (ValueError, TypeError):
|
||||
return default
|
||||
|
||||
|
||||
def base64_to_text(value):
|
||||
# type: (Any) -> Text
|
||||
return base64.b64decode(value).decode("utf-8")
|
||||
|
||||
@@ -4,7 +4,7 @@ from os.path import expandvars, expanduser
|
||||
from pathlib import Path
|
||||
from typing import List, TYPE_CHECKING
|
||||
|
||||
from pyhocon import HOCONConverter, ConfigTree
|
||||
from clearml_agent.external.pyhocon import HOCONConverter, ConfigTree
|
||||
|
||||
if TYPE_CHECKING:
|
||||
from .config import Config
|
||||
|
||||
@@ -118,13 +118,15 @@ class ServiceCommandSection(BaseCommandSection):
|
||||
""" The name of the REST service used by this command """
|
||||
pass
|
||||
|
||||
def get(self, endpoint, *args, session=None, **kwargs):
|
||||
def get(self, endpoint, *args, service=None, session=None, **kwargs):
|
||||
session = session or self._session
|
||||
return session.get(service=self.service, action=endpoint, *args, **kwargs)
|
||||
service = service or self.service
|
||||
return session.get(service=service, action=endpoint, *args, **kwargs)
|
||||
|
||||
def post(self, endpoint, *args, session=None, **kwargs):
|
||||
def post(self, endpoint, *args, service=None, session=None, **kwargs):
|
||||
session = session or self._session
|
||||
return session.post(service=self.service, action=endpoint, *args, **kwargs)
|
||||
service = service or self.service
|
||||
return session.post(service=service, action=endpoint, *args, **kwargs)
|
||||
|
||||
def get_with_act_as(self, endpoint, *args, **kwargs):
|
||||
return self._session.get_with_act_as(service=self.service, action=endpoint, *args, **kwargs)
|
||||
@@ -347,7 +349,7 @@ class ServiceCommandSection(BaseCommandSection):
|
||||
except AttributeError:
|
||||
raise NameResolutionError('Name resolution unavailable for {}'.format(service))
|
||||
|
||||
request = request_cls.from_dict(dict(name=name, only_fields=['name', 'id']))
|
||||
request = request_cls.from_dict(dict(name=re.escape(name), only_fields=['name', 'id']))
|
||||
# from_dict will ignore unrecognised keyword arguments - not all GetAll's have only_fields
|
||||
response = getattr(self._session.send_api(request), service)
|
||||
matches = [db_object for db_object in response if name.lower() == db_object.name.lower()]
|
||||
|
||||
@@ -1,14 +1,15 @@
|
||||
from __future__ import print_function
|
||||
|
||||
from six.moves import input
|
||||
from pyhocon import ConfigFactory, ConfigMissingException
|
||||
from typing import Dict, Optional
|
||||
|
||||
from pathlib2 import Path
|
||||
from six.moves import input
|
||||
from six.moves.urllib.parse import urlparse
|
||||
|
||||
from clearml_agent.backend_api.session import Session
|
||||
from clearml_agent.backend_api.session.defs import ENV_HOST
|
||||
from clearml_agent.backend_config.defs import LOCAL_CONFIG_FILES
|
||||
|
||||
from clearml_agent.external.pyhocon import ConfigFactory, ConfigMissingException
|
||||
|
||||
description = """
|
||||
Please create new clearml credentials through the settings page in your `clearml-server` web app,
|
||||
@@ -112,6 +113,21 @@ def main():
|
||||
print('Exiting setup without creating configuration file')
|
||||
return
|
||||
|
||||
selection = input_options(
|
||||
'Default Output URI (used to automatically store models and artifacts)',
|
||||
{'N': 'None', 'S': 'ClearML Server', 'C': 'Custom'},
|
||||
default='None'
|
||||
)
|
||||
if selection == 'Custom':
|
||||
print('Custom Default Output URI: ', end='')
|
||||
default_output_uri = input().strip()
|
||||
elif selection == "ClearML Server":
|
||||
default_output_uri = files_host
|
||||
else:
|
||||
default_output_uri = None
|
||||
|
||||
print('\nDefault Output URI: {}'.format(default_output_uri if default_output_uri else 'not set'))
|
||||
|
||||
# get GIT User/Pass for cloning
|
||||
print('Enter git username for repository cloning (leave blank for SSH key authentication): [] ', end='')
|
||||
git_user = input()
|
||||
@@ -122,7 +138,7 @@ def main():
|
||||
" Bitbucket: https://support.atlassian.com/bitbucket-cloud/docs/app-passwords/\n"
|
||||
" GitLab: https://docs.gitlab.com/ee/user/profile/personal_access_tokens.html\n"
|
||||
)
|
||||
print('Enter git password token for user \'{}\': '.format(git_user), end='')
|
||||
print('Enter git personal token for user \'{}\': '.format(git_user), end='')
|
||||
git_pass = input()
|
||||
print('Git repository cloning will be using user={} token={}'.format(git_user, git_pass))
|
||||
else:
|
||||
@@ -179,6 +195,13 @@ def main():
|
||||
'agent.package_manager.extra_index_url= ' \
|
||||
'[\n{}\n]\n\n'.format("\n".join(map("\"{}\"".format, extra_index_urls)))
|
||||
f.write(extra_index_str)
|
||||
if default_output_uri:
|
||||
default_output_url_str = '# Default Task output_uri. if output_uri is not provided to Task.init, ' \
|
||||
'default_output_uri will be used instead.\n' \
|
||||
'sdk.development.default_output_uri="{}"\n' \
|
||||
'\n'.format(default_output_uri.strip('"'))
|
||||
f.write(default_output_url_str)
|
||||
default_conf = default_conf.replace('default_output_uri: ""', '# default_output_uri: ""')
|
||||
f.write(default_conf)
|
||||
except Exception:
|
||||
print('Error! Could not write configuration file at: {}'.format(str(conf_file)))
|
||||
@@ -305,6 +328,25 @@ def input_url(host_type, host=None):
|
||||
return host
|
||||
|
||||
|
||||
def input_options(message, options, default=None):
|
||||
# type: (str, Dict[str, str], Optional[str]) -> str
|
||||
options_msg = "/".join(
|
||||
"".join(('(' + c.upper() + ')') if c == o else c for c in option)
|
||||
for o, option in options.items()
|
||||
)
|
||||
if default:
|
||||
options_msg += " [{}]".format(default)
|
||||
while True:
|
||||
print('{}: {} '.format(message, options_msg), end='')
|
||||
res = input().strip()
|
||||
if not res:
|
||||
return default
|
||||
elif res.lower() in options:
|
||||
return options[res.lower()]
|
||||
elif res.upper() in options:
|
||||
return options[res.upper()]
|
||||
|
||||
|
||||
def input_host_port(host_type, parsed_host):
|
||||
print('Enter port for {} host '.format(host_type), end='')
|
||||
replace_port = input().lower()
|
||||
|
||||
@@ -3,8 +3,6 @@ from __future__ import print_function
|
||||
import json
|
||||
import time
|
||||
|
||||
from future.builtins import super
|
||||
|
||||
from clearml_agent.commands.base import ServiceCommandSection
|
||||
from clearml_agent.helper.base import return_list
|
||||
|
||||
|
||||
@@ -1,6 +1,8 @@
|
||||
import json
|
||||
import re
|
||||
import shlex
|
||||
|
||||
from clearml_agent.backend_api.session import Request
|
||||
from clearml_agent.helper.package.requirements import (
|
||||
RequirementsManager, MarkerRequirement,
|
||||
compare_version_rules, )
|
||||
@@ -26,7 +28,7 @@ def resolve_default_container(session, task_id, container_config):
|
||||
'script.repository', 'script.branch',
|
||||
'project', 'container'],
|
||||
'search_hidden': True},
|
||||
method='get',
|
||||
method=Request.def_method,
|
||||
async_enable=False,
|
||||
)
|
||||
try:
|
||||
@@ -53,7 +55,7 @@ def resolve_default_container(session, task_id, container_config):
|
||||
'id': [task_info.get('project')],
|
||||
'only_fields': ['name'],
|
||||
},
|
||||
method='get',
|
||||
method=Request.def_method,
|
||||
async_enable=False,
|
||||
)
|
||||
try:
|
||||
|
||||
File diff suppressed because it is too large
Load Diff
@@ -1,6 +1,6 @@
|
||||
from pyhocon import ConfigTree
|
||||
|
||||
import six
|
||||
|
||||
from clearml_agent.external.pyhocon import ConfigTree
|
||||
from clearml_agent.helper.base import Singleton
|
||||
|
||||
|
||||
|
||||
@@ -5,9 +5,9 @@ from enum import IntEnum
|
||||
from os import getenv, environ
|
||||
from typing import Text, Optional, Union, Tuple, Any
|
||||
|
||||
import six
|
||||
from pathlib2 import Path
|
||||
|
||||
import six
|
||||
from clearml_agent.helper.base import normalize_path
|
||||
|
||||
PROGRAM_NAME = "clearml-agent"
|
||||
@@ -69,41 +69,65 @@ ENV_AWS_SECRET_KEY = EnvironmentConfig("AWS_SECRET_ACCESS_KEY")
|
||||
ENV_AZURE_ACCOUNT_KEY = EnvironmentConfig("AZURE_STORAGE_KEY")
|
||||
|
||||
ENVIRONMENT_CONFIG = {
|
||||
"api.api_server": EnvironmentConfig("CLEARML_API_HOST", "TRAINS_API_HOST", ),
|
||||
"api.files_server": EnvironmentConfig("CLEARML_FILES_HOST", "TRAINS_FILES_HOST", ),
|
||||
"api.web_server": EnvironmentConfig("CLEARML_WEB_HOST", "TRAINS_WEB_HOST", ),
|
||||
"api.api_server": EnvironmentConfig(
|
||||
"CLEARML_API_HOST",
|
||||
"TRAINS_API_HOST",
|
||||
),
|
||||
"api.files_server": EnvironmentConfig(
|
||||
"CLEARML_FILES_HOST",
|
||||
"TRAINS_FILES_HOST",
|
||||
),
|
||||
"api.web_server": EnvironmentConfig(
|
||||
"CLEARML_WEB_HOST",
|
||||
"TRAINS_WEB_HOST",
|
||||
),
|
||||
"api.credentials.access_key": EnvironmentConfig(
|
||||
"CLEARML_API_ACCESS_KEY", "TRAINS_API_ACCESS_KEY",
|
||||
"CLEARML_API_ACCESS_KEY",
|
||||
"TRAINS_API_ACCESS_KEY",
|
||||
),
|
||||
"api.credentials.secret_key": ENV_AGENT_SECRET_KEY,
|
||||
"agent.worker_name": EnvironmentConfig("CLEARML_WORKER_NAME", "TRAINS_WORKER_NAME", ),
|
||||
"agent.worker_id": EnvironmentConfig("CLEARML_WORKER_ID", "TRAINS_WORKER_ID", ),
|
||||
"agent.cuda_version": EnvironmentConfig(
|
||||
"CLEARML_CUDA_VERSION", "TRAINS_CUDA_VERSION", "CUDA_VERSION"
|
||||
"agent.worker_name": EnvironmentConfig(
|
||||
"CLEARML_WORKER_NAME",
|
||||
"TRAINS_WORKER_NAME",
|
||||
),
|
||||
"agent.cudnn_version": EnvironmentConfig(
|
||||
"CLEARML_CUDNN_VERSION", "TRAINS_CUDNN_VERSION", "CUDNN_VERSION"
|
||||
),
|
||||
"agent.cpu_only": EnvironmentConfig(
|
||||
names=("CLEARML_CPU_ONLY", "TRAINS_CPU_ONLY", "CPU_ONLY"), type=bool
|
||||
"agent.worker_id": EnvironmentConfig(
|
||||
"CLEARML_WORKER_ID",
|
||||
"TRAINS_WORKER_ID",
|
||||
),
|
||||
"agent.cuda_version": EnvironmentConfig("CLEARML_CUDA_VERSION", "TRAINS_CUDA_VERSION", "CUDA_VERSION"),
|
||||
"agent.cudnn_version": EnvironmentConfig("CLEARML_CUDNN_VERSION", "TRAINS_CUDNN_VERSION", "CUDNN_VERSION"),
|
||||
"agent.cpu_only": EnvironmentConfig(names=("CLEARML_CPU_ONLY", "TRAINS_CPU_ONLY", "CPU_ONLY"), type=bool),
|
||||
"agent.crash_on_exception": EnvironmentConfig("CLEAMRL_AGENT_CRASH_ON_EXCEPTION", type=bool),
|
||||
"sdk.aws.s3.key": EnvironmentConfig("AWS_ACCESS_KEY_ID"),
|
||||
"sdk.aws.s3.secret": ENV_AWS_SECRET_KEY,
|
||||
"sdk.aws.s3.region": EnvironmentConfig("AWS_DEFAULT_REGION"),
|
||||
"sdk.azure.storage.containers.0": {'account_name': EnvironmentConfig("AZURE_STORAGE_ACCOUNT"),
|
||||
'account_key': ENV_AZURE_ACCOUNT_KEY},
|
||||
"sdk.azure.storage.containers.0": {
|
||||
"account_name": EnvironmentConfig("AZURE_STORAGE_ACCOUNT"),
|
||||
"account_key": ENV_AZURE_ACCOUNT_KEY,
|
||||
},
|
||||
"sdk.google.storage.credentials_json": EnvironmentConfig("GOOGLE_APPLICATION_CREDENTIALS"),
|
||||
}
|
||||
|
||||
ENVIRONMENT_SDK_PARAMS = {
|
||||
"task_id": ("CLEARML_TASK_ID", "TRAINS_TASK_ID", ),
|
||||
"config_file": ("CLEARML_CONFIG_FILE", "TRAINS_CONFIG_FILE", ),
|
||||
"log_level": ("CLEARML_LOG_LEVEL", "TRAINS_LOG_LEVEL", ),
|
||||
"log_to_backend": ("CLEARML_LOG_TASK_TO_BACKEND", "TRAINS_LOG_TASK_TO_BACKEND", ),
|
||||
"task_id": (
|
||||
"CLEARML_TASK_ID",
|
||||
"TRAINS_TASK_ID",
|
||||
),
|
||||
"config_file": (
|
||||
"CLEARML_CONFIG_FILE",
|
||||
"TRAINS_CONFIG_FILE",
|
||||
),
|
||||
"log_level": (
|
||||
"CLEARML_LOG_LEVEL",
|
||||
"TRAINS_LOG_LEVEL",
|
||||
),
|
||||
"log_to_backend": (
|
||||
"CLEARML_LOG_TASK_TO_BACKEND",
|
||||
"TRAINS_LOG_TASK_TO_BACKEND",
|
||||
),
|
||||
}
|
||||
|
||||
ENVIRONMENT_BACKWARD_COMPATIBLE = EnvironmentConfig(
|
||||
names=("CLEARML_AGENT_ALG_ENV", "TRAINS_AGENT_ALG_ENV"), type=bool)
|
||||
ENVIRONMENT_BACKWARD_COMPATIBLE = EnvironmentConfig(names=("CLEARML_AGENT_ALG_ENV", "TRAINS_AGENT_ALG_ENV"), type=bool)
|
||||
|
||||
VIRTUAL_ENVIRONMENT_PATH = {
|
||||
"python2": normalize_path(CONFIG_DIR, "py2venv"),
|
||||
@@ -122,34 +146,61 @@ TOKEN_EXPIRATION_SECONDS = int(timedelta(days=2).total_seconds())
|
||||
|
||||
METADATA_EXTENSION = ".json"
|
||||
|
||||
DEFAULT_VENV_UPDATE_URL = (
|
||||
"https://raw.githubusercontent.com/Yelp/venv-update/v3.2.4/venv_update.py"
|
||||
)
|
||||
DEFAULT_VENV_UPDATE_URL = "https://raw.githubusercontent.com/Yelp/venv-update/v3.2.4/venv_update.py"
|
||||
WORKING_REPOSITORY_DIR = "task_repository"
|
||||
WORKING_STANDALONE_DIR = "code"
|
||||
DEFAULT_VCS_CACHE = normalize_path(CONFIG_DIR, "vcs-cache")
|
||||
PIP_EXTRA_INDICES = [
|
||||
]
|
||||
PIP_EXTRA_INDICES = []
|
||||
DEFAULT_PIP_DOWNLOAD_CACHE = normalize_path(CONFIG_DIR, "pip-download-cache")
|
||||
ENV_DOCKER_IMAGE = EnvironmentConfig('CLEARML_DOCKER_IMAGE', 'TRAINS_DOCKER_IMAGE')
|
||||
ENV_WORKER_ID = EnvironmentConfig('CLEARML_WORKER_ID', 'TRAINS_WORKER_ID')
|
||||
ENV_WORKER_TAGS = EnvironmentConfig('CLEARML_WORKER_TAGS')
|
||||
ENV_AGENT_SKIP_PIP_VENV_INSTALL = EnvironmentConfig('CLEARML_AGENT_SKIP_PIP_VENV_INSTALL')
|
||||
ENV_AGENT_SKIP_PYTHON_ENV_INSTALL = EnvironmentConfig('CLEARML_AGENT_SKIP_PYTHON_ENV_INSTALL', type=bool)
|
||||
ENV_DOCKER_SKIP_GPUS_FLAG = EnvironmentConfig('CLEARML_DOCKER_SKIP_GPUS_FLAG', 'TRAINS_DOCKER_SKIP_GPUS_FLAG')
|
||||
ENV_AGENT_GIT_USER = EnvironmentConfig('CLEARML_AGENT_GIT_USER', 'TRAINS_AGENT_GIT_USER')
|
||||
ENV_AGENT_GIT_PASS = EnvironmentConfig('CLEARML_AGENT_GIT_PASS', 'TRAINS_AGENT_GIT_PASS')
|
||||
ENV_AGENT_GIT_HOST = EnvironmentConfig('CLEARML_AGENT_GIT_HOST', 'TRAINS_AGENT_GIT_HOST')
|
||||
ENV_AGENT_DISABLE_SSH_MOUNT = EnvironmentConfig('CLEARML_AGENT_DISABLE_SSH_MOUNT', type=bool)
|
||||
ENV_SSH_AUTH_SOCK = EnvironmentConfig('SSH_AUTH_SOCK')
|
||||
ENV_TASK_EXECUTE_AS_USER = EnvironmentConfig('CLEARML_AGENT_EXEC_USER', 'TRAINS_AGENT_EXEC_USER')
|
||||
ENV_TASK_EXTRA_PYTHON_PATH = EnvironmentConfig('CLEARML_AGENT_EXTRA_PYTHON_PATH', 'TRAINS_AGENT_EXTRA_PYTHON_PATH')
|
||||
ENV_DOCKER_HOST_MOUNT = EnvironmentConfig('CLEARML_AGENT_K8S_HOST_MOUNT', 'CLEARML_AGENT_DOCKER_HOST_MOUNT',
|
||||
'TRAINS_AGENT_K8S_HOST_MOUNT', 'TRAINS_AGENT_DOCKER_HOST_MOUNT')
|
||||
ENV_VENV_CACHE_PATH = EnvironmentConfig('CLEARML_AGENT_VENV_CACHE_PATH')
|
||||
ENV_EXTRA_DOCKER_ARGS = EnvironmentConfig('CLEARML_AGENT_EXTRA_DOCKER_ARGS', type=list)
|
||||
ENV_DOCKER_IMAGE = EnvironmentConfig("CLEARML_DOCKER_IMAGE", "TRAINS_DOCKER_IMAGE")
|
||||
ENV_WORKER_ID = EnvironmentConfig("CLEARML_WORKER_ID", "TRAINS_WORKER_ID")
|
||||
ENV_WORKER_TAGS = EnvironmentConfig("CLEARML_WORKER_TAGS")
|
||||
ENV_AGENT_SKIP_PIP_VENV_INSTALL = EnvironmentConfig("CLEARML_AGENT_SKIP_PIP_VENV_INSTALL")
|
||||
ENV_AGENT_SKIP_PYTHON_ENV_INSTALL = EnvironmentConfig("CLEARML_AGENT_SKIP_PYTHON_ENV_INSTALL", type=bool)
|
||||
ENV_DOCKER_SKIP_GPUS_FLAG = EnvironmentConfig("CLEARML_DOCKER_SKIP_GPUS_FLAG", "TRAINS_DOCKER_SKIP_GPUS_FLAG")
|
||||
ENV_AGENT_GIT_USER = EnvironmentConfig("CLEARML_AGENT_GIT_USER", "TRAINS_AGENT_GIT_USER")
|
||||
ENV_AGENT_GIT_PASS = EnvironmentConfig("CLEARML_AGENT_GIT_PASS", "TRAINS_AGENT_GIT_PASS")
|
||||
ENV_AGENT_GIT_HOST = EnvironmentConfig("CLEARML_AGENT_GIT_HOST", "TRAINS_AGENT_GIT_HOST")
|
||||
ENV_AGENT_DISABLE_SSH_MOUNT = EnvironmentConfig("CLEARML_AGENT_DISABLE_SSH_MOUNT", type=bool)
|
||||
ENV_SSH_AUTH_SOCK = EnvironmentConfig("SSH_AUTH_SOCK")
|
||||
ENV_TASK_EXECUTE_AS_USER = EnvironmentConfig("CLEARML_AGENT_EXEC_USER", "TRAINS_AGENT_EXEC_USER")
|
||||
ENV_TASK_EXTRA_PYTHON_PATH = EnvironmentConfig("CLEARML_AGENT_EXTRA_PYTHON_PATH", "TRAINS_AGENT_EXTRA_PYTHON_PATH")
|
||||
ENV_DOCKER_HOST_MOUNT = EnvironmentConfig(
|
||||
"CLEARML_AGENT_K8S_HOST_MOUNT",
|
||||
"CLEARML_AGENT_DOCKER_HOST_MOUNT",
|
||||
"TRAINS_AGENT_K8S_HOST_MOUNT",
|
||||
"TRAINS_AGENT_DOCKER_HOST_MOUNT",
|
||||
)
|
||||
ENV_VENV_CACHE_PATH = EnvironmentConfig("CLEARML_AGENT_VENV_CACHE_PATH")
|
||||
ENV_EXTRA_DOCKER_ARGS = EnvironmentConfig("CLEARML_AGENT_EXTRA_DOCKER_ARGS", type=list)
|
||||
ENV_DEBUG_INFO = EnvironmentConfig("CLEARML_AGENT_DEBUG_INFO")
|
||||
ENV_CHILD_AGENTS_COUNT_CMD = EnvironmentConfig("CLEARML_AGENT_CHILD_AGENTS_COUNT_CMD")
|
||||
ENV_DOCKER_ARGS_FILTERS = EnvironmentConfig("CLEARML_AGENT_DOCKER_ARGS_FILTERS")
|
||||
ENV_DOCKER_ARGS_HIDE_ENV = EnvironmentConfig("CLEARML_AGENT_DOCKER_ARGS_HIDE_ENV")
|
||||
|
||||
ENV_CUSTOM_BUILD_SCRIPT = EnvironmentConfig('CLEARML_AGENT_CUSTOM_BUILD_SCRIPT')
|
||||
ENV_SERVICES_DOCKER_RESTART = EnvironmentConfig("CLEARML_AGENT_SERVICES_DOCKER_RESTART")
|
||||
"""
|
||||
Specify a restart value for a services agent task containers.
|
||||
Note that when a restart value is provided, task containers will not be run with the '--rm' flag and will
|
||||
not be cleaned up automatically when completed (this will need to be done externally using the
|
||||
'docker container prune' command to free up resources).
|
||||
Value format for this env var is "<restart-value>;<task-selector>", where:
|
||||
- <restart-value> can be any valid restart value for docker-run (see https://docs.docker.com/engine/reference/commandline/run/#restart)
|
||||
- <task-selector> is optional, allowing to restrict this behaviour to specific tasks. The format is:
|
||||
"<path-to-task-field>=<value>" where:
|
||||
* <path-to-task-field> is a dot-separated path to a task field (e.g. "container.image")
|
||||
* <value> is optional. If not provided, the restart policy till be applied for the task container if the
|
||||
path provided exists. If provided, the restart policy will be applied if the value matches the value
|
||||
obtained from the task (value parsing and comparison is based on the type of value obtained from the task)
|
||||
For example:
|
||||
CLEARML_AGENT_SERVICES_DOCKER_RESTART=unless-stopped
|
||||
CLEARML_AGENT_SERVICES_DOCKER_RESTART=unless-stopped;container.image=some-image
|
||||
"""
|
||||
|
||||
ENV_FORCE_SYSTEM_SITE_PACKAGES = EnvironmentConfig("CLEARML_AGENT_FORCE_SYSTEM_SITE_PACKAGES", type=bool)
|
||||
""" Force system_site_packages: true when running tasks in containers (i.e. docker mode or k8s glue) """
|
||||
|
||||
ENV_CUSTOM_BUILD_SCRIPT = EnvironmentConfig("CLEARML_AGENT_CUSTOM_BUILD_SCRIPT")
|
||||
"""
|
||||
Specifies a custom environment setup script to be executed instead of installing a virtual environment.
|
||||
If provided, this script is executed following Git cloning. Script command may include environment variable and
|
||||
|
||||
5
clearml_agent/external/pyhocon/__init__.py
vendored
Normal file
5
clearml_agent/external/pyhocon/__init__.py
vendored
Normal file
@@ -0,0 +1,5 @@
|
||||
from .config_parser import ConfigParser, ConfigFactory, ConfigMissingException
|
||||
from .config_tree import ConfigTree
|
||||
from .converter import HOCONConverter
|
||||
|
||||
__all__ = ["ConfigParser", "ConfigFactory", "ConfigMissingException", "ConfigTree", "HOCONConverter"]
|
||||
762
clearml_agent/external/pyhocon/config_parser.py
vendored
Normal file
762
clearml_agent/external/pyhocon/config_parser.py
vendored
Normal file
@@ -0,0 +1,762 @@
|
||||
import itertools
|
||||
import re
|
||||
import os
|
||||
import socket
|
||||
import contextlib
|
||||
import codecs
|
||||
from datetime import timedelta
|
||||
|
||||
from pyparsing import Forward, Keyword, QuotedString, Word, Literal, Suppress, Regex, Optional, SkipTo, ZeroOrMore, \
|
||||
Group, lineno, col, TokenConverter, replaceWith, alphanums, alphas8bit, ParseSyntaxException, StringEnd
|
||||
from pyparsing import ParserElement
|
||||
from .config_tree import ConfigTree, ConfigSubstitution, ConfigList, ConfigValues, ConfigUnquotedString, \
|
||||
ConfigInclude, NoneValue, ConfigQuotedString
|
||||
from .exceptions import ConfigSubstitutionException, ConfigMissingException, ConfigException
|
||||
import logging
|
||||
import copy
|
||||
|
||||
use_urllib2 = False
|
||||
try:
|
||||
# For Python 3.0 and later
|
||||
from urllib.request import urlopen
|
||||
from urllib.error import HTTPError, URLError
|
||||
except ImportError: # pragma: no cover
|
||||
# Fall back to Python 2's urllib2
|
||||
from urllib2 import urlopen, HTTPError, URLError
|
||||
|
||||
use_urllib2 = True
|
||||
try:
|
||||
basestring
|
||||
except NameError: # pragma: no cover
|
||||
basestring = str
|
||||
unicode = str
|
||||
|
||||
logger = logging.getLogger(__name__)
|
||||
|
||||
#
|
||||
# Substitution Defaults
|
||||
#
|
||||
|
||||
|
||||
class DEFAULT_SUBSTITUTION(object):
|
||||
pass
|
||||
|
||||
|
||||
class MANDATORY_SUBSTITUTION(object):
|
||||
pass
|
||||
|
||||
|
||||
class NO_SUBSTITUTION(object):
|
||||
pass
|
||||
|
||||
|
||||
class STR_SUBSTITUTION(object):
|
||||
pass
|
||||
|
||||
|
||||
def period(period_value, period_unit):
|
||||
try:
|
||||
from dateutil.relativedelta import relativedelta as period_impl
|
||||
except Exception:
|
||||
from datetime import timedelta as period_impl
|
||||
|
||||
if period_unit == 'nanoseconds':
|
||||
period_unit = 'microseconds'
|
||||
period_value = int(period_value / 1000)
|
||||
|
||||
arguments = dict(zip((period_unit,), (period_value,)))
|
||||
|
||||
if period_unit == 'milliseconds':
|
||||
return timedelta(**arguments)
|
||||
|
||||
return period_impl(**arguments)
|
||||
|
||||
|
||||
class ConfigFactory(object):
|
||||
|
||||
@classmethod
|
||||
def parse_file(cls, filename, encoding='utf-8', required=True, resolve=True, unresolved_value=DEFAULT_SUBSTITUTION):
|
||||
"""Parse file
|
||||
|
||||
:param filename: filename
|
||||
:type filename: basestring
|
||||
:param encoding: file encoding
|
||||
:type encoding: basestring
|
||||
:param required: If true, raises an exception if can't load file
|
||||
:type required: boolean
|
||||
:param resolve: if true, resolve substitutions
|
||||
:type resolve: boolean
|
||||
:param unresolved_value: assigned value value to unresolved substitution.
|
||||
If overriden with a default value, it will replace all unresolved value to the default value.
|
||||
If it is set to to pyhocon.STR_SUBSTITUTION then it will replace the value by its
|
||||
substitution expression (e.g., ${x})
|
||||
:type unresolved_value: class
|
||||
:return: Config object
|
||||
:type return: Config
|
||||
"""
|
||||
try:
|
||||
with codecs.open(filename, 'r', encoding=encoding) as fd:
|
||||
content = fd.read()
|
||||
return cls.parse_string(content, os.path.dirname(filename), resolve, unresolved_value)
|
||||
except IOError as e:
|
||||
if required:
|
||||
raise e
|
||||
logger.warn('Cannot include file %s. File does not exist or cannot be read.', filename)
|
||||
return []
|
||||
|
||||
@classmethod
|
||||
def parse_URL(cls, url, timeout=None, resolve=True, required=False, unresolved_value=DEFAULT_SUBSTITUTION):
|
||||
"""Parse URL
|
||||
|
||||
:param url: url to parse
|
||||
:type url: basestring
|
||||
:param resolve: if true, resolve substitutions
|
||||
:type resolve: boolean
|
||||
:param unresolved_value: assigned value value to unresolved substitution.
|
||||
If overriden with a default value, it will replace all unresolved value to the default value.
|
||||
If it is set to to pyhocon.STR_SUBSTITUTION then it will replace the value by
|
||||
its substitution expression (e.g., ${x})
|
||||
:type unresolved_value: boolean
|
||||
:return: Config object or []
|
||||
:type return: Config or list
|
||||
"""
|
||||
socket_timeout = socket._GLOBAL_DEFAULT_TIMEOUT if timeout is None else timeout
|
||||
|
||||
try:
|
||||
with contextlib.closing(urlopen(url, timeout=socket_timeout)) as fd:
|
||||
content = fd.read() if use_urllib2 else fd.read().decode('utf-8')
|
||||
return cls.parse_string(content, os.path.dirname(url), resolve, unresolved_value)
|
||||
except (HTTPError, URLError) as e:
|
||||
logger.warn('Cannot include url %s. Resource is inaccessible.', url)
|
||||
if required:
|
||||
raise e
|
||||
else:
|
||||
return []
|
||||
|
||||
@classmethod
|
||||
def parse_string(cls, content, basedir=None, resolve=True, unresolved_value=DEFAULT_SUBSTITUTION):
|
||||
"""Parse URL
|
||||
|
||||
:param content: content to parse
|
||||
:type content: basestring
|
||||
:param resolve: If true, resolve substitutions
|
||||
:param resolve: if true, resolve substitutions
|
||||
:type resolve: boolean
|
||||
:param unresolved_value: assigned value value to unresolved substitution.
|
||||
If overriden with a default value, it will replace all unresolved value to the default value.
|
||||
If it is set to to pyhocon.STR_SUBSTITUTION then it will replace the value by
|
||||
its substitution expression (e.g., ${x})
|
||||
:type unresolved_value: boolean
|
||||
:return: Config object
|
||||
:type return: Config
|
||||
"""
|
||||
return ConfigParser().parse(content, basedir, resolve, unresolved_value)
|
||||
|
||||
@classmethod
|
||||
def from_dict(cls, dictionary, root=False):
|
||||
"""Convert dictionary (and ordered dictionary) into a ConfigTree
|
||||
:param dictionary: dictionary to convert
|
||||
:type dictionary: dict
|
||||
:return: Config object
|
||||
:type return: Config
|
||||
"""
|
||||
|
||||
def create_tree(value):
|
||||
if isinstance(value, dict):
|
||||
res = ConfigTree(root=root)
|
||||
for key, child_value in value.items():
|
||||
res.put(key, create_tree(child_value))
|
||||
return res
|
||||
if isinstance(value, list):
|
||||
return [create_tree(v) for v in value]
|
||||
else:
|
||||
return value
|
||||
|
||||
return create_tree(dictionary)
|
||||
|
||||
|
||||
class ConfigParser(object):
|
||||
"""
|
||||
Parse HOCON files: https://github.com/typesafehub/config/blob/master/HOCON.md
|
||||
"""
|
||||
|
||||
REPLACEMENTS = {
|
||||
'\\\\': '\\',
|
||||
'\\\n': '\n',
|
||||
'\\n': '\n',
|
||||
'\\r': '\r',
|
||||
'\\t': '\t',
|
||||
'\\=': '=',
|
||||
'\\#': '#',
|
||||
'\\!': '!',
|
||||
'\\"': '"',
|
||||
}
|
||||
|
||||
period_type_map = {
|
||||
'nanoseconds': ['ns', 'nano', 'nanos', 'nanosecond', 'nanoseconds'],
|
||||
|
||||
'microseconds': ['us', 'micro', 'micros', 'microsecond', 'microseconds'],
|
||||
'milliseconds': ['ms', 'milli', 'millis', 'millisecond', 'milliseconds'],
|
||||
'seconds': ['s', 'second', 'seconds'],
|
||||
'minutes': ['m', 'minute', 'minutes'],
|
||||
'hours': ['h', 'hour', 'hours'],
|
||||
'weeks': ['w', 'week', 'weeks'],
|
||||
'days': ['d', 'day', 'days'],
|
||||
|
||||
}
|
||||
|
||||
optional_period_type_map = {
|
||||
'months': ['mo', 'month', 'months'], # 'm' from hocon spec removed. conflicts with minutes syntax.
|
||||
'years': ['y', 'year', 'years']
|
||||
}
|
||||
|
||||
supported_period_map = None
|
||||
|
||||
@classmethod
|
||||
def get_supported_period_type_map(cls):
|
||||
if cls.supported_period_map is None:
|
||||
cls.supported_period_map = {}
|
||||
cls.supported_period_map.update(cls.period_type_map)
|
||||
|
||||
try:
|
||||
from dateutil import relativedelta
|
||||
|
||||
if relativedelta is not None:
|
||||
cls.supported_period_map.update(cls.optional_period_type_map)
|
||||
except Exception:
|
||||
pass
|
||||
|
||||
return cls.supported_period_map
|
||||
|
||||
@classmethod
|
||||
def parse(cls, content, basedir=None, resolve=True, unresolved_value=DEFAULT_SUBSTITUTION):
|
||||
"""parse a HOCON content
|
||||
|
||||
:param content: HOCON content to parse
|
||||
:type content: basestring
|
||||
:param resolve: if true, resolve substitutions
|
||||
:type resolve: boolean
|
||||
:param unresolved_value: assigned value value to unresolved substitution.
|
||||
If overriden with a default value, it will replace all unresolved value to the default value.
|
||||
If it is set to to pyhocon.STR_SUBSTITUTION then it will replace the value by
|
||||
its substitution expression (e.g., ${x})
|
||||
:type unresolved_value: boolean
|
||||
:return: a ConfigTree or a list
|
||||
"""
|
||||
|
||||
unescape_pattern = re.compile(r'\\.')
|
||||
|
||||
def replace_escape_sequence(match):
|
||||
value = match.group(0)
|
||||
return cls.REPLACEMENTS.get(value, value)
|
||||
|
||||
def norm_string(value):
|
||||
return unescape_pattern.sub(replace_escape_sequence, value)
|
||||
|
||||
def unescape_string(tokens):
|
||||
return ConfigUnquotedString(norm_string(tokens[0]))
|
||||
|
||||
def parse_multi_string(tokens):
|
||||
# remove the first and last 3 "
|
||||
return tokens[0][3: -3]
|
||||
|
||||
def convert_number(tokens):
|
||||
n = tokens[0]
|
||||
try:
|
||||
return int(n, 10)
|
||||
except ValueError:
|
||||
return float(n)
|
||||
|
||||
def safe_convert_number(tokens):
|
||||
n = tokens[0]
|
||||
try:
|
||||
return int(n, 10)
|
||||
except ValueError:
|
||||
try:
|
||||
return float(n)
|
||||
except ValueError:
|
||||
return n
|
||||
|
||||
def convert_period(tokens):
|
||||
|
||||
period_value = int(tokens.value)
|
||||
period_identifier = tokens.unit
|
||||
|
||||
period_unit = next((single_unit for single_unit, values
|
||||
in cls.get_supported_period_type_map().items()
|
||||
if period_identifier in values))
|
||||
|
||||
return period(period_value, period_unit)
|
||||
|
||||
# ${path} or ${?path} for optional substitution
|
||||
SUBSTITUTION_PATTERN = r"\$\{(?P<optional>\?)?(?P<variable>[^}]+)\}(?P<ws>[ \t]*)"
|
||||
|
||||
def create_substitution(instring, loc, token):
|
||||
# remove the ${ and }
|
||||
match = re.match(SUBSTITUTION_PATTERN, token[0])
|
||||
variable = match.group('variable')
|
||||
ws = match.group('ws')
|
||||
optional = match.group('optional') == '?'
|
||||
substitution = ConfigSubstitution(variable, optional, ws, instring, loc)
|
||||
return substitution
|
||||
|
||||
# ${path} or ${?path} for optional substitution
|
||||
STRING_PATTERN = '"(?P<value>(?:[^"\\\\]|\\\\.)*)"(?P<ws>[ \t]*)'
|
||||
|
||||
def create_quoted_string(instring, loc, token):
|
||||
# remove the ${ and }
|
||||
match = re.match(STRING_PATTERN, token[0])
|
||||
value = norm_string(match.group('value'))
|
||||
ws = match.group('ws')
|
||||
return ConfigQuotedString(value, ws, instring, loc)
|
||||
|
||||
def include_config(instring, loc, token):
|
||||
url = None
|
||||
file = None
|
||||
required = False
|
||||
|
||||
if token[0] == 'required':
|
||||
required = True
|
||||
final_tokens = token[1:]
|
||||
else:
|
||||
final_tokens = token
|
||||
|
||||
if len(final_tokens) == 1: # include "test"
|
||||
value = final_tokens[0].value if isinstance(final_tokens[0], ConfigQuotedString) else final_tokens[0]
|
||||
if value.startswith("http://") or value.startswith("https://") or value.startswith("file://"):
|
||||
url = value
|
||||
else:
|
||||
file = value
|
||||
elif len(final_tokens) == 2: # include url("test") or file("test")
|
||||
value = final_tokens[1].value if isinstance(token[1], ConfigQuotedString) else final_tokens[1]
|
||||
if final_tokens[0] == 'url':
|
||||
url = value
|
||||
else:
|
||||
file = value
|
||||
|
||||
if url is not None:
|
||||
logger.debug('Loading config from url %s', url)
|
||||
obj = ConfigFactory.parse_URL(
|
||||
url,
|
||||
resolve=False,
|
||||
required=required,
|
||||
unresolved_value=NO_SUBSTITUTION
|
||||
)
|
||||
elif file is not None:
|
||||
path = file if basedir is None else os.path.join(basedir, file)
|
||||
logger.debug('Loading config from file %s', path)
|
||||
obj = ConfigFactory.parse_file(
|
||||
path,
|
||||
resolve=False,
|
||||
required=required,
|
||||
unresolved_value=NO_SUBSTITUTION
|
||||
)
|
||||
else:
|
||||
raise ConfigException('No file or URL specified at: {loc}: {instring}', loc=loc, instring=instring)
|
||||
|
||||
return ConfigInclude(obj if isinstance(obj, list) else obj.items())
|
||||
|
||||
@contextlib.contextmanager
|
||||
def set_default_white_spaces():
|
||||
default = ParserElement.DEFAULT_WHITE_CHARS
|
||||
ParserElement.setDefaultWhitespaceChars(' \t')
|
||||
yield
|
||||
ParserElement.setDefaultWhitespaceChars(default)
|
||||
|
||||
with set_default_white_spaces():
|
||||
assign_expr = Forward()
|
||||
true_expr = Keyword("true", caseless=True).setParseAction(replaceWith(True))
|
||||
false_expr = Keyword("false", caseless=True).setParseAction(replaceWith(False))
|
||||
null_expr = Keyword("null", caseless=True).setParseAction(replaceWith(NoneValue()))
|
||||
# key = QuotedString('"', escChar='\\', unquoteResults=False) | Word(alphanums + alphas8bit + '._- /')
|
||||
regexp_numbers = r'[+-]?(\d*\.\d+|\d+(\.\d+)?)([eE][+\-]?\d+)?(?=$|[ \t]*([\$\}\],#\n\r]|//))'
|
||||
key = QuotedString('"', escChar='\\', unquoteResults=False) | \
|
||||
Regex(regexp_numbers, re.DOTALL).setParseAction(safe_convert_number) | \
|
||||
Word(alphanums + alphas8bit + '._- /')
|
||||
|
||||
eol = Word('\n\r').suppress()
|
||||
eol_comma = Word('\n\r,').suppress()
|
||||
comment = (Literal('#') | Literal('//')) - SkipTo(eol | StringEnd())
|
||||
comment_eol = Suppress(Optional(eol_comma) + comment)
|
||||
comment_no_comma_eol = (comment | eol).suppress()
|
||||
number_expr = Regex(regexp_numbers, re.DOTALL).setParseAction(convert_number)
|
||||
|
||||
period_types = itertools.chain.from_iterable(cls.get_supported_period_type_map().values())
|
||||
period_expr = Regex(r'(?P<value>\d+)\s*(?P<unit>' + '|'.join(period_types) + ')$'
|
||||
).setParseAction(convert_period)
|
||||
|
||||
# multi line string using """
|
||||
# Using fix described in http://pyparsing.wikispaces.com/share/view/3778969
|
||||
multiline_string = Regex('""".*?"*"""', re.DOTALL | re.UNICODE).setParseAction(parse_multi_string)
|
||||
# single quoted line string
|
||||
quoted_string = Regex(r'"(?:[^"\\\n]|\\.)*"[ \t]*', re.UNICODE).setParseAction(create_quoted_string)
|
||||
# unquoted string that takes the rest of the line until an optional comment
|
||||
# we support .properties multiline support which is like this:
|
||||
# line1 \
|
||||
# line2 \
|
||||
# so a backslash precedes the \n
|
||||
unquoted_string = Regex(r'(?:[^^`+?!@*&"\[\{\s\]\}#,=\$\\]|\\.)+[ \t]*',
|
||||
re.UNICODE).setParseAction(unescape_string)
|
||||
substitution_expr = Regex(r'[ \t]*\$\{[^\}]+\}[ \t]*').setParseAction(create_substitution)
|
||||
string_expr = multiline_string | quoted_string | unquoted_string
|
||||
|
||||
value_expr = period_expr | number_expr | true_expr | false_expr | null_expr | string_expr
|
||||
|
||||
include_content = (quoted_string | ((Keyword('url') | Keyword(
|
||||
'file')) - Literal('(').suppress() - quoted_string - Literal(')').suppress()))
|
||||
include_expr = (
|
||||
Keyword("include", caseless=True).suppress() + (
|
||||
include_content | (
|
||||
Keyword("required") - Literal('(').suppress() - include_content - Literal(')').suppress()
|
||||
)
|
||||
)
|
||||
).setParseAction(include_config)
|
||||
|
||||
root_dict_expr = Forward()
|
||||
dict_expr = Forward()
|
||||
list_expr = Forward()
|
||||
multi_value_expr = ZeroOrMore(comment_eol | include_expr | substitution_expr |
|
||||
dict_expr | list_expr | value_expr | (Literal('\\') - eol).suppress())
|
||||
# for a dictionary : or = is optional
|
||||
# last zeroOrMore is because we can have t = {a:4} {b: 6} {c: 7} which is dictionary concatenation
|
||||
inside_dict_expr = ConfigTreeParser(ZeroOrMore(comment_eol | include_expr | assign_expr | eol_comma))
|
||||
inside_root_dict_expr = ConfigTreeParser(ZeroOrMore(
|
||||
comment_eol | include_expr | assign_expr | eol_comma), root=True)
|
||||
dict_expr << Suppress('{') - inside_dict_expr - Suppress('}')
|
||||
root_dict_expr << Suppress('{') - inside_root_dict_expr - Suppress('}')
|
||||
list_entry = ConcatenatedValueParser(multi_value_expr)
|
||||
list_expr << Suppress('[') - ListParser(list_entry - ZeroOrMore(eol_comma - list_entry)) - Suppress(']')
|
||||
|
||||
# special case when we have a value assignment where the string can potentially be the remainder of the line
|
||||
assign_expr << Group(key - ZeroOrMore(comment_no_comma_eol) -
|
||||
(dict_expr | (Literal('=') | Literal(':') | Literal('+=')) -
|
||||
ZeroOrMore(comment_no_comma_eol) - ConcatenatedValueParser(multi_value_expr)))
|
||||
|
||||
# the file can be { ... } where {} can be omitted or []
|
||||
config_expr = ZeroOrMore(comment_eol | eol) + (list_expr | root_dict_expr |
|
||||
inside_root_dict_expr) + ZeroOrMore(comment_eol | eol_comma)
|
||||
config = config_expr.parseString(content, parseAll=True)[0]
|
||||
|
||||
if resolve:
|
||||
allow_unresolved = resolve and unresolved_value is not DEFAULT_SUBSTITUTION and \
|
||||
unresolved_value is not MANDATORY_SUBSTITUTION
|
||||
has_unresolved = cls.resolve_substitutions(config, allow_unresolved)
|
||||
if has_unresolved and unresolved_value is MANDATORY_SUBSTITUTION:
|
||||
raise ConfigSubstitutionException(
|
||||
'resolve cannot be set to True and unresolved_value to MANDATORY_SUBSTITUTION')
|
||||
|
||||
if unresolved_value is not NO_SUBSTITUTION and unresolved_value is not DEFAULT_SUBSTITUTION:
|
||||
cls.unresolve_substitutions_to_value(config, unresolved_value)
|
||||
return config
|
||||
|
||||
@classmethod
|
||||
def _resolve_variable(cls, config, substitution):
|
||||
"""
|
||||
:param config:
|
||||
:param substitution:
|
||||
:return: (is_resolved, resolved_variable)
|
||||
"""
|
||||
variable = substitution.variable
|
||||
try:
|
||||
return True, config.get(variable)
|
||||
except ConfigMissingException:
|
||||
# default to environment variable
|
||||
value = os.environ.get(variable)
|
||||
|
||||
if value is None:
|
||||
if substitution.optional:
|
||||
return False, None
|
||||
else:
|
||||
raise ConfigSubstitutionException(
|
||||
"Cannot resolve variable ${{{variable}}} (line: {line}, col: {col})".format(
|
||||
variable=variable,
|
||||
line=lineno(substitution.loc, substitution.instring),
|
||||
col=col(substitution.loc, substitution.instring)))
|
||||
elif isinstance(value, ConfigList) or isinstance(value, ConfigTree):
|
||||
raise ConfigSubstitutionException(
|
||||
"Cannot substitute variable ${{{variable}}} because it does not point to a "
|
||||
"string, int, float, boolean or null {type} (line:{line}, col: {col})".format(
|
||||
variable=variable,
|
||||
type=value.__class__.__name__,
|
||||
line=lineno(substitution.loc, substitution.instring),
|
||||
col=col(substitution.loc, substitution.instring)))
|
||||
return True, value
|
||||
|
||||
@classmethod
|
||||
def _fixup_self_references(cls, config, accept_unresolved=False):
|
||||
if isinstance(config, ConfigTree) and config.root:
|
||||
for key in config: # Traverse history of element
|
||||
history = config.history[key]
|
||||
previous_item = history[0]
|
||||
for current_item in history[1:]:
|
||||
for substitution in cls._find_substitutions(current_item):
|
||||
prop_path = ConfigTree.parse_key(substitution.variable)
|
||||
if len(prop_path) > 1 and config.get(substitution.variable, None) is not None:
|
||||
continue # If value is present in latest version, don't do anything
|
||||
if prop_path[0] == key:
|
||||
if isinstance(previous_item, ConfigValues) and not accept_unresolved:
|
||||
# We hit a dead end, we cannot evaluate
|
||||
raise ConfigSubstitutionException(
|
||||
"Property {variable} cannot be substituted. Check for cycles.".format(
|
||||
variable=substitution.variable
|
||||
)
|
||||
)
|
||||
else:
|
||||
value = previous_item if len(
|
||||
prop_path) == 1 else previous_item.get(".".join(prop_path[1:]))
|
||||
_, _, current_item = cls._do_substitute(substitution, value)
|
||||
previous_item = current_item
|
||||
|
||||
if len(history) == 1:
|
||||
for substitution in cls._find_substitutions(previous_item):
|
||||
prop_path = ConfigTree.parse_key(substitution.variable)
|
||||
if len(prop_path) > 1 and config.get(substitution.variable, None) is not None:
|
||||
continue # If value is present in latest version, don't do anything
|
||||
if prop_path[0] == key and substitution.optional:
|
||||
cls._do_substitute(substitution, None)
|
||||
if prop_path[0] == key:
|
||||
value = os.environ.get(key)
|
||||
if value is not None:
|
||||
cls._do_substitute(substitution, value)
|
||||
continue
|
||||
if substitution.optional: # special case, when self optional referencing without existing
|
||||
cls._do_substitute(substitution, None)
|
||||
|
||||
# traverse config to find all the substitutions
|
||||
@classmethod
|
||||
def _find_substitutions(cls, item):
|
||||
"""Convert HOCON input into a JSON output
|
||||
|
||||
:return: JSON string representation
|
||||
:type return: basestring
|
||||
"""
|
||||
if isinstance(item, ConfigValues):
|
||||
return item.get_substitutions()
|
||||
|
||||
substitutions = []
|
||||
elements = []
|
||||
if isinstance(item, ConfigTree):
|
||||
elements = item.values()
|
||||
elif isinstance(item, list):
|
||||
elements = item
|
||||
|
||||
for child in elements:
|
||||
substitutions += cls._find_substitutions(child)
|
||||
return substitutions
|
||||
|
||||
@classmethod
|
||||
def _do_substitute(cls, substitution, resolved_value, is_optional_resolved=True):
|
||||
unresolved = False
|
||||
new_substitutions = []
|
||||
if isinstance(resolved_value, ConfigValues):
|
||||
resolved_value = resolved_value.transform()
|
||||
if isinstance(resolved_value, ConfigValues):
|
||||
unresolved = True
|
||||
result = resolved_value
|
||||
else:
|
||||
# replace token by substitution
|
||||
config_values = substitution.parent
|
||||
# if it is a string, then add the extra ws that was present in the original string after the substitution
|
||||
formatted_resolved_value = resolved_value \
|
||||
if resolved_value is None \
|
||||
or isinstance(resolved_value, (dict, list)) \
|
||||
or substitution.index == len(config_values.tokens) - 1 \
|
||||
else (str(resolved_value) + substitution.ws)
|
||||
# use a deepcopy of resolved_value to avoid mutation
|
||||
config_values.put(substitution.index, copy.deepcopy(formatted_resolved_value))
|
||||
transformation = config_values.transform()
|
||||
result = config_values.overriden_value \
|
||||
if transformation is None and not is_optional_resolved \
|
||||
else transformation
|
||||
|
||||
if result is None and config_values.key in config_values.parent:
|
||||
del config_values.parent[config_values.key]
|
||||
else:
|
||||
config_values.parent[config_values.key] = result
|
||||
s = cls._find_substitutions(result)
|
||||
if s:
|
||||
new_substitutions = s
|
||||
unresolved = True
|
||||
|
||||
return (unresolved, new_substitutions, result)
|
||||
|
||||
@classmethod
|
||||
def _final_fixup(cls, item):
|
||||
if isinstance(item, ConfigValues):
|
||||
return item.transform()
|
||||
elif isinstance(item, list):
|
||||
return list([cls._final_fixup(child) for child in item])
|
||||
elif isinstance(item, ConfigTree):
|
||||
items = list(item.items())
|
||||
for key, child in items:
|
||||
item[key] = cls._final_fixup(child)
|
||||
return item
|
||||
|
||||
@classmethod
|
||||
def unresolve_substitutions_to_value(cls, config, unresolved_value=STR_SUBSTITUTION):
|
||||
for substitution in cls._find_substitutions(config):
|
||||
if unresolved_value is STR_SUBSTITUTION:
|
||||
value = substitution.raw_str()
|
||||
elif unresolved_value is None:
|
||||
value = NoneValue()
|
||||
else:
|
||||
value = unresolved_value
|
||||
cls._do_substitute(substitution, value, False)
|
||||
cls._final_fixup(config)
|
||||
|
||||
@classmethod
|
||||
def resolve_substitutions(cls, config, accept_unresolved=False):
|
||||
has_unresolved = False
|
||||
cls._fixup_self_references(config, accept_unresolved)
|
||||
substitutions = cls._find_substitutions(config)
|
||||
if len(substitutions) > 0:
|
||||
unresolved = True
|
||||
any_unresolved = True
|
||||
_substitutions = []
|
||||
cache = {}
|
||||
while any_unresolved and len(substitutions) > 0 and set(substitutions) != set(_substitutions):
|
||||
unresolved = False
|
||||
any_unresolved = True
|
||||
_substitutions = substitutions[:]
|
||||
|
||||
for substitution in _substitutions:
|
||||
is_optional_resolved, resolved_value = cls._resolve_variable(config, substitution)
|
||||
|
||||
# if the substitution is optional
|
||||
if not is_optional_resolved and substitution.optional:
|
||||
resolved_value = None
|
||||
if isinstance(resolved_value, ConfigValues):
|
||||
parents = cache.get(resolved_value)
|
||||
if parents is None:
|
||||
parents = []
|
||||
link = resolved_value
|
||||
while isinstance(link, ConfigValues):
|
||||
parents.append(link)
|
||||
link = link.overriden_value
|
||||
cache[resolved_value] = parents
|
||||
|
||||
if isinstance(resolved_value, ConfigValues) \
|
||||
and substitution.parent in parents \
|
||||
and hasattr(substitution.parent, 'overriden_value') \
|
||||
and substitution.parent.overriden_value:
|
||||
|
||||
# self resolution, backtrack
|
||||
resolved_value = substitution.parent.overriden_value
|
||||
|
||||
unresolved, new_substitutions, result = cls._do_substitute(
|
||||
substitution, resolved_value, is_optional_resolved)
|
||||
any_unresolved = unresolved or any_unresolved
|
||||
substitutions.extend(new_substitutions)
|
||||
if not isinstance(result, ConfigValues):
|
||||
substitutions.remove(substitution)
|
||||
|
||||
cls._final_fixup(config)
|
||||
if unresolved:
|
||||
has_unresolved = True
|
||||
if not accept_unresolved:
|
||||
raise ConfigSubstitutionException("Cannot resolve {variables}. Check for cycles.".format(
|
||||
variables=', '.join('${{{variable}}}: (line: {line}, col: {col})'.format(
|
||||
variable=substitution.variable,
|
||||
line=lineno(substitution.loc, substitution.instring),
|
||||
col=col(substitution.loc, substitution.instring)) for substitution in substitutions)))
|
||||
|
||||
cls._final_fixup(config)
|
||||
return has_unresolved
|
||||
|
||||
|
||||
class ListParser(TokenConverter):
|
||||
"""Parse a list [elt1, etl2, ...]
|
||||
"""
|
||||
|
||||
def __init__(self, expr=None):
|
||||
super(ListParser, self).__init__(expr)
|
||||
self.saveAsList = True
|
||||
|
||||
def postParse(self, instring, loc, token_list):
|
||||
"""Create a list from the tokens
|
||||
|
||||
:param instring:
|
||||
:param loc:
|
||||
:param token_list:
|
||||
:return:
|
||||
"""
|
||||
cleaned_token_list = [token for tokens in (token.tokens if isinstance(token, ConfigInclude) else [token]
|
||||
for token in token_list if token != '')
|
||||
for token in tokens]
|
||||
config_list = ConfigList(cleaned_token_list)
|
||||
return [config_list]
|
||||
|
||||
|
||||
class ConcatenatedValueParser(TokenConverter):
|
||||
def __init__(self, expr=None):
|
||||
super(ConcatenatedValueParser, self).__init__(expr)
|
||||
self.parent = None
|
||||
self.key = None
|
||||
|
||||
def postParse(self, instring, loc, token_list):
|
||||
config_values = ConfigValues(token_list, instring, loc)
|
||||
return [config_values.transform()]
|
||||
|
||||
|
||||
class ConfigTreeParser(TokenConverter):
|
||||
"""
|
||||
Parse a config tree from tokens
|
||||
"""
|
||||
|
||||
def __init__(self, expr=None, root=False):
|
||||
super(ConfigTreeParser, self).__init__(expr)
|
||||
self.root = root
|
||||
self.saveAsList = True
|
||||
|
||||
def postParse(self, instring, loc, token_list):
|
||||
"""Create ConfigTree from tokens
|
||||
|
||||
:param instring:
|
||||
:param loc:
|
||||
:param token_list:
|
||||
:return:
|
||||
"""
|
||||
config_tree = ConfigTree(root=self.root)
|
||||
for element in token_list:
|
||||
expanded_tokens = element.tokens if isinstance(element, ConfigInclude) else [element]
|
||||
|
||||
for tokens in expanded_tokens:
|
||||
# key, value1 (optional), ...
|
||||
key = tokens[0].strip() if isinstance(tokens[0], (unicode, basestring)) else tokens[0]
|
||||
operator = '='
|
||||
if len(tokens) == 3 and tokens[1].strip() in [':', '=', '+=']:
|
||||
operator = tokens[1].strip()
|
||||
values = tokens[2:]
|
||||
elif len(tokens) == 2:
|
||||
values = tokens[1:]
|
||||
else:
|
||||
raise ParseSyntaxException("Unknown tokens {tokens} received".format(tokens=tokens))
|
||||
# empty string
|
||||
if len(values) == 0:
|
||||
config_tree.put(key, '')
|
||||
else:
|
||||
value = values[0]
|
||||
if isinstance(value, list) and operator == "+=":
|
||||
value = ConfigValues([ConfigSubstitution(key, True, '', False, loc), value], False, loc)
|
||||
config_tree.put(key, value, False)
|
||||
elif isinstance(value, unicode) and operator == "+=":
|
||||
value = ConfigValues([ConfigSubstitution(key, True, '', True, loc), ' ' + value], True, loc)
|
||||
config_tree.put(key, value, False)
|
||||
elif isinstance(value, list):
|
||||
config_tree.put(key, value, False)
|
||||
else:
|
||||
existing_value = config_tree.get(key, None)
|
||||
if isinstance(value, ConfigTree) and not isinstance(existing_value, list):
|
||||
# Only Tree has to be merged with tree
|
||||
config_tree.put(key, value, True)
|
||||
elif isinstance(value, ConfigValues):
|
||||
conf_value = value
|
||||
value.parent = config_tree
|
||||
value.key = key
|
||||
if isinstance(existing_value, list) or isinstance(existing_value, ConfigTree):
|
||||
config_tree.put(key, conf_value, True)
|
||||
else:
|
||||
config_tree.put(key, conf_value, False)
|
||||
else:
|
||||
config_tree.put(key, value, False)
|
||||
return config_tree
|
||||
608
clearml_agent/external/pyhocon/config_tree.py
vendored
Normal file
608
clearml_agent/external/pyhocon/config_tree.py
vendored
Normal file
@@ -0,0 +1,608 @@
|
||||
from collections import OrderedDict
|
||||
from pyparsing import lineno
|
||||
from pyparsing import col
|
||||
try:
|
||||
basestring
|
||||
except NameError: # pragma: no cover
|
||||
basestring = str
|
||||
unicode = str
|
||||
|
||||
import re
|
||||
import copy
|
||||
from .exceptions import ConfigException, ConfigWrongTypeException, ConfigMissingException
|
||||
|
||||
|
||||
class UndefinedKey(object):
|
||||
pass
|
||||
|
||||
|
||||
class NonExistentKey(object):
|
||||
pass
|
||||
|
||||
|
||||
class NoneValue(object):
|
||||
pass
|
||||
|
||||
|
||||
class ConfigTree(OrderedDict):
|
||||
KEY_SEP = '.'
|
||||
|
||||
def __init__(self, *args, **kwds):
|
||||
self.root = kwds.pop('root') if 'root' in kwds else False
|
||||
if self.root:
|
||||
self.history = {}
|
||||
super(ConfigTree, self).__init__(*args, **kwds)
|
||||
for key, value in self.items():
|
||||
if isinstance(value, ConfigValues):
|
||||
value.parent = self
|
||||
value.index = key
|
||||
|
||||
@staticmethod
|
||||
def merge_configs(a, b, copy_trees=False):
|
||||
"""Merge config b into a
|
||||
|
||||
:param a: target config
|
||||
:type a: ConfigTree
|
||||
:param b: source config
|
||||
:type b: ConfigTree
|
||||
:return: merged config a
|
||||
"""
|
||||
for key, value in b.items():
|
||||
# if key is in both a and b and both values are dictionary then merge it otherwise override it
|
||||
if key in a and isinstance(a[key], ConfigTree) and isinstance(b[key], ConfigTree):
|
||||
if copy_trees:
|
||||
a[key] = a[key].copy()
|
||||
ConfigTree.merge_configs(a[key], b[key], copy_trees=copy_trees)
|
||||
else:
|
||||
if isinstance(value, ConfigValues):
|
||||
value.parent = a
|
||||
value.key = key
|
||||
if key in a:
|
||||
value.overriden_value = a[key]
|
||||
a[key] = value
|
||||
if a.root:
|
||||
if b.root:
|
||||
a.history[key] = a.history.get(key, []) + b.history.get(key, [value])
|
||||
else:
|
||||
a.history[key] = a.history.get(key, []) + [value]
|
||||
|
||||
return a
|
||||
|
||||
def _put(self, key_path, value, append=False):
|
||||
key_elt = key_path[0]
|
||||
if len(key_path) == 1:
|
||||
# if value to set does not exist, override
|
||||
# if they are both configs then merge
|
||||
# if not then override
|
||||
if key_elt in self and isinstance(self[key_elt], ConfigTree) and isinstance(value, ConfigTree):
|
||||
if self.root:
|
||||
new_value = ConfigTree.merge_configs(ConfigTree(), self[key_elt], copy_trees=True)
|
||||
new_value = ConfigTree.merge_configs(new_value, value, copy_trees=True)
|
||||
self._push_history(key_elt, new_value)
|
||||
self[key_elt] = new_value
|
||||
else:
|
||||
ConfigTree.merge_configs(self[key_elt], value)
|
||||
elif append:
|
||||
# If we have t=1
|
||||
# and we try to put t.a=5 then t is replaced by {a: 5}
|
||||
l_value = self.get(key_elt, None)
|
||||
if isinstance(l_value, ConfigValues):
|
||||
l_value.tokens.append(value)
|
||||
l_value.recompute()
|
||||
elif isinstance(l_value, ConfigTree) and isinstance(value, ConfigValues):
|
||||
value.overriden_value = l_value
|
||||
value.tokens.insert(0, l_value)
|
||||
value.recompute()
|
||||
value.parent = self
|
||||
value.key = key_elt
|
||||
self._push_history(key_elt, value)
|
||||
self[key_elt] = value
|
||||
elif isinstance(l_value, list) and isinstance(value, ConfigValues):
|
||||
self._push_history(key_elt, value)
|
||||
value.overriden_value = l_value
|
||||
value.parent = self
|
||||
value.key = key_elt
|
||||
self[key_elt] = value
|
||||
elif isinstance(l_value, list):
|
||||
self[key_elt] = l_value + value
|
||||
self._push_history(key_elt, l_value)
|
||||
elif l_value is None:
|
||||
self._push_history(key_elt, value)
|
||||
self[key_elt] = value
|
||||
|
||||
else:
|
||||
raise ConfigWrongTypeException(
|
||||
u"Cannot concatenate the list {key}: {value} to {prev_value} of {type}".format(
|
||||
key='.'.join(key_path),
|
||||
value=value,
|
||||
prev_value=l_value,
|
||||
type=l_value.__class__.__name__)
|
||||
)
|
||||
else:
|
||||
# if there was an override keep overide value
|
||||
if isinstance(value, ConfigValues):
|
||||
value.parent = self
|
||||
value.key = key_elt
|
||||
value.overriden_value = self.get(key_elt, None)
|
||||
self._push_history(key_elt, value)
|
||||
self[key_elt] = value
|
||||
else:
|
||||
next_config_tree = super(ConfigTree, self).get(key_elt)
|
||||
if not isinstance(next_config_tree, ConfigTree):
|
||||
# create a new dictionary or overwrite a previous value
|
||||
next_config_tree = ConfigTree()
|
||||
self._push_history(key_elt, next_config_tree)
|
||||
self[key_elt] = next_config_tree
|
||||
next_config_tree._put(key_path[1:], value, append)
|
||||
|
||||
def _push_history(self, key, value):
|
||||
if self.root:
|
||||
hist = self.history.get(key)
|
||||
if hist is None:
|
||||
hist = self.history[key] = []
|
||||
hist.append(value)
|
||||
|
||||
def _get(self, key_path, key_index=0, default=UndefinedKey):
|
||||
key_elt = key_path[key_index]
|
||||
elt = super(ConfigTree, self).get(key_elt, UndefinedKey)
|
||||
|
||||
if elt is UndefinedKey:
|
||||
if default is UndefinedKey:
|
||||
raise ConfigMissingException(u"No configuration setting found for key {key}".format(
|
||||
key='.'.join(key_path[: key_index + 1])))
|
||||
else:
|
||||
return default
|
||||
|
||||
if key_index == len(key_path) - 1:
|
||||
if isinstance(elt, NoneValue):
|
||||
return None
|
||||
elif isinstance(elt, list):
|
||||
return [None if isinstance(x, NoneValue) else x for x in elt]
|
||||
else:
|
||||
return elt
|
||||
elif isinstance(elt, ConfigTree):
|
||||
return elt._get(key_path, key_index + 1, default)
|
||||
else:
|
||||
if default is UndefinedKey:
|
||||
raise ConfigWrongTypeException(
|
||||
u"{key} has type {type} rather than dict".format(key='.'.join(key_path[:key_index + 1]),
|
||||
type=type(elt).__name__))
|
||||
else:
|
||||
return default
|
||||
|
||||
@staticmethod
|
||||
def parse_key(string):
|
||||
"""
|
||||
Split a key into path elements:
|
||||
- a.b.c => a, b, c
|
||||
- a."b.c" => a, QuotedKey("b.c") if . is any of the special characters: $}[]:=+#`^?!@*&.
|
||||
- "a" => a
|
||||
- a.b."c" => a, b, c (special case)
|
||||
:param string: either string key (parse '.' as sub-key) or int / float as regular keys
|
||||
:return:
|
||||
"""
|
||||
if isinstance(string, (int, float)):
|
||||
return [string]
|
||||
|
||||
special_characters = '$}[]:=+#`^?!@*&.'
|
||||
tokens = re.findall(
|
||||
r'"[^"]+"|[^{special_characters}]+'.format(special_characters=re.escape(special_characters)),
|
||||
string)
|
||||
|
||||
def contains_special_character(token):
|
||||
return any((c in special_characters) for c in token)
|
||||
|
||||
return [token if contains_special_character(token) else token.strip('"') for token in tokens]
|
||||
|
||||
def put(self, key, value, append=False):
|
||||
"""Put a value in the tree (dot separated)
|
||||
|
||||
:param key: key to use (dot separated). E.g., a.b.c
|
||||
:type key: basestring
|
||||
:param value: value to put
|
||||
"""
|
||||
self._put(ConfigTree.parse_key(key), value, append)
|
||||
|
||||
def get(self, key, default=UndefinedKey):
|
||||
"""Get a value from the tree
|
||||
|
||||
:param key: key to use (dot separated). E.g., a.b.c
|
||||
:type key: basestring
|
||||
:param default: default value if key not found
|
||||
:type default: object
|
||||
:return: value in the tree located at key
|
||||
"""
|
||||
return self._get(ConfigTree.parse_key(key), 0, default)
|
||||
|
||||
def get_string(self, key, default=UndefinedKey):
|
||||
"""Return string representation of value found at key
|
||||
|
||||
:param key: key to use (dot separated). E.g., a.b.c
|
||||
:type key: basestring
|
||||
:param default: default value if key not found
|
||||
:type default: basestring
|
||||
:return: string value
|
||||
:type return: basestring
|
||||
"""
|
||||
value = self.get(key, default)
|
||||
if value is None:
|
||||
return None
|
||||
|
||||
string_value = unicode(value)
|
||||
if isinstance(value, bool):
|
||||
string_value = string_value.lower()
|
||||
return string_value
|
||||
|
||||
def pop(self, key, default=UndefinedKey):
|
||||
"""Remove specified key and return the corresponding value.
|
||||
If key is not found, default is returned if given, otherwise ConfigMissingException is raised
|
||||
|
||||
This method assumes the user wants to remove the last value in the chain so it parses via parse_key
|
||||
and pops the last value out of the dict.
|
||||
|
||||
:param key: key to use (dot separated). E.g., a.b.c
|
||||
:type key: basestring
|
||||
:param default: default value if key not found
|
||||
:type default: object
|
||||
:param default: default value if key not found
|
||||
:return: value in the tree located at key
|
||||
"""
|
||||
if default != UndefinedKey and key not in self:
|
||||
return default
|
||||
|
||||
value = self.get(key, UndefinedKey)
|
||||
lst = ConfigTree.parse_key(key)
|
||||
parent = self.KEY_SEP.join(lst[0:-1])
|
||||
child = lst[-1]
|
||||
|
||||
if parent:
|
||||
self.get(parent).__delitem__(child)
|
||||
else:
|
||||
self.__delitem__(child)
|
||||
return value
|
||||
|
||||
def get_int(self, key, default=UndefinedKey):
|
||||
"""Return int representation of value found at key
|
||||
|
||||
:param key: key to use (dot separated). E.g., a.b.c
|
||||
:type key: basestring
|
||||
:param default: default value if key not found
|
||||
:type default: int
|
||||
:return: int value
|
||||
:type return: int
|
||||
"""
|
||||
value = self.get(key, default)
|
||||
try:
|
||||
return int(value) if value is not None else None
|
||||
except (TypeError, ValueError):
|
||||
raise ConfigException(
|
||||
u"{key} has type '{type}' rather than 'int'".format(key=key, type=type(value).__name__))
|
||||
|
||||
def get_float(self, key, default=UndefinedKey):
|
||||
"""Return float representation of value found at key
|
||||
|
||||
:param key: key to use (dot separated). E.g., a.b.c
|
||||
:type key: basestring
|
||||
:param default: default value if key not found
|
||||
:type default: float
|
||||
:return: float value
|
||||
:type return: float
|
||||
"""
|
||||
value = self.get(key, default)
|
||||
try:
|
||||
return float(value) if value is not None else None
|
||||
except (TypeError, ValueError):
|
||||
raise ConfigException(
|
||||
u"{key} has type '{type}' rather than 'float'".format(key=key, type=type(value).__name__))
|
||||
|
||||
def get_bool(self, key, default=UndefinedKey):
|
||||
"""Return boolean representation of value found at key
|
||||
|
||||
:param key: key to use (dot separated). E.g., a.b.c
|
||||
:type key: basestring
|
||||
:param default: default value if key not found
|
||||
:type default: bool
|
||||
:return: boolean value
|
||||
:type return: bool
|
||||
"""
|
||||
|
||||
# String conversions as per API-recommendations:
|
||||
# https://github.com/typesafehub/config/blob/master/HOCON.md#automatic-type-conversions
|
||||
bool_conversions = {
|
||||
None: None,
|
||||
'true': True, 'yes': True, 'on': True,
|
||||
'false': False, 'no': False, 'off': False
|
||||
}
|
||||
string_value = self.get_string(key, default)
|
||||
if string_value is not None:
|
||||
string_value = string_value.lower()
|
||||
try:
|
||||
return bool_conversions[string_value]
|
||||
except KeyError:
|
||||
raise ConfigException(
|
||||
u"{key} does not translate to a Boolean value".format(key=key))
|
||||
|
||||
def get_list(self, key, default=UndefinedKey):
|
||||
"""Return list representation of value found at key
|
||||
|
||||
:param key: key to use (dot separated). E.g., a.b.c
|
||||
:type key: basestring
|
||||
:param default: default value if key not found
|
||||
:type default: list
|
||||
:return: list value
|
||||
:type return: list
|
||||
"""
|
||||
value = self.get(key, default)
|
||||
if isinstance(value, list):
|
||||
return value
|
||||
elif isinstance(value, ConfigTree):
|
||||
lst = []
|
||||
for k, v in sorted(value.items(), key=lambda kv: kv[0]):
|
||||
if re.match('^[1-9][0-9]*$|0', k):
|
||||
lst.append(v)
|
||||
else:
|
||||
raise ConfigException(u"{key} does not translate to a list".format(key=key))
|
||||
return lst
|
||||
elif value is None:
|
||||
return None
|
||||
else:
|
||||
raise ConfigException(
|
||||
u"{key} has type '{type}' rather than 'list'".format(key=key, type=type(value).__name__))
|
||||
|
||||
def get_config(self, key, default=UndefinedKey):
|
||||
"""Return tree config representation of value found at key
|
||||
|
||||
:param key: key to use (dot separated). E.g., a.b.c
|
||||
:type key: basestring
|
||||
:param default: default value if key not found
|
||||
:type default: config
|
||||
:return: config value
|
||||
:type return: ConfigTree
|
||||
"""
|
||||
value = self.get(key, default)
|
||||
if isinstance(value, dict):
|
||||
return value
|
||||
elif value is None:
|
||||
return None
|
||||
else:
|
||||
raise ConfigException(
|
||||
u"{key} has type '{type}' rather than 'config'".format(key=key, type=type(value).__name__))
|
||||
|
||||
def __getitem__(self, item):
|
||||
val = self.get(item)
|
||||
if val is UndefinedKey:
|
||||
raise KeyError(item)
|
||||
return val
|
||||
|
||||
try:
|
||||
from collections import _OrderedDictItemsView
|
||||
except ImportError: # pragma: nocover
|
||||
pass
|
||||
else:
|
||||
def items(self): # pragma: nocover
|
||||
return self._OrderedDictItemsView(self)
|
||||
|
||||
def __getattr__(self, item):
|
||||
val = self.get(item, NonExistentKey)
|
||||
if val is NonExistentKey:
|
||||
return super(ConfigTree, self).__getattr__(item)
|
||||
return val
|
||||
|
||||
def __contains__(self, item):
|
||||
return self._get(self.parse_key(item), default=NoneValue) is not NoneValue
|
||||
|
||||
def with_fallback(self, config, resolve=True):
|
||||
"""
|
||||
return a new config with fallback on config
|
||||
:param config: config or filename of the config to fallback on
|
||||
:param resolve: resolve substitutions
|
||||
:return: new config with fallback on config
|
||||
"""
|
||||
if isinstance(config, ConfigTree):
|
||||
result = ConfigTree.merge_configs(copy.deepcopy(config), copy.deepcopy(self))
|
||||
else:
|
||||
from . import ConfigFactory
|
||||
result = ConfigTree.merge_configs(ConfigFactory.parse_file(config, resolve=False), copy.deepcopy(self))
|
||||
|
||||
if resolve:
|
||||
from . import ConfigParser
|
||||
ConfigParser.resolve_substitutions(result)
|
||||
return result
|
||||
|
||||
def as_plain_ordered_dict(self):
|
||||
"""return a deep copy of this config as a plain OrderedDict
|
||||
|
||||
The config tree should be fully resolved.
|
||||
|
||||
This is useful to get an object with no special semantics such as path expansion for the keys.
|
||||
In particular this means that keys that contain dots are not surrounded with '"' in the plain OrderedDict.
|
||||
|
||||
:return: this config as an OrderedDict
|
||||
:type return: OrderedDict
|
||||
"""
|
||||
def plain_value(v):
|
||||
if isinstance(v, list):
|
||||
return [plain_value(e) for e in v]
|
||||
elif isinstance(v, ConfigTree):
|
||||
return v.as_plain_ordered_dict()
|
||||
else:
|
||||
if isinstance(v, ConfigValues):
|
||||
raise ConfigException("The config tree contains unresolved elements")
|
||||
return v
|
||||
|
||||
return OrderedDict((key.strip('"') if isinstance(key, (unicode, basestring)) else key, plain_value(value))
|
||||
for key, value in self.items())
|
||||
|
||||
|
||||
class ConfigList(list):
|
||||
def __init__(self, iterable=[]):
|
||||
new_list = list(iterable)
|
||||
super(ConfigList, self).__init__(new_list)
|
||||
for index, value in enumerate(new_list):
|
||||
if isinstance(value, ConfigValues):
|
||||
value.parent = self
|
||||
value.key = index
|
||||
|
||||
|
||||
class ConfigInclude(object):
|
||||
def __init__(self, tokens):
|
||||
self.tokens = tokens
|
||||
|
||||
|
||||
class ConfigValues(object):
|
||||
def __init__(self, tokens, instring, loc):
|
||||
self.tokens = tokens
|
||||
self.parent = None
|
||||
self.key = None
|
||||
self._instring = instring
|
||||
self._loc = loc
|
||||
self.overriden_value = None
|
||||
self.recompute()
|
||||
|
||||
def recompute(self):
|
||||
for index, token in enumerate(self.tokens):
|
||||
if isinstance(token, ConfigSubstitution):
|
||||
token.parent = self
|
||||
token.index = index
|
||||
|
||||
# no value return empty string
|
||||
if len(self.tokens) == 0:
|
||||
self.tokens = ['']
|
||||
|
||||
# if the last token is an unquoted string then right strip it
|
||||
if isinstance(self.tokens[-1], ConfigUnquotedString):
|
||||
# rstrip only whitespaces, not \n\r because they would have been used escaped
|
||||
self.tokens[-1] = self.tokens[-1].rstrip(' \t')
|
||||
|
||||
def has_substitution(self):
|
||||
return len(self.get_substitutions()) > 0
|
||||
|
||||
def get_substitutions(self):
|
||||
lst = []
|
||||
node = self
|
||||
while node:
|
||||
lst = [token for token in node.tokens if isinstance(token, ConfigSubstitution)] + lst
|
||||
if hasattr(node, 'overriden_value'):
|
||||
node = node.overriden_value
|
||||
if not isinstance(node, ConfigValues):
|
||||
break
|
||||
else:
|
||||
break
|
||||
return lst
|
||||
|
||||
def transform(self):
|
||||
def determine_type(token):
|
||||
return ConfigTree if isinstance(token, ConfigTree) else ConfigList if isinstance(token, list) else str
|
||||
|
||||
def format_str(v, last=False):
|
||||
if isinstance(v, ConfigQuotedString):
|
||||
return v.value + ('' if last else v.ws)
|
||||
else:
|
||||
return '' if v is None else unicode(v)
|
||||
|
||||
if self.has_substitution():
|
||||
return self
|
||||
|
||||
# remove None tokens
|
||||
tokens = [token for token in self.tokens if token is not None]
|
||||
|
||||
if not tokens:
|
||||
return None
|
||||
|
||||
# check if all tokens are compatible
|
||||
first_tok_type = determine_type(tokens[0])
|
||||
for index, token in enumerate(tokens[1:]):
|
||||
tok_type = determine_type(token)
|
||||
if first_tok_type is not tok_type:
|
||||
raise ConfigWrongTypeException(
|
||||
"Token '{token}' of type {tok_type} (index {index}) must be of type {req_tok_type} "
|
||||
"(line: {line}, col: {col})".format(
|
||||
token=token,
|
||||
index=index + 1,
|
||||
tok_type=tok_type.__name__,
|
||||
req_tok_type=first_tok_type.__name__,
|
||||
line=lineno(self._loc, self._instring),
|
||||
col=col(self._loc, self._instring)))
|
||||
|
||||
if first_tok_type is ConfigTree:
|
||||
child = []
|
||||
if hasattr(self, 'overriden_value'):
|
||||
node = self.overriden_value
|
||||
while node:
|
||||
if isinstance(node, ConfigValues):
|
||||
value = node.transform()
|
||||
if isinstance(value, ConfigTree):
|
||||
child.append(value)
|
||||
else:
|
||||
break
|
||||
elif isinstance(node, ConfigTree):
|
||||
child.append(node)
|
||||
else:
|
||||
break
|
||||
if hasattr(node, 'overriden_value'):
|
||||
node = node.overriden_value
|
||||
else:
|
||||
break
|
||||
|
||||
result = ConfigTree()
|
||||
for conf in reversed(child):
|
||||
ConfigTree.merge_configs(result, conf, copy_trees=True)
|
||||
for token in tokens:
|
||||
ConfigTree.merge_configs(result, token, copy_trees=True)
|
||||
return result
|
||||
elif first_tok_type is ConfigList:
|
||||
result = []
|
||||
main_index = 0
|
||||
for sublist in tokens:
|
||||
sublist_result = ConfigList()
|
||||
for token in sublist:
|
||||
if isinstance(token, ConfigValues):
|
||||
token.parent = result
|
||||
token.key = main_index
|
||||
main_index += 1
|
||||
sublist_result.append(token)
|
||||
result.extend(sublist_result)
|
||||
return result
|
||||
else:
|
||||
if len(tokens) == 1:
|
||||
if isinstance(tokens[0], ConfigQuotedString):
|
||||
return tokens[0].value
|
||||
return tokens[0]
|
||||
else:
|
||||
return ''.join(format_str(token) for token in tokens[:-1]) + format_str(tokens[-1], True)
|
||||
|
||||
def put(self, index, value):
|
||||
self.tokens[index] = value
|
||||
|
||||
def __repr__(self): # pragma: no cover
|
||||
return '[ConfigValues: ' + ','.join(str(o) for o in self.tokens) + ']'
|
||||
|
||||
|
||||
class ConfigSubstitution(object):
|
||||
def __init__(self, variable, optional, ws, instring, loc):
|
||||
self.variable = variable
|
||||
self.optional = optional
|
||||
self.ws = ws
|
||||
self.index = None
|
||||
self.parent = None
|
||||
self.instring = instring
|
||||
self.loc = loc
|
||||
|
||||
def __repr__(self): # pragma: no cover
|
||||
return '[ConfigSubstitution: ' + self.variable + ']'
|
||||
|
||||
|
||||
class ConfigUnquotedString(unicode):
|
||||
def __new__(cls, value):
|
||||
return super(ConfigUnquotedString, cls).__new__(cls, value)
|
||||
|
||||
|
||||
class ConfigQuotedString(object):
|
||||
def __init__(self, value, ws, instring, loc):
|
||||
self.value = value
|
||||
self.ws = ws
|
||||
self.instring = instring
|
||||
self.loc = loc
|
||||
|
||||
def __repr__(self): # pragma: no cover
|
||||
return '[ConfigQuotedString: ' + self.value + ']'
|
||||
329
clearml_agent/external/pyhocon/converter.py
vendored
Normal file
329
clearml_agent/external/pyhocon/converter.py
vendored
Normal file
@@ -0,0 +1,329 @@
|
||||
import json
|
||||
import re
|
||||
import sys
|
||||
|
||||
from . import ConfigFactory
|
||||
from .config_tree import ConfigQuotedString
|
||||
from .config_tree import ConfigSubstitution
|
||||
from .config_tree import ConfigTree
|
||||
from .config_tree import ConfigValues
|
||||
from .config_tree import NoneValue
|
||||
|
||||
|
||||
try:
|
||||
basestring
|
||||
except NameError:
|
||||
basestring = str
|
||||
unicode = str
|
||||
|
||||
|
||||
class HOCONConverter(object):
|
||||
_number_re = r'[+-]?(\d*\.\d+|\d+(\.\d+)?)([eE][+\-]?\d+)?(?=$|[ \t]*([\$\}\],#\n\r]|//))'
|
||||
_number_re_matcher = re.compile(_number_re)
|
||||
|
||||
@classmethod
|
||||
def to_json(cls, config, compact=False, indent=2, level=0):
|
||||
"""Convert HOCON input into a JSON output
|
||||
|
||||
:return: JSON string representation
|
||||
:type return: basestring
|
||||
"""
|
||||
lines = ""
|
||||
if isinstance(config, ConfigTree):
|
||||
if len(config) == 0:
|
||||
lines += '{}'
|
||||
else:
|
||||
lines += '{\n'
|
||||
bet_lines = []
|
||||
for key, item in config.items():
|
||||
bet_lines.append('{indent}"{key}": {value}'.format(
|
||||
indent=''.rjust((level + 1) * indent, ' '),
|
||||
key=key.strip('"'), # for dotted keys enclosed with "" to not be interpreted as nested key
|
||||
value=cls.to_json(item, compact, indent, level + 1))
|
||||
)
|
||||
lines += ',\n'.join(bet_lines)
|
||||
lines += '\n{indent}}}'.format(indent=''.rjust(level * indent, ' '))
|
||||
elif isinstance(config, list):
|
||||
if len(config) == 0:
|
||||
lines += '[]'
|
||||
else:
|
||||
lines += '[\n'
|
||||
bet_lines = []
|
||||
for item in config:
|
||||
bet_lines.append('{indent}{value}'.format(
|
||||
indent=''.rjust((level + 1) * indent, ' '),
|
||||
value=cls.to_json(item, compact, indent, level + 1))
|
||||
)
|
||||
lines += ',\n'.join(bet_lines)
|
||||
lines += '\n{indent}]'.format(indent=''.rjust(level * indent, ' '))
|
||||
elif isinstance(config, basestring):
|
||||
lines = json.dumps(config)
|
||||
elif config is None or isinstance(config, NoneValue):
|
||||
lines = 'null'
|
||||
elif config is True:
|
||||
lines = 'true'
|
||||
elif config is False:
|
||||
lines = 'false'
|
||||
else:
|
||||
lines = str(config)
|
||||
return lines
|
||||
|
||||
@staticmethod
|
||||
def _auto_indent(lines, section):
|
||||
# noinspection PyBroadException
|
||||
try:
|
||||
indent = len(lines) - lines.rindex('\n')
|
||||
except Exception:
|
||||
indent = len(lines)
|
||||
# noinspection PyBroadException
|
||||
try:
|
||||
section_indent = section.index('\n')
|
||||
except Exception:
|
||||
section_indent = len(section)
|
||||
if section_indent < 3:
|
||||
return lines + section
|
||||
|
||||
indent = '\n' + ''.rjust(indent, ' ')
|
||||
return lines + indent.join([sec.strip() for sec in section.split('\n')])
|
||||
# indent = ''.rjust(indent, ' ')
|
||||
# return lines + section.replace('\n', '\n'+indent)
|
||||
|
||||
@classmethod
|
||||
def to_hocon(cls, config, compact=False, indent=2, level=0):
|
||||
"""Convert HOCON input into a HOCON output
|
||||
|
||||
:return: JSON string representation
|
||||
:type return: basestring
|
||||
"""
|
||||
lines = ""
|
||||
if isinstance(config, ConfigTree):
|
||||
if len(config) == 0:
|
||||
lines += '{}'
|
||||
else:
|
||||
if level > 0: # don't display { at root level
|
||||
lines += '{\n'
|
||||
bet_lines = []
|
||||
|
||||
for key, item in config.items():
|
||||
if compact:
|
||||
full_key = key
|
||||
while isinstance(item, ConfigTree) and len(item) == 1:
|
||||
key, item = next(iter(item.items()))
|
||||
full_key += '.' + key
|
||||
else:
|
||||
full_key = key
|
||||
|
||||
if isinstance(full_key, float) or \
|
||||
(isinstance(full_key, (basestring, unicode)) and cls._number_re_matcher.match(full_key)):
|
||||
# if key can be casted to float, and it is a string, make sure we quote it
|
||||
full_key = '\"{}\"'.format(full_key)
|
||||
|
||||
bet_line = ('{indent}{key}{assign_sign} '.format(
|
||||
indent=''.rjust(level * indent, ' '),
|
||||
key=full_key,
|
||||
assign_sign='' if isinstance(item, dict) else ' =',)
|
||||
)
|
||||
value_line = cls.to_hocon(item, compact, indent, level + 1)
|
||||
if isinstance(item, (list, tuple)):
|
||||
bet_lines.append(cls._auto_indent(bet_line, value_line))
|
||||
else:
|
||||
bet_lines.append(bet_line + value_line)
|
||||
lines += '\n'.join(bet_lines)
|
||||
|
||||
if level > 0: # don't display { at root level
|
||||
lines += '\n{indent}}}'.format(indent=''.rjust((level - 1) * indent, ' '))
|
||||
elif isinstance(config, (list, tuple)):
|
||||
if len(config) == 0:
|
||||
lines += '[]'
|
||||
else:
|
||||
# lines += '[\n'
|
||||
lines += '['
|
||||
bet_lines = []
|
||||
base_len = len(lines)
|
||||
skip_comma = False
|
||||
for i, item in enumerate(config):
|
||||
if 0 < i and not skip_comma:
|
||||
# if not isinstance(item, (str, int, float)):
|
||||
# lines += ',\n{indent}'.format(indent=''.rjust(level * indent, ' '))
|
||||
# else:
|
||||
# lines += ', '
|
||||
lines += ', '
|
||||
|
||||
skip_comma = False
|
||||
new_line = cls.to_hocon(item, compact, indent, level + 1)
|
||||
lines += new_line
|
||||
if '\n' in new_line or len(lines) - base_len > 80:
|
||||
if i < len(config) - 1:
|
||||
lines += ',\n{indent}'.format(indent=''.rjust(level * indent, ' '))
|
||||
base_len = len(lines)
|
||||
skip_comma = True
|
||||
# bet_lines.append('{value}'.format(value=cls.to_hocon(item, compact, indent, level + 1)))
|
||||
|
||||
# lines += '\n'.join(bet_lines)
|
||||
# lines += ', '.join(bet_lines)
|
||||
|
||||
# lines += '\n{indent}]'.format(indent=''.rjust((level - 1) * indent, ' '))
|
||||
lines += ']'
|
||||
elif isinstance(config, basestring):
|
||||
if '\n' in config and len(config) > 1:
|
||||
lines = '"""{value}"""'.format(value=config) # multilines
|
||||
else:
|
||||
lines = '"{value}"'.format(value=cls.__escape_string(config))
|
||||
elif isinstance(config, ConfigValues):
|
||||
lines = ''.join(cls.to_hocon(o, compact, indent, level) for o in config.tokens)
|
||||
elif isinstance(config, ConfigSubstitution):
|
||||
lines = '${'
|
||||
if config.optional:
|
||||
lines += '?'
|
||||
lines += config.variable + '}' + config.ws
|
||||
elif isinstance(config, ConfigQuotedString):
|
||||
if '\n' in config.value and len(config.value) > 1:
|
||||
lines = '"""{value}"""'.format(value=config.value) # multilines
|
||||
else:
|
||||
lines = '"{value}"'.format(value=cls.__escape_string(config.value))
|
||||
elif config is None or isinstance(config, NoneValue):
|
||||
lines = 'null'
|
||||
elif config is True:
|
||||
lines = 'true'
|
||||
elif config is False:
|
||||
lines = 'false'
|
||||
else:
|
||||
lines = str(config)
|
||||
return lines
|
||||
|
||||
@classmethod
|
||||
def to_yaml(cls, config, compact=False, indent=2, level=0):
|
||||
"""Convert HOCON input into a YAML output
|
||||
|
||||
:return: YAML string representation
|
||||
:type return: basestring
|
||||
"""
|
||||
lines = ""
|
||||
if isinstance(config, ConfigTree):
|
||||
if len(config) > 0:
|
||||
if level > 0:
|
||||
lines += '\n'
|
||||
bet_lines = []
|
||||
for key, item in config.items():
|
||||
bet_lines.append('{indent}{key}: {value}'.format(
|
||||
indent=''.rjust(level * indent, ' '),
|
||||
key=key.strip('"'), # for dotted keys enclosed with "" to not be interpreted as nested key,
|
||||
value=cls.to_yaml(item, compact, indent, level + 1))
|
||||
)
|
||||
lines += '\n'.join(bet_lines)
|
||||
elif isinstance(config, list):
|
||||
config_list = [line for line in config if line is not None]
|
||||
if len(config_list) == 0:
|
||||
lines += '[]'
|
||||
else:
|
||||
lines += '\n'
|
||||
bet_lines = []
|
||||
for item in config_list:
|
||||
bet_lines.append('{indent}- {value}'.format(indent=''.rjust(level * indent, ' '),
|
||||
value=cls.to_yaml(item, compact, indent, level + 1)))
|
||||
lines += '\n'.join(bet_lines)
|
||||
elif isinstance(config, basestring):
|
||||
# if it contains a \n then it's multiline
|
||||
lines = config.split('\n')
|
||||
if len(lines) == 1:
|
||||
lines = config
|
||||
else:
|
||||
lines = '|\n' + '\n'.join([line.rjust(level * indent, ' ') for line in lines])
|
||||
elif config is None or isinstance(config, NoneValue):
|
||||
lines = 'null'
|
||||
elif config is True:
|
||||
lines = 'true'
|
||||
elif config is False:
|
||||
lines = 'false'
|
||||
else:
|
||||
lines = str(config)
|
||||
return lines
|
||||
|
||||
@classmethod
|
||||
def to_properties(cls, config, compact=False, indent=2, key_stack=[]):
|
||||
"""Convert HOCON input into a .properties output
|
||||
|
||||
:return: .properties string representation
|
||||
:type return: basestring
|
||||
:return:
|
||||
"""
|
||||
|
||||
def escape_value(value):
|
||||
return value.replace('=', '\\=').replace('!', '\\!').replace('#', '\\#').replace('\n', '\\\n')
|
||||
|
||||
stripped_key_stack = [key.strip('"') for key in key_stack]
|
||||
lines = []
|
||||
if isinstance(config, ConfigTree):
|
||||
for key, item in config.items():
|
||||
if item is not None:
|
||||
lines.append(cls.to_properties(item, compact, indent, stripped_key_stack + [key]))
|
||||
elif isinstance(config, list):
|
||||
for index, item in enumerate(config):
|
||||
if item is not None:
|
||||
lines.append(cls.to_properties(item, compact, indent, stripped_key_stack + [str(index)]))
|
||||
elif isinstance(config, basestring):
|
||||
lines.append('.'.join(stripped_key_stack) + ' = ' + escape_value(config))
|
||||
elif config is True:
|
||||
lines.append('.'.join(stripped_key_stack) + ' = true')
|
||||
elif config is False:
|
||||
lines.append('.'.join(stripped_key_stack) + ' = false')
|
||||
elif config is None or isinstance(config, NoneValue):
|
||||
pass
|
||||
else:
|
||||
lines.append('.'.join(stripped_key_stack) + ' = ' + str(config))
|
||||
return '\n'.join([line for line in lines if len(line) > 0])
|
||||
|
||||
@classmethod
|
||||
def convert(cls, config, output_format='json', indent=2, compact=False):
|
||||
converters = {
|
||||
'json': cls.to_json,
|
||||
'properties': cls.to_properties,
|
||||
'yaml': cls.to_yaml,
|
||||
'hocon': cls.to_hocon,
|
||||
}
|
||||
|
||||
if output_format in converters:
|
||||
return converters[output_format](config, compact, indent)
|
||||
else:
|
||||
raise Exception("Invalid format '{format}'. Format must be 'json', 'properties', 'yaml' or 'hocon'".format(
|
||||
format=output_format))
|
||||
|
||||
@classmethod
|
||||
def convert_from_file(cls, input_file=None, output_file=None, output_format='json', indent=2, compact=False):
|
||||
"""Convert to json, properties or yaml
|
||||
|
||||
:param input_file: input file, if not specified stdin
|
||||
:param output_file: output file, if not specified stdout
|
||||
:param output_format: json, properties or yaml
|
||||
:return: json, properties or yaml string representation
|
||||
"""
|
||||
|
||||
if input_file is None:
|
||||
content = sys.stdin.read()
|
||||
config = ConfigFactory.parse_string(content)
|
||||
else:
|
||||
config = ConfigFactory.parse_file(input_file)
|
||||
|
||||
res = cls.convert(config, output_format, indent, compact)
|
||||
if output_file is None:
|
||||
print(res)
|
||||
else:
|
||||
with open(output_file, "w") as fd:
|
||||
fd.write(res)
|
||||
|
||||
@classmethod
|
||||
def __escape_match(cls, match):
|
||||
char = match.group(0)
|
||||
return {
|
||||
'\b': r'\b',
|
||||
'\t': r'\t',
|
||||
'\n': r'\n',
|
||||
'\f': r'\f',
|
||||
'\r': r'\r',
|
||||
'"': r'\"',
|
||||
'\\': r'\\',
|
||||
}.get(char) or (r'\u%04x' % ord(char))
|
||||
|
||||
@classmethod
|
||||
def __escape_string(cls, string):
|
||||
return re.sub(r'[\x00-\x1F"\\]', cls.__escape_match, string)
|
||||
17
clearml_agent/external/pyhocon/exceptions.py
vendored
Normal file
17
clearml_agent/external/pyhocon/exceptions.py
vendored
Normal file
@@ -0,0 +1,17 @@
|
||||
class ConfigException(Exception):
|
||||
|
||||
def __init__(self, message, ex=None):
|
||||
super(ConfigException, self).__init__(message)
|
||||
self._exception = ex
|
||||
|
||||
|
||||
class ConfigMissingException(ConfigException, KeyError):
|
||||
pass
|
||||
|
||||
|
||||
class ConfigSubstitutionException(ConfigException):
|
||||
pass
|
||||
|
||||
|
||||
class ConfigWrongTypeException(ConfigException):
|
||||
pass
|
||||
@@ -1,6 +1,9 @@
|
||||
import os
|
||||
import re
|
||||
import warnings
|
||||
|
||||
from clearml_agent.definitions import PIP_EXTRA_INDICES
|
||||
|
||||
from .requirement import Requirement
|
||||
|
||||
|
||||
@@ -42,9 +45,14 @@ def parse(reqstr, cwd=None):
|
||||
yield requirement
|
||||
elif line.startswith('-f') or line.startswith('--find-links') or \
|
||||
line.startswith('-i') or line.startswith('--index-url') or \
|
||||
line.startswith('--extra-index-url') or \
|
||||
line.startswith('--no-index'):
|
||||
warnings.warn('Private repos not supported. Skipping.')
|
||||
elif line.startswith('--extra-index-url'):
|
||||
extra_index = line[len('--extra-index-url'):].strip()
|
||||
extra_index = re.sub(r"\s+#.*$", "", extra_index) # strip comments
|
||||
if extra_index and extra_index not in PIP_EXTRA_INDICES:
|
||||
PIP_EXTRA_INDICES.append(extra_index)
|
||||
print(f"appended {extra_index} to list of extra pip indices")
|
||||
continue
|
||||
elif line.startswith('-Z') or line.startswith('--always-unzip'):
|
||||
warnings.warn('Unused option --always-unzip. Skipping.')
|
||||
|
||||
7
clearml_agent/glue/definitions.py
Normal file
7
clearml_agent/glue/definitions.py
Normal file
@@ -0,0 +1,7 @@
|
||||
from clearml_agent.definitions import EnvironmentConfig
|
||||
|
||||
ENV_START_AGENT_SCRIPT_PATH = EnvironmentConfig('CLEARML_K8S_GLUE_START_AGENT_SCRIPT_PATH')
|
||||
"""
|
||||
Script path to use when creating the bash script to run the agent inside the scheduled pod's docker container.
|
||||
Script will be appended to the specified file.
|
||||
"""
|
||||
@@ -9,21 +9,30 @@ import os
|
||||
import re
|
||||
import subprocess
|
||||
import tempfile
|
||||
from collections import defaultdict
|
||||
from copy import deepcopy
|
||||
from pathlib import Path
|
||||
from pprint import pformat
|
||||
from threading import Thread
|
||||
from time import sleep
|
||||
from typing import Text, List, Callable, Any, Collection, Optional, Union
|
||||
from time import sleep, time
|
||||
from typing import Text, List, Callable, Any, Collection, Optional, Union, Iterable, Dict, Tuple, Set
|
||||
|
||||
import yaml
|
||||
|
||||
from clearml_agent.backend_api.session import Request
|
||||
from clearml_agent.commands.events import Events
|
||||
from clearml_agent.commands.worker import Worker, get_task_container, set_task_container, get_next_task
|
||||
from clearml_agent.definitions import ENV_DOCKER_IMAGE
|
||||
from clearml_agent.definitions import (
|
||||
ENV_DOCKER_IMAGE,
|
||||
ENV_AGENT_GIT_USER,
|
||||
ENV_AGENT_GIT_PASS,
|
||||
ENV_FORCE_SYSTEM_SITE_PACKAGES,
|
||||
)
|
||||
from clearml_agent.errors import APIError
|
||||
from clearml_agent.glue.definitions import ENV_START_AGENT_SCRIPT_PATH
|
||||
from clearml_agent.helper.base import safe_remove_file
|
||||
from clearml_agent.helper.dicts import merge_dicts
|
||||
from clearml_agent.helper.process import get_bash_output
|
||||
from clearml_agent.helper.process import get_bash_output, stringify_bash_output
|
||||
from clearml_agent.helper.resource_monitor import ResourceMonitor
|
||||
from clearml_agent.interface.base import ObjectID
|
||||
|
||||
@@ -33,19 +42,14 @@ class K8sIntegration(Worker):
|
||||
|
||||
K8S_DEFAULT_NAMESPACE = "clearml"
|
||||
AGENT_LABEL = "CLEARML=agent"
|
||||
LIMIT_POD_LABEL = "ai.allegro.agent.serial=pod-{pod_number}"
|
||||
|
||||
KUBECTL_APPLY_CMD = "kubectl apply --namespace={namespace} -f"
|
||||
|
||||
KUBECTL_RUN_CMD = "kubectl run clearml-id-{task_id} " \
|
||||
"--image {docker_image} {docker_args} " \
|
||||
"--restart=Never " \
|
||||
"--namespace={namespace}"
|
||||
|
||||
KUBECTL_DELETE_CMD = "kubectl delete pods " \
|
||||
"--selector={selector} " \
|
||||
"--field-selector=status.phase!=Pending,status.phase!=Running " \
|
||||
"--namespace={namespace}"
|
||||
KUBECTL_CLEANUP_DELETE_CMD = "kubectl delete pods " \
|
||||
"-l={agent_label} " \
|
||||
"--field-selector=status.phase!=Pending,status.phase!=Running " \
|
||||
"--namespace={namespace} " \
|
||||
"--output name"
|
||||
|
||||
BASH_INSTALL_SSH_CMD = [
|
||||
"apt-get update",
|
||||
@@ -62,6 +66,9 @@ class K8sIntegration(Worker):
|
||||
'echo "ldconfig" >> /etc/profile',
|
||||
"/usr/sbin/sshd -p {port}"]
|
||||
|
||||
DEFAULT_EXECUTION_AGENT_ARGS = os.getenv("K8S_GLUE_DEF_EXEC_AGENT_ARGS", "--full-monitoring --require-queue")
|
||||
POD_AGENT_INSTALL_ARGS = os.getenv("K8S_GLUE_POD_AGENT_INSTALL_ARGS", "")
|
||||
|
||||
CONTAINER_BASH_SCRIPT = [
|
||||
"export DEBIAN_FRONTEND='noninteractive'",
|
||||
"echo 'Binary::apt::APT::Keep-Downloaded-Packages \"true\";' > /etc/apt/apt.conf.d/docker-clean",
|
||||
@@ -73,18 +80,20 @@ class K8sIntegration(Worker):
|
||||
"export LOCAL_PYTHON=$(which python3.$i) && break ; done",
|
||||
"[ ! -z $LOCAL_PYTHON ] || apt-get install -y python3-pip",
|
||||
"[ ! -z $LOCAL_PYTHON ] || export LOCAL_PYTHON=python3",
|
||||
"$LOCAL_PYTHON -m pip install clearml-agent",
|
||||
"{extra_bash_init_cmd}",
|
||||
"$LOCAL_PYTHON -m pip install clearml-agent{agent_install_args}",
|
||||
"{extra_docker_bash_script}",
|
||||
"$LOCAL_PYTHON -m clearml_agent execute --full-monitoring --require-queue --id {task_id}"
|
||||
"$LOCAL_PYTHON -m clearml_agent execute {default_execution_agent_args} --id {task_id}"
|
||||
]
|
||||
|
||||
DEFAULT_POD_NAME_PREFIX = "clearml-id-"
|
||||
DEFAULT_LIMIT_POD_LABEL = "ai.allegro.agent.serial=pod-{pod_number}"
|
||||
|
||||
_edit_hyperparams_version = "2.9"
|
||||
|
||||
def __init__(
|
||||
self,
|
||||
k8s_pending_queue_name=None,
|
||||
kubectl_cmd=None,
|
||||
container_bash_script=None,
|
||||
debug=False,
|
||||
ports_mode=False,
|
||||
@@ -97,15 +106,14 @@ class K8sIntegration(Worker):
|
||||
extra_bash_init_script=None,
|
||||
namespace=None,
|
||||
max_pods_limit=None,
|
||||
pod_name_prefix=None,
|
||||
limit_pod_label=None,
|
||||
**kwargs
|
||||
):
|
||||
"""
|
||||
Initialize the k8s integration glue layer daemon
|
||||
|
||||
:param str k8s_pending_queue_name: queue name to use when task is pending in the k8s scheduler
|
||||
:param str|callable kubectl_cmd: kubectl command line str, supports formatting (default: KUBECTL_RUN_CMD)
|
||||
example: "task={task_id} image={docker_image} queue_id={queue_id}"
|
||||
or a callable function: kubectl_cmd(task_id, docker_image, docker_args, queue_id, task_data)
|
||||
:param str container_bash_script: container bash script to be executed in k8s (default: CONTAINER_BASH_SCRIPT)
|
||||
Notice this string will use format() call, if you have curly brackets they should be doubled { -> {{
|
||||
Format arguments passed: {task_id} and {extra_bash_init_cmd}
|
||||
@@ -119,7 +127,7 @@ class K8sIntegration(Worker):
|
||||
when scheduling a task to run in a pod. Callable can receive an optional pod number and should return
|
||||
a dictionary of user properties (name and value). Signature is [[Optional[int]], Dict[str,str]]
|
||||
:param str overrides_yaml: YAML file containing the overrides for the pod (optional)
|
||||
:param str template_yaml: YAML file containing the template for the pod (optional).
|
||||
:param str template_yaml: YAML file containing the template for the pod (optional).
|
||||
If provided the pod is scheduled with kubectl apply and overrides are ignored, otherwise with kubectl run.
|
||||
:param str clearml_conf_file: clearml.conf file to be use by the pod itself (optional)
|
||||
:param str extra_bash_init_script: Additional bash script to run before starting the Task inside the container
|
||||
@@ -127,15 +135,21 @@ class K8sIntegration(Worker):
|
||||
:param int max_pods_limit: Maximum number of pods that K8S glue can run at the same time
|
||||
"""
|
||||
super(K8sIntegration, self).__init__()
|
||||
self.pod_name_prefix = pod_name_prefix or self.DEFAULT_POD_NAME_PREFIX
|
||||
self.limit_pod_label = limit_pod_label or self.DEFAULT_LIMIT_POD_LABEL
|
||||
self.k8s_pending_queue_name = k8s_pending_queue_name or self.K8S_PENDING_QUEUE
|
||||
self.kubectl_cmd = kubectl_cmd or self.KUBECTL_RUN_CMD
|
||||
self.k8s_pending_queue_id = None
|
||||
self.container_bash_script = container_bash_script or self.CONTAINER_BASH_SCRIPT
|
||||
# Always do system packages, because by we will be running inside a docker
|
||||
self._session.config.put("agent.package_manager.system_site_packages", True)
|
||||
force_system_packages = ENV_FORCE_SYSTEM_SITE_PACKAGES.get()
|
||||
self._force_system_site_packages = force_system_packages if force_system_packages is not None else True
|
||||
if self._force_system_site_packages:
|
||||
# Use system packages, because by we will be running inside a docker
|
||||
self._session.config.put("agent.package_manager.system_site_packages", True)
|
||||
# Add debug logging
|
||||
if debug:
|
||||
self.log.logger.disabled = False
|
||||
self.log.logger.setLevel(logging.INFO)
|
||||
self.log.logger.setLevel(logging.DEBUG)
|
||||
self.log.logger.addHandler(logging.StreamHandler())
|
||||
self.ports_mode = ports_mode
|
||||
self.num_of_services = num_of_services
|
||||
self.base_pod_num = base_pod_num
|
||||
@@ -151,66 +165,87 @@ class K8sIntegration(Worker):
|
||||
self.pod_limits = []
|
||||
self.pod_requests = []
|
||||
self.max_pods_limit = max_pods_limit if not self.ports_mode else None
|
||||
if overrides_yaml:
|
||||
with open(os.path.expandvars(os.path.expanduser(str(overrides_yaml))), 'rt') as f:
|
||||
overrides = yaml.load(f, Loader=getattr(yaml, 'FullLoader', None))
|
||||
if overrides:
|
||||
containers = overrides.get('spec', {}).get('containers', [])
|
||||
for c in containers:
|
||||
resources = {str(k).lower(): v for k, v in c.get('resources', {}).items()}
|
||||
if not resources:
|
||||
continue
|
||||
if resources.get('limits'):
|
||||
self.pod_limits += ['{}={}'.format(k, v) for k, v in resources['limits'].items()]
|
||||
if resources.get('requests'):
|
||||
self.pod_requests += ['{}={}'.format(k, v) for k, v in resources['requests'].items()]
|
||||
# remove double entries
|
||||
self.pod_limits = list(set(self.pod_limits))
|
||||
self.pod_requests = list(set(self.pod_requests))
|
||||
if self.pod_limits or self.pod_requests:
|
||||
self.log.warning('Found pod container requests={} limits={}'.format(
|
||||
self.pod_limits, self.pod_requests))
|
||||
if containers:
|
||||
self.log.warning('Removing containers section: {}'.format(overrides['spec'].pop('containers')))
|
||||
self.overrides_json_string = json.dumps(overrides)
|
||||
|
||||
self._load_overrides_yaml(overrides_yaml)
|
||||
|
||||
if template_yaml:
|
||||
with open(os.path.expandvars(os.path.expanduser(str(template_yaml))), 'rt') as f:
|
||||
self.template_dict = yaml.load(f, Loader=getattr(yaml, 'FullLoader', None))
|
||||
self.template_dict = self._load_template_file(template_yaml)
|
||||
|
||||
clearml_conf_file = clearml_conf_file or kwargs.get('trains_conf_file')
|
||||
|
||||
if clearml_conf_file:
|
||||
with open(os.path.expandvars(os.path.expanduser(str(clearml_conf_file))), 'rt') as f:
|
||||
self.conf_file_content = f.read()
|
||||
# make sure we use system packages!
|
||||
self.conf_file_content += '\nagent.package_manager.system_site_packages=true\n'
|
||||
|
||||
self._agent_label = None
|
||||
|
||||
self._monitor_hanging_pods()
|
||||
|
||||
self._min_cleanup_interval_per_ns_sec = 1.0
|
||||
self._last_pod_cleanup_per_ns = defaultdict(lambda: 0.)
|
||||
|
||||
def _load_overrides_yaml(self, overrides_yaml):
|
||||
if not overrides_yaml:
|
||||
return
|
||||
overrides = self._load_template_file(overrides_yaml)
|
||||
if not overrides:
|
||||
return
|
||||
containers = overrides.get('spec', {}).get('containers', [])
|
||||
for c in containers:
|
||||
resources = {str(k).lower(): v for k, v in c.get('resources', {}).items()}
|
||||
if not resources:
|
||||
continue
|
||||
if resources.get('limits'):
|
||||
self.pod_limits += ['{}={}'.format(k, v) for k, v in resources['limits'].items()]
|
||||
if resources.get('requests'):
|
||||
self.pod_requests += ['{}={}'.format(k, v) for k, v in resources['requests'].items()]
|
||||
# remove double entries
|
||||
self.pod_limits = list(set(self.pod_limits))
|
||||
self.pod_requests = list(set(self.pod_requests))
|
||||
if self.pod_limits or self.pod_requests:
|
||||
self.log.warning('Found pod container requests={} limits={}'.format(
|
||||
self.pod_limits, self.pod_requests))
|
||||
if containers:
|
||||
self.log.warning('Removing containers section: {}'.format(overrides['spec'].pop('containers')))
|
||||
self.overrides_json_string = json.dumps(overrides)
|
||||
|
||||
def _monitor_hanging_pods(self):
|
||||
_check_pod_thread = Thread(target=self._monitor_hanging_pods_daemon)
|
||||
_check_pod_thread.daemon = True
|
||||
_check_pod_thread.start()
|
||||
|
||||
@staticmethod
|
||||
def _get_path(d, *path, default=None):
|
||||
try:
|
||||
return functools.reduce(
|
||||
lambda a, b: a[b], path, d
|
||||
)
|
||||
except (IndexError, KeyError):
|
||||
return default
|
||||
def _load_template_file(path):
|
||||
with open(os.path.expandvars(os.path.expanduser(str(path))), 'rt') as f:
|
||||
return yaml.load(f, Loader=getattr(yaml, 'FullLoader', None))
|
||||
|
||||
def _get_kubectl_options(self, command, extra_labels=None, filters=None, output="json", labels=None):
|
||||
# type: (str, Iterable[str], Iterable[str], str, Iterable[str]) -> Dict
|
||||
if not labels:
|
||||
labels = [self._get_agent_label()]
|
||||
labels = list(labels) + (list(extra_labels) if extra_labels else [])
|
||||
d = {
|
||||
"-l": ",".join(labels),
|
||||
"-n": str(self.namespace),
|
||||
"-o": output,
|
||||
}
|
||||
if filters:
|
||||
d["--field-selector"] = ",".join(filters)
|
||||
return d
|
||||
|
||||
def get_kubectl_command(self, command, output="json", **args):
|
||||
opts = self._get_kubectl_options(command, output=output, **args)
|
||||
return 'kubectl {command} {opts}'.format(
|
||||
command=command, opts=" ".join(x for item in opts.items() for x in item)
|
||||
)
|
||||
|
||||
def _monitor_hanging_pods_daemon(self):
|
||||
last_tasks_msgs = {} # last msg updated for every task
|
||||
|
||||
while True:
|
||||
output = get_bash_output('kubectl get pods -n {namespace} -o=JSON'.format(
|
||||
namespace=self.namespace
|
||||
))
|
||||
output = '' if not output else output if isinstance(output, str) else output.decode('utf-8')
|
||||
kubectl_cmd = self.get_kubectl_command("get pods", filters=["status.phase=Pending"])
|
||||
self.log.debug("Detecting hanging pods: {}".format(kubectl_cmd))
|
||||
output = stringify_bash_output(get_bash_output(kubectl_cmd))
|
||||
try:
|
||||
output_config = json.loads(output)
|
||||
except Exception as ex:
|
||||
@@ -218,11 +253,8 @@ class K8sIntegration(Worker):
|
||||
sleep(self._polling_interval)
|
||||
continue
|
||||
pods = output_config.get('items', [])
|
||||
task_ids = set()
|
||||
task_id_to_details = dict()
|
||||
for pod in pods:
|
||||
if self._get_path(pod, 'status', 'phase') != "Pending":
|
||||
continue
|
||||
|
||||
pod_name = pod.get('metadata', {}).get('name', None)
|
||||
if not pod_name:
|
||||
continue
|
||||
@@ -231,7 +263,11 @@ class K8sIntegration(Worker):
|
||||
if not task_id:
|
||||
continue
|
||||
|
||||
task_ids.add(task_id)
|
||||
namespace = pod.get('metadata', {}).get('namespace', None)
|
||||
if not namespace:
|
||||
continue
|
||||
|
||||
task_id_to_details[task_id] = (pod_name, namespace)
|
||||
|
||||
msg = None
|
||||
|
||||
@@ -250,9 +286,11 @@ class K8sIntegration(Worker):
|
||||
msg = reason + (" ({})".format(message) if message else "")
|
||||
|
||||
if reason == 'ImagePullBackOff':
|
||||
delete_pod_cmd = 'kubectl delete pods {} -n {}'.format(pod_name, self.namespace)
|
||||
delete_pod_cmd = 'kubectl delete pods {} -n {}'.format(pod_name, namespace)
|
||||
self.log.debug(" - deleting pod due to ImagePullBackOff: {}".format(delete_pod_cmd))
|
||||
get_bash_output(delete_pod_cmd)
|
||||
try:
|
||||
self.log.debug(" - Detecting hanging pods: {}".format(kubectl_cmd))
|
||||
self._session.api_client.tasks.failed(
|
||||
task=task_id,
|
||||
status_reason="K8S glue error: {}".format(msg),
|
||||
@@ -273,7 +311,7 @@ class K8sIntegration(Worker):
|
||||
service='tasks',
|
||||
action='update',
|
||||
json={"task": task_id, "status_message": "K8S glue status: {}".format(msg)},
|
||||
method='get',
|
||||
method=Request.def_method,
|
||||
async_enable=False,
|
||||
)
|
||||
if not result.ok:
|
||||
@@ -284,17 +322,51 @@ class K8sIntegration(Worker):
|
||||
last_tasks_msgs[task_id] = msg
|
||||
except Exception as ex:
|
||||
self.log.warning(
|
||||
'K8S Glue pods monitor: Failed setting status message for task "{}"\nEX: {}'.format(
|
||||
task_id, ex
|
||||
'K8S Glue pods monitor: Failed setting status message for task "{}"\nMSG: {}\nEX: {}'.format(
|
||||
task_id, msg, ex
|
||||
)
|
||||
)
|
||||
|
||||
if task_id_to_details:
|
||||
try:
|
||||
result = self._session.get(
|
||||
service='tasks',
|
||||
action='get_all',
|
||||
json={"id": list(task_id_to_details), "status": ["stopped"], "only_fields": ["id"]},
|
||||
method=Request.def_method,
|
||||
async_enable=False,
|
||||
)
|
||||
aborted_task_ids = list(filter(None, (task.get("id") for task in result["tasks"])))
|
||||
|
||||
for task_id in aborted_task_ids:
|
||||
pod_name, namespace = task_id_to_details.get(task_id)
|
||||
if not pod_name:
|
||||
self.log.error("Failed locating aborted task {} in pending pods list".format(task_id))
|
||||
continue
|
||||
self.log.info(
|
||||
"K8S Glue pods monitor: task {} was aborted by its pod {} is still pending, "
|
||||
"deleting pod".format(task_id, pod_name)
|
||||
)
|
||||
|
||||
kubectl_cmd = "kubectl delete pod {pod_name} --output name {namespace}".format(
|
||||
namespace=f"--namespace={namespace}" if namespace else "", pod_name=pod_name,
|
||||
).strip()
|
||||
self.log.debug("Deleting aborted task pending pod: {}".format(kubectl_cmd))
|
||||
output = stringify_bash_output(get_bash_output(kubectl_cmd))
|
||||
if not output:
|
||||
self.log.warning("K8S Glue pods monitor: failed deleting pod {}".format(pod_name))
|
||||
except Exception as ex:
|
||||
self.log.warning(
|
||||
'K8S Glue pods monitor: failed checking aborted tasks for hanging pods: {}'.format(ex)
|
||||
)
|
||||
|
||||
# clean up any last message for a task that wasn't seen as a pod
|
||||
last_tasks_msgs = {k: v for k, v in last_tasks_msgs.items() if k in task_ids}
|
||||
last_tasks_msgs = {k: v for k, v in last_tasks_msgs.items() if k in task_id_to_details}
|
||||
|
||||
sleep(self._polling_interval)
|
||||
|
||||
def _set_task_user_properties(self, task_id: str, **properties: str):
|
||||
def _set_task_user_properties(self, task_id: str, task_session=None, **properties: str):
|
||||
session = task_session or self._session
|
||||
if self._edit_hyperparams_support is not True:
|
||||
# either not supported or never tested
|
||||
if self._edit_hyperparams_support == self._session.api_version:
|
||||
@@ -305,7 +377,7 @@ class K8sIntegration(Worker):
|
||||
self._edit_hyperparams_support = self._session.api_version
|
||||
return
|
||||
try:
|
||||
self._session.get(
|
||||
session.get(
|
||||
service="tasks",
|
||||
action="edit_hyper_params",
|
||||
task=task_id,
|
||||
@@ -336,68 +408,95 @@ class K8sIntegration(Worker):
|
||||
|
||||
return self._agent_label
|
||||
|
||||
def _get_number_used_pods(self):
|
||||
def _get_used_pods(self):
|
||||
# type: () -> Tuple[int, Set[str]]
|
||||
# noinspection PyBroadException
|
||||
try:
|
||||
kubectl_cmd_new = "kubectl get pods -l {agent_label} -n {namespace} -o json".format(
|
||||
agent_label=self._get_agent_label(),
|
||||
namespace=self.namespace,
|
||||
kubectl_cmd = self.get_kubectl_command(
|
||||
"get pods",
|
||||
output="jsonpath=\"{range .items[*]}{.metadata.name}{' '}{.metadata.namespace}{'\\n'}{end}\""
|
||||
)
|
||||
process = subprocess.Popen(kubectl_cmd_new.split(), stdout=subprocess.PIPE, stderr=subprocess.PIPE)
|
||||
output, error = process.communicate()
|
||||
output = '' if not output else output if isinstance(output, str) else output.decode('utf-8')
|
||||
error = '' if not error else error if isinstance(error, str) else error.decode('utf-8')
|
||||
self.log.debug("Getting used pods: {}".format(kubectl_cmd))
|
||||
output = stringify_bash_output(get_bash_output(kubectl_cmd, raise_error=True))
|
||||
|
||||
if not output:
|
||||
# No such pod exist so we can use the pod_number we found
|
||||
return 0
|
||||
return 0, set([])
|
||||
|
||||
try:
|
||||
current_pod_count = len(json.loads(output).get("items", []))
|
||||
except (ValueError, TypeError) as ex:
|
||||
return -1
|
||||
items = output.splitlines()
|
||||
current_pod_count = len(items)
|
||||
namespaces = {item.rpartition(" ")[-1] for item in items}
|
||||
self.log.debug(" - found {} pods in namespaces {}".format(current_pod_count, ", ".join(namespaces)))
|
||||
except (KeyError, ValueError, TypeError, AttributeError) as ex:
|
||||
print("Failed parsing used pods command response for cleanup: {}".format(ex))
|
||||
return -1, set([])
|
||||
|
||||
return current_pod_count
|
||||
return current_pod_count, namespaces
|
||||
except Exception as ex:
|
||||
print('Failed getting number of used pods: {}'.format(ex))
|
||||
return -2
|
||||
print('Failed obtaining used pods information: {}'.format(ex))
|
||||
return -2, set([])
|
||||
|
||||
def _is_same_tenant(self, task_session):
|
||||
if not task_session or task_session is self._session:
|
||||
return True
|
||||
# noinspection PyStatementEffect
|
||||
try:
|
||||
tenant = self._session.get_decoded_token(self._session.token, verify=False)["tenant"]
|
||||
task_tenant = task_session.get_decoded_token(task_session.token, verify=False)["tenant"]
|
||||
return tenant == task_tenant
|
||||
except Exception as ex:
|
||||
print("ERROR: Failed getting tenant for task session: {}".format(ex))
|
||||
|
||||
def run_one_task(self, queue: Text, task_id: Text, worker_args=None, task_session=None, **_):
|
||||
print('Pulling task {} launching on kubernetes cluster'.format(task_id))
|
||||
task_data = self._session.api_client.tasks.get_all(id=[task_id])[0]
|
||||
session = task_session or self._session
|
||||
task_data = session.api_client.tasks.get_all(id=[task_id])[0]
|
||||
|
||||
# push task into the k8s queue, so we have visibility on pending tasks in the k8s scheduler
|
||||
try:
|
||||
print('Pushing task {} into temporary pending queue'.format(task_id))
|
||||
res = self._session.api_client.tasks.stop(task_id, force=True)
|
||||
res = self._session.api_client.tasks.enqueue(
|
||||
task_id,
|
||||
queue=self.k8s_pending_queue_name,
|
||||
status_reason='k8s pending scheduler',
|
||||
)
|
||||
if res.meta.result_code != 200:
|
||||
raise Exception(res.meta.result_msg)
|
||||
except Exception as e:
|
||||
self.log.error("ERROR: Could not push back task [{}] to k8s pending queue [{}], error: {}".format(
|
||||
task_id, self.k8s_pending_queue_name, e))
|
||||
return
|
||||
if self._is_same_tenant(task_session):
|
||||
try:
|
||||
print('Pushing task {} into temporary pending queue'.format(task_id))
|
||||
_ = session.api_client.tasks.stop(task_id, force=True)
|
||||
|
||||
container = get_task_container(self._session, task_id)
|
||||
res = self._session.api_client.tasks.enqueue(
|
||||
task_id,
|
||||
queue=self.k8s_pending_queue_id,
|
||||
status_reason='k8s pending scheduler',
|
||||
)
|
||||
if res.meta.result_code != 200:
|
||||
raise Exception(res.meta.result_msg)
|
||||
except Exception as e:
|
||||
self.log.error("ERROR: Could not push back task [{}] to k8s pending queue {} [{}], error: {}".format(
|
||||
task_id, self.k8s_pending_queue_name, self.k8s_pending_queue_id, e))
|
||||
return
|
||||
|
||||
container = get_task_container(session, task_id)
|
||||
if not container.get('image'):
|
||||
container['image'] = str(
|
||||
ENV_DOCKER_IMAGE.get() or self._session.config.get("agent.default_docker.image", "nvidia/cuda")
|
||||
ENV_DOCKER_IMAGE.get() or session.config.get("agent.default_docker.image", "nvidia/cuda")
|
||||
)
|
||||
container['arguments'] = self._session.config.get("agent.default_docker.arguments", None)
|
||||
container['arguments'] = session.config.get("agent.default_docker.arguments", None)
|
||||
set_task_container(
|
||||
self._session, task_id, docker_image=container['image'], docker_arguments=container['arguments']
|
||||
session, task_id, docker_image=container['image'], docker_arguments=container['arguments']
|
||||
)
|
||||
|
||||
# get the clearml.conf encoded file
|
||||
# get the clearml.conf encoded file, make sure we use system packages!
|
||||
|
||||
git_user = ENV_AGENT_GIT_USER.get() or self._session.config.get("agent.git_user", None)
|
||||
git_pass = ENV_AGENT_GIT_PASS.get() or self._session.config.get("agent.git_pass", None)
|
||||
extra_config_values = [
|
||||
'agent.package_manager.system_site_packages: true' if self._force_system_site_packages else '',
|
||||
'agent.git_user: "{}"'.format(git_user) if git_user else '',
|
||||
'agent.git_pass: "{}"'.format(git_pass) if git_pass else '',
|
||||
]
|
||||
|
||||
# noinspection PyProtectedMember
|
||||
hocon_config_encoded = (
|
||||
self.conf_file_content
|
||||
or Path(self._session._config_file).read_text()
|
||||
).encode("ascii")
|
||||
config_content = (
|
||||
self.conf_file_content or (session._config_file and Path(session._config_file).read_text()) or ""
|
||||
) + '\n{}\n'.format('\n'.join(x for x in extra_config_values if x))
|
||||
|
||||
hocon_config_encoded = config_content.encode("ascii")
|
||||
|
||||
create_clearml_conf = ["echo '{}' | base64 --decode >> ~/clearml.conf".format(
|
||||
base64.b64encode(
|
||||
@@ -426,39 +525,40 @@ class K8sIntegration(Worker):
|
||||
pod_number = self.base_pod_num
|
||||
while self.ports_mode or self.max_pods_limit:
|
||||
pod_number = self.base_pod_num + pod_count
|
||||
if self.ports_mode:
|
||||
kubectl_cmd_new = "kubectl get pods -l {pod_label},{agent_label} -n {namespace}".format(
|
||||
pod_label=self.LIMIT_POD_LABEL.format(pod_number=pod_number),
|
||||
agent_label=self._get_agent_label(),
|
||||
namespace=self.namespace,
|
||||
)
|
||||
else:
|
||||
kubectl_cmd_new = "kubectl get pods -l {agent_label} -n {namespace} -o json".format(
|
||||
agent_label=self._get_agent_label(),
|
||||
namespace=self.namespace,
|
||||
)
|
||||
|
||||
kubectl_cmd_new = self.get_kubectl_command(
|
||||
"get pods",
|
||||
extra_labels=[self.limit_pod_label.format(pod_number=pod_number)] if self.ports_mode else None
|
||||
)
|
||||
self.log.debug("Looking for a free pod/port: {}".format(kubectl_cmd_new))
|
||||
process = subprocess.Popen(kubectl_cmd_new.split(), stdout=subprocess.PIPE, stderr=subprocess.PIPE)
|
||||
output, error = process.communicate()
|
||||
output = '' if not output else output if isinstance(output, str) else output.decode('utf-8')
|
||||
error = '' if not error else error if isinstance(error, str) else error.decode('utf-8')
|
||||
output = stringify_bash_output(output)
|
||||
error = stringify_bash_output(error)
|
||||
|
||||
if not output:
|
||||
# No such pod exist so we can use the pod_number we found
|
||||
try:
|
||||
items_count = len(json.loads(output).get("items", []))
|
||||
except (ValueError, TypeError) as ex:
|
||||
self.log.warning(
|
||||
"K8S Glue pods monitor: Failed parsing kubectl output:\n{}\ntask '{}' "
|
||||
"will be enqueued back to queue '{}'\nEx: {}".format(
|
||||
output, task_id, queue, ex
|
||||
)
|
||||
)
|
||||
session.api_client.tasks.stop(task_id, force=True)
|
||||
# noinspection PyBroadException
|
||||
try:
|
||||
self._session.api_client.tasks.enqueue(task_id, queue=queue, status_reason='kubectl parsing error')
|
||||
except:
|
||||
self.log.warning("Failed enqueuing task to queue '{}'".format(queue))
|
||||
return
|
||||
|
||||
if not items_count:
|
||||
# No such pod exist so we can use the pod_number we found (result exists but with no items)
|
||||
break
|
||||
|
||||
if self.max_pods_limit:
|
||||
try:
|
||||
current_pod_count = len(json.loads(output).get("items", []))
|
||||
except (ValueError, TypeError) as ex:
|
||||
self.log.warning(
|
||||
"K8S Glue pods monitor: Failed parsing kubectl output:\n{}\ntask '{}' "
|
||||
"will be enqueued back to queue '{}'\nEx: {}".format(
|
||||
output, task_id, queue, ex
|
||||
)
|
||||
)
|
||||
self._session.api_client.tasks.stop(task_id, force=True)
|
||||
self._session.api_client.tasks.enqueue(task_id, queue=queue, status_reason='kubectl parsing error')
|
||||
return
|
||||
current_pod_count = items_count
|
||||
max_count = self.max_pods_limit
|
||||
else:
|
||||
current_pod_count = pod_count
|
||||
@@ -474,42 +574,54 @@ class K8sIntegration(Worker):
|
||||
task_id, queue
|
||||
)
|
||||
)
|
||||
self._session.api_client.tasks.stop(task_id, force=True)
|
||||
self._session.api_client.tasks.enqueue(
|
||||
task_id, queue=queue, status_reason='k8s max pod limit (no free k8s service)')
|
||||
session.api_client.tasks.stop(task_id, force=True)
|
||||
# noinspection PyBroadException
|
||||
try:
|
||||
self._session.api_client.tasks.enqueue(
|
||||
task_id, queue=queue, status_reason='k8s max pod limit (no free k8s service)'
|
||||
)
|
||||
except:
|
||||
self.log.warning("Failed enqueuing task to queue '{}'".format(queue))
|
||||
return
|
||||
elif self.max_pods_limit:
|
||||
# max pods limit hasn't reached yet, so we can create the pod
|
||||
break
|
||||
pod_count += 1
|
||||
|
||||
labels = ([self.LIMIT_POD_LABEL.format(pod_number=pod_number)] if self.ports_mode else []) + \
|
||||
[self._get_agent_label()]
|
||||
labels.append("clearml-agent-queue={}".format(self._safe_k8s_label_value(queue)))
|
||||
labels.append("clearml-agent-queue-name={}".format(self._safe_k8s_label_value(queue_name)))
|
||||
labels = self._get_pod_labels(queue, queue_name)
|
||||
if self.ports_mode:
|
||||
labels.append(self.limit_pod_label.format(pod_number=pod_number))
|
||||
|
||||
if self.ports_mode:
|
||||
print("Kubernetes scheduling task id={} on pod={} (pod_count={})".format(task_id, pod_number, pod_count))
|
||||
else:
|
||||
print("Kubernetes scheduling task id={}".format(task_id))
|
||||
|
||||
kubectl_kwargs = dict(
|
||||
create_clearml_conf=create_clearml_conf,
|
||||
labels=labels,
|
||||
docker_image=container['image'],
|
||||
docker_args=container['arguments'],
|
||||
docker_bash=container.get('setup_shell_script'),
|
||||
task_id=task_id,
|
||||
queue=queue
|
||||
)
|
||||
try:
|
||||
template = self._resolve_template(task_session, task_data, queue)
|
||||
except Exception as ex:
|
||||
print("ERROR: Failed resolving template (skipping): {}".format(ex))
|
||||
return
|
||||
|
||||
if self.template_dict:
|
||||
output, error = self._kubectl_apply(**kubectl_kwargs)
|
||||
else:
|
||||
output, error = self._kubectl_run(task_data=task_data, **kubectl_kwargs)
|
||||
try:
|
||||
namespace = template['metadata']['namespace'] or self.namespace
|
||||
except (KeyError, TypeError, AttributeError):
|
||||
namespace = self.namespace
|
||||
|
||||
if template:
|
||||
output, error = self._kubectl_apply(
|
||||
template=template,
|
||||
pod_number=pod_number,
|
||||
create_clearml_conf=create_clearml_conf,
|
||||
labels=labels,
|
||||
docker_image=container['image'],
|
||||
docker_args=container['arguments'],
|
||||
docker_bash=container.get('setup_shell_script'),
|
||||
task_id=task_id,
|
||||
queue=queue,
|
||||
namespace=namespace,
|
||||
)
|
||||
|
||||
error = '' if not error else (error if isinstance(error, str) else error.decode('utf-8'))
|
||||
output = '' if not output else (output if isinstance(output, str) else output.decode('utf-8'))
|
||||
print('kubectl output:\n{}\n{}'.format(error, output))
|
||||
if error:
|
||||
send_log = "Running kubectl encountered an error: {}".format(error)
|
||||
@@ -523,6 +635,7 @@ class K8sIntegration(Worker):
|
||||
"k8s-pod-number": pod_number,
|
||||
"k8s-pod-label": labels[0],
|
||||
"k8s-internal-pod-count": pod_count,
|
||||
"k8s-agent": self._get_agent_label(),
|
||||
}
|
||||
)
|
||||
|
||||
@@ -537,9 +650,17 @@ class K8sIntegration(Worker):
|
||||
if user_props:
|
||||
self._set_task_user_properties(
|
||||
task_id=task_id,
|
||||
task_session=task_session,
|
||||
**user_props
|
||||
)
|
||||
|
||||
def _get_pod_labels(self, queue, queue_name):
|
||||
return [
|
||||
self._get_agent_label(),
|
||||
"clearml-agent-queue={}".format(self._safe_k8s_label_value(queue)),
|
||||
"clearml-agent-queue-name={}".format(self._safe_k8s_label_value(queue_name))
|
||||
]
|
||||
|
||||
def _get_docker_args(self, docker_args, flags, target=None, convert=None):
|
||||
# type: (List[str], Collection[str], Optional[str], Callable[[str], Any]) -> Union[dict, List[str]]
|
||||
"""
|
||||
@@ -566,12 +687,23 @@ class K8sIntegration(Worker):
|
||||
return {target: results} if results else {}
|
||||
return results
|
||||
|
||||
def _kubectl_apply(self, create_clearml_conf, docker_image, docker_args, docker_bash, labels, queue, task_id):
|
||||
template = deepcopy(self.template_dict)
|
||||
def _kubectl_apply(
|
||||
self,
|
||||
create_clearml_conf,
|
||||
docker_image,
|
||||
docker_args,
|
||||
docker_bash,
|
||||
labels,
|
||||
queue,
|
||||
task_id,
|
||||
namespace,
|
||||
template=None,
|
||||
pod_number=None
|
||||
):
|
||||
template.setdefault('apiVersion', 'v1')
|
||||
template['kind'] = 'Pod'
|
||||
template.setdefault('metadata', {})
|
||||
name = 'clearml-id-{task_id}'.format(task_id=task_id)
|
||||
name = self.pod_name_prefix + str(task_id)
|
||||
template['metadata']['name'] = name
|
||||
template.setdefault('spec', {})
|
||||
template['spec'].setdefault('containers', [])
|
||||
@@ -599,17 +731,22 @@ class K8sIntegration(Worker):
|
||||
['#!/bin/bash', ] +
|
||||
[line.format(extra_bash_init_cmd=self.extra_bash_init_script or '',
|
||||
task_id=task_id,
|
||||
extra_docker_bash_script=extra_docker_bash_script)
|
||||
extra_docker_bash_script=extra_docker_bash_script,
|
||||
default_execution_agent_args=self.DEFAULT_EXECUTION_AGENT_ARGS,
|
||||
agent_install_args=self.POD_AGENT_INSTALL_ARGS)
|
||||
for line in container_bash_script])
|
||||
|
||||
extra_bash_commands = list(create_clearml_conf or [])
|
||||
|
||||
start_agent_script_path = ENV_START_AGENT_SCRIPT_PATH.get() or "~/__start_agent__.sh"
|
||||
|
||||
extra_bash_commands.append(
|
||||
"echo '{}' | base64 --decode >> ~/__start_agent__.sh ; "
|
||||
"/bin/bash ~/__start_agent__.sh".format(
|
||||
base64.b64encode(
|
||||
"echo '{content}' | base64 --decode >> {script_path} ; /bin/bash {script_path}".format(
|
||||
content=base64.b64encode(
|
||||
script_encoded.encode('ascii')
|
||||
).decode('ascii'))
|
||||
).decode('ascii'),
|
||||
script_path=start_agent_script_path
|
||||
)
|
||||
)
|
||||
|
||||
# Notice: we always leave with exit code 0, so pods are never restarted
|
||||
@@ -634,11 +771,13 @@ class K8sIntegration(Worker):
|
||||
with open(yaml_file, 'wt') as f:
|
||||
yaml.dump(template, f)
|
||||
|
||||
self.log.debug("Applying template:\n{}".format(pformat(template, indent=2)))
|
||||
|
||||
kubectl_cmd = self.KUBECTL_APPLY_CMD.format(
|
||||
task_id=task_id,
|
||||
docker_image=docker_image,
|
||||
queue_id=queue,
|
||||
namespace=self.namespace
|
||||
namespace=namespace
|
||||
)
|
||||
# make sure we provide a list
|
||||
if isinstance(kubectl_cmd, str):
|
||||
@@ -654,57 +793,81 @@ class K8sIntegration(Worker):
|
||||
finally:
|
||||
safe_remove_file(yaml_file)
|
||||
|
||||
return output, error
|
||||
return stringify_bash_output(output), stringify_bash_output(error)
|
||||
|
||||
def _kubectl_run(
|
||||
self, create_clearml_conf, docker_image, docker_args, docker_bash, labels, queue, task_data, task_id
|
||||
):
|
||||
if callable(self.kubectl_cmd):
|
||||
kubectl_cmd = self.kubectl_cmd(task_id, docker_image, docker_args, queue, task_data)
|
||||
else:
|
||||
kubectl_cmd = self.kubectl_cmd.format(
|
||||
task_id=task_id,
|
||||
docker_image=docker_image,
|
||||
docker_args=" ".join(self._get_docker_args(
|
||||
docker_args, flags={"-e", "--env"}, convert=lambda env: '--env={}'.format(env))
|
||||
),
|
||||
queue_id=queue,
|
||||
namespace=self.namespace,
|
||||
def _cleanup_old_pods(self, namespaces, extra_msg=None):
|
||||
# type: (Iterable[str], Optional[str]) -> Dict[str, List[str]]
|
||||
self.log.debug("Cleaning up pods")
|
||||
deleted_pods = defaultdict(list)
|
||||
for namespace in namespaces:
|
||||
if time() - self._last_pod_cleanup_per_ns[namespace] < self._min_cleanup_interval_per_ns_sec:
|
||||
# Do not try to cleanup the same namespace too quickly
|
||||
continue
|
||||
kubectl_cmd = self.KUBECTL_CLEANUP_DELETE_CMD.format(
|
||||
namespace=namespace, agent_label=self._get_agent_label()
|
||||
)
|
||||
# make sure we provide a list
|
||||
if isinstance(kubectl_cmd, str):
|
||||
kubectl_cmd = kubectl_cmd.split()
|
||||
self.log.debug("Deleting old/failed pods{} for ns {}: {}".format(
|
||||
extra_msg or "", namespace, kubectl_cmd
|
||||
))
|
||||
try:
|
||||
res = get_bash_output(kubectl_cmd, raise_error=True)
|
||||
lines = [
|
||||
line for line in
|
||||
(r.strip().rpartition("/")[-1] for r in res.splitlines())
|
||||
if line.startswith(self.pod_name_prefix)
|
||||
]
|
||||
self.log.debug(" - deleted pod(s) %s", ", ".join(lines))
|
||||
deleted_pods[namespace].extend(lines)
|
||||
except Exception as ex:
|
||||
self.log.error("Failed deleting old/failed pods for ns %s: %s", namespace, str(ex))
|
||||
finally:
|
||||
self._last_pod_cleanup_per_ns[namespace] = time()
|
||||
|
||||
if self.overrides_json_string:
|
||||
kubectl_cmd += ['--overrides=' + self.overrides_json_string]
|
||||
# Locate tasks belonging to deleted pods that are still marked as pending or running
|
||||
tasks_to_abort = []
|
||||
try:
|
||||
task_ids = list(filter(None, (
|
||||
pod_name[len(self.pod_name_prefix):].strip()
|
||||
for pod_names in deleted_pods.values()
|
||||
for pod_name in pod_names
|
||||
)))
|
||||
if task_ids:
|
||||
result = self._session.get(
|
||||
service='tasks',
|
||||
action='get_all',
|
||||
json={"id": task_ids, "status": ["in_progress", "queued"], "only_fields": ["id", "status"]},
|
||||
method=Request.def_method,
|
||||
)
|
||||
tasks_to_abort = result["tasks"]
|
||||
except Exception as ex:
|
||||
self.log.warning('Failed getting running tasks for deleted pods: {}'.format(ex))
|
||||
|
||||
if self.pod_limits:
|
||||
kubectl_cmd += ['--limits', ",".join(self.pod_limits)]
|
||||
if self.pod_requests:
|
||||
kubectl_cmd += ['--requests', ",".join(self.pod_requests)]
|
||||
for task in tasks_to_abort:
|
||||
task_id = task.get("id")
|
||||
status = task.get("status")
|
||||
if not task_id or not status:
|
||||
self.log.warning('Failed getting task information: id={}, status={}'.format(task_id, status))
|
||||
continue
|
||||
try:
|
||||
if status == "queued":
|
||||
self._session.get(
|
||||
service='tasks',
|
||||
action='dequeue',
|
||||
json={"task": task_id, "force": True, "status_reason": "Pod deleted (not pending or running)",
|
||||
"status_message": "Pod deleted by agent {}".format(self.worker_id or "unknown")},
|
||||
method=Request.def_method,
|
||||
)
|
||||
self._session.get(
|
||||
service='tasks',
|
||||
action='failed',
|
||||
json={"task": task_id, "force": True, "status_reason": "Pod deleted (not pending or running)",
|
||||
"status_message": "Pod deleted by agent {}".format(self.worker_id or "unknown")},
|
||||
method=Request.def_method,
|
||||
)
|
||||
except Exception as ex:
|
||||
self.log.warning('Failed setting task {} to status "failed": {}'.format(task_id, ex))
|
||||
|
||||
if self._docker_force_pull and not any(x.startswith("--image-pull-policy=") for x in kubectl_cmd):
|
||||
kubectl_cmd += ["--image-pull-policy='always'"]
|
||||
|
||||
container_bash_script = [self.container_bash_script] if isinstance(self.container_bash_script, str) \
|
||||
else self.container_bash_script
|
||||
container_bash_script = ' ; '.join(container_bash_script)
|
||||
|
||||
kubectl_cmd += [
|
||||
"--labels=" + ",".join(labels),
|
||||
"--command",
|
||||
"--",
|
||||
"/bin/sh",
|
||||
"-c",
|
||||
"{} ; {}".format(" ; ".join(create_clearml_conf or []), container_bash_script.format(
|
||||
extra_bash_init_cmd=self.extra_bash_init_script or "",
|
||||
extra_docker_bash_script=docker_bash or "",
|
||||
task_id=task_id
|
||||
)),
|
||||
]
|
||||
process = subprocess.Popen(kubectl_cmd, stdout=subprocess.PIPE, stderr=subprocess.PIPE)
|
||||
output, error = process.communicate()
|
||||
return output, error
|
||||
return deleted_pods
|
||||
|
||||
def run_tasks_loop(self, queues: List[Text], worker_params, **kwargs):
|
||||
"""
|
||||
@@ -720,26 +883,29 @@ class K8sIntegration(Worker):
|
||||
events_service = self.get_service(Events)
|
||||
|
||||
# make sure we have a k8s pending queue
|
||||
# noinspection PyBroadException
|
||||
try:
|
||||
self._session.api_client.queues.create(self.k8s_pending_queue_name)
|
||||
except Exception:
|
||||
pass
|
||||
# get queue id
|
||||
self.k8s_pending_queue_name = self._resolve_name(self.k8s_pending_queue_name, "queues")
|
||||
if not self.k8s_pending_queue_id:
|
||||
resolved_ids = self._resolve_queue_names([self.k8s_pending_queue_name], create_if_missing=True)
|
||||
if not resolved_ids:
|
||||
raise ValueError(
|
||||
"Failed resolving or creating k8s pending queue {}".format(self.k8s_pending_queue_name)
|
||||
)
|
||||
self.k8s_pending_queue_id = resolved_ids[0]
|
||||
|
||||
_last_machine_update_ts = 0
|
||||
while True:
|
||||
# Get used pods and namespaces
|
||||
current_pods, namespaces = self._get_used_pods()
|
||||
|
||||
# just in case there are no pods, make sure we look at our base namespace
|
||||
namespaces.add(self.namespace)
|
||||
|
||||
# check if have pod limit, then check if we hit it.
|
||||
if self.max_pods_limit:
|
||||
current_pods = self._get_number_used_pods()
|
||||
if current_pods >= self.max_pods_limit:
|
||||
print("Maximum pod limit reached {}/{}, sleeping for {:.1f} seconds".format(
|
||||
current_pods, self.max_pods_limit, self._polling_interval))
|
||||
# delete old completed / failed pods
|
||||
get_bash_output(
|
||||
self.KUBECTL_DELETE_CMD.format(namespace=self.namespace, selector=self._get_agent_label())
|
||||
)
|
||||
self._cleanup_old_pods(namespaces, " due to pod limit")
|
||||
# go to sleep
|
||||
sleep(self._polling_interval)
|
||||
continue
|
||||
@@ -747,32 +913,22 @@ class K8sIntegration(Worker):
|
||||
# iterate over queues (priority style, queues[0] is highest)
|
||||
for queue in queues:
|
||||
# delete old completed / failed pods
|
||||
get_bash_output(
|
||||
self.KUBECTL_DELETE_CMD.format(namespace=self.namespace, selector=self._get_agent_label())
|
||||
)
|
||||
self._cleanup_old_pods(namespaces)
|
||||
|
||||
# get next task in queue
|
||||
try:
|
||||
response = get_next_task(
|
||||
self._session, queue=queue, get_task_info=self._impersonate_as_task_owner
|
||||
)
|
||||
response = self._get_next_task(queue=queue, get_task_info=self._impersonate_as_task_owner)
|
||||
except Exception as e:
|
||||
print("Warning: Could not access task queue [{}], error: {}".format(queue, e))
|
||||
continue
|
||||
else:
|
||||
if not response:
|
||||
continue
|
||||
try:
|
||||
task_id = response["entry"]["task"]
|
||||
except (KeyError, TypeError, AttributeError):
|
||||
print("No tasks in queue {}".format(queue))
|
||||
continue
|
||||
events_service.send_log_events(
|
||||
self.worker_id,
|
||||
task_id=task_id,
|
||||
lines="task {} pulled from {} by worker {}".format(
|
||||
task_id, queue, self.worker_id
|
||||
),
|
||||
level="INFO",
|
||||
)
|
||||
|
||||
task_session = None
|
||||
if self._impersonate_as_task_owner:
|
||||
@@ -792,6 +948,16 @@ class K8sIntegration(Worker):
|
||||
)
|
||||
continue
|
||||
|
||||
events_service.send_log_events(
|
||||
self.worker_id,
|
||||
task_id=task_id,
|
||||
lines="task {} pulled from {} by worker {}".format(
|
||||
task_id, queue, self.worker_id
|
||||
),
|
||||
level="INFO",
|
||||
session=task_session,
|
||||
)
|
||||
|
||||
self.report_monitor(ResourceMonitor.StatusReport(queues=queues, queue=queue, task=task_id))
|
||||
self.run_one_task(queue, task_id, worker_params, task_session)
|
||||
self.report_monitor(ResourceMonitor.StatusReport(queues=self.queues))
|
||||
@@ -820,6 +986,15 @@ class K8sIntegration(Worker):
|
||||
log_level=logging.INFO, foreground=True, docker=False, **kwargs,
|
||||
)
|
||||
|
||||
def _get_next_task(self, queue, get_task_info):
|
||||
return get_next_task(
|
||||
self._session, queue=queue, get_task_info=get_task_info
|
||||
)
|
||||
|
||||
def _resolve_template(self, task_session, task_data, queue):
|
||||
if self.template_dict:
|
||||
return deepcopy(self.template_dict)
|
||||
|
||||
@classmethod
|
||||
def get_ssh_server_bash(cls, ssh_port_number):
|
||||
return ' ; '.join(line.format(port=ssh_port_number) for line in cls.BASH_INSTALL_SSH_CMD)
|
||||
@@ -845,5 +1020,6 @@ class K8sIntegration(Worker):
|
||||
value = re.sub(r'^[^A-Za-z0-9]+', '', value) # strip leading non-alphanumeric chars
|
||||
value = re.sub(r'[^A-Za-z0-9]+$', '', value) # strip trailing non-alphanumeric chars
|
||||
value = re.sub(r'\W+', '-', value) # allow only word chars (this removed "." which is supported, but nvm)
|
||||
value = re.sub(r'_+', '-', value) # "_" is not allowed as well
|
||||
value = re.sub(r'-+', '-', value) # don't leave messy "--" after replacing previous chars
|
||||
return value[:63]
|
||||
|
||||
@@ -20,13 +20,13 @@ from typing import Text, Dict, Any, Optional, AnyStr, IO, Union
|
||||
|
||||
import attr
|
||||
import furl
|
||||
import pyhocon
|
||||
import yaml
|
||||
from attr import fields_dict
|
||||
from pathlib2 import Path
|
||||
|
||||
import six
|
||||
from six.moves import reduce
|
||||
from clearml_agent.external import pyhocon
|
||||
from clearml_agent.errors import CommandFailedError
|
||||
from clearml_agent.helper.dicts import filter_keys
|
||||
|
||||
|
||||
96
clearml_agent/helper/docker_args.py
Normal file
96
clearml_agent/helper/docker_args.py
Normal file
@@ -0,0 +1,96 @@
|
||||
import re
|
||||
import shlex
|
||||
from typing import Tuple, List, TYPE_CHECKING
|
||||
from urllib.parse import urlunparse, urlparse
|
||||
|
||||
from clearml_agent.definitions import (
|
||||
ENV_AGENT_GIT_PASS,
|
||||
ENV_AGENT_SECRET_KEY,
|
||||
ENV_AWS_SECRET_KEY,
|
||||
ENV_AZURE_ACCOUNT_KEY,
|
||||
ENV_AGENT_AUTH_TOKEN,
|
||||
ENV_DOCKER_IMAGE,
|
||||
ENV_DOCKER_ARGS_HIDE_ENV,
|
||||
)
|
||||
|
||||
if TYPE_CHECKING:
|
||||
from clearml_agent.session import Session
|
||||
|
||||
|
||||
class DockerArgsSanitizer:
|
||||
@classmethod
|
||||
def sanitize_docker_command(cls, session, docker_command):
|
||||
# type: (Session, List[str]) -> List[str]
|
||||
if not docker_command:
|
||||
return docker_command
|
||||
|
||||
enabled = (
|
||||
session.config.get('agent.hide_docker_command_env_vars.enabled', False) or ENV_DOCKER_ARGS_HIDE_ENV.get()
|
||||
)
|
||||
if not enabled:
|
||||
return docker_command
|
||||
|
||||
keys = set(session.config.get('agent.hide_docker_command_env_vars.extra_keys', []))
|
||||
if ENV_DOCKER_ARGS_HIDE_ENV.get():
|
||||
keys.update(shlex.split(ENV_DOCKER_ARGS_HIDE_ENV.get().strip()))
|
||||
keys.update(
|
||||
ENV_AGENT_GIT_PASS.vars,
|
||||
ENV_AGENT_SECRET_KEY.vars,
|
||||
ENV_AWS_SECRET_KEY.vars,
|
||||
ENV_AZURE_ACCOUNT_KEY.vars,
|
||||
ENV_AGENT_AUTH_TOKEN.vars,
|
||||
)
|
||||
|
||||
parse_embedded_urls = bool(session.config.get(
|
||||
'agent.hide_docker_command_env_vars.parse_embedded_urls', True
|
||||
))
|
||||
|
||||
skip_next = False
|
||||
result = docker_command[:]
|
||||
for i, item in enumerate(docker_command):
|
||||
if skip_next:
|
||||
skip_next = False
|
||||
continue
|
||||
try:
|
||||
if item in ("-e", "--env"):
|
||||
key, sep, val = result[i + 1].partition("=")
|
||||
if not sep:
|
||||
continue
|
||||
if key in ENV_DOCKER_IMAGE.vars:
|
||||
# special case - this contains a complete docker command
|
||||
val = " ".join(cls.sanitize_docker_command(session, re.split(r"\s", val)))
|
||||
elif key in keys:
|
||||
val = "********"
|
||||
elif parse_embedded_urls:
|
||||
val = cls._sanitize_urls(val)[0]
|
||||
result[i + 1] = "{}={}".format(key, val)
|
||||
skip_next = True
|
||||
elif parse_embedded_urls and not item.startswith("-"):
|
||||
item, changed = cls._sanitize_urls(item)
|
||||
if changed:
|
||||
result[i] = item
|
||||
except (KeyError, TypeError):
|
||||
pass
|
||||
|
||||
return result
|
||||
|
||||
@staticmethod
|
||||
def _sanitize_urls(s: str) -> Tuple[str, bool]:
|
||||
""" Replaces passwords in URLs with asterisks """
|
||||
regex = re.compile("^([^:]*:)[^@]+(.*)$")
|
||||
tokens = re.split(r"\s", s)
|
||||
changed = False
|
||||
for k in range(len(tokens)):
|
||||
if "@" in tokens[k]:
|
||||
res = urlparse(tokens[k])
|
||||
if regex.match(res.netloc):
|
||||
changed = True
|
||||
tokens[k] = urlunparse((
|
||||
res.scheme,
|
||||
regex.sub("\\1********\\2", res.netloc),
|
||||
res.path,
|
||||
res.params,
|
||||
res.query,
|
||||
res.fragment
|
||||
))
|
||||
return " ".join(tokens) if changed else s, changed
|
||||
@@ -80,7 +80,12 @@ class PackageManager(object):
|
||||
|
||||
def upgrade_pip(self):
|
||||
result = self._install(
|
||||
select_for_platform(windows='pip{}', linux='pip{}').format(self.get_pip_version()), "--upgrade")
|
||||
*select_for_platform(
|
||||
windows=self.get_pip_versions(),
|
||||
linux=self.get_pip_versions()
|
||||
),
|
||||
"--upgrade"
|
||||
)
|
||||
packages = self.run_with_env(('list',), output=True).splitlines()
|
||||
# p.split is ('pip', 'x.y.z')
|
||||
pip = [p.split() for p in packages if len(p.split()) == 2 and p.split()[0] == 'pip']
|
||||
@@ -157,15 +162,26 @@ class PackageManager(object):
|
||||
def set_pip_version(cls, version):
|
||||
if not version:
|
||||
return
|
||||
version = version.replace(' ', '')
|
||||
if ('=' in version) or ('~' in version) or ('<' in version) or ('>' in version):
|
||||
cls._pip_version = version
|
||||
|
||||
if isinstance(version, (list, tuple)):
|
||||
versions = version
|
||||
else:
|
||||
cls._pip_version = "=="+version
|
||||
versions = [version]
|
||||
|
||||
cls._pip_version = []
|
||||
for version in versions:
|
||||
version = version.strip()
|
||||
if ('=' in version) or ('~' in version) or ('<' in version) or ('>' in version):
|
||||
cls._pip_version.append(version)
|
||||
else:
|
||||
cls._pip_version.append("==" + version)
|
||||
|
||||
@classmethod
|
||||
def get_pip_version(cls):
|
||||
return cls._pip_version or ''
|
||||
def get_pip_versions(cls, pip="pip", wrap=''):
|
||||
return [
|
||||
(wrap + pip + version + wrap)
|
||||
for version in cls._pip_version or [pip]
|
||||
]
|
||||
|
||||
def get_cached_venv(self, requirements, docker_cmd, python_version, cuda_version, destination_folder):
|
||||
# type: (Dict, Optional[Union[dict, str]], Optional[str], Optional[str], Path) -> Optional[Path]
|
||||
@@ -176,8 +192,13 @@ class PackageManager(object):
|
||||
if not self._get_cache_manager():
|
||||
return None
|
||||
|
||||
keys = self._generate_reqs_hash_keys(requirements, docker_cmd, python_version, cuda_version)
|
||||
return self._get_cache_manager().copy_cached_entry(keys, destination_folder)
|
||||
try:
|
||||
keys = self._generate_reqs_hash_keys(requirements, docker_cmd, python_version, cuda_version)
|
||||
return self._get_cache_manager().copy_cached_entry(keys, destination_folder)
|
||||
except Exception as ex:
|
||||
print("WARNING: Failed accessing venvs cache at {}: {}".format(destination_folder, ex))
|
||||
print("WARNING: Skipping venv cache - folder not accessible!")
|
||||
return None
|
||||
|
||||
def add_cached_venv(
|
||||
self,
|
||||
@@ -194,9 +215,15 @@ class PackageManager(object):
|
||||
"""
|
||||
if not self._get_cache_manager():
|
||||
return
|
||||
keys = self._generate_reqs_hash_keys(requirements, docker_cmd, python_version, cuda_version)
|
||||
return self._get_cache_manager().add_entry(
|
||||
keys=keys, source_folder=source_folder, exclude_sub_folders=exclude_sub_folders)
|
||||
|
||||
try:
|
||||
keys = self._generate_reqs_hash_keys(requirements, docker_cmd, python_version, cuda_version)
|
||||
return self._get_cache_manager().add_entry(
|
||||
keys=keys, source_folder=source_folder, exclude_sub_folders=exclude_sub_folders)
|
||||
except Exception as ex:
|
||||
print("WARNING: Failed accessing venvs cache at {}: {}".format(source_folder, ex))
|
||||
print("WARNING: Skipping venv cache - folder not accessible!")
|
||||
return None
|
||||
|
||||
def get_cache_folder(self):
|
||||
# type: () -> Optional[Path]
|
||||
@@ -213,6 +240,13 @@ class PackageManager(object):
|
||||
return
|
||||
return self._get_cache_manager().get_last_copied_entry()
|
||||
|
||||
def is_cached_enabled(self):
|
||||
if not self._cache_manager:
|
||||
cache_folder = ENV_VENV_CACHE_PATH.get() or self.session.config.get(self._config_cache_folder, None)
|
||||
if not cache_folder:
|
||||
return False
|
||||
return True
|
||||
|
||||
@classmethod
|
||||
def _generate_reqs_hash_keys(cls, requirements_list, docker_cmd, python_version, cuda_version):
|
||||
# type: (Union[Dict, List[Dict]], Optional[Union[dict, str]], Optional[str], Optional[str]) -> List[str]
|
||||
@@ -257,12 +291,19 @@ class PackageManager(object):
|
||||
|
||||
def _get_cache_manager(self):
|
||||
if not self._cache_manager:
|
||||
cache_folder = ENV_VENV_CACHE_PATH.get() or self.session.config.get(self._config_cache_folder, None)
|
||||
if not cache_folder:
|
||||
cache_folder = None
|
||||
try:
|
||||
cache_folder = ENV_VENV_CACHE_PATH.get() or self.session.config.get(self._config_cache_folder, None)
|
||||
if not cache_folder:
|
||||
return None
|
||||
|
||||
max_entries = int(self.session.config.get(self._config_cache_max_entries, 10))
|
||||
free_space_threshold = float(self.session.config.get(self._config_cache_free_space_threshold, 0))
|
||||
self._cache_manager = FolderCache(
|
||||
cache_folder, max_cache_entries=max_entries, min_free_space_gb=free_space_threshold)
|
||||
except Exception as ex:
|
||||
print("WARNING: Failed accessing venvs cache at {}: {}".format(cache_folder, ex))
|
||||
print("WARNING: Skipping venv cache - folder not accessible!")
|
||||
return None
|
||||
|
||||
max_entries = int(self.session.config.get(self._config_cache_max_entries, 10))
|
||||
free_space_threshold = float(self.session.config.get(self._config_cache_free_space_threshold, 0))
|
||||
self._cache_manager = FolderCache(
|
||||
cache_folder, max_cache_entries=max_entries, min_free_space_gb=free_space_threshold)
|
||||
return self._cache_manager
|
||||
|
||||
@@ -135,7 +135,12 @@ class CondaAPI(PackageManager):
|
||||
if self.env_read_only:
|
||||
print('Conda environment in read-only mode, skipping pip upgrade.')
|
||||
return ''
|
||||
return self._install(select_for_platform(windows='pip{}', linux='pip{}').format(self.pip.get_pip_version()))
|
||||
return self._install(
|
||||
*select_for_platform(
|
||||
windows=self.pip.get_pip_versions(),
|
||||
linux=self.pip.get_pip_versions()
|
||||
)
|
||||
)
|
||||
|
||||
def create(self):
|
||||
"""
|
||||
|
||||
@@ -50,6 +50,14 @@ class ExternalRequirements(SimpleSubstitution):
|
||||
print("No need to reinstall \'{}\' from VCS, "
|
||||
"the exact same version is already installed".format(req.name))
|
||||
continue
|
||||
|
||||
if not req.pip_new_version:
|
||||
# noinspection PyBroadException
|
||||
try:
|
||||
freeze_base = PackageManager.out_of_scope_freeze() or dict(pip=[])
|
||||
except Exception:
|
||||
freeze_base = dict(pip=[])
|
||||
|
||||
req_line = self._add_vcs_credentials(req, session)
|
||||
|
||||
# if we have older pip version we have to make sure we replace back the package name with the
|
||||
@@ -58,14 +66,14 @@ class ExternalRequirements(SimpleSubstitution):
|
||||
PackageManager.out_of_scope_install_package(req_line, "--no-deps")
|
||||
# noinspection PyBroadException
|
||||
try:
|
||||
freeze_post = PackageManager.out_of_scope_freeze() or ''
|
||||
freeze_post = PackageManager.out_of_scope_freeze() or dict(pip=[])
|
||||
package_name = list(set(freeze_post['pip']) - set(freeze_base['pip']))
|
||||
if package_name and package_name[0] not in self.post_install_req_lookup:
|
||||
self.post_install_req_lookup[package_name[0]] = req.req.line
|
||||
except Exception:
|
||||
pass
|
||||
|
||||
# no need to force reinstall, pip will always rebuilt if the package comes from git
|
||||
# no need to force reinstall, pip will always rebuild if the package comes from git
|
||||
# and make sure the required packages are installed (if they are not it will install them)
|
||||
if not PackageManager.out_of_scope_install_package(req_line):
|
||||
raise ValueError("Failed installing GIT/HTTPs package \'{}\'".format(req_line))
|
||||
@@ -86,16 +94,18 @@ class ExternalRequirements(SimpleSubstitution):
|
||||
vcs_url = vcs_url[::-1].replace(fragment[::-1], '', 1)[::-1]
|
||||
# remove ssh:// or git:// prefix for git detection and credentials
|
||||
scheme = ''
|
||||
full_vcs_url = vcs_url
|
||||
if vcs_url and (vcs_url.startswith('ssh://') or vcs_url.startswith('git://')):
|
||||
scheme = 'ssh://' # notice git:// is actually ssh://
|
||||
vcs_url = vcs_url[6:]
|
||||
|
||||
from ..repo import Git
|
||||
vcs = Git(session=session, url=vcs_url, location=None, revision=None)
|
||||
vcs = Git(session=session, url=full_vcs_url, location=None, revision=None)
|
||||
vcs._set_ssh_url()
|
||||
new_req_line = 'git+{}{}{}'.format(
|
||||
'' if scheme and '://' in vcs.url else scheme,
|
||||
vcs.url_with_auth, fragment
|
||||
vcs_url if session.config.get('agent.force_git_ssh_protocol', None) else vcs.url_with_auth,
|
||||
fragment
|
||||
)
|
||||
if new_req_line != req_line:
|
||||
furl_line = furl(new_req_line)
|
||||
|
||||
@@ -69,6 +69,11 @@ class PoetryConfig:
|
||||
path = path.replace(':'+sys.base_prefix, ':'+sys.real_prefix, 1)
|
||||
kwargs['env']['PATH'] = path
|
||||
|
||||
if self.session and self.session.config:
|
||||
extra_args = self.session.config.get("agent.package_manager.poetry_install_extra_args", None)
|
||||
if extra_args:
|
||||
args = args + tuple(extra_args)
|
||||
|
||||
if check_if_command_exists("poetry"):
|
||||
argv = Argv("poetry", *args)
|
||||
else:
|
||||
|
||||
@@ -1,3 +1,4 @@
|
||||
import re
|
||||
from typing import Text
|
||||
|
||||
from .base import PackageManager
|
||||
@@ -11,13 +12,14 @@ class PriorityPackageRequirement(SimpleSubstitution):
|
||||
|
||||
def __init__(self, *args, **kwargs):
|
||||
super(PriorityPackageRequirement, self).__init__(*args, **kwargs)
|
||||
self._replaced_packages = {}
|
||||
# check if we need to replace the packages:
|
||||
priority_packages = self.config.get('agent.package_manager.priority_packages', None)
|
||||
if priority_packages:
|
||||
self.__class__.name = priority_packages
|
||||
self.__class__.name = [p.lower() for p in priority_packages]
|
||||
priority_optional_packages = self.config.get('agent.package_manager.priority_optional_packages', None)
|
||||
if priority_optional_packages:
|
||||
self.__class__.optional_package_names = priority_optional_packages
|
||||
self.__class__.optional_package_names = [p.lower() for p in priority_optional_packages]
|
||||
|
||||
def match(self, req):
|
||||
# match both Cython & cython
|
||||
@@ -28,7 +30,9 @@ class PriorityPackageRequirement(SimpleSubstitution):
|
||||
Replace a requirement
|
||||
:raises: ValueError if version is pre-release
|
||||
"""
|
||||
if req.name in self.optional_package_names:
|
||||
self._replaced_packages[req.name] = req.line
|
||||
|
||||
if req.name.lower() in self.optional_package_names:
|
||||
# noinspection PyBroadException
|
||||
try:
|
||||
if PackageManager.out_of_scope_install_package(str(req)):
|
||||
@@ -39,6 +43,41 @@ class PriorityPackageRequirement(SimpleSubstitution):
|
||||
PackageManager.out_of_scope_install_package(str(req))
|
||||
return Text(req)
|
||||
|
||||
def replace_back(self, list_of_requirements):
|
||||
"""
|
||||
:param list_of_requirements: {'pip': ['a==1.0', ]}
|
||||
:return: {'pip': ['a==1.0', ]}
|
||||
"""
|
||||
# if we replaced setuptools, it means someone requested it, and since freeze will not contain it,
|
||||
# we need to add it manually
|
||||
if not self._replaced_packages or "setuptools" not in self._replaced_packages:
|
||||
return list_of_requirements
|
||||
|
||||
try:
|
||||
for k, lines in list_of_requirements.items():
|
||||
# k is either pip/conda
|
||||
if k not in ('pip', 'conda'):
|
||||
continue
|
||||
for i, line in enumerate(lines):
|
||||
if not line or line.lstrip().startswith('#'):
|
||||
continue
|
||||
parts = [p for p in re.split(r'\s|=|\.|<|>|~|!|@|#', line) if p]
|
||||
if not parts:
|
||||
continue
|
||||
# if we found setuptools, do nothing
|
||||
if parts[0] == "setuptools":
|
||||
return list_of_requirements
|
||||
|
||||
# if we are here it means we have not found setuptools
|
||||
# we should add it:
|
||||
if "pip" in list_of_requirements:
|
||||
list_of_requirements["pip"] = [self._replaced_packages["setuptools"]] + list_of_requirements["pip"]
|
||||
|
||||
except Exception as ex: # noqa
|
||||
return list_of_requirements
|
||||
|
||||
return list_of_requirements
|
||||
|
||||
|
||||
class PackageCollectorRequirement(SimpleSubstitution):
|
||||
"""
|
||||
|
||||
@@ -7,13 +7,15 @@ from furl import furl
|
||||
import urllib.parse
|
||||
from operator import itemgetter
|
||||
from html.parser import HTMLParser
|
||||
from typing import Text, Optional
|
||||
from typing import Text, Optional, Dict
|
||||
|
||||
import attr
|
||||
import requests
|
||||
|
||||
import six
|
||||
from .requirements import SimpleSubstitution, FatalSpecsResolutionError, SimpleVersion, MarkerRequirement
|
||||
from .requirements import (
|
||||
SimpleSubstitution, FatalSpecsResolutionError, SimpleVersion, MarkerRequirement,
|
||||
compare_version_rules, )
|
||||
from ...external.requirements_parser.requirement import Requirement
|
||||
|
||||
OS_TO_WHEEL_NAME = {"linux": "linux_x86_64", "windows": "win_amd64"}
|
||||
@@ -53,17 +55,16 @@ class PytorchWheel(object):
|
||||
python = attr.ib(type=str, converter=lambda x: str(x).replace(".", ""))
|
||||
torch_version = attr.ib(type=str, converter=fix_version)
|
||||
|
||||
url_template = (
|
||||
"http://download.pytorch.org/whl/"
|
||||
"{0.cuda_version}/torch-{0.torch_version}-cp{0.python}-cp{0.python}m{0.unicode}-{0.os_name}.whl"
|
||||
)
|
||||
url_template_prefix = "http://download.pytorch.org/whl/"
|
||||
url_template = "{0.cuda_version}/torch-{0.torch_version}" \
|
||||
"-cp{0.python}-cp{0.python}m{0.unicode}-{0.os_name}.whl"
|
||||
|
||||
def __attrs_post_init__(self):
|
||||
self.unicode = "u" if self.python.startswith("2") else ""
|
||||
|
||||
def make_url(self):
|
||||
# type: () -> Text
|
||||
return self.url_template.format(self)
|
||||
return (self.url_template_prefix + self.url_template).format(self)
|
||||
|
||||
|
||||
class PytorchResolutionError(FatalSpecsResolutionError):
|
||||
@@ -170,6 +171,10 @@ class PytorchRequirement(SimpleSubstitution):
|
||||
name = "torch"
|
||||
packages = ("torch", "torchvision", "torchaudio", "torchcsprng", "torchtext")
|
||||
|
||||
extra_index_url_template = 'https://download.pytorch.org/whl/cu{}/'
|
||||
nightly_extra_index_url_template = 'https://download.pytorch.org/whl/nightly/cu{}/'
|
||||
torch_index_url_lookup = {}
|
||||
|
||||
def __init__(self, *args, **kwargs):
|
||||
os_name = kwargs.pop("os_override", None)
|
||||
super(PytorchRequirement, self).__init__(*args, **kwargs)
|
||||
@@ -183,6 +188,26 @@ class PytorchRequirement(SimpleSubstitution):
|
||||
self._fix_setuptools = None
|
||||
self.exceptions = []
|
||||
self._original_req = []
|
||||
# allow override pytorch lookup pages
|
||||
if self.config.get("agent.package_manager.extra_index_url_template", None):
|
||||
self.extra_index_url_template = \
|
||||
self.config.get("agent.package_manager.extra_index_url_template", None)
|
||||
if self.config.get("agent.package_manager.nightly_extra_index_url_template", None):
|
||||
self.nightly_extra_index_url_template = \
|
||||
self.config.get("agent.package_manager.nightly_extra_index_url_template", None)
|
||||
# allow override pytorch lookup pages
|
||||
if self.config.get("agent.package_manager.torch_page", None):
|
||||
SimplePytorchRequirement.page_lookup_template = \
|
||||
self.config.get("agent.package_manager.torch_page", None)
|
||||
if self.config.get("agent.package_manager.torch_nightly_page", None):
|
||||
SimplePytorchRequirement.nightly_page_lookup_template = \
|
||||
self.config.get("agent.package_manager.torch_nightly_page", None)
|
||||
if self.config.get("agent.package_manager.torch_url_template_prefix", None):
|
||||
PytorchWheel.url_template_prefix = \
|
||||
self.config.get("agent.package_manager.torch_url_template_prefix", None)
|
||||
if self.config.get("agent.package_manager.torch_url_template", None):
|
||||
PytorchWheel.url_template = \
|
||||
self.config.get("agent.package_manager.torch_url_template", None)
|
||||
|
||||
def _init_python_ver_cuda_ver(self):
|
||||
if self.cuda is None:
|
||||
@@ -369,7 +394,8 @@ class PytorchRequirement(SimpleSubstitution):
|
||||
print('Trying PyTorch CUDA version {} support'.format(torch_url_key))
|
||||
|
||||
# fix broken pytorch setuptools incompatibility
|
||||
if closest_matched_version and SimpleVersion.compare_versions(closest_matched_version, "<", "1.11.0"):
|
||||
if req.name == "torch" and closest_matched_version and \
|
||||
SimpleVersion.compare_versions(closest_matched_version, "<", "1.11.0"):
|
||||
self._fix_setuptools = "setuptools < 59"
|
||||
|
||||
if not url:
|
||||
@@ -449,6 +475,44 @@ class PytorchRequirement(SimpleSubstitution):
|
||||
return self.match_version(req, base).replace(" ", "\n")
|
||||
|
||||
def replace(self, req):
|
||||
# check if package is already installed with system packages
|
||||
self.validate_python_version()
|
||||
|
||||
# try to check if we can just use the new index URL, if we do not we will revert to old method
|
||||
try:
|
||||
extra_index_url = self.get_torch_index_url(self.cuda_version)
|
||||
if extra_index_url:
|
||||
# check if the torch version cannot be above 1.11 , we need to fix setup tools
|
||||
try:
|
||||
if req.name == "torch" and not compare_version_rules(req.specs, [(">=", "1.11.0")]):
|
||||
self._fix_setuptools = "setuptools < 59"
|
||||
except Exception: # noqa
|
||||
pass
|
||||
# now we just need to add the correct extra index url for the cuda version
|
||||
self.set_add_install_extra_index(extra_index_url[0])
|
||||
|
||||
if req.specs and len(req.specs) == 1 and req.specs[0][0] == "==":
|
||||
# remove any +cu extension and let pip resolve that
|
||||
# and add .* if we have 3 parts version to deal with nvidia container 'a' version
|
||||
# i.e. "1.13.0" -> "1.13.0.*" so it should match preinstalled "1.13.0a0+936e930"
|
||||
spec_3_parts = req.format_specs(num_parts=3)
|
||||
spec_max3_parts = req.format_specs(max_num_parts=3)
|
||||
if spec_3_parts == spec_max3_parts and not spec_max3_parts.endswith("*"):
|
||||
line = "{} {}.*".format(req.name, spec_max3_parts)
|
||||
else:
|
||||
line = "{} {}".format(req.name, spec_max3_parts)
|
||||
|
||||
if req.marker:
|
||||
line += " ; {}".format(req.marker)
|
||||
else:
|
||||
# return the original line
|
||||
line = req.line
|
||||
|
||||
return line
|
||||
|
||||
except Exception: # noqa
|
||||
pass
|
||||
|
||||
try:
|
||||
new_req = self._replace(req)
|
||||
if new_req:
|
||||
@@ -512,7 +576,7 @@ class PytorchRequirement(SimpleSubstitution):
|
||||
for i, line in enumerate(lines):
|
||||
if not line or line.lstrip().startswith('#'):
|
||||
continue
|
||||
parts = [p for p in re.split('\s|=|\.|<|>|~|!|@|#', line) if p]
|
||||
parts = [p for p in re.split(r'\s|=|\.|<|>|~|!|@|#', line) if p]
|
||||
if not parts:
|
||||
continue
|
||||
for req, new_req in self._original_req:
|
||||
@@ -544,6 +608,51 @@ class PytorchRequirement(SimpleSubstitution):
|
||||
return MarkerRequirement(Requirement.parse(self._fix_setuptools))
|
||||
return None
|
||||
|
||||
@classmethod
|
||||
def get_torch_index_url(cls, cuda_version, nightly=False):
|
||||
# noinspection PyBroadException
|
||||
try:
|
||||
cuda = int(cuda_version)
|
||||
except Exception:
|
||||
cuda = 0
|
||||
|
||||
if nightly:
|
||||
for c in range(cuda, max(-1, cuda-15), -1):
|
||||
# then try the nightly builds, it might be there...
|
||||
torch_url = cls.nightly_extra_index_url_template.format(c)
|
||||
# noinspection PyBroadException
|
||||
try:
|
||||
if requests.get(torch_url, timeout=10).ok:
|
||||
print('Torch nightly CUDA {} index page found'.format(c))
|
||||
cls.torch_index_url_lookup[c] = torch_url
|
||||
return cls.torch_index_url_lookup[c], c
|
||||
except Exception:
|
||||
pass
|
||||
return
|
||||
|
||||
# first check if key is valid
|
||||
if cuda in cls.torch_index_url_lookup:
|
||||
return cls.torch_index_url_lookup[cuda], cuda
|
||||
|
||||
# then try a new cuda version page
|
||||
for c in range(cuda, max(-1, cuda-15), -1):
|
||||
torch_url = cls.extra_index_url_template.format(c)
|
||||
# noinspection PyBroadException
|
||||
try:
|
||||
if requests.get(torch_url, timeout=10).ok:
|
||||
print('Torch CUDA {} index page found'.format(c))
|
||||
cls.torch_index_url_lookup[c] = torch_url
|
||||
return cls.torch_index_url_lookup[c], c
|
||||
except Exception:
|
||||
pass
|
||||
|
||||
keys = sorted(cls.torch_index_url_lookup.keys(), reverse=True)
|
||||
for k in keys:
|
||||
if k <= cuda:
|
||||
return cls.torch_index_url_lookup[k], k
|
||||
# return default - zero
|
||||
return cls.torch_index_url_lookup[0], 0
|
||||
|
||||
MAP = {
|
||||
"windows": {
|
||||
"cuda100": {
|
||||
|
||||
@@ -11,9 +11,10 @@ from os import path
|
||||
from typing import Text, List, Type, Optional, Tuple, Dict
|
||||
|
||||
from pathlib2 import Path
|
||||
from pyhocon import ConfigTree
|
||||
from clearml_agent.external.pyhocon import ConfigTree
|
||||
|
||||
import six
|
||||
from six.moves.urllib.parse import unquote
|
||||
import logging
|
||||
from clearml_agent.definitions import PIP_EXTRA_INDICES
|
||||
from clearml_agent.helper.base import (
|
||||
@@ -99,7 +100,8 @@ class MarkerRequirement(object):
|
||||
return ','.join(starmap(operator.add, self.specs))
|
||||
|
||||
op, version = self.specs[0]
|
||||
for v in self._sub_versions_pep440:
|
||||
# noinspection PyProtectedMember
|
||||
for v in SimpleVersion._sub_versions_pep440:
|
||||
version = version.replace(v, '.')
|
||||
if num_parts:
|
||||
version = (version.strip('.').split('.') + ['0'] * num_parts)[:max_num_parts]
|
||||
@@ -175,11 +177,13 @@ class MarkerRequirement(object):
|
||||
return
|
||||
local_path = Path(self.uri[len("file://"):])
|
||||
if not local_path.exists():
|
||||
line = self.line
|
||||
if self.remove_local_file_ref():
|
||||
# print warning
|
||||
logging.getLogger(__name__).warning(
|
||||
'Local file not found [{}], references removed'.format(line))
|
||||
local_path = Path(unquote(self.uri)[len("file://"):])
|
||||
if not local_path.exists():
|
||||
line = self.line
|
||||
if self.remove_local_file_ref():
|
||||
# print warning
|
||||
logging.getLogger(__name__).warning(
|
||||
'Local file not found [{}], references removed'.format(line))
|
||||
|
||||
|
||||
class SimpleVersion:
|
||||
@@ -275,6 +279,8 @@ class SimpleVersion:
|
||||
return version_a_key > version_b_key
|
||||
if op == '<':
|
||||
return version_a_key < version_b_key
|
||||
if op == '!=':
|
||||
return version_a_key != version_b_key
|
||||
raise ValueError('Unrecognized comparison operator [{}]'.format(op))
|
||||
|
||||
@classmethod
|
||||
@@ -359,7 +365,7 @@ def compare_version_rules(specs_a, specs_b):
|
||||
# specs_a/b are a list of tuples: [('==', '1.2.3'), ] or [('>=', '1.2'), ('<', '1.3')]
|
||||
# section definition:
|
||||
class Section(object):
|
||||
def __init__(self, left=None, left_eq=False, right=None, right_eq=False):
|
||||
def __init__(self, left="-999999999", left_eq=False, right="999999999", right_eq=False):
|
||||
self.left, self.left_eq, self.right, self.right_eq = left, left_eq, right, right_eq
|
||||
# first create a list of in/out sections for each spec
|
||||
# >, >= are left rule
|
||||
@@ -431,6 +437,11 @@ class RequirementSubstitution(object):
|
||||
|
||||
_pip_extra_index_url = PIP_EXTRA_INDICES
|
||||
|
||||
@classmethod
|
||||
def set_add_install_extra_index(cls, extra_index_url):
|
||||
if extra_index_url not in cls._pip_extra_index_url:
|
||||
cls._pip_extra_index_url.append(extra_index_url)
|
||||
|
||||
def __init__(self, session):
|
||||
# type: (Session) -> ()
|
||||
self._session = session
|
||||
|
||||
@@ -1,3 +1,4 @@
|
||||
from tempfile import mkdtemp
|
||||
from typing import Text
|
||||
|
||||
from furl import furl
|
||||
@@ -20,7 +21,16 @@ class RequirementsTranslator(object):
|
||||
config = session.config
|
||||
self.cache_dir = cache_dir or Path(config["agent.pip_download_cache.path"]).expanduser().as_posix()
|
||||
self.enabled = config["agent.pip_download_cache.enabled"]
|
||||
Path(self.cache_dir).mkdir(parents=True, exist_ok=True)
|
||||
# noinspection PyBroadException
|
||||
try:
|
||||
Path(self.cache_dir).mkdir(parents=True, exist_ok=True)
|
||||
except Exception:
|
||||
temp_cache_folder = mkdtemp(prefix='pip_download_cache.')
|
||||
print("Failed creating pip download cache folder at `{}` reverting to `{}`".format(
|
||||
self.cache_dir, temp_cache_folder))
|
||||
self.cache_dir = temp_cache_folder
|
||||
Path(self.cache_dir).mkdir(parents=True, exist_ok=True)
|
||||
|
||||
self.config = Config()
|
||||
self.pip = SystemPip(interpreter=interpreter, session=self._session)
|
||||
self._translate_back = {}
|
||||
|
||||
@@ -16,7 +16,6 @@ from typing import Union, Text, Sequence, Any, TypeVar, Callable
|
||||
|
||||
import psutil
|
||||
from furl import furl
|
||||
from future.builtins import super
|
||||
from pathlib2 import Path
|
||||
|
||||
import six
|
||||
@@ -26,7 +25,7 @@ from clearml_agent.helper.base import bash_c, is_windows_platform, select_for_pl
|
||||
PathLike = Union[Text, Path]
|
||||
|
||||
|
||||
def get_bash_output(cmd, strip=False, stderr=subprocess.STDOUT, stdin=False):
|
||||
def get_bash_output(cmd, strip=False, stderr=subprocess.STDOUT, stdin=False, raise_error=False):
|
||||
try:
|
||||
output = (
|
||||
subprocess.check_output(
|
||||
@@ -38,10 +37,16 @@ def get_bash_output(cmd, strip=False, stderr=subprocess.STDOUT, stdin=False):
|
||||
.strip()
|
||||
)
|
||||
except subprocess.CalledProcessError:
|
||||
if raise_error:
|
||||
raise
|
||||
output = None
|
||||
return output if not strip or not output else output.strip()
|
||||
|
||||
|
||||
def stringify_bash_output(value):
|
||||
return '' if not value else (value if isinstance(value, str) else value.decode('utf-8'))
|
||||
|
||||
|
||||
def terminate_process(pid, timeout=10., ignore_zombie=True, include_children=False):
|
||||
# noinspection PyBroadException
|
||||
try:
|
||||
@@ -112,10 +117,11 @@ def terminate_all_child_processes(pid=None, timeout=10., include_parent=True):
|
||||
|
||||
|
||||
def get_docker_id(docker_cmd_contains):
|
||||
# noinspection PyBroadException
|
||||
try:
|
||||
containers_running = get_bash_output(cmd='docker ps --no-trunc --format \"{{.ID}}: {{.Command}}\"')
|
||||
for docker_line in containers_running.split('\n'):
|
||||
parts = docker_line.split(':')
|
||||
parts = docker_line.split(':', 1)
|
||||
if docker_cmd_contains in parts[-1]:
|
||||
# we found our docker, return it
|
||||
return parts[0]
|
||||
|
||||
@@ -1,7 +1,11 @@
|
||||
import abc
|
||||
import os
|
||||
import re
|
||||
import shutil
|
||||
import stat
|
||||
import subprocess
|
||||
import sys
|
||||
import tempfile
|
||||
from distutils.spawn import find_executable
|
||||
from hashlib import md5
|
||||
from os import environ
|
||||
@@ -23,7 +27,7 @@ from clearml_agent.helper.base import (
|
||||
rm_tree,
|
||||
ExecutionInfo,
|
||||
normalize_path,
|
||||
create_file_if_not_exists,
|
||||
create_file_if_not_exists, safe_remove_file,
|
||||
)
|
||||
from clearml_agent.helper.os.locks import FileLock
|
||||
from clearml_agent.helper.process import DEVNULL, Argv, PathLike, COMMAND_SUCCESS
|
||||
@@ -118,6 +122,13 @@ class VCS(object):
|
||||
"""
|
||||
return self.add_auth(self.session.config, self.url)
|
||||
|
||||
@property
|
||||
def url_without_auth(self):
|
||||
"""
|
||||
Return URL without configured user/password
|
||||
"""
|
||||
return self.add_auth(self.session.config, self.url, reset_auth=True)
|
||||
|
||||
@abc.abstractmethod
|
||||
def executable_name(self):
|
||||
"""
|
||||
@@ -349,7 +360,9 @@ class VCS(object):
|
||||
If not in debug mode, filter VCS password from output.
|
||||
"""
|
||||
self._set_ssh_url()
|
||||
clone_command = ("clone", self.url_with_auth, self.location) + self.clone_flags
|
||||
# if we are on linux no need for the full auth url because we use GIT_ASKPASS
|
||||
url = self.url_without_auth if self._use_ask_pass else self.url_with_auth
|
||||
clone_command = ("clone", url, self.location) + self.clone_flags
|
||||
# clone all branches regardless of when we want to later checkout
|
||||
# if branch:
|
||||
# clone_command += ("-b", branch)
|
||||
@@ -357,34 +370,35 @@ class VCS(object):
|
||||
self.call(*clone_command)
|
||||
return
|
||||
|
||||
def normalize_output(result):
|
||||
"""
|
||||
Returns result string without user's password.
|
||||
NOTE: ``self.get_stderr``'s result might or might not have the same type as ``e.output`` in case of error.
|
||||
"""
|
||||
string_type = (
|
||||
ensure_text
|
||||
if isinstance(result, six.text_type)
|
||||
else ensure_binary
|
||||
)
|
||||
return result.replace(
|
||||
string_type(self.url),
|
||||
string_type(furl(self.url).remove(password=True).tostr()),
|
||||
)
|
||||
|
||||
def print_output(output):
|
||||
print(ensure_text(output))
|
||||
|
||||
try:
|
||||
print_output(normalize_output(self.get_stderr(*clone_command)))
|
||||
self._print_output(self._normalize_output(self.get_stderr(*clone_command)))
|
||||
except subprocess.CalledProcessError as e:
|
||||
# In Python 3, subprocess.CalledProcessError has a `stderr` attribute,
|
||||
# but since stderr is redirect to `subprocess.PIPE` it will appear in the usual `output` attribute
|
||||
if e.output:
|
||||
e.output = normalize_output(e.output)
|
||||
print_output(e.output)
|
||||
e.output = self._normalize_output(e.output)
|
||||
self._print_output(e.output)
|
||||
raise
|
||||
|
||||
def _normalize_output(self, result):
|
||||
"""
|
||||
Returns result string without user's password.
|
||||
NOTE: ``self.get_stderr``'s result might or might not have the same type as ``e.output`` in case of error.
|
||||
"""
|
||||
string_type = (
|
||||
ensure_text
|
||||
if isinstance(result, six.text_type)
|
||||
else ensure_binary
|
||||
)
|
||||
return result.replace(
|
||||
string_type(self.url),
|
||||
string_type(furl(self.url).remove(password=True).tostr()),
|
||||
)
|
||||
|
||||
@staticmethod
|
||||
def _print_output(output):
|
||||
print(ensure_text(output))
|
||||
|
||||
def checkout(self):
|
||||
# type: () -> None
|
||||
"""
|
||||
@@ -473,10 +487,12 @@ class VCS(object):
|
||||
return Argv(self.executable_name, *argv)
|
||||
|
||||
@classmethod
|
||||
def add_auth(cls, config, url):
|
||||
def add_auth(cls, config, url, reset_auth=False):
|
||||
"""
|
||||
Add username and password to URL if missing from URL and present in config.
|
||||
Does not modify ssh URLs.
|
||||
|
||||
:param reset_auth: If true remove the user/pass from the URL (default False)
|
||||
"""
|
||||
try:
|
||||
parsed_url = furl(url)
|
||||
@@ -493,7 +509,10 @@ class VCS(object):
|
||||
and config_pass
|
||||
and (not config_domain or config_domain.lower() == parsed_url.host)
|
||||
):
|
||||
parsed_url.set(username=config_user, password=config_pass)
|
||||
if reset_auth:
|
||||
parsed_url.set(username=None, password=None)
|
||||
else:
|
||||
parsed_url.set(username=config_user, password=config_pass)
|
||||
return parsed_url.url
|
||||
|
||||
@abc.abstractmethod
|
||||
@@ -531,6 +550,10 @@ class Git(VCS):
|
||||
|
||||
def __init__(self, *args, **kwargs):
|
||||
super(Git, self).__init__(*args, **kwargs)
|
||||
|
||||
self._use_ask_pass = False if not self.session.config.get('agent.enable_git_ask_pass', None) \
|
||||
else sys.platform == "linux"
|
||||
|
||||
try:
|
||||
self.call("config", "--global", "--replace-all", "safe.directory", "*", cwd=self.location)
|
||||
except: # noqa
|
||||
@@ -558,6 +581,66 @@ class Git(VCS):
|
||||
def pull(self):
|
||||
self.call("fetch", "--all", "--recurse-submodules", cwd=self.location)
|
||||
|
||||
def _git_pass_auth_wrapper(self, func, *args, **kwargs):
|
||||
try:
|
||||
url_with_auth = furl(self.url_with_auth)
|
||||
password = url_with_auth.password if url_with_auth else None
|
||||
username = url_with_auth.username if url_with_auth else None
|
||||
except: # noqa
|
||||
password = None
|
||||
username = None
|
||||
|
||||
# if this is not linux or we do not have a password, just run as is
|
||||
if not self._use_ask_pass or not password or not username:
|
||||
return func(*args, **kwargs)
|
||||
|
||||
# create the password file
|
||||
fp, pass_file = tempfile.mkstemp(prefix='clearml_git_', suffix='.sh')
|
||||
os.close(fp)
|
||||
with open(pass_file, 'wt') as f:
|
||||
# get first letter only (username / password are the argument options)
|
||||
# then echo the correct information
|
||||
f.writelines([
|
||||
'#!/bin/bash\n',
|
||||
'c="$1"\n',
|
||||
'c="${c%"${c#?}"}"\n',
|
||||
'if [ "$c" == "u" ] || [ "$c" == "U" ]; then echo "{}"; else echo "{}"; fi\n'.format(
|
||||
username.replace('"', '\\"'), password.replace('"', '\\"')
|
||||
)
|
||||
])
|
||||
# mark executable
|
||||
st = os.stat(pass_file)
|
||||
os.chmod(pass_file, st.st_mode | stat.S_IEXEC)
|
||||
# let GIT use it
|
||||
self.COMMAND_ENV["GIT_ASKPASS"] = pass_file
|
||||
# call git command
|
||||
try:
|
||||
ret = func(*args, **kwargs)
|
||||
finally:
|
||||
# delete temp password file
|
||||
self.COMMAND_ENV.pop("GIT_ASKPASS", None)
|
||||
safe_remove_file(pass_file)
|
||||
|
||||
return ret
|
||||
|
||||
def get_stderr(self, *argv, **kwargs):
|
||||
"""
|
||||
Wrapper with git password authentication
|
||||
"""
|
||||
return self._git_pass_auth_wrapper(super(Git, self).get_stderr, *argv, **kwargs)
|
||||
|
||||
def call_with_stdin(self, *argv, **kwargs):
|
||||
"""
|
||||
Wrapper with git password authentication
|
||||
"""
|
||||
return self._git_pass_auth_wrapper(super(Git, self).call_with_stdin, *argv, **kwargs)
|
||||
|
||||
def call(self, *argv, **kwargs):
|
||||
"""
|
||||
Wrapper with git password authentication
|
||||
"""
|
||||
return self._git_pass_auth_wrapper(super(Git, self).call, *argv, **kwargs)
|
||||
|
||||
def checkout(self): # type: () -> None
|
||||
"""
|
||||
Checkout repository at specified revision
|
||||
|
||||
@@ -82,7 +82,7 @@ class ResourceMonitor(object):
|
||||
if not worker_tags and ENV_WORKER_TAGS.get():
|
||||
worker_tags = shlex.split(ENV_WORKER_TAGS.get())
|
||||
self._worker_tags = worker_tags
|
||||
if os.environ.get('NVIDIA_VISIBLE_DEVICES') == 'none':
|
||||
if Session.get_nvidia_visible_env() == 'none':
|
||||
# NVIDIA_VISIBLE_DEVICES set to none, marks cpu_only flag
|
||||
# active_gpus == False means no GPU reporting
|
||||
self._active_gpus = False
|
||||
@@ -92,10 +92,10 @@ class ResourceMonitor(object):
|
||||
# None means no filtering, report all gpus
|
||||
self._active_gpus = None
|
||||
try:
|
||||
active_gpus = os.environ.get('NVIDIA_VISIBLE_DEVICES', '') or \
|
||||
os.environ.get('CUDA_VISIBLE_DEVICES', '')
|
||||
if active_gpus:
|
||||
self._active_gpus = [int(g.strip()) for g in active_gpus.split(',')]
|
||||
active_gpus = Session.get_nvidia_visible_env()
|
||||
# None means no filtering, report all gpus
|
||||
if active_gpus and active_gpus != "all":
|
||||
self._active_gpus = [g.strip() for g in str(active_gpus).split(',')]
|
||||
except Exception:
|
||||
pass
|
||||
|
||||
@@ -263,7 +263,7 @@ class ResourceMonitor(object):
|
||||
gpu_stat = self._gpustat.new_query()
|
||||
for i, g in enumerate(gpu_stat.gpus):
|
||||
# only monitor the active gpu's, if none were selected, monitor everything
|
||||
if self._active_gpus and i not in self._active_gpus:
|
||||
if self._active_gpus and str(i) not in self._active_gpus:
|
||||
continue
|
||||
stats["gpu_temperature_{:d}".format(i)] = g["temperature.gpu"]
|
||||
stats["gpu_utilization_{:d}".format(i)] = g["utilization.gpu"]
|
||||
|
||||
@@ -22,7 +22,7 @@ WORKER_ARGS = {
|
||||
'help': 'git username for repository access',
|
||||
},
|
||||
'--git-pass': {
|
||||
'help': 'git password for repository access',
|
||||
'help': 'git password (personal access tokens) for repository access',
|
||||
},
|
||||
'--log-level': {
|
||||
'help': 'SDK log level',
|
||||
|
||||
@@ -10,8 +10,8 @@ from typing import Any, Callable
|
||||
|
||||
import attr
|
||||
from pathlib2 import Path
|
||||
from pyhocon import ConfigFactory, HOCONConverter, ConfigTree
|
||||
|
||||
from clearml_agent.external.pyhocon import ConfigFactory, HOCONConverter, ConfigTree
|
||||
from clearml_agent.backend_api.session import Session as _Session, Request
|
||||
from clearml_agent.backend_api.session.client import APIClient
|
||||
from clearml_agent.backend_config.defs import LOCAL_CONFIG_FILE_OVERRIDE_VAR, LOCAL_CONFIG_FILES
|
||||
@@ -19,6 +19,7 @@ from clearml_agent.definitions import ENVIRONMENT_CONFIG, ENV_TASK_EXECUTE_AS_US
|
||||
from clearml_agent.errors import APIError
|
||||
from clearml_agent.helper.base import HOCONEncoder
|
||||
from clearml_agent.helper.process import Argv
|
||||
from clearml_agent.helper.docker_args import DockerArgsSanitizer
|
||||
from .version import __version__
|
||||
|
||||
POETRY = "poetry"
|
||||
@@ -76,7 +77,7 @@ class Session(_Session):
|
||||
|
||||
cpu_only = kwargs.get('cpu_only')
|
||||
if cpu_only:
|
||||
os.environ['CUDA_VISIBLE_DEVICES'] = os.environ['NVIDIA_VISIBLE_DEVICES'] = 'none'
|
||||
Session.set_nvidia_visible_env('none')
|
||||
|
||||
if kwargs.get('gpus') and not os.environ.get('KUBERNETES_SERVICE_HOST') \
|
||||
and not os.environ.get('KUBERNETES_PORT'):
|
||||
@@ -85,7 +86,7 @@ class Session(_Session):
|
||||
os.environ.pop('CUDA_VISIBLE_DEVICES', None)
|
||||
os.environ['NVIDIA_VISIBLE_DEVICES'] = kwargs.get('gpus')
|
||||
else:
|
||||
os.environ['CUDA_VISIBLE_DEVICES'] = os.environ['NVIDIA_VISIBLE_DEVICES'] = kwargs.get('gpus')
|
||||
Session.set_nvidia_visible_env(kwargs.get('gpus'))
|
||||
|
||||
if kwargs.get('only_load_config'):
|
||||
from clearml_agent.backend_api.config import load
|
||||
@@ -105,7 +106,7 @@ class Session(_Session):
|
||||
if os.path.exists(os.path.expanduser(os.path.expandvars(f))):
|
||||
self._config_file = f
|
||||
break
|
||||
self.api_client = APIClient(session=self, api_version="2.5")
|
||||
self._api_client = None
|
||||
# HACK make sure we have python version to execute,
|
||||
# if nothing was specific, use the one that runs us
|
||||
def_python = ConfigValue(self.config, "agent.default_python")
|
||||
@@ -132,7 +133,7 @@ class Session(_Session):
|
||||
# override with environment variables
|
||||
# cuda_version & cudnn_version are overridden with os.environ here, and normalized in the next section
|
||||
for config_key, env_config in ENVIRONMENT_CONFIG.items():
|
||||
# check if the propery is of a list:
|
||||
# check if the property is of a list:
|
||||
if config_key.endswith('.0'):
|
||||
if all(not i.get() for i in env_config.values()):
|
||||
continue
|
||||
@@ -166,6 +167,16 @@ class Session(_Session):
|
||||
if not kwargs.get('only_load_config'):
|
||||
self.create_cache_folders()
|
||||
|
||||
@property
|
||||
def api_client(self):
|
||||
if self._api_client is None:
|
||||
self._api_client = APIClient(session=self, api_version="2.5")
|
||||
return self._api_client
|
||||
|
||||
@api_client.setter
|
||||
def api_client(self, value):
|
||||
self._api_client = value
|
||||
|
||||
@staticmethod
|
||||
def get_logger(name):
|
||||
logger = logging.getLogger(name)
|
||||
@@ -232,7 +243,8 @@ class Session(_Session):
|
||||
def print_configuration(
|
||||
self,
|
||||
remove_secret_keys=("secret", "pass", "token", "account_key", "contents"),
|
||||
skip_value_keys=("environment", )
|
||||
skip_value_keys=("environment", ),
|
||||
docker_args_sanitize_keys=("extra_docker_arguments", ),
|
||||
):
|
||||
# remove all the secrets from the print
|
||||
def recursive_remove_secrets(dictionary, secret_keys=(), empty_keys=()):
|
||||
@@ -249,6 +261,8 @@ class Session(_Session):
|
||||
if isinstance(dictionary.get(k, None), dict):
|
||||
recursive_remove_secrets(dictionary[k], secret_keys=secret_keys, empty_keys=empty_keys)
|
||||
elif isinstance(dictionary.get(k, None), (list, tuple)):
|
||||
if k in (docker_args_sanitize_keys or []):
|
||||
dictionary[k] = DockerArgsSanitizer.sanitize_docker_command(self, dictionary[k])
|
||||
for item in dictionary[k]:
|
||||
if isinstance(item, dict):
|
||||
recursive_remove_secrets(item, secret_keys=secret_keys, empty_keys=empty_keys)
|
||||
@@ -256,7 +270,7 @@ class Session(_Session):
|
||||
config = deepcopy(self.config.to_dict())
|
||||
# remove the env variable, it's not important
|
||||
config.pop('env', None)
|
||||
if remove_secret_keys or skip_value_keys:
|
||||
if remove_secret_keys or skip_value_keys or docker_args_sanitize_keys:
|
||||
recursive_remove_secrets(config, secret_keys=remove_secret_keys, empty_keys=skip_value_keys)
|
||||
# remove logging.loggers.urllib3.level from the print
|
||||
try:
|
||||
@@ -288,7 +302,7 @@ class Session(_Session):
|
||||
def get(self, service, action, version=None, headers=None,
|
||||
data=None, json=None, async_enable=False, **kwargs):
|
||||
return self._manual_request(service=service, action=action,
|
||||
version=version, method="get", headers=headers,
|
||||
version=version, method=Request.def_method, headers=headers,
|
||||
data=data, async_enable=async_enable,
|
||||
json=json or kwargs)
|
||||
|
||||
@@ -299,7 +313,7 @@ class Session(_Session):
|
||||
data=data, async_enable=async_enable,
|
||||
json=json or kwargs)
|
||||
|
||||
def _manual_request(self, service, action, version=None, method="get", headers=None,
|
||||
def _manual_request(self, service, action, version=None, method=Request.def_method, headers=None,
|
||||
data=None, json=None, async_enable=False, **kwargs):
|
||||
|
||||
res = self.send_request(service=service, action=action,
|
||||
@@ -327,6 +341,23 @@ class Session(_Session):
|
||||
def command(self, *args):
|
||||
return Argv(*args, log=self.get_logger(Argv.__module__))
|
||||
|
||||
@staticmethod
|
||||
def set_nvidia_visible_env(gpus):
|
||||
if not gpus:
|
||||
gpus = ""
|
||||
visible_env = gpus.replace(".", ":") if isinstance(gpus, str) else \
|
||||
','.join(str(g).replace(".", ":") for g in gpus)
|
||||
|
||||
os.environ['CUDA_VISIBLE_DEVICES'] = os.environ['NVIDIA_VISIBLE_DEVICES'] = visible_env
|
||||
|
||||
@staticmethod
|
||||
def get_nvidia_visible_env():
|
||||
visible_env = os.environ.get('NVIDIA_VISIBLE_DEVICES') or os.environ.get('CUDA_VISIBLE_DEVICES')
|
||||
if visible_env is None:
|
||||
return None
|
||||
visible_env = str(visible_env).replace(":", ".")
|
||||
return visible_env
|
||||
|
||||
|
||||
@attr.s
|
||||
class TrainsAgentLogger(object):
|
||||
|
||||
@@ -1 +1 @@
|
||||
__version__ = '1.2.4rc1'
|
||||
__version__ = '1.5.2'
|
||||
|
||||
@@ -57,8 +57,8 @@ agent {
|
||||
# supported options: pip, conda, poetry
|
||||
type: pip,
|
||||
|
||||
# specify pip version to use (examples "<20", "==19.3.1", "", empty string will install the latest version)
|
||||
pip_version: "<20.2",
|
||||
# specify pip version to use (examples "<20.2", "==19.3.1", "", empty string will install the latest version)
|
||||
pip_version: "<21",
|
||||
|
||||
# virtual environment inheres packages from system
|
||||
system_site_packages: false,
|
||||
|
||||
@@ -1,16 +1,36 @@
|
||||
#!/bin/sh
|
||||
#!/bin/bash +x
|
||||
|
||||
CLEARML_FILES_HOST=${CLEARML_FILES_HOST:-$TRAINS_FILES_HOST}
|
||||
if [ -n "$SHUTDOWN_IF_NO_ACCESS_KEY" ] && [ -z "$CLEARML_API_ACCESS_KEY" ] && [ -z "$TRAINS_API_ACCESS_KEY" ]; then
|
||||
echo "CLEARML_API_ACCESS_KEY was not provided, service will not be started"
|
||||
exit 0
|
||||
fi
|
||||
|
||||
export CLEARML_FILES_HOST=${CLEARML_FILES_HOST:-$TRAINS_FILES_HOST}
|
||||
|
||||
if [ -z "$CLEARML_FILES_HOST" ]; then
|
||||
CLEARML_HOST_IP=${CLEARML_HOST_IP:-${TRAINS_HOST_IP:-$(curl -s https://ifconfig.me/ip)}}
|
||||
fi
|
||||
|
||||
CLEARML_FILES_HOST=${CLEARML_FILES_HOST:-${TRAINS_FILES_HOST:-"http://$CLEARML_HOST_IP:8081"}}
|
||||
CLEARML_WEB_HOST=${CLEARML_WEB_HOST:-${TRAINS_WEB_HOST:-"http://$CLEARML_HOST_IP:8080"}}
|
||||
CLEARML_API_HOST=${CLEARML_API_HOST:-${TRAINS_API_HOST:-"http://$CLEARML_HOST_IP:8008"}}
|
||||
export CLEARML_FILES_HOST=${CLEARML_FILES_HOST:-${TRAINS_FILES_HOST:-"http://$CLEARML_HOST_IP:8081"}}
|
||||
export CLEARML_WEB_HOST=${CLEARML_WEB_HOST:-${TRAINS_WEB_HOST:-"http://$CLEARML_HOST_IP:8080"}}
|
||||
export CLEARML_API_HOST=${CLEARML_API_HOST:-${TRAINS_API_HOST:-"http://$CLEARML_HOST_IP:8008"}}
|
||||
|
||||
echo $CLEARML_FILES_HOST $CLEARML_WEB_HOST $CLEARML_API_HOST 1>&2
|
||||
|
||||
python3 -m pip install -q -U "clearml-agent${CLEARML_AGENT_UPDATE_VERSION:-$TRAINS_AGENT_UPDATE_VERSION}"
|
||||
clearml-agent daemon --services-mode --queue services --create-queue --docker "${CLEARML_AGENT_DEFAULT_BASE_DOCKER:-$TRAINS_AGENT_DEFAULT_BASE_DOCKER}" --cpu-only ${CLEARML_AGENT_EXTRA_ARGS:-$TRAINS_AGENT_EXTRA_ARGS}
|
||||
if [[ "$CLEARML_AGENT_UPDATE_VERSION" =~ ^[0-9]{1,3}\.[0-9]{1,3}(\.[0-9]{1,3}([a-zA-Z]{1,3}[0-9]{1,3})?)?$ ]]
|
||||
then
|
||||
CLEARML_AGENT_UPDATE_VERSION="==$CLEARML_AGENT_UPDATE_VERSION"
|
||||
fi
|
||||
|
||||
DAEMON_OPTIONS=${CLEARML_AGENT_DAEMON_OPTIONS:---services-mode --create-queue}
|
||||
QUEUES=${CLEARML_AGENT_QUEUES:-services}
|
||||
|
||||
if [ -z "$CLEARML_AGENT_NO_UPDATE" ]; then
|
||||
if [ -n "$CLEARML_AGENT_UPDATE_REPO" ]; then
|
||||
python3 -m pip install -q -U $CLEARML_AGENT_UPDATE_REPO
|
||||
else
|
||||
python3 -m pip install -q -U "clearml-agent${CLEARML_AGENT_UPDATE_VERSION:-$TRAINS_AGENT_UPDATE_VERSION}"
|
||||
fi
|
||||
fi
|
||||
|
||||
clearml-agent daemon $DAEMON_OPTIONS --queue $QUEUES --docker "${CLEARML_AGENT_DEFAULT_BASE_DOCKER:-$TRAINS_AGENT_DEFAULT_BASE_DOCKER}" --cpu-only ${CLEARML_AGENT_EXTRA_ARGS:-$TRAINS_AGENT_EXTRA_ARGS}
|
||||
|
||||
@@ -13,6 +13,15 @@ api {
|
||||
}
|
||||
|
||||
agent {
|
||||
# unique name of this worker, if None, created based on hostname:process_id
|
||||
# Override with os environment: CLEARML_WORKER_ID
|
||||
# worker_id: "clearml-agent-machine1:gpu0"
|
||||
worker_id: ""
|
||||
|
||||
# worker name, replaces the hostname when creating a unique name for this worker
|
||||
# Override with os environment: CLEARML_WORKER_NAME
|
||||
# worker_name: "clearml-agent-machine1"
|
||||
worker_name: ""
|
||||
# Set GIT user/pass credentials (if user/pass are set, GIT protocol will be set to https)
|
||||
# leave blank for GIT SSH credentials (set force_git_ssh_protocol=true to force SSH protocol)
|
||||
# **Notice**: GitHub personal token is equivalent to password, you can put it directly into `git_pass`
|
||||
@@ -20,11 +29,11 @@ agent {
|
||||
# https://docs.github.com/en/authentication/keeping-your-account-and-data-secure/creating-a-personal-access-token
|
||||
# https://support.atlassian.com/bitbucket-cloud/docs/app-passwords/
|
||||
# https://docs.gitlab.com/ee/user/profile/personal_access_tokens.html
|
||||
git_user=""
|
||||
git_pass=""
|
||||
# git_user: ""
|
||||
# git_pass: ""
|
||||
# Limit credentials to a single domain, for example: github.com,
|
||||
# all other domains will use public access (no user/pass). Default: always send user/pass for any VCS domain
|
||||
git_host=""
|
||||
# git_host: ""
|
||||
|
||||
# Force GIT protocol to use SSH regardless of the git url (Assumes GIT user/pass are blank)
|
||||
force_git_ssh_protocol: false
|
||||
@@ -33,16 +42,6 @@ agent {
|
||||
# Force a specific SSH username when converting http to ssh links (the default username is 'git')
|
||||
# force_git_ssh_user: git
|
||||
|
||||
# unique name of this worker, if None, created based on hostname:process_id
|
||||
# Overridden with os environment: CLEARML_WORKER_ID
|
||||
# worker_id: "clearml-agent-machine1:gpu0"
|
||||
worker_id: ""
|
||||
|
||||
# worker name, replaces the hostname when creating a unique name for this worker
|
||||
# Overridden with os environment: CLEARML_WORKER_NAME
|
||||
# worker_name: "clearml-agent-machine1"
|
||||
worker_name: ""
|
||||
|
||||
# Set the python version to use when creating the virtual environment and launching the experiment
|
||||
# Example values: "/usr/bin/python3" or "/usr/local/bin/python3.6"
|
||||
# The default is the python executing the clearml_agent
|
||||
@@ -51,6 +50,22 @@ agent {
|
||||
# specific python version and the system supports multiple python the agent will use the requested python version)
|
||||
# ignore_requested_python_version: true
|
||||
|
||||
# Force the root folder of the git repository (instead of the working directory) into the PYHTONPATH
|
||||
# default false, only the working directory will be added to the PYHTONPATH
|
||||
# force_git_root_python_path: false
|
||||
|
||||
# if set, use GIT_ASKPASS to pass user/pass when cloning / fetch repositories
|
||||
# it solves passing user/token to git submodules.
|
||||
# this is a safer way to ensure multiple users using the same repository will
|
||||
# not accidentally leak credentials
|
||||
# Only supported on Linux systems, it will be the default in future releases
|
||||
# enable_git_ask_pass: false
|
||||
|
||||
# in docker mode, if container's entrypoint automatically activated a virtual environment
|
||||
# use the activated virtual environment and install everything there
|
||||
# set to False to disable, and always create a new venv inheriting from the system_site_packages
|
||||
# docker_use_activated_venv: true
|
||||
|
||||
# select python package manager:
|
||||
# currently supported: pip, conda and poetry
|
||||
# if "pip" or "conda" are used, the agent installs the required packages
|
||||
@@ -63,10 +78,11 @@ agent {
|
||||
# supported options: pip, conda, poetry
|
||||
type: pip,
|
||||
|
||||
# specify pip version to use (examples "<20", "==19.3.1", "", empty string will install the latest version)
|
||||
# pip_version: "<20"
|
||||
# specify pip version to use (examples "<20.2", "==19.3.1", "", empty string will install the latest version)
|
||||
# pip_version: ["<20.2 ; python_version < '3.10'", "<22.3 ; python_version >= '3.10'"]
|
||||
# specify poetry version to use (examples "<2", "==1.1.1", "", empty string will install the latest version)
|
||||
# poetry_version: "<2",
|
||||
# poetry_install_extra_args: ["-v"]
|
||||
|
||||
# virtual environment inheres packages from system
|
||||
system_site_packages: false,
|
||||
@@ -113,7 +129,7 @@ agent {
|
||||
# minimum required free space to allow for cache entry, disable by passing 0 or negative value
|
||||
free_space_threshold_gb: 2.0
|
||||
# unmark to enable virtual environment caching
|
||||
# path: ~/.clearml/venvs-cache
|
||||
path: ~/.clearml/venvs-cache
|
||||
},
|
||||
|
||||
# cached git clone folder
|
||||
@@ -136,6 +152,12 @@ agent {
|
||||
},
|
||||
|
||||
translate_ssh: true,
|
||||
|
||||
# set "disable_ssh_mount: true" to disable the automatic mount of ~/.ssh folder into the docker containers
|
||||
# default is false, automatically mounts ~/.ssh
|
||||
# Must be set to True if using "clearml-session" with this agent!
|
||||
# disable_ssh_mount: false
|
||||
|
||||
# reload configuration file every daemon execution
|
||||
reload_config: false,
|
||||
|
||||
@@ -219,7 +241,7 @@ agent {
|
||||
enable_task_env: false
|
||||
|
||||
# CUDA versions used for Conda setup & solving PyTorch wheel packages
|
||||
# it Should be detected automatically. Override with os environment CUDA_VERSION / CUDNN_VERSION
|
||||
# Should be detected automatically. Override with os environment CUDA_VERSION / CUDNN_VERSION
|
||||
# cuda_version: 10.1
|
||||
# cudnn_version: 7.6
|
||||
|
||||
@@ -245,7 +267,7 @@ agent {
|
||||
# pip_cache: "/root/.cache/pip"
|
||||
# poetry_cache: "/root/.cache/pypoetry"
|
||||
# vcs_cache: "/root/.clearml/vcs-cache"
|
||||
# venv_build: "/root/.clearml/venvs-builds"
|
||||
# venv_build: "~/.clearml/venvs-builds"
|
||||
# pip_download: "/root/.clearml/pip-download-cache"
|
||||
# }
|
||||
|
||||
@@ -325,6 +347,11 @@ sdk {
|
||||
key: ""
|
||||
secret: ""
|
||||
region: ""
|
||||
# Or enable credentials chain to let Boto3 pick the right credentials.
|
||||
# This includes picking credentials from environment variables,
|
||||
# credential file and IAM role using metadata service.
|
||||
# Refer to the latest Boto3 docs
|
||||
use_credentials_chain: false
|
||||
|
||||
credentials: [
|
||||
# specifies key/secret credentials to use when handling s3 urls (read or write)
|
||||
@@ -418,42 +445,46 @@ sdk {
|
||||
|
||||
# Apply top-level environment section from configuration into os.environ
|
||||
apply_environment: true
|
||||
# Top-level environment section is in the form of:
|
||||
# environment {
|
||||
# key: value
|
||||
# ...
|
||||
# }
|
||||
# and is applied to the OS environment as `key=value` for each key/value pair
|
||||
|
||||
# Apply top-level files section from configuration into local file system
|
||||
apply_files: true
|
||||
# Top-level files section allows auto-generating files at designated paths with a predefined contents
|
||||
# and target format. Options include:
|
||||
# contents: the target file's content, typically a string (or any base type int/float/list/dict etc.)
|
||||
# format: a custom format for the contents. Currently supported value is `base64` to automatically decode a
|
||||
# base64-encoded contents string, otherwise ignored
|
||||
# path: the target file's path, may include ~ and inplace env vars
|
||||
# target_format: format used to encode contents before writing into the target file. Supported values are json,
|
||||
# yaml, yml and bytes (in which case the file will be written in binary mode). Default is text mode.
|
||||
# overwrite: overwrite the target file in case it exists. Default is true.
|
||||
#
|
||||
# Example:
|
||||
# files {
|
||||
# myfile1 {
|
||||
# contents: "The quick brown fox jumped over the lazy dog"
|
||||
# path: "/tmp/fox.txt"
|
||||
# }
|
||||
# myjsonfile {
|
||||
# contents: {
|
||||
# some {
|
||||
# nested {
|
||||
# value: [1, 2, 3, 4]
|
||||
# }
|
||||
# }
|
||||
# }
|
||||
# path: "/tmp/test.json"
|
||||
# target_format: json
|
||||
# }
|
||||
# }
|
||||
}
|
||||
|
||||
# Environment section (top-level) is applied to the OS environment as `key=value` for each key/value pair
|
||||
# * enable/disable with `agent.apply_environment` OR `sdk.apply_environment`
|
||||
# Example:
|
||||
#
|
||||
# environment {
|
||||
# key_a: value_a
|
||||
# key_b: value_b
|
||||
# }
|
||||
|
||||
# Files section (top-level) allows auto-generating files at designated paths with
|
||||
# predefined content and target format.
|
||||
# * enable/disable with `agent.apply_files` OR `sdk.apply_files`
|
||||
# Files content options include:
|
||||
# contents: the target file's content, typically a string (or any base type int/float/list/dict etc.)
|
||||
# format: a custom format for the contents. Currently supported value is `base64` to automatically decode a
|
||||
# base64-encoded contents string, otherwise ignored
|
||||
# path: the target file's path, may include ~ and inplace env vars
|
||||
# target_format: format used to encode contents before writing into the target file. Supported values are json,
|
||||
# yaml, yml and bytes (in which case the file will be written in binary mode). Default is text mode.
|
||||
# overwrite: overwrite the target file in case it exists. Default is true.
|
||||
#
|
||||
# Example:
|
||||
# files {
|
||||
# myfile1 {
|
||||
# contents: "The quick brown fox jumped over the lazy dog"
|
||||
# path: "/tmp/fox.txt"
|
||||
# }
|
||||
# myjsonfile {
|
||||
# contents: {
|
||||
# some {
|
||||
# nested {
|
||||
# value: [1, 2, 3, 4]
|
||||
# }
|
||||
# }
|
||||
# }
|
||||
# path: "/tmp/test.json"
|
||||
# target_format: json
|
||||
# }
|
||||
# }
|
||||
|
||||
@@ -1,17 +1,15 @@
|
||||
attrs>=18.0,<20.4.0
|
||||
attrs>=18.0,<23.0.0
|
||||
enum34>=0.9,<1.2.0 ; python_version < '3.6'
|
||||
furl>=2.0.0,<2.2.0
|
||||
future>=0.16.0,<0.19.0
|
||||
jsonschema>=2.6.0,<3.3.0
|
||||
jsonschema>=2.6.0,<5.0.0
|
||||
pathlib2>=2.3.0,<2.4.0
|
||||
psutil>=3.4.2,<5.9.0
|
||||
pyhocon>=0.3.38,<0.4.0
|
||||
pyparsing>=2.0.3,<2.5.0
|
||||
psutil>=3.4.2,<5.10.0
|
||||
pyparsing>=2.0.3,<3.1.0
|
||||
python-dateutil>=2.4.2,<2.9.0
|
||||
pyjwt>=1.6.4,<2.1.0
|
||||
PyYAML>=3.12,<5.5.0
|
||||
requests>=2.20.0,<2.26.0
|
||||
six>=1.13.0,<1.16.0
|
||||
pyjwt>=2.4.0,<2.7.0
|
||||
PyYAML>=3.12,<6.1
|
||||
requests>=2.20.0,<2.29.0
|
||||
six>=1.13.0,<1.17.0
|
||||
typing>=3.6.4,<3.8.0 ; python_version < '3.5'
|
||||
urllib3>=1.21.1,<1.27.0
|
||||
virtualenv>=16,<21
|
||||
|
||||
Reference in New Issue
Block a user