Compare commits

...

104 Commits

Author SHA1 Message Date
allegroai
090327234a Version bump to v0.16.3 2020-12-22 20:18:30 +02:00
allegroai
3620c3a12d Update PyJWT requirement (v2.0.0 breaks interface) as well as other requirements constraints 2020-12-22 20:18:14 +02:00
allegroai
9a3f950ac6 Fix conform queue name to k8s standard 2020-12-13 16:21:29 +02:00
allegroai
0b36cb0f85 Change k8s pod naming scheme to include queue name 2020-12-10 14:19:19 +02:00
allegroai
dd42423482 Version bump to v0.16.2 2020-12-10 13:02:19 +02:00
allegroai
69eb25db1f Fix running trains-agent from conda environment - conda.sh not found in first conda PATH match 2020-12-10 09:53:18 +02:00
allegroai
a41ea52f87 Add multiple packages support 2020-12-10 09:52:00 +02:00
allegroai
259113c989 Add PackageCollectorRequirement to allow multiple entries of the same package 2020-12-06 12:16:56 +02:00
allegroai
1afa3a3914 Add torchcsprng and torchtext to PyTorch resolving. Improve debug prints on auto cuda version resolving. 2020-12-06 12:15:12 +02:00
allegroai
448e23825c Fix requirements dict with null entry in pip should be considered None and we should install from requirements.txt 2020-12-06 12:14:22 +02:00
allegroai
b0c0f41f62 Allow zero context diffs (useful when blind patching repository) 2020-12-06 12:13:28 +02:00
allegroai
d2c5fb6512 Add K8s glue example --gateway-address settings properties/k8s-gateway-address on all Tasks 2020-12-06 12:12:42 +02:00
allegroai
b89cf4ec23 version bump 2020-11-29 23:17:50 +02:00
allegroai
74b646af9e Add pass TRAINS_DOCKER_IMAGE into docker for interactive sessions 2020-11-29 23:16:40 +02:00
allegroai
0cf485f7a9 Improve k8s nvidia container integration 2020-11-26 01:15:49 +02:00
allegroai
ea63e4f66e Add --ssh-server-port to k8s glue service 2020-11-26 01:15:20 +02:00
allegroai
58eb5fbd5f Fix torch CUDA 11.1 support 2020-11-26 01:14:36 +02:00
allegroai
a8c543ef7b Fix nvidia pytorch dockers support 2020-11-25 16:45:09 +02:00
allegroai
64e198a57a Fix nvidia docker support on some linux distros (SUSE) 2020-11-25 16:44:37 +02:00
allegroai
de332b9e6b Document '--stop' usage 2020-11-19 12:36:58 +02:00
allegroai
60eeff292d version bump 2020-11-11 17:11:51 +02:00
allegroai
52f30b306a Fix git diff with empty line at the end of the git diff will cause corrupt diff apply message 2020-11-11 17:11:28 +02:00
allegroai
6df0f81ca0 Fix uid is None causes ValueError in str.startswith(). Fix str.split (should be on the filename itself, not the path). 2020-11-11 16:32:47 +02:00
allegroai
40b3c1502d Add extra_bash_init_script to k8s glue. Default config is the raw config file (not created at runtime) 2020-11-11 16:31:25 +02:00
allegroai
a61265effe Improve trying to find conda executable 2020-11-11 16:29:50 +02:00
allegroai
92efea6b76 Add agent.package_manager.force_repo_requirements_txt. If True, "Installed Packages" on Task are ignored, and only repo requirements.txt is used 2020-11-11 16:29:00 +02:00
allegroai
216b3e2179 Allow to specifying cudatoolkit version in "installed packages" when using Conda as package manager (trains issue #229) 2020-10-30 10:06:02 +02:00
allegroai
293a92f486 Improve k8s glue add --template-yaml 2020-10-23 01:28:22 +03:00
allegroai
6bad2b5352 Fix support non-ascii git diff 2020-10-23 01:27:59 +03:00
allegroai
a09a638b9c Improve k8s glue layer 2020-10-22 18:09:56 +03:00
allegroai
24f57270ed version bump 2020-10-22 18:09:23 +03:00
allegroai
1b7964ce98 Add k8s select external trains.conf file for the pod itself 2020-10-21 19:04:38 +03:00
allegroai
5a510882b8 Ignore environment SSH_AUTH_SOCK. Only check if git_user/pass are configured, if they are not, leave the links as they are 2020-10-21 19:02:29 +03:00
allegroai
601ed03198 Add support for k8s pod custom user properties 2020-10-20 23:48:02 +03:00
allegroai
90fe4570b9 Show k8s pod number in task's User Properties configuration section 2020-10-20 23:27:04 +03:00
allegroai
92fc8e838f Add K8s glue support for limited number of services exposing ports 2020-10-20 14:17:30 +03:00
allegroai
89a3020c5e Fix ubuntu/debian support by making sure not to ask for input (fix tzdata install) 2020-10-15 23:32:17 +03:00
allegroai
fc3e47b67e Add suppress_carriage_return to documentation
Add docker_preprocess_bash_script to allow preprocessing bash to be added
Fix multiple python versions installed in the same docker by finding the highest installed python inside the docker
Fix conda_env_as_base_docker not set to False in docker mode
2020-10-15 23:31:01 +03:00
allegroai
b2a80ca314 Fix Trains examples references 2020-10-15 23:28:53 +03:00
allegroai
14655f19a0 Fix conda PYTHONPATH (point only to code, not to venv) 2020-10-15 23:26:58 +03:00
allegroai
47092c47db Fix apply git diff from submodule only 2020-10-15 23:26:52 +03:00
allegroai
8e6fce8d63 Add conda support for read-only pre-built environment (pass conda folder as docker_cmd on Task).
Fix conda restore prebuild tar.gz file, fix conda prefix by call conda-unpack from unzipped conda env.
2020-10-15 23:25:57 +03:00
allegroai
3c514e3418 Make sure TRAINS_AGENT_K8S_HOST_MOUNT is used only once per mount 2020-10-15 23:24:51 +03:00
allegroai
8a425b100b Fix k8s glue script to trains-agent default docker script 2020-10-15 23:24:21 +03:00
allegroai
eb942cfedd Add agent.package_manager.conda_env_as_base_docker allowing "docker_cmd" to contain link to a full pre-packaged conda environment (conda-pack outputs a tar.gz). Use TRAINS_CONDA_ENV_PACKAGE to specify conda tar.gz file. 2020-10-15 23:23:46 +03:00
Allegro AI
0a7fc06108 Merge pull request #31 from eliorc/master
Fix broken links in README.md
2020-10-14 16:13:40 +03:00
Elior Cohen
0ae35afa76 📝 Broken links in README.md 2020-10-14 10:43:33 +03:00
allegroai
a2156e73bf Fix conda pip freeze to be consistent with trains 0.16.3 2020-10-11 11:25:35 +03:00
allegroai
9fe77f3c28 Fix conda environment support for trains 0.16.3 full env. Add agent.package_manager.conda_full_env_update to allow conda to update back the requirements (default is false, to preserve previous behavior) 2020-10-11 11:24:52 +03:00
allegroai
6f078afafd Add Requirement.clone() 2020-10-11 11:21:49 +03:00
allegroai
15f4aa613e Suppress "\r" when reading a current chunk of a file. Add agent.suppress_carriage_return (default True) to support previous behavior. 2020-10-11 11:21:08 +03:00
allegroai
7cd9fa6c41 Version bump to v0.16.1 2020-10-05 18:27:07 +03:00
allegroai
234d5fac2c When using force ssh protocol, only enforce on git_host if provided, otherwise apply everywhere 2020-10-05 18:26:21 +03:00
allegroai
6cbfb96ff8 Rename git_domain to git_host 2020-10-05 11:25:03 +03:00
allegroai
6e54e55c31 Add agent.force_git_ssh_port to control https to ssh link conversion for non standard ssh port 2020-10-04 19:42:44 +03:00
allegroai
3ff85b7b85 Replace back package version on conda and pip 2020-10-04 19:41:26 +03:00
allegroai
5640489f57 Replace torch version on pre-installed local file 2020-10-04 19:40:39 +03:00
allegroai
8135a6facf Add agent.git_domain setting for limiting git credential usage for a specific domain (env var TRAINS_AGENT_GIT_DOMAIN is also supported) 2020-10-04 19:40:04 +03:00
allegroai
b6ae4f211d Fix "package @ " should processed by us (pip will not test pre-installed version of the package compared with the link) 2020-10-04 19:38:33 +03:00
allegroai
a56f032ec4 Fix torch support to not change back the same link 2020-10-04 19:37:12 +03:00
allegroai
075736de20 Translate downloaded URL back to original link when new pip version is installed (otherwise we end up with file:///... links) 2020-10-04 19:36:14 +03:00
allegroai
d8543c892e When new pip version is installed, no need to install git packages twice (pip freeze will detect the correct git link version) 2020-10-04 19:35:26 +03:00
allegroai
ca0870b048 Allow parsing of "package @ scheme://link" lines in requirements 2020-10-04 19:34:32 +03:00
allegroai
c7a739fafa Add support for detecting new pip version (20+) supporting @ in requirements 2020-10-04 19:33:52 +03:00
allegroai
7170296162 Remove warning on '.' (same as an empty working directory) 2020-10-04 19:32:48 +03:00
allegroai
3bed0ef33c Add protection against bad file name parsing in git diff apply 2020-10-04 19:31:48 +03:00
allegroai
d419fa1e4f Update torch version after using system pre-installed version 2020-10-04 19:29:47 +03:00
allegroai
31a56c71bd Add preliminary agent uptime/downtime support 2020-09-29 19:34:51 +03:00
allegroai
28f47419b0 Fix incorrect check for spaces in current execution folder (only check in cache folders) 2020-09-15 20:26:02 +03:00
allegroai
6a24da2849 Add post_packages post_optional_packages to control packages installed after all the rest (e.g. horovod)
Rename CythonReq to PriorityPackageRequirement and HorovodReq to PostRequirement
2020-09-15 20:20:55 +03:00
allegroai
782668fd21 Add sdk.metrics.plot_max_num_digits to reduce plot storage size 2020-09-05 16:37:17 +03:00
allegroai
aaf8d802e7 Update documentation 2020-09-05 16:37:17 +03:00
allegroai
ca89a1e322 Fix pre-installed packages are ignored when installing a git package wheel. Reinstalling a git+http link is enough to make sure all requirements are met/installed (trains issue #196) 2020-09-05 16:37:17 +03:00
allegroai
121dec2a62 Version bump to v0.16.0 2020-08-10 17:28:00 +03:00
allegroai
4aacf9005e Fix GPU Windows monitoring support (Trains Issue #177) 2020-08-10 08:07:51 +03:00
allegroai
6b333202e9 Sync generated conf file with latest Trains 2020-08-08 14:44:45 +03:00
allegroai
ce6831368f Fix GPU monitoring on Windows machines 2020-08-08 14:43:25 +03:00
allegroai
e4111c830b Fix GIT user/pass in requirements and support for '-e git+http' lines 2020-07-30 14:30:23 +03:00
allegroai
52c1772b04 Add requirement_parser into trains-agent instead as a dependency. Fix requirement_parser to support 'package @ git+http' lines 2020-07-30 14:29:37 +03:00
allegroai
699d13bbb3 Fix task status change to queued should also never happen during Task runtime 2020-07-14 23:42:11 +03:00
allegroai
2c8d7d3d9a Fix --debug to set all specified loggers to DEBUG
Add set_urllib_log_level, in debug set urllib log level to DEBUG
2020-07-11 01:45:46 +03:00
allegroai
b13cc1e8e7 Add error message when Trains API Server is not accessible on startup 2020-07-11 01:44:45 +03:00
allegroai
17d2bf2a3e Change daemon --stop without any specific flag to terminate the agents by worker id lexicographic order 2020-07-11 01:43:54 +03:00
allegroai
94997f9c88 Add daemon --order-fairness for round-robin queue pulling
Add daemon --stop to terminate running agent (assume all the rest of the arguments are the same)
Clean up all log files on termination unless executed with --debug
2020-07-11 01:42:56 +03:00
allegroai
c6d998c4df Add terminate process and rmtree utilities 2020-07-11 01:40:50 +03:00
allegroai
f8ea445339 Fix docker to use UTF-8 encoding, so prints won't break it 2020-07-11 01:40:14 +03:00
allegroai
712efa208b version bump 2020-07-06 21:09:21 +03:00
allegroai
09b6b6a9de Fix non-root docker image usage
Fix broken trains-agent build
Improve support for dockers with preinstalled conda env
Improve trains-agent-docker spinning
2020-07-06 21:09:11 +03:00
allegroai
98ff9a50e6 Changed agent.docker_init_bash_script default value in comment 2020-07-06 21:05:55 +03:00
allegroai
1f4d358316 Changed default docker image from nvidia/cuda to "nvidia/cuda:10.1-runtime-ubuntu18.04" to support cudnn frameworks (TF) 2020-07-02 01:35:57 +03:00
allegroai
f693fa165c Fix .git-credentials and .gitconfig mapping into docker
Add agent.docker_init_bash_script allow finer control over docker startup script
2020-07-02 01:33:13 +03:00
allegroai
c43084825c Version bump to v0.15.1 2020-06-21 23:23:44 +03:00
allegroai
f1abee91dd Add FORCE_LOCAL_TRAINS_AGENT_WHEEL to force the install of local trains agent wheel into the docker image 2020-06-21 23:23:26 +03:00
allegroai
c6b04edc34 version bump 2020-06-18 01:55:30 +03:00
allegroai
50b847f4f7 Add trains-agent dockerfile 2020-06-18 01:55:24 +03:00
allegroai
1f53a06299 Add agent.force_git_ssh_protocol option to force all git links to ssh:// (issue #16)
Add git user/pass credentials for pip git packages (git+http and  git+ssh) (issue #22)
2020-06-18 01:55:14 +03:00
allegroai
257dd95401 Add warning on --gpus without detected CUDA version (see issue #24) 2020-06-18 01:52:58 +03:00
allegroai
1736d205bb Documentation 2020-06-18 00:31:44 +03:00
allegroai
6fef58df6c embed jsonmodels 2.4 into trains-agent 2020-06-18 00:30:40 +03:00
allegroai
473a8de8bb Fix trains-agent init (max two verification retries, then print error) 2020-06-11 15:39:38 +03:00
Allegro AI
ff6272f48f Merge pull request #23 from H4dr1en/patch-1
remove six and pathlib2 dependencies from setup.py
2020-06-05 19:20:09 +03:00
H4dr1en
1b5bcebd10 remove six and pathlib2 dependencies from setup.py 2020-06-05 18:01:35 +02:00
Allegro AI
c4344d3afd Update README.md 2020-06-02 01:02:34 +03:00
Allegro AI
45a44b087a Update README.md 2020-06-02 00:58:52 +03:00
58 changed files with 4085 additions and 509 deletions

129
README.md
View File

@@ -1,4 +1,4 @@
# TRAINS Agent
# Allegro Trains Agent
## Deep Learning DevOps For Everyone - Now supporting all platforms (Linux, macOS, and Windows)
"All the Deep-Learning DevOps your research needs, and then some... Because ain't nobody got time for that"
@@ -10,27 +10,27 @@
### Help improve Trains by filling our 2-min [user survey](https://allegro.ai/lp/trains-user-survey/)
**TRAINS Agent is an AI experiment cluster solution.**
**Trains Agent is an AI experiment cluster solution.**
It is a zero configuration fire-and-forget execution agent, which combined with trains-server provides a full AI cluster solution.
**Full AutoML in 5 steps**
1. Install the [TRAINS server](https://github.com/allegroai/trains-agent) (or use our [open server](https://demoapp.trains.allegro.ai))
2. `pip install trains-agent` ([install](#installing-the-trains-agent) the TRAINS agent on any GPU machine: on-premises / cloud / ...)
3. Add [TRAINS](https://github.com/allegroai/trains) to your code with just 2 lines & run it once (on your machine / laptop)
1. Install the [Trains Server](https://github.com/allegroai/trains-agent) (or use our [open server](https://demoapp.trains.allegro.ai))
2. `pip install trains-agent` ([install](#installing-the-trains-agent) the Trains Agent on any GPU machine: on-premises / cloud / ...)
3. Add [Trains](https://github.com/allegroai/trains) to your code with just 2 lines & run it once (on your machine / laptop)
4. Change the [parameters](#using-the-trains-agent) in the UI & schedule for [execution](#using-the-trains-agent) (or automate with an [AutoML pipeline](#automl-and-orchestration-pipelines-))
5. :chart_with_downwards_trend: :chart_with_upwards_trend: :eyes: :beer:
**Using the TRAINS agent, you can now set up a dynamic cluster with \*epsilon DevOps**
**Using the Trains Agent, you can now set up a dynamic cluster with \*epsilon DevOps**
*epsilon - Because we are scientists :triangular_ruler: and nothing is really zero work
(Experience TRAINS live at [https://demoapp.trains.allegro.ai](https://demoapp.trains.allegro.ai))
(Experience Trains live at [https://demoapp.trains.allegro.ai](https://demoapp.trains.allegro.ai))
<a href="https://demoapp.trains.allegro.ai"><img src="https://raw.githubusercontent.com/allegroai/trains-agent/9f1e86c1ca45c984ee13edc9353c7b10c55d7257/docs/screenshots.gif" width="100%"></a>
## Simple, Flexible Experiment Orchestration
**The TRAINS Agent was built to address the DL/ML R&D DevOps needs:**
**The Trains Agent was built to address the DL/ML R&D DevOps needs:**
* Easily add & remove machines from the cluster
* Reuse machines without the need for any dedicated containers or images
@@ -51,30 +51,30 @@ If you are considering K8S for your research, also consider that you will soon b
In our experience, handling and building the environments, having to package every experiment in a docker, managing those hundreds (or more) containers and building pipelines on top of it all, is very complicated (also, its usually out of scope for the research team, and overwhelming even for the DevOps team).
We feel there has to be a better way, that can be just as powerful for R&D and at the same time allow integration with K8S **when the need arises**.
(If you already have a K8S cluster for AI, detailed instructions on how to integrate TRAINS into your K8S cluster are *coming soon*.)
(If you already have a K8S cluster for AI, detailed instructions on how to integrate Trains into your K8S cluster are [here](https://github.com/allegroai/trains-server-k8s/tree/master/trains-server-chart) with included [helm chart](https://github.com/allegroai/trains-server-helm))
## Using the TRAINS Agent
## Using the Trains Agent
**Full scale HPC with a click of a button**
TRAINS Agent is a job scheduler that listens on job queue(s), pulls jobs, sets the job environments, executes the job and monitors its progress.
The Trains Agent is a job scheduler that listens on job queue(s), pulls jobs, sets the job environments, executes the job and monitors its progress.
Any 'Draft' experiment can be scheduled for execution by a TRAINS agent.
Any 'Draft' experiment can be scheduled for execution by a Trains agent.
A previously run experiment can be put into 'Draft' state by either of two methods:
* Using the **'Reset'** action from the experiment right-click context menu in the
TRAINS UI - This will clear any results and artifacts the previous run had created.
Trains UI - This will clear any results and artifacts the previous run had created.
* Using the **'Clone'** action from the experiment right-click context menu in the
TRAINS UI - This will create a new 'Draft' experiment with the same configuration as the original experiment.
Trains UI - This will create a new 'Draft' experiment with the same configuration as the original experiment.
An experiment is scheduled for execution using the **'Enqueue'** action from the experiment
right-click context menu in the TRAINS UI and selecting the execution queue.
right-click context menu in the Trains UI and selecting the execution queue.
See [creating an experiment and enqueuing it for execution](#from-scratch).
Once an experiment is enqueued, it will be picked up and executed by a TRAINS agent monitoring this queue.
Once an experiment is enqueued, it will be picked up and executed by a Trains agent monitoring this queue.
The TRAINS UI Workers & Queues page provides ongoing execution information:
The Trains UI Workers & Queues page provides ongoing execution information:
- Workers Tab: Monitor you cluster
- Review available resources
- Monitor machines statistics (CPU / GPU / Disk / Network)
@@ -83,16 +83,16 @@ The TRAINS UI Workers & Queues page provides ongoing execution information:
- Cancel or abort job execution
- Move jobs between execution queues
### What The TRAINS Agent Actually Does
The TRAINS agent executes experiments using the following process:
### What The Trains Agent Actually Does
The Trains Agent executes experiments using the following process:
- Create a new virtual environment (or launch the selected docker image)
- Clone the code into the virtual-environment (or inside the docker)
- Install python packages based on the package requirements listed for the experiment
- Special note for PyTorch: The TRAINS agent will automatically select the
- Special note for PyTorch: The Trains Agent will automatically select the
torch packages based on the CUDA_VERSION environment variable of the machine
- Execute the code, while monitoring the process
- Log all stdout/stderr in the TRAINS UI, including the cloning and installation process, for easy debugging
- Monitor the execution and allow you to manually abort the job using the TRAINS UI (or, in the unfortunate case of a code crash, catch the error and signal the experiment has failed)
- Log all stdout/stderr in the Trains UI, including the cloning and installation process, for easy debugging
- Monitor the execution and allow you to manually abort the job using the Trains UI (or, in the unfortunate case of a code crash, catch the error and signal the experiment has failed)
### System Design & Flow
```text
@@ -100,24 +100,24 @@ The TRAINS agent executes experiments using the following process:
| GPU Machine |
Development Machine | |
+------------------------+ | +-------------+ |
| Data Scientist's | +--------------+ | |TRAINS Agent | |
| Data Scientist's | +--------------+ | |Trains Agent | |
| DL/ML Code | | WEB UI | | | | |
| | | | | | +---------+ | |
| | | | | | | DL/ML | | |
| | +--------------+ | | | Code | | |
| | User Clones Exp #1 / . . . . . . . / | | | | | |
| +-------------------+ | into Exp #2 / . . . . . . . / | | +---------+ | |
| | TRAINS | | +---------------/-_____________-/ | | | |
| | Trains | | +---------------/-_____________-/ | | | |
| +---------+---------+ | | | | ^ | |
+-----------|------------+ | | +------|------+ |
| | +--------|--------+
Auto-Magically | |
Creates Exp #1 | The TRAINS Agent
Creates Exp #1 | The Trains Agent
\ User Change Hyper-Parameters Pulls Exp #2, setup the
| | environment & clone code.
| | Start execution with the
+------------|------------+ | +--------------------+ new set of Hyper-Parameters.
| +---------v---------+ | | | TRAINS-SERVER | |
| +---------v---------+ | | | Trains Server | |
| | Experiment #1 | | | | | |
| +-------------------+ | | | Execution Queue | |
| || | | | | |
@@ -128,17 +128,17 @@ Development Machine |
| | ------------->---------------+ | |
| | User Send Exp #2 | |Execute Exp #2 +--------------------+
| | For Execution | +---------------+ |
| TRAINS-SERVER | | |
| Trains Server | | |
+-------------------------+ +--------------------+
```
### Installing the TRAINS Agent
### Installing the Trains Agent
```bash
pip install trains-agent
```
### TRAINS Agent Usage Examples
### Trains Agent Usage Examples
Full Interface and capabilities are available with
```bash
@@ -146,22 +146,22 @@ trains-agent --help
trains-agent daemon --help
```
### Configuring the TRAINS Agent
### Configuring the Trains Agent
```bash
trains-agent init
```
Note: The TRAINS agent uses a cache folder to cache pip packages, apt packages and cloned repositories. The default TRAINS Agent cache folder is `~/.trains`
Note: The Trains Agent uses a cache folder to cache pip packages, apt packages and cloned repositories. The default Trains Agent cache folder is `~/.trains`
See full details in your configuration file at `~/trains.conf`
Note: The **TRAINS agent** extends the **TRAINS** configuration file `~/trains.conf`
Note: The **Trains agent** extends the **Trains** configuration file `~/trains.conf`
They are designed to share the same configuration file, see example [here](docs/trains.conf)
### Running the TRAINS Agent
### Running the Trains Agent
For debug and experimentation, start the TRAINS agent in `foreground` mode, where all the output is printed to screen
For debug and experimentation, start the Trains agent in `foreground` mode, where all the output is printed to screen
```bash
trains-agent daemon --queue default --foreground
```
@@ -190,9 +190,9 @@ trains-agent daemon --detached --gpus 0,1 --queue dual_gpu
trains-agent daemon --detached --gpus 2,3 --queue dual_gpu
```
#### Starting the TRAINS Agent in docker mode
#### Starting the Trains Agent in docker mode
For debug and experimentation, start the TRAINS agent in `foreground` mode, where all the output is printed to screen
For debug and experimentation, start the Trains agent in `foreground` mode, where all the output is printed to screen
```bash
trains-agent daemon --queue default --docker --foreground
```
@@ -215,7 +215,7 @@ trains-agent daemon --detached --gpus 0,1 --queue dual_gpu --docker nvidia/cuda
trains-agent daemon --detached --gpus 2,3 --queue dual_gpu --docker nvidia/cuda
```
#### Starting the TRAINS Agent - Priority Queues
#### Starting the Trains Agent - Priority Queues
Priority Queues are also supported, example use case:
@@ -223,14 +223,22 @@ High priority queue: `important_jobs` Low priority queue: `default`
```bash
trains-agent daemon --queue important_jobs default
```
The **TRAINS agent** will first try to pull jobs from the `important_jobs` queue, only then it will fetch a job from the `default` queue.
The **Trains Agent** will first try to pull jobs from the `important_jobs` queue, only then it will fetch a job from the `default` queue.
Adding queues, managing job order within a queue and moving jobs between queues, is available using the Web UI, see example on our [open server](https://demoapp.trains.allegro.ai/workers-and-queues/queues)
# How do I create an experiment on the TRAINS server? <a name="from-scratch"></a>
* Integrate [TRAINS](https://github.com/allegroai/trains) with your code
#### Stopping the Trains Agent
To stop a **Trains Agent** running in the background, run the same command line used to start the agent with `--stop` appended.
For example, to stop the first of the above shown same machine, single gpu agents:
```bash
trains-agent daemon --detached --gpus 0 --queue default --docker nvidia/cuda --stop
```
## How do I create an experiment on the Trains Server? <a name="from-scratch"></a>
* Integrate [Trains](https://github.com/allegroai/trains) with your code
* Execute the code on your machine (Manually / PyCharm / Jupyter Notebook)
* As your code is running, **TRAINS** creates an experiment logging all the necessary execution information:
* As your code is running, **Trains** creates an experiment logging all the necessary execution information:
- Git repository link and commit ID (or an entire jupyter notebook)
- Git diff (were not saying you never commit and push, but still...)
- Python packages used by your code (including specific versions used)
@@ -239,7 +247,7 @@ Adding queues, managing job order within a queue and moving jobs between queues,
You now have a 'template' of your experiment with everything required for automated execution
* In the TRAINS UI, Right click on the experiment and select 'clone'. A copy of your experiment will be created.
* In the Trains UI, Right click on the experiment and select 'clone'. A copy of your experiment will be created.
* You now have a new draft experiment cloned from your original experiment, feel free to edit it
- Change the Hyper-Parameters
- Switch to the latest code base of the repository
@@ -248,23 +256,44 @@ Adding queues, managing job order within a queue and moving jobs between queues,
- Or simply change nothing to run the same experiment again...
* Schedule the newly created experiment for execution: Right-click the experiment and select 'enqueue'
# AutoML and Orchestration Pipelines <a name="automl-pipes"></a>
The TRAINS Agent can also be used to implement AutoML orchestration and Experiment Pipelines in conjunction with the TRAINS package.
## Trains-Agent Services Mode <a name="services"></a>
Sample AutoML & Orchestration examples can be found in the TRAINS [example/automl](https://github.com/allegroai/trains/tree/master/examples/automl) folder.
Trains-Agent Services is a special mode of Trains-Agent that provides the ability to launch long-lasting jobs
that previously had to be executed on local / dedicated machines. It allows a single agent to
launch multiple dockers (Tasks) for different use cases. To name a few use cases, auto-scaler service (spinning instances
when the need arises and the budget allows), Controllers (Implementing pipelines and more sophisticated DevOps logic),
Optimizer (such as Hyper-parameter Optimization or sweeping), and Application (such as interactive Bokeh apps for
increased data transparency)
Trains-Agent Services mode will spin **any** task enqueued into the specified queue.
Every task launched by Trains-Agent Services will be registered as a new node in the system,
providing tracking and transparency capabilities.
Currently trains-agent in services-mode supports cpu only configuration. Trains-agent services mode can be launched alongside GPU agents.
```bash
trains-agent daemon --services-mode --detached --queue services --create-queue --docker ubuntu:18.04 --cpu-only
```
**Note**: It is the user's responsibility to make sure the proper tasks are pushed into the specified queue.
## AutoML and Orchestration Pipelines <a name="automl-pipes"></a>
The Trains Agent can also be used to implement AutoML orchestration and Experiment Pipelines in conjunction with the Trains package.
Sample AutoML & Orchestration examples can be found in the Trains [example/automation](https://github.com/allegroai/trains/tree/master/examples/automation) folder.
AutoML examples
- [Toy Keras training experiment](https://github.com/allegroai/trains/blob/master/examples/automl/automl_base_template_keras_simple.py)
- [Toy Keras training experiment](https://github.com/allegroai/trains/blob/master/examples/optimization/hyper-parameter-optimization/base_template_keras_simple.py)
- In order to create an experiment-template in the system, this code must be executed once manually
- [Random Search over the above Keras experiment-template](https://github.com/allegroai/trains/blob/master/examples/automl/automl_random_search_example.py)
- [Random Search over the above Keras experiment-template](https://github.com/allegroai/trains/blob/master/examples/automation/manual_random_param_search_example.py)
- This example will create multiple copies of the Keras experiment-template, with different hyper-parameter combinations
Experiment Pipeline examples
- [First step experiment](https://github.com/allegroai/trains/blob/master/examples/automl/task_piping_example.py)
- [First step experiment](https://github.com/allegroai/trains/blob/master/examples/automation/task_piping_example.py)
- This example will "process data", and once done, will launch a copy of the 'second step' experiment-template
- [Second step experiment](https://github.com/allegroai/trains/blob/master/examples/automl/toy_base_task.py)
- [Second step experiment](https://github.com/allegroai/trains/blob/master/examples/automation/toy_base_task.py)
- In order to create an experiment-template in the system, this code must be executed once manually
# License
## License
Apache License, Version 2.0 (see the [LICENSE](https://www.apache.org/licenses/LICENSE-2.0.html) for more information)

18
docker/agent/Dockerfile Normal file
View File

@@ -0,0 +1,18 @@
# syntax = docker/dockerfile
FROM nvidia/cuda
WORKDIR /usr/agent
COPY . /usr/agent
RUN apt-get update
RUN apt-get dist-upgrade -y
RUN apt-get install -y curl python3-pip git
RUN curl -sSL https://get.docker.com/ | sh
RUN python3 -m pip install -U pip
RUN python3 -m pip install trains-agent
RUN python3 -m pip install -U "cryptography>=2.9"
ENV TRAINS_DOCKER_SKIP_GPUS_FLAG=1
ENTRYPOINT ["/usr/agent/entrypoint.sh"]

19
docker/agent/entrypoint.sh Executable file
View File

@@ -0,0 +1,19 @@
#!/bin/sh
LOWER_PIP_UPDATE_VERSION="$(echo "$PIP_UPDATE_VERSION" | tr '[:upper:]' '[:lower:]')"
LOWER_TRAINS_AGENT_UPDATE_VERSION="$(echo "$TRAINS_AGENT_UPDATE_VERSION" | tr '[:upper:]' '[:lower:]')"
if [ "$LOWER_PIP_UPDATE_VERSION" = "yes" ] || [ "$LOWER_PIP_UPDATE_VERSION" = "true" ] ; then
python3 -m pip install -U pip
elif [ ! -z "$LOWER_PIP_UPDATE_VERSION" ] ; then
python3 -m pip install pip$LOWER_PIP_UPDATE_VERSION ;
fi
echo "TRAINS_AGENT_UPDATE_VERSION = $LOWER_TRAINS_AGENT_UPDATE_VERSION"
if [ "$LOWER_TRAINS_AGENT_UPDATE_VERSION" = "yes" ] || [ "$LOWER_TRAINS_AGENT_UPDATE_VERSION" = "true" ] ; then
python3 -m pip install trains-agent -U
elif [ ! -z "$LOWER_TRAINS_AGENT_UPDATE_VERSION" ] ; then
python3 -m pip install trains-agent$LOWER_TRAINS_AGENT_UPDATE_VERSION ;
fi
python3 -m trains_agent daemon --docker "$TRAINS_AGENT_DEFAULT_BASE_DOCKER" --force-current-version $TRAINS_AGENT_EXTRA_ARGS

View File

@@ -0,0 +1,25 @@
# syntax = docker/dockerfile
FROM ubuntu:18.04
WORKDIR /usr/agent
COPY . /usr/agent
ENV LC_ALL=en_US.UTF-8
ENV LANG=en_US.UTF-8
ENV LANGUAGE=en_US.UTF-8
ENV PYTHONIOENCODING=UTF-8
RUN apt-get update
RUN apt-get dist-upgrade -y
RUN apt-get install -y locales
RUN locale-gen en_US.UTF-8
RUN apt-get install -y curl python3-pip git
RUN curl -sSL https://get.docker.com/ | sh
RUN python3 -m pip install -U pip
RUN python3 -m pip install trains-agent
RUN python3 -m pip install -U "cryptography>=2.9"
ENTRYPOINT ["/usr/agent/entrypoint.sh"]

14
docker/services/entrypoint.sh Executable file
View File

@@ -0,0 +1,14 @@
#!/bin/sh
if [ -z "$TRAINS_FILES_HOST" ]; then
TRAINS_HOST_IP=${TRAINS_HOST_IP:-$(curl -s https://ifconfig.me/ip)}
fi
TRAINS_FILES_HOST=${TRAINS_FILES_HOST:-"http://$TRAINS_HOST_IP:8081"}
TRAINS_WEB_HOST=${TRAINS_WEB_HOST:-"http://$TRAINS_HOST_IP:8080"}
TRAINS_API_HOST=${TRAINS_API_HOST:-"http://$TRAINS_HOST_IP:8008"}
echo $TRAINS_FILES_HOST $TRAINS_WEB_HOST $TRAINS_API_HOST 1>&2
python3 -m pip install -q -U "trains-agent${TRAINS_AGENT_UPDATE_VERSION}"
trains-agent daemon --services-mode --queue services --create-queue --docker "$TRAINS_AGENT_DEFAULT_BASE_DOCKER" --cpu-only $TRAINS_AGENT_EXTRA_ARGS

View File

@@ -13,11 +13,18 @@ api {
}
agent {
# Set GIT user/pass credentials
# leave blank for GIT SSH credentials
# Set GIT user/pass credentials (if user/pass are set, GIT protocol will be set to https)
# leave blank for GIT SSH credentials (set force_git_ssh_protocol=true to force SSH protocol)
git_user=""
git_pass=""
# Limit credentials to a single domain, for example: github.com,
# all other domains will use public access (no user/pass). Default: always send user/pass for any VCS domain
git_host=""
# Force GIT protocol to use SSH regardless of the git url (Assumes GIT user/pass are blank)
force_git_ssh_protocol: false
# Force a specific SSH port when converting http to ssh links (the domain is kept the same)
# force_git_ssh_port: ""
# unique name of this worker, if None, created based on hostname:process_id
# Overridden with os environment: TRAINS_WORKER_NAME
@@ -55,6 +62,24 @@ agent {
# additional conda channels to use when installing with conda package manager
conda_channels: ["pytorch", "conda-forge", ]
# conda_full_env_update: false
# conda_env_as_base_docker: false
# set the priority packages to be installed before the rest of the required packages
# priority_packages: ["cython", "numpy", "setuptools", ]
# set the optional priority packages to be installed before the rest of the required packages,
# In case a package installation fails, the package will be ignored,
# and the virtual environment process will continue
# priority_optional_packages: ["pygobject", ]
# set the post packages to be installed after all the rest of the required packages
# post_packages: ["horovod", ]
# set the optional post packages to be installed after all the rest of the required packages,
# In case a package installation fails, the package will be ignored,
# and the virtual environment process will continue
# post_optional_packages: []
# set to True to support torch nightly build installation,
# notice: torch nightly builds are ephemeral and are deleted from time to time
@@ -104,11 +129,16 @@ agent {
default_docker: {
# default docker image to use when running in docker mode
image: "nvidia/cuda"
image: "nvidia/cuda:10.1-runtime-ubuntu18.04"
# optional arguments to pass to docker image
# arguments: ["--ipc=host"]
}
# CUDA versions used for Conda setup & solving PyTorch wheel packages
# it Should be detected automatically. Override with os environment CUDA_VERSION / CUDNN_VERSION
# cuda_version: 10.1
# cudnn_version: 7.6
}
sdk {
@@ -139,6 +169,9 @@ sdk {
# X images are stored in the upload destination for each matplotlib plot title.
matplotlib_untitled_history_size: 100
# Limit the number of digits after the dot in plot reporting (reducing plot report size)
# plot_max_num_digits: 5
# Settings for generated debug images
images {
format: JPEG

View File

@@ -0,0 +1,75 @@
"""
This example assumes you have preconfigured services with selectors in the form of
"ai.allegro.agent.serial=pod-<number>" and a targetPort of 10022.
The K8sIntegration component will label each pod accordingly.
"""
from argparse import ArgumentParser
from trains_agent.glue.k8s import K8sIntegration
def parse_args():
parser = ArgumentParser()
parser.add_argument(
"--queue", type=str, help="Queue to pull tasks from"
)
parser.add_argument(
"--ports-mode", action='store_true', default=False,
help="Ports-Mode will add a label to the pod which can be used as service, in order to expose ports"
)
parser.add_argument(
"--num-of-services", type=int, default=20,
help="Specify the number of k8s services to be used. Use only with ports-mode."
)
parser.add_argument(
"--base-port", type=int,
help="Used in conjunction with ports-mode, specifies the base port exposed by the services. "
"For pod #X, the port will be <base-port>+X"
)
parser.add_argument(
"--gateway-address", type=str, default=None,
help="Used in conjunction with ports-mode, specify the external address of the k8s ingress / ELB"
)
parser.add_argument(
"--pod-trains-conf", type=str,
help="Configuration file to be used by the pod itself (if not provided, current configuration is used)"
)
parser.add_argument(
"--overrides-yaml", type=str,
help="YAML file containing pod overrides to be used when launching a new pod"
)
parser.add_argument(
"--template-yaml", type=str,
help="YAML file containing pod template. If provided pod will be scheduled with kubectl apply "
"and overrides are ignored, otherwise it will be scheduled with kubectl run"
)
parser.add_argument(
"--ssh-server-port", type=int, default=0,
help="If non-zero, every pod will also start an SSH server on the selected port (default: zero, not active)"
)
return parser.parse_args()
def main():
args = parse_args()
user_props_cb = None
if args.ports_mode and args.base_port:
def k8s_user_props_cb(pod_number=0):
user_prop = {"k8s-pod-port": args.base_port + pod_number}
if args.gateway_address:
user_prop["k8s-gateway-address"] = args.gateway_address
return user_prop
user_props_cb = k8s_user_props_cb
k8s = K8sIntegration(
ports_mode=args.ports_mode, num_of_services=args.num_of_services, user_props_cb=user_props_cb,
overrides_yaml=args.overrides_yaml, trains_conf_file=args.pod_trains_conf, template_yaml=args.template_yaml,
extra_bash_init_script=K8sIntegration.get_ssh_server_bash(
ssh_port_number=args.ssh_server_port) if args.ssh_server_port else None
)
k8s.k8s_daemon(args.queue)
if __name__ == "__main__":
main()

View File

@@ -1,22 +1,20 @@
attrs>=18.0
enum34>=0.9 ; python_version < '3.6'
furl>=2.0.0
future>=0.16.0
humanfriendly>=2.1
jsonmodels>=2.2
jsonschema>=2.6.0
pathlib2>=2.3.0
psutil>=3.4.2
pyhocon>=0.3.38
pyparsing>=2.0.3
python-dateutil>=2.4.2
pyjwt>=1.6.4
PyYAML>=3.12
requests-file>=1.4.2
requests>=2.20.0
requirements_parser>=0.2.0
six>=1.11.0
tqdm>=4.19.5
typing>=3.6.4
urllib3>=1.21.1
attrs>=18.0,<20.4.0
enum34>=0.9,<1.2.0 ; python_version < '3.6'
furl>=2.0.0,<2.2.0
future>=0.16.0,<0.19.0
humanfriendly>=2.1,<9.2
jsonschema>=2.6.0,<3.3.0
pathlib2>=2.3.0,<2.4.0
psutil>=3.4.2,<5.9.0
pyhocon>=0.3.38,<0.4.0
pyparsing>=2.0.3,<2.5.0
python-dateutil>=2.4.2,<2.9.0
pyjwt>=1.6.4,<2.0.0
PyYAML>=3.12,<5.4.0
requests-file>=1.4.2,<1.6.0
requests>=2.20.0,<2.26.0
six>=1.11.0,<1.16.0
tqdm>=4.19.5,<4.55.0
typing>=3.6.4,<3.8.0
urllib3>=1.21.1,<1.27.0
virtualenv>=16,<20

View File

@@ -4,28 +4,31 @@ TRAINS-AGENT DevOps for machine/deep learning
https://github.com/allegroai/trains-agent
"""
import os.path
# Always prefer setuptools over distutils
from setuptools import setup, find_packages
from six import exec_
from pathlib2 import Path
def read_text(filepath):
with open(filepath, "r") as f:
return f.read()
here = Path(__file__).resolve().parent
here = os.path.dirname(__file__)
# Get the long description from the README file
long_description = (here / 'README.md').read_text()
long_description = read_text(os.path.join(here, 'README.md'))
def read_version_string():
result = {}
exec_((here / 'trains_agent/version.py').read_text(), result)
return result['__version__']
def read_version_string(version_file):
for line in read_text(version_file).splitlines():
if line.startswith('__version__'):
delim = '"' if '"' in line else "'"
return line.split(delim)[1]
else:
raise RuntimeError("Unable to find version string.")
version = read_version_string()
requirements = (here / 'requirements.txt').read_text().splitlines()
version = read_version_string("trains_agent/version.py")
requirements = read_text(os.path.join(here, 'requirements.txt')).splitlines()
setup(
name='trains_agent',

View File

@@ -30,6 +30,6 @@ from trains_agent.helper.repo import VCS
),
)
def test(url, expected):
result = VCS.resolve_ssh_url(url)
result = VCS.replace_ssh_url(url)
expected = expected or url
assert result == expected

View File

@@ -9,9 +9,16 @@
# worker_name: "trains-agent-machine1"
worker_name: ""
# Set GIT user/pass credentials for cloning code, leave blank for GIT SSH credentials.
# Set GIT user/pass credentials (if user/pass are set, GIT protocol will be set to https)
# leave blank for GIT SSH credentials (set force_git_ssh_protocol=true to force SSH protocol)
# git_user: ""
# git_pass: ""
# git_host: ""
# Force GIT protocol to use SSH regardless of the git url (Assumes GIT user/pass are blank)
force_git_ssh_protocol: false
# Force a specific SSH port when converting http to ssh links (the domain is kept the same)
# force_git_ssh_port: 0
# Set the python version to use when creating the virtual environment and launching the experiment
# Example values: "/usr/bin/python3" or "/usr/local/bin/python3.6"
@@ -40,6 +47,26 @@
# additional conda channels to use when installing with conda package manager
conda_channels: ["defaults", "conda-forge", "pytorch", ]
# If set to true, Task's "installed packages" are ignored,
# and the repository's "requirements.txt" is used instead
# force_repo_requirements_txt: false
# set the priority packages to be installed before the rest of the required packages
# priority_packages: ["cython", "numpy", "setuptools", ]
# set the optional priority packages to be installed before the rest of the required packages,
# In case a package installation fails, the package will be ignored,
# and the virtual environment process will continue
# priority_optional_packages: ["pygobject", ]
# set the post packages to be installed after all the rest of the required packages
# post_packages: ["horovod", ]
# set the optional post packages to be installed after all the rest of the required packages,
# In case a package installation fails, the package will be ignored,
# and the virtual environment process will continue
# post_optional_packages: []
# set to True to support torch nightly build installation,
# notice: torch nightly builds are ephemeral and are deleted from time to time
torch_nightly: false,
@@ -82,18 +109,55 @@
# optional shell script to run in docker when started before the experiment is started
# extra_docker_shell_script: ["apt-get install -y bindfs", ]
# optional uptime configuration, make sure to use only one of 'uptime/downtime' and not both.
# If uptime is specified, agent will actively poll (and execute) tasks in the time-spans defined here.
# Outside of the specified time-spans, the agent will be idle.
# Defined using a list of items of the format: "<hours> <days>".
# hours - use values 0-23, single values would count as start hour and end at midnight.
# days - use days in abbreviated format (SUN-SAT)
# use '-' for ranges and ',' to separate singular values.
# for example, to enable the workers every Sunday and Tuesday between 17:00-20:00 set uptime to:
# uptime: ["17-20 SUN,TUE"]
# optional downtime configuration, can be used only when uptime is not used.
# If downtime is specified, agent will be idle in the time-spans defined here.
# Outside of the specified time-spans, the agent will actively poll (and execute) tasks.
# Use the same format as described above for uptime
# downtime: []
# set to true in order to force "docker pull" before running an experiment using a docker image.
# This makes sure the docker image is updated.
docker_force_pull: false
default_docker: {
# default docker image to use when running in docker mode
image: "nvidia/cuda"
image: "nvidia/cuda:10.1-runtime-ubuntu18.04"
# optional arguments to pass to docker image
# arguments: ["--ipc=host", ]
}
# set the initial bash script to execute at the startup of any docker.
# all lines will be executed regardless of their exit code.
# {python_single_digit} is translated to 'python3' or 'python2' according to requested python version
# docker_init_bash_script = [
# "echo 'Binary::apt::APT::Keep-Downloaded-Packages \"true\";' > /etc/apt/apt.conf.d/docker-clean",
# "chown -R root /root/.cache/pip",
# "apt-get update",
# "apt-get install -y git libsm6 libxext6 libxrender-dev libglib2.0-0",
# "(which {python_single_digit} && {python_single_digit} -m pip --version) || apt-get install -y {python_single_digit}-pip",
# ]
# set the preprocessing bash script to execute at the startup of any docker.
# all lines will be executed regardless of their exit code.
# docker_preprocess_bash_script = [
# "echo \"starting docker\"",
#]
# If False replace \r with \n and display full console output
# default is True, report a single \r line in a sequence of consecutive lines, per 5 seconds.
# suppress_carriage_return: true
# cuda versions used for solving pytorch wheel packages
# should be detected automatically. Override with os environment CUDA_VERSION / CUDNN_VERSION
# cuda_version: 10.1

View File

@@ -31,12 +31,18 @@
# X images are stored in the upload destination for each matplotlib plot title.
matplotlib_untitled_history_size: 100
# Limit the number of digits after the dot in plot reporting (reducing plot report size)
# plot_max_num_digits: 5
# Settings for generated debug images
images {
format: JPEG
quality: 87
subsampling: 0
}
# Support plot-per-graph fully matching Tensorboard behavior (i.e. if this is set to true, each series should have its own graph)
tensorboard_single_series_per_graph: false
}
network {
@@ -117,11 +123,11 @@
log {
# debugging feature: set this to true to make null log propagate messages to root logger (so they appear in stdout)
null_log_propagate: False
null_log_propagate: false
task_log_buffer_capacity: 66
# disable urllib info and lower levels
disable_urllib3_info: True
disable_urllib3_info: true
}
development {
@@ -131,14 +137,30 @@
task_reuse_time_window_in_hours: 72.0
# Run VCS repository detection asynchronously
vcs_repo_detect_async: True
vcs_repo_detect_async: true
# Store uncommitted git/hg source code diff in experiment manifest when training in development mode
# This stores "git diff" or "hg diff" into the experiment's "script.requirements.diff" section
store_uncommitted_code_diff_on_train: True
store_uncommitted_code_diff: true
# Support stopping an experiment in case it was externally stopped, status was changed or task was reset
support_stopping: True
support_stopping: true
# Default Task output_uri. if output_uri is not provided to Task.init, default_output_uri will be used instead.
default_output_uri: ""
# Default auto generated requirements optimize for smaller requirements
# If True, analyze the entire repository regardless of the entry point.
# If False, first analyze the entry point script, if it does not contain other to local files,
# do not analyze the entire repository.
force_analyze_entire_repo: false
# If set to true, *trains* update message will not be printed to the console
# this value can be overwritten with os environment variable TRAINS_SUPPRESS_UPDATE_MESSAGE=1
suppress_update_message: false
# If this flag is true (default is false), instead of analyzing the code with Pigar, analyze with `pip freeze`
detect_with_pip_freeze: false
# Development mode worker
worker {
@@ -149,7 +171,11 @@
ping_period_sec: 30
# Log all stdout & stderr
log_stdout: True
log_stdout: true
# compatibility feature, report memory usage for the entire machine
# default (false), report only on the running process and its sub-processes
report_global_mem_used: false
}
}
}
}

View File

@@ -8,3 +8,4 @@ ENV_ACCESS_KEY = EnvEntry("TRAINS_API_ACCESS_KEY", "TRAINS_API_ACCESS_KEY")
ENV_SECRET_KEY = EnvEntry("TRAINS_API_SECRET_KEY", "TRAINS_API_SECRET_KEY")
ENV_VERBOSE = EnvEntry("TRAINS_API_VERBOSE", "TRAINS_API_VERBOSE", type=bool, default=False)
ENV_HOST_VERIFY_CERT = EnvEntry("TRAINS_API_HOST_VERIFY_CERT", "TRAINS_API_HOST_VERIFY_CERT", type=bool, default=True)
ENV_CONDA_ENV_PACKAGE = EnvEntry("TRAINS_CONDA_ENV_PACKAGE", "TRAINS_CONDA_ENV_PACKAGE")

View File

@@ -0,0 +1,9 @@
# coding: utf-8
__author__ = 'Szczepan Cieślik'
__email__ = 'szczepan.cieslik@gmail.com'
__version__ = '2.4'
from . import models
from . import fields
from . import errors

View File

@@ -0,0 +1,230 @@
"""Builders to generate in memory representation of model and fields tree."""
from __future__ import absolute_import
from collections import defaultdict
import six
from . import errors
from .fields import NotSet
class Builder(object):
def __init__(self, parent=None, nullable=False, default=NotSet):
self.parent = parent
self.types_builders = {}
self.types_count = defaultdict(int)
self.definitions = set()
self.nullable = nullable
self.default = default
@property
def has_default(self):
return self.default is not NotSet
def register_type(self, type, builder):
if self.parent:
return self.parent.register_type(type, builder)
self.types_count[type] += 1
if type not in self.types_builders:
self.types_builders[type] = builder
def get_builder(self, type):
if self.parent:
return self.parent.get_builder(type)
return self.types_builders[type]
def count_type(self, type):
if self.parent:
return self.parent.count_type(type)
return self.types_count[type]
@staticmethod
def maybe_build(value):
return value.build() if isinstance(value, Builder) else value
def add_definition(self, builder):
if self.parent:
return self.parent.add_definition(builder)
self.definitions.add(builder)
class ObjectBuilder(Builder):
def __init__(self, model_type, *args, **kwargs):
super(ObjectBuilder, self).__init__(*args, **kwargs)
self.properties = {}
self.required = []
self.type = model_type
self.register_type(self.type, self)
def add_field(self, name, field, schema):
_apply_validators_modifications(schema, field)
self.properties[name] = schema
if field.required:
self.required.append(name)
def build(self):
builder = self.get_builder(self.type)
if self.is_definition and not self.is_root:
self.add_definition(builder)
[self.maybe_build(value) for _, value in self.properties.items()]
return '#/definitions/{name}'.format(name=self.type_name)
else:
return builder.build_definition(nullable=self.nullable)
@property
def type_name(self):
module_name = '{module}.{name}'.format(
module=self.type.__module__,
name=self.type.__name__,
)
return module_name.replace('.', '_').lower()
def build_definition(self, add_defintitions=True, nullable=False):
properties = dict(
(name, self.maybe_build(value))
for name, value
in self.properties.items()
)
schema = {
'type': 'object',
'additionalProperties': False,
'properties': properties,
}
if self.required:
schema['required'] = self.required
if self.definitions and add_defintitions:
schema['definitions'] = dict(
(builder.type_name, builder.build_definition(False, False))
for builder in self.definitions
)
return schema
@property
def is_definition(self):
if self.count_type(self.type) > 1:
return True
elif self.parent:
return self.parent.is_definition
else:
return False
@property
def is_root(self):
return not bool(self.parent)
def _apply_validators_modifications(field_schema, field):
for validator in field.validators:
try:
validator.modify_schema(field_schema)
except AttributeError:
pass
class PrimitiveBuilder(Builder):
def __init__(self, type, *args, **kwargs):
super(PrimitiveBuilder, self).__init__(*args, **kwargs)
self.type = type
def build(self):
schema = {}
if issubclass(self.type, six.string_types):
obj_type = 'string'
elif issubclass(self.type, bool):
obj_type = 'boolean'
elif issubclass(self.type, int):
obj_type = 'number'
elif issubclass(self.type, float):
obj_type = 'number'
else:
raise errors.FieldNotSupported(
"Can't specify value schema!", self.type
)
if self.nullable:
obj_type = [obj_type, 'null']
schema['type'] = obj_type
if self.has_default:
schema["default"] = self.default
return schema
class ListBuilder(Builder):
def __init__(self, *args, **kwargs):
super(ListBuilder, self).__init__(*args, **kwargs)
self.schemas = []
def add_type_schema(self, schema):
self.schemas.append(schema)
def build(self):
schema = {'type': 'array'}
if self.nullable:
self.add_type_schema({'type': 'null'})
if self.has_default:
schema["default"] = [self.to_struct(i) for i in self.default]
schemas = [self.maybe_build(s) for s in self.schemas]
if len(schemas) == 1:
items = schemas[0]
else:
items = {'oneOf': schemas}
schema['items'] = items
return schema
@property
def is_definition(self):
return self.parent.is_definition
@staticmethod
def to_struct(item):
from .models import Base
if isinstance(item, Base):
return item.to_struct()
return item
class EmbeddedBuilder(Builder):
def __init__(self, *args, **kwargs):
super(EmbeddedBuilder, self).__init__(*args, **kwargs)
self.schemas = []
def add_type_schema(self, schema):
self.schemas.append(schema)
def build(self):
if self.nullable:
self.add_type_schema({'type': 'null'})
schemas = [self.maybe_build(schema) for schema in self.schemas]
if len(schemas) == 1:
schema = schemas[0]
else:
schema = {'oneOf': schemas}
if self.has_default:
# The default value of EmbeddedField is expected to be an instance
# of a subclass of models.Base, thus have `to_struct`
schema["default"] = self.default.to_struct()
return schema
@property
def is_definition(self):
return self.parent.is_definition

View File

@@ -0,0 +1,21 @@
class ModelCollection(list):
"""`ModelCollection` is list which validates stored values.
Validation is made with use of field passed to `__init__` at each point,
when new value is assigned.
"""
def __init__(self, field):
self.field = field
def append(self, value):
self.field.validate_single_value(value)
super(ModelCollection, self).append(value)
def __setitem__(self, key, value):
self.field.validate_single_value(value)
super(ModelCollection, self).__setitem__(key, value)

View File

@@ -0,0 +1,15 @@
class ValidationError(RuntimeError):
pass
class FieldNotFound(RuntimeError):
pass
class FieldNotSupported(ValueError):
pass

View File

@@ -0,0 +1,488 @@
import datetime
import re
from weakref import WeakKeyDictionary
import six
from dateutil.parser import parse
from .errors import ValidationError
from .collections import ModelCollection
# unique marker for "no default value specified". None is not good enough since
# it is a completely valid default value.
NotSet = object()
class BaseField(object):
"""Base class for all fields."""
types = None
def __init__(
self,
required=False,
nullable=False,
help_text=None,
validators=None,
default=NotSet,
name=None):
self.memory = WeakKeyDictionary()
self.required = required
self.help_text = help_text
self.nullable = nullable
self._assign_validators(validators)
self.name = name
self._validate_name()
if default is not NotSet:
self.validate(default)
self._default = default
@property
def has_default(self):
return self._default is not NotSet
def _assign_validators(self, validators):
if validators and not isinstance(validators, list):
validators = [validators]
self.validators = validators or []
def __set__(self, instance, value):
self._finish_initialization(type(instance))
value = self.parse_value(value)
self.validate(value)
self.memory[instance._cache_key] = value
def __get__(self, instance, owner=None):
if instance is None:
self._finish_initialization(owner)
return self
self._finish_initialization(type(instance))
self._check_value(instance)
return self.memory[instance._cache_key]
def _finish_initialization(self, owner):
pass
def _check_value(self, obj):
if obj._cache_key not in self.memory:
self.__set__(obj, self.get_default_value())
def validate_for_object(self, obj):
value = self.__get__(obj)
self.validate(value)
def validate(self, value):
self._check_types()
self._validate_against_types(value)
self._check_against_required(value)
self._validate_with_custom_validators(value)
def _check_against_required(self, value):
if value is None and self.required:
raise ValidationError('Field is required!')
def _validate_against_types(self, value):
if value is not None and not isinstance(value, self.types):
raise ValidationError(
'Value is wrong, expected type "{types}"'.format(
types=', '.join([t.__name__ for t in self.types])
),
value,
)
def _check_types(self):
if self.types is None:
raise ValidationError(
'Field "{type}" is not usable, try '
'different field type.'.format(type=type(self).__name__))
def to_struct(self, value):
"""Cast value to Python structure."""
return value
def parse_value(self, value):
"""Parse value from primitive to desired format.
Each field can parse value to form it wants it to be (like string or
int).
"""
return value
def _validate_with_custom_validators(self, value):
if value is None and self.nullable:
return
for validator in self.validators:
try:
validator.validate(value)
except AttributeError:
validator(value)
def get_default_value(self):
"""Get default value for field.
Each field can specify its default.
"""
return self._default if self.has_default else None
def _validate_name(self):
if self.name is None:
return
if not re.match('^[A-Za-z_](([\w\-]*)?\w+)?$', self.name):
raise ValueError('Wrong name', self.name)
def structue_name(self, default):
return self.name if self.name is not None else default
class StringField(BaseField):
"""String field."""
types = six.string_types
class IntField(BaseField):
"""Integer field."""
types = (int,)
def parse_value(self, value):
"""Cast value to `int`, e.g. from string or long"""
parsed = super(IntField, self).parse_value(value)
if parsed is None:
return parsed
return int(parsed)
class FloatField(BaseField):
"""Float field."""
types = (float, int)
class BoolField(BaseField):
"""Bool field."""
types = (bool,)
def parse_value(self, value):
"""Cast value to `bool`."""
parsed = super(BoolField, self).parse_value(value)
return bool(parsed) if parsed is not None else None
class ListField(BaseField):
"""List field."""
types = (list,)
def __init__(self, items_types=None, *args, **kwargs):
"""Init.
`ListField` is **always not required**. If you want to control number
of items use validators.
"""
self._assign_types(items_types)
super(ListField, self).__init__(*args, **kwargs)
self.required = False
def get_default_value(self):
default = super(ListField, self).get_default_value()
if default is None:
return ModelCollection(self)
return default
def _assign_types(self, items_types):
if items_types:
try:
self.items_types = tuple(items_types)
except TypeError:
self.items_types = items_types,
else:
self.items_types = tuple()
types = []
for type_ in self.items_types:
if isinstance(type_, six.string_types):
types.append(_LazyType(type_))
else:
types.append(type_)
self.items_types = tuple(types)
def validate(self, value):
super(ListField, self).validate(value)
if len(self.items_types) == 0:
return
for item in value:
self.validate_single_value(item)
def validate_single_value(self, item):
if len(self.items_types) == 0:
return
if not isinstance(item, self.items_types):
raise ValidationError(
'All items must be instances '
'of "{types}", and not "{type}".'.format(
types=', '.join([t.__name__ for t in self.items_types]),
type=type(item).__name__,
))
def parse_value(self, values):
"""Cast value to proper collection."""
result = self.get_default_value()
if not values:
return result
if not isinstance(values, list):
return values
return [self._cast_value(value) for value in values]
def _cast_value(self, value):
if isinstance(value, self.items_types):
return value
else:
if len(self.items_types) != 1:
tpl = 'Cannot decide which type to choose from "{types}".'
raise ValidationError(
tpl.format(
types=', '.join([t.__name__ for t in self.items_types])
)
)
return self.items_types[0](**value)
def _finish_initialization(self, owner):
super(ListField, self)._finish_initialization(owner)
types = []
for type in self.items_types:
if isinstance(type, _LazyType):
types.append(type.evaluate(owner))
else:
types.append(type)
self.items_types = tuple(types)
def _elem_to_struct(self, value):
try:
return value.to_struct()
except AttributeError:
return value
def to_struct(self, values):
return [self._elem_to_struct(v) for v in values]
class EmbeddedField(BaseField):
"""Field for embedded models."""
def __init__(self, model_types, *args, **kwargs):
self._assign_model_types(model_types)
super(EmbeddedField, self).__init__(*args, **kwargs)
def _assign_model_types(self, model_types):
if not isinstance(model_types, (list, tuple)):
model_types = (model_types,)
types = []
for type_ in model_types:
if isinstance(type_, six.string_types):
types.append(_LazyType(type_))
else:
types.append(type_)
self.types = tuple(types)
def _finish_initialization(self, owner):
super(EmbeddedField, self)._finish_initialization(owner)
types = []
for type in self.types:
if isinstance(type, _LazyType):
types.append(type.evaluate(owner))
else:
types.append(type)
self.types = tuple(types)
def validate(self, value):
super(EmbeddedField, self).validate(value)
try:
value.validate()
except AttributeError:
pass
def parse_value(self, value):
"""Parse value to proper model type."""
if not isinstance(value, dict):
return value
embed_type = self._get_embed_type()
return embed_type(**value)
def _get_embed_type(self):
if len(self.types) != 1:
raise ValidationError(
'Cannot decide which type to choose from "{types}".'.format(
types=', '.join([t.__name__ for t in self.types])
)
)
return self.types[0]
def to_struct(self, value):
return value.to_struct()
class _LazyType(object):
def __init__(self, path):
self.path = path
def evaluate(self, base_cls):
module, type_name = _evaluate_path(self.path, base_cls)
return _import(module, type_name)
def _evaluate_path(relative_path, base_cls):
base_module = base_cls.__module__
modules = _get_modules(relative_path, base_module)
type_name = modules.pop()
module = '.'.join(modules)
if not module:
module = base_module
return module, type_name
def _get_modules(relative_path, base_module):
canonical_path = relative_path.lstrip('.')
canonical_modules = canonical_path.split('.')
if not relative_path.startswith('.'):
return canonical_modules
parents_amount = len(relative_path) - len(canonical_path)
parent_modules = base_module.split('.')
parents_amount = max(0, parents_amount - 1)
if parents_amount > len(parent_modules):
raise ValueError("Can't evaluate path '{}'".format(relative_path))
return parent_modules[:parents_amount * -1] + canonical_modules
def _import(module_name, type_name):
module = __import__(module_name, fromlist=[type_name])
try:
return getattr(module, type_name)
except AttributeError:
raise ValueError(
"Can't find type '{}.{}'.".format(module_name, type_name))
class TimeField(StringField):
"""Time field."""
types = (datetime.time,)
def __init__(self, str_format=None, *args, **kwargs):
"""Init.
:param str str_format: Format to cast time to (if `None` - casting to
ISO 8601 format).
"""
self.str_format = str_format
super(TimeField, self).__init__(*args, **kwargs)
def to_struct(self, value):
"""Cast `time` object to string."""
if self.str_format:
return value.strftime(self.str_format)
return value.isoformat()
def parse_value(self, value):
"""Parse string into instance of `time`."""
if value is None:
return value
if isinstance(value, datetime.time):
return value
return parse(value).timetz()
class DateField(StringField):
"""Date field."""
types = (datetime.date,)
default_format = '%Y-%m-%d'
def __init__(self, str_format=None, *args, **kwargs):
"""Init.
:param str str_format: Format to cast date to (if `None` - casting to
%Y-%m-%d format).
"""
self.str_format = str_format
super(DateField, self).__init__(*args, **kwargs)
def to_struct(self, value):
"""Cast `date` object to string."""
if self.str_format:
return value.strftime(self.str_format)
return value.strftime(self.default_format)
def parse_value(self, value):
"""Parse string into instance of `date`."""
if value is None:
return value
if isinstance(value, datetime.date):
return value
return parse(value).date()
class DateTimeField(StringField):
"""Datetime field."""
types = (datetime.datetime,)
def __init__(self, str_format=None, *args, **kwargs):
"""Init.
:param str str_format: Format to cast datetime to (if `None` - casting
to ISO 8601 format).
"""
self.str_format = str_format
super(DateTimeField, self).__init__(*args, **kwargs)
def to_struct(self, value):
"""Cast `datetime` object to string."""
if self.str_format:
return value.strftime(self.str_format)
return value.isoformat()
def parse_value(self, value):
"""Parse string into instance of `datetime`."""
if isinstance(value, datetime.datetime):
return value
if value:
return parse(value)
else:
return None

View File

@@ -0,0 +1,154 @@
import six
from . import parsers, errors
from .fields import BaseField
from .errors import ValidationError
class JsonmodelMeta(type):
def __new__(cls, name, bases, attributes):
cls.validate_fields(attributes)
return super(cls, cls).__new__(cls, name, bases, attributes)
@staticmethod
def validate_fields(attributes):
fields = {
key: value for key, value in attributes.items()
if isinstance(value, BaseField)
}
taken_names = set()
for name, field in fields.items():
structue_name = field.structue_name(name)
if structue_name in taken_names:
raise ValueError('Name taken', structue_name, name)
taken_names.add(structue_name)
class Base(six.with_metaclass(JsonmodelMeta, object)):
"""Base class for all models."""
def __init__(self, **kwargs):
self._cache_key = _CacheKey()
self.populate(**kwargs)
def populate(self, **values):
"""Populate values to fields. Skip non-existing."""
values = values.copy()
fields = list(self.iterate_with_name())
for _, structure_name, field in fields:
if structure_name in values:
field.__set__(self, values.pop(structure_name))
for name, _, field in fields:
if name in values:
field.__set__(self, values.pop(name))
def get_field(self, field_name):
"""Get field associated with given attribute."""
for attr_name, field in self:
if field_name == attr_name:
return field
raise errors.FieldNotFound('Field not found', field_name)
def __iter__(self):
"""Iterate through fields and values."""
for name, field in self.iterate_over_fields():
yield name, field
def validate(self):
"""Explicitly validate all the fields."""
for name, field in self:
try:
field.validate_for_object(self)
except ValidationError as error:
raise ValidationError(
"Error for field '{name}'.".format(name=name),
error,
)
@classmethod
def iterate_over_fields(cls):
"""Iterate through fields as `(attribute_name, field_instance)`."""
for attr in dir(cls):
clsattr = getattr(cls, attr)
if isinstance(clsattr, BaseField):
yield attr, clsattr
@classmethod
def iterate_with_name(cls):
"""Iterate over fields, but also give `structure_name`.
Format is `(attribute_name, structue_name, field_instance)`.
Structure name is name under which value is seen in structure and
schema (in primitives) and only there.
"""
for attr_name, field in cls.iterate_over_fields():
structure_name = field.structue_name(attr_name)
yield attr_name, structure_name, field
def to_struct(self):
"""Cast model to Python structure."""
return parsers.to_struct(self)
@classmethod
def to_json_schema(cls):
"""Generate JSON schema for model."""
return parsers.to_json_schema(cls)
def __repr__(self):
attrs = {}
for name, _ in self:
try:
attr = getattr(self, name)
if attr is not None:
attrs[name] = repr(attr)
except ValidationError:
pass
return '{class_name}({fields})'.format(
class_name=self.__class__.__name__,
fields=', '.join(
'{0[0]}={0[1]}'.format(x) for x in sorted(attrs.items())
),
)
def __str__(self):
return '{name} object'.format(name=self.__class__.__name__)
def __setattr__(self, name, value):
try:
return super(Base, self).__setattr__(name, value)
except ValidationError as error:
raise ValidationError(
"Error for field '{name}'.".format(name=name),
error
)
def __eq__(self, other):
if type(other) is not type(self):
return False
for name, _ in self.iterate_over_fields():
try:
our = getattr(self, name)
except errors.ValidationError:
our = None
try:
their = getattr(other, name)
except errors.ValidationError:
their = None
if our != their:
return False
return True
def __ne__(self, other):
return not (self == other)
class _CacheKey(object):
"""Object to identify model in memory."""

View File

@@ -0,0 +1,106 @@
"""Parsers to change model structure into different ones."""
import inspect
from . import fields, builders, errors
def to_struct(model):
"""Cast instance of model to python structure.
:param model: Model to be casted.
:rtype: ``dict``
"""
model.validate()
resp = {}
for _, name, field in model.iterate_with_name():
value = field.__get__(model)
if value is None:
continue
value = field.to_struct(value)
resp[name] = value
return resp
def to_json_schema(cls):
"""Generate JSON schema for given class.
:param cls: Class to be casted.
:rtype: ``dict``
"""
builder = build_json_schema(cls)
return builder.build()
def build_json_schema(value, parent_builder=None):
from .models import Base
cls = value if inspect.isclass(value) else value.__class__
if issubclass(cls, Base):
return build_json_schema_object(cls, parent_builder)
else:
return build_json_schema_primitive(cls, parent_builder)
def build_json_schema_object(cls, parent_builder=None):
builder = builders.ObjectBuilder(cls, parent_builder)
if builder.count_type(builder.type) > 1:
return builder
for _, name, field in cls.iterate_with_name():
if isinstance(field, fields.EmbeddedField):
builder.add_field(name, field, _parse_embedded(field, builder))
elif isinstance(field, fields.ListField):
builder.add_field(name, field, _parse_list(field, builder))
else:
builder.add_field(
name, field, _create_primitive_field_schema(field))
return builder
def _parse_list(field, parent_builder):
builder = builders.ListBuilder(
parent_builder, field.nullable, default=field._default)
for type in field.items_types:
builder.add_type_schema(build_json_schema(type, builder))
return builder
def _parse_embedded(field, parent_builder):
builder = builders.EmbeddedBuilder(
parent_builder, field.nullable, default=field._default)
for type in field.types:
builder.add_type_schema(build_json_schema(type, builder))
return builder
def build_json_schema_primitive(cls, parent_builder):
builder = builders.PrimitiveBuilder(cls, parent_builder)
return builder
def _create_primitive_field_schema(field):
if isinstance(field, fields.StringField):
obj_type = 'string'
elif isinstance(field, fields.IntField):
obj_type = 'number'
elif isinstance(field, fields.FloatField):
obj_type = 'float'
elif isinstance(field, fields.BoolField):
obj_type = 'boolean'
else:
raise errors.FieldNotSupported(
'Field {field} is not supported!'.format(
field=type(field).__class__.__name__))
if field.nullable:
obj_type = [obj_type, 'null']
schema = {'type': obj_type}
if field.has_default:
schema["default"] = field._default
return schema

View File

@@ -0,0 +1,156 @@
from __future__ import absolute_import
import six
import re
from collections import namedtuple
SCALAR_TYPES = tuple(list(six.string_types) + [int, float, bool])
ECMA_TO_PYTHON_FLAGS = {
'i': re.I,
'm': re.M,
}
PYTHON_TO_ECMA_FLAGS = dict(
(value, key) for key, value in ECMA_TO_PYTHON_FLAGS.items()
)
PythonRegex = namedtuple('PythonRegex', ['regex', 'flags'])
def _normalize_string_type(value):
if isinstance(value, six.string_types):
return six.text_type(value)
else:
return value
def _compare_dicts(one, two):
if len(one) != len(two):
return False
for key, value in one.items():
if key not in one or key not in two:
return False
if not compare_schemas(one[key], two[key]):
return False
return True
def _compare_lists(one, two):
if len(one) != len(two):
return False
they_match = False
for first_item in one:
for second_item in two:
if they_match:
continue
they_match = compare_schemas(first_item, second_item)
return they_match
def _assert_same_types(one, two):
if not isinstance(one, type(two)) or not isinstance(two, type(one)):
raise RuntimeError('Types mismatch! "{type1}" and "{type2}".'.format(
type1=type(one).__name__, type2=type(two).__name__))
def compare_schemas(one, two):
"""Compare two structures that represents JSON schemas.
For comparison you can't use normal comparison, because in JSON schema
lists DO NOT keep order (and Python lists do), so this must be taken into
account during comparison.
Note this wont check all configurations, only first one that seems to
match, which can lead to wrong results.
:param one: First schema to compare.
:param two: Second schema to compare.
:rtype: `bool`
"""
one = _normalize_string_type(one)
two = _normalize_string_type(two)
_assert_same_types(one, two)
if isinstance(one, list):
return _compare_lists(one, two)
elif isinstance(one, dict):
return _compare_dicts(one, two)
elif isinstance(one, SCALAR_TYPES):
return one == two
elif one is None:
return one is two
else:
raise RuntimeError('Not allowed type "{type}"'.format(
type=type(one).__name__))
def is_ecma_regex(regex):
"""Check if given regex is of type ECMA 262 or not.
:rtype: bool
"""
parts = regex.split('/')
if len(parts) == 1:
return False
if len(parts) < 3:
raise ValueError('Given regex isn\'t ECMA regex nor Python regex.')
parts.pop()
parts.append('')
raw_regex = '/'.join(parts)
if raw_regex.startswith('/') and raw_regex.endswith('/'):
return True
return False
def convert_ecma_regex_to_python(value):
"""Convert ECMA 262 regex to Python tuple with regex and flags.
If given value is already Python regex it will be returned unchanged.
:param string value: ECMA regex.
:return: 2-tuple with `regex` and `flags`
:rtype: namedtuple
"""
if not is_ecma_regex(value):
return PythonRegex(value, [])
parts = value.split('/')
flags = parts.pop()
try:
result_flags = [ECMA_TO_PYTHON_FLAGS[f] for f in flags]
except KeyError:
raise ValueError('Wrong flags "{}".'.format(flags))
return PythonRegex('/'.join(parts[1:]), result_flags)
def convert_python_regex_to_ecma(value, flags=[]):
"""Convert Python regex to ECMA 262 regex.
If given value is already ECMA regex it will be returned unchanged.
:param string value: Python regex.
:param list flags: List of flags (allowed flags: `re.I`, `re.M`)
:return: ECMA 262 regex
:rtype: str
"""
if is_ecma_regex(value):
return value
result_flags = [PYTHON_TO_ECMA_FLAGS[f] for f in flags]
result_flags = ''.join(result_flags)
return '/{value}/{flags}'.format(value=value, flags=result_flags)

View File

@@ -0,0 +1,202 @@
"""Predefined validators."""
import re
from six.moves import reduce
from .errors import ValidationError
from . import utilities
class Min(object):
"""Validator for minimum value."""
def __init__(self, minimum_value, exclusive=False):
"""Init.
:param minimum_value: Minimum value for validator.
:param bool exclusive: If `True`, then validated value must be strongly
lower than given threshold.
"""
self.minimum_value = minimum_value
self.exclusive = exclusive
def validate(self, value):
"""Validate value."""
if self.exclusive:
if value <= self.minimum_value:
tpl = "'{value}' is lower or equal than minimum ('{min}')."
raise ValidationError(
tpl.format(value=value, min=self.minimum_value))
else:
if value < self.minimum_value:
raise ValidationError(
"'{value}' is lower than minimum ('{min}').".format(
value=value, min=self.minimum_value))
def modify_schema(self, field_schema):
"""Modify field schema."""
field_schema['minimum'] = self.minimum_value
if self.exclusive:
field_schema['exclusiveMinimum'] = True
class Max(object):
"""Validator for maximum value."""
def __init__(self, maximum_value, exclusive=False):
"""Init.
:param maximum_value: Maximum value for validator.
:param bool exclusive: If `True`, then validated value must be strongly
bigger than given threshold.
"""
self.maximum_value = maximum_value
self.exclusive = exclusive
def validate(self, value):
"""Validate value."""
if self.exclusive:
if value >= self.maximum_value:
tpl = "'{val}' is bigger or equal than maximum ('{max}')."
raise ValidationError(
tpl.format(val=value, max=self.maximum_value))
else:
if value > self.maximum_value:
raise ValidationError(
"'{value}' is bigger than maximum ('{max}').".format(
value=value, max=self.maximum_value))
def modify_schema(self, field_schema):
"""Modify field schema."""
field_schema['maximum'] = self.maximum_value
if self.exclusive:
field_schema['exclusiveMaximum'] = True
class Regex(object):
"""Validator for regular expressions."""
FLAGS = {
'ignorecase': re.I,
'multiline': re.M,
}
def __init__(self, pattern, **flags):
"""Init.
Note, that if given pattern is ECMA regex, given flags will be
**completely ignored** and taken from given regex.
:param string pattern: Pattern of regex.
:param bool flags: Flags used for the regex matching.
Allowed flag names are in the `FLAGS` attribute. The flag value
does not matter as long as it evaluates to True.
Flags with False values will be ignored.
Invalid flags will be ignored.
"""
if utilities.is_ecma_regex(pattern):
result = utilities.convert_ecma_regex_to_python(pattern)
self.pattern, self.flags = result
else:
self.pattern = pattern
self.flags = [self.FLAGS[key] for key, value in flags.items()
if key in self.FLAGS and value]
def validate(self, value):
"""Validate value."""
flags = self._calculate_flags()
try:
result = re.search(self.pattern, value, flags)
except TypeError as te:
raise ValidationError(*te.args)
if not result:
raise ValidationError(
'Value "{value}" did not match pattern "{pattern}".'.format(
value=value, pattern=self.pattern
))
def _calculate_flags(self):
return reduce(lambda x, y: x | y, self.flags, 0)
def modify_schema(self, field_schema):
"""Modify field schema."""
field_schema['pattern'] = utilities.convert_python_regex_to_ecma(
self.pattern, self.flags)
class Length(object):
"""Validator for length."""
def __init__(self, minimum_value=None, maximum_value=None):
"""Init.
Note that if no `minimum_value` neither `maximum_value` will be
specified, `ValueError` will be raised.
:param int minimum_value: Minimum value (optional).
:param int maximum_value: Maximum value (optional).
"""
if minimum_value is None and maximum_value is None:
raise ValueError(
"Either 'minimum_value' or 'maximum_value' must be specified.")
self.minimum_value = minimum_value
self.maximum_value = maximum_value
def validate(self, value):
"""Validate value."""
len_ = len(value)
if self.minimum_value is not None and len_ < self.minimum_value:
tpl = "Value '{val}' length is lower than allowed minimum '{min}'."
raise ValidationError(tpl.format(
val=value, min=self.minimum_value
))
if self.maximum_value is not None and len_ > self.maximum_value:
raise ValidationError(
"Value '{val}' length is bigger than "
"allowed maximum '{max}'.".format(
val=value,
max=self.maximum_value,
))
def modify_schema(self, field_schema):
"""Modify field schema."""
if self.minimum_value:
field_schema['minLength'] = self.minimum_value
if self.maximum_value:
field_schema['maxLength'] = self.maximum_value
class Enum(object):
"""Validator for enums."""
def __init__(self, *choices):
"""Init.
:param [] choices: Valid choices for the field.
"""
self.choices = list(choices)
def validate(self, value):
if value not in self.choices:
tpl = "Value '{val}' is not a valid choice."
raise ValidationError(tpl.format(val=value))
def modify_schema(self, field_schema):
field_schema['enum'] = self.choices

View File

@@ -1,10 +1,8 @@
import requests
import six
import jsonmodels.models
import jsonmodels.fields
import jsonmodels.errors
from . import jsonmodels
from .apimodel import ApiModel
from .datamodel import NonStrictDataModelMixin

View File

@@ -40,6 +40,7 @@ class Session(TokenManager):
_session_requests = 0
_session_initial_timeout = (3.0, 10.)
_session_timeout = (10.0, 30.)
_session_initial_connect_retry = 4
_write_session_data_size = 15000
_write_session_timeout = (30.0, 30.)
@@ -85,6 +86,7 @@ class Session(TokenManager):
initialize_logging=True,
client=None,
config=None,
http_retries_config=None,
**kwargs
):
# add backward compatibility support for old environment variables
@@ -95,7 +97,7 @@ class Session(TokenManager):
else:
self.config = load()
if initialize_logging:
self.config.initialize_logging()
self.config.initialize_logging(debug=kwargs.get('debug', False))
token_expiration_threshold_sec = self.config.get(
"auth.token_expiration_threshold_sec", 60
@@ -129,11 +131,10 @@ class Session(TokenManager):
raise ValueError("host is required in init or config")
self.__host = host.strip("/")
http_retries_config = self.config.get(
http_retries_config = http_retries_config or self.config.get(
"api.http.retries", ConfigTree()
).as_plain_ordered_dict()
http_retries_config["status_forcelist"] = self._retry_codes
self.__http_session = get_http_session_with_retry(**http_retries_config)
self.__worker = worker or gethostname()
@@ -143,7 +144,14 @@ class Session(TokenManager):
self.client = client or "api-{}".format(__version__)
# limit the reconnect retries, so we get an error if we are starting the session
http_no_retries_config = dict(**http_retries_config)
http_no_retries_config['connect'] = self._session_initial_connect_retry
self.__http_session = get_http_session_with_retry(**http_no_retries_config)
# try to connect with the server
self.refresh_token()
# create the default session with many retries
self.__http_session = get_http_session_with_retry(**http_retries_config)
# update api version from server response
try:
@@ -426,16 +434,15 @@ class Session(TokenManager):
@classmethod
def get_api_server_host(cls, config=None):
if not config:
from ...config import config_obj
config = config_obj
return None
return ENV_HOST.get(default=(config.get("api.api_server", None) or
config.get("api.host", None) or cls.default_host))
@classmethod
def get_app_server_host(cls, config=None):
if not config:
from ...config import config_obj
config = config_obj
return None
# get from config/environment
web_host = ENV_WEB_HOST.get(default=config.get("api.web_server", None))
@@ -462,8 +469,8 @@ class Session(TokenManager):
@classmethod
def get_files_server_host(cls, config=None):
if not config:
from ...config import config_obj
config = config_obj
return None
# get from config/environment
files_host = ENV_FILES_HOST.get(default=(config.get("api.files_server", None)))
if files_host:
@@ -545,6 +552,9 @@ class Session(TokenManager):
else:
raise LoginError("Response data mismatch: No 'token' in 'data' value from res, receive : {}, "
"exception: {}".format(res, ex))
except requests.ConnectionError as ex:
raise ValueError('Connection Error: it seems *api_server* is misconfigured. '
'Is this the TRAINS API server {} ?'.format('/'.join(ex.request.url.split('/')[:3])))
except Exception as ex:
raise LoginError('Unrecognized Authentication Error: {} {}'.format(type(ex), ex))

View File

@@ -190,7 +190,7 @@ class Config(object):
def reload(self):
self.replace(self._reload())
def initialize_logging(self):
def initialize_logging(self, debug=False):
logging_config = self._config.get("logging", None)
if not logging_config:
return False
@@ -217,6 +217,8 @@ class Config(object):
)
for logger in loggers:
handlers = logger.get("handlers", None)
if debug:
logger['level'] = 'DEBUG'
if not handlers:
continue
logger["handlers"] = [h for h in handlers if h not in deleted]

View File

@@ -46,6 +46,15 @@ class Environment(object):
local = 'local'
class UptimeConf(object):
min_api_version = "2.10"
queue_tag_on = "force_workers:on"
queue_tag_off = "force_workers:off"
worker_key = "force"
worker_value_off = ["off"]
worker_value_on = ["on"]
CONFIG_FILE_EXTENSION = '.conf'

View File

@@ -142,6 +142,7 @@ def main():
with open(str(conf_file), 'wt') as f:
header = '# TRAINS-AGENT configuration file\n' \
'api {\n' \
' # Notice: \'host\' is the api server (default port 8008), not the web server.\n' \
' api_server: %s\n' \
' web_server: %s\n' \
' files_server: %s\n' \
@@ -233,7 +234,8 @@ def verify_credentials(api_host, credentials):
try:
print('Verifying credentials ...')
if api_host:
Session(api_key=credentials['access_key'], secret_key=credentials['secret_key'], host=api_host)
Session(api_key=credentials['access_key'], secret_key=credentials['secret_key'], host=api_host,
http_retries_config={"total": 2})
print('Credentials verified!')
return True
else:
@@ -275,7 +277,7 @@ def read_manual_credentials():
def input_url(host_type, host=None):
while True:
print('{} configured to: [{}] '.format(host_type, host), end='')
print('{} configured to: {}'.format(host_type, '[{}] '.format(host) if host else ''), end='')
parse_input = input()
if host and (not parse_input or parse_input.lower() == 'yes' or parse_input.lower() == 'y'):
break
@@ -289,11 +291,12 @@ def input_url(host_type, host=None):
def input_host_port(host_type, parsed_host):
print('Enter port for {} host '.format(host_type), end='')
replace_port = input().lower()
return parsed_host.scheme + "://" + parsed_host.netloc + (':{}'.format(replace_port) if replace_port else '') + \
parsed_host.path
return parsed_host.scheme + "://" + parsed_host.netloc + (
':{}'.format(replace_port) if replace_port else '') + parsed_host.path
def verify_url(parse_input):
# noinspection PyBroadException
try:
if not parse_input.startswith('http://') and not parse_input.startswith('https://'):
# if we have a specific port, use http prefix, otherwise assume https

File diff suppressed because it is too large Load Diff

View File

@@ -122,6 +122,7 @@ PIP_EXTRA_INDICES = [
DEFAULT_PIP_DOWNLOAD_CACHE = normalize_path(CONFIG_DIR, "pip-download-cache")
ENV_AGENT_GIT_USER = EnvironmentConfig('TRAINS_AGENT_GIT_USER')
ENV_AGENT_GIT_PASS = EnvironmentConfig('TRAINS_AGENT_GIT_PASS')
ENV_AGENT_GIT_HOST = EnvironmentConfig('TRAINS_AGENT_GIT_HOST')
ENV_TASK_EXECUTE_AS_USER = 'TRAINS_AGENT_EXEC_USER'
ENV_TASK_EXTRA_PYTHON_PATH = 'TRAINS_AGENT_EXTRA_PYTHON_PATH'
ENV_DOCKER_HOST_MOUNT = EnvironmentConfig('TRAINS_AGENT_K8S_HOST_MOUNT', 'TRAINS_AGENT_DOCKER_HOST_MOUNT')

0
trains_agent/external/__init__.py vendored Normal file
View File

View File

@@ -0,0 +1,22 @@
from .parser import parse # noqa
_MAJOR = 0
_MINOR = 2
_PATCH = 0
def version_tuple():
'''
Returns a 3-tuple of ints that represent the version
'''
return (_MAJOR, _MINOR, _PATCH)
def version():
'''
Returns a string representation of the version
'''
return '%d.%d.%d' % (version_tuple())
__version__ = version()

View File

@@ -0,0 +1,44 @@
import re
# Copied from pip
# https://github.com/pypa/pip/blob/281eb61b09d87765d7c2b92f6982b3fe76ccb0af/pip/index.py#L947
HASH_ALGORITHMS = set(['sha1', 'sha224', 'sha384', 'sha256', 'sha512', 'md5'])
extras_require_search = re.compile(
r'(?P<name>.+)\[(?P<extras>[^\]]+)\]').search
def parse_fragment(fragment_string):
"""Takes a fragment string nd returns a dict of the components"""
fragment_string = fragment_string.lstrip('#')
try:
return dict(
key_value_string.split('=')
for key_value_string in fragment_string.split('&')
)
except ValueError:
raise ValueError(
'Invalid fragment string {fragment_string}'.format(
fragment_string=fragment_string
)
)
def get_hash_info(d):
"""Returns the first matching hashlib name and value from a dict"""
for key in d.keys():
if key.lower() in HASH_ALGORITHMS:
return key, d[key]
return None, None
def parse_extras_require(egg):
if egg is not None:
match = extras_require_search(egg)
if match is not None:
name = match.group('name')
extras = match.group('extras')
return name, [extra.strip() for extra in extras.split(',')]
return egg, []

View File

@@ -0,0 +1,50 @@
import os
import warnings
from .requirement import Requirement
def parse(reqstr):
"""
Parse a requirements file into a list of Requirements
See: pip/req.py:parse_requirements()
:param reqstr: a string or file like object containing requirements
:returns: a *generator* of Requirement objects
"""
filename = getattr(reqstr, 'name', None)
try:
# Python 2.x compatibility
if not isinstance(reqstr, basestring):
reqstr = reqstr.read()
except NameError:
# Python 3.x only
if not isinstance(reqstr, str):
reqstr = reqstr.read()
for line in reqstr.splitlines():
line = line.strip()
if line == '':
continue
elif not line or line.startswith('#'):
# comments are lines that start with # only
continue
elif line.startswith('-r') or line.startswith('--requirement'):
_, new_filename = line.split()
new_file_path = os.path.join(os.path.dirname(filename or '.'),
new_filename)
with open(new_file_path) as f:
for requirement in parse(f):
yield requirement
elif line.startswith('-f') or line.startswith('--find-links') or \
line.startswith('-i') or line.startswith('--index-url') or \
line.startswith('--extra-index-url') or \
line.startswith('--no-index'):
warnings.warn('Private repos not supported. Skipping.')
continue
elif line.startswith('-Z') or line.startswith('--always-unzip'):
warnings.warn('Unused option --always-unzip. Skipping.')
continue
else:
yield Requirement.parse(line)

View File

@@ -0,0 +1,241 @@
from __future__ import unicode_literals
import re
from pkg_resources import Requirement as Req
from .fragment import get_hash_info, parse_fragment, parse_extras_require
from .vcs import VCS, VCS_SCHEMES
URI_REGEX = re.compile(
r'^(?P<scheme>https?|file|ftps?)://(?P<path>[^#]+)'
r'(#(?P<fragment>\S+))?'
)
VCS_REGEX = re.compile(
r'^(?P<scheme>{0})://'.format(r'|'.join(
[scheme.replace('+', r'\+') for scheme in VCS_SCHEMES])) +
r'((?P<login>[^/@]+)@)?'
r'(?P<path>[^#@]+)'
r'(@(?P<revision>[^#]+))?'
r'(#(?P<fragment>\S+))?'
)
# This matches just about everyting
LOCAL_REGEX = re.compile(
r'^((?P<scheme>file)://)?'
r'(?P<path>[^#]+)' +
r'(#(?P<fragment>\S+))?'
)
class Requirement(object):
"""
Represents a single requirementfrom trains_agent.external.requirements_parser.requirement import Requirement
Typically instances of this class are created with ``Requirement.parse``.
For local file requirements, there's no verification that the file
exists. This class attempts to be *dict-like*.
See: http://www.pip-installer.org/en/latest/logic.html
**Members**:
* ``line`` - the actual requirement line being parsed
* ``editable`` - a boolean whether this requirement is "editable"
* ``local_file`` - a boolean whether this requirement is a local file/path
* ``specifier`` - a boolean whether this requirement used a requirement
specifier (eg. "django>=1.5" or "requirements")
* ``vcs`` - a string specifying the version control system
* ``revision`` - a version control system specifier
* ``name`` - the name of the requirement
* ``uri`` - the URI if this requirement was specified by URI
* ``subdirectory`` - the subdirectory fragment of the URI
* ``path`` - the local path to the requirement
* ``hash_name`` - the type of hashing algorithm indicated in the line
* ``hash`` - the hash value indicated by the requirement line
* ``extras`` - a list of extras for this requirement
(eg. "mymodule[extra1, extra2]")
* ``specs`` - a list of specs for this requirement
(eg. "mymodule>1.5,<1.6" => [('>', '1.5'), ('<', '1.6')])
"""
def __init__(self, line):
# Do not call this private method
self.line = line
self.editable = False
self.local_file = False
self.specifier = False
self.vcs = None
self.name = None
self.subdirectory = None
self.uri = None
self.path = None
self.revision = None
self.hash_name = None
self.hash = None
self.extras = []
self.specs = []
def __repr__(self):
return '<Requirement: "{0}">'.format(self.line)
def __getitem__(self, key):
return getattr(self, key)
def keys(self):
return self.__dict__.keys()
@classmethod
def parse_editable(cls, line):
"""
Parses a Requirement from an "editable" requirement which is either
a local project path or a VCS project URI.
See: pip/req.py:from_editable()
:param line: an "editable" requirement
:returns: a Requirement instance for the given line
:raises: ValueError on an invalid requirement
"""
req = cls('-e {0}'.format(line))
req.editable = True
vcs_match = VCS_REGEX.match(line)
local_match = LOCAL_REGEX.match(line)
if vcs_match is not None:
groups = vcs_match.groupdict()
if groups.get('login'):
req.uri = '{scheme}://{login}@{path}'.format(**groups)
else:
req.uri = '{scheme}://{path}'.format(**groups)
req.revision = groups['revision']
if groups['fragment']:
fragment = parse_fragment(groups['fragment'])
egg = fragment.get('egg')
req.name, req.extras = parse_extras_require(egg)
req.hash_name, req.hash = get_hash_info(fragment)
req.subdirectory = fragment.get('subdirectory')
for vcs in VCS:
if req.uri.startswith(vcs):
req.vcs = vcs
else:
assert local_match is not None, 'This should match everything'
groups = local_match.groupdict()
req.local_file = True
if groups['fragment']:
fragment = parse_fragment(groups['fragment'])
egg = fragment.get('egg')
req.name, req.extras = parse_extras_require(egg)
req.hash_name, req.hash = get_hash_info(fragment)
req.subdirectory = fragment.get('subdirectory')
req.path = groups['path']
return req
@classmethod
def parse_line(cls, line):
"""
Parses a Requirement from a non-editable requirement.
See: pip/req.py:from_line()
:param line: a "non-editable" requirement
:returns: a Requirement instance for the given line
:raises: ValueError on an invalid requirement
"""
req = cls(line)
vcs_match = VCS_REGEX.match(line)
uri_match = URI_REGEX.match(line)
local_match = LOCAL_REGEX.match(line)
if vcs_match is not None:
groups = vcs_match.groupdict()
if groups.get('login'):
req.uri = '{scheme}://{login}@{path}'.format(**groups)
else:
req.uri = '{scheme}://{path}'.format(**groups)
req.revision = groups['revision']
if groups['fragment']:
fragment = parse_fragment(groups['fragment'])
egg = fragment.get('egg')
req.name, req.extras = parse_extras_require(egg)
req.hash_name, req.hash = get_hash_info(fragment)
req.subdirectory = fragment.get('subdirectory')
for vcs in VCS:
if req.uri.startswith(vcs):
req.vcs = vcs
elif uri_match is not None:
groups = uri_match.groupdict()
req.uri = '{scheme}://{path}'.format(**groups)
if groups['fragment']:
fragment = parse_fragment(groups['fragment'])
egg = fragment.get('egg')
req.name, req.extras = parse_extras_require(egg)
req.hash_name, req.hash = get_hash_info(fragment)
req.subdirectory = fragment.get('subdirectory')
if groups['scheme'] == 'file':
req.local_file = True
elif '#egg=' in line:
# Assume a local file match
assert local_match is not None, 'This should match everything'
groups = local_match.groupdict()
req.local_file = True
if groups['fragment']:
fragment = parse_fragment(groups['fragment'])
egg = fragment.get('egg')
name, extras = parse_extras_require(egg)
req.name = fragment.get('egg')
req.hash_name, req.hash = get_hash_info(fragment)
req.subdirectory = fragment.get('subdirectory')
req.path = groups['path']
else:
# This is a requirement specifier.
# Delegate to pkg_resources and hope for the best
req.specifier = True
pkg_req = Req.parse(line)
req.name = pkg_req.unsafe_name
req.extras = list(pkg_req.extras)
req.specs = pkg_req.specs
return req
@classmethod
def parse(cls, line):
"""
Parses a Requirement from a line of a requirement file.
:param line: a line of a requirement file
:returns: a Requirement instance for the given line
:raises: ValueError on an invalid requirement
"""
line = line.lstrip()
if line.startswith('-e') or line.startswith('--editable'):
# Editable installs are either a local project path
# or a VCS project URI
return cls.parse_editable(
re.sub(r'^(-e|--editable=?)\s*', '', line))
elif '@' in line and ('#' not in line or line.index('#') > line.index('@')):
# Allegro bug fix: support 'name @ git+' entries
name, uri = line.split('@', 1)
name = name.strip()
uri = uri.strip()
# noinspection PyBroadException
try:
# check if the name is valid & parsed
Req.parse(name)
# if we are here, name is a valid package name, check if the vcs part is valid
if VCS_REGEX.match(uri):
req = cls.parse_line(uri)
req.name = name
return req
elif URI_REGEX.match(uri):
req = cls.parse_line(uri)
req.name = name
req.line = line
return req
except Exception:
pass
return cls.parse_line(line)

View File

@@ -0,0 +1,30 @@
from __future__ import unicode_literals
VCS = [
'git',
'hg',
'svn',
'bzr',
]
VCS_SCHEMES = [
'git',
'git+https',
'git+ssh',
'git+git',
'hg+http',
'hg+https',
'hg+static-http',
'hg+ssh',
'svn',
'svn+svn',
'svn+http',
'svn+https',
'svn+ssh',
'bzr+http',
'bzr+https',
'bzr+ssh',
'bzr+sftp',
'bzr+ftp',
'bzr+lp',
]

View File

@@ -1,45 +1,115 @@
from __future__ import print_function, division, unicode_literals
import base64
import logging
import os
import re
import subprocess
import tempfile
from copy import deepcopy
import yaml
import json
from time import sleep
from typing import Text, List
from pyhocon import HOCONConverter
from trains_agent.commands.events import Events
from trains_agent.commands.worker import Worker
from trains_agent.errors import APIError
from trains_agent.helper.base import safe_remove_file
from trains_agent.helper.dicts import merge_dicts
from trains_agent.helper.process import get_bash_output
from trains_agent.helper.resource_monitor import ResourceMonitor
from trains_agent.interface.base import ObjectID
class K8sIntegration(Worker):
K8S_PENDING_QUEUE = "k8s_scheduler"
KUBECTL_RUN_CMD = "kubectl run trains_id_{task_id} " \
KUBECTL_APPLY_CMD = "kubectl apply -f"
KUBECTL_RUN_CMD = "kubectl run trains-{queue_name}-id-{task_id} " \
"--image {docker_image} " \
"--restart=Never --replicas=1 " \
"--generator=run-pod/v1"
"--generator=run-pod/v1 " \
"--namespace=trains"
KUBECTL_DELETE_CMD = "kubectl delete pods " \
"--selector=TRAINS=agent " \
"--field-selector=status.phase!=Pending,status.phase!=Running"
"--field-selector=status.phase!=Pending,status.phase!=Running " \
"--namespace=trains"
CONTAINER_BASH_SCRIPT = "apt-get install -y git python-pip && " \
"pip install trains-agent && " \
"python -u -m trains_agent execute --full-monitoring --require-queue --id {}"
BASH_INSTALL_SSH_CMD = [
"apt-get install -y openssh-server",
"mkdir -p /var/run/sshd",
"echo 'root:training' | chpasswd",
"echo 'PermitRootLogin yes' >> /etc/ssh/sshd_config",
"sed -i 's/PermitRootLogin prohibit-password/PermitRootLogin yes/' /etc/ssh/sshd_config",
r"sed 's@session\s*required\s*pam_loginuid.so@session optional pam_loginuid.so@g' -i /etc/pam.d/sshd",
"echo 'AcceptEnv TRAINS_API_ACCESS_KEY TRAINS_API_SECRET_KEY' >> /etc/ssh/sshd_config",
'echo "export VISIBLE=now" >> /etc/profile',
'echo "export PATH=$PATH" >> /etc/profile',
'echo "ldconfig" >> /etc/profile',
"/usr/sbin/sshd -p {port}"]
def __init__(self, k8s_pending_queue_name=None, kubectl_cmd=None, container_bash_script=None, debug=False):
CONTAINER_BASH_SCRIPT = [
"export DEBIAN_FRONTEND='noninteractive'",
"echo 'Binary::apt::APT::Keep-Downloaded-Packages \"true\";' > /etc/apt/apt.conf.d/docker-clean",
"chown -R root /root/.cache/pip",
"apt-get update",
"apt-get install -y git libsm6 libxext6 libxrender-dev libglib2.0-0",
"declare LOCAL_PYTHON",
"for i in {{10..5}}; do which python3.$i && python3.$i -m pip --version && "
"export LOCAL_PYTHON=$(which python3.$i) && break ; done",
"[ ! -z $LOCAL_PYTHON ] || apt-get install -y python3-pip",
"[ ! -z $LOCAL_PYTHON ] || export LOCAL_PYTHON=python3",
"$LOCAL_PYTHON -m pip install trains-agent",
"{extra_bash_init_cmd}",
"$LOCAL_PYTHON -m trains_agent execute --full-monitoring --require-queue --id {task_id}"
]
AGENT_LABEL = "TRAINS=agent"
LIMIT_POD_LABEL = "ai.allegro.agent.serial=pod-{pod_number}"
_edit_hyperparams_version = "2.9"
def __init__(
self,
k8s_pending_queue_name=None,
kubectl_cmd=None,
container_bash_script=None,
debug=False,
ports_mode=False,
num_of_services=20,
user_props_cb=None,
overrides_yaml=None,
template_yaml=None,
trains_conf_file=None,
extra_bash_init_script=None,
):
"""
Initialize the k8s integration glue layer daemon
:param str k8s_pending_queue_name: queue name to use when task is pending in the k8s scheduler
:param str|callable kubectl_cmd: kubectl command line str, supports formating (default: KUBECTL_RUN_CMD)
:param str|callable kubectl_cmd: kubectl command line str, supports formatting (default: KUBECTL_RUN_CMD)
example: "task={task_id} image={docker_image} queue_id={queue_id}"
or a callable function: kubectl_cmd(task_id, docker_image, queue_id, task_data)
:param str container_bash_script: container bash script to be executed in k8s (default: CONTAINER_BASH_SCRIPT)
Notice this string will use format() call, if you have curly brackets they should be doubled { -> {{
Format arguments passed: {task_id} and {extra_bash_init_cmd}
:param bool debug: Switch logging on
:param bool ports_mode: Adds a label to each pod which can be used in services in order to expose ports.
Requires the `num_of_services` parameter.
:param int num_of_services: Number of k8s services configured in the cluster. Required if `port_mode` is True.
(default: 20)
:param callable user_props_cb: An Optional callable allowing additional user properties to be specified
when scheduling a task to run in a pod. Callable can receive an optional pod number and should return
a dictionary of user properties (name and value). Signature is [[Optional[int]], Dict[str,str]]
:param str overrides_yaml: YAML file containing the overrides for the pod (optional)
:param str template_yaml: YAML file containing the template for the pod (optional).
If provided the pod is scheduled with kubectl apply and overrides are ignored, otherwise with kubectl run.
:param str trains_conf_file: trains.conf file to be use by the pod itself (optional)
:param str extra_bash_init_script: Additional bash script to run before starting the Task inside the container
"""
super(K8sIntegration, self).__init__()
self.k8s_pending_queue_name = k8s_pending_queue_name or self.K8S_PENDING_QUEUE
@@ -51,12 +121,88 @@ class K8sIntegration(Worker):
if debug:
self.log.logger.disabled = False
self.log.logger.setLevel(logging.INFO)
self.ports_mode = ports_mode
self.num_of_services = num_of_services
self._edit_hyperparams_support = None
self._user_props_cb = user_props_cb
self.trains_conf_file = None
self.overrides_json_string = None
self.template_dict = None
self.extra_bash_init_script = extra_bash_init_script or None
if self.extra_bash_init_script and not isinstance(self.extra_bash_init_script, str):
self.extra_bash_init_script = ' ; '.join(self.extra_bash_init_script) # noqa
self.pod_limits = []
self.pod_requests = []
if overrides_yaml:
with open(os.path.expandvars(os.path.expanduser(str(overrides_yaml))), 'rt') as f:
overrides = yaml.load(f, Loader=getattr(yaml, 'FullLoader', None))
if overrides:
containers = overrides.get('spec', {}).get('containers', [])
for c in containers:
resources = {str(k).lower(): v for k, v in c.get('resources', {}).items()}
if not resources:
continue
if resources.get('limits'):
self.pod_limits += ['{}={}'.format(k, v) for k, v in resources['limits'].items()]
if resources.get('requests'):
self.pod_requests += ['{}={}'.format(k, v) for k, v in resources['requests'].items()]
# remove double entries
self.pod_limits = list(set(self.pod_limits))
self.pod_requests = list(set(self.pod_requests))
if self.pod_limits or self.pod_requests:
self.log.warning('Found pod container requests={} limits={}'.format(
self.pod_limits, self.pod_requests))
if containers:
self.log.warning('Removing containers section: {}'.format(overrides['spec'].pop('containers')))
self.overrides_json_string = json.dumps(overrides)
if template_yaml:
with open(os.path.expandvars(os.path.expanduser(str(template_yaml))), 'rt') as f:
self.template_dict = yaml.load(f, Loader=getattr(yaml, 'FullLoader', None))
def run_one_task(self, queue: Text, task_id: Text, worker_args=None):
if trains_conf_file:
with open(os.path.expandvars(os.path.expanduser(str(trains_conf_file))), 'rt') as f:
self.trains_conf_file = f.read()
# make sure we use system packages!
self.trains_conf_file += '\nagent.package_manager.system_site_packages=true\n'
def _set_task_user_properties(self, task_id: str, **properties: str):
if self._edit_hyperparams_support is not True:
# either not supported or never tested
if self._edit_hyperparams_support == self._session.api_version:
# tested against latest api_version, not supported
return
if not self._session.check_min_api_version(self._edit_hyperparams_version):
# not supported due to insufficient api_version
self._edit_hyperparams_support = self._session.api_version
return
try:
self._session.get(
service="tasks",
action="edit_hyper_params",
task=task_id,
hyperparams=[
{
"section": "properties",
"name": k,
"value": str(v),
}
for k, v in properties.items()
],
)
# definitely supported
self._runtime_props_support = True
except APIError as error:
if error.code == 404:
self._edit_hyperparams_support = self._session.api_version
def run_one_task(self, queue: Text, task_id: Text, worker_args=None, **_):
print('Pulling task {} launching on kubernetes cluster'.format(task_id))
task_data = self._session.api_client.tasks.get_all(id=[task_id])[0]
# push task into the k8s queue, so we have visibility on pending tasks in the k8s scheduler
try:
print('Pushing task {} into temporary pending queue'.format(task_id))
self._session.api_client.tasks.reset(task_id)
self._session.api_client.tasks.enqueue(task_id, queue=self.k8s_pending_queue_name,
status_reason='k8s pending scheduler')
except Exception as e:
@@ -65,36 +211,231 @@ class K8sIntegration(Worker):
return
if task_data.execution.docker_cmd:
docker_image = task_data.execution.docker_cmd
docker_parts = task_data.execution.docker_cmd
else:
docker_image = str(os.environ.get("TRAINS_DOCKER_IMAGE") or
docker_parts = str(os.environ.get("TRAINS_DOCKER_IMAGE") or
self._session.config.get("agent.default_docker.image", "nvidia/cuda"))
# take the first part, this is the docker image name (not arguments)
docker_image = docker_image.split()[0]
docker_parts = docker_parts.split()
docker_image = docker_parts[0]
docker_args = docker_parts[1:] if len(docker_parts) > 1 else []
create_trains_conf = "echo '{}' >> ~/trains.conf && ".format(
HOCONConverter.to_hocon(self._session.config._config))
# get the trains.conf encoded file
# noinspection PyProtectedMember
hocon_config_encoded = (self.trains_conf_file or self._session._config_file).encode('ascii')
create_trains_conf = "echo '{}' | base64 --decode >> ~/trains.conf".format(
base64.b64encode(
hocon_config_encoded
).decode('ascii')
)
if callable(self.kubectl_cmd):
kubectl_cmd = self.kubectl_cmd(task_id, docker_image, queue, task_data)
if self.ports_mode:
print("Kubernetes looking for available pod to use")
# noinspection PyBroadException
try:
queue_name = self._session.api_client.queues.get_by_id(queue=queue).name
except Exception:
queue_name = 'k8s'
# conform queue name to k8s standards
safe_queue_name = queue_name.lower().strip()
safe_queue_name = re.sub(r'\W+', '', safe_queue_name).replace('_', '').replace('-', '')
# Search for a free pod number
pod_number = 1
while self.ports_mode:
kubectl_cmd_new = "kubectl get pods -l {pod_label},{agent_label} -n trains".format(
pod_label=self.LIMIT_POD_LABEL.format(pod_number=pod_number),
agent_label=self.AGENT_LABEL
)
process = subprocess.Popen(kubectl_cmd_new.split(), stdout=subprocess.PIPE, stderr=subprocess.PIPE)
output, error = process.communicate()
output = '' if not output else output if isinstance(output, str) else output.decode('utf-8')
error = '' if not error else error if isinstance(error, str) else error.decode('utf-8')
if not output:
# No such pod exist so we can use the pod_number we found
break
if pod_number >= self.num_of_services:
# All pod numbers are taken, exit
self.log.warning(
"kubectl last result: {}\n{}\nAll k8s services are in use, task '{}' "
"will be enqueued back to queue '{}'".format(
error, output, task_id, queue
)
)
self._session.api_client.tasks.reset(task_id)
self._session.api_client.tasks.enqueue(
task_id, queue=queue, status_reason='k8s max pod limit (no free k8s service)')
return
pod_number += 1
labels = ([self.LIMIT_POD_LABEL.format(pod_number=pod_number)] if self.ports_mode else []) + [self.AGENT_LABEL]
if self.ports_mode:
print("Kubernetes scheduling task id={} on pod={}".format(task_id, pod_number))
else:
kubectl_cmd = self.kubectl_cmd.format(task_id=task_id, docker_image=docker_image, queue_id=queue)
print("Kubernetes scheduling task id={}".format(task_id))
# make sure we gave a list
if self.template_dict:
output, error = self._kubectl_apply(
create_trains_conf=create_trains_conf,
labels=labels, docker_image=docker_image, docker_args=docker_args,
task_id=task_id, queue=queue, queue_name=safe_queue_name)
else:
output, error = self._kubectl_run(
create_trains_conf=create_trains_conf,
labels=labels, docker_image=docker_image,
task_data=task_data,
task_id=task_id, queue=queue, queue_name=safe_queue_name)
error = '' if not error else (error if isinstance(error, str) else error.decode('utf-8'))
output = '' if not output else (output if isinstance(output, str) else output.decode('utf-8'))
print('kubectl output:\n{}\n{}'.format(error, output))
if error:
self.log.error("Running kubectl encountered an error: {}".format(error))
user_props = {"k8s-queue": str(queue_name)}
if self.ports_mode:
user_props.update({"k8s-pod-number": pod_number, "k8s-pod-label": labels[0]})
if self._user_props_cb:
# noinspection PyBroadException
try:
custom_props = self._user_props_cb(pod_number) if self.ports_mode else self._user_props_cb()
user_props.update(custom_props)
except Exception:
pass
if user_props:
self._set_task_user_properties(
task_id=task_id,
**user_props
)
def _parse_docker_args(self, docker_args):
# type: (list) -> dict
kube_args = {'env': []}
while docker_args:
cmd = docker_args.pop().strip()
if cmd in ('-e', '--env',):
env = docker_args.pop().strip()
key, value = env.split('=', 1)
kube_args[key] += {key: value}
else:
self.log.warning('skipping docker argument {} (only -e --env supported)'.format(cmd))
return kube_args
def _kubectl_apply(self, create_trains_conf, docker_image, docker_args, labels, queue, task_id, queue_name):
template = deepcopy(self.template_dict)
template.setdefault('apiVersion', 'v1')
template['kind'] = 'Pod'
template.setdefault('metadata', {})
name = 'trains-{queue}-id-{task_id}'.format(queue=queue_name, task_id=task_id)
template['metadata']['name'] = name
template.setdefault('spec', {})
template['spec'].setdefault('containers', [])
if labels:
labels_dict = dict(pair.split('=', 1) for pair in labels)
template['metadata'].setdefault('labels', {})
template['metadata']['labels'].update(labels_dict)
container = self._parse_docker_args(docker_args)
container_bash_script = [self.container_bash_script] if isinstance(self.container_bash_script, str) \
else self.container_bash_script
script_encoded = '\n'.join(
['#!/bin/bash', ] +
[line.format(extra_bash_init_cmd=self.extra_bash_init_script or '', task_id=task_id)
for line in container_bash_script])
create_init_script = \
"echo '{}' | base64 --decode >> ~/__start_agent__.sh ; " \
"/bin/bash ~/__start_agent__.sh".format(
base64.b64encode(
script_encoded.encode('ascii')
).decode('ascii'))
container = merge_dicts(
container,
dict(name=name, image=docker_image,
command=['/bin/bash'],
args=['-c', '{} ; {}'.format(create_trains_conf, create_init_script)])
)
if template['spec']['containers']:
template['spec']['containers'][0] = merge_dicts(template['spec']['containers'][0], container)
else:
template['spec']['containers'].append(container)
fp, yaml_file = tempfile.mkstemp(prefix='trains_k8stmpl_', suffix='.yml')
os.close(fp)
with open(yaml_file, 'wt') as f:
yaml.dump(template, f)
kubectl_cmd = self.KUBECTL_APPLY_CMD.format(
task_id=task_id,
docker_image=docker_image,
queue_id=queue,
)
# make sure we provide a list
if isinstance(kubectl_cmd, str):
kubectl_cmd = kubectl_cmd.split()
kubectl_cmd += ["--labels=TRAINS=agent", "--command", "--", "/bin/sh", "-c",
create_trains_conf + self.container_bash_script.format(task_id)]
# add the template file at the end
kubectl_cmd += [yaml_file]
try:
process = subprocess.Popen(kubectl_cmd, stdout=subprocess.PIPE, stderr=subprocess.PIPE)
output, error = process.communicate()
except Exception as ex:
return None, str(ex)
finally:
safe_remove_file(yaml_file)
return output, error
def _kubectl_run(self, create_trains_conf, docker_image, labels, queue, task_data, task_id, queue_name):
if callable(self.kubectl_cmd):
kubectl_cmd = self.kubectl_cmd(task_id, docker_image, queue, task_data, queue_name)
else:
kubectl_cmd = self.kubectl_cmd.format(
queue_name=queue_name,
task_id=task_id,
docker_image=docker_image,
queue_id=queue
)
# make sure we provide a list
if isinstance(kubectl_cmd, str):
kubectl_cmd = kubectl_cmd.split()
if self.overrides_json_string:
kubectl_cmd += ['--overrides=' + self.overrides_json_string]
if self.pod_limits:
kubectl_cmd += ['--limits', ",".join(self.pod_limits)]
if self.pod_requests:
kubectl_cmd += ['--requests', ",".join(self.pod_requests)]
container_bash_script = [self.container_bash_script] if isinstance(self.container_bash_script, str) \
else self.container_bash_script
container_bash_script = ' ; '.join(container_bash_script)
kubectl_cmd += [
"--labels=" + ",".join(labels),
"--command",
"--",
"/bin/sh",
"-c",
"{} ; {}".format(create_trains_conf, container_bash_script.format(
extra_bash_init_cmd=self.extra_bash_init_script, task_id=task_id)),
]
process = subprocess.Popen(kubectl_cmd, stdout=subprocess.PIPE, stderr=subprocess.PIPE)
output, error = process.communicate()
self.log.info("K8s scheduling experiment task id={}".format(task_id))
if error:
self.log.error("Running kubectl encountered an error: {}".format(
error if isinstance(error, str) else error.decode()))
return output, error
def run_tasks_loop(self, queues: List[Text], worker_params):
def run_tasks_loop(self, queues: List[Text], worker_params, **kwargs):
"""
:summary: Pull and run tasks from queues.
:description: 1. Go through ``queues`` by order.
@@ -108,6 +449,7 @@ class K8sIntegration(Worker):
events_service = self.get_service(Events)
# make sure we have a k8s pending queue
# noinspection PyBroadException
try:
self._session.api_client.queues.create(self.k8s_pending_queue_name)
except Exception:
@@ -119,7 +461,7 @@ class K8sIntegration(Worker):
while True:
# iterate over queues (priority style, queues[0] is highest)
for queue in queues:
# delete old completed /failed pods
# delete old completed / failed pods
get_bash_output(self.KUBECTL_DELETE_CMD)
# get next task in queue
@@ -155,15 +497,20 @@ class K8sIntegration(Worker):
if self._session.config["agent.reload_config"]:
self.reload_config()
def k8s_daemon(self, queues):
def k8s_daemon(self, queue):
"""
Start the k8s Glue service.
This service will be pulling tasks from *queues* and scheduling them for execution using kubectl.
This service will be pulling tasks from *queue* and scheduling them for execution using kubectl.
Notice all scheduled tasks are pushed back into K8S_PENDING_QUEUE,
and popped when execution actually starts. This creates full visibility into the k8s scheduler.
Manually popping a task from the K8S_PENDING_QUEUE,
will cause the k8s scheduler to skip the execution once the scheduled tasks needs to be executed
:param list(str) queues: List of queue names to pull from
:param list(str) queue: queue name to pull from
"""
return self.daemon(queues=queues, log_level=logging.INFO, foreground=True, docker=False)
return self.daemon(queues=[ObjectID(name=queue)] if queue else None,
log_level=logging.INFO, foreground=True, docker=False)
@classmethod
def get_ssh_server_bash(cls, ssh_port_number):
return ' ; '.join(line.format(port=ssh_port_number) for line in cls.BASH_INSTALL_SSH_CMD)

View File

@@ -173,14 +173,32 @@ def normalize_path(*paths):
def safe_remove_file(filename, error_message=None):
# noinspection PyBroadException
try:
os.remove(filename)
if filename:
os.remove(filename)
except Exception:
if error_message:
print(error_message)
def get_python_path(script_dir, entry_point, package_api):
def safe_remove_tree(filename):
if not filename:
return
# noinspection PyBroadException
try:
shutil.rmtree(filename, ignore_errors=True)
except Exception:
pass
# noinspection PyBroadException
try:
os.remove(filename)
except Exception:
pass
def get_python_path(script_dir, entry_point, package_api, is_conda_env=False):
# noinspection PyBroadException
try:
python_path_sep = ';' if is_windows_platform() else ':'
python_path_cmd = package_api.get_python_command(
@@ -192,9 +210,9 @@ def get_python_path(script_dir, entry_point, package_api):
(Path(script_dir) / Path(entry_point)).parent.absolute().as_posix(),
python_path_sep=python_path_sep)
if is_windows_platform():
return python_path.replace('/', '\\') + org_python_path
python_path = python_path.replace('/', '\\')
return python_path + org_python_path
return python_path if is_conda_env else (python_path + org_python_path)
except Exception:
return None
@@ -442,9 +460,9 @@ def chain_map(*args):
return reduce(lambda x, y: x.update(y) or x, args, {})
def check_directory_path(path):
def check_directory_path(path, check_whitespace_in_path=True):
message = 'Could not create directory "{}": {}'
if not is_windows_platform():
if not is_windows_platform() and check_whitespace_in_path:
match = re.search(r'\s', path)
if match:
raise CommandFailedError(
@@ -537,6 +555,7 @@ class ExecutionInfo(NonStrictAttrs):
branch = nullable_string
version_num = nullable_string
tag = nullable_string
docker_cmd = nullable_string
@classmethod
def from_task(cls, task_info):
@@ -554,4 +573,24 @@ class ExecutionInfo(NonStrictAttrs):
execution.entry_point = entry_point
execution.working_dir = working_dir or ""
# noinspection PyBroadException
try:
execution.docker_cmd = task_info.execution.docker_cmd
except Exception:
pass
return execution
class safe_furl(furl.furl):
@property
def port(self):
return self._port
@port.setter
def port(self, port):
"""
Any port value is valid
"""
self._port = port

View File

@@ -9,7 +9,7 @@ from attr import attrs, attrib
import six
from six import binary_type, text_type
from trains_agent.helper.base import nonstrict_in_place_sort, create_tree
from trains_agent.helper.base import nonstrict_in_place_sort
def print_text(text, newline=True):
@@ -22,15 +22,21 @@ def print_text(text, newline=True):
sys.stdout.write(data)
def decode_binary_lines(binary_lines, encoding='utf-8'):
def decode_binary_lines(binary_lines, encoding='utf-8', replace_cr=False, overwrite_cr=False):
# decode per line, if we failed decoding skip the line
lines = []
for b in binary_lines:
# noinspection PyBroadException
try:
l = b.decode(encoding=encoding, errors='replace').replace('\r', '\n')
except:
l = ''
lines.append(l + '\n' if l and l[-1] != '\n' else l)
line = b.decode(encoding=encoding, errors='replace')
if replace_cr:
line = line.replace('\r', '\n')
elif overwrite_cr:
cr_lines = line.split('\r')
line = cr_lines[-1] if cr_lines[-1] or len(cr_lines) < 2 else cr_lines[-2]
except Exception:
line = ''
lines.append(line + '\n' if not line or line[-1] != '\n' else line)
return lines

View File

@@ -3,3 +3,15 @@ from typing import Callable, Dict, Any
def filter_keys(filter_, dct): # type: (Callable[[Any], bool], Dict) -> Dict
return {key: value for key, value in dct.items() if filter_(key)}
def merge_dicts(dict1, dict2):
""" Recursively merges dict2 into dict1 """
if not isinstance(dict1, dict) or not isinstance(dict2, dict):
return dict2
for k in dict2:
if k in dict1:
dict1[k] = merge_dicts(dict1[k], dict2[k])
else:
dict1[k] = dict2[k]
return dict1

View File

@@ -200,24 +200,30 @@ class GPUStatCollection(object):
GPUStatCollection.global_processes[nv_process.pid] = \
psutil.Process(pid=nv_process.pid)
ps_process = GPUStatCollection.global_processes[nv_process.pid]
process['username'] = ps_process.username()
# cmdline returns full path;
# as in `ps -o comm`, get short cmdnames.
_cmdline = ps_process.cmdline()
if not _cmdline:
# sometimes, zombie or unknown (e.g. [kworker/8:2H])
process['command'] = '?'
process['full_command'] = ['?']
else:
process['command'] = os.path.basename(_cmdline[0])
process['full_command'] = _cmdline
# Bytes to MBytes
process['gpu_memory_usage'] = nv_process.usedGpuMemory // MB
process['cpu_percent'] = ps_process.cpu_percent()
process['cpu_memory_usage'] = \
round((ps_process.memory_percent() / 100.0) *
psutil.virtual_memory().total)
process['pid'] = nv_process.pid
# noinspection PyBroadException
try:
# we do not actually use these, so no point in collecting them
# process['username'] = ps_process.username()
# # cmdline returns full path;
# # as in `ps -o comm`, get short cmdnames.
# _cmdline = ps_process.cmdline()
# if not _cmdline:
# # sometimes, zombie or unknown (e.g. [kworker/8:2H])
# process['command'] = '?'
# process['full_command'] = ['?']
# else:
# process['command'] = os.path.basename(_cmdline[0])
# process['full_command'] = _cmdline
# process['cpu_percent'] = ps_process.cpu_percent()
# process['cpu_memory_usage'] = \
# round((ps_process.memory_percent() / 100.0) *
# psutil.virtual_memory().total)
# Bytes to MBytes
process['gpu_memory_usage'] = nv_process.usedGpuMemory // MB
except Exception:
# insufficient permissions
pass
return process
if not GPUStatCollection._gpu_device_info.get(index):
@@ -285,12 +291,13 @@ class GPUStatCollection(object):
# e.g. nvidia-smi reset or reboot the system
pass
# TODO: Do not block if full process info is not requested
time.sleep(0.1)
for process in processes:
pid = process['pid']
cache_process = GPUStatCollection.global_processes[pid]
process['cpu_percent'] = cache_process.cpu_percent()
# we do not actually use these, so no point in collecting them
# # TODO: Do not block if full process info is not requested
# time.sleep(0.1)
# for process in processes:
# pid = process['pid']
# cache_process = GPUStatCollection.global_processes[pid]
# process['cpu_percent'] = cache_process.cpu_percent()
index = N.nvmlDeviceGetIndex(handle)
gpu_info = {

View File

@@ -5,7 +5,7 @@ from contextlib import contextmanager
from typing import Text, Iterable, Union
import six
from trains_agent.helper.base import mkstemp, safe_remove_file, join_lines
from trains_agent.helper.base import mkstemp, safe_remove_file, join_lines, select_for_platform
from trains_agent.helper.process import Executable, Argv, PathLike
@@ -66,7 +66,20 @@ class PackageManager(object):
pass
def upgrade_pip(self):
return self._install("pip"+self.get_pip_version(), "--upgrade")
result = self._install(
select_for_platform(windows='"pip{}"', linux='pip{}').format(self.get_pip_version()), "--upgrade")
packages = self.run_with_env(('list',), output=True).splitlines()
# p.split is ('pip', 'x.y.z')
pip = [p.split() for p in packages if len(p.split()) == 2 and p.split()[0] == 'pip']
if pip:
# noinspection PyBroadException
try:
from .requirements import MarkerRequirement
pip = pip[0][1].split('.')
MarkerRequirement.pip_new_version = bool(int(pip[0]) >= 20)
except Exception:
pass
return result
def get_python_command(self, extra=()):
# type: (...) -> Executable
@@ -111,10 +124,12 @@ class PackageManager(object):
def out_of_scope_install_package(cls, package_name, *args):
if PackageManager._selected_manager is not None:
try:
return PackageManager._selected_manager._install(package_name, *args)
result = PackageManager._selected_manager._install(package_name, *args)
if result not in (0, None, True):
return False
except Exception:
pass
return
return False
return True
@classmethod
def out_of_scope_freeze(cls):

View File

@@ -2,8 +2,9 @@ from __future__ import unicode_literals
import json
import re
import shutil
import os
import subprocess
from collections import OrderedDict
from distutils.spawn import find_executable
from functools import partial
from itertools import chain
@@ -14,17 +15,18 @@ import yaml
from time import time
from attr import attrs, attrib, Factory
from pathlib2 import Path
from requirements import parse
from requirements.requirement import Requirement
from trains_agent.external.requirements_parser import parse
from trains_agent.external.requirements_parser.requirement import Requirement
from trains_agent.errors import CommandFailedError
from trains_agent.helper.base import rm_tree, NonStrictAttrs, select_for_platform, is_windows_platform
from trains_agent.helper.base import rm_tree, NonStrictAttrs, select_for_platform, is_windows_platform, ExecutionInfo
from trains_agent.helper.process import Argv, Executable, DEVNULL, CommandSequence, PathLike
from trains_agent.helper.package.requirements import SimpleVersion
from trains_agent.session import Session
from .base import PackageManager
from .pip_api.venv import VirtualenvPip
from .requirements import RequirementsManager, MarkerRequirement
from ...backend_api.session.defs import ENV_CONDA_ENV_PACKAGE
package_normalize = partial(re.compile(r"""\[version=['"](.*)['"]\]""").sub, r"\1")
@@ -40,8 +42,8 @@ def _package_diff(path, packages):
class CondaPip(VirtualenvPip):
def __init__(self, source=None, *args, **kwargs):
super(CondaPip, self).__init__(*args, interpreter=Path(kwargs.get('path'), "python.exe") \
if is_windows_platform() and kwargs.get('path') else None, **kwargs)
super(CondaPip, self).__init__(*args, interpreter=Path(kwargs.get('path'), "python.exe")
if is_windows_platform() and kwargs.get('path') else None, **kwargs)
self.source = source
def run_with_env(self, command, output=False, **kwargs):
@@ -61,8 +63,8 @@ class CondaAPI(PackageManager):
MINIMUM_VERSION = "4.3.30"
def __init__(self, session, path, python, requirements_manager):
# type: (Session, PathLike, float, RequirementsManager) -> None
def __init__(self, session, path, python, requirements_manager, execution_info=None, **kwargs):
# type: (Session, PathLike, float, RequirementsManager, ExecutionInfo, Any) -> None
"""
:param python: base python version to use (e.g python3.6)
:param path: path of env
@@ -72,7 +74,15 @@ class CondaAPI(PackageManager):
self.source = None
self.requirements_manager = requirements_manager
self.path = path
self.env_read_only = False
self.extra_channels = self.session.config.get('agent.package_manager.conda_channels', [])
self.conda_env_as_base_docker = \
self.session.config.get('agent.package_manager.conda_env_as_base_docker', None) or \
bool(ENV_CONDA_ENV_PACKAGE.get())
if ENV_CONDA_ENV_PACKAGE.get():
self.conda_pre_build_env_path = ENV_CONDA_ENV_PACKAGE.get()
else:
self.conda_pre_build_env_path = execution_info.docker_cmd if execution_info else None
self.pip = CondaPip(
session=self.session,
source=self.source,
@@ -80,10 +90,15 @@ class CondaAPI(PackageManager):
requirements_manager=self.requirements_manager,
path=self.path,
)
self.conda = (
find_executable("conda")
or Argv(select_for_platform(windows="where", linux="which"), "conda").get_output(shell=True).strip()
)
try:
self.conda = (
find_executable("conda") or
Argv(select_for_platform(windows="where", linux="which"), "conda").get_output(
shell=select_for_platform(windows=True, linux=False)).strip()
)
except Exception:
raise ValueError("ERROR: package manager \"conda\" selected, "
"but \'conda\' executable could not be located")
try:
output = Argv(self.conda, "--version").get_output(stderr=subprocess.STDOUT)
except subprocess.CalledProcessError as ex:
@@ -111,13 +126,58 @@ class CondaAPI(PackageManager):
def bin(self):
return self.pip.bin
# noinspection SpellCheckingInspection
def upgrade_pip(self):
return self._install("pip" + self.pip.get_pip_version())
# do not change pip version if pre built environement is used
if self.env_read_only:
print('Conda environment in read-only mode, skipping pip upgrade.')
return ''
return self._install(select_for_platform(windows='"pip{}"', linux='pip{}').format(self.pip.get_pip_version()))
def create(self):
"""
Create a new environment
"""
if self.conda_env_as_base_docker and self.conda_pre_build_env_path:
if Path(self.conda_pre_build_env_path).is_dir():
print("Using pre-existing Conda environment from {}".format(self.conda_pre_build_env_path))
self.path = Path(self.conda_pre_build_env_path)
self.source = ("conda", "activate", self.path.as_posix())
self.pip = CondaPip(
session=self.session,
source=self.source,
python=self.python,
requirements_manager=self.requirements_manager,
path=self.path,
)
conda_env = self._get_conda_sh()
self.source = self.pip.source = CommandSequence(('source', conda_env.as_posix()), self.source)
self.env_read_only = True
return self
elif Path(self.conda_pre_build_env_path).is_file():
print("Restoring Conda environment from {}".format(self.conda_pre_build_env_path))
tar_path = find_executable("tar")
self.path.mkdir(parents=True, exist_ok=True)
output = Argv(
tar_path,
"-xzf",
self.conda_pre_build_env_path,
"-C",
self.path,
).get_output()
self.source = self.pip.source = ("conda", "activate", self.path.as_posix())
conda_env = self._get_conda_sh()
self.source = self.pip.source = CommandSequence(('source', conda_env.as_posix()), self.source)
# unpack cleanup
print("Fixing prefix in Conda environment {}".format(self.path))
CommandSequence(('source', conda_env.as_posix()),
((self.path / 'bin' / 'conda-unpack').as_posix(), )).get_output()
return self
else:
raise ValueError("Could not restore Conda environment, cannot find {}".format(
self.conda_pre_build_env_path))
output = Argv(
self.conda,
"create",
@@ -133,13 +193,15 @@ class CondaAPI(PackageManager):
self.source = self.pip.source = (
tuple(match.group(1).split()) + (match.group(2),)
if match
else ("activate", self.path)
else ("conda", "activate", self.path.as_posix())
)
conda_env = Path(self.conda).parent.parent / 'etc' / 'profile.d' / 'conda.sh'
conda_env = self._get_conda_sh()
if conda_env.is_file() and not is_windows_platform():
self.source = self.pip.source = CommandSequence(('source', conda_env.as_posix()), self.source)
# install cuda toolkit
# noinspection PyBroadException
try:
cuda_version = float(int(self.session.config['agent.cuda_version'])) / 10.0
if cuda_version > 0:
@@ -181,6 +243,10 @@ class CondaAPI(PackageManager):
def _install(self, *args):
# type: (*PathLike) -> ()
# if we are in read only mode, do not install anything
if self.env_read_only:
print('Conda environment in read-only mode, skipping package installing: {}'.format(args))
return
channels_args = tuple(
chain.from_iterable(("-c", channel) for channel in self.extra_channels)
)
@@ -208,6 +274,10 @@ class CondaAPI(PackageManager):
return self._install(*packages)
def uninstall_packages(self, *packages):
# if we are in read only mode, do not uninstall anything
if self.env_read_only:
print('Conda environment in read-only mode, skipping package uninstalling: {}'.format(packages))
return ''
return self._run_command(("uninstall", "-p", self.path))
def install_from_file(self, path):
@@ -226,23 +296,158 @@ class CondaAPI(PackageManager):
with self.temp_file("pip_reqs", pip_packages) as reqs:
self.pip.install_from_file(reqs)
def freeze(self):
def freeze(self, freeze_full_environment=False):
requirements = self.pip.freeze()
req_lines = []
conda_lines = []
# noinspection PyBroadException
try:
conda_packages = json.loads(self._run_command((self.conda, "list", "--json", "-p", self.path), raw=True))
conda_packages_txt = []
requirements_pip = [r.split('==')[0].strip().lower() for r in requirements['pip']]
for pkg in conda_packages:
# skip if this is a pypi package or it is not a python package at all
if pkg['channel'] == 'pypi' or pkg['name'].lower() not in requirements_pip:
pip_lines = requirements['pip']
conda_packages_json = json.loads(
self._run_command((self.conda, "list", "--json", "-p", self.path), raw=True))
for r in conda_packages_json:
# check if this is a pypi package, if it is, leave it outside
if not r.get('channel') or r.get('channel') == 'pypi':
name = (r['name'].replace('-', '_'), r['name'])
pip_req_line = [l for l in pip_lines
if l.split('==', 1)[0].strip() in name or l.split('@', 1)[0].strip() in name]
if pip_req_line and \
('@' not in pip_req_line[0] or
not pip_req_line[0].split('@', 1)[1].strip().startswith('file://')):
req_lines.append(pip_req_line[0])
continue
req_lines.append(
'{}=={}'.format(name[1], r['version']) if r.get('version') else '{}'.format(name[1]))
continue
conda_packages_txt.append('{0}{1}{2}'.format(pkg['name'], '==', pkg['version']))
requirements['conda'] = conda_packages_txt
except:
# check if we have it in our required packages
name = r['name']
# hack support pytorch/torch different naming convention
if name == 'pytorch':
name = 'torch'
# skip over packages with _
if name.startswith('_'):
continue
conda_lines.append('{}=={}'.format(name, r['version']) if r.get('version') else '{}'.format(name))
# make sure we see the conda packages, put them into the pip as well
if conda_lines:
req_lines = ['# Conda Packages', ''] + conda_lines + ['', '# pip Packages', ''] + req_lines
requirements['pip'] = req_lines
requirements['conda'] = conda_lines
except Exception:
pass
if freeze_full_environment:
# noinspection PyBroadException
try:
conda_env_json = json.loads(
self._run_command((self.conda, "env", "export", "--json", "-p", self.path), raw=True))
conda_env_json.pop('name', None)
conda_env_json.pop('prefix', None)
conda_env_json.pop('channels', None)
requirements['conda_env_json'] = json.dumps(conda_env_json)
except Exception:
pass
return requirements
def _load_conda_full_env(self, conda_env_dict, requirements):
# noinspection PyBroadException
try:
cuda_version = int(self.session.config.get('agent.cuda_version', 0))
except Exception:
cuda_version = 0
conda_env_dict['channels'] = self.extra_channels
if 'dependencies' not in conda_env_dict:
conda_env_dict['dependencies'] = []
new_dependencies = OrderedDict()
pip_requirements = None
for line in conda_env_dict['dependencies']:
if isinstance(line, dict):
pip_requirements = line.pop('pip', None)
continue
name = line.strip().split('=', 1)[0].lower()
if name == 'pip':
continue
elif name == 'python':
line = 'python={}'.format('.'.join(line.split('=')[1].split('.')[:2]))
elif name == 'tensorflow-gpu' and cuda_version == 0:
line = 'tensorflow={}'.format(line.split('=')[1])
elif name == 'tensorflow' and cuda_version > 0:
line = 'tensorflow-gpu={}'.format(line.split('=')[1])
elif name in ('cupti', 'cudnn'):
# cudatoolkit should pull them based on the cudatoolkit version
continue
elif name.startswith('_'):
continue
new_dependencies[line.split('=', 1)[0].strip()] = line
# fix packages:
conda_env_dict['dependencies'] = list(new_dependencies.values())
with self.temp_file("conda_env", yaml.dump(conda_env_dict), suffix=".yml") as name:
print('Conda: Trying to install requirements:\n{}'.format(conda_env_dict['dependencies']))
result = self._run_command(
("env", "update", "-p", self.path, "--file", name)
)
# check if we need to remove specific packages
bad_req = self._parse_conda_result_bad_packges(result)
if bad_req:
print('failed installing the following conda packages: {}'.format(bad_req))
return False
if pip_requirements:
# create a list of vcs packages that we need to replace in the pip section
vcs_reqs = {}
if 'pip' in requirements:
pip_lines = requirements['pip'].splitlines() \
if isinstance(requirements['pip'], six.string_types) else requirements['pip']
for line in pip_lines:
try:
marker = list(parse(line))
except Exception:
marker = None
if not marker:
continue
m = MarkerRequirement(marker[0])
if m.vcs:
vcs_reqs[m.name] = m
try:
pip_req_str = [str(vcs_reqs.get(r.split('=', 1)[0], r)) for r in pip_requirements
if not r.startswith('pip=') and not r.startswith('virtualenv=')]
print('Conda: Installing requirements: step 2 - using pip:\n{}'.format(pip_req_str))
PackageManager._selected_manager = self.pip
self.pip.load_requirements({'pip': '\n'.join(pip_req_str)})
except Exception as e:
print(e)
raise e
finally:
PackageManager._selected_manager = self
self.requirements_manager.post_install(self.session)
def load_requirements(self, requirements):
# if we are in read only mode, do not uninstall anything
if self.env_read_only:
print('Conda environment in read-only mode, skipping requirements installation.')
return None
# if we have a full conda environment, use it and pass the pip to pip
if requirements.get('conda_env_json'):
# noinspection PyBroadException
try:
conda_env_json = json.loads(requirements.get('conda_env_json'))
print('Conda restoring full yaml environment')
return self._load_conda_full_env(conda_env_json, requirements)
except Exception:
print('Could not load fully stored conda environment, falling back to requirements')
# create new environment file
conda_env = dict()
conda_env['channels'] = self.extra_channels
@@ -276,6 +481,15 @@ class CondaAPI(PackageManager):
if m.vcs:
pip_requirements.append(m)
continue
# Skip over pip
if m.name in ('pip', 'virtualenv', ):
continue
# python version, only major.minor
if m.name == 'python' and m.specs:
m.specs = [(m.specs[0][0], '.'.join(m.specs[0][1].split('.')[:2])), ]
if '.' not in m.specs[0][1]:
continue
conda_supported_req_names.append(m.name.lower())
if m.req.name.lower() == 'matplotlib':
has_matplotlib = True
@@ -303,15 +517,20 @@ class CondaAPI(PackageManager):
continue
m = MarkerRequirement(marker[0])
# skip over local files (we cannot change the version to a local file)
if m.local_file:
continue
m_name = m.name.lower()
if m_name in conda_supported_req_names:
# this package is in the conda list,
# make sure that if we changed version and we match it in conda
conda_supported_req_names.remove(m_name)
## conda_supported_req_names.remove(m_name)
for cr in reqs:
if m_name == cr.name.lower():
if m_name.lower().replace('_', '-') == cr.name.lower().replace('_', '-'):
# match versions
cr.specs = m.specs
# # conda always likes "-" not "_" but only on pypi packages
# cr.name = cr.name.lower().replace('_', '-')
break
else:
# not in conda, it is a pip package
@@ -319,29 +538,39 @@ class CondaAPI(PackageManager):
if m_name == 'matplotlib':
has_matplotlib = True
# remove any leftover conda packages (they were removed from the pip list)
if conda_supported_req_names:
reqs = [r for r in reqs if r.name.lower() not in conda_supported_req_names]
# Conda requirements Hacks:
if has_matplotlib:
reqs.append(MarkerRequirement(Requirement.parse('graphviz')))
reqs.append(MarkerRequirement(Requirement.parse('python-graphviz')))
reqs.append(MarkerRequirement(Requirement.parse('kiwisolver')))
# remove specific cudatoolkit, it should have being preinstalled.
# allow to override default cudatoolkit, but not the derivative packages, cudatoolkit should pull them
reqs = [r for r in reqs if r.name not in ('cudnn', 'cupti')]
if has_torch and cuda_version == 0:
reqs.append(MarkerRequirement(Requirement.parse('cpuonly')))
# make sure we have no double entries
reqs = list(OrderedDict((r.name, r) for r in reqs).values())
# conform conda packages (version/name)
for r in reqs:
# change _ to - in name but not the prefix _ (as this is conda prefix)
if not r.name.startswith('_') and not requirements.get('conda', None):
r.name = r.name.replace('_', '-')
# remove .post from version numbers, it fails ~= version, and change == to ~=
if r.specs and r.specs[0]:
r.specs = [(r.specs[0][0].replace('==', '~='), r.specs[0][1].split('.post')[0])]
# conda always likes "-" not "_"
r.req.name = r.req.name.replace('_', '-')
while reqs:
# notice, we give conda more freedom in version selection, to help it choose best combination
conda_env['dependencies'] = [r.tostr() for r in reqs]
def clean_ver(ar):
if not ar.specs:
return ar.tostr()
ar.specs = [(ar.specs[0][0], ar.specs[0][1] + '.0' if '.' not in ar.specs[0][1] else ar.specs[0][1])]
return ar.tostr()
conda_env['dependencies'] = [clean_ver(r) for r in reqs]
with self.temp_file("conda_env", yaml.dump(conda_env), suffix=".yml") as name:
print('Conda: Trying to install requirements:\n{}'.format(conda_env['dependencies']))
result = self._run_command(
@@ -371,14 +600,17 @@ class CondaAPI(PackageManager):
if pip_requirements:
try:
pip_req_str = [r.tostr() for r in pip_requirements]
pip_req_str = [r.tostr() for r in pip_requirements if r.name not in ('pip', 'virtualenv', )]
print('Conda: Installing requirements: step 2 - using pip:\n{}'.format(pip_req_str))
self.pip.load_requirements('\n'.join(pip_req_str))
PackageManager._selected_manager = self.pip
self.pip.load_requirements({'pip': '\n'.join(pip_req_str)})
except Exception as e:
print(e)
raise e
finally:
PackageManager._selected_manager = self
self.requirements_manager.post_install()
self.requirements_manager.post_install(self.session)
return True
def _parse_conda_result_bad_packges(self, result_dict):
@@ -441,8 +673,22 @@ class CondaAPI(PackageManager):
def get_python_command(self, extra=()):
return CommandSequence(self.source, self.pip.get_python_command(extra=extra))
def _get_conda_sh(self):
# type () -> Path
base_conda_env = Path(self.conda).parent.parent / 'etc' / 'profile.d' / 'conda.sh'
if base_conda_env.is_file():
return base_conda_env
for path in os.environ.get('PATH', '').split(select_for_platform(windows=';', linux=':')):
conda = find_executable("conda", path=path)
if not conda:
continue
conda_env = Path(conda).parent.parent / 'etc' / 'profile.d' / 'conda.sh'
if conda_env.is_file():
return conda_env
return base_conda_env
# enable hashing with cmp=False because pdb fails on unhashable exceptions
# enable hashing with cmp=False because pdb fails on un-hashable exceptions
exception = attrs(str=True, cmp=False)

View File

@@ -1,25 +0,0 @@
from typing import Text
from .base import PackageManager
from .requirements import SimpleSubstitution
class CythonRequirement(SimpleSubstitution):
name = ("cython", "numpy", )
def __init__(self, *args, **kwargs):
super(CythonRequirement, self).__init__(*args, **kwargs)
def match(self, req):
# match both Cython & cython
return req.name and req.name.lower() in self.name
def replace(self, req):
"""
Replace a requirement
:raises: ValueError if version is pre-release
"""
# install Cython before
PackageManager.out_of_scope_install_package(str(req))
return Text(req)

View File

@@ -1,8 +1,10 @@
import re
from collections import OrderedDict
from typing import Text
from .base import PackageManager
from .requirements import SimpleSubstitution
from ..base import safe_furl as furl
class ExternalRequirements(SimpleSubstitution):
@@ -20,9 +22,11 @@ class ExternalRequirements(SimpleSubstitution):
return False
if not req.req or not req.req.line or not req.req.line.strip() or req.req.line.strip().startswith('#'):
return False
if req.pip_new_version and not (req.req.editable or req.req.vcs):
return False
return True
def post_install(self):
def post_install(self, session):
post_install_req = self.post_install_req
self.post_install_req = []
for req in post_install_req:
@@ -30,15 +34,50 @@ class ExternalRequirements(SimpleSubstitution):
freeze_base = PackageManager.out_of_scope_freeze() or ''
except:
freeze_base = ''
PackageManager.out_of_scope_install_package(req.tostr(markers=False), "--no-deps")
try:
freeze_post = PackageManager.out_of_scope_freeze() or ''
package_name = list(set(freeze_post['pip']) - set(freeze_base['pip']))
if package_name and package_name[0] not in self.post_install_req_lookup:
self.post_install_req_lookup[package_name[0]] = req.req.line
except:
pass
PackageManager.out_of_scope_install_package(req.tostr(markers=False), "--ignore-installed")
req_line = req.tostr(markers=False)
if req_line.strip().startswith('-e ') or req_line.strip().startswith('--editable'):
req_line = re.sub(r'^(-e|--editable=?)\s*', '', req_line, count=1)
if req.req.vcs and req_line.startswith('git+'):
try:
url_no_frag = furl(req_line)
url_no_frag.set(fragment=None)
# reverse replace
fragment = req_line[::-1].replace(url_no_frag.url[::-1], '', 1)[::-1]
vcs_url = req_line[4:]
# reverse replace
vcs_url = vcs_url[::-1].replace(fragment[::-1], '', 1)[::-1]
from ..repo import Git
vcs = Git(session=session, url=vcs_url, location=None, revision=None)
vcs._set_ssh_url()
new_req_line = 'git+{}{}'.format(vcs.url_with_auth, fragment)
if new_req_line != req_line:
furl_line = furl(new_req_line)
print('Replacing original pip vcs \'{}\' with \'{}\''.format(
req_line,
furl_line.set(password='xxxxxx').tostr() if furl_line.password else new_req_line))
req_line = new_req_line
except Exception:
print('WARNING: Failed parsing pip git install, using original line {}'.format(req_line))
# if we have older pip version we have to make sure we replace back the package name with the
# git repository link. In new versions this is supported and we get "package @ git+https://..."
if not req.pip_new_version:
PackageManager.out_of_scope_install_package(req_line, "--no-deps")
# noinspection PyBroadException
try:
freeze_post = PackageManager.out_of_scope_freeze() or ''
package_name = list(set(freeze_post['pip']) - set(freeze_base['pip']))
if package_name and package_name[0] not in self.post_install_req_lookup:
self.post_install_req_lookup[package_name[0]] = req.req.line
except Exception:
pass
# no need to force reinstall, pip will always rebuilt if the package comes from git
# and make sure the required packages are installed (if they are not it will install them)
if not PackageManager.out_of_scope_install_package(req_line):
raise ValueError("Failed installing GIT/HTTPs package \'{}\'".format(req_line))
def replace(self, req):
"""
@@ -51,10 +90,17 @@ class ExternalRequirements(SimpleSubstitution):
return Text('')
def replace_back(self, list_of_requirements):
if 'pip' in list_of_requirements:
original_requirements = list_of_requirements['pip']
list_of_requirements['pip'] = [r for r in original_requirements
if r not in self.post_install_req_lookup]
list_of_requirements['pip'] += [self.post_install_req_lookup.get(r, '')
for r in self.post_install_req_lookup.keys() if r in original_requirements]
if not list_of_requirements:
return list_of_requirements
for k in list_of_requirements:
# k is either pip/conda
if k not in ('pip', 'conda'):
continue
original_requirements = list_of_requirements[k]
list_of_requirements[k] = [r for r in original_requirements
if r not in self.post_install_req_lookup]
list_of_requirements[k] += [self.post_install_req_lookup.get(r, '')
for r in self.post_install_req_lookup.keys() if r in original_requirements]
return list_of_requirements

View File

@@ -1,32 +0,0 @@
from typing import Text
from .base import PackageManager
from .requirements import SimpleSubstitution
class HorovodRequirement(SimpleSubstitution):
name = "horovod"
def __init__(self, *args, **kwargs):
super(HorovodRequirement, self).__init__(*args, **kwargs)
self.post_install_req = None
def match(self, req):
# match both horovod
return req.name and self.name == req.name.lower()
def post_install(self):
if self.post_install_req:
PackageManager.out_of_scope_install_package(self.post_install_req.tostr(markers=False))
self.post_install_req = None
def replace(self, req):
"""
Replace a requirement
:raises: ValueError if version is pre-release
"""
# Store in post req install, and return nothing
self.post_install_req = req
# mark skip package, we will install it in post install hook
return Text('')

View File

@@ -1,6 +1,8 @@
from typing import Any
from pathlib2 import Path
from trains_agent.helper.base import select_for_platform, rm_tree
from trains_agent.helper.base import select_for_platform, rm_tree, ExecutionInfo
from trains_agent.helper.package.base import PackageManager
from trains_agent.helper.process import Argv, PathLike
from trains_agent.session import Session
@@ -9,8 +11,8 @@ from ..requirements import RequirementsManager
class VirtualenvPip(SystemPip, PackageManager):
def __init__(self, session, python, requirements_manager, path, interpreter=None):
# type: (Session, float, RequirementsManager, PathLike, PathLike) -> ()
def __init__(self, session, python, requirements_manager, path, interpreter=None, execution_info=None, **kwargs):
# type: (Session, float, RequirementsManager, PathLike, PathLike, ExecutionInfo, Any) -> ()
"""
Program interface to virtualenv pip.
Must be given either path to virtualenv or source command.
@@ -37,7 +39,7 @@ class VirtualenvPip(SystemPip, PackageManager):
if isinstance(requirements, dict) and requirements.get("pip"):
requirements["pip"] = self.requirements_manager.replace(requirements["pip"])
super(VirtualenvPip, self).load_requirements(requirements)
self.requirements_manager.post_install()
self.requirements_manager.post_install(self.session)
def create_flags(self):
"""

View File

@@ -0,0 +1,48 @@
from typing import Text
from .base import PackageManager
from .requirements import SimpleSubstitution
class PostRequirement(SimpleSubstitution):
name = ("horovod", )
optional_package_names = tuple()
def __init__(self, *args, **kwargs):
super(PostRequirement, self).__init__(*args, **kwargs)
self.post_install_req = []
# check if we need to replace the packages:
post_packages = self.config.get('agent.package_manager.post_packages', None)
if post_packages:
self.__class__.name = post_packages
post_optional_packages = self.config.get('agent.package_manager.post_optional_packages', None)
if post_optional_packages:
self.__class__.optional_package_names = post_optional_packages
def match(self, req):
# match both horovod
return req.name and (req.name.lower() in self.name or req.name.lower() in self.optional_package_names)
def post_install(self, session):
for req in self.post_install_req:
if req.name in self.optional_package_names:
# noinspection PyBroadException
try:
PackageManager.out_of_scope_install_package(req.tostr(markers=False))
except Exception:
pass
else:
PackageManager.out_of_scope_install_package(req.tostr(markers=False))
self.post_install_req = []
def replace(self, req):
"""
Replace a requirement
:raises: ValueError if version is pre-release
"""
# Store in post req install, and return nothing
self.post_install_req.append(req)
# mark skip package, we will install it in post install hook
return Text('')

View File

@@ -0,0 +1,75 @@
from typing import Text
from .base import PackageManager
from .requirements import SimpleSubstitution
class PriorityPackageRequirement(SimpleSubstitution):
name = ("cython", "numpy", "setuptools", )
optional_package_names = tuple()
def __init__(self, *args, **kwargs):
super(PriorityPackageRequirement, self).__init__(*args, **kwargs)
# check if we need to replace the packages:
priority_packages = self.config.get('agent.package_manager.priority_packages', None)
if priority_packages:
self.__class__.name = priority_packages
priority_optional_packages = self.config.get('agent.package_manager.priority_optional_packages', None)
if priority_optional_packages:
self.__class__.optional_package_names = priority_optional_packages
def match(self, req):
# match both Cython & cython
return req.name and (req.name.lower() in self.name or req.name.lower() in self.optional_package_names)
def replace(self, req):
"""
Replace a requirement
:raises: ValueError if version is pre-release
"""
if req.name in self.optional_package_names:
# noinspection PyBroadException
try:
if PackageManager.out_of_scope_install_package(str(req)):
return Text(req)
except Exception:
pass
return Text('')
PackageManager.out_of_scope_install_package(str(req))
return Text(req)
class PackageCollectorRequirement(SimpleSubstitution):
"""
This RequirementSubstitution class will allow you to have multiple instances of the same
package, it will output the last one (by order) to be actually used.
"""
name = tuple()
def __init__(self, session, collect_package):
super(PackageCollectorRequirement, self).__init__(session)
self._collect_packages = collect_package or tuple()
self._last_req = None
def match(self, req):
# match package names
return req.name and req.name.lower() in self._collect_packages
def replace(self, req):
"""
Replace a requirement
:raises: ValueError if version is pre-release
"""
self._last_req = req.clone()
return ''
def post_scan_add_req(self):
"""
Allows the RequirementSubstitution to add an extra line/requirements after
the initial requirements scan is completed.
Called only once per requirements.txt object
"""
last_req = self._last_req
self._last_req = None
return last_req

View File

@@ -82,6 +82,8 @@ class SimplePytorchRequirement(SimpleSubstitution):
92: 'https://download.pytorch.org/whl/cu92/torch_stable.html',
100: 'https://download.pytorch.org/whl/cu100/torch_stable.html',
101: 'https://download.pytorch.org/whl/cu101/torch_stable.html',
102: 'https://download.pytorch.org/whl/cu102/torch_stable.html',
110: 'https://download.pytorch.org/whl/cu110/torch_stable.html',
}
def __init__(self, *args, **kwargs):
@@ -117,20 +119,24 @@ class SimplePytorchRequirement(SimpleSubstitution):
@classmethod
def get_torch_page(cls, cuda_version, nightly=False):
# noinspection PyBroadException
try:
cuda = int(cuda_version)
except:
except Exception:
cuda = 0
if nightly:
# then try the nightly builds, it might be there...
torch_url = cls.nightly_page_lookup_template.format(cuda)
try:
if requests.get(torch_url, timeout=10).ok:
cls.torch_page_lookup[cuda] = torch_url
return cls.torch_page_lookup[cuda], cuda
except Exception:
pass
for c in range(cuda, max(-1, cuda-15), -1):
# then try the nightly builds, it might be there...
torch_url = cls.nightly_page_lookup_template.format(c)
# noinspection PyBroadException
try:
if requests.get(torch_url, timeout=10).ok:
print('Torch nightly CUDA {} download page found'.format(c))
cls.torch_page_lookup[c] = torch_url
return cls.torch_page_lookup[c], c
except Exception:
pass
return
# first check if key is valid
@@ -138,13 +144,16 @@ class SimplePytorchRequirement(SimpleSubstitution):
return cls.torch_page_lookup[cuda], cuda
# then try a new cuda version page
torch_url = cls.page_lookup_template.format(cuda)
try:
if requests.get(torch_url, timeout=10).ok:
cls.torch_page_lookup[cuda] = torch_url
return cls.torch_page_lookup[cuda], cuda
except Exception:
pass
for c in range(cuda, max(-1, cuda-15), -1):
torch_url = cls.page_lookup_template.format(c)
# noinspection PyBroadException
try:
if requests.get(torch_url, timeout=10).ok:
print('Torch CUDA {} download page found'.format(c))
cls.torch_page_lookup[c] = torch_url
return cls.torch_page_lookup[c], c
except Exception:
pass
keys = sorted(cls.torch_page_lookup.keys(), reverse=True)
for k in keys:
@@ -157,7 +166,7 @@ class SimplePytorchRequirement(SimpleSubstitution):
class PytorchRequirement(SimpleSubstitution):
name = "torch"
packages = ("torch", "torchvision", "torchaudio")
packages = ("torch", "torchvision", "torchaudio", "torchcsprng", "torchtext")
def __init__(self, *args, **kwargs):
os_name = kwargs.pop("os_override", None)
@@ -235,6 +244,7 @@ class PytorchRequirement(SimpleSubstitution):
py_ver = self.python_major_minor_str.replace('.', '')
url = None
last_v = None
closest_v = None
# search for our package
for l in links_parser.links:
parts = l.split('/')[-1].split('-')
@@ -244,73 +254,94 @@ class PytorchRequirement(SimpleSubstitution):
continue
# version (ignore +cpu +cu92 etc. + is %2B in the file link)
# version ignore .postX suffix (treat as regular version)
# noinspection PyBroadException
try:
v = str(parts[1].split('%')[0].split('+')[0])
except Exception:
continue
if len(parts) < 3 or not parts[2].endswith(py_ver):
continue
if len(parts) < 5 or platform_wheel not in parts[4]:
continue
# update the closest matched version (from above)
if not closest_v:
closest_v = v
elif SimpleVersion.compare_versions(
version_a=closest_v, op='>=', version_b=v, num_parts=3) and \
SimpleVersion.compare_versions(
version_a=v, op='>=', version_b=req.specs[0][1], num_parts=3):
closest_v = v
# check if this an actual match
if not req.compare_version(v) or \
(last_v and SimpleVersion.compare_versions(last_v, '>', v, ignore_sub_versions=False)):
continue
if not parts[2].endswith(py_ver):
continue
if platform_wheel not in parts[4]:
continue
url = '/'.join(torch_url.split('/')[:-1] + l.split('/'))
last_v = v
# if we found an exact match, use it
# noinspection PyBroadException
try:
if req.specs[0][0] == '==' and \
SimpleVersion.compare_versions(req.specs[0][1], '==', v, ignore_sub_versions=False):
break
except:
except Exception:
pass
return url
return url, last_v or closest_v
def get_url_for_platform(self, req):
# check if package is already installed with system packages
# noinspection PyBroadException
try:
if self.config.get("agent.package_manager.system_site_packages"):
if self.config.get("agent.package_manager.system_site_packages", None):
from pip._internal.commands.show import search_packages_info
installed_torch = list(search_packages_info([req.name]))
# notice the comparision order, the first part will make sure we have a valid installed package
if installed_torch[0]['version'] and req.compare_version(installed_torch[0]['version']):
# notice the comparison order, the first part will make sure we have a valid installed package
if installed_torch and installed_torch[0]['version'] and \
req.compare_version(installed_torch[0]['version']):
print('PyTorch: requested "{}" version {}, using pre-installed version {}'.format(
req.name, req.specs[0] if req.specs else 'unspecified', installed_torch[0]['version']))
# package already installed, do nothing
return str(req), True
except:
req.specs = [('==', str(installed_torch[0]['version']))]
return '{} {} {}'.format(req.name, req.specs[0][0], req.specs[0][1]), True
except Exception:
pass
# make sure we have a specific version to retrieve
if not req.specs:
req.specs = [('>', '0')]
# noinspection PyBroadException
try:
req.specs[0] = (req.specs[0][0], req.specs[0][1].split('+')[0])
except:
except Exception:
pass
op, version = req.specs[0]
# assert op == "=="
torch_url, torch_url_key = SimplePytorchRequirement.get_torch_page(self.cuda_version)
url = self._get_link_from_torch_page(req, torch_url)
if not url and self.config.get("agent.package_manager.torch_nightly"):
url, closest_matched_version = self._get_link_from_torch_page(req, torch_url)
if not url and self.config.get("agent.package_manager.torch_nightly", None):
torch_url, torch_url_key = SimplePytorchRequirement.get_torch_page(self.cuda_version, nightly=True)
url = self._get_link_from_torch_page(req, torch_url)
url, closest_matched_version = self._get_link_from_torch_page(req, torch_url)
# try one more time, with a lower cuda version (never fallback to CPU):
while not url and torch_url_key > 0:
previous_cuda_key = torch_url_key
print('Warning, could not locate PyTorch {} matching CUDA version {}, best candidate {}\n'.format(
req, previous_cuda_key, closest_matched_version))
url, closest_matched_version = self._get_link_from_torch_page(req, torch_url)
if url:
break
torch_url, torch_url_key = SimplePytorchRequirement.get_torch_page(int(torch_url_key)-1)
# never fallback to CPU
if torch_url_key < 1:
print('Warning! Could not locate PyTorch version {} matching CUDA version {}'.format(
req, previous_cuda_key))
raise ValueError('Could not locate PyTorch version {} matching CUDA version {}'.format(
req, self.cuda_version))
print('Warning! Could not locate PyTorch version {} matching CUDA version {}, trying CUDA version {}'.format(
req, previous_cuda_key, torch_url_key))
url = self._get_link_from_torch_page(req, torch_url)
print(
'Error! Could not locate PyTorch version {} matching CUDA version {}'.format(
req, previous_cuda_key))
raise ValueError(
'Could not locate PyTorch version {} matching CUDA version {}'.format(req, self.cuda_version))
else:
print('Trying PyTorch CUDA version {} support'.format(torch_url_key))
if not url:
url = PytorchWheel(
@@ -322,6 +353,8 @@ class PytorchRequirement(SimpleSubstitution):
if url:
# normalize url (sometimes we will get ../ which we should not...
url = '/'.join(url.split('/')[:3]) + urllib.parse.quote(str(furl(url).path.normalize()))
# print found
print('Found PyTorch version {} matching CUDA version {}'.format(req, torch_url_key))
self.log.debug("checking url: %s", url)
return url, requests.head(url, timeout=10).ok
@@ -457,7 +490,13 @@ class PytorchRequirement(SimpleSubstitution):
if req.req.name == parts[0]:
# support for pip >= 20.1
if '@' in line:
lines[i] = '{} # {}'.format(str(req), str(new_req))
# skip if we have nothing to add
if str(req).strip() != str(new_req).strip():
# if this is local file and use the version detection
if req.local_file:
lines[i] = '{}'.format(str(new_req))
else:
lines[i] = '{} # {}'.format(str(req), str(new_req))
else:
lines[i] = '{} # {}'.format(line, str(new_req))
break

View File

@@ -4,7 +4,7 @@ import operator
import os
import re
from abc import ABCMeta, abstractmethod
from copy import deepcopy
from copy import deepcopy, copy
from itertools import chain, starmap
from operator import itemgetter
from os import path
@@ -12,15 +12,15 @@ from typing import Text, List, Type, Optional, Tuple, Dict
from pathlib2 import Path
from pyhocon import ConfigTree
from requirements import parse
# noinspection PyPackageRequirements
from requirements.requirement import Requirement
import six
from trains_agent.definitions import PIP_EXTRA_INDICES
from trains_agent.helper.base import warning, is_conda, which, join_lines, is_windows_platform
from trains_agent.helper.process import Argv, PathLike
from trains_agent.session import Session, normalize_cuda_version
from trains_agent.external.requirements_parser import parse
from trains_agent.external.requirements_parser.requirement import Requirement
from .translator import RequirementsTranslator
@@ -35,6 +35,10 @@ class FatalSpecsResolutionError(Exception):
@six.python_2_unicode_compatible
class MarkerRequirement(object):
# if True pip version above 20.x and with support for "package @ scheme://link"
# default is True
pip_new_version = True
def __init__(self, req): # type: (Requirement) -> None
self.req = req
@@ -54,7 +58,21 @@ class MarkerRequirement(object):
if self.specifier:
parts.append(self.format_specs())
elif self.vcs:
# leave the line as is, let pip handle it
if self.line:
return self.line
else:
# let's build the line manually
parts = [
self.uri,
'@{}'.format(self.revision) if self.revision else '',
'#subdirectory={}'.format(self.subdirectory) if self.subdirectory else ''
]
elif self.pip_new_version and self.uri and self.name and self.line and self.local_file:
# package @ file:///example.com/somewheel.whl
# leave the line as is, let pip handle it
return self.line
else:
parts = [self.uri]
@@ -63,6 +81,9 @@ class MarkerRequirement(object):
return ''.join(parts)
def clone(self):
return MarkerRequirement(copy(self.req))
__str__ = tostr
def __repr__(self):
@@ -128,7 +149,8 @@ class MarkerRequirement(object):
version = self.specs[0][1]
op = (op or self.specs[0][0]).strip()
return SimpleVersion.compare_versions(requested_version, op, version)
return SimpleVersion.compare_versions(
version_a=requested_version, op=op, version_b=version, num_parts=num_parts)
class SimpleVersion:
@@ -167,7 +189,7 @@ class SimpleVersion:
_regex = re.compile(r"^\s*" + VERSION_PATTERN + r"\s*$", re.VERBOSE | re.IGNORECASE)
@classmethod
def compare_versions(cls, version_a, op, version_b, ignore_sub_versions=True):
def compare_versions(cls, version_a, op, version_b, ignore_sub_versions=True, num_parts=3):
"""
Compare two versions based on the op operator
returns bool(version_a op version_b)
@@ -178,12 +200,12 @@ class SimpleVersion:
:param str version_b:
:param bool ignore_sub_versions: if true compare only major.minor.patch
(ignore a/b/rc/post/dev in the comparison)
:param int num_parts: number of parts to compare, split by . (dot)
:return bool: version_a op version_b
"""
if not version_b:
return True
num_parts = 3
if op == '~=':
num_parts = max(num_parts, 2)
@@ -316,7 +338,15 @@ class RequirementSubstitution(object):
"""
pass
def post_install(self):
def post_scan_add_req(self): # type: () -> Optional[MarkerRequirement]
"""
Allows the RequirementSubstitution to add an extra line/requirements after
the initial requirements scan is completed.
Called only once per requirements.txt object
"""
return None
def post_install(self, session):
pass
@classmethod
@@ -470,16 +500,28 @@ class RequirementsManager(object):
)
if not conda:
result = map(self.translator.translate, result)
result = list(result)
# add post scan add requirements call back
for h in self.handlers:
req = h.post_scan_add_req()
if req:
result.append(req.tostr())
return join_lines(result)
def post_install(self):
def post_install(self, session):
for h in self.handlers:
try:
h.post_install()
h.post_install(session)
except Exception as ex:
print('RequirementsManager handler {} raised exception: {}'.format(h, ex))
raise
def replace_back(self, requirements):
if self.translator:
requirements = self.translator.replace_back(requirements)
for h in self.handlers:
try:
requirements = h.replace_back(requirements)

View File

@@ -23,6 +23,7 @@ class RequirementsTranslator(object):
Path(self.cache_dir).mkdir(parents=True, exist_ok=True)
self.config = Config()
self.pip = SystemPip(interpreter=interpreter, session=self._session)
self._translate_back = {}
def download(self, url):
self.pip.download_package(url, cache_dir=self.cache_dir)
@@ -60,4 +61,30 @@ class RequirementsTranslator(object):
except Exception:
command.error('Could not download wheel name of "{}"'.format(line))
return line
self._translate_back[str(downloaded)] = line
return downloaded
def replace_back(self, requirements):
if not requirements:
return requirements
for k in requirements:
# k is either pip/conda
if k not in ('pip', 'conda'):
continue
original_requirements = requirements[k]
new_requirements = []
for line in original_requirements:
local_file = [d for d in self._translate_back.keys() if d in line]
if local_file:
local_file = local_file[0]
new_requirements.append(line.replace(local_file, self._translate_back[local_file]))
else:
new_requirements.append(line)
requirements[k] = new_requirements
return requirements

View File

@@ -11,6 +11,7 @@ from copy import deepcopy
from distutils.spawn import find_executable
from itertools import chain, repeat, islice
from os.path import devnull
from time import sleep
from typing import Union, Text, Sequence, Any, TypeVar, Callable
import psutil
@@ -41,6 +42,30 @@ def get_bash_output(cmd, strip=False, stderr=subprocess.STDOUT, stdin=False):
return output if not strip or not output else output.strip()
def terminate_process(pid, timeout=10.):
# noinspection PyBroadException
try:
proc = psutil.Process(pid)
proc.terminate()
cnt = 0
while proc.is_running() and cnt < timeout:
sleep(1.)
cnt += 1
proc.terminate()
cnt = 0
while proc.is_running() and cnt < timeout:
sleep(1.)
cnt += 1
proc.kill()
except Exception:
pass
# noinspection PyBroadException
try:
return not psutil.Process(pid).is_running()
except Exception:
return True
def kill_all_child_processes(pid=None):
# get current process if pid not provided
include_parent = True

View File

@@ -5,7 +5,7 @@ import subprocess
from distutils.spawn import find_executable
from hashlib import md5
from os import environ, getenv
from typing import Text, Sequence, Mapping, Iterable, TypeVar, Callable, Tuple
from typing import Text, Sequence, Mapping, Iterable, TypeVar, Callable, Tuple, Optional
import attr
from furl import furl
@@ -13,7 +13,7 @@ from pathlib2 import Path
import six
from trains_agent.definitions import ENV_AGENT_GIT_USER, ENV_AGENT_GIT_PASS
from trains_agent.definitions import ENV_AGENT_GIT_USER, ENV_AGENT_GIT_PASS, ENV_AGENT_GIT_HOST
from trains_agent.helper.console import ensure_text, ensure_binary
from trains_agent.errors import CommandFailedError
from trains_agent.helper.base import (
@@ -97,7 +97,7 @@ class VCS(object):
:param session: program session
:param url: repository url
:param location: (desired) clone location
:param: desired clone revision
:param revision: desired clone revision
"""
self.session = session
self.log = self.session.get_logger(
@@ -150,12 +150,23 @@ class VCS(object):
"""
Apply patch repository at `location`
"""
self.log.info("applying diff to %s", location)
self.log.info("applying diff to %s" % location)
for match in filter(
None, map(self.PATCH_ADDED_FILE_RE.match, patch_content.splitlines())
):
create_file_if_not_exists(normalize_path(location, match.group("path")))
# noinspection PyBroadException
try:
for match in filter(
None, map(self.PATCH_ADDED_FILE_RE.match, patch_content.splitlines())
):
file_path = None
# noinspection PyBroadException
try:
file_path = normalize_path(location, match.group("path"))
create_file_if_not_exists(file_path)
except Exception:
if file_path:
self.log.warning("failed creating file for git diff (%s)" % file_path)
except Exception:
pass
return_code, errors = self.call_with_stdin(
patch_content, *self.patch_base, cwd=location
@@ -208,7 +219,7 @@ class VCS(object):
)
@classmethod
def resolve_ssh_url(cls, url):
def replace_ssh_url(cls, url):
# type: (Text) -> Text
"""
Replace SSH URL with HTTPS URL when applicable
@@ -242,20 +253,58 @@ class VCS(object):
).url
return url
@classmethod
def replace_http_url(cls, url, port=None):
# type: (Text, Optional[int]) -> Text
"""
Replace HTTPS URL with SSH URL when applicable
"""
parsed_url = furl(url)
if parsed_url.scheme == "https":
parsed_url.scheme = "ssh"
parsed_url.username = "git"
parsed_url.password = None
# make sure there is no port in the final url (safe_furl support)
# the original port was an https port, and we do not know if there is a different ssh port,
# so we have to clear the original port specified (https) and use the default ssh schema port.
parsed_url.port = port or None
url = parsed_url.url
return url
def _set_ssh_url(self):
"""
Replace instance URL with SSH substitution result and report to log.
According to ``man ssh-add``, ``SSH_AUTH_SOCK`` must be set in order for ``ssh-add`` to work.
"""
if self.session.config.get('agent.force_git_ssh_protocol', None) and self.url:
parsed_url = furl(self.url)
# only apply to a specific domain (if requested)
config_domain = \
ENV_AGENT_GIT_HOST.get() or self.session.config.get("agent.git_host", None)
if config_domain and config_domain != parsed_url.host:
return
if parsed_url.scheme == "https":
new_url = self.replace_http_url(
self.url, port=self.session.config.get('agent.force_git_ssh_port', None))
if new_url != self.url:
print("Using SSH credentials - replacing https url '{}' with ssh url '{}'".format(
self.url, new_url))
self.url = new_url
return
if not self.session.config.agent.translate_ssh:
return
ssh_agent_variable = "SSH_AUTH_SOCK"
if not getenv(ssh_agent_variable) and (
(ENV_AGENT_GIT_USER.get() or self.session.config.get('agent.git_user', None)) and
(ENV_AGENT_GIT_PASS.get() or self.session.config.get('agent.git_pass', None))
):
new_url = self.resolve_ssh_url(self.url)
# if we have git_user / git_pass replace ssh credentials with https authentication
if (ENV_AGENT_GIT_USER.get() or self.session.config.get('agent.git_user', None)) and \
(ENV_AGENT_GIT_PASS.get() or self.session.config.get('agent.git_pass', None)):
# only apply to a specific domain (if requested)
config_domain = \
ENV_AGENT_GIT_HOST.get() or self.session.config.get("git_host", None)
if config_domain and config_domain != furl(self.url).host:
return
new_url = self.replace_ssh_url(self.url)
if new_url != self.url:
print("Using user/pass credentials - replacing ssh url '{}' with https url '{}'".format(
self.url, new_url))
@@ -332,9 +381,10 @@ class VCS(object):
"""
Run command with `input_` as stdin
"""
input_ = input_.encode("latin1")
if not input_.endswith(b"\n"):
input_ += b"\n"
input_ = input_.encode("utf-8")
# always add extra empty line
# (there is no downside, and it solves empty lines issue at end of patch cause corrupt message)
input_ += b"\n"
process = self._call_subprocess(
subprocess.Popen,
argv,
@@ -396,15 +446,20 @@ class VCS(object):
Add username and password to URL if missing from URL and present in config.
Does not modify ssh URLs.
"""
parsed_url = furl(url)
try:
parsed_url = furl(url)
except ValueError:
return url
if parsed_url.scheme in ["", "ssh"] or parsed_url.scheme.startswith("git"):
return parsed_url.url
config_user = ENV_AGENT_GIT_USER.get() or config.get("agent.{}_user".format(cls.executable_name), None)
config_pass = ENV_AGENT_GIT_PASS.get() or config.get("agent.{}_pass".format(cls.executable_name), None)
config_domain = ENV_AGENT_GIT_HOST.get() or config.get("agent.{}_host".format(cls.executable_name), None)
if (
(not (parsed_url.username and parsed_url.password))
and config_user
and config_pass
and (not config_domain or config_domain.lower() == parsed_url.host)
):
parsed_url.set(username=config_user, password=config_pass)
return parsed_url.url
@@ -479,7 +534,7 @@ class Git(VCS):
root=Argv(executable_name, "rev-parse", "--show-toplevel"),
)
patch_base = ("apply",)
patch_base = ("apply", "--unidiff-zero", )
class Hg(VCS):

View File

@@ -0,0 +1,170 @@
import re
from datetime import datetime, timedelta
from typing import List, Tuple, Optional
from trains_agent.backend_config.defs import UptimeConf
DAYS = ["SUN", "MON", "TUE", "WED", "THU", "FRI", "SAT"]
PATTERN = re.compile(r"^(?P<hours>[^\s]+)\s(?P<days>[^\s]+)")
def check_runtime(ranges_list, is_uptime=True):
# type: (List[str], bool) -> bool
for entry in ranges_list:
days_list = get_days_list(entry)
if not check_day(days_list):
continue
hours_list = get_hours_list(entry)
if check_hour(hours_list):
return is_uptime
return not is_uptime
def check_hour(hours):
# type: (List[str]) -> bool
return datetime.now().hour in hours
def check_day(days):
# type: (List[str]) -> bool
return datetime.now().strftime("%a").upper() in days
def get_days_list(entry):
# type: (str) -> List[str]
days_intervals = PATTERN.match(entry)["days"].split(",")
days_total = []
for days in days_intervals:
start, end = days.split("-") if "-" in days else (days, days)
try:
days_total.extend(
[*range(DAYS.index(start.upper()), DAYS.index(end.upper()) + 1)]
)
except ValueError:
print(
"Warning: days interval '{}' is invalid, use intervals of the format <start>-<end>."
" make sure to use the abbreviated format SUN-SAT".format(days)
)
continue
return [DAYS[day] for day in days_total]
def get_hours_list(entry):
# type: (str) -> List[str]
hours_intervals = PATTERN.match(entry)["hours"].split(",")
hours_total = []
for hours in hours_intervals:
start, end = get_start_end_hours(hours)
if not (start and end):
continue
hours_total.extend([*range(start, end)])
return hours_total
def get_start_end_hours(hours):
# type: (str) -> Tuple[int, int]
try:
start, end = (
tuple(map(int, hours.split("-"))) if "-" in hours else (int(hours), 24)
)
except Exception as ex:
print(
"Warning: hours interval '{}' is invalid, use intervals of the format <start>-<end>".format(
hours, ex
)
)
start, end = (None, None)
if end == 0:
end = 24
return start, end
def print_uptime_properties(
ranges_list, queues_info, runtime_properties, is_uptime=True
):
# type: (List[str], List[dict], List[dict], bool) -> None
if ranges_list:
uptime_string = ["Working hours {} configurations".format("uptime" if is_uptime else "downtime")]
uptime_string.extend(get_uptime_string(entry) for entry in ranges_list)
else:
uptime_string = ["No uptime/downtime configurations found"]
is_server_forced, server_string = get_runtime_properties_string(runtime_properties)
is_queue_forced, queues_string = get_queues_tags_string(queues_info)
res = list(
filter(
len,
[
"\n".join(uptime_string),
"\nCurrently forced {}".format(is_queue_forced or is_server_forced)
if is_queue_forced or is_server_forced
else " ",
server_string,
queues_string,
],
)
)
print("\n".join(res))
def get_uptime_string(entry):
# type: (str) -> str
res = []
days_list = get_days_list(entry)
hours_intervals = PATTERN.match(entry)["hours"].split(",")
for hours in hours_intervals:
start, end = get_start_end_hours(hours)
if not (start and end):
continue
res.append(
" - {}:00-{}:59 on {}".format(start, end - 1, ' and '.join(days_list))
if not (start == end)
else ""
)
return "\n".join(res)
def get_runtime_properties_string(runtime_properties):
# type: (List[dict]) -> Tuple[Optional[str], str]
server_string = []
force_flag = next(
(prop for prop in runtime_properties if prop["key"] == UptimeConf.worker_key),
None,
)
is_server_forced = None
if force_flag:
is_server_forced = force_flag["value"].upper()
expiry_hour = (
(datetime.now() + timedelta(seconds=force_flag["expiry"])).strftime("%H:%M")
if force_flag["expiry"]
else None
)
expires = " expires at {}".format(expiry_hour) if expiry_hour else ""
server_string.append(
" - Server runtime property '{}: {}'{}".format(force_flag['key'], force_flag['value'], expires)
)
return is_server_forced, "\n".join(server_string)
def get_queues_tags_string(queues_info):
# type: (List[dict]) -> Tuple[Optional[str], str]
queues_string = []
is_queue_forced = None
for queue in queues_info:
if "force_workers:off" in queue.get("tags", []):
tag = "force_workers:off"
is_queue_forced = is_queue_forced or "OFF"
elif "force_workers:on" in queue.get("tags", []):
tag = "force_workers:on"
is_queue_forced = "ON"
else:
tag = None
tagged = " (tagged '{}')'".format(tag) if tag else ""
queues_string.append(
" - Listening to queue '{}'{}".format(queue.get('name', ''), tagged)
)
return is_queue_forced, "\n".join(queues_string)

View File

@@ -4,6 +4,8 @@ from time import sleep
from glob import glob
from tempfile import gettempdir, NamedTemporaryFile
from typing import List, Tuple, Optional
from trains_agent.definitions import ENV_DOCKER_HOST_MOUNT
from trains_agent.helper.base import warning
@@ -37,6 +39,10 @@ class Singleton(object):
except:
pass
@classmethod
def get_lock_filename(cls):
return os.path.join(cls._get_temp_folder(), cls._lock_file_name)
@classmethod
def register_instance(cls, unique_worker_id=None, worker_name=None, api_client=None, allow_double=False):
"""
@@ -47,7 +53,7 @@ class Singleton(object):
:return: (str worker_id, int slot_number) Return None value on instance already running
"""
# try to lock file
lock_file = os.path.join(cls._get_temp_folder(), cls._lock_file_name)
lock_file = cls.get_lock_filename()
timeout = 0
while os.path.exists(lock_file):
if timeout > cls._lock_timeout:
@@ -79,30 +85,42 @@ class Singleton(object):
return ret
@classmethod
def _register_instance(cls, unique_worker_id=None, worker_name=None, api_client=None, allow_double=False):
if cls.worker_id:
return cls.worker_id, cls.instance_slot
# make sure we have a unique name
instance_num = 0
def get_running_pids(cls):
# type: () -> List[Tuple[int, Optional[str], Optional[int], str]]
temp_folder = cls._get_temp_folder()
files = glob(os.path.join(temp_folder, cls.prefix + cls.sep + '*' + cls.ext))
slots = {}
pids = []
for file in files:
parts = file.split(cls.sep)
parts = os.path.basename(file).split(cls.sep)
# noinspection PyBroadException
try:
pid = int(parts[1])
if not psutil.pid_exists(pid):
pid = -1
except Exception:
# something is wrong, use non existing pid and delete the file
pid = -1
uid, slot = None, None
# noinspection PyBroadException
try:
with open(file, 'r') as f:
uid, slot = str(f.read()).split('\n')
slot = int(slot)
except Exception:
pass
pids.append((pid, uid, slot, file))
return pids
@classmethod
def _register_instance(cls, unique_worker_id=None, worker_name=None, api_client=None, allow_double=False):
if cls.worker_id:
return cls.worker_id, cls.instance_slot
# make sure we have a unique name
instance_num = 0
slots = {}
for pid, uid, slot, file in cls.get_running_pids():
worker = None
if api_client and ENV_DOCKER_HOST_MOUNT.get() and uid:
try:
@@ -111,7 +129,7 @@ class Singleton(object):
worker = None
# count active instances and delete dead files
if not worker and not psutil.pid_exists(pid):
if not worker and pid < 0:
# delete the file
try:
os.remove(os.path.join(file))
@@ -165,3 +183,9 @@ class Singleton(object):
@classmethod
def get_slot(cls):
return cls.instance_slot or 0
@classmethod
def get_pid_file(cls):
if not cls._pid_file:
return None
return cls._pid_file.name

View File

@@ -68,6 +68,10 @@ DAEMON_ARGS = dict({
'dest': 'queues',
'type': foreign_object_id('queues'),
},
'--order-fairness': {
'help': 'Pull from each queue in a round-robin order, instead of priority order.',
'action': 'store_true',
},
'--standalone-mode': {
'help': 'Do not use any network connects, assume everything is pre-installed',
'action': 'store_true',
@@ -85,7 +89,28 @@ DAEMON_ARGS = dict({
'action': 'store_true',
'aliases': ['-d'],
},
'--stop': {
'help': 'Stop the running agent (based on the same set of arguments)',
'action': 'store_true',
},
'--uptime': {
'help': 'Specify uptime for trains-agent in "<hours> <days>" format. for example, use "17-20 TUE" to set '
'Tuesday\'s uptime to 17-20'
'Note: Make sure to have only one of uptime/downtime configuration and not both.',
'nargs': '*',
'default': None,
},
'--downtime': {
'help': 'Specify uptime for trains-agent in "<hours> <days>" format. for example, use "09-13 TUE" to set '
'Tuesday\'s downtime to 09-13'
'Note: Make sure to have only on of uptime/downtime configuration and not both.',
'nargs': '*',
'default': None,
},
'--status': {
'help': 'Print the worker\'s schedule (uptime properties, server\'s runtime properties and listening queues)',
'action': 'store_true',
},
}, **WORKER_ARGS)
COMMANDS = {

View File

@@ -73,9 +73,11 @@ class Session(_Session):
os.environ[LOCAL_CONFIG_FILE_OVERRIDE_VAR] = config_file
if not Path(config_file).is_file():
raise ValueError("Could not open configuration file: {}".format(config_file))
cpu_only = kwargs.get('cpu_only')
if cpu_only:
os.environ['CUDA_VISIBLE_DEVICES'] = os.environ['NVIDIA_VISIBLE_DEVICES'] = 'none'
if kwargs.get('gpus') and not os.environ.get('KUBERNETES_SERVICE_HOST') \
and not os.environ.get('KUBERNETES_PORT'):
# CUDA_VISIBLE_DEVICES does not support 'all'
@@ -84,6 +86,7 @@ class Session(_Session):
os.environ['NVIDIA_VISIBLE_DEVICES'] = kwargs.get('gpus')
else:
os.environ['CUDA_VISIBLE_DEVICES'] = os.environ['NVIDIA_VISIBLE_DEVICES'] = kwargs.get('gpus')
if kwargs.get('only_load_config'):
from trains_agent.backend_api.config import load
self.config = load()

View File

@@ -1 +1 @@
__version__ = '0.15.0'
__version__ = '0.16.3'