mirror of
https://github.com/clearml/clearml
synced 2025-06-26 18:16:07 +00:00
clearml initial version 0.17.0
This commit is contained in:
245
README.md
245
README.md
@@ -1,143 +1,134 @@
|
||||
# Allegro Trains - new name is coming soon ;)
|
||||
## Auto-Magical Experiment Manager, Version Control and ML-Ops for AI
|
||||
<div align="center">
|
||||
|
||||
## :confetti_ball: Now with Full ML/DL DevOps - See [TRAINS AGENT](https://github.com/allegroai/trains-agent) and [Services](https://github.com/allegroai/trains-server#trains-agent-services--)
|
||||
## :station: [Documentation is here!](https://allegro.ai/docs) `wubba lubba dub dub` and a [Slack Channel](https://join.slack.com/t/allegroai-trains/shared_invite/enQtOTQyMTI1MzQxMzE4LTY5NTUxOTY1NmQ1MzQ5MjRhMGRhZmM4ODE5NTNjMTg2NTBlZGQzZGVkMWU3ZDg1MGE1MjQxNDEzMWU2NmVjZmY) :train2:
|
||||
## Features: [AWS autoscaler wizard](https://allegro.ai/docs/examples/services/aws_autoscaler/aws_autoscaler/) :robot: [Hyper-Parameter Optimization](https://allegro.ai/docs/examples/optimization/hyper-parameter-optimization/examples_hyperparam_opt/) and :electric_plug: [Pipeline Controllers](https://allegro.ai/docs/examples/pipeline/pipeline_controller/)
|
||||
<a href="https://app.community.clear.ml"><img src="https://github.com/allegroai/trains/blob/master/docs/clearml-logo.svg?raw=true" width="250px"></a>
|
||||
|
||||
"Because it’s a jungle out there"
|
||||
|
||||
[](https://img.shields.io/github/license/allegroai/trains.svg)
|
||||
[](https://img.shields.io/pypi/pyversions/trains.svg)
|
||||
[](https://img.shields.io/pypi/v/trains.svg)
|
||||
[](https://pypi.python.org/pypi/trains/)
|
||||
**ClearML - Auto-Magical Suite of tools to streamline your ML workflow
|
||||
Experiment Manager, ML-Ops and Data-Management**
|
||||
|
||||
[](https://img.shields.io/github/license/allegroai/clearml.svg)
|
||||
[](https://img.shields.io/pypi/pyversions/clearml.svg)
|
||||
[](https://img.shields.io/pypi/v/clearml.svg)
|
||||
[](https://pypi.python.org/pypi/clearml/)
|
||||
[](https://optuna.org)
|
||||
[](https://join.slack.com/t/allegroai-trains/shared_invite/zt-c0t13pty-aVUZZW1TSSSg2vyIGVPBhg)
|
||||
[](https://join.slack.com/t/allegroai-trains/shared_invite/zt-c0t13pty-aVUZZW1TSSSg2vyIGVPBhg)
|
||||
|
||||
### :point_right: Help improve Trains by filling our 2-min [user survey](https://allegro.ai/lp/trains-user-survey/)
|
||||
</div>
|
||||
|
||||
Trains is our solution to a problem we share with countless other researchers and developers in the machine
|
||||
---
|
||||
### ClearML
|
||||
#### *Formerly known as Allegro Trains*
|
||||
ClearML is a ML/DL development and production suite, it contains three main modules:
|
||||
|
||||
- [Experiment Manager](#clearml-experiment-management) - Automagical experiment tracking, environments and results
|
||||
- [ML-Ops](https://github.com/allegroai/trains-agent) - Automation, Pipelines & Orchestration solution for ML/DL jobs (K8s / Cloud / bare-metal)
|
||||
- [Data-Management](https://github.com/allegroai/clearml/doc/clearml-data.md) - Fully differentiable data management & version control solution on top of object-storage
|
||||
(S3/GS/Azure/NAS)
|
||||
|
||||
|
||||
Instrumenting these components is the **ClearML-server**, see [Self-Hosting]() & [Free tier Hosting]()
|
||||
|
||||
|
||||
---
|
||||
<div align="center">
|
||||
|
||||
**[Signup](https://app.community.clear.ml) & [Start using](https://allegro.ai/clearml/docs/getting_started/getting_started/) in under 2 minutes**
|
||||
|
||||
</div>
|
||||
|
||||
---
|
||||
<a href="https://app.community.clear.ml"><img src="https://github.com/allegroai/trains/blob/master/docs/webapp_screenshots.gif?raw=true" width="100%"></a>
|
||||
|
||||
## ClearML Experiment Manager
|
||||
|
||||
**Adding only 2 lines to your code gets you the following**
|
||||
|
||||
* Complete experiment setup log
|
||||
* Full source control info including non-committed local changes
|
||||
* Execution environment (including specific packages & versions)
|
||||
* Hyper-parameters
|
||||
* ArgParser for command line parameters with currently used values
|
||||
* Explicit parameters dictionary
|
||||
* Tensorflow Defines (absl-py)
|
||||
* Hydra configuration and overrides
|
||||
* Initial model weights file
|
||||
* Full experiment output automatic capture
|
||||
* stdout and stderr
|
||||
* Resource Monitoring (CPU/GPU utilization, temperature, IO, network, etc.)
|
||||
* Model snapshots (With optional automatic upload to central storage: Shared folder, S3, GS, Azure, Http)
|
||||
* Artifacts log & store (Shared folder, S3, GS, Azure, Http)
|
||||
* Tensorboard/TensorboardX scalars, metrics, histograms, **images, audio and video samples**
|
||||
* [Matplotlib & Seaborn](https://github.com/allegroai/trains/tree/master/examples/frameworks/matplotlib)
|
||||
* [ClearML Explicit Logging](https://allegro.ai/clearml/docs/examples/reporting/) interface for complete flexibility.
|
||||
* Extensive platform support and integrations
|
||||
* Supported ML/DL frameworks: [PyTorch](https://github.com/allegroai/trains/tree/master/examples/frameworks/pytorch)(incl' ignite/lightning), [Tensorflow](https://github.com/allegroai/trains/tree/master/examples/frameworks/tensorflow), [Keras](https://github.com/allegroai/trains/tree/master/examples/frameworks/keras), [AutoKeras](https://github.com/allegroai/trains/tree/master/examples/frameworks/autokeras), [XGBoost](https://github.com/allegroai/trains/tree/master/examples/frameworks/xgboost) and [Scikit-Learn](https://github.com/allegroai/trains/tree/master/examples/frameworks/scikit-learn)
|
||||
* Seamless integration (including version control) with **Jupyter Notebook**
|
||||
and [*PyCharm* remote debugging](https://github.com/allegroai/trains-pycharm-plugin)
|
||||
|
||||
#### [Start using ClearML](https://allegro.ai/clearml/docs/getting_started/getting_started/)
|
||||
|
||||
```bash
|
||||
pip install clearml
|
||||
```
|
||||
|
||||
Add two lines to your code:
|
||||
```python
|
||||
from clearml import Task
|
||||
task = Task(project_name='examples', task_name='hello world')
|
||||
```
|
||||
|
||||
You are done, everything your process outputs is now automagically logged into ClearML.
|
||||
<br>Next step automation! **Learn more on ClearML two clicks automation [here]()**
|
||||
|
||||
## ClearML Architecture
|
||||
|
||||
The ClearML run-time components:
|
||||
|
||||
* The ClearML Python Package for integrating ClearML into your existing scripts by adding just two lines of code, and optionally extending your experiments and other workflows with ClearML powerful and versatile set of classes and methods.
|
||||
* The ClearML Server storing experiment, model, and workflow data, and supporting the Web UI experiment manager, and ML-Ops automation for reproducibility and tuning. It is available as a hosted service and open source for you to deploy your own ClearML Server.
|
||||
* The ClearML Agent for ML-Ops orchestration, experiment and workflow reproducibility, and scalability.
|
||||
|
||||
<img src="https://allegro.ai/clearml/docs/img/ClearML_Architecture.png" width="100%" alt="clearml-architecture">
|
||||
|
||||
## Additional Modules
|
||||
|
||||
- [clearml-session](https://github.com/allegroai/clearml-session) - **Launch remote JupyterLab / VSCode-server inside any docker, on Cloud/On-Prem machines**
|
||||
- [clearml-task](https://github.com/allegroai/clearml/doc/clearml-task.md) - Run any codebase on remote machines with full remote logging of Tensorboard, Matplotlib & Console outputs
|
||||
- [clearml-data](https://github.com/allegroai/clearml/doc/clearml-data.md) - **CLI for managing and versioning your datasets, including creating / uploading / downloading of data from S3/GS/Azure/NAS**
|
||||
- [AWS Auto-Scaler](examples/services/aws-autoscaler/aws_autoscaler.py) - Automatically spin EC2 instances based on your workloads with preconfigured budget! No need for K8s!
|
||||
- [Hyper-Parameter Optimization](examples/services/hyper-parameter-optimization/hyper_parameter_optimizer.py) - Optimize any code with black-box approach and state of the art Bayesian optimization algorithms
|
||||
- [Automation Pipeline](examples/pipeline/pipeline_controller.py) - Build pipelines based on existing experiments / jobs, supports building pipelines of pipelines!
|
||||
- [Slack Integration](examples/services/monitoring/slack_alerts.py) - Report experiments progress / failure directly to Slack (fully customizable!)
|
||||
|
||||
## Why ClearML?
|
||||
|
||||
ClearML is our solution to a problem we share with countless other researchers and developers in the machine
|
||||
learning/deep learning universe: Training production-grade deep learning models is a glorious but messy process.
|
||||
Trains tracks and controls the process by associating code version control, research projects,
|
||||
ClearML tracks and controls the process by associating code version control, research projects,
|
||||
performance metrics, and model provenance.
|
||||
|
||||
We designed Trains specifically to require effortless integration so that teams can preserve their existing methods
|
||||
and practices. Use it on a daily basis to boost collaboration and visibility, or use it to automatically collect
|
||||
your experimentation logs, outputs, and data to one centralized server.
|
||||
We designed ClearML specifically to require effortless integration so that teams can preserve their existing methods
|
||||
and practices.
|
||||
|
||||
**We have a demo server up and running at [https://demoapp.trains.allegro.ai](https://demoapp.trains.allegro.ai).**
|
||||
|
||||
### :steam_locomotive: [Getting Started Tutorial](https://allegro.ai/blog/setting-up-allegro-ai-platform/) :rocket:
|
||||
|
||||
**You can try out Trains and [test your code](#integrate-trains), with no additional setup.**
|
||||
|
||||
<a href="https://demoapp.trains.allegro.ai"><img src="https://github.com/allegroai/trains/blob/master/docs/webapp_screenshots.gif?raw=true" width="100%"></a>
|
||||
|
||||
## Trains Automatically Logs Everything
|
||||
**With only two lines of code, this is what you are getting:**
|
||||
|
||||
* Git repository, branch, commit id, entry point and local git diff
|
||||
* Python environment (including specific packages & versions)
|
||||
* stdout and stderr
|
||||
* Resource Monitoring (CPU/GPU utilization, temperature, IO, network, etc.)
|
||||
* Hyper-parameters
|
||||
* ArgParser for command line parameters with currently used values
|
||||
* Explicit parameters dictionary
|
||||
* Tensorflow Defines (absl-py)
|
||||
* Initial model weights file
|
||||
* Model snapshots (With optional automatic upload to central storage: Shared folder, S3, GS, Azure, Http)
|
||||
* Artifacts log & store (Shared folder, S3, GS, Azure, Http)
|
||||
* Tensorboard/TensorboardX scalars, metrics, histograms, **images, audio and video**
|
||||
* [Matplotlib & Seaborn](https://github.com/allegroai/trains/tree/master/examples/frameworks/matplotlib)
|
||||
* Supported frameworks: [PyTorch](https://github.com/allegroai/trains/tree/master/examples/frameworks/pytorch), [Tensorflow](https://github.com/allegroai/trains/tree/master/examples/frameworks/tensorflow), [Keras](https://github.com/allegroai/trains/tree/master/examples/frameworks/keras), [AutoKeras](https://github.com/allegroai/trains/tree/master/examples/frameworks/autokeras), [XGBoost](https://github.com/allegroai/trains/tree/master/examples/frameworks/xgboost) and [Scikit-Learn](https://github.com/allegroai/trains/tree/master/examples/frameworks/scikit-learn) (MxNet is coming soon)
|
||||
* Seamless integration (including version control) with **Jupyter Notebook**
|
||||
and [*PyCharm* remote debugging](https://github.com/allegroai/trains-pycharm-plugin)
|
||||
|
||||
**Additionally, log data explicitly using [Trains Explicit Logging](https://allegro.ai/docs/examples/reporting/).**
|
||||
|
||||
## Using Trains <a name="using-trains"></a>
|
||||
|
||||
Trains is a two part solution:
|
||||
|
||||
1. Trains [python package](https://pypi.org/project/trains/) auto-magically connects with your code
|
||||
|
||||
**Trains requires only two lines of code for full integration.**
|
||||
|
||||
To connect your code with Trains:
|
||||
|
||||
- Install Trains <a name="integrate-trains"></a>
|
||||
|
||||
pip install trains
|
||||
<details>
|
||||
<summary>Add optional cloud storage support (S3/GoogleStorage/Azure):</summary>
|
||||
|
||||
```bash
|
||||
pip install trains[s3]
|
||||
pip install trains[gs]
|
||||
pip install trains[azure]
|
||||
```
|
||||
|
||||
</details>
|
||||
|
||||
- Add the following lines to your code
|
||||
|
||||
from trains import Task
|
||||
task = Task.init(project_name="my project", task_name="my task")
|
||||
|
||||
* If project_name is not provided, the repository name will be used instead
|
||||
* If task_name (experiment) is not provided, the current filename will be used instead
|
||||
|
||||
- Run your code. When Trains connects to the server, a link is printed. For example
|
||||
|
||||
Trains Results page:
|
||||
https://demoapp.trains.allegro.ai/projects/76e5e2d45e914f52880621fe64601e85/experiments/241f06ae0f5c4b27b8ce8b64890ce152/output/log
|
||||
|
||||
- Open the link and view your experiment parameters, model and tensorboard metrics
|
||||
|
||||
**See examples [here](https://allegro.ai/docs/examples/examples_overview/)**
|
||||
|
||||
2. [Trains Server](https://github.com/allegroai/trains-server) for logging, querying, control and UI ([Web-App](https://github.com/allegroai/trains-web))
|
||||
|
||||
**We already have a demo server up and running for you at [https://demoapp.trains.allegro.ai](https://demoapp.trains.allegro.ai).**
|
||||
|
||||
**You can try out Trains without the need to install your own *trains-server*, just add the two lines of code, and it will automatically connect to the Trains demo-server.**
|
||||
|
||||
*Note that the demo server resets every 24 hours and all of the logged data is deleted.*
|
||||
|
||||
When you are ready to use your own Trains server, go ahead and [install *trains-server*](https://github.com/allegroai/trains-server).
|
||||
|
||||
<img src="https://github.com/allegroai/trains/blob/master/docs/system_diagram.png?raw=true" width="50%">
|
||||
|
||||
|
||||
## Configuring Your Own Trains server <a name="configuration"></a>
|
||||
|
||||
1. Install and run *trains-server* (see [Installing the Trains Server](https://github.com/allegroai/trains-server))
|
||||
|
||||
2. Run the initial configuration wizard for your Trains installation and follow the instructions to setup Trains package
|
||||
(http://**_trains-server-ip_**:__port__ and user credentials)
|
||||
|
||||
trains-init
|
||||
|
||||
After installing and configuring, you can access your configuration file at `~/trains.conf`
|
||||
|
||||
Sample configuration file available [here](https://github.com/allegroai/trains/blob/master/docs/trains.conf).
|
||||
- Use it on a daily basis to boost collaboration and visibility in your team
|
||||
- Create a remote job from any experiment with a click of a button
|
||||
- Automate processes and create pipelines to collect your experimentation logs, outputs, and data
|
||||
- Store all you data on any object-storage solution, with the simplest interface possible
|
||||
- Make you data transparent by cataloging it all on the ClearML platform
|
||||
|
||||
We believe ClearML is ground-breaking. We wish to establish new standards of true seamless integration between
|
||||
experiment management,ML-Ops and data management.
|
||||
|
||||
## Who We Are
|
||||
|
||||
Trains is supported by the same team behind *allegro.ai*,
|
||||
ClearML is supported by the team behind *allegro.ai*,
|
||||
where we build deep learning pipelines and infrastructure for enterprise companies.
|
||||
|
||||
We built Trains to track and control the glorious but messy process of training production-grade deep learning models.
|
||||
We are committed to vigorously supporting and expanding the capabilities of Trains.
|
||||
We built ClearML to track and control the glorious but messy process of training production-grade deep learning models.
|
||||
We are committed to vigorously supporting and expanding the capabilities of ClearML.
|
||||
|
||||
## Why Are We Releasing Trains?
|
||||
|
||||
We believe Trains is ground-breaking. We wish to establish new standards of experiment management in
|
||||
deep-learning and ML. Only the greater community can help us do that.
|
||||
|
||||
We promise to always be backwardly compatible. If you start working with Trains today,
|
||||
even though this project is currently in the beta stage, your logs and data will always upgrade with you.
|
||||
We promise to always be backwardly compatible, making sure all your logs, data and pipelines
|
||||
will always upgrade with you.
|
||||
|
||||
## License
|
||||
|
||||
@@ -145,19 +136,19 @@ Apache License, Version 2.0 (see the [LICENSE](https://www.apache.org/licenses/L
|
||||
|
||||
## Documentation, Community & Support
|
||||
|
||||
More information in the [official documentation](https://allegro.ai/docs) and [on YouTube](https://www.youtube.com/c/AllegroAI).
|
||||
More information in the [official documentation](https://allegro.ai//clearml/docs) and [on YouTube](https://www.youtube.com/c/AllegroAI).
|
||||
|
||||
For examples and use cases, check the [examples folder](https://github.com/allegroai/trains/tree/master/examples) and [corresponding documentation](https://allegro.ai/docs/examples/examples_overview/).
|
||||
For examples and use cases, check the [examples folder](https://github.com/allegroai/trains/tree/master/examples) and [corresponding documentation](https://allegro.ai/clearml/docs/examples/examples_overview/).
|
||||
|
||||
If you have any questions: post on our [Slack Channel](https://join.slack.com/t/allegroai-trains/shared_invite/enQtOTQyMTI1MzQxMzE4LTY5NTUxOTY1NmQ1MzQ5MjRhMGRhZmM4ODE5NTNjMTg2NTBlZGQzZGVkMWU3ZDg1MGE1MjQxNDEzMWU2NmVjZmY), or tag your questions on [stackoverflow](https://stackoverflow.com/questions/tagged/trains) with '**trains**' tag.
|
||||
|
||||
For feature requests or bug reports, please use [GitHub issues](https://github.com/allegroai/trains/issues).
|
||||
|
||||
Additionally, you can always find us at *trains@allegro.ai*
|
||||
Additionally, you can always find us at *clearml@allegro.ai*
|
||||
|
||||
## Contributing
|
||||
|
||||
See the Trains [Guidelines for Contributing](https://github.com/allegroai/trains/blob/master/docs/contributing.md).
|
||||
See the ClearML [Guidelines for Contributing](https://github.com/allegroai/trains/blob/master/docs/contributing.md).
|
||||
|
||||
|
||||
_May the force (and the goddess of learning rates) be with you!_
|
||||
|
||||
@@ -1,4 +1,4 @@
|
||||
""" TRAINS open SDK """
|
||||
""" ClearML open SDK """
|
||||
|
||||
from .version import __version__
|
||||
from .task import Task
|
||||
@@ -6,5 +6,7 @@ from .model import InputModel, OutputModel, Model
|
||||
from .logger import Logger
|
||||
from .storage import StorageManager
|
||||
from .errors import UsageError
|
||||
from .datasets import Dataset
|
||||
|
||||
__all__ = ["__version__", "Task", "InputModel", "OutputModel", "Model", "Logger", "StorageManager", "UsageError"]
|
||||
__all__ = ["__version__", "Task", "InputModel", "OutputModel", "Model", "Logger",
|
||||
"StorageManager", "UsageError", "Dataset"]
|
||||
|
||||
@@ -1,6 +1,7 @@
|
||||
from .parameters import UniformParameterRange, DiscreteParameterRange, UniformIntegerParameterRange, ParameterSet
|
||||
from .optimization import GridSearch, RandomSearch, HyperParameterOptimizer, Objective
|
||||
from .job import TrainsJob
|
||||
from .controller import PipelineController
|
||||
|
||||
__all__ = ["UniformParameterRange", "DiscreteParameterRange", "UniformIntegerParameterRange", "ParameterSet",
|
||||
"GridSearch", "RandomSearch", "HyperParameterOptimizer", "Objective", "TrainsJob"]
|
||||
"GridSearch", "RandomSearch", "HyperParameterOptimizer", "Objective", "TrainsJob", "PipelineController"]
|
||||
|
||||
@@ -102,15 +102,15 @@ class AutoScaler(object):
|
||||
|
||||
def spin_up_worker(self, resource, worker_id_prefix, queue_name):
|
||||
"""
|
||||
Creates a new worker for trains (cloud-specific implementation).
|
||||
Creates a new worker for clearml (cloud-specific implementation).
|
||||
First, create an instance in the cloud and install some required packages.
|
||||
Then, define trains-agent environment variables and run trains-agent for the specified queue.
|
||||
Then, define clearml-agent environment variables and run clearml-agent for the specified queue.
|
||||
NOTE: - Will wait until instance is running
|
||||
- This implementation assumes the instance image already has docker installed
|
||||
|
||||
:param str resource: resource name, as defined in self.resource_configurations and self.queues.
|
||||
:param str worker_id_prefix: worker name prefix
|
||||
:param str queue_name: trains queue to listen to
|
||||
:param str queue_name: clearml queue to listen to
|
||||
"""
|
||||
pass
|
||||
|
||||
@@ -137,17 +137,17 @@ class AutoScaler(object):
|
||||
minutes would be removed.
|
||||
"""
|
||||
|
||||
# Worker's id in trains would be composed from prefix, name, instance_type and cloud_id separated by ';'
|
||||
# Worker's id in clearml would be composed from prefix, name, instance_type and cloud_id separated by ';'
|
||||
workers_pattern = re.compile(
|
||||
r"^(?P<prefix>[^:]+):(?P<name>[^:]+):(?P<instance_type>[^:]+):(?P<cloud_id>[^:]+)"
|
||||
)
|
||||
|
||||
# Set up the environment variables for trains
|
||||
os.environ["TRAINS_API_HOST"] = self.api_server
|
||||
os.environ["TRAINS_WEB_HOST"] = self.web_server
|
||||
os.environ["TRAINS_FILES_HOST"] = self.files_server
|
||||
os.environ["TRAINS_API_ACCESS_KEY"] = self.access_key
|
||||
os.environ["TRAINS_API_SECRET_KEY"] = self.secret_key
|
||||
# Set up the environment variables for clearml
|
||||
os.environ["CLEARML_API_HOST"] = self.api_server
|
||||
os.environ["CLEARML_WEB_HOST"] = self.web_server
|
||||
os.environ["CLEARML_FILES_HOST"] = self.files_server
|
||||
os.environ["CLEARML_API_ACCESS_KEY"] = self.access_key
|
||||
os.environ["CLEARML_API_SECRET_KEY"] = self.secret_key
|
||||
api_client = APIClient()
|
||||
|
||||
# Verify the requested queues exist and create those that doesn't exist
|
||||
@@ -234,7 +234,7 @@ class AutoScaler(object):
|
||||
# skip resource types that might be needed
|
||||
if resources in required_idle_resources:
|
||||
continue
|
||||
# Remove from both aws and trains all instances that are idle for longer than MAX_IDLE_TIME_MIN
|
||||
# Remove from both aws and clearml all instances that are idle for longer than MAX_IDLE_TIME_MIN
|
||||
if time() - timestamp > self.max_idle_time_min * 60.0:
|
||||
cloud_id = workers_pattern.match(worker.id)["cloud_id"]
|
||||
self.spin_down_worker(cloud_id)
|
||||
|
||||
@@ -31,15 +31,15 @@ class AwsAutoScaler(AutoScaler):
|
||||
|
||||
def spin_up_worker(self, resource, worker_id_prefix, queue_name):
|
||||
"""
|
||||
Creates a new worker for trains.
|
||||
Creates a new worker for clearml.
|
||||
First, create an instance in the cloud and install some required packages.
|
||||
Then, define trains-agent environment variables and run trains-agent for the specified queue.
|
||||
Then, define clearml-agent environment variables and run clearml-agent for the specified queue.
|
||||
NOTE: - Will wait until instance is running
|
||||
- This implementation assumes the instance image already has docker installed
|
||||
|
||||
:param str resource: resource name, as defined in BUDGET and QUEUES.
|
||||
:param str worker_id_prefix: worker name prefix
|
||||
:param str queue_name: trains queue to listen to
|
||||
:param str queue_name: clearml queue to listen to
|
||||
"""
|
||||
resource_conf = self.resource_configurations[resource]
|
||||
# Add worker type and AWS instance type to the worker name.
|
||||
@@ -50,7 +50,7 @@ class AwsAutoScaler(AutoScaler):
|
||||
)
|
||||
|
||||
# user_data script will automatically run when the instance is started. it will install the required packages
|
||||
# for trains-agent configure it using environment variables and run trains-agent on the required queue
|
||||
# for clearml-agent configure it using environment variables and run clearml-agent on the required queue
|
||||
user_data = """#!/bin/bash
|
||||
sudo apt-get update
|
||||
sudo apt-get install -y python3-dev
|
||||
@@ -60,22 +60,22 @@ class AwsAutoScaler(AutoScaler):
|
||||
sudo apt-get install -y build-essential
|
||||
python3 -m pip install -U pip
|
||||
python3 -m pip install virtualenv
|
||||
python3 -m virtualenv trains_agent_venv
|
||||
source trains_agent_venv/bin/activate
|
||||
python -m pip install trains-agent
|
||||
echo 'agent.git_user=\"{git_user}\"' >> /root/trains.conf
|
||||
echo 'agent.git_pass=\"{git_pass}\"' >> /root/trains.conf
|
||||
echo "{trains_conf}" >> /root/trains.conf
|
||||
export TRAINS_API_HOST={api_server}
|
||||
export TRAINS_WEB_HOST={web_server}
|
||||
export TRAINS_FILES_HOST={files_server}
|
||||
python3 -m virtualenv clearml_agent_venv
|
||||
source clearml_agent_venv/bin/activate
|
||||
python -m pip install clearml-agent
|
||||
echo 'agent.git_user=\"{git_user}\"' >> /root/clearml.conf
|
||||
echo 'agent.git_pass=\"{git_pass}\"' >> /root/clearml.conf
|
||||
echo "{clearml_conf}" >> /root/clearml.conf
|
||||
export CLEARML_API_HOST={api_server}
|
||||
export CLEARML_WEB_HOST={web_server}
|
||||
export CLEARML_FILES_HOST={files_server}
|
||||
export DYNAMIC_INSTANCE_ID=`curl http://169.254.169.254/latest/meta-data/instance-id`
|
||||
export TRAINS_WORKER_ID={worker_id}:$DYNAMIC_INSTANCE_ID
|
||||
export TRAINS_API_ACCESS_KEY='{access_key}'
|
||||
export TRAINS_API_SECRET_KEY='{secret_key}'
|
||||
export CLEARML_WORKER_ID={worker_id}:$DYNAMIC_INSTANCE_ID
|
||||
export CLEARML_API_ACCESS_KEY='{access_key}'
|
||||
export CLEARML_API_SECRET_KEY='{secret_key}'
|
||||
{bash_script}
|
||||
source ~/.bashrc
|
||||
python -m trains_agent --config-file '/root/trains.conf' daemon --queue '{queue}' {docker}
|
||||
python -m clearml_agent --config-file '/root/clearml.conf' daemon --queue '{queue}' {docker}
|
||||
shutdown
|
||||
""".format(
|
||||
api_server=self.api_server,
|
||||
@@ -87,7 +87,7 @@ class AwsAutoScaler(AutoScaler):
|
||||
queue=queue_name,
|
||||
git_user=self.git_user or "",
|
||||
git_pass=self.git_pass or "",
|
||||
trains_conf='\\"'.join(self.extra_trains_conf.split('"')),
|
||||
clearml_conf='\\"'.join(self.extra_trains_conf.split('"')),
|
||||
bash_script=self.extra_vm_bash_script,
|
||||
docker="--docker '{}'".format(self.default_docker_image)
|
||||
if self.default_docker_image
|
||||
|
||||
@@ -17,7 +17,7 @@ class PipelineController(object):
|
||||
"""
|
||||
Pipeline controller.
|
||||
Pipeline is a DAG of base tasks, each task will be cloned (arguments changed as required) executed and monitored
|
||||
The pipeline process (task) itself can be executed manually or by the trains-agent services queue.
|
||||
The pipeline process (task) itself can be executed manually or by the clearml-agent services queue.
|
||||
Notice: The pipeline controller lives as long as the pipeline itself is being executed.
|
||||
"""
|
||||
_tag = 'pipeline'
|
||||
@@ -601,7 +601,7 @@ class PipelineController(object):
|
||||
print('Parameters:\n{}'.format(self._nodes[name].job.task_parameter_override))
|
||||
self._running_nodes.append(name)
|
||||
else:
|
||||
getLogger('trains.automation.controller').error(
|
||||
getLogger('clearml.automation.controller').error(
|
||||
'ERROR: Failed launching step \'{}\': {}'.format(name, self._nodes[name]))
|
||||
|
||||
# update current state (in configuration, so that we could later continue an aborted pipeline)
|
||||
|
||||
@@ -8,7 +8,7 @@ from ..task import Task
|
||||
from ..backend_api.services import tasks as tasks_service
|
||||
|
||||
|
||||
logger = getLogger('trains.automation.job')
|
||||
logger = getLogger('clearml.automation.job')
|
||||
|
||||
|
||||
class TrainsJob(object):
|
||||
|
||||
@@ -22,7 +22,7 @@ class Monitor(object):
|
||||
self._project_ids = None
|
||||
self._projects = None
|
||||
self._projects_refresh_timestamp = None
|
||||
self._trains_apiclient = None
|
||||
self._clearml_apiclient = None
|
||||
|
||||
def set_projects(self, project_names=None, project_names_re=None, project_ids=None):
|
||||
# type: (Optional[Sequence[str]], Optional[Sequence[str]], Optional[Sequence[str]]) -> ()
|
||||
@@ -167,10 +167,10 @@ class Monitor(object):
|
||||
def _get_api_client(self):
|
||||
# type: () -> APIClient
|
||||
"""
|
||||
Return an APIClient object to directly query the trains-server
|
||||
Return an APIClient object to directly query the clearml-server
|
||||
|
||||
:return: APIClient object
|
||||
"""
|
||||
if not self._trains_apiclient:
|
||||
self._trains_apiclient = APIClient()
|
||||
return self._trains_apiclient
|
||||
if not self._clearml_apiclient:
|
||||
self._clearml_apiclient = APIClient()
|
||||
return self._clearml_apiclient
|
||||
|
||||
@@ -15,7 +15,7 @@ from ..logger import Logger
|
||||
from ..backend_api.services import workers as workers_service, tasks as tasks_services
|
||||
from ..task import Task
|
||||
|
||||
logger = getLogger('trains.automation.optimization')
|
||||
logger = getLogger('clearml.automation.optimization')
|
||||
|
||||
|
||||
try:
|
||||
@@ -878,9 +878,9 @@ class HyperParameterOptimizer(object):
|
||||
:linenos:
|
||||
:caption: Example
|
||||
|
||||
from trains import Task
|
||||
from trains.automation import UniformParameterRange, DiscreteParameterRange
|
||||
from trains.automation import GridSearch, RandomSearch, HyperParameterOptimizer
|
||||
from clearml import Task
|
||||
from clearml.automation import UniformParameterRange, DiscreteParameterRange
|
||||
from clearml.automation import GridSearch, RandomSearch, HyperParameterOptimizer
|
||||
|
||||
task = Task.init('examples', 'HyperParameterOptimizer example')
|
||||
an_optimizer = HyperParameterOptimizer(
|
||||
|
||||
@@ -8,7 +8,7 @@ from ..utilities.check_updates import Version
|
||||
|
||||
|
||||
class ApiServiceProxy(object):
|
||||
_main_services_module = "trains.backend_api.services"
|
||||
_main_services_module = "clearml.backend_api.services"
|
||||
_available_versions = None
|
||||
|
||||
def __init__(self, module):
|
||||
|
||||
@@ -1,16 +1,16 @@
|
||||
{
|
||||
version: 1.5
|
||||
# default api_server: https://demoapi.trains.allegro.ai
|
||||
# default api_server: https://demoapi.clearml.allegro.ai
|
||||
api_server: ""
|
||||
# default web_server: https://demoapp.trains.allegro.ai
|
||||
# default web_server: https://demoapp.clearml.allegro.ai
|
||||
web_server: ""
|
||||
# default files_server: https://demofiles.trains.allegro.ai
|
||||
# default files_server: https://demofiles.clearml.allegro.ai
|
||||
files_server: ""
|
||||
|
||||
# verify host ssl certificate, set to False only if you have a very good reason
|
||||
verify_certificate: True
|
||||
|
||||
# default demoapi.trains.allegro.ai credentials
|
||||
# default demoapi.clearml.allegro.ai credentials
|
||||
credentials {
|
||||
access_key: ""
|
||||
secret_key: ""
|
||||
|
||||
@@ -107,15 +107,15 @@ class StrictSession(Session):
|
||||
init()
|
||||
return
|
||||
|
||||
original = os.environ.get(LOCAL_CONFIG_FILE_OVERRIDE_VAR, None)
|
||||
original = LOCAL_CONFIG_FILE_OVERRIDE_VAR.get() or None
|
||||
try:
|
||||
os.environ[LOCAL_CONFIG_FILE_OVERRIDE_VAR] = str(config_file)
|
||||
LOCAL_CONFIG_FILE_OVERRIDE_VAR.set(str(config_file))
|
||||
init()
|
||||
finally:
|
||||
if original is None:
|
||||
os.environ.pop(LOCAL_CONFIG_FILE_OVERRIDE_VAR, None)
|
||||
LOCAL_CONFIG_FILE_OVERRIDE_VAR.pop()
|
||||
else:
|
||||
os.environ[LOCAL_CONFIG_FILE_OVERRIDE_VAR] = original
|
||||
LOCAL_CONFIG_FILE_OVERRIDE_VAR.set(original)
|
||||
|
||||
def send(self, request, *args, **kwargs):
|
||||
result = super(StrictSession, self).send(request, *args, **kwargs)
|
||||
@@ -560,4 +560,4 @@ class APIClient(object):
|
||||
for name, module in services.items()
|
||||
},
|
||||
)
|
||||
)
|
||||
)
|
||||
|
||||
@@ -2,12 +2,14 @@ from ...backend_config import EnvEntry
|
||||
from ...backend_config.converters import safe_text_to_bool
|
||||
|
||||
|
||||
ENV_HOST = EnvEntry("TRAINS_API_HOST", "ALG_API_HOST")
|
||||
ENV_WEB_HOST = EnvEntry("TRAINS_WEB_HOST", "ALG_WEB_HOST")
|
||||
ENV_FILES_HOST = EnvEntry("TRAINS_FILES_HOST", "ALG_FILES_HOST")
|
||||
ENV_ACCESS_KEY = EnvEntry("TRAINS_API_ACCESS_KEY", "ALG_API_ACCESS_KEY")
|
||||
ENV_SECRET_KEY = EnvEntry("TRAINS_API_SECRET_KEY", "ALG_API_SECRET_KEY")
|
||||
ENV_VERBOSE = EnvEntry("TRAINS_API_VERBOSE", "ALG_API_VERBOSE", type=bool, default=False)
|
||||
ENV_HOST_VERIFY_CERT = EnvEntry("TRAINS_API_HOST_VERIFY_CERT", "ALG_API_HOST_VERIFY_CERT", type=bool, default=True)
|
||||
ENV_OFFLINE_MODE = EnvEntry("TRAINS_OFFLINE_MODE", "ALG_OFFLINE_MODE", type=bool, converter=safe_text_to_bool)
|
||||
ENV_TRAINS_NO_DEFAULT_SERVER = EnvEntry("TRAINS_NO_DEFAULT_SERVER", "ALG_NO_DEFAULT_SERVER", type=bool, default=False)
|
||||
ENV_HOST = EnvEntry("CLEARML_API_HOST", "TRAINS_API_HOST")
|
||||
ENV_WEB_HOST = EnvEntry("CLEARML_WEB_HOST", "TRAINS_WEB_HOST")
|
||||
ENV_FILES_HOST = EnvEntry("CLEARML_FILES_HOST", "TRAINS_FILES_HOST")
|
||||
ENV_ACCESS_KEY = EnvEntry("CLEARML_API_ACCESS_KEY", "TRAINS_API_ACCESS_KEY")
|
||||
ENV_SECRET_KEY = EnvEntry("CLEARML_API_SECRET_KEY", "TRAINS_API_SECRET_KEY")
|
||||
ENV_VERBOSE = EnvEntry("CLEARML_API_VERBOSE", "TRAINS_API_VERBOSE", type=bool, default=False)
|
||||
ENV_HOST_VERIFY_CERT = EnvEntry("CLEARML_API_HOST_VERIFY_CERT", "TRAINS_API_HOST_VERIFY_CERT",
|
||||
type=bool, default=True)
|
||||
ENV_OFFLINE_MODE = EnvEntry("CLEARML_OFFLINE_MODE", "TRAINS_OFFLINE_MODE", type=bool, converter=safe_text_to_bool)
|
||||
ENV_TRAINS_NO_DEFAULT_SERVER = EnvEntry("CLEARML_NO_DEFAULT_SERVER", "TRAINS_NO_DEFAULT_SERVER",
|
||||
type=bool, default=False)
|
||||
|
||||
@@ -36,12 +36,12 @@ class MaxRequestSizeError(Exception):
|
||||
|
||||
|
||||
class Session(TokenManager):
|
||||
""" TRAINS API Session class. """
|
||||
""" ClearML API Session class. """
|
||||
|
||||
_AUTHORIZATION_HEADER = "Authorization"
|
||||
_WORKER_HEADER = "X-Trains-Worker"
|
||||
_ASYNC_HEADER = "X-Trains-Async"
|
||||
_CLIENT_HEADER = "X-Trains-Client"
|
||||
_WORKER_HEADER = ("X-ClearML-Worker", "X-Trains-Worker", )
|
||||
_ASYNC_HEADER = ("X-ClearML-Async", "X-Trains-Async", )
|
||||
_CLIENT_HEADER = ("X-ClearML-Client", "X-Trains-Client", )
|
||||
|
||||
_async_status_code = 202
|
||||
_session_requests = 0
|
||||
@@ -57,10 +57,10 @@ class Session(TokenManager):
|
||||
_client = [(__package__.partition(".")[0], __version__)]
|
||||
|
||||
api_version = '2.1'
|
||||
default_demo_host = "https://demoapi.trains.allegro.ai"
|
||||
default_demo_host = "https://demoapi.demo.clear.ml"
|
||||
default_host = default_demo_host
|
||||
default_web = "https://demoapp.trains.allegro.ai"
|
||||
default_files = "https://demofiles.trains.allegro.ai"
|
||||
default_web = "https://demoapp.demo.clear.ml"
|
||||
default_files = "https://demofiles.demo.clear.ml"
|
||||
default_key = "EGRTCO8JMSIGI6S39GTP43NFWXDQOW"
|
||||
default_secret = "x!XTov_G-#vspE*Y(h$Anm&DIc5Ou-F)jsl$PdOyj5wG1&E!Z8"
|
||||
force_max_api_version = None
|
||||
@@ -177,8 +177,8 @@ class Session(TokenManager):
|
||||
if not api_version:
|
||||
api_version = '2.2' if token_dict.get('env', '') == 'prod' else Session.api_version
|
||||
if token_dict.get('server_version'):
|
||||
if not any(True for c in Session._client if c[0] == 'trains-server'):
|
||||
Session._client.append(('trains-server', token_dict.get('server_version'), ))
|
||||
if not any(True for c in Session._client if c[0] == 'clearml-server'):
|
||||
Session._client.append(('clearml-server', token_dict.get('server_version'), ))
|
||||
|
||||
Session.api_version = str(api_version)
|
||||
except (jwt.DecodeError, ValueError):
|
||||
@@ -218,10 +218,13 @@ class Session(TokenManager):
|
||||
if self._offline_mode:
|
||||
return None
|
||||
|
||||
res = None
|
||||
host = self.host
|
||||
headers = headers.copy() if headers else {}
|
||||
headers[self._WORKER_HEADER] = self.worker
|
||||
headers[self._CLIENT_HEADER] = self.client
|
||||
for h in self._WORKER_HEADER:
|
||||
headers[h] = self.worker
|
||||
for h in self._CLIENT_HEADER:
|
||||
headers[h] = self.client
|
||||
|
||||
token_refreshed_on_error = False
|
||||
url = (
|
||||
@@ -308,7 +311,8 @@ class Session(TokenManager):
|
||||
headers.copy() if headers else {}
|
||||
)
|
||||
if async_enable:
|
||||
headers[self._ASYNC_HEADER] = "1"
|
||||
for h in self._ASYNC_HEADER:
|
||||
headers[h] = "1"
|
||||
return self._send_request(
|
||||
service=service,
|
||||
action=action,
|
||||
@@ -508,7 +512,7 @@ class Session(TokenManager):
|
||||
if parsed.port == 8008:
|
||||
return host.replace(':8008', ':8080', 1)
|
||||
|
||||
raise ValueError('Could not detect TRAINS web application server')
|
||||
raise ValueError('Could not detect ClearML web application server')
|
||||
|
||||
@classmethod
|
||||
def get_files_server_host(cls, config=None):
|
||||
@@ -624,7 +628,7 @@ class Session(TokenManager):
|
||||
# check if this is a misconfigured api server (getting 200 without the data section)
|
||||
if res and res.status_code == 200:
|
||||
raise ValueError('It seems *api_server* is misconfigured. '
|
||||
'Is this the TRAINS API server {} ?'.format(self.host))
|
||||
'Is this the ClearML API server {} ?'.format(self.host))
|
||||
else:
|
||||
raise LoginError("Response data mismatch: No 'token' in 'data' value from res, receive : {}, "
|
||||
"exception: {}".format(res, ex))
|
||||
|
||||
@@ -14,7 +14,7 @@ if six.PY3:
|
||||
from functools import lru_cache
|
||||
elif six.PY2:
|
||||
# python 2 support
|
||||
from backports.functools_lru_cache import lru_cache
|
||||
from backports.functools_lru_cache import lru_cache # noqa
|
||||
|
||||
|
||||
__disable_certificate_verification_warning = 0
|
||||
@@ -139,7 +139,7 @@ def get_http_session_with_retry(
|
||||
if not session.verify and __disable_certificate_verification_warning < 2:
|
||||
# show warning
|
||||
__disable_certificate_verification_warning += 1
|
||||
logging.getLogger('trains').warning(
|
||||
logging.getLogger('clearml').warning(
|
||||
msg='InsecureRequestWarning: Certificate verification is disabled! Adding '
|
||||
'certificate verification is strongly advised. See: '
|
||||
'https://urllib3.readthedocs.io/en/latest/advanced-usage.html#ssl-warnings')
|
||||
|
||||
@@ -88,7 +88,7 @@ class Config(object):
|
||||
self._folder_name = config_folder or DEFAULT_CONFIG_FOLDER
|
||||
self._roots = []
|
||||
self._config = ConfigTree()
|
||||
self._env = env or os.environ.get("TRAINS_ENV", Environment.default)
|
||||
self._env = env or os.environ.get("CLEARML_ENV", os.environ.get("TRAINS_ENV", Environment.default))
|
||||
self.config_paths = set()
|
||||
self.is_server = is_server
|
||||
|
||||
@@ -139,7 +139,7 @@ class Config(object):
|
||||
else:
|
||||
env_config_paths = []
|
||||
|
||||
env_config_path_override = os.environ.get(ENV_CONFIG_PATH_OVERRIDE_VAR)
|
||||
env_config_path_override = ENV_CONFIG_PATH_OVERRIDE_VAR.get()
|
||||
if env_config_path_override:
|
||||
env_config_paths = [expanduser(env_config_path_override)]
|
||||
|
||||
@@ -166,7 +166,7 @@ class Config(object):
|
||||
)
|
||||
|
||||
local_config_files = LOCAL_CONFIG_FILES
|
||||
local_config_override = os.environ.get(LOCAL_CONFIG_FILE_OVERRIDE_VAR)
|
||||
local_config_override = LOCAL_CONFIG_FILE_OVERRIDE_VAR.get()
|
||||
if local_config_override:
|
||||
local_config_files = [expanduser(local_config_override)]
|
||||
|
||||
|
||||
@@ -1,7 +1,10 @@
|
||||
from os.path import expanduser
|
||||
from pathlib2 import Path
|
||||
|
||||
ENV_VAR = 'TRAINS_ENV'
|
||||
from .environment import EnvEntry
|
||||
|
||||
|
||||
ENV_VAR = 'CLEARML_ENV'
|
||||
""" Name of system environment variable that can be used to specify the config environment name """
|
||||
|
||||
|
||||
@@ -25,15 +28,16 @@ LOCAL_CONFIG_PATHS = [
|
||||
|
||||
LOCAL_CONFIG_FILES = [
|
||||
expanduser('~/trains.conf'), # used for workstation configuration (end-users, workers)
|
||||
expanduser('~/clearml.conf'), # used for workstation configuration (end-users, workers)
|
||||
]
|
||||
""" Local config files (not paths) """
|
||||
|
||||
|
||||
LOCAL_CONFIG_FILE_OVERRIDE_VAR = 'TRAINS_CONFIG_FILE'
|
||||
LOCAL_CONFIG_FILE_OVERRIDE_VAR = EnvEntry("CLEARML_CONFIG_FILE", "TRAINS_CONFIG_FILE")
|
||||
""" Local config file override environment variable. If this is set, no other local config files will be used. """
|
||||
|
||||
|
||||
ENV_CONFIG_PATH_OVERRIDE_VAR = 'TRAINS_CONFIG_PATH'
|
||||
ENV_CONFIG_PATH_OVERRIDE_VAR = EnvEntry("CLEARML_CONFIG_PATH", "TRAINS_CONFIG_PATH")
|
||||
"""
|
||||
Environment-related config path override environment variable. If this is set, no other env config path will be used.
|
||||
"""
|
||||
|
||||
@@ -85,9 +85,10 @@ class Entry(object):
|
||||
return self.get_pair(default=default, converter=converter)[1]
|
||||
|
||||
def set(self, value):
|
||||
# type: (Any, Any) -> (Text, Any)
|
||||
key, _ = self.get_pair(default=None, converter=None)
|
||||
self._set(key, str(value))
|
||||
# type: (Any) -> ()
|
||||
# key, _ = self.get_pair(default=None, converter=None)
|
||||
for k in self.keys:
|
||||
self._set(k, str(value))
|
||||
|
||||
def _set(self, key, value):
|
||||
# type: (Text, Text) -> None
|
||||
|
||||
@@ -15,6 +15,10 @@ class EnvEntry(Entry):
|
||||
super(EnvEntry, self).__init__(key, *more_keys, **kwargs)
|
||||
self._ignore_errors = kwargs.pop('ignore_errors', False)
|
||||
|
||||
def pop(self):
|
||||
for k in self.keys:
|
||||
environ.pop(k, None)
|
||||
|
||||
def _get(self, key):
|
||||
value = getenv(key, "").strip()
|
||||
return value or NotSet
|
||||
|
||||
@@ -5,11 +5,11 @@ from pathlib2 import Path
|
||||
|
||||
|
||||
def logger(path=None):
|
||||
name = "trains"
|
||||
name = "clearml"
|
||||
if path:
|
||||
p = Path(path)
|
||||
module = (p.parent if p.stem.startswith('_') else p).stem
|
||||
name = "trains.%s" % module
|
||||
name = "clearml.%s" % module
|
||||
return logging.getLogger(name)
|
||||
|
||||
|
||||
|
||||
@@ -50,6 +50,7 @@ class MetricsEventAdapter(object):
|
||||
url = attr.attrib(default=None)
|
||||
|
||||
exception = attr.attrib(default=None)
|
||||
retries = attr.attrib(default=None)
|
||||
|
||||
delete_local_file = attr.attrib(default=True)
|
||||
""" Local file path, if exists, delete the file after upload completed """
|
||||
@@ -198,6 +199,7 @@ class UploadEvent(MetricsEventAdapter):
|
||||
_format = '.' + str(config.get('metrics.images.format', 'JPEG')).upper().lstrip('.')
|
||||
_quality = int(config.get('metrics.images.quality', 87))
|
||||
_subsampling = int(config.get('metrics.images.subsampling', 0))
|
||||
_upload_retries = 3
|
||||
|
||||
_metric_counters = {}
|
||||
_metric_counters_lock = Lock()
|
||||
@@ -253,7 +255,7 @@ class UploadEvent(MetricsEventAdapter):
|
||||
self._upload_filename += filename_ext
|
||||
|
||||
self._override_storage_key_prefix = kwargs.pop('override_storage_key_prefix', None)
|
||||
|
||||
self.retries = self._upload_retries
|
||||
super(UploadEvent, self).__init__(metric, variant, iter=iter, **kwargs)
|
||||
|
||||
@classmethod
|
||||
@@ -334,6 +336,7 @@ class UploadEvent(MetricsEventAdapter):
|
||||
key_prop='key',
|
||||
upload_uri=self._upload_uri,
|
||||
delete_local_file=local_file if self._delete_after_upload else None,
|
||||
retries=self.retries,
|
||||
)
|
||||
|
||||
def get_target_full_upload_uri(self, storage_uri, storage_key_prefix=None, quote_uri=True):
|
||||
|
||||
@@ -165,10 +165,11 @@ class Metrics(InterfaceBase):
|
||||
|
||||
try:
|
||||
storage = self._get_storage(upload_uri)
|
||||
retries = getattr(e, 'retries', None) or self._file_upload_retries
|
||||
if isinstance(e.stream, Path):
|
||||
url = storage.upload(e.stream.as_posix(), e.url, retries=self._file_upload_retries)
|
||||
url = storage.upload(e.stream.as_posix(), e.url, retries=retries)
|
||||
else:
|
||||
url = storage.upload_from_stream(e.stream, e.url, retries=self._file_upload_retries)
|
||||
url = storage.upload_from_stream(e.stream, e.url, retries=retries)
|
||||
e.event.update(url=url)
|
||||
except Exception as exp:
|
||||
log.warning("Failed uploading to {} ({})".format(
|
||||
|
||||
@@ -178,7 +178,7 @@ class Reporter(InterfaceBase, AbstractContextManager, SetupUploadMixin, AsyncMan
|
||||
self._report(ev)
|
||||
|
||||
def report_matplotlib(self, title, series, figure, iter, force_save_as_image=False, logger=None):
|
||||
from trains.binding.matplotlib_bind import PatchedMatplotlib
|
||||
from clearml.binding.matplotlib_bind import PatchedMatplotlib
|
||||
PatchedMatplotlib.report_figure(
|
||||
title=title,
|
||||
series=series,
|
||||
|
||||
@@ -62,7 +62,7 @@ class TaskHandler(BufferingHandler):
|
||||
if self._connect_logger and not TaskHandler.__once:
|
||||
base_logger = getLogger()
|
||||
if len(base_logger.handlers) == 1 and isinstance(base_logger.handlers[0], TaskHandler):
|
||||
if record.name != 'console' and not record.name.startswith('trains.'):
|
||||
if record.name != 'console' and not record.name.startswith('clearml.'):
|
||||
base_logger.removeHandler(self)
|
||||
basicConfig()
|
||||
base_logger.addHandler(self)
|
||||
@@ -149,7 +149,7 @@ class TaskHandler(BufferingHandler):
|
||||
self._last_event = None
|
||||
batch_requests = events.AddBatchRequest(requests=[events.AddRequest(e) for e in record_events if e])
|
||||
except Exception:
|
||||
self.__log_stderr("WARNING: trains.log - Failed logging task to backend ({:d} lines)".format(len(buffer)))
|
||||
self.__log_stderr("WARNING: clearml.log - Failed logging task to backend ({:d} lines)".format(len(buffer)))
|
||||
batch_requests = None
|
||||
|
||||
if batch_requests and batch_requests.requests:
|
||||
@@ -253,7 +253,7 @@ class TaskHandler(BufferingHandler):
|
||||
write = sys.stderr._original_write if hasattr(sys.stderr, '_original_write') else sys.stderr.write
|
||||
write('{asctime} - {name} - {levelname} - {message}\n'.format(
|
||||
asctime=Formatter().formatTime(makeLogRecord({})),
|
||||
name='trains.log', levelname=getLevelName(level), message=msg))
|
||||
name='clearml.log', levelname=getLevelName(level), message=msg))
|
||||
|
||||
@classmethod
|
||||
def report_offline_session(cls, task, folder):
|
||||
|
||||
317
clearml/backend_interface/task/populate.py
Normal file
317
clearml/backend_interface/task/populate.py
Normal file
@@ -0,0 +1,317 @@
|
||||
import json
|
||||
import os
|
||||
from functools import reduce
|
||||
from logging import getLogger
|
||||
from typing import Optional, Sequence
|
||||
|
||||
from six.moves.urllib.parse import urlparse
|
||||
|
||||
from pathlib2 import Path
|
||||
|
||||
from ...task import Task
|
||||
from .repo import ScriptInfo
|
||||
|
||||
|
||||
class CreateAndPopulate(object):
|
||||
def __init__(
|
||||
self,
|
||||
project_name=None, # Optional[str]
|
||||
task_name=None, # Optional[str]
|
||||
task_type=None, # Optional[str]
|
||||
repo=None, # Optional[str]
|
||||
branch=None, # Optional[str]
|
||||
commit=None, # Optional[str]
|
||||
script=None, # Optional[str]
|
||||
working_directory=None, # Optional[str]
|
||||
packages=None, # Optional[Sequence[str]]
|
||||
requirements_file=None, # Optional[Union[str, Path]]
|
||||
docker=None, # Optional[str]
|
||||
base_task_id=None, # Optional[str]
|
||||
add_task_init_call=True, # bool
|
||||
raise_on_missing_entries=False, # bool
|
||||
):
|
||||
# type: (...) -> None
|
||||
"""
|
||||
Create a new Task from an existing code base.
|
||||
If the code does not already contain a call to Task.init, pass add_task_init_call=True,
|
||||
and the code will be patched in remote execution (i.e. when executed by `clearml-agent`
|
||||
|
||||
:param project_name: Set the project name for the task. Required if base_task_id is None.
|
||||
:param task_name: Set the name of the remote task. Required if base_task_id is None.
|
||||
:param task_type: Optional, The task type to be created. Supported values: 'training', 'testing', 'inference',
|
||||
'data_processing', 'application', 'monitor', 'controller', 'optimizer', 'service', 'qc', 'custom'
|
||||
:param repo: Remote URL for the repository to use, or path to local copy of the git repository
|
||||
Example: 'https://github.com/allegroai/clearml.git' or '~/project/repo'
|
||||
:param branch: Select specific repository branch/tag (implies the latest commit from the branch)
|
||||
:param commit: Select specific commit id to use (default: latest commit,
|
||||
or when used with local repository matching the local commit id)
|
||||
:param script: Specify the entry point script for the remote execution. When used in tandem with
|
||||
remote git repository the script should be a relative path inside the repository,
|
||||
for example: './source/train.py' . When used with local repository path it supports a
|
||||
direct path to a file inside the local repository itself, for example: '~/project/source/train.py'
|
||||
:param working_directory: Working directory to launch the script from. Default: repository root folder.
|
||||
Relative to repo root or local folder.
|
||||
:param packages: Manually specify a list of required packages. Example: ["tqdm>=2.1", "scikit-learn"]
|
||||
:param requirements_file: Specify requirements.txt file to install when setting the session.
|
||||
If not provided, the requirements.txt from the repository will be used.
|
||||
:param docker: Select the docker image to be executed in by the remote session
|
||||
:param base_task_id: Use a pre-existing task in the system, instead of a local repo/script.
|
||||
Essentially clones an existing task and overrides arguments/requirements.
|
||||
:param add_task_init_call: If True, a 'Task.init()' call is added to the script entry point in remote execution.
|
||||
:param raise_on_missing_entries: If True raise ValueError on missing entries when populating
|
||||
"""
|
||||
if len(urlparse(repo).scheme) <= 1:
|
||||
folder = repo
|
||||
repo = None
|
||||
else:
|
||||
folder = None
|
||||
|
||||
if raise_on_missing_entries and not base_task_id:
|
||||
if not script:
|
||||
raise ValueError("Entry point script not provided")
|
||||
if not repo and not folder and not Path(script).is_file():
|
||||
raise ValueError("Repository or script must be provided")
|
||||
if raise_on_missing_entries and commit and branch:
|
||||
raise ValueError(
|
||||
"Specify either a branch/tag or specific commit id, not both (either --commit or --branch)")
|
||||
if raise_on_missing_entries and not folder and working_directory and working_directory.startswith('/'):
|
||||
raise ValueError("working directory \'{}\', must be relative to repository root")
|
||||
|
||||
if requirements_file and not Path(requirements_file).is_file():
|
||||
raise ValueError("requirements file could not be found \'{}\'")
|
||||
|
||||
self.folder = folder
|
||||
self.commit = commit
|
||||
self.branch = branch
|
||||
self.repo = repo
|
||||
self.script = script
|
||||
self.cwd = working_directory
|
||||
assert not packages or isinstance(packages, (tuple, list))
|
||||
self.packages = list(packages) if packages else None
|
||||
self.requirements_file = Path(requirements_file) if requirements_file else None
|
||||
self.base_task_id = base_task_id
|
||||
self.docker = docker
|
||||
self.add_task_init_call = add_task_init_call
|
||||
self.project_name = project_name
|
||||
self.task_name = task_name
|
||||
self.task_type = task_type
|
||||
self.task = None
|
||||
self.raise_on_missing_entries = raise_on_missing_entries
|
||||
|
||||
def create_task(self):
|
||||
# type: () -> Task
|
||||
"""
|
||||
Create the new populated Task
|
||||
|
||||
:return: newly created Task object
|
||||
"""
|
||||
local_entry_file = None
|
||||
repo_info = None
|
||||
if self.folder or (self.script and Path(self.script).is_file()):
|
||||
self.folder = os.path.expandvars(os.path.expanduser(self.folder)) if self.folder else None
|
||||
self.script = os.path.expandvars(os.path.expanduser(self.script)) if self.script else None
|
||||
self.cwd = os.path.expandvars(os.path.expanduser(self.cwd)) if self.cwd else None
|
||||
if Path(self.script).is_file():
|
||||
entry_point = self.script
|
||||
else:
|
||||
entry_point = (Path(self.folder) / self.script).as_posix()
|
||||
entry_point = os.path.abspath(entry_point)
|
||||
if not os.path.isfile(entry_point):
|
||||
raise ValueError("Script entrypoint file \'{}\' could not be found".format(entry_point))
|
||||
|
||||
local_entry_file = entry_point
|
||||
repo_info, requirements = ScriptInfo.get(
|
||||
filepaths=[entry_point],
|
||||
log=getLogger(),
|
||||
create_requirements=False, uncommitted_from_remote=True)
|
||||
|
||||
# check if we have no repository and no requirements raise error
|
||||
if self.raise_on_missing_entries and not self.requirements_file and not self.repo and (
|
||||
not repo_info or not repo_info.script or not repo_info.script.get('repository')):
|
||||
raise ValueError("Standalone script detected \'{}\', but no requirements provided".format(self.script))
|
||||
|
||||
if self.base_task_id:
|
||||
print('Cloning task {}'.format(self.base_task_id))
|
||||
task = Task.clone(source_task=self.base_task_id, project=Task.get_project_id(self.project_name))
|
||||
else:
|
||||
# noinspection PyProtectedMember
|
||||
task = Task._create(task_name=self.task_name, project_name=self.project_name, task_type=self.task_type)
|
||||
# if there is nothing to populate, return
|
||||
if not any([
|
||||
self.folder, self.commit, self.branch, self.repo, self.script, self.cwd,
|
||||
self.packages, self.requirements_file, self.base_task_id, self.docker
|
||||
]):
|
||||
return task
|
||||
|
||||
task_state = task.export_task()
|
||||
if 'script' not in task_state:
|
||||
task_state['script'] = {}
|
||||
|
||||
if repo_info:
|
||||
task_state['script']['repository'] = repo_info.script['repository']
|
||||
task_state['script']['version_num'] = repo_info.script['version_num']
|
||||
task_state['script']['branch'] = repo_info.script['branch']
|
||||
task_state['script']['diff'] = repo_info.script['diff'] or ''
|
||||
task_state['script']['working_dir'] = repo_info.script['working_dir']
|
||||
task_state['script']['entry_point'] = repo_info.script['entry_point']
|
||||
task_state['script']['binary'] = repo_info.script['binary']
|
||||
task_state['script']['requirements'] = {}
|
||||
if self.cwd:
|
||||
self.cwd = self.cwd
|
||||
cwd = self.cwd if Path(self.cwd).is_dir() else (
|
||||
Path(repo_info.script['repo_root']) / self.cwd).as_posix()
|
||||
if not Path(cwd).is_dir():
|
||||
raise ValueError("Working directory \'{}\' could not be found".format(cwd))
|
||||
cwd = Path(cwd).relative_to(repo_info.script['repo_root']).as_posix()
|
||||
entry_point = \
|
||||
Path(repo_info.script['repo_root']) / repo_info.script['working_dir'] / repo_info.script[
|
||||
'entry_point']
|
||||
entry_point = entry_point.relative_to(cwd).as_posix()
|
||||
task_state['script']['entry_point'] = entry_point
|
||||
task_state['script']['working_dir'] = cwd
|
||||
elif self.repo:
|
||||
# normalize backslashes and remove first one
|
||||
entry_point = '/'.join([p for p in self.script.split('/') if p and p != '.'])
|
||||
cwd = '/'.join([p for p in (self.cwd or '.').split('/') if p and p != '.'])
|
||||
if cwd and entry_point.startswith(cwd + '/'):
|
||||
entry_point = entry_point[len(cwd) + 1:]
|
||||
task_state['script']['repository'] = self.repo
|
||||
task_state['script']['version_num'] = self.commit or None
|
||||
task_state['script']['branch'] = self.branch or None
|
||||
task_state['script']['diff'] = ''
|
||||
task_state['script']['working_dir'] = cwd or '.'
|
||||
task_state['script']['entry_point'] = entry_point
|
||||
|
||||
# update requirements
|
||||
reqs = []
|
||||
if self.requirements_file:
|
||||
with open(self.requirements_file.as_posix(), 'rt') as f:
|
||||
reqs = [line.strip() for line in f.readlines()]
|
||||
if self.packages:
|
||||
reqs += self.packages
|
||||
if reqs:
|
||||
# make sure we have clearml.
|
||||
clearml_found = False
|
||||
for line in reqs:
|
||||
if line.strip().startswith('#'):
|
||||
continue
|
||||
package = reduce(lambda a, b: a.split(b)[0], "#;@=~<>", line).strip()
|
||||
if package == 'clearml':
|
||||
clearml_found = True
|
||||
break
|
||||
if not clearml_found:
|
||||
reqs.append('clearml')
|
||||
task_state['script']['requirements'] = {'pip': '\n'.join(reqs)}
|
||||
elif not self.repo and repo_info:
|
||||
# we are in local mode, make sure we have "requirements.txt" it is a must
|
||||
reqs_txt_file = Path(repo_info.script['repo_root']) / "requirements.txt"
|
||||
if self.raise_on_missing_entries and not reqs_txt_file.is_file():
|
||||
raise ValueError(
|
||||
"requirements.txt not found [{}] "
|
||||
"Use --requirements or --packages".format(reqs_txt_file.as_posix()))
|
||||
|
||||
if self.add_task_init_call:
|
||||
script_entry = os.path.abspath('/' + task_state['script']['working_dir'] +
|
||||
'/' + task_state['script']['entry_point'])
|
||||
idx_a = 0
|
||||
# find the right entry for the patch if we have a local file (basically after __future__
|
||||
if local_entry_file:
|
||||
with open(local_entry_file, 'rt') as f:
|
||||
lines = f.readlines()
|
||||
future_found = -1
|
||||
for i, line in enumerate(lines):
|
||||
tokens = [t.strip() for t in line.split(' ') if t.strip()]
|
||||
if tokens and tokens[0] in ('import', 'from',):
|
||||
if '__future__' in line:
|
||||
future_found = i
|
||||
else:
|
||||
break
|
||||
if future_found >= 0:
|
||||
idx_a = future_found + 1
|
||||
|
||||
task_init_patch = ''
|
||||
# if we do not have requirements, add clearml to the requirements.txt
|
||||
if not reqs:
|
||||
task_init_patch += \
|
||||
"diff --git a/requirements.txt b/requirements.txt\n" \
|
||||
"--- a/requirements.txt\n" \
|
||||
"+++ b/requirements.txt\n" \
|
||||
"@@ -0,0 +1,1 @@\n" \
|
||||
"+clearml\n"
|
||||
|
||||
task_init_patch += \
|
||||
"diff --git a{script_entry} b{script_entry}\n" \
|
||||
"--- a{script_entry}\n" \
|
||||
"+++ b{script_entry}\n" \
|
||||
"@@ -{idx_a},0 +{idx_b},3 @@\n" \
|
||||
"+from clearml import Task\n" \
|
||||
"+Task.init()\n" \
|
||||
"+\n".format(
|
||||
script_entry=script_entry, idx_a=idx_a, idx_b=idx_a + 1)
|
||||
|
||||
task_state['script']['diff'] = task_init_patch + task_state['script']['diff']
|
||||
|
||||
# set base docker image if provided
|
||||
if self.docker:
|
||||
task.set_base_docker(self.docker)
|
||||
|
||||
if task_state['script']['repository']:
|
||||
repo_details = {k: v for k, v in task_state['script'].items()
|
||||
if v and k not in ('diff', 'requirements', 'binary')}
|
||||
print('Repository Detected\n{}'.format(json.dumps(repo_details, indent=2)))
|
||||
else:
|
||||
print('Standalone script detected\n Script: {}\n: Requirements: {}'.format(
|
||||
self.script, task_state['script']['requirements'].get('pip', [])))
|
||||
|
||||
if task_state['script'].get('requirements') and task_state['script']['requirements'].get('pip'):
|
||||
print('Requirements:\n requirements.txt: {}\n Additional Packages:{}'.format(
|
||||
self.requirements_file.as_posix().name if self.requirements_file else '', self.packages))
|
||||
if self.docker:
|
||||
print('Base docker image: {}'.format(self.docker))
|
||||
|
||||
# update the Task
|
||||
task.update_task(task_state)
|
||||
self.task = task
|
||||
return task
|
||||
|
||||
def update_task_args(self, args=None):
|
||||
# type: (Optional[Sequence[str]]) -> ()
|
||||
"""
|
||||
Update the newly created Task argparse Arguments
|
||||
If called before Task created, used for argument verification
|
||||
|
||||
:param args: Arguments to pass to the remote execution, list of string pairs (argument, value) or
|
||||
list of strings '<argument>=<value>'. Example: ['lr=0.003', (batch_size, 64)]
|
||||
"""
|
||||
if not args:
|
||||
return
|
||||
|
||||
# check args are in format <key>=<value>
|
||||
args_list = []
|
||||
for a in args:
|
||||
if isinstance(a, (list, tuple)):
|
||||
assert len(a) == 2
|
||||
args_list.append(a)
|
||||
continue
|
||||
try:
|
||||
parts = a.split('=', 1)
|
||||
assert len(parts) == 2
|
||||
args_list.append(parts)
|
||||
except Exception:
|
||||
raise ValueError(
|
||||
"Failed parsing argument \'{}\', arguments must be in \'<key>=<value>\' format")
|
||||
|
||||
if not self.task:
|
||||
return
|
||||
|
||||
task_params = self.task.get_parameters()
|
||||
args_list = {'Args/{}'.format(k): v for k, v in args_list}
|
||||
task_params.update(args_list)
|
||||
self.task.set_parameters(task_params)
|
||||
|
||||
def get_id(self):
|
||||
# type: () -> Optional[str]
|
||||
"""
|
||||
:return: Return the created Task id (str)
|
||||
"""
|
||||
return self.task.id if self.task else None
|
||||
@@ -52,21 +52,21 @@ class ScriptRequirements(object):
|
||||
try:
|
||||
# noinspection PyPackageRequirements,PyUnresolvedReferences
|
||||
import boto3 # noqa: F401
|
||||
modules.add('boto3', 'trains.storage', 0)
|
||||
modules.add('boto3', 'clearml.storage', 0)
|
||||
except Exception:
|
||||
pass
|
||||
# noinspection PyBroadException
|
||||
try:
|
||||
# noinspection PyPackageRequirements,PyUnresolvedReferences
|
||||
from google.cloud import storage # noqa: F401
|
||||
modules.add('google_cloud_storage', 'trains.storage', 0)
|
||||
modules.add('google_cloud_storage', 'clearml.storage', 0)
|
||||
except Exception:
|
||||
pass
|
||||
# noinspection PyBroadException
|
||||
try:
|
||||
# noinspection PyPackageRequirements,PyUnresolvedReferences
|
||||
from azure.storage.blob import ContentSettings # noqa: F401
|
||||
modules.add('azure_storage_blob', 'trains.storage', 0)
|
||||
modules.add('azure_storage_blob', 'clearml.storage', 0)
|
||||
except Exception:
|
||||
pass
|
||||
|
||||
@@ -100,7 +100,7 @@ class ScriptRequirements(object):
|
||||
from ..task import Task
|
||||
# noinspection PyProtectedMember
|
||||
for package, version in Task._force_requirements.items():
|
||||
modules.add(package, 'trains', 0)
|
||||
modules.add(package, 'clearml', 0)
|
||||
except Exception:
|
||||
pass
|
||||
|
||||
@@ -265,7 +265,7 @@ class _JupyterObserver(object):
|
||||
|
||||
@classmethod
|
||||
def _daemon(cls, jupyter_notebook_filename):
|
||||
from trains import Task
|
||||
from clearml import Task
|
||||
|
||||
# load jupyter notebook package
|
||||
# noinspection PyBroadException
|
||||
@@ -715,12 +715,12 @@ class ScriptInfo(object):
|
||||
jupyter_filepath=jupyter_filepath,
|
||||
)
|
||||
|
||||
if repo_info.modified:
|
||||
messages.append(
|
||||
"======> WARNING! UNCOMMITTED CHANGES IN REPOSITORY {} <======".format(
|
||||
script_info.get("repository", "")
|
||||
)
|
||||
)
|
||||
# if repo_info.modified:
|
||||
# messages.append(
|
||||
# "======> WARNING! UNCOMMITTED CHANGES IN REPOSITORY {} <======".format(
|
||||
# script_info.get("repository", "")
|
||||
# )
|
||||
# )
|
||||
|
||||
if not any(script_info.values()):
|
||||
script_info = None
|
||||
|
||||
@@ -27,6 +27,7 @@ from six.moves.urllib.parse import quote
|
||||
|
||||
from ...utilities.locks import RLock as FileRLock
|
||||
from ...utilities.attrs import readonly
|
||||
from ...utilities.proxy_object import verify_basic_type
|
||||
from ...binding.artifacts import Artifacts
|
||||
from ...backend_interface.task.development.worker import DevWorker
|
||||
from ...backend_api import Session
|
||||
@@ -144,9 +145,9 @@ class Task(IdObjectBase, AccessMixin, SetupUploadMixin):
|
||||
self.__reporter = None
|
||||
self._curr_label_stats = {}
|
||||
self._raise_on_validation_errors = raise_on_validation_errors
|
||||
self._parameters_allowed_types = (
|
||||
self._parameters_allowed_types = tuple(set(
|
||||
six.string_types + six.integer_types + (six.text_type, float, list, tuple, dict, type(None))
|
||||
)
|
||||
))
|
||||
self._app_server = None
|
||||
self._files_server = None
|
||||
self._initial_iteration_offset = 0
|
||||
@@ -216,7 +217,7 @@ class Task(IdObjectBase, AccessMixin, SetupUploadMixin):
|
||||
)
|
||||
else:
|
||||
self.get_logger().report_text(
|
||||
'TRAINS new version available: upgrade to v{} is recommended!'.format(
|
||||
'ClearML new version available: upgrade to v{} is recommended!'.format(
|
||||
latest_version[0]),
|
||||
)
|
||||
except Exception:
|
||||
@@ -296,8 +297,8 @@ class Task(IdObjectBase, AccessMixin, SetupUploadMixin):
|
||||
if task_type.value not in (self.TaskTypes.training, self.TaskTypes.testing) and \
|
||||
not Session.check_min_api_version('2.8'):
|
||||
print('WARNING: Changing task type to "{}" : '
|
||||
'trains-server does not support task type "{}", '
|
||||
'please upgrade trains-server.'.format(self.TaskTypes.training, task_type.value))
|
||||
'clearml-server does not support task type "{}", '
|
||||
'please upgrade clearml-server.'.format(self.TaskTypes.training, task_type.value))
|
||||
task_type = self.TaskTypes.training
|
||||
|
||||
project_id = None
|
||||
@@ -402,7 +403,7 @@ class Task(IdObjectBase, AccessMixin, SetupUploadMixin):
|
||||
# type: () -> str
|
||||
"""
|
||||
The Task's status. To keep the Task updated.
|
||||
Trains reloads the Task status information only, when this value is accessed.
|
||||
ClearML reloads the Task status information only, when this value is accessed.
|
||||
|
||||
return str: TaskStatusEnum status
|
||||
"""
|
||||
@@ -445,7 +446,7 @@ class Task(IdObjectBase, AccessMixin, SetupUploadMixin):
|
||||
def reload(self):
|
||||
# type: () -> ()
|
||||
"""
|
||||
Reload current Task's state from trains-server.
|
||||
Reload current Task's state from clearml-server.
|
||||
Refresh all task's fields, including artifacts / models / parameters etc.
|
||||
"""
|
||||
return super(Task, self).reload()
|
||||
@@ -628,9 +629,9 @@ class Task(IdObjectBase, AccessMixin, SetupUploadMixin):
|
||||
):
|
||||
# type: (...) -> str
|
||||
"""
|
||||
Update the Task's output model weights file. First, Trains uploads the file to the preconfigured output
|
||||
Update the Task's output model weights file. First, ClearML uploads the file to the preconfigured output
|
||||
destination (see the Task's ``output.destination`` property or call the ``setup_upload`` method),
|
||||
then Trains updates the model object associated with the Task an API call. The API call uses with the URI
|
||||
then ClearML updates the model object associated with the Task an API call. The API call uses with the URI
|
||||
of the uploaded file, and other values provided by additional arguments.
|
||||
|
||||
:param str model_file: The path to the updated model weights file.
|
||||
@@ -684,19 +685,19 @@ class Task(IdObjectBase, AccessMixin, SetupUploadMixin):
|
||||
Set a new input model for the Task. The model must be "ready" (status is ``Published``) to be used as the
|
||||
Task's input model.
|
||||
|
||||
:param model_id: The Id of the model on the **Trains Server** (backend). If ``model_name`` is not specified,
|
||||
:param model_id: The Id of the model on the **ClearML Server** (backend). If ``model_name`` is not specified,
|
||||
then ``model_id`` must be specified.
|
||||
:param model_name: The model name. The name is used to locate an existing model in the **Trains Server**
|
||||
:param model_name: The model name. The name is used to locate an existing model in the **ClearML Server**
|
||||
(backend). If ``model_id`` is not specified, then ``model_name`` must be specified.
|
||||
:param update_task_design: Update the Task's design
|
||||
|
||||
- ``True`` - Trains copies the Task's model design from the input model.
|
||||
- ``False`` - Trains does not copy the Task's model design from the input model.
|
||||
- ``True`` - ClearML copies the Task's model design from the input model.
|
||||
- ``False`` - ClearML does not copy the Task's model design from the input model.
|
||||
|
||||
:param update_task_labels: Update the Task's label enumeration
|
||||
|
||||
- ``True`` - Trains copies the Task's label enumeration from the input model.
|
||||
- ``False`` - Trains does not copy the Task's label enumeration from the input model.
|
||||
- ``True`` - ClearML copies the Task's label enumeration from the input model.
|
||||
- ``False`` - ClearML does not copy the Task's label enumeration from the input model.
|
||||
"""
|
||||
if model_id is None and not model_name:
|
||||
raise ValueError('Expected one of [model_id, model_name]')
|
||||
@@ -749,7 +750,7 @@ class Task(IdObjectBase, AccessMixin, SetupUploadMixin):
|
||||
i.e. {'Args/param': 'value'} is the argument "param" from section "Args"
|
||||
|
||||
:param backwards_compatibility: If True (default) parameters without section name
|
||||
(API version < 2.9, trains-server < 0.16) will be at dict root level.
|
||||
(API version < 2.9, clearml-server < 0.16) will be at dict root level.
|
||||
If False, parameters without section name, will be nested under "Args/" key.
|
||||
|
||||
:return: dict of the task parameters, all flattened to key/value.
|
||||
@@ -838,14 +839,15 @@ class Task(IdObjectBase, AccessMixin, SetupUploadMixin):
|
||||
not_allowed = {
|
||||
k: type(v).__name__
|
||||
for k, v in new_parameters.items()
|
||||
if not isinstance(v, self._parameters_allowed_types)
|
||||
if not verify_basic_type(v, self._parameters_allowed_types)
|
||||
}
|
||||
if not_allowed:
|
||||
raise ValueError(
|
||||
"Only builtin types ({}) are allowed for values (got {})".format(
|
||||
', '.join(t.__name__ for t in self._parameters_allowed_types),
|
||||
', '.join('%s=>%s' % p for p in not_allowed.items())),
|
||||
self.log.warning(
|
||||
"Skipping parameter: {}, only builtin types are supported ({})".format(
|
||||
', '.join('%s[%s]' % p for p in not_allowed.items()),
|
||||
', '.join(t.__name__ for t in self._parameters_allowed_types))
|
||||
)
|
||||
new_parameters = {k: v for k, v in new_parameters.items() if k not in not_allowed}
|
||||
|
||||
use_hyperparams = Session.check_min_api_version('2.9')
|
||||
|
||||
@@ -958,7 +960,7 @@ class Task(IdObjectBase, AccessMixin, SetupUploadMixin):
|
||||
:return: True if the parameter was deleted successfully
|
||||
"""
|
||||
if not Session.check_min_api_version('2.9'):
|
||||
raise ValueError("Delete hyper parameter is not supported by your trains-server, "
|
||||
raise ValueError("Delete hyper parameter is not supported by your clearml-server, "
|
||||
"upgrade to the latest version")
|
||||
with self._edit_lock:
|
||||
paramkey = tasks.ParamKey(section=name.split('/', 1)[0], name=name.split('/', 1)[1])
|
||||
@@ -1011,7 +1013,7 @@ class Task(IdObjectBase, AccessMixin, SetupUploadMixin):
|
||||
# type: (str) -> ()
|
||||
"""
|
||||
Set the base docker image for this experiment
|
||||
If provided, this value will be used by trains-agent to execute this experiment
|
||||
If provided, this value will be used by clearml-agent to execute this experiment
|
||||
inside the provided docker image.
|
||||
When running remotely the call is ignored
|
||||
"""
|
||||
@@ -1275,7 +1277,7 @@ class Task(IdObjectBase, AccessMixin, SetupUploadMixin):
|
||||
# type: () -> str
|
||||
"""
|
||||
Return the Task results & outputs web page address.
|
||||
For example: https://demoapp.trains.allegro.ai/projects/216431/experiments/60763e04/output/log
|
||||
For example: https://demoapp.demo.clear.ml/projects/216431/experiments/60763e04/output/log
|
||||
|
||||
:return: http/s URL link.
|
||||
"""
|
||||
@@ -1428,7 +1430,7 @@ class Task(IdObjectBase, AccessMixin, SetupUploadMixin):
|
||||
def running_locally():
|
||||
# type: () -> bool
|
||||
"""
|
||||
Is the task running locally (i.e., ``trains-agent`` is not executing it)
|
||||
Is the task running locally (i.e., ``clearml-agent`` is not executing it)
|
||||
|
||||
:return: True, if the task is running locally. False, if the task is not running locally.
|
||||
|
||||
@@ -1637,7 +1639,7 @@ class Task(IdObjectBase, AccessMixin, SetupUploadMixin):
|
||||
mutually_exclusive(config_dict=config_dict, config_text=config_text, _check_none=True)
|
||||
|
||||
if not Session.check_min_api_version('2.9'):
|
||||
raise ValueError("Multiple configurations is not supported with the current 'trains-server', "
|
||||
raise ValueError("Multiple configurations is not supported with the current 'clearml-server', "
|
||||
"please upgrade to the latest version")
|
||||
|
||||
if description:
|
||||
@@ -1661,7 +1663,7 @@ class Task(IdObjectBase, AccessMixin, SetupUploadMixin):
|
||||
return None if configuration name is not valid.
|
||||
"""
|
||||
if not Session.check_min_api_version('2.9'):
|
||||
raise ValueError("Multiple configurations is not supported with the current 'trains-server', "
|
||||
raise ValueError("Multiple configurations is not supported with the current 'clearml-server', "
|
||||
"please upgrade to the latest version")
|
||||
|
||||
configuration = self.data.configuration or {}
|
||||
@@ -1725,6 +1727,22 @@ class Task(IdObjectBase, AccessMixin, SetupUploadMixin):
|
||||
"""
|
||||
|
||||
session = session if session else cls._get_default_session()
|
||||
use_clone_api = Session.check_min_api_version('2.9')
|
||||
if use_clone_api:
|
||||
res = cls._send(
|
||||
session=session, log=log,
|
||||
req=tasks.CloneRequest(
|
||||
task=cloned_task_id,
|
||||
new_task_name=name,
|
||||
new_task_tags=tags,
|
||||
new_task_comment=comment,
|
||||
new_task_parent=parent,
|
||||
new_task_project=project,
|
||||
execution_overrides=execution_overrides,
|
||||
)
|
||||
)
|
||||
cloned_task_id = res.response.id
|
||||
return cloned_task_id
|
||||
|
||||
res = cls._send(session=session, log=log, req=tasks.GetByIdRequest(task=cloned_task_id))
|
||||
task = res.response.task
|
||||
@@ -1858,7 +1876,7 @@ class Task(IdObjectBase, AccessMixin, SetupUploadMixin):
|
||||
if not PROC_MASTER_ID_ENV_VAR.get() or len(PROC_MASTER_ID_ENV_VAR.get().split(':')) < 2:
|
||||
self.__edit_lock = RLock()
|
||||
elif PROC_MASTER_ID_ENV_VAR.get().split(':')[1] == str(self.id):
|
||||
filename = os.path.join(gettempdir(), 'trains_{}.lock'.format(self.id))
|
||||
filename = os.path.join(gettempdir(), 'clearml_{}.lock'.format(self.id))
|
||||
# no need to remove previous file lock if we have a dead process, it will automatically release the lock.
|
||||
# # noinspection PyBroadException
|
||||
# try:
|
||||
|
||||
@@ -187,7 +187,7 @@ class Artifact(object):
|
||||
:raise: Raises error if local copy not found.
|
||||
:return: A local path to a downloaded copy of the artifact.
|
||||
"""
|
||||
from trains.storage import StorageManager
|
||||
from clearml.storage import StorageManager
|
||||
local_copy = StorageManager.get_local_copy(
|
||||
remote_url=self.url,
|
||||
extract_archive=extract_archive and self.type == 'archive',
|
||||
@@ -308,7 +308,7 @@ class Artifacts(object):
|
||||
delete_after_upload=False, auto_pickle=True, wait_on_upload=False):
|
||||
# type: (str, Optional[object], Optional[dict], Optional[str], bool, bool, bool) -> bool
|
||||
if not Session.check_min_api_version('2.3'):
|
||||
LoggerRoot.get_base_logger().warning('Artifacts not supported by your TRAINS-server version, '
|
||||
LoggerRoot.get_base_logger().warning('Artifacts not supported by your ClearML-server version, '
|
||||
'please upgrade to the latest server version')
|
||||
return False
|
||||
|
||||
@@ -648,7 +648,7 @@ class Artifacts(object):
|
||||
return
|
||||
self._last_artifacts_upload[name] = current_sha2
|
||||
|
||||
# If old trains-server, upload as debug image
|
||||
# If old clearml-server, upload as debug image
|
||||
if not Session.check_min_api_version('2.3'):
|
||||
logger.report_image(title='artifacts', series=name, local_path=local_csv.as_posix(),
|
||||
delete_after_upload=True, iteration=self._task.get_last_iteration(),
|
||||
@@ -698,7 +698,7 @@ class Artifacts(object):
|
||||
"""
|
||||
Upload local file and return uri of the uploaded file (uploading in the background)
|
||||
"""
|
||||
from trains.storage import StorageManager
|
||||
from clearml.storage import StorageManager
|
||||
|
||||
upload_uri = self._task.output_uri or self._task.get_logger().get_default_upload_destination()
|
||||
if not isinstance(local_file, Path):
|
||||
@@ -715,7 +715,7 @@ class Artifacts(object):
|
||||
# send for upload
|
||||
# noinspection PyProtectedMember
|
||||
if wait_on_upload:
|
||||
StorageManager.upload_file(local_file.as_posix(), uri)
|
||||
StorageManager.upload_file(local_file.as_posix(), uri, wait_for_upload=True, retries=ev.retries)
|
||||
if delete_after_upload:
|
||||
try:
|
||||
os.unlink(local_file.as_posix())
|
||||
|
||||
@@ -44,7 +44,7 @@ class EnvironmentBind(object):
|
||||
match = match.strip()
|
||||
if match == '*':
|
||||
env_param.update({k: os.environ.get(k) for k in os.environ
|
||||
if not k.startswith('TRAINS_') and not k.startswith('ALG_')})
|
||||
if not k.startswith('TRAINS_') and not k.startswith('CLEARML_')})
|
||||
elif match.endswith('*'):
|
||||
match = match.strip('*')
|
||||
env_param.update({k: os.environ.get(k) for k in os.environ if k.startswith(match)})
|
||||
|
||||
@@ -114,7 +114,7 @@ class WeightsFileHandler(object):
|
||||
Add a pre-save/load callback for weights files and return its handle. If the callback was already added,
|
||||
return the existing handle.
|
||||
|
||||
Use this callback to modify the weights filename registered in the Trains Server. In case Trains is
|
||||
Use this callback to modify the weights filename registered in the ClearML Server. In case ClearML is
|
||||
configured to upload the weights file, this will affect the uploaded filename as well.
|
||||
Callback returning None will disable the tracking of the current call Model save,
|
||||
it will not disable saving it to disk, just the logging/tracking/uploading.
|
||||
@@ -422,7 +422,7 @@ class WeightsFileHandler(object):
|
||||
# HACK: if pytorch-lightning is used, remove the temp '.part' file extension
|
||||
if sys.modules.get('pytorch_lightning') and target_filename.lower().endswith('.part'):
|
||||
target_filename = target_filename[:-len('.part')]
|
||||
fd, temp_file = mkstemp(prefix='.trains.upload_model_', suffix='.tmp')
|
||||
fd, temp_file = mkstemp(prefix='.clearml.upload_model_', suffix='.tmp')
|
||||
os.close(fd)
|
||||
shutil.copy(files[0], temp_file)
|
||||
trains_out_model.update_weights(
|
||||
|
||||
@@ -192,7 +192,7 @@ class WeightsGradientHistHelper(object):
|
||||
class EventTrainsWriter(object):
|
||||
"""
|
||||
TF SummaryWriter implementation that converts the tensorboard's summary into
|
||||
Trains events and reports the events (metrics) for an Trains task (logger).
|
||||
ClearML events and reports the events (metrics) for an ClearML task (logger).
|
||||
"""
|
||||
_add_lock = threading.RLock()
|
||||
_series_name_lookup = {}
|
||||
@@ -298,8 +298,8 @@ class EventTrainsWriter(object):
|
||||
def __init__(self, logger, logdir=None, report_freq=100, image_report_freq=None,
|
||||
histogram_update_freq_multiplier=10, histogram_granularity=50, max_keep_images=None):
|
||||
"""
|
||||
Create a compatible Trains backend to the TensorFlow SummaryToEventTransformer
|
||||
Everything will be serialized directly to the Trains backend, instead of to the standard TF FileWriter
|
||||
Create a compatible ClearML backend to the TensorFlow SummaryToEventTransformer
|
||||
Everything will be serialized directly to the ClearML backend, instead of to the standard TF FileWriter
|
||||
|
||||
:param logger: The task.logger to use for sending the metrics (def: task.get_logger())
|
||||
:param report_freq: How often to update the statistics values
|
||||
@@ -846,7 +846,7 @@ class PatchSummaryToEventTransformer(object):
|
||||
if PatchSummaryToEventTransformer.__original_getattribute is None:
|
||||
PatchSummaryToEventTransformer.__original_getattribute = SummaryToEventTransformer.__getattribute__
|
||||
SummaryToEventTransformer.__getattribute__ = PatchSummaryToEventTransformer._patched_getattribute
|
||||
setattr(SummaryToEventTransformer, 'trains',
|
||||
setattr(SummaryToEventTransformer, 'clearml',
|
||||
property(PatchSummaryToEventTransformer.trains_object))
|
||||
except Exception as ex:
|
||||
LoggerRoot.get_base_logger(TensorflowBinding).debug(str(ex))
|
||||
@@ -859,7 +859,7 @@ class PatchSummaryToEventTransformer(object):
|
||||
from torch.utils.tensorboard.writer import FileWriter as FileWriterT # noqa
|
||||
PatchSummaryToEventTransformer._original_add_eventT = FileWriterT.add_event
|
||||
FileWriterT.add_event = PatchSummaryToEventTransformer._patched_add_eventT
|
||||
setattr(FileWriterT, 'trains', None)
|
||||
setattr(FileWriterT, 'clearml', None)
|
||||
except ImportError:
|
||||
# this is a new version of TensorflowX
|
||||
pass
|
||||
@@ -875,7 +875,7 @@ class PatchSummaryToEventTransformer(object):
|
||||
PatchSummaryToEventTransformer.__original_getattributeX = \
|
||||
SummaryToEventTransformerX.__getattribute__
|
||||
SummaryToEventTransformerX.__getattribute__ = PatchSummaryToEventTransformer._patched_getattributeX
|
||||
setattr(SummaryToEventTransformerX, 'trains',
|
||||
setattr(SummaryToEventTransformerX, 'clearml',
|
||||
property(PatchSummaryToEventTransformer.trains_object))
|
||||
except ImportError:
|
||||
# this is a new version of TensorflowX
|
||||
@@ -890,7 +890,7 @@ class PatchSummaryToEventTransformer(object):
|
||||
from tensorboardX.writer import FileWriter as FileWriterX # noqa
|
||||
PatchSummaryToEventTransformer._original_add_eventX = FileWriterX.add_event
|
||||
FileWriterX.add_event = PatchSummaryToEventTransformer._patched_add_eventX
|
||||
setattr(FileWriterX, 'trains', None)
|
||||
setattr(FileWriterX, 'clearml', None)
|
||||
except ImportError:
|
||||
# this is a new version of TensorflowX
|
||||
pass
|
||||
@@ -899,38 +899,38 @@ class PatchSummaryToEventTransformer(object):
|
||||
|
||||
@staticmethod
|
||||
def _patched_add_eventT(self, *args, **kwargs):
|
||||
if not hasattr(self, 'trains') or not PatchSummaryToEventTransformer.__main_task:
|
||||
if not hasattr(self, 'clearml') or not PatchSummaryToEventTransformer.__main_task:
|
||||
return PatchSummaryToEventTransformer._original_add_eventT(self, *args, **kwargs)
|
||||
if not self.trains:
|
||||
if not self.clearml: # noqa
|
||||
# noinspection PyBroadException
|
||||
try:
|
||||
logdir = self.get_logdir()
|
||||
except Exception:
|
||||
logdir = None
|
||||
self.trains = EventTrainsWriter(PatchSummaryToEventTransformer.__main_task.get_logger(),
|
||||
self.clearml = EventTrainsWriter(PatchSummaryToEventTransformer.__main_task.get_logger(),
|
||||
logdir=logdir, **PatchSummaryToEventTransformer.defaults_dict)
|
||||
# noinspection PyBroadException
|
||||
try:
|
||||
self.trains.add_event(*args, **kwargs)
|
||||
self.clearml.add_event(*args, **kwargs)
|
||||
except Exception:
|
||||
pass
|
||||
return PatchSummaryToEventTransformer._original_add_eventT(self, *args, **kwargs)
|
||||
|
||||
@staticmethod
|
||||
def _patched_add_eventX(self, *args, **kwargs):
|
||||
if not hasattr(self, 'trains') or not PatchSummaryToEventTransformer.__main_task:
|
||||
if not hasattr(self, 'clearml') or not PatchSummaryToEventTransformer.__main_task:
|
||||
return PatchSummaryToEventTransformer._original_add_eventX(self, *args, **kwargs)
|
||||
if not self.trains:
|
||||
if not self.clearml:
|
||||
# noinspection PyBroadException
|
||||
try:
|
||||
logdir = self.get_logdir()
|
||||
except Exception:
|
||||
logdir = None
|
||||
self.trains = EventTrainsWriter(PatchSummaryToEventTransformer.__main_task.get_logger(),
|
||||
self.clearml = EventTrainsWriter(PatchSummaryToEventTransformer.__main_task.get_logger(),
|
||||
logdir=logdir, **PatchSummaryToEventTransformer.defaults_dict)
|
||||
# noinspection PyBroadException
|
||||
try:
|
||||
self.trains.add_event(*args, **kwargs)
|
||||
self.clearml.add_event(*args, **kwargs)
|
||||
except Exception:
|
||||
pass
|
||||
return PatchSummaryToEventTransformer._original_add_eventX(self, *args, **kwargs)
|
||||
@@ -947,17 +947,17 @@ class PatchSummaryToEventTransformer(object):
|
||||
|
||||
@staticmethod
|
||||
def _patched_getattribute_(self, attr, get_base):
|
||||
# no main task, zero chance we have an Trains event logger
|
||||
# no main task, zero chance we have an ClearML event logger
|
||||
if PatchSummaryToEventTransformer.__main_task is None:
|
||||
return get_base(self, attr)
|
||||
|
||||
# check if we already have an Trains event logger
|
||||
# check if we already have an ClearML event logger
|
||||
__dict__ = get_base(self, '__dict__')
|
||||
if 'event_writer' not in __dict__ or \
|
||||
isinstance(__dict__['event_writer'], (ProxyEventsWriter, EventTrainsWriter)):
|
||||
return get_base(self, attr)
|
||||
|
||||
# patch the events writer field, and add a double Event Logger (Trains and original)
|
||||
# patch the events writer field, and add a double Event Logger (ClearML and original)
|
||||
base_eventwriter = __dict__['event_writer']
|
||||
# noinspection PyBroadException
|
||||
try:
|
||||
@@ -1062,7 +1062,7 @@ class PatchModelCheckPointCallback(object):
|
||||
if PatchModelCheckPointCallback.__original_getattribute is None and callbacks is not None:
|
||||
PatchModelCheckPointCallback.__original_getattribute = callbacks.ModelCheckpoint.__getattribute__
|
||||
callbacks.ModelCheckpoint.__getattribute__ = PatchModelCheckPointCallback._patched_getattribute
|
||||
setattr(callbacks.ModelCheckpoint, 'trains',
|
||||
setattr(callbacks.ModelCheckpoint, 'clearml',
|
||||
property(PatchModelCheckPointCallback.trains_object))
|
||||
|
||||
except Exception as ex:
|
||||
@@ -1072,17 +1072,17 @@ class PatchModelCheckPointCallback(object):
|
||||
def _patched_getattribute(self, attr):
|
||||
get_base = PatchModelCheckPointCallback.__original_getattribute
|
||||
|
||||
# no main task, zero chance we have an Trains event logger
|
||||
# no main task, zero chance we have an ClearML event logger
|
||||
if PatchModelCheckPointCallback.__main_task is None:
|
||||
return get_base(self, attr)
|
||||
|
||||
# check if we already have an Trains event logger
|
||||
# check if we already have an ClearML event logger
|
||||
__dict__ = get_base(self, '__dict__')
|
||||
if 'model' not in __dict__ or \
|
||||
isinstance(__dict__['model'], _ModelAdapter):
|
||||
return get_base(self, attr)
|
||||
|
||||
# patch the events writer field, and add a double Event Logger (Trains and original)
|
||||
# patch the events writer field, and add a double Event Logger (ClearML and original)
|
||||
base_model = __dict__['model']
|
||||
defaults_dict = __dict__.get('_trains_defaults') or PatchModelCheckPointCallback.defaults_dict
|
||||
output_model = OutputModel(
|
||||
|
||||
1
clearml/cli/__init__.py
Normal file
1
clearml/cli/__init__.py
Normal file
@@ -0,0 +1 @@
|
||||
|
||||
1
clearml/cli/config/__init__.py
Normal file
1
clearml/cli/config/__init__.py
Normal file
@@ -0,0 +1 @@
|
||||
|
||||
@@ -1,4 +1,4 @@
|
||||
""" Trains configuration wizard"""
|
||||
""" ClearML configuration wizard"""
|
||||
from __future__ import print_function
|
||||
|
||||
import argparse
|
||||
@@ -8,22 +8,23 @@ from pathlib2 import Path
|
||||
from six.moves import input
|
||||
from six.moves.urllib.parse import urlparse
|
||||
|
||||
from trains.backend_api.session import Session
|
||||
from trains.backend_api.session.defs import ENV_HOST
|
||||
from trains.backend_config.defs import LOCAL_CONFIG_FILES, LOCAL_CONFIG_FILE_OVERRIDE_VAR
|
||||
from trains.config import config_obj
|
||||
from trains.utilities.pyhocon import ConfigFactory, ConfigMissingException
|
||||
from clearml.backend_api.session import Session
|
||||
from clearml.backend_api.session.defs import ENV_HOST
|
||||
from clearml.backend_config.defs import LOCAL_CONFIG_FILES, LOCAL_CONFIG_FILE_OVERRIDE_VAR
|
||||
from clearml.config import config_obj
|
||||
from clearml.utilities.pyhocon import ConfigFactory, ConfigMissingException
|
||||
|
||||
description = "\n" \
|
||||
"Please create new trains credentials through the profile page in " \
|
||||
"your trains web app (e.g. http://localhost:8080/profile)\n" \
|
||||
"Please create new clearml credentials through the profile page in " \
|
||||
"your clearml web app (e.g. http://localhost:8080/profile) \n"\
|
||||
"Or with the free hosted service at https://app.community.clear.ml/profile\n" \
|
||||
"In the profile page, press \"Create new credentials\", then press \"Copy to clipboard\".\n" \
|
||||
"\n" \
|
||||
"Paste copied configuration here:\n"
|
||||
|
||||
host_description = """
|
||||
Editing configuration file: {CONFIG_FILE}
|
||||
Enter the url of the trains-server's Web service, for example: {HOST}
|
||||
Enter the url of the clearml-server's Web service, for example: {HOST}
|
||||
"""
|
||||
|
||||
# noinspection PyBroadException
|
||||
@@ -40,7 +41,12 @@ def validate_file(string):
|
||||
|
||||
|
||||
def main():
|
||||
default_config_file = os.getenv(LOCAL_CONFIG_FILE_OVERRIDE_VAR) or LOCAL_CONFIG_FILES[0]
|
||||
default_config_file = LOCAL_CONFIG_FILE_OVERRIDE_VAR.get()
|
||||
if not default_config_file:
|
||||
for f in LOCAL_CONFIG_FILES:
|
||||
default_config_file = f
|
||||
if os.path.exists(os.path.expanduser(os.path.expandvars(f))):
|
||||
break
|
||||
|
||||
p = argparse.ArgumentParser(description=__doc__)
|
||||
p.add_argument(
|
||||
@@ -51,16 +57,20 @@ def main():
|
||||
|
||||
args = p.parse_args()
|
||||
|
||||
print('TRAINS SDK setup process')
|
||||
print('ClearML SDK setup process')
|
||||
|
||||
conf_file = Path(args.file).absolute()
|
||||
conf_file = Path(os.path.expanduser(args.file)).absolute()
|
||||
if conf_file.exists() and conf_file.is_file() and conf_file.stat().st_size > 0:
|
||||
print('Configuration file already exists: {}'.format(str(conf_file)))
|
||||
print('Leaving setup, feel free to edit the configuration file.')
|
||||
return
|
||||
print(description, end='')
|
||||
sentinel = ''
|
||||
parse_input = '\n'.join(iter(input, sentinel))
|
||||
parse_input = ''
|
||||
for line in iter(input, sentinel):
|
||||
parse_input += line+'\n'
|
||||
if line.rstrip() == '}':
|
||||
break
|
||||
credentials = None
|
||||
api_server = None
|
||||
web_server = None
|
||||
@@ -104,7 +114,7 @@ def main():
|
||||
|
||||
files_host = input_url('File Store Host', files_host)
|
||||
|
||||
print('\nTRAINS Hosts configuration:\nWeb App: {}\nAPI: {}\nFile Store: {}\n'.format(
|
||||
print('\nClearML Hosts configuration:\nWeb App: {}\nAPI: {}\nFile Store: {}\n'.format(
|
||||
web_host, api_host, files_host))
|
||||
|
||||
retry = 1
|
||||
@@ -121,7 +131,7 @@ def main():
|
||||
|
||||
# noinspection PyBroadException
|
||||
try:
|
||||
default_sdk_conf = Path(__file__).parent.absolute() / 'sdk.conf'
|
||||
default_sdk_conf = Path(__file__).absolute().parents[2] / 'config/default/sdk.conf'
|
||||
with open(str(default_sdk_conf), 'rt') as f:
|
||||
default_sdk = f.read()
|
||||
except Exception:
|
||||
@@ -130,14 +140,14 @@ def main():
|
||||
# noinspection PyBroadException
|
||||
try:
|
||||
with open(str(conf_file), 'wt') as f:
|
||||
header = '# TRAINS SDK configuration file\n' \
|
||||
header = '# ClearML SDK configuration file\n' \
|
||||
'api {\n' \
|
||||
' # Notice: \'host\' is the api server (default port 8008), not the web server.\n' \
|
||||
' api_server: %s\n' \
|
||||
' web_server: %s\n' \
|
||||
' files_server: %s\n' \
|
||||
' # Credentials are generated using the webapp, %s/profile\n' \
|
||||
' # Override with os environment: TRAINS_API_ACCESS_KEY / TRAINS_API_SECRET_KEY\n' \
|
||||
' # Override with os environment: CLEARML_API_ACCESS_KEY / CLEARML_API_SECRET_KEY\n' \
|
||||
' credentials {"access_key": "%s", "secret_key": "%s"}\n' \
|
||||
'}\n' \
|
||||
'sdk ' % (api_host, web_host, files_host,
|
||||
@@ -149,7 +159,7 @@ def main():
|
||||
return
|
||||
|
||||
print('\nNew configuration stored in {}'.format(str(conf_file)))
|
||||
print('TRAINS setup completed successfully.')
|
||||
print('ClearML setup completed successfully.')
|
||||
|
||||
|
||||
def parse_host(parsed_host, allow_input=True):
|
||||
@@ -290,7 +300,7 @@ def verify_url(parse_input):
|
||||
parsed_host = None
|
||||
except Exception:
|
||||
parsed_host = None
|
||||
print('Could not parse url {}\nEnter your trains-server host: '.format(parse_input), end='')
|
||||
print('Could not parse url {}\nEnter your clearml-server host: '.format(parse_input), end='')
|
||||
return parsed_host
|
||||
|
||||
|
||||
1
clearml/cli/task/__init__.py
Normal file
1
clearml/cli/task/__init__.py
Normal file
@@ -0,0 +1 @@
|
||||
|
||||
119
clearml/cli/task/__main__.py
Normal file
119
clearml/cli/task/__main__.py
Normal file
@@ -0,0 +1,119 @@
|
||||
from argparse import ArgumentParser
|
||||
|
||||
from pathlib2 import Path
|
||||
|
||||
from clearml.backend_interface.task.populate import CreateAndPopulate
|
||||
from clearml import Task
|
||||
|
||||
|
||||
def setup_parser(parser):
|
||||
parser.add_argument('--version', action='store_true', default=None,
|
||||
help='Display the clearml-task utility version')
|
||||
parser.add_argument('--project', type=str, default=None,
|
||||
help='Required: set the project name for the task. '
|
||||
'If --base-task-id is used, this arguments is optional.')
|
||||
parser.add_argument('--name', type=str, default=None, required=True,
|
||||
help='Required: select a name for the remote task')
|
||||
parser.add_argument('--repo', type=str, default=None,
|
||||
help='remote URL for the repository to use. '
|
||||
'Example: --repo https://github.com/allegroai/clearml.git')
|
||||
parser.add_argument('--branch', type=str, default=None,
|
||||
help='Select specific repository branch/tag (implies the latest commit from the branch)')
|
||||
parser.add_argument('--commit', type=str, default=None,
|
||||
help='Select specific commit id to use (default: latest commit, '
|
||||
'or when used with local repository matching the local commit id)')
|
||||
parser.add_argument('--folder', type=str, default=None,
|
||||
help='Remotely execute the code in the local folder. '
|
||||
'Notice! It assumes a git repository already exists. '
|
||||
'Current state of the repo (commit id and uncommitted changes) is logged '
|
||||
'and will be replicated on the remote machine')
|
||||
parser.add_argument('--script', type=str, default=None,
|
||||
help='Specify the entry point script for the remote execution. '
|
||||
'When used in tandem with --repo the script should be a relative path inside '
|
||||
'the repository, for example: --script source/train.py .'
|
||||
'When used with --folder it supports a direct path to a file inside the local '
|
||||
'repository itself, for example: --script ~/project/source/train.py')
|
||||
parser.add_argument('--cwd', type=str, default=None,
|
||||
help='Working directory to launch the script from. Default: repository root folder. '
|
||||
'Relative to repo root or local folder')
|
||||
parser.add_argument('--args', default=None, nargs='*',
|
||||
help='Arguments to pass to the remote execution, list of <argument>=<value> strings.'
|
||||
'Currently only argparse arguments are supported. '
|
||||
'Example: --args lr=0.003 batch_size=64')
|
||||
parser.add_argument('--queue', type=str, default=None,
|
||||
help='Select the queue to launch the task. '
|
||||
'If not provided a Task will be created but it will not be launched.')
|
||||
parser.add_argument('--requirements', type=str, default=None,
|
||||
help='Specify requirements.txt file to install when setting the session. '
|
||||
'If not provided, the requirements.txt from the repository will be used.')
|
||||
parser.add_argument('--packages', default=None, nargs='*',
|
||||
help='Manually specify a list of required packages. '
|
||||
'Example: --packages "tqdm>=2.1" "scikit-learn"')
|
||||
parser.add_argument('--docker', type=str, default=None,
|
||||
help='Select the docker image to use in the remote session')
|
||||
parser.add_argument('--skip-task-init', action='store_true', default=None,
|
||||
help='If set, Task.init() call is not added to the entry point, and is assumed '
|
||||
'to be called in within the script. Default: add Task.init() call entry point script')
|
||||
parser.add_argument('--base-task-id', type=str, default=None,
|
||||
help='Use a pre-existing task in the system, instead of a local repo/script. '
|
||||
'Essentially clones an existing task and overrides arguments/requirements.')
|
||||
|
||||
|
||||
def cli():
|
||||
title = 'ClearML launch - launch any codebase on remote machine running clearml-agent'
|
||||
print(title)
|
||||
parser = ArgumentParser(description=title)
|
||||
setup_parser(parser)
|
||||
|
||||
# get the args
|
||||
args = parser.parse_args()
|
||||
|
||||
if args.version:
|
||||
from ...version import __version__
|
||||
print('Version {}'.format(__version__))
|
||||
exit(0)
|
||||
|
||||
create_populate = CreateAndPopulate(
|
||||
project_name=args.project,
|
||||
task_name=args.name,
|
||||
repo=args.repo or args.folder,
|
||||
branch=args.branch,
|
||||
commit=args.commit,
|
||||
script=args.script,
|
||||
working_directory=args.cwd,
|
||||
packages=args.packages,
|
||||
requirements_file=args.requirements,
|
||||
base_task_id=args.base_task_id,
|
||||
add_task_init_call=not args.skip_task_init,
|
||||
raise_on_missing_entries=True,
|
||||
)
|
||||
# verify args
|
||||
create_populate.update_task_args(args.args)
|
||||
|
||||
print('Creating new task')
|
||||
create_populate.create_task()
|
||||
# update Task args
|
||||
create_populate.update_task_args(args.args)
|
||||
|
||||
print('New task created id={}'.format(create_populate.get_id()))
|
||||
if not args.queue:
|
||||
print('Warning: No queue was provided, leaving task in draft-mode.')
|
||||
exit(0)
|
||||
|
||||
Task.enqueue(create_populate.task, queue_name=args.queue)
|
||||
print('Task id={} sent for execution on queue {}'.format(create_populate.get_id(), args.queue))
|
||||
print('Execution log at: {}'.format(create_populate.task.get_output_log_web_page()))
|
||||
|
||||
|
||||
def main():
|
||||
try:
|
||||
cli()
|
||||
except KeyboardInterrupt:
|
||||
print('\nUser aborted')
|
||||
except Exception as ex:
|
||||
print('\nError: {}'.format(ex))
|
||||
exit(1)
|
||||
|
||||
|
||||
if __name__ == '__main__':
|
||||
main()
|
||||
@@ -135,7 +135,10 @@ def dev_worker_name():
|
||||
def __set_is_master_node():
|
||||
# noinspection PyBroadException
|
||||
try:
|
||||
force_master_node = os.environ.pop('TRAINS_FORCE_MASTER_NODE', None)
|
||||
# pop both set the first
|
||||
env_a = os.environ.pop('CLEARML_FORCE_MASTER_NODE', None)
|
||||
env_b = os.environ.pop('TRAINS_FORCE_MASTER_NODE', None)
|
||||
force_master_node = env_a or env_b
|
||||
except Exception:
|
||||
force_master_node = None
|
||||
|
||||
|
||||
@@ -2,7 +2,7 @@
|
||||
version: 1
|
||||
disable_existing_loggers: 0
|
||||
loggers {
|
||||
trains {
|
||||
clearml {
|
||||
level: INFO
|
||||
}
|
||||
boto {
|
||||
|
||||
@@ -1,10 +1,10 @@
|
||||
{
|
||||
# TRAINS - default SDK configuration
|
||||
# ClearML - default SDK configuration
|
||||
|
||||
storage {
|
||||
cache {
|
||||
# Defaults to system temp folder / cache
|
||||
default_base_dir: "~/.trains/cache"
|
||||
default_base_dir: "~/.clearml/cache"
|
||||
}
|
||||
|
||||
direct_access: [
|
||||
@@ -93,7 +93,7 @@
|
||||
google.storage {
|
||||
# # Default project and credentials file
|
||||
# # Will be used when no bucket configuration is found
|
||||
# project: "trains"
|
||||
# project: "clearml"
|
||||
# credentials_json: "/path/to/credentials.json"
|
||||
|
||||
# # Specific credentials per bucket and sub directory
|
||||
@@ -101,7 +101,7 @@
|
||||
# {
|
||||
# bucket: "my-bucket"
|
||||
# subdir: "path/in/bucket" # Not required
|
||||
# project: "trains"
|
||||
# project: "clearml"
|
||||
# credentials_json: "/path/to/credentials.json"
|
||||
# },
|
||||
# ]
|
||||
@@ -109,7 +109,7 @@
|
||||
azure.storage {
|
||||
# containers: [
|
||||
# {
|
||||
# account_name: "trains"
|
||||
# account_name: "clearml"
|
||||
# account_key: "secret"
|
||||
# # container_name:
|
||||
# }
|
||||
@@ -150,8 +150,8 @@
|
||||
# do not analyze the entire repository.
|
||||
force_analyze_entire_repo: false
|
||||
|
||||
# If set to true, *trains* update message will not be printed to the console
|
||||
# this value can be overwritten with os environment variable TRAINS_SUPPRESS_UPDATE_MESSAGE=1
|
||||
# If set to true, *clearml* update message will not be printed to the console
|
||||
# this value can be overwritten with os environment variable CLEARML_SUPPRESS_UPDATE_MESSAGE=1
|
||||
suppress_update_message: false
|
||||
|
||||
# If this flag is true (default is false), instead of analyzing the code with Pigar, analyze with `pip freeze`
|
||||
@@ -161,7 +161,7 @@
|
||||
# of the Hyper-Parameters.
|
||||
# multiple selected variables are supported including the suffix '*'.
|
||||
# For example: "AWS_*" will log any OS environment variable starting with 'AWS_'.
|
||||
# This value can be overwritten with os environment variable TRAINS_LOG_ENVIRONMENT="[AWS_*, CUDA_VERSION]"
|
||||
# This value can be overwritten with os environment variable CLEARML_LOG_ENVIRONMENT="[AWS_*, CUDA_VERSION]"
|
||||
# Example: log_os_environments: ["AWS_*", "CUDA_VERSION"]
|
||||
log_os_environments: []
|
||||
|
||||
|
||||
@@ -5,28 +5,28 @@ from ..backend_config.converters import base64_to_text, or_
|
||||
from pathlib2 import Path
|
||||
|
||||
SESSION_CACHE_FILE = ".session.json"
|
||||
DEFAULT_CACHE_DIR = str(Path(tempfile.gettempdir()) / "trains_cache")
|
||||
DEFAULT_CACHE_DIR = str(Path(tempfile.gettempdir()) / "clearml_cache")
|
||||
|
||||
TASK_ID_ENV_VAR = EnvEntry("TRAINS_TASK_ID", "ALG_TASK_ID")
|
||||
DOCKER_IMAGE_ENV_VAR = EnvEntry("TRAINS_DOCKER_IMAGE", "ALG_DOCKER_IMAGE")
|
||||
LOG_TO_BACKEND_ENV_VAR = EnvEntry("TRAINS_LOG_TASK_TO_BACKEND", "ALG_LOG_TASK_TO_BACKEND", type=bool)
|
||||
NODE_ID_ENV_VAR = EnvEntry("TRAINS_NODE_ID", "ALG_NODE_ID", type=int)
|
||||
PROC_MASTER_ID_ENV_VAR = EnvEntry("TRAINS_PROC_MASTER_ID", "ALG_PROC_MASTER_ID", type=str)
|
||||
LOG_STDERR_REDIRECT_LEVEL = EnvEntry("TRAINS_LOG_STDERR_REDIRECT_LEVEL", "ALG_LOG_STDERR_REDIRECT_LEVEL")
|
||||
DEV_WORKER_NAME = EnvEntry("TRAINS_WORKER_NAME", "ALG_WORKER_NAME")
|
||||
DEV_TASK_NO_REUSE = EnvEntry("TRAINS_TASK_NO_REUSE", "ALG_TASK_NO_REUSE", type=bool)
|
||||
TASK_LOG_ENVIRONMENT = EnvEntry("TRAINS_LOG_ENVIRONMENT", "ALG_LOG_ENVIRONMENT", type=str)
|
||||
TRAINS_CACHE_DIR = EnvEntry("TRAINS_CACHE_DIR", "ALG_CACHE_DIR")
|
||||
TASK_ID_ENV_VAR = EnvEntry("CLEARML_TASK_ID", "TRAINS_TASK_ID")
|
||||
DOCKER_IMAGE_ENV_VAR = EnvEntry("CLEARML_DOCKER_IMAGE", "TRAINS_DOCKER_IMAGE")
|
||||
LOG_TO_BACKEND_ENV_VAR = EnvEntry("CLEARML_LOG_TASK_TO_BACKEND", "TRAINS_LOG_TASK_TO_BACKEND", type=bool)
|
||||
NODE_ID_ENV_VAR = EnvEntry("CLEARML_NODE_ID", "TRAINS_NODE_ID", type=int)
|
||||
PROC_MASTER_ID_ENV_VAR = EnvEntry("CLEARML_PROC_MASTER_ID", "TRAINS_PROC_MASTER_ID", type=str)
|
||||
LOG_STDERR_REDIRECT_LEVEL = EnvEntry("CLEARML_LOG_STDERR_REDIRECT_LEVEL", "TRAINS_LOG_STDERR_REDIRECT_LEVEL")
|
||||
DEV_WORKER_NAME = EnvEntry("CLEARML_WORKER_NAME", "TRAINS_WORKER_NAME")
|
||||
DEV_TASK_NO_REUSE = EnvEntry("CLEARML_TASK_NO_REUSE", "TRAINS_TASK_NO_REUSE", type=bool)
|
||||
TASK_LOG_ENVIRONMENT = EnvEntry("CLEARML_LOG_ENVIRONMENT", "TRAINS_LOG_ENVIRONMENT", type=str)
|
||||
TRAINS_CACHE_DIR = EnvEntry("CLEARML_CACHE_DIR", "TRAINS_CACHE_DIR")
|
||||
|
||||
LOG_LEVEL_ENV_VAR = EnvEntry("TRAINS_LOG_LEVEL", "ALG_LOG_LEVEL", converter=or_(int, str))
|
||||
LOG_LEVEL_ENV_VAR = EnvEntry("CLEARML_LOG_LEVEL", "TRAINS_LOG_LEVEL", converter=or_(int, str))
|
||||
|
||||
SUPPRESS_UPDATE_MESSAGE_ENV_VAR = EnvEntry("TRAINS_SUPPRESS_UPDATE_MESSAGE", "ALG_SUPPRESS_UPDATE_MESSAGE", type=bool)
|
||||
SUPPRESS_UPDATE_MESSAGE_ENV_VAR = EnvEntry("CLEARML_SUPPRESS_UPDATE_MESSAGE", "TRAINS_SUPPRESS_UPDATE_MESSAGE", type=bool)
|
||||
|
||||
# Repository detection
|
||||
VCS_REPO_TYPE = EnvEntry("TRAINS_VCS_REPO_TYPE", "ALG_VCS_REPO_TYPE", default="git")
|
||||
VCS_REPOSITORY_URL = EnvEntry("TRAINS_VCS_REPO_URL", "ALG_VCS_REPO_URL")
|
||||
VCS_COMMIT_ID = EnvEntry("TRAINS_VCS_COMMIT_ID", "ALG_VCS_COMMIT_ID")
|
||||
VCS_BRANCH = EnvEntry("TRAINS_VCS_BRANCH", "ALG_VCS_BRANCH")
|
||||
VCS_ROOT = EnvEntry("TRAINS_VCS_ROOT", "ALG_VCS_ROOT")
|
||||
VCS_STATUS = EnvEntry("TRAINS_VCS_STATUS", "ALG_VCS_STATUS", converter=base64_to_text)
|
||||
VCS_DIFF = EnvEntry("TRAINS_VCS_DIFF", "ALG_VCS_DIFF", converter=base64_to_text)
|
||||
VCS_REPO_TYPE = EnvEntry("CLEARML_VCS_REPO_TYPE", "TRAINS_VCS_REPO_TYPE", default="git")
|
||||
VCS_REPOSITORY_URL = EnvEntry("CLEARML_VCS_REPO_URL", "TRAINS_VCS_REPO_URL")
|
||||
VCS_COMMIT_ID = EnvEntry("CLEARML_VCS_COMMIT_ID", "TRAINS_VCS_COMMIT_ID")
|
||||
VCS_BRANCH = EnvEntry("CLEARML_VCS_BRANCH", "TRAINS_VCS_BRANCH")
|
||||
VCS_ROOT = EnvEntry("CLEARML_VCS_ROOT", "TRAINS_VCS_ROOT")
|
||||
VCS_STATUS = EnvEntry("CLEARML_VCS_STATUS", "TRAINS_VCS_STATUS", converter=base64_to_text)
|
||||
VCS_DIFF = EnvEntry("CLEARML_VCS_DIFF", "TRAINS_VCS_DIFF", converter=base64_to_text)
|
||||
|
||||
6
clearml/datasets/__init__.py
Normal file
6
clearml/datasets/__init__.py
Normal file
@@ -0,0 +1,6 @@
|
||||
from .dataset import FileEntry, Dataset
|
||||
|
||||
__all__ = [
|
||||
"FileEntry",
|
||||
"Dataset",
|
||||
]
|
||||
1244
clearml/datasets/dataset.py
Normal file
1244
clearml/datasets/dataset.py
Normal file
File diff suppressed because it is too large
Load Diff
@@ -56,7 +56,7 @@ class LoggerRoot(object):
|
||||
# avoid nested imports
|
||||
from ..config import get_log_redirect_level
|
||||
|
||||
LoggerRoot.__base_logger = logging.getLogger('trains')
|
||||
LoggerRoot.__base_logger = logging.getLogger('clearml')
|
||||
level = level if level is not None else default_level
|
||||
LoggerRoot.__base_logger.setLevel(level)
|
||||
|
||||
|
||||
@@ -145,11 +145,11 @@ def _patch_module(module, prefix='', basepath=None, basemodule=None, exclude_pre
|
||||
prefix += module.__name__.split('.')[-1] + '.'
|
||||
|
||||
# Do not patch low level network layer
|
||||
if prefix.startswith('trains.backend_api.session.') and prefix != 'trains.backend_api.session.':
|
||||
if prefix.startswith('clearml.backend_api.session.') and prefix != 'clearml.backend_api.session.':
|
||||
if not prefix.endswith('.Session.') and '.token_manager.' not in prefix:
|
||||
# print('SKIPPING: {}'.format(prefix))
|
||||
return
|
||||
if prefix.startswith('trains.backend_api.services.'):
|
||||
if prefix.startswith('clearml.backend_api.services.'):
|
||||
return
|
||||
|
||||
for skip in exclude_prefixes:
|
||||
@@ -208,7 +208,7 @@ def _patch_module(module, prefix='', basepath=None, basemodule=None, exclude_pre
|
||||
|
||||
def trace_trains(stream=None, level=1, exclude_prefixes=[], only_prefix=[]):
|
||||
"""
|
||||
DEBUG ONLY - Add full Trains package code trace
|
||||
DEBUG ONLY - Add full ClearML package code trace
|
||||
Output trace to filename or stream, default is sys.stderr
|
||||
Trace level
|
||||
-2: Trace function and arguments and returned call
|
||||
@@ -244,7 +244,7 @@ def trace_trains(stream=None, level=1, exclude_prefixes=[], only_prefix=[]):
|
||||
__stream_flush = None
|
||||
|
||||
from ..version import __version__
|
||||
msg = 'Trains v{} - Starting Trace\n\n'.format(__version__)
|
||||
msg = 'ClearML v{} - Starting Trace\n\n'.format(__version__)
|
||||
# print to actual stderr
|
||||
stderr_write(msg)
|
||||
# store to stream
|
||||
@@ -252,7 +252,7 @@ def trace_trains(stream=None, level=1, exclude_prefixes=[], only_prefix=[]):
|
||||
__stream_write('{:9}:{:5}:{:8}: {:14}\n'.format('seconds', 'pid', 'tid', 'self'))
|
||||
__stream_write('{:9}:{:5}:{:8}:{:15}\n'.format('-' * 9, '-' * 5, '-' * 8, '-' * 15))
|
||||
__trace_start = time.time()
|
||||
_patch_module('trains', exclude_prefixes=exclude_prefixes or [], only_prefix=only_prefix or [])
|
||||
_patch_module('clearml', exclude_prefixes=exclude_prefixes or [], only_prefix=only_prefix or [])
|
||||
|
||||
|
||||
def trace_level(level=1):
|
||||
@@ -343,7 +343,7 @@ def end_of_program():
|
||||
|
||||
|
||||
if __name__ == '__main__':
|
||||
# from trains import Task
|
||||
# from clearml import Task
|
||||
# task = Task.init(project_name="examples", task_name="trace test")
|
||||
# trace_trains('_trace.txt', level=2)
|
||||
print_traced_files('_trace_*.txt', lines_per_tid=10)
|
||||
|
||||
@@ -1,3 +1,3 @@
|
||||
class UsageError(RuntimeError):
|
||||
""" An exception raised for illegal usage of trains objects"""
|
||||
""" An exception raised for illegal usage of clearml objects"""
|
||||
pass
|
||||
|
||||
4
clearml/external/kerastuner.py
vendored
4
clearml/external/kerastuner.py
vendored
@@ -14,7 +14,7 @@ try:
|
||||
except ImportError:
|
||||
pd = None
|
||||
from logging import getLogger
|
||||
getLogger('trains.external.kerastuner').warning(
|
||||
getLogger('clearml.external.kerastuner').warning(
|
||||
'Pandas is not installed, summary table reporting will be skipped.')
|
||||
|
||||
|
||||
@@ -26,7 +26,7 @@ class TrainsTunerLogger(Logger):
|
||||
super(TrainsTunerLogger, self).__init__()
|
||||
self.task = task or Task.current_task()
|
||||
if not self.task:
|
||||
raise ValueError("Trains Task could not be found, pass in TrainsTunerLogger or "
|
||||
raise ValueError("ClearML Task could not be found, pass in TrainsTunerLogger or "
|
||||
"call Task.init before initializing TrainsTunerLogger")
|
||||
self._summary = pd.DataFrame() if pd else None
|
||||
|
||||
|
||||
@@ -36,14 +36,14 @@ if TYPE_CHECKING:
|
||||
|
||||
class Logger(object):
|
||||
"""
|
||||
The ``Logger`` class is the Trains console log and metric statistics interface, and contains methods for explicit
|
||||
The ``Logger`` class is the ClearML console log and metric statistics interface, and contains methods for explicit
|
||||
reporting.
|
||||
|
||||
Explicit reporting extends Trains automagical capturing of inputs and output. Explicit reporting
|
||||
Explicit reporting extends ClearML automagical capturing of inputs and output. Explicit reporting
|
||||
methods include scalar plots, line plots, histograms, confusion matrices, 2D and 3D scatter
|
||||
diagrams, text logging, tables, and image uploading and reporting.
|
||||
|
||||
In the **Trains Web-App (UI)**, ``Logger`` output appears in the **RESULTS** tab, **LOG**, **SCALARS**,
|
||||
In the **ClearML Web-App (UI)**, ``Logger`` output appears in the **RESULTS** tab, **LOG**, **SCALARS**,
|
||||
**PLOTS**, and **DEBUG SAMPLES** sub-tabs. When you compare experiments, ``Logger`` output appears in the
|
||||
comparisons.
|
||||
|
||||
@@ -90,7 +90,7 @@ class Logger(object):
|
||||
if self._connect_logging:
|
||||
StdStreamPatch.patch_logging_formatter(self)
|
||||
elif not self._connect_std_streams:
|
||||
# make sure that at least the main trains logger is connect
|
||||
# make sure that at least the main clearml logger is connect
|
||||
base_logger = LoggerRoot.get_base_logger()
|
||||
if base_logger and base_logger.handlers:
|
||||
StdStreamPatch.patch_logging_formatter(self, base_logger.handlers[0])
|
||||
@@ -126,7 +126,7 @@ class Logger(object):
|
||||
|
||||
logger.report_text('log some text', level=logging.DEBUG, print_console=False)
|
||||
|
||||
You can view the reported text in the **Trains Web-App (UI)**, **RESULTS** tab, **LOG** sub-tab.
|
||||
You can view the reported text in the **ClearML Web-App (UI)**, **RESULTS** tab, **LOG** sub-tab.
|
||||
|
||||
:param str msg: The text to log.
|
||||
:param int level: The log level from the Python ``logging`` package. The default value is ``logging.INFO``.
|
||||
@@ -151,7 +151,7 @@ class Logger(object):
|
||||
scalar_series = [random.randint(0,10) for i in range(10)]
|
||||
logger.report_scalar(title='scalar metrics','series', value=scalar_series[iteration], iteration=0)
|
||||
|
||||
You can view the scalar plots in the **Trains Web-App (UI)**, **RESULTS** tab, **SCALARS** sub-tab.
|
||||
You can view the scalar plots in the **ClearML Web-App (UI)**, **RESULTS** tab, **SCALARS** sub-tab.
|
||||
|
||||
:param str title: The title (metric) of the plot. Plot more than one scalar series on the same plot by using
|
||||
the same ``title`` for each call to this method.
|
||||
@@ -190,7 +190,7 @@ class Logger(object):
|
||||
logger.report_vector(title='vector example', series='vector series', values=vector_series, iteration=0,
|
||||
labels=['A','B'], xaxis='X axis label', yaxis='Y axis label')
|
||||
|
||||
You can view the vectors plots in the **Trains Web-App (UI)**, **RESULTS** tab, **PLOTS** sub-tab.
|
||||
You can view the vectors plots in the **ClearML Web-App (UI)**, **RESULTS** tab, **PLOTS** sub-tab.
|
||||
|
||||
:param str title: The title (metric) of the plot.
|
||||
:param str series: The series name (variant) of the reported histogram.
|
||||
@@ -237,7 +237,7 @@ class Logger(object):
|
||||
logger.report_histogram(title='histogram example', series='histogram series',
|
||||
values=vector_series, iteration=0, labels=['A','B'], xaxis='X axis label', yaxis='Y axis label')
|
||||
|
||||
You can view the reported histograms in the **Trains Web-App (UI)**, **RESULTS** tab, **PLOTS** sub-tab.
|
||||
You can view the reported histograms in the **ClearML Web-App (UI)**, **RESULTS** tab, **PLOTS** sub-tab.
|
||||
|
||||
:param str title: The title (metric) of the plot.
|
||||
:param str series: The series name (variant) of the reported histogram.
|
||||
@@ -305,7 +305,7 @@ class Logger(object):
|
||||
|
||||
logger.report_table(title='table example',series='pandas DataFrame',iteration=0,table_plot=df)
|
||||
|
||||
You can view the reported tables in the **Trains Web-App (UI)**, **RESULTS** tab, **PLOTS** sub-tab.
|
||||
You can view the reported tables in the **ClearML Web-App (UI)**, **RESULTS** tab, **PLOTS** sub-tab.
|
||||
|
||||
:param str title: The title (metric) of the table.
|
||||
:param str series: The series name (variant) of the reported table.
|
||||
@@ -1022,8 +1022,8 @@ class Logger(object):
|
||||
The images are uploaded separately. A link to each image is reported.
|
||||
|
||||
.. note::
|
||||
Credentials for the destination storage are specified in the Trains configuration file,
|
||||
``~/trains.conf``.
|
||||
Credentials for the destination storage are specified in the ClearML configuration file,
|
||||
``~/clearml.conf``.
|
||||
|
||||
:param str uri: example: 's3://bucket/directory/' or 'file:///tmp/debug/'
|
||||
|
||||
@@ -1150,7 +1150,7 @@ class Logger(object):
|
||||
The values are:
|
||||
|
||||
- ``True`` - Scalars without specific titles are grouped together in the "Scalars" plot, preserving
|
||||
backward compatibility with Trains automagical behavior.
|
||||
backward compatibility with ClearML automagical behavior.
|
||||
- ``False`` - TensorBoard scalars without titles get a title/series with the same tag. (default)
|
||||
:type group_scalars: bool
|
||||
"""
|
||||
@@ -1214,7 +1214,7 @@ class Logger(object):
|
||||
try:
|
||||
# make sure we are writing to the original stdout
|
||||
StdStreamPatch.stderr_original_write(
|
||||
'trains.Logger failed sending log [level {}]: "{}"\n'.format(level, msg))
|
||||
'clearml.Logger failed sending log [level {}]: "{}"\n'.format(level, msg))
|
||||
except Exception:
|
||||
pass
|
||||
else:
|
||||
|
||||
@@ -472,9 +472,9 @@ class InputModel(Model):
|
||||
framework, and indicate whether to immediately set the model's status to ``Published``.
|
||||
The model is read-only.
|
||||
|
||||
The **Trains Server** (backend) may already store the model's URL. If the input model's URL is not
|
||||
stored, meaning the model is new, then it is imported and Trains stores its metadata.
|
||||
If the URL is already stored, the import process stops, Trains issues a warning message, and Trains
|
||||
The **ClearML Server** (backend) may already store the model's URL. If the input model's URL is not
|
||||
stored, meaning the model is new, then it is imported and ClearML stores its metadata.
|
||||
If the URL is already stored, the import process stops, ClearML issues a warning message, and ClearML
|
||||
reuses the model.
|
||||
|
||||
In your Python experiment script, after importing the model, you can connect it to the main execution
|
||||
@@ -482,12 +482,12 @@ class InputModel(Model):
|
||||
network.
|
||||
|
||||
.. note::
|
||||
Using the **Trains Web-App** (user interface), you can reuse imported models and switch models in
|
||||
Using the **ClearML Web-App** (user interface), you can reuse imported models and switch models in
|
||||
experiments.
|
||||
|
||||
:param str weights_url: A valid URL for the initial weights file. If the **Trains Web-App** (backend)
|
||||
:param str weights_url: A valid URL for the initial weights file. If the **ClearML Web-App** (backend)
|
||||
already stores the metadata of a model with the same URL, that existing model is returned
|
||||
and Trains ignores all other parameters.
|
||||
and ClearML ignores all other parameters.
|
||||
|
||||
For example:
|
||||
|
||||
@@ -715,7 +715,7 @@ class InputModel(Model):
|
||||
def __init__(self, model_id):
|
||||
# type: (str) -> None
|
||||
"""
|
||||
:param str model_id: The Trains Id (system UUID) of the input model whose metadata the **Trains Server**
|
||||
:param str model_id: The ClearML Id (system UUID) of the input model whose metadata the **ClearML Server**
|
||||
(backend) stores.
|
||||
"""
|
||||
super(InputModel, self).__init__(model_id)
|
||||
@@ -731,16 +731,16 @@ class InputModel(Model):
|
||||
Connect the current model to a Task object, if the model is preexisting. Preexisting models include:
|
||||
|
||||
- Imported models (InputModel objects created using the :meth:`Logger.import_model` method).
|
||||
- Models whose metadata is already in the Trains platform, meaning the InputModel object is instantiated
|
||||
from the ``InputModel`` class specifying the the model's Trains Id as an argument.
|
||||
- Models whose origin is not Trains that are used to create an InputModel object. For example,
|
||||
- Models whose metadata is already in the ClearML platform, meaning the InputModel object is instantiated
|
||||
from the ``InputModel`` class specifying the the model's ClearML Id as an argument.
|
||||
- Models whose origin is not ClearML that are used to create an InputModel object. For example,
|
||||
models created using TensorFlow models.
|
||||
|
||||
When the experiment is executed remotely in a worker, the input model already specified in the experiment is
|
||||
used.
|
||||
|
||||
.. note::
|
||||
The **Trains Web-App** allows you to switch one input model for another and then enqueue the experiment
|
||||
The **ClearML Web-App** allows you to switch one input model for another and then enqueue the experiment
|
||||
to execute in a worker.
|
||||
|
||||
:param object task: A Task object.
|
||||
@@ -789,7 +789,7 @@ class OutputModel(BaseModel):
|
||||
|
||||
.. note::
|
||||
When executing a Task (experiment) remotely in a worker, you can modify the model configuration and / or model's
|
||||
label enumeration using the **Trains Web-App**.
|
||||
label enumeration using the **ClearML Web-App**.
|
||||
"""
|
||||
|
||||
@property
|
||||
@@ -990,7 +990,7 @@ class OutputModel(BaseModel):
|
||||
Connect the current model to a Task object, if the model is a preexisting model. Preexisting models include:
|
||||
|
||||
- Imported models.
|
||||
- Models whose metadata the **Trains Server** (backend) is already storing.
|
||||
- Models whose metadata the **ClearML Server** (backend) is already storing.
|
||||
- Models from another source, such as frameworks like TensorFlow.
|
||||
|
||||
:param object task: A Task object.
|
||||
@@ -1044,8 +1044,8 @@ class OutputModel(BaseModel):
|
||||
Using this method, files uploads are separate and then a link to each is stored in the model object.
|
||||
|
||||
.. note::
|
||||
For storage requiring credentials, the credentials are stored in the Trains configuration file,
|
||||
``~/trains.conf``.
|
||||
For storage requiring credentials, the credentials are stored in the ClearML configuration file,
|
||||
``~/clearml.conf``.
|
||||
|
||||
:param str uri: The URI of the upload storage destination.
|
||||
|
||||
|
||||
@@ -58,10 +58,10 @@ class CacheManager(object):
|
||||
return cached_file
|
||||
|
||||
@staticmethod
|
||||
def upload_file(local_file, remote_url, wait_for_upload=True):
|
||||
def upload_file(local_file, remote_url, wait_for_upload=True, retries=1):
|
||||
helper = StorageHelper.get(remote_url)
|
||||
result = helper.upload(
|
||||
local_file, remote_url, async_enable=not wait_for_upload
|
||||
local_file, remote_url, async_enable=not wait_for_upload, retries=retries,
|
||||
)
|
||||
CacheManager._add_remote_url(remote_url, local_file)
|
||||
return result
|
||||
|
||||
@@ -681,6 +681,7 @@ class StorageHelper(object):
|
||||
# try to get file size
|
||||
try:
|
||||
if isinstance(self._driver, _HttpDriver) and obj:
|
||||
obj = self._driver._get_download_object(obj)
|
||||
total_size_mb = float(obj.headers.get('Content-Length', 0)) / (1024 * 1024)
|
||||
elif hasattr(obj, 'size'):
|
||||
size = obj.size
|
||||
@@ -785,12 +786,12 @@ class StorageHelper(object):
|
||||
def check_write_permissions(self, dest_path=None):
|
||||
# create a temporary file, then de;ete it
|
||||
base_url = dest_path or self._base_url
|
||||
dest_path = base_url + '/.trains.test'
|
||||
dest_path = base_url + '/.clearml.test'
|
||||
# do not check http/s connection permissions
|
||||
if dest_path.startswith('http'):
|
||||
return True
|
||||
try:
|
||||
self.upload_from_stream(stream=six.BytesIO(b'trains'), dest_path=dest_path)
|
||||
self.upload_from_stream(stream=six.BytesIO(b'clearml'), dest_path=dest_path)
|
||||
self.delete(path=dest_path)
|
||||
except Exception:
|
||||
raise ValueError('Insufficient permissions for {}'.format(base_url))
|
||||
@@ -1024,6 +1025,11 @@ class _HttpDriver(_Driver):
|
||||
return self._default_backend_session.add_auth_headers({})
|
||||
return None
|
||||
|
||||
class _HttpSessionHandle(object):
|
||||
def __init__(self, url, is_stream, container_name, object_name):
|
||||
self.url, self.is_stream, self.container_name, self.object_name = \
|
||||
url, is_stream, container_name, object_name
|
||||
|
||||
def __init__(self, retries=5):
|
||||
self._retries = retries
|
||||
self._containers = {}
|
||||
@@ -1055,24 +1061,39 @@ class _HttpDriver(_Driver):
|
||||
def list_container_objects(self, *args, **kwargs):
|
||||
raise NotImplementedError('List is not implemented for http protocol')
|
||||
|
||||
def delete_object(self, *args, **kwargs):
|
||||
raise NotImplementedError('Delete is not implemented for http protocol')
|
||||
def delete_object(self, obj, *args, **kwargs):
|
||||
assert isinstance(obj, self._HttpSessionHandle)
|
||||
container = self._containers[obj.container_name]
|
||||
res = container.session.delete(obj.url, headers=container.get_headers(obj.url))
|
||||
if res.status_code != requests.codes.ok:
|
||||
raise ValueError('Failed deleting object %s (%d): %s' % (obj.object_name, res.status_code, res.text))
|
||||
return res
|
||||
|
||||
def get_object(self, container_name, object_name, *args, **kwargs):
|
||||
container = self._containers[container_name]
|
||||
# set stream flag before get request
|
||||
container.session.stream = kwargs.get('stream', True)
|
||||
is_stream = kwargs.get('stream', True)
|
||||
url = ''.join((container_name, object_name.lstrip('/')))
|
||||
res = container.session.get(url, timeout=self.timeout, headers=container.get_headers(url))
|
||||
return self._HttpSessionHandle(url, is_stream, container_name, object_name)
|
||||
|
||||
def _get_download_object(self, obj):
|
||||
# bypass for session result
|
||||
if not isinstance(obj, self._HttpSessionHandle):
|
||||
return obj
|
||||
|
||||
container = self._containers[obj.container_name]
|
||||
# set stream flag before we send the request
|
||||
container.session.stream = obj.is_stream
|
||||
res = container.session.get(obj.url, timeout=self.timeout, headers=container.get_headers(obj.url))
|
||||
if res.status_code != requests.codes.ok:
|
||||
raise ValueError('Failed getting object %s (%d): %s' % (object_name, res.status_code, res.text))
|
||||
raise ValueError('Failed getting object %s (%d): %s' % (obj.object_name, res.status_code, res.text))
|
||||
return res
|
||||
|
||||
def download_object_as_stream(self, obj, chunk_size=64 * 1024, **_):
|
||||
# return iterable object
|
||||
obj = self._get_download_object(obj)
|
||||
return obj.iter_content(chunk_size=chunk_size)
|
||||
|
||||
def download_object(self, obj, local_path, overwrite_existing=True, delete_on_failure=True, callback=None, **_):
|
||||
obj = self._get_download_object(obj)
|
||||
p = Path(local_path)
|
||||
if not overwrite_existing and p.is_file():
|
||||
log.warning('failed saving after download: overwrite=False and file exists (%s)' % str(p))
|
||||
|
||||
@@ -48,8 +48,8 @@ class StorageManager(object):
|
||||
|
||||
@classmethod
|
||||
def upload_file(
|
||||
cls, local_file, remote_url, wait_for_upload=True
|
||||
): # type: (str, str, bool) -> str
|
||||
cls, local_file, remote_url, wait_for_upload=True, retries=1
|
||||
): # type: (str, str, bool, int) -> str
|
||||
"""
|
||||
Upload a local file to a remote location. remote url is the finale destination of the uploaded file.
|
||||
|
||||
@@ -64,12 +64,14 @@ class StorageManager(object):
|
||||
:param str local_file: Full path of a local file to be uploaded
|
||||
:param str remote_url: Full path or remote url to upload to (including file name)
|
||||
:param bool wait_for_upload: If False, return immediately and upload in the background. Default True.
|
||||
:param int retries: Number of retries before failing to upload file, default 1.
|
||||
:return: Newly uploaded remote URL.
|
||||
"""
|
||||
return CacheManager.get_cache_manager().upload_file(
|
||||
local_file=local_file,
|
||||
remote_url=remote_url,
|
||||
wait_for_upload=wait_for_upload,
|
||||
retries=retries,
|
||||
)
|
||||
|
||||
@classmethod
|
||||
|
||||
@@ -1,6 +1,6 @@
|
||||
import hashlib
|
||||
import sys
|
||||
from typing import Optional
|
||||
from typing import Optional, Union
|
||||
|
||||
from six.moves.urllib.parse import quote, urlparse, urlunparse
|
||||
import six
|
||||
@@ -72,8 +72,23 @@ def sha256sum(filename, skip_header=0, block_size=65536):
|
||||
return h.hexdigest(), file_hash.hexdigest() if skip_header else None
|
||||
|
||||
|
||||
def md5text(text, seed=1337):
|
||||
# type: (str, Union[int, str]) -> str
|
||||
"""
|
||||
Return md5 hash of a string
|
||||
Do not use this hash for security, if needed use something stronger like SHA2
|
||||
|
||||
:param text: string to hash
|
||||
:param seed: use prefix seed for hashing
|
||||
:return: md5 string
|
||||
"""
|
||||
h = hashlib.md5()
|
||||
h.update((str(seed) + str(text)).encode('utf-8'))
|
||||
return h.hexdigest()
|
||||
|
||||
|
||||
def is_windows():
|
||||
"""
|
||||
:return: True if currently running on windows OS
|
||||
"""
|
||||
return sys.platform == 'win32'
|
||||
return sys.platform == 'win32'
|
||||
|
||||
275
clearml/task.py
275
clearml/task.py
@@ -77,8 +77,8 @@ class Task(_Task):
|
||||
configuration, label enumeration, models, and other artifacts.
|
||||
|
||||
The term "main execution Task" refers to the Task context for current running experiment. Python experiment scripts
|
||||
can create one, and only one, main execution Task. It is a traceable, and after a script runs and Trains stores
|
||||
the Task in the **Trains Server** (backend), it is modifiable, reproducible, executable by a worker, and you
|
||||
can create one, and only one, main execution Task. It is a traceable, and after a script runs and ClearML stores
|
||||
the Task in the **ClearML Server** (backend), it is modifiable, reproducible, executable by a worker, and you
|
||||
can duplicate it for further experimentation.
|
||||
|
||||
The ``Task`` class and its methods allow you to create and manage experiments, as well as perform
|
||||
@@ -93,7 +93,7 @@ class Task(_Task):
|
||||
- Create a new reproducible Task - :meth:`Task.init`
|
||||
|
||||
.. important::
|
||||
In some cases, ``Task.init`` may return a Task object which is already stored in **Trains Server** (already
|
||||
In some cases, ``Task.init`` may return a Task object which is already stored in **ClearML Server** (already
|
||||
initialized), instead of creating a new Task. For a detailed explanation of those cases, see the ``Task.init``
|
||||
method.
|
||||
|
||||
@@ -102,17 +102,17 @@ class Task(_Task):
|
||||
- Get another (different) Task - :meth:`Task.get_task`
|
||||
|
||||
.. note::
|
||||
The **Trains** documentation often refers to a Task as, "Task (experiment)".
|
||||
The **ClearML** documentation often refers to a Task as, "Task (experiment)".
|
||||
|
||||
"Task" refers to the class in the Trains Python Client Package, the object in your Python experiment script,
|
||||
and the entity with which **Trains Server** and **Trains Agent** work.
|
||||
"Task" refers to the class in the ClearML Python Client Package, the object in your Python experiment script,
|
||||
and the entity with which **ClearML Server** and **ClearML Agent** work.
|
||||
|
||||
"Experiment" refers to your deep learning solution, including its connected components, inputs, and outputs,
|
||||
and is the experiment you can view, analyze, compare, modify, duplicate, and manage using the Trains
|
||||
and is the experiment you can view, analyze, compare, modify, duplicate, and manage using the ClearML
|
||||
**Web-App** (UI).
|
||||
|
||||
Therefore, a "Task" is effectively an "experiment", and "Task (experiment)" encompasses its usage throughout
|
||||
the Trains.
|
||||
the ClearML.
|
||||
|
||||
The exception to this Task behavior is sub-tasks (non-reproducible Tasks), which do not use the main execution
|
||||
Task. Creating a sub-task always creates a new Task with a new Task ID.
|
||||
@@ -197,7 +197,7 @@ class Task(_Task):
|
||||
Creates a new Task (experiment) if:
|
||||
|
||||
- The Task never ran before. No Task with the same ``task_name`` and ``project_name`` is stored in
|
||||
**Trains Server**.
|
||||
**ClearML Server**.
|
||||
- The Task has run before (the same ``task_name`` and ``project_name``), and (a) it stored models and / or
|
||||
artifacts, or (b) its status is Published , or (c) it is Archived.
|
||||
- A new Task is forced by calling ``Task.init`` with ``reuse_last_task_id=False``.
|
||||
@@ -215,7 +215,7 @@ class Task(_Task):
|
||||
|
||||
.. code-block:: py
|
||||
|
||||
from trains import Task
|
||||
from clearml import Task
|
||||
task = Task.init('myProject', 'myTask')
|
||||
|
||||
If this code runs again, it will not create a new Task. It does not store a model or artifact,
|
||||
@@ -285,7 +285,7 @@ class Task(_Task):
|
||||
This is equivalent to `continue_last_task=True` and `reuse_last_task_id=a_task_id_string`.
|
||||
|
||||
:param str output_uri: The default location for output models and other artifacts. In the default location,
|
||||
Trains creates a subfolder for the output. The subfolder structure is the following:
|
||||
ClearML creates a subfolder for the output. The subfolder structure is the following:
|
||||
|
||||
<output destination name> / <project name> / <task name>.< Task ID>
|
||||
|
||||
@@ -297,9 +297,9 @@ class Task(_Task):
|
||||
- Azure Storage: ``azure://company.blob.core.windows.net/folder/``
|
||||
|
||||
.. important::
|
||||
For cloud storage, you must install the **Trains** package for your cloud storage type,
|
||||
For cloud storage, you must install the **ClearML** package for your cloud storage type,
|
||||
and then configure your storage credentials. For detailed information, see
|
||||
`Trains Python Client Extras <./references/trains_extras_storage/>`_ in the "Trains Python Client
|
||||
`ClearML Python Client Extras <./references/clearml_extras_storage/>`_ in the "ClearML Python Client
|
||||
Reference" section.
|
||||
|
||||
:param auto_connect_arg_parser: Automatically connect an argparse object to the Task
|
||||
@@ -324,7 +324,7 @@ class Task(_Task):
|
||||
|
||||
:param auto_connect_frameworks: Automatically connect frameworks This includes patching MatplotLib, XGBoost,
|
||||
scikit-learn, Keras callbacks, and TensorBoard/X to serialize plots, graphs, and the model location to
|
||||
the **Trains Server** (backend), in addition to original output destination.
|
||||
the **ClearML Server** (backend), in addition to original output destination.
|
||||
|
||||
The values are:
|
||||
|
||||
@@ -342,7 +342,7 @@ class Task(_Task):
|
||||
'xgboost': True, 'scikit': True, 'fastai': True, 'lightgbm': True, 'hydra': True}
|
||||
|
||||
:param bool auto_resource_monitoring: Automatically create machine resource monitoring plots
|
||||
These plots appear in in the **Trains Web-App (UI)**, **RESULTS** tab, **SCALARS** sub-tab,
|
||||
These plots appear in in the **ClearML Web-App (UI)**, **RESULTS** tab, **SCALARS** sub-tab,
|
||||
with a title of **:resource monitor:**.
|
||||
|
||||
The values are:
|
||||
@@ -409,6 +409,7 @@ class Task(_Task):
|
||||
# create a new logger (to catch stdout/err)
|
||||
cls.__main_task._logger = None
|
||||
cls.__main_task.__reporter = None
|
||||
# noinspection PyProtectedMember
|
||||
cls.__main_task._get_logger(auto_connect_streams=auto_connect_streams)
|
||||
cls.__main_task._artifacts_manager = Artifacts(cls.__main_task)
|
||||
# unregister signal hooks, they cause subprocess to hang
|
||||
@@ -569,10 +570,10 @@ class Task(_Task):
|
||||
# show the debug metrics page in the log, it is very convenient
|
||||
if not is_sub_process_task_id:
|
||||
if cls._offline_mode:
|
||||
logger.report_text('TRAINS running in offline mode, session stored in {}'.format(
|
||||
logger.report_text('ClearML running in offline mode, session stored in {}'.format(
|
||||
task.get_offline_mode_folder()))
|
||||
else:
|
||||
logger.report_text('TRAINS results page: {}'.format(task.get_output_log_web_page()))
|
||||
logger.report_text('ClearML results page: {}'.format(task.get_output_log_web_page()))
|
||||
# Make sure we start the dev worker if required, otherwise it will only be started when we write
|
||||
# something to the log.
|
||||
task._dev_mode_task_start()
|
||||
@@ -580,55 +581,76 @@ class Task(_Task):
|
||||
return task
|
||||
|
||||
@classmethod
|
||||
def create(cls, project_name=None, task_name=None, task_type=TaskTypes.training):
|
||||
# type: (Optional[str], Optional[str], Task.TaskTypes) -> Task
|
||||
def create(
|
||||
cls,
|
||||
project_name=None, # Optional[str]
|
||||
task_name=None, # Optional[str]
|
||||
task_type=None, # Optional[str]
|
||||
repo=None, # Optional[str]
|
||||
branch=None, # Optional[str]
|
||||
commit=None, # Optional[str]
|
||||
script=None, # Optional[str]
|
||||
working_directory=None, # Optional[str]
|
||||
packages=None, # Optional[Sequence[str]]
|
||||
requirements_file=None, # Optional[Union[str, Path]]
|
||||
docker=None, # Optional[str]
|
||||
base_task_id=None, # Optional[str]
|
||||
add_task_init_call=True, # bool
|
||||
):
|
||||
# type: (...) -> Task
|
||||
"""
|
||||
Create a new, non-reproducible Task (experiment). This is called a sub-task.
|
||||
Manually create and populate a new Task (experiment) in the system.
|
||||
If the code does not already contain a call to ``Task.init``, pass add_task_init_call=True,
|
||||
and the code will be patched in remote execution (i.e. when executed by `clearml-agent`
|
||||
|
||||
.. note::
|
||||
This method always creates a new, non-reproducible Task. To create a reproducible Task, call the
|
||||
:meth:`Task.init` method. To reference another Task, call the :meth:`Task.get_task` method .
|
||||
This method **always** creates a new Task.
|
||||
Use :meth:`Task.init` method to automatically create and populate task for the running process.
|
||||
To reference an existing Task, call the :meth:`Task.get_task` method .
|
||||
|
||||
:param str project_name: The name of the project in which the experiment will be created.
|
||||
If ``project_name`` is ``None``, and the main execution Task is initialized (see :meth:`Task.init`),
|
||||
then the main execution Task's project is used. Otherwise, if the project does
|
||||
not exist, it is created. (Optional)
|
||||
:param str task_name: The name of Task (experiment).
|
||||
:param TaskTypes task_type: The task type.
|
||||
:param project_name: Set the project name for the task. Required if base_task_id is None.
|
||||
:param task_name: Set the name of the remote task. Required if base_task_id is None.
|
||||
:param task_type: Optional, The task type to be created. Supported values: 'training', 'testing', 'inference',
|
||||
'data_processing', 'application', 'monitor', 'controller', 'optimizer', 'service', 'qc', 'custom'
|
||||
:param repo: Remote URL for the repository to use, or path to local copy of the git repository
|
||||
Example: 'https://github.com/allegroai/clearml.git' or '~/project/repo'
|
||||
:param branch: Select specific repository branch/tag (implies the latest commit from the branch)
|
||||
:param commit: Select specific commit id to use (default: latest commit,
|
||||
or when used with local repository matching the local commit id)
|
||||
:param script: Specify the entry point script for the remote execution. When used in tandem with
|
||||
remote git repository the script should be a relative path inside the repository,
|
||||
for example: './source/train.py' . When used with local repository path it supports a
|
||||
direct path to a file inside the local repository itself, for example: '~/project/source/train.py'
|
||||
:param working_directory: Working directory to launch the script from. Default: repository root folder.
|
||||
Relative to repo root or local folder.
|
||||
:param packages: Manually specify a list of required packages. Example: ["tqdm>=2.1", "scikit-learn"]
|
||||
:param requirements_file: Specify requirements.txt file to install when setting the session.
|
||||
If not provided, the requirements.txt from the repository will be used.
|
||||
:param docker: Select the docker image to be executed in by the remote session
|
||||
:param base_task_id: Use a pre-existing task in the system, instead of a local repo/script.
|
||||
Essentially clones an existing task and overrides arguments/requirements.
|
||||
:param add_task_init_call: If True, a 'Task.init()' call is added to the script entry point in remote execution.
|
||||
|
||||
Valid task types:
|
||||
|
||||
- ``TaskTypes.training`` (default)
|
||||
- ``TaskTypes.testing``
|
||||
- ``TaskTypes.inference``
|
||||
- ``TaskTypes.data_processing``
|
||||
- ``TaskTypes.application``
|
||||
- ``TaskTypes.monitor``
|
||||
- ``TaskTypes.controller``
|
||||
- ``TaskTypes.optimizer``
|
||||
- ``TaskTypes.service``
|
||||
- ``TaskTypes.qc``
|
||||
- ``TaskTypes.custom``
|
||||
|
||||
:return: A new experiment.
|
||||
:return: The newly created Task (experiment)
|
||||
"""
|
||||
if not project_name:
|
||||
if not project_name and not base_task_id:
|
||||
if not cls.__main_task:
|
||||
raise ValueError("Please provide project_name, no global task context found "
|
||||
"(Task.current_task hasn't been called)")
|
||||
project_name = cls.__main_task.get_project_name()
|
||||
from .backend_interface.task.populate import CreateAndPopulate
|
||||
manual_populate = CreateAndPopulate(
|
||||
project_name=project_name, task_name=task_name, task_type=task_type,
|
||||
repo=repo, branch=branch, commit=commit,
|
||||
script=script, working_directory=working_directory,
|
||||
packages=packages, requirements_file=requirements_file,
|
||||
docker=docker,
|
||||
base_task_id=base_task_id,
|
||||
add_task_init_call=add_task_init_call,
|
||||
raise_on_missing_entries=False,
|
||||
)
|
||||
task = manual_populate.create_task()
|
||||
|
||||
try:
|
||||
task = cls(
|
||||
private=cls.__create_protection,
|
||||
project_name=project_name,
|
||||
task_name=task_name,
|
||||
task_type=task_type,
|
||||
log_to_backend=False,
|
||||
force_create=True,
|
||||
)
|
||||
except Exception:
|
||||
raise
|
||||
return task
|
||||
|
||||
@classmethod
|
||||
@@ -721,7 +743,7 @@ class Task(_Task):
|
||||
helper = StorageHelper.get(value)
|
||||
if not helper:
|
||||
raise ValueError("Could not get access credentials for '{}' "
|
||||
", check configuration file ~/trains.conf".format(value))
|
||||
", check configuration file ~/clearml.conf".format(value))
|
||||
helper.check_write_permissions(value)
|
||||
self.storage_uri = value
|
||||
|
||||
@@ -758,7 +780,7 @@ class Task(_Task):
|
||||
"""
|
||||
Get a Logger object for reporting, for this task context. You can view all Logger report output associated with
|
||||
the Task for which this method is called, including metrics, plots, text, tables, and images, in the
|
||||
**Trains Web-App (UI)**.
|
||||
**ClearML Web-App (UI)**.
|
||||
|
||||
:return: The Logger object for the current Task (experiment).
|
||||
"""
|
||||
@@ -796,8 +818,8 @@ class Task(_Task):
|
||||
"""
|
||||
assert isinstance(source_task, (six.string_types, Task))
|
||||
if not Session.check_min_api_version('2.4'):
|
||||
raise ValueError("Trains-server does not support DevOps features, "
|
||||
"upgrade trains-server to 0.12.0 or above")
|
||||
raise ValueError("ClearML-server does not support DevOps features, "
|
||||
"upgrade clearml-server to 0.12.0 or above")
|
||||
|
||||
task_id = source_task if isinstance(source_task, six.string_types) else source_task.id
|
||||
if not parent:
|
||||
@@ -820,7 +842,7 @@ class Task(_Task):
|
||||
|
||||
.. note::
|
||||
A worker daemon must be listening at the queue for the worker to fetch the Task and execute it,
|
||||
see `Use Case Examples <../trains_agent_ref/#use-case-examples>`_ on the "Trains Agent
|
||||
see `Use Case Examples <../clearml_agent_ref/#use-case-examples>`_ on the "ClearML Agent
|
||||
Reference page.
|
||||
|
||||
:param Task/str task: The Task to enqueue. Specify a Task object or Task ID.
|
||||
@@ -859,8 +881,8 @@ class Task(_Task):
|
||||
"""
|
||||
assert isinstance(task, (six.string_types, Task))
|
||||
if not Session.check_min_api_version('2.4'):
|
||||
raise ValueError("Trains-server does not support DevOps features, "
|
||||
"upgrade trains-server to 0.12.0 or above")
|
||||
raise ValueError("ClearML-server does not support DevOps features, "
|
||||
"upgrade clearml-server to 0.12.0 or above")
|
||||
|
||||
# make sure we have wither name ot id
|
||||
mutually_exclusive(queue_name=queue_name, queue_id=queue_id)
|
||||
@@ -923,8 +945,8 @@ class Task(_Task):
|
||||
"""
|
||||
assert isinstance(task, (six.string_types, Task))
|
||||
if not Session.check_min_api_version('2.4'):
|
||||
raise ValueError("Trains-server does not support DevOps features, "
|
||||
"upgrade trains-server to 0.12.0 or above")
|
||||
raise ValueError("ClearML-server does not support DevOps features, "
|
||||
"upgrade clearml-server to 0.12.0 or above")
|
||||
|
||||
task_id = task if isinstance(task, six.string_types) else task.id
|
||||
session = cls._get_default_session()
|
||||
@@ -990,7 +1012,7 @@ class Task(_Task):
|
||||
name = self._default_configuration_section_name
|
||||
|
||||
if not multi_config_support and name and name != self._default_configuration_section_name:
|
||||
raise ValueError("Multiple configurations is not supported with the current 'trains-server', "
|
||||
raise ValueError("Multiple configurations is not supported with the current 'clearml-server', "
|
||||
"please upgrade to the latest version")
|
||||
|
||||
for mutable_type, method in dispatch:
|
||||
@@ -1024,11 +1046,11 @@ class Task(_Task):
|
||||
:param configuration: The configuration. This is usually the configuration used in the model training process.
|
||||
Specify one of the following:
|
||||
|
||||
- A dictionary - A dictionary containing the configuration. Trains stores the configuration in
|
||||
the **Trains Server** (backend), in a HOCON format (JSON-like format) which is editable.
|
||||
- A ``pathlib2.Path`` string - A path to the configuration file. Trains stores the content of the file.
|
||||
- A dictionary - A dictionary containing the configuration. ClearML stores the configuration in
|
||||
the **ClearML Server** (backend), in a HOCON format (JSON-like format) which is editable.
|
||||
- A ``pathlib2.Path`` string - A path to the configuration file. ClearML stores the content of the file.
|
||||
A local path must be relative path. When executing a Task remotely in a worker, the contents brought
|
||||
from the **Trains Server** (backend) overwrites the contents of the file.
|
||||
from the **ClearML Server** (backend) overwrites the contents of the file.
|
||||
|
||||
:param str name: Configuration section name. default: 'General'
|
||||
Allowing users to store multiple configuration dicts/files
|
||||
@@ -1038,10 +1060,10 @@ class Task(_Task):
|
||||
:return: If a dictionary is specified, then a dictionary is returned. If pathlib2.Path / string is
|
||||
specified, then a path to a local configuration file is returned. Configuration object.
|
||||
"""
|
||||
pathlib_Path = None
|
||||
pathlib_Path = None # noqa
|
||||
if not isinstance(configuration, (dict, Path, six.string_types)):
|
||||
try:
|
||||
from pathlib import Path as pathlib_Path
|
||||
from pathlib import Path as pathlib_Path # noqa
|
||||
except ImportError:
|
||||
pass
|
||||
if not pathlib_Path or not isinstance(configuration, pathlib_Path):
|
||||
@@ -1053,7 +1075,7 @@ class Task(_Task):
|
||||
name = self._default_configuration_section_name
|
||||
|
||||
if not multi_config_support and name and name != self._default_configuration_section_name:
|
||||
raise ValueError("Multiple configurations is not supported with the current 'trains-server', "
|
||||
raise ValueError("Multiple configurations is not supported with the current 'clearml-server', "
|
||||
"please upgrade to the latest version")
|
||||
|
||||
# parameter dictionary
|
||||
@@ -1141,7 +1163,7 @@ class Task(_Task):
|
||||
return configuration
|
||||
|
||||
configuration_path = Path(configuration)
|
||||
fd, local_filename = mkstemp(prefix='trains_task_config_',
|
||||
fd, local_filename = mkstemp(prefix='clearml_task_config_',
|
||||
suffix=configuration_path.suffixes[-1] if
|
||||
configuration_path.suffixes else '.txt')
|
||||
os.write(fd, configuration_text.encode('utf-8'))
|
||||
@@ -1187,7 +1209,7 @@ class Task(_Task):
|
||||
"""
|
||||
Get a Logger object for reporting, for this task context. You can view all Logger report output associated with
|
||||
the Task for which this method is called, including metrics, plots, text, tables, and images, in the
|
||||
**Trains Web-App (UI)**.
|
||||
**ClearML Web-App (UI)**.
|
||||
|
||||
:return: The Logger for the Task (experiment).
|
||||
"""
|
||||
@@ -1247,7 +1269,7 @@ class Task(_Task):
|
||||
def reset(self, set_started_on_success=False, force=False):
|
||||
# type: (bool, bool) -> None
|
||||
"""
|
||||
Reset a Task. Trains reloads a Task after a successful reset.
|
||||
Reset a Task. ClearML reloads a Task after a successful reset.
|
||||
When a worker executes a Task remotely, the Task does not reset unless
|
||||
the ``force`` parameter is set to ``True`` (this avoids accidentally clearing logs and metrics).
|
||||
|
||||
@@ -1290,16 +1312,16 @@ class Task(_Task):
|
||||
# type: (str, pandas.DataFrame, Dict, Union[bool, Sequence[str]]) -> None
|
||||
"""
|
||||
Register (add) an artifact for the current Task. Registered artifacts are dynamically sychronized with the
|
||||
**Trains Server** (backend). If a registered artifact is updated, the update is stored in the
|
||||
**Trains Server** (backend). Registered artifacts are primarily used for Data Audition.
|
||||
**ClearML Server** (backend). If a registered artifact is updated, the update is stored in the
|
||||
**ClearML Server** (backend). Registered artifacts are primarily used for Data Audition.
|
||||
|
||||
The currently supported registered artifact object type is a pandas.DataFrame.
|
||||
|
||||
See also :meth:`Task.unregister_artifact` and :meth:`Task.get_registered_artifacts`.
|
||||
|
||||
.. note::
|
||||
Trains also supports uploaded artifacts which are one-time uploads of static artifacts that are not
|
||||
dynamically sychronized with the **Trains Server** (backend). These static artifacts include
|
||||
ClearML also supports uploaded artifacts which are one-time uploads of static artifacts that are not
|
||||
dynamically sychronized with the **ClearML Server** (backend). These static artifacts include
|
||||
additional object types. For more information, see :meth:`Task.upload_artifact`.
|
||||
|
||||
:param str name: The name of the artifact.
|
||||
@@ -1308,7 +1330,7 @@ class Task(_Task):
|
||||
If an artifact with the same name was previously registered, it is overwritten.
|
||||
:param object artifact: The artifact object.
|
||||
:param dict metadata: A dictionary of key-value pairs for any metadata. This dictionary appears with the
|
||||
experiment in the **Trains Web-App (UI)**, **ARTIFACTS** tab.
|
||||
experiment in the **ClearML Web-App (UI)**, **ARTIFACTS** tab.
|
||||
:param uniqueness_columns: A Sequence of columns for artifact uniqueness comparison criteria, or the default
|
||||
value of ``True``. If ``True``, the artifact uniqueness comparison criteria is all the columns,
|
||||
which is the same as ``artifact.columns``.
|
||||
@@ -1323,13 +1345,13 @@ class Task(_Task):
|
||||
def unregister_artifact(self, name):
|
||||
# type: (str) -> None
|
||||
"""
|
||||
Unregister (remove) a registered artifact. This removes the artifact from the watch list that Trains uses
|
||||
to synchronize artifacts with the **Trains Server** (backend).
|
||||
Unregister (remove) a registered artifact. This removes the artifact from the watch list that ClearML uses
|
||||
to synchronize artifacts with the **ClearML Server** (backend).
|
||||
|
||||
.. important::
|
||||
- Calling this method does not remove the artifact from a Task. It only stops Trains from
|
||||
- Calling this method does not remove the artifact from a Task. It only stops ClearML from
|
||||
monitoring the artifact.
|
||||
- When this method is called, Trains immediately takes the last snapshot of the artifact.
|
||||
- When this method is called, ClearML immediately takes the last snapshot of the artifact.
|
||||
"""
|
||||
self._artifacts_manager.unregister_artifact(name=name)
|
||||
|
||||
@@ -1361,12 +1383,12 @@ class Task(_Task):
|
||||
|
||||
The currently supported upload (static) artifact types include:
|
||||
|
||||
- string / pathlib2.Path - A path to artifact file. If a wildcard or a folder is specified, then Trains
|
||||
- string / pathlib2.Path - A path to artifact file. If a wildcard or a folder is specified, then ClearML
|
||||
creates and uploads a ZIP file.
|
||||
- dict - Trains stores a dictionary as ``.json`` file and uploads it.
|
||||
- pandas.DataFrame - Trains stores a pandas.DataFrame as ``.csv.gz`` (compressed CSV) file and uploads it.
|
||||
- numpy.ndarray - Trains stores a numpy.ndarray as ``.npz`` file and uploads it.
|
||||
- PIL.Image - Trains stores a PIL.Image as ``.png`` file and uploads it.
|
||||
- dict - ClearML stores a dictionary as ``.json`` file and uploads it.
|
||||
- pandas.DataFrame - ClearML stores a pandas.DataFrame as ``.csv.gz`` (compressed CSV) file and uploads it.
|
||||
- numpy.ndarray - ClearML stores a numpy.ndarray as ``.npz`` file and uploads it.
|
||||
- PIL.Image - ClearML stores a PIL.Image as ``.png`` file and uploads it.
|
||||
- Any - If called with auto_pickle=True, the object will be pickled and uploaded.
|
||||
|
||||
:param str name: The artifact name.
|
||||
@@ -1376,7 +1398,7 @@ class Task(_Task):
|
||||
|
||||
:param object artifact_object: The artifact object.
|
||||
:param dict metadata: A dictionary of key-value pairs for any metadata. This dictionary appears with the
|
||||
experiment in the **Trains Web-App (UI)**, **ARTIFACTS** tab.
|
||||
experiment in the **ClearML Web-App (UI)**, **ARTIFACTS** tab.
|
||||
:param bool delete_after_upload: After the upload, delete the local copy of the artifact
|
||||
|
||||
- ``True`` - Delete the local copy of the artifact.
|
||||
@@ -1416,7 +1438,7 @@ class Task(_Task):
|
||||
|
||||
.. code-block:: py
|
||||
|
||||
{'input': [trains.Model()], 'output': [trains.Model()]}
|
||||
{'input': [clearml.Model()], 'output': [clearml.Model()]}
|
||||
|
||||
"""
|
||||
task_models = {'input': self._get_models(model_type='input'),
|
||||
@@ -1510,7 +1532,7 @@ class Task(_Task):
|
||||
|
||||
.. note::
|
||||
The maximum reported iteration is not in the local cache. This method
|
||||
sends a request to the **Trains Server** (backend).
|
||||
sends a request to the **ClearML Server** (backend).
|
||||
|
||||
:return: The last reported iteration number.
|
||||
"""
|
||||
@@ -1704,7 +1726,7 @@ class Task(_Task):
|
||||
# type: (str) -> ()
|
||||
"""
|
||||
Set the base docker image for this experiment
|
||||
If provided, this value will be used by trains-agent to execute this experiment
|
||||
If provided, this value will be used by clearml-agent to execute this experiment
|
||||
inside the provided docker image.
|
||||
"""
|
||||
if not self.running_locally() and self.is_main_task():
|
||||
@@ -1732,12 +1754,12 @@ class Task(_Task):
|
||||
def execute_remotely(self, queue_name=None, clone=False, exit_process=True):
|
||||
# type: (Optional[str], bool, bool) -> Optional[Task]
|
||||
"""
|
||||
If task is running locally (i.e., not by ``trains-agent``), then clone the Task and enqueue it for remote
|
||||
If task is running locally (i.e., not by ``clearml-agent``), then clone the Task and enqueue it for remote
|
||||
execution; or, stop the execution of the current Task, reset its state, and enqueue it. If ``exit==True``,
|
||||
*exit* this process.
|
||||
|
||||
.. note::
|
||||
If the task is running remotely (i.e., ``trains-agent`` is executing it), this call is a no-op
|
||||
If the task is running remotely (i.e., ``clearml-agent`` is executing it), this call is a no-op
|
||||
(i.e., does nothing).
|
||||
|
||||
:param queue_name: The queue name used for enqueueing the task. If ``None``, this call exits the process
|
||||
@@ -2006,12 +2028,12 @@ class Task(_Task):
|
||||
:param session_folder_zip: Path to a folder containing the session, or zip-file of the session folder.
|
||||
:return: Newly created task ID (str)
|
||||
"""
|
||||
print('TRAINS: Importing offline session from {}'.format(session_folder_zip))
|
||||
print('ClearML: Importing offline session from {}'.format(session_folder_zip))
|
||||
|
||||
temp_folder = None
|
||||
if Path(session_folder_zip).is_file():
|
||||
# unzip the file:
|
||||
temp_folder = mkdtemp(prefix='trains-offline-')
|
||||
temp_folder = mkdtemp(prefix='clearml-offline-')
|
||||
ZipFile(session_folder_zip).extractall(path=temp_folder)
|
||||
session_folder_zip = temp_folder
|
||||
|
||||
@@ -2053,7 +2075,7 @@ class Task(_Task):
|
||||
# metrics
|
||||
Metrics.report_offline_session(task, session_folder)
|
||||
# print imported results page
|
||||
print('TRAINS results page: {}'.format(task.get_output_log_web_page()))
|
||||
print('ClearML results page: {}'.format(task.get_output_log_web_page()))
|
||||
task.completed()
|
||||
# close task
|
||||
task.close()
|
||||
@@ -2072,10 +2094,10 @@ class Task(_Task):
|
||||
def set_credentials(cls, api_host=None, web_host=None, files_host=None, key=None, secret=None, host=None):
|
||||
# type: (Optional[str], Optional[str], Optional[str], Optional[str], Optional[str], Optional[str]) -> ()
|
||||
"""
|
||||
Set new default **Trains Server** (backend) host and credentials.
|
||||
Set new default **ClearML Server** (backend) host and credentials.
|
||||
|
||||
These credentials will be overridden by either OS environment variables, or the Trains configuration
|
||||
file, ``trains.conf``.
|
||||
These credentials will be overridden by either OS environment variables, or the ClearML configuration
|
||||
file, ``clearml.conf``.
|
||||
|
||||
.. warning::
|
||||
Credentials must be set before initializing a Task object.
|
||||
@@ -2114,6 +2136,40 @@ class Task(_Task):
|
||||
Session.default_web = web_host or ''
|
||||
Session.default_files = files_host or ''
|
||||
|
||||
@classmethod
|
||||
def _create(cls, project_name=None, task_name=None, task_type=TaskTypes.training):
|
||||
# type: (Optional[str], Optional[str], Task.TaskTypes) -> Task
|
||||
"""
|
||||
Create a new unpopulated Task (experiment).
|
||||
|
||||
:param str project_name: The name of the project in which the experiment will be created.
|
||||
If ``project_name`` is ``None``, and the main execution Task is initialized (see :meth:`Task.init`),
|
||||
then the main execution Task's project is used. Otherwise, if the project does
|
||||
not exist, it is created. (Optional)
|
||||
:param str task_name: The name of Task (experiment).
|
||||
:param TaskTypes task_type: The task type.
|
||||
|
||||
:return: The newly created task created.
|
||||
"""
|
||||
if not project_name:
|
||||
if not cls.__main_task:
|
||||
raise ValueError("Please provide project_name, no global task context found "
|
||||
"(Task.current_task hasn't been called)")
|
||||
project_name = cls.__main_task.get_project_name()
|
||||
|
||||
try:
|
||||
task = cls(
|
||||
private=cls.__create_protection,
|
||||
project_name=project_name,
|
||||
task_name=task_name,
|
||||
task_type=task_type,
|
||||
log_to_backend=False,
|
||||
force_create=True,
|
||||
)
|
||||
except Exception:
|
||||
raise
|
||||
return task
|
||||
|
||||
def _set_model_config(self, config_text=None, config_dict=None):
|
||||
# type: (Optional[str], Optional[Mapping]) -> None
|
||||
"""
|
||||
@@ -2285,15 +2341,15 @@ class Task(_Task):
|
||||
# force update of base logger to this current task (this is the main logger task)
|
||||
logger = task._get_logger(auto_connect_streams=auto_connect_streams)
|
||||
if closed_old_task:
|
||||
logger.report_text('TRAINS Task: Closing old development task id={}'.format(default_task.get('id')))
|
||||
logger.report_text('ClearML Task: Closing old development task id={}'.format(default_task.get('id')))
|
||||
# print warning, reusing/creating a task
|
||||
if default_task_id and not continue_last_task:
|
||||
logger.report_text('TRAINS Task: overwriting (reusing) task id=%s' % task.id)
|
||||
logger.report_text('ClearML Task: overwriting (reusing) task id=%s' % task.id)
|
||||
elif default_task_id and continue_last_task:
|
||||
logger.report_text('TRAINS Task: continuing previous task id=%s '
|
||||
logger.report_text('ClearML Task: continuing previous task id=%s '
|
||||
'Notice this run will not be reproducible!' % task.id)
|
||||
else:
|
||||
logger.report_text('TRAINS Task: created new task id=%s' % task.id)
|
||||
logger.report_text('ClearML Task: created new task id=%s' % task.id)
|
||||
|
||||
# update current repository and put warning into logs
|
||||
if detect_repo:
|
||||
@@ -2567,8 +2623,7 @@ class Task(_Task):
|
||||
self._kill_all_child_processes(send_kill=False)
|
||||
time.sleep(2.0)
|
||||
self._kill_all_child_processes(send_kill=True)
|
||||
# noinspection PyProtectedMember
|
||||
os._exit(1)
|
||||
os._exit(1) # noqa
|
||||
|
||||
@staticmethod
|
||||
def _kill_all_child_processes(send_kill=False):
|
||||
@@ -2800,7 +2855,7 @@ class Task(_Task):
|
||||
if filename.is_file():
|
||||
relative_file_name = filename.relative_to(offline_folder).as_posix()
|
||||
zf.write(filename.as_posix(), arcname=relative_file_name)
|
||||
print('TRAINS Task: Offline session stored in {}'.format(zip_file))
|
||||
print('ClearML Task: Offline session stored in {}'.format(zip_file))
|
||||
except Exception:
|
||||
pass
|
||||
|
||||
@@ -3179,8 +3234,8 @@ class Task(_Task):
|
||||
task_data.get('type') not in (cls.TaskTypes.training, cls.TaskTypes.testing) and \
|
||||
not Session.check_min_api_version(2.8):
|
||||
print('WARNING: Changing task type to "{}" : '
|
||||
'trains-server does not support task type "{}", '
|
||||
'please upgrade trains-server.'.format(cls.TaskTypes.training, task_data['type'].value))
|
||||
'clearml-server does not support task type "{}", '
|
||||
'please upgrade clearml-server.'.format(cls.TaskTypes.training, task_data['type'].value))
|
||||
task_data['type'] = cls.TaskTypes.training
|
||||
|
||||
compares = (
|
||||
|
||||
@@ -46,7 +46,7 @@ class PatchArgumentParser:
|
||||
from ..config import running_remotely, get_remote_task_id
|
||||
if running_remotely():
|
||||
# this will cause the current_task() to set PatchArgumentParser._current_task
|
||||
from trains import Task
|
||||
from clearml import Task
|
||||
# noinspection PyBroadException
|
||||
try:
|
||||
current_task = Task.get_task(task_id=get_remote_task_id())
|
||||
|
||||
@@ -27,10 +27,10 @@ class CheckPackageUpdates(object):
|
||||
cls._package_version_checked = True
|
||||
client, version = Session._client[0]
|
||||
version = Version(version)
|
||||
is_demo = 'https://demoapi.trains.allegro.ai/'.startswith(Session.get_api_server_host())
|
||||
is_demo = 'https://demoapi.demo.clear.ml/'.startswith(Session.get_api_server_host())
|
||||
|
||||
update_server_releases = requests.get(
|
||||
'https://updates.trains.allegro.ai/updates',
|
||||
'https://updates.clear.ml/updates',
|
||||
json={"demo": is_demo,
|
||||
"versions": {c: str(v) for c, v in Session._client},
|
||||
"CI": str(os.environ.get('CI', ''))},
|
||||
@@ -62,13 +62,13 @@ class CheckPackageUpdates(object):
|
||||
@staticmethod
|
||||
def get_version_from_updates_server(cur_version):
|
||||
"""
|
||||
Get the latest version for trains from updates server
|
||||
:param cur_version: The current running version of trains
|
||||
Get the latest version for clearml from updates server
|
||||
:param cur_version: The current running version of clearml
|
||||
:type cur_version: Version
|
||||
"""
|
||||
try:
|
||||
_ = requests.get('https://updates.trains.allegro.ai/updates',
|
||||
data=json.dumps({"versions": {"trains": str(cur_version)}}),
|
||||
_ = requests.get('https://updates.clear.ml/updates',
|
||||
data=json.dumps({"versions": {"clearml": str(cur_version)}}),
|
||||
timeout=1.0)
|
||||
return
|
||||
except Exception:
|
||||
|
||||
@@ -77,6 +77,19 @@ class ProxyDictPreWrite(dict):
|
||||
return self._set_callback((prefix + '.' + key_value[0], key_value[1],))
|
||||
|
||||
|
||||
def verify_basic_type(a_dict_list, basic_types=None):
|
||||
basic_types = (float, int, bool, six.string_types, ) if not basic_types else \
|
||||
tuple(b for b in basic_types if b not in (list, tuple, dict))
|
||||
|
||||
if isinstance(a_dict_list, basic_types):
|
||||
return True
|
||||
if isinstance(a_dict_list, (list, tuple)):
|
||||
return all(verify_basic_type(v) for v in a_dict_list)
|
||||
elif isinstance(a_dict_list, dict):
|
||||
return all(verify_basic_type(k) for k in a_dict_list.keys()) and \
|
||||
all(verify_basic_type(v) for v in a_dict_list.values())
|
||||
|
||||
|
||||
def flatten_dictionary(a_dict, prefix=''):
|
||||
flat_dict = {}
|
||||
sep = '/'
|
||||
@@ -88,7 +101,11 @@ def flatten_dictionary(a_dict, prefix=''):
|
||||
elif isinstance(v, (list, tuple)) and all([isinstance(i, basic_types) for i in v]):
|
||||
flat_dict[prefix + k] = v
|
||||
elif isinstance(v, dict):
|
||||
flat_dict.update(flatten_dictionary(v, prefix=prefix + k + sep))
|
||||
nested_flat_dict = flatten_dictionary(v, prefix=prefix + k + sep)
|
||||
if nested_flat_dict:
|
||||
flat_dict.update(nested_flat_dict)
|
||||
else:
|
||||
flat_dict[k] = {}
|
||||
else:
|
||||
# this is a mixture of list and dict, or any other object,
|
||||
# leave it as is, we have nothing to do with it.
|
||||
|
||||
@@ -41,7 +41,7 @@ class ResourceMonitor(object):
|
||||
self._last_process_pool = {}
|
||||
self._last_process_id_list = []
|
||||
if not self._gpustat:
|
||||
self._task.get_logger().report_text('TRAINS Monitor: GPU monitoring is not available')
|
||||
self._task.get_logger().report_text('ClearML Monitor: GPU monitoring is not available')
|
||||
else: # if running_remotely():
|
||||
try:
|
||||
active_gpus = os.environ.get('NVIDIA_VISIBLE_DEVICES', '') or \
|
||||
@@ -105,13 +105,13 @@ class ResourceMonitor(object):
|
||||
if IsTensorboardInit.tensorboard_used():
|
||||
fallback_to_sec_as_iterations = False
|
||||
elif seconds_since_started >= self.wait_for_first_iteration:
|
||||
self._task.get_logger().report_text('TRAINS Monitor: Could not detect iteration reporting, '
|
||||
self._task.get_logger().report_text('ClearML Monitor: Could not detect iteration reporting, '
|
||||
'falling back to iterations as seconds-from-start')
|
||||
fallback_to_sec_as_iterations = True
|
||||
elif fallback_to_sec_as_iterations is True and seconds_since_started <= self.max_check_first_iteration:
|
||||
if self._check_logger_reported():
|
||||
fallback_to_sec_as_iterations = False
|
||||
self._task.get_logger().report_text('TRAINS Monitor: Reporting detected, '
|
||||
self._task.get_logger().report_text('ClearML Monitor: Reporting detected, '
|
||||
'reverting back to iteration based reporting')
|
||||
|
||||
clear_readouts = True
|
||||
@@ -231,7 +231,7 @@ class ResourceMonitor(object):
|
||||
# something happened and we can't use gpu stats,
|
||||
self._gpustat_fail += 1
|
||||
if self._gpustat_fail >= 3:
|
||||
self._task.get_logger().report_text('TRAINS Monitor: GPU monitoring failed getting GPU reading, '
|
||||
self._task.get_logger().report_text('ClearML Monitor: GPU monitoring failed getting GPU reading, '
|
||||
'switching off GPU monitoring')
|
||||
self._gpustat = None
|
||||
|
||||
|
||||
@@ -12,7 +12,7 @@ def make_deterministic(seed=1337, cudnn_deterministic=False):
|
||||
Ensure deterministic behavior across PyTorch using the provided random seed.
|
||||
This function makes sure that torch, numpy and random use the same random seed.
|
||||
|
||||
When using trains's task, call this function using the task's random seed like so:
|
||||
When using clearml's task, call this function using the task's random seed like so:
|
||||
make_deterministic(task.get_random_seed())
|
||||
|
||||
:param int seed: Seed number
|
||||
|
||||
@@ -1 +1 @@
|
||||
__version__ = '0.16.4'
|
||||
__version__ = '0.17.0rc0'
|
||||
|
||||
136
docs/clearml-task.md
Normal file
136
docs/clearml-task.md
Normal file
@@ -0,0 +1,136 @@
|
||||
# `clearml-task` - Execute ANY python code on a remote machine
|
||||
|
||||
If you are already familiar with `clearml`, then you can think of `clearml-task` as a way to create a Task/experiment
|
||||
from any script without the need to add even a single line of code to the original codebase.
|
||||
|
||||
`clearml-task` allows a user to **take any python code/repository and launch it on a remote machine**.
|
||||
|
||||
The remote execution is fully monitored, all outputs - including console / tensorboard / matplotlib
|
||||
are logged in real-time into the ClearML UI
|
||||
|
||||
## What does it do?
|
||||
|
||||
`clearml-task` creates a new experiment on your `clearml-server`; it populates the experiment's environment with:
|
||||
|
||||
* repository/commit/branch, as specified by the command-line invocation.
|
||||
* optional: the base docker image to be used as underlying environment
|
||||
* optional: alternative python requirements, in case `requirements.txt` is not found inside the repository.
|
||||
|
||||
Once the new experiment is created and populated, it will enqueue the experiment to the selected execution queue.
|
||||
|
||||
When the experiment is executed on the remote machine (performed by an available `clearml-agent`), all the console outputs
|
||||
will be logged in real-time, alongside your TensorBoard and matplotlib.
|
||||
|
||||
### Use-cases for `clearml-task` remote execution
|
||||
|
||||
- You have an off-the-shelf code, and you want to launch it on a remote machine with a specific resource (i.e., GPU)
|
||||
- You want to run the [hyper-parameter optimization]() on a codebase that is still not connected with `clearml`
|
||||
- You want to create a [pipeline]() from an assortment of scripts, and you need to create Tasks for those scripts
|
||||
- Sometimes, you just want to run some code on a remote machine, either using an on-prem cluster or on the cloud...
|
||||
|
||||
### Prerequisites
|
||||
|
||||
- A single python script, or an up-to-date repository containing the codebase.
|
||||
- `clearml-agent` running on at least one machine (to execute the experiment)
|
||||
|
||||
## Tutorial
|
||||
|
||||
### Launching a job from a repository
|
||||
|
||||
We will be launching this [script](https://github.com/allegroai/trains/blob/master/examples/frameworks/scikit-learn/sklearn_matplotlib_example.py) on a remote machine. The following are the command-line options we will be using:
|
||||
- First, we have to give the experiment a name and select a project (`--project examples --name remote_test`)
|
||||
- Then, we select the repository with our code. If we do not specify branch / commit, it will take the latest commit
|
||||
from the master branch (`--repo https://github.com/allegroai/clearml.git`)
|
||||
- Lastly, we need to specify which script in the repository needs to be run (`--script examples/frameworks/scikit-learn/sklearn_matplotlib_example.py`)
|
||||
Notice that by default, the execution working directory will be the root of the repository. If we need to change it, add `--cwd <folder>`
|
||||
|
||||
If we additionally need to pass an argument to our scripts, use the `--args` switch.
|
||||
The names of the arguments should match the argparse arguments, removing the '--' prefix
|
||||
(e.g. instead of --key=value -> use `--args key=value` )
|
||||
|
||||
``` bash
|
||||
clearml-task --project examples --name remote_test --repo https://github.com/allegroai/clearml.git
|
||||
--script examples/frameworks/scikit-learn/sklearn_matplotlib_example.py
|
||||
--queue single_gpu
|
||||
```
|
||||
|
||||
### Launching a job from a local script
|
||||
|
||||
We will be launching a single local script file (no git repo needed) on a remote machine.
|
||||
|
||||
- First, we have to give the experiment a name and select a project (`--project examples --name remote_test`)
|
||||
- Then, we select the script file on our machine, `--script /path/to/my/script.py`
|
||||
- If we need specific packages, we can specify them manually with `--packages "tqdm>=4" "torch>1.0"`
|
||||
or we can pass a requirements file `--requirements /path/to/my/requirements.txt`
|
||||
- Same as in the repo case, if we need to pass arguments to `argparse` we can add `--args key=value`
|
||||
- If we have a docker container with an entire environment we want our script to run inside,
|
||||
add e.g., `--docker nvcr.io/nvidia/pytorch:20.11-py3`
|
||||
|
||||
Note: In this example, the exact version of PyTorch to install will be resolved by the `clearml-agent` depending on the CUDA environment available at runtime.
|
||||
|
||||
``` bash
|
||||
clearml-task --project examples --name remote_test --script /path/to/my/script.py
|
||||
--packages "tqdm>=4" "torch>1.0" --args verbose=true
|
||||
--queue dual_gpu
|
||||
```
|
||||
|
||||
### CLI options
|
||||
|
||||
``` bash
|
||||
clearml-task --help
|
||||
```
|
||||
|
||||
``` console
|
||||
ClearML launch - launch any codebase on remote machines running clearml-agent
|
||||
|
||||
optional arguments:
|
||||
-h, --help show this help message and exit
|
||||
--version Display the Allegro.ai utility version
|
||||
--project PROJECT Required: set the project name for the task. If
|
||||
--base-task-id is used, this arguments is optional.
|
||||
--name NAME Required: select a name for the remote task
|
||||
--repo REPO remote URL for the repository to use. Example: --repo
|
||||
https://github.com/allegroai/clearml.git
|
||||
--branch BRANCH Select specific repository branch/tag (implies the
|
||||
latest commit from the branch)
|
||||
--commit COMMIT Select specific commit id to use (default: latest
|
||||
commit, or when used with local repository matching
|
||||
the local commit id)
|
||||
--folder FOLDER Remotely execute the code in the local folder. Notice!
|
||||
It assumes a git repository already exists. Current
|
||||
state of the repo (commit id and uncommitted changes)
|
||||
is logged and will be replicated on the remote machine
|
||||
--script SCRIPT Specify the entry point script for the remote
|
||||
execution. When used in tandem with --repo the script
|
||||
should be a relative path inside the repository, for
|
||||
example: --script source/train.py .When used with
|
||||
--folder it supports a direct path to a file inside
|
||||
the local repository itself, for example: --script
|
||||
~/project/source/train.py
|
||||
--cwd CWD Working directory to launch the script from. Default:
|
||||
repository root folder. Relative to repo root or local
|
||||
folder
|
||||
--args [ARGS [ARGS ...]]
|
||||
Arguments to pass to the remote execution, list of
|
||||
<argument>=<value> strings.Currently only argparse
|
||||
arguments are supported. Example: --args lr=0.003
|
||||
batch_size=64
|
||||
--queue QUEUE Select the queue to launch the task. If not provided a
|
||||
Task will be created but it will not be launched.
|
||||
--requirements REQUIREMENTS
|
||||
Specify requirements.txt file to install when setting
|
||||
the session. If not provided, the requirements.txt
|
||||
from the repository will be used.
|
||||
--packages [PACKAGES [PACKAGES ...]]
|
||||
Manually specify a list of required packages. Example:
|
||||
--packages "tqdm>=2.1" "scikit-learn"
|
||||
--docker DOCKER Select the docker image to use in the remote session
|
||||
--skip-task-init If set, Task.init() call is not added to the entry
|
||||
point, and is assumed to be called in within the
|
||||
script. Default: add Task.init() call entry point
|
||||
script
|
||||
--base-task-id BASE_TASK_ID
|
||||
Use a pre-existing task in the system, instead of a local repo/script.
|
||||
Essentially clones an existing task and overrides arguments/requirements.
|
||||
|
||||
```
|
||||
196
docs/clearml.conf
Normal file
196
docs/clearml.conf
Normal file
@@ -0,0 +1,196 @@
|
||||
# ClearML SDK configuration file
|
||||
api {
|
||||
# web_server on port 8080
|
||||
web_server: "http://localhost:8080"
|
||||
|
||||
# Notice: 'api_server' is the api server (default port 8008), not the web server.
|
||||
api_server: "http://localhost:8008"
|
||||
|
||||
# file server on port 8081
|
||||
files_server: "http://localhost:8081"
|
||||
|
||||
# Credentials are generated using the webapp, http://localhost:8080/profile
|
||||
credentials {"access_key": "EGRTCO8JMSIGI6S39GTP43NFWXDQOW", "secret_key": "x!XTov_G-#vspE*Y(h$Anm&DIc5Ou-F)jsl$PdOyj5wG1&E!Z8"}
|
||||
|
||||
# verify host ssl certificate, set to False only if you have a very good reason
|
||||
verify_certificate: True
|
||||
}
|
||||
sdk {
|
||||
# ClearML - default SDK configuration
|
||||
|
||||
storage {
|
||||
cache {
|
||||
# Defaults to system temp folder / cache
|
||||
default_base_dir: "~/.clearml/cache"
|
||||
}
|
||||
}
|
||||
|
||||
metrics {
|
||||
# History size for debug files per metric/variant. For each metric/variant combination with an attached file
|
||||
# (e.g. debug image event), file names for the uploaded files will be recycled in such a way that no more than
|
||||
# X files are stored in the upload destination for each metric/variant combination.
|
||||
file_history_size: 100
|
||||
|
||||
# Max history size for matplotlib imshow files per plot title.
|
||||
# File names for the uploaded images will be recycled in such a way that no more than
|
||||
# X images are stored in the upload destination for each matplotlib plot title.
|
||||
matplotlib_untitled_history_size: 100
|
||||
|
||||
# Limit the number of digits after the dot in plot reporting (reducing plot report size)
|
||||
# plot_max_num_digits: 5
|
||||
|
||||
# Settings for generated debug images
|
||||
images {
|
||||
format: JPEG
|
||||
quality: 87
|
||||
subsampling: 0
|
||||
}
|
||||
|
||||
# Support plot-per-graph fully matching Tensorboard behavior (i.e. if this is set to true, each series should have its own graph)
|
||||
tensorboard_single_series_per_graph: false
|
||||
}
|
||||
|
||||
network {
|
||||
metrics {
|
||||
# Number of threads allocated to uploading files (typically debug images) when transmitting metrics for
|
||||
# a specific iteration
|
||||
file_upload_threads: 4
|
||||
|
||||
# Warn about upload starvation if no uploads were made in specified period while file-bearing events keep
|
||||
# being sent for upload
|
||||
file_upload_starvation_warning_sec: 120
|
||||
}
|
||||
|
||||
iteration {
|
||||
# Max number of retries when getting frames if the server returned an error (http code 500)
|
||||
max_retries_on_server_error: 5
|
||||
# Backoff factory for consecutive retry attempts.
|
||||
# SDK will wait for {backoff factor} * (2 ^ ({number of total retries} - 1)) between retries.
|
||||
retry_backoff_factor_sec: 10
|
||||
}
|
||||
}
|
||||
aws {
|
||||
s3 {
|
||||
# S3 credentials, used for read/write access by various SDK elements
|
||||
|
||||
# default, used for any bucket not specified below
|
||||
key: ""
|
||||
secret: ""
|
||||
region: ""
|
||||
|
||||
credentials: [
|
||||
# specifies key/secret credentials to use when handling s3 urls (read or write)
|
||||
# {
|
||||
# bucket: "my-bucket-name"
|
||||
# key: "my-access-key"
|
||||
# secret: "my-secret-key"
|
||||
# },
|
||||
# {
|
||||
# # This will apply to all buckets in this host (unless key/value is specifically provided for a given bucket)
|
||||
# host: "my-minio-host:9000"
|
||||
# key: "12345678"
|
||||
# secret: "12345678"
|
||||
# multipart: false
|
||||
# secure: false
|
||||
# }
|
||||
]
|
||||
}
|
||||
boto3 {
|
||||
pool_connections: 512
|
||||
max_multipart_concurrency: 16
|
||||
}
|
||||
}
|
||||
google.storage {
|
||||
# # Default project and credentials file
|
||||
# # Will be used when no bucket configuration is found
|
||||
# project: "clearml"
|
||||
# credentials_json: "/path/to/credentials.json"
|
||||
|
||||
# # Specific credentials per bucket and sub directory
|
||||
# credentials = [
|
||||
# {
|
||||
# bucket: "my-bucket"
|
||||
# subdir: "path/in/bucket" # Not required
|
||||
# project: "clearml"
|
||||
# credentials_json: "/path/to/credentials.json"
|
||||
# },
|
||||
# ]
|
||||
}
|
||||
azure.storage {
|
||||
# containers: [
|
||||
# {
|
||||
# account_name: "clearml"
|
||||
# account_key: "secret"
|
||||
# # container_name:
|
||||
# }
|
||||
# ]
|
||||
}
|
||||
|
||||
log {
|
||||
# debugging feature: set this to true to make null log propagate messages to root logger (so they appear in stdout)
|
||||
null_log_propagate: false
|
||||
task_log_buffer_capacity: 66
|
||||
|
||||
# disable urllib info and lower levels
|
||||
disable_urllib3_info: true
|
||||
}
|
||||
|
||||
development {
|
||||
# Development-mode options
|
||||
|
||||
# dev task reuse window
|
||||
task_reuse_time_window_in_hours: 72.0
|
||||
|
||||
# Run VCS repository detection asynchronously
|
||||
vcs_repo_detect_async: true
|
||||
|
||||
# Store uncommitted git/hg source code diff in experiment manifest when training in development mode
|
||||
# This stores "git diff" or "hg diff" into the experiment's "script.requirements.diff" section
|
||||
store_uncommitted_code_diff: true
|
||||
store_code_diff_from_remote: false
|
||||
|
||||
# Support stopping an experiment in case it was externally stopped, status was changed or task was reset
|
||||
support_stopping: true
|
||||
|
||||
# Default Task output_uri. if output_uri is not provided to Task.init, default_output_uri will be used instead.
|
||||
default_output_uri: ""
|
||||
|
||||
# Default auto generated requirements optimize for smaller requirements
|
||||
# If True, analyze the entire repository regardless of the entry point.
|
||||
# If False, first analyze the entry point script, if it does not contain other to local files,
|
||||
# do not analyze the entire repository.
|
||||
force_analyze_entire_repo: false
|
||||
|
||||
# If set to true, *clearml* update message will not be printed to the console
|
||||
# this value can be overwritten with os environment variable CLEARML_SUPPRESS_UPDATE_MESSAGE=1
|
||||
suppress_update_message: false
|
||||
|
||||
# If this flag is true (default is false), instead of analyzing the code with Pigar, analyze with `pip freeze`
|
||||
detect_with_pip_freeze: false
|
||||
detect_with_conda_freeze: false
|
||||
|
||||
# Log specific environment variables. OS environments are enlisted in the "Environment" section
|
||||
# of the Hyper-Parameters.
|
||||
# multiple selected variables are supported including the suffix '*'.
|
||||
# For example: "AWS_*" will log any OS environment variable starting with 'AWS_'.
|
||||
# This value can be overwritten with os environment variable CLEARML_LOG_ENVIRONMENT="[AWS_*, CUDA_VERSION]"
|
||||
# Example: log_os_environments: ["AWS_*", "CUDA_VERSION"]
|
||||
log_os_environments: []
|
||||
|
||||
# Development mode worker
|
||||
worker {
|
||||
# Status report period in seconds
|
||||
report_period_sec: 2
|
||||
|
||||
# ping to the server - check connectivity
|
||||
ping_period_sec: 30
|
||||
|
||||
# Log all stdout & stderr
|
||||
log_stdout: true
|
||||
|
||||
# compatibility feature, report memory usage for the entire machine
|
||||
# default (false), report only on the running process and its sub-processes
|
||||
report_global_mem_used: false
|
||||
}
|
||||
}
|
||||
}
|
||||
@@ -2,39 +2,40 @@
|
||||
|
||||
Firstly, we thank you for taking the time to contribute!
|
||||
|
||||
The following is a set of guidelines for contributing to TRAINS.
|
||||
Contribution comes in many forms:
|
||||
* Reporting [issues](https://github.com/allegroai/clearml/issues) you've come upon
|
||||
* Participating in issue discussions in the [issue tracker](https://github.com/allegroai/clearml/issues) and the [ClearML community slack space](https://join.slack.com/t/allegroai-trains/shared_invite/enQtOTQyMTI1MzQxMzE4LTY5NTUxOTY1NmQ1MzQ5MjRhMGRhZmM4ODE5NTNjMTg2NTBlZGQzZGVkMWU3ZDg1MGE1MjQxNDEzMWU2NmVjZmY)
|
||||
* Suggesting new features or enhancements
|
||||
* Implementing new features or fixing outstanding issues
|
||||
|
||||
The following is a set of guidelines for contributing to ClearML.
|
||||
These are primarily guidelines, not rules.
|
||||
Use your best judgment and feel free to propose changes to this document in a pull request.
|
||||
|
||||
## Reporting Bugs
|
||||
## Reporting Issues
|
||||
|
||||
This section guides you through submitting a bug report for TRAINS.
|
||||
By following these guidelines, you
|
||||
help maintainers and the community understand your report, reproduce the behavior, and find related reports.
|
||||
By following these guidelines, you help maintainers and the community understand your report, reproduce the behavior, and find related reports.
|
||||
|
||||
Before creating bug reports, please check whether the bug you want to report already appears [here](link to issues).
|
||||
You may discover that you do not need to create a bug report.
|
||||
When you are creating a bug report, please include as much detail as possible.
|
||||
Before reporting an issue, please check whether it already appears [here](https://github.com/allegroai/clearml/issues).
|
||||
If it does, join the on-going discussion instead.
|
||||
|
||||
**Note**: If you find a **Closed** issue that may be the same issue which you are currently experiencing,
|
||||
then open a **New** issue and include a link to the original (Closed) issue in the body of your new one.
|
||||
|
||||
Explain the problem and include additional details to help maintainers reproduce the problem:
|
||||
When reporting an issue, please include as much detail as possible: explain the problem and include additional details to help maintainers reproduce the problem:
|
||||
|
||||
* **Use a clear and descriptive title** for the issue to identify the problem.
|
||||
* **Describe the exact steps necessary to reproduce the problem** in as much detail as possible. Please do not just summarize what you did. Make sure to explain how you did it.
|
||||
* **Provide the specific environment setup.** Include the `pip freeze` output, specific environment variables, Python version, and other relevant information.
|
||||
* **Provide specific examples to demonstrate the steps.** Include links to files or GitHub projects, or copy/paste snippets which you use in those examples.
|
||||
* **If you are reporting any TRAINS crash,** include a crash report with a stack trace from the operating system. Make sure to add the crash report in the issue and place it in a [code block](https://help.github.com/en/articles/getting-started-with-writing-and-formatting-on-github#multiple-lines),
|
||||
* **If you are reporting any ClearML crash,** include a crash report with a stack trace from the operating system. Make sure to add the crash report in the issue and place it in a [code block](https://help.github.com/en/articles/getting-started-with-writing-and-formatting-on-github#multiple-lines),
|
||||
a [file attachment](https://help.github.com/articles/file-attachments-on-issues-and-pull-requests/), or just put it in a [gist](https://gist.github.com/) (and provide link to that gist).
|
||||
* **Describe the behavior you observed after following the steps** and the exact problem with that behavior.
|
||||
* **Explain which behavior you expected to see and why.**
|
||||
* **For Web-App issues, please include screenshots and animated GIFs** which recreate the described steps and clearly demonstrate the problem. You can use [LICEcap](https://www.cockos.com/licecap/) to record GIFs on macOS and Windows, and [silentcast](https://github.com/colinkeenan/silentcast) or [byzanz](https://github.com/threedaymonk/byzanz) on Linux.
|
||||
|
||||
## Suggesting Enhancements
|
||||
## Suggesting New Features and Enhancements
|
||||
|
||||
This section guides you through submitting an enhancement suggestion for TRAINS, including
|
||||
completely new features and minor improvements to existing functionality.
|
||||
By following these guidelines, you help maintainers and the community understand your suggestion and find related suggestions.
|
||||
|
||||
Enhancement suggestions are tracked as GitHub issues. After you determine which repository your enhancement suggestion is related to, create an issue on that repository and provide the following:
|
||||
@@ -43,12 +44,18 @@ Enhancement suggestions are tracked as GitHub issues. After you determine which
|
||||
* **A step-by-step description of the suggested enhancement** in as much detail as possible.
|
||||
* **Specific examples to demonstrate the steps.** Include copy/pasteable snippets which you use in those examples as [Markdown code blocks](https://help.github.com/articles/markdown-basics/#multiple-lines).
|
||||
* **Describe the current behavior and explain which behavior you expected to see instead and why.**
|
||||
* **Include screenshots or animated GIFs** which help you demonstrate the steps or point out the part of TRAINS which the suggestion is related to. You can use [LICEcap](https://www.cockos.com/licecap/) to record GIFs on macOS and Windows, and [silentcast](https://github.com/colinkeenan/silentcast) or [byzanz](https://github.com/threedaymonk/byzanz) on Linux.
|
||||
|
||||
|
||||
|
||||
* **Include screenshots or animated GIFs** which help you demonstrate the steps or point out the part of ClearML which the suggestion is related to. You can use [LICEcap](https://www.cockos.com/licecap/) to record GIFs on macOS and Windows, and [silentcast](https://github.com/colinkeenan/silentcast) or [byzanz](https://github.com/threedaymonk/byzanz) on Linux.
|
||||
|
||||
## Pull Requests
|
||||
|
||||
Before you submit a new PR:
|
||||
|
||||
* Verify the work you plan to merge addresses an existing [issue](https://github.com/allegroai/clearml/issues) (If not, open a new one)
|
||||
* Check related discussions in the [ClearML slack community](https://join.slack.com/t/allegroai-trains/shared_invite/enQtOTQyMTI1MzQxMzE4LTY5NTUxOTY1NmQ1MzQ5MjRhMGRhZmM4ODE5NTNjMTg2NTBlZGQzZGVkMWU3ZDg1MGE1MjQxNDEzMWU2NmVjZmY) (Or start your own discussion on the `#clearml-dev` channel)
|
||||
* Make sure your code conforms to the ClearML coding standards by running:
|
||||
`flake8 --max-line-length=120 --statistics --show-source --extend-ignore=E501 ./trains*`
|
||||
|
||||
In your PR include:
|
||||
* A reference to the issue it addresses
|
||||
* A brief description of the approach you've taken for implementing
|
||||
|
||||
|
||||
BIN
docs/dataset_screenshots.gif
Normal file
BIN
docs/dataset_screenshots.gif
Normal file
Binary file not shown.
|
After Width: | Height: | Size: 447 KiB |
139
docs/datasets.md
Normal file
139
docs/datasets.md
Normal file
@@ -0,0 +1,139 @@
|
||||
# ClearML introducing Dataset management!
|
||||
|
||||
## Decoupling Data from Code - The Dataset Paradigm
|
||||
|
||||
### The ultimate goal of `clearml-data` is to transform datasets into configuration parameters
|
||||
Just like any other argument, the dataset argument should retrieve a full local copy of the
|
||||
dataset to be used by the experiment.
|
||||
This means datasets can be efficiently retrieved by any machine in a reproducible way.
|
||||
Together it creates a full version control solution for all your data,
|
||||
that is both machine and environment agnostic.
|
||||
|
||||
|
||||
### Design Goals : Simple / Agnostic / File-based / Efficient
|
||||
|
||||
## Key Concepts:
|
||||
1) **Dataset** is a **collection of files** : e.g. folder with all subdirectories and files included in the dataset
|
||||
2) **Differential storage** : Efficient storage / network
|
||||
3) **Flexible**: support addition / removal / merge of files and datasets
|
||||
4) **Descriptive, transparent & searchable**: support projects, names, descriptions, tags and searchable fields
|
||||
5) **Simple interface** (CLI and programmatic)
|
||||
6) **Accessible**: get a copy of the dataset files from anywhere on any machine
|
||||
|
||||
### Workflow:
|
||||
|
||||
#### Simple dataset creation with CLI:
|
||||
|
||||
- Create a dataset
|
||||
``` bash
|
||||
clearml-data create --project <my_project> --name <my_dataset_name>
|
||||
```
|
||||
- Add local files to the dataset
|
||||
``` bashtrue
|
||||
clearml-data add --id <dataset_id_from_previous_command> --files ~/datasets/best_dataset/
|
||||
```
|
||||
- Upload files (Optional: specify storage `--storage` `s3://bucket` or `gs://` or `azure://` or `/mnt/shared/`)
|
||||
``` bash
|
||||
clearml-data upload --id <dataset_id>
|
||||
```
|
||||
- Close dataset
|
||||
``` bash
|
||||
clearml-data close --id <dataset_id>
|
||||
```
|
||||
|
||||
|
||||
#### Integrating datasets into your code:
|
||||
``` python
|
||||
from argparse import ArgumentParser
|
||||
from clearml import Dataset
|
||||
|
||||
# adding command line interface, so it is easy to use
|
||||
parser = ArgumentParser()
|
||||
parser.add_argument('--dataset', default='aayyzz', type=str, help='Dataset ID to train on')
|
||||
args = parser.parse_args()
|
||||
|
||||
# creating a task, so that later we could override the argparse from UI
|
||||
task = Task.init(project_name='examples', task_name='dataset demo')
|
||||
|
||||
# getting a local copy of the dataset
|
||||
dataset_folder = Datset.get(dataset_id=args.dataset).get_local_copy()
|
||||
|
||||
# go over the files in `dataset_folder` and train your model
|
||||
```
|
||||
|
||||
|
||||
#### Modifying a dataset with CLI:
|
||||
|
||||
- Create a new dataset (specify the parent dataset id)
|
||||
``` bash
|
||||
clearml-data create --name <improved_dataset> --parents <existing_dataset_id>
|
||||
```
|
||||
- Get a mutable copy of the current dataset
|
||||
``` bash
|
||||
clearml-data get --id <created_dataset_id> --copy ~/datasets/working_dataset
|
||||
```
|
||||
- Change / add / remove files from the dataset folder
|
||||
``` bash
|
||||
vim ~/datasets/working_dataset/everything.csv
|
||||
```
|
||||
- Sync local changes
|
||||
``` bash
|
||||
clearml-data sync --id <created_dataset_id> --folder ~/datasets/working_dataset
|
||||
```
|
||||
- Upload files (Optional: specify storage `--storage` `s3://bucket` or `gs://` or `azure://` or `/mnt/shared/`)
|
||||
``` bash
|
||||
clearml-data upload --id <created_dataset_id>
|
||||
```
|
||||
- Close dataset
|
||||
``` bash
|
||||
clearml-data close --id <created_dataset_id>
|
||||
```
|
||||
|
||||
|
||||
#### Command Line Interface Summary:
|
||||
|
||||
- **`search`** Search a dataset based on project / name / description / tag etc.
|
||||
- **`list`** List the file directory content of a dataset (no need to download a copy pf the dataset)
|
||||
- **`verify`** Verify a local copy of a dataset (verify the dataset files SHA2 hash)
|
||||
- **`create`** Create a new dataset (support extending/inheriting multiple parents)
|
||||
- **`delete`** Delete a dataset
|
||||
- **`add`** Add local files to a dataset
|
||||
- **`sync`** Sync dataset with a local folder (source-of-truth being the local folder)
|
||||
- **`remove`** Remove files from dataset (no need to download a copy of the dataset)
|
||||
- **`get`** Get a local copy of the dataset (either readonly --link, or writable --copy)
|
||||
- **`upload`** Upload the dataset (use --storage to specify storage target such as S3/GS/Azure/Folder, default: file server)
|
||||
|
||||
|
||||
#### Under the hood (how it all works):
|
||||
|
||||
Each dataset instance stores the collection of files added/modified from the previous version (parent).
|
||||
|
||||
When requesting a copy of the dataset all parent datasets on the graph are downloaded and a new folder
|
||||
is merged with all changes introduced in the dataset DAG.
|
||||
|
||||
Implementation details:
|
||||
|
||||
Dataset differential snapshot is stored in a single zip file for efficiency in storage and network
|
||||
bandwidth. Local cache is built into the process making sure datasets are downloaded only once.
|
||||
Dataset contains SHA2 hash of all the files in the dataset.
|
||||
In order to increase dataset fetching speed, only file size is verified automatically,
|
||||
the SHA2 hash is verified only on user's request.
|
||||
|
||||
The design supports multiple parents per dataset, essentially merging all parents based on order.
|
||||
To improve deep dataset DAG storage and speed, dataset squashing was introduced. A user can squash
|
||||
a dataset, merging down all changes introduced in the DAG, creating a new flat version without parent datasets.
|
||||
|
||||
|
||||
### Datasets UI:
|
||||
|
||||
A dataset is represented as a special `Task` in the system. <br>
|
||||
It is of type `data-processing` with a special tag `dataset`.
|
||||
|
||||
- Full log (calls / CLI) of the dataset creation process can be found in the "Execution" section.
|
||||
- Listing of the dataset differential snapshot, summary of files added / modified / removed and details of files
|
||||
in the differential snapshot (location / size / hash), is available in the Artifacts section you can find a
|
||||
- The full dataset listing (all files included) is available in the Configuration section under `Dataset Content`.
|
||||
This allows you to quickly compare two dataset contents and visually see the difference.
|
||||
- The dataset genealogy DAG and change-set summary table is visualized in Results / Plots
|
||||
|
||||
<a href="https://app.community.clear.ml"><img src="https://github.com/allegroai/clearml/blob/master/docs/dataset_screenshots.gif?raw=true" width="80%"></a>
|
||||
@@ -1,6 +1,6 @@
|
||||
# TRAINS Explicit Logging
|
||||
# ClearML Explicit Logging
|
||||
|
||||
Using the **TRAINS** [Logger](https://github.com/allegroai/trains/blob/master/trains/logger.py) module and other **TRAINS** features, you can explicitly log any of the following:
|
||||
Using the **ClearML** [Logger](https://github.com/allegroai/clearml/blob/master/clearml/logger.py) module and other **ClearML** features, you can explicitly log any of the following:
|
||||
|
||||
* Report graphs and images
|
||||
* [Scalar metrics](#scalar-metrics)
|
||||
@@ -19,10 +19,10 @@ Using the **TRAINS** [Logger](https://github.com/allegroai/trains/blob/master/tr
|
||||
* Message logging
|
||||
* [Reporting text without formatting](#reporting-text-without-formatting)
|
||||
|
||||
Additionally, the **TRAINS** Logger module provides methods that allow you to do the following:
|
||||
Additionally, the **ClearML** Logger module provides methods that allow you to do the following:
|
||||
|
||||
* Get the [current logger]()
|
||||
* Overrride the TRAINS configuration file with a [default upload destination]() for images and files
|
||||
* Overrride the ClearML configuration file with a [default upload destination]() for images and files
|
||||
|
||||
## Graphs and Images
|
||||
|
||||
@@ -30,7 +30,7 @@ Additionally, the **TRAINS** Logger module provides methods that allow you to do
|
||||
|
||||
Use to report scalar metrics by iteration as a line plot.
|
||||
|
||||
First [get the current logger](#get-the-current-logger) and then use it (see an [example script](https://github.com/allegroai/trains/blob/master/examples/reporting/scalar_reporting.py)) with the following method.
|
||||
First [get the current logger](#get-the-current-logger) and then use it (see an [example script](https://github.com/allegroai/clearml/blob/master/examples/reporting/scalar_reporting.py)) with the following method.
|
||||
|
||||
**Method**:
|
||||
|
||||
@@ -101,7 +101,7 @@ def report_scalar(self, title, series, value, iteration)
|
||||
|
||||
Use to report any data by iteration as a histogram.
|
||||
|
||||
First [get the current logger](#get-the-current-logger) and then use it (see an [example script](https://github.com/allegroai/trains/blob/master/examples/reporting/scatter_hist_confusion_mat_reporting.py)) with the following method.
|
||||
First [get the current logger](#get-the-current-logger) and then use it (see an [example script](https://github.com/allegroai/clearml/blob/master/examples/reporting/scatter_hist_confusion_mat_reporting.py)) with the following method.
|
||||
|
||||
**Method**:
|
||||
|
||||
@@ -199,7 +199,7 @@ def report_histogram(self, title, series, values, iteration, labels=None, xlabel
|
||||
|
||||
Use to report any data by iteration as a single or multiple line plot.
|
||||
|
||||
First [get the current logger](#get-the-current-logger) and then use it (see an [example script](https://github.com/allegroai/trains/blob/master/examples/reporting/scatter_hist_confusion_mat_reporting.py)) with the following method.
|
||||
First [get the current logger](#get-the-current-logger) and then use it (see an [example script](https://github.com/allegroai/clearml/blob/master/examples/reporting/scatter_hist_confusion_mat_reporting.py)) with the following method.
|
||||
|
||||
**Method**:
|
||||
|
||||
@@ -325,7 +325,7 @@ def report_line_plot(self, title, series, iteration, xaxis, yaxis, mode='lines',
|
||||
|
||||
Use to report any vector data as a 2D scatter diagram.
|
||||
|
||||
First [get the current logger](#get-the-current-logger) and then use it (see an [example script](https://github.com/allegroai/trains/blob/master/examples/reporting/scatter_hist_confusion_mat_reporting.py)) with the following method.
|
||||
First [get the current logger](#get-the-current-logger) and then use it (see an [example script](https://github.com/allegroai/clearml/blob/master/examples/reporting/scatter_hist_confusion_mat_reporting.py)) with the following method.
|
||||
|
||||
**Method**:
|
||||
|
||||
@@ -461,7 +461,7 @@ def report_scatter2d(self, title, series, scatter, iteration, xaxis=None, yaxis=
|
||||
|
||||
Use to report any array data as a 3D scatter diagram.
|
||||
|
||||
First [get the current logger](#get-the-current-logger) and then use it (see an [example script](https://github.com/allegroai/trains/blob/master/examples/reporting/3d_plots_reporting.py)) with the following method.
|
||||
First [get the current logger](#get-the-current-logger) and then use it (see an [example script](https://github.com/allegroai/clearml/blob/master/examples/reporting/3d_plots_reporting.py)) with the following method.
|
||||
|
||||
**Method**:
|
||||
|
||||
@@ -597,7 +597,7 @@ def report_scatter3d(self, title, series, scatter, iteration, labels=None, mode=
|
||||
|
||||
Use to report a heat-map matrix as a confusion matrix. You can also plot a heat-map as a [surface diagram](#surface-diagrams).
|
||||
|
||||
First [get the current logger](#get-the-current-logger) and then use it (see an [example script](https://github.com/allegroai/trains/blob/master/examples/reporting/scatter_hist_confusion_mat_reporting.py)) with the following method.
|
||||
First [get the current logger](#get-the-current-logger) and then use it (see an [example script](https://github.com/allegroai/clearml/blob/master/examples/reporting/scatter_hist_confusion_mat_reporting.py)) with the following method.
|
||||
|
||||
**Method**:
|
||||
|
||||
@@ -689,7 +689,7 @@ def report_confusion_matrix(self, title, series, matrix, iteration, xlabels=None
|
||||
|
||||
Use to plot a heat-map matrix as a surface diagram. You can also plot a heat-map as a [confusion matrix](#confusion-matrices).
|
||||
|
||||
First [get the current logger](#get-the-current-logger) and then use it (see an [example script](https://github.com/allegroai/trains/blob/master/examples/reporting/3d_plots_reporting.py)) with the following method.
|
||||
First [get the current logger](#get-the-current-logger) and then use it (see an [example script](https://github.com/allegroai/clearml/blob/master/examples/reporting/3d_plots_reporting.py)) with the following method.
|
||||
|
||||
**Method**:
|
||||
|
||||
@@ -818,10 +818,10 @@ def report_surface(self, title, series, matrix, iteration, xlabels=None, ylabels
|
||||
|
||||
### Images
|
||||
|
||||
Use to report an image and upload its contents to the bucket specified in the **TRAINS** configuration file,
|
||||
Use to report an image and upload its contents to the bucket specified in the **ClearML** configuration file,
|
||||
or a [a default upload destination](#set-default-upload-destination), if you set a default.
|
||||
|
||||
First [get the current logger](#get-the-current-logger) and then use it (see an [example script](https://github.com/allegroai/trains/blob/master/examples/manual_reporting.py)) with the following method.
|
||||
First [get the current logger](#get-the-current-logger) and then use it (see an [example script](https://github.com/allegroai/clearml/blob/master/examples/manual_reporting.py)) with the following method.
|
||||
|
||||
**Method**:
|
||||
|
||||
@@ -929,7 +929,7 @@ def report_image(self, title, series, iteration, local_path=None, matrix=None, m
|
||||
|
||||
### Logging Experiment Parameter Dictionaries
|
||||
|
||||
In order for **TRAINS** to log a dictionary of parameters, use the `Task.connect` method.
|
||||
In order for **ClearML** to log a dictionary of parameters, use the `Task.connect` method.
|
||||
|
||||
For example, to log the hyper-parameters <code>learning_rate</code>, <code>batch_size</code>, <code>display_step</code>, <code>model_path</code>, <code>n_hidden_1</code>, and <code>n_hidden_2</code>:
|
||||
|
||||
@@ -938,27 +938,27 @@ For example, to log the hyper-parameters <code>learning_rate</code>, <code>batch
|
||||
parameters_dict = { 'learning_rate': 0.001, 'batch_size': 100, 'display_step': 1,
|
||||
'model_path': "/tmp/model.ckpt", 'n_hidden_1': 256, 'n_hidden_2': 256 }
|
||||
|
||||
# Connect the dictionary to your TRAINS Task
|
||||
# Connect the dictionary to your ClearML Task
|
||||
parameters_dict = Task.current_task().connect(parameters_dict)
|
||||
```
|
||||
|
||||
### Specifying Environment Variables to Track
|
||||
|
||||
By setting the `TRAINS_LOG_ENVIRONMENT` environment variable, make **TRAINS** log either:
|
||||
By setting the `CLEARML_LOG_ENVIRONMENT` environment variable, make **ClearML** log either:
|
||||
|
||||
* All environment variables
|
||||
|
||||
export TRAINS_LOG_ENVIRONMENT="*"
|
||||
export CLEARML_LOG_ENVIRONMENT="*"
|
||||
|
||||
* Specific environment variables
|
||||
|
||||
For example, log `PWD` and `PYTHONPATH`
|
||||
|
||||
export TRAINS_LOG_ENVIRONMENT="PWD,PYTHONPATH"
|
||||
export CLEARML_LOG_ENVIRONMENT="PWD,PYTHONPATH"
|
||||
|
||||
* No environment variables
|
||||
|
||||
export TRAINS_LOG_ENVIRONMENT=
|
||||
export CLEARML_LOG_ENVIRONMENT=
|
||||
|
||||
## Logging Messages
|
||||
|
||||
@@ -972,7 +972,7 @@ Use the methods in this section to log various types of messages. The method nam
|
||||
def debug(self, msg, *args, **kwargs)
|
||||
```
|
||||
|
||||
First [get the current logger](#get-the-current-logger) and then use it (see an [example script](https://github.com/allegroai/trains/blob/master/examples/reporting/text_reporting.py)) with the following method.
|
||||
First [get the current logger](#get-the-current-logger) and then use it (see an [example script](https://github.com/allegroai/clearml/blob/master/examples/reporting/text_reporting.py)) with the following method.
|
||||
|
||||
**Arguments**:
|
||||
|
||||
@@ -1010,7 +1010,7 @@ First [get the current logger](#get-the-current-logger) and then use it (see an
|
||||
def info(self, msg, *args, **kwargs)
|
||||
```
|
||||
|
||||
First [get the current logger](#get-the-current-logger) and then use it (see an [example script](https://github.com/allegroai/trains/blob/master/examples/reporting/text_reporting.py)) with the following method.
|
||||
First [get the current logger](#get-the-current-logger) and then use it (see an [example script](https://github.com/allegroai/clearml/blob/master/examples/reporting/text_reporting.py)) with the following method.
|
||||
|
||||
**Arguments**:
|
||||
|
||||
@@ -1048,7 +1048,7 @@ First [get the current logger](#get-the-current-logger) and then use it (see an
|
||||
def warn(self, msg, *args, **kwargs)
|
||||
```
|
||||
|
||||
First [get the current logger](#get-the-current-logger) and then use it (see an [example script](https://github.com/allegroai/trains/blob/master/examples/reporting/text_reporting.py)) with the following method.
|
||||
First [get the current logger](#get-the-current-logger) and then use it (see an [example script](https://github.com/allegroai/clearml/blob/master/examples/reporting/text_reporting.py)) with the following method.
|
||||
|
||||
**Arguments**:<a name="log_arguments"></a>
|
||||
|
||||
@@ -1087,7 +1087,7 @@ First [get the current logger](#get-the-current-logger) and then use it (see an
|
||||
def error(self, msg, *args, **kwargs)
|
||||
```
|
||||
|
||||
First [get the current logger](#get-the-current-logger) and then use it (see an [example script](https://github.com/allegroai/trains/blob/master/examples/reporting/text_reporting.py)) with the following method.
|
||||
First [get the current logger](#get-the-current-logger) and then use it (see an [example script](https://github.com/allegroai/clearml/blob/master/examples/reporting/text_reporting.py)) with the following method.
|
||||
|
||||
**Arguments**:
|
||||
|
||||
@@ -1125,7 +1125,7 @@ First [get the current logger](#get-the-current-logger) and then use it (see an
|
||||
def critical(self, msg, *args, **kwargs)
|
||||
```
|
||||
|
||||
First [get the current logger](#get-the-current-logger) and then use it (see an [example script](https://github.com/allegroai/trains/blob/master/examples/reporting/text_reporting.py)) with the following method.
|
||||
First [get the current logger](#get-the-current-logger) and then use it (see an [example script](https://github.com/allegroai/clearml/blob/master/examples/reporting/text_reporting.py)) with the following method.
|
||||
|
||||
**Arguments**:
|
||||
|
||||
@@ -1163,7 +1163,7 @@ First [get the current logger](#get-the-current-logger) and then use it (see an
|
||||
def fatal(self, msg, *args, **kwargs)
|
||||
```
|
||||
|
||||
First [get the current logger](#get-the-current-logger) and then use it (see an [example script](https://github.com/allegroai/trains/blob/master/examples/reporting/text_reporting.py)) with the following method.
|
||||
First [get the current logger](#get-the-current-logger) and then use it (see an [example script](https://github.com/allegroai/clearml/blob/master/examples/reporting/text_reporting.py)) with the following method.
|
||||
|
||||
**Arguments**:
|
||||
|
||||
@@ -1201,7 +1201,7 @@ First [get the current logger](#get-the-current-logger) and then use it (see an
|
||||
def console(self, msg, level=logging.INFO, omit_console=False, *args, **kwargs)
|
||||
```
|
||||
|
||||
First [get the current logger](#get-the-current-logger) and then use it (see an [example script](https://github.com/allegroai/trains/blob/master/examples/reporting/text_reporting.py)) with the following method.
|
||||
First [get the current logger](#get-the-current-logger) and then use it (see an [example script](https://github.com/allegroai/clearml/blob/master/examples/reporting/text_reporting.py)) with the following method.
|
||||
|
||||
**Arguments**:
|
||||
|
||||
@@ -1279,7 +1279,7 @@ First [get the current logger](#get-the-current-logger) and then use it (see an
|
||||
def report_text(self, msg, level=logging.INFO, print_console=False, *args, **_)
|
||||
```
|
||||
|
||||
First [get the current logger](#get-the-current-logger) and then use it (see an [example script](https://github.com/allegroai/trains/blob/master/examples/reporting/text_reporting.py)) with the following method.
|
||||
First [get the current logger](#get-the-current-logger) and then use it (see an [example script](https://github.com/allegroai/clearml/blob/master/examples/reporting/text_reporting.py)) with the following method.
|
||||
|
||||
**Arguments**:
|
||||
|
||||
@@ -1371,7 +1371,7 @@ None.
|
||||
Use to specify the default destination storage location used for uploading images.
|
||||
Images are uploaded and a link to the image is reported.
|
||||
|
||||
Credentials for the storage location are in the global configuration file (for example, on Linux, <code>~/trains.conf</code>).
|
||||
Credentials for the storage location are in the global configuration file (for example, on Linux, <code>~/clearml.conf</code>).
|
||||
|
||||
**Method**:
|
||||
|
||||
|
||||
@@ -1,4 +1,4 @@
|
||||
# TRAINS SDK configuration file
|
||||
# ClearML SDK configuration file - Please use ~/clearml.conf
|
||||
api {
|
||||
# web_server on port 8080
|
||||
web_server: "http://localhost:8080"
|
||||
@@ -16,12 +16,12 @@ api {
|
||||
verify_certificate: True
|
||||
}
|
||||
sdk {
|
||||
# TRAINS - default SDK configuration
|
||||
# ClearML - default SDK configuration
|
||||
|
||||
storage {
|
||||
cache {
|
||||
# Defaults to system temp folder / cache
|
||||
default_base_dir: "~/.trains/cache"
|
||||
default_base_dir: "~/.clearml/cache"
|
||||
}
|
||||
}
|
||||
|
||||
@@ -103,7 +103,7 @@ sdk {
|
||||
google.storage {
|
||||
# # Default project and credentials file
|
||||
# # Will be used when no bucket configuration is found
|
||||
# project: "trains"
|
||||
# project: "clearml"
|
||||
# credentials_json: "/path/to/credentials.json"
|
||||
|
||||
# # Specific credentials per bucket and sub directory
|
||||
@@ -111,7 +111,7 @@ sdk {
|
||||
# {
|
||||
# bucket: "my-bucket"
|
||||
# subdir: "path/in/bucket" # Not required
|
||||
# project: "trains"
|
||||
# project: "clearml"
|
||||
# credentials_json: "/path/to/credentials.json"
|
||||
# },
|
||||
# ]
|
||||
@@ -119,7 +119,7 @@ sdk {
|
||||
azure.storage {
|
||||
# containers: [
|
||||
# {
|
||||
# account_name: "trains"
|
||||
# account_name: "clearml"
|
||||
# account_key: "secret"
|
||||
# # container_name:
|
||||
# }
|
||||
@@ -161,8 +161,8 @@ sdk {
|
||||
# do not analyze the entire repository.
|
||||
force_analyze_entire_repo: false
|
||||
|
||||
# If set to true, *trains* update message will not be printed to the console
|
||||
# this value can be overwritten with os environment variable TRAINS_SUPPRESS_UPDATE_MESSAGE=1
|
||||
# If set to true, *clearml* update message will not be printed to the console
|
||||
# this value can be overwritten with os environment variable CLEARML_SUPPRESS_UPDATE_MESSAGE=1
|
||||
suppress_update_message: false
|
||||
|
||||
# If this flag is true (default is false), instead of analyzing the code with Pigar, analyze with `pip freeze`
|
||||
@@ -173,7 +173,7 @@ sdk {
|
||||
# of the Hyper-Parameters.
|
||||
# multiple selected variables are supported including the suffix '*'.
|
||||
# For example: "AWS_*" will log any OS environment variable starting with 'AWS_'.
|
||||
# This value can be overwritten with os environment variable TRAINS_LOG_ENVIRONMENT="[AWS_*, CUDA_VERSION]"
|
||||
# This value can be overwritten with os environment variable CLEARML_LOG_ENVIRONMENT="[AWS_*, CUDA_VERSION]"
|
||||
# Example: log_os_environments: ["AWS_*", "CUDA_VERSION"]
|
||||
log_os_environments: []
|
||||
|
||||
|
||||
@@ -1,8 +1,8 @@
|
||||
from random import sample
|
||||
|
||||
from trains import Task
|
||||
from clearml import Task
|
||||
|
||||
# Connecting TRAINS
|
||||
# Connecting ClearML
|
||||
task = Task.init(project_name='examples', task_name='Random Hyper-Parameter Search Example', task_type=Task.TaskTypes.optimizer)
|
||||
|
||||
# Create a hyper-parameter dictionary for the task
|
||||
|
||||
@@ -1 +1 @@
|
||||
trains
|
||||
clearml
|
||||
@@ -1,4 +1,4 @@
|
||||
from trains import Task
|
||||
from clearml import Task
|
||||
from time import sleep
|
||||
|
||||
# Initialize the Task Pipe's first Task used to start the Task Pipe
|
||||
@@ -35,7 +35,7 @@ cloned_task_parameters = cloned_task.get_parameters()
|
||||
cloned_task_parameters[param['param_name']] = param['param_name_new_value']
|
||||
cloned_task.set_parameters(cloned_task_parameters)
|
||||
|
||||
# Enqueue the Task for execution. The enqueued Task must already exist in the trains platform
|
||||
# Enqueue the Task for execution. The enqueued Task must already exist in the clearml platform
|
||||
print('Enqueue next step in pipeline to queue: {}'.format(param['execution_queue_name']))
|
||||
Task.enqueue(cloned_task.id, queue_name=param['execution_queue_name'])
|
||||
|
||||
|
||||
@@ -1,7 +1,7 @@
|
||||
# This Task is the base task that we will be executing as a second step (see task_piping.py)
|
||||
# In order to make sure this experiment is registered in the platform, you must execute it once.
|
||||
|
||||
from trains import Task
|
||||
from clearml import Task
|
||||
|
||||
# Initialize the task pipe's first task used to start the task pipe
|
||||
task = Task.init('examples', 'Toy Base Task')
|
||||
|
||||
@@ -1,3 +1,3 @@
|
||||
torch>=1.1.0
|
||||
torchvision>=0.3.0
|
||||
trains
|
||||
clearml
|
||||
@@ -1,4 +1,4 @@
|
||||
# TRAINS - example of multiple sub-processes interacting and reporting to a single master experiment
|
||||
# ClearML - example of multiple sub-processes interacting and reporting to a single master experiment
|
||||
|
||||
import multiprocessing
|
||||
import os
|
||||
@@ -7,7 +7,7 @@ import sys
|
||||
import time
|
||||
from argparse import ArgumentParser
|
||||
|
||||
from trains import Task
|
||||
from clearml import Task
|
||||
|
||||
# fake data for us to "process"
|
||||
data = (
|
||||
@@ -51,7 +51,7 @@ if __name__ == '__main__':
|
||||
|
||||
# We have to initialize the task in the master process,
|
||||
# it will make sure that any sub-process calling Task.init will get the master task object
|
||||
# notice that we exclude the `counter` argument, so we can launch multiple sub-processes with trains-agent
|
||||
# notice that we exclude the `counter` argument, so we can launch multiple sub-processes with clearml-agent
|
||||
# otherwise, the `counter` will always be set to the original value.
|
||||
task = Task.init('examples', 'Popen example', auto_connect_arg_parser={'counter': False})
|
||||
|
||||
|
||||
@@ -3,7 +3,7 @@ import numpy as np
|
||||
import tensorflow as tf
|
||||
from tensorflow import keras
|
||||
|
||||
from trains import Task
|
||||
from clearml import Task
|
||||
|
||||
task = Task.init(project_name="autokeras", task_name="autokeras imdb example with scalars")
|
||||
|
||||
|
||||
@@ -1,5 +1,5 @@
|
||||
# Plese read this https://github.com/keras-team/autokeras#installation before doing changes
|
||||
autokeras
|
||||
tensorflow>=2.3.0
|
||||
trains
|
||||
clearml
|
||||
git+https://github.com/keras-team/keras-tuner.git@1.0.2rc2
|
||||
|
||||
@@ -1,10 +1,10 @@
|
||||
# TRAINS - Fastai with Tensorboard example code, automatic logging the model and Tensorboard outputs
|
||||
# ClearML - Fastai with Tensorboard example code, automatic logging the model and Tensorboard outputs
|
||||
#
|
||||
|
||||
from fastai.callbacks.tensorboard import LearnerTensorboardWriter
|
||||
from fastai.vision import * # Quick access to computer vision functionality
|
||||
|
||||
from trains import Task
|
||||
from clearml import Task
|
||||
|
||||
task = Task.init(project_name="example", task_name="fastai with tensorboard callback")
|
||||
|
||||
|
||||
@@ -1,4 +1,4 @@
|
||||
fastai
|
||||
tensorboard
|
||||
tensorboardX
|
||||
trains
|
||||
clearml
|
||||
|
||||
@@ -15,12 +15,12 @@ from ignite.utils import setup_logger
|
||||
from torch.utils.tensorboard import SummaryWriter
|
||||
from tqdm import tqdm
|
||||
|
||||
from trains import Task, StorageManager
|
||||
from clearml import Task, StorageManager
|
||||
|
||||
# Trains Initializations
|
||||
# ClearML Initializations
|
||||
task = Task.init(project_name='Image Example', task_name='image classification CIFAR10')
|
||||
params = {'number_of_epochs': 20, 'batch_size': 64, 'dropout': 0.25, 'base_lr': 0.001, 'momentum': 0.9, 'loss_report': 100}
|
||||
params = task.connect(params) # enabling configuration override by trains
|
||||
params = task.connect(params) # enabling configuration override by clearml
|
||||
print(params) # printing actual configuration (after override in remote mode)
|
||||
|
||||
manager = StorageManager()
|
||||
|
||||
@@ -1,4 +1,4 @@
|
||||
# TRAINS - Keras with Tensorboard example code, automatic logging model and Tensorboard outputs
|
||||
# ClearML - Keras with Tensorboard example code, automatic logging model and Tensorboard outputs
|
||||
#
|
||||
# Train a simple deep NN on the MNIST dataset.
|
||||
# Gets to 98.40% test accuracy after 20 epochs
|
||||
@@ -19,7 +19,7 @@ from tensorflow.keras.layers import Activation, Dense
|
||||
from tensorflow.keras.models import Sequential
|
||||
from tensorflow.keras.optimizers import RMSprop
|
||||
|
||||
from trains import Task
|
||||
from clearml import Task
|
||||
|
||||
|
||||
class TensorBoardImage(TensorBoard):
|
||||
@@ -89,7 +89,7 @@ model.compile(loss='categorical_crossentropy',
|
||||
optimizer=RMSprop(),
|
||||
metrics=['accuracy'])
|
||||
|
||||
# Connecting TRAINS
|
||||
# Connecting ClearML
|
||||
task = Task.init(project_name='examples', task_name='Keras with TensorBoard example')
|
||||
|
||||
# To set your own configuration:
|
||||
|
||||
@@ -1,4 +1,4 @@
|
||||
# TRAINS - Keras with Tensorboard example code, automatic logging model and Tensorboard outputs
|
||||
# ClearML - Keras with Tensorboard example code, automatic logging model and Tensorboard outputs
|
||||
#
|
||||
# Train a simple deep NN on the MNIST dataset.
|
||||
# Gets to 98.40% test accuracy after 20 epochs
|
||||
@@ -18,7 +18,7 @@ from keras.layers.core import Dense, Activation
|
||||
from keras.optimizers import RMSprop
|
||||
from keras.utils import np_utils
|
||||
import tensorflow as tf
|
||||
from trains import Task
|
||||
from clearml import Task
|
||||
|
||||
|
||||
class TensorBoardImage(TensorBoard):
|
||||
@@ -88,7 +88,7 @@ model.compile(loss='categorical_crossentropy',
|
||||
optimizer=RMSprop(),
|
||||
metrics=['accuracy'])
|
||||
|
||||
# Connecting TRAINS
|
||||
# Connecting ClearML
|
||||
task = Task.init(project_name='examples', task_name='Keras with TensorBoard example')
|
||||
task.connect_configuration({'test': 1337, 'nested': {'key': 'value', 'number': 1}})
|
||||
|
||||
|
||||
@@ -1,2 +1,2 @@
|
||||
trains
|
||||
clearml
|
||||
Keras>=2.2.4
|
||||
|
||||
@@ -1,11 +1,11 @@
|
||||
# TRAINS - Example of manual model configuration and uploading
|
||||
# ClearML - Example of manual model configuration and uploading
|
||||
#
|
||||
import os
|
||||
from tempfile import gettempdir
|
||||
|
||||
from keras import Input, layers, Model
|
||||
|
||||
from trains import Task
|
||||
from clearml import Task
|
||||
|
||||
|
||||
task = Task.init(project_name='examples', task_name='Model configuration and upload')
|
||||
|
||||
@@ -1,3 +1,3 @@
|
||||
Keras
|
||||
tensorflow>=2.0
|
||||
trains
|
||||
clearml
|
||||
@@ -3,9 +3,9 @@
|
||||
import kerastuner as kt
|
||||
import tensorflow as tf
|
||||
import tensorflow_datasets as tfds
|
||||
from trains.external.kerastuner import TrainsTunerLogger
|
||||
from clearml.external.kerastuner import TrainsTunerLogger
|
||||
|
||||
from trains import Task
|
||||
from clearml import Task
|
||||
|
||||
physical_devices = tf.config.list_physical_devices('GPU')
|
||||
if physical_devices:
|
||||
|
||||
@@ -1,4 +1,4 @@
|
||||
keras-tuner
|
||||
tensorflow>=2.0
|
||||
tensorflow-datasets
|
||||
trains
|
||||
clearml
|
||||
@@ -1,4 +1,4 @@
|
||||
lightgbm
|
||||
scikit-learn
|
||||
pandas
|
||||
trains
|
||||
clearml
|
||||
@@ -1,10 +1,10 @@
|
||||
# TRAINS - Example of LightGBM integration
|
||||
# ClearML - Example of LightGBM integration
|
||||
#
|
||||
import lightgbm as lgb
|
||||
import pandas as pd
|
||||
from sklearn.metrics import mean_squared_error
|
||||
|
||||
from trains import Task
|
||||
from clearml import Task
|
||||
|
||||
task = Task.init(project_name="examples", task_name="LIGHTgbm")
|
||||
|
||||
|
||||
@@ -1,9 +1,9 @@
|
||||
# TRAINS - Example of Matplotlib and Seaborn integration and reporting
|
||||
# ClearML - Example of Matplotlib and Seaborn integration and reporting
|
||||
#
|
||||
import numpy as np
|
||||
import matplotlib.pyplot as plt
|
||||
import seaborn as sns
|
||||
from trains import Task
|
||||
from clearml import Task
|
||||
|
||||
|
||||
task = Task.init(project_name='examples', task_name='Matplotlib example')
|
||||
|
||||
@@ -1,4 +1,4 @@
|
||||
matplotlib >= 3.1.1 ; python_version >= '3.6'
|
||||
matplotlib >= 2.2.4 ; python_version < '3.6'
|
||||
seaborn
|
||||
trains
|
||||
clearml
|
||||
@@ -1,10 +1,10 @@
|
||||
# TRAINS - Example of manual model configuration and uploading
|
||||
# ClearML - Example of manual model configuration and uploading
|
||||
#
|
||||
import os
|
||||
from tempfile import gettempdir
|
||||
|
||||
import torch
|
||||
from trains import Task
|
||||
from clearml import Task
|
||||
|
||||
|
||||
task = Task.init(project_name='examples', task_name='Model configuration and upload')
|
||||
|
||||
@@ -1,4 +1,4 @@
|
||||
# TRAINS - example of TRAINS torch distributed support
|
||||
# ClearML - example of ClearML torch distributed support
|
||||
# notice all nodes will be reporting to the master Task (experiment)
|
||||
|
||||
import os
|
||||
@@ -15,7 +15,7 @@ import torch.nn.functional as F
|
||||
from torch import optim
|
||||
from torchvision import datasets, transforms
|
||||
|
||||
from trains import Task
|
||||
from clearml import Task
|
||||
|
||||
|
||||
local_dataset_path = './MNIST_data'
|
||||
@@ -150,7 +150,7 @@ if __name__ == "__main__":
|
||||
|
||||
# We have to initialize the task in the master process,
|
||||
# it will make sure that any sub-process calling Task.init will get the master task object
|
||||
# notice that we exclude the `rank` argument, so we can launch multiple sub-processes with trains-agent
|
||||
# notice that we exclude the `rank` argument, so we can launch multiple sub-processes with clearml-agent
|
||||
# otherwise, the `rank` will always be set to the original value.
|
||||
task = Task.init("examples", "test torch distributed", auto_connect_arg_parser={'rank': False})
|
||||
|
||||
|
||||
@@ -1,4 +1,4 @@
|
||||
# TRAINS - Example of Pytorch and matplotlib integration and reporting
|
||||
# ClearML - Example of Pytorch and matplotlib integration and reporting
|
||||
#
|
||||
"""
|
||||
Neural Transfer Using PyTorch
|
||||
@@ -60,7 +60,7 @@ import torchvision.transforms as transforms
|
||||
import torchvision.models as models
|
||||
|
||||
import copy
|
||||
from trains import Task
|
||||
from clearml import Task
|
||||
|
||||
|
||||
task = Task.init(project_name='examples', task_name='pytorch with matplotlib example', task_type=Task.TaskTypes.testing)
|
||||
|
||||
@@ -1,4 +1,4 @@
|
||||
# TRAINS - Example of Pytorch mnist training integration
|
||||
# ClearML - Example of Pytorch mnist training integration
|
||||
#
|
||||
from __future__ import print_function
|
||||
import argparse
|
||||
@@ -11,7 +11,7 @@ import torch.nn.functional as F
|
||||
import torch.optim as optim
|
||||
from torchvision import datasets, transforms
|
||||
|
||||
from trains import Task, Logger
|
||||
from clearml import Task, Logger
|
||||
|
||||
|
||||
class Net(nn.Module):
|
||||
|
||||
@@ -1,4 +1,4 @@
|
||||
# TRAINS - Example of pytorch with tensorboard>=v1.14
|
||||
# ClearML - Example of pytorch with tensorboard>=v1.14
|
||||
#
|
||||
from __future__ import print_function
|
||||
|
||||
@@ -14,7 +14,7 @@ from torchvision import datasets, transforms
|
||||
from torch.autograd import Variable
|
||||
from torch.utils.tensorboard import SummaryWriter
|
||||
|
||||
from trains import Task
|
||||
from clearml import Task
|
||||
|
||||
|
||||
class Net(nn.Module):
|
||||
@@ -99,7 +99,7 @@ def main():
|
||||
parser.add_argument('--log-interval', type=int, default=10, metavar='N',
|
||||
help='how many batches to wait before logging training status')
|
||||
args = parser.parse_args()
|
||||
task = Task.init(project_name='examples', task_name='pytorch with tensorboard') # noqa: F841
|
||||
Task.init(project_name='examples', task_name='pytorch with tensorboard')
|
||||
writer = SummaryWriter('runs')
|
||||
writer.add_text('TEXT', 'This is some text', 0)
|
||||
args.cuda = not args.no_cuda and torch.cuda.is_available()
|
||||
|
||||
@@ -3,4 +3,4 @@ tensorboardX
|
||||
tensorboard>=1.14.0
|
||||
torch>=1.1.0
|
||||
torchvision>=0.3.0
|
||||
trains
|
||||
clearml
|
||||
@@ -5,7 +5,7 @@ import numpy as np
|
||||
from PIL import Image
|
||||
from torch.utils.tensorboard import SummaryWriter
|
||||
|
||||
from trains import Task
|
||||
from clearml import Task
|
||||
task = Task.init(project_name='examples', task_name='pytorch tensorboard toy example')
|
||||
|
||||
|
||||
|
||||
@@ -2,4 +2,4 @@ joblib>=0.13.2
|
||||
matplotlib >= 3.1.1 ; python_version >= '3.6'
|
||||
matplotlib >= 2.2.4 ; python_version < '3.6'
|
||||
scikit-learn
|
||||
trains
|
||||
clearml
|
||||
@@ -10,7 +10,7 @@ import numpy as np
|
||||
import matplotlib.pyplot as plt
|
||||
|
||||
|
||||
from trains import Task
|
||||
from clearml import Task
|
||||
|
||||
task = Task.init(project_name="examples", task_name="scikit-learn joblib example")
|
||||
|
||||
|
||||
@@ -6,7 +6,7 @@ from sklearn.model_selection import learning_curve
|
||||
from sklearn.naive_bayes import GaussianNB
|
||||
from sklearn.svm import SVC
|
||||
|
||||
from trains import Task
|
||||
from clearml import Task
|
||||
|
||||
|
||||
def plot_learning_curve(estimator, title, X, y, axes=None, ylim=None, cv=None, n_jobs=None,
|
||||
|
||||
@@ -1,4 +1,4 @@
|
||||
# TRAINS - Example of pytorch with tensorboardX
|
||||
# ClearML - Example of pytorch with tensorboardX
|
||||
#
|
||||
from __future__ import print_function
|
||||
|
||||
@@ -14,7 +14,7 @@ from tensorboardX import SummaryWriter
|
||||
from torch.autograd import Variable
|
||||
from torchvision import datasets, transforms
|
||||
|
||||
from trains import Task
|
||||
from clearml import Task
|
||||
|
||||
|
||||
class Net(nn.Module):
|
||||
|
||||
@@ -1,4 +1,4 @@
|
||||
tensorboardX>=1.8
|
||||
torch>=1.1.0
|
||||
torchvision>=0.3.0
|
||||
trains
|
||||
clearml
|
||||
Some files were not shown because too many files have changed in this diff Show More
Reference in New Issue
Block a user