clearml initial version 0.17.0

This commit is contained in:
allegroai
2020-12-22 23:25:37 +02:00
parent a460df1e68
commit d327f2e2b9
145 changed files with 3136 additions and 794 deletions

245
README.md
View File

@@ -1,143 +1,134 @@
# Allegro Trains - new name is coming soon ;)
## Auto-Magical Experiment Manager, Version Control and ML-Ops for AI
<div align="center">
## :confetti_ball: Now with Full ML/DL DevOps - See [TRAINS AGENT](https://github.com/allegroai/trains-agent) and [Services](https://github.com/allegroai/trains-server#trains-agent-services--)
## :station: [Documentation is here!](https://allegro.ai/docs) `wubba lubba dub dub` and a [Slack Channel](https://join.slack.com/t/allegroai-trains/shared_invite/enQtOTQyMTI1MzQxMzE4LTY5NTUxOTY1NmQ1MzQ5MjRhMGRhZmM4ODE5NTNjMTg2NTBlZGQzZGVkMWU3ZDg1MGE1MjQxNDEzMWU2NmVjZmY) :train2:
## Features: [AWS autoscaler wizard](https://allegro.ai/docs/examples/services/aws_autoscaler/aws_autoscaler/) :robot: [Hyper-Parameter Optimization](https://allegro.ai/docs/examples/optimization/hyper-parameter-optimization/examples_hyperparam_opt/) and :electric_plug: [Pipeline Controllers](https://allegro.ai/docs/examples/pipeline/pipeline_controller/)
<a href="https://app.community.clear.ml"><img src="https://github.com/allegroai/trains/blob/master/docs/clearml-logo.svg?raw=true" width="250px"></a>
"Because its a jungle out there"
[![GitHub license](https://img.shields.io/github/license/allegroai/trains.svg)](https://img.shields.io/github/license/allegroai/trains.svg)
[![PyPI pyversions](https://img.shields.io/pypi/pyversions/trains.svg)](https://img.shields.io/pypi/pyversions/trains.svg)
[![PyPI version shields.io](https://img.shields.io/pypi/v/trains.svg)](https://img.shields.io/pypi/v/trains.svg)
[![PyPI status](https://img.shields.io/pypi/status/trains.svg)](https://pypi.python.org/pypi/trains/)
**ClearML - Auto-Magical Suite of tools to streamline your ML workflow
Experiment Manager, ML-Ops and Data-Management**
[![GitHub license](https://img.shields.io/github/license/allegroai/trains.svg)](https://img.shields.io/github/license/allegroai/clearml.svg)
[![PyPI pyversions](https://img.shields.io/pypi/pyversions/clearml.svg)](https://img.shields.io/pypi/pyversions/clearml.svg)
[![PyPI version shields.io](https://img.shields.io/pypi/v/clearml.svg)](https://img.shields.io/pypi/v/clearml.svg)
[![PyPI status](https://img.shields.io/pypi/status/clearml.svg)](https://pypi.python.org/pypi/clearml/)
[![Optuna](https://img.shields.io/badge/Optuna-integrated-blue)](https://optuna.org)
[![Slack Channel](https://img.shields.io/badge/slack-%23trains--community-blueviolet?logo=slack)](https://join.slack.com/t/allegroai-trains/shared_invite/zt-c0t13pty-aVUZZW1TSSSg2vyIGVPBhg)
[![Slack Channel](https://img.shields.io/badge/slack-%23clearml--community-blueviolet?logo=slack)](https://join.slack.com/t/allegroai-trains/shared_invite/zt-c0t13pty-aVUZZW1TSSSg2vyIGVPBhg)
### :point_right: Help improve Trains by filling our 2-min [user survey](https://allegro.ai/lp/trains-user-survey/)
</div>
Trains is our solution to a problem we share with countless other researchers and developers in the machine
---
### ClearML
#### *Formerly known as Allegro Trains*
ClearML is a ML/DL development and production suite, it contains three main modules:
- [Experiment Manager](#clearml-experiment-management) - Automagical experiment tracking, environments and results
- [ML-Ops](https://github.com/allegroai/trains-agent) - Automation, Pipelines & Orchestration solution for ML/DL jobs (K8s / Cloud / bare-metal)
- [Data-Management](https://github.com/allegroai/clearml/doc/clearml-data.md) - Fully differentiable data management & version control solution on top of object-storage
(S3/GS/Azure/NAS)
Instrumenting these components is the **ClearML-server**, see [Self-Hosting]() & [Free tier Hosting]()
---
<div align="center">
**[Signup](https://app.community.clear.ml) & [Start using](https://allegro.ai/clearml/docs/getting_started/getting_started/) in under 2 minutes**
</div>
---
<a href="https://app.community.clear.ml"><img src="https://github.com/allegroai/trains/blob/master/docs/webapp_screenshots.gif?raw=true" width="100%"></a>
## ClearML Experiment Manager
**Adding only 2 lines to your code gets you the following**
* Complete experiment setup log
* Full source control info including non-committed local changes
* Execution environment (including specific packages & versions)
* Hyper-parameters
* ArgParser for command line parameters with currently used values
* Explicit parameters dictionary
* Tensorflow Defines (absl-py)
* Hydra configuration and overrides
* Initial model weights file
* Full experiment output automatic capture
* stdout and stderr
* Resource Monitoring (CPU/GPU utilization, temperature, IO, network, etc.)
* Model snapshots (With optional automatic upload to central storage: Shared folder, S3, GS, Azure, Http)
* Artifacts log & store (Shared folder, S3, GS, Azure, Http)
* Tensorboard/TensorboardX scalars, metrics, histograms, **images, audio and video samples**
* [Matplotlib & Seaborn](https://github.com/allegroai/trains/tree/master/examples/frameworks/matplotlib)
* [ClearML Explicit Logging](https://allegro.ai/clearml/docs/examples/reporting/) interface for complete flexibility.
* Extensive platform support and integrations
* Supported ML/DL frameworks: [PyTorch](https://github.com/allegroai/trains/tree/master/examples/frameworks/pytorch)(incl' ignite/lightning), [Tensorflow](https://github.com/allegroai/trains/tree/master/examples/frameworks/tensorflow), [Keras](https://github.com/allegroai/trains/tree/master/examples/frameworks/keras), [AutoKeras](https://github.com/allegroai/trains/tree/master/examples/frameworks/autokeras), [XGBoost](https://github.com/allegroai/trains/tree/master/examples/frameworks/xgboost) and [Scikit-Learn](https://github.com/allegroai/trains/tree/master/examples/frameworks/scikit-learn)
* Seamless integration (including version control) with **Jupyter Notebook**
and [*PyCharm* remote debugging](https://github.com/allegroai/trains-pycharm-plugin)
#### [Start using ClearML](https://allegro.ai/clearml/docs/getting_started/getting_started/)
```bash
pip install clearml
```
Add two lines to your code:
```python
from clearml import Task
task = Task(project_name='examples', task_name='hello world')
```
You are done, everything your process outputs is now automagically logged into ClearML.
<br>Next step automation! **Learn more on ClearML two clicks automation [here]()**
## ClearML Architecture
The ClearML run-time components:
* The ClearML Python Package for integrating ClearML into your existing scripts by adding just two lines of code, and optionally extending your experiments and other workflows with ClearML powerful and versatile set of classes and methods.
* The ClearML Server storing experiment, model, and workflow data, and supporting the Web UI experiment manager, and ML-Ops automation for reproducibility and tuning. It is available as a hosted service and open source for you to deploy your own ClearML Server.
* The ClearML Agent for ML-Ops orchestration, experiment and workflow reproducibility, and scalability.
<img src="https://allegro.ai/clearml/docs/img/ClearML_Architecture.png" width="100%" alt="clearml-architecture">
## Additional Modules
- [clearml-session](https://github.com/allegroai/clearml-session) - **Launch remote JupyterLab / VSCode-server inside any docker, on Cloud/On-Prem machines**
- [clearml-task](https://github.com/allegroai/clearml/doc/clearml-task.md) - Run any codebase on remote machines with full remote logging of Tensorboard, Matplotlib & Console outputs
- [clearml-data](https://github.com/allegroai/clearml/doc/clearml-data.md) - **CLI for managing and versioning your datasets, including creating / uploading / downloading of data from S3/GS/Azure/NAS**
- [AWS Auto-Scaler](examples/services/aws-autoscaler/aws_autoscaler.py) - Automatically spin EC2 instances based on your workloads with preconfigured budget! No need for K8s!
- [Hyper-Parameter Optimization](examples/services/hyper-parameter-optimization/hyper_parameter_optimizer.py) - Optimize any code with black-box approach and state of the art Bayesian optimization algorithms
- [Automation Pipeline](examples/pipeline/pipeline_controller.py) - Build pipelines based on existing experiments / jobs, supports building pipelines of pipelines!
- [Slack Integration](examples/services/monitoring/slack_alerts.py) - Report experiments progress / failure directly to Slack (fully customizable!)
## Why ClearML?
ClearML is our solution to a problem we share with countless other researchers and developers in the machine
learning/deep learning universe: Training production-grade deep learning models is a glorious but messy process.
Trains tracks and controls the process by associating code version control, research projects,
ClearML tracks and controls the process by associating code version control, research projects,
performance metrics, and model provenance.
We designed Trains specifically to require effortless integration so that teams can preserve their existing methods
and practices. Use it on a daily basis to boost collaboration and visibility, or use it to automatically collect
your experimentation logs, outputs, and data to one centralized server.
We designed ClearML specifically to require effortless integration so that teams can preserve their existing methods
and practices.
**We have a demo server up and running at [https://demoapp.trains.allegro.ai](https://demoapp.trains.allegro.ai).**
### :steam_locomotive: [Getting Started Tutorial](https://allegro.ai/blog/setting-up-allegro-ai-platform/) :rocket:
**You can try out Trains and [test your code](#integrate-trains), with no additional setup.**
<a href="https://demoapp.trains.allegro.ai"><img src="https://github.com/allegroai/trains/blob/master/docs/webapp_screenshots.gif?raw=true" width="100%"></a>
## Trains Automatically Logs Everything
**With only two lines of code, this is what you are getting:**
* Git repository, branch, commit id, entry point and local git diff
* Python environment (including specific packages & versions)
* stdout and stderr
* Resource Monitoring (CPU/GPU utilization, temperature, IO, network, etc.)
* Hyper-parameters
* ArgParser for command line parameters with currently used values
* Explicit parameters dictionary
* Tensorflow Defines (absl-py)
* Initial model weights file
* Model snapshots (With optional automatic upload to central storage: Shared folder, S3, GS, Azure, Http)
* Artifacts log & store (Shared folder, S3, GS, Azure, Http)
* Tensorboard/TensorboardX scalars, metrics, histograms, **images, audio and video**
* [Matplotlib & Seaborn](https://github.com/allegroai/trains/tree/master/examples/frameworks/matplotlib)
* Supported frameworks: [PyTorch](https://github.com/allegroai/trains/tree/master/examples/frameworks/pytorch), [Tensorflow](https://github.com/allegroai/trains/tree/master/examples/frameworks/tensorflow), [Keras](https://github.com/allegroai/trains/tree/master/examples/frameworks/keras), [AutoKeras](https://github.com/allegroai/trains/tree/master/examples/frameworks/autokeras), [XGBoost](https://github.com/allegroai/trains/tree/master/examples/frameworks/xgboost) and [Scikit-Learn](https://github.com/allegroai/trains/tree/master/examples/frameworks/scikit-learn) (MxNet is coming soon)
* Seamless integration (including version control) with **Jupyter Notebook**
and [*PyCharm* remote debugging](https://github.com/allegroai/trains-pycharm-plugin)
**Additionally, log data explicitly using [Trains Explicit Logging](https://allegro.ai/docs/examples/reporting/).**
## Using Trains <a name="using-trains"></a>
Trains is a two part solution:
1. Trains [python package](https://pypi.org/project/trains/) auto-magically connects with your code
**Trains requires only two lines of code for full integration.**
To connect your code with Trains:
- Install Trains <a name="integrate-trains"></a>
pip install trains
<details>
<summary>Add optional cloud storage support (S3/GoogleStorage/Azure):</summary>
```bash
pip install trains[s3]
pip install trains[gs]
pip install trains[azure]
```
</details>
- Add the following lines to your code
from trains import Task
task = Task.init(project_name="my project", task_name="my task")
* If project_name is not provided, the repository name will be used instead
* If task_name (experiment) is not provided, the current filename will be used instead
- Run your code. When Trains connects to the server, a link is printed. For example
Trains Results page:
https://demoapp.trains.allegro.ai/projects/76e5e2d45e914f52880621fe64601e85/experiments/241f06ae0f5c4b27b8ce8b64890ce152/output/log
- Open the link and view your experiment parameters, model and tensorboard metrics
**See examples [here](https://allegro.ai/docs/examples/examples_overview/)**
2. [Trains Server](https://github.com/allegroai/trains-server) for logging, querying, control and UI ([Web-App](https://github.com/allegroai/trains-web))
**We already have a demo server up and running for you at [https://demoapp.trains.allegro.ai](https://demoapp.trains.allegro.ai).**
**You can try out Trains without the need to install your own *trains-server*, just add the two lines of code, and it will automatically connect to the Trains demo-server.**
*Note that the demo server resets every 24 hours and all of the logged data is deleted.*
When you are ready to use your own Trains server, go ahead and [install *trains-server*](https://github.com/allegroai/trains-server).
<img src="https://github.com/allegroai/trains/blob/master/docs/system_diagram.png?raw=true" width="50%">
## Configuring Your Own Trains server <a name="configuration"></a>
1. Install and run *trains-server* (see [Installing the Trains Server](https://github.com/allegroai/trains-server))
2. Run the initial configuration wizard for your Trains installation and follow the instructions to setup Trains package
(http://**_trains-server-ip_**:__port__ and user credentials)
trains-init
After installing and configuring, you can access your configuration file at `~/trains.conf`
Sample configuration file available [here](https://github.com/allegroai/trains/blob/master/docs/trains.conf).
- Use it on a daily basis to boost collaboration and visibility in your team
- Create a remote job from any experiment with a click of a button
- Automate processes and create pipelines to collect your experimentation logs, outputs, and data
- Store all you data on any object-storage solution, with the simplest interface possible
- Make you data transparent by cataloging it all on the ClearML platform
We believe ClearML is ground-breaking. We wish to establish new standards of true seamless integration between
experiment management,ML-Ops and data management.
## Who We Are
Trains is supported by the same team behind *allegro.ai*,
ClearML is supported by the team behind *allegro.ai*,
where we build deep learning pipelines and infrastructure for enterprise companies.
We built Trains to track and control the glorious but messy process of training production-grade deep learning models.
We are committed to vigorously supporting and expanding the capabilities of Trains.
We built ClearML to track and control the glorious but messy process of training production-grade deep learning models.
We are committed to vigorously supporting and expanding the capabilities of ClearML.
## Why Are We Releasing Trains?
We believe Trains is ground-breaking. We wish to establish new standards of experiment management in
deep-learning and ML. Only the greater community can help us do that.
We promise to always be backwardly compatible. If you start working with Trains today,
even though this project is currently in the beta stage, your logs and data will always upgrade with you.
We promise to always be backwardly compatible, making sure all your logs, data and pipelines
will always upgrade with you.
## License
@@ -145,19 +136,19 @@ Apache License, Version 2.0 (see the [LICENSE](https://www.apache.org/licenses/L
## Documentation, Community & Support
More information in the [official documentation](https://allegro.ai/docs) and [on YouTube](https://www.youtube.com/c/AllegroAI).
More information in the [official documentation](https://allegro.ai//clearml/docs) and [on YouTube](https://www.youtube.com/c/AllegroAI).
For examples and use cases, check the [examples folder](https://github.com/allegroai/trains/tree/master/examples) and [corresponding documentation](https://allegro.ai/docs/examples/examples_overview/).
For examples and use cases, check the [examples folder](https://github.com/allegroai/trains/tree/master/examples) and [corresponding documentation](https://allegro.ai/clearml/docs/examples/examples_overview/).
If you have any questions: post on our [Slack Channel](https://join.slack.com/t/allegroai-trains/shared_invite/enQtOTQyMTI1MzQxMzE4LTY5NTUxOTY1NmQ1MzQ5MjRhMGRhZmM4ODE5NTNjMTg2NTBlZGQzZGVkMWU3ZDg1MGE1MjQxNDEzMWU2NmVjZmY), or tag your questions on [stackoverflow](https://stackoverflow.com/questions/tagged/trains) with '**trains**' tag.
For feature requests or bug reports, please use [GitHub issues](https://github.com/allegroai/trains/issues).
Additionally, you can always find us at *trains@allegro.ai*
Additionally, you can always find us at *clearml@allegro.ai*
## Contributing
See the Trains [Guidelines for Contributing](https://github.com/allegroai/trains/blob/master/docs/contributing.md).
See the ClearML [Guidelines for Contributing](https://github.com/allegroai/trains/blob/master/docs/contributing.md).
_May the force (and the goddess of learning rates) be with you!_

View File

@@ -1,4 +1,4 @@
""" TRAINS open SDK """
""" ClearML open SDK """
from .version import __version__
from .task import Task
@@ -6,5 +6,7 @@ from .model import InputModel, OutputModel, Model
from .logger import Logger
from .storage import StorageManager
from .errors import UsageError
from .datasets import Dataset
__all__ = ["__version__", "Task", "InputModel", "OutputModel", "Model", "Logger", "StorageManager", "UsageError"]
__all__ = ["__version__", "Task", "InputModel", "OutputModel", "Model", "Logger",
"StorageManager", "UsageError", "Dataset"]

View File

@@ -1,6 +1,7 @@
from .parameters import UniformParameterRange, DiscreteParameterRange, UniformIntegerParameterRange, ParameterSet
from .optimization import GridSearch, RandomSearch, HyperParameterOptimizer, Objective
from .job import TrainsJob
from .controller import PipelineController
__all__ = ["UniformParameterRange", "DiscreteParameterRange", "UniformIntegerParameterRange", "ParameterSet",
"GridSearch", "RandomSearch", "HyperParameterOptimizer", "Objective", "TrainsJob"]
"GridSearch", "RandomSearch", "HyperParameterOptimizer", "Objective", "TrainsJob", "PipelineController"]

View File

@@ -102,15 +102,15 @@ class AutoScaler(object):
def spin_up_worker(self, resource, worker_id_prefix, queue_name):
"""
Creates a new worker for trains (cloud-specific implementation).
Creates a new worker for clearml (cloud-specific implementation).
First, create an instance in the cloud and install some required packages.
Then, define trains-agent environment variables and run trains-agent for the specified queue.
Then, define clearml-agent environment variables and run clearml-agent for the specified queue.
NOTE: - Will wait until instance is running
- This implementation assumes the instance image already has docker installed
:param str resource: resource name, as defined in self.resource_configurations and self.queues.
:param str worker_id_prefix: worker name prefix
:param str queue_name: trains queue to listen to
:param str queue_name: clearml queue to listen to
"""
pass
@@ -137,17 +137,17 @@ class AutoScaler(object):
minutes would be removed.
"""
# Worker's id in trains would be composed from prefix, name, instance_type and cloud_id separated by ';'
# Worker's id in clearml would be composed from prefix, name, instance_type and cloud_id separated by ';'
workers_pattern = re.compile(
r"^(?P<prefix>[^:]+):(?P<name>[^:]+):(?P<instance_type>[^:]+):(?P<cloud_id>[^:]+)"
)
# Set up the environment variables for trains
os.environ["TRAINS_API_HOST"] = self.api_server
os.environ["TRAINS_WEB_HOST"] = self.web_server
os.environ["TRAINS_FILES_HOST"] = self.files_server
os.environ["TRAINS_API_ACCESS_KEY"] = self.access_key
os.environ["TRAINS_API_SECRET_KEY"] = self.secret_key
# Set up the environment variables for clearml
os.environ["CLEARML_API_HOST"] = self.api_server
os.environ["CLEARML_WEB_HOST"] = self.web_server
os.environ["CLEARML_FILES_HOST"] = self.files_server
os.environ["CLEARML_API_ACCESS_KEY"] = self.access_key
os.environ["CLEARML_API_SECRET_KEY"] = self.secret_key
api_client = APIClient()
# Verify the requested queues exist and create those that doesn't exist
@@ -234,7 +234,7 @@ class AutoScaler(object):
# skip resource types that might be needed
if resources in required_idle_resources:
continue
# Remove from both aws and trains all instances that are idle for longer than MAX_IDLE_TIME_MIN
# Remove from both aws and clearml all instances that are idle for longer than MAX_IDLE_TIME_MIN
if time() - timestamp > self.max_idle_time_min * 60.0:
cloud_id = workers_pattern.match(worker.id)["cloud_id"]
self.spin_down_worker(cloud_id)

View File

@@ -31,15 +31,15 @@ class AwsAutoScaler(AutoScaler):
def spin_up_worker(self, resource, worker_id_prefix, queue_name):
"""
Creates a new worker for trains.
Creates a new worker for clearml.
First, create an instance in the cloud and install some required packages.
Then, define trains-agent environment variables and run trains-agent for the specified queue.
Then, define clearml-agent environment variables and run clearml-agent for the specified queue.
NOTE: - Will wait until instance is running
- This implementation assumes the instance image already has docker installed
:param str resource: resource name, as defined in BUDGET and QUEUES.
:param str worker_id_prefix: worker name prefix
:param str queue_name: trains queue to listen to
:param str queue_name: clearml queue to listen to
"""
resource_conf = self.resource_configurations[resource]
# Add worker type and AWS instance type to the worker name.
@@ -50,7 +50,7 @@ class AwsAutoScaler(AutoScaler):
)
# user_data script will automatically run when the instance is started. it will install the required packages
# for trains-agent configure it using environment variables and run trains-agent on the required queue
# for clearml-agent configure it using environment variables and run clearml-agent on the required queue
user_data = """#!/bin/bash
sudo apt-get update
sudo apt-get install -y python3-dev
@@ -60,22 +60,22 @@ class AwsAutoScaler(AutoScaler):
sudo apt-get install -y build-essential
python3 -m pip install -U pip
python3 -m pip install virtualenv
python3 -m virtualenv trains_agent_venv
source trains_agent_venv/bin/activate
python -m pip install trains-agent
echo 'agent.git_user=\"{git_user}\"' >> /root/trains.conf
echo 'agent.git_pass=\"{git_pass}\"' >> /root/trains.conf
echo "{trains_conf}" >> /root/trains.conf
export TRAINS_API_HOST={api_server}
export TRAINS_WEB_HOST={web_server}
export TRAINS_FILES_HOST={files_server}
python3 -m virtualenv clearml_agent_venv
source clearml_agent_venv/bin/activate
python -m pip install clearml-agent
echo 'agent.git_user=\"{git_user}\"' >> /root/clearml.conf
echo 'agent.git_pass=\"{git_pass}\"' >> /root/clearml.conf
echo "{clearml_conf}" >> /root/clearml.conf
export CLEARML_API_HOST={api_server}
export CLEARML_WEB_HOST={web_server}
export CLEARML_FILES_HOST={files_server}
export DYNAMIC_INSTANCE_ID=`curl http://169.254.169.254/latest/meta-data/instance-id`
export TRAINS_WORKER_ID={worker_id}:$DYNAMIC_INSTANCE_ID
export TRAINS_API_ACCESS_KEY='{access_key}'
export TRAINS_API_SECRET_KEY='{secret_key}'
export CLEARML_WORKER_ID={worker_id}:$DYNAMIC_INSTANCE_ID
export CLEARML_API_ACCESS_KEY='{access_key}'
export CLEARML_API_SECRET_KEY='{secret_key}'
{bash_script}
source ~/.bashrc
python -m trains_agent --config-file '/root/trains.conf' daemon --queue '{queue}' {docker}
python -m clearml_agent --config-file '/root/clearml.conf' daemon --queue '{queue}' {docker}
shutdown
""".format(
api_server=self.api_server,
@@ -87,7 +87,7 @@ class AwsAutoScaler(AutoScaler):
queue=queue_name,
git_user=self.git_user or "",
git_pass=self.git_pass or "",
trains_conf='\\"'.join(self.extra_trains_conf.split('"')),
clearml_conf='\\"'.join(self.extra_trains_conf.split('"')),
bash_script=self.extra_vm_bash_script,
docker="--docker '{}'".format(self.default_docker_image)
if self.default_docker_image

View File

@@ -17,7 +17,7 @@ class PipelineController(object):
"""
Pipeline controller.
Pipeline is a DAG of base tasks, each task will be cloned (arguments changed as required) executed and monitored
The pipeline process (task) itself can be executed manually or by the trains-agent services queue.
The pipeline process (task) itself can be executed manually or by the clearml-agent services queue.
Notice: The pipeline controller lives as long as the pipeline itself is being executed.
"""
_tag = 'pipeline'
@@ -601,7 +601,7 @@ class PipelineController(object):
print('Parameters:\n{}'.format(self._nodes[name].job.task_parameter_override))
self._running_nodes.append(name)
else:
getLogger('trains.automation.controller').error(
getLogger('clearml.automation.controller').error(
'ERROR: Failed launching step \'{}\': {}'.format(name, self._nodes[name]))
# update current state (in configuration, so that we could later continue an aborted pipeline)

View File

@@ -8,7 +8,7 @@ from ..task import Task
from ..backend_api.services import tasks as tasks_service
logger = getLogger('trains.automation.job')
logger = getLogger('clearml.automation.job')
class TrainsJob(object):

View File

@@ -22,7 +22,7 @@ class Monitor(object):
self._project_ids = None
self._projects = None
self._projects_refresh_timestamp = None
self._trains_apiclient = None
self._clearml_apiclient = None
def set_projects(self, project_names=None, project_names_re=None, project_ids=None):
# type: (Optional[Sequence[str]], Optional[Sequence[str]], Optional[Sequence[str]]) -> ()
@@ -167,10 +167,10 @@ class Monitor(object):
def _get_api_client(self):
# type: () -> APIClient
"""
Return an APIClient object to directly query the trains-server
Return an APIClient object to directly query the clearml-server
:return: APIClient object
"""
if not self._trains_apiclient:
self._trains_apiclient = APIClient()
return self._trains_apiclient
if not self._clearml_apiclient:
self._clearml_apiclient = APIClient()
return self._clearml_apiclient

View File

@@ -15,7 +15,7 @@ from ..logger import Logger
from ..backend_api.services import workers as workers_service, tasks as tasks_services
from ..task import Task
logger = getLogger('trains.automation.optimization')
logger = getLogger('clearml.automation.optimization')
try:
@@ -878,9 +878,9 @@ class HyperParameterOptimizer(object):
:linenos:
:caption: Example
from trains import Task
from trains.automation import UniformParameterRange, DiscreteParameterRange
from trains.automation import GridSearch, RandomSearch, HyperParameterOptimizer
from clearml import Task
from clearml.automation import UniformParameterRange, DiscreteParameterRange
from clearml.automation import GridSearch, RandomSearch, HyperParameterOptimizer
task = Task.init('examples', 'HyperParameterOptimizer example')
an_optimizer = HyperParameterOptimizer(

View File

@@ -8,7 +8,7 @@ from ..utilities.check_updates import Version
class ApiServiceProxy(object):
_main_services_module = "trains.backend_api.services"
_main_services_module = "clearml.backend_api.services"
_available_versions = None
def __init__(self, module):

View File

@@ -1,16 +1,16 @@
{
version: 1.5
# default api_server: https://demoapi.trains.allegro.ai
# default api_server: https://demoapi.clearml.allegro.ai
api_server: ""
# default web_server: https://demoapp.trains.allegro.ai
# default web_server: https://demoapp.clearml.allegro.ai
web_server: ""
# default files_server: https://demofiles.trains.allegro.ai
# default files_server: https://demofiles.clearml.allegro.ai
files_server: ""
# verify host ssl certificate, set to False only if you have a very good reason
verify_certificate: True
# default demoapi.trains.allegro.ai credentials
# default demoapi.clearml.allegro.ai credentials
credentials {
access_key: ""
secret_key: ""

View File

@@ -107,15 +107,15 @@ class StrictSession(Session):
init()
return
original = os.environ.get(LOCAL_CONFIG_FILE_OVERRIDE_VAR, None)
original = LOCAL_CONFIG_FILE_OVERRIDE_VAR.get() or None
try:
os.environ[LOCAL_CONFIG_FILE_OVERRIDE_VAR] = str(config_file)
LOCAL_CONFIG_FILE_OVERRIDE_VAR.set(str(config_file))
init()
finally:
if original is None:
os.environ.pop(LOCAL_CONFIG_FILE_OVERRIDE_VAR, None)
LOCAL_CONFIG_FILE_OVERRIDE_VAR.pop()
else:
os.environ[LOCAL_CONFIG_FILE_OVERRIDE_VAR] = original
LOCAL_CONFIG_FILE_OVERRIDE_VAR.set(original)
def send(self, request, *args, **kwargs):
result = super(StrictSession, self).send(request, *args, **kwargs)
@@ -560,4 +560,4 @@ class APIClient(object):
for name, module in services.items()
},
)
)
)

View File

@@ -2,12 +2,14 @@ from ...backend_config import EnvEntry
from ...backend_config.converters import safe_text_to_bool
ENV_HOST = EnvEntry("TRAINS_API_HOST", "ALG_API_HOST")
ENV_WEB_HOST = EnvEntry("TRAINS_WEB_HOST", "ALG_WEB_HOST")
ENV_FILES_HOST = EnvEntry("TRAINS_FILES_HOST", "ALG_FILES_HOST")
ENV_ACCESS_KEY = EnvEntry("TRAINS_API_ACCESS_KEY", "ALG_API_ACCESS_KEY")
ENV_SECRET_KEY = EnvEntry("TRAINS_API_SECRET_KEY", "ALG_API_SECRET_KEY")
ENV_VERBOSE = EnvEntry("TRAINS_API_VERBOSE", "ALG_API_VERBOSE", type=bool, default=False)
ENV_HOST_VERIFY_CERT = EnvEntry("TRAINS_API_HOST_VERIFY_CERT", "ALG_API_HOST_VERIFY_CERT", type=bool, default=True)
ENV_OFFLINE_MODE = EnvEntry("TRAINS_OFFLINE_MODE", "ALG_OFFLINE_MODE", type=bool, converter=safe_text_to_bool)
ENV_TRAINS_NO_DEFAULT_SERVER = EnvEntry("TRAINS_NO_DEFAULT_SERVER", "ALG_NO_DEFAULT_SERVER", type=bool, default=False)
ENV_HOST = EnvEntry("CLEARML_API_HOST", "TRAINS_API_HOST")
ENV_WEB_HOST = EnvEntry("CLEARML_WEB_HOST", "TRAINS_WEB_HOST")
ENV_FILES_HOST = EnvEntry("CLEARML_FILES_HOST", "TRAINS_FILES_HOST")
ENV_ACCESS_KEY = EnvEntry("CLEARML_API_ACCESS_KEY", "TRAINS_API_ACCESS_KEY")
ENV_SECRET_KEY = EnvEntry("CLEARML_API_SECRET_KEY", "TRAINS_API_SECRET_KEY")
ENV_VERBOSE = EnvEntry("CLEARML_API_VERBOSE", "TRAINS_API_VERBOSE", type=bool, default=False)
ENV_HOST_VERIFY_CERT = EnvEntry("CLEARML_API_HOST_VERIFY_CERT", "TRAINS_API_HOST_VERIFY_CERT",
type=bool, default=True)
ENV_OFFLINE_MODE = EnvEntry("CLEARML_OFFLINE_MODE", "TRAINS_OFFLINE_MODE", type=bool, converter=safe_text_to_bool)
ENV_TRAINS_NO_DEFAULT_SERVER = EnvEntry("CLEARML_NO_DEFAULT_SERVER", "TRAINS_NO_DEFAULT_SERVER",
type=bool, default=False)

View File

@@ -36,12 +36,12 @@ class MaxRequestSizeError(Exception):
class Session(TokenManager):
""" TRAINS API Session class. """
""" ClearML API Session class. """
_AUTHORIZATION_HEADER = "Authorization"
_WORKER_HEADER = "X-Trains-Worker"
_ASYNC_HEADER = "X-Trains-Async"
_CLIENT_HEADER = "X-Trains-Client"
_WORKER_HEADER = ("X-ClearML-Worker", "X-Trains-Worker", )
_ASYNC_HEADER = ("X-ClearML-Async", "X-Trains-Async", )
_CLIENT_HEADER = ("X-ClearML-Client", "X-Trains-Client", )
_async_status_code = 202
_session_requests = 0
@@ -57,10 +57,10 @@ class Session(TokenManager):
_client = [(__package__.partition(".")[0], __version__)]
api_version = '2.1'
default_demo_host = "https://demoapi.trains.allegro.ai"
default_demo_host = "https://demoapi.demo.clear.ml"
default_host = default_demo_host
default_web = "https://demoapp.trains.allegro.ai"
default_files = "https://demofiles.trains.allegro.ai"
default_web = "https://demoapp.demo.clear.ml"
default_files = "https://demofiles.demo.clear.ml"
default_key = "EGRTCO8JMSIGI6S39GTP43NFWXDQOW"
default_secret = "x!XTov_G-#vspE*Y(h$Anm&DIc5Ou-F)jsl$PdOyj5wG1&E!Z8"
force_max_api_version = None
@@ -177,8 +177,8 @@ class Session(TokenManager):
if not api_version:
api_version = '2.2' if token_dict.get('env', '') == 'prod' else Session.api_version
if token_dict.get('server_version'):
if not any(True for c in Session._client if c[0] == 'trains-server'):
Session._client.append(('trains-server', token_dict.get('server_version'), ))
if not any(True for c in Session._client if c[0] == 'clearml-server'):
Session._client.append(('clearml-server', token_dict.get('server_version'), ))
Session.api_version = str(api_version)
except (jwt.DecodeError, ValueError):
@@ -218,10 +218,13 @@ class Session(TokenManager):
if self._offline_mode:
return None
res = None
host = self.host
headers = headers.copy() if headers else {}
headers[self._WORKER_HEADER] = self.worker
headers[self._CLIENT_HEADER] = self.client
for h in self._WORKER_HEADER:
headers[h] = self.worker
for h in self._CLIENT_HEADER:
headers[h] = self.client
token_refreshed_on_error = False
url = (
@@ -308,7 +311,8 @@ class Session(TokenManager):
headers.copy() if headers else {}
)
if async_enable:
headers[self._ASYNC_HEADER] = "1"
for h in self._ASYNC_HEADER:
headers[h] = "1"
return self._send_request(
service=service,
action=action,
@@ -508,7 +512,7 @@ class Session(TokenManager):
if parsed.port == 8008:
return host.replace(':8008', ':8080', 1)
raise ValueError('Could not detect TRAINS web application server')
raise ValueError('Could not detect ClearML web application server')
@classmethod
def get_files_server_host(cls, config=None):
@@ -624,7 +628,7 @@ class Session(TokenManager):
# check if this is a misconfigured api server (getting 200 without the data section)
if res and res.status_code == 200:
raise ValueError('It seems *api_server* is misconfigured. '
'Is this the TRAINS API server {} ?'.format(self.host))
'Is this the ClearML API server {} ?'.format(self.host))
else:
raise LoginError("Response data mismatch: No 'token' in 'data' value from res, receive : {}, "
"exception: {}".format(res, ex))

View File

@@ -14,7 +14,7 @@ if six.PY3:
from functools import lru_cache
elif six.PY2:
# python 2 support
from backports.functools_lru_cache import lru_cache
from backports.functools_lru_cache import lru_cache # noqa
__disable_certificate_verification_warning = 0
@@ -139,7 +139,7 @@ def get_http_session_with_retry(
if not session.verify and __disable_certificate_verification_warning < 2:
# show warning
__disable_certificate_verification_warning += 1
logging.getLogger('trains').warning(
logging.getLogger('clearml').warning(
msg='InsecureRequestWarning: Certificate verification is disabled! Adding '
'certificate verification is strongly advised. See: '
'https://urllib3.readthedocs.io/en/latest/advanced-usage.html#ssl-warnings')

View File

@@ -88,7 +88,7 @@ class Config(object):
self._folder_name = config_folder or DEFAULT_CONFIG_FOLDER
self._roots = []
self._config = ConfigTree()
self._env = env or os.environ.get("TRAINS_ENV", Environment.default)
self._env = env or os.environ.get("CLEARML_ENV", os.environ.get("TRAINS_ENV", Environment.default))
self.config_paths = set()
self.is_server = is_server
@@ -139,7 +139,7 @@ class Config(object):
else:
env_config_paths = []
env_config_path_override = os.environ.get(ENV_CONFIG_PATH_OVERRIDE_VAR)
env_config_path_override = ENV_CONFIG_PATH_OVERRIDE_VAR.get()
if env_config_path_override:
env_config_paths = [expanduser(env_config_path_override)]
@@ -166,7 +166,7 @@ class Config(object):
)
local_config_files = LOCAL_CONFIG_FILES
local_config_override = os.environ.get(LOCAL_CONFIG_FILE_OVERRIDE_VAR)
local_config_override = LOCAL_CONFIG_FILE_OVERRIDE_VAR.get()
if local_config_override:
local_config_files = [expanduser(local_config_override)]

View File

@@ -1,7 +1,10 @@
from os.path import expanduser
from pathlib2 import Path
ENV_VAR = 'TRAINS_ENV'
from .environment import EnvEntry
ENV_VAR = 'CLEARML_ENV'
""" Name of system environment variable that can be used to specify the config environment name """
@@ -25,15 +28,16 @@ LOCAL_CONFIG_PATHS = [
LOCAL_CONFIG_FILES = [
expanduser('~/trains.conf'), # used for workstation configuration (end-users, workers)
expanduser('~/clearml.conf'), # used for workstation configuration (end-users, workers)
]
""" Local config files (not paths) """
LOCAL_CONFIG_FILE_OVERRIDE_VAR = 'TRAINS_CONFIG_FILE'
LOCAL_CONFIG_FILE_OVERRIDE_VAR = EnvEntry("CLEARML_CONFIG_FILE", "TRAINS_CONFIG_FILE")
""" Local config file override environment variable. If this is set, no other local config files will be used. """
ENV_CONFIG_PATH_OVERRIDE_VAR = 'TRAINS_CONFIG_PATH'
ENV_CONFIG_PATH_OVERRIDE_VAR = EnvEntry("CLEARML_CONFIG_PATH", "TRAINS_CONFIG_PATH")
"""
Environment-related config path override environment variable. If this is set, no other env config path will be used.
"""

View File

@@ -85,9 +85,10 @@ class Entry(object):
return self.get_pair(default=default, converter=converter)[1]
def set(self, value):
# type: (Any, Any) -> (Text, Any)
key, _ = self.get_pair(default=None, converter=None)
self._set(key, str(value))
# type: (Any) -> ()
# key, _ = self.get_pair(default=None, converter=None)
for k in self.keys:
self._set(k, str(value))
def _set(self, key, value):
# type: (Text, Text) -> None

View File

@@ -15,6 +15,10 @@ class EnvEntry(Entry):
super(EnvEntry, self).__init__(key, *more_keys, **kwargs)
self._ignore_errors = kwargs.pop('ignore_errors', False)
def pop(self):
for k in self.keys:
environ.pop(k, None)
def _get(self, key):
value = getenv(key, "").strip()
return value or NotSet

View File

@@ -5,11 +5,11 @@ from pathlib2 import Path
def logger(path=None):
name = "trains"
name = "clearml"
if path:
p = Path(path)
module = (p.parent if p.stem.startswith('_') else p).stem
name = "trains.%s" % module
name = "clearml.%s" % module
return logging.getLogger(name)

View File

@@ -50,6 +50,7 @@ class MetricsEventAdapter(object):
url = attr.attrib(default=None)
exception = attr.attrib(default=None)
retries = attr.attrib(default=None)
delete_local_file = attr.attrib(default=True)
""" Local file path, if exists, delete the file after upload completed """
@@ -198,6 +199,7 @@ class UploadEvent(MetricsEventAdapter):
_format = '.' + str(config.get('metrics.images.format', 'JPEG')).upper().lstrip('.')
_quality = int(config.get('metrics.images.quality', 87))
_subsampling = int(config.get('metrics.images.subsampling', 0))
_upload_retries = 3
_metric_counters = {}
_metric_counters_lock = Lock()
@@ -253,7 +255,7 @@ class UploadEvent(MetricsEventAdapter):
self._upload_filename += filename_ext
self._override_storage_key_prefix = kwargs.pop('override_storage_key_prefix', None)
self.retries = self._upload_retries
super(UploadEvent, self).__init__(metric, variant, iter=iter, **kwargs)
@classmethod
@@ -334,6 +336,7 @@ class UploadEvent(MetricsEventAdapter):
key_prop='key',
upload_uri=self._upload_uri,
delete_local_file=local_file if self._delete_after_upload else None,
retries=self.retries,
)
def get_target_full_upload_uri(self, storage_uri, storage_key_prefix=None, quote_uri=True):

View File

@@ -165,10 +165,11 @@ class Metrics(InterfaceBase):
try:
storage = self._get_storage(upload_uri)
retries = getattr(e, 'retries', None) or self._file_upload_retries
if isinstance(e.stream, Path):
url = storage.upload(e.stream.as_posix(), e.url, retries=self._file_upload_retries)
url = storage.upload(e.stream.as_posix(), e.url, retries=retries)
else:
url = storage.upload_from_stream(e.stream, e.url, retries=self._file_upload_retries)
url = storage.upload_from_stream(e.stream, e.url, retries=retries)
e.event.update(url=url)
except Exception as exp:
log.warning("Failed uploading to {} ({})".format(

View File

@@ -178,7 +178,7 @@ class Reporter(InterfaceBase, AbstractContextManager, SetupUploadMixin, AsyncMan
self._report(ev)
def report_matplotlib(self, title, series, figure, iter, force_save_as_image=False, logger=None):
from trains.binding.matplotlib_bind import PatchedMatplotlib
from clearml.binding.matplotlib_bind import PatchedMatplotlib
PatchedMatplotlib.report_figure(
title=title,
series=series,

View File

@@ -62,7 +62,7 @@ class TaskHandler(BufferingHandler):
if self._connect_logger and not TaskHandler.__once:
base_logger = getLogger()
if len(base_logger.handlers) == 1 and isinstance(base_logger.handlers[0], TaskHandler):
if record.name != 'console' and not record.name.startswith('trains.'):
if record.name != 'console' and not record.name.startswith('clearml.'):
base_logger.removeHandler(self)
basicConfig()
base_logger.addHandler(self)
@@ -149,7 +149,7 @@ class TaskHandler(BufferingHandler):
self._last_event = None
batch_requests = events.AddBatchRequest(requests=[events.AddRequest(e) for e in record_events if e])
except Exception:
self.__log_stderr("WARNING: trains.log - Failed logging task to backend ({:d} lines)".format(len(buffer)))
self.__log_stderr("WARNING: clearml.log - Failed logging task to backend ({:d} lines)".format(len(buffer)))
batch_requests = None
if batch_requests and batch_requests.requests:
@@ -253,7 +253,7 @@ class TaskHandler(BufferingHandler):
write = sys.stderr._original_write if hasattr(sys.stderr, '_original_write') else sys.stderr.write
write('{asctime} - {name} - {levelname} - {message}\n'.format(
asctime=Formatter().formatTime(makeLogRecord({})),
name='trains.log', levelname=getLevelName(level), message=msg))
name='clearml.log', levelname=getLevelName(level), message=msg))
@classmethod
def report_offline_session(cls, task, folder):

View File

@@ -0,0 +1,317 @@
import json
import os
from functools import reduce
from logging import getLogger
from typing import Optional, Sequence
from six.moves.urllib.parse import urlparse
from pathlib2 import Path
from ...task import Task
from .repo import ScriptInfo
class CreateAndPopulate(object):
def __init__(
self,
project_name=None, # Optional[str]
task_name=None, # Optional[str]
task_type=None, # Optional[str]
repo=None, # Optional[str]
branch=None, # Optional[str]
commit=None, # Optional[str]
script=None, # Optional[str]
working_directory=None, # Optional[str]
packages=None, # Optional[Sequence[str]]
requirements_file=None, # Optional[Union[str, Path]]
docker=None, # Optional[str]
base_task_id=None, # Optional[str]
add_task_init_call=True, # bool
raise_on_missing_entries=False, # bool
):
# type: (...) -> None
"""
Create a new Task from an existing code base.
If the code does not already contain a call to Task.init, pass add_task_init_call=True,
and the code will be patched in remote execution (i.e. when executed by `clearml-agent`
:param project_name: Set the project name for the task. Required if base_task_id is None.
:param task_name: Set the name of the remote task. Required if base_task_id is None.
:param task_type: Optional, The task type to be created. Supported values: 'training', 'testing', 'inference',
'data_processing', 'application', 'monitor', 'controller', 'optimizer', 'service', 'qc', 'custom'
:param repo: Remote URL for the repository to use, or path to local copy of the git repository
Example: 'https://github.com/allegroai/clearml.git' or '~/project/repo'
:param branch: Select specific repository branch/tag (implies the latest commit from the branch)
:param commit: Select specific commit id to use (default: latest commit,
or when used with local repository matching the local commit id)
:param script: Specify the entry point script for the remote execution. When used in tandem with
remote git repository the script should be a relative path inside the repository,
for example: './source/train.py' . When used with local repository path it supports a
direct path to a file inside the local repository itself, for example: '~/project/source/train.py'
:param working_directory: Working directory to launch the script from. Default: repository root folder.
Relative to repo root or local folder.
:param packages: Manually specify a list of required packages. Example: ["tqdm>=2.1", "scikit-learn"]
:param requirements_file: Specify requirements.txt file to install when setting the session.
If not provided, the requirements.txt from the repository will be used.
:param docker: Select the docker image to be executed in by the remote session
:param base_task_id: Use a pre-existing task in the system, instead of a local repo/script.
Essentially clones an existing task and overrides arguments/requirements.
:param add_task_init_call: If True, a 'Task.init()' call is added to the script entry point in remote execution.
:param raise_on_missing_entries: If True raise ValueError on missing entries when populating
"""
if len(urlparse(repo).scheme) <= 1:
folder = repo
repo = None
else:
folder = None
if raise_on_missing_entries and not base_task_id:
if not script:
raise ValueError("Entry point script not provided")
if not repo and not folder and not Path(script).is_file():
raise ValueError("Repository or script must be provided")
if raise_on_missing_entries and commit and branch:
raise ValueError(
"Specify either a branch/tag or specific commit id, not both (either --commit or --branch)")
if raise_on_missing_entries and not folder and working_directory and working_directory.startswith('/'):
raise ValueError("working directory \'{}\', must be relative to repository root")
if requirements_file and not Path(requirements_file).is_file():
raise ValueError("requirements file could not be found \'{}\'")
self.folder = folder
self.commit = commit
self.branch = branch
self.repo = repo
self.script = script
self.cwd = working_directory
assert not packages or isinstance(packages, (tuple, list))
self.packages = list(packages) if packages else None
self.requirements_file = Path(requirements_file) if requirements_file else None
self.base_task_id = base_task_id
self.docker = docker
self.add_task_init_call = add_task_init_call
self.project_name = project_name
self.task_name = task_name
self.task_type = task_type
self.task = None
self.raise_on_missing_entries = raise_on_missing_entries
def create_task(self):
# type: () -> Task
"""
Create the new populated Task
:return: newly created Task object
"""
local_entry_file = None
repo_info = None
if self.folder or (self.script and Path(self.script).is_file()):
self.folder = os.path.expandvars(os.path.expanduser(self.folder)) if self.folder else None
self.script = os.path.expandvars(os.path.expanduser(self.script)) if self.script else None
self.cwd = os.path.expandvars(os.path.expanduser(self.cwd)) if self.cwd else None
if Path(self.script).is_file():
entry_point = self.script
else:
entry_point = (Path(self.folder) / self.script).as_posix()
entry_point = os.path.abspath(entry_point)
if not os.path.isfile(entry_point):
raise ValueError("Script entrypoint file \'{}\' could not be found".format(entry_point))
local_entry_file = entry_point
repo_info, requirements = ScriptInfo.get(
filepaths=[entry_point],
log=getLogger(),
create_requirements=False, uncommitted_from_remote=True)
# check if we have no repository and no requirements raise error
if self.raise_on_missing_entries and not self.requirements_file and not self.repo and (
not repo_info or not repo_info.script or not repo_info.script.get('repository')):
raise ValueError("Standalone script detected \'{}\', but no requirements provided".format(self.script))
if self.base_task_id:
print('Cloning task {}'.format(self.base_task_id))
task = Task.clone(source_task=self.base_task_id, project=Task.get_project_id(self.project_name))
else:
# noinspection PyProtectedMember
task = Task._create(task_name=self.task_name, project_name=self.project_name, task_type=self.task_type)
# if there is nothing to populate, return
if not any([
self.folder, self.commit, self.branch, self.repo, self.script, self.cwd,
self.packages, self.requirements_file, self.base_task_id, self.docker
]):
return task
task_state = task.export_task()
if 'script' not in task_state:
task_state['script'] = {}
if repo_info:
task_state['script']['repository'] = repo_info.script['repository']
task_state['script']['version_num'] = repo_info.script['version_num']
task_state['script']['branch'] = repo_info.script['branch']
task_state['script']['diff'] = repo_info.script['diff'] or ''
task_state['script']['working_dir'] = repo_info.script['working_dir']
task_state['script']['entry_point'] = repo_info.script['entry_point']
task_state['script']['binary'] = repo_info.script['binary']
task_state['script']['requirements'] = {}
if self.cwd:
self.cwd = self.cwd
cwd = self.cwd if Path(self.cwd).is_dir() else (
Path(repo_info.script['repo_root']) / self.cwd).as_posix()
if not Path(cwd).is_dir():
raise ValueError("Working directory \'{}\' could not be found".format(cwd))
cwd = Path(cwd).relative_to(repo_info.script['repo_root']).as_posix()
entry_point = \
Path(repo_info.script['repo_root']) / repo_info.script['working_dir'] / repo_info.script[
'entry_point']
entry_point = entry_point.relative_to(cwd).as_posix()
task_state['script']['entry_point'] = entry_point
task_state['script']['working_dir'] = cwd
elif self.repo:
# normalize backslashes and remove first one
entry_point = '/'.join([p for p in self.script.split('/') if p and p != '.'])
cwd = '/'.join([p for p in (self.cwd or '.').split('/') if p and p != '.'])
if cwd and entry_point.startswith(cwd + '/'):
entry_point = entry_point[len(cwd) + 1:]
task_state['script']['repository'] = self.repo
task_state['script']['version_num'] = self.commit or None
task_state['script']['branch'] = self.branch or None
task_state['script']['diff'] = ''
task_state['script']['working_dir'] = cwd or '.'
task_state['script']['entry_point'] = entry_point
# update requirements
reqs = []
if self.requirements_file:
with open(self.requirements_file.as_posix(), 'rt') as f:
reqs = [line.strip() for line in f.readlines()]
if self.packages:
reqs += self.packages
if reqs:
# make sure we have clearml.
clearml_found = False
for line in reqs:
if line.strip().startswith('#'):
continue
package = reduce(lambda a, b: a.split(b)[0], "#;@=~<>", line).strip()
if package == 'clearml':
clearml_found = True
break
if not clearml_found:
reqs.append('clearml')
task_state['script']['requirements'] = {'pip': '\n'.join(reqs)}
elif not self.repo and repo_info:
# we are in local mode, make sure we have "requirements.txt" it is a must
reqs_txt_file = Path(repo_info.script['repo_root']) / "requirements.txt"
if self.raise_on_missing_entries and not reqs_txt_file.is_file():
raise ValueError(
"requirements.txt not found [{}] "
"Use --requirements or --packages".format(reqs_txt_file.as_posix()))
if self.add_task_init_call:
script_entry = os.path.abspath('/' + task_state['script']['working_dir'] +
'/' + task_state['script']['entry_point'])
idx_a = 0
# find the right entry for the patch if we have a local file (basically after __future__
if local_entry_file:
with open(local_entry_file, 'rt') as f:
lines = f.readlines()
future_found = -1
for i, line in enumerate(lines):
tokens = [t.strip() for t in line.split(' ') if t.strip()]
if tokens and tokens[0] in ('import', 'from',):
if '__future__' in line:
future_found = i
else:
break
if future_found >= 0:
idx_a = future_found + 1
task_init_patch = ''
# if we do not have requirements, add clearml to the requirements.txt
if not reqs:
task_init_patch += \
"diff --git a/requirements.txt b/requirements.txt\n" \
"--- a/requirements.txt\n" \
"+++ b/requirements.txt\n" \
"@@ -0,0 +1,1 @@\n" \
"+clearml\n"
task_init_patch += \
"diff --git a{script_entry} b{script_entry}\n" \
"--- a{script_entry}\n" \
"+++ b{script_entry}\n" \
"@@ -{idx_a},0 +{idx_b},3 @@\n" \
"+from clearml import Task\n" \
"+Task.init()\n" \
"+\n".format(
script_entry=script_entry, idx_a=idx_a, idx_b=idx_a + 1)
task_state['script']['diff'] = task_init_patch + task_state['script']['diff']
# set base docker image if provided
if self.docker:
task.set_base_docker(self.docker)
if task_state['script']['repository']:
repo_details = {k: v for k, v in task_state['script'].items()
if v and k not in ('diff', 'requirements', 'binary')}
print('Repository Detected\n{}'.format(json.dumps(repo_details, indent=2)))
else:
print('Standalone script detected\n Script: {}\n: Requirements: {}'.format(
self.script, task_state['script']['requirements'].get('pip', [])))
if task_state['script'].get('requirements') and task_state['script']['requirements'].get('pip'):
print('Requirements:\n requirements.txt: {}\n Additional Packages:{}'.format(
self.requirements_file.as_posix().name if self.requirements_file else '', self.packages))
if self.docker:
print('Base docker image: {}'.format(self.docker))
# update the Task
task.update_task(task_state)
self.task = task
return task
def update_task_args(self, args=None):
# type: (Optional[Sequence[str]]) -> ()
"""
Update the newly created Task argparse Arguments
If called before Task created, used for argument verification
:param args: Arguments to pass to the remote execution, list of string pairs (argument, value) or
list of strings '<argument>=<value>'. Example: ['lr=0.003', (batch_size, 64)]
"""
if not args:
return
# check args are in format <key>=<value>
args_list = []
for a in args:
if isinstance(a, (list, tuple)):
assert len(a) == 2
args_list.append(a)
continue
try:
parts = a.split('=', 1)
assert len(parts) == 2
args_list.append(parts)
except Exception:
raise ValueError(
"Failed parsing argument \'{}\', arguments must be in \'<key>=<value>\' format")
if not self.task:
return
task_params = self.task.get_parameters()
args_list = {'Args/{}'.format(k): v for k, v in args_list}
task_params.update(args_list)
self.task.set_parameters(task_params)
def get_id(self):
# type: () -> Optional[str]
"""
:return: Return the created Task id (str)
"""
return self.task.id if self.task else None

View File

@@ -52,21 +52,21 @@ class ScriptRequirements(object):
try:
# noinspection PyPackageRequirements,PyUnresolvedReferences
import boto3 # noqa: F401
modules.add('boto3', 'trains.storage', 0)
modules.add('boto3', 'clearml.storage', 0)
except Exception:
pass
# noinspection PyBroadException
try:
# noinspection PyPackageRequirements,PyUnresolvedReferences
from google.cloud import storage # noqa: F401
modules.add('google_cloud_storage', 'trains.storage', 0)
modules.add('google_cloud_storage', 'clearml.storage', 0)
except Exception:
pass
# noinspection PyBroadException
try:
# noinspection PyPackageRequirements,PyUnresolvedReferences
from azure.storage.blob import ContentSettings # noqa: F401
modules.add('azure_storage_blob', 'trains.storage', 0)
modules.add('azure_storage_blob', 'clearml.storage', 0)
except Exception:
pass
@@ -100,7 +100,7 @@ class ScriptRequirements(object):
from ..task import Task
# noinspection PyProtectedMember
for package, version in Task._force_requirements.items():
modules.add(package, 'trains', 0)
modules.add(package, 'clearml', 0)
except Exception:
pass
@@ -265,7 +265,7 @@ class _JupyterObserver(object):
@classmethod
def _daemon(cls, jupyter_notebook_filename):
from trains import Task
from clearml import Task
# load jupyter notebook package
# noinspection PyBroadException
@@ -715,12 +715,12 @@ class ScriptInfo(object):
jupyter_filepath=jupyter_filepath,
)
if repo_info.modified:
messages.append(
"======> WARNING! UNCOMMITTED CHANGES IN REPOSITORY {} <======".format(
script_info.get("repository", "")
)
)
# if repo_info.modified:
# messages.append(
# "======> WARNING! UNCOMMITTED CHANGES IN REPOSITORY {} <======".format(
# script_info.get("repository", "")
# )
# )
if not any(script_info.values()):
script_info = None

View File

@@ -27,6 +27,7 @@ from six.moves.urllib.parse import quote
from ...utilities.locks import RLock as FileRLock
from ...utilities.attrs import readonly
from ...utilities.proxy_object import verify_basic_type
from ...binding.artifacts import Artifacts
from ...backend_interface.task.development.worker import DevWorker
from ...backend_api import Session
@@ -144,9 +145,9 @@ class Task(IdObjectBase, AccessMixin, SetupUploadMixin):
self.__reporter = None
self._curr_label_stats = {}
self._raise_on_validation_errors = raise_on_validation_errors
self._parameters_allowed_types = (
self._parameters_allowed_types = tuple(set(
six.string_types + six.integer_types + (six.text_type, float, list, tuple, dict, type(None))
)
))
self._app_server = None
self._files_server = None
self._initial_iteration_offset = 0
@@ -216,7 +217,7 @@ class Task(IdObjectBase, AccessMixin, SetupUploadMixin):
)
else:
self.get_logger().report_text(
'TRAINS new version available: upgrade to v{} is recommended!'.format(
'ClearML new version available: upgrade to v{} is recommended!'.format(
latest_version[0]),
)
except Exception:
@@ -296,8 +297,8 @@ class Task(IdObjectBase, AccessMixin, SetupUploadMixin):
if task_type.value not in (self.TaskTypes.training, self.TaskTypes.testing) and \
not Session.check_min_api_version('2.8'):
print('WARNING: Changing task type to "{}" : '
'trains-server does not support task type "{}", '
'please upgrade trains-server.'.format(self.TaskTypes.training, task_type.value))
'clearml-server does not support task type "{}", '
'please upgrade clearml-server.'.format(self.TaskTypes.training, task_type.value))
task_type = self.TaskTypes.training
project_id = None
@@ -402,7 +403,7 @@ class Task(IdObjectBase, AccessMixin, SetupUploadMixin):
# type: () -> str
"""
The Task's status. To keep the Task updated.
Trains reloads the Task status information only, when this value is accessed.
ClearML reloads the Task status information only, when this value is accessed.
return str: TaskStatusEnum status
"""
@@ -445,7 +446,7 @@ class Task(IdObjectBase, AccessMixin, SetupUploadMixin):
def reload(self):
# type: () -> ()
"""
Reload current Task's state from trains-server.
Reload current Task's state from clearml-server.
Refresh all task's fields, including artifacts / models / parameters etc.
"""
return super(Task, self).reload()
@@ -628,9 +629,9 @@ class Task(IdObjectBase, AccessMixin, SetupUploadMixin):
):
# type: (...) -> str
"""
Update the Task's output model weights file. First, Trains uploads the file to the preconfigured output
Update the Task's output model weights file. First, ClearML uploads the file to the preconfigured output
destination (see the Task's ``output.destination`` property or call the ``setup_upload`` method),
then Trains updates the model object associated with the Task an API call. The API call uses with the URI
then ClearML updates the model object associated with the Task an API call. The API call uses with the URI
of the uploaded file, and other values provided by additional arguments.
:param str model_file: The path to the updated model weights file.
@@ -684,19 +685,19 @@ class Task(IdObjectBase, AccessMixin, SetupUploadMixin):
Set a new input model for the Task. The model must be "ready" (status is ``Published``) to be used as the
Task's input model.
:param model_id: The Id of the model on the **Trains Server** (backend). If ``model_name`` is not specified,
:param model_id: The Id of the model on the **ClearML Server** (backend). If ``model_name`` is not specified,
then ``model_id`` must be specified.
:param model_name: The model name. The name is used to locate an existing model in the **Trains Server**
:param model_name: The model name. The name is used to locate an existing model in the **ClearML Server**
(backend). If ``model_id`` is not specified, then ``model_name`` must be specified.
:param update_task_design: Update the Task's design
- ``True`` - Trains copies the Task's model design from the input model.
- ``False`` - Trains does not copy the Task's model design from the input model.
- ``True`` - ClearML copies the Task's model design from the input model.
- ``False`` - ClearML does not copy the Task's model design from the input model.
:param update_task_labels: Update the Task's label enumeration
- ``True`` - Trains copies the Task's label enumeration from the input model.
- ``False`` - Trains does not copy the Task's label enumeration from the input model.
- ``True`` - ClearML copies the Task's label enumeration from the input model.
- ``False`` - ClearML does not copy the Task's label enumeration from the input model.
"""
if model_id is None and not model_name:
raise ValueError('Expected one of [model_id, model_name]')
@@ -749,7 +750,7 @@ class Task(IdObjectBase, AccessMixin, SetupUploadMixin):
i.e. {'Args/param': 'value'} is the argument "param" from section "Args"
:param backwards_compatibility: If True (default) parameters without section name
(API version < 2.9, trains-server < 0.16) will be at dict root level.
(API version < 2.9, clearml-server < 0.16) will be at dict root level.
If False, parameters without section name, will be nested under "Args/" key.
:return: dict of the task parameters, all flattened to key/value.
@@ -838,14 +839,15 @@ class Task(IdObjectBase, AccessMixin, SetupUploadMixin):
not_allowed = {
k: type(v).__name__
for k, v in new_parameters.items()
if not isinstance(v, self._parameters_allowed_types)
if not verify_basic_type(v, self._parameters_allowed_types)
}
if not_allowed:
raise ValueError(
"Only builtin types ({}) are allowed for values (got {})".format(
', '.join(t.__name__ for t in self._parameters_allowed_types),
', '.join('%s=>%s' % p for p in not_allowed.items())),
self.log.warning(
"Skipping parameter: {}, only builtin types are supported ({})".format(
', '.join('%s[%s]' % p for p in not_allowed.items()),
', '.join(t.__name__ for t in self._parameters_allowed_types))
)
new_parameters = {k: v for k, v in new_parameters.items() if k not in not_allowed}
use_hyperparams = Session.check_min_api_version('2.9')
@@ -958,7 +960,7 @@ class Task(IdObjectBase, AccessMixin, SetupUploadMixin):
:return: True if the parameter was deleted successfully
"""
if not Session.check_min_api_version('2.9'):
raise ValueError("Delete hyper parameter is not supported by your trains-server, "
raise ValueError("Delete hyper parameter is not supported by your clearml-server, "
"upgrade to the latest version")
with self._edit_lock:
paramkey = tasks.ParamKey(section=name.split('/', 1)[0], name=name.split('/', 1)[1])
@@ -1011,7 +1013,7 @@ class Task(IdObjectBase, AccessMixin, SetupUploadMixin):
# type: (str) -> ()
"""
Set the base docker image for this experiment
If provided, this value will be used by trains-agent to execute this experiment
If provided, this value will be used by clearml-agent to execute this experiment
inside the provided docker image.
When running remotely the call is ignored
"""
@@ -1275,7 +1277,7 @@ class Task(IdObjectBase, AccessMixin, SetupUploadMixin):
# type: () -> str
"""
Return the Task results & outputs web page address.
For example: https://demoapp.trains.allegro.ai/projects/216431/experiments/60763e04/output/log
For example: https://demoapp.demo.clear.ml/projects/216431/experiments/60763e04/output/log
:return: http/s URL link.
"""
@@ -1428,7 +1430,7 @@ class Task(IdObjectBase, AccessMixin, SetupUploadMixin):
def running_locally():
# type: () -> bool
"""
Is the task running locally (i.e., ``trains-agent`` is not executing it)
Is the task running locally (i.e., ``clearml-agent`` is not executing it)
:return: True, if the task is running locally. False, if the task is not running locally.
@@ -1637,7 +1639,7 @@ class Task(IdObjectBase, AccessMixin, SetupUploadMixin):
mutually_exclusive(config_dict=config_dict, config_text=config_text, _check_none=True)
if not Session.check_min_api_version('2.9'):
raise ValueError("Multiple configurations is not supported with the current 'trains-server', "
raise ValueError("Multiple configurations is not supported with the current 'clearml-server', "
"please upgrade to the latest version")
if description:
@@ -1661,7 +1663,7 @@ class Task(IdObjectBase, AccessMixin, SetupUploadMixin):
return None if configuration name is not valid.
"""
if not Session.check_min_api_version('2.9'):
raise ValueError("Multiple configurations is not supported with the current 'trains-server', "
raise ValueError("Multiple configurations is not supported with the current 'clearml-server', "
"please upgrade to the latest version")
configuration = self.data.configuration or {}
@@ -1725,6 +1727,22 @@ class Task(IdObjectBase, AccessMixin, SetupUploadMixin):
"""
session = session if session else cls._get_default_session()
use_clone_api = Session.check_min_api_version('2.9')
if use_clone_api:
res = cls._send(
session=session, log=log,
req=tasks.CloneRequest(
task=cloned_task_id,
new_task_name=name,
new_task_tags=tags,
new_task_comment=comment,
new_task_parent=parent,
new_task_project=project,
execution_overrides=execution_overrides,
)
)
cloned_task_id = res.response.id
return cloned_task_id
res = cls._send(session=session, log=log, req=tasks.GetByIdRequest(task=cloned_task_id))
task = res.response.task
@@ -1858,7 +1876,7 @@ class Task(IdObjectBase, AccessMixin, SetupUploadMixin):
if not PROC_MASTER_ID_ENV_VAR.get() or len(PROC_MASTER_ID_ENV_VAR.get().split(':')) < 2:
self.__edit_lock = RLock()
elif PROC_MASTER_ID_ENV_VAR.get().split(':')[1] == str(self.id):
filename = os.path.join(gettempdir(), 'trains_{}.lock'.format(self.id))
filename = os.path.join(gettempdir(), 'clearml_{}.lock'.format(self.id))
# no need to remove previous file lock if we have a dead process, it will automatically release the lock.
# # noinspection PyBroadException
# try:

View File

@@ -187,7 +187,7 @@ class Artifact(object):
:raise: Raises error if local copy not found.
:return: A local path to a downloaded copy of the artifact.
"""
from trains.storage import StorageManager
from clearml.storage import StorageManager
local_copy = StorageManager.get_local_copy(
remote_url=self.url,
extract_archive=extract_archive and self.type == 'archive',
@@ -308,7 +308,7 @@ class Artifacts(object):
delete_after_upload=False, auto_pickle=True, wait_on_upload=False):
# type: (str, Optional[object], Optional[dict], Optional[str], bool, bool, bool) -> bool
if not Session.check_min_api_version('2.3'):
LoggerRoot.get_base_logger().warning('Artifacts not supported by your TRAINS-server version, '
LoggerRoot.get_base_logger().warning('Artifacts not supported by your ClearML-server version, '
'please upgrade to the latest server version')
return False
@@ -648,7 +648,7 @@ class Artifacts(object):
return
self._last_artifacts_upload[name] = current_sha2
# If old trains-server, upload as debug image
# If old clearml-server, upload as debug image
if not Session.check_min_api_version('2.3'):
logger.report_image(title='artifacts', series=name, local_path=local_csv.as_posix(),
delete_after_upload=True, iteration=self._task.get_last_iteration(),
@@ -698,7 +698,7 @@ class Artifacts(object):
"""
Upload local file and return uri of the uploaded file (uploading in the background)
"""
from trains.storage import StorageManager
from clearml.storage import StorageManager
upload_uri = self._task.output_uri or self._task.get_logger().get_default_upload_destination()
if not isinstance(local_file, Path):
@@ -715,7 +715,7 @@ class Artifacts(object):
# send for upload
# noinspection PyProtectedMember
if wait_on_upload:
StorageManager.upload_file(local_file.as_posix(), uri)
StorageManager.upload_file(local_file.as_posix(), uri, wait_for_upload=True, retries=ev.retries)
if delete_after_upload:
try:
os.unlink(local_file.as_posix())

View File

@@ -44,7 +44,7 @@ class EnvironmentBind(object):
match = match.strip()
if match == '*':
env_param.update({k: os.environ.get(k) for k in os.environ
if not k.startswith('TRAINS_') and not k.startswith('ALG_')})
if not k.startswith('TRAINS_') and not k.startswith('CLEARML_')})
elif match.endswith('*'):
match = match.strip('*')
env_param.update({k: os.environ.get(k) for k in os.environ if k.startswith(match)})

View File

@@ -114,7 +114,7 @@ class WeightsFileHandler(object):
Add a pre-save/load callback for weights files and return its handle. If the callback was already added,
return the existing handle.
Use this callback to modify the weights filename registered in the Trains Server. In case Trains is
Use this callback to modify the weights filename registered in the ClearML Server. In case ClearML is
configured to upload the weights file, this will affect the uploaded filename as well.
Callback returning None will disable the tracking of the current call Model save,
it will not disable saving it to disk, just the logging/tracking/uploading.
@@ -422,7 +422,7 @@ class WeightsFileHandler(object):
# HACK: if pytorch-lightning is used, remove the temp '.part' file extension
if sys.modules.get('pytorch_lightning') and target_filename.lower().endswith('.part'):
target_filename = target_filename[:-len('.part')]
fd, temp_file = mkstemp(prefix='.trains.upload_model_', suffix='.tmp')
fd, temp_file = mkstemp(prefix='.clearml.upload_model_', suffix='.tmp')
os.close(fd)
shutil.copy(files[0], temp_file)
trains_out_model.update_weights(

View File

@@ -192,7 +192,7 @@ class WeightsGradientHistHelper(object):
class EventTrainsWriter(object):
"""
TF SummaryWriter implementation that converts the tensorboard's summary into
Trains events and reports the events (metrics) for an Trains task (logger).
ClearML events and reports the events (metrics) for an ClearML task (logger).
"""
_add_lock = threading.RLock()
_series_name_lookup = {}
@@ -298,8 +298,8 @@ class EventTrainsWriter(object):
def __init__(self, logger, logdir=None, report_freq=100, image_report_freq=None,
histogram_update_freq_multiplier=10, histogram_granularity=50, max_keep_images=None):
"""
Create a compatible Trains backend to the TensorFlow SummaryToEventTransformer
Everything will be serialized directly to the Trains backend, instead of to the standard TF FileWriter
Create a compatible ClearML backend to the TensorFlow SummaryToEventTransformer
Everything will be serialized directly to the ClearML backend, instead of to the standard TF FileWriter
:param logger: The task.logger to use for sending the metrics (def: task.get_logger())
:param report_freq: How often to update the statistics values
@@ -846,7 +846,7 @@ class PatchSummaryToEventTransformer(object):
if PatchSummaryToEventTransformer.__original_getattribute is None:
PatchSummaryToEventTransformer.__original_getattribute = SummaryToEventTransformer.__getattribute__
SummaryToEventTransformer.__getattribute__ = PatchSummaryToEventTransformer._patched_getattribute
setattr(SummaryToEventTransformer, 'trains',
setattr(SummaryToEventTransformer, 'clearml',
property(PatchSummaryToEventTransformer.trains_object))
except Exception as ex:
LoggerRoot.get_base_logger(TensorflowBinding).debug(str(ex))
@@ -859,7 +859,7 @@ class PatchSummaryToEventTransformer(object):
from torch.utils.tensorboard.writer import FileWriter as FileWriterT # noqa
PatchSummaryToEventTransformer._original_add_eventT = FileWriterT.add_event
FileWriterT.add_event = PatchSummaryToEventTransformer._patched_add_eventT
setattr(FileWriterT, 'trains', None)
setattr(FileWriterT, 'clearml', None)
except ImportError:
# this is a new version of TensorflowX
pass
@@ -875,7 +875,7 @@ class PatchSummaryToEventTransformer(object):
PatchSummaryToEventTransformer.__original_getattributeX = \
SummaryToEventTransformerX.__getattribute__
SummaryToEventTransformerX.__getattribute__ = PatchSummaryToEventTransformer._patched_getattributeX
setattr(SummaryToEventTransformerX, 'trains',
setattr(SummaryToEventTransformerX, 'clearml',
property(PatchSummaryToEventTransformer.trains_object))
except ImportError:
# this is a new version of TensorflowX
@@ -890,7 +890,7 @@ class PatchSummaryToEventTransformer(object):
from tensorboardX.writer import FileWriter as FileWriterX # noqa
PatchSummaryToEventTransformer._original_add_eventX = FileWriterX.add_event
FileWriterX.add_event = PatchSummaryToEventTransformer._patched_add_eventX
setattr(FileWriterX, 'trains', None)
setattr(FileWriterX, 'clearml', None)
except ImportError:
# this is a new version of TensorflowX
pass
@@ -899,38 +899,38 @@ class PatchSummaryToEventTransformer(object):
@staticmethod
def _patched_add_eventT(self, *args, **kwargs):
if not hasattr(self, 'trains') or not PatchSummaryToEventTransformer.__main_task:
if not hasattr(self, 'clearml') or not PatchSummaryToEventTransformer.__main_task:
return PatchSummaryToEventTransformer._original_add_eventT(self, *args, **kwargs)
if not self.trains:
if not self.clearml: # noqa
# noinspection PyBroadException
try:
logdir = self.get_logdir()
except Exception:
logdir = None
self.trains = EventTrainsWriter(PatchSummaryToEventTransformer.__main_task.get_logger(),
self.clearml = EventTrainsWriter(PatchSummaryToEventTransformer.__main_task.get_logger(),
logdir=logdir, **PatchSummaryToEventTransformer.defaults_dict)
# noinspection PyBroadException
try:
self.trains.add_event(*args, **kwargs)
self.clearml.add_event(*args, **kwargs)
except Exception:
pass
return PatchSummaryToEventTransformer._original_add_eventT(self, *args, **kwargs)
@staticmethod
def _patched_add_eventX(self, *args, **kwargs):
if not hasattr(self, 'trains') or not PatchSummaryToEventTransformer.__main_task:
if not hasattr(self, 'clearml') or not PatchSummaryToEventTransformer.__main_task:
return PatchSummaryToEventTransformer._original_add_eventX(self, *args, **kwargs)
if not self.trains:
if not self.clearml:
# noinspection PyBroadException
try:
logdir = self.get_logdir()
except Exception:
logdir = None
self.trains = EventTrainsWriter(PatchSummaryToEventTransformer.__main_task.get_logger(),
self.clearml = EventTrainsWriter(PatchSummaryToEventTransformer.__main_task.get_logger(),
logdir=logdir, **PatchSummaryToEventTransformer.defaults_dict)
# noinspection PyBroadException
try:
self.trains.add_event(*args, **kwargs)
self.clearml.add_event(*args, **kwargs)
except Exception:
pass
return PatchSummaryToEventTransformer._original_add_eventX(self, *args, **kwargs)
@@ -947,17 +947,17 @@ class PatchSummaryToEventTransformer(object):
@staticmethod
def _patched_getattribute_(self, attr, get_base):
# no main task, zero chance we have an Trains event logger
# no main task, zero chance we have an ClearML event logger
if PatchSummaryToEventTransformer.__main_task is None:
return get_base(self, attr)
# check if we already have an Trains event logger
# check if we already have an ClearML event logger
__dict__ = get_base(self, '__dict__')
if 'event_writer' not in __dict__ or \
isinstance(__dict__['event_writer'], (ProxyEventsWriter, EventTrainsWriter)):
return get_base(self, attr)
# patch the events writer field, and add a double Event Logger (Trains and original)
# patch the events writer field, and add a double Event Logger (ClearML and original)
base_eventwriter = __dict__['event_writer']
# noinspection PyBroadException
try:
@@ -1062,7 +1062,7 @@ class PatchModelCheckPointCallback(object):
if PatchModelCheckPointCallback.__original_getattribute is None and callbacks is not None:
PatchModelCheckPointCallback.__original_getattribute = callbacks.ModelCheckpoint.__getattribute__
callbacks.ModelCheckpoint.__getattribute__ = PatchModelCheckPointCallback._patched_getattribute
setattr(callbacks.ModelCheckpoint, 'trains',
setattr(callbacks.ModelCheckpoint, 'clearml',
property(PatchModelCheckPointCallback.trains_object))
except Exception as ex:
@@ -1072,17 +1072,17 @@ class PatchModelCheckPointCallback(object):
def _patched_getattribute(self, attr):
get_base = PatchModelCheckPointCallback.__original_getattribute
# no main task, zero chance we have an Trains event logger
# no main task, zero chance we have an ClearML event logger
if PatchModelCheckPointCallback.__main_task is None:
return get_base(self, attr)
# check if we already have an Trains event logger
# check if we already have an ClearML event logger
__dict__ = get_base(self, '__dict__')
if 'model' not in __dict__ or \
isinstance(__dict__['model'], _ModelAdapter):
return get_base(self, attr)
# patch the events writer field, and add a double Event Logger (Trains and original)
# patch the events writer field, and add a double Event Logger (ClearML and original)
base_model = __dict__['model']
defaults_dict = __dict__.get('_trains_defaults') or PatchModelCheckPointCallback.defaults_dict
output_model = OutputModel(

1
clearml/cli/__init__.py Normal file
View File

@@ -0,0 +1 @@

View File

@@ -0,0 +1 @@

View File

@@ -1,4 +1,4 @@
""" Trains configuration wizard"""
""" ClearML configuration wizard"""
from __future__ import print_function
import argparse
@@ -8,22 +8,23 @@ from pathlib2 import Path
from six.moves import input
from six.moves.urllib.parse import urlparse
from trains.backend_api.session import Session
from trains.backend_api.session.defs import ENV_HOST
from trains.backend_config.defs import LOCAL_CONFIG_FILES, LOCAL_CONFIG_FILE_OVERRIDE_VAR
from trains.config import config_obj
from trains.utilities.pyhocon import ConfigFactory, ConfigMissingException
from clearml.backend_api.session import Session
from clearml.backend_api.session.defs import ENV_HOST
from clearml.backend_config.defs import LOCAL_CONFIG_FILES, LOCAL_CONFIG_FILE_OVERRIDE_VAR
from clearml.config import config_obj
from clearml.utilities.pyhocon import ConfigFactory, ConfigMissingException
description = "\n" \
"Please create new trains credentials through the profile page in " \
"your trains web app (e.g. http://localhost:8080/profile)\n" \
"Please create new clearml credentials through the profile page in " \
"your clearml web app (e.g. http://localhost:8080/profile) \n"\
"Or with the free hosted service at https://app.community.clear.ml/profile\n" \
"In the profile page, press \"Create new credentials\", then press \"Copy to clipboard\".\n" \
"\n" \
"Paste copied configuration here:\n"
host_description = """
Editing configuration file: {CONFIG_FILE}
Enter the url of the trains-server's Web service, for example: {HOST}
Enter the url of the clearml-server's Web service, for example: {HOST}
"""
# noinspection PyBroadException
@@ -40,7 +41,12 @@ def validate_file(string):
def main():
default_config_file = os.getenv(LOCAL_CONFIG_FILE_OVERRIDE_VAR) or LOCAL_CONFIG_FILES[0]
default_config_file = LOCAL_CONFIG_FILE_OVERRIDE_VAR.get()
if not default_config_file:
for f in LOCAL_CONFIG_FILES:
default_config_file = f
if os.path.exists(os.path.expanduser(os.path.expandvars(f))):
break
p = argparse.ArgumentParser(description=__doc__)
p.add_argument(
@@ -51,16 +57,20 @@ def main():
args = p.parse_args()
print('TRAINS SDK setup process')
print('ClearML SDK setup process')
conf_file = Path(args.file).absolute()
conf_file = Path(os.path.expanduser(args.file)).absolute()
if conf_file.exists() and conf_file.is_file() and conf_file.stat().st_size > 0:
print('Configuration file already exists: {}'.format(str(conf_file)))
print('Leaving setup, feel free to edit the configuration file.')
return
print(description, end='')
sentinel = ''
parse_input = '\n'.join(iter(input, sentinel))
parse_input = ''
for line in iter(input, sentinel):
parse_input += line+'\n'
if line.rstrip() == '}':
break
credentials = None
api_server = None
web_server = None
@@ -104,7 +114,7 @@ def main():
files_host = input_url('File Store Host', files_host)
print('\nTRAINS Hosts configuration:\nWeb App: {}\nAPI: {}\nFile Store: {}\n'.format(
print('\nClearML Hosts configuration:\nWeb App: {}\nAPI: {}\nFile Store: {}\n'.format(
web_host, api_host, files_host))
retry = 1
@@ -121,7 +131,7 @@ def main():
# noinspection PyBroadException
try:
default_sdk_conf = Path(__file__).parent.absolute() / 'sdk.conf'
default_sdk_conf = Path(__file__).absolute().parents[2] / 'config/default/sdk.conf'
with open(str(default_sdk_conf), 'rt') as f:
default_sdk = f.read()
except Exception:
@@ -130,14 +140,14 @@ def main():
# noinspection PyBroadException
try:
with open(str(conf_file), 'wt') as f:
header = '# TRAINS SDK configuration file\n' \
header = '# ClearML SDK configuration file\n' \
'api {\n' \
' # Notice: \'host\' is the api server (default port 8008), not the web server.\n' \
' api_server: %s\n' \
' web_server: %s\n' \
' files_server: %s\n' \
' # Credentials are generated using the webapp, %s/profile\n' \
' # Override with os environment: TRAINS_API_ACCESS_KEY / TRAINS_API_SECRET_KEY\n' \
' # Override with os environment: CLEARML_API_ACCESS_KEY / CLEARML_API_SECRET_KEY\n' \
' credentials {"access_key": "%s", "secret_key": "%s"}\n' \
'}\n' \
'sdk ' % (api_host, web_host, files_host,
@@ -149,7 +159,7 @@ def main():
return
print('\nNew configuration stored in {}'.format(str(conf_file)))
print('TRAINS setup completed successfully.')
print('ClearML setup completed successfully.')
def parse_host(parsed_host, allow_input=True):
@@ -290,7 +300,7 @@ def verify_url(parse_input):
parsed_host = None
except Exception:
parsed_host = None
print('Could not parse url {}\nEnter your trains-server host: '.format(parse_input), end='')
print('Could not parse url {}\nEnter your clearml-server host: '.format(parse_input), end='')
return parsed_host

View File

@@ -0,0 +1 @@

View File

@@ -0,0 +1,119 @@
from argparse import ArgumentParser
from pathlib2 import Path
from clearml.backend_interface.task.populate import CreateAndPopulate
from clearml import Task
def setup_parser(parser):
parser.add_argument('--version', action='store_true', default=None,
help='Display the clearml-task utility version')
parser.add_argument('--project', type=str, default=None,
help='Required: set the project name for the task. '
'If --base-task-id is used, this arguments is optional.')
parser.add_argument('--name', type=str, default=None, required=True,
help='Required: select a name for the remote task')
parser.add_argument('--repo', type=str, default=None,
help='remote URL for the repository to use. '
'Example: --repo https://github.com/allegroai/clearml.git')
parser.add_argument('--branch', type=str, default=None,
help='Select specific repository branch/tag (implies the latest commit from the branch)')
parser.add_argument('--commit', type=str, default=None,
help='Select specific commit id to use (default: latest commit, '
'or when used with local repository matching the local commit id)')
parser.add_argument('--folder', type=str, default=None,
help='Remotely execute the code in the local folder. '
'Notice! It assumes a git repository already exists. '
'Current state of the repo (commit id and uncommitted changes) is logged '
'and will be replicated on the remote machine')
parser.add_argument('--script', type=str, default=None,
help='Specify the entry point script for the remote execution. '
'When used in tandem with --repo the script should be a relative path inside '
'the repository, for example: --script source/train.py .'
'When used with --folder it supports a direct path to a file inside the local '
'repository itself, for example: --script ~/project/source/train.py')
parser.add_argument('--cwd', type=str, default=None,
help='Working directory to launch the script from. Default: repository root folder. '
'Relative to repo root or local folder')
parser.add_argument('--args', default=None, nargs='*',
help='Arguments to pass to the remote execution, list of <argument>=<value> strings.'
'Currently only argparse arguments are supported. '
'Example: --args lr=0.003 batch_size=64')
parser.add_argument('--queue', type=str, default=None,
help='Select the queue to launch the task. '
'If not provided a Task will be created but it will not be launched.')
parser.add_argument('--requirements', type=str, default=None,
help='Specify requirements.txt file to install when setting the session. '
'If not provided, the requirements.txt from the repository will be used.')
parser.add_argument('--packages', default=None, nargs='*',
help='Manually specify a list of required packages. '
'Example: --packages "tqdm>=2.1" "scikit-learn"')
parser.add_argument('--docker', type=str, default=None,
help='Select the docker image to use in the remote session')
parser.add_argument('--skip-task-init', action='store_true', default=None,
help='If set, Task.init() call is not added to the entry point, and is assumed '
'to be called in within the script. Default: add Task.init() call entry point script')
parser.add_argument('--base-task-id', type=str, default=None,
help='Use a pre-existing task in the system, instead of a local repo/script. '
'Essentially clones an existing task and overrides arguments/requirements.')
def cli():
title = 'ClearML launch - launch any codebase on remote machine running clearml-agent'
print(title)
parser = ArgumentParser(description=title)
setup_parser(parser)
# get the args
args = parser.parse_args()
if args.version:
from ...version import __version__
print('Version {}'.format(__version__))
exit(0)
create_populate = CreateAndPopulate(
project_name=args.project,
task_name=args.name,
repo=args.repo or args.folder,
branch=args.branch,
commit=args.commit,
script=args.script,
working_directory=args.cwd,
packages=args.packages,
requirements_file=args.requirements,
base_task_id=args.base_task_id,
add_task_init_call=not args.skip_task_init,
raise_on_missing_entries=True,
)
# verify args
create_populate.update_task_args(args.args)
print('Creating new task')
create_populate.create_task()
# update Task args
create_populate.update_task_args(args.args)
print('New task created id={}'.format(create_populate.get_id()))
if not args.queue:
print('Warning: No queue was provided, leaving task in draft-mode.')
exit(0)
Task.enqueue(create_populate.task, queue_name=args.queue)
print('Task id={} sent for execution on queue {}'.format(create_populate.get_id(), args.queue))
print('Execution log at: {}'.format(create_populate.task.get_output_log_web_page()))
def main():
try:
cli()
except KeyboardInterrupt:
print('\nUser aborted')
except Exception as ex:
print('\nError: {}'.format(ex))
exit(1)
if __name__ == '__main__':
main()

View File

@@ -135,7 +135,10 @@ def dev_worker_name():
def __set_is_master_node():
# noinspection PyBroadException
try:
force_master_node = os.environ.pop('TRAINS_FORCE_MASTER_NODE', None)
# pop both set the first
env_a = os.environ.pop('CLEARML_FORCE_MASTER_NODE', None)
env_b = os.environ.pop('TRAINS_FORCE_MASTER_NODE', None)
force_master_node = env_a or env_b
except Exception:
force_master_node = None

View File

@@ -2,7 +2,7 @@
version: 1
disable_existing_loggers: 0
loggers {
trains {
clearml {
level: INFO
}
boto {

View File

@@ -1,10 +1,10 @@
{
# TRAINS - default SDK configuration
# ClearML - default SDK configuration
storage {
cache {
# Defaults to system temp folder / cache
default_base_dir: "~/.trains/cache"
default_base_dir: "~/.clearml/cache"
}
direct_access: [
@@ -93,7 +93,7 @@
google.storage {
# # Default project and credentials file
# # Will be used when no bucket configuration is found
# project: "trains"
# project: "clearml"
# credentials_json: "/path/to/credentials.json"
# # Specific credentials per bucket and sub directory
@@ -101,7 +101,7 @@
# {
# bucket: "my-bucket"
# subdir: "path/in/bucket" # Not required
# project: "trains"
# project: "clearml"
# credentials_json: "/path/to/credentials.json"
# },
# ]
@@ -109,7 +109,7 @@
azure.storage {
# containers: [
# {
# account_name: "trains"
# account_name: "clearml"
# account_key: "secret"
# # container_name:
# }
@@ -150,8 +150,8 @@
# do not analyze the entire repository.
force_analyze_entire_repo: false
# If set to true, *trains* update message will not be printed to the console
# this value can be overwritten with os environment variable TRAINS_SUPPRESS_UPDATE_MESSAGE=1
# If set to true, *clearml* update message will not be printed to the console
# this value can be overwritten with os environment variable CLEARML_SUPPRESS_UPDATE_MESSAGE=1
suppress_update_message: false
# If this flag is true (default is false), instead of analyzing the code with Pigar, analyze with `pip freeze`
@@ -161,7 +161,7 @@
# of the Hyper-Parameters.
# multiple selected variables are supported including the suffix '*'.
# For example: "AWS_*" will log any OS environment variable starting with 'AWS_'.
# This value can be overwritten with os environment variable TRAINS_LOG_ENVIRONMENT="[AWS_*, CUDA_VERSION]"
# This value can be overwritten with os environment variable CLEARML_LOG_ENVIRONMENT="[AWS_*, CUDA_VERSION]"
# Example: log_os_environments: ["AWS_*", "CUDA_VERSION"]
log_os_environments: []

View File

@@ -5,28 +5,28 @@ from ..backend_config.converters import base64_to_text, or_
from pathlib2 import Path
SESSION_CACHE_FILE = ".session.json"
DEFAULT_CACHE_DIR = str(Path(tempfile.gettempdir()) / "trains_cache")
DEFAULT_CACHE_DIR = str(Path(tempfile.gettempdir()) / "clearml_cache")
TASK_ID_ENV_VAR = EnvEntry("TRAINS_TASK_ID", "ALG_TASK_ID")
DOCKER_IMAGE_ENV_VAR = EnvEntry("TRAINS_DOCKER_IMAGE", "ALG_DOCKER_IMAGE")
LOG_TO_BACKEND_ENV_VAR = EnvEntry("TRAINS_LOG_TASK_TO_BACKEND", "ALG_LOG_TASK_TO_BACKEND", type=bool)
NODE_ID_ENV_VAR = EnvEntry("TRAINS_NODE_ID", "ALG_NODE_ID", type=int)
PROC_MASTER_ID_ENV_VAR = EnvEntry("TRAINS_PROC_MASTER_ID", "ALG_PROC_MASTER_ID", type=str)
LOG_STDERR_REDIRECT_LEVEL = EnvEntry("TRAINS_LOG_STDERR_REDIRECT_LEVEL", "ALG_LOG_STDERR_REDIRECT_LEVEL")
DEV_WORKER_NAME = EnvEntry("TRAINS_WORKER_NAME", "ALG_WORKER_NAME")
DEV_TASK_NO_REUSE = EnvEntry("TRAINS_TASK_NO_REUSE", "ALG_TASK_NO_REUSE", type=bool)
TASK_LOG_ENVIRONMENT = EnvEntry("TRAINS_LOG_ENVIRONMENT", "ALG_LOG_ENVIRONMENT", type=str)
TRAINS_CACHE_DIR = EnvEntry("TRAINS_CACHE_DIR", "ALG_CACHE_DIR")
TASK_ID_ENV_VAR = EnvEntry("CLEARML_TASK_ID", "TRAINS_TASK_ID")
DOCKER_IMAGE_ENV_VAR = EnvEntry("CLEARML_DOCKER_IMAGE", "TRAINS_DOCKER_IMAGE")
LOG_TO_BACKEND_ENV_VAR = EnvEntry("CLEARML_LOG_TASK_TO_BACKEND", "TRAINS_LOG_TASK_TO_BACKEND", type=bool)
NODE_ID_ENV_VAR = EnvEntry("CLEARML_NODE_ID", "TRAINS_NODE_ID", type=int)
PROC_MASTER_ID_ENV_VAR = EnvEntry("CLEARML_PROC_MASTER_ID", "TRAINS_PROC_MASTER_ID", type=str)
LOG_STDERR_REDIRECT_LEVEL = EnvEntry("CLEARML_LOG_STDERR_REDIRECT_LEVEL", "TRAINS_LOG_STDERR_REDIRECT_LEVEL")
DEV_WORKER_NAME = EnvEntry("CLEARML_WORKER_NAME", "TRAINS_WORKER_NAME")
DEV_TASK_NO_REUSE = EnvEntry("CLEARML_TASK_NO_REUSE", "TRAINS_TASK_NO_REUSE", type=bool)
TASK_LOG_ENVIRONMENT = EnvEntry("CLEARML_LOG_ENVIRONMENT", "TRAINS_LOG_ENVIRONMENT", type=str)
TRAINS_CACHE_DIR = EnvEntry("CLEARML_CACHE_DIR", "TRAINS_CACHE_DIR")
LOG_LEVEL_ENV_VAR = EnvEntry("TRAINS_LOG_LEVEL", "ALG_LOG_LEVEL", converter=or_(int, str))
LOG_LEVEL_ENV_VAR = EnvEntry("CLEARML_LOG_LEVEL", "TRAINS_LOG_LEVEL", converter=or_(int, str))
SUPPRESS_UPDATE_MESSAGE_ENV_VAR = EnvEntry("TRAINS_SUPPRESS_UPDATE_MESSAGE", "ALG_SUPPRESS_UPDATE_MESSAGE", type=bool)
SUPPRESS_UPDATE_MESSAGE_ENV_VAR = EnvEntry("CLEARML_SUPPRESS_UPDATE_MESSAGE", "TRAINS_SUPPRESS_UPDATE_MESSAGE", type=bool)
# Repository detection
VCS_REPO_TYPE = EnvEntry("TRAINS_VCS_REPO_TYPE", "ALG_VCS_REPO_TYPE", default="git")
VCS_REPOSITORY_URL = EnvEntry("TRAINS_VCS_REPO_URL", "ALG_VCS_REPO_URL")
VCS_COMMIT_ID = EnvEntry("TRAINS_VCS_COMMIT_ID", "ALG_VCS_COMMIT_ID")
VCS_BRANCH = EnvEntry("TRAINS_VCS_BRANCH", "ALG_VCS_BRANCH")
VCS_ROOT = EnvEntry("TRAINS_VCS_ROOT", "ALG_VCS_ROOT")
VCS_STATUS = EnvEntry("TRAINS_VCS_STATUS", "ALG_VCS_STATUS", converter=base64_to_text)
VCS_DIFF = EnvEntry("TRAINS_VCS_DIFF", "ALG_VCS_DIFF", converter=base64_to_text)
VCS_REPO_TYPE = EnvEntry("CLEARML_VCS_REPO_TYPE", "TRAINS_VCS_REPO_TYPE", default="git")
VCS_REPOSITORY_URL = EnvEntry("CLEARML_VCS_REPO_URL", "TRAINS_VCS_REPO_URL")
VCS_COMMIT_ID = EnvEntry("CLEARML_VCS_COMMIT_ID", "TRAINS_VCS_COMMIT_ID")
VCS_BRANCH = EnvEntry("CLEARML_VCS_BRANCH", "TRAINS_VCS_BRANCH")
VCS_ROOT = EnvEntry("CLEARML_VCS_ROOT", "TRAINS_VCS_ROOT")
VCS_STATUS = EnvEntry("CLEARML_VCS_STATUS", "TRAINS_VCS_STATUS", converter=base64_to_text)
VCS_DIFF = EnvEntry("CLEARML_VCS_DIFF", "TRAINS_VCS_DIFF", converter=base64_to_text)

View File

@@ -0,0 +1,6 @@
from .dataset import FileEntry, Dataset
__all__ = [
"FileEntry",
"Dataset",
]

1244
clearml/datasets/dataset.py Normal file

File diff suppressed because it is too large Load Diff

View File

@@ -56,7 +56,7 @@ class LoggerRoot(object):
# avoid nested imports
from ..config import get_log_redirect_level
LoggerRoot.__base_logger = logging.getLogger('trains')
LoggerRoot.__base_logger = logging.getLogger('clearml')
level = level if level is not None else default_level
LoggerRoot.__base_logger.setLevel(level)

View File

@@ -145,11 +145,11 @@ def _patch_module(module, prefix='', basepath=None, basemodule=None, exclude_pre
prefix += module.__name__.split('.')[-1] + '.'
# Do not patch low level network layer
if prefix.startswith('trains.backend_api.session.') and prefix != 'trains.backend_api.session.':
if prefix.startswith('clearml.backend_api.session.') and prefix != 'clearml.backend_api.session.':
if not prefix.endswith('.Session.') and '.token_manager.' not in prefix:
# print('SKIPPING: {}'.format(prefix))
return
if prefix.startswith('trains.backend_api.services.'):
if prefix.startswith('clearml.backend_api.services.'):
return
for skip in exclude_prefixes:
@@ -208,7 +208,7 @@ def _patch_module(module, prefix='', basepath=None, basemodule=None, exclude_pre
def trace_trains(stream=None, level=1, exclude_prefixes=[], only_prefix=[]):
"""
DEBUG ONLY - Add full Trains package code trace
DEBUG ONLY - Add full ClearML package code trace
Output trace to filename or stream, default is sys.stderr
Trace level
-2: Trace function and arguments and returned call
@@ -244,7 +244,7 @@ def trace_trains(stream=None, level=1, exclude_prefixes=[], only_prefix=[]):
__stream_flush = None
from ..version import __version__
msg = 'Trains v{} - Starting Trace\n\n'.format(__version__)
msg = 'ClearML v{} - Starting Trace\n\n'.format(__version__)
# print to actual stderr
stderr_write(msg)
# store to stream
@@ -252,7 +252,7 @@ def trace_trains(stream=None, level=1, exclude_prefixes=[], only_prefix=[]):
__stream_write('{:9}:{:5}:{:8}: {:14}\n'.format('seconds', 'pid', 'tid', 'self'))
__stream_write('{:9}:{:5}:{:8}:{:15}\n'.format('-' * 9, '-' * 5, '-' * 8, '-' * 15))
__trace_start = time.time()
_patch_module('trains', exclude_prefixes=exclude_prefixes or [], only_prefix=only_prefix or [])
_patch_module('clearml', exclude_prefixes=exclude_prefixes or [], only_prefix=only_prefix or [])
def trace_level(level=1):
@@ -343,7 +343,7 @@ def end_of_program():
if __name__ == '__main__':
# from trains import Task
# from clearml import Task
# task = Task.init(project_name="examples", task_name="trace test")
# trace_trains('_trace.txt', level=2)
print_traced_files('_trace_*.txt', lines_per_tid=10)

View File

@@ -1,3 +1,3 @@
class UsageError(RuntimeError):
""" An exception raised for illegal usage of trains objects"""
""" An exception raised for illegal usage of clearml objects"""
pass

View File

@@ -14,7 +14,7 @@ try:
except ImportError:
pd = None
from logging import getLogger
getLogger('trains.external.kerastuner').warning(
getLogger('clearml.external.kerastuner').warning(
'Pandas is not installed, summary table reporting will be skipped.')
@@ -26,7 +26,7 @@ class TrainsTunerLogger(Logger):
super(TrainsTunerLogger, self).__init__()
self.task = task or Task.current_task()
if not self.task:
raise ValueError("Trains Task could not be found, pass in TrainsTunerLogger or "
raise ValueError("ClearML Task could not be found, pass in TrainsTunerLogger or "
"call Task.init before initializing TrainsTunerLogger")
self._summary = pd.DataFrame() if pd else None

View File

@@ -36,14 +36,14 @@ if TYPE_CHECKING:
class Logger(object):
"""
The ``Logger`` class is the Trains console log and metric statistics interface, and contains methods for explicit
The ``Logger`` class is the ClearML console log and metric statistics interface, and contains methods for explicit
reporting.
Explicit reporting extends Trains automagical capturing of inputs and output. Explicit reporting
Explicit reporting extends ClearML automagical capturing of inputs and output. Explicit reporting
methods include scalar plots, line plots, histograms, confusion matrices, 2D and 3D scatter
diagrams, text logging, tables, and image uploading and reporting.
In the **Trains Web-App (UI)**, ``Logger`` output appears in the **RESULTS** tab, **LOG**, **SCALARS**,
In the **ClearML Web-App (UI)**, ``Logger`` output appears in the **RESULTS** tab, **LOG**, **SCALARS**,
**PLOTS**, and **DEBUG SAMPLES** sub-tabs. When you compare experiments, ``Logger`` output appears in the
comparisons.
@@ -90,7 +90,7 @@ class Logger(object):
if self._connect_logging:
StdStreamPatch.patch_logging_formatter(self)
elif not self._connect_std_streams:
# make sure that at least the main trains logger is connect
# make sure that at least the main clearml logger is connect
base_logger = LoggerRoot.get_base_logger()
if base_logger and base_logger.handlers:
StdStreamPatch.patch_logging_formatter(self, base_logger.handlers[0])
@@ -126,7 +126,7 @@ class Logger(object):
logger.report_text('log some text', level=logging.DEBUG, print_console=False)
You can view the reported text in the **Trains Web-App (UI)**, **RESULTS** tab, **LOG** sub-tab.
You can view the reported text in the **ClearML Web-App (UI)**, **RESULTS** tab, **LOG** sub-tab.
:param str msg: The text to log.
:param int level: The log level from the Python ``logging`` package. The default value is ``logging.INFO``.
@@ -151,7 +151,7 @@ class Logger(object):
scalar_series = [random.randint(0,10) for i in range(10)]
logger.report_scalar(title='scalar metrics','series', value=scalar_series[iteration], iteration=0)
You can view the scalar plots in the **Trains Web-App (UI)**, **RESULTS** tab, **SCALARS** sub-tab.
You can view the scalar plots in the **ClearML Web-App (UI)**, **RESULTS** tab, **SCALARS** sub-tab.
:param str title: The title (metric) of the plot. Plot more than one scalar series on the same plot by using
the same ``title`` for each call to this method.
@@ -190,7 +190,7 @@ class Logger(object):
logger.report_vector(title='vector example', series='vector series', values=vector_series, iteration=0,
labels=['A','B'], xaxis='X axis label', yaxis='Y axis label')
You can view the vectors plots in the **Trains Web-App (UI)**, **RESULTS** tab, **PLOTS** sub-tab.
You can view the vectors plots in the **ClearML Web-App (UI)**, **RESULTS** tab, **PLOTS** sub-tab.
:param str title: The title (metric) of the plot.
:param str series: The series name (variant) of the reported histogram.
@@ -237,7 +237,7 @@ class Logger(object):
logger.report_histogram(title='histogram example', series='histogram series',
values=vector_series, iteration=0, labels=['A','B'], xaxis='X axis label', yaxis='Y axis label')
You can view the reported histograms in the **Trains Web-App (UI)**, **RESULTS** tab, **PLOTS** sub-tab.
You can view the reported histograms in the **ClearML Web-App (UI)**, **RESULTS** tab, **PLOTS** sub-tab.
:param str title: The title (metric) of the plot.
:param str series: The series name (variant) of the reported histogram.
@@ -305,7 +305,7 @@ class Logger(object):
logger.report_table(title='table example',series='pandas DataFrame',iteration=0,table_plot=df)
You can view the reported tables in the **Trains Web-App (UI)**, **RESULTS** tab, **PLOTS** sub-tab.
You can view the reported tables in the **ClearML Web-App (UI)**, **RESULTS** tab, **PLOTS** sub-tab.
:param str title: The title (metric) of the table.
:param str series: The series name (variant) of the reported table.
@@ -1022,8 +1022,8 @@ class Logger(object):
The images are uploaded separately. A link to each image is reported.
.. note::
Credentials for the destination storage are specified in the Trains configuration file,
``~/trains.conf``.
Credentials for the destination storage are specified in the ClearML configuration file,
``~/clearml.conf``.
:param str uri: example: 's3://bucket/directory/' or 'file:///tmp/debug/'
@@ -1150,7 +1150,7 @@ class Logger(object):
The values are:
- ``True`` - Scalars without specific titles are grouped together in the "Scalars" plot, preserving
backward compatibility with Trains automagical behavior.
backward compatibility with ClearML automagical behavior.
- ``False`` - TensorBoard scalars without titles get a title/series with the same tag. (default)
:type group_scalars: bool
"""
@@ -1214,7 +1214,7 @@ class Logger(object):
try:
# make sure we are writing to the original stdout
StdStreamPatch.stderr_original_write(
'trains.Logger failed sending log [level {}]: "{}"\n'.format(level, msg))
'clearml.Logger failed sending log [level {}]: "{}"\n'.format(level, msg))
except Exception:
pass
else:

View File

@@ -472,9 +472,9 @@ class InputModel(Model):
framework, and indicate whether to immediately set the model's status to ``Published``.
The model is read-only.
The **Trains Server** (backend) may already store the model's URL. If the input model's URL is not
stored, meaning the model is new, then it is imported and Trains stores its metadata.
If the URL is already stored, the import process stops, Trains issues a warning message, and Trains
The **ClearML Server** (backend) may already store the model's URL. If the input model's URL is not
stored, meaning the model is new, then it is imported and ClearML stores its metadata.
If the URL is already stored, the import process stops, ClearML issues a warning message, and ClearML
reuses the model.
In your Python experiment script, after importing the model, you can connect it to the main execution
@@ -482,12 +482,12 @@ class InputModel(Model):
network.
.. note::
Using the **Trains Web-App** (user interface), you can reuse imported models and switch models in
Using the **ClearML Web-App** (user interface), you can reuse imported models and switch models in
experiments.
:param str weights_url: A valid URL for the initial weights file. If the **Trains Web-App** (backend)
:param str weights_url: A valid URL for the initial weights file. If the **ClearML Web-App** (backend)
already stores the metadata of a model with the same URL, that existing model is returned
and Trains ignores all other parameters.
and ClearML ignores all other parameters.
For example:
@@ -715,7 +715,7 @@ class InputModel(Model):
def __init__(self, model_id):
# type: (str) -> None
"""
:param str model_id: The Trains Id (system UUID) of the input model whose metadata the **Trains Server**
:param str model_id: The ClearML Id (system UUID) of the input model whose metadata the **ClearML Server**
(backend) stores.
"""
super(InputModel, self).__init__(model_id)
@@ -731,16 +731,16 @@ class InputModel(Model):
Connect the current model to a Task object, if the model is preexisting. Preexisting models include:
- Imported models (InputModel objects created using the :meth:`Logger.import_model` method).
- Models whose metadata is already in the Trains platform, meaning the InputModel object is instantiated
from the ``InputModel`` class specifying the the model's Trains Id as an argument.
- Models whose origin is not Trains that are used to create an InputModel object. For example,
- Models whose metadata is already in the ClearML platform, meaning the InputModel object is instantiated
from the ``InputModel`` class specifying the the model's ClearML Id as an argument.
- Models whose origin is not ClearML that are used to create an InputModel object. For example,
models created using TensorFlow models.
When the experiment is executed remotely in a worker, the input model already specified in the experiment is
used.
.. note::
The **Trains Web-App** allows you to switch one input model for another and then enqueue the experiment
The **ClearML Web-App** allows you to switch one input model for another and then enqueue the experiment
to execute in a worker.
:param object task: A Task object.
@@ -789,7 +789,7 @@ class OutputModel(BaseModel):
.. note::
When executing a Task (experiment) remotely in a worker, you can modify the model configuration and / or model's
label enumeration using the **Trains Web-App**.
label enumeration using the **ClearML Web-App**.
"""
@property
@@ -990,7 +990,7 @@ class OutputModel(BaseModel):
Connect the current model to a Task object, if the model is a preexisting model. Preexisting models include:
- Imported models.
- Models whose metadata the **Trains Server** (backend) is already storing.
- Models whose metadata the **ClearML Server** (backend) is already storing.
- Models from another source, such as frameworks like TensorFlow.
:param object task: A Task object.
@@ -1044,8 +1044,8 @@ class OutputModel(BaseModel):
Using this method, files uploads are separate and then a link to each is stored in the model object.
.. note::
For storage requiring credentials, the credentials are stored in the Trains configuration file,
``~/trains.conf``.
For storage requiring credentials, the credentials are stored in the ClearML configuration file,
``~/clearml.conf``.
:param str uri: The URI of the upload storage destination.

View File

@@ -58,10 +58,10 @@ class CacheManager(object):
return cached_file
@staticmethod
def upload_file(local_file, remote_url, wait_for_upload=True):
def upload_file(local_file, remote_url, wait_for_upload=True, retries=1):
helper = StorageHelper.get(remote_url)
result = helper.upload(
local_file, remote_url, async_enable=not wait_for_upload
local_file, remote_url, async_enable=not wait_for_upload, retries=retries,
)
CacheManager._add_remote_url(remote_url, local_file)
return result

View File

@@ -681,6 +681,7 @@ class StorageHelper(object):
# try to get file size
try:
if isinstance(self._driver, _HttpDriver) and obj:
obj = self._driver._get_download_object(obj)
total_size_mb = float(obj.headers.get('Content-Length', 0)) / (1024 * 1024)
elif hasattr(obj, 'size'):
size = obj.size
@@ -785,12 +786,12 @@ class StorageHelper(object):
def check_write_permissions(self, dest_path=None):
# create a temporary file, then de;ete it
base_url = dest_path or self._base_url
dest_path = base_url + '/.trains.test'
dest_path = base_url + '/.clearml.test'
# do not check http/s connection permissions
if dest_path.startswith('http'):
return True
try:
self.upload_from_stream(stream=six.BytesIO(b'trains'), dest_path=dest_path)
self.upload_from_stream(stream=six.BytesIO(b'clearml'), dest_path=dest_path)
self.delete(path=dest_path)
except Exception:
raise ValueError('Insufficient permissions for {}'.format(base_url))
@@ -1024,6 +1025,11 @@ class _HttpDriver(_Driver):
return self._default_backend_session.add_auth_headers({})
return None
class _HttpSessionHandle(object):
def __init__(self, url, is_stream, container_name, object_name):
self.url, self.is_stream, self.container_name, self.object_name = \
url, is_stream, container_name, object_name
def __init__(self, retries=5):
self._retries = retries
self._containers = {}
@@ -1055,24 +1061,39 @@ class _HttpDriver(_Driver):
def list_container_objects(self, *args, **kwargs):
raise NotImplementedError('List is not implemented for http protocol')
def delete_object(self, *args, **kwargs):
raise NotImplementedError('Delete is not implemented for http protocol')
def delete_object(self, obj, *args, **kwargs):
assert isinstance(obj, self._HttpSessionHandle)
container = self._containers[obj.container_name]
res = container.session.delete(obj.url, headers=container.get_headers(obj.url))
if res.status_code != requests.codes.ok:
raise ValueError('Failed deleting object %s (%d): %s' % (obj.object_name, res.status_code, res.text))
return res
def get_object(self, container_name, object_name, *args, **kwargs):
container = self._containers[container_name]
# set stream flag before get request
container.session.stream = kwargs.get('stream', True)
is_stream = kwargs.get('stream', True)
url = ''.join((container_name, object_name.lstrip('/')))
res = container.session.get(url, timeout=self.timeout, headers=container.get_headers(url))
return self._HttpSessionHandle(url, is_stream, container_name, object_name)
def _get_download_object(self, obj):
# bypass for session result
if not isinstance(obj, self._HttpSessionHandle):
return obj
container = self._containers[obj.container_name]
# set stream flag before we send the request
container.session.stream = obj.is_stream
res = container.session.get(obj.url, timeout=self.timeout, headers=container.get_headers(obj.url))
if res.status_code != requests.codes.ok:
raise ValueError('Failed getting object %s (%d): %s' % (object_name, res.status_code, res.text))
raise ValueError('Failed getting object %s (%d): %s' % (obj.object_name, res.status_code, res.text))
return res
def download_object_as_stream(self, obj, chunk_size=64 * 1024, **_):
# return iterable object
obj = self._get_download_object(obj)
return obj.iter_content(chunk_size=chunk_size)
def download_object(self, obj, local_path, overwrite_existing=True, delete_on_failure=True, callback=None, **_):
obj = self._get_download_object(obj)
p = Path(local_path)
if not overwrite_existing and p.is_file():
log.warning('failed saving after download: overwrite=False and file exists (%s)' % str(p))

View File

@@ -48,8 +48,8 @@ class StorageManager(object):
@classmethod
def upload_file(
cls, local_file, remote_url, wait_for_upload=True
): # type: (str, str, bool) -> str
cls, local_file, remote_url, wait_for_upload=True, retries=1
): # type: (str, str, bool, int) -> str
"""
Upload a local file to a remote location. remote url is the finale destination of the uploaded file.
@@ -64,12 +64,14 @@ class StorageManager(object):
:param str local_file: Full path of a local file to be uploaded
:param str remote_url: Full path or remote url to upload to (including file name)
:param bool wait_for_upload: If False, return immediately and upload in the background. Default True.
:param int retries: Number of retries before failing to upload file, default 1.
:return: Newly uploaded remote URL.
"""
return CacheManager.get_cache_manager().upload_file(
local_file=local_file,
remote_url=remote_url,
wait_for_upload=wait_for_upload,
retries=retries,
)
@classmethod

View File

@@ -1,6 +1,6 @@
import hashlib
import sys
from typing import Optional
from typing import Optional, Union
from six.moves.urllib.parse import quote, urlparse, urlunparse
import six
@@ -72,8 +72,23 @@ def sha256sum(filename, skip_header=0, block_size=65536):
return h.hexdigest(), file_hash.hexdigest() if skip_header else None
def md5text(text, seed=1337):
# type: (str, Union[int, str]) -> str
"""
Return md5 hash of a string
Do not use this hash for security, if needed use something stronger like SHA2
:param text: string to hash
:param seed: use prefix seed for hashing
:return: md5 string
"""
h = hashlib.md5()
h.update((str(seed) + str(text)).encode('utf-8'))
return h.hexdigest()
def is_windows():
"""
:return: True if currently running on windows OS
"""
return sys.platform == 'win32'
return sys.platform == 'win32'

View File

@@ -77,8 +77,8 @@ class Task(_Task):
configuration, label enumeration, models, and other artifacts.
The term "main execution Task" refers to the Task context for current running experiment. Python experiment scripts
can create one, and only one, main execution Task. It is a traceable, and after a script runs and Trains stores
the Task in the **Trains Server** (backend), it is modifiable, reproducible, executable by a worker, and you
can create one, and only one, main execution Task. It is a traceable, and after a script runs and ClearML stores
the Task in the **ClearML Server** (backend), it is modifiable, reproducible, executable by a worker, and you
can duplicate it for further experimentation.
The ``Task`` class and its methods allow you to create and manage experiments, as well as perform
@@ -93,7 +93,7 @@ class Task(_Task):
- Create a new reproducible Task - :meth:`Task.init`
.. important::
In some cases, ``Task.init`` may return a Task object which is already stored in **Trains Server** (already
In some cases, ``Task.init`` may return a Task object which is already stored in **ClearML Server** (already
initialized), instead of creating a new Task. For a detailed explanation of those cases, see the ``Task.init``
method.
@@ -102,17 +102,17 @@ class Task(_Task):
- Get another (different) Task - :meth:`Task.get_task`
.. note::
The **Trains** documentation often refers to a Task as, "Task (experiment)".
The **ClearML** documentation often refers to a Task as, "Task (experiment)".
"Task" refers to the class in the Trains Python Client Package, the object in your Python experiment script,
and the entity with which **Trains Server** and **Trains Agent** work.
"Task" refers to the class in the ClearML Python Client Package, the object in your Python experiment script,
and the entity with which **ClearML Server** and **ClearML Agent** work.
"Experiment" refers to your deep learning solution, including its connected components, inputs, and outputs,
and is the experiment you can view, analyze, compare, modify, duplicate, and manage using the Trains
and is the experiment you can view, analyze, compare, modify, duplicate, and manage using the ClearML
**Web-App** (UI).
Therefore, a "Task" is effectively an "experiment", and "Task (experiment)" encompasses its usage throughout
the Trains.
the ClearML.
The exception to this Task behavior is sub-tasks (non-reproducible Tasks), which do not use the main execution
Task. Creating a sub-task always creates a new Task with a new Task ID.
@@ -197,7 +197,7 @@ class Task(_Task):
Creates a new Task (experiment) if:
- The Task never ran before. No Task with the same ``task_name`` and ``project_name`` is stored in
**Trains Server**.
**ClearML Server**.
- The Task has run before (the same ``task_name`` and ``project_name``), and (a) it stored models and / or
artifacts, or (b) its status is Published , or (c) it is Archived.
- A new Task is forced by calling ``Task.init`` with ``reuse_last_task_id=False``.
@@ -215,7 +215,7 @@ class Task(_Task):
.. code-block:: py
from trains import Task
from clearml import Task
task = Task.init('myProject', 'myTask')
If this code runs again, it will not create a new Task. It does not store a model or artifact,
@@ -285,7 +285,7 @@ class Task(_Task):
This is equivalent to `continue_last_task=True` and `reuse_last_task_id=a_task_id_string`.
:param str output_uri: The default location for output models and other artifacts. In the default location,
Trains creates a subfolder for the output. The subfolder structure is the following:
ClearML creates a subfolder for the output. The subfolder structure is the following:
<output destination name> / <project name> / <task name>.< Task ID>
@@ -297,9 +297,9 @@ class Task(_Task):
- Azure Storage: ``azure://company.blob.core.windows.net/folder/``
.. important::
For cloud storage, you must install the **Trains** package for your cloud storage type,
For cloud storage, you must install the **ClearML** package for your cloud storage type,
and then configure your storage credentials. For detailed information, see
`Trains Python Client Extras <./references/trains_extras_storage/>`_ in the "Trains Python Client
`ClearML Python Client Extras <./references/clearml_extras_storage/>`_ in the "ClearML Python Client
Reference" section.
:param auto_connect_arg_parser: Automatically connect an argparse object to the Task
@@ -324,7 +324,7 @@ class Task(_Task):
:param auto_connect_frameworks: Automatically connect frameworks This includes patching MatplotLib, XGBoost,
scikit-learn, Keras callbacks, and TensorBoard/X to serialize plots, graphs, and the model location to
the **Trains Server** (backend), in addition to original output destination.
the **ClearML Server** (backend), in addition to original output destination.
The values are:
@@ -342,7 +342,7 @@ class Task(_Task):
'xgboost': True, 'scikit': True, 'fastai': True, 'lightgbm': True, 'hydra': True}
:param bool auto_resource_monitoring: Automatically create machine resource monitoring plots
These plots appear in in the **Trains Web-App (UI)**, **RESULTS** tab, **SCALARS** sub-tab,
These plots appear in in the **ClearML Web-App (UI)**, **RESULTS** tab, **SCALARS** sub-tab,
with a title of **:resource monitor:**.
The values are:
@@ -409,6 +409,7 @@ class Task(_Task):
# create a new logger (to catch stdout/err)
cls.__main_task._logger = None
cls.__main_task.__reporter = None
# noinspection PyProtectedMember
cls.__main_task._get_logger(auto_connect_streams=auto_connect_streams)
cls.__main_task._artifacts_manager = Artifacts(cls.__main_task)
# unregister signal hooks, they cause subprocess to hang
@@ -569,10 +570,10 @@ class Task(_Task):
# show the debug metrics page in the log, it is very convenient
if not is_sub_process_task_id:
if cls._offline_mode:
logger.report_text('TRAINS running in offline mode, session stored in {}'.format(
logger.report_text('ClearML running in offline mode, session stored in {}'.format(
task.get_offline_mode_folder()))
else:
logger.report_text('TRAINS results page: {}'.format(task.get_output_log_web_page()))
logger.report_text('ClearML results page: {}'.format(task.get_output_log_web_page()))
# Make sure we start the dev worker if required, otherwise it will only be started when we write
# something to the log.
task._dev_mode_task_start()
@@ -580,55 +581,76 @@ class Task(_Task):
return task
@classmethod
def create(cls, project_name=None, task_name=None, task_type=TaskTypes.training):
# type: (Optional[str], Optional[str], Task.TaskTypes) -> Task
def create(
cls,
project_name=None, # Optional[str]
task_name=None, # Optional[str]
task_type=None, # Optional[str]
repo=None, # Optional[str]
branch=None, # Optional[str]
commit=None, # Optional[str]
script=None, # Optional[str]
working_directory=None, # Optional[str]
packages=None, # Optional[Sequence[str]]
requirements_file=None, # Optional[Union[str, Path]]
docker=None, # Optional[str]
base_task_id=None, # Optional[str]
add_task_init_call=True, # bool
):
# type: (...) -> Task
"""
Create a new, non-reproducible Task (experiment). This is called a sub-task.
Manually create and populate a new Task (experiment) in the system.
If the code does not already contain a call to ``Task.init``, pass add_task_init_call=True,
and the code will be patched in remote execution (i.e. when executed by `clearml-agent`
.. note::
This method always creates a new, non-reproducible Task. To create a reproducible Task, call the
:meth:`Task.init` method. To reference another Task, call the :meth:`Task.get_task` method .
This method **always** creates a new Task.
Use :meth:`Task.init` method to automatically create and populate task for the running process.
To reference an existing Task, call the :meth:`Task.get_task` method .
:param str project_name: The name of the project in which the experiment will be created.
If ``project_name`` is ``None``, and the main execution Task is initialized (see :meth:`Task.init`),
then the main execution Task's project is used. Otherwise, if the project does
not exist, it is created. (Optional)
:param str task_name: The name of Task (experiment).
:param TaskTypes task_type: The task type.
:param project_name: Set the project name for the task. Required if base_task_id is None.
:param task_name: Set the name of the remote task. Required if base_task_id is None.
:param task_type: Optional, The task type to be created. Supported values: 'training', 'testing', 'inference',
'data_processing', 'application', 'monitor', 'controller', 'optimizer', 'service', 'qc', 'custom'
:param repo: Remote URL for the repository to use, or path to local copy of the git repository
Example: 'https://github.com/allegroai/clearml.git' or '~/project/repo'
:param branch: Select specific repository branch/tag (implies the latest commit from the branch)
:param commit: Select specific commit id to use (default: latest commit,
or when used with local repository matching the local commit id)
:param script: Specify the entry point script for the remote execution. When used in tandem with
remote git repository the script should be a relative path inside the repository,
for example: './source/train.py' . When used with local repository path it supports a
direct path to a file inside the local repository itself, for example: '~/project/source/train.py'
:param working_directory: Working directory to launch the script from. Default: repository root folder.
Relative to repo root or local folder.
:param packages: Manually specify a list of required packages. Example: ["tqdm>=2.1", "scikit-learn"]
:param requirements_file: Specify requirements.txt file to install when setting the session.
If not provided, the requirements.txt from the repository will be used.
:param docker: Select the docker image to be executed in by the remote session
:param base_task_id: Use a pre-existing task in the system, instead of a local repo/script.
Essentially clones an existing task and overrides arguments/requirements.
:param add_task_init_call: If True, a 'Task.init()' call is added to the script entry point in remote execution.
Valid task types:
- ``TaskTypes.training`` (default)
- ``TaskTypes.testing``
- ``TaskTypes.inference``
- ``TaskTypes.data_processing``
- ``TaskTypes.application``
- ``TaskTypes.monitor``
- ``TaskTypes.controller``
- ``TaskTypes.optimizer``
- ``TaskTypes.service``
- ``TaskTypes.qc``
- ``TaskTypes.custom``
:return: A new experiment.
:return: The newly created Task (experiment)
"""
if not project_name:
if not project_name and not base_task_id:
if not cls.__main_task:
raise ValueError("Please provide project_name, no global task context found "
"(Task.current_task hasn't been called)")
project_name = cls.__main_task.get_project_name()
from .backend_interface.task.populate import CreateAndPopulate
manual_populate = CreateAndPopulate(
project_name=project_name, task_name=task_name, task_type=task_type,
repo=repo, branch=branch, commit=commit,
script=script, working_directory=working_directory,
packages=packages, requirements_file=requirements_file,
docker=docker,
base_task_id=base_task_id,
add_task_init_call=add_task_init_call,
raise_on_missing_entries=False,
)
task = manual_populate.create_task()
try:
task = cls(
private=cls.__create_protection,
project_name=project_name,
task_name=task_name,
task_type=task_type,
log_to_backend=False,
force_create=True,
)
except Exception:
raise
return task
@classmethod
@@ -721,7 +743,7 @@ class Task(_Task):
helper = StorageHelper.get(value)
if not helper:
raise ValueError("Could not get access credentials for '{}' "
", check configuration file ~/trains.conf".format(value))
", check configuration file ~/clearml.conf".format(value))
helper.check_write_permissions(value)
self.storage_uri = value
@@ -758,7 +780,7 @@ class Task(_Task):
"""
Get a Logger object for reporting, for this task context. You can view all Logger report output associated with
the Task for which this method is called, including metrics, plots, text, tables, and images, in the
**Trains Web-App (UI)**.
**ClearML Web-App (UI)**.
:return: The Logger object for the current Task (experiment).
"""
@@ -796,8 +818,8 @@ class Task(_Task):
"""
assert isinstance(source_task, (six.string_types, Task))
if not Session.check_min_api_version('2.4'):
raise ValueError("Trains-server does not support DevOps features, "
"upgrade trains-server to 0.12.0 or above")
raise ValueError("ClearML-server does not support DevOps features, "
"upgrade clearml-server to 0.12.0 or above")
task_id = source_task if isinstance(source_task, six.string_types) else source_task.id
if not parent:
@@ -820,7 +842,7 @@ class Task(_Task):
.. note::
A worker daemon must be listening at the queue for the worker to fetch the Task and execute it,
see `Use Case Examples <../trains_agent_ref/#use-case-examples>`_ on the "Trains Agent
see `Use Case Examples <../clearml_agent_ref/#use-case-examples>`_ on the "ClearML Agent
Reference page.
:param Task/str task: The Task to enqueue. Specify a Task object or Task ID.
@@ -859,8 +881,8 @@ class Task(_Task):
"""
assert isinstance(task, (six.string_types, Task))
if not Session.check_min_api_version('2.4'):
raise ValueError("Trains-server does not support DevOps features, "
"upgrade trains-server to 0.12.0 or above")
raise ValueError("ClearML-server does not support DevOps features, "
"upgrade clearml-server to 0.12.0 or above")
# make sure we have wither name ot id
mutually_exclusive(queue_name=queue_name, queue_id=queue_id)
@@ -923,8 +945,8 @@ class Task(_Task):
"""
assert isinstance(task, (six.string_types, Task))
if not Session.check_min_api_version('2.4'):
raise ValueError("Trains-server does not support DevOps features, "
"upgrade trains-server to 0.12.0 or above")
raise ValueError("ClearML-server does not support DevOps features, "
"upgrade clearml-server to 0.12.0 or above")
task_id = task if isinstance(task, six.string_types) else task.id
session = cls._get_default_session()
@@ -990,7 +1012,7 @@ class Task(_Task):
name = self._default_configuration_section_name
if not multi_config_support and name and name != self._default_configuration_section_name:
raise ValueError("Multiple configurations is not supported with the current 'trains-server', "
raise ValueError("Multiple configurations is not supported with the current 'clearml-server', "
"please upgrade to the latest version")
for mutable_type, method in dispatch:
@@ -1024,11 +1046,11 @@ class Task(_Task):
:param configuration: The configuration. This is usually the configuration used in the model training process.
Specify one of the following:
- A dictionary - A dictionary containing the configuration. Trains stores the configuration in
the **Trains Server** (backend), in a HOCON format (JSON-like format) which is editable.
- A ``pathlib2.Path`` string - A path to the configuration file. Trains stores the content of the file.
- A dictionary - A dictionary containing the configuration. ClearML stores the configuration in
the **ClearML Server** (backend), in a HOCON format (JSON-like format) which is editable.
- A ``pathlib2.Path`` string - A path to the configuration file. ClearML stores the content of the file.
A local path must be relative path. When executing a Task remotely in a worker, the contents brought
from the **Trains Server** (backend) overwrites the contents of the file.
from the **ClearML Server** (backend) overwrites the contents of the file.
:param str name: Configuration section name. default: 'General'
Allowing users to store multiple configuration dicts/files
@@ -1038,10 +1060,10 @@ class Task(_Task):
:return: If a dictionary is specified, then a dictionary is returned. If pathlib2.Path / string is
specified, then a path to a local configuration file is returned. Configuration object.
"""
pathlib_Path = None
pathlib_Path = None # noqa
if not isinstance(configuration, (dict, Path, six.string_types)):
try:
from pathlib import Path as pathlib_Path
from pathlib import Path as pathlib_Path # noqa
except ImportError:
pass
if not pathlib_Path or not isinstance(configuration, pathlib_Path):
@@ -1053,7 +1075,7 @@ class Task(_Task):
name = self._default_configuration_section_name
if not multi_config_support and name and name != self._default_configuration_section_name:
raise ValueError("Multiple configurations is not supported with the current 'trains-server', "
raise ValueError("Multiple configurations is not supported with the current 'clearml-server', "
"please upgrade to the latest version")
# parameter dictionary
@@ -1141,7 +1163,7 @@ class Task(_Task):
return configuration
configuration_path = Path(configuration)
fd, local_filename = mkstemp(prefix='trains_task_config_',
fd, local_filename = mkstemp(prefix='clearml_task_config_',
suffix=configuration_path.suffixes[-1] if
configuration_path.suffixes else '.txt')
os.write(fd, configuration_text.encode('utf-8'))
@@ -1187,7 +1209,7 @@ class Task(_Task):
"""
Get a Logger object for reporting, for this task context. You can view all Logger report output associated with
the Task for which this method is called, including metrics, plots, text, tables, and images, in the
**Trains Web-App (UI)**.
**ClearML Web-App (UI)**.
:return: The Logger for the Task (experiment).
"""
@@ -1247,7 +1269,7 @@ class Task(_Task):
def reset(self, set_started_on_success=False, force=False):
# type: (bool, bool) -> None
"""
Reset a Task. Trains reloads a Task after a successful reset.
Reset a Task. ClearML reloads a Task after a successful reset.
When a worker executes a Task remotely, the Task does not reset unless
the ``force`` parameter is set to ``True`` (this avoids accidentally clearing logs and metrics).
@@ -1290,16 +1312,16 @@ class Task(_Task):
# type: (str, pandas.DataFrame, Dict, Union[bool, Sequence[str]]) -> None
"""
Register (add) an artifact for the current Task. Registered artifacts are dynamically sychronized with the
**Trains Server** (backend). If a registered artifact is updated, the update is stored in the
**Trains Server** (backend). Registered artifacts are primarily used for Data Audition.
**ClearML Server** (backend). If a registered artifact is updated, the update is stored in the
**ClearML Server** (backend). Registered artifacts are primarily used for Data Audition.
The currently supported registered artifact object type is a pandas.DataFrame.
See also :meth:`Task.unregister_artifact` and :meth:`Task.get_registered_artifacts`.
.. note::
Trains also supports uploaded artifacts which are one-time uploads of static artifacts that are not
dynamically sychronized with the **Trains Server** (backend). These static artifacts include
ClearML also supports uploaded artifacts which are one-time uploads of static artifacts that are not
dynamically sychronized with the **ClearML Server** (backend). These static artifacts include
additional object types. For more information, see :meth:`Task.upload_artifact`.
:param str name: The name of the artifact.
@@ -1308,7 +1330,7 @@ class Task(_Task):
If an artifact with the same name was previously registered, it is overwritten.
:param object artifact: The artifact object.
:param dict metadata: A dictionary of key-value pairs for any metadata. This dictionary appears with the
experiment in the **Trains Web-App (UI)**, **ARTIFACTS** tab.
experiment in the **ClearML Web-App (UI)**, **ARTIFACTS** tab.
:param uniqueness_columns: A Sequence of columns for artifact uniqueness comparison criteria, or the default
value of ``True``. If ``True``, the artifact uniqueness comparison criteria is all the columns,
which is the same as ``artifact.columns``.
@@ -1323,13 +1345,13 @@ class Task(_Task):
def unregister_artifact(self, name):
# type: (str) -> None
"""
Unregister (remove) a registered artifact. This removes the artifact from the watch list that Trains uses
to synchronize artifacts with the **Trains Server** (backend).
Unregister (remove) a registered artifact. This removes the artifact from the watch list that ClearML uses
to synchronize artifacts with the **ClearML Server** (backend).
.. important::
- Calling this method does not remove the artifact from a Task. It only stops Trains from
- Calling this method does not remove the artifact from a Task. It only stops ClearML from
monitoring the artifact.
- When this method is called, Trains immediately takes the last snapshot of the artifact.
- When this method is called, ClearML immediately takes the last snapshot of the artifact.
"""
self._artifacts_manager.unregister_artifact(name=name)
@@ -1361,12 +1383,12 @@ class Task(_Task):
The currently supported upload (static) artifact types include:
- string / pathlib2.Path - A path to artifact file. If a wildcard or a folder is specified, then Trains
- string / pathlib2.Path - A path to artifact file. If a wildcard or a folder is specified, then ClearML
creates and uploads a ZIP file.
- dict - Trains stores a dictionary as ``.json`` file and uploads it.
- pandas.DataFrame - Trains stores a pandas.DataFrame as ``.csv.gz`` (compressed CSV) file and uploads it.
- numpy.ndarray - Trains stores a numpy.ndarray as ``.npz`` file and uploads it.
- PIL.Image - Trains stores a PIL.Image as ``.png`` file and uploads it.
- dict - ClearML stores a dictionary as ``.json`` file and uploads it.
- pandas.DataFrame - ClearML stores a pandas.DataFrame as ``.csv.gz`` (compressed CSV) file and uploads it.
- numpy.ndarray - ClearML stores a numpy.ndarray as ``.npz`` file and uploads it.
- PIL.Image - ClearML stores a PIL.Image as ``.png`` file and uploads it.
- Any - If called with auto_pickle=True, the object will be pickled and uploaded.
:param str name: The artifact name.
@@ -1376,7 +1398,7 @@ class Task(_Task):
:param object artifact_object: The artifact object.
:param dict metadata: A dictionary of key-value pairs for any metadata. This dictionary appears with the
experiment in the **Trains Web-App (UI)**, **ARTIFACTS** tab.
experiment in the **ClearML Web-App (UI)**, **ARTIFACTS** tab.
:param bool delete_after_upload: After the upload, delete the local copy of the artifact
- ``True`` - Delete the local copy of the artifact.
@@ -1416,7 +1438,7 @@ class Task(_Task):
.. code-block:: py
{'input': [trains.Model()], 'output': [trains.Model()]}
{'input': [clearml.Model()], 'output': [clearml.Model()]}
"""
task_models = {'input': self._get_models(model_type='input'),
@@ -1510,7 +1532,7 @@ class Task(_Task):
.. note::
The maximum reported iteration is not in the local cache. This method
sends a request to the **Trains Server** (backend).
sends a request to the **ClearML Server** (backend).
:return: The last reported iteration number.
"""
@@ -1704,7 +1726,7 @@ class Task(_Task):
# type: (str) -> ()
"""
Set the base docker image for this experiment
If provided, this value will be used by trains-agent to execute this experiment
If provided, this value will be used by clearml-agent to execute this experiment
inside the provided docker image.
"""
if not self.running_locally() and self.is_main_task():
@@ -1732,12 +1754,12 @@ class Task(_Task):
def execute_remotely(self, queue_name=None, clone=False, exit_process=True):
# type: (Optional[str], bool, bool) -> Optional[Task]
"""
If task is running locally (i.e., not by ``trains-agent``), then clone the Task and enqueue it for remote
If task is running locally (i.e., not by ``clearml-agent``), then clone the Task and enqueue it for remote
execution; or, stop the execution of the current Task, reset its state, and enqueue it. If ``exit==True``,
*exit* this process.
.. note::
If the task is running remotely (i.e., ``trains-agent`` is executing it), this call is a no-op
If the task is running remotely (i.e., ``clearml-agent`` is executing it), this call is a no-op
(i.e., does nothing).
:param queue_name: The queue name used for enqueueing the task. If ``None``, this call exits the process
@@ -2006,12 +2028,12 @@ class Task(_Task):
:param session_folder_zip: Path to a folder containing the session, or zip-file of the session folder.
:return: Newly created task ID (str)
"""
print('TRAINS: Importing offline session from {}'.format(session_folder_zip))
print('ClearML: Importing offline session from {}'.format(session_folder_zip))
temp_folder = None
if Path(session_folder_zip).is_file():
# unzip the file:
temp_folder = mkdtemp(prefix='trains-offline-')
temp_folder = mkdtemp(prefix='clearml-offline-')
ZipFile(session_folder_zip).extractall(path=temp_folder)
session_folder_zip = temp_folder
@@ -2053,7 +2075,7 @@ class Task(_Task):
# metrics
Metrics.report_offline_session(task, session_folder)
# print imported results page
print('TRAINS results page: {}'.format(task.get_output_log_web_page()))
print('ClearML results page: {}'.format(task.get_output_log_web_page()))
task.completed()
# close task
task.close()
@@ -2072,10 +2094,10 @@ class Task(_Task):
def set_credentials(cls, api_host=None, web_host=None, files_host=None, key=None, secret=None, host=None):
# type: (Optional[str], Optional[str], Optional[str], Optional[str], Optional[str], Optional[str]) -> ()
"""
Set new default **Trains Server** (backend) host and credentials.
Set new default **ClearML Server** (backend) host and credentials.
These credentials will be overridden by either OS environment variables, or the Trains configuration
file, ``trains.conf``.
These credentials will be overridden by either OS environment variables, or the ClearML configuration
file, ``clearml.conf``.
.. warning::
Credentials must be set before initializing a Task object.
@@ -2114,6 +2136,40 @@ class Task(_Task):
Session.default_web = web_host or ''
Session.default_files = files_host or ''
@classmethod
def _create(cls, project_name=None, task_name=None, task_type=TaskTypes.training):
# type: (Optional[str], Optional[str], Task.TaskTypes) -> Task
"""
Create a new unpopulated Task (experiment).
:param str project_name: The name of the project in which the experiment will be created.
If ``project_name`` is ``None``, and the main execution Task is initialized (see :meth:`Task.init`),
then the main execution Task's project is used. Otherwise, if the project does
not exist, it is created. (Optional)
:param str task_name: The name of Task (experiment).
:param TaskTypes task_type: The task type.
:return: The newly created task created.
"""
if not project_name:
if not cls.__main_task:
raise ValueError("Please provide project_name, no global task context found "
"(Task.current_task hasn't been called)")
project_name = cls.__main_task.get_project_name()
try:
task = cls(
private=cls.__create_protection,
project_name=project_name,
task_name=task_name,
task_type=task_type,
log_to_backend=False,
force_create=True,
)
except Exception:
raise
return task
def _set_model_config(self, config_text=None, config_dict=None):
# type: (Optional[str], Optional[Mapping]) -> None
"""
@@ -2285,15 +2341,15 @@ class Task(_Task):
# force update of base logger to this current task (this is the main logger task)
logger = task._get_logger(auto_connect_streams=auto_connect_streams)
if closed_old_task:
logger.report_text('TRAINS Task: Closing old development task id={}'.format(default_task.get('id')))
logger.report_text('ClearML Task: Closing old development task id={}'.format(default_task.get('id')))
# print warning, reusing/creating a task
if default_task_id and not continue_last_task:
logger.report_text('TRAINS Task: overwriting (reusing) task id=%s' % task.id)
logger.report_text('ClearML Task: overwriting (reusing) task id=%s' % task.id)
elif default_task_id and continue_last_task:
logger.report_text('TRAINS Task: continuing previous task id=%s '
logger.report_text('ClearML Task: continuing previous task id=%s '
'Notice this run will not be reproducible!' % task.id)
else:
logger.report_text('TRAINS Task: created new task id=%s' % task.id)
logger.report_text('ClearML Task: created new task id=%s' % task.id)
# update current repository and put warning into logs
if detect_repo:
@@ -2567,8 +2623,7 @@ class Task(_Task):
self._kill_all_child_processes(send_kill=False)
time.sleep(2.0)
self._kill_all_child_processes(send_kill=True)
# noinspection PyProtectedMember
os._exit(1)
os._exit(1) # noqa
@staticmethod
def _kill_all_child_processes(send_kill=False):
@@ -2800,7 +2855,7 @@ class Task(_Task):
if filename.is_file():
relative_file_name = filename.relative_to(offline_folder).as_posix()
zf.write(filename.as_posix(), arcname=relative_file_name)
print('TRAINS Task: Offline session stored in {}'.format(zip_file))
print('ClearML Task: Offline session stored in {}'.format(zip_file))
except Exception:
pass
@@ -3179,8 +3234,8 @@ class Task(_Task):
task_data.get('type') not in (cls.TaskTypes.training, cls.TaskTypes.testing) and \
not Session.check_min_api_version(2.8):
print('WARNING: Changing task type to "{}" : '
'trains-server does not support task type "{}", '
'please upgrade trains-server.'.format(cls.TaskTypes.training, task_data['type'].value))
'clearml-server does not support task type "{}", '
'please upgrade clearml-server.'.format(cls.TaskTypes.training, task_data['type'].value))
task_data['type'] = cls.TaskTypes.training
compares = (

View File

@@ -46,7 +46,7 @@ class PatchArgumentParser:
from ..config import running_remotely, get_remote_task_id
if running_remotely():
# this will cause the current_task() to set PatchArgumentParser._current_task
from trains import Task
from clearml import Task
# noinspection PyBroadException
try:
current_task = Task.get_task(task_id=get_remote_task_id())

View File

@@ -27,10 +27,10 @@ class CheckPackageUpdates(object):
cls._package_version_checked = True
client, version = Session._client[0]
version = Version(version)
is_demo = 'https://demoapi.trains.allegro.ai/'.startswith(Session.get_api_server_host())
is_demo = 'https://demoapi.demo.clear.ml/'.startswith(Session.get_api_server_host())
update_server_releases = requests.get(
'https://updates.trains.allegro.ai/updates',
'https://updates.clear.ml/updates',
json={"demo": is_demo,
"versions": {c: str(v) for c, v in Session._client},
"CI": str(os.environ.get('CI', ''))},
@@ -62,13 +62,13 @@ class CheckPackageUpdates(object):
@staticmethod
def get_version_from_updates_server(cur_version):
"""
Get the latest version for trains from updates server
:param cur_version: The current running version of trains
Get the latest version for clearml from updates server
:param cur_version: The current running version of clearml
:type cur_version: Version
"""
try:
_ = requests.get('https://updates.trains.allegro.ai/updates',
data=json.dumps({"versions": {"trains": str(cur_version)}}),
_ = requests.get('https://updates.clear.ml/updates',
data=json.dumps({"versions": {"clearml": str(cur_version)}}),
timeout=1.0)
return
except Exception:

View File

@@ -77,6 +77,19 @@ class ProxyDictPreWrite(dict):
return self._set_callback((prefix + '.' + key_value[0], key_value[1],))
def verify_basic_type(a_dict_list, basic_types=None):
basic_types = (float, int, bool, six.string_types, ) if not basic_types else \
tuple(b for b in basic_types if b not in (list, tuple, dict))
if isinstance(a_dict_list, basic_types):
return True
if isinstance(a_dict_list, (list, tuple)):
return all(verify_basic_type(v) for v in a_dict_list)
elif isinstance(a_dict_list, dict):
return all(verify_basic_type(k) for k in a_dict_list.keys()) and \
all(verify_basic_type(v) for v in a_dict_list.values())
def flatten_dictionary(a_dict, prefix=''):
flat_dict = {}
sep = '/'
@@ -88,7 +101,11 @@ def flatten_dictionary(a_dict, prefix=''):
elif isinstance(v, (list, tuple)) and all([isinstance(i, basic_types) for i in v]):
flat_dict[prefix + k] = v
elif isinstance(v, dict):
flat_dict.update(flatten_dictionary(v, prefix=prefix + k + sep))
nested_flat_dict = flatten_dictionary(v, prefix=prefix + k + sep)
if nested_flat_dict:
flat_dict.update(nested_flat_dict)
else:
flat_dict[k] = {}
else:
# this is a mixture of list and dict, or any other object,
# leave it as is, we have nothing to do with it.

View File

@@ -41,7 +41,7 @@ class ResourceMonitor(object):
self._last_process_pool = {}
self._last_process_id_list = []
if not self._gpustat:
self._task.get_logger().report_text('TRAINS Monitor: GPU monitoring is not available')
self._task.get_logger().report_text('ClearML Monitor: GPU monitoring is not available')
else: # if running_remotely():
try:
active_gpus = os.environ.get('NVIDIA_VISIBLE_DEVICES', '') or \
@@ -105,13 +105,13 @@ class ResourceMonitor(object):
if IsTensorboardInit.tensorboard_used():
fallback_to_sec_as_iterations = False
elif seconds_since_started >= self.wait_for_first_iteration:
self._task.get_logger().report_text('TRAINS Monitor: Could not detect iteration reporting, '
self._task.get_logger().report_text('ClearML Monitor: Could not detect iteration reporting, '
'falling back to iterations as seconds-from-start')
fallback_to_sec_as_iterations = True
elif fallback_to_sec_as_iterations is True and seconds_since_started <= self.max_check_first_iteration:
if self._check_logger_reported():
fallback_to_sec_as_iterations = False
self._task.get_logger().report_text('TRAINS Monitor: Reporting detected, '
self._task.get_logger().report_text('ClearML Monitor: Reporting detected, '
'reverting back to iteration based reporting')
clear_readouts = True
@@ -231,7 +231,7 @@ class ResourceMonitor(object):
# something happened and we can't use gpu stats,
self._gpustat_fail += 1
if self._gpustat_fail >= 3:
self._task.get_logger().report_text('TRAINS Monitor: GPU monitoring failed getting GPU reading, '
self._task.get_logger().report_text('ClearML Monitor: GPU monitoring failed getting GPU reading, '
'switching off GPU monitoring')
self._gpustat = None

View File

@@ -12,7 +12,7 @@ def make_deterministic(seed=1337, cudnn_deterministic=False):
Ensure deterministic behavior across PyTorch using the provided random seed.
This function makes sure that torch, numpy and random use the same random seed.
When using trains's task, call this function using the task's random seed like so:
When using clearml's task, call this function using the task's random seed like so:
make_deterministic(task.get_random_seed())
:param int seed: Seed number

View File

@@ -1 +1 @@
__version__ = '0.16.4'
__version__ = '0.17.0rc0'

136
docs/clearml-task.md Normal file
View File

@@ -0,0 +1,136 @@
# `clearml-task` - Execute ANY python code on a remote machine
If you are already familiar with `clearml`, then you can think of `clearml-task` as a way to create a Task/experiment
from any script without the need to add even a single line of code to the original codebase.
`clearml-task` allows a user to **take any python code/repository and launch it on a remote machine**.
The remote execution is fully monitored, all outputs - including console / tensorboard / matplotlib
are logged in real-time into the ClearML UI
## What does it do?
`clearml-task` creates a new experiment on your `clearml-server`; it populates the experiment's environment with:
* repository/commit/branch, as specified by the command-line invocation.
* optional: the base docker image to be used as underlying environment
* optional: alternative python requirements, in case `requirements.txt` is not found inside the repository.
Once the new experiment is created and populated, it will enqueue the experiment to the selected execution queue.
When the experiment is executed on the remote machine (performed by an available `clearml-agent`), all the console outputs
will be logged in real-time, alongside your TensorBoard and matplotlib.
### Use-cases for `clearml-task` remote execution
- You have an off-the-shelf code, and you want to launch it on a remote machine with a specific resource (i.e., GPU)
- You want to run the [hyper-parameter optimization]() on a codebase that is still not connected with `clearml`
- You want to create a [pipeline]() from an assortment of scripts, and you need to create Tasks for those scripts
- Sometimes, you just want to run some code on a remote machine, either using an on-prem cluster or on the cloud...
### Prerequisites
- A single python script, or an up-to-date repository containing the codebase.
- `clearml-agent` running on at least one machine (to execute the experiment)
## Tutorial
### Launching a job from a repository
We will be launching this [script](https://github.com/allegroai/trains/blob/master/examples/frameworks/scikit-learn/sklearn_matplotlib_example.py) on a remote machine. The following are the command-line options we will be using:
- First, we have to give the experiment a name and select a project (`--project examples --name remote_test`)
- Then, we select the repository with our code. If we do not specify branch / commit, it will take the latest commit
from the master branch (`--repo https://github.com/allegroai/clearml.git`)
- Lastly, we need to specify which script in the repository needs to be run (`--script examples/frameworks/scikit-learn/sklearn_matplotlib_example.py`)
Notice that by default, the execution working directory will be the root of the repository. If we need to change it, add `--cwd <folder>`
If we additionally need to pass an argument to our scripts, use the `--args` switch.
The names of the arguments should match the argparse arguments, removing the '--' prefix
(e.g. instead of --key=value -> use `--args key=value` )
``` bash
clearml-task --project examples --name remote_test --repo https://github.com/allegroai/clearml.git
--script examples/frameworks/scikit-learn/sklearn_matplotlib_example.py
--queue single_gpu
```
### Launching a job from a local script
We will be launching a single local script file (no git repo needed) on a remote machine.
- First, we have to give the experiment a name and select a project (`--project examples --name remote_test`)
- Then, we select the script file on our machine, `--script /path/to/my/script.py`
- If we need specific packages, we can specify them manually with `--packages "tqdm>=4" "torch>1.0"`
or we can pass a requirements file `--requirements /path/to/my/requirements.txt`
- Same as in the repo case, if we need to pass arguments to `argparse` we can add `--args key=value`
- If we have a docker container with an entire environment we want our script to run inside,
add e.g., `--docker nvcr.io/nvidia/pytorch:20.11-py3`
Note: In this example, the exact version of PyTorch to install will be resolved by the `clearml-agent` depending on the CUDA environment available at runtime.
``` bash
clearml-task --project examples --name remote_test --script /path/to/my/script.py
--packages "tqdm>=4" "torch>1.0" --args verbose=true
--queue dual_gpu
```
### CLI options
``` bash
clearml-task --help
```
``` console
ClearML launch - launch any codebase on remote machines running clearml-agent
optional arguments:
-h, --help show this help message and exit
--version Display the Allegro.ai utility version
--project PROJECT Required: set the project name for the task. If
--base-task-id is used, this arguments is optional.
--name NAME Required: select a name for the remote task
--repo REPO remote URL for the repository to use. Example: --repo
https://github.com/allegroai/clearml.git
--branch BRANCH Select specific repository branch/tag (implies the
latest commit from the branch)
--commit COMMIT Select specific commit id to use (default: latest
commit, or when used with local repository matching
the local commit id)
--folder FOLDER Remotely execute the code in the local folder. Notice!
It assumes a git repository already exists. Current
state of the repo (commit id and uncommitted changes)
is logged and will be replicated on the remote machine
--script SCRIPT Specify the entry point script for the remote
execution. When used in tandem with --repo the script
should be a relative path inside the repository, for
example: --script source/train.py .When used with
--folder it supports a direct path to a file inside
the local repository itself, for example: --script
~/project/source/train.py
--cwd CWD Working directory to launch the script from. Default:
repository root folder. Relative to repo root or local
folder
--args [ARGS [ARGS ...]]
Arguments to pass to the remote execution, list of
<argument>=<value> strings.Currently only argparse
arguments are supported. Example: --args lr=0.003
batch_size=64
--queue QUEUE Select the queue to launch the task. If not provided a
Task will be created but it will not be launched.
--requirements REQUIREMENTS
Specify requirements.txt file to install when setting
the session. If not provided, the requirements.txt
from the repository will be used.
--packages [PACKAGES [PACKAGES ...]]
Manually specify a list of required packages. Example:
--packages "tqdm>=2.1" "scikit-learn"
--docker DOCKER Select the docker image to use in the remote session
--skip-task-init If set, Task.init() call is not added to the entry
point, and is assumed to be called in within the
script. Default: add Task.init() call entry point
script
--base-task-id BASE_TASK_ID
Use a pre-existing task in the system, instead of a local repo/script.
Essentially clones an existing task and overrides arguments/requirements.
```

196
docs/clearml.conf Normal file
View File

@@ -0,0 +1,196 @@
# ClearML SDK configuration file
api {
# web_server on port 8080
web_server: "http://localhost:8080"
# Notice: 'api_server' is the api server (default port 8008), not the web server.
api_server: "http://localhost:8008"
# file server on port 8081
files_server: "http://localhost:8081"
# Credentials are generated using the webapp, http://localhost:8080/profile
credentials {"access_key": "EGRTCO8JMSIGI6S39GTP43NFWXDQOW", "secret_key": "x!XTov_G-#vspE*Y(h$Anm&DIc5Ou-F)jsl$PdOyj5wG1&E!Z8"}
# verify host ssl certificate, set to False only if you have a very good reason
verify_certificate: True
}
sdk {
# ClearML - default SDK configuration
storage {
cache {
# Defaults to system temp folder / cache
default_base_dir: "~/.clearml/cache"
}
}
metrics {
# History size for debug files per metric/variant. For each metric/variant combination with an attached file
# (e.g. debug image event), file names for the uploaded files will be recycled in such a way that no more than
# X files are stored in the upload destination for each metric/variant combination.
file_history_size: 100
# Max history size for matplotlib imshow files per plot title.
# File names for the uploaded images will be recycled in such a way that no more than
# X images are stored in the upload destination for each matplotlib plot title.
matplotlib_untitled_history_size: 100
# Limit the number of digits after the dot in plot reporting (reducing plot report size)
# plot_max_num_digits: 5
# Settings for generated debug images
images {
format: JPEG
quality: 87
subsampling: 0
}
# Support plot-per-graph fully matching Tensorboard behavior (i.e. if this is set to true, each series should have its own graph)
tensorboard_single_series_per_graph: false
}
network {
metrics {
# Number of threads allocated to uploading files (typically debug images) when transmitting metrics for
# a specific iteration
file_upload_threads: 4
# Warn about upload starvation if no uploads were made in specified period while file-bearing events keep
# being sent for upload
file_upload_starvation_warning_sec: 120
}
iteration {
# Max number of retries when getting frames if the server returned an error (http code 500)
max_retries_on_server_error: 5
# Backoff factory for consecutive retry attempts.
# SDK will wait for {backoff factor} * (2 ^ ({number of total retries} - 1)) between retries.
retry_backoff_factor_sec: 10
}
}
aws {
s3 {
# S3 credentials, used for read/write access by various SDK elements
# default, used for any bucket not specified below
key: ""
secret: ""
region: ""
credentials: [
# specifies key/secret credentials to use when handling s3 urls (read or write)
# {
# bucket: "my-bucket-name"
# key: "my-access-key"
# secret: "my-secret-key"
# },
# {
# # This will apply to all buckets in this host (unless key/value is specifically provided for a given bucket)
# host: "my-minio-host:9000"
# key: "12345678"
# secret: "12345678"
# multipart: false
# secure: false
# }
]
}
boto3 {
pool_connections: 512
max_multipart_concurrency: 16
}
}
google.storage {
# # Default project and credentials file
# # Will be used when no bucket configuration is found
# project: "clearml"
# credentials_json: "/path/to/credentials.json"
# # Specific credentials per bucket and sub directory
# credentials = [
# {
# bucket: "my-bucket"
# subdir: "path/in/bucket" # Not required
# project: "clearml"
# credentials_json: "/path/to/credentials.json"
# },
# ]
}
azure.storage {
# containers: [
# {
# account_name: "clearml"
# account_key: "secret"
# # container_name:
# }
# ]
}
log {
# debugging feature: set this to true to make null log propagate messages to root logger (so they appear in stdout)
null_log_propagate: false
task_log_buffer_capacity: 66
# disable urllib info and lower levels
disable_urllib3_info: true
}
development {
# Development-mode options
# dev task reuse window
task_reuse_time_window_in_hours: 72.0
# Run VCS repository detection asynchronously
vcs_repo_detect_async: true
# Store uncommitted git/hg source code diff in experiment manifest when training in development mode
# This stores "git diff" or "hg diff" into the experiment's "script.requirements.diff" section
store_uncommitted_code_diff: true
store_code_diff_from_remote: false
# Support stopping an experiment in case it was externally stopped, status was changed or task was reset
support_stopping: true
# Default Task output_uri. if output_uri is not provided to Task.init, default_output_uri will be used instead.
default_output_uri: ""
# Default auto generated requirements optimize for smaller requirements
# If True, analyze the entire repository regardless of the entry point.
# If False, first analyze the entry point script, if it does not contain other to local files,
# do not analyze the entire repository.
force_analyze_entire_repo: false
# If set to true, *clearml* update message will not be printed to the console
# this value can be overwritten with os environment variable CLEARML_SUPPRESS_UPDATE_MESSAGE=1
suppress_update_message: false
# If this flag is true (default is false), instead of analyzing the code with Pigar, analyze with `pip freeze`
detect_with_pip_freeze: false
detect_with_conda_freeze: false
# Log specific environment variables. OS environments are enlisted in the "Environment" section
# of the Hyper-Parameters.
# multiple selected variables are supported including the suffix '*'.
# For example: "AWS_*" will log any OS environment variable starting with 'AWS_'.
# This value can be overwritten with os environment variable CLEARML_LOG_ENVIRONMENT="[AWS_*, CUDA_VERSION]"
# Example: log_os_environments: ["AWS_*", "CUDA_VERSION"]
log_os_environments: []
# Development mode worker
worker {
# Status report period in seconds
report_period_sec: 2
# ping to the server - check connectivity
ping_period_sec: 30
# Log all stdout & stderr
log_stdout: true
# compatibility feature, report memory usage for the entire machine
# default (false), report only on the running process and its sub-processes
report_global_mem_used: false
}
}
}

View File

@@ -2,39 +2,40 @@
Firstly, we thank you for taking the time to contribute!
The following is a set of guidelines for contributing to TRAINS.
Contribution comes in many forms:
* Reporting [issues](https://github.com/allegroai/clearml/issues) you've come upon
* Participating in issue discussions in the [issue tracker](https://github.com/allegroai/clearml/issues) and the [ClearML community slack space](https://join.slack.com/t/allegroai-trains/shared_invite/enQtOTQyMTI1MzQxMzE4LTY5NTUxOTY1NmQ1MzQ5MjRhMGRhZmM4ODE5NTNjMTg2NTBlZGQzZGVkMWU3ZDg1MGE1MjQxNDEzMWU2NmVjZmY)
* Suggesting new features or enhancements
* Implementing new features or fixing outstanding issues
The following is a set of guidelines for contributing to ClearML.
These are primarily guidelines, not rules.
Use your best judgment and feel free to propose changes to this document in a pull request.
## Reporting Bugs
## Reporting Issues
This section guides you through submitting a bug report for TRAINS.
By following these guidelines, you
help maintainers and the community understand your report, reproduce the behavior, and find related reports.
By following these guidelines, you help maintainers and the community understand your report, reproduce the behavior, and find related reports.
Before creating bug reports, please check whether the bug you want to report already appears [here](link to issues).
You may discover that you do not need to create a bug report.
When you are creating a bug report, please include as much detail as possible.
Before reporting an issue, please check whether it already appears [here](https://github.com/allegroai/clearml/issues).
If it does, join the on-going discussion instead.
**Note**: If you find a **Closed** issue that may be the same issue which you are currently experiencing,
then open a **New** issue and include a link to the original (Closed) issue in the body of your new one.
Explain the problem and include additional details to help maintainers reproduce the problem:
When reporting an issue, please include as much detail as possible: explain the problem and include additional details to help maintainers reproduce the problem:
* **Use a clear and descriptive title** for the issue to identify the problem.
* **Describe the exact steps necessary to reproduce the problem** in as much detail as possible. Please do not just summarize what you did. Make sure to explain how you did it.
* **Provide the specific environment setup.** Include the `pip freeze` output, specific environment variables, Python version, and other relevant information.
* **Provide specific examples to demonstrate the steps.** Include links to files or GitHub projects, or copy/paste snippets which you use in those examples.
* **If you are reporting any TRAINS crash,** include a crash report with a stack trace from the operating system. Make sure to add the crash report in the issue and place it in a [code block](https://help.github.com/en/articles/getting-started-with-writing-and-formatting-on-github#multiple-lines),
* **If you are reporting any ClearML crash,** include a crash report with a stack trace from the operating system. Make sure to add the crash report in the issue and place it in a [code block](https://help.github.com/en/articles/getting-started-with-writing-and-formatting-on-github#multiple-lines),
a [file attachment](https://help.github.com/articles/file-attachments-on-issues-and-pull-requests/), or just put it in a [gist](https://gist.github.com/) (and provide link to that gist).
* **Describe the behavior you observed after following the steps** and the exact problem with that behavior.
* **Explain which behavior you expected to see and why.**
* **For Web-App issues, please include screenshots and animated GIFs** which recreate the described steps and clearly demonstrate the problem. You can use [LICEcap](https://www.cockos.com/licecap/) to record GIFs on macOS and Windows, and [silentcast](https://github.com/colinkeenan/silentcast) or [byzanz](https://github.com/threedaymonk/byzanz) on Linux.
## Suggesting Enhancements
## Suggesting New Features and Enhancements
This section guides you through submitting an enhancement suggestion for TRAINS, including
completely new features and minor improvements to existing functionality.
By following these guidelines, you help maintainers and the community understand your suggestion and find related suggestions.
Enhancement suggestions are tracked as GitHub issues. After you determine which repository your enhancement suggestion is related to, create an issue on that repository and provide the following:
@@ -43,12 +44,18 @@ Enhancement suggestions are tracked as GitHub issues. After you determine which
* **A step-by-step description of the suggested enhancement** in as much detail as possible.
* **Specific examples to demonstrate the steps.** Include copy/pasteable snippets which you use in those examples as [Markdown code blocks](https://help.github.com/articles/markdown-basics/#multiple-lines).
* **Describe the current behavior and explain which behavior you expected to see instead and why.**
* **Include screenshots or animated GIFs** which help you demonstrate the steps or point out the part of TRAINS which the suggestion is related to. You can use [LICEcap](https://www.cockos.com/licecap/) to record GIFs on macOS and Windows, and [silentcast](https://github.com/colinkeenan/silentcast) or [byzanz](https://github.com/threedaymonk/byzanz) on Linux.
* **Include screenshots or animated GIFs** which help you demonstrate the steps or point out the part of ClearML which the suggestion is related to. You can use [LICEcap](https://www.cockos.com/licecap/) to record GIFs on macOS and Windows, and [silentcast](https://github.com/colinkeenan/silentcast) or [byzanz](https://github.com/threedaymonk/byzanz) on Linux.
## Pull Requests
Before you submit a new PR:
* Verify the work you plan to merge addresses an existing [issue](https://github.com/allegroai/clearml/issues) (If not, open a new one)
* Check related discussions in the [ClearML slack community](https://join.slack.com/t/allegroai-trains/shared_invite/enQtOTQyMTI1MzQxMzE4LTY5NTUxOTY1NmQ1MzQ5MjRhMGRhZmM4ODE5NTNjMTg2NTBlZGQzZGVkMWU3ZDg1MGE1MjQxNDEzMWU2NmVjZmY) (Or start your own discussion on the `#clearml-dev` channel)
* Make sure your code conforms to the ClearML coding standards by running:
`flake8 --max-line-length=120 --statistics --show-source --extend-ignore=E501 ./trains*`
In your PR include:
* A reference to the issue it addresses
* A brief description of the approach you've taken for implementing

Binary file not shown.

After

Width:  |  Height:  |  Size: 447 KiB

139
docs/datasets.md Normal file
View File

@@ -0,0 +1,139 @@
# ClearML introducing Dataset management!
## Decoupling Data from Code - The Dataset Paradigm
### The ultimate goal of `clearml-data` is to transform datasets into configuration parameters
Just like any other argument, the dataset argument should retrieve a full local copy of the
dataset to be used by the experiment.
This means datasets can be efficiently retrieved by any machine in a reproducible way.
Together it creates a full version control solution for all your data,
that is both machine and environment agnostic.
### Design Goals : Simple / Agnostic / File-based / Efficient
## Key Concepts:
1) **Dataset** is a **collection of files** : e.g. folder with all subdirectories and files included in the dataset
2) **Differential storage** : Efficient storage / network
3) **Flexible**: support addition / removal / merge of files and datasets
4) **Descriptive, transparent & searchable**: support projects, names, descriptions, tags and searchable fields
5) **Simple interface** (CLI and programmatic)
6) **Accessible**: get a copy of the dataset files from anywhere on any machine
### Workflow:
#### Simple dataset creation with CLI:
- Create a dataset
``` bash
clearml-data create --project <my_project> --name <my_dataset_name>
```
- Add local files to the dataset
``` bashtrue
clearml-data add --id <dataset_id_from_previous_command> --files ~/datasets/best_dataset/
```
- Upload files (Optional: specify storage `--storage` `s3://bucket` or `gs://` or `azure://` or `/mnt/shared/`)
``` bash
clearml-data upload --id <dataset_id>
```
- Close dataset
``` bash
clearml-data close --id <dataset_id>
```
#### Integrating datasets into your code:
``` python
from argparse import ArgumentParser
from clearml import Dataset
# adding command line interface, so it is easy to use
parser = ArgumentParser()
parser.add_argument('--dataset', default='aayyzz', type=str, help='Dataset ID to train on')
args = parser.parse_args()
# creating a task, so that later we could override the argparse from UI
task = Task.init(project_name='examples', task_name='dataset demo')
# getting a local copy of the dataset
dataset_folder = Datset.get(dataset_id=args.dataset).get_local_copy()
# go over the files in `dataset_folder` and train your model
```
#### Modifying a dataset with CLI:
- Create a new dataset (specify the parent dataset id)
``` bash
clearml-data create --name <improved_dataset> --parents <existing_dataset_id>
```
- Get a mutable copy of the current dataset
``` bash
clearml-data get --id <created_dataset_id> --copy ~/datasets/working_dataset
```
- Change / add / remove files from the dataset folder
``` bash
vim ~/datasets/working_dataset/everything.csv
```
- Sync local changes
``` bash
clearml-data sync --id <created_dataset_id> --folder ~/datasets/working_dataset
```
- Upload files (Optional: specify storage `--storage` `s3://bucket` or `gs://` or `azure://` or `/mnt/shared/`)
``` bash
clearml-data upload --id <created_dataset_id>
```
- Close dataset
``` bash
clearml-data close --id <created_dataset_id>
```
#### Command Line Interface Summary:
- **`search`** Search a dataset based on project / name / description / tag etc.
- **`list`** List the file directory content of a dataset (no need to download a copy pf the dataset)
- **`verify`** Verify a local copy of a dataset (verify the dataset files SHA2 hash)
- **`create`** Create a new dataset (support extending/inheriting multiple parents)
- **`delete`** Delete a dataset
- **`add`** Add local files to a dataset
- **`sync`** Sync dataset with a local folder (source-of-truth being the local folder)
- **`remove`** Remove files from dataset (no need to download a copy of the dataset)
- **`get`** Get a local copy of the dataset (either readonly --link, or writable --copy)
- **`upload`** Upload the dataset (use --storage to specify storage target such as S3/GS/Azure/Folder, default: file server)
#### Under the hood (how it all works):
Each dataset instance stores the collection of files added/modified from the previous version (parent).
When requesting a copy of the dataset all parent datasets on the graph are downloaded and a new folder
is merged with all changes introduced in the dataset DAG.
Implementation details:
Dataset differential snapshot is stored in a single zip file for efficiency in storage and network
bandwidth. Local cache is built into the process making sure datasets are downloaded only once.
Dataset contains SHA2 hash of all the files in the dataset.
In order to increase dataset fetching speed, only file size is verified automatically,
the SHA2 hash is verified only on user's request.
The design supports multiple parents per dataset, essentially merging all parents based on order.
To improve deep dataset DAG storage and speed, dataset squashing was introduced. A user can squash
a dataset, merging down all changes introduced in the DAG, creating a new flat version without parent datasets.
### Datasets UI:
A dataset is represented as a special `Task` in the system. <br>
It is of type `data-processing` with a special tag `dataset`.
- Full log (calls / CLI) of the dataset creation process can be found in the "Execution" section.
- Listing of the dataset differential snapshot, summary of files added / modified / removed and details of files
in the differential snapshot (location / size / hash), is available in the Artifacts section you can find a
- The full dataset listing (all files included) is available in the Configuration section under `Dataset Content`.
This allows you to quickly compare two dataset contents and visually see the difference.
- The dataset genealogy DAG and change-set summary table is visualized in Results / Plots
<a href="https://app.community.clear.ml"><img src="https://github.com/allegroai/clearml/blob/master/docs/dataset_screenshots.gif?raw=true" width="80%"></a>

View File

@@ -1,6 +1,6 @@
# TRAINS Explicit Logging
# ClearML Explicit Logging
Using the **TRAINS** [Logger](https://github.com/allegroai/trains/blob/master/trains/logger.py) module and other **TRAINS** features, you can explicitly log any of the following:
Using the **ClearML** [Logger](https://github.com/allegroai/clearml/blob/master/clearml/logger.py) module and other **ClearML** features, you can explicitly log any of the following:
* Report graphs and images
* [Scalar metrics](#scalar-metrics)
@@ -19,10 +19,10 @@ Using the **TRAINS** [Logger](https://github.com/allegroai/trains/blob/master/tr
* Message logging
* [Reporting text without formatting](#reporting-text-without-formatting)
Additionally, the **TRAINS** Logger module provides methods that allow you to do the following:
Additionally, the **ClearML** Logger module provides methods that allow you to do the following:
* Get the [current logger]()
* Overrride the TRAINS configuration file with a [default upload destination]() for images and files
* Overrride the ClearML configuration file with a [default upload destination]() for images and files
## Graphs and Images
@@ -30,7 +30,7 @@ Additionally, the **TRAINS** Logger module provides methods that allow you to do
Use to report scalar metrics by iteration as a line plot.
First [get the current logger](#get-the-current-logger) and then use it (see an [example script](https://github.com/allegroai/trains/blob/master/examples/reporting/scalar_reporting.py)) with the following method.
First [get the current logger](#get-the-current-logger) and then use it (see an [example script](https://github.com/allegroai/clearml/blob/master/examples/reporting/scalar_reporting.py)) with the following method.
**Method**:
@@ -101,7 +101,7 @@ def report_scalar(self, title, series, value, iteration)
Use to report any data by iteration as a histogram.
First [get the current logger](#get-the-current-logger) and then use it (see an [example script](https://github.com/allegroai/trains/blob/master/examples/reporting/scatter_hist_confusion_mat_reporting.py)) with the following method.
First [get the current logger](#get-the-current-logger) and then use it (see an [example script](https://github.com/allegroai/clearml/blob/master/examples/reporting/scatter_hist_confusion_mat_reporting.py)) with the following method.
**Method**:
@@ -199,7 +199,7 @@ def report_histogram(self, title, series, values, iteration, labels=None, xlabel
Use to report any data by iteration as a single or multiple line plot.
First [get the current logger](#get-the-current-logger) and then use it (see an [example script](https://github.com/allegroai/trains/blob/master/examples/reporting/scatter_hist_confusion_mat_reporting.py)) with the following method.
First [get the current logger](#get-the-current-logger) and then use it (see an [example script](https://github.com/allegroai/clearml/blob/master/examples/reporting/scatter_hist_confusion_mat_reporting.py)) with the following method.
**Method**:
@@ -325,7 +325,7 @@ def report_line_plot(self, title, series, iteration, xaxis, yaxis, mode='lines',
Use to report any vector data as a 2D scatter diagram.
First [get the current logger](#get-the-current-logger) and then use it (see an [example script](https://github.com/allegroai/trains/blob/master/examples/reporting/scatter_hist_confusion_mat_reporting.py)) with the following method.
First [get the current logger](#get-the-current-logger) and then use it (see an [example script](https://github.com/allegroai/clearml/blob/master/examples/reporting/scatter_hist_confusion_mat_reporting.py)) with the following method.
**Method**:
@@ -461,7 +461,7 @@ def report_scatter2d(self, title, series, scatter, iteration, xaxis=None, yaxis=
Use to report any array data as a 3D scatter diagram.
First [get the current logger](#get-the-current-logger) and then use it (see an [example script](https://github.com/allegroai/trains/blob/master/examples/reporting/3d_plots_reporting.py)) with the following method.
First [get the current logger](#get-the-current-logger) and then use it (see an [example script](https://github.com/allegroai/clearml/blob/master/examples/reporting/3d_plots_reporting.py)) with the following method.
**Method**:
@@ -597,7 +597,7 @@ def report_scatter3d(self, title, series, scatter, iteration, labels=None, mode=
Use to report a heat-map matrix as a confusion matrix. You can also plot a heat-map as a [surface diagram](#surface-diagrams).
First [get the current logger](#get-the-current-logger) and then use it (see an [example script](https://github.com/allegroai/trains/blob/master/examples/reporting/scatter_hist_confusion_mat_reporting.py)) with the following method.
First [get the current logger](#get-the-current-logger) and then use it (see an [example script](https://github.com/allegroai/clearml/blob/master/examples/reporting/scatter_hist_confusion_mat_reporting.py)) with the following method.
**Method**:
@@ -689,7 +689,7 @@ def report_confusion_matrix(self, title, series, matrix, iteration, xlabels=None
Use to plot a heat-map matrix as a surface diagram. You can also plot a heat-map as a [confusion matrix](#confusion-matrices).
First [get the current logger](#get-the-current-logger) and then use it (see an [example script](https://github.com/allegroai/trains/blob/master/examples/reporting/3d_plots_reporting.py)) with the following method.
First [get the current logger](#get-the-current-logger) and then use it (see an [example script](https://github.com/allegroai/clearml/blob/master/examples/reporting/3d_plots_reporting.py)) with the following method.
**Method**:
@@ -818,10 +818,10 @@ def report_surface(self, title, series, matrix, iteration, xlabels=None, ylabels
### Images
Use to report an image and upload its contents to the bucket specified in the **TRAINS** configuration file,
Use to report an image and upload its contents to the bucket specified in the **ClearML** configuration file,
or a [a default upload destination](#set-default-upload-destination), if you set a default.
First [get the current logger](#get-the-current-logger) and then use it (see an [example script](https://github.com/allegroai/trains/blob/master/examples/manual_reporting.py)) with the following method.
First [get the current logger](#get-the-current-logger) and then use it (see an [example script](https://github.com/allegroai/clearml/blob/master/examples/manual_reporting.py)) with the following method.
**Method**:
@@ -929,7 +929,7 @@ def report_image(self, title, series, iteration, local_path=None, matrix=None, m
### Logging Experiment Parameter Dictionaries
In order for **TRAINS** to log a dictionary of parameters, use the `Task.connect` method.
In order for **ClearML** to log a dictionary of parameters, use the `Task.connect` method.
For example, to log the hyper-parameters <code>learning_rate</code>, <code>batch_size</code>, <code>display_step</code>, <code>model_path</code>, <code>n_hidden_1</code>, and <code>n_hidden_2</code>:
@@ -938,27 +938,27 @@ For example, to log the hyper-parameters <code>learning_rate</code>, <code>batch
parameters_dict = { 'learning_rate': 0.001, 'batch_size': 100, 'display_step': 1,
'model_path': "/tmp/model.ckpt", 'n_hidden_1': 256, 'n_hidden_2': 256 }
# Connect the dictionary to your TRAINS Task
# Connect the dictionary to your ClearML Task
parameters_dict = Task.current_task().connect(parameters_dict)
```
### Specifying Environment Variables to Track
By setting the `TRAINS_LOG_ENVIRONMENT` environment variable, make **TRAINS** log either:
By setting the `CLEARML_LOG_ENVIRONMENT` environment variable, make **ClearML** log either:
* All environment variables
export TRAINS_LOG_ENVIRONMENT="*"
export CLEARML_LOG_ENVIRONMENT="*"
* Specific environment variables
For example, log `PWD` and `PYTHONPATH`
export TRAINS_LOG_ENVIRONMENT="PWD,PYTHONPATH"
export CLEARML_LOG_ENVIRONMENT="PWD,PYTHONPATH"
* No environment variables
export TRAINS_LOG_ENVIRONMENT=
export CLEARML_LOG_ENVIRONMENT=
## Logging Messages
@@ -972,7 +972,7 @@ Use the methods in this section to log various types of messages. The method nam
def debug(self, msg, *args, **kwargs)
```
First [get the current logger](#get-the-current-logger) and then use it (see an [example script](https://github.com/allegroai/trains/blob/master/examples/reporting/text_reporting.py)) with the following method.
First [get the current logger](#get-the-current-logger) and then use it (see an [example script](https://github.com/allegroai/clearml/blob/master/examples/reporting/text_reporting.py)) with the following method.
**Arguments**:
@@ -1010,7 +1010,7 @@ First [get the current logger](#get-the-current-logger) and then use it (see an
def info(self, msg, *args, **kwargs)
```
First [get the current logger](#get-the-current-logger) and then use it (see an [example script](https://github.com/allegroai/trains/blob/master/examples/reporting/text_reporting.py)) with the following method.
First [get the current logger](#get-the-current-logger) and then use it (see an [example script](https://github.com/allegroai/clearml/blob/master/examples/reporting/text_reporting.py)) with the following method.
**Arguments**:
@@ -1048,7 +1048,7 @@ First [get the current logger](#get-the-current-logger) and then use it (see an
def warn(self, msg, *args, **kwargs)
```
First [get the current logger](#get-the-current-logger) and then use it (see an [example script](https://github.com/allegroai/trains/blob/master/examples/reporting/text_reporting.py)) with the following method.
First [get the current logger](#get-the-current-logger) and then use it (see an [example script](https://github.com/allegroai/clearml/blob/master/examples/reporting/text_reporting.py)) with the following method.
**Arguments**:<a name="log_arguments"></a>
@@ -1087,7 +1087,7 @@ First [get the current logger](#get-the-current-logger) and then use it (see an
def error(self, msg, *args, **kwargs)
```
First [get the current logger](#get-the-current-logger) and then use it (see an [example script](https://github.com/allegroai/trains/blob/master/examples/reporting/text_reporting.py)) with the following method.
First [get the current logger](#get-the-current-logger) and then use it (see an [example script](https://github.com/allegroai/clearml/blob/master/examples/reporting/text_reporting.py)) with the following method.
**Arguments**:
@@ -1125,7 +1125,7 @@ First [get the current logger](#get-the-current-logger) and then use it (see an
def critical(self, msg, *args, **kwargs)
```
First [get the current logger](#get-the-current-logger) and then use it (see an [example script](https://github.com/allegroai/trains/blob/master/examples/reporting/text_reporting.py)) with the following method.
First [get the current logger](#get-the-current-logger) and then use it (see an [example script](https://github.com/allegroai/clearml/blob/master/examples/reporting/text_reporting.py)) with the following method.
**Arguments**:
@@ -1163,7 +1163,7 @@ First [get the current logger](#get-the-current-logger) and then use it (see an
def fatal(self, msg, *args, **kwargs)
```
First [get the current logger](#get-the-current-logger) and then use it (see an [example script](https://github.com/allegroai/trains/blob/master/examples/reporting/text_reporting.py)) with the following method.
First [get the current logger](#get-the-current-logger) and then use it (see an [example script](https://github.com/allegroai/clearml/blob/master/examples/reporting/text_reporting.py)) with the following method.
**Arguments**:
@@ -1201,7 +1201,7 @@ First [get the current logger](#get-the-current-logger) and then use it (see an
def console(self, msg, level=logging.INFO, omit_console=False, *args, **kwargs)
```
First [get the current logger](#get-the-current-logger) and then use it (see an [example script](https://github.com/allegroai/trains/blob/master/examples/reporting/text_reporting.py)) with the following method.
First [get the current logger](#get-the-current-logger) and then use it (see an [example script](https://github.com/allegroai/clearml/blob/master/examples/reporting/text_reporting.py)) with the following method.
**Arguments**:
@@ -1279,7 +1279,7 @@ First [get the current logger](#get-the-current-logger) and then use it (see an
def report_text(self, msg, level=logging.INFO, print_console=False, *args, **_)
```
First [get the current logger](#get-the-current-logger) and then use it (see an [example script](https://github.com/allegroai/trains/blob/master/examples/reporting/text_reporting.py)) with the following method.
First [get the current logger](#get-the-current-logger) and then use it (see an [example script](https://github.com/allegroai/clearml/blob/master/examples/reporting/text_reporting.py)) with the following method.
**Arguments**:
@@ -1371,7 +1371,7 @@ None.
Use to specify the default destination storage location used for uploading images.
Images are uploaded and a link to the image is reported.
Credentials for the storage location are in the global configuration file (for example, on Linux, <code>~/trains.conf</code>).
Credentials for the storage location are in the global configuration file (for example, on Linux, <code>~/clearml.conf</code>).
**Method**:

View File

@@ -1,4 +1,4 @@
# TRAINS SDK configuration file
# ClearML SDK configuration file - Please use ~/clearml.conf
api {
# web_server on port 8080
web_server: "http://localhost:8080"
@@ -16,12 +16,12 @@ api {
verify_certificate: True
}
sdk {
# TRAINS - default SDK configuration
# ClearML - default SDK configuration
storage {
cache {
# Defaults to system temp folder / cache
default_base_dir: "~/.trains/cache"
default_base_dir: "~/.clearml/cache"
}
}
@@ -103,7 +103,7 @@ sdk {
google.storage {
# # Default project and credentials file
# # Will be used when no bucket configuration is found
# project: "trains"
# project: "clearml"
# credentials_json: "/path/to/credentials.json"
# # Specific credentials per bucket and sub directory
@@ -111,7 +111,7 @@ sdk {
# {
# bucket: "my-bucket"
# subdir: "path/in/bucket" # Not required
# project: "trains"
# project: "clearml"
# credentials_json: "/path/to/credentials.json"
# },
# ]
@@ -119,7 +119,7 @@ sdk {
azure.storage {
# containers: [
# {
# account_name: "trains"
# account_name: "clearml"
# account_key: "secret"
# # container_name:
# }
@@ -161,8 +161,8 @@ sdk {
# do not analyze the entire repository.
force_analyze_entire_repo: false
# If set to true, *trains* update message will not be printed to the console
# this value can be overwritten with os environment variable TRAINS_SUPPRESS_UPDATE_MESSAGE=1
# If set to true, *clearml* update message will not be printed to the console
# this value can be overwritten with os environment variable CLEARML_SUPPRESS_UPDATE_MESSAGE=1
suppress_update_message: false
# If this flag is true (default is false), instead of analyzing the code with Pigar, analyze with `pip freeze`
@@ -173,7 +173,7 @@ sdk {
# of the Hyper-Parameters.
# multiple selected variables are supported including the suffix '*'.
# For example: "AWS_*" will log any OS environment variable starting with 'AWS_'.
# This value can be overwritten with os environment variable TRAINS_LOG_ENVIRONMENT="[AWS_*, CUDA_VERSION]"
# This value can be overwritten with os environment variable CLEARML_LOG_ENVIRONMENT="[AWS_*, CUDA_VERSION]"
# Example: log_os_environments: ["AWS_*", "CUDA_VERSION"]
log_os_environments: []

View File

@@ -1,8 +1,8 @@
from random import sample
from trains import Task
from clearml import Task
# Connecting TRAINS
# Connecting ClearML
task = Task.init(project_name='examples', task_name='Random Hyper-Parameter Search Example', task_type=Task.TaskTypes.optimizer)
# Create a hyper-parameter dictionary for the task

View File

@@ -1 +1 @@
trains
clearml

View File

@@ -1,4 +1,4 @@
from trains import Task
from clearml import Task
from time import sleep
# Initialize the Task Pipe's first Task used to start the Task Pipe
@@ -35,7 +35,7 @@ cloned_task_parameters = cloned_task.get_parameters()
cloned_task_parameters[param['param_name']] = param['param_name_new_value']
cloned_task.set_parameters(cloned_task_parameters)
# Enqueue the Task for execution. The enqueued Task must already exist in the trains platform
# Enqueue the Task for execution. The enqueued Task must already exist in the clearml platform
print('Enqueue next step in pipeline to queue: {}'.format(param['execution_queue_name']))
Task.enqueue(cloned_task.id, queue_name=param['execution_queue_name'])

View File

@@ -1,7 +1,7 @@
# This Task is the base task that we will be executing as a second step (see task_piping.py)
# In order to make sure this experiment is registered in the platform, you must execute it once.
from trains import Task
from clearml import Task
# Initialize the task pipe's first task used to start the task pipe
task = Task.init('examples', 'Toy Base Task')

View File

@@ -1,3 +1,3 @@
torch>=1.1.0
torchvision>=0.3.0
trains
clearml

View File

@@ -1,4 +1,4 @@
# TRAINS - example of multiple sub-processes interacting and reporting to a single master experiment
# ClearML - example of multiple sub-processes interacting and reporting to a single master experiment
import multiprocessing
import os
@@ -7,7 +7,7 @@ import sys
import time
from argparse import ArgumentParser
from trains import Task
from clearml import Task
# fake data for us to "process"
data = (
@@ -51,7 +51,7 @@ if __name__ == '__main__':
# We have to initialize the task in the master process,
# it will make sure that any sub-process calling Task.init will get the master task object
# notice that we exclude the `counter` argument, so we can launch multiple sub-processes with trains-agent
# notice that we exclude the `counter` argument, so we can launch multiple sub-processes with clearml-agent
# otherwise, the `counter` will always be set to the original value.
task = Task.init('examples', 'Popen example', auto_connect_arg_parser={'counter': False})

View File

@@ -3,7 +3,7 @@ import numpy as np
import tensorflow as tf
from tensorflow import keras
from trains import Task
from clearml import Task
task = Task.init(project_name="autokeras", task_name="autokeras imdb example with scalars")

View File

@@ -1,5 +1,5 @@
# Plese read this https://github.com/keras-team/autokeras#installation before doing changes
autokeras
tensorflow>=2.3.0
trains
clearml
git+https://github.com/keras-team/keras-tuner.git@1.0.2rc2

View File

@@ -1,10 +1,10 @@
# TRAINS - Fastai with Tensorboard example code, automatic logging the model and Tensorboard outputs
# ClearML - Fastai with Tensorboard example code, automatic logging the model and Tensorboard outputs
#
from fastai.callbacks.tensorboard import LearnerTensorboardWriter
from fastai.vision import * # Quick access to computer vision functionality
from trains import Task
from clearml import Task
task = Task.init(project_name="example", task_name="fastai with tensorboard callback")

View File

@@ -1,4 +1,4 @@
fastai
tensorboard
tensorboardX
trains
clearml

View File

@@ -15,12 +15,12 @@ from ignite.utils import setup_logger
from torch.utils.tensorboard import SummaryWriter
from tqdm import tqdm
from trains import Task, StorageManager
from clearml import Task, StorageManager
# Trains Initializations
# ClearML Initializations
task = Task.init(project_name='Image Example', task_name='image classification CIFAR10')
params = {'number_of_epochs': 20, 'batch_size': 64, 'dropout': 0.25, 'base_lr': 0.001, 'momentum': 0.9, 'loss_report': 100}
params = task.connect(params) # enabling configuration override by trains
params = task.connect(params) # enabling configuration override by clearml
print(params) # printing actual configuration (after override in remote mode)
manager = StorageManager()

View File

@@ -1,4 +1,4 @@
# TRAINS - Keras with Tensorboard example code, automatic logging model and Tensorboard outputs
# ClearML - Keras with Tensorboard example code, automatic logging model and Tensorboard outputs
#
# Train a simple deep NN on the MNIST dataset.
# Gets to 98.40% test accuracy after 20 epochs
@@ -19,7 +19,7 @@ from tensorflow.keras.layers import Activation, Dense
from tensorflow.keras.models import Sequential
from tensorflow.keras.optimizers import RMSprop
from trains import Task
from clearml import Task
class TensorBoardImage(TensorBoard):
@@ -89,7 +89,7 @@ model.compile(loss='categorical_crossentropy',
optimizer=RMSprop(),
metrics=['accuracy'])
# Connecting TRAINS
# Connecting ClearML
task = Task.init(project_name='examples', task_name='Keras with TensorBoard example')
# To set your own configuration:

View File

@@ -1,4 +1,4 @@
# TRAINS - Keras with Tensorboard example code, automatic logging model and Tensorboard outputs
# ClearML - Keras with Tensorboard example code, automatic logging model and Tensorboard outputs
#
# Train a simple deep NN on the MNIST dataset.
# Gets to 98.40% test accuracy after 20 epochs
@@ -18,7 +18,7 @@ from keras.layers.core import Dense, Activation
from keras.optimizers import RMSprop
from keras.utils import np_utils
import tensorflow as tf
from trains import Task
from clearml import Task
class TensorBoardImage(TensorBoard):
@@ -88,7 +88,7 @@ model.compile(loss='categorical_crossentropy',
optimizer=RMSprop(),
metrics=['accuracy'])
# Connecting TRAINS
# Connecting ClearML
task = Task.init(project_name='examples', task_name='Keras with TensorBoard example')
task.connect_configuration({'test': 1337, 'nested': {'key': 'value', 'number': 1}})

View File

@@ -1,2 +1,2 @@
trains
clearml
Keras>=2.2.4

View File

@@ -1,11 +1,11 @@
# TRAINS - Example of manual model configuration and uploading
# ClearML - Example of manual model configuration and uploading
#
import os
from tempfile import gettempdir
from keras import Input, layers, Model
from trains import Task
from clearml import Task
task = Task.init(project_name='examples', task_name='Model configuration and upload')

View File

@@ -1,3 +1,3 @@
Keras
tensorflow>=2.0
trains
clearml

View File

@@ -3,9 +3,9 @@
import kerastuner as kt
import tensorflow as tf
import tensorflow_datasets as tfds
from trains.external.kerastuner import TrainsTunerLogger
from clearml.external.kerastuner import TrainsTunerLogger
from trains import Task
from clearml import Task
physical_devices = tf.config.list_physical_devices('GPU')
if physical_devices:

View File

@@ -1,4 +1,4 @@
keras-tuner
tensorflow>=2.0
tensorflow-datasets
trains
clearml

View File

@@ -1,4 +1,4 @@
lightgbm
scikit-learn
pandas
trains
clearml

View File

@@ -1,10 +1,10 @@
# TRAINS - Example of LightGBM integration
# ClearML - Example of LightGBM integration
#
import lightgbm as lgb
import pandas as pd
from sklearn.metrics import mean_squared_error
from trains import Task
from clearml import Task
task = Task.init(project_name="examples", task_name="LIGHTgbm")

View File

@@ -1,9 +1,9 @@
# TRAINS - Example of Matplotlib and Seaborn integration and reporting
# ClearML - Example of Matplotlib and Seaborn integration and reporting
#
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
from trains import Task
from clearml import Task
task = Task.init(project_name='examples', task_name='Matplotlib example')

View File

@@ -1,4 +1,4 @@
matplotlib >= 3.1.1 ; python_version >= '3.6'
matplotlib >= 2.2.4 ; python_version < '3.6'
seaborn
trains
clearml

View File

@@ -1,10 +1,10 @@
# TRAINS - Example of manual model configuration and uploading
# ClearML - Example of manual model configuration and uploading
#
import os
from tempfile import gettempdir
import torch
from trains import Task
from clearml import Task
task = Task.init(project_name='examples', task_name='Model configuration and upload')

View File

@@ -1,4 +1,4 @@
# TRAINS - example of TRAINS torch distributed support
# ClearML - example of ClearML torch distributed support
# notice all nodes will be reporting to the master Task (experiment)
import os
@@ -15,7 +15,7 @@ import torch.nn.functional as F
from torch import optim
from torchvision import datasets, transforms
from trains import Task
from clearml import Task
local_dataset_path = './MNIST_data'
@@ -150,7 +150,7 @@ if __name__ == "__main__":
# We have to initialize the task in the master process,
# it will make sure that any sub-process calling Task.init will get the master task object
# notice that we exclude the `rank` argument, so we can launch multiple sub-processes with trains-agent
# notice that we exclude the `rank` argument, so we can launch multiple sub-processes with clearml-agent
# otherwise, the `rank` will always be set to the original value.
task = Task.init("examples", "test torch distributed", auto_connect_arg_parser={'rank': False})

View File

@@ -1,4 +1,4 @@
# TRAINS - Example of Pytorch and matplotlib integration and reporting
# ClearML - Example of Pytorch and matplotlib integration and reporting
#
"""
Neural Transfer Using PyTorch
@@ -60,7 +60,7 @@ import torchvision.transforms as transforms
import torchvision.models as models
import copy
from trains import Task
from clearml import Task
task = Task.init(project_name='examples', task_name='pytorch with matplotlib example', task_type=Task.TaskTypes.testing)

View File

@@ -1,4 +1,4 @@
# TRAINS - Example of Pytorch mnist training integration
# ClearML - Example of Pytorch mnist training integration
#
from __future__ import print_function
import argparse
@@ -11,7 +11,7 @@ import torch.nn.functional as F
import torch.optim as optim
from torchvision import datasets, transforms
from trains import Task, Logger
from clearml import Task, Logger
class Net(nn.Module):

View File

@@ -1,4 +1,4 @@
# TRAINS - Example of pytorch with tensorboard>=v1.14
# ClearML - Example of pytorch with tensorboard>=v1.14
#
from __future__ import print_function
@@ -14,7 +14,7 @@ from torchvision import datasets, transforms
from torch.autograd import Variable
from torch.utils.tensorboard import SummaryWriter
from trains import Task
from clearml import Task
class Net(nn.Module):
@@ -99,7 +99,7 @@ def main():
parser.add_argument('--log-interval', type=int, default=10, metavar='N',
help='how many batches to wait before logging training status')
args = parser.parse_args()
task = Task.init(project_name='examples', task_name='pytorch with tensorboard') # noqa: F841
Task.init(project_name='examples', task_name='pytorch with tensorboard')
writer = SummaryWriter('runs')
writer.add_text('TEXT', 'This is some text', 0)
args.cuda = not args.no_cuda and torch.cuda.is_available()

View File

@@ -3,4 +3,4 @@ tensorboardX
tensorboard>=1.14.0
torch>=1.1.0
torchvision>=0.3.0
trains
clearml

View File

@@ -5,7 +5,7 @@ import numpy as np
from PIL import Image
from torch.utils.tensorboard import SummaryWriter
from trains import Task
from clearml import Task
task = Task.init(project_name='examples', task_name='pytorch tensorboard toy example')

View File

@@ -2,4 +2,4 @@ joblib>=0.13.2
matplotlib >= 3.1.1 ; python_version >= '3.6'
matplotlib >= 2.2.4 ; python_version < '3.6'
scikit-learn
trains
clearml

View File

@@ -10,7 +10,7 @@ import numpy as np
import matplotlib.pyplot as plt
from trains import Task
from clearml import Task
task = Task.init(project_name="examples", task_name="scikit-learn joblib example")

View File

@@ -6,7 +6,7 @@ from sklearn.model_selection import learning_curve
from sklearn.naive_bayes import GaussianNB
from sklearn.svm import SVC
from trains import Task
from clearml import Task
def plot_learning_curve(estimator, title, X, y, axes=None, ylim=None, cv=None, n_jobs=None,

View File

@@ -1,4 +1,4 @@
# TRAINS - Example of pytorch with tensorboardX
# ClearML - Example of pytorch with tensorboardX
#
from __future__ import print_function
@@ -14,7 +14,7 @@ from tensorboardX import SummaryWriter
from torch.autograd import Variable
from torchvision import datasets, transforms
from trains import Task
from clearml import Task
class Net(nn.Module):

View File

@@ -1,4 +1,4 @@
tensorboardX>=1.8
torch>=1.1.0
torchvision>=0.3.0
trains
clearml

Some files were not shown because too many files have changed in this diff Show More