diff --git a/README.md b/README.md
index 349e239d..96517a0a 100644
--- a/README.md
+++ b/README.md
@@ -1,143 +1,134 @@
-# Allegro Trains - new name is coming soon ;)
-## Auto-Magical Experiment Manager, Version Control and ML-Ops for AI
+
-## :confetti_ball: Now with Full ML/DL DevOps - See [TRAINS AGENT](https://github.com/allegroai/trains-agent) and [Services](https://github.com/allegroai/trains-server#trains-agent-services--)
-## :station: [Documentation is here!](https://allegro.ai/docs) `wubba lubba dub dub` and a [Slack Channel](https://join.slack.com/t/allegroai-trains/shared_invite/enQtOTQyMTI1MzQxMzE4LTY5NTUxOTY1NmQ1MzQ5MjRhMGRhZmM4ODE5NTNjMTg2NTBlZGQzZGVkMWU3ZDg1MGE1MjQxNDEzMWU2NmVjZmY) :train2:
-## Features: [AWS autoscaler wizard](https://allegro.ai/docs/examples/services/aws_autoscaler/aws_autoscaler/) :robot: [Hyper-Parameter Optimization](https://allegro.ai/docs/examples/optimization/hyper-parameter-optimization/examples_hyperparam_opt/) and :electric_plug: [Pipeline Controllers](https://allegro.ai/docs/examples/pipeline/pipeline_controller/)
+

-"Because it’s a jungle out there"
-[](https://img.shields.io/github/license/allegroai/trains.svg)
-[](https://img.shields.io/pypi/pyversions/trains.svg)
-[](https://img.shields.io/pypi/v/trains.svg)
-[](https://pypi.python.org/pypi/trains/)
+**ClearML - Auto-Magical Suite of tools to streamline your ML workflow
+Experiment Manager, ML-Ops and Data-Management**
+
+[](https://img.shields.io/github/license/allegroai/clearml.svg)
+[](https://img.shields.io/pypi/pyversions/clearml.svg)
+[](https://img.shields.io/pypi/v/clearml.svg)
+[](https://pypi.python.org/pypi/clearml/)
[](https://optuna.org)
-[](https://join.slack.com/t/allegroai-trains/shared_invite/zt-c0t13pty-aVUZZW1TSSSg2vyIGVPBhg)
+[](https://join.slack.com/t/allegroai-trains/shared_invite/zt-c0t13pty-aVUZZW1TSSSg2vyIGVPBhg)
-### :point_right: Help improve Trains by filling our 2-min [user survey](https://allegro.ai/lp/trains-user-survey/)
+
-Trains is our solution to a problem we share with countless other researchers and developers in the machine
+---
+### ClearML
+#### *Formerly known as Allegro Trains*
+ClearML is a ML/DL development and production suite, it contains three main modules:
+
+- [Experiment Manager](#clearml-experiment-management) - Automagical experiment tracking, environments and results
+- [ML-Ops](https://github.com/allegroai/trains-agent) - Automation, Pipelines & Orchestration solution for ML/DL jobs (K8s / Cloud / bare-metal)
+- [Data-Management](https://github.com/allegroai/clearml/doc/clearml-data.md) - Fully differentiable data management & version control solution on top of object-storage
+ (S3/GS/Azure/NAS)
+
+
+Instrumenting these components is the **ClearML-server**, see [Self-Hosting]() & [Free tier Hosting]()
+
+
+---
+
+
+**[Signup](https://app.community.clear.ml) & [Start using](https://allegro.ai/clearml/docs/getting_started/getting_started/) in under 2 minutes**
+
+
+
+---
+
+
+## ClearML Experiment Manager
+
+**Adding only 2 lines to your code gets you the following**
+
+* Complete experiment setup log
+ * Full source control info including non-committed local changes
+ * Execution environment (including specific packages & versions)
+ * Hyper-parameters
+ * ArgParser for command line parameters with currently used values
+ * Explicit parameters dictionary
+ * Tensorflow Defines (absl-py)
+ * Hydra configuration and overrides
+ * Initial model weights file
+* Full experiment output automatic capture
+ * stdout and stderr
+ * Resource Monitoring (CPU/GPU utilization, temperature, IO, network, etc.)
+ * Model snapshots (With optional automatic upload to central storage: Shared folder, S3, GS, Azure, Http)
+ * Artifacts log & store (Shared folder, S3, GS, Azure, Http)
+ * Tensorboard/TensorboardX scalars, metrics, histograms, **images, audio and video samples**
+ * [Matplotlib & Seaborn](https://github.com/allegroai/trains/tree/master/examples/frameworks/matplotlib)
+ * [ClearML Explicit Logging](https://allegro.ai/clearml/docs/examples/reporting/) interface for complete flexibility.
+* Extensive platform support and integrations
+ * Supported ML/DL frameworks: [PyTorch](https://github.com/allegroai/trains/tree/master/examples/frameworks/pytorch)(incl' ignite/lightning), [Tensorflow](https://github.com/allegroai/trains/tree/master/examples/frameworks/tensorflow), [Keras](https://github.com/allegroai/trains/tree/master/examples/frameworks/keras), [AutoKeras](https://github.com/allegroai/trains/tree/master/examples/frameworks/autokeras), [XGBoost](https://github.com/allegroai/trains/tree/master/examples/frameworks/xgboost) and [Scikit-Learn](https://github.com/allegroai/trains/tree/master/examples/frameworks/scikit-learn)
+ * Seamless integration (including version control) with **Jupyter Notebook**
+ and [*PyCharm* remote debugging](https://github.com/allegroai/trains-pycharm-plugin)
+
+#### [Start using ClearML](https://allegro.ai/clearml/docs/getting_started/getting_started/)
+
+```bash
+pip install clearml
+```
+
+Add two lines to your code:
+```python
+from clearml import Task
+task = Task(project_name='examples', task_name='hello world')
+```
+
+You are done, everything your process outputs is now automagically logged into ClearML.
+
Next step automation! **Learn more on ClearML two clicks automation [here]()**
+
+## ClearML Architecture
+
+The ClearML run-time components:
+
+* The ClearML Python Package for integrating ClearML into your existing scripts by adding just two lines of code, and optionally extending your experiments and other workflows with ClearML powerful and versatile set of classes and methods.
+* The ClearML Server storing experiment, model, and workflow data, and supporting the Web UI experiment manager, and ML-Ops automation for reproducibility and tuning. It is available as a hosted service and open source for you to deploy your own ClearML Server.
+* The ClearML Agent for ML-Ops orchestration, experiment and workflow reproducibility, and scalability.
+
+
+
+## Additional Modules
+
+- [clearml-session](https://github.com/allegroai/clearml-session) - **Launch remote JupyterLab / VSCode-server inside any docker, on Cloud/On-Prem machines**
+- [clearml-task](https://github.com/allegroai/clearml/doc/clearml-task.md) - Run any codebase on remote machines with full remote logging of Tensorboard, Matplotlib & Console outputs
+- [clearml-data](https://github.com/allegroai/clearml/doc/clearml-data.md) - **CLI for managing and versioning your datasets, including creating / uploading / downloading of data from S3/GS/Azure/NAS**
+- [AWS Auto-Scaler](examples/services/aws-autoscaler/aws_autoscaler.py) - Automatically spin EC2 instances based on your workloads with preconfigured budget! No need for K8s!
+- [Hyper-Parameter Optimization](examples/services/hyper-parameter-optimization/hyper_parameter_optimizer.py) - Optimize any code with black-box approach and state of the art Bayesian optimization algorithms
+- [Automation Pipeline](examples/pipeline/pipeline_controller.py) - Build pipelines based on existing experiments / jobs, supports building pipelines of pipelines!
+- [Slack Integration](examples/services/monitoring/slack_alerts.py) - Report experiments progress / failure directly to Slack (fully customizable!)
+
+## Why ClearML?
+
+ClearML is our solution to a problem we share with countless other researchers and developers in the machine
learning/deep learning universe: Training production-grade deep learning models is a glorious but messy process.
-Trains tracks and controls the process by associating code version control, research projects,
+ClearML tracks and controls the process by associating code version control, research projects,
performance metrics, and model provenance.
-We designed Trains specifically to require effortless integration so that teams can preserve their existing methods
-and practices. Use it on a daily basis to boost collaboration and visibility, or use it to automatically collect
-your experimentation logs, outputs, and data to one centralized server.
+We designed ClearML specifically to require effortless integration so that teams can preserve their existing methods
+and practices.
-**We have a demo server up and running at [https://demoapp.trains.allegro.ai](https://demoapp.trains.allegro.ai).**
-
-### :steam_locomotive: [Getting Started Tutorial](https://allegro.ai/blog/setting-up-allegro-ai-platform/) :rocket:
-
-**You can try out Trains and [test your code](#integrate-trains), with no additional setup.**
-
-
-
-## Trains Automatically Logs Everything
-**With only two lines of code, this is what you are getting:**
-
-* Git repository, branch, commit id, entry point and local git diff
-* Python environment (including specific packages & versions)
-* stdout and stderr
-* Resource Monitoring (CPU/GPU utilization, temperature, IO, network, etc.)
-* Hyper-parameters
- * ArgParser for command line parameters with currently used values
- * Explicit parameters dictionary
- * Tensorflow Defines (absl-py)
-* Initial model weights file
-* Model snapshots (With optional automatic upload to central storage: Shared folder, S3, GS, Azure, Http)
-* Artifacts log & store (Shared folder, S3, GS, Azure, Http)
-* Tensorboard/TensorboardX scalars, metrics, histograms, **images, audio and video**
-* [Matplotlib & Seaborn](https://github.com/allegroai/trains/tree/master/examples/frameworks/matplotlib)
-* Supported frameworks: [PyTorch](https://github.com/allegroai/trains/tree/master/examples/frameworks/pytorch), [Tensorflow](https://github.com/allegroai/trains/tree/master/examples/frameworks/tensorflow), [Keras](https://github.com/allegroai/trains/tree/master/examples/frameworks/keras), [AutoKeras](https://github.com/allegroai/trains/tree/master/examples/frameworks/autokeras), [XGBoost](https://github.com/allegroai/trains/tree/master/examples/frameworks/xgboost) and [Scikit-Learn](https://github.com/allegroai/trains/tree/master/examples/frameworks/scikit-learn) (MxNet is coming soon)
-* Seamless integration (including version control) with **Jupyter Notebook**
- and [*PyCharm* remote debugging](https://github.com/allegroai/trains-pycharm-plugin)
-
-**Additionally, log data explicitly using [Trains Explicit Logging](https://allegro.ai/docs/examples/reporting/).**
-
-## Using Trains
-
-Trains is a two part solution:
-
-1. Trains [python package](https://pypi.org/project/trains/) auto-magically connects with your code
-
- **Trains requires only two lines of code for full integration.**
-
- To connect your code with Trains:
-
- - Install Trains
-
- pip install trains
-
- Add optional cloud storage support (S3/GoogleStorage/Azure):
-
- ```bash
- pip install trains[s3]
- pip install trains[gs]
- pip install trains[azure]
- ```
-
-
-
- - Add the following lines to your code
-
- from trains import Task
- task = Task.init(project_name="my project", task_name="my task")
-
- * If project_name is not provided, the repository name will be used instead
- * If task_name (experiment) is not provided, the current filename will be used instead
-
- - Run your code. When Trains connects to the server, a link is printed. For example
-
- Trains Results page:
- https://demoapp.trains.allegro.ai/projects/76e5e2d45e914f52880621fe64601e85/experiments/241f06ae0f5c4b27b8ce8b64890ce152/output/log
-
- - Open the link and view your experiment parameters, model and tensorboard metrics
-
- **See examples [here](https://allegro.ai/docs/examples/examples_overview/)**
-
-2. [Trains Server](https://github.com/allegroai/trains-server) for logging, querying, control and UI ([Web-App](https://github.com/allegroai/trains-web))
-
- **We already have a demo server up and running for you at [https://demoapp.trains.allegro.ai](https://demoapp.trains.allegro.ai).**
-
- **You can try out Trains without the need to install your own *trains-server*, just add the two lines of code, and it will automatically connect to the Trains demo-server.**
-
- *Note that the demo server resets every 24 hours and all of the logged data is deleted.*
-
- When you are ready to use your own Trains server, go ahead and [install *trains-server*](https://github.com/allegroai/trains-server).
-
-
-
-
-## Configuring Your Own Trains server
-
-1. Install and run *trains-server* (see [Installing the Trains Server](https://github.com/allegroai/trains-server))
-
-2. Run the initial configuration wizard for your Trains installation and follow the instructions to setup Trains package
-(http://**_trains-server-ip_**:__port__ and user credentials)
-
- trains-init
-
-After installing and configuring, you can access your configuration file at `~/trains.conf`
-
-Sample configuration file available [here](https://github.com/allegroai/trains/blob/master/docs/trains.conf).
+ - Use it on a daily basis to boost collaboration and visibility in your team
+ - Create a remote job from any experiment with a click of a button
+ - Automate processes and create pipelines to collect your experimentation logs, outputs, and data
+ - Store all you data on any object-storage solution, with the simplest interface possible
+ - Make you data transparent by cataloging it all on the ClearML platform
+We believe ClearML is ground-breaking. We wish to establish new standards of true seamless integration between
+experiment management,ML-Ops and data management.
## Who We Are
-Trains is supported by the same team behind *allegro.ai*,
+ClearML is supported by the team behind *allegro.ai*,
where we build deep learning pipelines and infrastructure for enterprise companies.
-We built Trains to track and control the glorious but messy process of training production-grade deep learning models.
-We are committed to vigorously supporting and expanding the capabilities of Trains.
+We built ClearML to track and control the glorious but messy process of training production-grade deep learning models.
+We are committed to vigorously supporting and expanding the capabilities of ClearML.
-## Why Are We Releasing Trains?
-
-We believe Trains is ground-breaking. We wish to establish new standards of experiment management in
-deep-learning and ML. Only the greater community can help us do that.
-
-We promise to always be backwardly compatible. If you start working with Trains today,
-even though this project is currently in the beta stage, your logs and data will always upgrade with you.
+We promise to always be backwardly compatible, making sure all your logs, data and pipelines
+will always upgrade with you.
## License
@@ -145,19 +136,19 @@ Apache License, Version 2.0 (see the [LICENSE](https://www.apache.org/licenses/L
## Documentation, Community & Support
-More information in the [official documentation](https://allegro.ai/docs) and [on YouTube](https://www.youtube.com/c/AllegroAI).
+More information in the [official documentation](https://allegro.ai//clearml/docs) and [on YouTube](https://www.youtube.com/c/AllegroAI).
-For examples and use cases, check the [examples folder](https://github.com/allegroai/trains/tree/master/examples) and [corresponding documentation](https://allegro.ai/docs/examples/examples_overview/).
+For examples and use cases, check the [examples folder](https://github.com/allegroai/trains/tree/master/examples) and [corresponding documentation](https://allegro.ai/clearml/docs/examples/examples_overview/).
If you have any questions: post on our [Slack Channel](https://join.slack.com/t/allegroai-trains/shared_invite/enQtOTQyMTI1MzQxMzE4LTY5NTUxOTY1NmQ1MzQ5MjRhMGRhZmM4ODE5NTNjMTg2NTBlZGQzZGVkMWU3ZDg1MGE1MjQxNDEzMWU2NmVjZmY), or tag your questions on [stackoverflow](https://stackoverflow.com/questions/tagged/trains) with '**trains**' tag.
For feature requests or bug reports, please use [GitHub issues](https://github.com/allegroai/trains/issues).
-Additionally, you can always find us at *trains@allegro.ai*
+Additionally, you can always find us at *clearml@allegro.ai*
## Contributing
-See the Trains [Guidelines for Contributing](https://github.com/allegroai/trains/blob/master/docs/contributing.md).
+See the ClearML [Guidelines for Contributing](https://github.com/allegroai/trains/blob/master/docs/contributing.md).
_May the force (and the goddess of learning rates) be with you!_
diff --git a/clearml/__init__.py b/clearml/__init__.py
index c05221e3..0ad97be4 100644
--- a/clearml/__init__.py
+++ b/clearml/__init__.py
@@ -1,4 +1,4 @@
-""" TRAINS open SDK """
+""" ClearML open SDK """
from .version import __version__
from .task import Task
@@ -6,5 +6,7 @@ from .model import InputModel, OutputModel, Model
from .logger import Logger
from .storage import StorageManager
from .errors import UsageError
+from .datasets import Dataset
-__all__ = ["__version__", "Task", "InputModel", "OutputModel", "Model", "Logger", "StorageManager", "UsageError"]
+__all__ = ["__version__", "Task", "InputModel", "OutputModel", "Model", "Logger",
+ "StorageManager", "UsageError", "Dataset"]
diff --git a/clearml/automation/__init__.py b/clearml/automation/__init__.py
index d2ac6d46..71b68351 100644
--- a/clearml/automation/__init__.py
+++ b/clearml/automation/__init__.py
@@ -1,6 +1,7 @@
from .parameters import UniformParameterRange, DiscreteParameterRange, UniformIntegerParameterRange, ParameterSet
from .optimization import GridSearch, RandomSearch, HyperParameterOptimizer, Objective
from .job import TrainsJob
+from .controller import PipelineController
__all__ = ["UniformParameterRange", "DiscreteParameterRange", "UniformIntegerParameterRange", "ParameterSet",
- "GridSearch", "RandomSearch", "HyperParameterOptimizer", "Objective", "TrainsJob"]
+ "GridSearch", "RandomSearch", "HyperParameterOptimizer", "Objective", "TrainsJob", "PipelineController"]
diff --git a/clearml/automation/auto_scaler.py b/clearml/automation/auto_scaler.py
index b35d0d64..66b51d4a 100644
--- a/clearml/automation/auto_scaler.py
+++ b/clearml/automation/auto_scaler.py
@@ -102,15 +102,15 @@ class AutoScaler(object):
def spin_up_worker(self, resource, worker_id_prefix, queue_name):
"""
- Creates a new worker for trains (cloud-specific implementation).
+ Creates a new worker for clearml (cloud-specific implementation).
First, create an instance in the cloud and install some required packages.
- Then, define trains-agent environment variables and run trains-agent for the specified queue.
+ Then, define clearml-agent environment variables and run clearml-agent for the specified queue.
NOTE: - Will wait until instance is running
- This implementation assumes the instance image already has docker installed
:param str resource: resource name, as defined in self.resource_configurations and self.queues.
:param str worker_id_prefix: worker name prefix
- :param str queue_name: trains queue to listen to
+ :param str queue_name: clearml queue to listen to
"""
pass
@@ -137,17 +137,17 @@ class AutoScaler(object):
minutes would be removed.
"""
- # Worker's id in trains would be composed from prefix, name, instance_type and cloud_id separated by ';'
+ # Worker's id in clearml would be composed from prefix, name, instance_type and cloud_id separated by ';'
workers_pattern = re.compile(
r"^(?P[^:]+):(?P[^:]+):(?P[^:]+):(?P[^:]+)"
)
- # Set up the environment variables for trains
- os.environ["TRAINS_API_HOST"] = self.api_server
- os.environ["TRAINS_WEB_HOST"] = self.web_server
- os.environ["TRAINS_FILES_HOST"] = self.files_server
- os.environ["TRAINS_API_ACCESS_KEY"] = self.access_key
- os.environ["TRAINS_API_SECRET_KEY"] = self.secret_key
+ # Set up the environment variables for clearml
+ os.environ["CLEARML_API_HOST"] = self.api_server
+ os.environ["CLEARML_WEB_HOST"] = self.web_server
+ os.environ["CLEARML_FILES_HOST"] = self.files_server
+ os.environ["CLEARML_API_ACCESS_KEY"] = self.access_key
+ os.environ["CLEARML_API_SECRET_KEY"] = self.secret_key
api_client = APIClient()
# Verify the requested queues exist and create those that doesn't exist
@@ -234,7 +234,7 @@ class AutoScaler(object):
# skip resource types that might be needed
if resources in required_idle_resources:
continue
- # Remove from both aws and trains all instances that are idle for longer than MAX_IDLE_TIME_MIN
+ # Remove from both aws and clearml all instances that are idle for longer than MAX_IDLE_TIME_MIN
if time() - timestamp > self.max_idle_time_min * 60.0:
cloud_id = workers_pattern.match(worker.id)["cloud_id"]
self.spin_down_worker(cloud_id)
diff --git a/clearml/automation/aws_auto_scaler.py b/clearml/automation/aws_auto_scaler.py
index 0c8c4b2d..13c142c8 100644
--- a/clearml/automation/aws_auto_scaler.py
+++ b/clearml/automation/aws_auto_scaler.py
@@ -31,15 +31,15 @@ class AwsAutoScaler(AutoScaler):
def spin_up_worker(self, resource, worker_id_prefix, queue_name):
"""
- Creates a new worker for trains.
+ Creates a new worker for clearml.
First, create an instance in the cloud and install some required packages.
- Then, define trains-agent environment variables and run trains-agent for the specified queue.
+ Then, define clearml-agent environment variables and run clearml-agent for the specified queue.
NOTE: - Will wait until instance is running
- This implementation assumes the instance image already has docker installed
:param str resource: resource name, as defined in BUDGET and QUEUES.
:param str worker_id_prefix: worker name prefix
- :param str queue_name: trains queue to listen to
+ :param str queue_name: clearml queue to listen to
"""
resource_conf = self.resource_configurations[resource]
# Add worker type and AWS instance type to the worker name.
@@ -50,7 +50,7 @@ class AwsAutoScaler(AutoScaler):
)
# user_data script will automatically run when the instance is started. it will install the required packages
- # for trains-agent configure it using environment variables and run trains-agent on the required queue
+ # for clearml-agent configure it using environment variables and run clearml-agent on the required queue
user_data = """#!/bin/bash
sudo apt-get update
sudo apt-get install -y python3-dev
@@ -60,22 +60,22 @@ class AwsAutoScaler(AutoScaler):
sudo apt-get install -y build-essential
python3 -m pip install -U pip
python3 -m pip install virtualenv
- python3 -m virtualenv trains_agent_venv
- source trains_agent_venv/bin/activate
- python -m pip install trains-agent
- echo 'agent.git_user=\"{git_user}\"' >> /root/trains.conf
- echo 'agent.git_pass=\"{git_pass}\"' >> /root/trains.conf
- echo "{trains_conf}" >> /root/trains.conf
- export TRAINS_API_HOST={api_server}
- export TRAINS_WEB_HOST={web_server}
- export TRAINS_FILES_HOST={files_server}
+ python3 -m virtualenv clearml_agent_venv
+ source clearml_agent_venv/bin/activate
+ python -m pip install clearml-agent
+ echo 'agent.git_user=\"{git_user}\"' >> /root/clearml.conf
+ echo 'agent.git_pass=\"{git_pass}\"' >> /root/clearml.conf
+ echo "{clearml_conf}" >> /root/clearml.conf
+ export CLEARML_API_HOST={api_server}
+ export CLEARML_WEB_HOST={web_server}
+ export CLEARML_FILES_HOST={files_server}
export DYNAMIC_INSTANCE_ID=`curl http://169.254.169.254/latest/meta-data/instance-id`
- export TRAINS_WORKER_ID={worker_id}:$DYNAMIC_INSTANCE_ID
- export TRAINS_API_ACCESS_KEY='{access_key}'
- export TRAINS_API_SECRET_KEY='{secret_key}'
+ export CLEARML_WORKER_ID={worker_id}:$DYNAMIC_INSTANCE_ID
+ export CLEARML_API_ACCESS_KEY='{access_key}'
+ export CLEARML_API_SECRET_KEY='{secret_key}'
{bash_script}
source ~/.bashrc
- python -m trains_agent --config-file '/root/trains.conf' daemon --queue '{queue}' {docker}
+ python -m clearml_agent --config-file '/root/clearml.conf' daemon --queue '{queue}' {docker}
shutdown
""".format(
api_server=self.api_server,
@@ -87,7 +87,7 @@ class AwsAutoScaler(AutoScaler):
queue=queue_name,
git_user=self.git_user or "",
git_pass=self.git_pass or "",
- trains_conf='\\"'.join(self.extra_trains_conf.split('"')),
+ clearml_conf='\\"'.join(self.extra_trains_conf.split('"')),
bash_script=self.extra_vm_bash_script,
docker="--docker '{}'".format(self.default_docker_image)
if self.default_docker_image
diff --git a/clearml/automation/controller.py b/clearml/automation/controller.py
index 0625b2b2..365e41ac 100644
--- a/clearml/automation/controller.py
+++ b/clearml/automation/controller.py
@@ -17,7 +17,7 @@ class PipelineController(object):
"""
Pipeline controller.
Pipeline is a DAG of base tasks, each task will be cloned (arguments changed as required) executed and monitored
- The pipeline process (task) itself can be executed manually or by the trains-agent services queue.
+ The pipeline process (task) itself can be executed manually or by the clearml-agent services queue.
Notice: The pipeline controller lives as long as the pipeline itself is being executed.
"""
_tag = 'pipeline'
@@ -601,7 +601,7 @@ class PipelineController(object):
print('Parameters:\n{}'.format(self._nodes[name].job.task_parameter_override))
self._running_nodes.append(name)
else:
- getLogger('trains.automation.controller').error(
+ getLogger('clearml.automation.controller').error(
'ERROR: Failed launching step \'{}\': {}'.format(name, self._nodes[name]))
# update current state (in configuration, so that we could later continue an aborted pipeline)
diff --git a/clearml/automation/job.py b/clearml/automation/job.py
index cc2d9a92..344b976b 100644
--- a/clearml/automation/job.py
+++ b/clearml/automation/job.py
@@ -8,7 +8,7 @@ from ..task import Task
from ..backend_api.services import tasks as tasks_service
-logger = getLogger('trains.automation.job')
+logger = getLogger('clearml.automation.job')
class TrainsJob(object):
diff --git a/clearml/automation/monitor.py b/clearml/automation/monitor.py
index 7c751500..dff7820a 100644
--- a/clearml/automation/monitor.py
+++ b/clearml/automation/monitor.py
@@ -22,7 +22,7 @@ class Monitor(object):
self._project_ids = None
self._projects = None
self._projects_refresh_timestamp = None
- self._trains_apiclient = None
+ self._clearml_apiclient = None
def set_projects(self, project_names=None, project_names_re=None, project_ids=None):
# type: (Optional[Sequence[str]], Optional[Sequence[str]], Optional[Sequence[str]]) -> ()
@@ -167,10 +167,10 @@ class Monitor(object):
def _get_api_client(self):
# type: () -> APIClient
"""
- Return an APIClient object to directly query the trains-server
+ Return an APIClient object to directly query the clearml-server
:return: APIClient object
"""
- if not self._trains_apiclient:
- self._trains_apiclient = APIClient()
- return self._trains_apiclient
+ if not self._clearml_apiclient:
+ self._clearml_apiclient = APIClient()
+ return self._clearml_apiclient
diff --git a/clearml/automation/optimization.py b/clearml/automation/optimization.py
index ac78c42f..f860642c 100644
--- a/clearml/automation/optimization.py
+++ b/clearml/automation/optimization.py
@@ -15,7 +15,7 @@ from ..logger import Logger
from ..backend_api.services import workers as workers_service, tasks as tasks_services
from ..task import Task
-logger = getLogger('trains.automation.optimization')
+logger = getLogger('clearml.automation.optimization')
try:
@@ -878,9 +878,9 @@ class HyperParameterOptimizer(object):
:linenos:
:caption: Example
- from trains import Task
- from trains.automation import UniformParameterRange, DiscreteParameterRange
- from trains.automation import GridSearch, RandomSearch, HyperParameterOptimizer
+ from clearml import Task
+ from clearml.automation import UniformParameterRange, DiscreteParameterRange
+ from clearml.automation import GridSearch, RandomSearch, HyperParameterOptimizer
task = Task.init('examples', 'HyperParameterOptimizer example')
an_optimizer = HyperParameterOptimizer(
diff --git a/clearml/backend_api/api_proxy.py b/clearml/backend_api/api_proxy.py
index 164392e0..22281cfd 100644
--- a/clearml/backend_api/api_proxy.py
+++ b/clearml/backend_api/api_proxy.py
@@ -8,7 +8,7 @@ from ..utilities.check_updates import Version
class ApiServiceProxy(object):
- _main_services_module = "trains.backend_api.services"
+ _main_services_module = "clearml.backend_api.services"
_available_versions = None
def __init__(self, module):
diff --git a/clearml/backend_api/config/default/api.conf b/clearml/backend_api/config/default/api.conf
index e4d88bee..721f6434 100644
--- a/clearml/backend_api/config/default/api.conf
+++ b/clearml/backend_api/config/default/api.conf
@@ -1,16 +1,16 @@
{
version: 1.5
- # default api_server: https://demoapi.trains.allegro.ai
+ # default api_server: https://demoapi.clearml.allegro.ai
api_server: ""
- # default web_server: https://demoapp.trains.allegro.ai
+ # default web_server: https://demoapp.clearml.allegro.ai
web_server: ""
- # default files_server: https://demofiles.trains.allegro.ai
+ # default files_server: https://demofiles.clearml.allegro.ai
files_server: ""
# verify host ssl certificate, set to False only if you have a very good reason
verify_certificate: True
- # default demoapi.trains.allegro.ai credentials
+ # default demoapi.clearml.allegro.ai credentials
credentials {
access_key: ""
secret_key: ""
diff --git a/clearml/backend_api/session/client/client.py b/clearml/backend_api/session/client/client.py
index 1eefa199..69ce9add 100644
--- a/clearml/backend_api/session/client/client.py
+++ b/clearml/backend_api/session/client/client.py
@@ -107,15 +107,15 @@ class StrictSession(Session):
init()
return
- original = os.environ.get(LOCAL_CONFIG_FILE_OVERRIDE_VAR, None)
+ original = LOCAL_CONFIG_FILE_OVERRIDE_VAR.get() or None
try:
- os.environ[LOCAL_CONFIG_FILE_OVERRIDE_VAR] = str(config_file)
+ LOCAL_CONFIG_FILE_OVERRIDE_VAR.set(str(config_file))
init()
finally:
if original is None:
- os.environ.pop(LOCAL_CONFIG_FILE_OVERRIDE_VAR, None)
+ LOCAL_CONFIG_FILE_OVERRIDE_VAR.pop()
else:
- os.environ[LOCAL_CONFIG_FILE_OVERRIDE_VAR] = original
+ LOCAL_CONFIG_FILE_OVERRIDE_VAR.set(original)
def send(self, request, *args, **kwargs):
result = super(StrictSession, self).send(request, *args, **kwargs)
@@ -560,4 +560,4 @@ class APIClient(object):
for name, module in services.items()
},
)
- )
\ No newline at end of file
+ )
diff --git a/clearml/backend_api/session/defs.py b/clearml/backend_api/session/defs.py
index ff78f812..8571ab5d 100644
--- a/clearml/backend_api/session/defs.py
+++ b/clearml/backend_api/session/defs.py
@@ -2,12 +2,14 @@ from ...backend_config import EnvEntry
from ...backend_config.converters import safe_text_to_bool
-ENV_HOST = EnvEntry("TRAINS_API_HOST", "ALG_API_HOST")
-ENV_WEB_HOST = EnvEntry("TRAINS_WEB_HOST", "ALG_WEB_HOST")
-ENV_FILES_HOST = EnvEntry("TRAINS_FILES_HOST", "ALG_FILES_HOST")
-ENV_ACCESS_KEY = EnvEntry("TRAINS_API_ACCESS_KEY", "ALG_API_ACCESS_KEY")
-ENV_SECRET_KEY = EnvEntry("TRAINS_API_SECRET_KEY", "ALG_API_SECRET_KEY")
-ENV_VERBOSE = EnvEntry("TRAINS_API_VERBOSE", "ALG_API_VERBOSE", type=bool, default=False)
-ENV_HOST_VERIFY_CERT = EnvEntry("TRAINS_API_HOST_VERIFY_CERT", "ALG_API_HOST_VERIFY_CERT", type=bool, default=True)
-ENV_OFFLINE_MODE = EnvEntry("TRAINS_OFFLINE_MODE", "ALG_OFFLINE_MODE", type=bool, converter=safe_text_to_bool)
-ENV_TRAINS_NO_DEFAULT_SERVER = EnvEntry("TRAINS_NO_DEFAULT_SERVER", "ALG_NO_DEFAULT_SERVER", type=bool, default=False)
+ENV_HOST = EnvEntry("CLEARML_API_HOST", "TRAINS_API_HOST")
+ENV_WEB_HOST = EnvEntry("CLEARML_WEB_HOST", "TRAINS_WEB_HOST")
+ENV_FILES_HOST = EnvEntry("CLEARML_FILES_HOST", "TRAINS_FILES_HOST")
+ENV_ACCESS_KEY = EnvEntry("CLEARML_API_ACCESS_KEY", "TRAINS_API_ACCESS_KEY")
+ENV_SECRET_KEY = EnvEntry("CLEARML_API_SECRET_KEY", "TRAINS_API_SECRET_KEY")
+ENV_VERBOSE = EnvEntry("CLEARML_API_VERBOSE", "TRAINS_API_VERBOSE", type=bool, default=False)
+ENV_HOST_VERIFY_CERT = EnvEntry("CLEARML_API_HOST_VERIFY_CERT", "TRAINS_API_HOST_VERIFY_CERT",
+ type=bool, default=True)
+ENV_OFFLINE_MODE = EnvEntry("CLEARML_OFFLINE_MODE", "TRAINS_OFFLINE_MODE", type=bool, converter=safe_text_to_bool)
+ENV_TRAINS_NO_DEFAULT_SERVER = EnvEntry("CLEARML_NO_DEFAULT_SERVER", "TRAINS_NO_DEFAULT_SERVER",
+ type=bool, default=False)
diff --git a/clearml/backend_api/session/session.py b/clearml/backend_api/session/session.py
index 34a0efe7..e6da6650 100644
--- a/clearml/backend_api/session/session.py
+++ b/clearml/backend_api/session/session.py
@@ -36,12 +36,12 @@ class MaxRequestSizeError(Exception):
class Session(TokenManager):
- """ TRAINS API Session class. """
+ """ ClearML API Session class. """
_AUTHORIZATION_HEADER = "Authorization"
- _WORKER_HEADER = "X-Trains-Worker"
- _ASYNC_HEADER = "X-Trains-Async"
- _CLIENT_HEADER = "X-Trains-Client"
+ _WORKER_HEADER = ("X-ClearML-Worker", "X-Trains-Worker", )
+ _ASYNC_HEADER = ("X-ClearML-Async", "X-Trains-Async", )
+ _CLIENT_HEADER = ("X-ClearML-Client", "X-Trains-Client", )
_async_status_code = 202
_session_requests = 0
@@ -57,10 +57,10 @@ class Session(TokenManager):
_client = [(__package__.partition(".")[0], __version__)]
api_version = '2.1'
- default_demo_host = "https://demoapi.trains.allegro.ai"
+ default_demo_host = "https://demoapi.demo.clear.ml"
default_host = default_demo_host
- default_web = "https://demoapp.trains.allegro.ai"
- default_files = "https://demofiles.trains.allegro.ai"
+ default_web = "https://demoapp.demo.clear.ml"
+ default_files = "https://demofiles.demo.clear.ml"
default_key = "EGRTCO8JMSIGI6S39GTP43NFWXDQOW"
default_secret = "x!XTov_G-#vspE*Y(h$Anm&DIc5Ou-F)jsl$PdOyj5wG1&E!Z8"
force_max_api_version = None
@@ -177,8 +177,8 @@ class Session(TokenManager):
if not api_version:
api_version = '2.2' if token_dict.get('env', '') == 'prod' else Session.api_version
if token_dict.get('server_version'):
- if not any(True for c in Session._client if c[0] == 'trains-server'):
- Session._client.append(('trains-server', token_dict.get('server_version'), ))
+ if not any(True for c in Session._client if c[0] == 'clearml-server'):
+ Session._client.append(('clearml-server', token_dict.get('server_version'), ))
Session.api_version = str(api_version)
except (jwt.DecodeError, ValueError):
@@ -218,10 +218,13 @@ class Session(TokenManager):
if self._offline_mode:
return None
+ res = None
host = self.host
headers = headers.copy() if headers else {}
- headers[self._WORKER_HEADER] = self.worker
- headers[self._CLIENT_HEADER] = self.client
+ for h in self._WORKER_HEADER:
+ headers[h] = self.worker
+ for h in self._CLIENT_HEADER:
+ headers[h] = self.client
token_refreshed_on_error = False
url = (
@@ -308,7 +311,8 @@ class Session(TokenManager):
headers.copy() if headers else {}
)
if async_enable:
- headers[self._ASYNC_HEADER] = "1"
+ for h in self._ASYNC_HEADER:
+ headers[h] = "1"
return self._send_request(
service=service,
action=action,
@@ -508,7 +512,7 @@ class Session(TokenManager):
if parsed.port == 8008:
return host.replace(':8008', ':8080', 1)
- raise ValueError('Could not detect TRAINS web application server')
+ raise ValueError('Could not detect ClearML web application server')
@classmethod
def get_files_server_host(cls, config=None):
@@ -624,7 +628,7 @@ class Session(TokenManager):
# check if this is a misconfigured api server (getting 200 without the data section)
if res and res.status_code == 200:
raise ValueError('It seems *api_server* is misconfigured. '
- 'Is this the TRAINS API server {} ?'.format(self.host))
+ 'Is this the ClearML API server {} ?'.format(self.host))
else:
raise LoginError("Response data mismatch: No 'token' in 'data' value from res, receive : {}, "
"exception: {}".format(res, ex))
diff --git a/clearml/backend_api/utils.py b/clearml/backend_api/utils.py
index 84d19211..ab35d473 100644
--- a/clearml/backend_api/utils.py
+++ b/clearml/backend_api/utils.py
@@ -14,7 +14,7 @@ if six.PY3:
from functools import lru_cache
elif six.PY2:
# python 2 support
- from backports.functools_lru_cache import lru_cache
+ from backports.functools_lru_cache import lru_cache # noqa
__disable_certificate_verification_warning = 0
@@ -139,7 +139,7 @@ def get_http_session_with_retry(
if not session.verify and __disable_certificate_verification_warning < 2:
# show warning
__disable_certificate_verification_warning += 1
- logging.getLogger('trains').warning(
+ logging.getLogger('clearml').warning(
msg='InsecureRequestWarning: Certificate verification is disabled! Adding '
'certificate verification is strongly advised. See: '
'https://urllib3.readthedocs.io/en/latest/advanced-usage.html#ssl-warnings')
diff --git a/clearml/backend_config/config.py b/clearml/backend_config/config.py
index 52d344b2..b79cb314 100644
--- a/clearml/backend_config/config.py
+++ b/clearml/backend_config/config.py
@@ -88,7 +88,7 @@ class Config(object):
self._folder_name = config_folder or DEFAULT_CONFIG_FOLDER
self._roots = []
self._config = ConfigTree()
- self._env = env or os.environ.get("TRAINS_ENV", Environment.default)
+ self._env = env or os.environ.get("CLEARML_ENV", os.environ.get("TRAINS_ENV", Environment.default))
self.config_paths = set()
self.is_server = is_server
@@ -139,7 +139,7 @@ class Config(object):
else:
env_config_paths = []
- env_config_path_override = os.environ.get(ENV_CONFIG_PATH_OVERRIDE_VAR)
+ env_config_path_override = ENV_CONFIG_PATH_OVERRIDE_VAR.get()
if env_config_path_override:
env_config_paths = [expanduser(env_config_path_override)]
@@ -166,7 +166,7 @@ class Config(object):
)
local_config_files = LOCAL_CONFIG_FILES
- local_config_override = os.environ.get(LOCAL_CONFIG_FILE_OVERRIDE_VAR)
+ local_config_override = LOCAL_CONFIG_FILE_OVERRIDE_VAR.get()
if local_config_override:
local_config_files = [expanduser(local_config_override)]
diff --git a/clearml/backend_config/defs.py b/clearml/backend_config/defs.py
index f503602c..3d0e145e 100644
--- a/clearml/backend_config/defs.py
+++ b/clearml/backend_config/defs.py
@@ -1,7 +1,10 @@
from os.path import expanduser
from pathlib2 import Path
-ENV_VAR = 'TRAINS_ENV'
+from .environment import EnvEntry
+
+
+ENV_VAR = 'CLEARML_ENV'
""" Name of system environment variable that can be used to specify the config environment name """
@@ -25,15 +28,16 @@ LOCAL_CONFIG_PATHS = [
LOCAL_CONFIG_FILES = [
expanduser('~/trains.conf'), # used for workstation configuration (end-users, workers)
+ expanduser('~/clearml.conf'), # used for workstation configuration (end-users, workers)
]
""" Local config files (not paths) """
-LOCAL_CONFIG_FILE_OVERRIDE_VAR = 'TRAINS_CONFIG_FILE'
+LOCAL_CONFIG_FILE_OVERRIDE_VAR = EnvEntry("CLEARML_CONFIG_FILE", "TRAINS_CONFIG_FILE")
""" Local config file override environment variable. If this is set, no other local config files will be used. """
-ENV_CONFIG_PATH_OVERRIDE_VAR = 'TRAINS_CONFIG_PATH'
+ENV_CONFIG_PATH_OVERRIDE_VAR = EnvEntry("CLEARML_CONFIG_PATH", "TRAINS_CONFIG_PATH")
"""
Environment-related config path override environment variable. If this is set, no other env config path will be used.
"""
diff --git a/clearml/backend_config/entry.py b/clearml/backend_config/entry.py
index 2ca3251c..709d538f 100644
--- a/clearml/backend_config/entry.py
+++ b/clearml/backend_config/entry.py
@@ -85,9 +85,10 @@ class Entry(object):
return self.get_pair(default=default, converter=converter)[1]
def set(self, value):
- # type: (Any, Any) -> (Text, Any)
- key, _ = self.get_pair(default=None, converter=None)
- self._set(key, str(value))
+ # type: (Any) -> ()
+ # key, _ = self.get_pair(default=None, converter=None)
+ for k in self.keys:
+ self._set(k, str(value))
def _set(self, key, value):
# type: (Text, Text) -> None
diff --git a/clearml/backend_config/environment.py b/clearml/backend_config/environment.py
index 915bceb7..ca389b5d 100644
--- a/clearml/backend_config/environment.py
+++ b/clearml/backend_config/environment.py
@@ -15,6 +15,10 @@ class EnvEntry(Entry):
super(EnvEntry, self).__init__(key, *more_keys, **kwargs)
self._ignore_errors = kwargs.pop('ignore_errors', False)
+ def pop(self):
+ for k in self.keys:
+ environ.pop(k, None)
+
def _get(self, key):
value = getenv(key, "").strip()
return value or NotSet
diff --git a/clearml/backend_config/log.py b/clearml/backend_config/log.py
index 6dad47bc..5cd6643f 100644
--- a/clearml/backend_config/log.py
+++ b/clearml/backend_config/log.py
@@ -5,11 +5,11 @@ from pathlib2 import Path
def logger(path=None):
- name = "trains"
+ name = "clearml"
if path:
p = Path(path)
module = (p.parent if p.stem.startswith('_') else p).stem
- name = "trains.%s" % module
+ name = "clearml.%s" % module
return logging.getLogger(name)
diff --git a/clearml/backend_interface/metrics/events.py b/clearml/backend_interface/metrics/events.py
index 1640af88..8b84fb29 100644
--- a/clearml/backend_interface/metrics/events.py
+++ b/clearml/backend_interface/metrics/events.py
@@ -50,6 +50,7 @@ class MetricsEventAdapter(object):
url = attr.attrib(default=None)
exception = attr.attrib(default=None)
+ retries = attr.attrib(default=None)
delete_local_file = attr.attrib(default=True)
""" Local file path, if exists, delete the file after upload completed """
@@ -198,6 +199,7 @@ class UploadEvent(MetricsEventAdapter):
_format = '.' + str(config.get('metrics.images.format', 'JPEG')).upper().lstrip('.')
_quality = int(config.get('metrics.images.quality', 87))
_subsampling = int(config.get('metrics.images.subsampling', 0))
+ _upload_retries = 3
_metric_counters = {}
_metric_counters_lock = Lock()
@@ -253,7 +255,7 @@ class UploadEvent(MetricsEventAdapter):
self._upload_filename += filename_ext
self._override_storage_key_prefix = kwargs.pop('override_storage_key_prefix', None)
-
+ self.retries = self._upload_retries
super(UploadEvent, self).__init__(metric, variant, iter=iter, **kwargs)
@classmethod
@@ -334,6 +336,7 @@ class UploadEvent(MetricsEventAdapter):
key_prop='key',
upload_uri=self._upload_uri,
delete_local_file=local_file if self._delete_after_upload else None,
+ retries=self.retries,
)
def get_target_full_upload_uri(self, storage_uri, storage_key_prefix=None, quote_uri=True):
diff --git a/clearml/backend_interface/metrics/interface.py b/clearml/backend_interface/metrics/interface.py
index f3022e38..c37927f3 100644
--- a/clearml/backend_interface/metrics/interface.py
+++ b/clearml/backend_interface/metrics/interface.py
@@ -165,10 +165,11 @@ class Metrics(InterfaceBase):
try:
storage = self._get_storage(upload_uri)
+ retries = getattr(e, 'retries', None) or self._file_upload_retries
if isinstance(e.stream, Path):
- url = storage.upload(e.stream.as_posix(), e.url, retries=self._file_upload_retries)
+ url = storage.upload(e.stream.as_posix(), e.url, retries=retries)
else:
- url = storage.upload_from_stream(e.stream, e.url, retries=self._file_upload_retries)
+ url = storage.upload_from_stream(e.stream, e.url, retries=retries)
e.event.update(url=url)
except Exception as exp:
log.warning("Failed uploading to {} ({})".format(
diff --git a/clearml/backend_interface/metrics/reporter.py b/clearml/backend_interface/metrics/reporter.py
index 949f6f09..7a4639e8 100644
--- a/clearml/backend_interface/metrics/reporter.py
+++ b/clearml/backend_interface/metrics/reporter.py
@@ -178,7 +178,7 @@ class Reporter(InterfaceBase, AbstractContextManager, SetupUploadMixin, AsyncMan
self._report(ev)
def report_matplotlib(self, title, series, figure, iter, force_save_as_image=False, logger=None):
- from trains.binding.matplotlib_bind import PatchedMatplotlib
+ from clearml.binding.matplotlib_bind import PatchedMatplotlib
PatchedMatplotlib.report_figure(
title=title,
series=series,
diff --git a/clearml/backend_interface/task/log.py b/clearml/backend_interface/task/log.py
index bda1bfb8..06ab451b 100644
--- a/clearml/backend_interface/task/log.py
+++ b/clearml/backend_interface/task/log.py
@@ -62,7 +62,7 @@ class TaskHandler(BufferingHandler):
if self._connect_logger and not TaskHandler.__once:
base_logger = getLogger()
if len(base_logger.handlers) == 1 and isinstance(base_logger.handlers[0], TaskHandler):
- if record.name != 'console' and not record.name.startswith('trains.'):
+ if record.name != 'console' and not record.name.startswith('clearml.'):
base_logger.removeHandler(self)
basicConfig()
base_logger.addHandler(self)
@@ -149,7 +149,7 @@ class TaskHandler(BufferingHandler):
self._last_event = None
batch_requests = events.AddBatchRequest(requests=[events.AddRequest(e) for e in record_events if e])
except Exception:
- self.__log_stderr("WARNING: trains.log - Failed logging task to backend ({:d} lines)".format(len(buffer)))
+ self.__log_stderr("WARNING: clearml.log - Failed logging task to backend ({:d} lines)".format(len(buffer)))
batch_requests = None
if batch_requests and batch_requests.requests:
@@ -253,7 +253,7 @@ class TaskHandler(BufferingHandler):
write = sys.stderr._original_write if hasattr(sys.stderr, '_original_write') else sys.stderr.write
write('{asctime} - {name} - {levelname} - {message}\n'.format(
asctime=Formatter().formatTime(makeLogRecord({})),
- name='trains.log', levelname=getLevelName(level), message=msg))
+ name='clearml.log', levelname=getLevelName(level), message=msg))
@classmethod
def report_offline_session(cls, task, folder):
diff --git a/clearml/backend_interface/task/populate.py b/clearml/backend_interface/task/populate.py
new file mode 100644
index 00000000..cd1239fb
--- /dev/null
+++ b/clearml/backend_interface/task/populate.py
@@ -0,0 +1,317 @@
+import json
+import os
+from functools import reduce
+from logging import getLogger
+from typing import Optional, Sequence
+
+from six.moves.urllib.parse import urlparse
+
+from pathlib2 import Path
+
+from ...task import Task
+from .repo import ScriptInfo
+
+
+class CreateAndPopulate(object):
+ def __init__(
+ self,
+ project_name=None, # Optional[str]
+ task_name=None, # Optional[str]
+ task_type=None, # Optional[str]
+ repo=None, # Optional[str]
+ branch=None, # Optional[str]
+ commit=None, # Optional[str]
+ script=None, # Optional[str]
+ working_directory=None, # Optional[str]
+ packages=None, # Optional[Sequence[str]]
+ requirements_file=None, # Optional[Union[str, Path]]
+ docker=None, # Optional[str]
+ base_task_id=None, # Optional[str]
+ add_task_init_call=True, # bool
+ raise_on_missing_entries=False, # bool
+ ):
+ # type: (...) -> None
+ """
+ Create a new Task from an existing code base.
+ If the code does not already contain a call to Task.init, pass add_task_init_call=True,
+ and the code will be patched in remote execution (i.e. when executed by `clearml-agent`
+
+ :param project_name: Set the project name for the task. Required if base_task_id is None.
+ :param task_name: Set the name of the remote task. Required if base_task_id is None.
+ :param task_type: Optional, The task type to be created. Supported values: 'training', 'testing', 'inference',
+ 'data_processing', 'application', 'monitor', 'controller', 'optimizer', 'service', 'qc', 'custom'
+ :param repo: Remote URL for the repository to use, or path to local copy of the git repository
+ Example: 'https://github.com/allegroai/clearml.git' or '~/project/repo'
+ :param branch: Select specific repository branch/tag (implies the latest commit from the branch)
+ :param commit: Select specific commit id to use (default: latest commit,
+ or when used with local repository matching the local commit id)
+ :param script: Specify the entry point script for the remote execution. When used in tandem with
+ remote git repository the script should be a relative path inside the repository,
+ for example: './source/train.py' . When used with local repository path it supports a
+ direct path to a file inside the local repository itself, for example: '~/project/source/train.py'
+ :param working_directory: Working directory to launch the script from. Default: repository root folder.
+ Relative to repo root or local folder.
+ :param packages: Manually specify a list of required packages. Example: ["tqdm>=2.1", "scikit-learn"]
+ :param requirements_file: Specify requirements.txt file to install when setting the session.
+ If not provided, the requirements.txt from the repository will be used.
+ :param docker: Select the docker image to be executed in by the remote session
+ :param base_task_id: Use a pre-existing task in the system, instead of a local repo/script.
+ Essentially clones an existing task and overrides arguments/requirements.
+ :param add_task_init_call: If True, a 'Task.init()' call is added to the script entry point in remote execution.
+ :param raise_on_missing_entries: If True raise ValueError on missing entries when populating
+ """
+ if len(urlparse(repo).scheme) <= 1:
+ folder = repo
+ repo = None
+ else:
+ folder = None
+
+ if raise_on_missing_entries and not base_task_id:
+ if not script:
+ raise ValueError("Entry point script not provided")
+ if not repo and not folder and not Path(script).is_file():
+ raise ValueError("Repository or script must be provided")
+ if raise_on_missing_entries and commit and branch:
+ raise ValueError(
+ "Specify either a branch/tag or specific commit id, not both (either --commit or --branch)")
+ if raise_on_missing_entries and not folder and working_directory and working_directory.startswith('/'):
+ raise ValueError("working directory \'{}\', must be relative to repository root")
+
+ if requirements_file and not Path(requirements_file).is_file():
+ raise ValueError("requirements file could not be found \'{}\'")
+
+ self.folder = folder
+ self.commit = commit
+ self.branch = branch
+ self.repo = repo
+ self.script = script
+ self.cwd = working_directory
+ assert not packages or isinstance(packages, (tuple, list))
+ self.packages = list(packages) if packages else None
+ self.requirements_file = Path(requirements_file) if requirements_file else None
+ self.base_task_id = base_task_id
+ self.docker = docker
+ self.add_task_init_call = add_task_init_call
+ self.project_name = project_name
+ self.task_name = task_name
+ self.task_type = task_type
+ self.task = None
+ self.raise_on_missing_entries = raise_on_missing_entries
+
+ def create_task(self):
+ # type: () -> Task
+ """
+ Create the new populated Task
+
+ :return: newly created Task object
+ """
+ local_entry_file = None
+ repo_info = None
+ if self.folder or (self.script and Path(self.script).is_file()):
+ self.folder = os.path.expandvars(os.path.expanduser(self.folder)) if self.folder else None
+ self.script = os.path.expandvars(os.path.expanduser(self.script)) if self.script else None
+ self.cwd = os.path.expandvars(os.path.expanduser(self.cwd)) if self.cwd else None
+ if Path(self.script).is_file():
+ entry_point = self.script
+ else:
+ entry_point = (Path(self.folder) / self.script).as_posix()
+ entry_point = os.path.abspath(entry_point)
+ if not os.path.isfile(entry_point):
+ raise ValueError("Script entrypoint file \'{}\' could not be found".format(entry_point))
+
+ local_entry_file = entry_point
+ repo_info, requirements = ScriptInfo.get(
+ filepaths=[entry_point],
+ log=getLogger(),
+ create_requirements=False, uncommitted_from_remote=True)
+
+ # check if we have no repository and no requirements raise error
+ if self.raise_on_missing_entries and not self.requirements_file and not self.repo and (
+ not repo_info or not repo_info.script or not repo_info.script.get('repository')):
+ raise ValueError("Standalone script detected \'{}\', but no requirements provided".format(self.script))
+
+ if self.base_task_id:
+ print('Cloning task {}'.format(self.base_task_id))
+ task = Task.clone(source_task=self.base_task_id, project=Task.get_project_id(self.project_name))
+ else:
+ # noinspection PyProtectedMember
+ task = Task._create(task_name=self.task_name, project_name=self.project_name, task_type=self.task_type)
+ # if there is nothing to populate, return
+ if not any([
+ self.folder, self.commit, self.branch, self.repo, self.script, self.cwd,
+ self.packages, self.requirements_file, self.base_task_id, self.docker
+ ]):
+ return task
+
+ task_state = task.export_task()
+ if 'script' not in task_state:
+ task_state['script'] = {}
+
+ if repo_info:
+ task_state['script']['repository'] = repo_info.script['repository']
+ task_state['script']['version_num'] = repo_info.script['version_num']
+ task_state['script']['branch'] = repo_info.script['branch']
+ task_state['script']['diff'] = repo_info.script['diff'] or ''
+ task_state['script']['working_dir'] = repo_info.script['working_dir']
+ task_state['script']['entry_point'] = repo_info.script['entry_point']
+ task_state['script']['binary'] = repo_info.script['binary']
+ task_state['script']['requirements'] = {}
+ if self.cwd:
+ self.cwd = self.cwd
+ cwd = self.cwd if Path(self.cwd).is_dir() else (
+ Path(repo_info.script['repo_root']) / self.cwd).as_posix()
+ if not Path(cwd).is_dir():
+ raise ValueError("Working directory \'{}\' could not be found".format(cwd))
+ cwd = Path(cwd).relative_to(repo_info.script['repo_root']).as_posix()
+ entry_point = \
+ Path(repo_info.script['repo_root']) / repo_info.script['working_dir'] / repo_info.script[
+ 'entry_point']
+ entry_point = entry_point.relative_to(cwd).as_posix()
+ task_state['script']['entry_point'] = entry_point
+ task_state['script']['working_dir'] = cwd
+ elif self.repo:
+ # normalize backslashes and remove first one
+ entry_point = '/'.join([p for p in self.script.split('/') if p and p != '.'])
+ cwd = '/'.join([p for p in (self.cwd or '.').split('/') if p and p != '.'])
+ if cwd and entry_point.startswith(cwd + '/'):
+ entry_point = entry_point[len(cwd) + 1:]
+ task_state['script']['repository'] = self.repo
+ task_state['script']['version_num'] = self.commit or None
+ task_state['script']['branch'] = self.branch or None
+ task_state['script']['diff'] = ''
+ task_state['script']['working_dir'] = cwd or '.'
+ task_state['script']['entry_point'] = entry_point
+
+ # update requirements
+ reqs = []
+ if self.requirements_file:
+ with open(self.requirements_file.as_posix(), 'rt') as f:
+ reqs = [line.strip() for line in f.readlines()]
+ if self.packages:
+ reqs += self.packages
+ if reqs:
+ # make sure we have clearml.
+ clearml_found = False
+ for line in reqs:
+ if line.strip().startswith('#'):
+ continue
+ package = reduce(lambda a, b: a.split(b)[0], "#;@=~<>", line).strip()
+ if package == 'clearml':
+ clearml_found = True
+ break
+ if not clearml_found:
+ reqs.append('clearml')
+ task_state['script']['requirements'] = {'pip': '\n'.join(reqs)}
+ elif not self.repo and repo_info:
+ # we are in local mode, make sure we have "requirements.txt" it is a must
+ reqs_txt_file = Path(repo_info.script['repo_root']) / "requirements.txt"
+ if self.raise_on_missing_entries and not reqs_txt_file.is_file():
+ raise ValueError(
+ "requirements.txt not found [{}] "
+ "Use --requirements or --packages".format(reqs_txt_file.as_posix()))
+
+ if self.add_task_init_call:
+ script_entry = os.path.abspath('/' + task_state['script']['working_dir'] +
+ '/' + task_state['script']['entry_point'])
+ idx_a = 0
+ # find the right entry for the patch if we have a local file (basically after __future__
+ if local_entry_file:
+ with open(local_entry_file, 'rt') as f:
+ lines = f.readlines()
+ future_found = -1
+ for i, line in enumerate(lines):
+ tokens = [t.strip() for t in line.split(' ') if t.strip()]
+ if tokens and tokens[0] in ('import', 'from',):
+ if '__future__' in line:
+ future_found = i
+ else:
+ break
+ if future_found >= 0:
+ idx_a = future_found + 1
+
+ task_init_patch = ''
+ # if we do not have requirements, add clearml to the requirements.txt
+ if not reqs:
+ task_init_patch += \
+ "diff --git a/requirements.txt b/requirements.txt\n" \
+ "--- a/requirements.txt\n" \
+ "+++ b/requirements.txt\n" \
+ "@@ -0,0 +1,1 @@\n" \
+ "+clearml\n"
+
+ task_init_patch += \
+ "diff --git a{script_entry} b{script_entry}\n" \
+ "--- a{script_entry}\n" \
+ "+++ b{script_entry}\n" \
+ "@@ -{idx_a},0 +{idx_b},3 @@\n" \
+ "+from clearml import Task\n" \
+ "+Task.init()\n" \
+ "+\n".format(
+ script_entry=script_entry, idx_a=idx_a, idx_b=idx_a + 1)
+
+ task_state['script']['diff'] = task_init_patch + task_state['script']['diff']
+
+ # set base docker image if provided
+ if self.docker:
+ task.set_base_docker(self.docker)
+
+ if task_state['script']['repository']:
+ repo_details = {k: v for k, v in task_state['script'].items()
+ if v and k not in ('diff', 'requirements', 'binary')}
+ print('Repository Detected\n{}'.format(json.dumps(repo_details, indent=2)))
+ else:
+ print('Standalone script detected\n Script: {}\n: Requirements: {}'.format(
+ self.script, task_state['script']['requirements'].get('pip', [])))
+
+ if task_state['script'].get('requirements') and task_state['script']['requirements'].get('pip'):
+ print('Requirements:\n requirements.txt: {}\n Additional Packages:{}'.format(
+ self.requirements_file.as_posix().name if self.requirements_file else '', self.packages))
+ if self.docker:
+ print('Base docker image: {}'.format(self.docker))
+
+ # update the Task
+ task.update_task(task_state)
+ self.task = task
+ return task
+
+ def update_task_args(self, args=None):
+ # type: (Optional[Sequence[str]]) -> ()
+ """
+ Update the newly created Task argparse Arguments
+ If called before Task created, used for argument verification
+
+ :param args: Arguments to pass to the remote execution, list of string pairs (argument, value) or
+ list of strings '='. Example: ['lr=0.003', (batch_size, 64)]
+ """
+ if not args:
+ return
+
+ # check args are in format =
+ args_list = []
+ for a in args:
+ if isinstance(a, (list, tuple)):
+ assert len(a) == 2
+ args_list.append(a)
+ continue
+ try:
+ parts = a.split('=', 1)
+ assert len(parts) == 2
+ args_list.append(parts)
+ except Exception:
+ raise ValueError(
+ "Failed parsing argument \'{}\', arguments must be in \'=\' format")
+
+ if not self.task:
+ return
+
+ task_params = self.task.get_parameters()
+ args_list = {'Args/{}'.format(k): v for k, v in args_list}
+ task_params.update(args_list)
+ self.task.set_parameters(task_params)
+
+ def get_id(self):
+ # type: () -> Optional[str]
+ """
+ :return: Return the created Task id (str)
+ """
+ return self.task.id if self.task else None
diff --git a/clearml/backend_interface/task/repo/scriptinfo.py b/clearml/backend_interface/task/repo/scriptinfo.py
index 977fa426..3f0dd2fc 100644
--- a/clearml/backend_interface/task/repo/scriptinfo.py
+++ b/clearml/backend_interface/task/repo/scriptinfo.py
@@ -52,21 +52,21 @@ class ScriptRequirements(object):
try:
# noinspection PyPackageRequirements,PyUnresolvedReferences
import boto3 # noqa: F401
- modules.add('boto3', 'trains.storage', 0)
+ modules.add('boto3', 'clearml.storage', 0)
except Exception:
pass
# noinspection PyBroadException
try:
# noinspection PyPackageRequirements,PyUnresolvedReferences
from google.cloud import storage # noqa: F401
- modules.add('google_cloud_storage', 'trains.storage', 0)
+ modules.add('google_cloud_storage', 'clearml.storage', 0)
except Exception:
pass
# noinspection PyBroadException
try:
# noinspection PyPackageRequirements,PyUnresolvedReferences
from azure.storage.blob import ContentSettings # noqa: F401
- modules.add('azure_storage_blob', 'trains.storage', 0)
+ modules.add('azure_storage_blob', 'clearml.storage', 0)
except Exception:
pass
@@ -100,7 +100,7 @@ class ScriptRequirements(object):
from ..task import Task
# noinspection PyProtectedMember
for package, version in Task._force_requirements.items():
- modules.add(package, 'trains', 0)
+ modules.add(package, 'clearml', 0)
except Exception:
pass
@@ -265,7 +265,7 @@ class _JupyterObserver(object):
@classmethod
def _daemon(cls, jupyter_notebook_filename):
- from trains import Task
+ from clearml import Task
# load jupyter notebook package
# noinspection PyBroadException
@@ -715,12 +715,12 @@ class ScriptInfo(object):
jupyter_filepath=jupyter_filepath,
)
- if repo_info.modified:
- messages.append(
- "======> WARNING! UNCOMMITTED CHANGES IN REPOSITORY {} <======".format(
- script_info.get("repository", "")
- )
- )
+ # if repo_info.modified:
+ # messages.append(
+ # "======> WARNING! UNCOMMITTED CHANGES IN REPOSITORY {} <======".format(
+ # script_info.get("repository", "")
+ # )
+ # )
if not any(script_info.values()):
script_info = None
diff --git a/clearml/backend_interface/task/task.py b/clearml/backend_interface/task/task.py
index 96ad742c..c9944e88 100644
--- a/clearml/backend_interface/task/task.py
+++ b/clearml/backend_interface/task/task.py
@@ -27,6 +27,7 @@ from six.moves.urllib.parse import quote
from ...utilities.locks import RLock as FileRLock
from ...utilities.attrs import readonly
+from ...utilities.proxy_object import verify_basic_type
from ...binding.artifacts import Artifacts
from ...backend_interface.task.development.worker import DevWorker
from ...backend_api import Session
@@ -144,9 +145,9 @@ class Task(IdObjectBase, AccessMixin, SetupUploadMixin):
self.__reporter = None
self._curr_label_stats = {}
self._raise_on_validation_errors = raise_on_validation_errors
- self._parameters_allowed_types = (
+ self._parameters_allowed_types = tuple(set(
six.string_types + six.integer_types + (six.text_type, float, list, tuple, dict, type(None))
- )
+ ))
self._app_server = None
self._files_server = None
self._initial_iteration_offset = 0
@@ -216,7 +217,7 @@ class Task(IdObjectBase, AccessMixin, SetupUploadMixin):
)
else:
self.get_logger().report_text(
- 'TRAINS new version available: upgrade to v{} is recommended!'.format(
+ 'ClearML new version available: upgrade to v{} is recommended!'.format(
latest_version[0]),
)
except Exception:
@@ -296,8 +297,8 @@ class Task(IdObjectBase, AccessMixin, SetupUploadMixin):
if task_type.value not in (self.TaskTypes.training, self.TaskTypes.testing) and \
not Session.check_min_api_version('2.8'):
print('WARNING: Changing task type to "{}" : '
- 'trains-server does not support task type "{}", '
- 'please upgrade trains-server.'.format(self.TaskTypes.training, task_type.value))
+ 'clearml-server does not support task type "{}", '
+ 'please upgrade clearml-server.'.format(self.TaskTypes.training, task_type.value))
task_type = self.TaskTypes.training
project_id = None
@@ -402,7 +403,7 @@ class Task(IdObjectBase, AccessMixin, SetupUploadMixin):
# type: () -> str
"""
The Task's status. To keep the Task updated.
- Trains reloads the Task status information only, when this value is accessed.
+ ClearML reloads the Task status information only, when this value is accessed.
return str: TaskStatusEnum status
"""
@@ -445,7 +446,7 @@ class Task(IdObjectBase, AccessMixin, SetupUploadMixin):
def reload(self):
# type: () -> ()
"""
- Reload current Task's state from trains-server.
+ Reload current Task's state from clearml-server.
Refresh all task's fields, including artifacts / models / parameters etc.
"""
return super(Task, self).reload()
@@ -628,9 +629,9 @@ class Task(IdObjectBase, AccessMixin, SetupUploadMixin):
):
# type: (...) -> str
"""
- Update the Task's output model weights file. First, Trains uploads the file to the preconfigured output
+ Update the Task's output model weights file. First, ClearML uploads the file to the preconfigured output
destination (see the Task's ``output.destination`` property or call the ``setup_upload`` method),
- then Trains updates the model object associated with the Task an API call. The API call uses with the URI
+ then ClearML updates the model object associated with the Task an API call. The API call uses with the URI
of the uploaded file, and other values provided by additional arguments.
:param str model_file: The path to the updated model weights file.
@@ -684,19 +685,19 @@ class Task(IdObjectBase, AccessMixin, SetupUploadMixin):
Set a new input model for the Task. The model must be "ready" (status is ``Published``) to be used as the
Task's input model.
- :param model_id: The Id of the model on the **Trains Server** (backend). If ``model_name`` is not specified,
+ :param model_id: The Id of the model on the **ClearML Server** (backend). If ``model_name`` is not specified,
then ``model_id`` must be specified.
- :param model_name: The model name. The name is used to locate an existing model in the **Trains Server**
+ :param model_name: The model name. The name is used to locate an existing model in the **ClearML Server**
(backend). If ``model_id`` is not specified, then ``model_name`` must be specified.
:param update_task_design: Update the Task's design
- - ``True`` - Trains copies the Task's model design from the input model.
- - ``False`` - Trains does not copy the Task's model design from the input model.
+ - ``True`` - ClearML copies the Task's model design from the input model.
+ - ``False`` - ClearML does not copy the Task's model design from the input model.
:param update_task_labels: Update the Task's label enumeration
- - ``True`` - Trains copies the Task's label enumeration from the input model.
- - ``False`` - Trains does not copy the Task's label enumeration from the input model.
+ - ``True`` - ClearML copies the Task's label enumeration from the input model.
+ - ``False`` - ClearML does not copy the Task's label enumeration from the input model.
"""
if model_id is None and not model_name:
raise ValueError('Expected one of [model_id, model_name]')
@@ -749,7 +750,7 @@ class Task(IdObjectBase, AccessMixin, SetupUploadMixin):
i.e. {'Args/param': 'value'} is the argument "param" from section "Args"
:param backwards_compatibility: If True (default) parameters without section name
- (API version < 2.9, trains-server < 0.16) will be at dict root level.
+ (API version < 2.9, clearml-server < 0.16) will be at dict root level.
If False, parameters without section name, will be nested under "Args/" key.
:return: dict of the task parameters, all flattened to key/value.
@@ -838,14 +839,15 @@ class Task(IdObjectBase, AccessMixin, SetupUploadMixin):
not_allowed = {
k: type(v).__name__
for k, v in new_parameters.items()
- if not isinstance(v, self._parameters_allowed_types)
+ if not verify_basic_type(v, self._parameters_allowed_types)
}
if not_allowed:
- raise ValueError(
- "Only builtin types ({}) are allowed for values (got {})".format(
- ', '.join(t.__name__ for t in self._parameters_allowed_types),
- ', '.join('%s=>%s' % p for p in not_allowed.items())),
+ self.log.warning(
+ "Skipping parameter: {}, only builtin types are supported ({})".format(
+ ', '.join('%s[%s]' % p for p in not_allowed.items()),
+ ', '.join(t.__name__ for t in self._parameters_allowed_types))
)
+ new_parameters = {k: v for k, v in new_parameters.items() if k not in not_allowed}
use_hyperparams = Session.check_min_api_version('2.9')
@@ -958,7 +960,7 @@ class Task(IdObjectBase, AccessMixin, SetupUploadMixin):
:return: True if the parameter was deleted successfully
"""
if not Session.check_min_api_version('2.9'):
- raise ValueError("Delete hyper parameter is not supported by your trains-server, "
+ raise ValueError("Delete hyper parameter is not supported by your clearml-server, "
"upgrade to the latest version")
with self._edit_lock:
paramkey = tasks.ParamKey(section=name.split('/', 1)[0], name=name.split('/', 1)[1])
@@ -1011,7 +1013,7 @@ class Task(IdObjectBase, AccessMixin, SetupUploadMixin):
# type: (str) -> ()
"""
Set the base docker image for this experiment
- If provided, this value will be used by trains-agent to execute this experiment
+ If provided, this value will be used by clearml-agent to execute this experiment
inside the provided docker image.
When running remotely the call is ignored
"""
@@ -1275,7 +1277,7 @@ class Task(IdObjectBase, AccessMixin, SetupUploadMixin):
# type: () -> str
"""
Return the Task results & outputs web page address.
- For example: https://demoapp.trains.allegro.ai/projects/216431/experiments/60763e04/output/log
+ For example: https://demoapp.demo.clear.ml/projects/216431/experiments/60763e04/output/log
:return: http/s URL link.
"""
@@ -1428,7 +1430,7 @@ class Task(IdObjectBase, AccessMixin, SetupUploadMixin):
def running_locally():
# type: () -> bool
"""
- Is the task running locally (i.e., ``trains-agent`` is not executing it)
+ Is the task running locally (i.e., ``clearml-agent`` is not executing it)
:return: True, if the task is running locally. False, if the task is not running locally.
@@ -1637,7 +1639,7 @@ class Task(IdObjectBase, AccessMixin, SetupUploadMixin):
mutually_exclusive(config_dict=config_dict, config_text=config_text, _check_none=True)
if not Session.check_min_api_version('2.9'):
- raise ValueError("Multiple configurations is not supported with the current 'trains-server', "
+ raise ValueError("Multiple configurations is not supported with the current 'clearml-server', "
"please upgrade to the latest version")
if description:
@@ -1661,7 +1663,7 @@ class Task(IdObjectBase, AccessMixin, SetupUploadMixin):
return None if configuration name is not valid.
"""
if not Session.check_min_api_version('2.9'):
- raise ValueError("Multiple configurations is not supported with the current 'trains-server', "
+ raise ValueError("Multiple configurations is not supported with the current 'clearml-server', "
"please upgrade to the latest version")
configuration = self.data.configuration or {}
@@ -1725,6 +1727,22 @@ class Task(IdObjectBase, AccessMixin, SetupUploadMixin):
"""
session = session if session else cls._get_default_session()
+ use_clone_api = Session.check_min_api_version('2.9')
+ if use_clone_api:
+ res = cls._send(
+ session=session, log=log,
+ req=tasks.CloneRequest(
+ task=cloned_task_id,
+ new_task_name=name,
+ new_task_tags=tags,
+ new_task_comment=comment,
+ new_task_parent=parent,
+ new_task_project=project,
+ execution_overrides=execution_overrides,
+ )
+ )
+ cloned_task_id = res.response.id
+ return cloned_task_id
res = cls._send(session=session, log=log, req=tasks.GetByIdRequest(task=cloned_task_id))
task = res.response.task
@@ -1858,7 +1876,7 @@ class Task(IdObjectBase, AccessMixin, SetupUploadMixin):
if not PROC_MASTER_ID_ENV_VAR.get() or len(PROC_MASTER_ID_ENV_VAR.get().split(':')) < 2:
self.__edit_lock = RLock()
elif PROC_MASTER_ID_ENV_VAR.get().split(':')[1] == str(self.id):
- filename = os.path.join(gettempdir(), 'trains_{}.lock'.format(self.id))
+ filename = os.path.join(gettempdir(), 'clearml_{}.lock'.format(self.id))
# no need to remove previous file lock if we have a dead process, it will automatically release the lock.
# # noinspection PyBroadException
# try:
diff --git a/clearml/binding/artifacts.py b/clearml/binding/artifacts.py
index 85f6588c..ffa043f5 100644
--- a/clearml/binding/artifacts.py
+++ b/clearml/binding/artifacts.py
@@ -187,7 +187,7 @@ class Artifact(object):
:raise: Raises error if local copy not found.
:return: A local path to a downloaded copy of the artifact.
"""
- from trains.storage import StorageManager
+ from clearml.storage import StorageManager
local_copy = StorageManager.get_local_copy(
remote_url=self.url,
extract_archive=extract_archive and self.type == 'archive',
@@ -308,7 +308,7 @@ class Artifacts(object):
delete_after_upload=False, auto_pickle=True, wait_on_upload=False):
# type: (str, Optional[object], Optional[dict], Optional[str], bool, bool, bool) -> bool
if not Session.check_min_api_version('2.3'):
- LoggerRoot.get_base_logger().warning('Artifacts not supported by your TRAINS-server version, '
+ LoggerRoot.get_base_logger().warning('Artifacts not supported by your ClearML-server version, '
'please upgrade to the latest server version')
return False
@@ -648,7 +648,7 @@ class Artifacts(object):
return
self._last_artifacts_upload[name] = current_sha2
- # If old trains-server, upload as debug image
+ # If old clearml-server, upload as debug image
if not Session.check_min_api_version('2.3'):
logger.report_image(title='artifacts', series=name, local_path=local_csv.as_posix(),
delete_after_upload=True, iteration=self._task.get_last_iteration(),
@@ -698,7 +698,7 @@ class Artifacts(object):
"""
Upload local file and return uri of the uploaded file (uploading in the background)
"""
- from trains.storage import StorageManager
+ from clearml.storage import StorageManager
upload_uri = self._task.output_uri or self._task.get_logger().get_default_upload_destination()
if not isinstance(local_file, Path):
@@ -715,7 +715,7 @@ class Artifacts(object):
# send for upload
# noinspection PyProtectedMember
if wait_on_upload:
- StorageManager.upload_file(local_file.as_posix(), uri)
+ StorageManager.upload_file(local_file.as_posix(), uri, wait_for_upload=True, retries=ev.retries)
if delete_after_upload:
try:
os.unlink(local_file.as_posix())
diff --git a/clearml/binding/environ_bind.py b/clearml/binding/environ_bind.py
index b2902b7f..af52343c 100644
--- a/clearml/binding/environ_bind.py
+++ b/clearml/binding/environ_bind.py
@@ -44,7 +44,7 @@ class EnvironmentBind(object):
match = match.strip()
if match == '*':
env_param.update({k: os.environ.get(k) for k in os.environ
- if not k.startswith('TRAINS_') and not k.startswith('ALG_')})
+ if not k.startswith('TRAINS_') and not k.startswith('CLEARML_')})
elif match.endswith('*'):
match = match.strip('*')
env_param.update({k: os.environ.get(k) for k in os.environ if k.startswith(match)})
diff --git a/clearml/binding/frameworks/__init__.py b/clearml/binding/frameworks/__init__.py
index b658f8b2..2a24ab83 100644
--- a/clearml/binding/frameworks/__init__.py
+++ b/clearml/binding/frameworks/__init__.py
@@ -114,7 +114,7 @@ class WeightsFileHandler(object):
Add a pre-save/load callback for weights files and return its handle. If the callback was already added,
return the existing handle.
- Use this callback to modify the weights filename registered in the Trains Server. In case Trains is
+ Use this callback to modify the weights filename registered in the ClearML Server. In case ClearML is
configured to upload the weights file, this will affect the uploaded filename as well.
Callback returning None will disable the tracking of the current call Model save,
it will not disable saving it to disk, just the logging/tracking/uploading.
@@ -422,7 +422,7 @@ class WeightsFileHandler(object):
# HACK: if pytorch-lightning is used, remove the temp '.part' file extension
if sys.modules.get('pytorch_lightning') and target_filename.lower().endswith('.part'):
target_filename = target_filename[:-len('.part')]
- fd, temp_file = mkstemp(prefix='.trains.upload_model_', suffix='.tmp')
+ fd, temp_file = mkstemp(prefix='.clearml.upload_model_', suffix='.tmp')
os.close(fd)
shutil.copy(files[0], temp_file)
trains_out_model.update_weights(
diff --git a/clearml/binding/frameworks/tensorflow_bind.py b/clearml/binding/frameworks/tensorflow_bind.py
index a9b00267..8012636f 100644
--- a/clearml/binding/frameworks/tensorflow_bind.py
+++ b/clearml/binding/frameworks/tensorflow_bind.py
@@ -192,7 +192,7 @@ class WeightsGradientHistHelper(object):
class EventTrainsWriter(object):
"""
TF SummaryWriter implementation that converts the tensorboard's summary into
- Trains events and reports the events (metrics) for an Trains task (logger).
+ ClearML events and reports the events (metrics) for an ClearML task (logger).
"""
_add_lock = threading.RLock()
_series_name_lookup = {}
@@ -298,8 +298,8 @@ class EventTrainsWriter(object):
def __init__(self, logger, logdir=None, report_freq=100, image_report_freq=None,
histogram_update_freq_multiplier=10, histogram_granularity=50, max_keep_images=None):
"""
- Create a compatible Trains backend to the TensorFlow SummaryToEventTransformer
- Everything will be serialized directly to the Trains backend, instead of to the standard TF FileWriter
+ Create a compatible ClearML backend to the TensorFlow SummaryToEventTransformer
+ Everything will be serialized directly to the ClearML backend, instead of to the standard TF FileWriter
:param logger: The task.logger to use for sending the metrics (def: task.get_logger())
:param report_freq: How often to update the statistics values
@@ -846,7 +846,7 @@ class PatchSummaryToEventTransformer(object):
if PatchSummaryToEventTransformer.__original_getattribute is None:
PatchSummaryToEventTransformer.__original_getattribute = SummaryToEventTransformer.__getattribute__
SummaryToEventTransformer.__getattribute__ = PatchSummaryToEventTransformer._patched_getattribute
- setattr(SummaryToEventTransformer, 'trains',
+ setattr(SummaryToEventTransformer, 'clearml',
property(PatchSummaryToEventTransformer.trains_object))
except Exception as ex:
LoggerRoot.get_base_logger(TensorflowBinding).debug(str(ex))
@@ -859,7 +859,7 @@ class PatchSummaryToEventTransformer(object):
from torch.utils.tensorboard.writer import FileWriter as FileWriterT # noqa
PatchSummaryToEventTransformer._original_add_eventT = FileWriterT.add_event
FileWriterT.add_event = PatchSummaryToEventTransformer._patched_add_eventT
- setattr(FileWriterT, 'trains', None)
+ setattr(FileWriterT, 'clearml', None)
except ImportError:
# this is a new version of TensorflowX
pass
@@ -875,7 +875,7 @@ class PatchSummaryToEventTransformer(object):
PatchSummaryToEventTransformer.__original_getattributeX = \
SummaryToEventTransformerX.__getattribute__
SummaryToEventTransformerX.__getattribute__ = PatchSummaryToEventTransformer._patched_getattributeX
- setattr(SummaryToEventTransformerX, 'trains',
+ setattr(SummaryToEventTransformerX, 'clearml',
property(PatchSummaryToEventTransformer.trains_object))
except ImportError:
# this is a new version of TensorflowX
@@ -890,7 +890,7 @@ class PatchSummaryToEventTransformer(object):
from tensorboardX.writer import FileWriter as FileWriterX # noqa
PatchSummaryToEventTransformer._original_add_eventX = FileWriterX.add_event
FileWriterX.add_event = PatchSummaryToEventTransformer._patched_add_eventX
- setattr(FileWriterX, 'trains', None)
+ setattr(FileWriterX, 'clearml', None)
except ImportError:
# this is a new version of TensorflowX
pass
@@ -899,38 +899,38 @@ class PatchSummaryToEventTransformer(object):
@staticmethod
def _patched_add_eventT(self, *args, **kwargs):
- if not hasattr(self, 'trains') or not PatchSummaryToEventTransformer.__main_task:
+ if not hasattr(self, 'clearml') or not PatchSummaryToEventTransformer.__main_task:
return PatchSummaryToEventTransformer._original_add_eventT(self, *args, **kwargs)
- if not self.trains:
+ if not self.clearml: # noqa
# noinspection PyBroadException
try:
logdir = self.get_logdir()
except Exception:
logdir = None
- self.trains = EventTrainsWriter(PatchSummaryToEventTransformer.__main_task.get_logger(),
+ self.clearml = EventTrainsWriter(PatchSummaryToEventTransformer.__main_task.get_logger(),
logdir=logdir, **PatchSummaryToEventTransformer.defaults_dict)
# noinspection PyBroadException
try:
- self.trains.add_event(*args, **kwargs)
+ self.clearml.add_event(*args, **kwargs)
except Exception:
pass
return PatchSummaryToEventTransformer._original_add_eventT(self, *args, **kwargs)
@staticmethod
def _patched_add_eventX(self, *args, **kwargs):
- if not hasattr(self, 'trains') or not PatchSummaryToEventTransformer.__main_task:
+ if not hasattr(self, 'clearml') or not PatchSummaryToEventTransformer.__main_task:
return PatchSummaryToEventTransformer._original_add_eventX(self, *args, **kwargs)
- if not self.trains:
+ if not self.clearml:
# noinspection PyBroadException
try:
logdir = self.get_logdir()
except Exception:
logdir = None
- self.trains = EventTrainsWriter(PatchSummaryToEventTransformer.__main_task.get_logger(),
+ self.clearml = EventTrainsWriter(PatchSummaryToEventTransformer.__main_task.get_logger(),
logdir=logdir, **PatchSummaryToEventTransformer.defaults_dict)
# noinspection PyBroadException
try:
- self.trains.add_event(*args, **kwargs)
+ self.clearml.add_event(*args, **kwargs)
except Exception:
pass
return PatchSummaryToEventTransformer._original_add_eventX(self, *args, **kwargs)
@@ -947,17 +947,17 @@ class PatchSummaryToEventTransformer(object):
@staticmethod
def _patched_getattribute_(self, attr, get_base):
- # no main task, zero chance we have an Trains event logger
+ # no main task, zero chance we have an ClearML event logger
if PatchSummaryToEventTransformer.__main_task is None:
return get_base(self, attr)
- # check if we already have an Trains event logger
+ # check if we already have an ClearML event logger
__dict__ = get_base(self, '__dict__')
if 'event_writer' not in __dict__ or \
isinstance(__dict__['event_writer'], (ProxyEventsWriter, EventTrainsWriter)):
return get_base(self, attr)
- # patch the events writer field, and add a double Event Logger (Trains and original)
+ # patch the events writer field, and add a double Event Logger (ClearML and original)
base_eventwriter = __dict__['event_writer']
# noinspection PyBroadException
try:
@@ -1062,7 +1062,7 @@ class PatchModelCheckPointCallback(object):
if PatchModelCheckPointCallback.__original_getattribute is None and callbacks is not None:
PatchModelCheckPointCallback.__original_getattribute = callbacks.ModelCheckpoint.__getattribute__
callbacks.ModelCheckpoint.__getattribute__ = PatchModelCheckPointCallback._patched_getattribute
- setattr(callbacks.ModelCheckpoint, 'trains',
+ setattr(callbacks.ModelCheckpoint, 'clearml',
property(PatchModelCheckPointCallback.trains_object))
except Exception as ex:
@@ -1072,17 +1072,17 @@ class PatchModelCheckPointCallback(object):
def _patched_getattribute(self, attr):
get_base = PatchModelCheckPointCallback.__original_getattribute
- # no main task, zero chance we have an Trains event logger
+ # no main task, zero chance we have an ClearML event logger
if PatchModelCheckPointCallback.__main_task is None:
return get_base(self, attr)
- # check if we already have an Trains event logger
+ # check if we already have an ClearML event logger
__dict__ = get_base(self, '__dict__')
if 'model' not in __dict__ or \
isinstance(__dict__['model'], _ModelAdapter):
return get_base(self, attr)
- # patch the events writer field, and add a double Event Logger (Trains and original)
+ # patch the events writer field, and add a double Event Logger (ClearML and original)
base_model = __dict__['model']
defaults_dict = __dict__.get('_trains_defaults') or PatchModelCheckPointCallback.defaults_dict
output_model = OutputModel(
diff --git a/clearml/cli/__init__.py b/clearml/cli/__init__.py
new file mode 100644
index 00000000..8b137891
--- /dev/null
+++ b/clearml/cli/__init__.py
@@ -0,0 +1 @@
+
diff --git a/clearml/cli/config/__init__.py b/clearml/cli/config/__init__.py
new file mode 100644
index 00000000..8b137891
--- /dev/null
+++ b/clearml/cli/config/__init__.py
@@ -0,0 +1 @@
+
diff --git a/clearml/config/default/__main__.py b/clearml/cli/config/__main__.py
similarity index 87%
rename from clearml/config/default/__main__.py
rename to clearml/cli/config/__main__.py
index 410bacf6..ae4c2abf 100644
--- a/clearml/config/default/__main__.py
+++ b/clearml/cli/config/__main__.py
@@ -1,4 +1,4 @@
-""" Trains configuration wizard"""
+""" ClearML configuration wizard"""
from __future__ import print_function
import argparse
@@ -8,22 +8,23 @@ from pathlib2 import Path
from six.moves import input
from six.moves.urllib.parse import urlparse
-from trains.backend_api.session import Session
-from trains.backend_api.session.defs import ENV_HOST
-from trains.backend_config.defs import LOCAL_CONFIG_FILES, LOCAL_CONFIG_FILE_OVERRIDE_VAR
-from trains.config import config_obj
-from trains.utilities.pyhocon import ConfigFactory, ConfigMissingException
+from clearml.backend_api.session import Session
+from clearml.backend_api.session.defs import ENV_HOST
+from clearml.backend_config.defs import LOCAL_CONFIG_FILES, LOCAL_CONFIG_FILE_OVERRIDE_VAR
+from clearml.config import config_obj
+from clearml.utilities.pyhocon import ConfigFactory, ConfigMissingException
description = "\n" \
- "Please create new trains credentials through the profile page in " \
- "your trains web app (e.g. http://localhost:8080/profile)\n" \
+ "Please create new clearml credentials through the profile page in " \
+ "your clearml web app (e.g. http://localhost:8080/profile) \n"\
+ "Or with the free hosted service at https://app.community.clear.ml/profile\n" \
"In the profile page, press \"Create new credentials\", then press \"Copy to clipboard\".\n" \
"\n" \
"Paste copied configuration here:\n"
host_description = """
Editing configuration file: {CONFIG_FILE}
-Enter the url of the trains-server's Web service, for example: {HOST}
+Enter the url of the clearml-server's Web service, for example: {HOST}
"""
# noinspection PyBroadException
@@ -40,7 +41,12 @@ def validate_file(string):
def main():
- default_config_file = os.getenv(LOCAL_CONFIG_FILE_OVERRIDE_VAR) or LOCAL_CONFIG_FILES[0]
+ default_config_file = LOCAL_CONFIG_FILE_OVERRIDE_VAR.get()
+ if not default_config_file:
+ for f in LOCAL_CONFIG_FILES:
+ default_config_file = f
+ if os.path.exists(os.path.expanduser(os.path.expandvars(f))):
+ break
p = argparse.ArgumentParser(description=__doc__)
p.add_argument(
@@ -51,16 +57,20 @@ def main():
args = p.parse_args()
- print('TRAINS SDK setup process')
+ print('ClearML SDK setup process')
- conf_file = Path(args.file).absolute()
+ conf_file = Path(os.path.expanduser(args.file)).absolute()
if conf_file.exists() and conf_file.is_file() and conf_file.stat().st_size > 0:
print('Configuration file already exists: {}'.format(str(conf_file)))
print('Leaving setup, feel free to edit the configuration file.')
return
print(description, end='')
sentinel = ''
- parse_input = '\n'.join(iter(input, sentinel))
+ parse_input = ''
+ for line in iter(input, sentinel):
+ parse_input += line+'\n'
+ if line.rstrip() == '}':
+ break
credentials = None
api_server = None
web_server = None
@@ -104,7 +114,7 @@ def main():
files_host = input_url('File Store Host', files_host)
- print('\nTRAINS Hosts configuration:\nWeb App: {}\nAPI: {}\nFile Store: {}\n'.format(
+ print('\nClearML Hosts configuration:\nWeb App: {}\nAPI: {}\nFile Store: {}\n'.format(
web_host, api_host, files_host))
retry = 1
@@ -121,7 +131,7 @@ def main():
# noinspection PyBroadException
try:
- default_sdk_conf = Path(__file__).parent.absolute() / 'sdk.conf'
+ default_sdk_conf = Path(__file__).absolute().parents[2] / 'config/default/sdk.conf'
with open(str(default_sdk_conf), 'rt') as f:
default_sdk = f.read()
except Exception:
@@ -130,14 +140,14 @@ def main():
# noinspection PyBroadException
try:
with open(str(conf_file), 'wt') as f:
- header = '# TRAINS SDK configuration file\n' \
+ header = '# ClearML SDK configuration file\n' \
'api {\n' \
' # Notice: \'host\' is the api server (default port 8008), not the web server.\n' \
' api_server: %s\n' \
' web_server: %s\n' \
' files_server: %s\n' \
' # Credentials are generated using the webapp, %s/profile\n' \
- ' # Override with os environment: TRAINS_API_ACCESS_KEY / TRAINS_API_SECRET_KEY\n' \
+ ' # Override with os environment: CLEARML_API_ACCESS_KEY / CLEARML_API_SECRET_KEY\n' \
' credentials {"access_key": "%s", "secret_key": "%s"}\n' \
'}\n' \
'sdk ' % (api_host, web_host, files_host,
@@ -149,7 +159,7 @@ def main():
return
print('\nNew configuration stored in {}'.format(str(conf_file)))
- print('TRAINS setup completed successfully.')
+ print('ClearML setup completed successfully.')
def parse_host(parsed_host, allow_input=True):
@@ -290,7 +300,7 @@ def verify_url(parse_input):
parsed_host = None
except Exception:
parsed_host = None
- print('Could not parse url {}\nEnter your trains-server host: '.format(parse_input), end='')
+ print('Could not parse url {}\nEnter your clearml-server host: '.format(parse_input), end='')
return parsed_host
diff --git a/clearml/cli/task/__init__.py b/clearml/cli/task/__init__.py
new file mode 100644
index 00000000..8b137891
--- /dev/null
+++ b/clearml/cli/task/__init__.py
@@ -0,0 +1 @@
+
diff --git a/clearml/cli/task/__main__.py b/clearml/cli/task/__main__.py
new file mode 100644
index 00000000..61a6ec9e
--- /dev/null
+++ b/clearml/cli/task/__main__.py
@@ -0,0 +1,119 @@
+from argparse import ArgumentParser
+
+from pathlib2 import Path
+
+from clearml.backend_interface.task.populate import CreateAndPopulate
+from clearml import Task
+
+
+def setup_parser(parser):
+ parser.add_argument('--version', action='store_true', default=None,
+ help='Display the clearml-task utility version')
+ parser.add_argument('--project', type=str, default=None,
+ help='Required: set the project name for the task. '
+ 'If --base-task-id is used, this arguments is optional.')
+ parser.add_argument('--name', type=str, default=None, required=True,
+ help='Required: select a name for the remote task')
+ parser.add_argument('--repo', type=str, default=None,
+ help='remote URL for the repository to use. '
+ 'Example: --repo https://github.com/allegroai/clearml.git')
+ parser.add_argument('--branch', type=str, default=None,
+ help='Select specific repository branch/tag (implies the latest commit from the branch)')
+ parser.add_argument('--commit', type=str, default=None,
+ help='Select specific commit id to use (default: latest commit, '
+ 'or when used with local repository matching the local commit id)')
+ parser.add_argument('--folder', type=str, default=None,
+ help='Remotely execute the code in the local folder. '
+ 'Notice! It assumes a git repository already exists. '
+ 'Current state of the repo (commit id and uncommitted changes) is logged '
+ 'and will be replicated on the remote machine')
+ parser.add_argument('--script', type=str, default=None,
+ help='Specify the entry point script for the remote execution. '
+ 'When used in tandem with --repo the script should be a relative path inside '
+ 'the repository, for example: --script source/train.py .'
+ 'When used with --folder it supports a direct path to a file inside the local '
+ 'repository itself, for example: --script ~/project/source/train.py')
+ parser.add_argument('--cwd', type=str, default=None,
+ help='Working directory to launch the script from. Default: repository root folder. '
+ 'Relative to repo root or local folder')
+ parser.add_argument('--args', default=None, nargs='*',
+ help='Arguments to pass to the remote execution, list of = strings.'
+ 'Currently only argparse arguments are supported. '
+ 'Example: --args lr=0.003 batch_size=64')
+ parser.add_argument('--queue', type=str, default=None,
+ help='Select the queue to launch the task. '
+ 'If not provided a Task will be created but it will not be launched.')
+ parser.add_argument('--requirements', type=str, default=None,
+ help='Specify requirements.txt file to install when setting the session. '
+ 'If not provided, the requirements.txt from the repository will be used.')
+ parser.add_argument('--packages', default=None, nargs='*',
+ help='Manually specify a list of required packages. '
+ 'Example: --packages "tqdm>=2.1" "scikit-learn"')
+ parser.add_argument('--docker', type=str, default=None,
+ help='Select the docker image to use in the remote session')
+ parser.add_argument('--skip-task-init', action='store_true', default=None,
+ help='If set, Task.init() call is not added to the entry point, and is assumed '
+ 'to be called in within the script. Default: add Task.init() call entry point script')
+ parser.add_argument('--base-task-id', type=str, default=None,
+ help='Use a pre-existing task in the system, instead of a local repo/script. '
+ 'Essentially clones an existing task and overrides arguments/requirements.')
+
+
+def cli():
+ title = 'ClearML launch - launch any codebase on remote machine running clearml-agent'
+ print(title)
+ parser = ArgumentParser(description=title)
+ setup_parser(parser)
+
+ # get the args
+ args = parser.parse_args()
+
+ if args.version:
+ from ...version import __version__
+ print('Version {}'.format(__version__))
+ exit(0)
+
+ create_populate = CreateAndPopulate(
+ project_name=args.project,
+ task_name=args.name,
+ repo=args.repo or args.folder,
+ branch=args.branch,
+ commit=args.commit,
+ script=args.script,
+ working_directory=args.cwd,
+ packages=args.packages,
+ requirements_file=args.requirements,
+ base_task_id=args.base_task_id,
+ add_task_init_call=not args.skip_task_init,
+ raise_on_missing_entries=True,
+ )
+ # verify args
+ create_populate.update_task_args(args.args)
+
+ print('Creating new task')
+ create_populate.create_task()
+ # update Task args
+ create_populate.update_task_args(args.args)
+
+ print('New task created id={}'.format(create_populate.get_id()))
+ if not args.queue:
+ print('Warning: No queue was provided, leaving task in draft-mode.')
+ exit(0)
+
+ Task.enqueue(create_populate.task, queue_name=args.queue)
+ print('Task id={} sent for execution on queue {}'.format(create_populate.get_id(), args.queue))
+ print('Execution log at: {}'.format(create_populate.task.get_output_log_web_page()))
+
+
+def main():
+ try:
+ cli()
+ except KeyboardInterrupt:
+ print('\nUser aborted')
+ except Exception as ex:
+ print('\nError: {}'.format(ex))
+ exit(1)
+
+
+if __name__ == '__main__':
+ main()
diff --git a/clearml/config/__init__.py b/clearml/config/__init__.py
index 02295e06..107f8135 100644
--- a/clearml/config/__init__.py
+++ b/clearml/config/__init__.py
@@ -135,7 +135,10 @@ def dev_worker_name():
def __set_is_master_node():
# noinspection PyBroadException
try:
- force_master_node = os.environ.pop('TRAINS_FORCE_MASTER_NODE', None)
+ # pop both set the first
+ env_a = os.environ.pop('CLEARML_FORCE_MASTER_NODE', None)
+ env_b = os.environ.pop('TRAINS_FORCE_MASTER_NODE', None)
+ force_master_node = env_a or env_b
except Exception:
force_master_node = None
diff --git a/clearml/config/default/logging.conf b/clearml/config/default/logging.conf
index 6b3cb071..458695fa 100644
--- a/clearml/config/default/logging.conf
+++ b/clearml/config/default/logging.conf
@@ -2,7 +2,7 @@
version: 1
disable_existing_loggers: 0
loggers {
- trains {
+ clearml {
level: INFO
}
boto {
diff --git a/clearml/config/default/sdk.conf b/clearml/config/default/sdk.conf
index 9f481049..20c19a07 100644
--- a/clearml/config/default/sdk.conf
+++ b/clearml/config/default/sdk.conf
@@ -1,10 +1,10 @@
{
- # TRAINS - default SDK configuration
+ # ClearML - default SDK configuration
storage {
cache {
# Defaults to system temp folder / cache
- default_base_dir: "~/.trains/cache"
+ default_base_dir: "~/.clearml/cache"
}
direct_access: [
@@ -93,7 +93,7 @@
google.storage {
# # Default project and credentials file
# # Will be used when no bucket configuration is found
- # project: "trains"
+ # project: "clearml"
# credentials_json: "/path/to/credentials.json"
# # Specific credentials per bucket and sub directory
@@ -101,7 +101,7 @@
# {
# bucket: "my-bucket"
# subdir: "path/in/bucket" # Not required
- # project: "trains"
+ # project: "clearml"
# credentials_json: "/path/to/credentials.json"
# },
# ]
@@ -109,7 +109,7 @@
azure.storage {
# containers: [
# {
- # account_name: "trains"
+ # account_name: "clearml"
# account_key: "secret"
# # container_name:
# }
@@ -150,8 +150,8 @@
# do not analyze the entire repository.
force_analyze_entire_repo: false
- # If set to true, *trains* update message will not be printed to the console
- # this value can be overwritten with os environment variable TRAINS_SUPPRESS_UPDATE_MESSAGE=1
+ # If set to true, *clearml* update message will not be printed to the console
+ # this value can be overwritten with os environment variable CLEARML_SUPPRESS_UPDATE_MESSAGE=1
suppress_update_message: false
# If this flag is true (default is false), instead of analyzing the code with Pigar, analyze with `pip freeze`
@@ -161,7 +161,7 @@
# of the Hyper-Parameters.
# multiple selected variables are supported including the suffix '*'.
# For example: "AWS_*" will log any OS environment variable starting with 'AWS_'.
- # This value can be overwritten with os environment variable TRAINS_LOG_ENVIRONMENT="[AWS_*, CUDA_VERSION]"
+ # This value can be overwritten with os environment variable CLEARML_LOG_ENVIRONMENT="[AWS_*, CUDA_VERSION]"
# Example: log_os_environments: ["AWS_*", "CUDA_VERSION"]
log_os_environments: []
diff --git a/clearml/config/defs.py b/clearml/config/defs.py
index 87a9cb5e..8fe85ac4 100644
--- a/clearml/config/defs.py
+++ b/clearml/config/defs.py
@@ -5,28 +5,28 @@ from ..backend_config.converters import base64_to_text, or_
from pathlib2 import Path
SESSION_CACHE_FILE = ".session.json"
-DEFAULT_CACHE_DIR = str(Path(tempfile.gettempdir()) / "trains_cache")
+DEFAULT_CACHE_DIR = str(Path(tempfile.gettempdir()) / "clearml_cache")
-TASK_ID_ENV_VAR = EnvEntry("TRAINS_TASK_ID", "ALG_TASK_ID")
-DOCKER_IMAGE_ENV_VAR = EnvEntry("TRAINS_DOCKER_IMAGE", "ALG_DOCKER_IMAGE")
-LOG_TO_BACKEND_ENV_VAR = EnvEntry("TRAINS_LOG_TASK_TO_BACKEND", "ALG_LOG_TASK_TO_BACKEND", type=bool)
-NODE_ID_ENV_VAR = EnvEntry("TRAINS_NODE_ID", "ALG_NODE_ID", type=int)
-PROC_MASTER_ID_ENV_VAR = EnvEntry("TRAINS_PROC_MASTER_ID", "ALG_PROC_MASTER_ID", type=str)
-LOG_STDERR_REDIRECT_LEVEL = EnvEntry("TRAINS_LOG_STDERR_REDIRECT_LEVEL", "ALG_LOG_STDERR_REDIRECT_LEVEL")
-DEV_WORKER_NAME = EnvEntry("TRAINS_WORKER_NAME", "ALG_WORKER_NAME")
-DEV_TASK_NO_REUSE = EnvEntry("TRAINS_TASK_NO_REUSE", "ALG_TASK_NO_REUSE", type=bool)
-TASK_LOG_ENVIRONMENT = EnvEntry("TRAINS_LOG_ENVIRONMENT", "ALG_LOG_ENVIRONMENT", type=str)
-TRAINS_CACHE_DIR = EnvEntry("TRAINS_CACHE_DIR", "ALG_CACHE_DIR")
+TASK_ID_ENV_VAR = EnvEntry("CLEARML_TASK_ID", "TRAINS_TASK_ID")
+DOCKER_IMAGE_ENV_VAR = EnvEntry("CLEARML_DOCKER_IMAGE", "TRAINS_DOCKER_IMAGE")
+LOG_TO_BACKEND_ENV_VAR = EnvEntry("CLEARML_LOG_TASK_TO_BACKEND", "TRAINS_LOG_TASK_TO_BACKEND", type=bool)
+NODE_ID_ENV_VAR = EnvEntry("CLEARML_NODE_ID", "TRAINS_NODE_ID", type=int)
+PROC_MASTER_ID_ENV_VAR = EnvEntry("CLEARML_PROC_MASTER_ID", "TRAINS_PROC_MASTER_ID", type=str)
+LOG_STDERR_REDIRECT_LEVEL = EnvEntry("CLEARML_LOG_STDERR_REDIRECT_LEVEL", "TRAINS_LOG_STDERR_REDIRECT_LEVEL")
+DEV_WORKER_NAME = EnvEntry("CLEARML_WORKER_NAME", "TRAINS_WORKER_NAME")
+DEV_TASK_NO_REUSE = EnvEntry("CLEARML_TASK_NO_REUSE", "TRAINS_TASK_NO_REUSE", type=bool)
+TASK_LOG_ENVIRONMENT = EnvEntry("CLEARML_LOG_ENVIRONMENT", "TRAINS_LOG_ENVIRONMENT", type=str)
+TRAINS_CACHE_DIR = EnvEntry("CLEARML_CACHE_DIR", "TRAINS_CACHE_DIR")
-LOG_LEVEL_ENV_VAR = EnvEntry("TRAINS_LOG_LEVEL", "ALG_LOG_LEVEL", converter=or_(int, str))
+LOG_LEVEL_ENV_VAR = EnvEntry("CLEARML_LOG_LEVEL", "TRAINS_LOG_LEVEL", converter=or_(int, str))
-SUPPRESS_UPDATE_MESSAGE_ENV_VAR = EnvEntry("TRAINS_SUPPRESS_UPDATE_MESSAGE", "ALG_SUPPRESS_UPDATE_MESSAGE", type=bool)
+SUPPRESS_UPDATE_MESSAGE_ENV_VAR = EnvEntry("CLEARML_SUPPRESS_UPDATE_MESSAGE", "TRAINS_SUPPRESS_UPDATE_MESSAGE", type=bool)
# Repository detection
-VCS_REPO_TYPE = EnvEntry("TRAINS_VCS_REPO_TYPE", "ALG_VCS_REPO_TYPE", default="git")
-VCS_REPOSITORY_URL = EnvEntry("TRAINS_VCS_REPO_URL", "ALG_VCS_REPO_URL")
-VCS_COMMIT_ID = EnvEntry("TRAINS_VCS_COMMIT_ID", "ALG_VCS_COMMIT_ID")
-VCS_BRANCH = EnvEntry("TRAINS_VCS_BRANCH", "ALG_VCS_BRANCH")
-VCS_ROOT = EnvEntry("TRAINS_VCS_ROOT", "ALG_VCS_ROOT")
-VCS_STATUS = EnvEntry("TRAINS_VCS_STATUS", "ALG_VCS_STATUS", converter=base64_to_text)
-VCS_DIFF = EnvEntry("TRAINS_VCS_DIFF", "ALG_VCS_DIFF", converter=base64_to_text)
+VCS_REPO_TYPE = EnvEntry("CLEARML_VCS_REPO_TYPE", "TRAINS_VCS_REPO_TYPE", default="git")
+VCS_REPOSITORY_URL = EnvEntry("CLEARML_VCS_REPO_URL", "TRAINS_VCS_REPO_URL")
+VCS_COMMIT_ID = EnvEntry("CLEARML_VCS_COMMIT_ID", "TRAINS_VCS_COMMIT_ID")
+VCS_BRANCH = EnvEntry("CLEARML_VCS_BRANCH", "TRAINS_VCS_BRANCH")
+VCS_ROOT = EnvEntry("CLEARML_VCS_ROOT", "TRAINS_VCS_ROOT")
+VCS_STATUS = EnvEntry("CLEARML_VCS_STATUS", "TRAINS_VCS_STATUS", converter=base64_to_text)
+VCS_DIFF = EnvEntry("CLEARML_VCS_DIFF", "TRAINS_VCS_DIFF", converter=base64_to_text)
diff --git a/clearml/datasets/__init__.py b/clearml/datasets/__init__.py
new file mode 100644
index 00000000..84f8c2a8
--- /dev/null
+++ b/clearml/datasets/__init__.py
@@ -0,0 +1,6 @@
+from .dataset import FileEntry, Dataset
+
+__all__ = [
+ "FileEntry",
+ "Dataset",
+]
diff --git a/clearml/datasets/dataset.py b/clearml/datasets/dataset.py
new file mode 100644
index 00000000..a8de4c00
--- /dev/null
+++ b/clearml/datasets/dataset.py
@@ -0,0 +1,1244 @@
+import json
+import os
+import shutil
+from copy import deepcopy, copy
+from fnmatch import fnmatch
+from multiprocessing import cpu_count
+from multiprocessing.pool import ThreadPool
+from tempfile import mkstemp, mkdtemp
+from typing import Union, Optional, Sequence, List, Dict, Any, Mapping
+from zipfile import ZipFile, ZIP_DEFLATED
+
+import humanfriendly
+from attr import attrs, attrib
+from pathlib2 import Path
+
+from .. import Task, StorageManager
+from ..backend_api.session.client import APIClient
+from ..backend_interface.util import mutually_exclusive, exact_match_regex
+from ..debugging.log import LoggerRoot
+from ..storage.helper import StorageHelper
+from ..storage.cache import CacheManager
+from ..storage.util import sha256sum, is_windows, md5text
+
+try:
+ from pathlib import Path as _Path # noqa
+except ImportError:
+ _Path = None
+
+
+@attrs
+class FileEntry:
+ relative_path = attrib(default=None, type=str)
+ hash = attrib(default=None, type=str)
+ parent_dataset_id = attrib(default=None, type=str)
+ size = attrib(default=None, type=int)
+ # cleared when file is uploaded.
+ local_path = attrib(default=None, type=str)
+
+ def as_dict(self):
+ # type: () -> Dict
+ state = dict(relative_path=self.relative_path, hash=self.hash,
+ parent_dataset_id=self.parent_dataset_id, size=self.size,
+ **dict([('local_path', self.local_path)] if self.local_path else ()))
+ return state
+
+
+class Dataset(object):
+ __private_magic = 42 * 1337
+ __state_entry_name = 'state'
+ __data_entry_name = 'data'
+ __cache_context = 'datasets'
+ __tag = 'dataset'
+ __cache_folder_prefix = 'ds_'
+ __dataset_folder_template = CacheManager.set_context_folder_lookup(__cache_context, "{0}_archive_{1}")
+ __preview_max_file_entries = 15000
+ __preview_max_size = 5 * 1024 * 1024
+
+ def __init__(self, _private, task=None, dataset_project=None, dataset_name=None):
+ # type: (int, Optional[Task], Optional[str], Optional[str]) -> ()
+ """
+ Do not use directly! Use Dataset.create(...) or Dataset.get(...) instead.
+ """
+ assert _private == self.__private_magic
+ # key for the dataset file entries are the relative path within the data
+ self._dataset_file_entries = {} # type: Dict[str, FileEntry]
+ # this will create a graph of all the dependencies we have, each entry lists it's own direct parents
+ self._dependency_graph = {} # type: Dict[str, List[str]]
+ if not task:
+ task = Task.create(
+ project_name=dataset_project, task_name=dataset_name, task_type=Task.TaskTypes.data_processing)
+ task.set_system_tags((task.get_system_tags() or []) + [self.__tag])
+ task.mark_started()
+ # generate the script section
+ script = \
+ 'from clearml import Dataset\n\n' \
+ 'ds = Dataset.create(dataset_project=\'{dataset_project}\', dataset_name=\'{dataset_name}\')\n'.format(
+ dataset_project=dataset_project, dataset_name=dataset_name)
+ task.data.script.diff = script
+ task.data.script.working_dir = '.'
+ task.data.script.entry_point = 'register_dataset.py'
+ from clearml import __version__
+ task.data.script.requirements = {'pip': 'clearml == {}\n'.format(__version__)}
+ # noinspection PyProtectedMember
+ task._edit(script=task.data.script)
+
+ # store current dataset Task
+ self._task = task
+ # store current dataset id
+ self._id = task.id
+ # store the folder where the dataset was downloaded to
+ self._local_base_folder = None # type: Optional[Path]
+ # dirty flag, set True by any function call changing the dataset (regardless of weather it did anything)
+ self._dirty = False
+
+ @property
+ def id(self):
+ # type: () -> str
+ return self._id
+
+ @property
+ def file_entries(self):
+ # type: () -> List[FileEntry]
+ return list(self._dataset_file_entries.values())
+
+ @property
+ def file_entries_dict(self):
+ # type: () -> Mapping[str, FileEntry]
+ """
+ Notice this call returns an internal representation, do not modify!
+ :return: dict with relative file path as key, and FileEntry as value
+ """
+ return self._dataset_file_entries
+
+ @property
+ def project(self):
+ # type: () -> str
+ return self._task.get_project_name()
+
+ @property
+ def name(self):
+ # type: () -> str
+ return self._task.name
+
+ @property
+ def tags(self):
+ # type: () -> List[str]
+ return self._task.get_tags() or []
+
+ @tags.setter
+ def tags(self, values):
+ # type: (List[str]) -> ()
+ self._task.set_tags(values or [])
+
+ def add_files(self,
+ path, # type: Union[str, Path, _Path]
+ wildcard=None, # type: Optional[Union[str, Sequence[str]]]
+ local_base_folder=None, # type: Optional[str]
+ dataset_path=None, # type: Optional[str]
+ recursive=True, # type: bool
+ verbose=False # type: bool
+ ):
+ # type: (...) -> ()
+ """
+ Add a folder into the current dataset. calculate file hash,
+ and compare against parent, mark files to be uploaded
+
+ :param path: Add a folder/file to the dataset
+ :param wildcard: add only specific set of files.
+ Wildcard matching, can be a single string or a list of wildcards)
+ :param local_base_folder: files will be located based on their relative path from local_base_folder
+ :param dataset_path: where in the dataset the folder/files should be located
+ :param recursive: If True match all wildcard files recursively
+ :param verbose: If True print to console files added/modified
+ :return: number of files added
+ """
+ self._dirty = True
+ self._task.get_logger().report_text(
+ 'Adding files to dataset: {}'.format(
+ dict(path=path, wildcard=wildcard, local_base_folder=local_base_folder,
+ dataset_path=dataset_path, recursive=recursive, verbose=verbose)),
+ print_console=False)
+
+ num_added = self._add_files(
+ path=path, wildcard=wildcard, local_base_folder=local_base_folder,
+ dataset_path=dataset_path, recursive=recursive, verbose=verbose)
+
+ # update the task script
+ self._add_script_call(
+ 'add_files', path=path, wildcard=wildcard, local_base_folder=local_base_folder,
+ dataset_path=dataset_path, recursive=recursive)
+
+ self._serialize()
+
+ return num_added
+
+ def remove_files(self, dataset_path=None, recursive=True, verbose=False):
+ # type: (Optional[str], bool, bool) -> int
+ """
+ Add a folder into the current dataset. calculate file hash,
+ and compare against parent, mark files to be uploaded
+
+ :param dataset_path: Remove files from the dataset.
+ The path is always relative to the dataset (e.g 'folder/file.bin')
+ :param recursive: If True match all wildcard files recursively
+ :param verbose: If True print to console files removed
+ :return: Number of files removed
+ """
+ self._dirty = True
+ self._task.get_logger().report_text(
+ 'Removing files from dataset: {}'.format(
+ dict(dataset_path=dataset_path, recursive=recursive, verbose=verbose)),
+ print_console=False)
+
+ if dataset_path and dataset_path.startswith('/'):
+ dataset_path = dataset_path[1:]
+
+ num_files = len(self._dataset_file_entries)
+ org_files = list(self._dataset_file_entries.keys()) if verbose else None
+
+ if not recursive:
+ self._dataset_file_entries = {
+ k: v for k, v in self._dataset_file_entries.items()
+ if not fnmatch(k + '/', dataset_path + '/')}
+ else:
+ wildcard = dataset_path.split('/')[-1]
+ path = dataset_path[:-len(dataset_path)] + '*'
+
+ self._dataset_file_entries = {
+ k: v for k, v in self._dataset_file_entries.items()
+ if not (fnmatch(k, path) and fnmatch(k, '*/' + wildcard))}
+
+ if verbose and org_files:
+ for f in org_files:
+ if f not in self._dataset_file_entries:
+ self._task.get_logger().report_text('Remove {}'.format(f))
+
+ # update the task script
+ self._add_script_call(
+ 'remove_files', dataset_path=dataset_path, recursive=recursive)
+
+ self._serialize()
+
+ return num_files - len(self._dataset_file_entries)
+
+ def sync_folder(self, local_path, dataset_path=None, verbose=False):
+ # type: (Union[Path, _Path, str], Union[Path, _Path, str], bool) -> (int, int)
+ """
+ Synchronize the dataset with a local folder. The dataset is synchronized from the
+ relative_base_folder (default: dataset root) and deeper with the specified local path.
+
+ :param local_path: Local folder to sync (assumes all files and recursive)
+ :param dataset_path: Target dataset path to sync with (default the root of the dataset)
+ :param verbose: If true print to console files added/modified/removed
+ :return: number of files removed, number of files modified/added
+ """
+ def filter_f(f):
+ keep = (not f.relative_path.startswith(relative_prefix) or
+ (local_path / f.relative_path[len(relative_prefix):]).is_file())
+ if not keep and verbose:
+ self._task.get_logger().report_text('Remove {}'.format(f.relative_path))
+ return keep
+
+ self._task.get_logger().report_text(
+ 'Syncing local copy with dataset: {}'.format(
+ dict(local_path=local_path, dataset_path=dataset_path, verbose=verbose)),
+ print_console=False)
+
+ self._dirty = True
+ local_path = Path(local_path)
+
+ # Path().as_posix() will never end with /
+ relative_prefix = (Path(dataset_path).as_posix() + '/') if dataset_path else ''
+
+ # remove files
+ num_files = len(self._dataset_file_entries)
+ self._dataset_file_entries = {
+ k: f for k, f in self._dataset_file_entries.items() if filter_f(f)}
+ removed_files = num_files - len(self._dataset_file_entries)
+
+ # add remaining files
+ added_files = self._add_files(path=local_path, dataset_path=dataset_path, recursive=True, verbose=verbose)
+
+ if verbose:
+ self._task.get_logger().report_text(
+ 'Syncing folder {} : {} files removed, {} added / modified'.format(
+ local_path.as_posix(), removed_files, added_files))
+
+ # update the task script
+ self._add_script_call(
+ 'sync_folder', local_path=local_path, dataset_path=dataset_path)
+
+ self._serialize()
+ return removed_files, added_files
+
+ def upload(self, show_progress=True, verbose=False, output_url=None, compression=None):
+ # type: (bool, bool, Optional[str], Optional[str]) -> ()
+ """
+ Start file uploading, the function returns when all files are uploaded.
+
+ :param show_progress: If True show upload progress bar
+ :param verbose: If True print verbose progress report
+ :param output_url: Target storage for the compressed dataset (default: file server)
+ Examples: `s3://bucket/data`, `gs://bucket/data` , `azure://bucket/data` , `/mnt/share/data`
+ :param compression: Compression algorithm for the Zipped dataset file (default: ZIP_DEFLATED)
+ """
+ # set output_url
+ if output_url:
+ self._task.output_uri = output_url
+
+ self._task.get_logger().report_text(
+ 'Uploading dataset files: {}'.format(
+ dict(show_progress=show_progress, verbose=verbose, output_url=output_url, compression=compression)),
+ print_console=False)
+
+ fd, zip_file = mkstemp(
+ prefix='dataset.{}.'.format(self._id), suffix='.zip'
+ )
+ archive_preview = ''
+ count = 0
+ try:
+ with ZipFile(zip_file, 'w', allowZip64=True, compression=compression or ZIP_DEFLATED) as zf:
+ for file_entry in self._dataset_file_entries.values():
+ if not file_entry.local_path:
+ # file is located in a different version
+ continue
+ filename = Path(file_entry.local_path)
+ if not filename.is_file():
+ LoggerRoot.get_base_logger().warning(
+ "Could not store dataset file {}. File skipped".format(file_entry.local_path))
+ # mark for removal
+ file_entry.relative_path = None
+ continue
+ if verbose:
+ self._task.get_logger().report_text('Compressing {}'.format(filename.as_posix()))
+
+ relative_file_name = file_entry.relative_path
+ zf.write(filename.as_posix(), arcname=relative_file_name)
+ archive_preview += '{} - {}\n'.format(
+ relative_file_name, humanfriendly.format_size(filename.stat().st_size))
+ file_entry.local_path = None
+ count += 1
+ except Exception as e:
+ # failed uploading folder:
+ LoggerRoot.get_base_logger().warning(
+ 'Exception {}\nFailed zipping dataset.'.format(e))
+ return False
+ finally:
+ os.close(fd)
+
+ zip_file = Path(zip_file)
+
+ if not count:
+ zip_file.unlink()
+ LoggerRoot.get_base_logger().warning('No pending files, uploaded aborted')
+ return False
+
+ archive_preview = 'Dataset archive content [{} files]:\n'.format(count) + archive_preview
+
+ # noinspection PyBroadException
+ try:
+ # let's try to rename it
+ new_zip_file = zip_file.parent / 'dataset.{}.zip'.format(self._id)
+ zip_file.rename(new_zip_file)
+ zip_file = new_zip_file
+ except Exception:
+ pass
+ # remove files that could not be zipped, containing Null relative Path
+ self._dataset_file_entries = {k: v for k, v in self._dataset_file_entries.items()
+ if v.relative_path is not None}
+ # start upload
+ zip_file_size = humanfriendly.format_size(Path(zip_file).stat().st_size)
+ self._task.get_logger().report_text(
+ 'Uploading compressed dataset changes ({} files, total {})'.format(count, zip_file_size))
+ self._task.upload_artifact(
+ name=self.__data_entry_name, artifact_object=Path(zip_file), preview=archive_preview,
+ delete_after_upload=True, wait_on_upload=True)
+ self._task.get_logger().report_text('Upload completed ({})'.format(zip_file_size))
+
+ self._add_script_call(
+ 'upload', show_progress=show_progress, verbose=verbose, output_url=output_url, compression=compression)
+
+ self._dirty = False
+ self._serialize()
+
+ def finalize(self, verbose=False, raise_on_error=True):
+ # type: (bool, bool) -> bool
+ """
+ Finalize the dataset (if upload was not called, it will be called automatically) publish dataset Task.
+ If files need to be uploaded, throw exception (or return False)
+
+ :param verbose: If True print verbose progress report
+ :param raise_on_error: If True raise exception if dataset finalizing failed
+ """
+ # check we do not have files waiting for upload.
+ if self._dirty:
+ if raise_on_error:
+ raise ValueError("Cannot finalize dataset, pending uploads. Call Dataset.upload(...)")
+ return False
+
+ self._task.get_logger().report_text('Finalizing dataset', print_console=False)
+
+ # make sure we have no redundant parent versions
+ self._serialize()
+ self._add_script_call('finalize')
+ if verbose:
+ print('Updating statistics and genealogy')
+ self._report_dataset_genealogy()
+ hashed_nodes = [self._get_dataset_id_hash(k) for k in self._dependency_graph.keys()]
+ self._task.comment = 'Dependencies: {}\n'.format(hashed_nodes)
+ self._task.close()
+ self._task.completed()
+ return True
+
+ def is_final(self):
+ # type: () -> bool
+ """
+ Return True if the dataset was finalized and cannot be changed any more.
+
+ :return: True if dataset if final
+ """
+ return self._task.get_status() not in (
+ Task.TaskStatusEnum.in_progress, Task.TaskStatusEnum.created, Task.TaskStatusEnum.failed)
+
+ def get_local_copy(self, use_soft_links=None, raise_on_error=True):
+ # type: (bool, bool) -> str
+ """
+ return a base folder with a read-only (immutable) local copy of the entire dataset
+ download and copy / soft-link, files from all the parent dataset versions
+
+ :param use_soft_links: If True use soft links, default False on windows True on Posix systems
+ :param raise_on_error: If True raise exception if dataset merging failed on any file
+ :return: A base folder for the entire dataset
+ """
+ assert self._id
+ if not self._task:
+ self._task = Task.get_task(task_id=self._id)
+
+ # now let's merge the parents
+ target_folder = self._merge_datasets(use_soft_links=use_soft_links, raise_on_error=raise_on_error)
+ return target_folder
+
+ def get_mutable_local_copy(self, target_folder, overwrite=False, raise_on_error=True):
+ # type: (Union[Path, _Path, str], bool, bool) -> Optional[str]
+ """
+ return a base folder with a writable (mutable) local copy of the entire dataset
+ download and copy / soft-link, files from all the parent dataset versions
+
+ :param target_folder: Target folder for the writable copy
+ :param overwrite: If True, recursively delete the target folder before creating a copy.
+ If False (default) and target folder contains files, raise exception or return None
+ :param raise_on_error: If True raise exception if dataset merging failed on any file
+ :return: A the target folder containing the entire dataset
+ """
+ assert self._id
+ target_folder = Path(target_folder)
+ target_folder.mkdir(parents=True, exist_ok=True)
+ # noinspection PyBroadException
+ try:
+ target_folder.rmdir()
+ except Exception:
+ if not overwrite:
+ if raise_on_error:
+ raise ValueError("Target folder {} already contains files".format(target_folder.as_posix()))
+ else:
+ return None
+ shutil.rmtree(target_folder.as_posix())
+
+ ro_folder = self.get_local_copy(raise_on_error=raise_on_error)
+ shutil.copytree(ro_folder, target_folder.as_posix())
+ return target_folder.as_posix()
+
+ def list_files(self, dataset_path=None, recursive=True, dataset_id=None):
+ # type: (Optional[str], bool, Optional[str]) -> List[str]
+ """
+ returns a list of files in the current dataset
+ If dataset_id is give, return a list of files that remained unchanged since the specified dataset_version
+
+ :param dataset_path: Only match files matching the dataset_path (including wildcards).
+ Example: folder/sub/*.json
+ :param recursive: If True (default) matching dataset_path recursively
+ :param dataset_id: Filter list based on the dataset id containing the latest version of the file.
+ Default: None, do not filter files based on parent dataset.
+ :return: List of files with relative path
+ (files might not be available locally until get_local_copy() is called)
+ """
+ files = self._dataset_file_entries.keys() if not dataset_id else \
+ [k for k, v in self._dataset_file_entries.items() if v.parent_dataset_id == dataset_id]
+
+ if not dataset_path:
+ return sorted(files)
+
+ if dataset_path.startswith('/'):
+ dataset_path = dataset_path[1:]
+
+ if not recursive:
+ return sorted([k for k in files if fnmatch(k + '/', dataset_path + '/')])
+
+ wildcard = dataset_path.split('/')[-1]
+ path = dataset_path[:-len(wildcard)] + '*'
+ return sorted([k for k in files if fnmatch(k, path) and fnmatch(k, '*/' + wildcard)])
+
+ def list_removed_files(self, dataset_id=None):
+ # type: (str) -> List[str]
+ """
+ return a list of files removed when comparing to a specific dataset_version
+
+ :param dataset_id: dataset id (str) to compare against, if None is given compare against the parents datasets
+ :return: List of files with relative path
+ (files might not be available locally until get_local_copy() is called)
+ """
+ datasets = self._dependency_graph[self._id] if not dataset_id else [dataset_id]
+ unified_list = set()
+ for ds_id in datasets:
+ dataset = self.get(dataset_id=ds_id)
+ unified_list |= set(dataset._dataset_file_entries.keys())
+
+ removed_list = [f for f in unified_list if f not in self._dataset_file_entries]
+ return sorted(removed_list)
+
+ def list_modified_files(self, dataset_id=None):
+ # type: (str) -> List[str]
+ """
+ return a list of files removed when comparing to a specific dataset_version
+
+ :param dataset_id: dataset id (str) to compare against, if None is given compare against the parents datasets
+ :return: List of files with relative path
+ (files might not be available locally until get_local_copy() is called)
+ """
+ datasets = self._dependency_graph[self._id] if not dataset_id else [dataset_id]
+ unified_list = dict()
+ for ds_id in datasets:
+ dataset = self.get(dataset_id=ds_id)
+ unified_list.update(dict((k, v.hash) for k, v in dataset._dataset_file_entries.items()))
+
+ modified_list = [k for k, v in self._dataset_file_entries.items()
+ if k in unified_list and v.hash != unified_list[k]]
+ return sorted(modified_list)
+
+ def list_added_files(self, dataset_id=None):
+ # type: (str) -> List[str]
+ """
+ return a list of files removed when comparing to a specific dataset_version
+
+ :param dataset_id: dataset id (str) to compare against, if None is given compare against the parents datasets
+ :return: List of files with relative path
+ (files might not be available locally until get_local_copy() is called)
+ """
+ datasets = self._dependency_graph[self._id] if not dataset_id else [dataset_id]
+ unified_list = set()
+ for ds_id in datasets:
+ dataset = self.get(dataset_id=ds_id)
+ unified_list |= set(dataset._dataset_file_entries.keys())
+
+ added_list = [f for f in self._dataset_file_entries.keys() if f not in unified_list]
+ return sorted(added_list)
+
+ def get_dependency_graph(self):
+ """
+ return the DAG of the dataset dependencies (all previous dataset version and their parents/
+ Example:
+ {
+ 'current_dataset_id': ['parent_1_id', 'parent_2_id'],
+ 'parent_2_id': ['parent_1_id'],
+ 'parent_1_id': [],
+ }
+
+ :return: dict representing the genealogy dag graph of the current dataset
+ """
+ return deepcopy(self._dependency_graph)
+
+ def verify_dataset_hash(self, local_copy_path=None, skip_hash=False, verbose=False):
+ # type: (Optional[str], bool, bool) -> List[str]
+ """
+ Verify the current copy of the dataset against the stored hash
+
+ :param local_copy_path: Specify local path containing a copy of the dataset,
+ If not provide use the cached folder
+ :param skip_hash: If True, skip hash checks and verify file size only
+ :param verbose: If True print errors while testing dataset files hash
+ :return: List of files with unmatched hashes
+ """
+ local_path = local_copy_path or self.get_local_copy()
+
+ def compare(file_entry):
+ file_entry_copy = copy(file_entry)
+ file_entry_copy.local_path = (Path(local_path) / file_entry.relative_path).as_posix()
+ if skip_hash:
+ file_entry_copy.size = Path(file_entry_copy.local_path).stat().st_size
+ if file_entry_copy.size != file_entry.size:
+ if verbose:
+ print('Error: file size mismatch {} expected size {} current {}'.format(
+ file_entry.relative_path, file_entry.size, file_entry_copy.size))
+ return file_entry
+ else:
+ self._calc_file_hash(file_entry_copy)
+ if file_entry_copy.hash != file_entry.hash:
+ if verbose:
+ print('Error: hash mismatch {} expected size/hash {}/{} recalculated {}/{}'.format(
+ file_entry.relative_path,
+ file_entry.size, file_entry.hash,
+ file_entry_copy.size, file_entry_copy.hash))
+ return file_entry
+
+ return None
+
+ pool = ThreadPool(cpu_count() * 2)
+ matching_errors = pool.map(compare, self._dataset_file_entries.values())
+ pool.close()
+ return [f.relative_path for f in matching_errors if f is not None]
+
+ @classmethod
+ def create(cls, dataset_project, dataset_name, parent_datasets=None):
+ # type: (str, str, Optional[Sequence[Union[str, Dataset]]]) -> Dataset
+ """
+ Create a new dataset. Multiple dataset parents are supported.
+ Merging of parent datasets is done based on the order,
+ where each one can override overlapping files in the previous parent
+
+ :param dataset_project: Project containing the dataset
+ :param dataset_name: Naming the new dataset
+ :param parent_datasets: Expand a parent dataset by adding/removing files
+ :return: Newly created Dataset object
+ """
+ parent_datasets = [cls.get(dataset_id=p) if isinstance(p, str) else p for p in (parent_datasets or [])]
+ if any(not p.is_final() for p in parent_datasets):
+ raise ValueError("Cannot inherit from a parent that was not finalized/closed")
+
+ # merge datasets according to order
+ dataset_file_entries = {}
+ dependency_graph = {}
+ for p in parent_datasets:
+ dataset_file_entries.update(deepcopy(p._dataset_file_entries))
+ dependency_graph.update(deepcopy(p._dependency_graph))
+ instance = cls(_private=cls.__private_magic, dataset_project=dataset_project, dataset_name=dataset_name)
+ instance._task.get_logger().report_text('Dataset created', print_console=False)
+ instance._dataset_file_entries = dataset_file_entries
+ instance._dependency_graph = dependency_graph
+ instance._dependency_graph[instance._id] = [p._id for p in parent_datasets]
+ instance._serialize()
+ instance._task.flush(wait_for_uploads=True)
+ return instance
+
+ @classmethod
+ def delete(cls, dataset_id=None, dataset_project=None, dataset_name=None, force=False):
+ # type: (Optional[str], Optional[str], Optional[str], bool) -> ()
+ """
+ Delete a dataset, raise exception if dataset is used by other dataset versions.
+ Use force=True to forcefully delete the dataset
+
+ :param dataset_id: Dataset id to delete
+ :param dataset_project: Project containing the dataset
+ :param dataset_name: Naming the new dataset
+ :param force: If True delete even if other datasets depend on the specified dataset version
+ """
+ mutually_exclusive(dataset_id=dataset_id, dataset_project=dataset_project)
+ mutually_exclusive(dataset_id=dataset_id, dataset_name=dataset_name)
+ if not dataset_id:
+ tasks = Task.get_tasks(
+ project_name=dataset_project,
+ task_name=exact_match_regex(dataset_name) if dataset_name else None,
+ task_filter=dict(
+ system_tags=[cls.__tag],
+ type=[str(Task.TaskTypes.data_processing)],
+ page_size=2, page=0,)
+ )
+ if not tasks:
+ raise ValueError("Dataset project={} name={} could not be found".format(dataset_project, dataset_name))
+ if len(tasks) > 1:
+ raise ValueError("Too many datasets matching project={} name={}".format(dataset_project, dataset_name))
+ dataset_id = tasks[0].id
+
+ # check if someone is using the datasets
+ if not force:
+ # noinspection PyProtectedMember
+ dependencies = Task._query_tasks(
+ system_tags=[cls.__tag],
+ type=[str(Task.TaskTypes.data_processing)],
+ only_fields=['created', 'id', 'name'],
+ search_text='{}'.format(cls._get_dataset_id_hash(dataset_id))
+ )
+ # filter us out
+ if dependencies:
+ dependencies = [d for d in dependencies if d.id != dataset_id]
+ if dependencies:
+ raise ValueError("Dataset id={} is used by datasets: {}".format(
+ dataset_id, [d.id for d in dependencies]))
+
+ client = APIClient()
+ # notice the force here is a must, since the state is never draft
+ # noinspection PyBroadException
+ try:
+ t = client.tasks.get_by_id(dataset_id)
+ except Exception:
+ t = None
+ if not t:
+ raise ValueError("Dataset id={} could not be found".format(dataset_id))
+ if str(t.type) != str(Task.TaskTypes.data_processing) or cls.__tag not in t.system_tags:
+ raise ValueError("Dataset id={} is not of type Dataset".format(dataset_id))
+
+ task = Task.get_task(task_id=dataset_id)
+ # first delete all the artifacts from the dataset
+ for artifact in task.artifacts.values():
+ h = StorageHelper.get(artifact.url)
+ # noinspection PyBroadException
+ try:
+ h.delete(artifact.url)
+ except Exception as ex:
+ LoggerRoot.get_base_logger().warning('Failed deleting remote file \'{}\': {}'.format(
+ artifact.url, ex))
+
+ # now delete the actual task
+ client.tasks.delete(task=dataset_id, force=True)
+
+ @classmethod
+ def get(cls, dataset_id=None, dataset_project=None, dataset_name=None, only_completed=False, only_published=False):
+ # type: (Optional[str], Optional[str], Optional[str], bool , bool) -> Dataset
+ """
+ Get a specific Dataset. If only dataset_project is given, return the last Dataset in the Dataset project
+
+ :param dataset_id: Requested Dataset ID
+ :param dataset_project: Requested Dataset project name
+ :param dataset_name: Requested Dataset name
+ :param only_completed: Return only if the requested dataset is completed or published
+ :param only_published: Return only if the requested dataset is published
+ :return: Dataset object
+ """
+ mutually_exclusive(dataset_id=dataset_id, dataset_project=dataset_project)
+ mutually_exclusive(dataset_id=dataset_id, dataset_name=dataset_name)
+ tasks = Task.get_tasks(
+ task_ids=[dataset_id] if dataset_id else None,
+ project_name=dataset_project,
+ task_name=exact_match_regex(dataset_name) if dataset_name else None,
+ task_filter=dict(
+ system_tags=[cls.__tag, '-archived'], order_by=['-created'],
+ type=[str(Task.TaskTypes.data_processing)],
+ page_size=1, page=0,
+ status=['publish'] if only_published else
+ ['publish', 'stopped', 'completed', 'closed'] if only_completed else None)
+ )
+ if not tasks:
+ raise ValueError('Could not find Dataset {} {}'.format(
+ 'id' if dataset_id else 'project/name',
+ dataset_id if dataset_id else (dataset_project, dataset_name)))
+ task = tasks[0]
+ if task.status == 'created':
+ raise ValueError('Dataset id={} is in draft mode, delete and recreate it'.format(task.id))
+ force_download = False if task.status in ('stopped', 'published', 'closed', 'completed') else True
+ local_state_file = StorageManager.get_local_copy(
+ remote_url=task.artifacts[cls.__state_entry_name].url, cache_context=cls.__cache_context,
+ extract_archive=False, name=task.id, force_download=force_download)
+ if not local_state_file:
+ raise ValueError('Could not load Dataset id={} state'.format(task.id))
+
+ instance = cls._deserialize(local_state_file, task)
+ # remove the artifact, just in case
+ if force_download:
+ os.unlink(local_state_file)
+
+ return instance
+
+ @classmethod
+ def squash(cls, dataset_name, dataset_ids=None, dataset_project_name_pairs=None, output_url=None):
+ # type: (str, Optional[Sequence[Union[str, Dataset]]],Optional[Sequence[(str, str)]], Optional[str]) -> Dataset
+ """
+ Generate a new dataset from the squashed set of dataset versions.
+ If a single version is given it will squash to the root (i.e. create single standalone version)
+ If a set of versions are given it will squash the versions diff into a single version
+
+ :param dataset_name: Target name for the newly generated squashed dataset
+ :param dataset_ids: List of dataset Ids (or objects) to squash. Notice order does matter.
+ The versions are merged from first to last.
+ :param dataset_project_name_pairs: List of pairs (project_name, dataset_name) to squash.
+ Notice order does matter. The versions are merged from first to last.
+ :param output_url: Target storage for the compressed dataset (default: file server)
+ Examples: `s3://bucket/data`, `gs://bucket/data` , `azure://bucket/data` , `/mnt/share/data`
+ :return: Newly created dataset object.
+ """
+ mutually_exclusive(dataset_ids=dataset_ids, dataset_project_name_pairs=dataset_project_name_pairs)
+ datasets = [cls.get(dataset_id=d) for d in dataset_ids] if dataset_ids else \
+ [cls.get(dataset_project=pair[0], dataset_name=pair[1]) for pair in dataset_project_name_pairs]
+ # single dataset to squash, squash it all.
+ if len(datasets) == 1:
+ temp_folder = datasets[0].get_local_copy()
+ parents = set()
+ else:
+ parents = None
+ temp_folder = Path(mkdtemp(prefix='squash-datasets.'))
+ pool = ThreadPool()
+ for ds in datasets:
+ base_folder = Path(ds._extract_dataset_archive())
+ files = [f.relative_path for f in ds.file_entries if f.parent_dataset_id == ds.id]
+ pool.map(
+ lambda x:
+ (temp_folder / x).parent.mkdir(parents=True, exist_ok=True) or
+ shutil.copy((base_folder / x).as_posix(), (temp_folder / x).as_posix(), follow_symlinks=True),
+ files)
+ parents = set(ds._get_parents()) if parents is None else (parents & set(ds._get_parents()))
+ pool.close()
+
+ squashed_ds = cls.create(
+ dataset_project=datasets[0].project, dataset_name=dataset_name, parent_datasets=list(parents))
+ squashed_ds._task.get_logger().report_text('Squashing dataset', print_console=False)
+ squashed_ds.add_files(temp_folder)
+ squashed_ds.upload(output_url=output_url)
+ squashed_ds.finalize()
+ return squashed_ds
+
+ @classmethod
+ def list_datasets(cls, dataset_project=None, partial_name=None, tags=None, ids=None, only_completed=True):
+ # type: (Optional[str], Optional[str], Optional[Sequence[str]], Optional[Sequence[str]], bool) -> List[dict]
+ """
+ Query list of dataset in the system
+
+ :param dataset_project: Specify dataset project name
+ :param partial_name: Specify partial match to a dataset name
+ :param tags: Specify user tags
+ :param ids: List specific dataset based on IDs list
+ :param only_completed: If False return dataset that are still in progress (uploading/edited etc.)
+ :return: List of dictionaries with dataset information
+ Example: [{'name': name, 'project': project name, 'id': dataset_id, 'created': date_created},]
+ """
+ # noinspection PyProtectedMember
+ datasets = Task._query_tasks(
+ task_ids=ids or None, project_name=dataset_project or None,
+ task_name=partial_name,
+ system_tags=[cls.__tag],
+ type=[str(Task.TaskTypes.data_processing)],
+ tags=tags or None,
+ status=['stopped', 'published', 'completed', 'closed'] if only_completed else None,
+ only_fields=['created', 'id', 'name', 'project']
+ )
+ project_ids = {d.project for d in datasets}
+ # noinspection PyProtectedMember
+ project_id_lookup = {d: Task._get_project_name(d) for d in project_ids}
+ return [
+ {'name': d.name, 'created': d.created, 'project': project_id_lookup[d.project], 'id': d.id}
+ for d in datasets
+ ]
+
+ def _add_files(self,
+ path, # type: Union[str, Path, _Path]
+ wildcard=None, # type: Optional[Union[str, Sequence[str]]]
+ local_base_folder=None, # type: Optional[str]
+ dataset_path=None, # type: Optional[str]
+ recursive=True, # type: bool
+ verbose=False # type: bool
+ ):
+ # type: (...) -> int
+ """
+ Add a folder into the current dataset. calculate file hash,
+ and compare against parent, mark files to be uploaded
+
+ :param path: Add a folder/file to the dataset
+ :param wildcard: add only specific set of files.
+ Wildcard matching, can be a single string or a list of wildcards)
+ :param local_base_folder: files will be located based on their relative path from local_base_folder
+ :param dataset_path: where in the dataset the folder/files should be located
+ :param recursive: If True match all wildcard files recursively
+ :param verbose: If True print to console added files
+ """
+ if dataset_path and dataset_path.startswith('/'):
+ dataset_path = dataset_path[1:]
+ path = Path(path)
+ local_base_folder = Path(local_base_folder or path)
+ wildcard = wildcard or '*'
+ # single file, no need for threading
+ if path.is_file():
+ file_entry = self._calc_file_hash(
+ FileEntry(local_path=path.absolute().as_posix(),
+ relative_path=Path(dataset_path or '.') / path.relative_to(local_base_folder),
+ parent_dataset_id=self._id))
+ file_entries = [file_entry]
+ else:
+ # if not a folder raise exception
+ if not path.is_dir():
+ raise ValueError("Could not find file/folder \'{}\'", path.as_posix())
+
+ # prepare a list of files
+ files = list(path.rglob(wildcard)) if recursive else list(path.glob(wildcard))
+ file_entries = [
+ FileEntry(
+ parent_dataset_id=self._id,
+ local_path=f.absolute().as_posix(),
+ relative_path=(Path(dataset_path or '.') / f.relative_to(local_base_folder)).as_posix())
+ for f in files if f.is_file()]
+ self._task.get_logger().report_text('Generating SHA2 hash for {} files'.format(len(file_entries)))
+ pool = ThreadPool(cpu_count() * 2)
+ pool.map(self._calc_file_hash, file_entries)
+ pool.close()
+ self._task.get_logger().report_text('Hash generation completed')
+
+ # merge back into the dataset
+ count = 0
+ for f in file_entries:
+ ds_cur_f = self._dataset_file_entries.get(f.relative_path)
+ if not ds_cur_f:
+ if verbose:
+ self._task.get_logger().report_text('Add {}'.format(f.relative_path))
+ self._dataset_file_entries[f.relative_path] = f
+ count += 1
+ elif ds_cur_f.hash != f.hash:
+ if verbose:
+ self._task.get_logger().report_text('Modified {}'.format(f.relative_path))
+ self._dataset_file_entries[f.relative_path] = f
+ count += 1
+ elif f.parent_dataset_id == self._id and ds_cur_f.parent_dataset_id == self._id:
+ if verbose:
+ self._task.get_logger().report_text('Re-Added {}'.format(f.relative_path))
+ self._dataset_file_entries[f.relative_path] = f
+ count += 1
+ else:
+ if verbose:
+ self._task.get_logger().report_text('Unchanged {}'.format(f.relative_path))
+
+ return count
+
+ def _update_dependency_graph(self):
+ """
+ Update the dependency graph based on the current self._dataset_file_entries state
+ :return:
+ """
+ # collect all dataset versions
+ used_dataset_versions = set(f.parent_dataset_id for f in self._dataset_file_entries.values())
+ used_dataset_versions.add(self._id)
+ current_parents = self._dependency_graph.get(self._id)
+ # remove parent versions we no longer need from the main version list
+ # per version, remove unnecessary parent versions, if we do not need them
+ self._dependency_graph = {k: [p for p in parents if p in used_dataset_versions]
+ for k, parents in self._dependency_graph.items() if k in used_dataset_versions}
+ # make sure we do not remove our parents, for geology sake
+ self._dependency_graph[self._id] = current_parents
+
+ def _serialize(self):
+ """
+ store current state of the Dataset for later use
+ :return: object to be used for later deserialization
+ """
+ self._update_dependency_graph()
+
+ state = dict(
+ dataset_file_entries=[f.as_dict() for f in self._dataset_file_entries.values()],
+ dependency_graph=self._dependency_graph,
+ id=self._id,
+ dirty=self._dirty,
+ )
+ modified_files = [f['size'] for f in state['dataset_file_entries'] if f.get('parent_dataset_id') == self._id]
+ preview = \
+ 'Dataset state\n' \
+ 'Files added/modified: {0} - total size {1}\n' \
+ 'Current dependency graph: {2}\n'.format(
+ len(modified_files), humanfriendly.format_size(sum(modified_files)),
+ json.dumps(self._dependency_graph, indent=2, sort_keys=True))
+ # store as artifact of the Task.
+ self._task.upload_artifact(
+ name=self.__state_entry_name, artifact_object=state, preview=preview, wait_on_upload=True)
+
+ def _download_dataset_archive(self):
+ """
+ Download the dataset archive, return a link to locally stored zip file
+ :return: Path to locally stored zip file
+ """
+ pass # TODO: implement
+
+ def _extract_dataset_archive(self):
+ """
+ Download the dataset archive, and extract the zip content to a cached folder.
+ Notice no merging is done.
+
+ :return: Path to a local storage extracted archive
+ """
+ if not self._task:
+ self._task = Task.get_task(task_id=self._id)
+ local_zip = StorageManager.get_local_copy(
+ remote_url=self._task.artifacts[self.__data_entry_name].url, cache_context=self.__cache_context,
+ extract_archive=False, name=self._id)
+ if not local_zip:
+ raise ValueError("Could not download dataset id={}".format(self._id))
+ local_folder = (Path(local_zip).parent / self._get_cache_folder_name()).as_posix()
+ # if we got here, we need to clear the target folder
+ shutil.rmtree(local_folder, ignore_errors=True)
+ # noinspection PyProtectedMember
+ local_folder = StorageManager._extract_to_cache(
+ cached_file=local_zip, name=self._id,
+ cache_context=self.__cache_context, target_folder=local_folder)
+ return local_folder
+
+ def _merge_datasets(self, use_soft_links=None, raise_on_error=True):
+ # type: (bool, bool) -> str
+ """
+ download and copy / soft-link, files from all the parent dataset versions
+ :param use_soft_links: If True use soft links, default False on windows True on Posix systems
+ :param raise_on_error: If True raise exception if dataset merging failed on any file
+ :return: the target folder
+ """
+ if use_soft_links is None:
+ use_soft_links = False if is_windows() else True
+
+ # check if we already have everything
+ target_base_folder, target_base_size = CacheManager.get_cache_manager(
+ cache_context=self.__cache_context).get_cache_file(local_filename=self._get_cache_folder_name())
+ if target_base_folder and target_base_size is not None:
+ target_base_folder = Path(target_base_folder)
+ # check dataset file size, if we have a full match no need for parent dataset download / merge
+ verified = True
+ # noinspection PyBroadException
+ try:
+ for f in self._dataset_file_entries.values():
+ if (target_base_folder / f.relative_path).stat().st_size != f.size:
+ verified = False
+ break
+ except Exception:
+ verified = False
+
+ if verified:
+ return target_base_folder.as_posix()
+ else:
+ LoggerRoot.get_base_logger().info('Dataset needs refreshing, fetching all parent datasets')
+
+ # first get our dataset
+ target_base_folder = Path(self._extract_dataset_archive())
+ target_base_folder.touch()
+
+ # create thread pool
+ pool = ThreadPool(cpu_count() * 2)
+ for dataset_version_id in self._get_dependencies_by_order():
+ # make sure we skip over empty dependencies
+ if dataset_version_id not in self._dependency_graph:
+ continue
+
+ ds = Dataset.get(dataset_id=dataset_version_id)
+ ds_base_folder = Path(ds._extract_dataset_archive())
+ ds_base_folder.touch()
+
+ def copy_file(file_entry):
+ if file_entry.parent_dataset_id != dataset_version_id:
+ return
+ source = (ds_base_folder / file_entry.relative_path).as_posix()
+ target = (target_base_folder / file_entry.relative_path).as_posix()
+ try:
+ # make sure we have can overwrite the target file
+ # noinspection PyBroadException
+ try:
+ os.unlink(target)
+ except Exception:
+ Path(target).parent.mkdir(parents=True, exist_ok=True)
+
+ # copy / link
+ if use_soft_links:
+ if not os.path.isfile(source):
+ raise ValueError("Extracted file missing {}".format(source))
+ os.symlink(source, target)
+ else:
+ shutil.copy2(source, target, follow_symlinks=True)
+ except Exception as ex:
+ LoggerRoot.get_base_logger().warning('{}\nFailed {} file {} to {}'.format(
+ ex, 'linking' if use_soft_links else 'copying', source, target))
+ return ex
+
+ return None
+
+ errors = pool.map(copy_file, self._dataset_file_entries.values())
+ if raise_on_error and any(errors):
+ raise ValueError("Dataset merging failed: {}".format([e for e in errors if e is not None]))
+
+ pool.close()
+ return target_base_folder.as_posix()
+
+ def _get_dependencies_by_order(self, include_unused=False):
+ # type: (bool) -> List[str]
+ """
+ Return the dataset dependencies by order of application (from the last to the current)
+ :param bool include_unused: If True include unused datasets in the dependencies
+ :return: list of str representing the datasets id
+ """
+ roots = [self._id]
+ dependencies = []
+ # noinspection DuplicatedCode
+ while roots:
+ r = roots.pop(0)
+ dependencies.append(r)
+ # add the parents of the current node, only if the parents are in the general graph node list
+ if include_unused and r not in self._dependency_graph:
+ roots.extend(list(reversed(
+ [p for p in self.get(dataset_id=r)._get_parents() if p not in roots])))
+ else:
+ roots.extend(list(reversed(
+ [p for p in self._dependency_graph.get(r, [])
+ if p not in roots and (include_unused or (p in self._dependency_graph))])))
+
+ # make sure we cover leftovers
+ leftovers = set(self._dependency_graph.keys()) - set(dependencies)
+ if leftovers:
+ roots = list(leftovers)
+ # noinspection DuplicatedCode
+ while roots:
+ r = roots.pop(0)
+ dependencies.append(r)
+ # add the parents of the current node, only if the parents are in the general graph node list
+ if include_unused and r not in self._dependency_graph:
+ roots.extend(list(reversed(
+ [p for p in self.get(dataset_id=r)._get_parents() if p not in roots])))
+ else:
+ roots.extend(list(reversed(
+ [p for p in self._dependency_graph.get(r, [])
+ if p not in roots and (include_unused or (p in self._dependency_graph))])))
+
+ # skip our id
+ return list(reversed(dependencies[1:]))
+
+ def _get_parents(self):
+ # type: () -> Sequence[str]
+ """
+ Return a list of direct parent datasets (str)
+ :return: list of dataset ids
+ """
+ return self._dependency_graph[self.id]
+
+ @classmethod
+ def _deserialize(cls, stored_state, task):
+ # type: (Union[dict, str, Path, _Path], Task) -> Dataset
+ """
+ reload a dataset state from the stored_state object
+ :param task: Task object associated with the dataset
+ :return: A Dataset object
+ """
+ assert isinstance(stored_state, (dict, str, Path, _Path))
+
+ if isinstance(stored_state, (str, Path, _Path)):
+ stored_state_file = Path(stored_state).as_posix()
+ with open(stored_state_file, 'rt') as f:
+ stored_state = json.load(f)
+
+ instance = cls(_private=cls.__private_magic, task=task)
+ # assert instance._id == stored_state['id'] # They should match
+ instance._dependency_graph = stored_state['dependency_graph']
+ instance._dirty = stored_state.get('dirty', False)
+ instance._dataset_file_entries = {
+ s['relative_path']: FileEntry(**s) for s in stored_state['dataset_file_entries']}
+ return instance
+
+ @staticmethod
+ def _calc_file_hash(file_entry):
+ # calculate hash
+ file_entry.hash, _ = sha256sum(file_entry.local_path)
+ file_entry.size = Path(file_entry.local_path).stat().st_size
+ return file_entry
+
+ @classmethod
+ def _get_dataset_id_hash(cls, dataset_id):
+ # type: (str) -> str
+ """
+ Return hash used to search for the dataset id in text fields.
+ This is not a strong hash and used for defining dependencies.
+ :param dataset_id:
+ :return:
+ """
+ return 'dsh{}'.format(md5text(dataset_id))
+
+ def _get_cache_folder_name(self):
+ return '{}{}'.format(self.__cache_folder_prefix, self._id)
+
+ def _add_script_call(self, func_name, **kwargs):
+ # type: (str, **Any) -> ()
+ args = ', '.join('\n {}={}'.format(k, '\''+str(v)+'\'' if isinstance(v, (str, Path, _Path)) else v)
+ for k, v in kwargs.items())
+ if args:
+ args += '\n'
+ line = 'ds.{}({})\n'.format(func_name, args)
+ self._task.data.script.diff += line
+ # noinspection PyProtectedMember
+ self._task._edit(script=self._task.data.script)
+
+ def _report_dataset_genealogy(self):
+ sankey_node = dict(
+ label=[],
+ color=[],
+ customdata=[],
+ hovertemplate='%{customdata}',
+ hoverlabel={"align": "left"},
+ )
+ sankey_link = dict(
+ source=[],
+ target=[],
+ value=[],
+ hovertemplate='',
+ )
+ # get DAG nodes
+ nodes = self._get_dependencies_by_order(include_unused=True) + [self.id]
+ # dataset name lookup
+ # noinspection PyProtectedMember
+ node_names = {t.id: t.name for t in Task._query_tasks(task_ids=nodes, only_fields=['id', 'name'])}
+ node_details = {}
+ # Generate table and details
+ table_values = [["Dataset id", "name", "removed", "modified", "added", "size"]]
+ for node in nodes:
+ count = 0
+ size = 0
+ for f in self._dataset_file_entries.values():
+ if f.parent_dataset_id == node:
+ count += 1
+ size += f.size
+ removed = len(self.list_removed_files(node))
+ modified = len(self.list_modified_files(node))
+ table_values += [[node, node_names.get(node, ''),
+ removed, modified, count-modified, humanfriendly.format_size(size)]]
+ node_details[node] = [removed, modified, count-modified, humanfriendly.format_size(size)]
+
+ # create DAG
+ visited = []
+ for idx, node in enumerate(nodes):
+ visited.append(node)
+ if node in self._dependency_graph:
+ parents = [visited.index(p) for p in self._dependency_graph[node] or [] if p in visited]
+ else:
+ parents = [visited.index(p) for p in self.get(dataset_id=node)._get_parents() or [] if p in visited]
+
+ sankey_node['color'].append("mediumpurple" if node == self.id else "lightblue")
+ sankey_node['label'].append('{}'.format(node))
+ sankey_node['customdata'].append(
+ "name {}
removed {}
modified {}
added {}
size {}".format(
+ node_names.get(node, ''), *node_details[node]))
+
+ for p in parents:
+ sankey_link['source'].append(p)
+ sankey_link['target'].append(idx)
+ sankey_link['value'].append(max(1, node_details[visited[p]][-2]))
+
+ # create the sankey graph
+ dag_flow = dict(
+ link=sankey_link,
+ node=sankey_node,
+ textfont=dict(color='rgba(0,0,0,255)', size=10),
+ type='sankey',
+ orientation='h'
+ )
+ fig = dict(data=[dag_flow], layout={'xaxis': {'visible': False}, 'yaxis': {'visible': False}})
+
+ if len(nodes) > 1:
+ self._task.get_logger().report_plotly(
+ title='Dataset Genealogy', series='', iteration=0, figure=fig)
+
+ # report detailed table
+ self._task.get_logger().report_table(
+ title='Dataset Summary', series='Details', iteration=0, table_plot=table_values,
+ extra_layout={"title": "Files by parent dataset id"})
+
+ # report the detailed content of the dataset as configuration,
+ # this allows for easy version comparison in the UI
+ dataset_details = None
+ if len(self._dataset_file_entries) < self.__preview_max_file_entries:
+ file_entries = sorted(self._dataset_file_entries.values(), key=lambda x: x.relative_path)
+ dataset_details = \
+ 'File Name - File Size - Hash (SHA2)\n' +\
+ '\n'.join('{} - {} - {}'.format(f.relative_path, f.size, f.hash) for f in file_entries)
+ # too large to store
+ if not dataset_details or len(dataset_details) > self.__preview_max_size:
+ dataset_details = 'Dataset content is too large to preview'
+
+ # noinspection PyProtectedMember
+ self._task._set_configuration(
+ name='Dataset Content', description='Dataset content preview',
+ config_type='read-only',
+ config_text=dataset_details
+ )
diff --git a/clearml/debugging/log.py b/clearml/debugging/log.py
index 3fbb8d6a..366735a3 100644
--- a/clearml/debugging/log.py
+++ b/clearml/debugging/log.py
@@ -56,7 +56,7 @@ class LoggerRoot(object):
# avoid nested imports
from ..config import get_log_redirect_level
- LoggerRoot.__base_logger = logging.getLogger('trains')
+ LoggerRoot.__base_logger = logging.getLogger('clearml')
level = level if level is not None else default_level
LoggerRoot.__base_logger.setLevel(level)
diff --git a/clearml/debugging/trace.py b/clearml/debugging/trace.py
index e060a54c..5ca1be0b 100644
--- a/clearml/debugging/trace.py
+++ b/clearml/debugging/trace.py
@@ -145,11 +145,11 @@ def _patch_module(module, prefix='', basepath=None, basemodule=None, exclude_pre
prefix += module.__name__.split('.')[-1] + '.'
# Do not patch low level network layer
- if prefix.startswith('trains.backend_api.session.') and prefix != 'trains.backend_api.session.':
+ if prefix.startswith('clearml.backend_api.session.') and prefix != 'clearml.backend_api.session.':
if not prefix.endswith('.Session.') and '.token_manager.' not in prefix:
# print('SKIPPING: {}'.format(prefix))
return
- if prefix.startswith('trains.backend_api.services.'):
+ if prefix.startswith('clearml.backend_api.services.'):
return
for skip in exclude_prefixes:
@@ -208,7 +208,7 @@ def _patch_module(module, prefix='', basepath=None, basemodule=None, exclude_pre
def trace_trains(stream=None, level=1, exclude_prefixes=[], only_prefix=[]):
"""
- DEBUG ONLY - Add full Trains package code trace
+ DEBUG ONLY - Add full ClearML package code trace
Output trace to filename or stream, default is sys.stderr
Trace level
-2: Trace function and arguments and returned call
@@ -244,7 +244,7 @@ def trace_trains(stream=None, level=1, exclude_prefixes=[], only_prefix=[]):
__stream_flush = None
from ..version import __version__
- msg = 'Trains v{} - Starting Trace\n\n'.format(__version__)
+ msg = 'ClearML v{} - Starting Trace\n\n'.format(__version__)
# print to actual stderr
stderr_write(msg)
# store to stream
@@ -252,7 +252,7 @@ def trace_trains(stream=None, level=1, exclude_prefixes=[], only_prefix=[]):
__stream_write('{:9}:{:5}:{:8}: {:14}\n'.format('seconds', 'pid', 'tid', 'self'))
__stream_write('{:9}:{:5}:{:8}:{:15}\n'.format('-' * 9, '-' * 5, '-' * 8, '-' * 15))
__trace_start = time.time()
- _patch_module('trains', exclude_prefixes=exclude_prefixes or [], only_prefix=only_prefix or [])
+ _patch_module('clearml', exclude_prefixes=exclude_prefixes or [], only_prefix=only_prefix or [])
def trace_level(level=1):
@@ -343,7 +343,7 @@ def end_of_program():
if __name__ == '__main__':
- # from trains import Task
+ # from clearml import Task
# task = Task.init(project_name="examples", task_name="trace test")
# trace_trains('_trace.txt', level=2)
print_traced_files('_trace_*.txt', lines_per_tid=10)
diff --git a/clearml/errors.py b/clearml/errors.py
index f3ef762d..8f1ce8c3 100644
--- a/clearml/errors.py
+++ b/clearml/errors.py
@@ -1,3 +1,3 @@
class UsageError(RuntimeError):
- """ An exception raised for illegal usage of trains objects"""
+ """ An exception raised for illegal usage of clearml objects"""
pass
diff --git a/clearml/external/kerastuner.py b/clearml/external/kerastuner.py
index ffbb30fe..60e1bf3b 100644
--- a/clearml/external/kerastuner.py
+++ b/clearml/external/kerastuner.py
@@ -14,7 +14,7 @@ try:
except ImportError:
pd = None
from logging import getLogger
- getLogger('trains.external.kerastuner').warning(
+ getLogger('clearml.external.kerastuner').warning(
'Pandas is not installed, summary table reporting will be skipped.')
@@ -26,7 +26,7 @@ class TrainsTunerLogger(Logger):
super(TrainsTunerLogger, self).__init__()
self.task = task or Task.current_task()
if not self.task:
- raise ValueError("Trains Task could not be found, pass in TrainsTunerLogger or "
+ raise ValueError("ClearML Task could not be found, pass in TrainsTunerLogger or "
"call Task.init before initializing TrainsTunerLogger")
self._summary = pd.DataFrame() if pd else None
diff --git a/clearml/logger.py b/clearml/logger.py
index 69a8604b..11e447c6 100644
--- a/clearml/logger.py
+++ b/clearml/logger.py
@@ -36,14 +36,14 @@ if TYPE_CHECKING:
class Logger(object):
"""
- The ``Logger`` class is the Trains console log and metric statistics interface, and contains methods for explicit
+ The ``Logger`` class is the ClearML console log and metric statistics interface, and contains methods for explicit
reporting.
- Explicit reporting extends Trains automagical capturing of inputs and output. Explicit reporting
+ Explicit reporting extends ClearML automagical capturing of inputs and output. Explicit reporting
methods include scalar plots, line plots, histograms, confusion matrices, 2D and 3D scatter
diagrams, text logging, tables, and image uploading and reporting.
- In the **Trains Web-App (UI)**, ``Logger`` output appears in the **RESULTS** tab, **LOG**, **SCALARS**,
+ In the **ClearML Web-App (UI)**, ``Logger`` output appears in the **RESULTS** tab, **LOG**, **SCALARS**,
**PLOTS**, and **DEBUG SAMPLES** sub-tabs. When you compare experiments, ``Logger`` output appears in the
comparisons.
@@ -90,7 +90,7 @@ class Logger(object):
if self._connect_logging:
StdStreamPatch.patch_logging_formatter(self)
elif not self._connect_std_streams:
- # make sure that at least the main trains logger is connect
+ # make sure that at least the main clearml logger is connect
base_logger = LoggerRoot.get_base_logger()
if base_logger and base_logger.handlers:
StdStreamPatch.patch_logging_formatter(self, base_logger.handlers[0])
@@ -126,7 +126,7 @@ class Logger(object):
logger.report_text('log some text', level=logging.DEBUG, print_console=False)
- You can view the reported text in the **Trains Web-App (UI)**, **RESULTS** tab, **LOG** sub-tab.
+ You can view the reported text in the **ClearML Web-App (UI)**, **RESULTS** tab, **LOG** sub-tab.
:param str msg: The text to log.
:param int level: The log level from the Python ``logging`` package. The default value is ``logging.INFO``.
@@ -151,7 +151,7 @@ class Logger(object):
scalar_series = [random.randint(0,10) for i in range(10)]
logger.report_scalar(title='scalar metrics','series', value=scalar_series[iteration], iteration=0)
- You can view the scalar plots in the **Trains Web-App (UI)**, **RESULTS** tab, **SCALARS** sub-tab.
+ You can view the scalar plots in the **ClearML Web-App (UI)**, **RESULTS** tab, **SCALARS** sub-tab.
:param str title: The title (metric) of the plot. Plot more than one scalar series on the same plot by using
the same ``title`` for each call to this method.
@@ -190,7 +190,7 @@ class Logger(object):
logger.report_vector(title='vector example', series='vector series', values=vector_series, iteration=0,
labels=['A','B'], xaxis='X axis label', yaxis='Y axis label')
- You can view the vectors plots in the **Trains Web-App (UI)**, **RESULTS** tab, **PLOTS** sub-tab.
+ You can view the vectors plots in the **ClearML Web-App (UI)**, **RESULTS** tab, **PLOTS** sub-tab.
:param str title: The title (metric) of the plot.
:param str series: The series name (variant) of the reported histogram.
@@ -237,7 +237,7 @@ class Logger(object):
logger.report_histogram(title='histogram example', series='histogram series',
values=vector_series, iteration=0, labels=['A','B'], xaxis='X axis label', yaxis='Y axis label')
- You can view the reported histograms in the **Trains Web-App (UI)**, **RESULTS** tab, **PLOTS** sub-tab.
+ You can view the reported histograms in the **ClearML Web-App (UI)**, **RESULTS** tab, **PLOTS** sub-tab.
:param str title: The title (metric) of the plot.
:param str series: The series name (variant) of the reported histogram.
@@ -305,7 +305,7 @@ class Logger(object):
logger.report_table(title='table example',series='pandas DataFrame',iteration=0,table_plot=df)
- You can view the reported tables in the **Trains Web-App (UI)**, **RESULTS** tab, **PLOTS** sub-tab.
+ You can view the reported tables in the **ClearML Web-App (UI)**, **RESULTS** tab, **PLOTS** sub-tab.
:param str title: The title (metric) of the table.
:param str series: The series name (variant) of the reported table.
@@ -1022,8 +1022,8 @@ class Logger(object):
The images are uploaded separately. A link to each image is reported.
.. note::
- Credentials for the destination storage are specified in the Trains configuration file,
- ``~/trains.conf``.
+ Credentials for the destination storage are specified in the ClearML configuration file,
+ ``~/clearml.conf``.
:param str uri: example: 's3://bucket/directory/' or 'file:///tmp/debug/'
@@ -1150,7 +1150,7 @@ class Logger(object):
The values are:
- ``True`` - Scalars without specific titles are grouped together in the "Scalars" plot, preserving
- backward compatibility with Trains automagical behavior.
+ backward compatibility with ClearML automagical behavior.
- ``False`` - TensorBoard scalars without titles get a title/series with the same tag. (default)
:type group_scalars: bool
"""
@@ -1214,7 +1214,7 @@ class Logger(object):
try:
# make sure we are writing to the original stdout
StdStreamPatch.stderr_original_write(
- 'trains.Logger failed sending log [level {}]: "{}"\n'.format(level, msg))
+ 'clearml.Logger failed sending log [level {}]: "{}"\n'.format(level, msg))
except Exception:
pass
else:
diff --git a/clearml/model.py b/clearml/model.py
index f13064d4..667ed0d9 100644
--- a/clearml/model.py
+++ b/clearml/model.py
@@ -472,9 +472,9 @@ class InputModel(Model):
framework, and indicate whether to immediately set the model's status to ``Published``.
The model is read-only.
- The **Trains Server** (backend) may already store the model's URL. If the input model's URL is not
- stored, meaning the model is new, then it is imported and Trains stores its metadata.
- If the URL is already stored, the import process stops, Trains issues a warning message, and Trains
+ The **ClearML Server** (backend) may already store the model's URL. If the input model's URL is not
+ stored, meaning the model is new, then it is imported and ClearML stores its metadata.
+ If the URL is already stored, the import process stops, ClearML issues a warning message, and ClearML
reuses the model.
In your Python experiment script, after importing the model, you can connect it to the main execution
@@ -482,12 +482,12 @@ class InputModel(Model):
network.
.. note::
- Using the **Trains Web-App** (user interface), you can reuse imported models and switch models in
+ Using the **ClearML Web-App** (user interface), you can reuse imported models and switch models in
experiments.
- :param str weights_url: A valid URL for the initial weights file. If the **Trains Web-App** (backend)
+ :param str weights_url: A valid URL for the initial weights file. If the **ClearML Web-App** (backend)
already stores the metadata of a model with the same URL, that existing model is returned
- and Trains ignores all other parameters.
+ and ClearML ignores all other parameters.
For example:
@@ -715,7 +715,7 @@ class InputModel(Model):
def __init__(self, model_id):
# type: (str) -> None
"""
- :param str model_id: The Trains Id (system UUID) of the input model whose metadata the **Trains Server**
+ :param str model_id: The ClearML Id (system UUID) of the input model whose metadata the **ClearML Server**
(backend) stores.
"""
super(InputModel, self).__init__(model_id)
@@ -731,16 +731,16 @@ class InputModel(Model):
Connect the current model to a Task object, if the model is preexisting. Preexisting models include:
- Imported models (InputModel objects created using the :meth:`Logger.import_model` method).
- - Models whose metadata is already in the Trains platform, meaning the InputModel object is instantiated
- from the ``InputModel`` class specifying the the model's Trains Id as an argument.
- - Models whose origin is not Trains that are used to create an InputModel object. For example,
+ - Models whose metadata is already in the ClearML platform, meaning the InputModel object is instantiated
+ from the ``InputModel`` class specifying the the model's ClearML Id as an argument.
+ - Models whose origin is not ClearML that are used to create an InputModel object. For example,
models created using TensorFlow models.
When the experiment is executed remotely in a worker, the input model already specified in the experiment is
used.
.. note::
- The **Trains Web-App** allows you to switch one input model for another and then enqueue the experiment
+ The **ClearML Web-App** allows you to switch one input model for another and then enqueue the experiment
to execute in a worker.
:param object task: A Task object.
@@ -789,7 +789,7 @@ class OutputModel(BaseModel):
.. note::
When executing a Task (experiment) remotely in a worker, you can modify the model configuration and / or model's
- label enumeration using the **Trains Web-App**.
+ label enumeration using the **ClearML Web-App**.
"""
@property
@@ -990,7 +990,7 @@ class OutputModel(BaseModel):
Connect the current model to a Task object, if the model is a preexisting model. Preexisting models include:
- Imported models.
- - Models whose metadata the **Trains Server** (backend) is already storing.
+ - Models whose metadata the **ClearML Server** (backend) is already storing.
- Models from another source, such as frameworks like TensorFlow.
:param object task: A Task object.
@@ -1044,8 +1044,8 @@ class OutputModel(BaseModel):
Using this method, files uploads are separate and then a link to each is stored in the model object.
.. note::
- For storage requiring credentials, the credentials are stored in the Trains configuration file,
- ``~/trains.conf``.
+ For storage requiring credentials, the credentials are stored in the ClearML configuration file,
+ ``~/clearml.conf``.
:param str uri: The URI of the upload storage destination.
diff --git a/clearml/storage/cache.py b/clearml/storage/cache.py
index 3e1af1e1..3a80dc8f 100644
--- a/clearml/storage/cache.py
+++ b/clearml/storage/cache.py
@@ -58,10 +58,10 @@ class CacheManager(object):
return cached_file
@staticmethod
- def upload_file(local_file, remote_url, wait_for_upload=True):
+ def upload_file(local_file, remote_url, wait_for_upload=True, retries=1):
helper = StorageHelper.get(remote_url)
result = helper.upload(
- local_file, remote_url, async_enable=not wait_for_upload
+ local_file, remote_url, async_enable=not wait_for_upload, retries=retries,
)
CacheManager._add_remote_url(remote_url, local_file)
return result
diff --git a/clearml/storage/helper.py b/clearml/storage/helper.py
index f8804284..efaa9487 100644
--- a/clearml/storage/helper.py
+++ b/clearml/storage/helper.py
@@ -681,6 +681,7 @@ class StorageHelper(object):
# try to get file size
try:
if isinstance(self._driver, _HttpDriver) and obj:
+ obj = self._driver._get_download_object(obj)
total_size_mb = float(obj.headers.get('Content-Length', 0)) / (1024 * 1024)
elif hasattr(obj, 'size'):
size = obj.size
@@ -785,12 +786,12 @@ class StorageHelper(object):
def check_write_permissions(self, dest_path=None):
# create a temporary file, then de;ete it
base_url = dest_path or self._base_url
- dest_path = base_url + '/.trains.test'
+ dest_path = base_url + '/.clearml.test'
# do not check http/s connection permissions
if dest_path.startswith('http'):
return True
try:
- self.upload_from_stream(stream=six.BytesIO(b'trains'), dest_path=dest_path)
+ self.upload_from_stream(stream=six.BytesIO(b'clearml'), dest_path=dest_path)
self.delete(path=dest_path)
except Exception:
raise ValueError('Insufficient permissions for {}'.format(base_url))
@@ -1024,6 +1025,11 @@ class _HttpDriver(_Driver):
return self._default_backend_session.add_auth_headers({})
return None
+ class _HttpSessionHandle(object):
+ def __init__(self, url, is_stream, container_name, object_name):
+ self.url, self.is_stream, self.container_name, self.object_name = \
+ url, is_stream, container_name, object_name
+
def __init__(self, retries=5):
self._retries = retries
self._containers = {}
@@ -1055,24 +1061,39 @@ class _HttpDriver(_Driver):
def list_container_objects(self, *args, **kwargs):
raise NotImplementedError('List is not implemented for http protocol')
- def delete_object(self, *args, **kwargs):
- raise NotImplementedError('Delete is not implemented for http protocol')
+ def delete_object(self, obj, *args, **kwargs):
+ assert isinstance(obj, self._HttpSessionHandle)
+ container = self._containers[obj.container_name]
+ res = container.session.delete(obj.url, headers=container.get_headers(obj.url))
+ if res.status_code != requests.codes.ok:
+ raise ValueError('Failed deleting object %s (%d): %s' % (obj.object_name, res.status_code, res.text))
+ return res
def get_object(self, container_name, object_name, *args, **kwargs):
- container = self._containers[container_name]
- # set stream flag before get request
- container.session.stream = kwargs.get('stream', True)
+ is_stream = kwargs.get('stream', True)
url = ''.join((container_name, object_name.lstrip('/')))
- res = container.session.get(url, timeout=self.timeout, headers=container.get_headers(url))
+ return self._HttpSessionHandle(url, is_stream, container_name, object_name)
+
+ def _get_download_object(self, obj):
+ # bypass for session result
+ if not isinstance(obj, self._HttpSessionHandle):
+ return obj
+
+ container = self._containers[obj.container_name]
+ # set stream flag before we send the request
+ container.session.stream = obj.is_stream
+ res = container.session.get(obj.url, timeout=self.timeout, headers=container.get_headers(obj.url))
if res.status_code != requests.codes.ok:
- raise ValueError('Failed getting object %s (%d): %s' % (object_name, res.status_code, res.text))
+ raise ValueError('Failed getting object %s (%d): %s' % (obj.object_name, res.status_code, res.text))
return res
def download_object_as_stream(self, obj, chunk_size=64 * 1024, **_):
# return iterable object
+ obj = self._get_download_object(obj)
return obj.iter_content(chunk_size=chunk_size)
def download_object(self, obj, local_path, overwrite_existing=True, delete_on_failure=True, callback=None, **_):
+ obj = self._get_download_object(obj)
p = Path(local_path)
if not overwrite_existing and p.is_file():
log.warning('failed saving after download: overwrite=False and file exists (%s)' % str(p))
diff --git a/clearml/storage/manager.py b/clearml/storage/manager.py
index 13d13f8f..c3426b30 100644
--- a/clearml/storage/manager.py
+++ b/clearml/storage/manager.py
@@ -48,8 +48,8 @@ class StorageManager(object):
@classmethod
def upload_file(
- cls, local_file, remote_url, wait_for_upload=True
- ): # type: (str, str, bool) -> str
+ cls, local_file, remote_url, wait_for_upload=True, retries=1
+ ): # type: (str, str, bool, int) -> str
"""
Upload a local file to a remote location. remote url is the finale destination of the uploaded file.
@@ -64,12 +64,14 @@ class StorageManager(object):
:param str local_file: Full path of a local file to be uploaded
:param str remote_url: Full path or remote url to upload to (including file name)
:param bool wait_for_upload: If False, return immediately and upload in the background. Default True.
+ :param int retries: Number of retries before failing to upload file, default 1.
:return: Newly uploaded remote URL.
"""
return CacheManager.get_cache_manager().upload_file(
local_file=local_file,
remote_url=remote_url,
wait_for_upload=wait_for_upload,
+ retries=retries,
)
@classmethod
diff --git a/clearml/storage/util.py b/clearml/storage/util.py
index ce53889b..9875e0d2 100644
--- a/clearml/storage/util.py
+++ b/clearml/storage/util.py
@@ -1,6 +1,6 @@
import hashlib
import sys
-from typing import Optional
+from typing import Optional, Union
from six.moves.urllib.parse import quote, urlparse, urlunparse
import six
@@ -72,8 +72,23 @@ def sha256sum(filename, skip_header=0, block_size=65536):
return h.hexdigest(), file_hash.hexdigest() if skip_header else None
+def md5text(text, seed=1337):
+ # type: (str, Union[int, str]) -> str
+ """
+ Return md5 hash of a string
+ Do not use this hash for security, if needed use something stronger like SHA2
+
+ :param text: string to hash
+ :param seed: use prefix seed for hashing
+ :return: md5 string
+ """
+ h = hashlib.md5()
+ h.update((str(seed) + str(text)).encode('utf-8'))
+ return h.hexdigest()
+
+
def is_windows():
"""
:return: True if currently running on windows OS
"""
- return sys.platform == 'win32'
\ No newline at end of file
+ return sys.platform == 'win32'
diff --git a/clearml/task.py b/clearml/task.py
index ee70255b..12b3a9ef 100644
--- a/clearml/task.py
+++ b/clearml/task.py
@@ -77,8 +77,8 @@ class Task(_Task):
configuration, label enumeration, models, and other artifacts.
The term "main execution Task" refers to the Task context for current running experiment. Python experiment scripts
- can create one, and only one, main execution Task. It is a traceable, and after a script runs and Trains stores
- the Task in the **Trains Server** (backend), it is modifiable, reproducible, executable by a worker, and you
+ can create one, and only one, main execution Task. It is a traceable, and after a script runs and ClearML stores
+ the Task in the **ClearML Server** (backend), it is modifiable, reproducible, executable by a worker, and you
can duplicate it for further experimentation.
The ``Task`` class and its methods allow you to create and manage experiments, as well as perform
@@ -93,7 +93,7 @@ class Task(_Task):
- Create a new reproducible Task - :meth:`Task.init`
.. important::
- In some cases, ``Task.init`` may return a Task object which is already stored in **Trains Server** (already
+ In some cases, ``Task.init`` may return a Task object which is already stored in **ClearML Server** (already
initialized), instead of creating a new Task. For a detailed explanation of those cases, see the ``Task.init``
method.
@@ -102,17 +102,17 @@ class Task(_Task):
- Get another (different) Task - :meth:`Task.get_task`
.. note::
- The **Trains** documentation often refers to a Task as, "Task (experiment)".
+ The **ClearML** documentation often refers to a Task as, "Task (experiment)".
- "Task" refers to the class in the Trains Python Client Package, the object in your Python experiment script,
- and the entity with which **Trains Server** and **Trains Agent** work.
+ "Task" refers to the class in the ClearML Python Client Package, the object in your Python experiment script,
+ and the entity with which **ClearML Server** and **ClearML Agent** work.
"Experiment" refers to your deep learning solution, including its connected components, inputs, and outputs,
- and is the experiment you can view, analyze, compare, modify, duplicate, and manage using the Trains
+ and is the experiment you can view, analyze, compare, modify, duplicate, and manage using the ClearML
**Web-App** (UI).
Therefore, a "Task" is effectively an "experiment", and "Task (experiment)" encompasses its usage throughout
- the Trains.
+ the ClearML.
The exception to this Task behavior is sub-tasks (non-reproducible Tasks), which do not use the main execution
Task. Creating a sub-task always creates a new Task with a new Task ID.
@@ -197,7 +197,7 @@ class Task(_Task):
Creates a new Task (experiment) if:
- The Task never ran before. No Task with the same ``task_name`` and ``project_name`` is stored in
- **Trains Server**.
+ **ClearML Server**.
- The Task has run before (the same ``task_name`` and ``project_name``), and (a) it stored models and / or
artifacts, or (b) its status is Published , or (c) it is Archived.
- A new Task is forced by calling ``Task.init`` with ``reuse_last_task_id=False``.
@@ -215,7 +215,7 @@ class Task(_Task):
.. code-block:: py
- from trains import Task
+ from clearml import Task
task = Task.init('myProject', 'myTask')
If this code runs again, it will not create a new Task. It does not store a model or artifact,
@@ -285,7 +285,7 @@ class Task(_Task):
This is equivalent to `continue_last_task=True` and `reuse_last_task_id=a_task_id_string`.
:param str output_uri: The default location for output models and other artifacts. In the default location,
- Trains creates a subfolder for the output. The subfolder structure is the following:
+ ClearML creates a subfolder for the output. The subfolder structure is the following: