Initial commit

This commit is contained in:
allegroai
2021-05-14 02:48:51 +03:00
parent dc5a4e8a0d
commit 77c9a91a95
645 changed files with 37481 additions and 14 deletions

60
.eslintrc.js Normal file
View File

@@ -0,0 +1,60 @@
/**
* Copyright (c) Facebook, Inc. and its affiliates.
*
* This source code is licensed under the MIT license found in the
* LICENSE file in the root directory of this source tree.
*
* @format
*/
const OFF = 0;
const WARNING = 1;
const ERROR = 2;
module.exports = {
root: true,
env: {
browser: true,
commonjs: true,
jest: true,
node: true,
},
parser: 'babel-eslint',
parserOptions: {
allowImportExportEverywhere: true,
},
extends: ['airbnb', 'prettier', 'prettier/react'],
plugins: ['react-hooks', 'header'],
rules: {
// Ignore certain webpack alias because it can't be resolved
'import/no-unresolved': [
ERROR,
{ignore: ['^@theme', '^@docusaurus', '^@generated']},
],
'import/extensions': OFF,
'header/header': [
ERROR,
'block',
[
'*',
' * Copyright (c) Facebook, Inc. and its affiliates.',
' *',
' * This source code is licensed under the MIT license found in the',
' * LICENSE file in the root directory of this source tree.',
' *',
// Unfortunately eslint-plugin-header doesn't support optional lines.
// If you want to enforce your website JS files to have @flow or @format,
// modify these lines accordingly.
{
pattern: '.* @format',
},
' ',
],
],
'react/jsx-closing-bracket-location': OFF, // Conflicts with Prettier.
'react/jsx-filename-extension': OFF,
'react-hooks/rules-of-hooks': ERROR,
'react/prop-types': OFF, // PropTypes aren't used much these days.
},
};

39
.gitignore vendored
View File

@@ -1,14 +1,25 @@
.architect
bootstrap.css
bootstrap.js
bootstrap.json
bootstrap.jsonp
build/
classic.json
classic.jsonp
ext/
modern.json
modern.jsonp
resources/sass/.sass-cache/
resources/.arch-internal-preview.css
.arch-internal-preview.css
# Dependencies
/node_modules
# Production
/build
/.idea
# Generated files
.docusaurus
.cache-loader
# Misc
.DS_Store
.env.local
.env.development.local
.env.test.local
.env.production.local
npm-debug.log*
yarn-debug.log*
yarn-error.log*
# ESLint
.eslintcache

3
.prettierignore Normal file
View File

@@ -0,0 +1,3 @@
node_modules
build
.docusaurus

9
.prettierrc Normal file
View File

@@ -0,0 +1,9 @@
{
"arrowParens": "always",
"bracketSpacing": false,
"jsxBracketSameLine": true,
"printWidth": 80,
"proseWrap": "never",
"singleQuote": true,
"trailingComma": "all"
}

13
.stylelintrc.js Normal file
View File

@@ -0,0 +1,13 @@
/**
* Copyright (c) Facebook, Inc. and its affiliates.
*
* This source code is licensed under the MIT license found in the
* LICENSE file in the root directory of this source tree.
*/
module.exports = {
plugins: ['stylelint-copyright'],
rules: {
'docusaurus/copyright-header': true,
},
};

557
LICENSE Normal file
View File

@@ -0,0 +1,557 @@
Server Side Public License
VERSION 1, OCTOBER 16, 2018
Copyright © 2019 allegro.ai, Inc.
Everyone is permitted to copy and distribute verbatim copies of this
license document, but changing it is not allowed.
TERMS AND CONDITIONS
0. Definitions.
“This License” refers to Server Side Public License.
“Copyright” also means copyright-like laws that apply to other kinds of
works, such as semiconductor masks.
“The Program” refers to any copyrightable work licensed under this
License. Each licensee is addressed as “you”. “Licensees” and
“recipients” may be individuals or organizations.
To “modify” a work means to copy from or adapt all or part of the work in
a fashion requiring copyright permission, other than the making of an
exact copy. The resulting work is called a “modified version” of the
earlier work or a work “based on” the earlier work.
A “covered work” means either the unmodified Program or a work based on
the Program.
To “propagate” a work means to do anything with it that, without
permission, would make you directly or secondarily liable for
infringement under applicable copyright law, except executing it on a
computer or modifying a private copy. Propagation includes copying,
distribution (with or without modification), making available to the
public, and in some countries other activities as well.
To “convey” a work means any kind of propagation that enables other
parties to make or receive copies. Mere interaction with a user through a
computer network, with no transfer of a copy, is not conveying.
An interactive user interface displays “Appropriate Legal Notices” to the
extent that it includes a convenient and prominently visible feature that
(1) displays an appropriate copyright notice, and (2) tells the user that
there is no warranty for the work (except to the extent that warranties
are provided), that licensees may convey the work under this License, and
how to view a copy of this License. If the interface presents a list of
user commands or options, such as a menu, a prominent item in the list
meets this criterion.
1. Source Code.
The “source code” for a work means the preferred form of the work for
making modifications to it. “Object code” means any non-source form of a
work.
A “Standard Interface” means an interface that either is an official
standard defined by a recognized standards body, or, in the case of
interfaces specified for a particular programming language, one that is
widely used among developers working in that language. The “System
Libraries” of an executable work include anything, other than the work as
a whole, that (a) is included in the normal form of packaging a Major
Component, but which is not part of that Major Component, and (b) serves
only to enable use of the work with that Major Component, or to implement
a Standard Interface for which an implementation is available to the
public in source code form. A “Major Component”, in this context, means a
major essential component (kernel, window system, and so on) of the
specific operating system (if any) on which the executable work runs, or
a compiler used to produce the work, or an object code interpreter used
to run it.
The “Corresponding Source” for a work in object code form means all the
source code needed to generate, install, and (for an executable work) run
the object code and to modify the work, including scripts to control
those activities. However, it does not include the work's System
Libraries, or general-purpose tools or generally available free programs
which are used unmodified in performing those activities but which are
not part of the work. For example, Corresponding Source includes
interface definition files associated with source files for the work, and
the source code for shared libraries and dynamically linked subprograms
that the work is specifically designed to require, such as by intimate
data communication or control flow between those subprograms and other
parts of the work.
The Corresponding Source need not include anything that users can
regenerate automatically from other parts of the Corresponding Source.
The Corresponding Source for a work in source code form is that same work.
2. Basic Permissions.
All rights granted under this License are granted for the term of
copyright on the Program, and are irrevocable provided the stated
conditions are met. This License explicitly affirms your unlimited
permission to run the unmodified Program, subject to section 13. The
output from running a covered work is covered by this License only if the
output, given its content, constitutes a covered work. This License
acknowledges your rights of fair use or other equivalent, as provided by
copyright law. Subject to section 13, you may make, run and propagate
covered works that you do not convey, without conditions so long as your
license otherwise remains in force. You may convey covered works to
others for the sole purpose of having them make modifications exclusively
for you, or provide you with facilities for running those works, provided
that you comply with the terms of this License in conveying all
material for which you do not control copyright. Those thus making or
running the covered works for you must do so exclusively on your
behalf, under your direction and control, on terms that prohibit them
from making any copies of your copyrighted material outside their
relationship with you.
Conveying under any other circumstances is permitted solely under the
conditions stated below. Sublicensing is not allowed; section 10 makes it
unnecessary.
3. Protecting Users' Legal Rights From Anti-Circumvention Law.
No covered work shall be deemed part of an effective technological
measure under any applicable law fulfilling obligations under article 11
of the WIPO copyright treaty adopted on 20 December 1996, or similar laws
prohibiting or restricting circumvention of such measures.
When you convey a covered work, you waive any legal power to forbid
circumvention of technological measures to the extent such circumvention is
effected by exercising rights under this License with respect to the
covered work, and you disclaim any intention to limit operation or
modification of the work as a means of enforcing, against the work's users,
your or third parties' legal rights to forbid circumvention of
technological measures.
4. Conveying Verbatim Copies.
You may convey verbatim copies of the Program's source code as you
receive it, in any medium, provided that you conspicuously and
appropriately publish on each copy an appropriate copyright notice; keep
intact all notices stating that this License and any non-permissive terms
added in accord with section 7 apply to the code; keep intact all notices
of the absence of any warranty; and give all recipients a copy of this
License along with the Program. You may charge any price or no price for
each copy that you convey, and you may offer support or warranty
protection for a fee.
5. Conveying Modified Source Versions.
You may convey a work based on the Program, or the modifications to
produce it from the Program, in the form of source code under the terms
of section 4, provided that you also meet all of these conditions:
a) The work must carry prominent notices stating that you modified it,
and giving a relevant date.
b) The work must carry prominent notices stating that it is released
under this License and any conditions added under section 7. This
requirement modifies the requirement in section 4 to “keep intact all
notices”.
c) You must license the entire work, as a whole, under this License to
anyone who comes into possession of a copy. This License will therefore
apply, along with any applicable section 7 additional terms, to the
whole of the work, and all its parts, regardless of how they are
packaged. This License gives no permission to license the work in any
other way, but it does not invalidate such permission if you have
separately received it.
d) If the work has interactive user interfaces, each must display
Appropriate Legal Notices; however, if the Program has interactive
interfaces that do not display Appropriate Legal Notices, your work
need not make them do so.
A compilation of a covered work with other separate and independent
works, which are not by their nature extensions of the covered work, and
which are not combined with it such as to form a larger program, in or on
a volume of a storage or distribution medium, is called an “aggregate” if
the compilation and its resulting copyright are not used to limit the
access or legal rights of the compilation's users beyond what the
individual works permit. Inclusion of a covered work in an aggregate does
not cause this License to apply to the other parts of the aggregate.
6. Conveying Non-Source Forms.
You may convey a covered work in object code form under the terms of
sections 4 and 5, provided that you also convey the machine-readable
Corresponding Source under the terms of this License, in one of these
ways:
a) Convey the object code in, or embodied in, a physical product
(including a physical distribution medium), accompanied by the
Corresponding Source fixed on a durable physical medium customarily
used for software interchange.
b) Convey the object code in, or embodied in, a physical product
(including a physical distribution medium), accompanied by a written
offer, valid for at least three years and valid for as long as you
offer spare parts or customer support for that product model, to give
anyone who possesses the object code either (1) a copy of the
Corresponding Source for all the software in the product that is
covered by this License, on a durable physical medium customarily used
for software interchange, for a price no more than your reasonable cost
of physically performing this conveying of source, or (2) access to
copy the Corresponding Source from a network server at no charge.
c) Convey individual copies of the object code with a copy of the
written offer to provide the Corresponding Source. This alternative is
allowed only occasionally and noncommercially, and only if you received
the object code with such an offer, in accord with subsection 6b.
d) Convey the object code by offering access from a designated place
(gratis or for a charge), and offer equivalent access to the
Corresponding Source in the same way through the same place at no
further charge. You need not require recipients to copy the
Corresponding Source along with the object code. If the place to copy
the object code is a network server, the Corresponding Source may be on
a different server (operated by you or a third party) that supports
equivalent copying facilities, provided you maintain clear directions
next to the object code saying where to find the Corresponding Source.
Regardless of what server hosts the Corresponding Source, you remain
obligated to ensure that it is available for as long as needed to
satisfy these requirements.
e) Convey the object code using peer-to-peer transmission, provided you
inform other peers where the object code and Corresponding Source of
the work are being offered to the general public at no charge under
subsection 6d.
A separable portion of the object code, whose source code is excluded
from the Corresponding Source as a System Library, need not be included
in conveying the object code work.
A “User Product” is either (1) a “consumer product”, which means any
tangible personal property which is normally used for personal, family,
or household purposes, or (2) anything designed or sold for incorporation
into a dwelling. In determining whether a product is a consumer product,
doubtful cases shall be resolved in favor of coverage. For a particular
product received by a particular user, “normally used” refers to a
typical or common use of that class of product, regardless of the status
of the particular user or of the way in which the particular user
actually uses, or expects or is expected to use, the product. A product
is a consumer product regardless of whether the product has substantial
commercial, industrial or non-consumer uses, unless such uses represent
the only significant mode of use of the product.
“Installation Information” for a User Product means any methods,
procedures, authorization keys, or other information required to install
and execute modified versions of a covered work in that User Product from
a modified version of its Corresponding Source. The information must
suffice to ensure that the continued functioning of the modified object
code is in no case prevented or interfered with solely because
modification has been made.
If you convey an object code work under this section in, or with, or
specifically for use in, a User Product, and the conveying occurs as part
of a transaction in which the right of possession and use of the User
Product is transferred to the recipient in perpetuity or for a fixed term
(regardless of how the transaction is characterized), the Corresponding
Source conveyed under this section must be accompanied by the
Installation Information. But this requirement does not apply if neither
you nor any third party retains the ability to install modified object
code on the User Product (for example, the work has been installed in
ROM).
The requirement to provide Installation Information does not include a
requirement to continue to provide support service, warranty, or updates
for a work that has been modified or installed by the recipient, or for
the User Product in which it has been modified or installed. Access
to a network may be denied when the modification itself materially
and adversely affects the operation of the network or violates the
rules and protocols for communication across the network.
Corresponding Source conveyed, and Installation Information provided, in
accord with this section must be in a format that is publicly documented
(and with an implementation available to the public in source code form),
and must require no special password or key for unpacking, reading or
copying.
7. Additional Terms.
“Additional permissions” are terms that supplement the terms of this
License by making exceptions from one or more of its conditions.
Additional permissions that are applicable to the entire Program shall be
treated as though they were included in this License, to the extent that
they are valid under applicable law. If additional permissions apply only
to part of the Program, that part may be used separately under those
permissions, but the entire Program remains governed by this License
without regard to the additional permissions. When you convey a copy of
a covered work, you may at your option remove any additional permissions
from that copy, or from any part of it. (Additional permissions may be
written to require their own removal in certain cases when you modify the
work.) You may place additional permissions on material, added by you to
a covered work, for which you have or can give appropriate copyright
permission.
Notwithstanding any other provision of this License, for material you add
to a covered work, you may (if authorized by the copyright holders of
that material) supplement the terms of this License with terms:
a) Disclaiming warranty or limiting liability differently from the
terms of sections 15 and 16 of this License; or
b) Requiring preservation of specified reasonable legal notices or
author attributions in that material or in the Appropriate Legal
Notices displayed by works containing it; or
c) Prohibiting misrepresentation of the origin of that material, or
requiring that modified versions of such material be marked in
reasonable ways as different from the original version; or
d) Limiting the use for publicity purposes of names of licensors or
authors of the material; or
e) Declining to grant rights under trademark law for use of some trade
names, trademarks, or service marks; or
f) Requiring indemnification of licensors and authors of that material
by anyone who conveys the material (or modified versions of it) with
contractual assumptions of liability to the recipient, for any
liability that these contractual assumptions directly impose on those
licensors and authors.
All other non-permissive additional terms are considered “further
restrictions” within the meaning of section 10. If the Program as you
received it, or any part of it, contains a notice stating that it is
governed by this License along with a term that is a further restriction,
you may remove that term. If a license document contains a further
restriction but permits relicensing or conveying under this License, you
may add to a covered work material governed by the terms of that license
document, provided that the further restriction does not survive such
relicensing or conveying.
If you add terms to a covered work in accord with this section, you must
place, in the relevant source files, a statement of the additional terms
that apply to those files, or a notice indicating where to find the
applicable terms. Additional terms, permissive or non-permissive, may be
stated in the form of a separately written license, or stated as
exceptions; the above requirements apply either way.
8. Termination.
You may not propagate or modify a covered work except as expressly
provided under this License. Any attempt otherwise to propagate or modify
it is void, and will automatically terminate your rights under this
License (including any patent licenses granted under the third paragraph
of section 11).
However, if you cease all violation of this License, then your license
from a particular copyright holder is reinstated (a) provisionally,
unless and until the copyright holder explicitly and finally terminates
your license, and (b) permanently, if the copyright holder fails to
notify you of the violation by some reasonable means prior to 60 days
after the cessation.
Moreover, your license from a particular copyright holder is reinstated
permanently if the copyright holder notifies you of the violation by some
reasonable means, this is the first time you have received notice of
violation of this License (for any work) from that copyright holder, and
you cure the violation prior to 30 days after your receipt of the notice.
Termination of your rights under this section does not terminate the
licenses of parties who have received copies or rights from you under
this License. If your rights have been terminated and not permanently
reinstated, you do not qualify to receive new licenses for the same
material under section 10.
9. Acceptance Not Required for Having Copies.
You are not required to accept this License in order to receive or run a
copy of the Program. Ancillary propagation of a covered work occurring
solely as a consequence of using peer-to-peer transmission to receive a
copy likewise does not require acceptance. However, nothing other than
this License grants you permission to propagate or modify any covered
work. These actions infringe copyright if you do not accept this License.
Therefore, by modifying or propagating a covered work, you indicate your
acceptance of this License to do so.
10. Automatic Licensing of Downstream Recipients.
Each time you convey a covered work, the recipient automatically receives
a license from the original licensors, to run, modify and propagate that
work, subject to this License. You are not responsible for enforcing
compliance by third parties with this License.
An “entity transaction” is a transaction transferring control of an
organization, or substantially all assets of one, or subdividing an
organization, or merging organizations. If propagation of a covered work
results from an entity transaction, each party to that transaction who
receives a copy of the work also receives whatever licenses to the work
the party's predecessor in interest had or could give under the previous
paragraph, plus a right to possession of the Corresponding Source of the
work from the predecessor in interest, if the predecessor has it or can
get it with reasonable efforts.
You may not impose any further restrictions on the exercise of the rights
granted or affirmed under this License. For example, you may not impose a
license fee, royalty, or other charge for exercise of rights granted
under this License, and you may not initiate litigation (including a
cross-claim or counterclaim in a lawsuit) alleging that any patent claim
is infringed by making, using, selling, offering for sale, or importing
the Program or any portion of it.
11. Patents.
A “contributor” is a copyright holder who authorizes use under this
License of the Program or a work on which the Program is based. The work
thus licensed is called the contributor's “contributor version”.
A contributor's “essential patent claims” are all patent claims owned or
controlled by the contributor, whether already acquired or hereafter
acquired, that would be infringed by some manner, permitted by this
License, of making, using, or selling its contributor version, but do not
include claims that would be infringed only as a consequence of further
modification of the contributor version. For purposes of this definition,
“control” includes the right to grant patent sublicenses in a manner
consistent with the requirements of this License.
Each contributor grants you a non-exclusive, worldwide, royalty-free
patent license under the contributor's essential patent claims, to make,
use, sell, offer for sale, import and otherwise run, modify and propagate
the contents of its contributor version.
In the following three paragraphs, a “patent license” is any express
agreement or commitment, however denominated, not to enforce a patent
(such as an express permission to practice a patent or covenant not to
sue for patent infringement). To “grant” such a patent license to a party
means to make such an agreement or commitment not to enforce a patent
against the party.
If you convey a covered work, knowingly relying on a patent license, and
the Corresponding Source of the work is not available for anyone to copy,
free of charge and under the terms of this License, through a publicly
available network server or other readily accessible means, then you must
either (1) cause the Corresponding Source to be so available, or (2)
arrange to deprive yourself of the benefit of the patent license for this
particular work, or (3) arrange, in a manner consistent with the
requirements of this License, to extend the patent license to downstream
recipients. “Knowingly relying” means you have actual knowledge that, but
for the patent license, your conveying the covered work in a country, or
your recipient's use of the covered work in a country, would infringe
one or more identifiable patents in that country that you have reason
to believe are valid.
If, pursuant to or in connection with a single transaction or
arrangement, you convey, or propagate by procuring conveyance of, a
covered work, and grant a patent license to some of the parties receiving
the covered work authorizing them to use, propagate, modify or convey a
specific copy of the covered work, then the patent license you grant is
automatically extended to all recipients of the covered work and works
based on it.
A patent license is “discriminatory” if it does not include within the
scope of its coverage, prohibits the exercise of, or is conditioned on
the non-exercise of one or more of the rights that are specifically
granted under this License. You may not convey a covered work if you are
a party to an arrangement with a third party that is in the business of
distributing software, under which you make payment to the third party
based on the extent of your activity of conveying the work, and under
which the third party grants, to any of the parties who would receive the
covered work from you, a discriminatory patent license (a) in connection
with copies of the covered work conveyed by you (or copies made from
those copies), or (b) primarily for and in connection with specific
products or compilations that contain the covered work, unless you
entered into that arrangement, or that patent license was granted, prior
to 28 March 2007.
Nothing in this License shall be construed as excluding or limiting any
implied license or other defenses to infringement that may otherwise be
available to you under applicable patent law.
12. No Surrender of Others' Freedom.
If conditions are imposed on you (whether by court order, agreement or
otherwise) that contradict the conditions of this License, they do not
excuse you from the conditions of this License. If you cannot use,
propagate or convey a covered work so as to satisfy simultaneously your
obligations under this License and any other pertinent obligations, then
as a consequence you may not use, propagate or convey it at all. For
example, if you agree to terms that obligate you to collect a royalty for
further conveying from those to whom you convey the Program, the only way
you could satisfy both those terms and this License would be to refrain
entirely from conveying the Program.
13. Offering the Program as a Service.
If you make the functionality of the Program or a modified version
available to third parties as a service, you must make the Service Source
Code available via network download to everyone at no charge, under the
terms of this License. Making the functionality of the Program or
modified version available to third parties as a service includes,
without limitation, enabling third parties to interact with the
functionality of the Program or modified version remotely through a
computer network, offering a service the value of which entirely or
primarily derives from the value of the Program or modified version, or
offering a service that accomplishes for users the primary purpose of the
Program or modified version.
“Service Source Code” means the Corresponding Source for the Program or
the modified version, and the Corresponding Source for all programs that
you use to make the Program or modified version available as a service,
including, without limitation, management software, user interfaces,
application program interfaces, automation software, monitoring software,
backup software, storage software and hosting software, all such that a
user could run an instance of the service using the Service Source Code
you make available.
14. Revised Versions of this License.
MongoDB, Inc. may publish revised and/or new versions of the Server Side
Public License from time to time. Such new versions will be similar in
spirit to the present version, but may differ in detail to address new
problems or concerns.
Each version is given a distinguishing version number. If the Program
specifies that a certain numbered version of the Server Side Public
License “or any later version” applies to it, you have the option of
following the terms and conditions either of that numbered version or of
any later version published by MongoDB, Inc. If the Program does not
specify a version number of the Server Side Public License, you may
choose any version ever published by MongoDB, Inc.
If the Program specifies that a proxy can decide which future versions of
the Server Side Public License can be used, that proxy's public statement
of acceptance of a version permanently authorizes you to choose that
version for the Program.
Later license versions may give you additional or different permissions.
However, no additional obligations are imposed on any author or copyright
holder as a result of your choosing to follow a later version.
15. Disclaimer of Warranty.
THERE IS NO WARRANTY FOR THE PROGRAM, TO THE EXTENT PERMITTED BY
APPLICABLE LAW. EXCEPT WHEN OTHERWISE STATED IN WRITING THE COPYRIGHT
HOLDERS AND/OR OTHER PARTIES PROVIDE THE PROGRAM “AS IS” WITHOUT WARRANTY
OF ANY KIND, EITHER EXPRESSED OR IMPLIED, INCLUDING, BUT NOT LIMITED TO,
THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR
PURPOSE. THE ENTIRE RISK AS TO THE QUALITY AND PERFORMANCE OF THE PROGRAM
IS WITH YOU. SHOULD THE PROGRAM PROVE DEFECTIVE, YOU ASSUME THE COST OF
ALL NECESSARY SERVICING, REPAIR OR CORRECTION.
16. Limitation of Liability.
IN NO EVENT UNLESS REQUIRED BY APPLICABLE LAW OR AGREED TO IN WRITING
WILL ANY COPYRIGHT HOLDER, OR ANY OTHER PARTY WHO MODIFIES AND/OR CONVEYS
THE PROGRAM AS PERMITTED ABOVE, BE LIABLE TO YOU FOR DAMAGES, INCLUDING
ANY GENERAL, SPECIAL, INCIDENTAL OR CONSEQUENTIAL DAMAGES ARISING OUT OF
THE USE OR INABILITY TO USE THE PROGRAM (INCLUDING BUT NOT LIMITED TO
LOSS OF DATA OR DATA BEING RENDERED INACCURATE OR LOSSES SUSTAINED BY YOU
OR THIRD PARTIES OR A FAILURE OF THE PROGRAM TO OPERATE WITH ANY OTHER
PROGRAMS), EVEN IF SUCH HOLDER OR OTHER PARTY HAS BEEN ADVISED OF THE
POSSIBILITY OF SUCH DAMAGES.
17. Interpretation of Sections 15 and 16.
If the disclaimer of warranty and limitation of liability provided above
cannot be given local legal effect according to their terms, reviewing
courts shall apply local law that most closely approximates an absolute
waiver of all civil liability in connection with the Program, unless a
warranty or assumption of liability accompanies a copy of the Program in
return for a fee.
END OF TERMS AND CONDITIONS

27
README.md Normal file
View File

@@ -0,0 +1,27 @@
<div align="center">
<a href="https://app.community.clear.ml"><img src="https://github.com/allegroai/clearml/blob/master/docs/clearml-logo.svg?raw=true" width="250px"></a>
**ClearML - Auto-Magical Suite of tools to streamline your ML workflow
</br>Experiment Manager, ML-Ops and Data-Management**
</div>
# ClearML Documentation Website
The ClearML documentation website is built using [Docusaurus 2](https://v2.docusaurus.io/), a modern static website generator.
## Contributing (yes please!)
**PRs are always welcomed** :heart:
Good PR examples:
* If you see something that is inaccurate or missing
* A topic that interests you is not addressed
* You feel that a guide would have made your life easier
* Anything you know / experienced that might also help other community members
* Setup and use-case examples
_May the force (and the goddess of learning rates) be with you!_

12
babel.config.js Normal file
View File

@@ -0,0 +1,12 @@
/**
* Copyright (c) Facebook, Inc. and its affiliates.
*
* This source code is licensed under the MIT license found in the
* LICENSE file in the root directory of this source tree.
*
* @format
*/
module.exports = {
presets: [require.resolve('@docusaurus/core/lib/babel/preset')],
};

View File

@@ -0,0 +1,108 @@
---
title: ClearML Session
---
Machine Learning and Deep Learning development is sometimes more challenging than traditional software development. If
you are working on an average laptop or computer, and you have a sizeable dataset that requires significant computation,
your local machine may not be able to provide you with the resources for an effective workflow.
If you can run and debug your code on your own machine, congrats you are lucky! Continue doing that, then clone your code
in the UI and send it for long-term training on a remote machine.
**If you are not that lucky**, this section is for you :)
## What does Clearml Session do?
`clearml-session` is a feature that allows to launch a session of Jupyterlab and VS Code, and to execute code on a remote
machine that better meets resource needs. With this feature, local links are provided, which can be used to access
JupyterLab and VSCode on a remote machine over a secure and encrypted SSH connection.
![image](../img/clearml_session_jupyter.png)
## How it Works
ClearML allows to leverage a resource (e.g. GPU or CPU machine) by utilizing the [ClearML Agent](../clearml_agent).
A ClearML Agent will be executed on target machine, and ClearML Session will instruct it to execute the Jupyter \ VSCode server to develop remotely.
After entering a `clearml-session` command with all
specifications:
1. `clearml-session` creates a new [Task](../fundamentals/task.md) that is responsible for setting up the SSH and
JupyterLab / VSCode environment, according to your specifications, on the host machine.
1. The Task is enqueued to the queue ClearML Agent listens to and then executed by it. It will download the appropriate server and execute it.
1. Once the Agent finishes the initial setup of the interactive Task, the local `cleaml-session` connects to the host
machine via SSH, and tunnels both SSH and JupyterLab over the SSH connection. If a specific Docker was specified, the
JupyterLab environment will run inside the Docker.
1. The CLI outputs access links to the remote JupyterLab and VSCode sessions:
```console
Interactive session is running:
SSH: ssh root@localhost -p 8022 [password: c5d19b3c0fa9784ba4f6aeb568c1e036a4fc2a4bc7f9bfc54a2c198d64ceb9c8]
Jupyter Lab URL: http://localhost:8878/?token=ff7e5e8b9e5493a01b1a72530d18181320630b95f442b419
VSCode server available at http://localhost:8898/
```
Notice the links are to 'localhost' since all communication to the remote server itself is done over secure SSH connection.
1. Now start working on the code as if you're running on the target machine itself!
## Features
### Running in Docker
To run a session inside a Docker container, use the `--docker` flag and enter the docker image to use in the interactive
session.
### Passing requirements
`clearml-session` can download required Python packages. If the code you are going to execute in the remote session, has
required packages, they can be specified. If there is a `requirement.txt` file, the file can be attached to the
command using `--requirements </file/location.txt>`. Alternatively, packages can be manually entered, using `--packages "<package_name>"`
(for example `--packages "keras" "clearml"`).
### Passing Git credentials
To send local .git-credentials file to the interactive session, add a `--git-credentials` flag and set it to `True`.
This way, git references can be tracked, including untracked changes.
### Re-launching and shutting down sessions
If a `clearml-session` was launched locally and is still running on a remote machine, users can easily reconnect to it.
To reconnect to a previous session, execute `clearml-session` with no additional flags, and the option of reconnecting
to an existing session will show up:
```console
Connect to active session id=c7302b564aa945408aaa40ac5c69399c [Y]/n?`
```
If multiple sessions were launched from a local machine and are still active, choose the session to reconnect to:
```console
Active sessions:
0*] 2021-05-09 12:24:11 id=ed48fb83ad76430686b1abdbaa6eb1dd
1] 2021-05-09 12:06:48 id=009eb34abde74182a8be82f62af032ea
Connect to session [0-1] or 'N' to skip
```
To shut down a remote session, which will free the `clearml-agent` and close the CLI, enter "Shutdown". If a session
is shutdown, there is no option to reconnect to it.
### Connecting to existing session
If a `clearml-session` is running remotely, it's possible to continue working on the session from any machine. Starting a
session initializes a Task with a unique ID in the ClearML Server. To connect to an existing session:
1. Go to the ClearML UI, find the interactive session Task (by default it's in project "DevOps").
1. Click on the ID button to the right of the Task name, and copy the unique ID.
1. Enter the following command: `clearml-session --attach <session_id>`.
1. Click on the JupyterLab / VSCode link that is outputted, or connect directly to the SSH session
### Starting a debugging session
Previously executed experiments in the ClearML system can be debugged on a remote interactive session.
Input into `clearml-session` the ID of a Task to debug, then `clearml-session` clones the experiment's git repository and
replicates the environment on a remote machine. Then the code can be interactively executed and debugged on JupyterLab / VSCode.
:::note
The Task must be connected to a git repository, since currently single script debugging is not supported.
:::
1. In the **ClearML web UI**, find the experiment (Task) that needs debugging.
1. Click on the ID button next to the Task name, and copy the unique ID.
1. Enter the following command: `clearml-session --debugging-session <experiment_id_here>`
1. Click on the JupyterLab / VSCode link, or connect directly to the SSH session.
1. In JupyterLab / VSCode, access the experiment's repository in the `environment/task_repository` folder.

41
docs/apps/clearml_task.md Normal file
View File

@@ -0,0 +1,41 @@
---
title: ClearML Task
---
ClearML Task is ClearML's Zero Code Integration Module. Using only the command line and **zero** additional lines of code,
you can easily track your work and integrate ClearML with your existing code.
`clearml-task` automatically integrates ClearML into any script or **any** python repository. `clearml-task` has the option
to send the Task to a queue, where a **ClearML Agent** listening to the queue will fetch the Task it and executes it on a
remote or local machine. It's even possible to provide command line arguments and provide Python module dependencies and requirements.txt file!
## How Does ClearML Task Work?
1. Execute `clearml-task`, pointing it to your script or repository, and optionally an execution queue.
1. `clearml-task` does its magic! It creates a new experiment on the [ClearML Server](../deploying_clearml/clearml_server.md),
and, if a queue was specified, it sends the experiment to the queue to be fetched and executed by a **ClearML Agent**.
1. The command line will provide you with a link to your Task's page in the ClearML web UI,
where you will be able to view the Task's details.
## Features and Options
### Docker
Specify a docker container to run the code in by with the `--docker <docker_image>` flag.
The ClearML Agent will pull it from dockerhub or a docker artifactory automatically.
### Package Dependencies
If the local script requires packages to be installed installed or the remote repository doesn't have a requirements.txt file,
specify manually the required python packages using<br/>
`--packages "<package_name>"`, for example `--packages "keras" "tensorflow>2.2"`.
### Queue
Tasks are passed to ClearML Agents via [Queues](../fundamentals/agents_and_queues.md). Specify a queue to enqueue the Task to.
If a queue isn't chosen in the `clearml-task` command, the Task will not be executed; it will be left in draft mode,
and can be enqueued at a later point.
### Branch and Working Directory
A specific branch and commit ID, other than latest commit in master, to be executed can be specified by passing
`--branch <branch_name> --commit <commit_id>` flags.
If unspecified, `clearml-task` will use the latest commit from the master branch.
Learn how to use the `clearml-task` feature [here](../guides/clearml-task/clearml_task_tutorial.md).

View File

@@ -0,0 +1,3 @@
---
title: Remote Pycharm Debugging
---

506
docs/clearml_agent.md Normal file
View File

@@ -0,0 +1,506 @@
---
title: ClearML Agent
---
**ClearML Agent** is a virtual environment and execution manager for DL / ML solutions on GPU machines. It integrates with the **ClearML Python Package** and **ClearML Server** to provide a full AI cluster solution. <br/>
Its main focus is around:
- Reproducing experiments, including their complete environments.
- Scaling workflows on multiple target machines.
**ClearML Agent** executes an experiment or other workflow by reproducing the state of the code from the original machine
to a remote machine, and executing the code as follows:
1. **ClearML Agent** creates a new Python virtual environment (for every experiment).
1. In the new Python virtual environment, **ClearML Agent** installs the required Python package versions.
1. **ClearML Agent** clones the Git repository based on the definition stored in the experiment.
1. **ClearML Agent** applies the uncommitted changes to the newly cloned code.
1. Once the state of the code is reproduced on a remote machine, **ClearML Agent** runs the Python script based on the
working directory and entry point stored in the experiment. It executes with logging and monitoring.
1. While the Task is executing, and anytime after, track the experiment and visualize results in the **ClearML Web UI**.
Continue using **ClearML Agent** once it is running on a target machine. Reproduce experiments and execute
automated workflows in one (or both) of the following ways:
* Programmatically
* By using the **ClearML Web UI** (without directly working with code), by enqueuing experiments
to the queue that a **ClearML Agent** is listening to.
For more information, see [ClearML Agent Reference](references/clearml_agent_ref.md),
and [configuration options](configs/clearml_conf.md#agent-section).
## Installation
:::note
If **ClearML** was previously configured, follow [this](clearml_agent#adding-clearml-agent-to-a-configuration-file) to add
ClearML Agent specific configurations
:::
To install ClearML Agent execute
```bash
pip install clearml-agent
```
## Configuration
1. In a terminal session, execute
```bash
clearml-agent init
```
The setup wizard prompts for **ClearML** credentials (see [here](webapp/webapp_profile.md#creating-clearml-credentials) about obtaining credentials).
CLEARML-AGENT setup process
Please create new clearml credentials through the profile page in your clearml web app (e.g., https://demoapp.demo.clear.ml/profile)
In the profile page, press "Create new credentials", then press "Copy to clipboard".
Paste copied configuration here:
If the setup wizard's response indicates that a configuration file already exists, follow the instructions [here](#adding-clearml-agent-to-a-configuration-file).
The wizard does not edit or overwrite existing configuration files.
1. At the command prompt `Paste copied configuration here:`, copy and paste the **ClearML** credentials and press **Enter**.
The setup wizard confirms the credentials.
Detected credentials key="********************" secret="*******"
1. **Enter** to accept default server URL, which is detected from the credentials or Enter a ClearML web server URL.
A secure protocol, https, must be used. **Do not use http.**
WEB Host configured to: [https://app.community.clear.ml]
:::note
If you are using a self-hosted ClearML Server, the default URL will use your domain.
:::
1. Do as above for API, URL, and file servers.
1. The wizard responds with your configuration:
CLEARML Hosts configuration:
Web App: https://app.community.clear.ml
API: https://demoapi.clearml.allegro.ai
File Store: https://demofiles.clearml.allegro.ai
Verifying credentials ...
Credentials verified!
1. Enter your Git username and password. Leave blank for SSH key authentication or when only using public repositories.<br/>
This is needed for cloning repositories by the agent.
Enter git username for repository cloning (leave blank for SSH key authentication): []
Enter password for user '<username>':
The setup wizard confirms your git credentials.
Git repository cloning will be using user=<username> password=<password>
1. Enter an additional artifact repository, or press **Enter** if not required.<br/>
This is needed for installing Python packages not found in pypi.
Enter additional artifact repository (extra-index-url) to use when installing python packages (leave blank if not required):
The setup wizard completes.
New configuration stored in /home/<username>/clearml.conf
CLEARML-AGENT setup completed successfully.
The configuration file location depends upon the operating system:
* Linux - `~/clearml.conf`
* Mac - `$HOME/clearml.conf`
* Windows - `\User\<username>\clearml.conf`
1. Optionally, configure **ClearML** options for **ClearML Agent** (default docker, package manager, etc.). See the [ClearML Configuration Reference](configs/clearml_conf.md).
### Adding ClearML Agent to a configuration file
In case a `clearml.conf` file already exists, add a few ClearML Agent specific configurations to it.<br/>
**Adding ClearML Agent to a ClearML configuration file:**
1. Open the **ClearML** configuration file for editing. Depending upon the operating system, it is:
* Linux - `~/clearml.conf`
* Mac - `$HOME/clearml.conf`
* Windows - `\User\<username>\clearml.conf`
1. After the `api` section, add the following `agent` section:
agent {
# Set GIT user/pass credentials (if user/pass are set, GIT protocol will be set to https)
# leave blank for GIT SSH credentials (set force_git_ssh_protocol=true to force SSH protocol)
git_user: ""
git_pass: ""
# Limit credentials to a single domain, for example: github.com,
# all other domains will use public access (no user/pass). Default: always send user/pass for any VCS domain
git_host=""
# Force GIT protocol to use SSH regardless of the git url (Assumes GIT user/pass are blank)
force_git_ssh_protocol: false
# Force a specific SSH port when converting http to ssh links (the domain is kept the same)
# force_git_ssh_port: 0
# Force a specific SSH username when converting http to ssh links (the default username is 'git')
# force_git_ssh_user: git
# unique name of this worker, if None, created based on hostname:process_id
# Override with os environment: CLEARML_WORKER_ID
# worker_id: "clearml-agent-machine1:gpu0"
worker_id: ""
# worker name, replaces the hostname when creating a unique name for this worker
# Override with os environment: CLEARML_WORKER_ID
# worker_name: "clearml-agent-machine1"
worker_name: ""
# Set the python version to use when creating the virtual environment and launching the experiment
# Example values: "/usr/bin/python3" or "/usr/local/bin/python3.6"
# The default is the python executing the clearml_agent
python_binary: ""
# select python package manager:
# currently supported pip and conda
# poetry is used if pip selected and repository contains poetry.lock file
package_manager: {
# supported options: pip, conda, poetry
type: pip,
# specify pip version to use (examples "<20", "==19.3.1", "", empty string will install the latest version)
pip_version: "<20.2",
# virtual environment inheres packages from system
system_site_packages: false,
# install with --upgrade
force_upgrade: false,
# additional artifact repositories to use when installing python packages
# extra_index_url: ["https://allegroai.jfrog.io/clearmlai/api/pypi/public/simple"]
extra_index_url: []
# additional conda channels to use when installing with conda package manager
conda_channels: ["defaults", "conda-forge", "pytorch", ]
# conda_full_env_update: false
# conda_env_as_base_docker: false
# set the priority packages to be installed before the rest of the required packages
# priority_packages: ["cython", "numpy", "setuptools", ]
# set the optional priority packages to be installed before the rest of the required packages,
# In case a package installation fails, the package will be ignored,
# and the virtual environment process will continue
# priority_optional_packages: ["pygobject", ]
# set the post packages to be installed after all the rest of the required packages
# post_packages: ["horovod", ]
# set the optional post packages to be installed after all the rest of the required packages,
# In case a package installation fails, the package will be ignored,
# and the virtual environment process will continue
# post_optional_packages: []
# set to True to support torch nightly build installation,
# notice: torch nightly builds are ephemeral and are deleted from time to time
torch_nightly: false,
},
# target folder for virtual environments builds, created when executing experiment
venvs_dir = ~/.clearml/venvs-builds
# cached virtual environment folder
venvs_cache: {
# maximum number of cached venvs
max_entries: 10
# minimum required free space to allow for cache entry, disable by passing 0 or negative value
free_space_threshold_gb: 2.0
# unmark to enable virtual environment caching
# path: ~/.clearml/venvs-cache
},
# cached git clone folder
vcs_cache: {
enabled: true,
path: ~/.clearml/vcs-cache
},
# use venv-update in order to accelerate python virtual environment building
# Still in beta, turned off by default
venv_update: {
enabled: false,
},
# cached folder for specific python package download (mostly pytorch versions)
pip_download_cache {
enabled: true,
path: ~/.clearml/pip-download-cache
},
translate_ssh: true,
# reload configuration file every daemon execution
reload_config: false,
# pip cache folder mapped into docker, used for python package caching
docker_pip_cache = ~/.clearml/pip-cache
# apt cache folder mapped into docker, used for ubuntu package caching
docker_apt_cache = ~/.clearml/apt-cache
# optional arguments to pass to docker image
# these are local for this agent and will not be updated in the experiment's docker_cmd section
# extra_docker_arguments: ["--ipc=host", "-v", "/mnt/host/data:/mnt/data"]
# optional shell script to run in docker when started before the experiment is started
# extra_docker_shell_script: ["apt-get install -y bindfs", ]
# Install the required packages for opencv libraries (libsm6 libxext6 libxrender-dev libglib2.0-0),
# for backwards compatibility reasons, true as default,
# change to false to skip installation and decrease docker spin up time
# docker_install_opencv_libs: true
# set to true in order to force "docker pull" before running an experiment using a docker image.
# This makes sure the docker image is updated.
docker_force_pull: false
default_docker: {
# default docker image to use when running in docker mode
image: "nvidia/cuda:10.1-runtime-ubuntu18.04"
# optional arguments to pass to docker image
# arguments: ["--ipc=host", ]
}
# set the OS environments based on the Task's Environment section before launching the Task process.
enable_task_env: false
# CUDA versions used for Conda setup & solving PyTorch wheel packages
# it Should be detected automatically. Override with os environment CUDA_VERSION / CUDNN_VERSION
# cuda_version: 10.1
# cudnn_version: 7.6
}
1. Save the configuration.
## Execution
### Simple Execution
#### Executing an Agent
To execute an agent, listening to a queue, run:
```bash
clearml-agent daemon --queue <queue_name>
```
#### Executing in Background
To execute an agent in the background, run:
```bash
clearml-agent daemon --queue <execution_queue_to_pull_from> --detached
```
#### Stopping Agents
To stop an agent running in the background, run:
```bash
clearml-agent daemon <arguments> --stop
```
#### Allocating Resources
To specify GPUs associated with the agent, add the `--gpus` flag.
To execute multiple agents on the same machine (usually assigning GPU for the different agents), run:
```bash
clearml-agent daemon --detached --queue default --gpus 0
clearml-agent daemon --detached --queue default --gpus 1
```
To allocate more than one GPU, provide a list of allocated GPUs
```bash
clearml-agent daemon --gpus 0,1 --queue dual_gpu &
```
#### Queue Prioritization
A single agent can listen to multiple queues. The priority is set by their order.
```bash
clearml-agent daemon --detached --queue high_q low_q --gpus 0
```
This ensures the agent first tries to pull a Task from the “hiqh_q” queue, and only if it is empty, the agent will try to pull
from the “low_q” queue.
To make sure an agent pulls from all queues equally, add the `--order-fairness` flag.
```bash
clearml-agent daemon --detached --queue group_a group_b --order-fairness --gpus 0
```
It will make sure the agent will pull from the “group_a” queue, then from “group_b”, then back to “group_a”, etc. This ensures
that “group A” or ”group_b” will not be able to starve one another of resources.
### Explicit Task execution
ClearML Agent can also execute specific tasks directly, without listening to a queue.
#### Execute a Task without queue
Execute a Task with a `clearml-agent` worker without a queue.
```bash
clearml-agent execute --id <task-id>
```
#### Clone a Task and execute the cloned Task
Clone the specified Task and execute the cloned Task with a `clearml-agent` worker without a queue.
```bash
clearml-agent execute --id <task-id> --clone
```
#### Execute Task inside a Docker
Execute a Task with a `clearml-agent` worker using a Docker container without a queue.
```bash
clearml-agent execute --id <task-id> --docker
```
### Debugging
* Run a `clearml-agent` daemon in foreground mode, sending all output to the console.
```bash
clearml-agent daemon --queue default --foreground
```
## Building Docker Containers
### Task Container
Build a Docker container that when launched executes a specific experiment, or a clone (copy) of that experiment.
- Build a Docker container that at launch will execute a specific Task.
```bash
clearml-agent build --id <task-id> --docker --target <new-docker-name> --entry-point reuse_task
```
- Build a Docker container that at launch will clone a Task specified by Task ID, and will execute the newly cloned Task.
```bash
clearml-agent build --id <task-id> --docker --target <new-docker-name> --entry-point clone_task
```
- Run built Docker by executing:
```bash
docker run <new-docker-name>
```
### Base Docker Container
Build a Docker container according to the execution environment of a specific Task.
```bash
clearml-agent build --id <task-id> --docker --target <new-docker-name>
```
It's possible to add the Docker container as the base Docker image to a Task (experiment), using one of the following methods:
- Using the **ClearML Web UI** - See [Base Docker image](webapp/webapp_exp_tuning.md#base-docker-image) on the "Tuning
Experiments" page.
- In the **ClearML** configuration file - Use the **ClearML** configuration file [agent.default_docker](configs/clearml_conf.md#agentdefault_docker)
options.
## Execution Environments
ClearML Agent supports executing tasks in multiple environments.
### PIP Mode
By default, ClearML Agent works in PIP Mode, in which it uses [pip](https://en.wikipedia.org/wiki/Pip_(package_manager))
as the package manager. When ClearML runs, it will create a virtual environment
(or reuse an exisitng one, see [here](clearml_agent.md#virtual-environment-reuse)).
Task dependencies (Python packages) will be installed in the virtual environment.
### Conda Mode
This mode is similar to the PIP mode but uses [Conda](https://docs.conda.io/en/latest/) as the package
manager. To enable Conda mode, edit the `clearml.conf` file, and modify the `type: pip` to `type: conda` in the “package_manager” section.
If extra conda channels are needed, look for “conda_channels” under “package_manager”, and add the missing channel.
### Poetry Mode
This mode is similar to the PIP mode but uses [Poetry](https://python-poetry.org/) as the package manager.
To enable Poetry mode, edit the `clearml.conf` file, and modify the `type: pip` to `type: poetry` in the “package_manager”
section.
### Docker Mode
:::note
Docker Mode is only supported in linux.<br/>
Docker Mode requires docker service v19.03 or higher installed.
:::
When executing the ClearML Agent in Docker mode, it will:
1. Run the provided Docker container
1. Install ClearML Agent in the container
1. Execute the Task in the container, and monitor the process.
ClearML Agent uses the provided default Docker container, which can be overridden from the UI.
All ClearML Agent flags (Such as `--gpus` and `--foreground`) are applicable to Docker mode as well.
To execute ClearML Agent in Docker mode, run:
```bash
clearml-agent daemon --queue <execution_queue_to_pull_from> --docker [optional default docker image to use]
```
To use the current `clearml-agent` version in the Docker container, instead of the latest `clearml-agent` version that is
automatically installed, run:
```bash
clearml-agent daemon --queue default --docker --force-current-version
```
For Kubernetes, specify a host mount on the daemon host. Do not use the host mount inside the Docker container.
Set the environment variable `CLEARML_AGENT_K8S_HOST_MOUNT`.
For example:
```
CLEARML_AGENT_K8S_HOST_MOUNT=/mnt/host/data:/root/.clearml
```
## Environment Caching
ClearML Agent caches virtual environments so when running experiments multiple times, there's no need to spend time reinstalling
pre-installed packages. To make use of the cached virtual environments, enable the virtual environment reuse mechanism.
#### Virtual Environment Reuse
The virtual environment reuse feature may reduce experiment startup time dramatically.
By default, ClearML uses the package manager's environment caching. This means that even if no
new packages need to be installed, checking the list of packages can take a long time.
ClearML has a virtual environment reuse mechanism which, when enabled, allows using environments as-is without resolving
installed packages. This means that when executing multiple experiments with the same package dependencies,
the same environment will be used.
:::note
ClearML does not support environment reuse when using Poetry package manager
:::
To enable environment reuse, modify the `clearml.conf` file and unmark the venvs_cache section.
```
venvs_cache: {
# maximum number of cached venvs
max_entries: 10
# minimum required free space to allow for cache entry, disable by passing 0 or negative value
free_space_threshold_gb: 2.0
# unmark to enable virtual environment caching
# path: ~/.clearml/venvs-cache
},
```
## Services Mode
The ClearML Agent Services Mode executes an Agent that can execute multiple Tasks. This is useful for Tasks that are mostly
idling, such as periodic cleanup services, or a [pipeline controller](references/sdk/automation_controller_pipelinecontroller.md).
Launch a service Task like any other Task, by enqueuing it to the appropriate queue.
:::note
The default `clearml-server` configuration already runs a single `clearml-agent` in services mode that listens to the “services” queue.
:::
To run a `clearml-agent` in services mode, run:
```bash
clearml-agent daemon --services-mode --queue services --create-queue --docker <docker_name> --cpu-only
```
:::note
`services-mode` currently only supports Docker mode. Each service spins on its own Docker image.
:::
:::warning
Do not enqueue training or inference Tasks into the services queue. They will put an unnecessary load on the server.
:::

343
docs/clearml_data.md Normal file
View File

@@ -0,0 +1,343 @@
---
title: ClearML Data
---
In Machine Learning, you are very likely dealing with a gargantuan amount of data that you need to put in a dataset,
which you then need to be able to share, reproduce, and track.
ClearML Data Management solves two important challenges:
- Accessibility - Making data easily accessible from every machine,
- Versioning - Linking data and experiments for better **traceability**.
**We believe Data is not code**. It should not be stored in a git tree, because progress on datasets is not always linear.
Moreover, it can be difficult and inefficient to find on a git tree the commit associated with a certain version of a dataset.
The data usage in experiments needs to have high observability and needs to be understood, not just by data scientists.
`clearml-data` allows to easily create new flexible datasets, from which users can add and remove files. These datasets can
be retrieved simply from any machine with physical or network access to the data. Additionally, datasets can be set up to
inherit from other datasets, so data lineages can be created, and users can track when and how their data changes.
`clearml-data` utilizes existing object storage like S3/GS/Azure and even plain file system shares.
Datasets are stored in a binary differential format, allowing storage optimization and networking. Local copies
of datasets are always cached, so the same data never needs to be downloaded twice.
ClearML-data offers two interfaces:
- `clearml-data` - CLI utility for creating, uploading, and managing datasets.
- `clearml.Dataset` - A python interface for creating, retrieving, managing, and using datasets.
## Creating a Dataset
Using the `clearml-data` CLI, users can create datasets using the following commands:
```bash
clearml-data create --project dataset_example --name initial_version
clearml-data add --files data_folder
```
The commands will do the following:
1. Start a Data Processing Task called "initial_version" in the "dataset_example" project
1. The CLI will return a unique ID for the dataset
1. All the files from the "data_folder" folder will be added to the dataset and uploaded
by default to the [ClearML server](deploying_clearml/clearml_server.md).
:::note
`clearml-data` is stateful and remembers the last created dataset so there's no need to specify a specific dataset ID unless
we want to work on another dataset.
:::
## Using a Dataset
Now in our python code, we can access and use the created dataset from anywhere:
```python
from clearml import Dataset
local_path = Dataset.get(dataset_id='dataset_id_from_previous_command').get_local_copy()
```
We have all our files in the same folder structure under `local_path`, it is that simple!<br/>
The next step is to set the dataset_id as a parameter for our code and voilà! We can now train on any dataset we have in
the system.
## Setup
`clearml-data` comes built-in with our `clearml` python package! Just check out the [getting started](getting_started/ds/ds_first_steps.md) guide for more info!
## Usage
### CLI
It's possible to manage datasets (create \ modify \ upload \ delete) with the `clearml-data` command line tool.
#### Creating a Dataset
```bash
clearml-data create --project <project_name> --name <dataset_name> --parents <existing_dataset_id>`
```
Creates a new dataset. <br/>
**Parameters**
|Name|Description|Optional|
|---|---|---|
|name |Dataset's name| <img src="/icons/ico-optional-no.svg" className="icon size-md center-md" /> |
|project|Dataset's project| <img src="/icons/ico-optional-no.svg" className="icon size-md center-md" /> |
|parents|IDs of the dataset's parents. The dataset inherits all of its parents' content. Multiple parents can be entered, but they are merged in the order they were entered| <img src="/icons/ico-optional-yes.svg" className="icon size-md center-md" /> |
|tags |Dataset user tags. The dataset can be labeled, which can be useful for organizing datasets| <img src="/icons/ico-optional-yes.svg" className="icon size-md center-md" />|
:::important
clearml-data works in a stateful mode so once a new dataset is created, the following commands
do not require the `--id` flag.
:::
<br/>
#### Add Files to Dataset
```bash
clearml-data add --id <dataset_id> --files <filenames/folders_to_add>
```
It's possible to add individual files or complete folders.<br/>
**Parameters**
|Name|Description|Optional|
|---|---|---|
|id | Dataset's ID. Default: previously created / accessed dataset| <img src="/icons/ico-optional-yes.svg" className="icon size-md center-md" /> |
|files|Files / folders to add. Wildcard selection is supported, for example: `~/data/*.jpg ~/data/json` | <img src="/icons/ico-optional-no.svg" className="icon size-md center-md" /> |
|dataset-folder | Dataset base folder to add the files to in the dataset. Default: dataset root| <img src="/icons/ico-optional-yes.svg" className="icon size-md center-md" /> |
|non-recursive | Disable recursive scan of files | <img src="/icons/ico-optional-yes.svg" className="icon size-md center-md" /> |
|verbose | Verbose reporting | <img src="/icons/ico-optional-yes.svg" className="icon size-md center-md" />|
<br/>
#### Remove Files From Dataset
```bash
clearml-data remove --id <dataset_id_to_remove_from> --files <filenames/folders_to_remove>
```
**Parameters**
|Name|Description|Optional|
|---|---|---|
|id | Dataset's ID. Default: previously created / accessed dataset| <img src="/icons/ico-optional-yes.svg" className="icon size-md center-md" /> |
|files | Files / folders to remove (wildcard selection is supported, for example: `~/data/*.jpg ~/data/json`). Notice: file path is the path within the dataset, not the local path.| <img src="/icons/ico-optional-no.svg" className="icon size-md center-md" /> |
|non-recursive | Disable recursive scan of files | <img src="/icons/ico-optional-yes.svg" className="icon size-md center-md" /> |
|verbose | Verbose reporting | <img src="/icons/ico-optional-yes.svg" className="icon size-md center-md" />|
<br/>
#### Finalize Dataset
```bash
clearml-data close --id <dataset_id>
```
Finalizes the dataset and makes it ready to be consumed.
It automatically uploads all files that were not previously uploaded.
Once a dataset is finalized, it can no longer be modified.
**Parameters**
|Name|Description|Optional|
|---|---|---|
|id| Dataset's ID. Default: previously created / accessed dataset| <img src="/icons/ico-optional-yes.svg" className="icon size-md center-md" /> |
|storage| Remote storage to use for the dataset files. Default: files_server | <img src="/icons/ico-optional-yes.svg" className="icon size-md center-md" /> |
|disable-upload | Disable automatic upload when closing the dataset | <img src="/icons/ico-optional-yes.svg" className="icon size-md center-md" /> |
|verbose | Verbose reporting | <img src="/icons/ico-optional-yes.svg" className="icon size-md center-md" />|
<br/>
#### Upload Dataset' Content
```bash
clearml-data upload [--id <dataset_id>] [--storage <upload_destination>]
```
Uploads added files to [ClearML Server](deploying_clearml/clearml_server.md) by default. It's possible to specify a different storage
medium by entering an upload destination, such as `s3://bucket`, `gs://`, `azure://`, `/mnt/shared/`.
**Parameters**
|Name|Description|Optional|
|---|---|---|
|id| Dataset's ID. Default: previously created / accessed dataset| <img src="/icons/ico-optional-yes.svg" className="icon size-md center-md" /> |
|storage| Remote storage to use for the dataset files. Default: files_server | <img src="/icons/ico-optional-yes.svg" className="icon size-md center-md" /> |
|verbose | Verbose reporting | <img src="/icons/ico-optional-yes.svg" className="icon size-md center-md" />|
<br/>
#### Sync Local Folder
```
clearml-data sync [--id <dataset_id] --folder <folder_location> [--parents '<parent_id>']`
```
This option syncs a folder's content with ClearML. It is useful in case a user has a single point of truth (i.e. a folder) which
updates from time to time.
Once an update should be reflected into ClearML's system, users can call `clearml-data sync`, create a new dataset, enter the folder,
and the changes (either file addition, modification and removal) will be reflected in ClearML.
This command also uploads the data and finalizes the dataset automatically.
**Parameters**
|Name|Description|Optional|
|---|---|---|
|id| Dataset's ID. Default: previously created / accessed dataset| <img src="/icons/ico-optional-yes.svg" className="icon size-md center-md" /> |
|folder|Local folder to sync. Wildcard selection is supported, for example: `~/data/*.jpg ~/data/json`|<img src="/icons/ico-optional-no.svg" className="icon size-md center-md" />|
|storage|Remote storage to use for the dataset files. Default: files_server |<img src="/icons/ico-optional-yes.svg" className="icon size-md center-md" />|
|parents|IDs of the dataset's parents (i.e. merge all parents). All modifications made to the folder since the parents were synced will be reflected in the dataset|<img src="/icons/ico-optional-yes.svg" className="icon size-md center-md" />|
|project|If creating a new dataset, specify the dataset's project name|<img src="/icons/ico-optional-yes.svg" className="icon size-md center-md" />|
|name|If creating a new dataset, specify the dataset's name|<img src="/icons/ico-optional-yes.svg" className="icon size-md center-md" />|
|tags|Dataset user tags|<img src="/icons/ico-optional-yes.svg" className="icon size-md center-md" />|
|skip-close|Do not auto close dataset after syncing folders|<img src="/icons/ico-optional-yes.svg" className="icon size-md center-md" />|
|verbose | Verbose reporting |<img src="/icons/ico-optional-yes.svg" className="icon size-md center-md" />|
<br/>
#### List Dataset Content
```bash
clearml-data list [--id <dataset_id>]
```
**Parameters**
|Name|Description|Optional|
|---|---|---|
|id|Dataset ID whose contents will be shown (alternatively, use project / name combination). Default: previously accessed dataset|<img src="/icons/ico-optional-yes.svg" className="icon size-md center-md" />|
|project|Specify dataset project name (if used instead of ID, dataset name is also required)|<img src="/icons/ico-optional-yes.svg" className="icon size-md center-md" />|
|name|Specify dataset name (if used instead of ID, dataset project is also required)|<img src="/icons/ico-optional-yes.svg" className="icon size-md center-md" />|
|filter|Filter files based on folder / wildcard. Multiple filters are supported. Example: `folder/date_*.json folder/sub-folder`|<img src="/icons/ico-optional-yes.svg" className="icon size-md center-md" />|
|modified|Only list file changes (add / remove / modify) introduced in this version|<img src="/icons/ico-optional-yes.svg" className="icon size-md center-md" />|
<br/>
#### Delete a Dataset
```
clearml-data delete [--id <dataset_id_to_delete>]
```
Deletes an entire dataset from ClearML. This can also be used to delete a newly created dataset.
This does not work on datasets with children.
**Parameters**
|Name|Description|Optional|
|---|---|---|
|id|ID of dataset to be deleted. Default: previously created / accessed dataset that hasn't been finalized yet|<img src="/icons/ico-optional-yes.svg" className="icon size-md center-md" />|
|force|Force dataset deletion even if other dataset versions depend on it|<img src="/icons/ico-optional-yes.svg" className="icon size-md center-md" />||
<br/>
#### Search for a Dataset
```
clearml-data search [--name <name>] [--project <project_name>] [--tags <tag>]
```
Lists all datasets in the system that match the search request.
Datasets can be searched by project, name, ID, and tags.
**Parameters**
|Name|Description|Optional|
|---|---|---|
|ids|A list of dataset IDs|<img src="/icons/ico-optional-yes.svg" className="icon size-md center-md" />|
|project|The project name of the datasets|<img src="/icons/ico-optional-yes.svg" className="icon size-md center-md" />|
|name|A dataset name or a partial name to filter datasets by|<img src="/icons/ico-optional-yes.svg" className="icon size-md center-md" />|
|tags|A list of dataset user tags|<img src="/icons/ico-optional-yes.svg" className="icon size-md center-md" />|
<br/>
### Python API
All API commands should be imported with<br/> `from clearml import Dataset`
#### `Dataset.get(dataset_id='<DS_ID>').get_local_copy()`
Returns a path to dataset in cache, and downloads it if it is not already in cache.
**Parameters**
|Name|Description|Optional|
|---|---|---|
|use_soft_links|If True, use soft links. Default: False on Windows, True on Posix systems|<img src="/icons/ico-optional-yes.svg" className="icon size-md center-md" />|
|raise_on_error|If True, raise exception if dataset merging failed on any file|<img src="/icons/ico-optional-yes.svg" className="icon size-md center-md" />|
<br/>
#### `Dataset.get(dataset_id='<DS_ID>').get_mutable_local_copy()`
Downloads the dataset to a specific folder (non-cached). If the folder already has contents, specify whether to overwrite
its contents with the dataset contents.
**Parameters**
|Name|Description|Optional|
|---|---|---|
|target_folder|Local target folder for the writable copy of the dataset|<img src="/icons/ico-optional-no.svg" className="icon size-md center-md" />|
|overwrite|If True, recursively delete the contents of the target folder before creating a copy of the dataset. If False (default) and target folder contains files, raise exception or return None|<img src="/icons/ico-optional-yes.svg" className="icon size-md center-md" />|
|raise_on_error|If True, raise exception if dataset merging failed on any file|<img src="/icons/ico-optional-yes.svg" className="icon size-md center-md" />|
<br/>
#### `Dataset.create()`
Create a new dataset.
Parent datasets can be specified, and the new dataset inherits all of its parent's content. Multiple dataset parents can
be listed. Merging of parent datasets is done based on the list's order, where each parent can override overlapping files
in the previous parent dataset.
**Parameters**
|Name|Description|Optional|
|---|---|---|
|dataset_name|Name of the new dataset|<img src="/icons/ico-optional-no.svg" className="icon size-md center-md" />|
|dataset_project|The project containing the dataset. If not specified, infer project name from parent datasets. If there is no parent dataset, then this value is required|<img src="/icons/ico-optional-yes.svg" className="icon size-md center-md" />|
|parent_datasets|Expand a parent dataset by adding / removing files|<img src="/icons/ico-optional-yes.svg" className="icon size-md center-md" />|
|use_current_task|If True, the dataset is created on the current Task. Default: False|<img src="/icons/ico-optional-yes.svg" className="icon size-md center-md" />|
<br/>
#### `Dataset.add_files()`
Add files or folder into the current dataset.
**Parameters**
|Name|Description|Optional|
|---|---|---|
|path|Add a folder / file to the dataset|<img src="/icons/ico-optional-no.svg" className="icon size-md center-md" />|
|wildcard|Add only a specific set of files based on wildcard matching. Wildcard matching can be a single string or a list of wildcards, for example: `~/data/*.jpg`, `~/data/json`|<img src="/icons/ico-optional-yes.svg" className="icon size-md center-md" />|
|local_base_folder|Files will be located based on their relative path from local_base_folder|<img src="/icons/ico-optional-yes.svg" className="icon size-md center-md" />|
|dataset_path|Where in the dataset the folder / files should be located|<img src="/icons/ico-optional-yes.svg" className="icon size-md center-md" />|
|recursive|If True, match all wildcard files recursively|<img src="/icons/ico-optional-yes.svg" className="icon size-md center-md" />|
|verbose| If True, print to console files added / modified|<img src="/icons/ico-optional-yes.svg" className="icon size-md center-md" />|
<br/>
#### `Dataset.upload()`
Start file uploading, the function returns when all files are uploaded.
**Parameters**
|Name|Description|Optional|
|---|---|---|
|show_progress|If True, show upload progress bar|<img src="/icons/ico-optional-yes.svg" className="icon size-md center-md" />|
|verbose|If True, print verbose progress report|<img src="/icons/ico-optional-yes.svg" className="icon size-md center-md" />|
|output_url|Target storage for the compressed dataset (default: file server). Examples: `s3://bucket/data`, `gs://bucket/data` , `azure://bucket/data`, `/mnt/share/data` |<img src="/icons/ico-optional-yes.svg" className="icon size-md center-md" />|
|compression|Compression algorithm for the Zipped dataset file (default: ZIP_DEFLATED)|<img src="/icons/ico-optional-yes.svg" className="icon size-md center-md" />|
<br/>
#### `Dataset.finalize()`
Closes the dataset and marks it as *Completed*. After a dataset has been closed, it can no longer be modified.
Before closing a dataset, its files must first be uploaded.
**Parameters**
|Name|Description|Optional|
|---|---|---|
|verbose|If True, print verbose progress report|<img src="/icons/ico-optional-yes.svg" className="icon size-md center-md" />|
|raise_on_error|If True, raise exception if dataset finalizing failed|<img src="/icons/ico-optional-yes.svg" className="icon size-md center-md" />|

29
docs/clearml_sdk.md Normal file
View File

@@ -0,0 +1,29 @@
---
title: ClearML SDK
---
The **ClearML Python Package** supports the [automatic logging](fundamentals/logger.md#automatic-reporting)
that documents the experiment for you, and an extensive set of powerful features and functionality you can use to improve experimentation, other workflows, and get more out of **ClearML**.
The **ClearML Python Package** collects data from scripts including the Git repository (branch, commit ID, and uncommitted changes), working directory and entry point, hyperparameters, initial weights model, model snapshots (checkpoints), output model, other artifacts, metrics, logs, other reported data (from libraries and visualization toolkits), and debug samples.
In conjunction with the **ClearML Hosted Service** (or self-hosted **ClearML Server**) and **ClearML Agent**, the **ClearML Python Package** allows you and your teammates to collaborate programmatically and using the **ClearML Web UI**.
## Modules
* [Task](references/sdk/task.md) - The `task` module contains the `task.Task` class which is the code template for all `Task` features and functionality, including collecting data from scripts, storing that data in a `Task` object, automatic bindings with frameworks (TensorFlow/TensorBoard, PyTorch, Keras, Fastai, scikit-learn), libraries (Pandas, Plotly, AutoKeras), and visualization tools (Matplotlib, Seaborn), and a robust set of methods for Task execution, cloning, connecting parameter dictionaries, configurations, models, working with storage, and more.
* [Logger](references/sdk/logger.md) - The `logger` module contains the `logger.Logger` class which is the **ClearML** console log and metric statistics interface, and contains methods for explicit reporting, setting an upload destination in storage for debug samples, logger cache control, and TensorBoard support in addition to **ClearML** automatic TensorBoard logging.
* [Model](references/sdk/model_model.md) - The `model` module contains three classes: `model.Model` which represents an existing model in **ClearML** that can be loaded and connected to a Task, `model.InputModel` which represents an existing model that you can load into **ClearML**, and `model.OutputModel` which represents the experiment output model that is always connected to the Task.
* [Automation](references/sdk/automation_controller_pipelinecontroller.md) - The `automation` module contains classes supporting hyperparameter optimization, including Optuna, HpBandSter, grid searching, random searching, you own customized search strategies, and resource budgeting for searches; the AWS autoscaler; pipeline controllers; and Task monitoring.
* [StorageManager](references/sdk/storage.md) - The `storage` module contains the `storage.manager.StorageManager` class which provides support for downloading and uploading from storage, including folders, S3, Google Cloud Storage, Azure Storage, and http(s).
* [Dataset](references/sdk/dataset) - The `dataset` module contains classes that helps manage Dataset. Users can create, modify and delete datasets as well as retrieve them for use in their code
## Examples
**ClearML** example scripts in the [examples folder](https://github.com/allegroai/clearml/tree/master/examples) of the `clearml` GitHub repository. They are pre-loaded in the **ClearML Hosted Service**, and can be viewed, cloned, and edited in the **ClearML Web UI**, `ClearML Examples` project. The examples are each explained in this documentation's [examples section](guides/main.md).

98
docs/community.md Normal file
View File

@@ -0,0 +1,98 @@
---
title: Community Resources
---
## Join the ClearML Conversation
For feature requests or bug reports, see **ClearML** [GitHub Issues](https://github.com/allegroai/trains/issues).
If you have any questions, post on the **allegroai-clearml** [Slack Channel](https://join.slack.com/t/clearml/shared_invite/zt-c0t13pty-aVUZZW1TSSSg2vyIGVPBhg).
Or, tag your questions on [stackoverflow](https://stackoverflow.com/questions/tagged/clearml) with the **clearml** tag.
You can always find us at [clearml@allegro.ai](mailto:clearml@allegro.ai?subject=ClearML).
## Allegro AI resources
Read the [Allegro Blogs](https://allegro.ai/blog/).
Subscribe to the **ClearML** [Youtube Channel](https://www.youtube.com/c/ClearML) and view the tutorials, presentations, and discussions.
Join us on Twitter [@allegroAI](https://twitter.com/clearmlapp) for **ClearML** announcements and community discussions.
Follow **ClearML** on [LinkedIn](https://www.linkedin.com/company/clearml).
## Guidelines for Contributing
Firstly, we thank you for taking the time to contribute!
Contribution comes in many forms:
* Reporting [issues](https://github.com/allegroai/clearml/issues) you've come upon
* Participating in issue discussions in the [issue tracker](https://github.com/allegroai/clearml/issues) and the
[ClearML community slack space](https://join.slack.com/t/clearml/shared_invite/zt-c0t13pty-aVUZZW1TSSSg2vyIGVPBhg)
* Suggesting new features or enhancements
* Implementing new features or fixing outstanding issues
The list above is primarily guidelines, not rules. Use your best judgment and feel free to propose changes to this document in a pull request.
## Reporting Issues
By following these guidelines, you help maintainers and the community understand your report, reproduce the behavior, and find related reports.
Before reporting an issue, please check whether it already appears [here](https://github.com/allegroai/clearml/issues). If
it does, join the ongoing discussion instead.
:::note
If you find a **Closed** issue that may be the same issue that you are currently experiencing, then open a **New** issue
and include a link to the original (Closed) issue in the body of your new one.
:::
When reporting an issue, please include as much detail as possible; explain the problem and include additional details to
help maintainers reproduce the problem:
* **Use a clear and descriptive title** for the issue to identify the problem.
* **Describe the exact steps necessary to reproduce the problem** in as much detail as possible. Please do not just summarize what you did. Make sure to explain how you did it.
* **Provide the specific environment setup.** Include the ``pip freeze`` output, specific environment variables, Python version, and other relevant information.
* **Provide specific examples to demonstrate the steps.** Include links to files or GitHub projects, or copy / paste snippets which you use in those examples.
* **If you are reporting any ClearML crash,** include a crash report with a stack trace from the operating system. Make
sure to add the crash report in the issue and place it in a [code block](https://help.github.com/en/articles/getting-started-with-writing-and-formatting-on-github#multiple-lines),
a [file attachment](https://help.github.com/articles/file-attachments-on-issues-and-pull-requests), or just put it in
a [gist](https://gist.github.com) (and provide a link to that gist).
* **Describe the behavior you observed after following the steps** and the exact problem with that behavior.
* **Explain which behavior you expected to see and why.**
* **For Web-App issues, please include screenshots and animated GIFs** that recreate the described steps and clearly demonstrate
the problem. You can use [LICEcap](https://www.cockos.com/licecap) to record GIFs on macOS and Windows, and [silentcast](https://github.com/colinkeenan/silentcast)
or [byzanz](https://github.com/threedaymonk/byzanz) on Linux.
## Suggesting New Features and Enhancements
By following these guidelines, you help maintainers and the community understand your suggestion and find related suggestions.
Enhancement suggestions are tracked as GitHub issues. After you determine which repository your enhancement suggestion is related to, create an issue on that repository and provide the following:
* **A clear and descriptive title** for the issue to identify the suggestion.
* **A step-by-step description of the suggested enhancement** in as much detail as possible.
* **Specific examples to demonstrate the steps.** Include copy / pasteable snippets which you use in those examples as
[Markdown code blocks](https://help.github.com/articles/markdown-basics/#multiple-lines).
* **Describe the current behavior and explain which behavior you expected to see instead and why.**
* **Include screenshots or animated GIFs** that help you demonstrate the steps or point out the part of ClearML which the
suggestion is related to. You can use [LICEcap](https://www.cockos.com/licecap) to record GIFs on macOS and Windows, and
[silentcast](https://github.com/colinkeenan/silentcast) or [byzanz](https://github.com/threedaymonk/byzanz) on Linux.
## Pull Requests
Before you submit a new PR:
* Verify that the work you plan to merge addresses an existing [issue](https://github.com/allegroai/clearml/issues) (If not, open a new one)
* Check related discussions in the [ClearML slack community](https://join.slack.com/t/clearml/shared_invite/zt-c0t13pty-aVUZZW1TSSSg2vyIGVPBhg)
(Or start your own discussion on the ``#clearml-dev`` channel)
* Make sure your code conforms to the ClearML coding standards by running:
flake8 --max-line-length=120 --statistics --show-source --extend-ignore=E501 ./clearml*
In your PR include:
* A reference to the issue it addresses
* A brief description of your implementation approach

View File

@@ -0,0 +1,976 @@
---
title: Configuration File
---
This reference page provides detailed information about the configurable options for **ClearML** and **ClearML Agent**.
**ClearML** and **ClearML Agent** use the same configuration file `clearml.conf`.
This reference page is organized by configuration file section:
* [agent](#agent) - Contains **ClearML Agent** configuration options. If **ClearML Agent** was not installed, the configuration
file will not have an `agent` section.
* [api](#api) - Contains **ClearML** and **ClearML Agent** configuration options for **ClearML Server**.
* [sdk](#sdk) - Contains **ClearML** and **ClearML Agent** configuration options for **ClearML Python Package** and **ClearML Server**.
An example configuration file is located [here](https://github.com/allegroai/clearml-agent/blob/master/docs/clearml.conf),
in the **ClearML** GitHub repositories
### Editing your configuration file
To add, change, or delete options, edit your configuration file.
**To edit your **ClearML** configuration file:**
1. Open the configuration file for editing, depending upon your operating system:
* Linux - `~/clearml.conf`
* Mac - `$HOME/clearml.conf`
* Windows - `\User\<username>\clearml.conf`
1. In the required section (sections listed on this page), add, modify, or remove required options.
1. Save configuration file.
<a class="tr_top_negative" name="agent"></a>
### agent section
**`agent`** (*dict*)
* Dictionary of top-level **ClearML Agent** options to configure **ClearML Agent** for Git credentials, package managers, cache management, workers, and Docker for workers.
---
**`agent.cuda_version`** (*float*)
* The CUDA version to use.
* If specified, this is the CUDA version used.
* If not specified, the CUDA version is automatically detected.
Alternatively, override this option with the environment variable `CUDA_VERSION`.
---
**`agent.cudnn_version`** (*float*)
* The cuDNN version to use.
* If specified, this is the cuDNN version used.
* If not specified, the cuDNN version is automatically detected.
Alternatively, override this option with the environment variable `CUDNN_VERSION`.
---
**`agent.docker_apt_cache`** (*string*)
* The apt (Linux package tool) cache folder for mapping Ubuntu package caching into Docker.
---
**`agent.docker_force_pull`** (*bool*)
* Always update the Docker image by forcing a Docker `pull` before running an experiment
The values are:
* `true` - Always update the Docker image.
* `false` - Do not always update.
---
**`agent.docker_pip_cache`** (*string*)
* The pip (Python package tool) cache folder for mapping Python package caching into Docker.
---
**`agent.enable_task_env`** (*bool*)
* Set the OS environments based on the Task's Environment section before launching the Task process.
---
**`agent.extra_docker_arguments`** (*[string]*)
* Optional arguments to pass to the Docker image. These are local for this agent, and will not be updated in the experiment's `docker_cmd` section. For example, ` ["--ipc=host", ]`.
---
**`agent.extra_docker_shell_script`** (*[string]*)
* An optional shell script to run in the Docker, when the Docker starts, before the experiment starts. For example, `["apt-get install -y bindfs", ]`
---
**`agent.force_git_ssh_protocol`** (*bool*)
* Force Git protocol to use SSH regardless of the Git URL. This assumes the Git user/pass are blank.
The values are:
* `true` - Force
* `false` - Do not force
---
**`agent.force_git_ssh_port`** (*integer*)
* Force a specific SSH port when converting HTTP to SSH links. The domain remains unchanged.
---
**`agent.force_git_ssh_user`** (*string*)
* Force a specific SSH username when converting HTTP to SSH links (the default username is 'git')
---
**`agent.git_host`** (*string*)
* Limit Git credentials usage to this host. The environment variable `CLEARML_AGENT_GIT_HOST` overrides this configuration option.
---
**`agent.git_pass`** (*string*)
* Git repository password.
* If using Git SSH credentials, do not specify this option.
* If not using Git SSH credentials, use this option to specify a Git password for cloning your repositories.
---
**`agent.git_user`** (*string*)
* Git repository username.
* If using Git SSH credentials, do not specify this option.
* If not using Git SSH credentials, use this option to specify a Git password for cloning your repositories.
---
**`agent.python_binary`** (*string*)
* Set the Python version to use when creating the virtual environment, and when launching the experiment. For example, `/usr/bin/python3` or `/usr/local/bin/python3.6`.
---
**`agent.reload_config`** (*bool*)
* Indicates whether to reload the configuration each time the worker daemon is executed.
---
**`agent.translate_ssh`** (*bool*)
* Translate HTTPS communication to SSH
---
**`agent.venvs_dir`** (*string*)
* The target folder for virtual environments builds that are created when executing an experiment.
---
**`agent.worker_id`** (*string*)
* When creating a worker, assign the worker a name.
* If specified, a unique name for the worker. For example, `clearml-agent-machine1:gpu0`.
* If not specified, the following is used: `<hostname>:<process_id>`.
For example, `MyHost:12345`.
Alternatively, specify the environment variable `CLEARML_WORKER_NAME` to override this worker name.
---
**`agent.worker_name`** (*string*)
* Use to replace the hostname when creating a worker, if `agent.worker_id` is not specified. For example, if `worker_name`
is `MyMachine` and the process_id is `12345`, then the worker is name `MyMachine.12345`.
Alternatively, specify the environment variable `CLEARML_WORKER_ID` to override this worker name.
<br/>
#### agent.default_docker
<a class="tr_top_negative" name="agent_default_docker"></a>
**`agent.default_docker`** (*dict*)
* Dictionary containing the default options for workers in Docker mode.
---
**`agent.default_docker.arguments`** (*string*)
* If running a worker in Docker mode, this option specifies the options to pass to the Docker container.
---
**`agent.default_docker.image`** (*string*)
* If running a worker in Docker mode, this option specifies the default Docker image to use.
<br/>
#### agent.package_manager
**`agent.package_manager`** (*dict*)
* Dictionary containing the options for the Python package manager. The currently supported package managers are pip, conda,
and, if the repository contains a poetry.lock file, poetry.
---
**`agent.package_manager.conda_channels`** (*[string]*)
* If conda is used, then this is list of conda channels to use when installing Python packages.
---
**`agent.package_manager.conda_full_env_update`** (*bool*)
* Enables update of conda environment (Conda environment does not update by default as it might break)
---
**`agent.package_manager.conda_env_as_base_docker`** (*bool*)
* Uses conda environment for execution (Like a docker)
___
**`agent.package_manager.extra_index_url`** (*[string]*)
* A list of URLs for additional artifact repositories when installing Python packages.
---
**`agent.package_manager.force_upgrade`** (*bool*)
* Indicates whether to force an upgrade of Python packages.
The values are:
* `true` - Force
* `false` - Do not force
---
**`agent.package_manager.pip_version`** (*string*)
* The `pip` version to use. For example, `"<20"`, `"==19.3.1"`, `""` (empty string will install the latest version).
---
**`agent.package_manager.post_optional_packages`** (*string*)
* A list of optional packages that will be installed after the required packages. If the installation of an optional post
package fails, the package is ignored, and the virtual environment process continues.
---
**`agent.package_manager.post_packages`** (*[string]*)
* A list of packages that will be installed after the required packages.
___
**`agent.package_manager.priority_optional_packages`** (*[string]*)
* A list of optional priority packages to be installed before the rest of the required packages, but in case a
package installation fails, the package will be ignored, and the virtual environment process will continue.
---
**`agent.package_manager.priority_packages`** (*[string]*)
* A list of packages with priority to be installed before the rest of the required packages. For example: `["cython", "numpy", "setuptools", ]`
---
**`agent.package_manager.system_site_packages`** (*bool*)
* Indicates whether Python packages for virtual environments are inherited from the system when building a virtual environment
for an experiment.
The values are:
* `true` - Inherit
* `false` - Do not inherit (load Python packages)
---
**`agent.package_manager.torch_nightly`** (*bool*)
* Indicates whether to support installing PyTorch Nightly builds.
The values are:
* `true` - If a stable `torch` wheel is not found, install the nightly build.
* `false` - Do not install.
:::note
Torch Nightly builds are ephemeral and are deleted from time to time.
:::
---
**`agent.package_manager.type`** (*string*)
* Indicates the type of Python package manager to use.
The values are:
* `pip` - use pip as the package manager or, if a `poetry.lock` file exists in the repository, use poetry as the package manager
* `conda` - use conda as the package manager
<br/>
#### agent.pip_download_cache
**`agent.pip_download_cache`** (*dict*)
* Dictionary containing pip download cache options.
---
**`agent.pip_download_cache.enabled`** (*bool*)
* Indicates whether to use a specific cache folder for Python package downloads.
The values are:
* `true` - Use a specific folder which is specified in the option `agent.pip_download_cache.path`
* `false` - Do not use a specific folder.
---
**`agent.pip_download_cache.path`** (*string*)
* If `agent.pip_download_cache.enabled` is `true`, then this specifies the cache folder.
<br/>
#### agent.vcs_cache
**`agent.vcs_cache`** (*dict*)
* Dictionary containing version control system clone cache folder.
---
**`agent.vcs_cache.enabled`** (*bool*)
* Indicates whether the version control system cache is used.
The values are:
* `true` - Use cache
* `false` - Do not use cache
---
**`agent.vcs_cache.path`** (*string*)
* The version control system cache clone folder when executing experiments.
<br/>
#### agent.venvs_cache
**`agent.venvs_cache`** (*dict*)
* Dictionary containing virtual environment cache options.
---
**`agent.venvs_cache.free_space_threshold_gb`** (*integer*)
* Minimum required free space to allow for cache entry.
* Disable minimum by passing 0 or negative value.
---
**`agent.venvs_cache.max_entries`** (*integer*)
* Maximum number of cached virtual environments.
---
**`agent.venvs_cache.path`** (*string*)
* Folder of the virtual environment cache.
* Uncomment to enable virtual environment caching.
<br/>
#### agent.venv_update
**`agent.venv_update`** (*dict*)
* Dictionary containing virtual environment update options.
---
**`agent.venv_update.enabled`** (*bool*)
* Indicates whether to use accelerated Python virtual environment building (this is a beta feature).
The values are:
* `true` - Accelerate
* `false` - Do not accelerate (default value)
<a class="tr_top_negative" name="api"></a>
### api section
**`api`** (*dict*)
Dictionary of configuration options for the **ClearML Server** API, web, and file servers and credentials.
---
**`api.api_server`** (*string*)
* The URL of your **ClearML** API server. For example, `https://api.MyDomain.com`.
---
**`api.web_server`** (*string*)
* The URL of your **ClearML** web server. For example, `https://app.MyDomain.com`.
---
**`api.files_server`** (*string*)
* The URL of your **ClearML** file server. For example, `https://files.MyDomain.com`.
:::warning
You must use a secure protocol. For ``api.web_server``, ``api.files_server``, and ``api.files_server``. You must use a secure protocol, "https". Do not use "http".
:::
<br/>
#### api.credentials
**`api.credentials`** (*dict*)
* Dictionary of API credentials.
Alternatively, specify the environment variable ` CLEARML_API_ACCESS_KEY / CLEARML_API_SECRET_KEY` to override these keys.
---
**`api.credentials.access_key`** (*string*)
* Your **ClearML** access key.
---
**`api.credentials.secret_key`** (*string*)
* Your **ClearML** credentials.
---
**`api.verify_certificate`** (*bool*)
* Indicates whether to verify the host SSL certificate.
The values are:
* `true` - Verify
* `false` - Do not verify.
:::warning
Set to False only if required.
:::
<a class="tr_top_negative" name="sdk"></a>
<br/>
### sdk section
**`sdk`** (*dict*)
* Dictionary that contains configuration options for the **ClearML Python Package** and related options, including storage,
metrics, network, AWS S3 buckets and credentials, Google Cloud Storage, Azure Storage, log, and development.
<br/>
#### sdk.aws
**`sdk.aws`** (*dict*)
* Dictionary with AWS storage options.
<br/>
##### sdk.aws.boto3
**`sdk.aws.boto3`** (*dict*)
* Dictionary of AWS Storage, Boto2 options.
---
**`sdk.aws.boto3.pool_connections`** (*integer*)
* For AWS Boto3, The maximum number of Boto3 pool connections.
---
**`sdk.aws.boto3.max_multipart_concurrency`** (*integer*)
* For AWS Boto3, the maximum number of threads making requests for a transfer.
<br/>
##### sdk.aws.s3
**`sdk.aws.s3`** (*dict*)
* Dictionary of AWS Storage, AWS S3 options.
---
**`sdk.aws.s3.key`** (*string*)
* For AWS S3, the default access key for any bucket that is not specified in the `sdk.aws.s3.credentials` section.
---
**`sdk.aws.s3.region`** (*string*)
* For AWS S3, the default region name for any bucket that is not specified in the `sdk.aws.s3.credentials` section.
---
**`sdk.aws.s3.secret`** (*string*)
* For AWS S3, the default secret access key for any bucket that is not specified in the `sdk.aws.s3.credentials` section.
<br/>
###### sdk.aws.s3.credentials
**`sdk.aws.s3.credentials`** (*[dict]*)
* List of dictionaries, for AWS S3, each dictionary can contain the credentials for individual S3 buckets or hosts for individual buckets.
---
**`sdk.aws.s3.credentials.bucket`** (*string*)
* For AWS S3, if specifying credentials for individual buckets, then this is the bucket name for an individual bucket.
:::note
See the [AWS documentation](https://docs.aws.amazon.com/AmazonS3/latest/dev/BucketRestrictions.html) for restrictions
and limitations on bucket naming.
:::
---
**`sdk.aws.s3.credentials.host`** (*string*)
* For AWS S3, if specifying credentials for individual buckets by host, then this option is the host URL and optionally the port number.
---
**`sdk.aws.s3.credentials.key`** (*string*)
* For AWS S3:
* If specifying individual bucket, then this is the access key for the bucket.
* If specifying individual buckets by host, then this is access key for all buckets on the host.
---
**`sdk.aws.s3.credentials.multipart`** (*bool*)
* For AWS S3, if specifying credentials for individual buckets by host, then this indicates whether to allow multipart upload of a single object (object as a set of parts).
The values are:
* `true` - Enabled
* `false` - Disabled
---
**`sdk.aws.s3.credentials.secret`** (*bool*)
* For AWS S3:
* If specifying credentials for a specific bucket, then this is the secret key for the bucket.
* If specifying credentials for individual buckets by host, then this is the secret key for all buckets on the host.
---
**`sdk.aws.s3.credentials.secure`** (*string*)
* For AWS S3, if specifying credentials for individual buckets by host, then this indicates whether the host is secure.
The values are:
* `true` - Secure
* `false` - Not secure
<br/>
#### sdk.azure.storage
**`sdk.azure.storage.containers`** (*[dict]*)
* List of dictionaries, each dictionary contains credentials for an Azure Storage container.
---
**`sdk.azure.storage.containers.account_key`** (*string*)
* For Azure Storage, this is the credentials key.
---
**`sdk.azure.storage.containers.account_name`** (*string*)
* For Azure Storage, this is account name.
---
**`sdk.azure.storage.containers.container_name`** (*string*)
* For Azure Storage, this the container name.
<br/>
#### sdk.development
**`sdk.development`** (*dict*)
* Dictionary of development mode options.
---
**`sdk.development.default_output_uri`** (*string*) <a class="tr_top_negative" id="config_default_output_uri"></a>
* The default output destination for model checkpoints (snapshots) and artifacts. If the `output_uri` parameter is not provided
when calling the `Task.init` method, then use the destination in `default_output_uri`.
---
**`sdk.development.store_uncommitted_code_diff_on_train`** (*bool*)
* For development mode, indicates whether to store the uncommitted `git diff` or `hg diff` in the experiment manifest
The values are:
* `true` - Store the `diff` in the `script.requirements.diff` section
* `false` - Do not store the diff.
---
**`sdk.development.support_stopping`** (*bool*)
* For development mode, indicates whether to allow stopping an experiment if the experiment was aborted externally, its status was changed, or it was reset.
The values are:
* `true` - Allow
* `false` - Do not allow
---
**`sdk.development.task_reuse_time_window_in_hours`** (*float*)
* For development mode, the number of hours after which an experiment with the same project name and experiment name is reused.
---
**`sdk.development.vcs_repo_detect_async`** (*bool*)
* For development mode, indicates whether to run version control repository detection asynchronously.
The values are:
* `true` - Run asynchronously
* `false` - Do not run asynchronously
<br/>
##### sdk.development.worker
**`sdk.development.worker`** (*dict*)
* Dictionary of development mode options for workers.
---
**`sdk.development.worker.log_stdout`** (*bool*)
* For development mode workers, indicates whether all stdout and stderr messages are logged.
The values are:
* `true` - Log all
* `false` - Do not log all
---
**`sdk.development.worker.ping_period_sec`** (*integer*)
* For development mode workers, the interval in seconds for a worker to ping the server testing connectivity.
---
**`sdk.development.worker.report_period_sec`** (*integer*)
* For development mode workers, the interval in seconds for a development mode **ClearML** worker to report.
<br/>
#### sdk.google.storage
**`sdk.google.storage`** (*dict*)
* Dictionary of Google Cloud Storage credentials.
---
**`sdk.google.storage.project`** (*string*)
* For Google Cloud Storage, the name of project.
---
**`sdk.google.storage.credentials_json`** (*string*)
* For Google Cloud Storage, the file path for the default Google storage credentials JSON file.
<br/>
##### sdk.google.storage.credentials
**`sdk.google.storage.credentials`** (*[dict]*)
* A list of dictionaries, with specific credentials per bucket and sub-directory
---
**`sdk.google.storage.credentials.bucket`** (*string*)
* For Google Cloud Storage, if specifying credentials by the individual bucket, the name of the bucket.
---
**`sdk.google.storage.credentials.credentials_json`** (*string*)
* For Google Cloud Storage, if specifying credentials by the individual bucket, the file path for the default Google storage credentials JSON file.
---
**`sdk.google.storage.credentials.project`** (*string*)
* For Google Cloud Storage, if specifying credentials by the individual bucket, the name of the project.
---
**`sdk.google.storage.credentials.subdir`** (*string*)
* For Google Cloud Storage, if specifying credentials by the individual bucket, a subdirectory within the bucket.
<br/>
#### sdk.log
**`sdk.log`** (*dict*)
* Dictionary of log options.
---
**`sdk.log.disable_urllib3_info`** (*bool*)
* Indicates whether to disable `urllib3` info messages.
The values are:
* `true` - Disable
* `false` - Do not disable
---
**`sdk.log.null_log_propagate`** (*bool*)
* As debugging feature, indicates whether to allow null log messages to propagate to the root logger (so they appear as stdout).
The values are:
* `true` - Allow
* `false` - Do not allow
---
**`sdk.log.task_log_buffer_capacity`** (*integer*)
* The maximum capacity of the log buffer.
#### sdk.metrics
**`sdk.metrics`** (*dict*)
* Dictionary of metrics options.
---
**`sdk.metrics.file_history_size`** (*string*)
* The history size for debug files per metric / variant combination
* Each metric / variant combination, `file_history_size` indicates the number of files stored in the upload destination
* Files are recycled so that `file_history_size` is the maximum number of files at any time.
---
**`sdk.metrics.matplotlib_untitled_history_size`** (*integer*)
* The maximum history size for `matplotlib` `imshow` files per plot title.
* File names for the uploaded images are recycled so that the number of images stored in the upload destination for each matplotlib plot title
will not exceed the value of `matplotlib_untitled_history_size`
---
**`sdk.metrics.plot_max_num_digits`** (*integer*)
* The maximum number of digits after the decimal point in plot reporting. This can reduce the report size.
---
**`sdk.metrics.tensorboard_single_series_per_graph`** (*bool*)
* Indicates whether plots appear using TensorBoard behavior where each series is plotted in its own graph (plot-per-graph).
The values are:
* `true` - Support TensorBoard behavior
* `false` - Do not
<br/>
##### sdk.metrics.images
**`sdk.metrics.images`** (*dict*)
* Dictionary of metrics images options.
---
**`sdk.metrics.images.format`** (*string*)
* The image file format for generated debug images (e.g., JPEG).
---
**`sdk.metrics.images.quality`** (*integer*)
* The image quality for generated debug images.
---
**`sdk.metrics.images.subsampling`** (*integer*)
* The image subsampling for generated debug images.
<br/>
#### sdk.network
**`sdk.network.iteration`** (*dict*)
* Dictionary of network iteration options.
---
**`sdk.network.iteration.max_retries_on_server_error`**` (*integer*)
* For retries when getting frames from the server, if the server returned an error (http code 500), then this is the maximum number of retries.
---
**`sdk.network.iteration.retry_backoff_factor_sec`**
* For retries when getting frames from the server, this is backoff factor for consecutive retry attempts. This is used to
determine the number of seconds between retries. The retry backoff factor is calculated as {backoff factor} * (2 ^ ({number of total retries} - 1)).
<br/>
##### sdk.network.metrics
**`sdk.network.metrics`** (*dict*)
* Dictionary of network metrics options.
---
**`sdk.network.metrics.file_upload_starvation_warning_sec`** (*integer*)
* The number of seconds before a warning is issued that file-bearing events are sent for upload, but no uploads occur.
---
**`sdk.network.metrics.file_upload_threads`** (*integer*)
* The number of threads allocated to uploading files when transmitting metrics for a specific iteration.
<br/>
#### sdk.storage
**`sdk.storage`** (*dict*)
* Dictionary of storage options.
<br/>
##### sdk.storage.cache
**`sdk.storage.cache`** (*dict*)
* Dictionary of storage cache options.
---
**`sdk.storage.cache.default_base_dir`** (*string*)
* The default base directory for caching. The default is the system temp folder for caching.
<br/>
##### sdk.storage.direct_access
**`sdk.storage.direct_access`** (*dict*)
* Dictionary of storage direct access options.
---
**`sdk.storage.direct_access.url`** (*string*)
* Specify a list of direct access objects using glob patterns which matches sets of files using wildcards. Direct access
objects are not downloaded or cached, and any download request will return a direct reference.

57
docs/configs/env_vars.md Normal file
View File

@@ -0,0 +1,57 @@
---
title: Environment Variables
---
:::info
ClearML's environment variables overide the clearml.conf file and SDK
:::
## ClearML SDK Variables
### General
|Name|Description|
|---|---|
|**CLEARML_LOG_ENVIRONMENT** | List of Environment variables to log|
|**CLEARML_TASK_NO_REUSE** | Control Task reuse|
|**CLEARML_CACHE_DIR** | Sets the location of the cache directory|
|**CLEARML_DOCKER_IMAGE** | Sets the default docker image to run from|
|**CLEARML_LOG_LEVEL** | debug \ warning \ error \ info | Sets the ClearML package's log verbosity|
|**CLEARML_SUPPRESS_UPDATE_MESSAGE** | Suppresses the message that notifies users of new ClearML package version|
### VCS
Overrides Repository Auto-logging
|Name|Description|
|---|---|
|**CLEARML_VCS_REPO_URL**| Repository's URL|
|**CLEARML_VCS_COMMIT_ID**| Repository's Commit ID|
|**CLEARML_VCS_BRANCH**| Repository's Branch|
|**CLEARML_VCS_ROOT**| Repository's Root directory|
### Server Connection
|Name|Description|
|---|---|
|**CLEARML_API_HOST** | Sets the API Server URL|
|**CLEARML_WEB_HOST** | Sets the Web UI Server URL|
|**CLEARML_FILES_HOST** | Sets the File Server URL
|**CLEARML_API_ACCESS_KEY** | Sets the Server's Public Access Key|
|**CLEARML_API_SECRET_KEY** | Sets the Server's Private Access Key|
|**CLEARML_API_HOST_VERIFY_CERT**| Enables \ Disable server certificate verification (If behind a firewall)|
|**CLEARML_OFFLINE_MODE** | Sets Offline mode|
## ClearML Agent Variables
|Name|Description|
|---|---|
|**CLEARML_DOCKER_IMAGE** | Default ClearML Agent docker image|
|**CLEARML_WORKER_NAME** | Sets the Worker's name|
|**CLEARML_WORKER_ID** | Sets the Worker ID|
|**CLEARML_CUDA_VERSION** | Sets the CUDA version to be used|
|**CLEARML_CUDNN_VERSION** | Sets the CUDNN version to be used|
|**CLEARML_CPU_ONLY** | Force CPU only mode|
|**CLEARML_DOCKER_SKIP_GPUS_FLAG**| Skips the GPUs flag (support for docker V18|
|**CLEARML_AGENT_GIT_USER** | Sets the Git user for ClearML Agent|
|**CLEARML_AGENT_GIT_PASS** | Sets the Git password for ClearML Agent|
|**CLEARML_AGENT_GIT_HOST** | Sets Git host (only sending login to this host)|
|**CLEARML_AGENT_EXEC_USER**| User for Agent executing tasks (root by default)|
|**CLEARML_AGENT_EXTRA_PYTHON_PATH**| Sets extra python path|
|**CLEARML_AGENT_K8S_HOST_MOUNT / CLEARML_AGENT_DOCKER_HOST_MOUNT**| Specifies Agent's mount point for Docker \ K8s|

View File

@@ -0,0 +1,134 @@
---
title: Configuring ClearML for Your ClearML Server
---
:::important
This documentation page applies to deploying your own open source ClearML Server. It does not apply to ClearML Hosted Service users.
:::
The **ClearML** configuration file that will be initialized will contain the host URLs of the **ClearML Server**, and
**ClearML** credentials, allowing the code to integrate with the server. Later, **ClearML** can be tailored to fit requirements
by setting [configuration options](../configs/clearml_conf.md).
**To configure ClearML for your ClearML Server:**
1. If not installed already, install `clearml` (see [install](../getting_started/ds/ds_first_steps.md))
1. In a terminal session, run the **ClearML** setup wizard.
```
clearml-init
```
<details className="cml-expansion-panel info">
<summary className="cml-expansion-panel-summary">Learn about creating multiple ClearML configuration files</summary>
<div className="cml-expansion-panel-content">
Additional **ClearML** configuration files can be created, for example, to use inside Docker containers when executing
a Task.
Use the `--file` option for `clearml-init`.
clearml-init --file MyOtherClearML.conf
and then specify it using the ``CLEARML_CONFIG_FILE`` environment variable inside the container:
CLEARML_CONFIG_FILE = MyOtherClearML.conf
For more information about running experiments inside Docker containers, see [ClearML Agent Execution](../clearml_agent#execution)
and [ClearML Agent Reference](../references/clearml_agent_ref.md).
</div>
</details>
<br/>
If the setup wizard's response indicates that a configuration file already exists, follow the instructions in
[here](#add-clearml-to-a-configuration-file). The wizard does not edit or overwrite existing configuration files.
1. The setup wizard prompts for **ClearML** credentials.
ClearML SDK setup process
Please create new clearml credentials through the profile page in your clearml web app (e.g. http://localhost:8080/profile)
Or with the free hosted service at https://app.community.clear.ml/profile
In the profile page, press "Create new credentials", then press "Copy to clipboard".
Paste copied configuration here:
1. Get **ClearML** credentials. Open the **ClearML Web UI** in a browser. On the **PROFILE** page, click
**Create new credentials** **>** **Copy to clipboard**.
1. At the command prompt `Paste copied configuration here:`, copy and paste the **ClearML** credentials.
The setup wizard confirms the credentials.
Detected credentials key="********************" secret="*******"
1. Enter the **ClearML Server** web server URL, or press **Enter** to accept the default which is detected from the
credentials.
WEB Host configured to: [https://app.<your-domain>]
1. Enter the **ClearML Server** API server URL, or press **Enter** to accept the default value which is based on the previous response:
API Host configured to: [https://api.<your-domain>]
1. Enter the **ClearML Server** file server URL, or press **Enter** to accept the default value which is based on the previous response:
File Store Host configured to: [files.<your-domain>]
The wizard responds with a configuration and directs to the **ClearML Server**.
CLEARML Hosts configuration:
Web App: https://app.<your-domain>
API: https://api.<your-domain>
File Store: https://files.<your-domain>
Verifying credentials ...
Credentials verified!
New configuration stored in /home/<username>/clearml.conf
CLEARML setup completed successfully.
<br/>
The configuration file's location depends upon the operating system:
* Linux - `~/clearml.conf`
* Mac - `$HOME/clearml.conf`
* Windows - `\User\<username>\clearml.conf`
## Add ClearML to a configuration file
The setup wizard may indicate that a configuration file already exists. For example, if a **ClearML Agent** was previously
configured, then a configuration file was created. The wizard does not edit or overwrite existing configuration files.
The host URLs for the **ClearML Server** are required:
* **ClearML Server** web server
* **ClearML Server** API server
* **ClearML Server** file server
These may be localhost, the domain, or a sub-domain of the domain.
**To add ClearML settings to an existing ClearML configuration file:**
1. Open the **ClearML** configuration file for editing. Depending upon the operating system, it is:
* Linux - `~/clearml.conf`
* macOS - `$HOME/clearml.conf`
* Windows - `\User\<username>\clearml.conf`
1. In the `sdk.development` section, add the logging of environment variables option (see ``log_os_environments`` in an
[example configuration file](https://github.com/allegroai/clearml/blob/master/docs/clearml.conf#L178)).
```editorconfig
# Log specific environment variables. OS environments are enlisted in the "Environment" section
# of the Hyper-Parameters.
# multiple selected variables are supported including the suffix '*'.
# For example: "AWS_*" will log any OS environment variable starting with 'AWS_'.
# This value can be overwritten with os environment variable CLEARML_LOG_ENVIRONMENT="[AWS_*, CUDA_VERSION]"
# Example: log_os_environments: ["AWS_*", "CUDA_VERSION"]
log_os_environments: []
```
1. Save the **ClearML** configuration file. **ClearML** is now configured for the **ClearML Server**.

View File

@@ -0,0 +1,61 @@
---
title: ClearML Server
---
## What is ClearML Server?
The **ClearML Server** is the backend service infrastructure for **ClearML**. It allows multiple users to collaborate and manage their experiments by working seamlessly with the **ClearML Python Package** and [**ClearML Agent**](clearml_agent.md). **ClearML Server** is composed of the following:
* Web server including the **ClearML Web UI**, which is the user interface for tracking, comparing, and managing experiments.
* API server which a RESTful API for:
* Documenting and logging experiments, including information, statistics, and results.
* Querying experiments history, logs, and results.
* File server which stores media and models making them easily accessible using the **ClearML Web UI**.
The [**ClearML Hosted Service**](https://app.community.clear.ml) is essentially the **ClearML Server** maintained for you.
![image](../img/ClearML_Server_Diagram.png)
**ClearML Web UI** is the **ClearML** user interface and is part of **ClearML Server**.
Use the **ClearML Web UI** to:
* Track experiments
* Compare experiments
* Manage experiments
For detailed information about the **ClearML Web UI**, see [User Interface](../webapp/webapp_home.md).
ClearML Server also comes with a [services agent](../clearml_agent.md#services-mode) preinstalled.
## Deployment
The **ClearML Server** can be deployed in any of the formats listed below. Once deployed, configure the server for web login
authentication, sub-domains, and load balancers, and use any of its many configuration settings.
**To deploy your own ClearML Server:**
1. Deploy ``clearml-server`` using any of the available formats, which include:
* Pre-built [AWS EC2 AMIs](clearml_server_aws_ec2_ami.md)
* Pre-built [Google Cloud Platform custom images](clearml_server_gcp.md)
* Pre-built Docker images for [Linux](clearml_server_linux_mac.md), [macOS](clearml_server_linux_mac.md), and
[Windows 10](clearml_server_win.md)
* [Kubernetes](clearml_server_kubernetes.md) and [Kubernetes using Helm](clearml_server_kubernetes_helm.md)
1. Optionally, [configure ClearML Server](clearml_server_config.md) for additional features, including sub-domains and load balancers,
web login authentication, and the non-responsive task watchdog.
1. [Configure ClearML for ClearML Server](clearml_config_for_clearml_server.md)
## Updating
When necessary, upgrade your ClearML Server on any of the available formats:
* [AWS EC2 AMIs](upgrade_server_aws_ec2_ami.md)
* [Google Cloud Platform](upgrade_server_gcp.md)
* [Linux or MacOS](upgrade_server_linux_mac.md)
* [Windows 10](upgrade_server_win.md)
* [Kubernetes](upgrade_server_kubernetes.md) and [Kubernetes using Helm](upgrade_server_kubernetes_helm.md).
If you are using v0.15 or Older, [upgrade to ClearML Server](clearml_server_es7_migration.md).

View File

@@ -0,0 +1,180 @@
---
title: AWS EC2 AMIs
---
:::important
This documentation page applies to deploying your own open source ClearML Server. It does not apply to ClearML Hosted Service users.
:::
Deployment of **ClearML Server** on AWS is easily performed using AWS AMIs, which are available in the AWS Marketplace catalog
and in the AWS community AMI catalog.
* AWS Marketplace ClearML Server is coming soon - Preconfigured with unique initial access credentials. Until it arrives,
use [AWS Marketplace Trains Server](https://aws.amazon.com/marketplace/pp/B085D8W5NM) with the instructions on the page.
* [ClearML Server community AMIs](#clearml-server-aws-community-amis) - Configured by default without authentication to allow quick access and onboarding.
After deploying either type of AMI, configure the **ClearML Server** instance to provide the authentication scheme that
best matches the workflow.
For information about upgrading a **ClearML Server** in an AWS instance, see [here](upgrade_server_aws_ec2_ami.md).
:::important
If **ClearML Server** is being reinstalled, we recommend clearing browser cookies for **ClearML Server**. For example,
for Firefox, go to Developer Tools > Storage > Cookies, and for Chrome, go to Developer Tools > Application > Cookies,
and delete all cookies under the **ClearML Server** URL.
:::
## Launching
:::warning
By default, **ClearML Server** deploys as an open network. To restrict **ClearML Server** access, follow the instructions
in the [Security](clearml_server_security.md) page.
:::
The minimum recommended amount of RAM is 8 GB. For example, a t3.large or t3a.large EC2 instance type would accommodate the recommended RAM size.
### AWS community AMIs
**To launch a ClearML Server AWS community AMI:**
* Use one of the [ClearML Server AWS community AMIs](#clearml-server-aws-community-amis) and see:
* The AWS Knowledge Center page, [How do I launch an EC2 instance from a custom Amazon Machine Image (AMI)?](https://aws.amazon.com/premiumsupport/knowledge-center/launch-instance-custom-ami/)
* Detailed instructions in the AWS Documentation for [Launching an Instance Using the Launch Instance Wizard](https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/launching-instance.html).
### AWS Marketplace AMIs
**To launch a ClearML Server AWS Marketplace AMI through the AWS Marketplace website:**
1. Open the AWS Marketplace for the [Allegro AI ClearML Server](https://aws.amazon.com/marketplace/pp/B085D8W5NM).
1. In the heading area, click **Continue to Subscribe**.
1. **On the Subscribe to software** page, click **Accept Terms**, and then click **Continue to Configuration**.
1. On the **Configure this software** page, complete the following:
1. In the **Fulfillment Option** list, select **64-bit (x86) Amazon Machine Image (AMI)**.
1. In the **Software Version** list, select your **ClearML Server** version. For example, **0.13.0 (Mar 02, 2020)**.
1. In the **Region** list, select your region.
1. Click **Continue to Launch**.
1. On the **Launch this software** page, in the **Choose Action** list, select either of following options, and perform the steps for that option:
* **Launch through EC2**:
1. Click **Launch**.
1. Follow the instructions on the [How do I launch an EC2 instance from a custom Amazon Machine Image (AMI)?](https://aws.amazon.com/premiumsupport/knowledge-center/launch-instance-custom-ami/) AWS documentation page.
* **Launch from Website**:
1. Select required settings: EC2 Instance Type, VPC Settings, Subnet Settings, Security Group Settings, and Key Pair Settings.
1. Click **Launch**.
1. On the **Launch this software** page, note your Instance ID. You can use it later to search for your instance in the EC2 Console.
## Accessing ClearML Server
Once deployed, **ClearML Server** exposes the following services:
* Web server on `TCP port 8080`
* API server on `TCP port 8008`
* File Server on `TCP port 8081`
**To locate **ClearML Server** address:**
1. Go to AWS EC2 Console.
1. In the **Details** tab, **Public DNS (IPv4)** shows the **ClearML Server** address.
**To access **ClearML Server** Web-App (UI):**
* Direct browser to its web server URL: `http://<Server Address>:8080`
**To SSH into ClearML Server:**
* Log into the AWS AMI using the default username `ec2-user`. Control the SSH credentials from the AWS management console.
### Logging in to the Web-App (UI)
**To log in to the **ClearML** Web-App (UI):**
* If **ClearML Server** was launched from an AWS Community AMI, enter any name.
* If **ClearML Server** was launched through the AWS Marketplace, enter the preconfigured default login credentials, which
are:
* **clearml-user** (the default username).
* The **ClearML Server** EC2 instance ID (the default password).
If needed, modify the default login behavior to match workflow policy, see [Configuring Web Login Authentication](clearml_server_config.md#web-login-authentication)
on the "Configuring Your Own ClearML Server" page.
## Storage configuration
The pre-built **ClearML Server** storage configuration is the following:
* MongoDB: `/opt/clearml/data/mongo/`
* Elasticsearch: `/opt/clearml/data/elastic_7/`
* File Server: `/mnt/fileserver/`
## Backing up and restoring data and configuration
:::note
If data is being moved between a **Trains Server** and a **ClearML Server** installation, make sure to use the correct paths
for backup and restore (/opt/trains and /opt/clearml respectively).
:::
The commands in this section are examples for backing up and restoring data and configuration.
If data and configuration folders are in `/opt/clearml`, then archive all data into `~/clearml_backup_data.tgz`, and
configuration into `~/clearml_backup_config.tgz`:
```bash
sudo tar czvf ~/clearml_backup_data.tgz -C /opt/clearml/data .
sudo tar czvf ~/clearml_backup_config.tgz -C /opt/clearml/config .
```
**If data and configuration need to be restored**:
1. Verify you have the backup files.
1. Replace any existing data with the backup data:
```bash
sudo rm -fR /opt/clearml/data/* /opt/clearml/config/*
sudo tar -xzf ~/clearml_backup_data.tgz -C /opt/clearml/data
sudo tar -xzf ~/clearml_backup_config.tgz -C /opt/clearml/config
```
1. Grant access to the data:
```bash
sudo chown -R 1000:1000 /opt/clearml
```
## ClearML Server AWS community AMIs
The following sections contain lists of AMI Image IDs, per region, for each released **ClearML Server** version.
### Latest version
#### v1.0.0
* **eu-north-1** : ami-0d6b1781328f44b21
* **ap-south-1** : ami-03d18434eb00ba0d4
* **eu-west-3** : ami-0ca027ed4205e7d67
* **eu-west-2** : ami-04304fe1639f8324f
* **eu-west-1** : ami-06260010b2e24b438
* **ap-northeast-3** : ami-0d16f3c2176cf8639
* **ap-northeast-2** : ami-0a3a2e08cec3e2709
* **ap-northeast-1** : ami-04c2c71b7bcecf6af
* **sa-east-1** : ami-00c86a9d8b5b87239
* **ca-central-1** : ami-0889a860b58dd8d88
* **ap-southeast-1** : ami-0a9ac9925ab98a270
* **ap-southeast-2** : ami-01735e0de7b1a13f2
* **eu-central-1** : ami-0b93523a0f9ec5e2b
* **us-east-2** : ami-0fa34e08b01eadb96
* **us-west-1** : ami-0a8cb65f6856dd561
* **us-west-2** : ami-0eb1b443c591054fe
* **us-east-1** : ami-07ed6a6bbb63799cc
## Next Step
* [Configuring ClearML for ClearML Server](clearml_config_for_clearml_server.md).

View File

@@ -0,0 +1,323 @@
---
title: Configuring ClearML Server
---
:::important
This documentation page applies to deploying your own open source ClearML Server. It does not apply to ClearML Hosted Service users.
:::
This page describes the **ClearML Server** [deployment](#clearml-server-deployment-configuration) and [feature](#clearml-server-feature-configurations) configurations. Namely, it contains instructions on how to configure **ClearML Server** for:
* [Sub-domains and load balancers](#sub-domains-and-load-balancers) - An AWS load balancing example.
* [Opening Elasticsearch, MongoDB, and Redis for External Access](#opening-elasticsearch-mongodb-and-redis-for-external-access).
* [Web login authentication](#web-login-authentication) - Create and manage users and passwords.
* [Using hashed passwords](#using-hashed-passwords) - Option to use hashed passwords instead of plain-text passwords
* [Non-responsive Task watchdog](#non-responsive-task-watchdog) - For inactive experiments.
For all configuration options, see the [ClearML Configuration Reference](../configs/clearml_conf.md) page.
:::important
We recommend using the latest version of **ClearML Server**.
:::
## ClearML Server deployment configuration
**ClearML Server** supports two deployment configurations: single IP (domain) and sub-domains.
### Single IP (domain) configuration
Single IP (domain) with the following open ports:
* Web application on port `8080`
* API service on port `8008`
* File storage service on port `8081`
### Sub-domain configuration
Sub-domain configuration with default http/s ports (`80` or `443`):
* Web application on sub-domain: `app.*.*`
* API service on sub-domain: `api.*.*`
* File storage service on sub-domain: `files.*.*`
When [configuring sub-domains](#sub-domains-and-load-balancers) for **ClearML Server**, they will map to the **ClearML Server**'s
internally configured ports for the Dockers. As a result, **ClearML Server** Dockers remain accessible if, for example,
some type of port forwarding is implemented.
:::important
``app``, ``api``, and ``files`` as the sub-domain labels must be used.
:::
For example, a domain is called `mydomain.com`, and a sub-domain named `clearml.mydomain.com` is created, use the following:
* `app.clearml.mydomain.com` (web server)
* `api.clearml.mydomain.com` (API server)
* `files.clearml.mydomain.com` (file server)
Accessing the **ClearML Web UI** with `app.clearml.mydomain.com` will automatically send API requests to `api.clearml.mydomain.com`.
## ClearML Server Feature Configurations
**ClearML Server** features can be configured using either configuration files or environment variables.
### Configuration files
The **ClearML Server** uses the following configuration files:
* `apiserver.conf`
* `hosts.conf`
* `logging.conf`
* `secure.conf`
* `services.conf`
When starting up, the **ClearML Server** will look for these configuration files, in the `/opt/clearml/config` directory
(this path can be modified using the `CLEARML_CONFIG_DIR` environment variable).
The default configuration files are in the [clearml-server](https://github.com/allegroai/clearml-server/tree/master/apiserver/config/default) repository.
:::note
Within the default structure, the `services.conf` file is represented by a subdirectory with service-specific `.conf` files.
If `services.conf` is used to configure the server, any setting related to a file under the `services` subdirectory can
simply be represented by a key within the `services.conf` file.
For example, to override `multi_task_histogram_limit` that appears in the `default/services/tasks.conf`, the `services.conf` file should contain:
```
tasks {
"multi_task_histogram_limit": <new-value>
}
```
:::
### Environment variables
The **ClearML Server** supports several fixed environment variables that affect its behavior,
as well as dynamic environment variable that can be used to override any configuration file setting.
#### Fixed environment variables
General
* `CLEARML_CONFIG_DIR` allows overriding the default directory where the server looks for configuration files. Multiple directories can be specified (in the same format used for specifying the system's `PATH` env var)
Database service overrides:
* `CLEARML_MONGODB_SERVICE_HOST` allows overriding the hostname for the MongoDB service
* `CLEARML_MONGODB_SERVICE_PORT` allows overriding the port for the MongoDB service
* `CLEARML_ELASTIC_SERVICE_HOST` allows overriding the hostname for the ElasticSearch service
* `CLEARML_ELASTIC_SERVICE_PORT` allows overriding the port for the ElasticSearch service
* `CLEARML_REDIS_SERVICE_HOST` allows overriding the hostname for the Redis service
* `CLEARML_REDIS_SERVICE_PORT` allows overriding the port for the Redis service
#### Dynamic environment variables
Dynamic environment variables can be used to override any configuration setting that appears in the configuration files.
The environment variable's name should be `CLEARML__<configuration-path>`, where `<configuration-path>` represents the full path
to the configuration field being set, including the configuration file name. Elements of the configuration path
should be separated by `__` (double underscore).
For example, given the default `secure.conf` file contents:
```
...
credentials {
apiserver {
role: "system"
user_key: "defualt-key"
user_secret: "default-secret"
}
...
}
```
the default secret for the system's apiserver component can be overridden by setting the following environment variable:
`CLEARML__SECURE__CREDENTIALS__APISERVER__USER_SECRET="my-new-secret"`
:::note
* Since configuration fields may contain JSON-parsable values, make sure to always quote strings (otherwise the server
might fail to parse them)
* In order to comply with environment variables standards, it is also recommended to use only upper-case characters in
environment variable keys. For this reason, ClearML Server will always convert the configuration path specified in the
dynamic environment variable's key to lower-case before overriding configuration values with the environment variable value.
:::
## Configuration procedures
### Sub-domains and load balancers
To illustrate this configuration, we provide the following example based on AWS load balancing:
1. In the **ClearML Server** `/opt/clearml/config/apiserver.conf` file, add the following `auth.cookies` section:
auth {
cookies {
httponly: true
secure: true
domain: ".clearml.mydomain.com"
max_age: 99999999999
}
}
1. Use the following load balancer configuration:
* Listeners:
* Optional: HTTP listener, that redirects all traffic to HTTPS.
* HTTPS listener for `app.` forwarded to `AppTargetGroup`
* HTTPS listener for `api.` forwarded to `ApiTargetGroup`
* HTTPS listener for `files.` forwarded to `FilesTargetGroup`
* Target groups:
* `AppTargetGroup`: HTTP based target group, port `8080`
* `ApiTargetGroup`: HTTP based target group, port `8008`
* `FilesTargetGroup`: HTTP based target group, port `8081`
* Security and routing:
* Load balancer: make sure the load balancers are able to receive traffic from the relevant IP addresses (Security
groups and Subnets definitions).
* Instances: make sure the load balancers are able to access the instances, using the relevant ports (Security
groups definitions).
1. Restart **ClearML Server**.
### Opening Elasticsearch, MongoDB, and Redis for external access
For improved security, the ports for **ClearML Server** Elasticsearch, MongoDB, and Redis servers are not exposed by default;
they are only open internally in the docker network. If external access is needed, open these ports (but make sure to
understand the security risks involved with doing so).
:::warning
Opening the ports for Elasticsearch, MongoDB, and Redis for external access may pose a security concern and is not recommended
unless you know what you're doing. Network security measures, such as firewall configuration, should be considered when
opening ports for external access.
:::
To open external access to the Elasticsearch, MongoDB, and Redis ports:
1. Shutdown **ClearML Server**. Execute the following command (which assumes the configuration file is in the environment path).
docker-compose down
1. Edit the `docker-compose.yml` file as follows:
* In the `elasticsearch` section, add the two lines:
ports:
- "9200:9200"
* In the `mongo` section, add the two lines:
ports:
- "27017:27017"
* In the `redis` section, add the two lines:
ports:
- "6379:6379"
1. Startup **ClearML Server**.
docker-compose -f docker-compose.yml pull
docker-compose -f docker-compose.yml up -d
### Web Login Authentication
Web login authentication can be configured in the **ClearML Server** in order to permit only users provided
with credentials to access the **ClearML** system. Those credentials are a username and password.
Without web login authentication, **ClearML Server** does not restrict access (by default).
**To add web login authentication to the ClearML Server:**
1. In **ClearML Server** `/opt/clearml/config/apiserver.conf`, add the `auth.fixed_users` section and specify the users.
For example:
auth {
# Fixed users login credentials
# No other user will be able to login
fixed_users {
enabled: true
pass_hashed: false
users: [
{
username: "jane"
password: "12345678"
name: "Jane Doe"
},
{
username: "john"
password: "12345678"
name: "John Doe"
},
]
}
}
1. Restart **ClearML Server**.
### Using hashed passwords
You can also use hashed passwords instead of plain-text passwords. To do that:
- Set `pass_hashed: true`
- Use a base64-encoded hashed password in the `password` field instead of a plain-text password. Assuming Jane's plain-text password is `123456`, use the following bash command to generate the base64-encoded hashed password:
```bash
> python3 -c 'import bcrypt,base64; print(base64.b64encode(bcrypt.hashpw("123456".encode(), bcrypt.gensalt())))'
b'JDJiJDEyJDk3OHBFcHFlNEsxTkFoZDlPcGZsbC5sU1pmM3huZ1RpeHc0ay5WUjlzTzN5WE1WRXJrUmhp'
```
- Use the command's output as the user's password. Resulting `apiserver.conf` file should look as follows:
auth {
# Fixed users login credentials
# No other user will be able to login
fixed_users {
enabled: true
pass_hashed: true
users: [
{
username: "jane"
password: "JDJiJDEyJDk3OHBFcHFlNEsxTkFoZDlPcGZsbC5sU1pmM3huZ1RpeHc0ay5WUjlzTzN5WE1WRXJrUmhp"
name: "Jane Doe"
}
]
}
}
### Non-responsive Task watchdog
The non-responsive experiment watchdog monitors experiments that were not updated for a specified time interval, and then
the watchdog marks them as `aborted`. The non-responsive experiment watchdog is always active.
Modify the following settings for the watchdog:
* The time threshold (in seconds) of experiment inactivity (default value is 7200 seconds (2 hours)).
* The time interval (in seconds) between watchdog cycles.
**To configure the non-responsive watchdog for the ClearML Server:**
1. In the **ClearML Server** `/opt/clearml/config/services.conf` file, add or edit the `tasks.non_responsive_tasks_watchdog`
and specify the watchdog settings.
For example:
tasks {
non_responsive_tasks_watchdog {
# In-progress tasks that haven't been updated for at least 'value' seconds will be stopped by the watchdog
threshold_sec: 7200
# Watchdog will sleep for this number of seconds after each cycle
watch_interval_sec: 900
}
}
1. Restart **ClearML Server**.

View File

@@ -0,0 +1,217 @@
---
title: Upgrading Server from v0.15 or Older to ClearML Server
---
:::important
This documentation page applies to deploying your own open source ClearML Server. It does not apply to ClearML Hosted Service users.
:::
In v0.16, the Elasticsearch subsystem of **Trains Server** was upgraded from version 5.6 to version 7.6. This change necessitates
the migration of the database contents to accommodate the change in index structure across the different versions.
This page provides the instructions to carry out the migration process. Follow this process if using **Trains Server**
version 0.15 or older and are upgrading to **ClearML Server**.
The migration process makes use of a script that automatically performs the following:
* Backs up the existing **Trains Server** Elasticsearch data.
* Launches a pair of Elasticsearch 5 and Elasticsearch 7 migration containers.
* Copies the Elasticsearch indices using the migration containers.
* Terminates the migration containers.
* Renames the original data directory to avoid accidental reuse.
:::warning
Once the migration process completes successfully, the data is no longer accessible to the older version of Trains Server,
and **ClearML Server** needs to be installed.
:::
### Prerequisites
* Read/write permissions for the default **Trains Server** data directory `/opt/clearml/data` and its subdirectories, or,
if this default directory is not used, the permissions for the directory and subdirectories that are used.
* A minimum of 8GB system RAM.
* Minimum free disk space of at least 30% plus two times the size of the data.
* Python version >=2.7 or >=3.6, and Python accessible from the command-line as `python`
### Migrating the data
**To migrate the data:**
1. If the **Trains Server** is up, shut it down:
* **Linux and macOS**
docker-compose -f /opt/trains/docker-compose.yml down
* **Windows**
docker-compose -f c:\opt\trains\docker-compose-win10.yml down
* **Kubernetes**
kubectl delete -k overlays/current_version
* **Kubernetes using Helm**
helm del --purge trains-server
kubectl delete namespace trains
1. For **Kubernetes** and **Kubernetes using Helm**, connect to the node in the Kubernetes cluster labeled `app=trains`.
1. Download the migration package archive.
curl -L -O https://github.com/allegroai/clearml-server/releases/download/0.16.0/trains-server-0.16.0-migration.zip
If the file needs to be downloaded manually, use this direct link: [trains-server-0.16.0-migration.zip](https://github.com/allegroai/clearml-server/releases/download/0.16.0/trains-server-0.16.0-migration.zip).
1. Extract the archive.
unzip trains-server-0.16.0-migration.zip -d /opt/trains
1. Migrate the data.
* **Linux, macOS, and Windows** - if managing own containers.
Run the migration script. If elevated privileges are used to run Docker (`sudo` in Linux, or admin in Windows),
then use elevated privileges to run the migration script.
python elastic_upgrade.py [-s|--source <source_path>] [-t|--target <target_path>] [-n|--no-backup] [-p|--parallel]
The following optional command line parameters can be used to control the execution of the migration script:
* `<source_path>` - The path to the Elasticsearch data directory in the current **Trains Server** deployment.
If not specified, uses the default value of `/opt/trains/data/elastic` (or `c:\opt\trains\data\elastic` in Windows)
* `<target_path>` - The path to the Elasticsearch data directory in the current **Trains Server** deployment.
If not specified, uses the default value of `/opt/trains/data/elastic_7` (or `c:\opt\trains\data\elastic_7` in Windows)
* `no-backup` - Skip creating a backup of the existing Elasticsearch data directory before performing the migration.
If not specified, takes on the default value of `False` (Performs backup)
* `parallel` - Copy several indices in parallel to utilize more CPU cores. If not specified, parallel indexing is turned off.
* **Kubernetes**
1. Clone the `trains-server-k8s` repository and change to the new `trains-server-k8s/upgrade-elastic` directory:
git clone https://github.com/allegroai/clearml-server-k8s.git && cd clearml-server-k8s/upgrade-elastic
1. Create the `upgrade-elastic` namespace and deployments:
kubectl apply -k overlays/current_version
Wait for the job to be completed. To check if it's completed, run:
kubectl get jobs -n upgrade-elastic
* **Kubernetes using Helm**
1. Add the `clearml-server` repository to Helm client.
helm repo add allegroai https://allegroai.github.io/clearml-server-helm/
Confirm the `clearml-server` repository is now in the Helm client.
helm search clearml
The `helm search` results must include `allegroai/upgrade-elastic-helm`.
1. Install `upgrade-elastic-helm` on the cluster:
helm install allegroai/upgrade-elastic-helm --namespace=upgrade-elastic --name upgrade
An upgrade-elastic `namespace` is created in the cluster, and the upgrade is deployed in it.
Wait for the job to complete. To check if it completed, execute the following command:
kubectl get jobs -n upgrade-elastic
### Finishing up
To finish up:
1. Verify the data migration
1. Conclude the upgrade.
#### Step 1. Verifying the data migration
Upon successful completion, the migration script renames the original **Trains Server** directory, which contains the now
migrated data, and prints a completion message:
Renaming the source directory /opt/trains/data/elastic to /opt/trains/data/elastic_migrated_<date_time>.
Upgrade completed.
All console output during the execution of the migration script is saved to a log file in the directory where the migration script executes:
<path_to_script>/upgrade_to_7_<date_time>.log
If the migration script does not complete successfully, the migration script prints the error.
:::important
For help in resolving migration issues, check the **allegro-clearml** [Slack Channel](https://join.slack.com/t/clearml/shared_invite/zt-c0t13pty-aVUZZW1TSSSg2vyIGVPBhg),
[GitHub Issues](https://github.com/allegroai/clearml-server/issues), and the **ClearML Server** sections of the [FAQ](../faq.md).
:::
#### Step 2. Completing the installation
After verifying the data migration completed successfully, conclude the **ClearML Server** installation process.
##### Linux or macOS
For Linux or macOS, conclude with the steps in this section. For other deployment formats, see [below](#other-deployment-formats).
<details className="cml-expansion-panel info">
<summary className="cml-expansion-panel-summary">Important: Upgrading from v0.14 or older</summary>
<div className="cml-expansion-panel-content">
For Linux only, if upgrading from **Trains Server** v0.14 or older, configure the **ClearML Agent Services**.
* If ``CLEARML_HOST_IP`` is not provided, then **ClearML Agent Services** will use the external public address of the
**ClearML Server**.
* If ``CLEARML_AGENT_GIT_USER`` / ``CLEARML_AGENT_GIT_PASS`` are not provided, then **ClearML Agent Services** will
not be able to access any private repositories for running service tasks.
export CLEARML_HOST_IP=server_host_ip_here
export CLEARML_AGENT_GIT_USER=git_username_here
export CLEARML_AGENT_GIT_PASS=git_password_here
:::note
For backwards compatibility, the environment variables ``TRAINS_HOST_IP``, ``TRAINS_AGENT_GIT_USER``, and ``TRAINS_AGENT_GIT_PASS`` are supported.
:::
</div>
</details>
1. We recommend backing up data and, if the configuration folder is not empty, backing up the configuration.
For example, if the data and configuration folders are in `/opt/trains`, then archive all data into `~/trains_backup_data.tgz`,
and the configuration into `~/trains_backup_config.tgz`:
sudo tar czvf ~/trains_backup_data.tgz -C /opt/trains/data .
sudo tar czvf ~/trains_backup_config.tgz -C /opt/trains/config .
1. Rename `/opt/trains` and its subdirectories to `/opt/clearml`.
sudo mv /opt/trains /opt/clearml
1. Download the latest `docker-compose.yml` file.
curl https://raw.githubusercontent.com/allegroai/clearml-server/master/docker/docker-compose.yml -o /opt/clearml/docker-compose.yml
1. Startup **ClearML Server**. This automatically pulls the latest **ClearML Server** build.
docker-compose -f /opt/clearml/docker-compose.yml pull
docker-compose -f /opt/clearml/docker-compose.yml up -d
If issues arise during the upgrade, see the FAQ page, [How do I fix Docker upgrade errors?](../faq#common-docker-upgrade-errors).
##### Other deployment formats
To conclude the upgrade for deployment formats other than Linux, follow their upgrade instructions:
* [AWS EC2 AMIs](upgrade_server_aws_ec2_ami.md)
* [Google Cloud Platform custom images](upgrade_server_gcp.md)
* [Linux and macOS](upgrade_server_linux_mac.md)
* [Windows](upgrade_server_win.md)
* [Kubernetes](upgrade_server_kubernetes.md)
* [Kubernetes Using Helm](upgrade_server_kubernetes_helm.md).

View File

@@ -0,0 +1,124 @@
---
title: Google Cloud Platform
---
:::important
This documentation page applies to deploying your own open source ClearML Server. It does not apply to ClearML Hosted Service users.
:::
Deploy **ClearML Server** on the Google Cloud Platform (GCP) using one of the pre-built GCP Custom Images. **ClearML**
provides custom images for each released version of **ClearML Server**. For a list of the pre-built custom images, see
[ClearML Server GCP Custom Image](#clearml-server-gcp-custom-image).
After deploying **ClearML Server**, configure the **ClearML Python Package** for it, see [Configuring ClearML for ClearML Server](clearml_config_for_clearml_server.md).
For information about updgrading **ClearML server on GCP, see [here](upgrade_server_gcp.md).
:::important
If **ClearML Server** is being reinstalled, we recommend clearing browser cookies for **ClearML Server**. For example,
for Firefox, go to Developer Tools > Storage > Cookies, and for Chrome, go to Developer Tools > Application > Cookies,
and delete all cookies under the **ClearML Server** URL.
:::
## Default ClearML Server service ports
After deploying **ClearML Server**, the services expose the following node ports:
* Web server on `8080`
* API server on `8008`
* File Server on `8081`
## Default ClearML Server storage paths
The persistent storage configuration:
* MongoDB: `/opt/clearml/data/mongo/`
* Elasticsearch: `/opt/clearml/data/elastic_7/`
* File Server: `/mnt/fileserver/`
## Importing the Custom Image to your GCP account
Before launching an instance using a **ClearML Server** GCP Custom Image, import the image to the custom images list.
:::note
No upload of the image file is required. We provide links to image files stored in Google Storage.
:::
**To import the image to your custom images list:**
1. In the Cloud Console, go to the [Images](https://console.cloud.google.com/compute/images) page.
1. At the top of the page, click **Create image**.
1. In **Name**, specify a unique name for the image.
1. Optionally, specify an image family for the new image, or configure specific encryption settings for the image.
1. In the **Source** menu, select **Cloud Storage file**.
1. Enter the **ClearML Server** image bucket path (see [ClearML Server GCP Custom Image](#clearml-server-gcp-custom-image)),
for example: `allegro-files/clearml-server/clearml-server.tar.gz`.
1. Click **Create** to import the image. The process can take several minutes depending on the size of the boot disk image.
For more information see [Import the image to your custom images list](https://cloud.google.com/compute/docs/import/import-existing-image#import_image) in the [Compute Engine Documentation](https://cloud.google.com/compute/docs).
## Launching
:::warning
By default, **ClearML Server** launches with unrestricted access. To restrict **ClearML Server** access, follow the
instructions in the [Security](clearml_server_security.md) page.
:::
To launch **ClearML Server** using a GCP Custom Image, see the [Manually importing virtual disks](https://cloud.google.com/compute/docs/import/import-existing-image#overview) in the "Google Cloud Storage" documentation, [Compute Engine documentation](https://cloud.google.com/compute/docs). For more information on Custom Images, see [Custom Images](https://cloud.google.com/compute/docs/images#custom_images) in the "Compute Engine documentation".
The minimum requirements for **ClearML Server** are:
* 2 vCPUs
* 7.5GB RAM
## Restarting
**To restart ClearML Server Docker deployment:**
* Stop and then restart the Docker containers by executing the following commands:
docker-compose -f /opt/clearml/docker-compose.yml down
docker-compose -f /opt/clearml/docker-compose.yml up -d
## Backing up and restoring data and configuration
The commands in this section are an example of how to back up and restore data and configuration .
If data and configuration folders are in `/opt/clearml`, then archive all data into `~/clearml_backup_data.tgz`, and
configuration into `~/clearml_backup_config.tgz`:
sudo tar czvf ~/clearml_backup_data.tgz -C /opt/clearml/data .
sudo tar czvf ~/clearml_backup_config.tgz -C /opt/clearml/config .
If the data and the configuration need to be restored:
1. Verify you have the backup files.
1. Replace any existing data with the backup data:
sudo rm -fR /opt/clearml/data/* /opt/clearml/config/*
sudo tar -xzf ~/clearml_backup_data.tgz -C /opt/clearml/data
sudo tar -xzf ~/clearml_backup_config.tgz -C /opt/clearml/config
1. Grant access to the data:
sudo chown -R 1000:1000 /opt/clearml
## ClearML Server GCP Custom Image
The following section contains a list of Custom Image URLs (exported in different formats) for each released **ClearML Server** version.
### Latest version - v0.17.0
- [https://storage.googleapis.com/allegro-files/clearml-server/clearml-server.tar.gz](https://storage.googleapis.com/allegro-files/clearml-server/clearml-server.tar.gz)
### All release versions
- v0.17.0 - [https://storage.googleapis.com/allegro-files/clearml-server/clearml-server-0-17-0.tar.gz](https://storage.googleapis.com/allegro-files/clearml-server/clearml-server-0-17-0.tar.gz)
## Next Step
* [Configuring ClearML for ClearML Server](clearml_config_for_clearml_server.md).

View File

@@ -0,0 +1,146 @@
---
title: Kubernetes
---
:::important
This documentation page applies to deploying your own open source **ClearML Server**. It does not apply to **ClearML Hosted Service** users.
:::
This page describes the prerequisites and procedures for deploying **ClearML Server** to Kubernetes clusters, using manual
instructions, as well as accessing **ClearML Server**, and port mappings.
To deploy **ClearML Server** to Kubernetes using Helm, see [Deploying ClearML Server: Kubernetes using Helm](clearml_server_kubernetes_helm.md).
For more information about upgrading **ClearML Server** in a Kubernetes Cluster, see [here](upgrade_server_kubernetes.md)
:::important
If **ClearML Server** is being reinstalled, we recommend clearing browser cookies for **ClearML Server**. For example,
for Firefox, go to Developer Tools > Storage > Cookies, and for Chrome, go to Developer Tools > Application > Cookies,
and delete all cookies under the **ClearML Server** URL.
:::
## Prerequisites
* A Kubernetes cluster.
* `kubectl` installed and configured (see [Install and Set Up kubectl](https://kubernetes.io/docs/tasks/tools/install-kubectl/) in the Kubernetes documentation).
* One node labeled `app=clearml`.
:::warning
ClearML Server deployment uses node storage. If more than one node is labeled as ``app=clearml``, and the server is later
redeployed or updated, then **ClearML Server** may not locate all the data.
:::
## Deploying
:::warning
By default, **ClearML Server** launches with unrestricted access. To restrict **ClearML Server** access, follow the instructions
in the [Security](clearml_server_security.md) page.
:::
### Step 1: Modify Elasticsearch default values in the Docker configuration file
Before deploying **ClearML Server** in a Kubernetes cluster, modify several Elasticsearch settings in the Docker configuration.
For more information, see [Install Elasticsearch with Docker](https://www.elastic.co/guide/en/elasticsearch/reference/master/docker.html#_notes_for_production_use_and_defaults)
in the Elasticsearch documentation and [Daemon configuration file](https://docs.docker.com/config/daemon/) in the Docker documentation.
**To modify Elasticsearch default values in the Docker configuration file:**
1. Connect to the node in the Kubernetes cluster labeled `app=clearml`.
1. Create or edit (if one exists) the `/etc/docker/daemon.json` file, and add or modify the `defaults-ulimits` section
as the following example shows:
{
"default-ulimits": {
"nofile": {
"name": "nofile",
"hard": 65536,
"soft": 1024
},
"memlock":
{
"name": "memlock",
"soft": -1,
"hard": -1
}
}
}
1. Elasticsearch requires that the `vm.max_map_count` kernel setting, which is the maximum number of memory map areas a process can use, be set to at least `262144`.
For CentOS 7, Ubuntu 16.04, Mint 18.3, Ubuntu 18.04 and Mint 19.x, use the following commands to set `vm.max_map_count`:
echo "vm.max_map_count=262144" > /tmp/99-clearml.conf
sudo mv /tmp/99-clearml.conf /etc/sysctl.d/99-clearml.conf
sudo sysctl -w vm.max_map_count=262144
1. Restart docker:
sudo service docker restart
### Step 2. Deploy ClearML Server in the Kubernetes Cluster
After modifying several Elasticsearch settings in the Docker configuration (see Step 1 above), deploy **ClearML
Server**.
**To deploy ClearML Server in Kubernetes Clusters:**
1. Clone the `clearml-server-k8s` repository and change to the new `clearml-server-k8s` directory:
git clone https://github.com/allegroai/clearml-server-k8s.git && cd clearml-server-k8s/clearml-server-k8s
1. Create the clearml `namespace` and deployments:
kubectl apply -k overlays/current_version
:::note
This installs the templates for the current ``clearml-server`` version and updates patch versions whenever the deployment is restarted (or reinstalled).
:::
To use the latest version, which is **_not recommended_**:
kubectl apply -k base
## Port Mapping
After deploying **ClearML Server**, the services expose the following node ports:
* API server on `30008`.
* Web server on `30080`.
* File server on `30081`.
## Accessing ClearML Server
**To access the ClearML Server, do the following:**
1. Create domain records.
* Create records for the **ClearML Server** web server, file server, and API access using the following rules:
* `app.<your_domain_name>`
* `files.<your_domain_name>`
* `api.<your_domain_name>`
For example:
* `app.clearml.mydomainname.com`
* `files.clearml.mydomainname.com`
* `api.clearml.mydomainname.com`
1. Point the created records to the load balancer.
1. Configure the load balancer to redirect traffic coming from the records:
* `app.<your_domain_name>` should be redirected to k8s cluster nodes on port `30080`
* `files.<your_domain_name>` should be redirected to k8s cluster nodes on port `30081`
* `api.<your_domain_name>` should be redirected to k8s cluster nodes on port `30008`
## Next Step
* [Configuring ClearML for ClearML Server](clearml_config_for_clearml_server.md).

View File

@@ -0,0 +1,146 @@
---
title: Kubernetes Using Helm
---
:::important
This documentation page applies to deploying your own open source **ClearML Server**. It does not apply to **ClearML Hosted Service** users.
:::
:::warning
If **ClearML Server** is being reinstalled, we recommend clearing browser cookies for **ClearML Server**. For example,
for Firefox, go to Developer Tools > Storage > Cookies, and for Chrome, go to Developer Tools > Application > Cookies,
and delete all cookies under the **ClearML Server** URL.
:::
For information about upgrading **ClearML Server** in Kubernetes Clusters using Help, see [here](upgrade_server_kubernetes_helm.md).
## Prerequisites
* A Kubernetes cluster.
* `kubectl` installed and configured (see [Install and Set Up kubectl](https://kubernetes.io/docs/tasks/tools/install-kubectl/) in the Kubernetes documentation).
* `helm` is installed (see [Installing Helm](https://helm.sh/docs/using_helm.html#installing-helm) in the Helm documentation).
* One node labeled `app=clearml`.
:::warning
ClearML Server deployment uses node storage. If more than one node is labeled as ``app=clearml``, and the server is later
redeployed or updated, then **ClearML Server** may not locate all the data.
:::
## Deploying
:::warning
By default, **ClearML Server** launches with unrestricted access. To restrict **ClearML Server** access, follow the
instructions in the [Security](clearml_server_security.md) page.
:::
### Step 1: Modify Elasticsearch default values in the Docker configuration file
Before deploying **ClearML Server** in a Kubernetes cluster, modify several Elasticsearch settings in the Docker configuration.
For more information, see [Install Elasticsearch with Docker](https://www.elastic.co/guide/en/elasticsearch/reference/master/docker.html#_notes_for_production_use_and_defaults)
in the Elasticsearch documentation and [Daemon configuration file](https://docs.docker.com/config/daemon/) in the Docker documentation.
**To modify Elasticsearch default values in the Docker configuration file:**
1. Connect to the node in the Kubernetes cluster labeled `app=clearml`.
1. Create or edit (if one exists) the `/etc/docker/daemon.json` file, and add or modify the `defaults-ulimits` section as
the following example shows:
{
"default-ulimits": {
"nofile": {
"name": "nofile",
"hard": 65536,
"soft": 1024
},
"memlock":
{
"name": "memlock",
"soft": -1,
"hard": -1
}
}
}
1. Elasticsearch requires that the `vm.max_map_count` kernel setting, which is the maximum number of memory map areas a
process can use, be set to at least `262144`.
For CentOS 7, Ubuntu 16.04, Mint 18.3, Ubuntu 18.04 and Mint 19.x, use the following commands to set `vm.max_map_count`:
echo "vm.max_map_count=262144" > /tmp/99-clearml.conf
sudo mv /tmp/99-clearml.conf /etc/sysctl.d/99-clearml.conf
sudo sysctl -w vm.max_map_count=262144
1. Restart docker:
sudo service docker restart
### Step 2. Deploy ClearML Server in the Kubernetes using Helm
After modifying several Elasticsearch settings in the Docker configuration (see Step 1 above), deploy **ClearML Server**.
**To deploy ClearML Server in Kubernetes using Helm:**
1. Add the clearml-server repository to Helm:
helm repo add allegroai https://allegroai.github.io/clearml-server-helm/
1. Confirm the clearml-server repository is now in Helm:
helm search clearml
The helm search results must include `allegroai/clearml-server-chart`.
1. Install `clearml-server-chart` on your cluster:
helm install allegroai/clearml-server-chart --namespace=clearml --name clearml-server
A clearml `namespace` is created in the cluster and clearml-server is deployed in it.
## Port Mapping
After **ClearML Server** is deployed, the services expose the following node ports:
* API server on `30008`.
* Web server on `30080`.
* File server on `30081`.
The node ports map to the following container ports:
* `30080` maps to `clearml-webserver` container on port `8080`
* `30008` maps to `clearml-apiserver` container on port `8008`
* `30081` maps to `clearml-fileserver` container on port `8081`
:::important
We recommend using the container ports (``8080``, ``8008``, and ``8081``), or a load balancer (see the next section, [Accessing ClearML Server](#accessing)).
:::
## Accessing ClearML Server
**To access ClearML Server:**
* Create a load balancer and domain with records pointing to **ClearML Server** using the following rules, which **ClearML**
uses to translate domain names:
* The record to access the **ClearML Web UI**:
*app.<your domain name>.*
For example, `clearml.app.mydomainname.com` points to your node on port `30080`.
* The record to access the **ClearML** API:
*api.<your domain name>.*
For example, `clearml.api.mydomainname.com` points to your node on port `30008`.
* The record to access the **ClearML** file server:
*files.<your domain name>.*
For example, `clearmlfiles.mydomainname.com` points to your node on port `30081`.
## Next Step
* [Configuring ClearML for ClearML Server](clearml_config_for_clearml_server.md).

View File

@@ -0,0 +1,187 @@
---
title: Linux and macOS
---
:::important
This documentation page applies to deploying your own open source ClearML Server. It does not apply to ClearML Hosted Service users.
:::
Deploy the **ClearML Server** in Linux or macOS using the pre-built Docker image.
For **ClearML** docker images, including previous versions, see [https://hub.docker.com/r/allegroai/clearml](https://hub.docker.com/r/allegroai/clearml).
However, pulling the **ClearML** Docker image directly is not required. We provide a docker-compose YAML file that does this.
The docker-compose file is included in the instructions on this page.
For information about upgrading **ClearML Server** in Linux or macOS, see [here](upgrade_server_linux_mac.md)
:::important
If **ClearML Server** is being reinstalled, we recommend clearing browser cookies for **ClearML Server**. For example,
for Firefox, go to Developer Tools > Storage > Cookies, and for Chrome, go to Developer Tools > Application > Cookies,
and delete all cookies under the **ClearML Server** URL.
:::
## Prerequisites
For Linux users only:
* Linux distribution must support Docker. For more information, see this [explanation](https://docs.docker.com/engine/install/) in the Docker documentation.
* Be logged in as a user with `sudo` privileges.
* Use `bash` for all command-line instructions in this installation.
* The ports `8080`, `8081`, and `8008` must be available for the **ClearML Server** services.
## Deploying
:::warning
By default, **ClearML Server** launches with unrestricted access. To restrict **ClearML Server** access, follow the
instructions in the [Security](clearml_server_security.md) page.
:::
**To launch **ClearML Server** on Linux or macOS:**
1. Install Docker. The instructions depend upon the operating system:
* Linux - see [Docker for Ubuntu](https://docs.docker.com/install/linux/docker-ce/ubuntu/).
* macOS - see [Docker for OS X](https://docs.docker.com/docker-for-mac/install/).
1. Verify the Docker CE installation. Execute the command:
docker run hello-world
The expected is output is:
Hello from Docker!
This message shows that your installation appears to be working correctly.
To generate this message, Docker took the following steps:
1. The Docker client contacted the Docker daemon.
2. The Docker daemon pulled the "hello-world" image from the Docker Hub. (amd64)
3. The Docker daemon created a new container from that image which runs the executable that produces the output you are currently reading.
4. The Docker daemon streamed that output to the Docker client, which sent it to your terminal.
1. For macOS only, increase the memory allocation in Docker Desktop to `8GB`.
1. In the top status bar, click the Docker icon.
1. Click **Preferences** **>** **Resources** **>** **Advanced**, and then set the memory to at least `8192`.
1. Click **Apply**.
1. For Linux only, install `docker-compose`. Execute the following commands (for more information, see [Install Docker Compose](https://docs.docker.com/compose/install/) in the Docker documentation):
sudo curl -L "https://github.com/docker/compose/releases/download/1.24.1/docker-compose-$(uname -s)-$(uname -m)" -o /usr/local/bin/docker-compose
sudo chmod +x /usr/local/bin/docker-compose
1. Increase `vm.max_map_count` for Elasticsearch in Docker. Execute the following commands, depending upon the operating system:
* Linux:
echo "vm.max_map_count=262144" > /tmp/99-clearml.conf
sudo mv /tmp/99-clearml.conf /etc/sysctl.d/99-clearml.conf
sudo sysctl -w vm.max_map_count=262144
sudo service docker restart
* macOS:
screen ~/Library/Containers/com.docker.docker/Data/vms/0/tty
sysctl -w vm.max_map_count=262144
1. Remove any previous installation of **ClearML Server**.
**This clears all existing ClearML SDK databases.**
sudo rm -R /opt/clearml/
1. Create local directories for the databases and storage.
sudo mkdir -p /opt/clearml/data/elastic_7
sudo mkdir -p /opt/clearml/data/mongo/db
sudo mkdir -p /opt/clearml/data/mongo/configdb
sudo mkdir -p /opt/clearml/data/redis
sudo mkdir -p /opt/clearml/logs
sudo mkdir -p /opt/clearml/config
sudo mkdir -p /opt/clearml/data/fileserver
1. For macOS only do the following:
1. Open the Docker app.
1. Select **Preferences**.
1. On the **File Sharing** tab, add `/opt/clearml`.
1. Grant access to the Dockers, depending upon the operating system.
* Linux:
sudo chown -R 1000:1000 /opt/clearml
* macOS:
sudo chown -R $(whoami):staff /opt/clearml
1. Download the **ClearML Server** docker-compose YAML file.
sudo curl https://raw.githubusercontent.com/allegroai/clearml-server/master/docker/docker-compose.yml -o /opt/clearml/docker-compose.yml
1. For Linux only, configure the **ClearML Agent Services**. If `CLEARML_HOST_IP` is not provided, then **ClearML Agent Services** will use the external public address of the **ClearML Server**. If `CLEARML_AGENT_GIT_USER` / `CLEARML_AGENT_GIT_PASS` are not provided, then **ClearML Agent Services** will not be able to access any private repositories for running service tasks.
export CLEARML_HOST_IP=server_host_ip_here
export CLEARML_AGENT_GIT_USER=git_username_here
export CLEARML_AGENT_GIT_PASS=git_password_here
1. Run `docker-compose` with the downloaded configuration file.
docker-compose -f /opt/clearml/docker-compose.yml up -d
The server is now running on [http://localhost:8080](http://localhost:8080).
## Port mapping
After deploying **ClearML Server**, the services expose the following ports:
* Web server on port `8080`
* API server on port `8008`
* File server on port `8081`
## Restarting
**To restart ClearML Server Docker deployment:**
* Stop and then restart the Docker containers by executing the following commands:
docker-compose -f /opt/clearml/docker-compose.yml down
docker-compose -f /opt/clearml/docker-compose.yml up -d
## Backing up and restoring data and configuration
The commands in this section are an example of how to back up and to restore data and configuration .
If the data and configuration folders are in `/opt/clearml`, then archive all data into `~/clearml_backup_data.tgz`, and
configuration into `~/clearml_backup_config.tgz`:
sudo tar czvf ~/clearml_backup_data.tgz -C /opt/clearml/data .
sudo tar czvf ~/clearml_backup_config.tgz -C /opt/clearml/config .
If needed, restore data and configuration by doing the following:
1. Verify the existence of backup files.
1. Replace any existing data with the backup data:
sudo rm -fR /opt/clearml/data/* /opt/clearml/config/*
sudo tar -xzf ~/clearml_backup_data.tgz -C /opt/clearml/data
sudo tar -xzf ~/clearml_backup_config.tgz -C /opt/clearml/config
1. Grant access to the data, depending upon the operating system:
* Linux:
sudo chown -R 1000:1000 /opt/clearml
* macOS:
sudo chown -R $(whoami):staff /opt/clearml
## Next Step
* [Configuring ClearML for ClearML Server](clearml_config_for_clearml_server.md).

View File

@@ -0,0 +1,94 @@
---
title: Securing ClearML Server
---
:::important
This documentation page applies to deploying your own open source ClearML Server. It does not apply to ClearML Hosted Service users.
:::
To ensure deployment is properly secure, we recommend you follow the following best practices.
## Network Security
If the deployment is in an open network that allows public access, only allow access to the specific ports used by
**ClearML Server** (see [ClearML Server configurations](clearml_server_config.md#clearml-server-deployment-configuration)).
If HTTPS access is configured for the instance, allow access to port `443`.
For improved security, the ports for **ClearML Server** Elasticsearch, MongoDB, and Redis servers are not exposed by
default; they are only open internally in the docker network.
## User Access Security
Configure **ClearML Server** to use Web Login authentication, which requires a username and password for user access
(see [Web Login Authentication](clearml_server_config.md#web-login-authentication)).
## Server Credentials and Secrets
By default, **ClearML Server** comes with default values that are designed to allow to set it up quickly and to start working
with the ClearML SDK.
However, this also means that the **server must be secured** by either preventing any external access, or by changing
defaults so that the server's credentials are not publicly known.
The **ClearML Server** default secrets can be found [here](https://github.com/allegroai/clearml-server/blob/master/apiserver/config/default/secure.conf), and can be changed using the `secure.conf` configuration file or using environment variables
(see [ClearML Server Feature Configurations](clearml_server_config.md#clearml-server-feature-configurations)).
Specifically, the relevant settings are:
* `secure.http.session_secret.apiserver`
* `secure.auth.token_secret`
* `secure.credentials.apiserver.user_key`
* `secure.credentials.apiserver.user_secret`
* `secure.credentials.webserver.user_key` (automatically revoked by the server if using [Web Login Authentication](clearml_server_config.md#web-login-authentication))
* `secure.credentials.webserver.user_secret` (automatically revoked by the server if using [Web Login Authentication](./clearml_server_config.md#web-login-authentication))
* `secure.credentials.tests.user_key`
* `secure.credentials.tests.user_secret`
:::note
Securing the ClearML Server means also using [Web Login Authentication](clearml_server_config.md#web-login-authentication),
since the default "free access" login is inherently unsecure (and will not work once ``secure.credentials.webserver.user_key``
and ``secure.credentials.webserver.user_secret`` values are changed)
:::
### Example: Using Environment Variables
To set new values for these settings, use the following environment variables:
* `CLEARML__SECURE__HTTP__SESSION_SECRET__APISERVER="new-secret-string"`
* `CLEARML__SECURE__AUTH__TOKEN_SECRET="new-secret-string"`
* `CLEARML__SECURE__CREDENTIALS__APISERVER__USER_KEY="new-key-string"`
* `CLEARML__SECURE__CREDENTIALS__APISERVER__USER_SECRET="new-secret-string"`
* `CLEARML__SECURE__CREDENTIALS__WEBSERVER__USER_KEY="new-key-string"`
* `CLEARML__SECURE__CREDENTIALS__WEBSERVER__USER_SECRET="new-secret-string"`
* `CLEARML__SECURE__CREDENTIALS__TESTS__USER_KEY="new-key-string"`
* `CLEARML__SECURE__CREDENTIALS__TESTS__USER_SECRET="new-secret-string"`
### Example: Using Docker Compose
If used in `docker-compose.yml`, these variables should be specified for the `apiserver` service, under the `environment` section as follows:
```yaml
version: "3.6"
services:
apiserver:
...
environment:
...
CLEARML__SECURE__HTTP__SESSION_SECRET__APISERVER: "new-secret-string"
CLEARML__SECURE__AUTH__TOKEN_SECRET: "new-secret-string"
CLEARML__SECURE__CREDENTIALS__APISERVER__USER_KEY: "new-key-string"
CLEARML__SECURE__CREDENTIALS__APISERVER__USER_SECRET: "new-secret-string"
CLEARML__SECURE__CREDENTIALS__WEBSERVER__USER_KEY: "new-key-string"
CLEARML__SECURE__CREDENTIALS__WEBSERVER__USER_SECRET: "new-secret-string"
CLEARML__SECURE__CREDENTIALS__TESTS__USER_KEY: "new-key-string"
CLEARML__SECURE__CREDENTIALS__TESTS__USER_SECRET: "new-secret-string"
...
```
:::important
When generating new user keys and secrets, make sure to use sufficiently long strings (we use 30 chars for keys and 50-60
chars for secrets). See [here](https://github.com/allegroai/clearml-server/blob/master/apiserver/service_repo/auth/utils.py)
for Python example code to generate these strings.
:::

View File

@@ -0,0 +1,83 @@
---
title: Windows 10
---
:::important
This documentation page applies to deploying your own open source ClearML Server. It does not apply to ClearML Hosted Service users.
:::
For Windows, we recommend launching the pre-built Docker image on a Linux virtual machine (see [Deploying ClearML Server: Linux or macOS](clearml_server_linux_mac.md)).
However, **ClearML Server** can be launched on Windows 10, using Docker Desktop for Windows (see the Docker [System Requirements](https://docs.docker.com/docker-for-windows/install/#system-requirements)).
For information about upgrading **ClearML Server** on Windows, see [here](upgrade_server_win.md).
:::important
If **ClearML Server** is being reinstalled, we recommend clearing browser cookies for **ClearML Server**. For example,
for Firefox, go to Developer Tools > Storage > Cookies, and for Chrome, go to Developer Tools > Application > Cookies,
and delete all cookies under the **ClearML Server** URL.
:::
## Deploying
:::warning
By default, **ClearML Server** launches with unrestricted access. To restrict **ClearML Server** access, follow the instructions in the [Security](clearml_server_security.md) page.
:::
**To deploy ClearML Server on Windows 10:**
1. Install the Docker Desktop for Windows application by either:
* Following the [Install Docker Desktop on Windows](https://docs.docker.com/docker-for-windows/install/) instructions.
* Running the Docker installation [wizard](https://hub.docker.com/?overlay=onboarding).
1. Increase the memory allocation in Docker Desktop to `4GB`.
1. In the Windows notification area (system tray), right click the Docker icon.
1. Click **Settings** **>** **Advanced**, and then set the memory to at least `4096`.
1. Click **Apply**.
1. Remove any previous installation of **ClearML Server**.
**This clears all existing ClearML SDK databases.**
rmdir c:\opt\clearml /s
1. Create local directories for data and logs. Open PowerShell and execute the following commands:
cd c:
mkdir c:\opt\clearml\data
mkdir c:\opt\clearml\logs
1. Save the **ClearML Server** docker-compose YAML file.
curl https://raw.githubusercontent.com/allegroai/clearml-server/master/docker/docker-compose-win10.yml -o c:\opt\clearml\docker-compose-win10.yml
1. Run `docker-compose`. In PowerShell, execute the following commands:
docker-compose -f c:\opt\clearml\docker-compose-win10.yml up
The server is now running on [http://localhost:8080](http://localhost:8080).
## Port mapping
After deploying **ClearML Server**, the services expose the following node ports:
* Web server on port `8080`
* API server on port `8008`
* File server on port `8081`
## Restarting
**To restart ClearML Server Docker deployment:**
* Stop and then restart the Docker containers by executing the following commands:
docker-compose -f c:\opt\clearml\docker-compose-win10.yml down
docker-compose -f c:\opt\clearml\docker-compose-win10.yml up -d
## Next Step
* [Configuring ClearML for ClearML Server](clearml_config_for_clearml_server.md).

View File

@@ -0,0 +1,73 @@
---
title: AWS EC2 AMIs
---
:::important
This documentation page applies to deploying your own open source ClearML Server. It does not apply to ClearML Hosted Service users.
:::
:::note
For upgrade purposes, the terms **Trains Server** and **ClearML Server** are interchangeable.
:::
The sections below contain the steps to upgrade **ClearML Server** on the [same AWS instance](#upgrading-on-the-same-aws-instance), and
to upgrade and migrate to a [new AWS instance](#upgrading-and-migrating-to-a-new-aws-instance).
### Upgrading on the same AWS instance
This section contains the steps to upgrade **ClearML Server** on the same AWS instance.
:::warning
Some legacy **Trains Server** AMIs provided an auto-upgrade on restart capability. This functionality is now deprecated.
:::
**To upgrade your ClearML Server AWS AMI:**
1. Shutdown the **ClearML Server** executing the following command (which assumes the configuration file is in the environment path).
docker-compose -f /opt/clearml/docker-compose.yml down
If you are upgrading from **Trains Server**, use this command:
docker-compose -f /opt/trains/docker-compose.yml down
1. We recommend [backing up your data](clearml_server_aws_ec2_ami.md#backing-up-and-restoring-data-and-configuration) and,
if your configuration folder is not empty, backing up your configuration.
1. If upgrading from **Trains Server** version 0.15 or older, a data migration is required before upgrading,
First follow these [data migration instructions](clearml_server_es7_migration.md), and then continue this upgrade.
1. If upgrading from **Trains Server** to **ClearML Server**, rename `/opt/trains` to `/opt/clearml`.
1. Download the latest `docker-compose.yml` file. Execute the following command:
sudo curl https://raw.githubusercontent.com/allegroai/clearml-server/master/docker/docker-compose.yml -o /opt/clearml/docker-compose.yml
1. Startup **ClearML Server**. This automatically pulls the latest **ClearML Server** build.
docker-compose -f /opt/clearml/docker-compose.yml pull
docker-compose -f docker-compose.yml up -d
### Upgrading and migrating to a new AWS instance
This section contains the steps to upgrade **ClearML Server** on the new AWS instance.
**To migrate and to upgrade your ClearML Server AWS AMI:**
1. Shutdown **ClearML Server**. Executing the following command (which assumes the configuration file is in the environment path).
docker-compose down
1. On the old AWS instance, [backup your data](clearml_server_aws_ec2_ami.md#backing-up-and-restoring-data-and-configuration)
and, if your configuration folder is not empty, backup your configuration.
1. If upgrading from **Trains Server** version 0.15 or older, a data migration is required before upgrading. First follow
these [data migration instructions](clearml_server_es7_migration.md), and then continue this upgrade.
1. On the new AWS instance, [restore your data](clearml_server_aws_ec2_ami.md#backing-up-and-restoring-data-and-configuration) and, if the configuration folder is not empty, restore the
configuration.
1. Startup **ClearML Server**. This automatically pulls the latest **ClearML Server** build.
docker-compose -f docker-compose.yml pull
docker-compose -f docker-compose.yml up -d

View File

@@ -0,0 +1,36 @@
---
title: Google Cloud Platform
---
:::important
This documentation page applies to deploying your own open source ClearML Server. It does not apply to ClearML Hosted Service users.
:::
**To upgrade ClearML Server Docker deployment:**
1. Shut down the docker containers with the following command:
docker-compose -f docker-compose.yml down
1. If upgrading from **Trains Server** version 0.15 or older to **ClearML Server**, do the following:
1. A data migration is required before upgrading. First follow these [data migration instructions](clearml_server_es7_migration.md),
and then continue this upgrade.
1. Rename `/opt/trains` and its subdirectories to `/opt/clearml`.
sudo mv /opt/trains /opt/clearml
1. We recommend [backing up data](clearml_server_gcp.md#backing-up-and-restoring-data-and-configuration) and, if the configuration folder is
not empty, backing up the configuration.
1. Download the latest `docker-compose.yml` file.
curl https://raw.githubusercontent.com/allegroai/clearml-server/master/docker/docker-compose.yml -o /opt/clearml/docker-compose.yml
1. Startup **ClearML Server**. This automatically pulls the latest **ClearML Server** build.
docker-compose -f /opt/clearml/docker-compose.yml pull
docker-compose -f /opt/clearml/docker-compose.yml up -d
If issues arise during your upgrade, see the FAQ page, [How do I fix Docker upgrade errors?](../faq.md#common-docker-upgrade-errors).

View File

@@ -0,0 +1,27 @@
---
title: Kubernetes
---
:::important
This documentation page applies to deploying your own open source ClearML Server. It does not apply to ClearML Hosted Service users.
:::
:::note
We strongly encourage to keep your **ClearML Server** up to date, by upgrading to the current release.
:::
To update deployment, edit the yaml file that needs to be updated and then run following command:
**To update the ClearML Server in Kubernetes clusters:**
1. If a **ClearML Server** was previously deployed, delete old deployments using the following command:
kubectl delete -f .
1. If upgrading from **Trains Server** version 0.15 or older to **ClearML Server**, a data migration is required before
upgrading. First follow these [data migration instructions](clearml_server_es7_migration.md), and then continue this
upgrade.
1. Edit the YAML file that needs to be updated and then run following command:
kubectl apply -f <file you edited>.yaml

View File

@@ -0,0 +1,26 @@
---
title: Kubernetes Helm
---
:::important
This documentation page applies to deploying your own open source ClearML Server. It does not apply to ClearML Hosted Service users.
:::
:::note
We strongly encourage to keep the **ClearML Server** up to date, by upgrading to the current release.
:::
1. Upgrade using new or upgraded values.yaml
helm upgrade clearml-server allegroai/clearml-server-chart -f new-values.yaml
1. If **ClearML Server** was previously deployed, first delete old deployments using the following command:
helm delete --purge clearml-server
1. If upgrading from **Trains Server** version 0.15 or older to **ClearML Server**, a data migration is required before
upgrading. First follow these [data migration instructions](clearml_server_es7_migration.md), and then continue this upgrade.
1. Upgrade deployment to match repository version.
helm upgrade clearml-server allegroai/clearml-server-chart

View File

@@ -0,0 +1,61 @@
---
title: Linux or macOS
---
:::important
This documentation page applies to deploying your own open source ClearML Server. It does not apply to ClearML Hosted Service users.
:::
<br/>
<details className="cml-expansion-panel info">
<summary className="cml-expansion-panel-summary">Important: Upgrading from v0.14 or older</summary>
<div class="cml-expansion-panel-content">
For Linux only, if upgrading from <strong>Trains Server</strong> v0.14 or older, configure the <strong>ClearML Agent Services</strong>.
* If ``CLEARML_HOST_IP`` is not provided, then **ClearML Agent Services** will use the external public address of the **ClearML Server**.
* If ``CLEARML_AGENT_GIT_USER`` / ``CLEARML_AGENT_GIT_PASS`` are not provided, then **ClearML Agent Services** will not be able to access any private repositories for running service tasks.
export CLEARML_HOST_IP=server_host_ip_here
export CLEARML_AGENT_GIT_USER=git_username_here
export CLEARML_AGENT_GIT_PASS=git_password_here
:::note
For backwards compatibility, the environment variables ``TRAINS_HOST_IP``, ``TRAINS_AGENT_GIT_USER``, and ``TRAINS_AGENT_GIT_PASS`` are supported.
:::
</div>
</details>
<br/>
**To upgrade ClearML Server Docker deployment:**
1. Shutdown **ClearML Server**. Execute the following command (which assumes the configuration file is in the environment path).
docker-compose -f docker-compose.yml down
1. If upgrading from **Trains Server** version 0.15 or older to **ClearML Server**, a data migration is required before
upgrading. First follow these [data migration instructions](clearml_server_es7_migration.md), and then continue this upgrade.
1. We recommend [backing up data](clearml_server_linux_mac.md#backing-up-and-restoring-data-and-configuration) and, if the configuration folder is
not empty, backing up the configuration.
1. If upgrading from **Trains Server** to **ClearML Server**, rename `/opt/trains` and its subdirectories to `/opt/clearml`.
sudo mv /opt/trains /opt/clearml
1. Download the latest `docker-compose.yml` file.
curl https://raw.githubusercontent.com/allegroai/clearml-server/master/docker/docker-compose.yml -o /opt/clearml/docker-compose.yml
1. Startup **ClearML Server**. This automatically pulls the latest **ClearML Server** build.
docker-compose -f /opt/clearml/docker-compose.yml pull
docker-compose -f /opt/clearml/docker-compose.yml up -d
If issues arise during your upgrade, see the FAQ page, [How do I fix Docker upgrade errors?](../faq.md#common-docker-upgrade-errors).

View File

@@ -0,0 +1,44 @@
---
title: Windows
---
:::important
This documentation page applies to deploying your own open source ClearML Server. It does not apply to ClearML Hosted Service users.
:::
**To upgrade ClearML Server Docker deployment:**
1. Shut down the docker containers.
1. Execute one of the following commands, depending upon the version that is being upgraded:
* Upgrading **ClearML Server** version:
docker-compose -f c:\opt\clearml\docker-compose-win10.yml down
* Upgrading from **Trains Server** to **ClearML Server**:
docker-compose -f c:\opt\trains\docker-compose-win10.yml down
1. If upgrading from **Trains Server** version 0.15 or older to **ClearML Server**, a data migration is required before
upgrading. First follow these [data migration instructions](clearml_server_es7_migration.md), and then continue this upgrade.
1. We recommend backing up data and, if the configuration folder is not empty, backing up the configuration.
:::note
For example, if the configuration is in ``c:\opt\clearml``, then backup ``c:\opt\clearml\config`` and ``c:\opt\clearml\data``.
Before restoring, remove the old artifacts in ``c:\opt\clearml\config`` and ``c:\opt\clearml\data``, and then restore.
:::
1. If upgrading from **Trains Server** to **ClearML Server**, rename `/opt/trains` and its subdirectories to `/opt/clearml`.
1. Download the latest `docker-compose.yml` file.
curl https://raw.githubusercontent.com/allegroai/clearml-server/master/docker/docker-compose-win10.yml -o c:\opt\clearml\docker-compose-win10.yml
1. Startup **ClearML Server**. This automatically pulls the latest **ClearML Server** build.
docker-compose -f c:\opt\clearml\docker-compose-win10.yml pull
docker-compose -f c:\opt\clearml\docker-compose-win10.yml up -d
If issues arise during your upgrade, see the FAQ page, [How do I fix Docker upgrade errors?](../faq.md#common-docker-upgrade-errors).

856
docs/faq.md Normal file
View File

@@ -0,0 +1,856 @@
---
title: FAQ
---
**General Information**
* [How do I know a new version came out?](#new-version-auto-update)
**Models**
* [How can I sort models by a certain metric?](#custom-columns)
* [Can I store more information on the models?](#store-more-model-info)
* [Can I store the model configuration file as well?](#store-model-configuration)
* [I am training multiple models at the same time, but I only see one of them. What happened?](#only-last-model-appears)
* [Can I log input and output models manually?](#manually-log-models)
**Experiments**
* [I noticed I keep getting the message "warning: uncommitted code". What does it mean?](#uncommitted-code-warning)
* [I do not use argparse for hyperparameters. Do you have a solution?](#dont-want-argparser)
* [I noticed that all of my experiments appear as "Training". Are there other options?](#other-experiment-types)
* [Sometimes I see experiments as running when in fact they are not. What's going on?](#experiment-running-but-stopped)
* [My code throws an exception, but my experiment status is not "Failed". What happened?](#exception-not-failed)
* [CERTIFICATE_VERIFY_FAILED - When I run my experiment, I get an SSL Connection error . Do you have a solution?](#ssl-connection-error)
* [How do I modify experiment names once they have been created?](#modify_exp_names)
* [Using Conda and the "typing" package, I get the error "AttributeError: type object 'Callable' has no attribute '_abc_registry'". How do I fix this?](#typing)
* [My ClearML Server disk space usage is too high. What can I do about this?](#delete_exp)
* [Can I change the random seed my experiment uses?](#random_see)
* [In the Web UI, I can't access files that my experiment stored. Why not?](#access_files)
* [I get the message "ClearML Monitor: Could not detect iteration reporting, falling back to iterations as seconds-from-start". What does it mean?](#resource_monitoring)
* [Can I control what ClearML automatically logs?](#controlling_logging)
**Graphs and Logs**
* [The first log lines are missing from the experiment log tab. Where did they go?](#first-log-lines-missing)
* [Can I create a graph comparing hyperparameters vs model accuracy?](#compare-graph-parameters)
* [I want to add more graphs, not just with TensorBoard. Is this supported?](#more-graph-types)
* [How can I report more than one scatter 2D series on the same plot?](#multiple-scatter2D)
**GIT and Storage**
* [Is there something ClearML can do about uncommitted code running?](#help-uncommitted-code)
* [I read there is a feature for centralized model storage. How do I use it?](#centralized-model-storage)
* [When using PyCharm to remotely debug a machine, the Git repo is not detected. Do you have a solution?](#pycharm-remote-debug-detect-git)
**Remote Debugging (ClearML PyCharm Plugin)**
* [I am using your ClearML PyCharm Plugin for remote debugging. I get the message "clearml.Task - INFO - Repository and package analysis timed out (10.0 sec), giving up". What should I do?](#package_thread)
**Jupyter**
* [I am using Jupyter Notebook. Is this supported?](#jupyter-notebook)
**scikit-learn**
* [Can I use ClearML with scikit-learn?](#use-scikit-learn)
**ClearML Configuration**
* [How do I explicitly specify the ClearML configuration file to be used?](#change-config-path)
* [How can I override ClearML credentials from the OS environment?](#credentials-os-env)
* [How can I track OS environment variables with experiments?](#track-env-vars)
**ClearML Hosted Service**
* [I run my script, but my experiment is not in the ClearML Hosted Service Web UI. How do I fix this?](#hosted-service-no-config)
**ClearML Server Deployment**
* How do I deploy **ClearML Server** on:
* [Stand alone Linux Ubuntu systems?](#Ubuntu)
* [macOS?](#Ubuntu)
* [Windows 10?](#docker_compose_win10)
* [AWS EC2 AMIs?](#aws_ec2_amis)
* [Google Cloud Platform?](#google_cloud_platform)
* [How do I restart ClearML Server?](#restart)
* [Can I deploy ClearML Server on Kubernetes clusters?](#kubernetes)
* [Can I create a Helm Chart for ClearML Server Kubernetes deployment?](#helm)
* [My Docker cannot load a local host directory on SELinux?](#selinux)
**ClearML Server Configuration**
* [How do I configure ClearML Server for sub-domains and load balancers?](#sub-domains)
* [Can I add web login authentication to ClearML Server?](#web-auth)
* [Can I modify a non-responsive task settings?](#watchdog)
**ClearML Server Troubleshooting**
* [I did a reinstall. Why can't I create credentials in the Web-App (UI)?](#clearml-server-reinstall-cookies)
* [How do I fix Docker upgrade errors?](#common-docker-upgrade-errors)
* [Why is web login authentication not working?](#port-conflict)
* [How do I bypass a proxy configuration to access my local ClearML Server?](#proxy-localhost)
* [Trains is failing to update ClearML Server. I get an error 500 (or 400). How do I fix this?](#elastic_watermark)
* [Why is my Trains Web-App (UI) not showing any data?](#web-ui-empty)
**ClearML Agent**
* [How can I execute ClearML Agent without installing packages each time?](#system_site_packages)
**ClearML API**
* [How can I use the ClearML API to fetch data?](#api)
## General Information
**How do I know a new version came out? <a className="tr_top_negative" id="new-version-auto-update"></a>**
Starting with **ClearML** v0.9.3, **ClearML** issues a new version release notification, which appears in the log and is
output to the console, when a Python experiment script is run.
For example, when a new **ClearML Python Package** version is available, the notification is:
CLEARML new package available: UPGRADE to vX.Y.Z is recommended!
When a new **ClearML Server** version is available, the notification is:
CLEARML-SERVER new version available: upgrade to vX.Y is recommended!
## Models
**How can I sort models by a certain metric?** <a id="custom-columns"></a>
**ClearML** associates models with the experiments that created them. To sort experiments by a metric, in the ClearML Web UI,
add a [custom column](webapp/webapp_exp_table.md#customizing-the-experiments-table) in the experiments table and sort by
that metric column.
<br/>
**Can I store more information on the models?** <a id="store-more-model-info"></a>
Yes! For example, you can use the [Task.set_model_label_enumeration](references/sdk/task.md#set_model_label_enumerationenumerationnone)
method to store label enumeration:
Task.current_task().set_model_label_enumeration( {"label": int(0), } )
For more information about `Task` class methods, see the [Task Class](fundamentals/task.md) reference page.
<br/>
**Can I store the model configuration file as well?** <a id="store-model-configuration"></a>
Yes! Use the [Task.set_model_config](references/sdk/task.md#set_model_configconfig_textnone-config_dictnone)
method:
Task.current_task().set_model_config("a very long text with the configuration file's content")
<br/>
**I am training multiple models at the same time, but I only see one of them. What happened?** <a id="only-last-model-appears"></a>
Currently, in the experiment info panel, **ClearML** shows only the last associated model. In the **ClearML Web UI**,
on the Projects page, the **Models** tab shows all models.
This will be improved in a future version.
<br/>
**Can I log input and output models manually?** <a id="manually-log-models"></a>
Yes! Use the [InputModel.import_model](references/sdk/model_inputmodel.md#inputmodelimport_model)
and [Task.connect](references/sdk/task.md#connect) methods to manually connect an input model. Use the
[OutputModel.update_weights](references/sdk/model_outputmodel.md#update_weights)
method to manually connect a model weights file.
input_model = InputModel.import_model(link_to_initial_model_file)
Task.current_task().connect(input_model)
OutputModel(Task.current_task()).update_weights(link_to_new_model_file_here)
For more information about models, see [InputModel](references/sdk/model_inputmodel.md)
and [OutputModel](references/sdk/model_outputmodel.md) classes.
## Experiments
**I noticed I keep getting the message "warning: uncommitted code". What does it mean?** <a id="uncommitted-code-warning"></a>
This message is only a warning. **ClearML** not only detects your current repository and git commit, but also warns you
if you are using uncommitted code. **ClearML** does this because uncommitted code means this experiment will be difficult
to reproduce. You can see uncommitted changes in the **ClearML Web UI**, in the **EXECUTION** tab of the experiment info panel.
**I do not use argparse for hyperparameters. Do you have a solution?** <a id="dont-want-argparser"></a>
Yes! **ClearML** supports connecting hyperparameter dictionaries to experiments, using the [Task.connect](fundamentals/hyperparameters#connecting-objects) method.
For example, to log the hyperparameters `learning_rate`, `batch_size`, `display_step`,
`model_path`, `n_hidden_1`, and `n_hidden_2`:
# Create a dictionary of parameters
parameters_dict = { 'learning_rate': 0.001, 'batch_size': 100, 'display_step': 1,
'model_path': "/tmp/model.ckpt", 'n_hidden_1': 256, 'n_hidden_2': 256 }
# Connect the dictionary to your CLEARML Task
parameters_dict = Task.current_task().connect(parameters_dict)
<br/>
**I noticed that all of my experiments appear as "Training" Are there other options?** <a id="other-experiment-types"></a>
Yes! When creating experiments and calling [Task.init](fundamentals/task.md#usage),
you can provide an experiment type. **ClearML** supports [multiple experiment types](fundamentals/task.md#task-types). For example:
task = Task.init(project_name, task_name, Task.TaskTypes.testing)
<br/>
**Sometimes I see experiments as running when in fact they are not. What's going on?** <a id="experiment-running-but-stopped"></a>
**ClearML** monitors your Python process. When the process exits properly, **ClearML** closes the experiment. When the process crashes and terminates abnormally, it sometimes misses the stop signal. In this case, you can safely right click the experiment in the Web-App and abort it.
<br/>
**My code throws an exception, but my experiment status is not "Failed". What happened?** <a id="exception-not-failed"></a>
This issue was resolved in **Trains** v0.9.2. Upgrade to **ClearML** by executing the following command:
pip install -U clearml
<a id="ssl-connection-error"></a>
<br/>
**When I run my experiment, I get an SSL Connection error CERTIFICATE_VERIFY_FAILED. Do you have a solution?**
Your firewall may be preventing the connection. Try one of the following solutions:
* Direct python "requests" to use the enterprise certificate file by setting the OS environment variables CURL_CA_BUNDLE or REQUESTS_CA_BUNDLE. For a detailed discussion of this topic, see [https://stackoverflow.com/questions/48391750/disable-python-requests-ssl-validation-for-an-imported-module](https://stackoverflow.com/questions/48391750/disable-python-requests-ssl-validation-for-an-imported-module).
* Disable certificate verification
:::warning
For security reasons, it is not recommended to disable certificate verification
:::
1. Upgrade **ClearML** to the current version:
pip install -U clearml
1. Create a new `clearml.conf` configuration file (see a [sample configuration file](https://github.com/allegroai/clearml/blob/master/docs/clearml.conf)), containing:
api { verify_certificate = False }
1. Copy the new `clearml.conf` file to:
* Linux - `~/clearml.conf`
* Mac - `$HOME/clearml.conf`
* Windows - `\User\<username>\clearml.conf``~/clearml.conf`
<a id="modify_exp_names"></a>
<br/>
**How do I modify experiment names once they have been created?**
An experiment's name is a user-controlled property, which can be accessed via the `Task.name` variable. This allows you to use meaningful naming schemes for easily filtering and comparing of experiments.
For example, to distinguish between different experiments, you can append the task ID to the task name:
task = Task.init('examples', 'train')
task.name += ' {}'.format(task.id)
Or, append the Task ID post-execution:
tasks = Task.get_tasks(project_name='examples', task_name='train')
for t in tasks:
t.name += ' {}'.format(task.id)
Another example is to append a specific hyperparameter and its value to each task's name:
tasks = Task.get_tasks(project_name='examples', task_name='my_automl_experiment')
for t in tasks:
params = t.get_parameters()
if 'my_secret_parameter' in params:
t.name += ' my_secret_parameter={}'.format(params['my_secret_parameter'])
Use this experiment naming when creating automation pipelines with a naming convention.
<a id="typing"></a>
<br/>
**Using Conda and the "typing" package, I get the error "AttributeError: type object 'Callable' has no attribute '_abc_registry'". How do I fix this?**
Conda and the [typing](https://pypi.org/project/typing/) package may have some compatibility issues.
However, [since Python 3.5](https://docs.python.org/3.5/library/typing.html), the `typing` package is part of the standard library.
To resolve the error, uninstall `typing` and rerun you script. If this does not fix the issue, create a [new ClearML issue](https://github.com/allegroai/clearml/issues/new), including the full error, and your environment details.
<a id="delete_exp"></a>
<br/>
**My ClearML Server disk space usage is too high. What can I do about this?**
We designed the **ClearML** open source suite, including **ClearML Server**, to ensure experiment traceability. For this reason, the **ClearML Web UI** does not include a feature to delete experiments. The **ClearML Web UI** does allow you to archive experiments so that they appear only in the Archive area.
In rare instances, however, such as high disk usage for a privately-hosted **ClearML Server** because Elasticsearch is indexing unwanted experiments, you may choose to delete an experiment.
You can use the `APIClient` provided by **ClearML Agent** and
`client.tasks.delete()` to delete an experiment.
:::warning
You cannot undo the deletion of an experiment.
:::
For example, the following script deletes an experiment whose Task ID is `123456789`.
from clearml_agent import APIClient
client = APIClient()
client.tasks.delete(task='123456789')
<a id="random_see"></a>
<br/>
**Can I change the random seed my experiment uses?**
Yes! By default, **ClearML** initializes Tasks with a default seed. You change that seed by calling the [make_deterministic](https://github.com/allegroai/clearml/blob/2f5b519cd8c4df9d3db397604f5b8097c23ccc40/trains/utilities/seed.py) method.
<a id="access_files"></a>
<br/>
**In the Web UI, I can't access files that my experiment stored. Why not?**
**ClearML** stores file locations. The machine running your browser must have access to the location where the machine
that ran the Task stored the file. This applies to debug samples and artifacts. If, for example, the machine running the browser does not have access, you may see "Unable to load image", instead of the image.
<a id="resource_monitoring"></a>
<br/>
**I get the message "CLEARML Monitor: Could not detect iteration reporting, falling back to iterations as seconds-from-start". What does it mean?**
If metric reporting begins within the first three minutes, **ClearML** reports resource monitoring by iteration. Otherwise,
it reports resource monitoring by seconds from start, and logs a message:
CLEARML Monitor: Could not detect iteration reporting, falling back to iterations as seconds-from-start.
However, if metric reporting begins after three minutes and anytime up to thirty minutes, resource monitoring reverts to
by iteration, and **ClearML** logs a message
CLEARML Monitor: Reporting detected, reverting back to iteration based reporting.
After thirty minutes, it remains unchanged.
<br/>
**Can I control what ClearML automatically logs?** <a id="controlling_logging"></a>
Yes! **ClearML** allows you to control automatic logging for `stdout`, `stderr`, and frameworks.
When initializing a Task by calling the `Task.init` method, provide the `auto_connect_frameworks` parameter to control
framework logging, and the `auto_connect_streams` parameter to control `stdout`, `stderr`, and standard logging. The
values are `True`, `False`, and a dictionary for fine-grain control. See [Task.init](references/sdk/task.md#classmethod-initproject_namenone-task_namenone-task_typetasktypestraining-training-tagsnone-reuse_last_task_idtrue-continue_last_taskfalse-output_urinone-auto_connect_arg_parsertrue-auto_connect_frameworkstrue-auto_resource_monitoringtrue-auto_connect_streamstrue).
## Graphs and Logs
**The first log lines are missing from the experiment log tab. Where did they go?** <a id="first-log-lines-missing"></a>
Due to speed/optimization issues, we opted to display only the last several hundred log lines.
You can always download the full log as a file using the **ClearML Web UI**. In the **ClearML Web UI** **>** experiment
info panel *>* **RESULTS** tab **>** **LOG** sub-tab, use the **Download full log** feature.
<br/>
**Can I create a graph comparing hyperparameters vs. model accuracy?** <a id="compare-graph-parameters"></a>
Yes! You can manually create a plot with a single point X-axis for the hyperparameter value, and Y-axis for the accuracy.
For example:
number_layers = 10
accuracy = 0.95
Task.current_task().get_logger().report_scatter2d(
"performance", "accuracy", iteration=0,
mode='markers', scatter=[(number_layers, accuracy)])
Assuming the hyperparameter is `number_layers` with current value `10`, and the `accuracy` for the trained model is `0.95`. Then, the experiment comparison graph shows:
![image](img/clearml_faq_screenshots/compare_plots.png)
Another option is a histogram chart:
number_layers = 10
accuracy = 0.95
Task.current_task().get_logger().report_vector(
"performance", "accuracy", iteration=0, labels=['accuracy'],
values=[accuracy], xlabels=['number_layers %d' % number_layers])
![image](img/clearml_faq_screenshots/compare_plots_hist.png)
<br/>
**I want to add more graphs, not just with TensorBoard. Is this supported?** <a id="more-graph-types"></a>
Yes! The [Logger](fundamentals/logger.md) module includes methods for explicit reporting. For examples of explicit reporting, see the [Explicit Reporting](guides/reporting/explicit_reporting.md)
tutorial, which includes a list of methods for explicit reporting.
<br/>
**How can I report more than one scatter 2D series on the same plot?** <a id="multiple-scatter2D"></a>
The [`Logger.report_scatter2d()`](references/sdk/logger.md#report_scatter2dtitle-series-scatter-iteration-xaxisnone-yaxisnone-labelsnone-modelines-commentnone-extra_layoutnone)
method reports all series with the same `title` and `iteration` parameter values on the same plot.
For example, the following two scatter2D series are reported on the same plot, because both have a `title` of `example_scatter` and an `iteration` of `1`:
scatter2d_1 = np.hstack((np.atleast_2d(np.arange(0, 10)).T, np.random.randint(10, size=(10, 1))))
logger.report_scatter2d("example_scatter", "series_1", iteration=1, scatter=scatter2d_1,
xaxis="title x", yaxis="title y")
scatter2d_2 = np.hstack((np.atleast_2d(np.arange(0, 10)).T, np.random.randint(10, size=(10, 1))))
logger.report_scatter2d("example_scatter", "series_2", iteration=1, scatter=scatter2d_2,
xaxis="title x", yaxis="title y")
## GIT and Storage
**Is there something ClearML can do about uncommitted code running?** <a id="help-uncommitted-code"></a>
Yes! **ClearML** stores the git diff as part of the experiment's information. You can view the git diff in the **ClearML Web UI**,
experiment info panel **>** **EXECUTION** tab.
<br/>
**I read there is a feature for centralized model storage. How do I use it?** <a id="centralized-model-storage"></a>
When calling [Task.init](references/sdk/task.md#classmethod-initproject_namenone-task_namenone-task_typetasktypestraining-training-tagsnone-reuse_last_task_idtrue-continue_last_taskfalse-output_urinone-auto_connect_arg_parsertrue-auto_connect_frameworkstrue-auto_resource_monitoringtrue-auto_connect_streamstrue),
providing the `output_uri` parameter allows you to specify the location in which model checkpoints (snapshots) will be stored.
For example, to store model checkpoints (snapshots) in `/mnt/shared/folder`:
task = Task.init(project_name, task_name, output_uri="/mnt/shared/folder")
**ClearML** will copy all stored snapshots into a subfolder under `/mnt/shared/folder`. The subfolder's name will contain
the experiment's ID. If the experiment's ID is `6ea4f0b56d994320a713aeaf13a86d9d`, the following folder will be used:
`/mnt/shared/folder/task.6ea4f0b56d994320a713aeaf13a86d9d/models/`
**ClearML** supports other storage types for `output_uri`, including:
# AWS S3 bucket
task = Task.init(project_name, task_name, output_uri="s3://bucket-name/folder")
# Google Cloud Storage bucket
task = Task.init(project_name, task_name, output_uri="gs://bucket-name/folder")
To use Cloud storage with **ClearML**, configure the storage credentials in your `~/clearml.conf`. For detailed information,
see [ClearML Configuration Reference](configs/clearml_conf.md).
<a id="pycharm-remote-debug-detect-git"></a>
<br/>
**When using PyCharm to remotely debug a machine, the Git repo is not detected. Do you have a solution?**
Yes! Since this is such a common occurrence, we created a PyCharm plugin that allows a remote debugger to grab your local
repository / commit ID. For detailed information about using the plugin, see the [ClearML PyCharm Plugin](guides/ide/integration_pycharm.md).
## Jupyter
**I am using Jupyter Notebook. Is this supported?** <a id="jupyter-notebook"></a>
Yes! You can run **ClearML** in Jupyter Notebooks using either of the following:
* Option 1: Install **ClearML** on your Jupyter Notebook host machine
* Option 2: Install **ClearML** in your Jupyter Notebook and connect using **ClearML** credentials
**Option 1: Install ClearML on your Jupyter host machine**
1. Connect to your Jupyter host machine.
1. Install the **ClearML Python Package**.
pip install clearml
1. Run the **ClearML** initialize wizard.
clearml-init
1. In your Jupyter Notebook, you can now use **ClearML**.
**Option 2: Install ClearML in your Jupyter Notebook**
1. In the **ClearML Web UI**, Profile page, create credentials and copy your access key and secret key. These are required in the Step 3.
1. Install the **ClearML Python Package**.
pip install clearml
1. Use the [Task.set_credentials](references/sdk/task.md#classmethod-set_credentialsapi_hostnone-web_hostnone-files_hostnone-keynone-secretnone-store_conf_filefalse)
method to specify the host, port, access key and secret key (see step 1).
# Set your credentials using the trains apiserver URI and port, access_key, and secret_key.
Task.set_credentials(host='http://localhost:8008',key='<access_key>', secret='<secret_key>')
:::note
`host` is the API server (default port `8008`), not the web server (default port `8080`).
:::
1. You can now use **ClearML**.
# create a task and start training
task = Task.init('juptyer project', 'my notebook')
<a id="commit-git-in-jupyter"></a>
<br/>
## Remote Debugging (ClearML PyCharm Plugin)
**I am using your ClearML PyCharm Plugin for remote debugging. I get the message "clearml.Task - INFO - Repository and
package analysis timed out (10.0 sec), giving up". What should I do?**<a id="package_thread"></a>
**ClearML** uses a background thread to analyze the script. This includes package requirements. At the end of the execution
of the script, if the background thread is still running, **ClearML** allows the thread another 10 seconds to complete.
If the thread does not complete, it times out.
This can occur for scripts that do not import any packages, for example short test scripts.
To fix this issue, you could import the `time` package and add a `time.sleep(20)` statement to the end of your script.
## scikit-learn
**Can I use ClearML with scikit-learn?** <a id="use-scikit-learn"></a>
Yes! `scikit-learn` is supported. Everything you do is logged. **ClearML** automatically logs models which are stored using `joblib`.
See the scikit-learn examples with [Matplotlib](guides/frameworks/scikit-learn/sklearn_matplotlib_example.md) and [Joblib](guides/frameworks/scikit-learn/sklearn_joblib_example.md).
## ClearML Configuration
**How do I explicitly specify the ClearML configuration file to be used?** <a id="change-config-path"></a>
To override the default configuration file location, set the `CLEARML_CONFIG_FILE` OS environment variable.
For example:
export CLEARML_CONFIG_FILE="/home/user/myclearml.conf"
<br/>
**How can I override ClearML credentials from the OS environment?** <a id="credentials-os-env"></a>
To override your configuration file / defaults, set the following OS environment variables:
export CLEARML_API_ACCESS_KEY="key_here"
export CLEARML_API_SECRET_KEY="secret_here"
export CLEARML_API_HOST="http://localhost:8008"
<br/>
**How can I track OS environment variables with experiments?** <a id="track-env-vars"></a>
Set the OS environment variable `ClearML_LOG_ENVIRONMENT` with the variables you need track, either:
* All environment variables:
export ClearML_LOG_ENVIRONMENT="*"
* Specific environment variables, for example, log `PWD` and `PYTHONPATH`:
export ClearML_LOG_ENVIRONMENT="PWD,PYTHONPATH"
* No environment variables:
export ClearML_LOG_ENVIRONMENT=
## ClearML Hosted Service
**I run my script, but my experiment is not in the ClearML Hosted Service Web UI. How do I fix this?** <a id="hosted-service-no-config"></a>
If you joined the **ClearML Hosted Service** and run a script, but your experiment does not appear in Web UI, you may not have configured **ClearML** for the hosted service. Run the **ClearML** setup wizard. It will request your hosted service **ClearML** credentials and create the **ClearML** configuration you need.
pip install clearml
clearml-init
## ClearML Server Deployment
**How do I deploy ClearML Server on stand-alone Linux Ubuntu or macOS systems?** <a id="Ubuntu"></a>
For detailed instructions, see [Deploying ClearML Server: Linux or macOS](deploying_clearml/clearml_server_linux_mac.md)
in the "Deploying ClearML" section.
<br/>
**How do I deploy ClearML Server on Windows 10?** <a id="docker_compose_win10"></a>
For detailed instructions, see [Deploying ClearML Server: Windows 10](deploying_clearml/clearml_server_win.md) in the
"Deploying ClearML" section.
<br/>
**How do I deploy ClearML Server on AWS EC2 AMIs?** <a id="aws_ec2_amis"></a>
For detailed instructions, see [Deploying ClearML Server: AWS EC2 AMIs](deploying_clearml/clearml_server_aws_ec2_ami.md)
in the "Deploying ClearML" section.
<br/>
**How do I deploy ClearML Server on the Google Cloud Platform?** <a id="google_cloud_platform"></a>
For detailed instructions, see [Deploying ClearML Server: Google Cloud Platform](deploying_clearml/clearml_server_gcp.md)
in the "Deploying ClearML" section.
<br/>
**How do I restart ClearML Server?** <a id="restart"></a>
For detailed instructions, see the "Restarting" section of the documentation page for your deployment format. For example,
if you deployed to Linux, see [Restarting](deploying_clearml/clearml_server_linux_mac.md#restarting) on the "Deploying ClearML Server: Linux or macOS" page.
<br/>
**Can I deploy ClearML Server on Kubernetes clusters?** <a id="kubernetes"></a>
Yes! ClearML Server supports Kubernetes. For detailed instructions, see [Deploying ClearML Server: Kubernetes](deploying_clearml/clearml_server_kubernetes.md)
in the "Deploying ClearML" section.
<br/>
**Can I create a Helm Chart for ClearML Server Kubernetes deployment?** <a id="helm"></a>
Yes! You can create a Helm Chart of **ClearML Server** Kubernetes deployment. For detailed instructions,
see [Deploying ClearML Server: Kubernetes using Helm](deploying_clearml/clearml_server_kubernetes_helm.md) in the "Deploying ClearML" section.
<br/>
**My Docker cannot load a local host directory on SELinux?** <a id="selinux"></a>
If you are using SELinux, run the following command (see this [discussion](https://stackoverflow.com/a/24334000)):
chcon -Rt svirt_sandbox_file_t /opt/clearml
## ClearML Server Configuration
**How do I configure ClearML Server for sub-domains and load balancers?** <a id="sub-domains"></a>
For detailed instructions, see [Configuring Sub-domains and load balancers](deploying_clearml/clearml_server_config.md#sub-domains-and-load-balancers)
on the "Configuring Your Own ClearML Server" page.
<br/>
**Can I add web login authentication to ClearML Server?** <a id="web-auth"></a>
By default, anyone can login to the **ClearML Server** Web-App. You can configure the **ClearML Server** to allow only a specific set of users to access the system.
For detailed instructions, see [Web Login Authentication](deploying_clearml/clearml_server_config.md#web-login-authentication)
on the "Configuring Your Own ClearML Server" page in the "Deploying ClearML" section.
<br/>
**Can I modify a non-responsive task settings?** <a id="watchdog"></a>
The non-responsive experiment watchdog monitors experiments that were not updated for a specified time interval, and
marks them as `aborted`. The watchdog is always active.
You can modify the following settings for the watchdog:
* The time threshold (in seconds) of task inactivity (default value is 7200 seconds which is 2 hours).
* The time interval (in seconds) between watchdog cycles.
For detailed instructions, see [Modifying non-responsive Task watchdog settings](deploying_clearml/clearml_server_config.md#non-responsive-task-watchdog) on the "Configuring Your Own ClearML Server" page.
## ClearML Server Troubleshooting
**I did a reinstall. Why can't I create credentials in the Web-App (UI)?** <a id="clearml-server-reinstall-cookies"></a>
The issue is likely your browser cookies for **ClearML Server**. We recommend clearing your browser cookies for **ClearML Server**.
For example:
* For Firefox - go to Developer Tools > Storage > Cookies > delete all cookies under the **ClearML Server** URL.
* For Chrome - Developer Tools > Application > Cookies > delete all cookies under the **ClearML Server** URL.
<br/>
**How do I fix Docker upgrade errors?** <a id="common-docker-upgrade-errors"></a>
To resolve the Docker error:
`... The container name "/trains-???" is already in use by ...`
try removing deprecated images:
$ docker rm -f $(docker ps -a -q)
<br/>
**Why is web login authentication not working?** <a className="tr_top_negative" id="port-conflict"></a>
A port conflict between the **ClearML Server** MongoDB and / or Elastic instances, and other instances running on your system may prevent web login authentication from working correctly.
**ClearML Server** uses the following default ports which may be in conflict with other instances:
* MongoDB port `27017`
* Elastic port `9200`
You can check for port conflicts in the logs in `/opt/clearml/log`.
If a port conflict occurs, change the MongoDB and / or Elastic ports in the `docker-compose.yml`, and then run the Docker compose commands to restart the **ClearML Server** instance.
To change the MongoDB and / or Elastic ports for your **ClearML Server**, do the following:
1. Edit the `docker-compose.yml` file.
1. In the `services/trainsserver/environment` section, add the following environment variable(s):
* For MongoDB:
MONGODB_SERVICE_PORT: <new-mongodb-port>
* For Elastic:
ELASTIC_SERVICE_PORT: <new-elasticsearch-port>
For example:
MONGODB_SERVICE_PORT: 27018
ELASTIC_SERVICE_PORT: 9201
1. For MongoDB, in the `services/mongo/ports` section, expose the new MongoDB port:
<new-mongodb-port>:27017
For example:
20718:27017
1. For Elastic, in the `services/elasticsearch/ports` section, expose the new Elastic port:
<new-elasticsearch-port>:9200
For example:
9201:9200
1. Restart **ClearML Server**, see [Restarting ClearML Server](#restart).
<br/>
**How do I bypass a proxy configuration to access my local ClearML Server?** <a className="tr_top_negative" id="proxy-localhost"></a>
A proxy server may block access to **ClearML Server** configured for `localhost`.
To fix this, you may allow bypassing of your proxy server to `localhost` using a system environment variable, and configure **ClearML** for **ClearML Server** using it.
Do the following:
1. Allow bypassing of your proxy server to `localhost`
using a system environment variable, for example:
NO_PROXY = localhost
1. If a **ClearML** configuration file (`clearml.conf`) exists, delete it.
1. Open a terminal session.
1. In the terminal session, set the system environment variable to `127.0.0.1`, for example:
* Linux:
no_proxy=127.0.0.1
NO_PROXY=127.0.0.1
* Windows:
set no_proxy=127.0.0.1
set NO_PROXY=127.0.0.1
1. Run the **ClearML** wizard `clearml-init` to configure **ClearML** for **ClearML Server**, which will prompt you to open the **ClearML Web UI** at, [http://127.0.0.1:8080/](http://127.0.0.1:8080/), and create new **ClearML** credentials.
The wizard completes with:
Verifying credentials ...
Credentials verified!
New configuration stored in /home/<username>/clearml.conf
ClearML setup completed successfully.
<a className="tr_top_negative" id="elastic_watermark"></a>
<br/>
**The ClearML Server keeps returning HTTP 500 (or 400) errors. How do I fix this?**
The **ClearML Server** will return HTTP error responses (5XX, or 4XX) when some of its [backend components](deploying_clearml/clearml_server.md)
are failing.
A common cause for such a failure is low available disk space, as the Elasticsearch service used by your server will go
into read-only mode when it hits Elasticsearch flood watermark (by default, set to 95% disk space used).
This can be readily fixed by making more disk space available to the Elasticsearch service (either freeing up disk space
disk, or if using dynamic cloud storage, increasing the disk size).
:::note
A likely indication of this situation can be determined by searching your clearml logs for *"\[FORBIDDEN/12/index read-only
/ allow delete (api)]*".
:::
<br/>
**Why is my ClearML Web-App (UI) not showing any data?** <a className="tr_top_negative" id="web-ui-empty"></a>
If your **ClearML Web-App (UI)** does not show anything, it may be an error authenticating with the server. Try clearing the application cookies for the site in your browser's developer tools.
## ClearML Agent
**How can I execute ClearML Agent without installing packages each time?** <a className="tr_top_negative" id="system_site_packages"></a>
Instead of installing the Python packages in the virtual environment created by **ClearML Agent**, you can optimize execution
time by inheriting the packages from your global site-packages directory. In the **ClearML** configuration file, set the
configuration option `agent.package_manager.system_site_packages` to `true`.
## ClearML API
**How can I use the ClearML API to fetch data?** <a className="tr_top_negative" id="api"></a>
To fetch data using the **ClearML API**, create an authenticated session and send requests for data using the **ClearML API** services and methods. The responses to the requests contain your data.
For example, to get the metrics for an experiment and to print metrics as a histogram:
1. Start an authenticated session.
1. Send a request for all projects named `examples` using the `projects` service `GetAllRequest` method.
1. From the response, get the Ids of all those projects named `examples`.
1. Send a request for all experiments (tasks) with those project IDs using the `tasks` service `GetAllRequest` method.
1. From the response, get the data for the experiment (task) ID `11` and print the experiment name.
1. Send a request for a metrics histogram for experiment (task) ID `11` using the `events` service `ScalarMetricsIterHistogramRequest` method and print the histogram.
# Import Session from the trains backend_api
from trains.backend_api import Session
# Import the services for tasks, events, and projects
from trains.backend_api.services import tasks, events, projects
# Create an authenticated session
session = Session()
# Get projects matching the project name 'examples'
res = session.send(projects.GetAllRequest(name='examples'))
# Get all the project Ids matching the project name 'examples"
projects_id = [p.id for p in res.response.projects]
print('project ids: {}'.format(projects_id))
# Get all the experiments/tasks
res = session.send(tasks.GetAllRequest(project=projects_id))
# Do your work
# For example, get the experiment whose ID is '11'
task = res.response.tasks[11]
print('task name: {}'.format(task.name))
# For example, for experiment ID '11', get the experiment metric values
res = session.send(events.ScalarMetricsIterHistogramRequest(
task=task.id,
))
scalars = res.response_data
print('scalars {}'.format(scalars))

View File

@@ -0,0 +1,80 @@
---
title: Agent & Queue
---
Two major components of MLOps is experiment reproducibility, and the ability to scale work to multiple machines. ClearML Agent,
coupled with execution queues, addresses both of those needs.
The Agent is the base for **Automation** in ClearML and can be leveraged to build automated pipelines, services (such as alerts) and more.
## What does a ClearML Agent do?
An agent (Also referred to as a Worker) allows users to execute code on any machine it's installed on, which is used to scale data science work beyond one's own machine.
ClearML Agent not only clones the code, applies uncommitted changes, tracks experiment metrics and machine's status, but it also recreates the entire execution environment, be it by pulling the docker container or installing specified packages.
Once the environment is set up, and the code is cloned, the script is executed by the Agent, which reports metrics as well as monitor the machine it runs in.
The Agent also allows code parameters to be modified on-the-fly without code modification, this is the base for [Hyper Parameter Optimization](https://github.com/allegroai/clearml/tree/master/examples/optimization/hyper-parameter-optimization).
An agent can be associated with specific GPUs, so a machine with 8 GPUs can execute code only on a few GPUs or all the GPUs together.
## What is a Queue?
A queue is a list of Task IDs to be executed. You can configure a specific agent or agents to listen to a certain queue,
and to execute all Tasks pushed to that queue one after the other.
The Agent can also listen to multiple queues, according to one of the following options:
* The Agent pulls first from tfhe high priority queue then from the low priority queue.
* The Agent can pull in a round-robin (i.e. each queue has the same priority).
## Resource management
Installing an Agent on machines allows it to monitor all the machine's status (GPU \ CPU \ Memory \ Network \ Disk IO).
When managing multiple machines, this allows users to have an overview of their entire HW resources. What is the status of each machine, what is the expected workload
on each machine and so on.
![image](../img/agents_queues_resource_management.png)
You can organize your queues according to resource usage. Say you have a single-GPU machine. You can create a queue called
"single-gpu-queue" and assign the machine's agent, as well as other single-GPU agents to that queue. This way you will know
that Tasks assigned to that queue will be executed by a single GPU machine.
While the agents are up and running in your machines, you can access these resources from any machine by enqueueing a
Task to one of your queues, according to the amount of resources you want to allocate to the Task.
With queues and ClearML Agent, you can easily add and remove machines from the cluster, and you can
reuse machines without the need for any dedicated containers or images.
## Additional features
Agents can be deployed bare-metal, with multiple instances allocating
specific GPUs to the agents. They can also be deployed as dockers in a Kubernetes cluster.
The Agent has three running modes:
- Docker mode: The agent spins a docker image based on the Tasks definition then inside the docker the agent will clone
the specified repository/code, apply the original executions uncommitted changes, install the required python packages
and start executing the code while monitoring it.
- Virtual Environment Mode: The agent creates a new virtual environment for the experiment, installs the required python
packages based on the Task specification, clones the code repository, applies the uncommitted changes and finally
executes the code while monitoring it.
- Conda Environment Mode: Similar to the Virtual Environment mode, only instead of using pip, it uses conda install and
pip combination. Notice this mode is quite brittle due to the Conda package version support table.
## Services Agent & Queue
The ClearML Agent, in its default setup, spins a single Task per Agent. It's possible to run multiple agents on the same machine,
but each one will execute a single Task at a time.<br/>
This setup makes sense compute-heavy Tasks that might take some time to complete.
Some tasks, mainly control (Like a pipeline controller) or services (Like an archive cleanup service) are mostly idling, and only implement a thin control logic.<br/>
This is where the `services-modes` comes into play. An agent running in services-mode will spin multiple tasks at the same time, each Task will register itself as a sub-agent (visible in the workers Tab in the UI).
Some examples for suitable tasks are:
- [Pipeline controller](https://github.com/allegroai/clearml/blob/master/examples/pipeline/pipeline_controller.py) - Implementing the pipeline scheduling and logic
- [Hyper-Parameter Optimization](https://github.com/allegroai/clearml/blob/master/examples/optimization/hyper-parameter-optimization/hyper_parameter_optimizer.py) - Implementing an active selection of experiments
- [Control Service](https://github.com/allegroai/clearml/blob/master/examples/services/aws-autoscaler/aws_autoscaler.py) - AWS Autoscaler for example
- [External services](https://github.com/allegroai/clearml/blob/master/examples/services/monitoring/slack_alerts.py) - Such as Slack integration alert service
By default, [ClearML Server](../deploying_clearml/clearml_server.md) comes with an Agent running on the machine that runs it. It also comes with a Services queue.

View File

@@ -0,0 +1,122 @@
---
title: Artifacts & Models
---
ClearML allows easy storarge of experiments' output products as **artifacts** that can later be accessed easily
and used, through the web UI or programmatically.
A few examples of artifacts are:
* Model snapshot \ weights file
* Data preprocessing
* Feature representation of data
* and more!
## Artifacts
### Logging Artifacts
To log any type of artifact to a Task, use the upload_artifact() method. For example:
* Upload a local file containing the preprocessing results of the data.
```python
task.upload_artifact(name='data', artifact_object='/path/to/preprocess_data.csv')
```
* Upload an entire folder with all its content by passing the folder, which will be zipped and uploaded as a single
zip file:
```python
task.upload_artifact(name='folder', artifact_object='/path/to/folder/')
```
* Upload an instance of an object, Numpy/Pandas/PIL (converted to npz/csv.gz/jpg formats accordingly). If the
object type is unknown, it is pickled and uploaded.
```python
person_dict = {'name': 'Erik', 'age': 30}
task.upload_artifact(name='person dictionary', artifact_object=person_dict)
```
See more details in the artifacts [example](../guides/reporting/artifacts.md).
### Using Artifacts
To access a Task's artifact in order to use it:
1. Get the Task that created the artifact (see more details on [querying](task.md#querying--searching-tasks)
Tasks).
1. Retrieve all the Task's artifacts with the *artifact* property, which is essentially a dictionary,
where the key is the artifact name, and the value is the artifact itself.
1. Access a specific artifact using one of the following methods:
- Access files by calling *get_local_copy()*, which caches the files for later use and returns a path to the cached
file
- Access object artifacts by using the *get()* method that returns the Python object.
The code below demonstrates how to access a file artifact using the previously generated preprocessed data:
```python
# get instance of Task that created artifact, using Task ID
preprocess_task = Task.get_task(task_id='the_preprocessing_task_id')
# access artifact
local_csv = preprocess_task.artifacts['data'].get_local_copy()
```
See more details in the using artifacts [example](https://github.com/allegroai/clearml/blob/master/examples/reporting/using_artifacts_example.py).
### List of Supported Artifacts
- Numpy array (as npz file)
- Pandas dataframe
- PIL (converted to jpg)
- Files and folders
- Python objects (pickled)
## Models
Models are a special kind of artifact and, unlike regular artifacts that are accessed with the creating Task's ID, Models
are entities with their own unique ID, this makes Models a standalone entry that can be used as an artifactory interface.
### Logging Models (weights file)
When models are saved (for instance, by calling the `torch.save()` method), ClearML automatically logs the models and all
snapshot paths.
![image](../img/fundamentals_artifacts_logging_models.png)
See model storage examples, [TF](https://github.com/allegroai/clearml/blob/master/examples/frameworks/tensorflow/tensorflow_mnist.py),
[PyTorch](https://github.com/allegroai/clearml/blob/master/examples/frameworks/pytorch/pytorch_mnist.py),
[Keras](https://github.com/allegroai/clearml/blob/master/examples/frameworks/keras/keras_tensorboard.py),
[Scikit-Learn](https://github.com/allegroai/clearml/blob/master/examples/frameworks/scikit-learn/sklearn_joblib_example.py).
### Using Models
Loading a previously trained model is quite similar to loading artifacts.
```python
prev_task = Task.get_task(task_id='the_training_task')
last_snapshot = prev_task.models['output'][-1]
local_weights_path = last_snapshot.get_local_copy()
```
1. Get the instance of the Task that created the original weights files
2. Query the Task on its output models (a list of snapshots)
3. Get the latest snapshot (if using Tensorflow, the snapshots are stored in a folder, so the 'local_weights_path' will point to a folder containing the requested snapshot).
Notice that if one of the frameworks will load the weights file, the running Task will automatically update, with
"Input Model" pointing directly to the original training Task's model. With this feature, it's easy to get a full genealogy
of every trained and used model in our system!
Loading framework models appear under the "Input Models" section, under the Artifacts tab in the ClearML UI.
### Setting Upload Destination
ClearML automatically captures the storage path of Models created by frameworks such as TF, Pytorch, and scikit-learn. By default,
it stores the local loading path they are saved to.
To automatically store all created models by a specific experiment, modify the `Task.init()` function as such:
```python
task = Task.init(project_name='examples', task_name='storing model', output_uri='s3://my_models/')
```
To automatically store all created models from all experiments in a certain storage medium, edit the `clearml.conf` (see
[ClearML Cofiguration Reference](../configs/clearml_conf#sdkdevelopment)) and set `sdk.developmenmt.default_output_uri` to the desired
storage (see [Storage](../integrations/storage.md)).
This is especially helpful when using [clearml-agent](../clearml_agent.md) to execute code.
### List of Supported Frameworks
- Tensorflow
- Keras
- Pytorch
- scikit-learn (only using joblib)
- XGBoost (only using joblib)

118
docs/fundamentals/hpo.md Normal file
View File

@@ -0,0 +1,118 @@
---
title: Hyperparameter optimization
---
## What is HyperParameter Optimization?
Hyperparameters are variables that directly control the behaviors of training algorithms, and have a significant effect on
the performance of the resulting machine learning models. Finding the hyperparameter values that yield the best
performing models can be complicated. Manually adjusting hyperparameters over the course of many training trials can be
slow and tedious. Luckily, **hyperparameter optimization** can be automated and boosted using **ClearML**'s
`HyperParameterOptimizer` class.
## What does ClearML's `HyperParameterOptimizer` do?
The `HyperParameterOptimizer` class does the following:
* Clones the base experiment that needs to be optimized
* Changes arguments based on an optimizer strategy that is specified
* Tries to minimize / maximize defined objectives.
The `HyperParameterOptimizer` class contains **ClearML**s hyperparameter optimization modules. Its modular design enables
using different optimizers, including existing software frameworks, enabling simple, accurate, and fast hyperparameter
optimization.
**The optimizers include:**
* **Optuna** - `automation.optuna.optuna.OptimizerOptuna`. Optuna is the default optimizer in ClearML. It makes use of
different samplers such as grid search, random, bayesian, and evolutionary algorithms.
For more information, see the [Optuna](https://optuna.readthedocs.io/en/latest/)
documentation.
* **BOHB** - `automation.hpbandster.bandster.OptimizerBOHB`. BOHB performs robust and efficient hyperparameter optimization
at scale by combining the speed of Hyperband searches with the guidance and guarantees of convergence of Bayesian Optimization.
For more information about HpBandSter BOHB, see the [HpBandSter](https://automl.github.io/HpBandSter/build/html/index.html)
documentation.
* **Random** uniform sampling of hyperparameters - `automation.optimization.RandomSearch`
* **Full grid** sampling strategy of every hyperparameter combination - `Grid search automation.optimization.GridSearch`.
* **Custom** - `automation.optimization.SearchStrategy`. - Use a custom class and inherit from the ClearML automation base strategy class
Make use of **ClearML**'s hyperparameter optimization capabilities by:
* Initializing an Optimizer Task, which will record and monitor arguments, execution details, results, and more.
* Instantiating a `HyperParameterOptimizer`, where the following is specified:
* Task to optimize
* Hyperparameters to optimize
* Metric to optimize
* Optimizer class (optimization strategy) where the optimization configuration and resources budget are defined
* And more.
* Enqueuing the Task to be executed by a `clearml-agent` or multiple agent in a remote machine.
* Monitoring the optimization process and viewing the summarized results in the **ClearML web UI**
**ClearML**'s approach to hyperparameter optimization is scalable, easy to set up and to manage, and it makes it easy to
compare results.
## Defining a hyperparameter optimization search example
1. Import ClearML's automation modules:
```python
from clearml.automation import UniformParameterRange, UniformIntegerParameterRange
from clearml.automation import HyperParameterOptimizer
from clearml.automation.optuna import OptimizerOptuna
```
1. Initialize the Task, which will be stored in ClearML Server when the code runs. After the code runs at least once,
it can be reproduced and tuned:
```python
from clearml import Task
task = Task.init(project_name='Hyper-Parameter Optimization',
task_name='Automatic Hyper-Parameter Optimization',
task_type=Task.TaskTypes.optimizer,
reuse_last_task_id=False)
```
1. Define the optimization configuration and resources budget:
```python
optimizer = HyperParameterOptimizer(
# specifying the Task to be optimized, Task must be in system already so it can be cloned
base_task_id=TEMPLATE_TASK_ID,
# setting the hyper-parameters to optimize
hyper_parameters=[
UniformIntegerParameterRange('number_of_epochs', min_value=2, max_value=12, step_size=2),
UniformIntegerParameterRange('batch_size', min_value=2, max_value=16, step_size=2),
UniformParameterRange('dropout', min_value=0, max_value=0.5, step_size=0.05),
UniformParameterRange('base_lr', min_value=0.00025, max_value=0.01, step_size=0.00025),
],
# setting the objective metric we want to maximize/minimize
objective_metric_title='accuracy',
objective_metric_series='total',
objective_metric_sign='max',
# setting optimizer
optimizer_class=OptimizerOptuna,
# configuring optimization parameters
execution_queue='default',
max_number_of_concurrent_tasks=2,
optimization_time_limit=60.,
compute_time_limit=120,
total_max_jobs=20,
min_iteration_per_job=15000,
max_iteration_per_job=150000,
)
```
For further information about the `HyperParameterOptimizer` arguments, see the [Automation module reference](../references/sdk/hpo_optimization_hyperparameteroptimizer.md).
1. Make sure an agent or multiple agents are listening to the queue defined above (`execution_queue='default'`). See [Clearml Agent](../clearml_agent.md).
1. Start the hyperparameter optimization process:
```python
optimizer.set_report_period(1) # setting the time gap between two consecutive reports
optimizer.start()
optimizer.wait() # wait until process is done
optimizer.stop() # make sure background optimization stopped
```
1. Take a look at the summarized results of the optimization in the **Web UI**, in the optimizer Task's experiment page.
There is also the option to look at the results of a specific experiment, or the results of a few experiments and
to [Compare](../webapp/webapp_exp_comparing.md).

View File

@@ -0,0 +1,186 @@
---
title: Hyperparameters
---
Hyperparameters are the configuration options given for a script.
ClearML logs hyperparameters used in experiments from multiple different sources.
In ClearML, parameters are split into 3 sections:
- User Properties - Modifiable section that can be edited post execution.
- Hyperparameters - Individual parameters for configuration
- Configuration Objects - Usually configuration files (Json \ YAML) or python objects.
These sections are further broken down into sub-sections (General \ Args \ TF_Define) for convenience.
![image](../img/hyperparameters_sections.png)
## Argument Parser
Parameters passed to experiments, using Python's built-in argparser module, are automatically captured by ClearML, so no code
changes are needed.
```python
from clearml import Task
import argparse
parser = argparse.ArgumentParser(description="Script Argparser")
parser.add_argument("-lr", default=0.001, help="Initial learning rate")
parser.add_argument("-epochs", default= 10, help="Total number of epochs")
args = parser.parse_args()
task = Task.init(project_name="examples",task_name="argparser logging")
```
## Connecting Objects
Users can directly connect objects, such as dictionaries or even custom classes, to Tasks.
All class members will be automatically fetched and logged by ClearML.
* Connecting a class:
```python
class person:
def __init__(self, name, age):
self.name = name
self.age = age
me = person('Erik',5)
task = Task.init(project_name='examples',task_name='argparser')
task.connect(me)
```
* Connecting a dictionary:
```python
task = Task.init(project_name='examples', task_name='dictionary logging')
params_dictionary = {'epochs': 3, 'lr': 0.4}
task.connect(params_dictionary)
```
## User Properties
User properties are an editable key / value store, which enables adding information to an experiment,
making it easier to search / filter. User properties are like parameters that can also be added after a Task's execution, which
can also be added to an [experiment table](../webapp/webapp_exp_table.md) (i.e. customize columns).
For example:
```python
task.set_user_properties({"name": "backbone",
"description": "network type",
"value": "great"})
```
The above example will connect to the Task a user property named "backbone", with the description "network type", and
the value of "great".
## Environment Variables
:::important
Relying on environment variables makes an experiment not fully reproducible, since ClearML Agent can't reproduce them at
runtime.
:::
Environment variables can be logged by modifying the [clearml.conf](../configs/clearml_conf) file. Modify the *log_os_environments*
parameter specifying parameters to log.
`log_os_environments: ["AWS_*", "CUDA_VERSION"]`
It's also possible to specify environment variables using the CLEARML_LOG_ENVIRONMENT variable.
:::note
The CLEARML_LOG_ENVIRONMENT always overrides the clearml.conf file.
:::
## TF Defines
ClearML automatically captures TFDefine files, which are used as configuration files for Tensorflow.
## Hydra
[Hydra](https://github.com/facebookresearch/hydra) is a module developed by FaceBook AI Research to manage experiments'
parameters. Hydra offers the best of both worlds, managing configurations with files while making parameters overridable at runtime.
ClearML logs the Omegaconf which holds all the configuration files, as well as overriden values.
Check out the [example code](https://github.com/allegroai/clearml/blob/master/examples/frameworks/hydra/hydra_example.py),
which demonstrates the creation of a configuration object to which configuration values can be added and overridden using the
command line.
## Configuration Objects
Configuration objects are dictionaries or configuration files connected to the Task. Unlike Hyperparameters, these are saved as a whole and not
divided into individual parameters.
To connect a configuration dictionary:
```python
model_config_dict = {
'value': 13.37,
'dict': {'sub_value': 'string', 'sub_integer': 11},
'list_of_ints': [1, 2, 3, 4],
}
model_config_dict = task.connect_configuration(name='dictionary', configuration=model_config_dict)
```
To connect a configuration file:
```python
config_file_yaml = task.connect_configuration(name="yaml file", configuration='path/to/configuration/file.yaml', )
```
Configuration objects can be split into categories in the Configuration section.
The "name" argument, is the name of the section that the object will go into. If a section name is not specified, the default section is *General*.
See [here](https://github.com/allegroai/clearml/blob/master/examples/reporting/model_config.py) for a detailed example.
## Manual Parameter Access
### Manual Parameter Input
In addition to connecting a dictionary or a class to log hyperparameters, users can also use the `set_parameters` method
to define parameters manually. Parameters are inputted as dictionaries.
Additionally, parameters can be categorized, and each category appears in its own section in the hyperparameter tab of the web UI.
Specify a section by putting its name before the parameter, for example `'Args/epochs': 'value'` - 'epochs' will go into the
'Args' section. If a section isn't specified, the parameter will go into the *General* section by default.
Calling the `set_parameter` method will set a single parameter.
```python
task = Task.init(project_name='examples', task_name='parameters')
# override parameters with provided dictionary
task.set_parameters({'Args/epochs':7, 'lr': 0.5})
# setting a single parameter
task.set_parameter(name='decay',value=0.001)
```
:::warning
The *set_parameters* method will override any parameters already logged.
:::
### Adding Parameters
To update the parameters in an experiment, use the `set_parameters_as_dict` method . Arguments and values are inputted as a dictionary.
Like in the `set_parameters` method, the dictionary can be nested, so the parameter's section can be specified.
```python
task = Task.task_get(task_id='123456789')
# add parameters
task.set_parameters_as_dict({'my_args/lr':0.3, 'epochs':10})
```
### Accessing Parameters
To get all Task's parameters, use the `get_parameters()` method, which will return a dictionary with the parameters, including
their section.
```python
task = Task.get_task(project_name='examples', task_name='parameters')
# will print a flattened dictionary of the 'section/parameter': 'value' pairs. {'Args/epochs': '7', 'General/lr': '0.5'}
print(task.get_parameters())
```

View File

@@ -0,0 +1,62 @@
---
title: Logger
---
The ClearML **Logger** object is used to report experiments' results such as metrics, graphs, and debug samples. It is a
member of the [Task](task.md) object.
ClearML integrates with the leading visualization libraries, and automatically captures reports to them.
## Types of Logged Results
In ClearML, there are four types of reports:
- Text - Mostly captured automatically from stdout and stderr but can be logged manually.
- Scalars - Time series data. X-axis is always a sequential number, usually iterations but can be epochs or others.
- Plots - General graphs and diagrams, such as histograms, confusion matrices line plots, and custom plotly charts.
- Debug Samples - Images, audio, and videos. Can be reported per iteration.
![image](../img/fundamentals_logger_results.png)
## Automatic Reporting
ClearML automatically captures metrics reported to tools, such as Tensorboard and Matplotlib, with no additional code
necessary.
In addition, ClearML will capture and log everything written to standard output, from debug messages to errors to
library warning messages.
GPU, CPU, Memory and Network information is also automatically captured.
![image](../img/fundamentals_logger_cpu_monitoring.png)
### Supported packages
- [Tensorboard](https://www.tensorflow.org/tensorboard)
- [TensorboardX](https://github.com/lanpa/tensorboardX)
- [matplotlib](https://matplotlib.org/)
## Manual Reporting
ClearML also supports manually reporting multiple types of metrics and plots, such as line plots, histograms, and even plotly
charts.
The object used for reporting metrics is called **logger** and is obtained by calling
```python
logger = task.get_logger()
```
Check out all the available object types that can be reported in the example [here](../guides/reporting/scalar_reporting.md).
#### Media reporting
ClearML also supports reporting media (such as audio, video and images) for every iteration.
This section is mostly used for debugging. It's recommended to use [artifacts](artifacts.md#artifacts) for storing script
outputs that would be used later on.
Only the last X results of each title \ series are saved to prevent overloading the server.
See details in [Logger.report_media](../references/sdk/logger.md#report_media).
![image](../img/fundamentals_logger_reported_images.png)
Check out the Media Reporting [example](../guides/reporting/media_reporting).

View File

@@ -0,0 +1,74 @@
---
title: Pipelines
---
Users can automate [Tasks](task) to run consecutively or according to some logic by putting the Tasks into a pipeline.
Tasks in a pipeline can leverage other tasks' work products such as artifacts and parameters.
Pipelines are controlled by a *Controller Task* that holds the logic of the pipeline execution steps.
## How do pipelines work?
Before running a pipeline, we need to configure a Controller Task, in which the pipeline is defined. The user decides the controlling logic, whether it be simple
([DAG](https://en.wikipedia.org/wiki/Directed_acyclic_graph)) or complex custom logic.
Once the pipeline is running, it first clones existing Tasks (called templates) and then sends the cloned Tasks for execution
according to the pipeline's control logic.
![image](../img/fundamentals_pipeline.png)
## Simple DAG Pipelines
For a simple, DAG based logic, use the off-the-shelf `PipelineController` class to define the DAG (see an example [here](../guides/pipeline/pipeline_controller)). Once PipelineController object is populated and configured,
we can start the pipeline, which will launch its first steps, then it waits until the pipeline is completed.
The pipeline control logic is processed in a background thread.
## Custom Pipelines
In cases where a DAG is insufficient (for example when needing to launch one pipeline, then, if performance is inadequate, rerun pipeline again),
users can apply custom logic, using a generic methods to enqueue Tasks, implemented in python code.
The logic of the pipeline sits in a *Controller Task*.
Since a pipeline *Controller Task* is a Task on its own, it's possible to have pipelines running other pipelines.
This gives users greater degrees of freedom for automation.
Custom pipelines usually involves cloning existing Tasks (Template Tasks), modiftying their parameters and manually enqueuing
them to queues (For execution by [agents](../clearml_agent.md). Since it's possible to control Task's execution (Including
overriding Hyperparameters and Artifacts) and get output metrics, it's possible to create custom logic that controls inputs and acts upon outputs.
A simple Custom pipeline may look like this:
```python
task = Task.init('examples', 'Simple Controller Task', task_type=Task.TaskTypes.controller)
# Get a reference to the task to pipe to.
first_task = Task.get_task(project_name='PROJECT NAME', task_name='TASK NAME')
# Clone the task to pipe to. This creates a task with status Draft whose parameters can be modified.
cloned_first_task = Task.clone(source_task=first_task, name='Auto generated cloned task')
cloned_first_task.set_parameters({'key':val})
Task.enqueue(cloned_first_task.id, queue_name='QUEUE NAME')
# Here comes custom logic
#
#
###
# Get a reference to the task to pipe to.
next_task = Task.get_task(project_name='SECOND PROJECT NAME', task_name='SECOND TASK NAME')
# Clone the task to pipe to. This creates a task with status Draft whose parameters can be modified.
cloned_task = Task.clone(source_task=next_task, name='Second Cloned Task')
Task.enqueue(cloned_task.id, queue_name='QUEUE NAME')
```
See an example for custom pipelines [here](https://github.com/allegroai/clearml/tree/master/examples/automation)
:::note
We recommend enqueuing Pipeline Controller Tasks into a
[services](agents_and_queues#services-agent--queue) queue
:::

365
docs/fundamentals/task.md Normal file
View File

@@ -0,0 +1,365 @@
---
title: Task / Experiment
---
ClearML Task lies at the heart of ClearML's experiment manager. A Task is an object that holds
all the execution information: Code, Environment, Parameters, Artifacts and Results.
A Task is a single code execution session. To transform an existing script into a Task, one must call [Task.init()](../references/sdk/task.md#taskinit)
which creates a Task object that automatically captures:
* Git information
* Python environment
* Parameters in the code
* Uncommitted code
* Outputs of the execution (e.g. console outputs, Tensorboard, logs etc.)
Previously executed Tasks can be accessed and utilized with code. It's possible to copy a Task multiple times and modify its:
* Arguments
* Environment (e.g. repo commit ID, Python package)
* Configurations (e.g. command line arguments, configuration file etc.).
In ClearML, Tasks are organized into projects, and Tasks can be identified either by a project name & task name combination
or by a unique ID.
### Projects and Sub Projects
In ClearML, Tasks are organized into projects. Projects are logical entities (similar to folders) that group tasks. Users can decide
how to group tasks, but different models or objectives are usually grouped into different projects.
Projects can be further divided into sub-projects (and sub-sub-projects, etc.)
just like files and subdirectories on a computer, making experiment organization easier.
## Task sections
A Task is comprised of multiple sections, linked together for traceability.
After a Task has been initialized, it's possible to track, visualize, and, depending on its status, edit Task details, including:
* [Execution Information](#execution)
* [Configuration Parameters](#configuration)
* [Artifacts](#artifacts).
### Execution
The environment for executing the experiment.
#### Source code:
- Repository / Commit - Saves a reference to the git repository and specific commit ID of the current experiment.
- Script Path - Stores the entry point script for the experiment.
- Working Directory - The working directory for the current experiment. This is relative to the root git repository folder.
#### Uncommitted changes
Stores the uncommitted changes of the current experiment. If the experiment has no git repository, it will store the
entire experiment script file here (ClearML only stores a single file, when using more than a single script for
an experiment please use git :smile: )
#### Installed packages
Stores a list of all the packages that the experiment is using, including the specific version of the packages.
Only directly imported packages will appear here. This is done to make sure the important packages and versions used
by the experiment are captured.
The section itself is fully compatible with the Python `requirements.txt` standard, and is fully editable.
#### Base docker image
Specify the required docker image for remote execution of the code (see [ClearML Agent](../clearml_agent)).
A remote machine will execute the entire experiment inside the requested docker.
It's also possible to add parameters for the docker execution. For example:
`nvcr.io/nvidia/pytorch:20.11-py3 --ipc=host`
#### Output destination
Storage target to Automatically uploads all models / snapshots. This is applicable
mostly when an experiment is executed by an agent, read more on [Agents](../clearml_agent.md) and [Storage](../integrations/storage) integration here.
### Configuration
Configurations are a set of arguments / dictionaries / files used to define the experiment (read more [here](hyperparameters)).
#### User properties
Editable key / value store, which enables adding information to an experiment after execution, making it easier to search / filter.
#### Hyperparameters
- Args - Command line arguments of the experiment process .`argparse` values are automatically detected and logged here.
- Environment - Specific [Environment variables](../configs/env_vars.md) to be logged.
- General - The default section name for a general purpose dictionary of parameters that are logged. See the 'name'
parameter of [`task_connect`](../references/sdk/task#connect).
- *user_section* - Custom section for logged python dictionaries & objects that are logged.
#### Configuration object:
- General - Default section for a dictionary or configuration file to store as plain test configuration. Modifiable when executed
by an agent.
- *user_section* - Support for multiple configuration files (or dictionaries), name each configuration section. Modifiable
when executed by an agent.
### Artifacts
Artifacts are a way to store the outputs of an experiment, and later use those outputs as inputs in other processes.
See more information on [Artifacts](artifacts).
#### Models
- **Input Model** - Any model weights file loaded by the experiment will appear here.
- **Output Model** - Any stored weights file / model will be logged here. This is useful for searching and connecting output models to
inference pipelines for production automation.
### Results
Results recorded in the task. Supports text, graphs, plots, images audio and more including automatic reports by Tensorboard and Matplotlib.
See [logger](logger).
#### Console
Stdout and stderr outputs will appear here automatically.
#### Scalars
Any time-series graphs appear here such as Tensorboard scalar, scalar reporting from code and machine performance (CPU / GPU / Net etc.).
#### Plots
Non-time-series plots appear here, such as Tensorboard Histograms \ Distribution and Matplotlib plots (with exception to `imshow` plots). <br/>
It's also possible to report plots directly to ClearML (e.g. scatter 2d / 3d tables, generic plotly objects etc).
#### Debug samples
Any media (image / audio / html) is saved here.
Media reported to Tensorboard is saved here as well as images shown with `Matplotlib.plot.imshow`.<br/>
It's also possible to manually report media / link an experiment produces with the Logger interface. See [Logger.report_media](../references/sdk/logger.md#report_media).<br/>
## Usage
### Task Creation
`Task.init()` is the main method used to create Tasks in ClearML. It will create a Task, and populate it with:
* A link to the running git repository (including commit ID and local uncommitted changes)
* Python packages used (i.e. directly imported Python packages, and the versions available on the machine)
* Argparse arguments (default and specific to the current execution)
* Reports to Tensorboard & Matplotlib and model checkpoints.
```python
from clearml import Task
task = Task.init(
project_name='example',
task_name='task template',
task_type=None,
tags=None,
reuse_last_task_id=True,
continue_last_task=False,
output_uri=None,
auto_connect_arg_parser=True,
auto_connect_frameworks=True,
auto_resource_monitoring=True,
auto_connect_streams=True,
)
```
Once a Task is created, the Task object can be accessed from anywhere in the code by calling [`Task.current_task()`](../references/sdk/task.md#taskcurrent_task).
If multiple Tasks need to be created in the same process (for example, for logging multiple manual runs),
make sure we close a Task, before initializing a new one. To close a task simply call `task.close()`
(see example [here](https://github.com/allegroai/clearml/blob/master/examples/advanced/multiple_tasks_single_process.py)).
Projects can be divided into sub-projects, just like folders are broken into subfolders.
For example:
```python
Task.init(project_name='main_project/sub_project', task_name='test')
```
Nesting projects works on multiple levels. For example: `project_name=main_project/sub_project/sub_sub_project`
#### Task Reuse
Every `Task.init` call will create a new Task for the current execution.
In order to mitigate the clutter that a multitude of debugging Tasks might create, a Task will be reused if:
* The last time it was executed (on this machine) was under 72 hours ago (configurable, see
`sdk.development.task_reuse_time_window_in_hours` in the [`sdk.development` section](../configs/clearml_conf.md#sdkdevelopment) of
the ClearML configuration reference)
* The previous Task execution did not have any artifacts/models
It's possible to always create a new Task by passing `reuse_last_task_id=False`.
See full `Task.init` documentation [here](../references/sdk/task.md#taskinit).
### Empty Task Creation
A Task can also be created without the need to execute the code itself.
Unlike the runtime detections, all the environment and configuration details needs to be provided explicitly.
For example:
```python
task = Task.create(
project_name='example',
task_name='task template',
repo='https://github.com/allegroai/clearml.git',
branch='master',
script='examples/reporting/html_reporting.py',
working_directory='.',
docker=None,
)
```
See [`Task.create`](../references/sdk/task.md#taskcreate) in the Python SDK reference.
### Accessing Tasks
A Task can be identified by its project and name, and by a unique identifier (UUID string). The name and project of
a Task can be changed after an experiment has been executed, but its ID can't be changed.
Programmatically, Task objects can be retrieved by querying the system based on either the Task ID or a project and name
combination. If a project / name combination is used, and multiple Tasks have the exact same name, the function will return
the *last modified Task*.
For example:
* Accessing a Task object with a Task ID:
```python
a_task = Task.get_task(task_id='123456deadbeef')
```
* Accessing a Task with a project / name:
```python
a_task = Task.get_task(project_name='examples', task_name='artifacts')
```
Once a Task object is obtained, it's possible to query the state of the Task, reported scalars, etc.
The Task's outputs, such as artifacts and models, can also be retrieved.
### Querying \ Searching Tasks
Searching and filtering Tasks can be done via the [web UI](../webapp/webapp_overview.md), but also programmatically.
Input search parameters into the `Task.get_tasks` method, which returns a list of Task objects that match the search.
For example:
```python
task_list = Task.get_tasks(
task_ids=None, # type Optional[Sequence[str]]
project_name=None, # Optional[str]
task_name=None, # Optional[str]
task_filter=None # Optional[Dict]
)
```
We can search for tasks by either their UUID or their project \ name combination
It's possible to also filter Tasks by passing filtering rules to `task_filter`.
For example:
```python
task_filter={
# only Tasks with tag `included_tag` and without tag `excluded_tag`
'tags': ['included_tag', '-excluded_tag'],
# filter out archived Tasks
'system_tags': ['-archived'],
# only completed & published Tasks
'status': ['completed', 'published'],
# only training type Tasks
'type': ['training'],
# match text in Task comment or task name
'search_text': 'reg_exp_text'
}
```
### Cloning & Executing Tasks
Once a Task object is created, it can be a copied (cloned). `Task.clone` returns a copy of the original Task (`source_task`).
By default, the cloned Task is added to the same project as the original, and it's called "Clone Of ORIGINAL_NAME", but
the name / project / comment of the cloned Task can be directly overridden.
```python
cloned = Task.clone(
source_task=task, # type: Optional[Union[Task, str]]
# override default name
name='newly created task', # type: Optional[str]
comment=None, # type: Optional[str]
# insert cloned Task into a different project
project=None, # type: Optional[str]
)
```
A cloned Task starts in [draft](#task-states-and-state-transitions) mode, so its Task configurations can be edited (see
[Task.set_parameters](../references/sdk/task.md#set_parameters)).
Once a Task is modified, launch it by pushing it into an execution queue, then a [ClearML Agent](../clearml_agent) will pull
it from the queue and execute the Task.
```python
Task.enqueue(
task=task, # type: Union[Task, str]
queue_name='default', # type: Optional[str]
queue_id=None # type: Optional[str]
)
```
See enqueue [example](https://github.com/allegroai/clearml/blob/master/examples/automation/task_piping_example.py).
### Advanced Remote Execution
A compelling workflow is:
1. Running code on the development machine for a few iterations, or just setting up the environment.
1. Moving the execution to a beefier remote machine for the actual training.
For example, to stop the current manual execution, and then re-run it on a remote machine, simply add the following
function call to the code:
```python
task.execute_remotely(
queue_name='default', # type: Optional[str]
clone=False, # type: bool
exit_process=True # type: bool
)
```
Once the function is called on the machine, it will stop the local process and enqueue the current Task into the *default*
queue. From there, an agent will be able to pick it up and launch it.
#### Remote Function Execution
A specific function can also be launched on a remote machine with `create_function_task`.
For example:
```python
def run_me_remotely(some_argument):
print(some_argument)
a_func_task = task.create_function_task(
func=run_me_remotely, # type: Callable
func_name='func_id_run_me_remotely', # type:Optional[str]
task_name='a func task', # type:Optional[str]
# everything below will be passed directly to our function as arguments
some_argument=123
)
```
Arguments passed to the function will be automatically logged under the `Function` section in the Hyperparameters tab.
Like any other arguments, they can be changed from the UI or programmatically.
:::note
Function Tasks must be created from within a regular Task, created by calling `Task.init()`
:::
## Task lifecycle
1. A Task is created when running the code. It collects the environment configuration of the runtime execution.
1. Results of the code execution (graphs, artifacts, etc.) are stored by the Task.
1. To execute a Task (in draft mode) on a remote machine, push the Task into an execution queue.
1. A `clearml-agent` can execute a Task on a remote machine:
1. The agent pulls the Task from the execution queue.
2. The agent sets the environment, runs the code, and collects the results.
1. An existing Task can be replicated (cloned). The environment / configuration is replicated, but the output results are
left empty (draft mode).
#### Task states and state transitions
The state of a Task represents its stage in the Task lifecycle. It indicates whether the Task is read-write (editable) or
read-only. For each state, a state transition indicates which actions can be performed on an experiment, and the new state
after performing an action.
The following table describes Task the states and state transitions.
| State | Description / Usage | State Transition |
|---|---|---|
| *Draft* | The experiment is editable. Only experiments in *Draft* mode are editable. The experiment is not running locally or remotely. | If the experiment is enqueued for a [worker](../fundamentals/agents_and_queues.md) to fetch and execute, the state becomes *Pending*. |
| *Pending* | The experiment was enqueued and is waiting in a queue for a worker to fetch and execute it. | If the experiment is dequeued, the state becomes *Draft*. |
| *Running* | The experiment is running locally or remotely. | If the experiment is manually or programmatically terminated, the state becomes *Aborted*. |
| *Completed* | The experiment ran and terminated successfully. | If the experiment is reset or cloned, the state of the cloned experiment or newly cloned experiment becomes *Draft*. Resetting deletes the logs and output of a previous run. Cloning creates an exact, editable copy. |
| *Failed* | The experiment ran and terminated with an error. | The same as *Completed*. |
| *Aborted* | The experiment ran, and was manually or programmatically terminated. | The same as *Completed*. |
| *Published* | The experiment is read-only. Publish an experiment to prevent changes to its inputs and outputs. | A *Published* experiment cannot be reset. If it is cloned, the state of the newly cloned experiment becomes *Draft*. |
## Task types
Tasks also have a *type* attribute, which denotes their purpose (Training / Testing / Data processing). This helps to further
organize projects and ensure Tasks are easy to search and find. The default Task type is *training*.
Available Task types are:
- Experimentation
- *training*, *testing*, *inference*
- Other workflows
- *controller*, *optimizer*
- *monitor*, *service*, *application*
- *data_processing*, *qc*
- *custom*

View File

@@ -0,0 +1,12 @@
---
title: ClearML Modules
---
- **ClearML Python Package** (clearml) for integrating **ClearML** into your existing code-base.
- **ClearML Server** (clearml-server) storing experiment, model, and workflow data, and supporting the Web UI experiment manager. It is also the control plane for the ML-Ops.
- **ClearML Agent** (clearml-agent) The ML-Ops orchestration agent. Enabling experiment and workflow reproducibility, and scalability.
- **ClearML Data** (clearml-data) data management and versioning on top of file-systems/object-storage.
- **ClearML Session** (clearml-session) Launch remote instances of Jupyter Notebooks and VSCode.
solutions combined with the clearml-server control plain.
![clearml architecture](../img/clearml_architecture.png)

View File

@@ -0,0 +1,70 @@
---
title: Best Practices
---
This section talks about what made us design ClearML the way we did and how does it reflect on ML \ DL workflows.
While ClearML was designed to fit into any workflow, we do feel that working as we describe below brings a lot of advantages from organizing one's workflow
and furthermore, preparing it to scale in the long term.
:::important
The below is only our opinion. ClearML was designed to fit into any workflow whether it conforms to our way or not!
:::
## Develop Locally
**Work on a machine that is easily managable!**
During early stages of model development, while code is still being modified heavily, this is the usual setup we'd expect to see used by data scientists:
- A local development machine, usually a laptop (and usually using only CPU) with a fraction of the dataset for faster iterations - this is used for writing the training pipeline code, ensuring it knows to parse the data
and there are no glaring bugs.
- A workstation with a GPU, usually with a limited amount of memory for small batch-sizes. This is used to train the model and ensure the model we chose makes sense and that the training
procedure works. Can be used to provide initial models for testing.
The abovementioned setups might be folded into each other and that's great! If you have a GPU machine for each researcher that's awesome!
The goal of this phase is to get a code, dataset and environment set-up so we can start digging to find the best model!
- [ClearML SDK](../../clearml_sdk.md) should be integrated into your code (Check out our [getting started](ds_first_steps.md)).
This helps visualizing the results and track progress.
- [ClearML Agent](../../clearml_agent.md) helps moving your work to other machines without the hassle of rebuilding the environment every time,
while also creating an easy queue interface that easily allows you to just drop your experiments to be executed one by one
(Great for ensuring that the GPUs are churning during the weekend).
- [ClearML Session](../../apps/clearml_session.md) helps with developing on remote machines, just like you'd develop on you local laptop!
## Train Remotely
In this phase, we scale our training efforts, and try to come up with the best code \ parameter \ data combination that
yields the best performing model for our task!
- The real training (usually) should **not** be executed on your development machine.
- Training sessions should be launched and monitored from a web UI.
- You should continue coding while experiments are being executed without interrupting them.
- Stop optimizing your code because your machine struggles, and run it on a beefier machine (cloud \ on-prem).
Visulization and comparisons dashboards keep your sanity at bay! In this stage we usually have a docker container with all the binaries
that we need.
- [ClearML SDK](../../clearml_sdk.md) ensures that all the metrics, parameters and Models are automatically logged and can later be
accessed, [compared](../../webapp/webapp_exp_comparing.md) and [tracked](../../webapp/webapp_exp_track_visual.md).
- [ClearML Agent](../../clearml_agent.md) does the heavy lifting. It reproduces the execution environment, clones your code
, apply code patches, manage parameters (Including overriding them on the fly), execute the code and queue multiple tasks
It can even [build](../../clearml_agent.md#buildingdockercontainers) the docker container for you!
-[ClearML Pipelines](../../fundamentals/pipelines.md) ensures that steps run in the same order,
programatically chaining tasks together, while giving an overview of the execution pipeline's status.<br/>
**Your entire environment should magically be able to run on any machine, without you working hard.**
## Track EVERYTHING
We believe that you should track everything! From obscure parameters to weird metrics, it's impossible to know what will end up
improving our results later on!
- Make sure experiments are reproducible! ClearML logs code, parameters, environment in a single, easily searchable place.
- Development is not linear. Configuration \ Parameters should not be stored in your git
they are temporary, and we constantly change them. But we still need to log them because who knows one day...
- Uncommitted changes to your code should be stored for later forensics in case that magic number actually saved the day. Not every line change should be committed.
- Mark potentially good experiments, make them the new baseline for comparison.
## Visibility Matters
While it's possible to track experiments with one tool, and pipeline them with another, we believe that having
everything under the same roof benefits you great! It's

View File

@@ -0,0 +1,52 @@
---
title: First Steps
---
## Install ClearML
First, [sign up for free](https://app.community.clear.ml)
Install the clearml python package:
```bash
pip install clearml
```
Connect your computer to the server by [creating credentials](https://app.community.clear.ml/profile), then run the below and follow the setup instructions:
```bash
clearml-init
```
## Auto-log experiment
In ClearML, experiments are organized as [Tasks](../../fundamentals/task).
ClearML will automatically log your experiment and code once you integrate the ClearML [SDK](../../clearml_sdk.md) with your code.
At the begging of your code, import the clearml package
```python
From clearml import Task
```
:::note
To ensure full automatic logging it is recommended to import the ClearML package at the top of your entry script.
:::
Then initialize the Task object in your `main()` function, or the beginning of the script.
```python
Task = Task.init(project_name=great project, task_name=best experiment)
```
Task name is not unique, it's possible to have multiple experiments with the same name.
If the project does not already exist, a new one will be created automatically.
**Thats it!** You are done integrating ClearML with your code :)
Now, [command-line arguments](../../fundamentals/hyperparameters.md#argument-parser), [console output](../../fundamentals/logger#types-of-logged-results) as well as Tensorboard and Matplotlib will automatically be logged in the UI under the created Task.
<br/>
Sit back, relax, and watch your models converge :) or continue to see what else can be done with ClearML [here](ds_second_steps.md).

View File

@@ -0,0 +1,170 @@
---
title: Next Steps
---
So, we've [already](ds_first_steps.md) installed ClearML's python package and ran our first experiment!
Now, we'll learn how to track Hyperparameters, Artifacts and Metrics!
## Accessing Experiments
Every previously executed experiment is stored as a Task.
A Task has a project and a name, both can be changed after the experiment has been executed.
A Task is also automatically assigned an auto-generated unique identifier (UUID string) that cannot be changed and will always locate the same Task in the system.<br/>
It's possible to retrieve a Task object programmatically by querying the system based on either the Task ID,
or project & name combination. It's also possible to query tasks based on their properties, like Tags.
``` python
prev_task = Task.get_task(task_id=123456deadbeef)
```
Once we have a Task object we can query the state of the Task, get its Model, scalars, parameters, etc.
## Log Hyperparameters
For full reproducibility, it's paramount to save Hyperparameters for each experiment. Since Hyperparameters can have substantial impact
on Model performance, saving and comparing these between experiments is sometimes the key to understand model behavior.
ClearML supports logging `argparse` module arguments out of the box, so once integrating it into the code, it will automatically log all parameters provided to the argument parser.<br/>
It's also possible to log parameter dictionaries (very useful when parsing an external config file and storing as a dict object),
whole configuration files or even custom objects or [Hydra](https://hydra.cc/docs/intro/) configurations!
```python
params_dictionary = {'epochs': 3, 'lr': 0.4}
task.connect(params_dictionary)
```
Check [this](../../fundamentals/hyperparameters.md) out for all Hyperparameter logging options.
## Log Artifacts
ClearML allows you to easily store the output products of an experiment - Model snapshot \ weights file, a preprocessing of your data, feature representation of data and more!
Essentially artifacts are files (or python objects) uploaded from a script and are stored alongside the Task.
These Artifacts can be easily accessed by the web UI or programmatically.
Artifacts can be stored anywhere, either on the ClearML server, or any object storage solution or shared folder.<br/>
see all [storage capabilities](../../integrations/storage).
### Adding artifacts
Uploading a local file containing the preprocessed results of the data:
```python
task.upload_artifact(/path/to/preprocess_data.csv, name=data)
```
We can also upload an entire folder with all its content by passing the folder (the folder will be zipped and uploaded as a single zip file)
```python
task.upload_artifact(/path/to/folder/, name=folder)
```
Lastly we can upload an instance of an object, Numpy/Pandas/PIL Images are supported with npz/csv.gz/jpg formats accordingly.
If the object type is unknown ClearML pickles it and uploads the pickle file.
```python
task.upload_artifacts(my_numpy_matrix, name=features)
```
Check out all [artifact logging](../../fundamentals/artifacts.md) options.
### Using Artifacts
Logged Artifacts can be used by other Tasks, whether it's a pretrains Model or processed data.
To use an Artifact, first we have to get an instance of the Task that originally created it,
then we either download it and get it's path, or get the Artifact object directly.<br/>
For example, using a previously generated preprocessed data.
```python
preprocess_task = Task.get_task(task_id=preprocessing_task_id)
local_csv = preprocess_task.artifacts[data].get_local_copy()
```
The `task.artifacts` is a dictionary where the keys are the Artifacts names and the returned object is the Artifact object.
Calling get_local_copy() will return a local cached copy of the artifact,
this means that the next time we execute the code we will not need to download the artifact again.
Calling 'get()' will get a deserialized pickled object.
Check out the [artifacts retrieval](https://github.com/allegroai/clearml/blob/master/examples/reporting/artifacts_retrieval.py) example code.
### Models
Models are a special kind artifact.
Models created by popular frameworks (such as Pytorch, Tensorflow, Scikit-learn) are automatically logged by ClearML.
All snapshots are automatically logged, in order to make sure we also automatically upload the model snapshot (instead of saving its local path)
we need to pass a storage location for the model files to be uploaded to.
For example uploading all snapshots to our S3 bucket:
```python
task = Task.init(project_name=examples, task_name=storing model, output_uri=s3://my_models/)
```
From now on, whenever the framework (TF/Keras/PyTroch etc.) will be storing a snapshot, the model file will automatically get uploaded to our bucket under a specific folder for the experiment.
Loading models by a framework is also logged by the system, these models appear under the “Input Models” section, under the Artifacts tab.
Check out model snapshots examples for [TF](https://github.com/allegroai/clearml/blob/master/examples/frameworks/tensorflow/tensorflow_mnist.py),
[PyTorch](https://github.com/allegroai/clearml/blob/master/examples/frameworks/pytorch/pytorch_mnist.py),
[Keras](https://github.com/allegroai/clearml/blob/master/examples/frameworks/keras/keras_tensorboard.py),
[Scikit-Learn](https://github.com/allegroai/clearml/blob/master/examples/frameworks/scikit-learn/sklearn_joblib_example.py).
#### Loading Models
Loading a previously trained model is quite similar to loading artifacts.
```python
prev_task = Task.get_task(task_id=the_training_task)
last_snapshot = prev_task.models[output][-1]
local_weights_path = last_snapshot.get_local_copy()
```
Like before we have to get the instance of the Task training the original weights files, then we can query the task for its output models (a list of snapshots), and get the latest snapshot.
:::note
Using Tensorflow, the snapshots are stored in a folder, meaning the local_weights_path will point to a folder containing our requested snapshot.
:::
As with Artifacts all models are cached, meaning the next time we will run this code, no model will need to be downloaded.
Once one of the frameworks will load the weights file, the running Task will be automatically updated with “Input Model” pointing directly to the original training Tasks Model.
This feature allows you to easily get a full genealogy of every trained and used model by your system!
## Log Metrics
Full metrics logging is the key to finding the best performing model!
By default, everything that's reported to Tensorboard & Matplotlib is automatically captured and logged.<br/>
Since not all metrics are tracked that way, it's also possible to manually report metrics using the `logger` object.<br/>
It's possible to log everything, from time series data to confusion matrices to HTML, Audio and Video, to custom plotly graphs! Everything goes!<br/>
![image](../../img/report_plotly.png)
Once everything is neatly logged and displayed, using the [comparison tool](../../webapp/webapp_exp_comparing) makes it easy to find the best configuration!
## Track Experiments
The experiment table is a powerful tool for creating dashboards and views of your own projects, your team's projects, or the entire development.
![image](../../img/webapp_exp_table_01.png)
### Creating Leaderboards
The [experiments table](../../webapp/webapp_exp_table.md) can be customized to your own needs, adding desired views of parameters, metrics and tags.
It's possible to filter and sort based on parameters and metrics, so creating custom views is simple and flexible.
Create a dashboard for a project, presenting the latest Models and their accuracy scores, for immediate insights.
It can also be used as a live leaderboard, showing the best performing experiments' status, updated in real time.
This is helpful to monitor your projects' progress, and share it across the organization.<br/>
Any page is sharable by copying the URL from the address bar, allowing you to bookmark leaderboards or send an exact view of a specific experiment or a comparison view.<br/>
It's also possible to tag Tasks for visibility and filtering allowing you to add more information on the execution of the experiment.
Later you can search based on task name and tag in the search bar, and filter experiments based on their tags, parameters, status and more.
## What's next?
This covers the Basics of ClearML! Running through this guide we've learned how to log Parameters, Artifacts and Metrics!
If you want to learn more look at how we see the data science process in our [best practices](best_practices.md) page,
or check these pages out:
- Scale you work and deploy [ClearML Agents](../../clearml_agent.md)
- Develop on remote machines with [ClearML Session](../../apps/clearml_session.md)
- Structure your work and put it into [Pipelines](../../fundamentals/pipelines.md)
- Improve your experiments with [HyperParameter Optimization](https://github.com/allegroai/clearml/tree/master/examples/optimization/hyper-parameter-optimization)
- Check out ClearML's integrations to [external libraries](../../integrations/libraries.md).

View File

@@ -0,0 +1,21 @@
---
id: main
title: What is ClearML?
slug: /
---
ClearML is an open source platform that automates and simplifies developing and managing machine learning solutions
for thousands of data science teams all over the world.
It is designed as an end-to-end MLOps suite allowing you to focus on developing your ML code & automation,
while ClearML ensures your work is reproducible and scalable.
## What can you do with ClearML?
- Track and upload metrics and models with only 2 lines of code
- Create a bot that sends you a slack message whenever you model improves in accuracy
- Automatically scale AWS instances according to your resources needs
- Reproduce experiments with 3 mouse clicks
- Much More!
#### Who We Are
ClearML is supported by you :heart: and by the team behind [allegro.ai](https://www.allegro.ai) , where we build even more MLOps for enterprise companies.

View File

@@ -0,0 +1,39 @@
---
title: Best Practices
---
In short - **automate everything** :)
From training models to data processing to deploying to production.
## Development - Preparing for Automation
Basically track everything, there is nothing that is not worth having visibility to.
If you are afraid of clutter, use the archive option, and set up your own cleanup service (see [here](../../guides/services/cleanup_service) how)
- Track the code base. there is no reason not to add metrics to any process in your workflow, even if it is not directly ML. Visibility is key to iterative improvement of your code \ workflow.
- Create per-project [leader-boards](../../webapp/webapp_exp_track_visual.md) based on custom columns
(hyper parameters and performance accuracy), and bookmark them (full URL will always reproduce the same view & table).
- Share experiments with your colleagues and team-leaders.
Invite more people to see how your project is progressing, and suggest they add metric reporting for their own.
These metrics can later be part of your own in-house monitoring solution, don't let good data go to waste :)
## Clone Tasks
In order to define a Task in ClearML we have two options
- Run the actual code with task.init call. This will create and auto-populate the Task in CleaML (including Git Repo/Python Packages/ Command line etc.)
- Register local/remote code repository with `clearml-task`. Dee [details](../../apps/clearml_task.md)
Once we have a Task in ClearML, we can clone and edit its definition in the UI. Then launch it on one of our nodes with [ClearML Agent](../../clearml_agent.md)
## Advanced Automation
- Create daily/weekly cron jobs for retraining best performing models on.
- Create data monitoring & scheduling and launch inference jobs to test performance on any new coming dataset.
- Once there are two or more experiments that run after another, group them together into a [pipeline](../../fundamentals/pipelines.md)
## Manage your data
Use [ClearML Data](../../clearml_data.md) to version your data, then link it to running experiments for easy reproduction.
Make datasets machine agnostic (i.e. store original dataset in a shared storage location, e.g. shared-folder/S3/Gs/Azure)
ClearML Data supports efficient Dataset storage and caching, differentiable & compressed
## Scale Your Work
Use [ClearML Agent](../../clearml_agent.md) to scale work. Install the agent machines (Remote or local) and manage
training workload with it. <br/>
Improve team collaboration by transparent resource monitoring, always know what is running where.

View File

@@ -0,0 +1,146 @@
---
title: First Steps
---
:::note
This tutorial assumes that you've already [signed up](https://app.community.clear.ml) to ClearML
:::
MLOps is all about automation! We'll discuss the need for automation and the Tools ClearML offers for automation, orchestration and tracking!<br/>
Effective MLOps relies on being able to scale work beyond one's own computer. Moving from your own machine can be inefficient,
assuming that you have all the drivers and applications installed, you still need to manage multiple python environments
for different packages \ package versions, or worst - manage different docker for different package versions.<br/>
Not to mention, when working on remote machines, executing experiments and tracking what's running where and making sure they are fully utilized at all times
becomes a daunting task.<br/>
This can create overhead that derails you from the core work!
ClearML Agent was designed to deal with these and more! It is a module responsible executing experiments,
on remote machines, on premise or in the cloud!<br/>
It will setup the environment for the specific Task (inside a docker, or bare-metal) install the required python packages and execute & monitor the process itself.
## Spin up an Agent
First, let's install the agent!
```bash
pip install clearml-agent
```
Connect the Agent to the server by [creating credentials](https://app.community.clear.ml/profile), then run this:
```bash
clearml-init
```
:::note
If you've already created credentials, you can copy-paste the default agent section from [here](https://github.com/allegroai/clearml-agent/blob/master/docs/clearml.conf#L15) (this is obviously optional if the section is not provided the default values will be used)
:::
Start the agent's daemon. The agent will start pulling Tasks from the assigned queue(default in our case), and execute them one after the other.
```bash
clearml-agent daemon --queue default
```
## Clone an Experiment
Creating a new "job" to be executed, is essentially cloning a Task in the system, then enqueueing the Task in one of the execution queues for the agent to execute it.
When cloning a Task we are creating another copy of the Task in a *draft* mode, allowing us to edit the Task's environment definitions. <br/>
We can edit the git \ code references, control the python packages to be installed, specify docker container image to be used, or change the hyper-parameters and configuration files.
Once we are done, enqueuing the Task in one of the execution queues will put it in the execution queue.
Multiple agents can listen to the same queue (or even multiple queues), but only a single agent will pick the Task to be executed.
You can clone an experiments from our [examples](https://app.community.clear.ml/projects/764d8edf41474d77ad671db74583528d/experiments) project and enqueue it to a queue!
### Accessing Previously Executed Experiments
All executed Tasks in the system can be accessed based on the unique Task ID, or by searching for the Task based on its properties.
For example:
```python
from clearml import Task
executed_task = Task.get_task(task_id='aabbcc')
```
## Log Hyperparameters
Hyperparameters are an integral part of Machine Learning code as it lets you control the code without directly modifying it.<br/>
Hyperparameters can be added from anywhere in your code, and ClearML supports [multiple](../../fundamentals/hyperparameters.md) ways to obtain them!
ClearML also allows users to change and track hyperparameter values without changing the code itself.
When a cloned experiment is executed by an Agent, it will override the default values with new ones.
It's also possible to programatically change cloned experiments' parameters
For example:
```python
from clearml import Task
cloned_task = Task.clone(task_id='aabbcc')
cloned_task.set_parameter(name='internal/magic', value=42)
Task.enqueue(cloned_task, queue_name='default')
```
## Logging Artifacts
Artifacts are a great way to pass and reuse data between Tasks in the system.
From anywhere in the code you can upload [multiple](../../fundamentals/artifacts.md#logging-artifacts) types of data, object and files.
Artifacts are the base of ClearML's [Data Management](../../clearml_data.md) solution and as a way to communicate complex objects between different
stages of a [pipeline](../../fundamentals/pipelines.md)
```python
import numpy as np
from clearml import Task
Task.current_task().upload_artifact(name='a_file', artifact_object='local_file.bin')
Task.current_task().upload_artifact(name='numpy', artifact_object=np.ones(4,4))
```
### Using Artifacts
Artifacts can be retrieved by [accessing](../../fundamentals/artifacts.md#uing-artifacts) the Task that created it.
```python
from clearml import Task
executed_task = Task.get_task(task_id='aabbcc')
# artifact as a file
local_file = executed_task.artifacts['file'].get_local_copy()
# artifact as object
a_numpy = executed_task.artifacts['numpy'].get()
```
### Models
Model are a special type of artifact that's automatically logged.
Logging models into the model repository is the easiest way to integrate the development process directly with production.<br/>
Any model stored by the supported frameworks (Keras \ TF \PyTorch \ Joblib) will be automatically logged into ClearML.
Models can be automatically stored on a preferred storage medium (s3 bucket, google storage, etc...).
## Log Metrics
Log as many metrics from your processes! It improves visibility on their progress.
Use the Logger class from to report scalars and plots.
```python
from clearml import Logger
Logger.current_logger().report_scalar(graph='metric', series='variant', value=13.37, iteration=counter)
```
You can later analyze reported scalars
```python
from clearml import Task
executed_task = Task.get_task(task_id='aabbcc')
# get a summary of the min/max/last value of all reported scalars
min_max_vlues = executed_task.get_last_scalar_metrics()
# get detialed graphs of all scalars
full_scalars = executed_task.get_reported_scalars()
```
## Track Experiments
You can also search and query Tasks in the system.
Use the `Task.get_tasks` call to retrieve Tasks objects and filter based on the specific values of the Task - status, parameters, metrics and more!
```python
from clearml import Task
tasks = Task.get_tasks(project_name='examples', task_name='partial_name_match', task_filter={'status': 'in_proress'})
```
## Manage Your Data
Data is probably one of the biggest factors that determines the success of a project.
Associating the data a model used to the model's configuration, code and results (such as accuracy) is key to deducing meaningful insights into how
models behave. <br/>
[ClearML Data](../../clearml_data.md) allows you to version your data so it's never lost, fetch it from every machine with minimal code changes
and associate data to experiments results.
Logging data can be done via command line, or via code. If any preprocessing code is involved, ClearML logs it as well!<br/>
Once data is logged, it can be used by other experiments.

View File

@@ -0,0 +1,103 @@
---
title: Next Steps
---
Once Tasks are defined and in the ClearML system, they can be chained together to create Pipelines.
Pipelines provide users with a greater level of abstraction and automation, with Tasks running one after the other.<br/>
Tasks can interface with other Tasks in the pipeline and leverage other Tasks' work products.<br/>
We'll go through a scenario where users create a Dataset, process the data then consume it with another task, all running as a pipeline.
## Building Tasks
### Dataset Creation
Let's assume we have some code that extracts data from a production Database into a local folder.
Our goal is to create an immutable copy of the data to be used by further steps:
```bash
clearml-data create --project data --name dataset
clearml-data sync --folder ./from_production
```
We could also add a Tag `latest` to the Dataset, marking it as the latest version.
### Preprocessing Data
The second step is to preprocess the date. First we need to access it, then we want to modify it
and lastly we want to create a new version of the data.
```python
# create a task for the data processing part
task = Task.init(project_name='data', task_name='ingest', task_type='data_processing')
# get the v1 dataset
dataset = Dataset.get(dataset_project='data', dataset_name='dataset_v1')
# get a local mutable copy of the dataset
dataset_folder = dataset.get_mutable_local_copy(target_folder='work_dataset', overwrite=True)
# change some files in the `./work_dataset` folder
...
# create a new version of the dataset with the pickle file
new_dataset = Dataset.create(
dataset_project='data', dataset_name='dataset_v2',
parent_datasets=[dataset],
use_current_task=True, # this will make sure we have the creation code and the actual dataset artifacts on the same Task
)
new_dataset.sync_folder(local_path=dataset_folder)
new_dataset.upload()
new_dataset.finalize()
# now let's remove the previous dataset tag
dataset.tags = []
new_dataset.tags = ['latest']
```
We passed the `parents` argument when we created v2 of the Dataset, this inherits all the parent's version content.
This will not only help us in tracing back dataset changes with full genealogy, but will also make our storage more efficient,
as it will only store the files that were changed \ added from the parent versions.
When we will later need access to the Dataset it will automatically merge the files from all parent versions
in a fully automatic and transparent process, as if they were always part of the requested Dataset.
### Training
We can now train our model with the **latest** Dataset we have in the system.
We will do that by getting the instance of the Dataset based on the `latest` tag
(if by any chance we have two Datasets with the same tag we will get the newest).
Once we have the dataset we can request a local copy of the data. All local copy requests are cached,
which means that if we are accessing the same dataset multiple times we will not have any unnecessary downloads.
```python
# create a task for the model training
task = Task.init(project_name='data', task_name='ingest', task_type='training')
# get the latest dataset with the tag `latest`
dataset = Dataset.get(dataset_tags='latest')
# get a cached copy of the Dataset files
dataset_folder = dataset.get_local_copy()
# train our model here
```
## Building the Pipeline
Now that we have the data creation step, and the data training step, let's create a pipeline that when executed,
will first run the first and then run the second.
It is important to remember that pipelines are Tasks by themselves and can also be automated by other pipelines (i.e. pipelines of pipelines).
```python
pipe = PipelineController(
always_create_task=True,
pipeline_project='data', pipeline_name='pipeline demo',
)
pipe.add_step(
name='step 1 data',
base_task_id='cbc84a74288e459c874b54998d650214', # Put the task ID here
)
pipe.add_step(
name='step 2 train',
parents=['step 1 data', ],
base_task_id='cbc84a74288e459c874b54998d650214', # Put the task ID here
)
```
We could also pass the parameters from one step to the other (for example `Task.id`).
See more in the full pipeline documentation [here](../../fundamentals/pipelines.md).

View File

@@ -0,0 +1,32 @@
---
title: Manual Random Parameter Search
---
The [manual_random_param_search_example.py](https://github.com/allegroai/clearml/blob/master/examples/automation/manual_random_param_search_example.py)
script demonstrates a random parameter search by automating the execution of an experiment multiple times, each time with
a different set of random hyperparameters.
This example accomplishes the automated random parameter search by doing the following:
1. Creating a template Task named `Keras HP optimization base`. To create it, run the [base_template_keras_simple.py](https://github.com/allegroai/clearml/blob/master/examples/optimization/hyper-parameter-optimization/base_template_keras_simple.py)
script. This experiment must be executed first, so it will be stored in the server, and then it can be accessed, cloned,
and modified by another Task.
1. Creating a parameter dictionary, which is connected to the Task by calling [Task.connect](../../references/sdk/task.md#connect)
so that the parameters are logged by **ClearML**.
1. Adding the random search hyperparameters and parameters defining the search (e.g., the experiment name, and number of
times to run the experiment).
1. Creating a Task object referencing the template experiment, `Keras HP optimization base`. See [Task.get_task](../../references/sdk/task.md#taskget_task).
1. For each set of parameters:
1. Cloning the Task object. See [Task.clone](../../references/sdk/task.md#taskclone).
1. Getting the newly cloned Task's parameters. See [Task.get_parameters](../../references/sdk/task.md#get_parameters)
1. Setting the newly cloned Task's parameters to the search values in the parameter dictionary (Step 1). See [Task.set_parameters](../../references/sdk/task.md#set_parameters).
1. Enqueuing the newly cloned Task to execute. See [Task.enqueue](../../references/sdk/task.md#taskenqueue).
When the example script runs, it creates an experiment named `Random Hyper-Parameter Search Example` which is associated
with the `examples` project. This starts the parameter search, and creates the experiments:
* `Keras HP optimization base 0`
* `Keras HP optimization base 1`
* `Keras HP optimization base 2`.
When these experiments are completed, their [results can be compared](../../webapp/webapp_exp_comparing.md).

View File

@@ -0,0 +1,25 @@
---
title: Task Piping
---
The [task_piping_example.py](https://github.com/allegroai/clearml/blob/master/examples/automation/task_piping_example.py)
example demonstrates:
1. Creating an instance of a Task from a template Task.
1. Customizing that instance by changing the value of a parameter
1. Enqueuing the customized instance for execution.
This example accomplishes a task pipe by doing the following:
1. Creating the template Task which is named `Toy Base Task`. It must be stored in **ClearML Server** before instances of
it can be created. To create it, run another **ClearML** example script, [toy_base_task.py](https://github.com/allegroai/clearml/blob/master/examples/automation/toy_base_task.py).
The template Task has a parameter dictionary, which is connected to the Task: `{'Example_Param': 1}`.
1. Back in `task_piping_example.py`, creating a parameter dictionary, which is connected to the Task by calling [Task.connect](../../references/sdk/task.md#connect)
so that the parameters are logged by **ClearML**. The dictionary contains the name of the parameter from the template
Task that is going to be customized (`Example_Param`), as well as its new value.
1. Creating a Task object referencing the template Task. See [Task.get_task](../../references/sdk/task.md#taskget_task).
1. Creating an instance of the template Task by cloning it.
1. Getting the newly cloned Task's parameters. See [Task.get_parameters](../../references/sdk/task.md#get_parameters).
1. Setting the newly cloned Task's parameters to the search values in the parameter dictionary (Step 2). See [Task.set_parameters](../../references/sdk/task.md#set_parameters).
1. Enqueuing the newly cloned Task to execute. See [Task.enqueue](../../references/sdk/task.md#taskenqueue).
When the example script runs, it creates an instance of the template experiment, named `Auto generated cloned task` which is associated with the `examples` project. In the instance, the value of the customized parameter, `Example_Param` changed to `3`. You can see it in **CONFIGURATIONS** **>** **HYPER PARAMETERS**.

View File

@@ -0,0 +1,76 @@
---
title: ClearML Task Tutorial
---
In this tutorial, you will use `clearml-task` to execute this [script](https://github.com/allegroai/events/blob/master/webinar-0620/keras_mnist.py)
on a remote or local machine, from the remote repository and from a local script.
### Prerequisites
- `clearml` Python package installed
- `clearml-agent` running on at least one machine (to execute the experiment) and assigned to listen to default queue
- [allegroai/events](https://github.com/allegroai/events) repository cloned (for local script execution)
### Executing code from a remote repository
``` bash
clearml-task --project keras_examples --name remote_test --repo https://github.com/allegroai/events.git --script /webinar-0620/keras_mnist.py --args batch_size=64 epochs=1 --queue default
```
Provide `clearml-task` with the following arguments:
1. `--project keras_examples --name remote_test` - The project and experiment name.
If the project entered doesn't exist, a new project will be created with the selected name.
1. `--repo https://github.com/allegroai/events.git` - The chosen repository's URL.
By default, `clearml-task` will use the latest commit from the master branch.
1. `--script /webinar-0620/keras_mnist.py` - The script to be executed.
1. `--args batch_size=64 epochs=1` - Arguments passed to the script.
This uses the `argparse` object to get CLI parameters.
1. `--queue default` - Selected queue to send the experiment to.
Now `clearml-task` does the rest of the heavy-lifting!
* It creates a new Task on the [ClearML Server](../../deploying_clearml/clearml_server.md).
* Then, the Task is enqueued in the selected execution queue, where it will be executed by an available
`clearml-agent` assigned to that queue.
Your output should look something like this:
```console
New task created id=2f96ee95b05d4693b360d0fcbb26b727
Task id=2f96ee95b05d4693b360d0fcbb26b727 sent for execution on queue default
Execution log at: https://app.community.clear.ml/projects/552d5399112d47029c146d5248570295/experiments/2f96ee95b05d4693b360d0fcbb26b727/output/log
```
:::note
**clearml-task** automatically finds the requirements.txt file in remote repositories.
If a remote repo does not have such a file, make sure to either add one with all the required Python packages,
or add the **`--packages '<package_name>`** flag to the command.
:::
<br />
### Executing a local script
Using `clearml-task` to execute a local script is very similar to using it with a remote repo.
For this example, we will be using a local version of this [script](https://github.com/allegroai/events/blob/master/webinar-0620/keras_mnist.py).
1. Go to the root folder of the cloned [allegroai/events](https://github.com/allegroai/events) repository
1. Run `clearml-task` by executing:
``` bash
clearml-task --project keras --name local_test --script webinar-0620/keras_mnist.py --requirements webinar-0620/requirements.txt --args epochs=1 --queue default
```
Notice that the command is almost identical to executing code from a git repository. The only differences are:
* `--script webinar-0620/keras_mnist.py` - Pointing `clearml-task` to a local script.
* `--requirements webinar-0620/requirements.txt` - Manually specifying a *requirements.txt* file.
After executing `clearml-task`, a Task will be created according to the parameters entered. The Task will
be sent to a queue for execution.

View File

@@ -0,0 +1,303 @@
---
title: Dataset Management Using CIFAR10
---
In this tutorial, we are going use a CIFAR example, manage the CIFAR dataset with `clearml-data`, and then replace our
current dataset read method with one that interfaces with `clearml-data`.
## Creating the Dataset
### Downloading the Data
Before we can register the CIFAR dataset with `clearml-data` we need to obtain a local copy of it.
Execute this python script to download the data
```python
from clearml import StorageManager
# We're using the StorageManager to download the data for us!
# It's a neat little utility that helps us download
# files we need and cache them :)
manager = StorageManager()
dataset_path = manager.get_local_copy(remote_url="https://www.cs.toronto.edu/~kriz/cifar-10-python.tar.gz")
# make sure to copy the printed value
print("COPY THIS DATASET PATH: {}".format(dataset_path))
```
Expected reponse:
```bash
COPY THIS DATASET PATH: /home/erez/.clearml/cache/storage_manager/global/f2751d3a22ccb78db0e07874912b5c43.cifar-10-python_artifacts_archive_None
```
The script prints the path to the downloaded data. It'll be needed later one
### Creating the Dataset
To create the dataset, in a CLI, execute:
```
clearml-data create --project cifar --name cifar_dataset
```
Expected response:
```
clearml-data - Dataset Management & Versioning CLI
Creating a new dataset:
New dataset created id=*********
```
Where \*\*\*\*\*\*\*\*\* is the dataset ID.
## Adding Files
Add the files we just downloaded to the dataset:
```
clearml-data add --files <dataset_path>
```
where `dataset_path` is the path that was printed earlier, which denotes the location of the downloaded dataset.
:::note
There's no need to specify a *dataset_id* as *clearml-data* session stores it.
:::
## Finalizing the Dataset
Run the close command to upload the files (it'll be uploaded to file server by default):<br/>
```
clearml-data close
```
![image](../../img/examples_data_management_cifar_dataset.png)
## Using the Dataset
Now that we have a new dataset registered, we can consume it.
We [take](https://github.com/allegroai/clearml/blob/master/examples/frameworks/ignite/cifar_ignite.py)
this script as a base to train on the CIFAR dataset.
We replace the file load part with ClearML's Dataset object. The Dataset's `get_local_copy()` method will return a path
to the cached, downloaded dataset.
Then we provide the path to Pytorch's dataset object.
```python
dataset_id = "ee1c35f60f384e65bc800f42f0aca5ec"
from clearml import Dataset
dataset_path = Dataset.get(dataset_id=dataset_id).get_local_copy()
trainset = datasets.CIFAR10(root=dataset_path,
train=True,
download=False,
transform=transform)
```
<details className="cml-expansion-panel info">
<summary className="cml-expansion-panel-summary">Full example code using dataset:</summary>
<div className="cml-expansion-panel-content">
```python
#These are the obligatory imports
from pathlib import Path
import matplotlib.pyplot as plt
import torch
import torch.nn as nn
import torch.nn.functional as F
import torch.optim as optim
import torchvision.datasets as datasets
import torchvision.transforms as transforms
from ignite.contrib.handlers import TensorboardLogger
from ignite.engine import Events, create_supervised_trainer, create_supervised_evaluator
from ignite.handlers import global_step_from_engine
from ignite.metrics import Accuracy, Loss, Recall
from ignite.utils import setup_logger
from torch.utils.tensorboard import SummaryWriter
from tqdm import tqdm
from clearml import Task, StorageManager
# Connecting ClearML with the current process,
# from here on everything is logged automatically
task = Task.init(project_name='Image Example', task_name='image classification CIFAR10')
params = {'number_of_epochs': 20, 'batch_size': 64, 'dropout': 0.25, 'base_lr': 0.001, 'momentum': 0.9, 'loss_report': 100}
params = task.connect(params) # enabling configuration override by clearml/
print(params) # printing actual configuration (after override in remote mode)
# This is our original data retrieval code. it uses storage manager to just download and cache our dataset.
'''
manager = StorageManager()
dataset_path = Path(manager.get_local_copy(remote_url="https://www.cs.toronto.edu/~kriz/cifar-10-python.tar.gz"))
'''
# Let's now modify it to utilize for the new dataset API, you'll need to copy the created dataset id
# to the next variable
dataset_id = "ee1c35f60f384e65bc800f42f0aca5ec"
# The below gets the dataset and stores in the cache. If you want to download the dataset regardless if it's in the
# cache, use the Dataset.get(dataset_id).get_mutable_local_copy(path to download)
from clearml import Dataset
dataset_path = Dataset.get(dataset_id=dataset_id).get_local_copy()
# Dataset and Dataloader initializations
transform = transforms.Compose([transforms.ToTensor()])
trainset = datasets.CIFAR10(root=dataset_path,
train=True,
download=False,
transform=transform)
trainloader = torch.utils.data.DataLoader(trainset,
batch_size=params.get('batch_size', 4),
shuffle=True,
num_workers=10)
testset = datasets.CIFAR10(root=dataset_path,
train=False,
download=False,
transform=transform)
testloader = torch.utils.data.DataLoader(testset,
batch_size=params.get('batch_size', 4),
shuffle=False,
num_workers=10)
classes = ('plane', 'car', 'bird', 'cat', 'deer', 'dog', 'frog', 'horse', 'ship', 'truck')
tb_logger = TensorboardLogger(log_dir="cifar-output")
# Helper function to store predictions and scores using matplotlib
def predictions_gt_images_handler(engine, logger, *args, **kwargs):
x, _ = engine.state.batch
y_pred, y = engine.state.output
num_x = num_y = 4
le = num_x * num_y
fig = plt.figure(figsize=(20, 20))
trans = transforms.ToPILImage()
for idx in range(le):
preds = torch.argmax(F.softmax(y_pred[idx],dim=0))
probs = torch.max(F.softmax(y_pred[idx],dim=0))
ax = fig.add_subplot(num_x, num_y, idx + 1, xticks=[], yticks=[])
ax.imshow(trans(x[idx]))
ax.set_title("{0} {1:.1f}% (label: {2})".format(
classes[preds],
probs * 100,
classes[y[idx]]),
color=("green" if preds == y[idx] else "red")
)
logger.writer.add_figure('predictions vs actuals', figure=fig, global_step=engine.state.epoch)
class Net(nn.Module):
def __init__(self):
super(Net, self).__init__()
self.conv1 = nn.Conv2d(3, 6, 3)
self.conv2 = nn.Conv2d(6, 16, 3)
self.pool = nn.MaxPool2d(2, 2)
self.fc1 = nn.Linear(16 * 6 * 6, 120)
self.fc2 = nn.Linear(120, 84)
self.dorpout = nn.Dropout(p=params.get('dropout', 0.25))
self.fc3 = nn.Linear(84, 10)
def forward(self, x):
x = self.pool(F.relu(self.conv1(x)))
x = self.pool(F.relu(self.conv2(x)))
x = x.view(-1, 16 * 6 * 6)
x = F.relu(self.fc1(x))
x = F.relu(self.fc2(x))
x = self.fc3(self.dorpout(x))
return x
# Training
def run(epochs, lr, momentum, log_interval):
device = "cuda" if torch.cuda.is_available() else "cpu"
net = Net().to(device)
criterion = nn.CrossEntropyLoss()
optimizer = optim.SGD(net.parameters(), lr=lr, momentum=momentum)
trainer = create_supervised_trainer(net, optimizer, criterion, device=device)
trainer.logger = setup_logger("trainer")
val_metrics = {"accuracy": Accuracy(),"loss": Loss(criterion), "recall": Recall()}
evaluator = create_supervised_evaluator(net, metrics=val_metrics, device=device)
evaluator.logger = setup_logger("evaluator")
# Attach handler to plot trainer's loss every 100 iterations
tb_logger.attach_output_handler(
trainer,
event_name=Events.ITERATION_COMPLETED(every=params.get('loss_report')),
tag="training",
output_transform=lambda loss: {"loss": loss},
)
# Attach handler to dump evaluator's metrics every epoch completed
for tag, evaluator in [("training", trainer), ("validation", evaluator)]:
tb_logger.attach_output_handler(
evaluator,
event_name=Events.EPOCH_COMPLETED,
tag=tag,
metric_names="all",
global_step_transform=global_step_from_engine(trainer),
)
# Attach function to build debug images and report every epoch end
tb_logger.attach(
evaluator,
log_handler=predictions_gt_images_handler,
event_name=Events.EPOCH_COMPLETED(once=1),
);
desc = "ITERATION - loss: {:.2f}"
pbar = tqdm(initial=0, leave=False, total=len(trainloader), desc=desc.format(0))
@trainer.on(Events.ITERATION_COMPLETED(every=log_interval))
def log_training_loss(engine):
pbar.desc = desc.format(engine.state.output)
pbar.update(log_interval)
@trainer.on(Events.EPOCH_COMPLETED)
def log_training_results(engine):
pbar.refresh()
evaluator.run(trainloader)
metrics = evaluator.state.metrics
avg_accuracy = metrics["accuracy"]
avg_nll = metrics["loss"]
tqdm.write(
"Training Results - Epoch: {} Avg accuracy: {:.2f} Avg loss: {:.2f}".format(
engine.state.epoch, avg_accuracy, avg_nll
)
)
@trainer.on(Events.EPOCH_COMPLETED)
def log_validation_results(engine):
evaluator.run(testloader)
metrics = evaluator.state.metrics
avg_accuracy = metrics["accuracy"]
avg_nll = metrics["loss"]
tqdm.write(
"Validation Results - Epoch: {} Avg accuracy: {:.2f} Avg loss: {:.2f}".format(
engine.state.epoch, avg_accuracy, avg_nll
)
)
pbar.n = pbar.last_print_n = 0
@trainer.on(Events.EPOCH_COMPLETED | Events.COMPLETED)
def log_time():
tqdm.write(
"{} took {} seconds".format(trainer.last_event_name.name, trainer.state.times[trainer.last_event_name.name])
)
trainer.run(trainloader, max_epochs=epochs)
pbar.close()
PATH = './cifar_net.pth'
torch.save(net.state_dict(), PATH)
print('Finished Training')
print('Task ID number is: {}'.format(task.id))
run(params.get('number_of_epochs'), params.get('base_lr'), params.get('momentum'), 10)
```
</div></details>
<br/><br/>
That's it! All you need to do now is run the full script.

View File

@@ -0,0 +1,79 @@
---
title: Folder Sync
---
This example shows how to use the *clearml-data* folder sync function.
*clearml-data* folder sync mode is useful for cases when users have a single point of truth (i.e. a folder) that updates
from time to time. When the point of truth is updated, users can call `clearml-data sync` and the
changes (file addition, modification, or removal) will be reflected in ClearML.
## Creating Initial Version
## Prerequisites
First, make sure that you have cloned the [clearml](https://github.com/allegroai/clearml) repository. This contains all
the needed files.
1. Open terminal and change directory to the cloned repository's examples folder
`cd clearml/examples/reporting`
## Syncing a Folder
Create a dataset and sync the `data_samples` folder from the repo to ClearML
```bash
clearml-data sync --project datasets --name sync_folder --folder data_samples
```
Expected response:
```
clearml-data - Dataset Management & Versioning CLI
Creating a new dataset:
New dataset created id=0d8f5f3e5ebd4f849bfb218021be1ede
Syncing dataset id 0d8f5f3e5ebd4f849bfb218021be1ede to local folder data_samples
Generating SHA2 hash for 5 files
Hash generation completed
Sync completed: 0 files removed, 5 added / modified
Finalizing dataset
Pending uploads, starting dataset upload to https://files.community.clear.ml
Uploading compressed dataset changes (5 files, total 222.17 KB) to https://files.community.clear.ml
Upload completed (222.17 KB)
2021-05-04 09:57:56,809 - clearml.Task - INFO - Waiting to finish uploads
2021-05-04 09:57:57,581 - clearml.Task - INFO - Finished uploading
Dataset closed and finalized
```
As can be seen, the `clearml-data sync` command creates the dataset, then uploads the files, and closes the dataset.
## Modifying Synced Folder
Now we'll modify the folder:
1. Add another line to one of the files in the `data_samples` folder.
1. Add a file to the sample_data folder.<br/>
Run`echo "data data data" > data_samples/new_data.txt` (this will create the file `new_data.txt` and put it in the `data_samples` folder)
We'll repeat the process of creating a new dataset with the previous one as its parent, and syncing the folder.
```bash
clearml-data sync --project datasets --name second_ds --parents a1ddc8b0711b4178828f6c6e6e994b7c --folder data_samples
```
Expected response:
```
clearml-data - Dataset Management & Versioning CLI
Creating a new dataset:
New dataset created id=0992dd6bae6144388e0f2ef131d9724a
Syncing dataset id 0992dd6bae6144388e0f2ef131d9724a to local folder data_samples
Generating SHA2 hash for 6 files
Hash generation completed
Sync completed: 0 files removed, 2 added / modified
Finalizing dataset
Pending uploads, starting dataset upload to https://files.community.clear.ml
Uploading compressed dataset changes (2 files, total 742 bytes) to https://files.community.clear.ml
Upload completed (742 bytes)
2021-05-04 10:05:42,353 - clearml.Task - INFO - Waiting to finish uploads
2021-05-04 10:05:43,106 - clearml.Task - INFO - Finished uploading
Dataset closed and finalized
```
We can see that 2 files were added or modified, just as we expected!

View File

@@ -0,0 +1,165 @@
---
title: Data Management Example
---
In this example we'll create a simple dataset and demonstrate basic actions on it.
## Prerequisites
First, make sure that you have cloned the [clearml](https://github.com/allegroai/clearml) repository. This contains all
the needed files.
1. Open terminal and change directory to the cloned repository's examples folder
`cd clearml/examples/reporting`
## Creating initial dataset
1. To create the dataset, run this code:
```bash
clearml-data create --project datasets --name HelloDataset
```
Expected response:
```bash
clearml-data - Dataset Management & Versioning CLI
Creating a new dataset:
New dataset created id=24d05040f3e14fbfbed8edb1bf08a88c
```
1. Now let's add a folder. File addition is recursive, so it's enough to point at the folder
to captures all files and subfolders:
```bash
clearml-data add --files data_samples
```
Expected response:
```bash
clearml-data - Dataset Management & Versioning CLI
Adding files/folder to dataset id 24d05040f3e14fbfbed8edb1bf08a88c
Generating SHA2 hash for 2 files
Hash generation completed
5 files added
```
:::note
After creating a dataset, we don't have to specify its ID when running commands, such as *add*, *remove* or *list*
:::
1. Close the dataset - this command uploads the files. By default, the files are uploaded to the file server, but
this can be configured with the `--storage` flag to any of ClearML's supported storage mediums (see [storage](../../integrations/storage.md)).
The command also finalizes the dataset, making it immutable and ready to be consumed.
```bash
clearml-data close
```
Expected response:
```bash
clearml-data - Dataset Management & Versioning CLI
Finalizing dataset id 24d05040f3e14fbfbed8edb1bf08a88c
Pending uploads, starting dataset upload to https://files.community-master.hosted.allegro.ai
Pending uploads, starting dataset upload to https://files.community.clear.ml
Uploading compressed dataset changes (4 files, total 221.56 KB) to https://files.community.clear.ml
Upload completed (221.56 KB)
2021-05-04 09:32:03,388 - clearml.Task - INFO - Waiting to finish uploads
2021-05-04 09:32:04,067 - clearml.Task - INFO - Finished uploading
Dataset closed and finalized
```
## Listing Dataset content
To see that all the files were added to the created dataset, use `clearml-data list` and enter the ID of the dataset
that was just closed.
```bash
clearml-data list --id 24d05040f3e14fbfbed8edb1bf08a88c
```
Expected response:
```console
clearml-data - Dataset Management & Versioning CLI
List dataset content: 24d05040f3e14fbfbed8edb1bf08a88c
Listing dataset content
file name | size | hash
------------------------------------------------------------------------------------------------------------------------------------------------
dancing.jpg | 40,484 | 78e804c0c1d54da8d67e9d072c1eec514b91f4d1f296cdf9bf16d6e54d63116a
data.csv | 21,440 | b618696f57b822cd2e9b92564a52b3cc93a2206f41df3f022956bb6cfe4e7ad5
picasso.jpg | 114,573 | 6b3c67ea9ec82b09bd7520dd09dad2f1176347d740fd2042c88720e780691a7c
sample.json | 132 | 9c42a9a978ac7a71873ebd5c65985e613cfaaff1c98f655af0d2ee0246502fd7
sample.mp3 | 72,142 | fbb756ae14005420ff00ccdaff99416bebfcea3adb7e30963a69e68e9fbe361b
Total 5 files, 248771 bytes
```
## Creating a Child Dataset
In Clear Data, it's possible to create datasets that inherit the content of other datasets, there are called child datasets.
1. Create a new dataset, specifying the previously created one as its parent:
```bash
clearml-data create --project datasets --name HelloDataset-improved --parents 24d05040f3e14fbfbed8edb1bf08a88c
```
:::note
You'll need to input the Dataset ID you received when created the dataset above
:::
1. Now, we want to add a new file.
* Create a new file: `echo "data data data" > new_data.txt` (this will create the file `new_data.txt`),
* Now add the file to the dataset:
```bash
clearml-data add --files new_data.txt
```
Which should return this output:
```console
clearml-data - Dataset Management & Versioning CLI
Adding files/folder to dataset id 8b68686a4af040d081027ba3cf6bbca6
1 file added
```
1. Let's also remove a file. We'll need to specify the file's full path (within the dataset, not locally) to remove it.
```bash
clearml-data remove --files data_samples/dancing.jpg
```
Expected response:
```bash
clearml-data - Dataset Management & Versioning CLI
Removing files/folder from dataset id 8b68686a4af040d081027ba3cf6bbca6
1 files removed
```
1. Close and finalize the dataset
```bash
clearml-data close
```
1. Let's take a look again at the files in the dataset:
```
clearml-data list --id 8b68686a4af040d081027ba3cf6bbca6
```
And we see that our changes have been made! `new_data.txt` has been added, and `dancing.jpg` has been removed.
```
file name | size | hash
------------------------------------------------------------------------------------------------------------------------------------------------
data.csv | 21,440 | b618696f57b822cd2e9b92564a52b3cc93a2206f41df3f022956bb6cfe4e7ad5
new_data.txt | 15 | 6df986a2154902260a836febc5a32543f5337eac60560c57db99257a7e012051
picasso.jpg | 114,573 | 6b3c67ea9ec82b09bd7520dd09dad2f1176347d740fd2042c88720e780691a7c
sample.json | 132 | 9c42a9a978ac7a71873ebd5c65985e613cfaaff1c98f655af0d2ee0246502fd7
sample.mp3 | 72,142 | fbb756ae14005420ff00ccdaff99416bebfcea3adb7e30963a69e68e9fbe361b
Total 5 files, 208302 bytes
```
By using `clearml-data`, a clear lineage is created for the data. As seen in this example, when a dataset is closed, the
only way to add or remove data is to create a new dataset, and using the previous dataset as a parent. This way, the data
is not reliant on the code and is reproducible.

View File

@@ -0,0 +1,69 @@
---
title: PyTorch Distributed
---
The [pytorch_distributed_example.py](https://github.com/allegroai/clearml/blob/master/examples/frameworks/pytorch/pytorch_distributed_example.py)
script demonstrates integrating **ClearML** into code that uses the [PyTorch Distributed Communications Package](https://pytorch.org/docs/stable/distributed.html)
(`torch.distributed`).
The script initializes a main Task and spawns subprocesses, each for an instance of that Task.
The Task in each subprocess trains a neural network over a partitioned dataset (the torchvision built-in [MNIST](https://pytorch.org/vision/stable/datasets.html#mnist)
dataset), and reports (uploads) the following to the main Task:
* Artifacts - A dictionary containing different key-value pairs.
* Scalars - Loss reported as a scalar during training in each Task in a subprocess.
* Hyperparameters - Hyperparameters created in each Task are added to the hyperparameters in the main Task.
Each Task in a subprocess references the main Task by calling [Task.current_task](../../references/sdk/task#taskcurrent_task), which always returns
the main Task.
When the script runs, it creates an experiment named `test torch distributed`, which is associated with the `examples` project
in the **ClearML Web UI**.
## Artifacts
The example uploads a dictionary as an artifact in the main Task by calling the [Task.upload_artifact](../../references/sdk/task.md#upload_artifact)
method on [`Task.current_task`](../../references/sdk/task.md#taskcurrent_task) (the main Task). The dictionary contains the [`dist.rank`](https://pytorch.org/docs/stable/distributed.html#torch.distributed.get_rank)
of the subprocess, making each unique.
Task.current_task().upload_artifact(
'temp {:02d}'.format(dist.get_rank()), artifact_object={'worker_rank': dist.get_rank()})
All of these artifacts appear in the main Task under **ARTIFACTS** **>** **OTHER**.
![image](../../img/examples_pytorch_distributed_example_09.png)
## Scalars
Loss is reported to the main Task by calling the [Logger.report_scalar](../../references/sdk/logger#report_scalar)
method on `Task.current_task().get_logger`, which is the logger for the main Task. Since `Logger.report_scalar` is called
with the same title (`loss`), but a different series name (containing the subprocess' `rank`), all loss scalar series are
logged together.
Task.current_task().get_logger().report_scalar(
'loss', 'worker {:02d}'.format(dist.get_rank()), value=loss.item(), iteration=i)
The single scalar plot for loss appears in **RESULTS** **>** **SCALARS**.
![image](../../img/examples_pytorch_distributed_example_08.png)
## Hyperparameters
**ClearML** automatically logs the argparse command line options. Since the [Task.connect](../../references/sdk/task#connect)
method is called on `Task.current_task`, they are logged in the main Task. A different hyperparameter key is used in each
subprocess, so they do not overwrite each other in the main Task.
param = {'worker_{}_stuff'.format(dist.get_rank()): 'some stuff ' + str(randint(0, 100))}
Task.current_task().connect(param)
All the hyperparameters appear in **CONFIGURATIONS** **>** **HYPER PARAMETERS**.
![image](../../img/examples_pytorch_distributed_example_01.png)
![image](../../img/examples_pytorch_distributed_example_01a.png)
## Log
Output to the console, including the text messages printed from the main Task object and each subprocess appear in **RESULTS** **>** **LOG**.
![image](../../img/examples_pytorch_distributed_example_06.png)

View File

@@ -0,0 +1,35 @@
---
title: Subprocess
---
The [subprocess_example.py](https://github.com/allegroai/clearml/blob/master/examples/distributed/subprocess_example.py)
script demonstrates multiple subprocesses interacting and reporting to a main Task. The following happens in the script:
* This script initializes a main Task and spawns subprocesses, each for an instances of that Task.
* Each Task in a subprocess references the main Task by calling [Task.current_task](../../references/sdk/task#taskcurrent_task),
which always returns the main Task.
* The Task in each subprocess reports the following to the main Task:
* Hyperparameters - Additional, different hyperparameters.
* Log - Text logged to the console as the Task in each subprocess executes.
* When the script runs, it creates an experiment named `Popen example` which is associated with the `examples` project.
## Hyperparameters
**ClearML** automatically logs the command line options defined with `argparse`. A parameter dictionary is logged by
connecting it to the Task using a call to the [Task.connect](../../references/sdk/task#connect) method.
additional_parameters = {'stuff_' + str(randint(0, 100)): 'some stuff ' + str(randint(0, 100))}
Task.current_task().connect(additional_parameters)
Command line options appear in **CONFIGURATIONS** **>** **HYPER PARAMETERS** **>** **Args**.
![image](../../img/examples_subprocess_example_01.png)
Parameter dictionaries appear in **General**.
![image](../../img/examples_subprocess_example_01a.png)
## Log
Output to the console, including the text messages from the Task in each subprocess, appear in **RESULTS** **>** **LOG**.
![image](../../img/examples_subprocess_example_02.png)

View File

@@ -0,0 +1,43 @@
---
title: Extra Docker Shell Script
---
When using `clearml-agent`, an agent recreates an entire execution environment, be it by pulling the docker container or
installing specified packages, and then executes the code on a remote machine. The Agent takes into account required Python packages,
but sometimes, when using a Docker container, a user may need to use additional, non-Python tools.
## Tutorial
In this tutorial, we will learn how to use `extra_docker_shell_script`, with which we will reconfigure an Agent to execute
a shell script when a docker is started, but before an experiment is run.
## Prerequisites
* `clearml-agent` downloaded and configured - work on a machine which has access to the configuration file of the Agent
you want to configure
* Any code with a ClearML Task.
## Steps
1. Open your ClearML configuration file for editing. Depending upon your operating system, it is:
* Linux - ~/clearml.conf
* Mac - $HOME/clearml.conf
* Windows - \\User\\<username\>\\clearml.conf
When you open up the file, the first line should say: `# CLEARML-AGENT configuration file`
1. In the file, search for and go to, "extra_docker_shell_script:", which is where we will be putting our extra script. If
it is commented out, make sure to uncomment the line. We will use the example script that is already there ["apt-get install -y bindfs", ].
1. Search for and go to "docker_force_pull" in the document, and make sure that it is set to "true", so that your docker image will
be updated.
1. Run the `clearml-agent` in docker mode: `clearml-agent daemon --docker --queue default`. The agent will use the default
Cuda/Nvidia Docker Image.
1. Enqueue any Clearml Task to the default queue, which the Agent is now listening to. The Agent pulls the Task, and then reproduces it,
and now it will execute the `extra_docker_shell_script` that was put in the configuration file. Then the code will be
executed in the updated docker container. If we look at the console output in the web UI, the third entry should start
with `Executing: ['docker', 'run', '-t', '--gpus...'`, and towards the end of the entry, where the downloaded packages are
mentioned, we can see the additional shell-script `apt-get install -y bindfs`.

View File

@@ -0,0 +1,37 @@
---
title: AutoKeras Imdb
---
The [autokeras_imdb_example.py](https://github.com/allegroai/clearml/blob/master/examples/frameworks/autokeras/autokeras_imdb_example.py) example
script demonstrates the integration of **ClearML** into code, which uses [autokeras](https://github.com/keras-team/autokeras).
It trains text classification networks on the Keras built-in [IMDB](https://keras.io/api/datasets/imdb/) dataset, using the autokeras [TextClassifier](https://autokeras.com/text_classifier/) class, and searches for the best model. It uses two TensorBoard callbacks, one for training and one for testing. **ClearML** automatically logs everything the code sends to TensorBoard. When the script runs, it creates an experiment named `autokeras imdb example with scalars`, which is associated with the `autokeras` project.
## Scalars
The loss and accuracy metric scalar plots appear in **RESULTS** **>** **SCALARS**, along with the resource utilization plots,
which are titled **:monitor: machine**.
![image](../../../img/examples_keras_14.png)
## Hyperparameters
**ClearML** automatically logs TensorFlow Definitions. They appear in **CONFIGURATIONS** **>** **HYPER PARAMETERS** **>** **TF_DEFINE**.
![image](../../../img/examples_keras_16.png)
## Log
Text printed to the console for training progress, as well as all other console output, appear in **RESULTS** **>** **LOG**.
![image](../../../img/examples_keras_15.png)
## Artifacts
Model artifacts associated with the experiment appear in the info panel of the **EXPERIMENTS** tab and in the model info panel in the **MODELS** tab.
The experiment info panel shows model tracking, including the model name and design (in this case, no design was stored).
![image](../../../img/examples_keras_18.png)
The model info panel contains the model details, including the model URL, framework, and snapshot locations.
![image](../../../img/examples_keras_17.png)

View File

@@ -0,0 +1,41 @@
---
title: AutoKeras Integration
---
Integrate **ClearML** into code that uses [autokeras](https://github.com/keras-team/autokeras). Initialize a **ClearML**
Task in a code, and **ClearML** automatically logs scalars, plots, and images reported to TensorBoard, Matplotlib, Plotly,
and Seaborn, and all other automatic logging, and explicit reporting added to the code (see [Logging](../../../fundamentals/logger.md)).
**ClearML** allows to:
* Visualize experiment results in the **ClearML Web UI**.
* Track and upload models.
* Track model performance and create tracking leaderboards.
* Rerun experiments, reproduce experiments on any target machine, and tune experiments.
* Compare experiments.
See the [AutoKeras](autokeras_imdb_example.md) example, which shows **ClearML** automatically logging:
* Scalars
* Hyperparameters
* The console log
* Models.
Once these are logged, they can be visualized in the **ClearML Web UI**.
:::note
If you are not already using **ClearML**, see [Getting Started](/getting_started/ds/best_practices.md).
:::
## Adding ClearML to code
Add two lines of code:
```python
from clearml import Task
task = Task.init(project_name="myProject", task_name="myExperiment")
```
When the code runs, it initializes a Task in **ClearML Server**. A hyperlink to the experiment's log is output to the console.
CLEARML Task: created new task id=c1f1dc6cf2ee4ec88cd1f6184344ca4e
CLEARML results page: https://app.clearml-master.hosted.allegro.ai/projects/1c7a45633c554b8294fa6dcc3b1f2d4d/experiments/c1f1dc6cf2ee4ec88cd1f6184344ca4e/output/log
Later in the code, define callbacks using TensorBoard, and **ClearML** logs TensorBoard scalars, histograms, and images.

View File

@@ -0,0 +1,28 @@
---
title: Fastai
---
The [fastai_with_tensorboard.py](https://github.com/allegroai/clearml/blob/master/examples/frameworks/fastai/fastai_with_tensorboard.py)
example demonstrates the integration of **ClearML** into code that uses fastai and TensorBoard.
The example code does the following:
1. Trains a simple deep neural network on the fastai built-in MNIST dataset (see the [fast.ai](https://docs.fast.ai) documentation).
1. Uses the fastai `LearnerTensorboardWriter` callback, and **ClearML** automatically logs TensorBoard through the callback.
1. During script execution, creates an experiment named `fastai with tensorboard callback`, which is associated with the `examples` project.
## Scalars
**ClearML** automatically logs the histogram output to TensorBoard. They appear in **RESULTS** **>** **PLOTS**.
![image](../../../img/examples_reporting_fastai_01.png)
## Plots
Histograms output to TensorBoard. They appear in **RESULTS** **>** **PLOTS**.
![image](../../../img/examples_reporting_fastai_02.png)
## Logs
Text printed to the console for training progress, as well as all other console output, appear in **RESULTS** **>** **LOG**.
![image](../../../img/examples_reporting_fastai_03.png)

View File

@@ -0,0 +1,47 @@
---
title: Keras with TensorBoard - Jupyter Notebook
---
The [ClearML_keras_TB_example.ipynb](https://github.com/allegroai/clearml/blob/master/examples/frameworks/keras/jupyter_keras_TB_example.ipynb)
example demonstrates **ClearML** automatically logging code, which is running in Jupyter Notebook and is using Keras and TensorBoard.
The example script does the following:
1. Trains a simple deep neural network on the Keras built-in [MNIST](https://keras.io/api/datasets/mnist/#load_data-function)
dataset.
1. Builds a sequential model using a categorical crossentropy loss objective function.
1. Specifies accuracy as the metric, and uses two callbacks: a TensorBoard callback and a model checkpoint callback.
1. During script execution, creates an experiment named `Keras with TensorBoard example` which is associated with the `Colab notebooks` project.
:::note
In the ``clearml`` GitHub repository, this example includes a clickable icon to open the notebook in Google Colab.
:::
## Scalars
The loss and accuracy metric scalar plots appear in **RESULTS** **>** **SCALARS**, along with the resource utilization plots,
which are titled **:monitor: machine**.
![image](../../../img/examples_keras_01.png)
## Histograms
Histograms for layer density appear in **RESULTS** **>** **PLOTS**.
![image](../../../img/examples_keras_02.png)
## Hyperparameters
**ClearML** automatically logs TensorFlow Definitions, which appear in **CONFIGURATIONS** **>** **HYPER PARAMETERS** **>** **TF_DEFINE**.
![image](../../../img/examples_keras_00a.png)
## Log
Text printed to the console for training progress, as well as all other console output, appear in **RESULTS** **>** **LOG**.
![image](../../../img/keras_colab_01.png)
## Configuration objects
The configuration appears in **CONFIGURATIONS** **>** **CONFIGURATION OBJECTS** **>** **General**.
![image](../../../img/keras_colab_02.png)

View File

@@ -0,0 +1,82 @@
---
title: Keras with Matplotlib - Jupyter Notebook
---
The [jupyter.ipynb](https://github.com/allegroai/clearml/blob/master/examples/frameworks/keras/jupyter.ipynb) example
demonstrates **ClearML**'s automatic logging of code running in a Jupyter Notebook that uses Keras and Matplotlib.
The example does the following:
1. Trains a simple deep neural network on the Keras built-in [MNIST](https://keras.io/api/datasets/mnist/#load_data-function)
dataset.
1. Builds a sequential model using a categorical crossentropy loss objective function.
1. Specifies accuracy as the metric, and uses two callbacks: a TensorBoard callback and a model checkpoint callback.
1. During script execution, creates an experiment named `notebook example` which is associated with the `examples` project.
## Scalars
The loss and accuracy metric scalar plots appear in **RESULTS** **>** **SCALARS**, along with the resource utilization plots, which are titled **:monitor: machine**.
![image](../../../img/examples_keras_jupyter_08.png)
## Plots
The example calls Matplotlib methods to create several sample plots, and TensorBoard methods to plot histograms for layer density.
They appear in **RESULTS** **>** **PLOTS**.
![image](../../../img/examples_keras_jupyter_03.png)
![image](../../../img/examples_keras_jupyter_03a.png)
![image](../../../img/examples_keras_jupyter_03b.png)
## Debug samples
The example calls Matplotlib methods to log debug sample images. They appear in **RESULTS** **>** **DEBUG SAMPLES**.
![image](../../../img/examples_keras_jupyter_04.png)
## Hyperparameters
**ClearML** automatically logs TensorFlow Definitions. A parameter dictionary is logged by connecting it to the Task, by
calling the [Task.connect](../../../references/sdk/task.md#connect) method.
```python
task_params = {'num_scatter_samples': 60, 'sin_max_value': 20, 'sin_steps': 30}
task_params = task.connect(task_params)
```
Later in the Jupyter Notebook, more parameters are added to the dictionary.
```python
task_params['batch_size'] = 128
task_params['nb_classes'] = 10
task_params['nb_epoch'] = 6
task_params['hidden_dim'] = 512
```
Parameter dictionaries appear in **CONFIGURATIONS** **>** **HYPER PARAMETERS** **>** **General**.
![image](../../../img/examples_keras_jupyter_20.png)
The TensorFlow Definitions appear in the **TF_DEFINE** subsection.
![image](../../../img/examples_keras_jupyter_21.png)
## Log
Text printed to the console for training appears in **RESULTS** **>** **LOG**.
![image](../../../img/examples_keras_jupyter_07.png)
## Artifacts
Model artifacts associated with the experiment appear in the experiment info panel (in the **EXPERIMENTS** tab), and in the model info panel (in the **MODELS** tab).
The experiment info panel shows model tracking, including the model name and design in **ARTIFACTS** **>** **Output Model**.
![image](../../../img/examples_keras_jupyter_23.png)
The model info panel contains the model details, including the model URL, framework, and snapshot locations.
![image](../../../img/examples_keras_jupyter_24.png)

View File

@@ -0,0 +1,59 @@
---
title: Keras with TensorBoard
---
The [keras_tensorboard.py](https://github.com/allegroai/clearml/blob/master/examples/frameworks/keras/keras_tensorboard.py)
example demonstrates the integration of **ClearML** into code which uses Keras and TensorBoard.
The example does the following:
1. Trains a simple deep neural network on the Keras built-in [MNIST](https://keras.io/api/datasets/mnist/#load_data-function)
dataset.
1. Builds a sequential model using a categorical crossentropy loss objective function.
1. Specifies accuracy as the metric, and uses two callbacks: a TensorBoard callback and a model checkpoint callback.
1. During script execution, it creates an experiment named `Keras with TensorBoard example` which is associated with the
`examples` project.
## Scalars
The loss and accuracy metric scalar plots appear in the **RESULTS** **>** **SCALARS**, along with the resource utilization
plots, which are titled **:monitor: machine**.
![image](../../../img/examples_keras_01.png)
## Histograms
Histograms for layer density appear in **RESULTS** **>** **PLOTS**.
![image](../../../img/examples_keras_02.png)
## Hyperparameters
**ClearML** automatically logs command line options generated with `argparse`, and TensorFlow Definitions.
Command line options appear in **CONFIGURATIONS** **>** **HYPER PARAMETERS** **>** **Args**.
![image](../../../img/examples_keras_00.png)
TensorFlow Definitions appear in **TF_DEFINE**.
![image](../../../img/examples_keras_00a.png)
## Log
Text printed to the console for training progress, as well as all other console output, appear in **RESULTS** **>** **LOG**.
![image](../../../img/examples_keras_03.png)
## Configuration objects
In the experiment code, a configuration dictionary is connected to the Task by calling the [Task.connect](../../../references/sdk/task.md#connect)
method.
```python
task.connect_configuration({'test': 1337, 'nested': {'key': 'value', 'number': 1}})
```
It appears in **CONFIGURATIONS** **>** **CONFIGURATION OBJECTS**.
![image](../../../img/examples_keras_00b.png)

View File

@@ -0,0 +1,66 @@
---
title: Manual Model Upload
---
The [manual_model_upload.py](https://github.com/allegroai/clearml/blob/master/examples/frameworks/keras/manual_model_upload.py)
example demonstrates **ClearML**'s tracking of a manually configured model created with Keras, including:
* Model checkpoints (snapshots),
* Hyperparameters
* Console output.
When the script runs, it creates an experiment named `Model configuration and upload`, which is associated with the `examples` project.
Configure **ClearML** for model checkpoint (snapshot) storage in any of the following ways ([debug sample](../../../references/sdk/logger.md#set_default_upload_destination)
storage is different):
* In the configuration file, set [default_output_uri](../../../configs/clearml_conf.md#sdkdevelopment).
* In code, when [initializing a Task](../../../references/sdk/task.md#taskinit), use the `output_uri` parameter.
* In the **ClearML Web UI**, when [modifying an experiment](../../../webapp/webapp_exp_tuning.md#output-destination).
## Configuration
This example shows two ways to connect a configuration, using the [Task.connect_configuration](../../../references/sdk/task.md#connect_configuration)
method.
* Connect a configuration file by providing the file's path. **ClearML Server** stores a copy of the file.
```python
# Connect a local configuration file
config_file = os.path.join('..', '..', 'reporting', 'data_samples', 'sample.json')
config_file = task.connect_configuration(config_file)
```
* Create a configuration dictionary and provide the dictionary.
```python
model_config_dict = {
'value': 13.37,
'dict': {'sub_value': 'string', 'sub_integer': 11},
'list_of_ints': [1, 2, 3, 4],
}
model_config_dict = task.connect_configuration(model_config_dict)
```
If the configuration changes, **ClearML** tracks it.
```python
model_config_dict['new value'] = 10
model_config_dict['value'] *= model_config_dict['new value']
```
The configuration appears in **CONFIGURATIONS** **>** **CONFIGURATION OBJECTS**.
![image](../../../img/examples_manual_model_upload_01.png)
## Artifacts
Model artifacts associated with the experiment appear in the experiment info panel (in the **EXPERIMENTS** tab), and in the model info panel (in the **MODELS** tab).
The experiment info panel shows model tracking, including the model name and design:
![image](../../../img/examples_manual_model_upload_02.png)
The model info panel contains the model details, including the model URL, framework, and snapshot locations.
![image](../../../img/examples_manual_model_upload_03.png)

View File

@@ -0,0 +1,32 @@
---
title: Matplotlib - Jupyter Notebook
---
The [jupyter_matplotlib_example.ipynb](https://github.com/allegroai/clearml/blob/master/examples/frameworks/matplotlib/jupyter_matplotlib_example.ipynb)
example demonstrates the integration of **ClearML** into code running in Jupyter Notebook, which uses `matplotlib` to plot
scatter diagrams, and show images. **ClearML** automatically logs the diagrams and images. When the script runs, ClearML
creates an experiment named `Matplotlib example` which is associated with the `Colab notebooks` project.
:::note
In the ``clearml`` GitHub repository, this example includes a clickable icon to open the notebook in Google Colab.
:::
## Plots
The scatter plots appear in the **ClearML Web UI**, in **RESULTS** **>** **PLOTS**.
![image](../../../img/examples_matplotlib_example_01.png)
![image](../../../img/examples_matplotlib_example_02.png)
![image](../../../img/examples_matplotlib_example_03.png)
## Debug samples
The images appear in **RESULTS** **>** **DEBUG SAMPLES**. Each debug sample image is associated with a metric.
![image](../../../img/examples_matplotlib_example_04.png)
View the debug sample in the image viewer.
![image](../../../img/examples_matplotlib_example_05.png)

View File

@@ -0,0 +1,28 @@
---
title: Matplotlib
---
The [matplotlib_example.py](https://github.com/allegroai/clearml/blob/master/examples/frameworks/matplotlib/matplotlib_example.py)
example demonstrates integrating **ClearML** into code that uses `matplotlib` to plot scatter diagrams, and show images.
**ClearML** automatically logs the diagrams and images. When the script runs, it creates an experiment named `Matplotlib example`,
which is associated with the `examples` project.
## Plots
The scatter plots appear in the **ClearML Web UI**, in **RESULTS** **>** **PLOTS**.
![image](../../../img/examples_matplotlib_example_01.png)
![image](../../../img/examples_matplotlib_example_02.png)
![image](../../../img/examples_matplotlib_example_03.png)
## Debug samples
The images appear in **RESULTS** **>** **DEBUG SAMPLES**. Each debug sample image is associated with a metric.
![image](../../../img/examples_matplotlib_example_04.png)
View the debug sample in the image viewer.
![image](../../../img/examples_matplotlib_example_05.png)

View File

@@ -0,0 +1,182 @@
---
title: PyTorch Ignite Integration
---
Integrate **ClearML** into code using [ignite](https://github.com/pytorch/ignite).
Use ignite's `ClearMLLogger`, and the handlers that can be attached to it. See ignite's [handler](https://github.com/pytorch/ignite/blob/master/ignite/contrib/handlers/trains_logger.py).
:::note
If you are not already using **ClearML**, see our [Getting Started](/getting_started/ds/ds_first_steps.md).
:::
## Ignite ClearMLLogger
Integrate **ClearML** with the following steps:
1. Create an Ignite `ClearMLLogger` object.
1. When the code runs, it connects to the **ClearML** backend, and creates a Task (experiment) in **ClearML**.
```python
from ignite.contrib.handlers.clearml_logger import *
clearml_logger = ClearMLLogger(project_name="examples", task_name="ignite")
```
1. Later in the code, attach any of the **ClearML** handlers to the `ClearMLLogger` object.
For example, attach the `OutputHandler` and log training loss at each iteration:
```python
clearml_logger.attach(trainer,
log_handler=OutputHandler(tag="training",
output_transform=lambda loss: {"loss": loss}),
event_name=Events.ITERATION_COMPLETED)
```
### ClearMLLogger parameters
The following are the `ClearMLLogger` method parameters:
* `project_name` (optional[str]) The name of the project in which the experiment will be created. If the project does not exist, it is created. If `project_name` is `None`, the repository name becomes the project name.
* `task_name` (optional[str]) The name of Task (experiment). If `task_name` is `None`, the Python experiment scripts file name becomes the Task name.
* `task_type` (optional[str]) The name of the experiment.
The `task_type` values include:
* `TaskTypes.training` (default)
* `TaskTypes.train`
* `TaskTypes.testing`
* `TaskTypes.inference`
* `report_freq` (optional[int]) The histogram processing frequency (handles histogram values every X calls to the handler). Affects `GradsHistHandler` and `WeightsHistHandler`. Default value is `100`.
* `histogram_update_freq_multiplier` (optional[int]) The histogram report frequency (report first X histograms and once every X reports afterwards). Default value is `10`.
* `histogram_granularity` (optional[int]): Optional. Histogram sampling granularity. Default is `50`.
<a name="visualizing" class="tr_top_negative"></a>
## Logging
### Ignite engine output and / or metrics
To log scalars, Ignite engine's output and / or metrics, use the `OutputHandler`.
* Log training loss at each iteration:
```python
# Attach the logger to the trainer to log training loss at each iteration
clearml_logger.attach(trainer,
log_handler=OutputHandler(tag="training",
output_transform=lambda loss: {"loss": loss}),
event_name=Events.ITERATION_COMPLETED)
```
* Log metrics for training:
```python
# Attach the logger to the evaluator on the training dataset and log NLL, Accuracy metrics after each epoch
# We setup `global_step_transform=global_step_from_engine(trainer)` to take the epoch
# of the `trainer` instead of `train_evaluator`.
clearml_logger.attach(train_evaluator,
log_handler=OutputHandler(tag="training",
metric_names=["nll", "accuracy"],
global_step_transform=global_step_from_engine(trainer)),
event_name=Events.EPOCH_COMPLETED)
```
* Log metrics for validation:
```python
# Attach the logger to the evaluator on the validation dataset and log NLL, Accuracy metrics after
# each epoch. We setup `global_step_transform=global_step_from_engine(trainer)` to take the epoch of the
# `trainer` instead of `evaluator`.
clearml_logger.attach(evaluator,
log_handler=OutputHandler(tag="validation",
metric_names=["nll", "accuracy"],
global_step_transform=global_step_from_engine(trainer)),
event_name=Events.EPOCH_COMPLETED)
```
### Optimizer parameters
To log optimizer parameters, use `OptimizerParamsHandler`:
```python
# Attach the logger to the trainer to log optimizer's parameters, e.g., learning rate at each iteration
clearml_logger.attach(trainer,
log_handler=OptimizerParamsHandler(optimizer),
event_name=Events.ITERATION_STARTED)
```
### Model weights
To log model weights as scalars, use `WeightsScalarHandler`:
```python
# Attach the logger to the trainer to log model's weights norm after each iteration
clearml_logger.attach(trainer,
log_handler=WeightsScalarHandler(model, reduction=torch.norm),
event_name=Events.ITERATION_COMPLETED)
```
To log model weights as histograms, use `WeightsHistHandler`:
```python
# Attach the logger to the trainer to log model's weights norm after each iteration
clearml_logger.attach(trainer,
log_handler=WeightsHistHandler(model),
event_name=Events.ITERATION_COMPLETED)
```
## Model snapshots
To save input snapshots as **ClearML** artifacts, use `ClearMLSaver`:
```python
to_save = {"model": model}
handler = Checkpoint(to_save, ClearMLSaver(clearml_logger), n_saved=1,
score_function=lambda e: 123, score_name="acc",
filename_prefix="best",
global_step_transform=global_step_from_engine(trainer))
validation_evaluator.add_event_handler(Events.EVENT_COMPLETED, handler)
```
## Visualizing experiment results
When the code with an ignite `ClearMLLogger` object and attached [handlers](https://github.com/pytorch/ignite/blob/master/ignite/contrib/handlers/trains_logger.py)
runs, the experiment results can be visualized in the **ClearML Web UI**.
The `ignite` repository contains an MNIST ClearMLLogger example, [mnist_with_clearml_logger.py](https://github.com/pytorch/ignite/blob/master/examples/contrib/mnist/mnist_with_clearml_logger.py).
Run this code and visualize the experiment results in the **ClearML Web UI**.
### Scalars
View the scalars, including training and validation metrics, in the experiment's page in the **ClearML Web UI**, under
**RESULTS** **>** **SCALARS**.
![image](../../../img/ignite_training.png)
![image](../../../img/ignite_validation.png)
### Model snapshots
To save model snapshots, use `ClearMLServer`.
```python
handler = Checkpoint(
{"model": model},
ClearMLSaver(clearml_logger, dirname="~/.clearml/cache/"),
n_saved=1,
score_function=lambda e: 123,
score_name="acc",
filename_prefix="best",
global_step_transform=global_step_from_engine(trainer),
)
```
<br/>
View saved snapshots in the **ARTIFACTS** tab.
![image](../../../img/ignite_artifact.png)
To view the model, in the **ARTIFACTS** tab, click the model name (or download it).
![image](../../../img/ignite_model.png)

View File

@@ -0,0 +1,79 @@
---
title: Manual Model Upload
---
The [manual_model_upload.py](https://github.com/allegroai/clearml/blob/master/examples/frameworks/pytorch/manual_model_upload.py)
example demonstrates **ClearML**'s tracking of a manually configured model created with PyTorch, including model checkpoints
(snapshots), and output to the console. When the script runs, it creates an experiment named `Model configuration and upload`,
which is associated with the `examples` project.
Configure **ClearML** for model checkpoint (snapshot) storage in any of the following ways ([debug sample](../../../references/sdk/logger.md#set_default_upload_destination) storage is different):
* In the configuration file, set [default_output_uri](../../../configs/clearml_conf.md#sdkdevelopment).
* In code, when [initializing a Task](../../../references/sdk/task.md#taskinit), use the `output_uri` parameter.
* In the **ClearML Web UI**, when [modifying an experiment](../../../webapp/webapp_exp_tuning.md#output-destination).
## Configuration
This example shows two ways to connect a configuration, using the [Task.connect_configuration](../../../references/sdk/task.md#connect_configuration)
method.
* Connect a configuration file by providing the file's path. **ClearML Server** stores a copy of the file.
```python
# Connect a local configuration file
config_file = os.path.join('..', '..', 'reporting', 'data_samples', 'sample.json')
config_file = task.connect_configuration(config_file)
```
* Create a configuration dictionary and plug it into the method.
```python
model_config_dict = {
'value': 13.37,
'dict': {'sub_value': 'string', 'sub_integer': 11},
'list_of_ints': [1, 2, 3, 4],
}
model_config_dict = task.connect_configuration(model_config_dict)
```
If the configuration changes, **ClearML** tracks it.
```python
model_config_dict['new value'] = 10
model_config_dict['value'] *= model_config_dict['new value']
```
## Artifacts
Model artifacts associated with the experiment appear in the info panel of the **EXPERIMENTS** tab and in the info panel
in the **MODELS** tab.
The model info panel contains model details, including:
* Model design (which is also in the experiment info panel)
* Label enumeration
* Model URL
* Framework
* Snapshot locations.
### General model information
![image](../../../img/examples_pytorch_manual_model_upload_03.png)
### Model design
![image](../../../img/examples_pytorch_manual_model_upload_04.png)
### Label enumeration
Connect a label enumeration dictionary by calling the [Task.connect_label_enumeration](../../../references/sdk/task.md#connect_label_enumeration)
method.
```python
# store the label enumeration of the training model
labels = {'background': 0, 'cat': 1, 'dog': 2}
task.connect_label_enumeration(labels)
```
![image](../../../img/examples_pytorch_manual_model_upload_05.png)

View File

@@ -0,0 +1,51 @@
---
title: Audio Classification - Jupyter Notebooks
---
The example [audio_classification_UrbanSound8K.ipynb](https://github.com/allegroai/clearml/blob/master/examples/frameworks/pytorch/notebooks/audio/audio_classifier_UrbanSound8K.ipynb) demonstrates integrating **ClearML** into a Jupyter Notebook which uses PyTorch, TensorBoard, and TorchVision to train a neural network on the UrbanSound8K dataset for audio classification. The example calls TensorBoard methods in training and testing to report scalars, audio debug samples, and spectrogram visualizations. The spectrogram visualizations are plotted by calling Matplotlib methods. In the example, we also demonstrate connecting parameters to a Task and logging them. When the script runs, it creates an experiment named `audio classifier` which is associated with the `Audio Example` project.
## Scalars
The accuracy, learning rate, and training loss scalars are automatically logged, along with the resource utilization plots (titled **:monitor: machine**), and appear **RESULTS** **>** **SCALARS**.
![image](../../../../../img/examples_audio_classification_UrbanSound8K_03.png)
## Debug samples
The audio samples and spectrogram plots are automatically logged and appear in **RESULTS** **>** **DEBUG SAMPLES**.
### Audio samples
![image](../../../../../img/examples_audio_classification_UrbanSound8K_06.png)
By doubling clicking a thumbnail, you can play an audio sample.
### Spectrogram visualizations
![image](../../../../../img/examples_audio_classification_UrbanSound8K_04.png)
By doubling clicking a thumbnail, you can view a spectrogram plot in the image viewer.
![image](../../../../../img/examples_audio_classification_UrbanSound8K_05.png)
## Hyperparameters
**ClearML** automatically logs TensorFlow Definitions. A parameter dictionary is logged by connecting it to the Task using
a call to the [Task.connect](../../../../../references/sdk/task.md#connect) method.
configuration_dict = {'number_of_epochs': 10, 'batch_size': 4, 'dropout': 0.25, 'base_lr': 0.001}
configuration_dict = task.connect(configuration_dict) # enabling configuration override by clearml
Parameter dictionaries appear in **CONFIGURATIONS** **>** **HYPER PARAMETERS** **>** **General**.
![image](../../../../../img/examples_audio_classification_UrbanSound8K_01.png)
TensorFlow Definitions appear in the **TF_DEFINE** subsection.
![image](../../../../../img/examples_audio_classification_UrbanSound8K_01a.png)
## Log
Text printed to the console for training progress, as well as all other console output, appear in **RESULTS** **>** **LOG**.
![image](../../../../../img/examples_audio_classification_UrbanSound8K_02.png)

View File

@@ -0,0 +1,31 @@
---
title: Audio Preprocessing - Jupyter Notebook
---
The example [audio_preprocessing_example.ipynb](https://github.com/allegroai/clearml/blob/master/examples/frameworks/pytorch/notebooks/audio/audio_preprocessing_example.ipynb)
demonstrates integrating **ClearML** into a Jupyter Notebook which uses PyTorch and preprocesses audio samples. **ClearML** automatically logs spectrogram visualizations reported by calling Matplotlib methods, and audio samples reported by calling TensorBoard methods. In the example, we also demonstrate connecting parameters to a Task and logging them. When the script runs, it creates an experiment named `data pre-processing`, which is associated with the `Audio Example` project.
## Plots
**ClearML** automatically logs the waveform which the example reports by calling a Matplotlib method. These appear in **RESULTS** **>** **PLOTS**.
![image](../../../../../img/examples_audio_preprocessing_example_08.png)
## Debug samples
**ClearML** automatically logs the audio samples which the example reports by calling TensorBoard methods, and the spectrogram visualizations reported by calling Matplotlib methods. They appear in **RESULTS** **>** **DEBUG SAMPLES**.
### Audio samples
You can play the audio samples by double clicking the audio thumbnail.
![image](../../../../../img/examples_audio_preprocessing_example_03.png)
### Spectrogram visualizations
![image](../../../../../img/examples_audio_preprocessing_example_06.png)
![image](../../../../../img/examples_audio_preprocessing_example_06a.png)
You can view the spectrogram visualizations in the **ClearML Web UI** image viewer.
![image](../../../../../img/examples_audio_preprocessing_example_07.png)

View File

@@ -0,0 +1,124 @@
---
title: Image Hyperparameter Optimization - Jupyter Notebook
---
[hyperparameter_search.ipynb](https://github.com/allegroai/clearml/blob/master/examples/frameworks/pytorch/notebooks/image/hyperparameter_search.ipynb)
demonstrates integrating **ClearML** into a Jupyter Notebook which performs automated hyperparameter optimization. This
is an example of **ClearML** [automation](../../../../../references/sdk/automation_controller_pipelinecontroller). It creates a **ClearML**
[HyperParameterOptimizer](../../../../../references/sdk/hpo_optimization_hyperparameteroptimizer.md)
object, which is a search controller. The search controller's search strategy optimizer is [OptimizerBOHB](../../../../../references/sdk/hpo_hpbandster_bandster_optimizerbohb.md)
The example maximizes total accuracy by finding an optimal batch size, base learning rate, and dropout. **ClearML**
automatically logs the optimization's top performing experiments.
The experiment whose hyperparameters are optimized is named `image_classification_CIFAR10`. It is created by running another
**ClearML** example, [image_classification_CIFAR10.ipynb](https://github.com/allegroai/clearml/blob/master/examples/frameworks/pytorch/notebooks/image/image_classification_CIFAR10.ipynb), which must run before `hyperparameter_search.ipynb`.
When `hyperparameter_search.py` runs, it creates an experiment named `Hyper-Parameter Optimization` which is associated
with the `Hyper-Parameter Search` project.
The optimizer Task, `Hyper-Parameter Optimization`, and the experiments appear individually in the **ClearML Web UI**.
## Optimizer Task
### Scalars
Scalars for total accuracy and remaining budget by iteration, and a plot of total accuracy by iteration appear in **RESULTS** **>** **SCALARS**. Remaining budget indicates the percentage of total iterations for all jobs left before that total is reached.
These scalars are reported automatically by **ClearML** from `HyperParameterOptimizer` when it runs.
![image](../../../../../img/examples_hyperparameter_search_04.png)
### Plots
A plot for the optimization of total accuracy by job appears in **RESULTS** **>** **SCALARS**.
This is also reported automatically by **ClearML** when `HyperParameterOptimizer` runs.
![image](../../../../../img/examples_hyperparameter_search_05.png)
### Hyperparameters
`HyperParameterOptimizer` hyperparameters, including the optimizer parameters appear in **CONFIGURATIONS** **>** **HYPER PARAMETERS**.
These hyperparameters are those in the optimizer Task, where the `HyperParameterOptimizer` object is created.
optimizer = HyperParameterOptimizer(
base_task_id=TEMPLATE_TASK_ID, # This is the experiment we want to optimize
# here we define the hyper-parameters to optimize
hyper_parameters=[
UniformIntegerParameterRange('number_of_epochs', min_value=5, max_value=15, step_size=1),
UniformIntegerParameterRange('batch_size', min_value=2, max_value=12, step_size=2),
UniformParameterRange('dropout', min_value=0, max_value=0.5, step_size=0.05),
UniformParameterRange('base_lr', min_value=0.0005, max_value=0.01, step_size=0.0005),
],
# this is the objective metric we want to maximize/minimize
objective_metric_title='accuracy',
objective_metric_series='total',
objective_metric_sign='max', # maximize or minimize the objective metric
max_number_of_concurrent_tasks=3, # number of concurrent experiments
# setting optimizer - clearml supports GridSearch, RandomSearch or OptimizerBOHB
optimizer_class=OptimizerBOHB, # can be replaced with OptimizerBOHB
execution_queue='default', # queue to schedule the experiments for execution
optimization_time_limit=30., # time limit for each experiment (optional, ignored by OptimizerBOHB)
pool_period_min=1, # Check the experiments every x minutes
# set the maximum number of experiments for the optimization.
# OptimizerBOHB sets the total number of iteration as total_max_jobs * max_iteration_per_job
total_max_jobs=12,
# setting OptimizerBOHB configuration (ignored by other optimizers)
min_iteration_per_job=15000, # minimum number of iterations per experiment, till early stopping
max_iteration_per_job=150000, # maximum number of iterations per experiment
)
![image](../../../../../img/examples_hyperparameter_search_01.png)
### Log
All console output from `Hyper-Parameter Optimization` appears in **RESULTS** tab, **LOG** sub-tab.
![image](../../../../../img/examples_hyperparameter_search_03.png)
## Experiments comparison
**ClearML** automatically logs each job, meaning each experiment that executes with a set of hyperparameters, separately. Each appears as an individual experiment in the **ClearML Web UI**, where the Task name is `image_classification_CIFAR10` and the hyperparameters appended.
For example:
`image_classification_CIFAR10: base_lr=0.0075 batch_size=12 dropout=0.05 number_of_epochs=6`
Use the **ClearML Web UI** [experiment comparison](../../../../../webapp/webapp_exp_comparing.md) to visualize the following:
* Side by side hyperparameter value comparison
* Metric comparison by hyperparameter
* Scalars by specific values and series
* Plots
* Debug images
### Side by side hyperparameter value comparison
In the experiment comparison window, **HYPER PARAMETERS** tab, select **Values** in the list (the right of **+ Add Experiment**), and hyperparameter differences appear with a different background color.
![image](../../../../../img/examples_hyperparameter_search_06.png)
### Metric comparison by hyperparameter
Select **Parallel Coordinates** in the list, click a **Performance Metric**, and then select the checkboxes of the hyperparameters.
![image](../../../../../img/examples_hyperparameter_search_07.png)
### Scalar values comparison
In the **SCALARS** tab, select **Last Values**, **Min Values**, or **Max Values**. Value differences appear with a different background color.
![image](../../../../../img/examples_hyperparameter_search_09.png)
### Scalar series comparison
Select **Graph** and the scalar series for the jobs appears, where each scalar plot shows the series for all jobs.
![image](../../../../../img/examples_hyperparameter_search_08.png)
### Debug samples comparison
In the **DEBUG SAMPLES** tab, debug images appear.
![image](../../../../../img/examples_hyperparameter_search_10.png)

View File

@@ -0,0 +1,50 @@
---
title: Image Classification - Jupyter Notebook
---
The example [image_classification_CIFAR10.ipynb](https://github.com/allegroai/clearml/blob/master/examples/frameworks/pytorch/notebooks/image/image_classification_CIFAR10.ipynb)
demonstrates integrating **ClearML** into a Jupyter Notebook, which uses PyTorch, TensorBoard, and TorchVision to train a
neural network on the UrbanSound8K dataset for image classification. **ClearML** automatically logs the example script's
calls to TensorBoard methods in training and testing which report scalars and image debug samples, as well as the model
and console log. In the example, we also demonstrate connecting parameters to a Task and logging them. When the script runs,
it creates an experiment named `image_classification_CIFAR10` which is associated with the `Image Example` project.
Another example optimizes the hyperparameters for this image classification example (see the [Hyperparameter Optimization - Jupyter Notebook](hyperparameter_search.md) documentation page). This image classification example must run before the hyperparameter optimization example.
## Scalars
The accuracy, accuracy per class, and training loss scalars are automatically logged, along with the resource utilization plots (titled **:monitor: machine**), and appear **RESULTS** **>** **SCALARS**.
![image](../../../../../img/examples_image_classification_CIFAR10_05.png)
## Debug samples
The image samples are automatically logged and appear in **RESULTS** **>** **DEBUG SAMPLES**.
![image](../../../../../img/examples_image_classification_CIFAR10_07.png)
By doubling clicking a thumbnail, you can view a spectrogram plot in the image viewer.
![image](../../../../../img/examples_image_classification_CIFAR10_06.png)
## Hyperparameters
**ClearML** automatically logs TensorFlow Definitions. A parameter dictionary is logged by connecting it to the Task using
a call to the [Task.connect](../../../../../references/sdk/task.md#connect) method.
configuration_dict = {'number_of_epochs': 3, 'batch_size': 4, 'dropout': 0.25, 'base_lr': 0.001}
configuration_dict = task.connect(configuration_dict) # enabling configuration override by clearml
Parameter dictionaries appear in **CONFIGURATIONS** **>** **HYPER PARAMETERS** **>** **General**.
![image](../../../../../img/examples_image_classification_CIFAR10_01.png)
TensorFlow Definitions appear in the **TF_DEFINE** subsection.
![image](../../../../../img/examples_image_classification_CIFAR10_01a.png)
## Log
Text printed to the console for training progress, as well as all other console output, appear in **RESULTS** **>** **LOG**.
![image](../../../../../img/examples_image_classification_CIFAR10_04.png)

View File

@@ -0,0 +1,53 @@
---
title: Tabular Data Downloading and Preprocessing - Jupyter Notebook
---
The [download_and_preprocessing.ipynb](https://github.com/allegroai/clearml/blob/master/examples/frameworks/pytorch/notebooks/table/download_and_preprocessing.ipynb) example demonstrates **ClearML** storing preprocessed tabular data as artifacts, and explicitly reporting the tabular data in the **ClearML Web UI**. When the script runs, it creates an experiment named `tabular preprocessing` which is associated with the `Table Example` project.
This tabular data is prepared for another script, [train_tabular_predictor.ipynb](https://github.com/allegroai/clearml/blob/master/examples/frameworks/pytorch/notebooks/table/train_tabular_predictor.ipynb), which trains a network with it.
## Artifacts
The example code preprocesses the downloaded data using Pandas DataFrames, and stores it as three artifacts:
* `Categories per column` - Number of unique values per column of data.
* `Outcome dictionary` - Label enumeration for training.
* `Processed data` - A dictionary containing the paths of the training and validation data.
Each artifact is uploaded by calling the [Task.upload_artifact](../../../../../references/sdk/task.md#upload_artifact)
method. Artifacts appear in the **ARTIFACTS** tab.
![image](../../../../../img/download_and_preprocessing_02.png)
## Plots (tables)
The example code explicitly reports the data in Pandas DataFrames by calling the [Logger.report_table](../../../../../references/sdk/logger.md#report_table)
method.
For example, the raw data is read into a Pandas DataFrame named `train_set`, and the `head` of the DataFrame is reported.
train_set = pd.read_csv(Path(path_to_ShelterAnimal) / 'train.csv')
Logger.current_logger().report_table(title='ClearMLet - raw',series='pandas DataFrame',iteration=0, table_plot=train_set.head())
The tables appear in **RESULTS** **>** **PLOTS**.
![image](../../../../../img/download_and_preprocessing_07.png)
## Hyperparameters
A parameter dictionary is logged by connecting it to the Task using a call to the [Task.connect](../../../../../references/sdk/task.md#connect)
method.
logger = task.get_logger()
configuration_dict = {'test_size': 0.1, 'split_random_state': 0}
configuration_dict = task.connect(configuration_dict)
Parameter dictionaries appear in the **General** subsection.
![image](../../../../../img/download_and_preprocessing_01.png)
## Log
Output to the console appears in **RESULTS** **>** **LOG**.
![image](../../../../../img/download_and_preprocessing_06.png)

View File

@@ -0,0 +1,219 @@
---
title: Tabular Data Pipeline with Concurrent Steps - Jupyter Notebook
---
This example demonstrates an ML pipeline which preprocesses data in two concurrent steps, trains two networks, where each
network's training depends upon the completion of its own preprocessed data, and picks the best model. It is implemented
using the [automation.controller.PipelineController](../../../../../references/sdk/automation_controller_pipelinecontroller.md)
class.
The pipeline uses four Tasks (each Task is created using a different notebook):
* The pipeline controller Task ([tabular_ml_pipeline.ipynb](https://github.com/allegroai/clearml/blob/master/examples/frameworks/pytorch/notebooks/table/tabular_ml_pipeline.ipynb))
* A data preprocessing Task ([preprocessing_and_encoding.ipynb](https://github.com/allegroai/clearml/blob/master/examples/frameworks/pytorch/notebooks/table/preprocessing_and_encoding.ipynb))
* A training Task ([train_tabular_predictor.ipynb](https://github.com/allegroai/clearml/blob/master/examples/frameworks/pytorch/notebooks/table/train_tabular_predictor.ipynb))
* A better model comparison Task ([pick_best_model.ipynb](https://github.com/allegroai/clearml/blob/master/examples/frameworks/pytorch/notebooks/table/pick_best_model.ipynb))
The `automation.controller.PipelineController` class includes functionality to create a pipeline controller, add steps to the pipeline, pass data from one step to another, control the dependencies of a step beginning only after other steps complete, run the pipeline, wait for it to complete, and cleanup afterwards.
In this pipeline example, the data preprocessing Task and training Task are each added to the pipeline twice (each is in two steps). When the pipeline runs, the data preprocessing Task and training Task are cloned twice, and the newly cloned Tasks execute. The Task they are cloned from, called the base Task, does not execute. The pipeline controller passes different data to each cloned Task by overriding parameters. In this way, the same Task can run more than once in the pipeline, but with different data.
:::note
The data download Task is not a step in the pipeline, see [download_and_split](https://github.com/allegroai/clearml/blob/master/examples/frameworks/pytorch/notebooks/table/download_and_split.ipynb).
:::
## Pipeline controller and steps
In this example, a pipeline controller object is created.
pipe = PipelineController(default_execution_queue='dan_queue', add_pipeline_tags=True)
### Preprocessing step
Two preprocessing nodes are added to the pipeline. These steps will run concurrently.
pipe.add_step(name='preprocessing_1', base_task_project='Tabular Example', base_task_name='tabular preprocessing',
parameter_override={'General/data_task_id': '39fbf86fc4a341359ac6df4aa70ff91b',
'General/fill_categorical_NA': 'True',
'General/fill_numerical_NA': 'True'})
pipe.add_step(name='preprocessing_2', base_task_project='Tabular Example', base_task_name='tabular preprocessing',
parameter_override={'General/data_task_id': '39fbf86fc4a341359ac6df4aa70ff91b',
'General/fill_categorical_NA': 'False',
'General/fill_numerical_NA': 'True'})
The preprocessing data Task fills in values of `NaN` data based on the values of the parameters named `fill_categorical_NA`
and `fill_numerical_NA`. It will connect a parameter dictionary to the Task which contains keys with those same names.
The pipeline will override the values of those keys when the pipeline executes the cloned Tasks of the base Task. In this way,
two sets of data are created in the pipeline.
<details className="cml-expansion-panel info">
<summary className="cml-expansion-panel-summary">ClearML tracks and reports the preprocessing step</summary>
<div className="cml-expansion-panel-content">
In the preprocessing data Task, the parameter values in ``data_task_id``, ``fill_categorical_NA``, and ``fill_numerical_NA`` are overridden.
configuration_dict = {'data_task_id': '39fbf86fc4a341359ac6df4aa70ff91b',
'fill_categorical_NA': True, 'fill_numerical_NA': True}
configuration_dict = task.connect(configuration_dict) # enabling configuration override by clearml
**ClearML** tracks and reports each instance of the preprocessing Task.
The raw data appears as a table in **RESULTS** **>** **PLOTS**.
These images are from one of the two preprocessing Tasks.
![image](../../../../../img/preprocessing_and_encoding_02.png)
The data after filling NA values is also reported.
![image](../../../../../img/preprocessing_and_encoding_03.png)
After an outcome dictionary (label enumeration) is created, it appears in **ARTIFACTS** **>** **OTHER** **>** **Outcome Dictionary**.
![image](../../../../../img/preprocessing_and_encoding_04.png)
The training and validation data is labeled with the encoding and reported as table.
![image](../../../../../img/preprocessing_and_encoding_05.png)
The column categories are created and uploaded as artifacts, which appear in appears in **ARTIFACTS** **>** **OTHER** **>** **Outcome Dictionary**.
![image](../../../../../img/preprocessing_and_encoding_06.png)
Finally, the training data and validation data are stored as artifacts.
![image](../../../../../img/preprocessing_and_encoding_07.png)
</div>
</details>
### Training step
Each training node depends upon the completion of one preprocessing node. The parameter `parents` is a list of step names indicating all steps that must complete before the new step starts. In this case, `preprocessing_1` must complete before `train_1` begins, and `preprocessing_2` must complete before `train_2` begins.
The ID of a Task whose artifact contains a set of preprocessed data for training will be overridden using the `data_task_id` key. Its value takes the form `${<stage-name>.<part-of-Task>}`. In this case, `${preprocessing_1.id}` is the ID of one of the preprocessing node Tasks. In this way, each training Task consumes its own set of data.
pipe.add_step(name='train_1', parents=['preprocessing_1'],
base_task_project='Tabular Example', base_task_name='tabular prediction',
parameter_override={'General/data_task_id': '${preprocessing_1.id}'})
pipe.add_step(name='train_2', parents=['preprocessing_2'],
base_task_project='Tabular Example', base_task_name='tabular prediction',
parameter_override={'General/data_task_id': '${preprocessing_2.id}'})
<details className="cml-expansion-panel info">
<summary className="cml-expansion-panel-summary">ClearML tracks and reports the training step</summary>
<div className="cml-expansion-panel-content">
In the training Task, the ``data_task_id`` parameter value is overridden. This allows the pipeline controller to pass a
different Task ID to each instance of training, where each Task has an artifact containing different data.
configuration_dict = {'data_task_id': 'b605d76398f941e69fc91b43420151d2',
'number_of_epochs': 15, 'batch_size': 100, 'dropout': 0.3, 'base_lr': 0.1}
configuration_dict = task.connect(configuration_dict) # enabling configuration override by clearml
**ClearML** tracks and reports the training step with each instance of the newly cloned and executed training Task.
**ClearML** automatically logs training loss and learning. They appear in **RESULTS** **>** **SCALARS**.
The following images show one of the two training Tasks.
![image](../../../../../img/train_tabular_predictor_04.png)
Parameter dictionaries appear in the **General** subsection.
![image](../../../../../img/train_tabular_predictor_01.png)
The TensorFlow Definitions appear in the **TF_DEFINE** subsection.
![image](../../../../../img/train_tabular_predictor_02.png)
</div>
</details>
### Best model step
The best model step depends upon both training nodes completing and takes the two training node Task IDs to override.
pipe.add_step(name='pick_best', parents=['train_1', 'train_2'],
base_task_project='Tabular Example', base_task_name='pick best model',
parameter_override={'General/train_tasks_ids': '[${train_1.id}, ${train_2.id}]'})
The IDs of the training Tasks from the steps named `train_1` and `train_2` are passed to the best model Task. They take the form `${<stage-name>.<part-of-Task>}`.
<details className="cml-expansion-panel info">
<summary className="cml-expansion-panel-summary">ClearML tracks and reports the best model step</summary>
<div className="cml-expansion-panel-content">
In the best model Task, the `train_tasks_ids` parameter is overridden with the Task IDs of the two training tasks.
configuration_dict = {'train_tasks_ids': ['c9bff3d15309487a9e5aaa00358ff091', 'c9bff3d15309487a9e5aaa00358ff091']}
configuration_dict = task.connect(configuration_dict) # enabling configuration override by clearml
The logs shows the Task ID and accuracy for the best model in **RESULTS** **>** **LOGS**.
![image](../../../../../img/tabular_training_pipeline_02.png)
In **ARTIFACTS** **>** **Output Model** is link to the model details.
![image](../../../../../img/tabular_training_pipeline_03.png)
The model details appear in the **MODELS** table **>** **>GENERAL**.
![image](../../../../../img/tabular_training_pipeline_04.png)
</div>
</details>
### Pipeline start, wait, and cleanup
Once all steps are added to the pipeline, start it. Wait for it to complete. Finally, cleanup the pipeline processes.
# Starting the pipeline (in the background)
pipe.start()
# Wait until pipeline terminates
pipe.wait()
# cleanup everything
pipe.stop()
<details className="cml-expansion-panel info">
<summary className="cml-expansion-panel-summary">ClearML tracks and reports the pipeline's execution</summary>
<div className="cml-expansion-panel-content">
ClearML reports the pipeline with its steps in **RESULTS** **>** **PLOTS**.
![image](../../../../../img/tabular_training_pipeline_01.png)
By hovering over a step or path between nodes, you can view information about it.
![image](../../../../../img/tabular_training_pipeline_06.png)
</div>
</details>
## Running the pipeline
**To run the pipeline:**
1. Download the data by running the notebook [download_and_split.ipynb](https://github.com/allegroai/clearml/blob/master/examples/frameworks/pytorch/notebooks/table/download_and_split.ipynb).
1. Run the script for each of the steps, if the script has not run once before.
* [preprocessing_and_encoding.ipynb](https://github.com/allegroai/clearml/blob/master/examples/frameworks/pytorch/notebooks/table/preprocessing_and_encoding.ipynb)
* [train_tabular_predictor.ipynb](https://github.com/allegroai/clearml/blob/master/examples/frameworks/pytorch/notebooks/table/train_tabular_predictor.ipynb)
* [pick_best_model.ipynb](https://github.com/allegroai/clearml/blob/master/examples/frameworks/pytorch/notebooks/table/pick_best_model.ipynb).
1. Run the pipeline controller one of the following two ways:
* Run the notebook [tabular_ml_pipeline.ipynb](https://github.com/allegroai/clearml/blob/master/examples/frameworks/pytorch/notebooks/table/tabular_ml_pipeline.ipynb).
* Remotely execute the Task - If the Task `tabular training pipeline` which is associated with the project `Tabular Example` already exists in **ClearML Server**, clone it and enqueue it to execute.
:::note
If you enqueue a Task, a worker must be listening to that queue for the Task to execute.
:::

View File

@@ -0,0 +1,37 @@
---
title: Text Classification - Jupyter Notebook
---
The example [text_classification_AG_NEWS.ipynb](https://github.com/allegroai/clearml/blob/master/examples/frameworks/pytorch/notebooks/text/text_classification_AG_NEWS.ipynb)
demonstrates using Jupyter Notebook for **ClearML**, and the integration of **ClearML** into code which trains a network
to classify text in the `torchtext` [AG_NEWS](https://pytorch.org/text/stable/datasets.html#ag-news) dataset, and then applies the model to predict the classification of sample text. **ClearML** automatically logs the scalar and console output by calling TensorBoard methods. In the example, we explicitly log parameters with the Task. When the script runs, it creates an experiment named `text classifier` which is associated with the `Text Example` project.
## Scalars
Accuracy, learning rate, and training loss appear in **RESULTS** **>** **SCALARS**, along with the resource utilization plots, which are titled **:monitor: machine**.
![image](../../../../../img/text_classification_AG_NEWS_03.png)
## Hyperparameters
**ClearML** automatically logs the command line options, because the example code uses `argparse`. A parameter dictionary
is logged by connecting it to the Task using a call to the [Task.connect](../../../../../references/sdk/task.md#connect)
method.
configuration_dict = {'number_of_epochs': 6, 'batch_size': 16, 'ngrams': 2, 'base_lr': 1.0}
configuration_dict = task.connect(configuration_dict) # enabling configuration override by clearml
Command line options appear in **CONFIGURATIONS** **>** **HYPER PARAMETERS** **>** **Args**.
![image](../../../../../img/text_classification_AG_NEWS_01.png)
Parameter dictionaries appear in the **General** subsection.
![image](../../../../../img/text_classification_AG_NEWS_01a.png)
## Log
Text printed to the console for training progress, as well as all other console output, appear in **RESULTS** **>** **LOG**.
![image](../../../../../img/text_classification_AG_NEWS_02.png)

View File

@@ -0,0 +1,76 @@
---
title: PyTorch Distributed
---
The [pytorch_distributed_example.py](https://github.com/allegroai/clearml/blob/master/examples/frameworks/pytorch/pytorch_distributed_example.py)
script demonstrates integrating **ClearML** into a code that uses the [PyTorch Distributed Communications Package](https://pytorch.org/docs/stable/distributed.html)
(`torch.distributed`).
The script does the following:
1. It initializes a main Task and spawns subprocesses, each for an instance of that Task.
1. The Task in each subprocess trains a neural network over a partitioned dataset (the torchvision built-in [MNIST](https://pytorch.org/vision/stable/datasets.html#mnist)
dataset), and reports the following to the main Task:
* Artifacts - A dictionary containing different key-value pairs is uploaded from the Task in each subprocess to the main Task.
* Scalars - Loss reported as a scalar during training in each subprocess Task is logged in the main Task.
* Hyperparameters - Hyperparameters created in each subprocess Task are added to the main Task's hyperparametersy.
Each Task in a subprocess references the main Task by calling [Task.current_task](../../../references/sdk/task.md#taskcurrent_task),
which always returns the main Task.
1. When the script runs, it creates an experiment named `test torch distributed` which is associated with the `examples` project in the **ClearML Web UI**.
### Artifacts
The example uploads a dictionary as an artifact in the main Task by calling the [Task.upload_artifact](../../../references/sdk/task.md#upload_artifact)
method on `Task.current_task` (the main Task). The dictionary contains the `dist.rank` of the subprocess, making each unique.
Task.current_task().upload_artifact(
'temp {:02d}'.format(dist.get_rank()), artifact_object={'worker_rank': dist.get_rank()})
All of these artifacts appear in the main Task, **ARTIFACTS** **>** **OTHER**.
![image](../../../img/examples_pytorch_distributed_example_09.png)
## Scalars
We report loss to the main Task by calling the [Logger.report_scalar](../../../references/sdk/logger.md#report_scalar) method on `Task.current_task().get_logger`, which is the logger for the main Task. Since we call `Logger.report_scalar` with the same title (`loss`), but a different series name (containing the subprocess' `rank`), all loss scalar series are logged together.
Task.current_task().get_logger().report_scalar(
'loss', 'worker {:02d}'.format(dist.get_rank()), value=loss.item(), iteration=i)
The single scalar plot for loss appears in **RESULTS** **>** **SCALARS**.
![image](../../../img/examples_pytorch_distributed_example_08.png)
## Hyperparameters
**ClearML** automatically logs the command line options defined using `argparse`.
A parameter dictionary is logged by connecting it to the Task using a call to the [Task.connect](../../../references/sdk/task.md#connect)
method.
```python
param = {'worker_{}_stuff'.format(dist.get_rank()): 'some stuff ' + str(randint(0, 100))}
Task.current_task().connect(param)
```
Command line options appear in **CONFIGURATIONS** **>** **HYPER PARAMETERS** **>** **Args**.
![image](../../../img/examples_pytorch_distributed_example_01.png)
Parameter dictionaries appear in the **General** section of **HYPER PARAMETERS**.
```python
param = {'worker_{}_stuff'.format(dist.get_rank()): 'some stuff ' + str(randint(0, 100))}
Task.current_task().connect(param)
```
![image](../../../img/examples_pytorch_distributed_example_02.png)
## Log
Output to the console, including the text messages printed from the main Task object and each subprocess, appears in **RESULTS** **>** **LOG**.
![image](../../../img/examples_pytorch_distributed_example_06.png)

View File

@@ -0,0 +1,38 @@
---
title: PyTorch with Matplotlib
---
The [pytorch_matplotlib.py](https://github.com/allegroai/clearml/blob/master/examples/frameworks/pytorch/pytorch_matplotlib.py)
example demonstrates the integration of **ClearML** into code that uses PyTorch and Matplotlib.
The example does the following:
1. The script calls Matplotlib methods to show images, each with a different title.
1. **ClearML** automatically logs the images as debug samples.
1. When the script runs, it creates an experiment named `pytorch with matplotlib example`, which is associated with the
`examples` project.
The images shown in the example script's `imshow` function appear according to metric in **RESULTS** **>** **DEBUG SAMPLES**.
![image](../../../img/examples_pytorch_matplotlib_02.png)
Select a debug sample by metric.
![image](../../../img/examples_pytorch_matplotlib_02a.png)
Open the debug sample in the image viewer.
![image](../../../img/examples_pytorch_matplotlib_02b.png)

View File

@@ -0,0 +1,65 @@
---
title: PyTorch MNIST
---
The [pytorch_mnist.py](https://github.com/allegroai/clearml/blob/master/examples/frameworks/pytorch/pytorch_mnist.py) example
demonstrates the integration of **ClearML** into code that uses PyTorch.
The example script does the following:
* Trains a simple deep neural network on the PyTorch built-in [MNIST](https://pytorch.org/docs/stable/torchvision/datasets.html#mnist)
dataset.
* Uses **ClearML** automatic logging.
* Calls the [Logger.report_scalar](../../../references/sdk/logger.md#report_scalar) method to demonstrate explicit reporting and explicit reporting,
which allows adding customized reporting to the code.
* Creates an experiment named `pytorch mnist train`, which is associated with the `examples` project.
## Scalars
In the example script's `train` function, the following code explicitly reports scalars to **ClearML**:
```python
Logger.current_logger().report_scalar(
"train", "loss", iteration=(epoch * len(train_loader) + batch_idx), value=loss.item())
```
In the `test` method, the code explicitly reports `loss` and `accuracy` scalars.
```python
Logger.current_logger().report_scalar(
"test", "loss", iteration=epoch, value=test_loss)
Logger.current_logger().report_scalar(
"test", "accuracy", iteration=epoch, value=(correct / len(test_loader.dataset)))
```
These scalars can be visualized in plots, which appear in the **ClearML web UI**, in the experiment's
page **>** **RESULTS** **>** **SCALARS**.
![image](../../../img/examples_pytorch_mnist_07.png)
## Hyperparameters
**ClearML** automatically logs command line options defined with `argparse`. They appear in **CONFIGURATIONS** **>** **HYPER PARAMETERS** **>** **Args**.
![image](../../../img/examples_pytorch_mnist_01.png)
## Log
Text printed to the console for training progress, as well as all other console output, appear in **RESULTS** **>** **LOG**.
![image](../../../img/examples_pytorch_mnist_06.png)
## Artifacts
Model artifacts associated with the experiment appear in the info panel of the **EXPERIMENTS** tab and in
the info panel of the **MODELS** tab.
The experiment info panel shows model tracking, including the model name and design (in this case, no design was stored).
![image](../../../img/examples_pytorch_mnist_02.png)
The model info panel contains the model details, including:
* Model URL
* Framework
* Snapshot locations.
![image](../../../img/examples_pytorch_mnist_03.png)

View File

@@ -0,0 +1,56 @@
---
title: PyTorch with TensorBoard
---
The [pytorch_tensorboard.py](https://github.com/allegroai/clearml/blob/master/examples/frameworks/pytorch/pytorch_tensorboard.py)
example demonstrates the integration of **ClearML** into code that uses PyTorch and TensorBoard.
The example does the following:
* Trains a simple deep neural network on the PyTorch built-in [MNIST](https://pytorch.org/vision/stable/datasets.html#mnist)
dataset.
* Creates a TensorBoard `SummaryWriter` object to log:
* Scalars during training.
* Scalars and debug samples during testing.
* Test text message to the console (a test message to demonstrate **ClearML**'s automatic logging).
* Creates an experiment named `pytorch with tensorboard`, which is associated with the `examples` project.
## Scalars
In the example script, the `train` and `test` functions call the TensorBoard `SummaryWriter.add_scalar` method to log loss.
These scalars, along with the resource utilization plots, which are titled **:monitor: machine**, appear in the experiment's page in the **ClearML web UI** under **RESULTS** **>** **SCALARS**,
![image](../../../img/examples_pytorch_tensorboard_07.png)
## Debug samples
**ClearML** automatically tracks images and text output to TensorFlow. They appear in **RESULTS** **>** **DEBUG SAMPLES**.
![image](../../../img/examples_pytorch_tensorboard_08.png)
## Hyperparameters
**ClearML** automatically logs TensorFlow Definitions. They appear in **CONFIGURATIONS** **>** **HYPER PARAMETERS** **>** **TF_DEFINE**.
![image](../../../img/examples_pytorch_tensorboard_01.png)
## Log
Text printed to the console for training progress, as well as all other console output, appear in **RESULTS** **>** **LOG**.
![image](../../../img/examples_pytorch_tensorboard_06.png)
## Artifacts
Model artifacts associated with the experiment appear in the info panel of the **EXPERIMENTS** tab and in the info panel
of the **MODELS** tab.
The experiment info panel shows model tracking, including the model name and design (in this case, no design was stored).
![image](../../../img/examples_pytorch_tensorboard_02.png)
The model info panel contains the model details, including:
* Model URL
* Framework
* Snapshot locations.
![image](../../../img/examples_pytorch_tensorboard_03.png)

View File

@@ -0,0 +1,52 @@
---
title: PyTorch TensorBoardX
---
The [pytorch_tensorboardX.py](https://github.com/allegroai/clearml/blob/master/examples/frameworks/tensorboardx/pytorch_tensorboardX.py)
example demonstrates the integration of **ClearML** into code that uses PyTorch and TensorBoardX.
The example does the following:
* Trains a simple deep neural network on the PyTorch built-in [MNIST](https://pytorch.org/vision/stable/datasets.html#mnist)
dataset.
* Creates a TensorBoardX `SummaryWriter` object to log:
* Scalars during training
* Scalars and debug samples during testing
* A test text message to the console (a test message to demonstrate **ClearML** automatic logging).
* Creates an experiment named `pytorch with tensorboardX`, which is associated with the `examples` project in the **ClearML Web UI**.
## Scalars
The loss and accuracy metric scalar plots, along with the resource utilization plots, which are titled **:monitor: machine**,
appear in the experiment's page in the **web UI**, under **RESULTS** **>** **SCALARS**.
.
![image](../../../img/examples_pytorch_tensorboardx_03.png)
## Hyperparameters
**ClearML** automatically logs command line options defined with `argparse`. They appear in **CONFIGURATIONS** **>**
**HYPER PARAMETERS** **>** **Args**.
![image](../../../img/examples_pytorch_tensorboardx_01.png)
## Log
Text printed to the console for training progress, as well as all other console output, appear in **RESULTS** **>** **LOG**.
![image](../../../img/examples_pytorch_tensorboardx_02.png)
## Artifacts
Model artifacts associated with the experiment appear in the info panel of the **EXPERIMENTS** tab and in the info panel
of the **MODELS** tab.
The experiment info panel shows model tracking, including the model name and design (in this case, no design was stored).
![image](../../../img/examples_pytorch_tensorboardx_04.png)
The model info panel contains the model details, including:
* Model URL
* Framework
* Snapshot locations.
![image](../../../img/examples_pytorch_tensorboardx_05.png)

View File

@@ -0,0 +1,21 @@
---
title: PyTorch TensorBoard Toy
---
The [tensorboard_toy_pytorch.py](https://github.com/allegroai/clearml/blob/master/examples/frameworks/pytorch/tensorboard_toy_pytorch.py)
example demonstrates the integration of **ClearML** into code, which creates a TensorBoard `SummaryWriter` object to log
debug sample images. When the script runs, it creates an experiment named `pytorch tensorboard toy example`, which is
associated with the `examples` project.
## Debug samples
The debug sample images appear according to metric, in the experiment page in the **ClearML web UI** under **RESULTS**
**>** **DEBUG SAMPLES**.
![image](../../../img/examples_tensorboard_toy_pytorch_02.png)
## Hyperparameters
**ClearML** automatically logs TensorFlow Definitions. They appear in **CONFIGURATIONS** **>** **HYPER PARAMETERS** **>** **TF_DEFINE**.
![image](../../../img/examples_tensorboard_toy_pytorch_00.png)

View File

@@ -0,0 +1,30 @@
---
title: scikit-learn with Joblib
---
The [sklearn_joblib_example.py](https://github.com/allegroai/clearml/blob/master/examples/frameworks/scikit-learn/sklearn_joblib_example.py)
demonstrates the integration of **ClearML** into code that uses `scikit-learn` and `joblib` to store a model and model snapshots,
and `matplotlib` to create a scatter diagram. When the script runs, it creates an experiment named `scikit-learn joblib examplescikit-learn joblib example`, which is associated with the `examples` project.
## Plots
**ClearML** automatically logs the scatter plot, which appears in the experiment's page in the **ClearML web UI**, under
**RESULTS** **>** **PLOTS**.
![image](../../../img/examples_sklearn_joblib_example_06.png)
## Artifacts
Model artifacts associated with the experiment appear in the info panel of the **EXPERIMENTS** tab and in the info panel
of the **MODELS** tab.
The experiment info panel shows model tracking, including the model name and design (in this case, no design was stored).
![image](../../../img/examples_sklearn_joblib_example_01.png)
The model info panel contains the model details, including:
* Model URL
* Framework
* Snapshot locations.
![image](../../../img/examples_sklearn_joblib_example_02.png)

View File

@@ -0,0 +1,18 @@
---
title: scikit-learn with Matplotlib
---
The [sklearn_matplotlib_example.py](https://github.com/allegroai/clearml/blob/master/examples/frameworks/scikit-learn/sklearn_matplotlib_example.py)
script demonstrates the integration of **ClearML** into code that uses `scikit-learn` and `matplotlib`.
The example does the following:
* Uses `scikit-learn` to determine cross-validated training and test scores.
* Uses `matplotlib` to plot the learning curves.
* Through ClearML, automatically logs the scatter diagrams for the learning curves.
* Creates an experiment named `scikit-learn matplotlib example` which is associated with the `examples` project.
## Plots
The learning curve plots appear in the **ClearML web UI** under **RESULTS** **>** **PLOTS**.
![image](../../../img/examples_sklearn_matplotlib_example_01.png)

View File

@@ -0,0 +1,51 @@
---
title: TensorBoardX
---
The [pytorch_tensorboardX.py](https://github.com/allegroai/clearml/blob/master/examples/frameworks/tensorboardx/pytorch_tensorboardX.py)
example demonstrates the integration of **ClearML** into code that uses PyTorch and TensorBoardX.
The script does the following:
1. Trains a simple deep neural network on the PyTorch built-in [MNIST](https://pytorch.org/docs/stable/torchvision/datasets.html#mnist) dataset.
1. Creates a TensorBoardX `SummaryWriter` object to log:
* Scalars during training
* Scalars and debug samples during testing
* A test text message to the console (a test message to demonstrate **ClearML**).
1. Creates an experiment named `pytorch with tensorboardX` which is associated with the `examples` project in the **ClearML Web UI**.
## Scalars
The loss and accuracy metric scalar plots appear in the experiment's page in the **ClearML web UI**, under
**RESULTS** **>** **SCALARS**. The also includes resource utilization plots, which are titled **:monitor: machine**.
![image](../../../img/examples_pytorch_tensorboardx_03.png)
## Hyperparameters
**ClearML** automatically logs command line options defined with `argparse`. They appear in **CONFIGURATIONS** **>**
**HYPER PARAMETERS** **>** **Args**.
![image](../../../img/examples_pytorch_tensorboardx_01.png)
## Log
Text printed to the console for training progress, as well as all other console output, appear in **RESULTS** **>** **LOG**.
![image](../../../img/examples_pytorch_tensorboardx_02.png)
## Artifacts
Model artifacts associated with the experiment appear in the info panel of the **EXPERIMENTS** tab and in the info panel
of the **MODELS** tab.
The experiment info panel shows model tracking, including the model name and design (in this case, no design was stored).
![image](../../../img/examples_pytorch_tensorboardx_04.png)
The model info panel contains the model details, including:
* Model URL
* Framework
* Snapshot locations.
![image](../../../img/examples_pytorch_tensorboardx_05.png)

View File

@@ -0,0 +1,78 @@
---
title: Keras Tuner Integration
---
Integrate **ClearML** into code that uses [Keras Tuner](https://www.tensorflow.org/tutorials/keras/keras_tuner). By
specifying `ClearMLTunerLogger` (see [kerastuner.py](https://github.com/allegroai/clearml/blob/master/clearml/external/kerastuner.py))
as the Keras Tuner logger, **ClearML** automatically logs scalars and hyperparameter optimization.
## ClearMLTunerLogger
Take a look at [keras_tuner_cifar.py](https://github.com/allegroai/clearml/blob/master/examples/frameworks/kerastuner/keras_tuner_cifar.py)
example script, which demonstrates the integration of **ClearML** in a code that uses Keras Tuner.
The script does the following:
1. Creates a `Hyperband` object, which uses Keras Tuner's `Hyperband` tuner. It finds the best hyperparameters to train a
network on a CIFAR10 dataset.
1. When the `Hyperband` object is created, instantiates a `ClearMLTunerLogger` object and assigns it to the `Hyperband` logger.
The `ClearMLTunerLogger` class provides the required binding for **ClearML** automatic logging.
```python
tuner = kt.Hyperband(
build_model,
project_name='kt examples',
logger=ClearMLTunerLogger(),
objective='val_accuracy',
max_epochs=10,
hyperband_iterations=6)
```
When the script runs, it logs:
* Tabular summary of hyperparameters tested and their metrics by trial ID
* Scalar plot showing metrics for all runs
* Summary plot
* Output model with configuration and snapshot location.
## Scalars
**ClearML** logs the scalars from training each network. They appear in the project's page in the **ClearML web UI**, under
**RESULTS** **>** **SCALARS**.
![image](../../../img/integration_keras_tuner_06.png)
## Summary of hyperparameter optimization
**ClearML** automatically logs the parameters of each experiment run in the hyperparameter search. They appear in tabular
form in **RESULTS** **>** **PLOTS**.
![image](../../../img/integration_keras_tuner_07.png)
## Artifacts
**ClearML** automatically stores the output model. It appears in **ARTIFACTS** **>** **Output Model**.
![image](../../../img/integration_keras_tuner_03.png)
Model details, such as snap locations, appear in the **MODELS** tab.
![image](../../../img/integration_keras_tuner_04.png)
The model configuration is stored with the model.
![image](../../../img/integration_keras_tuner_05.png)
## Configuration objects
### Hyperparameters
**ClearML** automatically logs the TensorFlow Definitions, which appear in **RESULTS** **>** **CONFIGURATION** **>** **HYPER PARAMETERS**.
![image](../../../img/integration_keras_tuner_01.png)
### Configuration
The Task configuration appears in **RESULTS** **>** **CONFIGURATION** **>** **General**.
![image](../../../img/integration_keras_tuner_02.png)

View File

@@ -0,0 +1,86 @@
---
title: Manual Model Upload
---
The [manual_model_upload.py](https://github.com/allegroai/clearml/blob/master/examples/frameworks/tensorflow/manual_model_upload.py)
example demonstrates **ClearML**'s tracking of a manually configured model created with TensorFlow, including:
* Model checkpoints (snapshots)
* Hyperparameters
* Output to the console.
When the script runs, it creates an experiment named `Model configuration and upload`, which is associated with the `examples` project.
Configure **ClearML** for model checkpoints (model snapshot) storage in any of the following ways ([debug sample](../../../references/sdk/logger.md#set_default_upload_destination)
storage is different):
* In the configuration file, set [default_output_uri](../../../configs/clearml_conf.md#sdkdevelopment).
* In code, when [initializing a Task](../../../references/sdk/task.md#taskinit), use the `output_uri` parameter.
* In the **ClearML Web UI**, when [modifying an experiment](../../../webapp/webapp_exp_tuning.md#output-destination).
## Configuration
This example shows two ways to connect a configuration, using the [Task.connect_configuration](../../../references/sdk/task.md#connect_configuration) method:
* Connect a configuration file by providing the file's path. **ClearML Server** stores a copy of the file.
```python
# Connect a local configuration file
config_file = os.path.join('..', '..', 'reporting', 'data_samples', 'sample.json')
config_file = task.connect_configuration(config_file)
```
* Create a configuration dictionary and provide the dictionary.
```python
model_config_dict = {
'value': 13.37,
'dict': {'sub_value': 'string', 'sub_integer': 11},
'list_of_ints': [1, 2, 3, 4],
}
model_config_dict = task.connect_configuration(model_config_dict)
```
If the configuration changes, **ClearML** track it.
```python
model_config_dict['new value'] = 10
model_config_dict['value'] *= model_config_dict['new value']
```
The configuration appears in the experiment's page in the **ClearML web UI**, under **CONFIGURATIONS** **>**
**CONFIGURATION OBJECTS**.
![image](../../../img/examples_manual_model_upload_01.png)
The output model's configuration appears in **ARTIFACTS** **>** **Output Model**.
## Artifacts
Model artifacts associated with the experiment appear in the info panel of the **EXPERIMENTS** tab) and in the info panel
of the **MODELS** tab.
The experiment info panel shows model tracking, including the model name and design (in this case, no design was stored).
![image](../../../img/examples_manual_model_upload_30.png)
The model info panel contains the model details, including:
* Model design
* label enumeration
* Model URL
* Framework
* Snapshot locations.
### General model information
![image](../../../img/examples_pytorch_manual_model_upload_03.png)
### Label enumeration
Connect a label enumeration dictionary by calling the [Task.connect_label_enumeration](../../../references/sdk/task.md#connect_label_enumeration) method.
```python
# store the label enumeration of the training model
labels = {'background': 0, 'cat': 1, 'dog': 2}
task.connect_label_enumeration(labels)
```
![image](../../../img/examples_pytorch_manual_model_upload_05.png)

View File

@@ -0,0 +1,38 @@
---
title: TensorBoard PR Curve
---
The [tensorboard_pr_curve.py](https://github.com/allegroai/clearml/blob/master/examples/frameworks/tensorflow/tensorboard_pr_curve.py)
example demonstrates the integration of **ClearML** into code that uses TensorFlow and TensorBoard.
The example script does the following:
* Creates three classes, R, G, and B, and generates colors within the RGB space from normal distributions. The true
label of each random color is associated with the normal distribution that generated it.
* Computes the probability that each color belongs to the class, using three other normal distributions.
* Generate PR curves using those probabilities.
* Creates a summary per class using [tensorboard.plugins.pr_curve.summary](https://github.com/tensorflow/tensorboard/blob/master/tensorboard/plugins/pr_curve/summary.py),
* Automatically logs the TensorBoard output, TensorFlow Definitions, and output to the console, using **ClearML**.
* When the script runs, Creates an experiment named `tensorboard pr_curve`, which is associated with the `examples` project.
## Plots
In the **ClearML Web UI**, the PR Curve summaries appear in the experiment's page under **RESULTS** **>** **PLOTS**.
* Blue PR curves
![image](../../../img/examples_tensorboard_pr_curve_01.png)
* Green PR curves
![image](../../../img/examples_tensorboard_pr_curve_02.png)
* Red PR curves
![image](../../../img/examples_tensorboard_pr_curve_03.png)
## Hyperparameters
**ClearML** automatically logs TensorFlow Definitions. They appear in **CONFIGURATIONS** **>** **HYPER PARAMETERS** **>** **TF_DEFINE**.
![image](../../../img/examples_tensorboard_pr_curve_04.png)
## Log
All other console output appears in **RESULTS** **>** **LOG**.
![image](../../../img/examples_tensorboard_pr_curve_05.png)

View File

@@ -0,0 +1,45 @@
---
title: TensorBoard Toy
---
The [tensorboard_toy.py](https://github.com/allegroai/clearml/blob/master/examples/frameworks/tensorflow/tensorboard_toy.py)
example demonstrates **ClearML**'s automatic logging of TensorBoard scalars, histograms, images, and text, as well as
all other console output and TensorFlow Definitions.
The script uses `tf.summary.create_file_writer` with the following:
* `tf.summary.histogram`
* `tf.summary.scalar`
* `tf.summary.text`
* `tf.summary.image`
When the script runs, it creates an experiment named `tensorboard toy example`, which is associated with the `examples`
project.
## Scalars
The `tf.summary.scalar` output appears in the experiment's page in the **ClearML web UI** under **RESULTS** **>**
**SCALARS**. Resource utilization plots, which are titled **:monitor: machine**, also appear in the **SCALARS** tab.
![image](../../../img/examples_tensorboard_toy_03.png)
## Plots
The `tf.summary.histogram` output appears in **RESULTS** **>** **PLOTS**.
![image](../../../img/examples_tensorboard_toy_04.png)
## Debug samples
**ClearML** automatically tracks images and text output to TensorFlow. They appear in **RESULTS** **>** **DEBUG SAMPLES**.
![image](../../../img/examples_tensorboard_toy_05.png)
## Hyperparameters
**ClearML** automatically logs TensorFlow Definitions. They appear in **CONFIGURATIONS** **>** **HYPER PARAMETERS** **>**
**TF_DEFINE**.
![image](../../../img/examples_tensorboard_toy_01.png)

View File

@@ -0,0 +1,54 @@
---
title: TensorFlow MNIST
---
The [tensorflow_mnist.py](https://github.com/allegroai/clearml/blob/master/examples/frameworks/tensorflow/tensorflow_mnist.py)
example demonstrates the integration of **ClearML** into code that uses TensorFlow and Keras to train a neural network on
the Keras built-in [MNIST](https://www.tensorflow.org/api_docs/python/tf/keras/datasets/mnist) handwritten digits dataset.
The script builds a TensorFlow Keras model, and trains and tests it with the following:
* Loss objective function - [tf.keras.metrics.SparseCategoricalCrossentropy](https://www.tensorflow.org/api_docs/python/tf/keras/losses/SparseCategoricalCrossentropy)
* Accuracy metric - [tf.keras.metrics.SparseCategoricalAccuracy](https://www.tensorflow.org/api_docs/python/tf/keras/metrics/SparseCategoricalAccuracy)
* Model checkpointing - [tf.clearml.Checkpoint](https://www.tensorflow.org/api_docs/python/tf/train/Checkpoint?hl=ca) and [tf.train.CheckpointManager](https://www.tensorflow.org/api_docs/python/tf/train/CheckpointManager?hl=ca)
When the script runs, it creates an experiment named `Tensorflow v2 mnist with summaries`, which is associated with the
`examples` project.
## Scalars
The loss and accuracy metric scalar plots appear in the experiment's page in the **ClearML web UI** under **RESULTS**
**>** **SCALARS**. Resource utilization plots, which are titled **:monitor: machine**, also appear in the *SCALARS** tab.
![image](../../../img/examples_tensorflow_mnist_06.png)
## Hyperparameters
**ClearML** automatically logs TensorFlow Definitions. They appear in **CONFIGURATIONS** **>** **HYPER PARAMETERS**
**>** **TF_DEFINE**.
![image](../../../img/examples_tensorflow_mnist_01.png)
## Log
All console output appears in **RESULTS** **>** **LOG**.
![image](../../../img/examples_tensorflow_mnist_05.png)
## Artifacts
Model artifacts associated with the experiment appear in the info panel of the **EXPERIMENTS** tab and in the info panel
of the **MODELS** tab.
The experiment info panel shows model tracking, including the model name and design (in this case, no design was stored).
![image](../../../img/examples_tensorflow_mnist_03.png)
The model info panel contains the model details, including:
* Model design
* Label enumeration
* Model URL
* Framework
* Snapshot locations.
![image](../../../img/examples_tensorflow_mnist_10.png)

View File

@@ -0,0 +1,55 @@
---
title: XGBoost
---
The [xgboost_sample.py](https://github.com/allegroai/clearml/blob/master/examples/frameworks/xgboost/xgboost_sample.py)
example demonstrates integrating **ClearML** into code that trains a network on the scikit-learn [iris](https://scikit-learn.org/stable/modules/generated/sklearn.datasets.load_iris.html#sklearn.datasets.load_iris)
classification dataset, using XGBoost to do the following:
* Load a model ([xgboost.Booster.load_model](https://xgboost.readthedocs.io/en/latest/python/python_api.html#xgboost.Booster.load_model))
* Save a model ([xgboost.Booster.save_model](https://xgboost.readthedocs.io/en/latest/python/python_api.html#xgboost.Booster.save_model))
* Dump a model to JSON or text file ([xgboost.Booster.dump_model](https://xgboost.readthedocs.io/en/latest/python/python_api.html#xgboost.Booster.dump_model))
* Plot feature importance ([xgboost.plot_importance](https://xgboost.readthedocs.io/en/latest/python/python_api.html#xgboost.plot_importance))
* Plot a tree ([xgboost.plot_tree](https://xgboost.readthedocs.io/en/latest/python/python_api.html#xgboost.plot_tree))
And using scikit-learn to score accuracy ([sklearn.metrics.accuracy_score](https://scikit-learn.org/stable/modules/generated/sklearn.metrics.accuracy_score.html)).
**ClearML** automatically logs:
* Input model
* Output model
* Model checkpoints (snapshots)
* Feature importance plot
* Tree plot
* Output to console.
When the script runs, it creates an experiment named `XGBoost simple example`, which is associated with the `examples` project.
## Plots
The feature importance plot and tree plot appear in the project's page in the **ClearML web UI**, under **RESULTS** **>**
**PLOTS**.
![image](../../../img/examples_xgboost_sample_06.png)
## Log
All other console output appear in **RESULTS** **>** **LOG**.
![image](../../../img/examples_xgboost_sample_05.png)
## Artifacts
Model artifacts associated with the experiment appear in the info panel of the **EXPERIMENTS** tab and in the info panel
of the **MODELS** tab.
The experiment info panel shows model tracking, including the model name and design (in this case, no design was stored).
![image](../../../img/examples_xgboost_sample_10.png)
The model info panel contains the model details, including:
* Model design
* Label enumeration
* Model URL
* Framework.
![image](../../../img/examples_xgboost_sample_03.png)

View File

@@ -0,0 +1,45 @@
---
title: Integration for PyCharm
---
The **ClearML PyCharm plugin** enables syncing a local execution configuration to a remote execution machine:
* Sync local repository information to a remote debug machine.
* Multiple users can use the same resource for execution without compromising private credentials.
* Run the [ClearML Agent](../../fundamentals/agents_and_queues.md) on default VMs/Containers.
## Installation
**To install the ClearML PyCharm plugin, do the following:**
1. Download the latest plugin version from the [Releases page](https://github.com/allegroai/trains-pycharm-plugin/releases).
1. Install the plugin in PyCharm from local disk:
![image](../../img/examples_ide_pycharm.png)
## Optional: ClearML configuration parameters
:::warning
If you set ClearML configuration parameters (ClearML Server and ClearML credentials) in the plugin, they will override
the settings in the ClearML configuration file.
:::
**To set ClearML configuration parameters:**
1. In PyCharm, open **Settings**.
1. Click **Tools**.
1. Click **ClearML**.
1. Configure ClearML server information:
1. API server (for example: ``http://localhost:8008``)
1. Web server (for example: ``http://localhost:8080``)
1. File server (for example: ``http://localhost:8081``)
1. Add **ClearML** user credentials key/secret.
![image](../../img/clearml_pycharm_plugin/pycharm_config_params.png)

View File

@@ -0,0 +1,105 @@
---
title: Remote Jupyter Tutorial
---
In this tutorial we will learn how to launch a remote interactive session on Jupyter Notebook using `clearml-session`.
We will be using two machines. A local one, where we will be using an interactive session of Jupyter, and a remote machine,
where a `clearml-agent` will run and spin an instance of the remote session.
## Prerequisites
* `clearml-session` package installed (`pip install clearml-session`)
* At least one `clearml-agent` running on a **remote** host. See [installation details](../../clearml_agent.md#installation).
Configure the `clearml-agent` to listen to the `default` queue (`clearml-agent daemon --queue default`)
* An SSH client installed on machine being used. To verify, open terminal, execute `ssh`, and if no error is received,
it should be good to go.
## Steps
1. Execute the `clearml-session` command with the following command line options:
```bash
clearml-session --docker nvidia/cuda:10.1-cudnn7-runtime-ubuntu18.04 --packages "clearml" "tensorflow>=2.2" "keras" --queue default
```
* Enter a docker image `--docker nvidia/cuda:10.1-cudnn7-runtime-ubuntu18.04`
* Enter required python packages `--packages "clearml" "tensorflow>=2.2" "keras"`
* Specify the resource queue `--queue default`.
<br/>
:::note
There is an option to enter a project name using `--project <name>`. If no project is inputted, the default project
name is "DevOps"
:::
1. After launching the command, the `clearml-agent` listening to the `default` queue spins a remote Jupyter environment
with the specifications. It will automatically connect to the docker on the remote machine.
The terminal should return output with the session's configuration details, which should look something like this:
```console
Interactive session config:
{
"base_task_id": null,
"git_credentials": false,
"jupyter_lab": true,
"password": "0879348ae41fb944004ff89b9103f09592ec799f39ae34e5b71afb46976d5c83",
"queue": "default",
"vscode_server": true
}
```
1. Press `Y` when the CLI will ask whether to `Launch interactive session [Y]/n?`. Press 'Y' or 'Enter'.
The terminal should output information regarding the status of the environment-building process, which should look
something like this:
```console
Creating new session
New session created [id=35c0af81ae6541589dbae1efb747f388]
Waiting for remote machine allocation [id=35c0af81ae6541589dbae1efb747f388]
.Status [queued]
...Status [in_progress]
Remote machine allocated
Setting remote environment [Task id=35c0af81ae6541589dbae1efb747f388]
Setup process details: https://app.community.clear.ml/projects/60893b87b0b642679fde00db96e90359/experiments/35c0af81ae6541589dbae1efb747f388/output/log
Waiting for environment setup to complete [usually about 20-30 seconds]
```
Then the CLI will output a link to the ready environment:
```console
Interactive session is running:
SSH: ssh root@localhost -p 8022 [password: c5d19b3c0fa9784ba4f6aeb568c1e036a4fc2a4bc7f9bfc54a2c198d64ceb9c8]
Jupyter Lab URL: http://localhost:8878/?token=ff7e5e8b9e5493a01b1a72530d18181320630b95f442b419
VSCode server available at http://localhost:8898/
```
1. Click on the JupyterLab link, which will open the remote session
1. Now, let's execute some code in the remote session! Open up a new Notebook.
1. In the first cell of the notebook, clone the [ClearML Repo](https://github.com/allegroai/events).
!git clone https://github.com/allegroai/clearml.git
1. In the second cell of the notebook, we are going to run this [script](https://github.com/allegroai/clearml/blob/master/examples/frameworks/keras/keras_tensorboard.py)
from the repository that we cloned.
%run clearml/examples/frameworks/keras/keras_tensorboard.py
Look in the script, and notice that it makes use of ClearML, Keras, and TensorFlow, but we don't need to install these
packages in Jupyter, because we specified them in the `--packages` flag of `clearml-session`.
1. To shut down the remote session, which will free the `clearml-agent` and close the CLI. Enter "Shutdown".
```console
Connection is up and running
Enter "r" (or "reconnect") to reconnect the session (for example after suspend)
Ctrl-C (or "quit") to abort (remote session remains active)
or "Shutdown" to shutdown remote interactive session
```

13
docs/guides/main.md Normal file
View File

@@ -0,0 +1,13 @@
---
id: guidemain
title: Examples
slug: /guides
---
To help learn and use **ClearML**, we provide example scripts that demonstrates how to use ClearML's various features.
Examples scripts are in the [examples](https://github.com/allegroai/clearml/tree/master/examples) folder of the GitHub `clearml`
repository. They are also pre-loaded in the **ClearML Server**:
Each examples folder in the GitHub ``clearml`` repository contains a ``requirements.txt`` file for example scripts in that folder.

View File

@@ -0,0 +1,214 @@
---
title: Hyperparameter Optimization
---
The [hyper_parameter_optimizer.py](https://github.com/allegroai/clearml/blob/master/examples/optimization/hyper-parameter-optimization/hyper_parameter_optimizer.py)
example script demonstrates hyperparameter optimization, which is automated by using **ClearML**
<a class="tr_top_negative" name="strategy"></a>
## Set the search strategy for optimization
A search strategy is required for the optimization, as well as a search strategy optimizer class to implement that strategy.
The following search strategies can be used:
* Optuna hyperparameter optimization - [automation.optuna.optuna.OptimizerOptuna](../../../references/sdk/hpo_optuna_optuna_optimizeroptuna.md).
For more information about Optuna, see the [Optuna](https://optuna.org/) documentation.
* BOHB - [automation.hpbandster.bandster.OptimizerBOHB](../../../references/sdk/hpo_hpbandster_bandster_optimizerbohb.md).
BOHB performs robust and efficient hyperparameter optimization at scale by combining the speed of Hyperband searches
with the guidance and guarantees of convergence of Bayesian Optimization.
**ClearML** implements BOHB for automation with HpBandSter's [bohb.py](https://github.com/automl/HpBandSter/blob/master/hpbandster/optimizers/bohb.py).
For more information about HpBandSter BOHB, see the [HpBandSter](https://automl.github.io/HpBandSter/build/html/index.html)
documentation.
* Random uniform sampling of hyperparameter strategy - [automation.optimization.RandomSearch](../../../references/sdk/hpo_optimization_randomsearch.md)
* Full grid sampling strategy of every hyperparameter combination - Grid search [automation.optimization.GridSearch](../../../references/sdk/hpo_optimization_gridsearch.md).
* Custom - Use a custom class and inherit from the **ClearML** automation base strategy class, automation.optimization.SearchStrategy.
The search strategy class that is chosen will be passed to the [automation.optimization.HyperParameterOptimizer](../../../references/sdk/hpo_optimization_hyperparameteroptimizer.md)
object later.
The example code attempts to import `OptimizerOptuna` for the search strategy. If `clearml.automation.optuna` is not
installed, it attempts to import `OptimizerBOHB`. If `clearml.automation.hpbandster` is not installed, it uses
the `RandomSearch` for the search strategy.
aSearchStrategy = None
if not aSearchStrategy:
try:
from clearml.automation.optuna import OptimizerOptuna
aSearchStrategy = OptimizerOptuna
except ImportError as ex:
pass
if not aSearchStrategy:
try:
from clearml.automation.hpbandster import OptimizerBOHB
aSearchStrategy = OptimizerBOHB
except ImportError as ex:
pass
if not aSearchStrategy:
logging.getLogger().warning(
'Apologies, it seems you do not have \'optuna\' or \'hpbandster\' installed, '
'we will be using RandomSearch strategy instead')
aSearchStrategy = RandomSearch
## Define a callback
When the optimization starts, a callback is provided that returns the best performing set of hyperparameters. In the script,
the `job_complete_callback` function returns the ID of `top_performance_job_id`.
def job_complete_callback(
job_id, # type: str
objective_value, # type: float
objective_iteration, # type: int
job_parameters, # type: dict
top_performance_job_id # type: str
):
print('Job completed!', job_id, objective_value, objective_iteration, job_parameters)
if job_id == top_performance_job_id:
print('WOOT WOOT we broke the record! Objective reached {}'.format(objective_value))
## Initialize the optimization Task
Initialize the Task, which will be stored in **ClearML Server** when the code runs. After the code runs at least once, it
can be [reproduced](../../../webapp/webapp_exp_reproducing.md) and [tuned](../../../webapp/webapp_exp_tuning.md).
We set the Task type to optimizer, and create a new experiment (and Task object) each time the optimizer runs (`reuse_last_task_id=False`).
When the code runs, it creates an experiment named **Automatic Hyper-Parameter Optimization** that is associated with
the project **Hyper-Parameter Optimization**, which can be seen in the **ClearML Web UI**.
# Connecting CLEARML
task = Task.init(project_name='Hyper-Parameter Optimization',
task_name='Automatic Hyper-Parameter Optimization',
task_type=Task.TaskTypes.optimizer,
reuse_last_task_id=False)
## Set up the arguments
Create an arguments dictionary that contains the ID of the Task to optimize, and a Boolean indicating whether the
optimizer will run as a service, see [Running as a service](#running-as-a-service).
In this example, an experiment named **Keras HP optimization base** is being optimized. The experiment must have run at
least once so that it is stored in **ClearML Server**, and, therefore, can be cloned.
Since the arguments dictionary is connected to the Task, after the code runs once, the `template_task_id` can be changed
to optimize a different experiment, see [tuning experiments](../../../webapp/webapp_exp_tuning.md).
# experiment template to optimize in the hyper-parameter optimization
args = {
'template_task_id': None,
'run_as_service': False,
}
args = task.connect(args)
# Get the template task experiment that we want to optimize
if not args['template_task_id']:
args['template_task_id'] = Task.get_task(
project_name='examples', task_name='Keras HP optimization base').id
## Instantiate the optimizer object
Instantiate an [automation.optimization.HyperParameterOptimizer](../../../references/sdk/hpo_optimization_hyperparameteroptimizer.md)
object, setting the optimization parameters, beginning with the ID of the experiment to optimize.
an_optimizer = HyperParameterOptimizer(
# This is the experiment we want to optimize
base_task_id=args['template_task_id'],
Set the hyperparameter ranges to sample, instantiating them as **ClearML** automation objects using [automation.parameters.UniformIntegerParameterRange](../../../references/sdk/hpo_parameters_uniformintegerparameterrange.md)
and [automation.parameters.DiscreteParameterRange](../../../references/sdk/hpo_parameters_discreteparameterrange.md).
hyper_parameters=[
UniformIntegerParameterRange('layer_1', min_value=128, max_value=512, step_size=128),
UniformIntegerParameterRange('layer_2', min_value=128, max_value=512, step_size=128),
DiscreteParameterRange('batch_size', values=[96, 128, 160]),
DiscreteParameterRange('epochs', values=[30]),
],
Set the metric to optimize and the optimization objective.
objective_metric_title='val_acc',
objective_metric_series='val_acc',
objective_metric_sign='max',
Set the number of concurrent Tasks.
max_number_of_concurrent_tasks=2,
Set the optimization strategy, see [Set the search strategy for optimization](#set-the-search-strategy-for-optimization).
optimizer_class=aSearchStrategy,
Specify the queue to use for remote execution. This is overridden if the optimizer runs as a service.
execution_queue='1xGPU',
Specify the remaining parameters, including the time limit per Task (minutes), period for checking the optimization (minutes), maximum number of jobs to launch, minimum and maximum number of iterations for each Task.
# Optional: Limit the execution time of a single experiment, in minutes.
# (this is optional, and if using OptimizerBOHB, it is ignored)
time_limit_per_job=10.,
# Check the experiments every 6 seconds is way too often, we should probably set it to 5 min,
# assuming a single experiment is usually hours...
pool_period_min=0.1,
# set the maximum number of jobs to launch for the optimization, default (None) unlimited
# If OptimizerBOHB is used, it defined the maximum budget in terms of full jobs
# basically the cumulative number of iterations will not exceed total_max_jobs * max_iteration_per_job
total_max_jobs=10,
# This is only applicable for OptimizerBOHB and ignore by the rest
# set the minimum number of iterations for an experiment, before early stopping
min_iteration_per_job=10,
# Set the maximum number of iterations for an experiment to execute
# (This is optional, unless using OptimizerBOHB where this is a must)
max_iteration_per_job=30,
<a class="tr_top_negative" name="service"></a>
## Running as a service
The optimization can run as a service, if the `run_as_service` argument is set to `true`. For more information about
running as a service, see [ClearML Agent services container](../../../clearml_agent.md#services-mode)
on "Concepts and Architecture" page.
# if we are running as a service, just enqueue ourselves into the services queue and let it run the optimization
if args['run_as_service']:
# if this code is executed by `clearml-agent` the function call does nothing.
# if executed locally, the local process will be terminated, and a remote copy will be executed instead
task.execute_remotely(queue_name='services', exit_process=True)
## Optimize
The optimizer is ready. Set the report period and start it, providing the callback method to report the best performance.
# report every 12 seconds, this is way too often, but we are testing here J
an_optimizer.set_report_period(0.2)
# start the optimization process, callback function to be called every time an experiment is completed
# this function returns immediately
an_optimizer.start(job_complete_callback=job_complete_callback)
# set the time limit for the optimization process (2 hours)
Now that it is running:
1. Set a time limit for optimization
1. Wait
1. Get the best performance
1. Print the best performance
1. Stop the optimizer.
# set the time limit for the optimization process (2 hours)
an_optimizer.set_time_limit(in_minutes=90.0)
# wait until process is done (notice we are controlling the optimization process in the background)
an_optimizer.wait()
# optimization is completed, print the top performing experiments id
top_exp = an_optimizer.get_top_experiments(top_k=3)
print([t.id for t in top_exp])
# make sure background optimization stopped
an_optimizer.stop()
print('We are done, good bye')

View File

@@ -0,0 +1,219 @@
---
title: Simple Pipeline - Serialized Data
---
The [pipeline_controller.py](https://github.com/allegroai/clearml/blob/master/examples/pipeline/pipeline_controller.py)
example demonstrates a simple pipeline in **ClearML**.
This pipeline is composed of three steps:
1. Download data
1. Process data
3. Train a network.
It is implemented using the [automation.controller.PipelineController](../../references/sdk/automation_controller_pipelinecontroller.md)
class. This class includes functionality to:
* Create a pipeline controller
* Add steps to the pipeline
* Pass data from one step to another
* Control the dependencies of a step beginning only after other steps complete
* Run the pipeline
* Wait for the pipeline to complete
* Cleanup after pipeline completes execution
This example implements the pipeline with four Tasks (each Task is created using a different script):
* **Controller Task** ([pipeline_controller.py](https://github.com/allegroai/clearml/blob/master/examples/pipeline/pipeline_controller.py)) -
Creates a pipeline controller, adds the steps (Tasks) to the pipeline, runs the pipeline.
* **Step 1 Task** ([step1_dataset_artifact.py](https://github.com/allegroai/clearml/blob/master/examples/pipeline/step1_dataset_artifact.py)) -
Downloads data and stores the data as an artifact.
* **Step 2 Task** ([step2_data_processing.py](https://github.com/allegroai/clearml/blob/master/examples/pipeline/step2_data_processing.py)) -
Loads the stored data (from Step 1), processes it, and stores the processed data as artifacts.
* **Step 3 Task** ([step3_train_model.py](https://github.com/allegroai/clearml/blob/master/examples/pipeline/step3_train_model.py)) -
Loads the processed data (from Step 2) and trains a network.
When the pipeline runs, the Step 1, Step 2, and Step 3 Tasks are cloned, and the newly cloned Tasks execute. The Tasks
they are cloned from, called the base Tasks, do not execute. This way, the pipeline can run multiple times. These
base Tasks must have already run at least once for them to be in **ClearML Server** and to be cloned. The controller Task
itself can be run from a development environment (by running the script), or cloned, and the cloned Task executed remotely (if the
controller Task has already run at least once and is in **ClearML Server**).
The sections below describe in more detail what happens in the controller Task and in each step Task.
## The pipeline controller
1. Create the pipeline controller object.
```python
pipe = PipelineController(default_execution_queue='default', add_pipeline_tags=False)
```
1. Add Step 1. Call the [automation.controller.PipelineController.add_step](../../references/sdk/automation_controller_pipelinecontroller.md#add_step)
method.
```python
pipe.add_step(name='stage_data', base_task_project='examples', base_task_name='pipeline step 1 dataset artifact')
```
* `name` - The name of Step 1 (`stage_data`).
* `base_task_project` and `base_task_name` - The Step 1 base Task to clone (the cloned Task will be executed when the pipeline runs).
1. Add Step 2.
```python
pipe.add_step(name='stage_process', parents=['stage_data', ],
base_task_project='examples', base_task_name='pipeline step 2 process dataset',
parameter_override={'General/dataset_url': '${stage_data.artifacts.dataset.url}',
'General/test_size': 0.25})
```
* `name` - The name of Step 2 (`stage_process`).
* `base_task_project` and `base_task_name` - The Step 2 base Task to clone.
* `parents` - The start of Step 2 (`stage_process`) depends upon the completion of Step 1 (`stage_data`).
* `parameter_override` - Pass the URL of the data artifact from Step 1 to Step 2. Override the value of the parameter
whose key is `dataset_url` (in the parameter group named `General`). Override it with the URL of the artifact named `dataset`. Also override the test size.
:::important
The syntax of the ``parameter_override`` value.
For other examples of ``parameter_override`` syntax, see the [automation.controller.PipelineController.add_step](../../references/sdk/automation_controller_pipelinecontroller.md#add_step).
:::
1. Add Step 3.
```python
pipe.add_step(name='stage_train', parents=['stage_process', ],
base_task_project='examples', base_task_name='pipeline step 3 train model',
parameter_override={'General/dataset_task_id': '${stage_process.id}'})
```
* `name` - The name of Step 3 (`stage_train`).
* `parents` - The start of Step 3 (`stage_train`) depends upon the completion of Step 2 (`stage_process`).
* `parameter_override` - Pass the ID of the Step 2 Task to the Step 3 Task. This is the ID of the cloned Task, not the base Task.
1. Run the pipeline, wait for it to complete, and cleanup.
```python
# Starting the pipeline (in the background)
pipe.start()
# Wait until pipeline terminates
pipe.wait()
# cleanup everything
pipe.stop()
```
## Step 1 - Downloading the data
In the Step 1 Task ([step1_dataset_artifact.py](https://github.com/allegroai/clearml/blob/master/examples/pipeline/step1_dataset_artifact.py)):
1. Clone base Task and enqueue it for execution
```python
task.execute_remotely()
```
1. Download data and store it as an artifact named `dataset`. This is the same artifact name used in `parameter_override`
when the `add_step` method is called in the pipeline controller.
```python
# simulate local dataset, download one, so we have something local
local_iris_pkl = StorageManager.get_local_copy(
remote_url='https://github.com/allegroai/events/raw/master/odsc20-east/generic/iris_dataset.pkl')
# add and upload local file containing our toy dataset
task.upload_artifact('dataset', artifact_object=local_iris_pkl)
```
## Step 2 - Processing the data
In the Step 2 Task ([step2_data_processing.py](https://github.com/allegroai/clearml/blob/master/examples/pipeline/step2_data_processing.py)):
1. Create a parameter dictionary and connect it to the Task.
```python
args = {
'dataset_task_id': '',
'dataset_url': '',
'random_state': 42,
'test_size': 0.2,
}
# store arguments, later we will be able to change them from outside the code
task.connect(args)
```
The parameter `dataset_url` is the same parameter name used by `parameter_override` when the `add_step` method is called in the pipeline controller.
1. Clone base Task and enqueue it for execution.
```python
task.execute_remotely()
```
1. Later in Step 2, the Task uses the URL in the parameter dictionary to get the data.
```python
iris_pickle = StorageManager.get_local_copy(remote_url=args['dataset_url'])
```
1. Task Processes data and then stores the processed data as artifacts.
```python
task.upload_artifact('X_train', X_train)
task.upload_artifact('X_test', X_test)
task.upload_artifact('y_train', y_train)
task.upload_artifact('y_test', y_test)
```
## Step 3 - Training the network
In the Step 3 Task ([step3_train_model.py](https://github.com/allegroai/clearml/blob/master/examples/pipeline/step3_train_model.py)):
1. Create a parameter dictionary and connect it to the Task.
```python
# Arguments
args = {
'dataset_task_id': 'REPLACE_WITH_DATASET_TASK_ID',
}
task.connect(args)
```
The parameter `dataset_task_id` is later overridden by the ID of the Step 2 Task (cloned Task, not base Task).
1. Clone the Step 3 base Task and enqueue it.
```python
task.execute_remotely()
```
1. Use the Step 2 Task ID to get the processed data stored in artifacts.
```python
dataset_task = Task.get_task(task_id=args['dataset_task_id'])
X_train = dataset_task.artifacts['X_train'].get()
X_test = dataset_task.artifacts['X_test'].get()
y_train = dataset_task.artifacts['y_train'].get()
y_test = dataset_task.artifacts['y_test'].get()
```
1. Train the network and log plots, along with **ClearML** automatic logging.
## Running the pipeline
**To run the pipeline:**
1. Run the script for each of the steps, if the script has not run once before.
python step1_dataset_artifact.py
python step2_data_processing.py
python step3_train_model.py
1. Run the pipeline controller one of the following two ways:
* Run the script.
python pipeline_controller.py
* Remotely execute the Task - If the Task `pipeline demo` in the project `examples` already exists in **ClearML Server**, clone it and enqueue it to execute.
:::note
If you enqueue a Task, a worker must be listening to that queue for the Task to execute.
:::
The plot appears in **RESULTS** > **PLOTS** describing the pipeline. Hover over a step in the pipeline, and view the name of the step and the parameters overridden by the step.
![image](../../img/pipeline_controller_01.png)

View File

@@ -0,0 +1,51 @@
---
title: 3D Plots Reporting
---
The [3d_plots_reporting.py](https://github.com/allegroai/clearml/blob/master/examples/reporting/3d_plots_reporting.py)
example demonstrates reporting a series as a surface plot and as a 3D scatter plot.
When the script runs, it creates an experiment named `3D plot reporting`, which is associated with the `examples` project.
**ClearML** reports these plots in the **ClearML Web UI** **>** experiment page **>** **RESULTS** tab **>** **PLOTS** sub-tab.
## Surface plot
To plot a series as a surface plot, use the [Logger.report_surface](../../references/sdk/logger.md#report_surface)
method.
# report 3d surface
surface = np.random.randint(10, size=(10, 10))
Logger.current_logger().report_surface(
"example_surface",
"series1",
iteration=iteration,
matrix=surface,
xaxis="title X",
yaxis="title Y",
zaxis="title Z",
)
Visualize the reported surface plot in **RESULTS** **>** **PLOTS**.
![image](../../img/examples_reporting_01.png)
## 3D scatter plot
To plot a series as a 3-dimensional scatter plot, use the [Logger.report_scatter3d](../../references/sdk/logger.md#report_scatter3d)
method.
# report 3d scatter plot
scatter3d = np.random.randint(10, size=(10, 3))
Logger.current_logger().report_scatter3d(
"example_scatter_3d",
"series_xyz",
iteration=iteration,
scatter=scatter3d,
xaxis="title x",
yaxis="title y",
zaxis="title z",
)
Visualize the reported 3D scatter plot in **RESULTS** **>** **PLOTS**.
![image](../../img/examples_reporting_02.png)

View File

@@ -0,0 +1,111 @@
---
title: Artifacts Reporting
---
The [artifacts.py](https://github.com/allegroai/clearml/blob/master/examples/reporting/artifacts.py) example demonstrates
uploading objects (other than models) to storage as experiment artifacts.
These artifacts include:
* Pandas DataFrames
* Local files, dictionaries
* Folders
* Numpy objects
* Image files
* Folders.
Artifacts can be uploaded and dynamically tracked, or uploaded without tracking.
<a name="configure_artifact_storage" class="tr_top_negative"></a>
Configure **ClearML** for uploading artifacts to any of the supported types of storage, which include local and shared folders,
S3 buckets, Google Cloud Storage, and Azure Storage ([debug sample storage](../../references/sdk/logger.md#set_default_upload_destination)
is different). Configure **ClearML** in any of the following ways:
* In the configuration file, set [default_output_uri](../../configs/clearml_conf.md#sdkdevelopment).
* In code, when [initializing a Task](../../references/sdk/task.md#taskinit), use the `output_uri` parameter.
* In the **ClearML Web UI**, when [modifying an experiment](../../webapp/webapp_exp_tuning.md#output-destination).
When the script runs, it creates an experiment named `artifacts example`, which is associated with the `examples` project.
**ClearML** reports artifacts in the **ClearML Web UI** **>** experiment details **>** **ARTIFACTS** tab.
![image](../../img/examples_reporting_03.png)
## Dynamically tracked artifacts
Currently, **ClearML** supports uploading and dynamically tracking Pandas DataFrames. Use the [Task.register_artifact](../../references/sdk/task.md#register_artifact)
method. If the Pandas DataFrame changes, **ClearML** uploads the changes. The updated artifact is associated with the experiment.
For example:
df = pd.DataFrame(
{
'num_legs': [2, 4, 8, 0],
'num_wings': [2, 0, 0, 0],
'num_specimen_seen': [10, 2, 1, 8]
},
index=['falcon', 'dog', 'spider', 'fish']
)
# Register Pandas object as artifact to watch
# (it will be monitored in the background and automatically synced and uploaded)
task.register_artifact('train', df, metadata={'counting': 'legs', 'max legs': 69}))
By changing the artifact, and calling the [Task.get_registered_artifacts](../../references/sdk/task.md#get_registered_artifacts)
method to retrieve it, we can see that **ClearML** tracked the change.
# change the artifact object
df.sample(frac=0.5, replace=True, random_state=1)
# or access it from anywhere using the Task's get_registered_artifacts()
Task.current_task().get_registered_artifacts()['train'].sample(frac=0.5, replace=True, random_state=1)
## Artifacts without tracking
**ClearML** supports several types of objects that can be uploaded and are not tracked. Use the [Task.upload_artifact](../../references/sdk/task.md#upload_artifact)
method.
Artifacts without tracking include:
* Pandas DataFrames
* Local files
* Dictionaries (stored as a JSONs)
* Numpy objects (stored as NPZ files)
* Image files (stored as PNG files)
* Folders (stored as a ZIP files)
* Wildcards (stored as a ZIP files)
### Pandas DataFrames
# add and upload pandas.DataFrame (onetime snapshot of the object)
task.upload_artifact('Pandas', artifact_object=df)
### Local files
# add and upload local file artifact
task.upload_artifact('local file', artifact_object=os.path.join('data_samples', 'dancing.jpg'))
### Dictionaries
# add and upload dictionary stored as JSON)
task.upload_artifact('dictionary', df.to_dict())
### Numpy objects
# add and upload Numpy Object (stored as .npz file)
task.upload_artifact('Numpy Eye', np.eye(100, 100))
### Image files
# add and upload Image (stored as .png file)
im = Image.open(os.path.join('data_samples', 'dancing.jpg'))
task.upload_artifact('pillow_image', im)
### Folders
# add and upload a folder, artifact_object should be the folder path
task.upload_artifact('local folder', artifact_object=os.path.join('data_samples'))
### Wildcards
# add and upload a wildcard
task.upload_artifact('wildcard jpegs', artifact_object=os.path.join('data_samples', '*.jpg'))

View File

@@ -0,0 +1,200 @@
---
title: Explicit Reporting - Jupyter Notebook
---
The [jupyter_logging_example.ipynb](https://github.com/allegroai/clearml/blob/master/examples/reporting/jupyter_logging_example.ipynb)
script demonstrates the integration of **ClearML** explicit reporting running in a Jupyter Notebook. All **ClearML**
explicit reporting works with Jupyter Notebook.
This example includes several types of explicit reporting, including:
* Scalars
* Some plots
* Media.
:::note
In the ``clearml`` GitHub repository, this example includes a clickable icon to open the notebook in Google Colab.
:::
## Scalars
To reports scalars, call the [Logger.report_scalar](../../references/sdk/logger.md#report_scalar)
method. The scalar plots appear in the **web UI** in **RESULTS** **>** **SCALARS**.
# report two scalar series on two different graphs
for i in range(10):
logger.report_scalar("graph A", "series A", iteration=i, value=1./(i+1))
logger.report_scalar("graph B", "series B", iteration=i, value=10./(i+1))
![image](../../img/colab_explicit_reporting_01.png)
# report two scalar series on the same graph
for i in range(10):
logger.report_scalar("unified graph", "series A", iteration=i, value=1./(i+1))
logger.report_scalar("unified graph", "series B", iteration=i, value=10./(i+1))
![image](../../img/colab_explicit_reporting_02.png)
## Plots
Plots appear in **RESULTS** **>** **PLOTS**.
### 2D Plots
Report 2D scatter plots by calling the [Logger.report_scatter2d](../../references/sdk/logger.md#report_scatter2d) method.
Use the `mode` parameter to plot data points as markers, or both lines and markers.
scatter2d = np.hstack(
(np.atleast_2d(np.arange(0, 10)).T, np.random.randint(10, size=(10, 1)))
)
# report 2d scatter plot with markers
logger.report_scatter2d(
"example_scatter",
"series_lines+markers",
iteration=iteration,
scatter=scatter2d,
xaxis="title x",
yaxis="title y",
mode='lines+markers'
)
![image](../../img/colab_explicit_reporting_04.png)
### 3D Plots
To plot a series as a 3-dimensional scatter plot, use the [Logger.report_scatter3d](../../references/sdk/logger.md#report_scatter3d) method.
# report 3d scatter plot
scatter3d = np.random.randint(10, size=(10, 3))
logger.report_scatter3d(
"example_scatter_3d",
"series_xyz",
iteration=iteration,
scatter=scatter3d,
xaxis="title x",
yaxis="title y",
zaxis="title z",
)
![image](../../img/colab_explicit_reporting_05.png)
To plot a series as a surface plot, use the [Logger.report_surface](../../references/sdk/logger.md#report_surface)
method.
# report 3d surface
surface = np.random.randint(10, size=(10, 10))
logger.report_surface(
"example_surface",
"series1",
iteration=iteration,
matrix=surface,
xaxis="title X",
yaxis="title Y",
zaxis="title Z",
)
![image](../../img/colab_explicit_reporting_06.png)
### Confusion matrices
Report confusion matrices by calling the [Logger.report_matrix](../../references/sdk/logger.md#report_matrix)
method.
# report confusion matrix
confusion = np.random.randint(10, size=(10, 10))
logger.report_matrix(
"example_confusion",
"ignored",
iteration=iteration,
matrix=confusion,
xaxis="title X",
yaxis="title Y",
)
![image](../../img/colab_explicit_reporting_03.png)
### Histograms
Report histograms by calling the [Logger.report_histogram](../../references/sdk/logger.md#report_histogram)
method. To report more than one series on the same plot, use the same `title` argument.
# report a single histogram
histogram = np.random.randint(10, size=10)
logger.report_histogram(
"single_histogram",
"random histogram",
iteration=iteration,
values=histogram,
xaxis="title x",
yaxis="title y",
)
![image](../../img/colab_explicit_reporting_12.png)
# report a two histograms on the same plot
histogram1 = np.random.randint(13, size=10)
histogram2 = histogram * 0.75
logger.report_histogram(
"two_histogram",
"series 1",
iteration=iteration,
values=histogram1,
xaxis="title x",
yaxis="title y",
)
logger.report_histogram(
"two_histogram",
"series 2",
iteration=iteration,
values=histogram2,
xaxis="title x",
yaxis="title y",
)
![image](../../img/colab_explicit_reporting_07.png)
## Media
Report audio, HTML, image, and video by calling the [Logger.report_media](../../references/sdk/logger.md#report_media)
method using the `local_path` parameter. They appear in **RESULTS** **>** **DEBUG SAMPLES**.
The media for these examples is downloaded using the [StorageManager.get_local_copy](../../references/sdk/storage.md#storagemanagerget_local_copy)
method.
For example, to download an image:
image_local_copy = StorageManager.get_local_copy(
remote_url="https://pytorch.org/tutorials/_static/img/neural-style/picasso.jpg",
name="picasso.jpg"
)
### Audio
logger.report_media('audio', 'pink panther', iteration=1, local_path=audio_local_copy)
![image](../../img/colab_explicit_reporting_08.png)
### HTML
logger.report_media("html", "url_html", iteration=1, url="https://allegro.ai/docs/index.html")
![image](../../img/colab_explicit_reporting_09.png)
### Images
logger.report_image("image", "image from url", iteration=100, local_path=image_local_copy)
![image](../../img/colab_explicit_reporting_10.png)
### Video
logger.report_media('video', 'big bunny', iteration=1, local_path=video_local_copy)
![image](../../img/colab_explicit_reporting_11.png)
## Text
Report text messages by calling the [Logger.report_text](../../references/sdk/logger.md#report_text).
logger.report_text("hello, this is plain text")
![image](../../img/colab_explicit_reporting_13.png)

Some files were not shown because too many files have changed in this diff Show More