Merge branch 'allegroai:main' into main

This commit is contained in:
pollfly
2021-08-18 15:42:29 +03:00
committed by GitHub
4 changed files with 33 additions and 6 deletions

View File

@@ -12,12 +12,13 @@ in the UI and send it for long-term training on a remote machine.
**If you are not that lucky**, this section is for you :)
## What does ClearML Session do?
`clearml-session` is a feature that allows to launch a session of Jupyterlab and VS Code, and to execute code on a remote
`clearml-session` is a feature that allows to launch a session of JupyterLab and VS Code, and to execute code on a remote
machine that better meets resource needs. With this feature, local links are provided, which can be used to access
JupyterLab and VS Code on a remote machine over a secure and encrypted SSH connection.
JupyterLab and VS Code on a remote machine over a secure and encrypted SSH connection. By default, the JupyterLab and
VS Code remote sessions use ports 8878 and 8898 respectively.
<details className="cml-expansion-panel screenshot">
<summary className="cml-expansion-panel-summary">Jupyter-Lab Window</summary>
<summary className="cml-expansion-panel-summary">JupyterLab Window</summary>
<div className="cml-expansion-panel-content">
![image](../img/session_jupyter.png)
@@ -138,7 +139,7 @@ The Task must be connected to a git repository, since currently single script de
| Command line options | Description | Default value |
|-----|---|---|
| `--jupyter-lab` | Download a Jupyter-Lab environment | `true` |
| `--jupyter-lab` | Download a JupyterLab environment | `true` |
| `--vscode-server` | Download a VSCode environment | `true` |
| `--public-ip` | Register the public IP of the remote machine (if you are running the session on a public cloud) | Session runs on the machine whose agent is executing the session|
| `--init-script` | Specify a BASH init script file to be executed when the interactive session is being set up | `none` or previously entered BASH script |

View File

@@ -300,6 +300,7 @@ the watchdog marks them as `aborted`. The non-responsive experiment watchdog is
Modify the following settings for the watchdog:
* Watchdog status - enabled / disabled
* The time threshold (in seconds) of experiment inactivity (default value is 7200 seconds (2 hours)).
* The time interval (in seconds) between watchdog cycles.
@@ -312,6 +313,8 @@ Modify the following settings for the watchdog:
tasks {
non_responsive_tasks_watchdog {
enabled: true
# In-progress tasks that haven't been updated for at least 'value' seconds will be stopped by the watchdog
threshold_sec: 7200

View File

@@ -94,6 +94,7 @@ title: FAQ
* [How do I bypass a proxy configuration to access my local ClearML Server?](#proxy-localhost)
* [Trains is failing to update ClearML Server. I get an error 500 (or 400). How do I fix this?](#elastic_watermark)
* [Why is my Trains Web-App (UI) not showing any data?](#web-ui-empty)
* [Why can't I access my ClearML Server when I run my code in a virtual machine?](#vm_server)
**ClearML Agent**
@@ -816,7 +817,7 @@ Do the following:
<br/>
**The ClearML Server keeps returning HTTP 500 (or 400) errors. How do I fix this?**
**The ClearML Server keeps returning HTTP 500 (or 400) errors. How do I fix this?** <a id="elastic_watermark"></a>
The ClearML Server will return HTTP error responses (5XX, or 4XX) when some of its [backend components](deploying_clearml/clearml_server.md)
are failing.
@@ -839,6 +840,28 @@ A likely indication of this situation can be determined by searching your clearm
If your ClearML Web-App (UI) does not show anything, it may be an error authenticating with the server. Try clearing the application cookies for the site in your browser's developer tools.
**Why can't I access my ClearML Server when I run my code in a virtual machine?** <a id="vm_server"></a>
The network definitions inside a virtual machine (or container) are different from those of the host. The virtual machine's
and the server machine's IP addresses are different, so you have to make sure that the machine that is executing the
experiment can access the server's machine.
Make sure to have an independent configuration file for the virtual machine where you are running your experiments.
Edit the `api` section of your `clearml.conf` file and insert IP addresses of the server machine that are accessible
from the VM. It should look something like this:
```
api {
web_server: http://192.168.1.2:8080
api_server: http://192.168.1.2:8008
credentials {
"access_key" = "KEY"
"secret_key" = "SECRET"
}
}
```
## ClearML Agent
**How can I execute ClearML Agent without installing packages each time?** <a className="tr_top_negative" id="system_site_packages"></a>

View File

@@ -30,7 +30,7 @@ pip install clearml-agent
Connect the Agent to the server by [creating credentials](https://app.community.clear.ml/profile), then run this:
```bash
clearml-init
clearml-agent init
```
:::note