From d24375299e0a1a504172e7cb1a80336e4e348231 Mon Sep 17 00:00:00 2001 From: pollfly <75068813+pollfly@users.noreply.github.com> Date: Wed, 18 Aug 2021 10:03:41 +0300 Subject: [PATCH 1/4] Add VM connectivity FAQ (#37) --- docs/faq.md | 25 ++++++++++++++++++++++++- 1 file changed, 24 insertions(+), 1 deletion(-) diff --git a/docs/faq.md b/docs/faq.md index b71fb78f..def29305 100644 --- a/docs/faq.md +++ b/docs/faq.md @@ -94,6 +94,7 @@ title: FAQ * [How do I bypass a proxy configuration to access my local ClearML Server?](#proxy-localhost) * [Trains is failing to update ClearML Server. I get an error 500 (or 400). How do I fix this?](#elastic_watermark) * [Why is my Trains Web-App (UI) not showing any data?](#web-ui-empty) +* [Why can't I access my ClearML Server when I run my code in a virtual machine?](#vm_server) **ClearML Agent** @@ -816,7 +817,7 @@ Do the following:
-**The ClearML Server keeps returning HTTP 500 (or 400) errors. How do I fix this?** +**The ClearML Server keeps returning HTTP 500 (or 400) errors. How do I fix this?** The ClearML Server will return HTTP error responses (5XX, or 4XX) when some of its [backend components](deploying_clearml/clearml_server.md) are failing. @@ -839,6 +840,28 @@ A likely indication of this situation can be determined by searching your clearm If your ClearML Web-App (UI) does not show anything, it may be an error authenticating with the server. Try clearing the application cookies for the site in your browser's developer tools. +**Why can't I access my ClearML Server when I run my code in a virtual machine?** + +The network definitions inside a virtual machine (or container) are different from those of the host. The virtual machine's +and the server machine's IP addresses are different, so you have to make sure that the machine that is executing the +experiment can access the server's machine. + +Make sure to have an independent configuration file for the virtual machine where you are running your experiments. +Edit the `api` section of your `clearml.conf` file and insert IP addresses of the server machine that are accessible +from the VM. It should look something like this: + +``` +api { + web_server: http://192.168.1.2:8080 + api_server: http://192.168.1.2:8008 + credentials { + "access_key" = "KEY" + "secret_key" = "SECRET" + } +} +``` + + ## ClearML Agent **How can I execute ClearML Agent without installing packages each time?** From b2caa49be68a923d41b190fc1c0ff7afe8d9d6ee Mon Sep 17 00:00:00 2001 From: pollfly <75068813+pollfly@users.noreply.github.com> Date: Wed, 18 Aug 2021 10:05:59 +0300 Subject: [PATCH 2/4] Add ClearML Session port details (#38) --- docs/apps/clearml_session.md | 9 +++++---- 1 file changed, 5 insertions(+), 4 deletions(-) diff --git a/docs/apps/clearml_session.md b/docs/apps/clearml_session.md index d440b91b..fe9e1d42 100644 --- a/docs/apps/clearml_session.md +++ b/docs/apps/clearml_session.md @@ -12,12 +12,13 @@ in the UI and send it for long-term training on a remote machine. **If you are not that lucky**, this section is for you :) ## What does ClearML Session do? -`clearml-session` is a feature that allows to launch a session of Jupyterlab and VS Code, and to execute code on a remote +`clearml-session` is a feature that allows to launch a session of JupyterLab and VS Code, and to execute code on a remote machine that better meets resource needs. With this feature, local links are provided, which can be used to access -JupyterLab and VS Code on a remote machine over a secure and encrypted SSH connection. +JupyterLab and VS Code on a remote machine over a secure and encrypted SSH connection. By default, the JupyterLab and +VS Code remote sessions use ports 8878 and 8898 respectively.
-Jupyter-Lab Window +JupyterLab Window
![image](../img/session_jupyter.png) @@ -138,7 +139,7 @@ The Task must be connected to a git repository, since currently single script de | Command line options | Description | Default value | |-----|---|---| -| `--jupyter-lab` | Download a Jupyter-Lab environment | `true` | +| `--jupyter-lab` | Download a JupyterLab environment | `true` | | `--vscode-server` | Download a VSCode environment | `true` | | `--public-ip` | Register the public IP of the remote machine (if you are running the session on a public cloud) | Session runs on the machine whose agent is executing the session| | `--init-script` | Specify a BASH init script file to be executed when the interactive session is being set up | `none` or previously entered BASH script | From 94dc32a9638d37ec77c65824eb491654eee4eb26 Mon Sep 17 00:00:00 2001 From: pollfly <75068813+pollfly@users.noreply.github.com> Date: Wed, 18 Aug 2021 10:29:24 +0300 Subject: [PATCH 3/4] add enabling/disabling docs for non-responsive watchdog (#39) --- docs/deploying_clearml/clearml_server_config.md | 3 +++ 1 file changed, 3 insertions(+) diff --git a/docs/deploying_clearml/clearml_server_config.md b/docs/deploying_clearml/clearml_server_config.md index 5eb4b600..3d15d9ed 100644 --- a/docs/deploying_clearml/clearml_server_config.md +++ b/docs/deploying_clearml/clearml_server_config.md @@ -300,6 +300,7 @@ the watchdog marks them as `aborted`. The non-responsive experiment watchdog is Modify the following settings for the watchdog: +* Watchdog status - enabled / disabled * The time threshold (in seconds) of experiment inactivity (default value is 7200 seconds (2 hours)). * The time interval (in seconds) between watchdog cycles. @@ -312,6 +313,8 @@ Modify the following settings for the watchdog: tasks { non_responsive_tasks_watchdog { + enabled: true + # In-progress tasks that haven't been updated for at least 'value' seconds will be stopped by the watchdog threshold_sec: 7200 From 5b51117434468fa2bf17f6bd5ca13b6bedc3c719 Mon Sep 17 00:00:00 2001 From: Derek Chia Date: Wed, 18 Aug 2021 18:45:40 +0800 Subject: [PATCH 4/4] Fix ClearML Agent initialization command (#40) --- docs/getting_started/mlops/mlops_first_steps.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/docs/getting_started/mlops/mlops_first_steps.md b/docs/getting_started/mlops/mlops_first_steps.md index 4b573b45..17fa158e 100644 --- a/docs/getting_started/mlops/mlops_first_steps.md +++ b/docs/getting_started/mlops/mlops_first_steps.md @@ -30,7 +30,7 @@ pip install clearml-agent Connect the Agent to the server by [creating credentials](https://app.community.clear.ml/profile), then run this: ```bash -clearml-init +clearml-agent init ``` :::note