mirror of
https://github.com/clearml/clearml-agent
synced 2025-06-26 18:16:15 +00:00
Update example (#177)
* Edit README * Edit README * small edits * update example * update example * update example
This commit is contained in:
parent
dd5d24b0ca
commit
2c7f091e57
@ -5,27 +5,30 @@
|
|||||||
"metadata": {},
|
"metadata": {},
|
||||||
"source": [
|
"source": [
|
||||||
"# Auto-Magically Spin AWS EC2 Instances On Demand \n",
|
"# Auto-Magically Spin AWS EC2 Instances On Demand \n",
|
||||||
"# and Create a Dynamic Cluster Running *Trains-Agent*\n",
|
"# and Create a Dynamic Cluster Running *ClearML-Agent*\n",
|
||||||
"\n",
|
"\n",
|
||||||
"### Define your budget and execute the notebook, that's it\n",
|
"## Define your budget and execute the notebook, that's it\n",
|
||||||
"### You now have a fully managed cluster on AWS 🎉 🎊 "
|
"## You now have a fully managed cluster on AWS 🎉 🎊"
|
||||||
]
|
]
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "markdown",
|
"cell_type": "markdown",
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"source": [
|
"source": [
|
||||||
"**trains-agent**'s main goal is to quickly pull a job from an execution queue, setup the environment (as defined in the experiment, including git cloning, python packages etc.) then execute the experiment and monitor it.\n",
|
"**clearml-agent**'s main goal is to quickly pull a job from an execution queue, set up the environment (as defined in the experiment, including git cloning, python packages etc.), then execute the experiment and monitor it.\n",
|
||||||
"\n",
|
"\n",
|
||||||
"This notebook defines a cloud budget (currently only AWS is supported, but feel free to expand with PRs), and spins an instance the minute a job is waiting for execution. It will also spin down idle machines, saving you some $$$ :)\n",
|
"This notebook defines a cloud budget (currently only AWS is supported, but feel free to expand with PRs), and spins an instance the minute a job is waiting for execution. It will also spin down idle machines, saving you some $$$ :)\n",
|
||||||
"\n",
|
"\n",
|
||||||
"Configuration steps\n",
|
"> **Note:**\n",
|
||||||
|
"> This is just an example of how you can use ClearML Agent to implement custom autoscaling. For a more structured autoscaler script, see [here](https://github.com/allegroai/clearml/blob/master/clearml/automation/auto_scaler.py).\n",
|
||||||
|
"\n",
|
||||||
|
"Configuration steps:\n",
|
||||||
"- Define maximum budget to be used (instance type / number of instances).\n",
|
"- Define maximum budget to be used (instance type / number of instances).\n",
|
||||||
"- Create new execution *queues* in the **trains-server**.\n",
|
"- Create new execution *queues* in the **clearml-server**.\n",
|
||||||
"- Define mapping between the created the *queues* and an instance budget.\n",
|
"- Define mapping between the created *queues* and an instance budget.\n",
|
||||||
"\n",
|
"\n",
|
||||||
"**TL;DR - This notebook:**\n",
|
"**TL;DR - This notebook:**\n",
|
||||||
"- Will spin instances if there are jobs in the execution *queues*, until it will hit the budget limit. \n",
|
"- Will spin instances if there are jobs in the execution *queues* until it will hit the budget limit.\n",
|
||||||
"- If machines are idle, it will spin them down.\n",
|
"- If machines are idle, it will spin them down.\n",
|
||||||
"\n",
|
"\n",
|
||||||
"The controller implementation itself is stateless, meaning you can always re-execute the notebook, if for some reason it stopped.\n",
|
"The controller implementation itself is stateless, meaning you can always re-execute the notebook, if for some reason it stopped.\n",
|
||||||
@ -39,7 +42,7 @@
|
|||||||
"cell_type": "markdown",
|
"cell_type": "markdown",
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"source": [
|
"source": [
|
||||||
"##### Install & import required packages"
|
"### Install & import required packages"
|
||||||
]
|
]
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
@ -48,7 +51,7 @@
|
|||||||
"metadata": {},
|
"metadata": {},
|
||||||
"outputs": [],
|
"outputs": [],
|
||||||
"source": [
|
"source": [
|
||||||
"!pip install trains-agent\n",
|
"!pip install clearml-agent\n",
|
||||||
"!pip install boto3"
|
"!pip install boto3"
|
||||||
]
|
]
|
||||||
},
|
},
|
||||||
@ -56,7 +59,7 @@
|
|||||||
"cell_type": "markdown",
|
"cell_type": "markdown",
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"source": [
|
"source": [
|
||||||
"##### Define AWS instance types and configuration (Instance Type, EBS, AMI etc.)"
|
"### Define AWS instance types and configuration (Instance Type, EBS, AMI etc.)"
|
||||||
]
|
]
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
@ -92,17 +95,17 @@
|
|||||||
"cell_type": "markdown",
|
"cell_type": "markdown",
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"source": [
|
"source": [
|
||||||
"##### Define machine budget per execution queue\n",
|
"### Define machine budget per execution queue\n",
|
||||||
"\n",
|
"\n",
|
||||||
"Now that we defined our budget, we need to connect it with the **Trains** cluster.\n",
|
"Now that we defined our budget, we need to connect it with the **ClearML** cluster.\n",
|
||||||
"\n",
|
"\n",
|
||||||
"We map each queue to a resource type (instance type).\n",
|
"We map each queue to a resource type (instance type).\n",
|
||||||
"\n",
|
"\n",
|
||||||
"Create two queues in the WebUI:\n",
|
"Create two queues in the Web UI:\n",
|
||||||
"- Browse to http://your_trains_server_ip:8080/workers-and-queues/queues\n",
|
"- Browse to http://your_clearml_server_ip:8080/workers-and-queues/queues\n",
|
||||||
"- Then click on the \"New Queue\" button and name your queues \"aws_normal\" and \"aws_high\" respectively\n",
|
"- Then click on the \"New Queue\" button and name your queues \"aws_normal\" and \"aws_high\" respectively\n",
|
||||||
"\n",
|
"\n",
|
||||||
"The QUEUES dictionary hold the mapping between the queue name and the type/number of instances to spin connected to the specific queue.\n",
|
"The QUEUES dictionary holds the mapping between the queue name and the type/number of instances to spin connected to the specific queue.\n",
|
||||||
"```\n",
|
"```\n",
|
||||||
"QUEUES = {\n",
|
"QUEUES = {\n",
|
||||||
" 'queue_name': [(\"instance-type-as-defined-in-RESOURCE_CONFIGURATIONS\", max_number_of_instances), ]\n",
|
" 'queue_name': [(\"instance-type-as-defined-in-RESOURCE_CONFIGURATIONS\", max_number_of_instances), ]\n",
|
||||||
@ -116,7 +119,7 @@
|
|||||||
"metadata": {},
|
"metadata": {},
|
||||||
"outputs": [],
|
"outputs": [],
|
||||||
"source": [
|
"source": [
|
||||||
"# Trains-Agent Queues - Machines budget per Queue\n",
|
"# ClearML Agent Queues - Machines budget per Queue\n",
|
||||||
"# Per queue: list of (machine type as defined in RESOURCE_CONFIGURATIONS,\n",
|
"# Per queue: list of (machine type as defined in RESOURCE_CONFIGURATIONS,\n",
|
||||||
"# max instances for the specific queue). Order machines from most preferred to least.\n",
|
"# max instances for the specific queue). Order machines from most preferred to least.\n",
|
||||||
"QUEUES = {\n",
|
"QUEUES = {\n",
|
||||||
@ -129,7 +132,7 @@
|
|||||||
"cell_type": "markdown",
|
"cell_type": "markdown",
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"source": [
|
"source": [
|
||||||
"##### Credentials for your AWS account, as well as for your **Trains-Server**"
|
"### Credentials for your AWS account, as well as for your **ClearML Server**"
|
||||||
]
|
]
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
@ -143,24 +146,25 @@
|
|||||||
"CLOUD_CREDENTIALS_SECRET = \"\"\n",
|
"CLOUD_CREDENTIALS_SECRET = \"\"\n",
|
||||||
"CLOUD_CREDENTIALS_REGION = \"us-east-1\"\n",
|
"CLOUD_CREDENTIALS_REGION = \"us-east-1\"\n",
|
||||||
"\n",
|
"\n",
|
||||||
"# TRAINS configuration\n",
|
"# CLEARML configuration\n",
|
||||||
"TRAINS_SERVER_WEB_SERVER = \"http://localhost:8080\"\n",
|
"CLEARML_WEB_SERVER = \"http://localhost:8080\"\n",
|
||||||
"TRAINS_SERVER_API_SERVER = \"http://localhost:8008\"\n",
|
"CLEARML_API_SERVER = \"http://localhost:8008\"\n",
|
||||||
"TRAINS_SERVER_FILES_SERVER = \"http://localhost:8081\"\n",
|
"CLEARML_FILES_SERVER = \"http://localhost:8081\"\n",
|
||||||
"# TRAINS credentials\n",
|
"# CLEARML credentials\n",
|
||||||
"TRAINS_ACCESS_KEY = \"\"\n",
|
"CLEARML_API_ACCESS_KEY = \"\"\n",
|
||||||
"TRAINS_SECRET_KEY = \"\"\n",
|
"CLEARML_API_SECRET_KEY = \"\"\n",
|
||||||
"# Git User/Pass to be used by trains-agent,\n",
|
"# Git User/Pass to be used by clearml-agent,\n",
|
||||||
"# leave empty if image already contains git ssh-key\n",
|
"# leave empty if image already contains git ssh-key\n",
|
||||||
"TRAINS_GIT_USER = \"\"\n",
|
"CLEARML_AGENT_GIT_USER = \"\"\n",
|
||||||
"TRAINS_GIT_PASS = \"\"\n",
|
"CLEARML_AGENT_GIT_PASS = \"\"\n",
|
||||||
"\n",
|
"\n",
|
||||||
"# Additional fields for trains.conf file created on the remote instance\n",
|
"# Additional fields for clearml.conf file created on the remote instance\n",
|
||||||
"# for example: 'agent.default_docker.image: \"nvidia/cuda:11.0.3-cudnn8-runtime-ubuntu20.04\"'\n",
|
"# for example: 'agent.default_docker.image: \"nvidia/cuda:11.0.3-cudnn8-runtime-ubuntu20.04\"'\n",
|
||||||
"EXTRA_TRAINS_CONF = \"\"\"\n",
|
"\n",
|
||||||
|
"EXTRA_CLEARML_CONF = \"\"\"\n",
|
||||||
"\"\"\"\n",
|
"\"\"\"\n",
|
||||||
"\n",
|
"\n",
|
||||||
"# Bash script to run on instances before running trains-agent\n",
|
"# Bash script to run on instances before running clearml-agent\n",
|
||||||
"# Example: \"\"\"\n",
|
"# Example: \"\"\"\n",
|
||||||
"# echo \"This is the first line\"\n",
|
"# echo \"This is the first line\"\n",
|
||||||
"# echo \"This is the second line\"\n",
|
"# echo \"This is the second line\"\n",
|
||||||
@ -168,9 +172,9 @@
|
|||||||
"EXTRA_BASH_SCRIPT = \"\"\"\n",
|
"EXTRA_BASH_SCRIPT = \"\"\"\n",
|
||||||
"\"\"\"\n",
|
"\"\"\"\n",
|
||||||
"\n",
|
"\n",
|
||||||
"# Default docker for trains-agent when running in docker mode (requires docker v19.03 and above). \n",
|
"# Default docker for clearml-agent when running in docker mode (requires docker v19.03 and above).\n",
|
||||||
"# Leave empty to run trains-agent in non-docker mode.\n",
|
"# Leave empty to run clearml-agent in non-docker mode.\n",
|
||||||
"DEFAULT_DOCKER_IMAGE = \"nvidia/cuda\""
|
"CLEARML_AGENT_DOCKER_IMAGE = \"nvidia/cuda\""
|
||||||
]
|
]
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
@ -192,7 +196,7 @@
|
|||||||
"cell_type": "markdown",
|
"cell_type": "markdown",
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"source": [
|
"source": [
|
||||||
"##### Import Packages and Budget Definition Sanity Check"
|
"### Import Packages and Budget Definition Sanity Check"
|
||||||
]
|
]
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
@ -209,7 +213,7 @@
|
|||||||
"from time import sleep, time\n",
|
"from time import sleep, time\n",
|
||||||
"\n",
|
"\n",
|
||||||
"import boto3\n",
|
"import boto3\n",
|
||||||
"from trains_agent.backend_api.session.client import APIClient"
|
"from clearml_agent.backend_api.session.client import APIClient"
|
||||||
]
|
]
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
@ -227,36 +231,36 @@
|
|||||||
" \"A resource name can only appear in a single queue definition.\"\n",
|
" \"A resource name can only appear in a single queue definition.\"\n",
|
||||||
" )\n",
|
" )\n",
|
||||||
"\n",
|
"\n",
|
||||||
"# Encode EXTRA_TRAINS_CONF for later bash script usage\n",
|
"# Encode EXTRA_CLEARML_CONF for later bash script usage\n",
|
||||||
"EXTRA_TRAINS_CONF_ENCODED = \"\\\\\\\"\".join(EXTRA_TRAINS_CONF.split(\"\\\"\"))"
|
"EXTRA_CLEARML_CONF_ENCODED = \"\\\\\\\"\".join(EXTRA_CLEARML_CONF.split(\"\\\"\"))"
|
||||||
]
|
]
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "markdown",
|
"cell_type": "markdown",
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"source": [
|
"source": [
|
||||||
"##### Cloud specific implementation of spin up/down - currently supports AWS only"
|
"### Cloud specific implementation of spin up/down - currently supports AWS only"
|
||||||
]
|
]
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "code",
|
"cell_type": "code",
|
||||||
"execution_count": null,
|
"execution_count": 1,
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"outputs": [],
|
"outputs": [],
|
||||||
"source": [
|
"source": [
|
||||||
"# Cloud-specific implementation (currently, only AWS EC2 is supported)\n",
|
"# Cloud-specific implementation (currently, only AWS EC2 is supported)\n",
|
||||||
"def spin_up_worker(resource, worker_id_prefix, queue_name):\n",
|
"def spin_up_worker(resource, worker_id_prefix, queue_name):\n",
|
||||||
" \"\"\"\n",
|
" \"\"\"\n",
|
||||||
" Creates a new worker for trains.\n",
|
" Creates a new worker for clearml.\n",
|
||||||
" First, create an instance in the cloud and install some required packages.\n",
|
" First, create an instance in the cloud and install some required packages.\n",
|
||||||
" Then, define trains-agent environment variables and run \n",
|
" Then, define clearml-agent environment variables and run\n",
|
||||||
" trains-agent for the specified queue.\n",
|
" clearml-agent for the specified queue.\n",
|
||||||
" NOTE: - Will wait until instance is running\n",
|
" NOTE: - Will wait until instance is running\n",
|
||||||
" - This implementation assumes the instance image already has docker installed\n",
|
" - This implementation assumes the instance image already has docker installed\n",
|
||||||
"\n",
|
"\n",
|
||||||
" :param str resource: resource name, as defined in BUDGET and QUEUES.\n",
|
" :param str resource: resource name, as defined in BUDGET and QUEUES.\n",
|
||||||
" :param str worker_id_prefix: worker name prefix\n",
|
" :param str worker_id_prefix: worker name prefix\n",
|
||||||
" :param str queue_name: trains queue to listen to\n",
|
" :param str queue_name: clearml queue to listen to\n",
|
||||||
" \"\"\"\n",
|
" \"\"\"\n",
|
||||||
" resource_conf = RESOURCE_CONFIGURATIONS[resource]\n",
|
" resource_conf = RESOURCE_CONFIGURATIONS[resource]\n",
|
||||||
" # Add worker type and AWS instance type to the worker name.\n",
|
" # Add worker type and AWS instance type to the worker name.\n",
|
||||||
@ -267,8 +271,8 @@
|
|||||||
" )\n",
|
" )\n",
|
||||||
"\n",
|
"\n",
|
||||||
" # user_data script will automatically run when the instance is started. \n",
|
" # user_data script will automatically run when the instance is started. \n",
|
||||||
" # It will install the required packages for trains-agent configure it using \n",
|
" # It will install the required packages for clearml-agent configure it using\n",
|
||||||
" # environment variables and run trains-agent on the required queue\n",
|
" # environment variables and run clearml-agent on the required queue\n",
|
||||||
" user_data = \"\"\"#!/bin/bash\n",
|
" user_data = \"\"\"#!/bin/bash\n",
|
||||||
" sudo apt-get update\n",
|
" sudo apt-get update\n",
|
||||||
" sudo apt-get install -y python3-dev\n",
|
" sudo apt-get install -y python3-dev\n",
|
||||||
@ -278,36 +282,36 @@
|
|||||||
" sudo apt-get install -y build-essential\n",
|
" sudo apt-get install -y build-essential\n",
|
||||||
" python3 -m pip install -U pip\n",
|
" python3 -m pip install -U pip\n",
|
||||||
" python3 -m pip install virtualenv\n",
|
" python3 -m pip install virtualenv\n",
|
||||||
" python3 -m virtualenv trains_agent_venv\n",
|
" python3 -m virtualenv clearml_agent_venv\n",
|
||||||
" source trains_agent_venv/bin/activate\n",
|
" source clearml_agent_venv/bin/activate\n",
|
||||||
" python -m pip install trains-agent\n",
|
" python -m pip install clearml-agent\n",
|
||||||
" echo 'agent.git_user=\\\"{git_user}\\\"' >> /root/trains.conf\n",
|
" echo 'agent.git_user=\\\"{git_user}\\\"' >> /root/clearml.conf\n",
|
||||||
" echo 'agent.git_pass=\\\"{git_pass}\\\"' >> /root/trains.conf\n",
|
" echo 'agent.git_pass=\\\"{git_pass}\\\"' >> /root/clearml.conf\n",
|
||||||
" echo \"{trains_conf}\" >> /root/trains.conf\n",
|
" echo \"{clearml_conf}\" >> /root/clearml.conf\n",
|
||||||
" export TRAINS_API_HOST={api_server}\n",
|
" export CLEARML_API_HOST={api_server}\n",
|
||||||
" export TRAINS_WEB_HOST={web_server}\n",
|
" export CLEARML_WEB_HOST={web_server}\n",
|
||||||
" export TRAINS_FILES_HOST={files_server}\n",
|
" export CLEARML_FILES_HOST={files_server}\n",
|
||||||
" export DYNAMIC_INSTANCE_ID=`curl http://169.254.169.254/latest/meta-data/instance-id`\n",
|
" export DYNAMIC_INSTANCE_ID=`curl http://169.254.169.254/latest/meta-data/instance-id`\n",
|
||||||
" export TRAINS_WORKER_ID={worker_id}:$DYNAMIC_INSTANCE_ID\n",
|
" export CLEARML_WORKER_ID={worker_id}:$DYNAMIC_INSTANCE_ID\n",
|
||||||
" export TRAINS_API_ACCESS_KEY='{access_key}'\n",
|
" export CLEARML_API_ACCESS_KEY='{access_key}'\n",
|
||||||
" export TRAINS_API_SECRET_KEY='{secret_key}'\n",
|
" export CLEARML_API_SECRET_KEY='{secret_key}'\n",
|
||||||
" {bash_script}\n",
|
" {bash_script}\n",
|
||||||
" source ~/.bashrc\n",
|
" source ~/.bashrc\n",
|
||||||
" python -m trains_agent --config-file '/root/trains.conf' daemon --queue '{queue}' {docker}\n",
|
" python -m clearml_agent --config-file '/root/clearml.conf' daemon --queue '{queue}' {docker}\n",
|
||||||
" shutdown\n",
|
" shutdown\n",
|
||||||
" \"\"\".format(\n",
|
" \"\"\".format(\n",
|
||||||
" api_server=TRAINS_SERVER_API_SERVER,\n",
|
" api_server=CLEARML_API_SERVER,\n",
|
||||||
" web_server=TRAINS_SERVER_WEB_SERVER,\n",
|
" web_server=CLEARML_WEB_SERVER,\n",
|
||||||
" files_server=TRAINS_SERVER_FILES_SERVER,\n",
|
" files_server=CLEARML_FILES_SERVER,\n",
|
||||||
" worker_id=worker_id,\n",
|
" worker_id=worker_id,\n",
|
||||||
" access_key=TRAINS_ACCESS_KEY,\n",
|
" access_key=CLEARML_API_ACCESS_KEY,\n",
|
||||||
" secret_key=TRAINS_SECRET_KEY,\n",
|
" secret_key=CLEARML_API_SECRET_KEY,\n",
|
||||||
" queue=queue_name,\n",
|
" queue=queue_name,\n",
|
||||||
" git_user=TRAINS_GIT_USER,\n",
|
" git_user=CLEARML_AGENT_GIT_USER,\n",
|
||||||
" git_pass=TRAINS_GIT_PASS,\n",
|
" git_pass=CLEARML_AGENT_GIT_PASS,\n",
|
||||||
" trains_conf=EXTRA_TRAINS_CONF_ENCODED,\n",
|
" clearml_conf=EXTRA_CLEARML_CONF_ENCODED,\n",
|
||||||
" bash_script=EXTRA_BASH_SCRIPT,\n",
|
" bash_script=EXTRA_BASH_SCRIPT,\n",
|
||||||
" docker=\"--docker '{}'\".format(DEFAULT_DOCKER_IMAGE) if DEFAULT_DOCKER_IMAGE else \"\"\n",
|
" docker=\"--docker '{}'\".format(CLEARML_AGENT_DOCKER_IMAGE) if CLEARML_AGENT_DOCKER_IMAGE else \"\"\n",
|
||||||
" )\n",
|
" )\n",
|
||||||
"\n",
|
"\n",
|
||||||
" ec2 = boto3.client(\n",
|
" ec2 = boto3.client(\n",
|
||||||
@ -405,7 +409,7 @@
|
|||||||
"cell_type": "markdown",
|
"cell_type": "markdown",
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"source": [
|
"source": [
|
||||||
"###### Controller Implementation and Logic"
|
"#### Controller Implementation and Logic"
|
||||||
]
|
]
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
@ -430,18 +434,18 @@
|
|||||||
"\n",
|
"\n",
|
||||||
" # Internal definitions\n",
|
" # Internal definitions\n",
|
||||||
" workers_prefix = \"dynamic_aws\"\n",
|
" workers_prefix = \"dynamic_aws\"\n",
|
||||||
" # Worker's id in trains would be composed from:\n",
|
" # Worker's id in clearml would be composed from:\n",
|
||||||
" # prefix, name, instance_type and cloud_id separated by ';'\n",
|
" # prefix, name, instance_type and cloud_id separated by ';'\n",
|
||||||
" workers_pattern = re.compile(\n",
|
" workers_pattern = re.compile(\n",
|
||||||
" r\"^(?P<prefix>[^:]+):(?P<name>[^:]+):(?P<instance_type>[^:]+):(?P<cloud_id>[^:]+)\"\n",
|
" r\"^(?P<prefix>[^:]+):(?P<name>[^:]+):(?P<instance_type>[^:]+):(?P<cloud_id>[^:]+)\"\n",
|
||||||
" )\n",
|
" )\n",
|
||||||
"\n",
|
"\n",
|
||||||
" # Set up the environment variables for trains\n",
|
" # Set up the environment variables for clearml\n",
|
||||||
" os.environ[\"TRAINS_API_HOST\"] = TRAINS_SERVER_API_SERVER\n",
|
" os.environ[\"CLEARML_API_HOST\"] = CLEARML_API_SERVER\n",
|
||||||
" os.environ[\"TRAINS_WEB_HOST\"] = TRAINS_SERVER_WEB_SERVER\n",
|
" os.environ[\"CLEARML_WEB_HOST\"] = CLEARML_WEB_SERVER\n",
|
||||||
" os.environ[\"TRAINS_FILES_HOST\"] = TRAINS_SERVER_FILES_SERVER\n",
|
" os.environ[\"CLEARML_FILES_HOST\"] = CLEARML_FILES_SERVER\n",
|
||||||
" os.environ[\"TRAINS_API_ACCESS_KEY\"] = TRAINS_ACCESS_KEY\n",
|
" os.environ[\"CLEARML_API_ACCESS_KEY\"] = CLEARM_API_ACCESS_KEY\n",
|
||||||
" os.environ[\"TRAINS_API_SECRET_KEY\"] = TRAINS_SECRET_KEY\n",
|
" os.environ[\"CLEARML_API_SECRET_KEY\"] = CLEARML_API_SECRET_KEY\n",
|
||||||
" api_client = APIClient()\n",
|
" api_client = APIClient()\n",
|
||||||
"\n",
|
"\n",
|
||||||
" # Verify the requested queues exist and create those that doesn't exist\n",
|
" # Verify the requested queues exist and create those that doesn't exist\n",
|
||||||
@ -520,7 +524,7 @@
|
|||||||
" # skip resource types that might be needed\n",
|
" # skip resource types that might be needed\n",
|
||||||
" if resources in required_idle_resources:\n",
|
" if resources in required_idle_resources:\n",
|
||||||
" continue\n",
|
" continue\n",
|
||||||
" # Remove from both aws and trains all instances that are \n",
|
" # Remove from both aws and clearml all instances that are\n",
|
||||||
" # idle for longer than MAX_IDLE_TIME_MIN\n",
|
" # idle for longer than MAX_IDLE_TIME_MIN\n",
|
||||||
" if time() - timestamp > MAX_IDLE_TIME_MIN * 60.0:\n",
|
" if time() - timestamp > MAX_IDLE_TIME_MIN * 60.0:\n",
|
||||||
" cloud_id = workers_pattern.match(worker.id)[\"cloud_id\"]\n",
|
" cloud_id = workers_pattern.match(worker.id)[\"cloud_id\"]\n",
|
||||||
@ -535,7 +539,7 @@
|
|||||||
"cell_type": "markdown",
|
"cell_type": "markdown",
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"source": [
|
"source": [
|
||||||
"##### Execute Forever* (the controller is stateless, so you can always re-execute the notebook)"
|
"### Execute Forever* (the controller is stateless, so you can always re-execute the notebook)"
|
||||||
]
|
]
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
|
Loading…
Reference in New Issue
Block a user