Add CI/CD example (#815)

* Add clearml ci/cd example

* Add section on how to set up CI/CD example
This commit is contained in:
Victor Sonck 2022-11-08 16:29:49 +01:00 committed by GitHub
parent 18a4065b2b
commit 51af6e833d
No known key found for this signature in database
GPG Key ID: 4AEE18F83AFDEB23
9 changed files with 320 additions and 0 deletions

49
examples/cicd/README.md Normal file
View File

@ -0,0 +1,49 @@
# Github CI/CD Examples
![Green is Good](images/checks_green.png)
This repository serves as an example on how one can use various ClearML features to help with commen CI/CD tasks.
3 distinct jobs are shown in `run_clearml_checks.yml`, each has a corresponding python file in which most of the logic resides.
## Authentication
### ClearML
In order to connect to the ClearML server to fetch the necessary data, the Github runner needs to be authenticated. You can do that by adding the following keys in your Github Secrets:
```
CLEARML_API_ACCESS_KEY
CLEARML_API_SECRET_KEY
CLEARML_API_HOST
```
You can find the values for each of these keys either in your `clearml.conf`-file that was generated by running `clearml-init` if you already have keys, or by creating new credentials in the webUI.
![WebUI Credentials Screenshot](images/credentials.png)
### Github
We'll need a Github authentication token too, if we want to be able to post a new comment on the open PR. (Only needed for the first Job)
You can find more info about a Github token [here](https://docs.github.com/en/authentication/keeping-your-account-and-data-secure/creating-a-personal-access-token)
## Setup Workflows
In order to run this workflow, copy paste the contents of this `cicd` example folder into a new git repository. Move `run_clearml_checks.yml` to a new folder `.github/workflows/run_clearml_checks.yml` so Github can pick it up. Then add your ClearML and optionally your Github credentials to the repository secrets using the keys specified above or in `run_clearml_checks.yml`. Finally, open a PR on the new repository with a change, and see how the Github Actions spring to life!
## Job 1: Add scalars to an open PR
### When to use
Imagine you have a certain model training experiment versioned in git and you created a new feature on a sidebranch. Now, when you open a PR to merge that branch, you want to make sure that this code has at least one successful task run in ClearML. To make that visible, we can then use the SDK to get the latest model metric from that specific ClearML task and automatically post it on the PR as a comment! With a link to the original experiment in ClearML of course.
### Technical details
The job simply starts a Github Actions instance and runs `task_stats_to_comment.py` so the actual logic is all contained in that python script. So, make sure you include this script in your repository if you want to add this job to your own CI/CD.
## Job 2: Compare model performance
### When to use
The second job is similar to the first, but now we want to make sure that we never merge a code change that will worsen the models performance. So we can again get the ClearML task corresponding to the current PR but this time compare the model metrics to the ones from the previous best ClearML task. Well only allow the pipeline to succeed, if the metrics are equal or better. In this way we can guarantee the quality of our main branch.
### Technical details
Similarly to Job 1, we have put all logic into the `compare_models.py` file. Please note here: this script imports a function from Job 1, so if you only want to include this job into your own project, make sure to copy the function over as well.
## Job 3: Check if code is remotely runnable by the ClearML Agent
### When to use
Usually it's a good idea to develop your code on your local computer and only later use the ClearML agent to remotely train the model for real. To make sure this is always possible, we can automatically set up the code from the PR on a ClearML agent and listen to the output. If the agent starts reporting iterations, it means the code is remotely runnable without issues. With this check, we can make sure that every commit on our main branch is ready for remote training.
### Technical details
In this job, we run 1 more command apart from running the accompanying python file (`check_remotely_runnable.py`), namely the `clearml-task` command. We can use this command to remotely launch an existing repository. In this case we will remotely launch the `example_task.py` file and then capture the Task-ID from the console output using [ripgrep](https://github.com/BurntSushi/ripgrep). Then we can send the Task-ID to the python script to poll it's status and progress.

View File

@ -0,0 +1,35 @@
import sys
import time
from clearml import Task
def check_task_status(task_id, timeout=600):
"""Make sure the task can run by checking for iteration reports."""
# Get the task object
task = Task.get_task(task_id=task_id)
start_time = time.time()
if task:
while time.time() - start_time < timeout:
task_status = task.get_status()
print(task_status)
print(task.get_last_iteration())
if task_status == 'queued':
# If queued, just reset the timeout timer
start_time = time.time()
if task_status in ['failed', 'stopped']:
raise ValueError("Task did not run correctly, check logs in webUI.")
elif task_status == 'in_progress':
# Try to get the first iteration metric
if task.get_last_iteration() > 0:
task.mark_stopped()
task.set_archived(True)
return True
time.sleep(5)
raise ValueError('Triggered Timeout!')
else:
return f"Can not find task {task}.\n\n"
if __name__ == '__main__':
check_task_status(sys.argv[1])

View File

@ -0,0 +1,30 @@
import os
from clearml import Task
from task_stats_to_comment import get_clearml_task_of_current_commit
def compare_and_tag_task(commit_hash):
"""Compare current performance to best previous performance and only allow equal or better."""
current_task = get_clearml_task_of_current_commit(commit_hash)
best_task = Task.get_task(project_name='Github CICD Video', task_name='cicd_test', tags=['Best Performance'])
if best_task:
best_metric = max(
best_task.get_reported_scalars().get('Performance Metric').get('Series 1').get('y')
)
current_metric = max(
current_task.get_reported_scalars().get('Performance Metric').get('Series 1').get('y')
)
print(f"Best metric in the system is: {best_metric} and current metric is {current_metric}")
if current_metric >= best_metric:
print("This means current metric is better or equal! Tagging as such.")
current_task.add_tags(['Best Performance'])
else:
print("This means current metric is worse! Not tagging.")
else:
current_task.add_tags(['Best Performance'])
if __name__ == '__main__':
print(f"Running on commit hash: {os.getenv('COMMIT_ID')}")
compare_and_tag_task(os.getenv('COMMIT_ID'))

View File

@ -0,0 +1,23 @@
"""This is a dummy ClearML task. This should be replaced by your own experiment code."""
import time
import random
from clearml import Task
from tqdm import tqdm
task = Task.init(
project_name='Github CICD Video',
task_name='dummy_task',
reuse_last_task_id=False
)
random.seed()
for i in tqdm(range(10)):
task.get_logger().report_scalar(
title="Performance Metric",
series="Series 1",
iteration=i,
value=random.randint(0, 100)
)
time.sleep(1)

Binary file not shown.

After

Width:  |  Height:  |  Size: 168 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 60 KiB

View File

@ -0,0 +1,5 @@
tqdm==4.64.1
clearml==1.7.2
github3.py==3.2.0
tabulate==0.9.0
pandas==1.5.1

View File

@ -0,0 +1,73 @@
name: ClearML Checks
on:
pull_request:
branches: [ main ]
types: [ assigned, opened, edited, reopened, synchronize ]
jobs:
task-stats-to-comment:
env:
CLEARML_API_ACCESS_KEY: ${{ secrets.ACCESS_KEY }}
CLEARML_API_SECRET_KEY: ${{ secrets.SECRET_KEY }}
CLEARML_API_HOST: ${{ secrets.CLEARML_API_HOST }}
GH_TOKEN: ${{ secrets.GH_TOKEN }}
COMMIT_ID: ${{ github.event.pull_request.head.sha }}
runs-on: ubuntu-20.04
steps:
- uses: actions/checkout@v3
- uses: actions/setup-python@v4
with:
python-version: '3.10'
- name: Install ClearML
run: |
python -m pip install --upgrade pip clearml pandas tabulate github3.py Jinja2
- name: Start the task
id: launch_task
run: |
python task_stats_to_comment.py
compare-models:
env:
CLEARML_API_ACCESS_KEY: ${{ secrets.ACCESS_KEY }}
CLEARML_API_SECRET_KEY: ${{ secrets.SECRET_KEY }}
CLEARML_API_HOST: ${{ secrets.CLEARML_API_HOST }}
COMMIT_ID: ${{ github.event.pull_request.head.sha }}
runs-on: ubuntu-20.04
steps:
- uses: actions/checkout@v3
- uses: actions/setup-python@v4
with:
python-version: '3.10'
- name: Install ClearML
run: |
python -m pip install --upgrade pip clearml pandas tabulate github3.py Jinja2
- name: Start the task
id: launch_task
run: |
python compare_models.py
test-remote-runnable:
env:
CLEARML_API_ACCESS_KEY: ${{ secrets.ACCESS_KEY }}
CLEARML_API_SECRET_KEY: ${{ secrets.SECRET_KEY }}
CLEARML_API_HOST: ${{ secrets.CLEARML_API_HOST }}
runs-on: ubuntu-20.04
steps:
- uses: actions/checkout@v3
with:
ref: ${{ github.event.pull_request.head.sha }}
- uses: actions/setup-python@v4
with:
python-version: '3.10'
- name: Install ClearML
run: |
python -m pip install --upgrade pip clearml
sudo apt-get update && sudo apt-get install -y ripgrep
- name: Start the task
id: launch_task
run: |
echo "TASK_ID=$(
clearml-task --project 'Github CICD Video' --name cicd_test --branch ${{ github.head_ref }} --script task.py --requirements requirements.txt --skip-task-init --queue default | rg -o 'Task id=(.*) sent' -r '$1'
)" >> $GITHUB_OUTPUT
- name: Poll for task progress
run: python check_clearml_task_running.py "${{ steps.launch_task.outputs.TASK_ID }}"

View File

@ -0,0 +1,105 @@
import json
import os
import pandas as pd
from clearml import Task
from github3 import login
from tabulate import tabulate
def create_output_tables(retrieve_scalars_dict):
"""Extract data from ClearML into format for tabulation."""
data = []
for graph_title, graph_values in retrieve_scalars_dict.items():
graph_data = []
for series, series_values in graph_values.items():
graph_data.append((graph_title, series, *series_values.values()))
data += graph_data
return sorted(data, key=lambda output: (output[0], output[1]))
def create_comment_output(task, status):
"""Create a markdown table from a ClearML task's output scalars."""
retrieve_scalars_dict = task.get_last_scalar_metrics()
if retrieve_scalars_dict:
scalars_tables = create_output_tables(retrieve_scalars_dict)
df = pd.DataFrame(data=scalars_tables, columns=["Title", "Series", "Last", "Min", "Max"])
df.style.set_caption(f"Last scalars metrics for task {task.task_id}, task status {status}")
table = tabulate(df, tablefmt="github", headers="keys", showindex=False)
return table
def create_stats_comment(project_stats):
"""Create a comment on the current PR containing the ClearML task stats."""
payload_fname = os.getenv('GITHUB_EVENT_PATH')
with open(payload_fname, 'r') as f:
payload = json.load(f)
print(payload)
owner, repo = payload.get("repository", {}).get("full_name", "").split("/")
if owner and repo:
gh = login(token=os.getenv("GH_TOKEN"))
if gh:
pull_request = gh.pull_request(owner, repo, payload.get("number"))
if pull_request:
pull_request.create_comment(project_stats)
else:
print(f'Can not comment PR, {payload.get("number")}')
else:
print(f"Can not log in to gh, {os.getenv('GH_TOKEN')}")
def get_task_stats(task):
"""Get the comment markdown for a stats table based on the task object."""
task_status = task.get_status()
# Try to get the task stats
if task_status == "completed":
table = create_comment_output(task, task_status)
if table:
return f"Results\n\n{table}\n\n" \
f"You can view full task results [here]({task.get_output_log_web_page()})"
else:
return (f"Something went wrong when creating the task table. "
f"Check full task [here]({task.get_output_log_web_page()})")
# Update the user about the task status, can not get any stats
else:
return f"Task is in {task_status} status, this should not happen!"
def get_clearml_task_of_current_commit(commit_id):
"""Find the ClearML task that correspond to the exact codebase in the commit ID."""
# Get the ID and Diff of all tasks based on the current commit hash, order by newest
tasks = Task.query_tasks(
task_filter={
'order_by': ['-last_update'],
'_all_': dict(fields=['script.version_num'],
pattern=commit_id
),
'status': ['completed']
},
additional_return_fields=['script.diff']
)
# If there are tasks, check which one has no diff: aka which one was run with the exact
# code that is staged in this PR.
if tasks:
for task in tasks:
if not task['script.diff']:
return Task.get_task(task_id=task['id'])
# If no task was run yet with the exact PR code, raise an error and block the PR.
raise ValueError("No task based on this code was found in ClearML."
"Make sure to run it at least once before merging.")
if __name__ == '__main__':
# Main check: Does a ClearML task exist for this specific commit?
print(f"Running on commit hash: {os.getenv('COMMIT_ID')}")
task_obj = get_clearml_task_of_current_commit(os.getenv('COMMIT_ID'))
# If the task exists, we can tag it as such, so we know in the interface which one it is.
task_obj.add_tags(['main_branch'])
# Let's also add the task metrics to the PR automatically.
# Get the metrics from the task and create a comment on the PR.
stats = get_task_stats(task_obj)
create_stats_comment(stats)