mirror of
https://github.com/clearml/clearml
synced 2025-04-15 04:52:20 +00:00
Add CI/CD example (#815)
* Add clearml ci/cd example * Add section on how to set up CI/CD example
This commit is contained in:
parent
18a4065b2b
commit
51af6e833d
49
examples/cicd/README.md
Normal file
49
examples/cicd/README.md
Normal file
@ -0,0 +1,49 @@
|
||||
# Github CI/CD Examples
|
||||
|
||||

|
||||
|
||||
This repository serves as an example on how one can use various ClearML features to help with commen CI/CD tasks.
|
||||
3 distinct jobs are shown in `run_clearml_checks.yml`, each has a corresponding python file in which most of the logic resides.
|
||||
|
||||
## Authentication
|
||||
|
||||
### ClearML
|
||||
In order to connect to the ClearML server to fetch the necessary data, the Github runner needs to be authenticated. You can do that by adding the following keys in your Github Secrets:
|
||||
```
|
||||
CLEARML_API_ACCESS_KEY
|
||||
CLEARML_API_SECRET_KEY
|
||||
CLEARML_API_HOST
|
||||
```
|
||||
You can find the values for each of these keys either in your `clearml.conf`-file that was generated by running `clearml-init` if you already have keys, or by creating new credentials in the webUI.
|
||||

|
||||
|
||||
### Github
|
||||
We'll need a Github authentication token too, if we want to be able to post a new comment on the open PR. (Only needed for the first Job)
|
||||
You can find more info about a Github token [here](https://docs.github.com/en/authentication/keeping-your-account-and-data-secure/creating-a-personal-access-token)
|
||||
|
||||
## Setup Workflows
|
||||
In order to run this workflow, copy paste the contents of this `cicd` example folder into a new git repository. Move `run_clearml_checks.yml` to a new folder `.github/workflows/run_clearml_checks.yml` so Github can pick it up. Then add your ClearML and optionally your Github credentials to the repository secrets using the keys specified above or in `run_clearml_checks.yml`. Finally, open a PR on the new repository with a change, and see how the Github Actions spring to life!
|
||||
|
||||
## Job 1: Add scalars to an open PR
|
||||
|
||||
### When to use
|
||||
Imagine you have a certain model training experiment versioned in git and you created a new feature on a sidebranch. Now, when you open a PR to merge that branch, you want to make sure that this code has at least one successful task run in ClearML. To make that visible, we can then use the SDK to get the latest model metric from that specific ClearML task and automatically post it on the PR as a comment! With a link to the original experiment in ClearML of course.
|
||||
|
||||
### Technical details
|
||||
The job simply starts a Github Actions instance and runs `task_stats_to_comment.py` so the actual logic is all contained in that python script. So, make sure you include this script in your repository if you want to add this job to your own CI/CD.
|
||||
|
||||
## Job 2: Compare model performance
|
||||
|
||||
### When to use
|
||||
The second job is similar to the first, but now we want to make sure that we never merge a code change that will worsen the model’s performance. So we can again get the ClearML task corresponding to the current PR but this time compare the model metrics to the ones from the previous best ClearML task. We’ll only allow the pipeline to succeed, if the metrics are equal or better. In this way we can guarantee the quality of our main branch.
|
||||
|
||||
### Technical details
|
||||
Similarly to Job 1, we have put all logic into the `compare_models.py` file. Please note here: this script imports a function from Job 1, so if you only want to include this job into your own project, make sure to copy the function over as well.
|
||||
|
||||
## Job 3: Check if code is remotely runnable by the ClearML Agent
|
||||
|
||||
### When to use
|
||||
Usually it's a good idea to develop your code on your local computer and only later use the ClearML agent to remotely train the model for real. To make sure this is always possible, we can automatically set up the code from the PR on a ClearML agent and listen to the output. If the agent starts reporting iterations, it means the code is remotely runnable without issues. With this check, we can make sure that every commit on our main branch is ready for remote training.
|
||||
|
||||
### Technical details
|
||||
In this job, we run 1 more command apart from running the accompanying python file (`check_remotely_runnable.py`), namely the `clearml-task` command. We can use this command to remotely launch an existing repository. In this case we will remotely launch the `example_task.py` file and then capture the Task-ID from the console output using [ripgrep](https://github.com/BurntSushi/ripgrep). Then we can send the Task-ID to the python script to poll it's status and progress.
|
35
examples/cicd/check_remotely_runnable.py
Normal file
35
examples/cicd/check_remotely_runnable.py
Normal file
@ -0,0 +1,35 @@
|
||||
import sys
|
||||
import time
|
||||
from clearml import Task
|
||||
|
||||
|
||||
def check_task_status(task_id, timeout=600):
|
||||
"""Make sure the task can run by checking for iteration reports."""
|
||||
# Get the task object
|
||||
task = Task.get_task(task_id=task_id)
|
||||
start_time = time.time()
|
||||
if task:
|
||||
while time.time() - start_time < timeout:
|
||||
task_status = task.get_status()
|
||||
print(task_status)
|
||||
print(task.get_last_iteration())
|
||||
|
||||
if task_status == 'queued':
|
||||
# If queued, just reset the timeout timer
|
||||
start_time = time.time()
|
||||
if task_status in ['failed', 'stopped']:
|
||||
raise ValueError("Task did not run correctly, check logs in webUI.")
|
||||
elif task_status == 'in_progress':
|
||||
# Try to get the first iteration metric
|
||||
if task.get_last_iteration() > 0:
|
||||
task.mark_stopped()
|
||||
task.set_archived(True)
|
||||
return True
|
||||
time.sleep(5)
|
||||
raise ValueError('Triggered Timeout!')
|
||||
else:
|
||||
return f"Can not find task {task}.\n\n"
|
||||
|
||||
|
||||
if __name__ == '__main__':
|
||||
check_task_status(sys.argv[1])
|
30
examples/cicd/compare_models.py
Normal file
30
examples/cicd/compare_models.py
Normal file
@ -0,0 +1,30 @@
|
||||
import os
|
||||
from clearml import Task
|
||||
|
||||
from task_stats_to_comment import get_clearml_task_of_current_commit
|
||||
|
||||
|
||||
def compare_and_tag_task(commit_hash):
|
||||
"""Compare current performance to best previous performance and only allow equal or better."""
|
||||
current_task = get_clearml_task_of_current_commit(commit_hash)
|
||||
best_task = Task.get_task(project_name='Github CICD Video', task_name='cicd_test', tags=['Best Performance'])
|
||||
if best_task:
|
||||
best_metric = max(
|
||||
best_task.get_reported_scalars().get('Performance Metric').get('Series 1').get('y')
|
||||
)
|
||||
current_metric = max(
|
||||
current_task.get_reported_scalars().get('Performance Metric').get('Series 1').get('y')
|
||||
)
|
||||
print(f"Best metric in the system is: {best_metric} and current metric is {current_metric}")
|
||||
if current_metric >= best_metric:
|
||||
print("This means current metric is better or equal! Tagging as such.")
|
||||
current_task.add_tags(['Best Performance'])
|
||||
else:
|
||||
print("This means current metric is worse! Not tagging.")
|
||||
else:
|
||||
current_task.add_tags(['Best Performance'])
|
||||
|
||||
|
||||
if __name__ == '__main__':
|
||||
print(f"Running on commit hash: {os.getenv('COMMIT_ID')}")
|
||||
compare_and_tag_task(os.getenv('COMMIT_ID'))
|
23
examples/cicd/example_task.py
Normal file
23
examples/cicd/example_task.py
Normal file
@ -0,0 +1,23 @@
|
||||
"""This is a dummy ClearML task. This should be replaced by your own experiment code."""
|
||||
import time
|
||||
import random
|
||||
from clearml import Task
|
||||
from tqdm import tqdm
|
||||
|
||||
|
||||
task = Task.init(
|
||||
project_name='Github CICD Video',
|
||||
task_name='dummy_task',
|
||||
reuse_last_task_id=False
|
||||
)
|
||||
|
||||
random.seed()
|
||||
|
||||
for i in tqdm(range(10)):
|
||||
task.get_logger().report_scalar(
|
||||
title="Performance Metric",
|
||||
series="Series 1",
|
||||
iteration=i,
|
||||
value=random.randint(0, 100)
|
||||
)
|
||||
time.sleep(1)
|
BIN
examples/cicd/images/checks_green.png
Normal file
BIN
examples/cicd/images/checks_green.png
Normal file
Binary file not shown.
After Width: | Height: | Size: 168 KiB |
BIN
examples/cicd/images/credentials.png
Normal file
BIN
examples/cicd/images/credentials.png
Normal file
Binary file not shown.
After Width: | Height: | Size: 60 KiB |
5
examples/cicd/requirements.txt
Normal file
5
examples/cicd/requirements.txt
Normal file
@ -0,0 +1,5 @@
|
||||
tqdm==4.64.1
|
||||
clearml==1.7.2
|
||||
github3.py==3.2.0
|
||||
tabulate==0.9.0
|
||||
pandas==1.5.1
|
73
examples/cicd/run_clearml_checks.yml
Normal file
73
examples/cicd/run_clearml_checks.yml
Normal file
@ -0,0 +1,73 @@
|
||||
name: ClearML Checks
|
||||
on:
|
||||
pull_request:
|
||||
branches: [ main ]
|
||||
types: [ assigned, opened, edited, reopened, synchronize ]
|
||||
|
||||
jobs:
|
||||
task-stats-to-comment:
|
||||
env:
|
||||
CLEARML_API_ACCESS_KEY: ${{ secrets.ACCESS_KEY }}
|
||||
CLEARML_API_SECRET_KEY: ${{ secrets.SECRET_KEY }}
|
||||
CLEARML_API_HOST: ${{ secrets.CLEARML_API_HOST }}
|
||||
GH_TOKEN: ${{ secrets.GH_TOKEN }}
|
||||
COMMIT_ID: ${{ github.event.pull_request.head.sha }}
|
||||
runs-on: ubuntu-20.04
|
||||
steps:
|
||||
- uses: actions/checkout@v3
|
||||
- uses: actions/setup-python@v4
|
||||
with:
|
||||
python-version: '3.10'
|
||||
- name: Install ClearML
|
||||
run: |
|
||||
python -m pip install --upgrade pip clearml pandas tabulate github3.py Jinja2
|
||||
- name: Start the task
|
||||
id: launch_task
|
||||
run: |
|
||||
python task_stats_to_comment.py
|
||||
|
||||
compare-models:
|
||||
env:
|
||||
CLEARML_API_ACCESS_KEY: ${{ secrets.ACCESS_KEY }}
|
||||
CLEARML_API_SECRET_KEY: ${{ secrets.SECRET_KEY }}
|
||||
CLEARML_API_HOST: ${{ secrets.CLEARML_API_HOST }}
|
||||
COMMIT_ID: ${{ github.event.pull_request.head.sha }}
|
||||
runs-on: ubuntu-20.04
|
||||
steps:
|
||||
- uses: actions/checkout@v3
|
||||
- uses: actions/setup-python@v4
|
||||
with:
|
||||
python-version: '3.10'
|
||||
- name: Install ClearML
|
||||
run: |
|
||||
python -m pip install --upgrade pip clearml pandas tabulate github3.py Jinja2
|
||||
- name: Start the task
|
||||
id: launch_task
|
||||
run: |
|
||||
python compare_models.py
|
||||
|
||||
test-remote-runnable:
|
||||
env:
|
||||
CLEARML_API_ACCESS_KEY: ${{ secrets.ACCESS_KEY }}
|
||||
CLEARML_API_SECRET_KEY: ${{ secrets.SECRET_KEY }}
|
||||
CLEARML_API_HOST: ${{ secrets.CLEARML_API_HOST }}
|
||||
runs-on: ubuntu-20.04
|
||||
steps:
|
||||
- uses: actions/checkout@v3
|
||||
with:
|
||||
ref: ${{ github.event.pull_request.head.sha }}
|
||||
- uses: actions/setup-python@v4
|
||||
with:
|
||||
python-version: '3.10'
|
||||
- name: Install ClearML
|
||||
run: |
|
||||
python -m pip install --upgrade pip clearml
|
||||
sudo apt-get update && sudo apt-get install -y ripgrep
|
||||
- name: Start the task
|
||||
id: launch_task
|
||||
run: |
|
||||
echo "TASK_ID=$(
|
||||
clearml-task --project 'Github CICD Video' --name cicd_test --branch ${{ github.head_ref }} --script task.py --requirements requirements.txt --skip-task-init --queue default | rg -o 'Task id=(.*) sent' -r '$1'
|
||||
)" >> $GITHUB_OUTPUT
|
||||
- name: Poll for task progress
|
||||
run: python check_clearml_task_running.py "${{ steps.launch_task.outputs.TASK_ID }}"
|
105
examples/cicd/task_stats_to_comment.py
Normal file
105
examples/cicd/task_stats_to_comment.py
Normal file
@ -0,0 +1,105 @@
|
||||
import json
|
||||
import os
|
||||
|
||||
import pandas as pd
|
||||
from clearml import Task
|
||||
from github3 import login
|
||||
from tabulate import tabulate
|
||||
|
||||
|
||||
def create_output_tables(retrieve_scalars_dict):
|
||||
"""Extract data from ClearML into format for tabulation."""
|
||||
data = []
|
||||
for graph_title, graph_values in retrieve_scalars_dict.items():
|
||||
graph_data = []
|
||||
for series, series_values in graph_values.items():
|
||||
graph_data.append((graph_title, series, *series_values.values()))
|
||||
data += graph_data
|
||||
return sorted(data, key=lambda output: (output[0], output[1]))
|
||||
|
||||
|
||||
def create_comment_output(task, status):
|
||||
"""Create a markdown table from a ClearML task's output scalars."""
|
||||
retrieve_scalars_dict = task.get_last_scalar_metrics()
|
||||
if retrieve_scalars_dict:
|
||||
scalars_tables = create_output_tables(retrieve_scalars_dict)
|
||||
df = pd.DataFrame(data=scalars_tables, columns=["Title", "Series", "Last", "Min", "Max"])
|
||||
df.style.set_caption(f"Last scalars metrics for task {task.task_id}, task status {status}")
|
||||
table = tabulate(df, tablefmt="github", headers="keys", showindex=False)
|
||||
return table
|
||||
|
||||
|
||||
def create_stats_comment(project_stats):
|
||||
"""Create a comment on the current PR containing the ClearML task stats."""
|
||||
payload_fname = os.getenv('GITHUB_EVENT_PATH')
|
||||
with open(payload_fname, 'r') as f:
|
||||
payload = json.load(f)
|
||||
print(payload)
|
||||
owner, repo = payload.get("repository", {}).get("full_name", "").split("/")
|
||||
if owner and repo:
|
||||
gh = login(token=os.getenv("GH_TOKEN"))
|
||||
if gh:
|
||||
pull_request = gh.pull_request(owner, repo, payload.get("number"))
|
||||
if pull_request:
|
||||
pull_request.create_comment(project_stats)
|
||||
else:
|
||||
print(f'Can not comment PR, {payload.get("number")}')
|
||||
else:
|
||||
print(f"Can not log in to gh, {os.getenv('GH_TOKEN')}")
|
||||
|
||||
|
||||
def get_task_stats(task):
|
||||
"""Get the comment markdown for a stats table based on the task object."""
|
||||
task_status = task.get_status()
|
||||
# Try to get the task stats
|
||||
if task_status == "completed":
|
||||
table = create_comment_output(task, task_status)
|
||||
if table:
|
||||
return f"Results\n\n{table}\n\n" \
|
||||
f"You can view full task results [here]({task.get_output_log_web_page()})"
|
||||
else:
|
||||
return (f"Something went wrong when creating the task table. "
|
||||
f"Check full task [here]({task.get_output_log_web_page()})")
|
||||
# Update the user about the task status, can not get any stats
|
||||
else:
|
||||
return f"Task is in {task_status} status, this should not happen!"
|
||||
|
||||
|
||||
def get_clearml_task_of_current_commit(commit_id):
|
||||
"""Find the ClearML task that correspond to the exact codebase in the commit ID."""
|
||||
# Get the ID and Diff of all tasks based on the current commit hash, order by newest
|
||||
tasks = Task.query_tasks(
|
||||
task_filter={
|
||||
'order_by': ['-last_update'],
|
||||
'_all_': dict(fields=['script.version_num'],
|
||||
pattern=commit_id
|
||||
),
|
||||
'status': ['completed']
|
||||
},
|
||||
additional_return_fields=['script.diff']
|
||||
)
|
||||
|
||||
# If there are tasks, check which one has no diff: aka which one was run with the exact
|
||||
# code that is staged in this PR.
|
||||
if tasks:
|
||||
for task in tasks:
|
||||
if not task['script.diff']:
|
||||
return Task.get_task(task_id=task['id'])
|
||||
|
||||
# If no task was run yet with the exact PR code, raise an error and block the PR.
|
||||
raise ValueError("No task based on this code was found in ClearML."
|
||||
"Make sure to run it at least once before merging.")
|
||||
|
||||
|
||||
if __name__ == '__main__':
|
||||
# Main check: Does a ClearML task exist for this specific commit?
|
||||
print(f"Running on commit hash: {os.getenv('COMMIT_ID')}")
|
||||
task_obj = get_clearml_task_of_current_commit(os.getenv('COMMIT_ID'))
|
||||
|
||||
# If the task exists, we can tag it as such, so we know in the interface which one it is.
|
||||
task_obj.add_tags(['main_branch'])
|
||||
|
||||
# Let's also add the task metrics to the PR automatically.
|
||||
# Get the metrics from the task and create a comment on the PR.
|
||||
stats = get_task_stats(task_obj)
|
||||
create_stats_comment(stats)
|
Loading…
Reference in New Issue
Block a user