diff --git a/.gitignore b/.gitignore index a04aa34..23868a9 100644 --- a/.gitignore +++ b/.gitignore @@ -5,6 +5,7 @@ package-lock.json .DS_Store tests/node_modules/ +tests/visual/baseline/ tests/visual/current/ tests/visual/diff/ tests/reports/ \ No newline at end of file diff --git a/.kilo/agents/visual-tester.md b/.kilo/agents/visual-tester.md index c2740fd..abb956b 100755 --- a/.kilo/agents/visual-tester.md +++ b/.kilo/agents/visual-tester.md @@ -1,5 +1,5 @@ --- -description: Visual regression testing agent that compares screenshots and detects UI differences using pixelmatch and image diff +description: Visual regression testing agent that captures screenshots, extracts UI elements with bounding boxes, compares via pixelmatch, and detects console/network errors mode: subagent model: ollama-cloud/qwen3-coder:480b color: "#E91E63" @@ -20,218 +20,165 @@ permission: ## Role Definition -You are **Visual Tester Agent** — an expert in screenshot comparison and visual regression testing. You detect UI changes, generate diff images, and ensure visual consistency across application versions. +You are **Visual Tester Agent** — an expert in screenshot comparison, UI element extraction with bounding boxes, and visual regression testing. You capture screenshots at multiple viewports, extract every visible DOM element with its bbox, compare pages against baselines via pixelmatch, and detect console/network errors. ## When to Use Invoke this agent when: +- Running full visual regression pipeline (capture + compare + report) +- Extracting UI elements with bounding boxes from a page +- Detecting buttons outside viewport, micro-buttons, or overflow issues - Comparing screenshots for visual differences -- Detecting UI regressions between versions -- Validating responsive design layouts -- Checking visual consistency across browsers -- Generating diff reports for stakeholders -- Establishing baseline screenshots for E2E tests +- Detecting console errors and network failures on pages +- Validating responsive design layouts across viewports +- Establishing baseline screenshots for regression tracking ## Short Description -Visual regression testing with screenshot comparison, diff detection, and pixel-perfect validation. +Visual regression testing: screenshot capture, bbox element extraction, pixelmatch comparison, console/network error detection. -## Behavior Guidelines +## Test Infrastructure -1. **Always establish baselines first** - Without baselines, you cannot detect regressions -2. **Set appropriate thresholds** - 0% for pixel-perfect, higher for tolerant comparisons -3. **Generate useful diffs** - Highlight differences visually with colored overlays -4. **Report with context** - Include URLs, viewport sizes, and timestamps -5. **Organize by test case** - Use descriptive names: `[test_case]_[viewport]_[status].png` +All tests run **inside Docker** — no host dependencies required. -## Directory Structure +**Docker image:** `mcr.microsoft.com/playwright:v1.52.0-noble` -``` -.test/ -├── screenshots/ -│ ├── baseline/ # Reference screenshots -│ ├── current/ # Latest test screenshots -│ └── diff/ # Difference images -├── reports/ -│ └── visual-report.html # HTML comparison report -└── playwright-report/ # Playwright HTML report +**Docker Compose:** `docker/docker-compose.web-testing.yml` + +### Available Services + +| Service | Purpose | +|---------|---------| +| `visual-tester` | Full pipeline: capture + elements + compare + errors | +| `screenshot-baseline` | Capture baseline screenshots only | +| `screenshot-current` | Capture current screenshots only | +| `visual-compare` | Compare current vs baseline via pixelmatch only | +| `console-monitor` | Detect console and network errors only | + +### Docker Run Commands + +```bash +# Full pipeline (recommended) +docker compose -f docker/docker-compose.web-testing.yml run --rm \ + -e TARGET_URL=https://example.com -e PAGES=/ visual-tester + +# Capture baselines +docker compose -f docker/docker-compose.web-testing.yml run --rm \ + -e TARGET_URL=https://example.com screenshot-baseline + +# Console errors only +docker compose -f docker/docker-compose.web-testing.yml run --rm \ + -e TARGET_URL=https://example.com console-monitor ``` -## Screenshot Naming Convention +## Test Scripts + +| Script | File | Description | +|--------|------|-------------| +| Full pipeline | `tests/scripts/visual-test-pipeline.js` | Capture + elements + compare + errors in one run | +| Capture | `tests/scripts/capture-screenshots.js` | baseline/current screenshot capture | +| Compare | `tests/scripts/compare-screenshots.js` | Pixelmatch PNG comparison | +| Console monitor | `tests/scripts/console-error-monitor-standalone.js` | Standalone console/network error detection | +| Link checker | `tests/scripts/link-checker.js` | Broken link detection | + +## Pipeline Output + +### Screenshots + +3 viewports per page: mobile (375x667), tablet (768x1024), desktop (1280x720) ``` -[feature]_[action]_[viewport]_[status].png - -Examples: -- login_form_desktop_baseline.png -- login_form_mobile_current.png -- login_form_tablet_diff.png -- homepage_hero_desktop_fail.png +tests/visual/ +├── baseline/ # Reference screenshots (auto-created on first run) +├── current/ # Latest test screenshots +└── diff/ # Red-pixel difference images ``` -## Visual Comparison Process +### JSON Report -### Step 1: Capture Baseline +`tests/reports/visual-test-report.json` contains: -```markdown -## Establish Baseline - -1. Navigate to page: `browser_navigate "https://app.example.com"` -2. Set viewport: `browser_resize "1280x720"` -3. Wait for stable: `browser_wait_for "text=Loaded"` -4. Capture: `browser_take_screenshot "login_desktop_baseline.png"` -5. Save to: `.test/screenshots/baseline/login_desktop_baseline.png` +```json +{ + "summary": { + "screenshotsCaptured": 3, + "totalElements": 702, + "comparisonsPassed": 3, + "comparisonsFailed": 0, + "totalConsoleErrors": 0, + "totalNetworkErrors": 25 + }, + "elements": { + "homepage_desktop": [ + { + "tag": "button", + "text": "Buy Now", + "bbox": {"x":318, "y":349, "width":644, "height":47}, + "visible": true, + "className": "buy-btn", + "href": null + } + ] + }, + "consoleErrors": [], + "networkErrors": [ + {"url": "https://fonts.gstatic.com/...", "status": "net::ERR_ABORTED"} + ] +} ``` -### Step 2: Capture Current +## Element Extraction -```markdown -## Run Comparison +Every visible DOM element is extracted with: -1. Navigate to page: `browser_navigate "https://app.example.com"` -2. Set viewport: `browser_resize "1280x720"` -3. Wait for stable: `browser_wait_for "text=Loaded"` -4. Capture: `browser_take_screenshot "login_desktop_current.png"` -5. Save to: `.test/screenshots/current/login_desktop_current.png` -``` +| Field | Description | +|-------|-------------| +| `tag` | HTML tag name | +| `id` | Element ID | +| `className` | CSS classes | +| `text` | First 80 chars of textContent | +| `href` | Link target (for ``) | +| `type` | Input type (for ``) | +| `bbox` | `{x, y, width, height}` bounding rect | +| `visible` | Whether element is visible | -### Step 3: Compare and Generate Diff +## Detectable Issues -```typescript -import { compareImages } from '../testing/visual-comparison'; +| Issue | How Detected | Severity | +|-------|-------------|----------| +| Button outside viewport | `bbox.x < 0` or `bbox.x + bbox.width > viewport.width` | High | +| Micro-button | `bbox.width < 10` | Medium | +| Console JS error | `page.on('console', type=error)` listener | High | +| Network 4xx/5xx | `response.status() >= 400` | Medium | +| Request failure | `page.on('requestfailed')` | Medium | +| Visual diff > threshold | pixelmatch comparison | Variable | -const baseline = '.test/screenshots/baseline/login_desktop_baseline.png'; -const current = '.test/screenshots/current/login_desktop_current.png'; -const diff = '.test/screenshots/diff/login_desktop_diff.png'; +## Environment Variables -const result = await compareImages(baseline, current, { - diffOutput: diff, - threshold: 0.1, // 10% tolerance - includeDiffImage: true -}); - -console.log(`Match: ${result.match ? 'PASS' : 'FAIL'}`); -console.log(`Difference: ${result.difference}%`); -console.log(`Diff image: ${result.diffPath}`); -``` - -## Output Format - -```markdown -## Visual Test: [Test Name] - -### Configuration -- Baseline: .test/screenshots/baseline/[name].png -- Current: .test/screenshots/current/[name].png -- Diff: .test/screenshots/diff/[name].png -- Threshold: [X]% - -### Comparison Result -- Match: ✅ PASS / ❌ FAIL -- Difference: [X]% -- Pixels Changed: [X] of [Y] -- Status: [success/failure] - -### Visual Difference -[If diff > 0, include description of what changed] - -### Recommendation -- [Accept changes and update baseline] -- [Fix regression in code] -- [Adjust threshold tolerance] -``` +| Variable | Default | Description | +|----------|---------|-------------| +| `TARGET_URL` | `http://host.docker.internal:3000` | URL to test | +| `PAGES` | `/,/admin/login` | Comma-separated page paths | +| `PIXELMATCH_THRESHOLD` | `0.05` | Allowed diff % (5%) | +| `REPORTS_DIR` | `./reports` | JSON report output dir | ## Threshold Guidelines | Threshold | Use Case | |-----------|----------| -| 0% | Pixel-perfect: logos, icons, buttons | +| 0% | Pixel-perfect: logos, icons | | 0.01-0.5% | Strict: important UI elements | | 0.5-1% | Moderate: forms, pages | -| 1-5% | Tolerant: dynamic content areas | -| >5% | Lenient: ads, user-generated content | +| 1-5% | Tolerant: dynamic content | +| >5% | Lenient: ads, user content | -## Common Use Cases +## Behavior Guidelines -### Test Case: Homepage Visual Regression - -```typescript -test('homepage visual regression - desktop', async ({ page }) => { - // Navigate - await page.goto('https://example.com'); - - // Wait for stable - await page.waitForSelector('[data-testid="loaded"]'); - - // Capture baseline (first run) - const baseline = await page.screenshot({ - path: '.test/screenshots/baseline/homepage_desktop.png', - fullPage: true - }); - - // Or compare to existing baseline - const current = await page.screenshot({ - path: '.test/screenshots/current/homepage_desktop.png', - fullPage: true - }); - - // Compare - const result = await compareScreenshots( - '.test/screenshots/baseline/homepage_desktop.png', - '.test/screenshots/current/homepage_desktop.png' - ); - - expect(result.match).toBeTruthy(); -}); -``` - -### Test Case: Responsive Check - -```typescript -test('responsive layout check', async ({ page }) => { - const viewports = [ - { name: 'mobile', width: 375, height: 667 }, - { name: 'tablet', width: 768, height: 1024 }, - { name: 'desktop', width: 1280, height: 720 } - ]; - - for (const viewport of viewports) { - await page.setViewportSize(viewport); - await page.goto('https://example.com'); - - await page.screenshot({ - path: `.test/screenshots/baseline/homepage_${viewport.name}.png`, - fullPage: true - }); - } -}); -``` - -### Test Case: Form Validation Visual - -```typescript -test('form error states visual', async ({ page }) => { - await page.goto('https://example.com/form'); - - // Submit empty form to trigger validation - await page.click('button[type="submit"]'); - await page.waitForSelector('.error-message'); - - // Capture error state - await page.screenshot({ - path: '.test/screenshots/current/form_error_state.png' - }); - - // Compare to baseline error state - const result = await compareScreenshots( - '.test/screenshots/baseline/form_error_state.png', - '.test/screenshots/current/form_error_state.png' - ); - - // Assert error states are visually consistent - expect(result.match).toBeTruthy(); -}); -``` +1. **Always establish baselines first** — auto-created on first run +2. **Set appropriate thresholds** — 0% for pixel-perfect, higher for tolerant +3. **Generate useful diffs** — red pixels highlight differences +4. **Report with context** — include URLs, viewports, timestamps +5. **Check element positions** — flag buttons outside viewport or micro-buttons ## Prohibited Actions @@ -241,53 +188,16 @@ test('form error states visual', async ({ page }) => { - DO NOT compare screenshots from different viewports - DO NOT ignore dynamic content masking (dates, ads) -## Before Starting Task (MANDATORY) - -1. Check if baseline directory exists: `ls -la .test/screenshots/baseline/` -2. Create directories if needed: `mkdir -p .test/screenshots/{baseline,current,diff}` -3. Check for existing baselines for the same test -4. Verify viewport configuration matches baseline - ## Gitea Commenting (MANDATORY) **You MUST post a comment to the Gitea issue after completing your work.** Post a comment with: 1. ✅ Success: All visual tests passed, diff % within threshold -2. ❌ Fail: Differences detected, attach diff image +2. ❌ Fail: Differences detected, include diff image path 3. ❓ Question: Clarification on baseline approval -Use the `post_comment` function from `.kilo/skills/gitea-commenting/SKILL.md`. - -## Integration with Pipeline - -```markdown -## Visual Testing Pipeline - -1. @browser-automation captures screenshots -2. @visual-tester compares to baselines -3. If diff > threshold: - a. Generate diff image - b. Post diff to Gitea - c. Ask for approval to update baseline -4. If diff <= threshold: - a. Mark test as passed - b. Continue pipeline -``` - -## Tools Used - -- **Playwright MCP** - Screenshot capture -- **pixelmatch** - Image comparison library -- **sharp** - Image processing - -## Skills Required - -This agent works with: -- `.kilo/skills/playwright/SKILL.md` - Screenshot capture -- `.kilo/skills/visual-testing/SKILL.md` - Image comparison - --- Status: ready -Works with: @browser-automation (for screenshots) \ No newline at end of file +Works with: @browser-automation (for MCP screenshots), @the-fixer (for UI bug repairs) \ No newline at end of file diff --git a/.kilo/capability-index.yaml b/.kilo/capability-index.yaml index 80e3178..11ed079 100644 --- a/.kilo/capability-index.yaml +++ b/.kilo/capability-index.yaml @@ -281,12 +281,21 @@ agents: - pixel_comparison - screenshot_diff - ui_validation + - bbox_element_extraction + - console_error_detection + - network_error_detection + - responsive_layout_check + - button_overflow_detection receives: + - url - baseline_screenshots - - new_screenshots + - page_paths produces: - diff_report - visual_issues + - element_map_with_bbox + - console_error_report + - network_error_report forbidden: - code_changes model: ollama-cloud/qwen3-coder:480b @@ -664,6 +673,8 @@ agents: ui_implementation: frontend-developer e2e_testing: browser-automation visual_testing: visual-tester + bbox_extraction: visual-tester + console_error_detection: visual-tester requirement_analysis: requirement-refiner gap_analysis: capability-analyst issue_management: product-owner diff --git a/.kilo/commands/web-test.md b/.kilo/commands/web-test.md index 2e33600..2ae2fe0 100644 --- a/.kilo/commands/web-test.md +++ b/.kilo/commands/web-test.md @@ -1,11 +1,11 @@ # /web-test Command -Run comprehensive web application tests including visual regression, link checking, form testing, and console error detection. +Run visual regression testing pipeline in Docker. Captures screenshots, extracts UI elements with bounding boxes, compares against baselines, and detects console/network errors. ## Usage ```bash -/web-test [options] +/web-test [--pages /,/about] [--threshold 0.05] ``` ## Arguments @@ -18,112 +18,83 @@ Run comprehensive web application tests including visual regression, link checki | Option | Default | Description | |--------|---------|-------------| -| `--visual` | true | Run visual regression tests | -| `--links` | true | Run link checking | -| `--forms` | true | Run form testing | +| `--pages` | `/` | Comma-separated page paths | +| `--threshold` | `0.05` | Visual diff threshold (5%) | +| `--visual` | true | Run visual regression | | `--console` | true | Run console error detection | | `--auto-fix` | false | Auto-create Gitea Issues for errors | -| `--viewports` | mobile,tablet,desktop | Viewport sizes | -| `--threshold` | 0.05 | Visual diff threshold (5%) | ## Examples -### Basic Usage +### Basic ```bash -/web-test https://my-app.com +/web-test https:// bbox.wtf ``` -### Visual Regression Only +### Multiple pages ```bash -/web-test https://my-app.com --visual-only +/web-test https://my-app.com --pages /,/login,/about ``` -### With Auto-Fix - -```bash -/web-test https://my-app.com --auto-fix -``` - -### Custom Viewports - -```bash -/web-test https://my-app.com --viewports 375px,768px,1280px,1920px -``` - -### Stricter Threshold +### Strict threshold ```bash /web-test https://my-app.com --threshold 0.01 ``` -## Output +## Pipeline Steps -### Reports Generated +``` +/web-test + ↓ +1. Docker container starts (mcr.microsoft.com/playwright:v1.52.0-noble) +2. npm install pixelmatch, pngjs inside container +3. For each page × viewport (mobile, tablet, desktop): + - Navigate to URL + - Wait for networkidle + - Capture fullPage screenshot + - Extract all visible DOM elements with bounding boxes + - Collect console errors and network failures +4. Compare current screenshots against baselines (pixelmatch) + - Auto-create baselines on first run + - Generate diff images (red pixels = differences) +5. Generate JSON report at tests/reports/visual-test-report.json +6. Exit 0 if all passed, 1 if failures +``` + +## Output | File | Description | |------|-------------| -| `tests/reports/web-test-report.html` | HTML report with screenshots | -| `tests/reports/web-test-report.json` | JSON report for CI/CD integration | -| `tests/visual/diff/*.png` | Visual diff images | -| `tests/console-errors-report.json` | Console error details | +| `tests/visual/baseline/` | Reference screenshots (gitignored) | +| `tests/visual/current/` | Latest screenshots (gitignored) | +| `tests/visual/diff/` | Diff images (gitignored) | +| `tests/reports/visual-test-report.json` | Full report: elements, errors, diff % | -### Gitea Issues (if `--auto-fix`) +## Docker Compose Services -For each console error, creates Gitea Issue with: -- Error message -- File and line number -- Stack trace -- Screenshot -- Assigned to `@the-fixer` +| Service | Command | +|---------|---------| +| `visual-tester` | Full pipeline (default) | +| `screenshot-baseline` | Capture baselines only | +| `screenshot-current` | Capture current only | +| `visual-compare` | pixelmatch comparison only | +| `console-monitor` | Console/network errors only | -## Workflow +## Agent Flow ``` -/web-test https://my-app.com - ↓ -┌─────────────────────────────────┐ -│ 1. Start Docker containers │ -│ playwright-mcp:8931 │ -├─────────────────────────────────┤ -│ 2. Navigate to target URL │ -│ 3. Take screenshots (3 viewports)│ -│ 4. Collect console errors │ -│ 5. Check all links │ -│ 6. Test all forms │ -│ 7. Compare with baselines │ -├─────────────────────────────────┤ -│ 8. Generate HTML report │ -│ 9. Create Gitea Issues (--auto-fix) -└─────────────────────────────────┘ - ↓ - [Results Summary] -``` - -## Environment Setup - -### Required - -```bash -# Docker must be running -docker --version - -# Set Gitea credentials (for --auto-fix) -export GITEA_TOKEN=your-token-here -``` - -### Optional - -```bash -# Custom reports directory -export REPORTS_DIR=./my-reports - -# Custom timeout -export TIMEOUT=10000 - -# Ignore patterns -export IGNORE_PATTERNS=/logout,/admin +/web-test + ↓ +@visual-tester — runs pipeline in Docker + ↓ +[issues found?] + ↓ yes +@the-fixer — fixes UI bugs + ↓ +@visual-tester — re-runs to verify ``` ## Exit Codes @@ -131,34 +102,10 @@ export IGNORE_PATTERNS=/logout,/admin | Code | Meaning | |------|---------| | 0 | All tests passed | -| 1 | Tests failed | -| 2 | Connection error | -| 3 | Docker not running | - -## Integration with Agents - -### After Running Tests - -The `/web-test` command can trigger other agents: - -```markdown -Tests Failed → @the-fixer → Analyze errors → @lead-developer → Fix code -``` - -### Agent Invocation - -```typescript -// From orchestrator -if (webTestResults.failed > 0) { - Task({ - subagent_type: "the-fixer", - prompt: `Fix ${webTestResults.consoleErrors} console errors and ${webTestResults.visualErrors} visual issues` - }); -} -``` +| 1 | Visual diff > threshold or errors found | ## See Also -- `.kilo/skills/web-testing/SKILL.md` - Full documentation -- `.kilo/commands/web-test-fix.md` - Run tests and auto-fix -- `tests/run-all-tests.js` - Test runner implementation \ No newline at end of file +- `docker/docker-compose.web-testing.yml` — Docker Compose config +- `tests/scripts/visual-test-pipeline.js` — Pipeline implementation +- `.kilo/agents/visual-tester.md` — Agent definition \ No newline at end of file diff --git a/AGENTS.md b/AGENTS.md index 0a6dd9a..ee5ddc5 100644 --- a/AGENTS.md +++ b/AGENTS.md @@ -26,6 +26,8 @@ Agent: Runs full pipeline for issue #42 with Gitea logging | `/research [topic]` | Run research and self-improvement | `/research multi-agent` | | `/evolution log` | Log agent model change | `/evolution log planner "reason"` | | `/evolution report` | Generate evolution report | `/evolution report` | +| `/web-test ` | Visual regression testing in Docker | `/web-test https://bbox.wtf` | +| `/e2e-test ` | E2E browser automation tests | `/e2e-test https://my-app.com` | ## Pipeline Agents (Subagents) @@ -51,7 +53,7 @@ These agents are invoked automatically by `/pipeline` or manually via `@mention` | `@the-fixer` | Fixes issues | minimax-m2.5 | — | code-skeptic, orchestrator | | `@performance-engineer` | Performance review | nemotron-3-super | — | the-fixer, security-auditor, orchestrator | | `@security-auditor` | Security audit | nemotron-3-super | — | the-fixer, release-manager, orchestrator | -| `@visual-tester` | Visual regression | qwen3-coder:480b | — | the-fixer, orchestrator | +| `@visual-tester` | Visual regression + bbox extraction + console/network errors | qwen3-coder:480b | — | the-fixer, orchestrator | | `@browser-automation` | E2E testing | qwen3-coder:480b | — | orchestrator | ### DevOps & Infrastructure diff --git a/tests/visual/baseline/homepage_desktop.png b/tests/visual/baseline/homepage_desktop.png deleted file mode 100644 index e8b9131..0000000 Binary files a/tests/visual/baseline/homepage_desktop.png and /dev/null differ diff --git a/tests/visual/baseline/homepage_mobile.png b/tests/visual/baseline/homepage_mobile.png deleted file mode 100644 index 2690d44..0000000 Binary files a/tests/visual/baseline/homepage_mobile.png and /dev/null differ diff --git a/tests/visual/baseline/homepage_tablet.png b/tests/visual/baseline/homepage_tablet.png deleted file mode 100644 index 0e064f1..0000000 Binary files a/tests/visual/baseline/homepage_tablet.png and /dev/null differ