A/B Benchmark: qwen3.5-122b vs glm-5.1 for evaluator #116

Open
opened 2026-05-25 14:08:23 +00:00 by NW · 0 comments
Owner

Context

Moved evaluator from glm-5.1 (IF=90) to qwen3.5-122b (IF=92, 12.4M pulls).

Task

Run 10 evaluation cycles on a completed pipeline issue.

Expected

~4% score improvement, lower instruction drift.

Refs: agent-evolution/data/model-research-2026-05-24.md

## Context Moved evaluator from glm-5.1 (IF=90) to qwen3.5-122b (IF=92, 12.4M pulls). ## Task Run 10 evaluation cycles on a completed pipeline issue. ## Expected ~4% score improvement, lower instruction drift. Refs: agent-evolution/data/model-research-2026-05-24.md
NW added this to the [Evolution] APAW Model Optimization May 2026 milestone 2026-05-25 14:08:23 +00:00
Sign in to join this conversation.
No Label
1 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: UniqueSoft/APAW#116