APAW/docker/Dockerfile.opencompass at ff87670d8c634f57b64682e344d24c4e0161149c - APAW - Gitea: Git with a cup of tea

UniqueSoft/APAW

Files

Deploy Bot 397d8367e9 feat: milestone 78 — objective model evolution from benchmark research

- Reassign 29/30 agents based on capability-analyst web research
- deepseek-v4-pro: 14 agents (coding SOTA: SWE-bench 80.6%, LiveCodeBench 93.5%)
- minimax-m3☁️ 8 agents (agentic: BrowseComp 83.5%, 12h autonomous)
- glm-5.1: 4 agents (CyberGym 68.7% SOTA, sustained rounds)
- minimax-m2.5☁️ 2 agents (frontend productivity, 2.2M pulls)
- kimi-k2.6: 1 agent (ONLY true multimodal)
- Add OpenCompass evaluation container (docker, scripts) for future objective runs
- Evidence saved to agent-evolution/data/research-report.json (598 lines, 6 models)

Data gaps honestly documented: minimax-m3/m2.5, qwen3-coder, kimi-k2.6 benchmark tables are image-only on Ollama.

2026-06-01 20:50:10 +01:00

6 lines

79 B

Docker

Raw Blame History

 FROM python:3.10
 RUN pip install --no-cache-dir -U opencompass
 WORKDIR /data