Files
APAW/docker/docker-compose.opencompass.yml
Deploy Bot 397d8367e9 feat: milestone 78 — objective model evolution from benchmark research
- Reassign 29/30 agents based on capability-analyst web research
- deepseek-v4-pro: 14 agents (coding SOTA: SWE-bench 80.6%, LiveCodeBench 93.5%)
- minimax-m3☁️ 8 agents (agentic: BrowseComp 83.5%, 12h autonomous)
- glm-5.1: 4 agents (CyberGym 68.7% SOTA, sustained rounds)
- minimax-m2.5☁️ 2 agents (frontend productivity, 2.2M pulls)
- kimi-k2.6: 1 agent (ONLY true multimodal)
- Add OpenCompass evaluation container (docker, scripts) for future objective runs
- Evidence saved to agent-evolution/data/research-report.json (598 lines, 6 models)

Data gaps honestly documented: minimax-m3/m2.5, qwen3-coder, kimi-k2.6 benchmark tables are image-only on Ollama.
2026-06-01 20:50:10 +01:00

29 lines
580 B
YAML

version: "3.8"
services:
opencompass:
build:
context: ..
dockerfile: docker/Dockerfile.opencompass
container_name: opencompass
environment:
- OLLAMA_API_URL=http://ollama:11434
volumes:
- opencompass-data:/data
- ../scripts/opencompass-setup.sh:/setup.sh:ro
- ../scripts/opencompass-eval.sh:/eval.sh:ro
networks:
- ollama-net
entrypoint: ["/bin/bash", "/eval.sh"]
profiles:
- eval
volumes:
opencompass-data:
driver: local
networks:
ollama-net:
external: true
name: docker_ollama-net