- Reassign 29/30 agents based on capability-analyst web research - deepseek-v4-pro: 14 agents (coding SOTA: SWE-bench 80.6%, LiveCodeBench 93.5%) - minimax-m3☁️ 8 agents (agentic: BrowseComp 83.5%, 12h autonomous) - glm-5.1: 4 agents (CyberGym 68.7% SOTA, sustained rounds) - minimax-m2.5☁️ 2 agents (frontend productivity, 2.2M pulls) - kimi-k2.6: 1 agent (ONLY true multimodal) - Add OpenCompass evaluation container (docker, scripts) for future objective runs - Evidence saved to agent-evolution/data/research-report.json (598 lines, 6 models) Data gaps honestly documented: minimax-m3/m2.5, qwen3-coder, kimi-k2.6 benchmark tables are image-only on Ollama.
29 lines
580 B
YAML
29 lines
580 B
YAML
version: "3.8"
|
|
|
|
services:
|
|
opencompass:
|
|
build:
|
|
context: ..
|
|
dockerfile: docker/Dockerfile.opencompass
|
|
container_name: opencompass
|
|
environment:
|
|
- OLLAMA_API_URL=http://ollama:11434
|
|
volumes:
|
|
- opencompass-data:/data
|
|
- ../scripts/opencompass-setup.sh:/setup.sh:ro
|
|
- ../scripts/opencompass-eval.sh:/eval.sh:ro
|
|
networks:
|
|
- ollama-net
|
|
entrypoint: ["/bin/bash", "/eval.sh"]
|
|
profiles:
|
|
- eval
|
|
|
|
volumes:
|
|
opencompass-data:
|
|
driver: local
|
|
|
|
networks:
|
|
ollama-net:
|
|
external: true
|
|
name: docker_ollama-net
|