- Reassign 29/30 agents based on capability-analyst web research - deepseek-v4-pro: 14 agents (coding SOTA: SWE-bench 80.6%, LiveCodeBench 93.5%) - minimax-m3☁️ 8 agents (agentic: BrowseComp 83.5%, 12h autonomous) - glm-5.1: 4 agents (CyberGym 68.7% SOTA, sustained rounds) - minimax-m2.5☁️ 2 agents (frontend productivity, 2.2M pulls) - kimi-k2.6: 1 agent (ONLY true multimodal) - Add OpenCompass evaluation container (docker, scripts) for future objective runs - Evidence saved to agent-evolution/data/research-report.json (598 lines, 6 models) Data gaps honestly documented: minimax-m3/m2.5, qwen3-coder, kimi-k2.6 benchmark tables are image-only on Ollama.
38 lines
994 B
Bash
Executable File
38 lines
994 B
Bash
Executable File
#!/usr/bin/env bash
|
|
set -euo pipefail
|
|
|
|
# OpenCompass dataset setup script
|
|
# Downloads required datasets on first run
|
|
|
|
DATA_DIR="/data"
|
|
ZIP_URL="https://github.com/InternLM/opencompass/releases/download/0.2.2/OpenCompassData-core-20240207.zip"
|
|
ZIP_FILE="${DATA_DIR}/OpenCompassData-core-20240207.zip"
|
|
MARKER="${DATA_DIR}/.datasets_ready"
|
|
|
|
if [[ -f "$MARKER" ]]; then
|
|
echo "Datasets already present (${MARKER} exists). Skipping download."
|
|
exit 0
|
|
fi
|
|
|
|
echo "Downloading OpenCompass core datasets ..."
|
|
mkdir -p "$DATA_DIR"
|
|
|
|
if command -v wget >/dev/null 2>&1; then
|
|
wget -q --show-progress -O "$ZIP_FILE" "$ZIP_URL" || {
|
|
echo "Error: Failed to download datasets from ${ZIP_URL}" >&2
|
|
exit 1
|
|
}
|
|
else
|
|
echo "Error: wget not found. Cannot download datasets." >&2
|
|
exit 1
|
|
fi
|
|
|
|
echo "Extracting datasets ..."
|
|
unzip -q "$ZIP_FILE" -d "$DATA_DIR" || {
|
|
echo "Error: Failed to extract ${ZIP_FILE}" >&2
|
|
exit 1
|
|
}
|
|
|
|
touch "$MARKER"
|
|
echo "Datasets ready in ${DATA_DIR}."
|