Leaderboard

Scores averaged across 9 problems. Click any column header to sort.

Framework	Contributor	Avg↓	Cloudcast	EPLB	LLM-SQL	MAS	Prism	Spot Multi-Reg	Spot Single-Reg	Telemetry Repair	Txn Scheduling	Date
Human SOTA	-	58.3	100.0	45.8	67.7	33.7	60.8	54.5	45.1	50.6	41.9	2025-06-01
AutoEvolve	ADRS Team	75.9	97.8	70.2	76.4	—	87.4	70.0	46.3	88.9	70.6	2025-12-06
GEPA	ADRS Team	73.6	96.6	70.2	67.7	—	87.4	62.2	51.4	85.5	67.7	2025-12-06
OpenEvolve	ADRS Team	72.9	92.9	62.0	72.5	—	87.4	66.7	42.5	88.9	70.0	2025-12-06
ShinkaEvolve	ADRS Team	69.8	72.0	66.4	68.5	—	87.4	63.6	45.6	86.5	68.2	2025-12-06

Have a new framework or updated results? Submit via GitHub.

Thank you to the UC Berkeley Sky Computing Lab, our sponsors, and the ADRS community.