Leaderboard
Scores averaged across 9 problems. Click any column header to sort.
| Framework | Contributor | Avg↓ | Cloudcast | EPLB | LLM-SQL | MAS | Prism | Spot Multi-Reg | Spot Single-Reg | Telemetry Repair | Txn Scheduling | Date |
|---|---|---|---|---|---|---|---|---|---|---|---|---|
| Human SOTA | - | 58.3 | 100.0 | 45.8 | 67.7 | 33.7 | 60.8 | 54.5 | 45.1 | 50.6 | 41.9 | 2025-06-01 |
| AutoEvolve | ADRS Team | 75.9 | 97.8 | 70.2 | 76.4 | — | 87.4 | 70.0 | 46.3 | 88.9 | 70.6 | 2025-12-06 |
| GEPA | ADRS Team | 73.6 | 96.6 | 70.2 | 67.7 | — | 87.4 | 62.2 | 51.4 | 85.5 | 67.7 | 2025-12-06 |
| OpenEvolve | ADRS Team | 72.9 | 92.9 | 62.0 | 72.5 | — | 87.4 | 66.7 | 42.5 | 88.9 | 70.0 | 2025-12-06 |
| ShinkaEvolve | ADRS Team | 69.8 | 72.0 | 66.4 | 68.5 | — | 87.4 | 63.6 | 45.6 | 86.5 | 68.2 | 2025-12-06 |
Submit Results
Have a new framework or updated results? Submit via GitHub.
Acknowledgements
Thank you to the UC Berkeley Sky Computing Lab, our sponsors, and the ADRS community.