
ADRS Leaderboard
Scores averaged across 9 problems
| ADRS Framework | Contributor | Average↓ | Cloudcast | EPLB | LLM-SQL | MAS | Prism | Spot Multi-Reg | Spot Single-Reg | Telemetry Repair | Txn Scheduling | Date |
|---|---|---|---|---|---|---|---|---|---|---|---|---|
| % score | % score | % score | % score | % score | % score | % score | % score | % score | % score | |||
| Human SOTA | - | 58.3 | 100.0 | 45.8 | 67.7 | 33.7 | 60.8 | 54.5 | 45.1 | 50.6 | 41.9 | 2025-06-01 |
| AutoEvolve | ADRS Team | 75.9 | 97.8 | 70.2 | 76.4 | - | 87.4 | 70.0 | 46.3 | 88.9 | 70.6 | 2025-12-06 |
| GEPA | ADRS Team | 73.6 | 96.6 | 70.2 | 67.7 | - | 87.4 | 62.2 | 51.4 | 85.5 | 67.7 | 2025-12-06 |
| OpenEvolve | ADRS Team | 72.9 | 92.9 | 62.0 | 72.5 | - | 87.4 | 66.7 | 42.5 | 88.9 | 70.0 | 2025-12-06 |
| ShinkaEvolve | ADRS Team | 69.8 | 72.0 | 66.4 | 68.5 | - | 87.4 | 63.6 | 45.6 | 86.5 | 68.2 | 2025-12-06 |
Submit Results!
Have a new ADRS framework or updated results? Add submissions here: github.com/UcbSkyADRS/ADRS-Leaderboard.
Acknowledgements
Thank you to the Berkeley Sky Computing Lab, our lab sponsors, and the ADRS community for supporting this project.