CentralGauge Benchmark Update: Why the Numbers Changed
A transparency report on significant fixes to the CentralGauge AL code benchmark infrastructure, including bugs in code extraction, broken tasks, and vague specs, along with updated LLM rankings.
aialbenchmarkcentralgaugedeveloper-tools