CentralGauge - AL Code Benchmark for LLMs
ActiveAn open source benchmark for evaluating LLM performance on AL code generation for Microsoft Dynamics 365 Business Central, with 56 tasks across three difficulty tiers, real compilation, and test execution.
CentralGauge measures how well different LLMs can generate AL code for Business Central. It uses curated YAML task definitions, compiles generated code in a real BC container, runs test codeunits, and scores results with a transparent point-based system. Supports parallel execution across multiple models with cost tracking.