projects/al-corpus
active al-corpus order 16

AL Corpus

A Rust CLI that extracts structured training datasets and detects anti-patterns from AL codebases using tree-sitter parsing.

Stack
Rust · tree-sitter
Status
Active
Released
2026
License
MIT

Overview

A Rust CLI tool that parses AL codebases with tree-sitter-al and extracts structured JSONL datasets for LLM fine-tuning. Processes all .al files, extracts objects (tables, pages, codeunits, reports, enums) with full metadata, captures every procedure and trigger with signatures, parameters, variables, and call references, then generates prompt/completion pairs from procedure signatures and bodies.

Also includes an anti-pattern labeler that detects ten common AL mistakes: CalcFields in loops, record operations in loops, missing SetLoadFields, unfiltered FindSet, hardcoded record IDs, and more. Each flagged with severity levels.

Parses 15,000+ files in under a minute. Feeds directly into al-train for fine-tuning pipelines.

Highlights

Structured JSONL extraction for LLM fine-tuning
Prompt/completion pairs from procedure signatures
Anti-pattern labeler across 10 AL mistake categories
~50 seconds for 15,358 files, 227,912 procedures
Streaming output, constant memory regardless of size