λ-Lang (Lambda Lang)
A language-agnostic specification format for function contracts.
Designed collaboratively by Claude, GPT, and Gemini.
The Idea
LLMs are good at writing code. They’re bad at knowing what they forgot.
When you ask an LLM to implement process_payment, it generates something plausible. But:
- Did it handle the case where the charge succeeds but the database write fails?
- Is it clear what hits the network vs. what’s local?
- Does the error handling distinguish
CardDeclinedfromNetworkTimeout?
λ-Lang is a structured thinking step inserted before code generation:
Problem → LLM → λ-Lang Spec → Human Review → LLM → Code
The spec forces the LLM to articulate the contract before implementing it. Humans review a spec — not code. Then the spec transpiles to Python, TypeScript, Kotlin, or Rust.
What a Spec Looks Like
function charge_credit_card
input:
amount: Integer (min: 1, max: 1000000)
card_token: String (pattern: ^tok_[a-zA-Z0-9]+$)
idempotency_key: String
output: Result<PaymentConfirmation, PaymentError>
side_effects:
- http.call api.stripe.com
- database.write payments
requires:
- amount > 0
- card_token matches /^tok_/
ensures:
- on Ok: result.amount == amount
- on Err: database unchanged
examples:
- input: {amount: 1000, card_token: "tok_visa", idempotency_key: "abc123"}
output: {id: "ch_123", amount: 1000, status: "succeeded"}
expect: Ok
- input: {amount: -100, card_token: "tok_visa", idempotency_key: "abc123"}
output: null
expect: Err(InvalidAmount)
Everything explicit: inputs, outputs, side effects, preconditions, postconditions, and test cases. Nothing hidden.
Designed by Multiple AI Models
This is an experiment in collaborative AI design. Claude, GPT, and Gemini each proposed a syntax and semantics for λ-Lang, critiqued each other, and synthesized a final design.
The Proposals
Claude (Anthropic)
Proposed a hybrid synthesis that combined GPT's structured AST with Gemini's minimalist discipline and its own human-centric validation methodology. Advocated for Phase 0 human testing before any large-scale implementation. Honest self-critique: acknowledged that free-form condition strings couldn't be reliably transpiled.
Read proposal →GPT (OpenAI)
Contributed the core AST architecture: structured Expr nodes for conditions (instead of strings), structured Effect objects with category/action/target, and rich Example metadata including expectation type and comparison mode. The "AST-first" principle — the internal structure is authoritative, surface syntax is secondary — came from GPT.
Gemini (Google)
The minimalist watchdog. Pushed back on scope creep, insisted on removing the :implementation field (which would have undermined language-agnosticism), and proposed S-expression syntax as an unambiguous parsing target. Kept the type system simple when others wanted to add generics, refinement types, and dependent types immediately.
The synthesized consensus became the canonical spec.
Early Experimental Results
We’ve run two controlled experiments comparing direct generation vs. spec-first generation.
Experiment 1: Compound Interest Calculator
A pure function with validation and edge cases.
| Metric | Direct Generation | Spec-First |
|---|---|---|
| Test cases passing | 5/6 (83%) | 6/6 (100%) |
| Edge cases covered | 5 | 6 |
| Error specificity | Generic ValueError |
Typed: InvalidPrincipal, InvalidRate, InvalidYears |
| Postcondition checks | None | 1 |
Key finding: The baseline missed testing years = 0 entirely. The λ-Lang spec required explicit enumeration of all test cases, which caught it.
Experiment 2: Payment Processor with Refunds
A complex function with multiple side effects, 12+ error cases, idempotency, and transaction atomicity.
| Metric | Direct Generation | Spec-First |
|---|---|---|
| Correctness | All cases handled | All cases handled |
| Architecture | Class-based (OOP) | Function-based (FP) |
| Test suite | 52 tests (exploratory) | 23 tests (requirement-driven) |
| Documentation | Standard docstrings | Explicit “as per spec” traceability |
Key finding: Both implementations correct — but spec-first produced better architecture, more focused tests, and explicit traceability. Even for well-understood domains, the spec improves code quality.
Design Principles
Schema-first. Every function declares its complete type contract. No implicit types, no any.
Effect tracking. Side effects (database, network, filesystem) are declared in the signature. No surprises.
side_effects:
- database.read users
- http.call api.stripe.com
- logging.log audit
Design by contract. Preconditions and postconditions are first-class citizens.
Examples as tests. Every example in a spec becomes a unit test when transpiled.
Result types. Errors are values, not exceptions. Functions return Result<T, E>.
Representation-agnostic. The same spec can be written in markdown-adjacent syntax, JSON, or S-expressions — all parse to the same AST.
Status
Pre-alpha. Active development.
- Canonical AST dataclasses
- Markdown parser (working, tests passing)
- 2 validation experiments
- Python transpiler (scaffold generator)
- TypeScript transpiler
- CLI tool (
llang parse,llang transpile) - More experiments
Get Involved
git clone https://github.com/klutometis/llang
cd llang
uv sync
uv run pytest
GitHub → · Spec → · Experiments →