λ-Lang (Lambda Lang)

A language-agnostic specification format for function contracts.
Designed collaboratively by Claude, GPT, and Gemini.

The Idea

LLMs are good at writing code. They’re bad at knowing what they forgot.

When you ask an LLM to implement process_payment, it generates something plausible. But:

Did it handle the case where the charge succeeds but the database write fails?
Is it clear what hits the network vs. what’s local?
Does the error handling distinguish CardDeclined from NetworkTimeout?

λ-Lang is a structured thinking step inserted before code generation:

Problem → LLM → λ-Lang Spec → Human Review → LLM → Code

The spec forces the LLM to articulate the contract before implementing it. Humans review a spec — not code. Then the spec transpiles to Python, TypeScript, Kotlin, or Rust.

What a Spec Looks Like

function charge_credit_card

input:
  amount: Integer (min: 1, max: 1000000)
  card_token: String (pattern: ^tok_[a-zA-Z0-9]+$)
  idempotency_key: String

output: Result<PaymentConfirmation, PaymentError>

side_effects:
  - http.call api.stripe.com
  - database.write payments

requires:
  - amount > 0
  - card_token matches /^tok_/

ensures:
  - on Ok: result.amount == amount
  - on Err: database unchanged

examples:
  - input: {amount: 1000, card_token: "tok_visa", idempotency_key: "abc123"}
    output: {id: "ch_123", amount: 1000, status: "succeeded"}
    expect: Ok

  - input: {amount: -100, card_token: "tok_visa", idempotency_key: "abc123"}
    output: null
    expect: Err(InvalidAmount)

Everything explicit: inputs, outputs, side effects, preconditions, postconditions, and test cases. Nothing hidden.

Designed by Multiple AI Models

This is an experiment in collaborative AI design. Claude, GPT, and Gemini each proposed a syntax and semantics for λ-Lang, critiqued each other, and synthesized a final design.

The Proposals

Claude (Anthropic)

Proposed a hybrid synthesis that combined GPT's structured AST with Gemini's minimalist discipline and its own human-centric validation methodology. Advocated for Phase 0 human testing before any large-scale implementation. Honest self-critique: acknowledged that free-form condition strings couldn't be reliably transpiled.

Read proposal →

GPT (OpenAI)

Contributed the core AST architecture: structured Expr nodes for conditions (instead of strings), structured Effect objects with category/action/target, and rich Example metadata including expectation type and comparison mode. The "AST-first" principle — the internal structure is authoritative, surface syntax is secondary — came from GPT.

Read proposal →

Gemini (Google)

The minimalist watchdog. Pushed back on scope creep, insisted on removing the :implementation field (which would have undermined language-agnosticism), and proposed S-expression syntax as an unambiguous parsing target. Kept the type system simple when others wanted to add generics, refinement types, and dependent types immediately.

Read proposal →

The synthesized consensus became the canonical spec.

Early Experimental Results

We’ve run two controlled experiments comparing direct generation vs. spec-first generation.

Experiment 1: Compound Interest Calculator

A pure function with validation and edge cases.

Metric	Direct Generation	Spec-First
Test cases passing	5/6 (83%)	6/6 (100%)
Edge cases covered	5	6
Error specificity	Generic `ValueError`	Typed: `InvalidPrincipal`, `InvalidRate`, `InvalidYears`
Postcondition checks	None	1

Key finding: The baseline missed testing years = 0 entirely. The λ-Lang spec required explicit enumeration of all test cases, which caught it.

Experiment 2: Payment Processor with Refunds

A complex function with multiple side effects, 12+ error cases, idempotency, and transaction atomicity.

Metric	Direct Generation	Spec-First
Correctness	All cases handled	All cases handled
Architecture	Class-based (OOP)	Function-based (FP)
Test suite	52 tests (exploratory)	23 tests (requirement-driven)
Documentation	Standard docstrings	Explicit “as per spec” traceability

Key finding: Both implementations correct — but spec-first produced better architecture, more focused tests, and explicit traceability. Even for well-understood domains, the spec improves code quality.

Full analysis →

Design Principles

Schema-first. Every function declares its complete type contract. No implicit types, no any.

Effect tracking. Side effects (database, network, filesystem) are declared in the signature. No surprises.

side_effects:
  - database.read users
  - http.call api.stripe.com
  - logging.log audit

Design by contract. Preconditions and postconditions are first-class citizens.

Examples as tests. Every example in a spec becomes a unit test when transpiled.

Result types. Errors are values, not exceptions. Functions return Result<T, E>.

Representation-agnostic. The same spec can be written in markdown-adjacent syntax, JSON, or S-expressions — all parse to the same AST.

Status

Pre-alpha. Active development.

Canonical AST dataclasses
Markdown parser (working, tests passing)
2 validation experiments
Python transpiler (scaffold generator)
TypeScript transpiler
CLI tool (llang parse, llang transpile)
More experiments

Get Involved

git clone https://github.com/klutometis/llang
cd llang
uv sync
uv run pytest

GitHub → · Spec → · Experiments →