Essays

The Best Way to Learn Might Be Starting at the End: Writing a Proof in Lean

AI leveraged learning lets you start with the application at the end. Curiosity guides what you learn, fundamentals backfill when you need them.

Wil Chung

17 Oct 2025 — 16 min read

AI leveraged learning lets you start with the application at the end. Curiosity guides what you learn, fundamentals backfill when you need them.

School teaches us progressively: the current lesson builds upon the previous one. Hence, most peoples' inadvertent take-away is that prerequisites are requirements in learning. But this is just an artifact of constructing a curriculum for teaching tractably at an industrial scale. Prerequisites are often seen as gatekeepers, but it's better to see them as activation energy. AI collapses that activation barrier by acting as a bridge, making curiosity and critical thinking sufficient to start learning anything you care about.

I discovered this bridge over that barrier recently when learning Lean, a proof-assistant programming language, to prove a requirement in a system called DBSP^[1], a way to do incremental computation. In order to guarantee DBSP can represent incremental states that can be rolled back, it relies on a foundational assertion that must hold universally: The Z-set and the additive merge together must form an abelian group.

Surprisingly, many software engineers are allergic to math to the point of not using it as a precision tool in their work. The idea of proving a foundational core in their system is seen as too incomprehensible, too impractical, or too time consuming. But it doesn't have to be this way.

This piece demonstrates how I went about learning Lean to prove the assertion above, showing how previously inaccessible topics are now accessible with a bit of agency and critical thinking. The activation energy to learn new topics is now much lower because we can direct AI to collecting, summarizing, and adjusting the material to the learner. This distills learning a new subject to its naked core for the learner: working to deeply understand core concepts and how they connect.

Lowering the Activation Energy

💡

Conjecture:
The Z-set and the additive merge together form an abelian group.

If these words mean nothing to you, here's a short primer:

Z-set: Sets are a bag of unique elements. Z-sets are bags that can contain multiple instances of the same element. Think of it as a map/dictionary with keys as the elements, and integers as values. The integer indicates how many instances of the element exist in the Z-set.
Additive Merge: It's an operation that merges two maps by summing values for shared keys. Non-shared keys are just copied over.
Abelian Group: It's a set with an operation where five rules hold:
1. you can combine any two elements with the binary operation and still get an element from the set (closure)
2. there’s an element that leaves the other element unchanged when using the binary operation (identity)
3. every element has an inverse that cancels it out into the identity (inverse)
4. the grouping of operations doesn't change the result (associativity)
5. the order of combining doesn't matter (commutativity)

In short, we're trying to prove that when we have an operation on this set of elements, these five properties always hold.

Basic definitions are a good starting point to ask AI for help. If these explanations are still opaque, it's a perfect chance to ask AI to adjust the explanation in understandable terms. Give it your background and domain of expertise, and it'll find ways to bridge these concepts back to concepts you already understand.

But the point of this article is to skip this step and go straight into the proof! Due to the constraints of the medium of the written word, it's impossible to write an article that reads comprehensibly without the right prerequisites in place.

When using AI to learn, this constraint doesn't exist. Instead of grinding through the foundations and prerequisites before finally getting to the proof, I did the following:

I scoped the problem to "prove Z-sets over additive merge form an abelian group"
Then I asked AI: "Generate a Lean tutorial for this exact problem"
Following the tutorial that guides me towards a working proof, I ask it for explanations of concepts and jargon as I encounter them.
I can ask as many questions as I want for anything that doesn't make sense.
I get a working proof in a couple hours, and a sufficient understanding to extend it.

This wasn't possible before, because there was no scalable way to produce bespoke education material for every possible topic a learner might be interested in. The options were to find a tutorial close to the application of interest or map the lessons for an unrelated tutorial into the application of interest.

Using almost any Wikipedia article about a technical topic to learn is useless, because it's written to be accurate, not for pedagogy. The experience is completely different with AI.

If different level of explanation is required for comprehension, we can ask the AI to ELI5 (explain like I'm five) for a bit to get the basics straight before asking it to raise the level of explanation again.
We can ask it to explain things in terms of already understood concepts. As we gain understanding, it can adjust to our current level of comprehension by refining and expanding its explanation.
Finally, the tone and style of teaching is adjustable. The AI can play the straight shooter with no fluff, so we're not distracted. Or it can help boost our confidence along with verbal encouragement.

These are all the different ways we can leverage AI to lower the activation energy to learning a new domain. All we have to do is ask.

Prerequisites for skipping prerequisites

Though this piece is about skipping prerequisites for learning, this isn't for everyone and the method has prerequisites. It only works with basic knowledge in an adjacent field, vigilant critical thinking, a genuine curiosity about the specific problem, and a domain with relatively easy-to-verify answers.

It's well known that LLMs hallucinate facts and sound confident about it. Nowadays, this is tempered by grounding their context in source material. But that doesn't mean they can't sometime reach the wrong conclusion in their answers. I've caught them a few times on saying things that were slightly wrong, or ill-informed.

First, it's an advantage to have expertise or training in an adjacent field. It's much easier to spot erroneous answers from the AI. If not caught, these errors can accumulate in our mental model of the world. Despite the effort to learn, we can actively make ourselves stupider.

Second, it has to be a habit of mind to ask questions about their answers, even as beginners in an unfamiliar domain. If something doesn't make sense, ask questions to clarify. Doubt the sycophantic agreement every step of the way. Double check their assertions by clicking on their quoted sources and doing independent searches. If anything is incongruent, confusing, or missing, ask questions about why.

Third, a genuine curiosity about the problem goes a long way. It's an infinite source of good questions to fuel critical thinking. Genuine curiosity will unearth all the hidden corners and edges of our current understanding, which makes for good questions to challenge the AI and sharpen our understanding. In addition, it's motivation to keep going when hitting a wall. The drive compels us to ask the AI to approach a new concept from different angles when one approach doesn't work. Sometimes, the breakthrough is a shift in perspective, and the trick is to persevere long enough to find it.

Lastly, it must be a domain with relatively easy-to-verify answers. The nice thing about mathematical proofs as a domain is that AI-generated answers are verifiable by the proof assistant. In other domains without such clear-cut verifiable truth ^[2], we rely on consistency. The answers must be consistent within the conversation, consistent with what we know, and consistent with external independent sources.

Now, let's see how this worked in practice, for this example in Lean.

Example: Defining Z-set and additive merge

The AI-generated tutorial started with the definition of Z-sets and the additive merge in Lean.

-- ZSet is a function from elements to integers (coefficients)
def ZSet (α : Type) : Type :=
  α → Int

-- Additive merge: add coefficients pointwise
def ZSet.add {α : Type} (zs1 zs2 : ZSet α) : ZSet α := 
  fun x => zs1 x + zs2 x

Code from the ML-family of programming languages (Haskell, OCaml, Elm) can be unfamiliar and confusing. Beyond definitions, syntax is another area AI can easily help learners get up to speed. Again, due to the constraints of this medium, I'll give a little primer for what the AI would have replied to questions about syntax.

The first definition just says that Z-set is a type that maps something of any type (α) to an integer. It is denoted as α → Int

The second definition just says, in order to additively merge two Z-sets, take the matching keys of each Z-set, and add them up.

Another way to understand is to leverage what we already know. Given more familiarity with Typescript than Haskell, we can ask the AI to rewrite the Lean code in Typescript. The Typescript code can now be used as a Rosetta Stone to read and interpret the Lean code.

// ZSet<A> is a total function A -> number (ℤ). Return 0 for “not present”.
export type ZSet<A> = (x: A) => number;

function add<A>(zs1: ZSet<A>, zs2: ZSet<A>, x: A): ZSet<A> {
  return (x: A) => zs1(x) + zs2(x);
}

As we read this code, we might notice an apparent incongruence. Earlier, in the explanation of a Z-set described it as a map. Here, the Typescript translation defines it as a function. Why is that?

This is where genuine curiosity kicks in and we can ask the AI to explain by asking, "What's the equivalent of a map or dictionary in Lean?" or "Why is Z-set represented as a function of type α → Int instead of a Map?" The AI might quip: "In type theory, functions ARE maps."

If this is a new concept, it can be a bit of a perspective shift. Any pure function can be represented completely by a key-value store with infinite memory. In this view, functions are just simply key-value stores one can write down without needing an infinite sheet of paper (or memory) to write down the value of every key.

We didn't need to read introductions to type theory to happen upon this insight. We used questions that naturally arose from curiosity and practicality to learn concepts and conventions in Lean as well as a perspective shift on the equivalence of functions and maps.

Example: Stating the Abelian Group

Next, the AI-generated tutorial has us define what an abelian group is in Lean.^[3]

-- Define our own minimal abelian group structure as a typeclass
structure AbelianGroup (G : Type) where
  add : G → G → G
  zero : G
  neg : G → G

  add_zero : ∀ a : G, add a zero = a
  zero_add : ∀ a : G, add zero a = a
  add_inv : ∀ a : G, add a (neg a) = zero
  add_assoc : ∀ a b c : G, add (add a b) c = add a (add b c)
  add_comm : ∀ a b : G, add a b = add b a

Here, notice the two kinds of definition. The first group are function signatures for which we need to provide a definition. The second group are properties that must be proven. If we provide proofs for the properties, then for any example of G and add, we can use the proofs to check if they indeed form an abelian group.

While writing this post, I realized I had a gap in my knowledge I wasn't aware of before. In definition of an abelian group above, we require three functions to be implemented (add, zero, and neg). Yet, abelian groups are uniquely defined with just the carrier set G and an add method. Why do we need to implement zero and neg?

So I started asking those kinds of questions to the AI. It argued that while mathematicians set out to prove theorems with the smallest number of axioms, our goal is to build a program that's correct by construction. Hence, we don't need to derive everything axiomatically. If we can write down a definition from previously proven statements, then it's a functionally equivalent to deriving it axiomatically.

Is this true? This characterization of what mathematicians and engineers care about does align with my own experiences. And considering the nature of their jobs, the perspective seems goal aligned as well. Hence, it's likely that this is a sufficient answer.

How would we axiomatically get the inverse? We can start with a + 0 = a and 0 + a = a to see if there's anything we can add to a that gives us a back again. If both are true, then an identity (0) exists. Then we can use a + (-a) = 0 to see if there's anything we can add to a that gives us the identity. If so, the inverse (-a) exists.

But for us programmers, we just need a definition. How can we define the additive merge from what has already been proven? We can leverage the fact that integers are an abelian group, and define the negation of a Z-set as negating the integer value of every key.

def ZSet.neg {α : Type} (zs : ZSet α) : ZSet α :=
  fun x => -(zs x)

While we're not sure if this will work in the proof yet, we don't need to start from just the identity and the additive merge to define the inverse/negation.

This kind of struggle is what forced a deeper understanding, and is the type of clarifying learning with an AI. Tutorials and youtube videos won't know what you don't know, so they can't fill in the gaps. Human teachers can, but they're far more expensive, and it might not be practical to get one for the domain.

As an exercise for the reader, here are other questions to ask the AI, if curiosity grabbed your attention:

Are there set and operation pairs that don't form an abelian group? What are some examples?
An abelian group has five properties. Why are there two properties for the proof of identity, add_zero and zero_add? And why do we not prove the closure property?
Why these five properties? Why is it that these five properties together are significant enough to be given a name?
How come the identity function is named zero? Wouldn't it be a misnomer if our binary operation was integer multiplication?

Example: Proving Associativity

We're not going to prove every property in the abelian group for a Z-set. Doing one is good enough to demonstrate the learning process with AI. However, I'll attach the complete proof at the end. In the AI-generated tutorial, the proof of associativity is:

instance : AbelianGroup (ZSet α)
where
  -- ... other definitions
  
  -- (a + b) + c = a + (b + c)
  add_assoc := by
    intro a b c
    ext x
    simp [ZSet.add, Int.add_assoc]

Here, there were a couple keywords I didn't understand, so again, here's a primer that AI would have said:

by enters into tactic mode, where I'm telling Lean: "I’ll construct this proof interactively, step by step."
intro puts quantifiable variables into the local context where the goal is to prove f(a, b, c) for arbitrary a, b, and c
ext is the application of the extensionality lemma, which says "Two functions are equal if they are equal at every argument."
simp is for simplification, which just rewriting and substitution of ZSet.add and Int.add_assoc with their expanded definitions.

When doing proofs on paper, the intermediate expressions are written down and the transformation is implied. When doing proofs in Lean, the method transformations are written down and the intermediate expressions are implied.

One can't just blindly apply proof tactics. For a proof this simple, the AI isn't going to get it wrong. Even if the AI writes a verifiable incorrect proof, the beauty of AI + proof assistants is that the AI can use the proof assistant to self-correct until it gets the proof right.

But that doesn't mean the proof should be accepted blindly, no more than professional developers should accept vibe coded programs blindly. To stand behind the results, understanding a correct proof teaches what to ask when the proof is wrong. Just as in code, while the results are technically correct, we might want it a different way for other reasons.

Initially, the AI proof used ring as a proof tactic. While correct, I wanted to understand why and what the alternative were. That's when I learned about simp along with its pros and cons. In the end, I chose simp for this article because it's more transparent and shows what's going on with less magic.

This is much harder than following a tutorial lockstep. This is why so many young programmers can watch all do all the tutorials and watch all the Youtuber videos they want, and still not achieve proficiency. The struggle of thinking about why something doesn't work and why needs to be experienced, in order for learning to actually take place.

However, the struggle is with the core concept, not just with trivia, like looking up what different keywords mean. We're doing real formal verification about a data structure from a real system, like DBSP. The activation energy collapsed from "I'll never understand proof assistants" to "I have a working proof" in one afternoon.

A PhD is for thinking previously unthinkable thoughts

To be clear, I'm not saying AI-based learning replaces a PhD when it comes to learning Lean-based math proofs. When people say "you need a PhD to do X", people often mean it in two different, but conflated meanings:

Background collation: Months of reading papers, learning the jargon, and hunting down the prerequisites.
Novel Research: Asking questions never asked, proving theorems never proven, and thinking previously unthinkable thoughts.

Most of the time people say that phrase, they mean it in the first sense, to express the high activation energy required to get started on breaking into learning anything outside of their familiar domains. AI eliminates the barrier in the first sense, but not the second. You can't ask an AI to "solve my novel research problem", but you can ask it to "teach me formal verification for my specific problem."

What if you are doing novel research? That's where I make no claims of efficacy using AI to learn, except to warn that it's easy to fall into AI-induced psychosis. No matter how smart or careful you are, the easiest person to fool is yourself.

Here's an excerpt of a passage from a smart and accomplished founder:

At the heart of physics is the idea of a frame of reference. A frame is a boundary, something that separates this from that. A frame of reference is a boundary which contains a decoder and codebook that are used to interpret the boundary’s flux as a description of the world.

The codebook is a coordinate map which indicates what perspective the codebook should take interpreting the flux. In the discrete case, this means it decodes input states into output states. In the continuous case, it applies a projection operator onto a manifold defined by the decoder.

It goes on for a while in the same manner. A charitable interpretation is they're trying to express something we don't yet have words for. But this reads like someone who took shallow definitions from physics to construct a pet theory without precise definition and internal consistency.

This is why answer verifiability matters. In Lean, wrong proofs don't compile. In novel physics theorizing, AI will happily construct elaborate castles in the air with you. It's not yet clear how to best leverage AI for the second meaning of "requiring a PhD". That's why one of the prerequisites for this method is to pick a domain where the answers are relatively easily verifiable. So for now, I'd stick with the first when using AI to learn.

Trying it yourself

You can use the following prompt to generate yourself a tutorial for the proof. Adjust according to your tastes.

Generate a Lean 4 tutorial for proving that Z-sets over additive merge form an abelian group. I'm a programmer familiar with functional programming, but never used proof assistants. If you're using mathematical terms, explain them in terms a developer would understand. Explain any jargon and concepts in detail.

Then install Lean 4 and VSCode or use the browser version to work through the AI-generated tutorial.

Ask as many questions you can think of to clarify your understanding. When stuck, ask: "Why did [X tactic] fail?" or "Explain [term] in context of this proof." Try different angles to understand the concepts, from generating code in a familiar language to explaining the concepts in a domain you already understand.

Here is my full proof that Z-sets over an additive merge is an abelian group. It demonstrates all five properties of an abelian group: closure (by definition), the existence of identity and the inverse, associativity, and commutativity.

variable {α : Type} [DecidableEq α]

-- Define our own minimal abelian group structure (no mathlib)
structure AbelianGroup (G : Type) where
  add : G → G → G
  zero : G
  neg : G → G
  add_assoc : ∀ a b c : G, add (add a b) c = add a (add b c)
  add_zero : ∀ a : G, add a zero = a
  zero_add : ∀ a : G, add zero a = a
  add_inv : ∀ a : G, add a (neg a) = zero
  add_comm : ∀ a b : G, add a b = add b a

-- Example that integers form our minimal abelian group:
example : AbelianGroup Int := {
  add := fun a b => a + b,
  zero := 0,
  neg := fun a => -a,
  add_assoc := Int.add_assoc,
  add_comm := Int.add_comm,
  add_zero := Int.add_zero,
  zero_add := Int.zero_add,
  add_inv := Int.add_right_neg,
}

-- ZSet is a function from elements to integers (coefficients)
def ZSet (α : Type) : Type :=
  α → Int

-- Register extensionality theorem for ZSet
-- This tells Lean: "To prove two ZSets are equal, show they're equal pointwise"
@[ext]
theorem ZSet.ext {α : Type} (f g : ZSet α) : (∀ x, f x  = g x) -> f = g :=
  funext

-- empty ZSet (all coefs are 0)
def ZSet.empty {α : Type} : ZSet α :=
  fun _ => 0

-- Add an element with coefficient
def ZSet.single (a : α) (n : Int) : ZSet α :=
  fun x => if x = a then n else 0

-- Additive merge: add coefficients pointwise
def ZSet.add {α : Type} (zs1 zs2 : ZSet α) : ZSet α :=
  fun x => zs1 x + zs2 x
-- Enable `+` notation for ZSets
instance {α : Type} : Add (ZSet α) where
  add := ZSet.add

-- Additive inverse: negate all coefficients
def ZSet.neg {α : Type} (zs : ZSet α) : ZSet α :=
  fun x => -(zs x)
-- Enable `-` unary operator for ZSets
instance {α : Type} : Neg (ZSet α) where
  neg := ZSet.neg

-- Proof that ZSet under additive merge form an abelian group
instance : AbelianGroup (ZSet α)
where
  add := ZSet.add
  zero := ZSet.empty
  neg := ZSet.neg

  -- (a + b) + c = a + (b + c)
  add_assoc := by
    intro a b c
    ext x
    simp [ZSet.add, Int.add_assoc]

  -- a + b = b + a
  add_comm := by
    intro a b
    ext x
    simp [ZSet.add, Int.add_comm]

  -- a + 0 = a
  add_zero := by
    intro a
    ext x
    simp [ZSet.add, ZSet.empty]

  -- 0 + a = a
  zero_add := by
    intro a
    ext x
    simp [ZSet.add, ZSet.empty]

  -- a + (-a) = 0
  add_inv := by
    intro a
    ext x
    simp [ZSet.add, ZSet.neg, ZSet.empty]
    rw [Int.add_right_neg]

example : (ZSet.single "a" 3 + ZSet.single "b" 2) "a" = 3 := by rfl

example : (ZSet.single "a" 3 + ZSet.single "a" 2) "a" = 5 := by rfl

example : (ZSet.single "a" (-3) + ZSet.single "a" 2) "a" = -1 := by rfl

example : ZSet.neg (ZSet.single "a" 3) "a" = (-3) := by rfl

Compare it to yours. What are the differences? Why do it one way versus the other? Try and see if other sets and binary operations are abelian groups.

What this enables

AI has already made a seismic shift in the way programmers work through vibe coding. It's increased the velocity of feature output and bug fixing by programmers that have figured out how to leverage it. But I hope that it also creates a different kind of shift; one where programmers aren't afraid to branch out to the edges of what academia has discovered to bring it into the realm of real-world applications.

Distributed systems PRs could include Lean proofs of safety properties. CRDT libraries could ship with formal verification of merge semantics, and not just unit tests. "Formally verified" no longer is something out of reach for the average backend engineer, and instead is a practical tool in their toolbox.

And I think this pattern generalizes across domains beyond formal verification. There are lots of specialized domains in programming, from GPU programming to embedded programming. The cross-pollination of programmers that can straddle between ecosystems also means that ideas in one domain can be tried in another, making good ideas spread faster. And if anything a wider breadth of experience will give more perspective, which usually give nuance and color when making decisions.

The bridge exists now where there was none before. We can cross it into adjacent domains to learn if we have the critical thinking, curiosity, and agency to try.

DataBase Stream Processing, but also a pun on Digital Signal Processing ↩︎
Please don't write in about Godel's Incompleteness theorem. It doesn't apply here, because we're not reasoning about anything that references itself. ↩︎
An abelian group is already defined in Lean's standard library (mathlib), but for the point of exercise, we define it. But also, Lean 4 currently doesn't have tree shaking, so including the pre-written definition balloons the C executable by about 150+MBs ↩︎