rss-bridge 2026-02-25T22:17:29+00:00

Deterministic Programming with LLMs

Comments

---

Deterministic Programming with LLMs

Introduction

If, like me, you regularly read programming blogs (and you do, otherwise you wouldn't be
seeing these words), then you are well aware that our industry is currently deep in the
throes of a drastic change and more than half of all those blog articles are currently
involved in debating the best way for our industry to adapt to the advent of LLMs (large
language models) that are capable of writing code.

Much has been said on the ethics of LLM coding, on the best approach for using it, and on
how to use AI agents effectively. In this essay, I would like to make one contribution to
this discussion: I would like to talk about how LLMs can be used in a deterministic way. I
am not trying to suggest this is the only way to use them, but I do think it is valuable to
think about it as one possible tool to use.

Mathematical Proof and LLMs

Before I look at my own industry, I want to take a look at what people are doing in a
different area that also is affected by LLMs and AI and try to learn something from what
they are doing. The field of math has been confronting questions about how to use AI and
what they have done is illuminating.

LLMs are actually quite capable at writing things that look like mathematical proofs.
In September 2024, Terence Tao (a Fields-medal-winning mathematician who has been
active in looking at the use of AI in math)
wrote that supervising an
LLM was like: "trying to advise a mediocre, but not completely incompetent, (static
simulation of a) graduate student." Very few human beings are capable of acting like a
mediocre graduate student in math, and the LLMs have only gotten better in the past
year-and-a-half.

But ultimately, LLMs are a device for producing what looks very much like other
documents they have seen, and they are subject to hallucinations. This is
especially dangerous in math proofs. Proofs often depend on very subtle
differences and given a plausible sounding argument it is
easy
to fall into the trap of believing it. So it is not safe to accept LLM-written
proofs as correct, nor is it particularly safe to ask mathematicians to read an
LLM-written proof and expect them to catch all errors.

So mathematicians have turned to another computer-based tool:
Lean (and
other
proof
systems). In principle, mathematical
proofs take starting axioms and definitions, apply logical inferences, and thereby
reach a proven conclusion. Actual mathematicians don't generally operate at this
level; if they did, then even simple 5-page proofs would be thousands of pages long with
too many steps for anyone to hope to follow them and no way of getting any insight from
it. But computers can. Lean (and other proof systems) produce just such a rigorous
step-by-step proof, but they are not very widely used by professional mathematicians
because writing proofs in Lean is actually very challenging.

You can see where this is going. In January of 2026, a team of people successfully
got an LLM to create a proof
of a meaningfully difficult problem never previously solved by humans. The problem was
one of a large set of problems posed by Paul Erdős, but by accident this one had been
written down incorrectly, which was only discovered a few months earlier.

The approach had a few steps: first, an LLM (ChatGPT) was asked to create an outline
of the proof. (This was an essential step, but — as predicted above — it
turned out that the argument actually had a few flaws.) Then another AI tool,
Aristotle, was able to patch up the logical
flaws in the argument and express the proof within Lean, so it could be verified. Finally,
ChatGPT was used again to read the Lean proof and write it in the format of a published
math proof, complete with references to existing literature.

Deterministic Tools in Programming

Let's get back to my own industry of software development, also known as "programming".
Today we have agents like Claude Code
or Gemini Code Assist that are capable of
"vibecoding" moderately
complex applications with minimal supervision. They are, arguably, at the level of a
mediocre junior developer. The existence of these tools is changing nearly everything in
our industry at a shocking pace as we try to figure out what software development will look
like after the invention of these coding agents. I want to talk specifically about LLMs
and determinism.

It is widely agreed that automated deployment scripts are better than deploying manually.
Writing a deployment script takes longer than doing a single deployment by hand, but over
many deployments the script will pay back that time spent, so scripted deployments save time
(in the long run). But that's not the most important reason to use automated deployments.

What is more important is that automated deployments are more reliable. Every time a
deployment is done by hand there is a chance of human error. The script is hard to get right
in the first place but it will consistently give the same results every time it is run.
Computer programs are incredibly good at being deterministic, at producing the exact same
result every time. Humans are less so.

And in this respect, LLMs lie somewhere between humans and computer programs. Unlike humans,
LLMs don't get bored or tired or impatient. But like humans — and unlike computer
programs — they do not produce the exact same results every time they are used.
This is fundamental to the way that LLMs operate: based on the "weights" derived from their
training data, they calculate the likelihood of possible next words to output, then
randomly select one (in proportion to its likelihood). This produces results that are
based on the sum total of the training data (most of human knowledge for the frontier models)
but which varies slightly each time it is used.

Asking an LLM to do deployments would be easier than writing and debugging a script, and it
would probably execute nearly as quickly as one. But it would be a poor way to do deployments
because it would not be deterministic: it might work 99 times out of 100, but still fail
occasionally.

When Determinism is Needed

Some tasks in programming are done just once, and others are done over and over again.
Determinism is important for the tasks that are done repeatedly. The simplest kind of work
done just once is a one-off task: migrate the data from system A to system B, import the
data from the spreadsheet, generate charts for the presentation, anything that you will
do once and then never need again. There is no need for determinism to guarantee the job
will be done identically every time if we only plan to do it once.

Nearly all of the code we create is written once in order to run many times. When we create
a login service for a web application, we only have to write the service once, and we expect
development is "write once, use many times". The login itself needs to be deterministic, but
the process of writing it doesn't. But there IS a more complex kind of "work done over and
over": maintaining standards within the codebase.

A good example of this would be protecting against
injection attacks. Before
using a user-supplied string within something like an SQL query, an HTML page, or a command
line argument it is essential that the string be properly escaped. And this needs to be
done EVERY time that you insert a string. "Protect against injection attacks by escaping
user-supplied strings before using them" is not a one-and-done task, it is one that needs
to be done over and over.

We know (from decades of experience trying to protect against injection attacks) that
humans (even experienced developers) aren't reliable enough to get this right 100% of the
time. And unfortunately, LLMs also cannot be considered reliable enough. We can
include sample code that properly escapes input strings, we can insert a reminder about
injection attacks into AGENTS.md or create a Claude Skill specifying how to escape strings,
but because of the stocastic nature of LLMs, that will never give us the deterministic
confidence that ALL of our strings will be properly sanitized. Better prompting cannot change
the fundamental nature of LLMs. And there are many practices that require consisten behavior:
following naming conventions, ensuring that every log message includes a stack trace, closing
every file in a finally block — there are lots of global practices we want to
enforce in our code.

The Solution is Code-Checking Code

Even the best humans are not fully deterministic, and over decades the software industry has
invented techniques for enforcing policies when we want them to be universal. Since LLMs
share the same limitation, we can use the same solution! Unlike humans and LLMs, programs
are extremely deterministic, so I recommend relying on them when we need
consistent, reliable behavior.

There are a number of ways to do this. We could encode the policy into the type system: if
we had two different types, "UserString" and "SanitizedString" we could get the compiler to
enforce the requirement that UserStrings must be sanitized when combining them into a
SanitizedString. Or, we could write a "lint" to enforce the use of our naming conventions
or to prefer our new logging framework rather than the one we are slowly replacing. We could
write a unit test which would scan the code to ensure that only approved libraries are
used. And because linters, tests, and compiler-enforced policies are run every single time
the code is built, there would be no risk that an LLM or a human programmer would
accidentally miss a case.

Creating code aides like this is great for determinism, but it requires extra work. Writing
new lint rules or transforming the code to use the NewType pattern may not require a lot of
creativity, but it does take a bit of time. And that is where the LLM comes in handy,
because agentic programming LLMs are very good at creating exactly this kind of tool. When
consistency is important, instead of asking your LLM to follow rules each time, ask your
LLM to build a program to enforce the rules, and incorporate it into your build chain.

This recommendation applies whenever there is a policy, something that needs to be done
consistently at many places within the code base. When something only needs to be done once
(whether it is creating the login screen or writing a lint to enforce some policy), asking
the LLM to write it is reasonable. (Just what level of human scrutiny it needs afterward is
a topic of intense debate. I review every line of code before committing it; I know not
everyone does so.)

Summary (In Case You Skipped the Rest)

LLMs may not get bored like human programmers, but they aren't fully deterministic. Whenever
there is a standard practice or policy that needs to be followed every time a certain type of
code is written, LLM coding agents might not get it right 100% of the time. You can prevent
that by writing lints, tests, or other non-stochastic programs to verify that the policy is
being followed and incorporate those into your build process, and the LLM agent can help you
to write that.

Posted Tue 24 February 2026
by mcherm
in Programming

---

[Original source](https://www.mcherm.com/deterministic-programming-with-llms.html)