Functional Programming and Reliability: Adts, Safety, Critical Infrastructure
Key topics
The debate around functional programming and its relation to reliability has sparked a lively discussion, with commenters weighing in on the role of strong type systems and Algebraic Data Types (ADTs) in achieving reliability. Some argue that these concepts are not exclusive to functional programming, pointing to examples like Rust and plain JavaScript, where similar techniques are used to achieve "correct-by-construction" code. As one commenter quipped, "In dynamic languages, you are the type system," highlighting the trade-offs between static and dynamic typing. The author acknowledges the points made, clarifying that functional programming's benefits extend beyond static typing and immutability.
Snapshot generated from the HN discussion
Discussion Activity
Very active discussionFirst comment
1h
Peak period
65
0-6h
Avg / period
14.5
Based on 160 loaded comments
Key moments
- 01Story posted
Dec 27, 2025 at 7:05 PM EST
5d ago
Step 01 - 02First comment
Dec 27, 2025 at 8:23 PM EST
1h after posting
Step 02 - 03Peak activity
65 comments in 0-6h
Hottest window of the conversation
Step 03 - 04Latest activity
Dec 30, 2025 at 3:51 PM EST
3d ago
Step 04
Generating AI Summary...
Analyzing up to 500 comments to identify key contributors and discussion patterns
Want the full context?
Jump to the original sources
Read the primary article or dive into the live Hacker News thread when you're ready.
I also ship code super fast. When I find bugs I just fix them on the spot. When I find variables named wrong, I just rename them. The result that I often smash bugfixes and features and cleanup together and have a messy git history, but on the flip side you'll never find bugs or naming deceptions that I've left sitting for years. If something is wrong and I can reproduce it (usually easy in functional code), the debugger and I are going to get to the bottom of it, and quickly. Always and only forward!
Perhaps the most direct inspiration I took from there though was from the wonderful "opaque types" feature that Flow supports (https://flow.org/en/docs/types/opaque-types/) which for reasons known only to Hejlsberg and God Typescript has never adopted; thus most people are unfamiliar with that way of thinking.
I’ve used languages with an approach like this. The difference in what I’ve used is that you separate the conventional part from the rest of the name with a space (or maybe a colon), then only refer to the value by the non-conventional part for the rest of the scope. Then the language enforces this convention for all of my co-workers! It’s pretty neat.
(I hope this makes the joke more obvious.)
In dynamic languages, you are the type system.
This comment is either severe snark or severe ignorance.
If you remove passes of a compiler, you, looking at it whilst running, are the compiler.
None of this seems that new. Even people who write TS code still write tests and still ship bugs and still have to think about good architecture patterns, like the ones in the linked post.
My angle was narrower: static types + ADTs improve the engineering loop (refactors, code review, test construction) by turning whole classes of mistakes into compiler errors. That’s not “what FP is”, it’s one very effective reliability layer that many FP ecosystems emphasize.
Even purity is not something exclusive to FP, D and Nim also support separating pure from impure functions. And if you ask me, the reason not many other languages have support for that is that in practice, it has been demonstrated again and again that it’s just not nearly as useful as you may think. Effects, as in Unison and Flix, generalizes the concept to include many more concepts than just purity and may perhaps prove more useful in general purpose programming, but the jury is still out on this.
In Unison, effects are called abilities, and they wrote a very good post explaining the differences!
https://www.unison-lang.org/docs/fundamentals/abilities/for-...
In summary: both have advantages and disadvantages. Which one is "better" depends on which factors you value more.
Rust is actually aligned with the point: it delivers major reliability wins via making invalid states harder to represent (enums, ownership/borrowing, pattern matching). That’s not “FP-first,” but it’s very compatible with functional style and the same invariants story.
If the TS example came off as “types instead of validation,” that’s on me to phrase better, the point wasn’t “types eliminate validation,” it’s “types make the shape explicit so validation becomes harder to forget and easier to review.”
For example, using the input parsing scenario, a Java 1.0 tutorial in 1995 would have said that you should create a TimeDuration class which parses the input and throws an exception when given an invalid value like “30s”. If you say that reliability requires FP, how would you respond when they point out that their code also prevents running with an invalid value? That discussion can be far more educational, especially because it might avoid derails around specific issues which are really just restating the given that JavaScript had lots of footgun opportunities for the unwary developer, even compared to some languages their grandmother might have used.
This is important. I threw my hands up and gave up during the height of the Haskell craze. You'd see people here saying things like LISP wasn't real FP because it didn't match their Haskell-colored expectations. Meanwhile for decades LISP was *the* canonical example of FP.
Similar to you, now I talk about specific patterns and concepts instead of calling a language functional. Also, as so many of these patterns & concepts have found their way into mainstream languages now, that becomes even more useful.
Untyped FP languages can be productive, flexible, even elegant (I guess) but they are structurally incapable of expressing large classes of correctness claims that typed FP makes routine.
That doesn’t make them useless, just, you know. Inferior.
This reliability isn't done by being perfect 100% of the time. Things like being able to handle states where transactions don't line up allowing for payments to eventually be settled. Or for telecom allowing for single parts of the system to not take down the whole thing or adding redundancy. Essentially these types of businesses require fault tolerance to be supported. The real world is messy, there is always going to be faults, so investing heavily into correctness may not be worth it compared to investing into fault tollerance.
Savvy processors recognize the immutability of each API version published to Merchants, along with how long each must be supported, and employ FP techniques both in design and implementation of their Merchant gateways.
Of course, the each bank's mainframes "on the rails" do not change unless absolutely necessary (and many times not even then).
My point is narrower: those mechanisms still benefit from making illegal transitions unrepresentable (e.g. explicit state machines) so your retries/idempotency don’t create new failure modes. It’s not correctness vs tolerance, it’s correctness inside tolerant designs.
Reliability should be simpler with FP but so much depends on correctness of the runtime and IO.
Erlang and the "run correctly or die" comes to mind as well. The system is either working or is off. When being off is fatal, Erlang seems to shrug and say "maybe next karmic cycle"
In hindsight I should have positioned types/ADTs as one layer in the reliability toolbox, not the toolbox.
Functional programming: no, functional programming as in: the final program consists in piping functions together and calling the pipe. In my opinion, that tends to get in the way of complex error handling.
The problem being that raising Exceptions at a deep level and catching them at some higher level is not pure functional programming. So your code has to deal with all the cases. It is more reliable if you can do it, but large systems have way too many failure points to be able to handle them all in a way that is practical.
Not if you program it with a mutable god object to mimic creating a new big state, then you have exactly the same kind of issues.
Have you done any FP? That's not how you do FP.
Agree. In Java, Streams allow you to process collections in a functional style. This feature enables concise, expressive data manipulation with operations like map, filter, and reduce.
Some people point out that Java's checked exceptions spoil the simplicity and elegance of Streams by forcing you to handle exceptions.
But that's not a reason to not have checked exceptions, it is a reason to not do functional style composition when methods can throw exceptions. Streams was invented for collections, which tend not to throw exceptions. If proper error handling is important don't do Streams.
`map` is a lot more than a fancy for-loop for lists and arrays; it's about abstracting away the entire idea of context. Java streams aren't a substitute for what you have in Haskell.
So yes, generating errors at a deep level and catching them at a higher one is a normal pard of the system design, it's purely functional, ando nothing strange happens, in very large systems. You ADT the errors, pipe up entire families of them, and select what you manage. It's significantly easier than exceptions, in the sense that I can be sure when I've validated the error.
It's practical, and typically one dedicates less code to the error handling than, say, yor typical enterprise Java program that is terrified of runtime exceptions and null checks every step of the way. In fact, I'd argue this is the main selling point of strongly typed FP.
You’ve just reinvented checked exceptions, good job.
dude .. wut?? Explain to me exactly how this is true, with a real world example.
From where I stand, untagged unions are useful in an extremely narrow set of circumstances. Tagged unions, on the other hand, are incredibly useful in a wide variety of applications.
A similar case applies to function parameters: In case of relaxed parameter requirements, changing a parameter from String to String|Null is trivial, but a change from String to Option<String> would necessitate changing all the call sites.
> From where I stand, untagged unions are useful in an extremely narrow set of circumstances. Tagged unions, on the other hand, are incredibly useful in a wide variety of applications.
Any real world example?
For API evolution, T | null can be a pragmatic “relax/strengthen contract” knob with less mechanical churn than Option<T> (because many call sites don’t care and just pass values through). That said, it also makes it easier to accidentally reintroduce nullability and harder to enforce handling consistently, the failure mode is “it compiles, but someone forgot the check.”
In practice, once the union has more than “nullable vs present”, people converge to discriminated unions ({ kind: "ok", ... } | { kind: "err", ... }) because the explicit tag buys exhaustiveness and avoids ambiguous narrowing. So I’d frame untagged unions as great for very narrow cases (nullability / simple widening), and tagged/discriminated unions as the reliability default for domain states.
For reliability, I’d rather pay the mechanical churn of Option<T> during API evolution than pay the ongoing risk tax of “nullable everywhere.
My post argues for paying costs that are one-time and compiler-enforced (refactors) vs costs that are ongoing and human-enforced (remembering null checks).
Two places where I still see tagged/discriminated unions win in practice:
1. Scaling beyond nullability. Once the union has multiple variants with overlapping structure, “untagged” narrowing becomes either ambiguous or ends up reintroducing an implicit tag anyway (some sentinel field / predicate ladder). An explicit tag gives stable, intention-revealing narrowing + exhaustiveness.
2. Boundary reality. In languages like TypeScript (even with strictNullChecks), unions are routinely weakened by any, assertions, JSON boundaries, or library types. Tagged unions make the “which case is this?” explicit at the value level, so the invariant survives serialization/deserialization and cross-module boundaries more reliably.
So I’d summarize it as: T | null is a great ergonomic tool for one axis (presence/absence) when the type system is enforced end-to-end. For domain states, I still prefer explicit tags because they keep exhaustiveness and intent robust as the system grows.
If you’re thinking Scala 3 / a sound type system end-to-end, your point is stronger; my caution is mostly from TS-in-the-wild + messy boundaries.
Yeah, that's exactly why I want a tagged union; so when I make a change, the compiler tells me where I need to go to do updates to my system, instead of manually hunting around for all the sites.
---
The only time an untagged union is appropriate is when the tag accounts for an appreciable amount of memory in a system that churns through a shit-ton of data, and has a soft or hard realtime performance constraint. Other than that, there's just no reason to not use a tagged union, except "I'm lazy and don't want to", which, sometimes, is also a valid reason. But it'll probably come back to bite you, if it stays in there too long.
> Yeah, that's exactly why I want a tagged union; so when I make a change, the compiler tells me where I need to go to do updates to my system, instead of manually hunting around for all the sites.
You don't have to do anything manually. There is nothing to do. Changing the return type of a function from String|Null to String is completely safe, the compiler knows that, so you don't have to do any "manual hunting" at call sites.
I'm unfamiliar with typescript, so in that language I don't have an opinion either way, but in C, you pretty much always want the tag
If I have an untagged union in <language_of_your_choice>, and I'm iterating over an array of elements of type `Foo|Bar|Baz`, and I have to do a dynamic cast before accessing the element (runtime typecheck) .. I believe that must actually be a tagged union under the hood, whether or not you call it a tagged union or not... right? ie. How would the program possibly know at runtime what the type of a heterogeneous set of elements is without a tag value to tell it?
We aren't discussing unions in memory layout, but in type systems. This also clearly indicates you aren't qualified for this discussion.
Tagged unions/ADTs make the discriminant explicit, which is exactly why they tend to be reliability-friendly: exhaustive matches + explicit constructors reduce “guessing” and refactor breakage.
That said, I agree the ergonomics matter, TS-style discriminated unions are basically “tagged” too once you add a kind field, for example.
In intuitionistic logic (which is the most basic kind from which to view the Curry-Howard or "propositions-as-types" correspondence), a proof of "A or B" is exactly a choice of "left" or "right" disjunct together with a corresponding proof of either A or B. The "choice tag" is part of the "constructive data" telling us how to build our proof of "A or B". Translated back into the language of code, the type "A | B" would be exactly a tagged union.
Now I do see where you are coming from; under a set-theoretic interpretation with "implies" as "subset", "or" as "union", and "and" as "intersection", the fact that "A implies (A or B)" tells us that an element of the set A is also an element of the set "A union B".
However, this is not the interpretation that leads to a straightforward correspondence between logic and programming. For example, we would like "A and B" to correspond to the type of pairs of elements of A with elements of B, which is not at all the set-theoretic intersection. And while "(A and B) implies A", we do not want to say a value of type "(A, B)" also has type "A". (E.g., if a function expects an "A" and receives an "(A, A)", we are at an impasse.)
So "implies" should not be read programmatically as a subtyping relation; instead, "A implies B" tells us that there is a function taking a value of type A to a value of type B. In the case of "A implies (A or B)", that function takes its input and tags it as belonging to the left disjunct!
Another perspective I must mention: given a proof of "A or B" and another of "(A or B) implies C", how can we combine these into a simpler proof of just "C"? A proof of "(A or B) implies C" must contain both a proof a "A implies C" and a proof of "B implies C", and we could insert into one those proofs a proof of A or a proof of B. But we have to know which one we have! (This is a very short gloss of a much deeper story, where, under Curry-Howard, proof simplification corresponds to computation, and this is another way of describing a function call that does a case-analysis of a tagged union (or "sum type").)
Now "union and intersection" types with the set-theoretic properties you are hinting at have indeed been studied (see, for example, Section 15.7 of Pierce's "Types and Programming Languages"). But they do not replace the much more familiar "sum and product types", and do not play the central role of "or" and "and" in the correspondence of programming to logic.
Intersection types in TypeScript and Scala 3 do work like conventional intersections / conjunctions. Something of type A&B is of type A. For example, a Set&Iterable is a Set. This makes perfect sense and is coherent with how unions work. A&A is then obviously equivalent to A. I'm not sure where you see the problem.
Suffice it to say for now: there is an interpretation of logic that gives a tighter correspondence to programming than the set-theoretic one, under the name "Curry-Howard" or "propositions as types, proofs as programs", and which has been known and cherished by logicians, programming language theorists, and also category theorists for a long time. The logic is constructive as it must be: a program of type A tells us how to build a value of type A, a proof of proposition A tells us how to construct evidence for A. From here we get things like "a proof of A and B is a proof of A together with a proof of B" (the "BHK interpretation"), which connects "and" to product types...
I spoke up because I could not leave untouched the idea that "tagged unions are illogical". On the contrary, tagged unions (aka "disjoint unions", "sum types", "coproducts", etc.) arise forthwith from an interpretation of logic that is not the set-theoretical one, but is a more fruitful one from which programming language theory begins. You are not wrong that there is also a correspondence between (untagged) union and intersection types and a set-theoretical interpretation of propositional logic, and that union and intersection types can also be used in programming, but you are missing a much bigger and very beautiful picture (which you will find described in most any introductory course or text on PL theory).
I suspect you are still reading "A implies B" as "A is a subtype of B", derived from a set-theoretic interpretation of propositional logic. But the constructive interpretation is that a proof of "A implies B" is a method to take a proof of A and transform it into a proof of B. Computationally, a value of type "A implies B" (typically rewritten "A -> B") is a function that takes values of type A and returns values of type B.
Well, everything I've said here is standard and widely-taught; go forth and check if you're inclined to. A good introduction is the one by Philip Wadler, https://homepages.inf.ed.ac.uk/wadler/papers/propositions-as... (tagged unions appear in Section 3, though it's all worth reading). A much more to-the-point and programming-focused account is in this OCaml book: https://cs3110.github.io/textbook/chapters/adv/curry-howard.... (very little OCaml syntax is used). You can find countless more.
(One complicating aspect is that there doesn't yet exist a mainstream language with full set-theoretic type system. TypeScript and Scala 3 currently only support intersections and unions, but no complements, making certain complex types not definable.)
In practice, the cohabitants in Option/Result types are almost always disjoint. So you spend time wrapping/unwrapping Some/Ok/Err for no value add.
Haha as someone who has worked in one of these domains using FP even - I wish the people in charge agreed with you!
Reliability is a cost center and Product-oriented Builders treat it as such.
There are constant data bugs in the feeds provided by major exchanges, market makers, etc, and so many iffy business rules that are basically all encoded in 100+ tab excel sheets.
Maybe this article focuses on a very specific niche of banking, but most of it is tied together with FTP and excel sheets.
I think the author would be shocked just how flaky a fundamental banking protocol like SWIFT is.
That’s exactly why I argue for stronger internal modeling: when the boundary is dirty, explicit state machines/ADTs + exhaustiveness + idempotency/reconciliation help ensure bad feeds don’t silently create invalid internal states.
Part of why I like “make invalid states unrepresentable” approaches is exactly that: it’s one of the few reliability investments that can pay back during feature work (safer refactors, fewer regressions), not only during incidents.
and this company is hugely successful. so i've learned that the biggest competition advantage in fintech is flagrant disregard for correctness and compliance.
i'm glad i have a csuite with the stones to execute that. i am way too principled.
I can understand treating social network sites as less critical, of course.
The reason money-related systems often get singled out is the combination of irreversibility and auditability: a bad state transition can mean incorrect balances/settlement, messy reconciliation, regulatory reporting, and long-tail customer harm that persists after the outage is over.
That said, my point isn’t “finance is special therefore FP.” It’s “build resilience and correctness by design early”, explicit state machines/invariants, idempotency/reconciliation, and making invalid states hard to represent. Doing this from the beginning also improves the developer experience: safer refactors, clearer reviews, fewer ‘tribal knowledge’ bugs.
In other areas like lost sales or failures of the system there is lot more arguments. On other hand if you are rich enough and can prove the other side is off by sufficiently large amount of money you can bring the hammer down with facts.
> [Most production incidents] are due to the code entering a state that should never have been possible.
I have never seen evidence that this is even remotely true, and I've been looking at software reliability research in the last few months.
Instead, it is more true that most production incidents are due to the system entering into one of thousands of unsafe states which were possible and latent in production potentially for years. In a sufficiently complex system—all interesting and important software projects—functional programming is not strong enough a tool to prevent even a sliver of potential accidents.
> Arguments that these degraded conditions should have been recognized before the overt accident are usually predicated on naïve notions of system performance. System operations are dynamic, with components (organizational, human, technical) failing and being replaced continuously. — https://how.complexsystems.fail/
OP says (your quote):
> [Most production incidents] are due to the code entering a state that should never have been possible.
You say:
> [...] it is more true that most production incidents are due to the system entering into one of thousands of unsafe states which were possible and latent in production potentially for years
I see you both agree that a broken system enters an "unsafe state" (your words) or a "state that should never have been possible" (OP's words).
"Unsafe state" and "state that should not have been possible" are, in practice in a real system, the same practical thing. I suspect you both would agree "system confuses a string for an integer and acts based on erroneous value" or "system acts on internal state that indicates the valve is both open and closed" would be states that a system should not be in. Outside pedantry, your descriptions are practically synonymous with each other.
Another way of casting it is like this. The goal may be:
1. Eliminate possibility code can enter invalid state 2. Control parameters of the system so that it remains in a safe condition
Those are very different goals.
I agree with you: no matter how good of a job the code (by construction or types or otherwise) does of “making unsafe states unrepresentable”, that in no way makes a real world complex system “safe” by itself.
Code can be structured so that valves may only be open OR closed, but nothing stops the real world from returning a sensor reading that says “the valve is <undefined>”. To remain a “safe” system, the system must deal with inconsistent states like “heisen-valves”.
But this isn't a falsifiable claim. We cannot possibly know if this is true or not.
- Not all of banking and telecom use functional programming or even static typing.
- Functional programming often leads to write-only incomprehensible code; the exact opposite of what you need to have a reliable system.
- There's no hard evidence that static typing improves reliability. Only vibes and feels.
I'm curious how you came to that conclusion?
https://pleiad.cl/papers/2012/kleinschmagerAl-icpc2012.pdf
https://www.deepdyve.com/lp/springer-journals/an-empirical-s...
So, we can safely disregard these papers. They got exactly the result that they sought out to get, and the papers were published because they confirmed the preexisting groupthink.
Medicine that involves testing human subject response to treatments is very different from the papers you’re citing and does involve falsifiable theses (usually, definitely not always).
My point about human subjects is in the context of the linked studies.
I’m not super interested in further debating human subjects in science generally
The first paper shown above was presented at ICPC, which has a history of bias in reviewing, and typically only assigns one or two reviewers in any case.
Which makes it not surprising that the paper itself doesn't really prove anything, except that the authors themselves are good at creating dissimilar situations.
Subjects had to modify existing systems, which were provided by the experimenters.
The experimenters deliberately removed any semblance of anything that might hint at types (comments, variable names, etc.) and did who the fuck knows what else to the dynamically typed code to make it difficult to work with.
They also provided their own IDE and full environment which the participants had to use.
Now, of course, we've all seen the graphs which show that for simple problems, dynamic is better, and there's a crossover point, and, of course Cooley's experiment is on the far left of that graph, so it certainly doesn't prove that strict static typing isn't better for large programs, but it's at least honest in its approach and results, using self-selected working practitioners (and there was never any shortage of working practioners swearing by how much better VHDL is).
https://danluu.com/empirical-pl/
In the spectrum of runtime errors statically typed languages mathematically and logically HAVE less errors. That by itself is the definition of more reliable. This isn’t even a scientific thing related to falsifiability. This comes from pure mathematical logic.
It is definitely not vibes and feels. Not all of banking uses statically typed languages but they are as a result living with a less reliable system then the alternative and that is a logical invariant.
This is so completely untrue that I'm confused as to why anyone would try to claim it. Type Confusion is an entire class of error and CVE that happens in statically typed languages. Java type shenanigans are endless if you want some fun but baseline you can cast to arbitrary types at runtime.
What I am saying is not untrue. It is definitive. Java just has a broken type system and it has warped your view. The article is more talking about type systems from functional programming languages where type errors are literally impossible.
You should check out elm. It’s one of the few languages (that is not a toy language and is deployed to production) where the type system is so strong that run time errors are impossible. You cannot crash an elm program because the type system doesn’t allow it. If you used that or Haskell for a while in a non trivial way it will give you deeper insight into why types matter.
> Ruby is so safe in it's execution model that Syntax Errors don't invalidate the running program's soundness.
This isn’t safety. Safety is when the program doesn’t even run or compile with a syntax error. Imagine if programs with syntax errors still tried their best effort to run… now you have a program with unknown behavior because who knows what that program did with the syntax error? Did it ignore it? Did it try to correct it? Now imagine that ruby program controlling a plane. That’s not safe.
For this reason Java is a bad example of a typed language. It gives static typing a bad rep because of its inflexible yet unreliable type system (only basic type inference, no ADTs, many things like presence of equality not checked at compile time etc ) Something like ocaml or fsharp have much more sound and capable type systems.
There is nothing inevitable about the consequence you’re imagining because statically typed languages also reject correct programs.
How does a statically typed language rejecting a correct program affect reliability? The two concepts are orthogonal. You’re talking about flexibility of a language but the topic is on reliability.
Let me be clear… as long as a language is Turing complete you can get it to accomplish virtually any task. In a statically typed language you have less ways to accomplish the same task then a dynamically typed language; but both languages can accomplish virtually any task. By logic a dynamically typed language is categorically more flexible than a static one but it is also categorically less reliable.
Because in some cases it will reject code that is simple and obviously correct, which will then need to be replaced by code that is less simple and less obviously correct (but which satisfies the type checker).
It is a strict upgrade in reliability. You're arguing for other benefits here, like readability and simplicity. The metric on topic is reliability and NOT other things like simplicity, expressiveness or readability. Additionally, like you said, it doesn't happen "most" of the time, so even IF we included those metrics in the topic of conversation your argument is not practical.
>You are paying for the extra guarantees on the code you can write by giving up lots of correct programs that you could otherwise have written.
Again the payment is orthogonal to the benefit. The benefit is reliability. The payment is simplicity, flexibility, expressiveness, and readability. For me, personally, (and you as you've seem to indicate) programs actually become more readable and more simple when you add types. Expressiveness and flexibility is actually a foot gun, but that's not an argument I'm making and you're free to feel differently.
Anyway, these "payments" are just more or less closer to opinion and "feels" and "vibes". My argument is that in the totality of possible errors, statically typed programs have provably LESS errors and thus are definitionally MORE reliable than untyped programs. I am saying that there is ZERO argument here, and that it is mathematical fact. No amount of side stepping out of the bounds of the metric "reliability" will change that.
>My argument is that in the totality of possible errors, statically typed programs have provably LESS errors and thus are definitionally MORE reliable than untyped programs. I am saying that there is ZERO argument here, and that it is mathematical fact. No amount of side stepping out of the bounds of the metric "reliability" will change that.
Making such broad statements about the real world with 100% confidence should already raise some eyebrows. Even through the lens of math and logic, it is unclear how to interpret your argument. Are you claiming that sum of all possible errors in all runnable programs in a statically checked language is less than sum of all possible errors in all runnable programs in an equivalent dynamically checked language? Both of those numbers are infinity, although i remember from school that some infinities are greater than others, I'm not sure how to prove that. And if such statement was true, how does it affect programs written in the real world?
Or is your claim that a randomly picked program from the set of all runnable statically checked programs is expected to have less errors than randomly picked program from the set of all runnable dynamically checked programs? Even this statement doesn't seem trivial, due to correct programs being rejected by type checker.
If your claim is about real world programs being written, you also have to consider that their distribution among the set of all runnable programs is not random. The amount of time, attention span and other resources is often limited. Consider the act of twisting an already correct program in various ways to satisfy the type checker, Consider the time lost that could be invested in further verifying the logic. The result will be much less clear cut, more probabilistic, more situation-dependent etc.
If you take the same language and change only one thing by adding sound static type checking, the resulting system is strictly more reliable in the sense that it can fail in fewer ways at runtime. That statement is not empirical, probabilistic, or sociological. It follows directly from the definitions.
This is not a claim that statically typed programs are more correct, smarter, or more likely to produce the right answers. Logical errors exist equally in both systems. Static typing does not prevent bad algorithms. That entire line of argument is orthogonal and does not touch the claim being made.
The comparison is deliberately narrow and controlled. Hold everything constant: runtime, semantics, memory model, expressiveness, libraries. Change only one thing: add static type checking. Once you do this, all other dimensions disappear by construction. There is literally nothing left to compare except runtime failure behavior.
Under that transformation, something very simple happens. Any program that executes successfully in the statically typed variant also executes successfully in the dynamic one. But there exist programs that execute in the dynamic variant and fail at runtime due to type errors that cannot be executed at all in the statically typed variant. Those failures are removed from the set of possible outcomes. That is exactly what static typing means.
This is a set inclusion argument, not a statistical one. The set of possible runtime failure states in the statically typed system is a strict subset of the failure states in the dynamically typed system. No infinities need to be compared. No random sampling over “all programs” is required. No assumptions about developer skill, time pressure, or real-world workflows are involved. It is a property of the language definition itself.
Redefining “reliability” does not escape this conclusion. If reliability is expanded to include things like readability, velocity, team discipline, or development process, those factors are already equalized away by holding the language constant. They cannot distinguish the two systems because they are identical by assumption. Once everything except typing is fixed, the only remaining difference is how many ways the program can fail at runtime. At that point, reliability reduces to failure count whether one likes that framing or not, because there is literally nothing else left to compare.
So the claim that “there’s no hard evidence” or that this is “just vibes” is simply misplaced. No empirical evidence is needed because no empirical claim is being made. This is a deductive statement about failure modes. Between two otherwise identical systems, the one that admits fewer runtime failure states is more reliable. That conclusion is not controversial, and it is not optional. It follows directly from the setup.
Even if the two languages are identical except for the static types, then it is clearly possible to write programs that do not have any runtime type errors in the dynamic language (I'll leave it as an exercise to the reader to prove this but it is very clearly true) so there exist programs in any dynamic language that are equally reliable to their static counterpart.
[1] I also disagree with your definition of reliability but I'm granting it for the sake of discussion.
The claim was about reliability and lack of empirical evidence. Once framed that way, definitions matter. My argument is purely ceteris paribus: take a language, hold everything constant, and add strict static type checking. Once you do that, every other comparison disappears by definition. Same runtime, same semantics, same memory model, same expressiveness. The only remaining difference is the runtime error set.
Static typing rejects at compile time a strict subset of programs that would otherwise run and fail with runtime type errors. That is not an empirical claim; it follows directly from the definition of static typing. This is not hypothetical either. TypeScript vs JavaScript, or Python vs Python with a sound type checker, are real examples of exactly this transformation. The error profile is identical except the typed variant admits fewer runtime failures.
Pointing out that some dynamic programs have no runtime type errors does not contradict this. It only shows that individual programs can be equally reliable. The asymmetry is at the language level: it is impossible to deploy a program with runtime type errors in a sound statically typed language, while it is always possible in a dynamically typed one. That strictly reduces the space of possible runtime failures.
Redefining “reliability” does not change the result. Suppose reliability is expanded to include readability, maintainability, developer skill, team discipline, or development velocity. Those may matter in general, but they are not variables in this comparison. By construction, everything except typing is held constant. There is literally nothing else left to compare. All non-type-related factors are identical by assumption. What remains is exactly one difference: the presence or absence of runtime type errors. At that point, reliability reduces to failure count not as a philosophical choice, but because there is no other dimension remaining.
Between two otherwise identical systems, the one that can fail in fewer ways at runtime is more reliable. That conclusion is not empirical, sociological, or debatable. It follows directly from the setup.
In what language?
… I mean, if we’re just making global assertions :)
Gimme the “write only” compiler verified exhaustively pattern matched non-mutable clump in a language I can mould freely any day of the week. I will provide the necessary semantic layer for progress backed with compiler guarantees. That same poop pile in a lesser language may be unrecoverable.
> But this isn't a falsifiable claim.
Saying "this isn't falsifiable" is a wild claim. Indeed the claim "functional programming and static typing make things more reliable" is falsifiable, as long as you admit a statistical understanding. The world is messy and experiments have noise, so what would you use if not statistics? Anecdotes?: no. Purely deductive methods?: no; we should not expect any single technique to be a silver bullet.
Good studies and analyses lay out a causal model and use strong methodologies for showing that some factor has a meaningful impact on some metric of interest. I recommend this as a starting point [1]
[1]: https://yanirseroussi.com/2016/05/15/diving-deeper-into-caus...
Any insights as to how to get effective determinism without pure functions?
Pure functions win here because we only have to reason about the function arguments. Without pure functions, the state space explodes, so how would you figure out equivalent states in practice.
Conclusion (can you argue against it?): by dramatically reducing the state space, pure functions are an obvious and probably essential component of reasoning about a program.
At least with many GUIs, the time it would take to know, document and/or test all combinations of UI behaviors quickly exceeds the time available to humans. It's at least tractable to start with vastly more states than could ever be measured, and then gate or special-case certain ones to try to make things minimally safe/sane/reliable for the user.
Pure functions/immutability help a lot because tests become representative and cheap. I’d only push back on “tests of all equivalent scenarios” being sufficient, the space explodes and many real failures live at I/O/concurrency/distributed boundaries. My intended claim is that FP/ADTs/types reduce the state space and improve the ROI of tests, not replace them.
But your take about modern Java is correctly, and they adopt this style under internal projects for some workflows.
26 more comments available on Hacker News