Formal Methods Only Solve Half My Problems
Key topics
Diving into the limitations of formal methods, a recent discussion sparked debate around the effectiveness of tools like TLA+ in proving system correctness, with commenters pointing out that while they excel at verifying correctness, they fall short in assessing performance and latency. Some contributors chimed in with alternative approaches, such as Worst Case Execution Time analysis and tools like Coq and Dafny, which offer program extraction and verification capabilities. However, others raised concerns about the gap between formal models and actual implementation, questioning the reliability of formal methods in the first place. As the conversation unfolded, it became clear that the quest for a more comprehensive solution remains an open and intriguing challenge.
Snapshot generated from the HN discussion
Discussion Activity
Very active discussionFirst comment
4d
Peak period
25
96-108h
Avg / period
9.7
Based on 29 loaded comments
Key moments
- 01Story posted
Jan 3, 2026 at 1:49 AM EST
7 days ago
Step 01 - 02First comment
Jan 7, 2026 at 6:17 AM EST
4d after posting
Step 02 - 03Peak activity
25 comments in 96-108h
Hottest window of the conversation
Step 03 - 04Latest activity
Jan 8, 2026 at 9:18 PM EST
1d ago
Step 04
Generating AI Summary...
Analyzing up to 500 comments to identify key contributors and discussion patterns
Want the full context?
Jump to the original sources
Read the primary article or dive into the live Hacker News thread when you're ready.
It's like the quote attributed to Don Knuth: "Beware of bugs in the above code; I have only proved it correct, not tried it."
I only believe in formal methods where we always have a machine validated way from model to implementation.
But one tool (like TLA+) can’t realistically support all formalisms for all types of analyses ¯\_(ツ)_/¯
> Distributed systems are notoriously hard to get right (i.e., guaranteeing correctness) as the programmer needs to reason about numerous control paths resulting from the myriad interleaving of events (or messages or failures). Unsurprisingly, programmers can easily introduce subtle errors when designing these systems. Moreover, it is extremely difficult to test distributed systems, as most control paths remain untested, and serious bugs lie dormant for months or even years after deployment.
> The P programming framework takes several steps towards addressing these challenges by providing a unified framework for modeling, specifying, implementing, testing, and verifying complex distributed systems.
It was last posted on HN about 2 years ago [1].
[0]: https://p-org.github.io/P/whatisP/
[1]: https://news.ycombinator.com/item?id=34273979
In large, distributed systems the best we're looking for is statistically acceptable. You can always tailor a workload that will break a guarantee in the real world.
So you engineer with techniques that reduce the likelihood that workloads you have characterized as realistic can be handled with headroom, and you worry about graceful degradation under oversubscription (i.e. maintaining "good-put"). In my experience, that usually comes down to good load-balancing, auto-scaling and load-shedding.
Virtually all of the truly bad incidents I've seen in large-scale distributed systems are caused by an inability to recover back to steady-state after some kind of unexpected perturbation.
If I had to characterize problem number one, it's bad subscriber-service request patterns that don't provide back pressure appropriately. e.g. subscribers that don't know how to back-off properly and services that don't provide back-pressure. Classical example is a subscriber that retries requests on a static schedule and gives up on requests that have been in-flight "too long", coupled with services that continue to accept requests when oversubscribed.
I personally could care less about proving that an endpoint always responds in less than 100ms say, but I care very much about understanding where various saturation points are in my systems, or what values I should set for limits like database connections, or how what the effect of sporadic timeouts are, etc. I think that's more the point of this post (which you see him talk about in other posts on his blog).
I think it often boils down to "know when you're going to start queuing, and how you will design the system to bound those queues". If you're not using that principle at design stage then I think you're already cooked.
I think simulation is definitely a promising direction.
I don't disagree with you that it's a realtime problem, I do however think that "just" is doing a lot of work there.
Agreed though, "just" is hiding quite a deep rabbit hole.
But, in practice, I've spent just as much time on issues introduced by perf / scalability limitations. And the post thesis is correct: we don't have great tools for reasoning about this. This has been pretty much all I've been thinking about recently.
https://www.raml.co/about/
https://arxiv.org/abs/2205.15211
Tools like Lean and Rocq can do arbitrary math — the limit is your time and budget, not the tool.
These performance questions can be mathematically defined, so it is possible.
And the SeL4 kernel has latency guarantees based on similar proofs (at considerable cost)
A hammer is great for certain things but I don't expect it to make good coffee. I use other tools for that. However that doesn't make hammers deficient.
https://arxiv.org/abs/2205.15211
Already we have Resource Aware ML which
> automatically and statically computes resource-use bounds for OCaml programs
https://www.raml.co/about/