How to Make Things Slower So They Go Faster

Posted5 months agoActive4 months ago

neehao

134 points

39 comments

gojiberries.ioTechstory

calmpositive

Debate

20/100

Performance OptimizationQueueing TheorySystem Design

Key topics

Performance Optimization

Queueing Theory

System Design

The article discusses how introducing delays or slowing down certain processes can ultimately lead to faster overall performance, and the discussion revolves around various examples and applications of this concept.

Snapshot generated from the HN discussion

Discussion Activity

Very active discussion

First comment

Peak period

Day 2

Avg / period

7.8

Comment distribution39 data points

Loading chart...

Based on 39 loaded comments

Key moments

01Story posted
Aug 24, 2025 at 1:10 AM EDT
5 months ago
Step 01
02First comment
Aug 25, 2025 at 11:19 AM EDT
1d after posting
Step 02
03Peak activity
27 comments in Day 2
Hottest window of the conversation
Step 03
04Latest activity
Sep 1, 2025 at 2:08 PM EDT
4 months ago
Step 04

Generating AI Summary...

Analyzing up to 500 comments to identify key contributors and discussion patterns

Discussion (39 comments)

Showing 39 comments

evaXhill

4 months ago

2 replies

Good post, bit too “mathy” but makes me think of “Asynchronous computing @Facebook: Driving efficiency and developer productivity at Facebook scale”. Where they touch on capacity optimization (queuing + time shifting), capacity regulation along with user delay tolerance (bc not all jobs, even at the same priority level, are equal)

SpaceManNabs

4 months ago

1 reply

I think the issue with the math is that it doesn't read well.

For example, the paragraphs around the paragraph with "compute the exact Poisson tail (or use a Chernoff bound)" and that paragraph itself could be better illustrated with lines of math instead of mostly language.

I think you do need some math if you want to approach this probabilistically, but I agree that might not be the most accessible approach, and a hard threshold calculation is more accessible and maybe just as good.

cogman10

4 months ago

For something like this, annotated graphs and examples (IMO) work a lot better than formulas in explaining the problem and solution.

Particularly because distributed computer systems aren't pure math problems to be solved. Load often comes from usage which is often closer to random inputs rather than predicable variables. Further, how load is processed depends on a bunch of things from the OS scheduler to the current load on the network.

It can be hard to really intuitively understand that a bottlenecked system processes the same load slower than an unbound system.

ignoramous

4 months ago

1 reply

> Asynchronous computing @Facebook: Driving efficiency and developer productivity ... optimization (queuing + time shifting), capacity regulation along with user delay tolerance ...

  We can infer a more detailed priority by understanding how long each of these asynchronous requests can be delayed ... For each job to be executed, we try to execute it as close as possible to its delay tolerance.

  ... we defer jobs with a long delay tolerance so that the workload is spread over a longer time window. Queueing plays an important role in selecting the most urgent job to execute first.

  ...  Time shifting ... optimize capacity in Async:

  1. Predictive compute collects the data people used yesterday. Predicated on which data people may need, it precomputes before peak hours and stores the data in cache ... This moves the computing lift from peak hours to off-peak hours and trades a little cache miss for better efficiency. 

  2. Deferred compute ... schedules a job as part of user request handling but runs at much later time. For instance, the "people you may know" list is processed during off-peak hours, then shown when people are online (generally during peak hours).

https://engineering.fb.com/2020/08/17/production-engineering... / https://archive.vn/A87hl

motorest

4 months ago

1 reply

> Asynchronous computing @Facebook

I feel tha I'm missing something obvious. Isn't this doc reinventing the wheel in terms of what very basic task queue systems do? It describes task queues and task prioritization, and how it supports tasks that cache user data. What am I missing?

ignoramous

4 months ago

Yes, what is described in the blog post is (by now) a fairly established way to deal with task queues at scale (if you're familiar with those kinds of systems, but not everyone is).

jmclnx

4 months ago

1 reply

I do something I think is simpler, and it is extremely portable. I use this on Linux and *BSD.

I just call nanosleep(2) based upon the amount if data processed. This is set by a parameter file that contains the sleep time and amount of data to determine when to sleep.

In programs I know will execute for a very long time, if the parameter file changes, parameters are adjusted during the run. Plus I will catch cancel signals to create a restart file should the program be cancelled.

LtWorf

4 months ago

1 reply

Why?

jmclnx

4 months ago

1 reply

Just saw this :)

Why ? On a laptop I have a program that will read an 8 billion record text file and matches it against a 1 billion record text file and doing some calculations based upon the data found between the 2 records.

So, slowing it down will prevent my laptop from overheating, it just runs quietly via a cron job.

LtWorf

4 months ago

1 reply

cpulimit

jmclnx

4 months ago

Not all systems have cpulimit. I know NetBSD does not have it, there are people who do not live only on Linux.

I need something 100% portable between systems, what I do meets that requirement :)

yardshop

4 months ago

5 replies

I thought this might be about the saying I've heard a bunch recently, "Slow is smooth, and smooth is fast."

I've mostly heard it in the context of building and construction videos where they are approaching a new skill or technique and have to remind themselves to slow down.

Going slowly and being careful leads to fewer mistakes, which will be a "smoother" process and ends up taking less time, whereas going too fast and making mistakes means work has to be redone and ultimately takes longer.

On rereading it, I see some parallels: When one is trying to go too fast, and is possibly becoming impatient with their progress, their mental queue fills up and processing suffers. If one accepts a slower pace, one's natural single-tasking capability will work better, and they will make better progress as a result.

And maybe its just my selection bias working hard to confirm that he actually is talking about what I want him to say!

bluedino

4 months ago

3 replies

> "Slow is smooth, and smooth is fast."

Common to hear this in auto racing and probably a lot of other fields

eszed

4 months ago

Yeah, the phrase goes back at least to Bill Miliken's monumental Race Car Vehicle Dynamics, where I first encountered it. The specific idea is that going slow(er) into a corner allows you to hit the apex precisely, optimally rotate the car, and get on the power sooner, which gets you a higher exit speed, which compounds all the way to the next corner. It's what fast drivers have done - probably since racing was done with horses - but it's counter-intuitive to beginners.

hbarka

4 months ago

The military places a lot of emphasis on this as a training principle. Practice over and over.

c0nsumer

4 months ago

Mountain biking as well, to counter new riders trying to ride fast from the get-go and crashing and getting hurt.

wallflower

4 months ago

1 reply

Very common. In fact, I think the hardest part of learning to play a musical instrument is the tendency to want to play at normal speed before you are ready. The idea that you can play something fast accurately when you can’t even play it slowly accurately is the classic mental and psychological conundrum.

There is a saying: “You don’t rise your level when performing. You fall to your level of practice.”

JasonSage

4 months ago

The saying is confusing and I would suggest makes the opposite claim. It’s common in sports. You practice at an uncomfortable pace to normalize it, even making mistakes, because if you can’t practice at game speed you won’t be able to compete at game speed. In that context there’s room for both, and I’d say the same for music—you need slow, deliberate practice and also reps in “performance” mode, and it’s probably too reductive to say you should “only” be doing either at any point in time.

milesvp

4 months ago

1 reply

What I find fascinating, is how much this concept scales to places it seems like it shouldn't. I had taken the idea to heart early in my life for anything that require dexterity. But it wasn't until mid career that I saw it work at an organizational level. At one point the team I was on stopped promising so much. We essentially decided to slow down. I don't quite remember what lead us to this mindset, though I know our weekly retrospectives were part of it (we had some really good retros, like I cry at the thought that I will likely never have that level of mutual trust in a team again). And, what was sort of unexpected, was that our velocity basically went up. We knew we wanted to make sure we focused on higher value items, and push back on low quality requests, but the amount of requests we could accommodate also went up along with the average value. I still don't fully understand the theory behind it, certainly we were using a lot of cycles on low value things, but just promising fewer deliverables allowed us to deliver more. I know that brains are bad at time slicing, but this seems to also expand to the organizational level too...

persedes

4 months ago

Isn't this essentially the idea behind agile? I'm not too deep into the agile theory, but the Phoenix project is always a very good read (albeit stressful if you work in software teams lol)

anonymars

4 months ago

Kind of like the tortoise and the hare?

Austizzle

4 months ago

They said this a bunch in the movie F1 that's playing in theatres right now, so that could be why there's been an uptick in usage

mparnisari

4 months ago

1 reply

"Synchronized demand is the moment a large cohort of clients acts almost together. In a service with capacity ... requests per second and background load ..., the usable headroom is ..."

To the author of the article: I stopped reading after the first two sentences. I have no idea what you are talking about.

cstrahan

4 months ago

1 reply

"Synchronized demand is the moment a large cohort of clients acts almost together."

Imagine everyone in a particular timezone browsing Amazon as they sit down for their 9 to 5; or an outage occurring, and a number of automated systems (re)trying requests just as the service comes back up. These clients are all "acting almost together".

"In a service with capacity mu requests per second and background load lambda_0, the usable headroom is H = mu - lambda_0 > 0"

Subtract the typical, baseline load (lambda_0) from the max capacity (mu), and that gives you how much headroom (H) you have.

The signal processing definition of headroom: the "space" between the normal operating level of a signal and the point at which the system can no longer handle it without distortion or clipping.

So headroom here can be thought of "wiggle room", if that is a more intuitive term to you.

fuckaj

4 months ago

Is the pragmatic solution to return 503 and have clients back off.

Or, if possible make latency a feature (embrace the queue!). For service to service internal stuff e.g. something like a request to hard delete something, this can always be a queue.

And obviously you can scale up as the queue backs up.

I do love the maths tho!

mhb

4 months ago

1 reply

Braess Paradox: "Braess' paradox is the observation that adding one or more roads to a road network can slow down overall traffic flow through it."

https://en.wikipedia.org/wiki/Braess%27_paradox

dijit

4 months ago

3 replies

I thought it was Jevons Paradox.

https://en.wikipedia.org/wiki/Jevons_paradox

I guess it's the same underlying principle for both paradoxii.

delifue

4 months ago

2 replies

Can you explain what's the same underlying principle between Jevons paradox and Braess paradox?

godelski

4 months ago

1 reply

This might help https://www.youtube.com/watch?v=Cg73j3QYRJc

mhb

4 months ago

Also: https://youtu.be/-QTkPfq7w1A?si=w4uuetyAXFHEzhFP

dijit

4 months ago

That you don’t necessarily solve a problem by increasing the capacity.

In fact, increasing capacity can make the problems worse due to the new capacity being thought of as available by many people at the same time.

ricudis

4 months ago

Segmentation fault, core dumped - you cannot use Latin inflections in Greek words

IAmBroom

4 months ago

That is a non-English plural for paradox.

Also, the plural should be quantified when possible: one paradox, two tridox, three quatrodox...

taeric

4 months ago

I thought this would be about Braess's paradox. I suppose that is a contrary wording of this? How making things faster slows things down?

A few fun videos covering this. I first saw Steve Mould's. He links to Up and Atom. Both are fun.

persedes

4 months ago

Always wondered if this could also be expressed "simply" with the Reynolds number to determine how to keep your flow laminar.. But then again how does one map software capabilities to SI units :D

bluedino

4 months ago

We encounter things like this all the time, they're fun to play around with. Assuming you have the ability to collect useful performance metrics...

In a simple, ideal world, your developers can issue the same number of jobs as you have CPUs available. Until you run into jobs that take more memory than is available. Or that access more disk/network IO than is available.

So you setup temporary storage, or in-memory storage, or stagger the jobs so only a couple of them hit the disks at a time, and then you measure performance in groups of 4 or 8 to see when performance falls off, or stand up an external caching server, or whatever else you can come up with to work within your budget and available resources.

fuckaj

4 months ago

Thinks looks very interesting but is there a good textbook as this is a bit tricky to follow. There are gaps of expected knowledge here.

johnthescott

4 months ago

Fast is fine, But accuracy is final. You must learn to be slow in a hurry.

        - Wyatt Earp

mikewarot

4 months ago

If you have a computation that can be expressed as a definite and finite sequence of steps, such as computing the next token from an LLM, you could unwind all the loops and spread it amongst a grid of FPGAs. It would be large, expensive, and power hungry, but doable.

If you do that, you're likely to have a latency on the order of almost a millisecond, putting the previous tokens in one end would get you the logits for the next at a rate of let's say 1000 tokens per second... impressive at current rates.

You could also take that same array, and program in several latches along the way to synchronize data at selected points, and enabling pipelining. This might produce a slight (10%) increase in latency, so a 10% or so loss in throughput for a single stream. However, it would allow you to have multiple independent streams flowing through the FPGAs. Instead of serving 1 customer at 1000 tokens/second, you might have 10 or more customers each with their 900 tokens/second.

Parallelism and pipelining are the future of compute.

View full discussion on Hacker News

ID: 45001556Type: storyLast synced: 11/20/2025, 6:27:41 PM

Want the full context?

Jump to the original sources

Read the primary article or dive into the live Hacker News thread when you're ready.

Open link View on HN