HTTP Caching, a Refresher
Key topics
The debate around HTTP caching's relevance in today's HTTPS-dominated landscape is heating up, with some arguing that widespread encryption has rendered caching obsolete. However, others counter that caching remains crucial, particularly in CDNs and internal browser caching, where it continues to improve performance and efficiency. As one commenter pointed out, enterprises still intercept HTTPS for inspection and logging, and may also cache responses at the point of interception, highlighting the ongoing importance of caching in certain contexts. The discussion reveals a nuanced consensus: while the landscape has changed, the fundamentals of HTTP caching remain relevant.
Snapshot generated from the HN discussion
Discussion Activity
Active discussionFirst comment
2h
Peak period
13
0-12h
Avg / period
8.3
Based on 33 loaded comments
Key moments
- 01Story posted
Dec 23, 2025 at 2:41 PM EST
16 days ago
Step 01 - 02First comment
Dec 23, 2025 at 5:11 PM EST
2h after posting
Step 02 - 03Peak activity
13 comments in 0-12h
Hottest window of the conversation
Step 03 - 04Latest activity
Dec 28, 2025 at 4:38 AM EST
11 days ago
Step 04
Generating AI Summary...
Analyzing up to 500 comments to identify key contributors and discussion patterns
Want the full context?
Jump to the original sources
Read the primary article or dive into the live Hacker News thread when you're ready.
I took a Django app that's behind an Apache server and added cache-control and vary headers using Django view decorators, and added Header directives to some static files that Apache was serving. This had 2 effects:
* Meant I could add mod_cache to the Apache server and have common pages cached and served directly from Apache instead of going back to Django. Load testing with vegeta ( https://github.com/tsenart/vegeta ) shows the server can now handle multiples more simultaneous traffic than it could before.
* Meant users browsers now cache all the CSS/JS. As users move between HTML pages, there is now often only 1 request the browser makes. Good for snappier page loads with less server load.
But yeah, updating especially the sections on public vs private caches with regards to HTTPS would be good.
This website is chock full of site operators raging mad at web crawlers created by people that didn't bother to implement proper caching mechanisms.
A node server could negociate https close to the user, do caching stuff and create an other https connection to your local server (or reuse an existing one).
Https everywhere with your CDN in middle.
I previously experimented a bit with Squid Cache on my home network for web archival purposes, and set it up to intercept HTTPS. I then added the TLS certificate to the trust store on my client, and was able to intercept and cache HTTPS responses.
In the end, Squid Cache was a little bit inflexible in terms of making sure that the browsed data would be stored forever as was my goal.
This Christmas I have been playing with using mitmproxy instead. I previously used mitmproxy for some debugging, and found out now that I might be able to use it for archival by adding a custom extension written in Python.
It’s working well so far. I browse HTTPS pages in Firefox and I persist URLs and timestamps in SQLite and write out request and response headers plus response body to disk.
My main focus at the moment is archiving some video courses that I paid for in the past, so that even the site I bought the courses from ceased operation I will still have those video courses. After I finish archiving the video courses, I will proceed to archiving other digital things I’ve bought like VST plugins, sample packs, 3d assets etc.
And after that I will give another shot at archiving all the random pages on the open web that I’ve bookmarked etc.
For me, archiving things by using an intercepting proxy is the best way. I have various manually organised copies of files from all over the place, both paid stuff and openly accessible things. But having a sort of Internet Archive of my own with all of the associated pages where I bought things and all the JS and CSS and images surrounding things is the dream. And at the moment it seems to be working pretty well with this mitmproxy + custom Python extension setup.
I am also aware of various existing web scrapers and internet archival systems for self hosting and have tried a few of them. But for me the system I am doing is the ideal.
[0] https://news.ycombinator.com/item?id=38346382
https://xcancel.com/simonw/status/1988984600346128664
An optimal solution would involve: the response listing which alternate content-types can be returned for that endpoint, the cache considering the accept header, if it sees a type from the alternates list higher in the accept header priority than whatever it has in cache, then it would forward the request to the server. Once it had all the alternatives in cache, it would pass them through according to the accept without hitting the server.
The closest existing header to the above would be the link header, if you give it rel=alternate, and type as the mime type. It's not clear what href you would be, since it usually is to a different document, but we want the same url but a different mime type. So clearly this would be an abuse of the header, but could work.
And an optimal solution IMHO would be for the origin server to simply return 302 to a specific resource, selected upon the value of the Accept header:
I had thought about recommending that people just use an alternate link as intended, to point to an alternate format. I think that would work best using existing web standards as intended, but it has the downside of initially serving the original format regardless of the content type.
Why? It has no "Vary" header, and it's the one that's supposed to get cached anyhow.
> the cache MUST NOT use that stored response without revalidation unless all the presented request header fields nominated by that Vary field value match those fields in the original request
As usual, MUST NOT is a suggestion more than a rule popular systems follow.
No need to be mean and assume the worst possible purpose :)
Hell, the author could probably have called it a primer and I think it'd have been fair.
I mean, do those <meta> tags really suggest someone who’s into SEO? Call me stale but what I really want is validation :-)
Wanted to highlight MDN's HTTP caching guide[0] that OP links in the conclusion. It's written at a higher level than the underlying reference material and has been a great resource I've turned to several times in the last few years.
[0]: https://developer.mozilla.org/en-US/docs/Web/HTTP/Guides/Cac...
I think setting FileETag None solved it. With that setup, the browser won't use stale JS/CSS/whatever bundles, instead always validating them against the server, but when the browser already has the correct asset downloaded earlier, it will get a 304 and avoid downloading a lot of stuff. Pretty simple and works well for low traffic setups.
It was surprisingly easy to mess up, or having your translation bundles have cached out of date versions in the browser.
(nothing against other web servers, Apache2 was just a good fit for other reasons)