Docker Systems Status: Full Service Disruption
Posted3 months agoActive2 months ago
dockerstatus.comTechstoryHigh profile
heatednegative
Debate
70/100
DockerAWS OutageContainer Registry
Key topics
Docker
AWS Outage
Container Registry
Docker Hub experienced a full service disruption due to an AWS outage, causing widespread issues with builds and deployments, and sparking discussions on the reliability of public registries and potential mitigations.
Snapshot generated from the HN discussion
Discussion Activity
Very active discussionFirst comment
3m
Peak period
118
0-12h
Avg / period
33.5
Comment distribution134 data points
Loading chart...
Based on 134 loaded comments
Key moments
- 01Story posted
Oct 20, 2025 at 3:31 AM EDT
3 months ago
Step 01 - 02First comment
Oct 20, 2025 at 3:34 AM EDT
3m after posting
Step 02 - 03Peak activity
118 comments in 0-12h
Hottest window of the conversation
Step 03 - 04Latest activity
Oct 27, 2025 at 10:07 AM EDT
2 months ago
Step 04
Generating AI Summary...
Analyzing up to 500 comments to identify key contributors and discussion patterns
ID: 45640877Type: storyLast synced: 11/20/2025, 8:37:21 PM
Want the full context?
Jump to the original sources
Read the primary article or dive into the live Hacker News thread when you're ready.
Isn't it everyone using multiple cloud providers nowadays? Why are they affected by single cloud provider outage?
True multi-tenancy is not only very rare, it's an absolute pain to manage as soon as people start using any vendor-specific functionality.
It's also true in circumstances where things have the same name but act differently.
You'd be forgiven for believing that AWS IAM and GCP IAM are the same thing for example, but in GCP an IAM Role is simply a list of permissions that you can attach to an identity. In AWS an IAM Role is the identity itself.
Other examples; if you're coming from GCP, you'd be forgiven for thinking that Networks are regional in AWS, which will be annoying to fix later when you realise you need to create peering connections.
Oh and while default firewall rules are stateful on both, if you dive into more advanced network security, the way rules are applied and processed can have subtle differences. The inherent global nature of the GCP VPC means firewall rules, by default, apply across all regions within that VPC, which requires a different mindset than AWS where rules are scoped more tightly to the region/subnet.
There's like, hundreds of these little details.
> There’s like hundreds of these little issues
Exactly. If it is a handful of things that is fine. It’s often as you describe.
Complex systems are hard.
On the other hand, it's pretty embarrassing at this point for something as fundamental as Docker to be in a single region. Most cloud providers make inter-region failover reasonably achievable.
There are multiple AWS services which are "global" in the sense that they are entirely hosted out of AWS East 1
Being multi-cloud does not come for free: time, engineers, knowledge and ultimately money.
Oh yes. All of them, in fact, especially if you count what key vendors host on.
> Why are they affected by single cloud provider outage?
Every workload is only on one cloud. Nb this doesn’t mean every workflow is on only one cloud. Important distinction since that would be more stable.
No? I very much doubt anyone is doing that.
Wonder how many builds or redeployments this will break. Personally, nothing against Docker or Docker Hub of course, I find them to be useful.
Depends on the implementation, of course: I'm speaking to 'distribution/distribution', the reference. Harbor or whatever else may behave differently, I have no idea.
Asside; seems Signal is also having issues. Damn.
Edit to add: This might spur on a few more to start doing that, but people are quick to forget/prioritise other areas. If this keeps happening then it will change.
It's not just about reducing your exposure to third parties who you (presumably) don't have a contract with, it's also good mitigation against potential supply chain attacks - especially if you go as far as building the base images from scratch.
https://github.com/actions/runner-images/issues/1445#issueco... https://github.com/orgs/community/discussions/76636
Just engineering hygiene IMO.
That doesn’t make sense unless you have some oddball setup where k8s is building the images you’re running on the fly. Theres no such thing as “base image” for tasks running in k8s. There is just the image itself and its layers which may come from some other image.
But it’s not built by k8s. Its be built in whatever is building your images and storing I. Your registers. That’s where you need your true base image caching.
So not agile!
Thankfully, AWS provides a docker.io mirror for those who can't wait:
In the error logs, the issue was mostly related to the authentication endpoint:▪ https://auth.docker.io → "No server is available to handle this request"
After switching to the AWS mirror, everything built successfully without any issues.
Just had to change
to Hope this helps![0]: https://cloud.google.com/artifact-registry/docs/pull-cached-...
If you're image is not cached on one of these then you may be SOL.
Also, quay.io - another image hoster, from red hat - has been read-only all day today.
If you're going to have docker/container image dependencies it's best to establish a solid hosting solution instead of riding whatever bus shows up
- AWS
- Vercel
- Atlassian
- Cloudflare
- Docker
- Google (see downdetector)
- Microsoft (see downdetector)
What's going on?
For instance: When there's a widespread Verizon cellular outage, sites like downdetector will show a spike in Verizon reports.
But such sites will also show a spike in AT&T and T-Mobile reports. Even though those latter networks are completely unaffected by Verizon's back-end issues, the graphs of user reports are consistently shaped the same for all 3 carriers.
This is just because some of the users doing the reporting have no clue.
So when the observation is "AWS is in outage and people are reporting issues at Google, and Microsoft," then the last two are often just factors of people being people and reporting the wrong thing.
(You're hanging out on HN, so there's very good certainty that you know what precisely what cell carrier you're using and also can discern the difference betwixt an Amazon, a Google, and a Microsoft. But lots of other people are not particularly adept at making these distinctions. It's normal and expected for some of them to be this way at all times.)
https://spegel.dev/
It is a huge deal if I can start investigating and deploying such a solution as a techie right away, compared to having to go through all the internal hoops for a software purchase.
Having the landing page explain the motivations of the authors vis-a-vis open source goes a long way to providing the context for whatever licensing is appearing in the source repos, and helps understand what the future steer for the project is likely to be.
There are loads of ostensibly open source projects out there whose real goal is to drive sales of associated software and services, often without which the value of the opensource components is reduced, especially in the developer tooling space.
No, but I also don't see why that matters a lot. Once you adopted a third party project as a dependency, you also implicitly sign up to whatever changes they do, or you get prepared for staying on a static version with only security fixes you apply yourself. This isn't exactly new problems nor rocket science, we've been dealing with these sort of things for decades already.
> There are loads of ostensibly open source projects out there whose real goal is to drive sales of associated software and services, often without which the value of the opensource components is reduced, especially in the developer tooling space.
Yeah, which is kind of terrible, but also kind of great. But in the end, ends up being fairly easy to detect one way or another, with the biggest and reddest signal being VC funded with no public pricing.
Isn’t a big part of getting a project out there actually letting people know what it is? Especially if you’re trying to give a tool to the open source-valuing community. That’s a high priority for them. That’s like having a vegan menu and not saying you’re a vegan restaurant anywhere public facing.
Agree to disagree. It should be front and center the moment I find your tool IMO.
Kuik: https://github.com/enix/kube-image-keeper?tab=readme-ov-file...
Also it looks kuik uses CRDs to store information about where images are cached, while Spegel uses its own p2p solution to do the routing of traffic between nodes.
If you are running k3s in your homelab you can enable Spegel with a flag as it is an embedded feature.
P.S. Your blog could do with an rss feed ;). I will track https://github.com/spegel-org/spegel/releases.atom for now
Ex: `docker pull ghcr.io/linuxcontainers/debian-slim:latest`
Google Container Registry provides a pull-through mirror, though, just prefix `mirror.gcr.io` and use `library` as the user for the Docker Official Images. For example `mirror.gcr.io/library/redis` for https://hub.docker.com/_/redis.
I find that it better surfaces the best discussion when there are multiple threads (like in this example), and it keeps showing slightly older threads for longer when there's still discussion happening.
We know how critical Docker Hub and services are to millions of developers, and we’re sorry for the pain this is causing. Thank you for your patience as we work to resolve this incident. We’ll publish a post-mortem in the next few days once this incident is fully resolved and we have a remediation plan.
docker got requests to allow you to configure a private registry, but they selfishly denied the ability to do that:
https://stackoverflow.com/questions/33054369/how-to-change-t...
redhat created docker-compatible podman and lets you close that hole
/etc/config/docker: BLOCK_REGISTRY='--block-registry=all' ADD_REGISTRY='--add-registry=registry.access.redhat.com'
Even if you could configure a default registry to point at something besides docker.io a lot of people, I'd say the vast majority, wouldn't have bothered. So they'd still be in the same spot.
And it's not hard to just tag images. I don't have a single image pulling from docker.io at work. Takes two seconds to slap <company-repo>/ at the front of the image name.
For example, if you're on a team and you have documentation containing commands, but your docker config is outdated, you can accidentally pull from docker's global public registry.
A welcome change IMO would be removing global registries entirely, since it just makes it easier to tell where your image is coming from (but I severely doubt docker would ever consider this since it makes it fractionally easier to use their services)
> [Monitoring] We are seeing error rates recovering across our SaaS services. We continue to monitor as we process our backlog.