Download Responsibly
Posted4 months agoActive3 months ago
blog.geofabrik.deTechstoryHigh profile
heatedmixed
Debate
80/100
OpenstreetmapData DownloadResponsible UsageRate Limiting
Key topics
Openstreetmap
Data Download
Responsible Usage
Rate Limiting
The Geofabrik blog post 'Download responsibly' highlights the issue of excessive downloads from their OpenStreetMap data servers, sparking a discussion on responsible usage and potential solutions like rate limiting.
Snapshot generated from the HN discussion
Discussion Activity
Very active discussionFirst comment
48m
Peak period
99
0-6h
Avg / period
22.9
Comment distribution160 data points
Loading chart...
Based on 160 loaded comments
Key moments
- 01Story posted
Sep 22, 2025 at 1:33 AM EDT
4 months ago
Step 01 - 02First comment
Sep 22, 2025 at 2:21 AM EDT
48m after posting
Step 02 - 03Peak activity
99 comments in 0-6h
Hottest window of the conversation
Step 03 - 04Latest activity
Sep 24, 2025 at 4:39 AM EDT
3 months ago
Step 04
Generating AI Summary...
Analyzing up to 500 comments to identify key contributors and discussion patterns
ID: 45329414Type: storyLast synced: 11/20/2025, 8:23:06 PM
Want the full context?
Jump to the original sources
Read the primary article or dive into the live Hacker News thread when you're ready.
However, the pratical evidence is to the contrary, AI companies are hammering every webserver out there, ignoring any kind of convention like robots.txt, re-downloading everything in pointlessly short intervals. Annoying everyone and killing services.
Just a few recent examples from HN: https://news.ycombinator.com/item?id=45260793 https://news.ycombinator.com/item?id=45226206 https://news.ycombinator.com/item?id=45150919 https://news.ycombinator.com/item?id=42549624 https://news.ycombinator.com/item?id=43476337 https://news.ycombinator.com/item?id=35701565
Make people sign up if they want a url they can `curl` and then either block or charge users who download too much.
> Just the other day, one user has managed to download almost 10,000 copies of the italy-latest.osm.pbf file in 24 hours!
No wonder many just turn caching entirely off at some point and never turn it back on.
Anybody can build a pipeline to get a task done (thousands of quick & shallow howto blog posts) but doing this efficiently so it becomes a flywheel rather than a blocker for teams is the hard part.
Not just caching but optimising job execution order and downstream dependencies too.
The faster it fails, the faster the developer feedback, and the faster a fix can be introduced.
I quite enjoy the work and always learning new techniques to squeeze extra performance or save time.
For example GMP blocked GitHub:
https://www.theregister.com/2023/06/28/microsofts_github_gmp...
This "emergency measure" is still in place, but there are mirrors available so it doesn't actually matter too much.
E.g. my SQLite project downloads code from the GitHub mirror rather than Fossil.
That way someone manually downloading the file is not impacted, but if you try to put the url in a script it won’t work.
Although they refer to IP ranges, the same principle applies on a smaller scale to a single IP address: (1) dynamic IP addresses get reallocated, and (2) entire buildings (universities, libraries, hotels, etc.) might share a single IP address.
Aside from accidentally affecting innocent users, you also open up the possibility of a DOS attack: the attacker just has to abuse the service from an IP address that he wants to deny access to.
Then it only takes one bad user on the same subnet to ruin the experience for everyone else. That sucks, and isn't working as intended, because the intent was to only punish the one abusive user.
Also, everyone go contribute/done to OSM.
Shapefiles shouldn't be what you're after, Parquet can almost always do a better job unless you need to either edit something or use really advanced geometry not yet supported in Parquet.
Also, this is your best source for bulk OSM data: https://tech.marksblogg.com/overture-dec-2024-update.html
If you're using ArcGIS Pro, use this plugin: https://tech.marksblogg.com/overture-maps-esri-arcgis-pro.ht...
1. BitTorrent has a bad rep. Most people still associate it with just illegal download.
2. It requires slightly more complex firewall rules, and asking the network admin to put them in place might raise some eyebrow for reason 1. On very restrictive network, they might not want to allow them at all due to the fact that it opens the door for, well, BitTorrent.
3. A BitTorrent client is more complicated than an HTTP client, and not installed on most company computer / ci pipeline (for lack of need, and again reason 1.). A lot of people just want to `curl` and be done with it.
4. A lot of people think they are required to seed, and for some reason that scare the hell of them.
Overall, I think it is mostly 1 and the fact that you can just simply `curl` stuff and have everything working. I do sadden me that people do not understand how good of a file transfer protocol BT is and how it is underused. I do remember some video game client using BT for updates under the hood, and peertube use webtorrent, but BT is sadly not very popular.
Truly the last two open web titans.
Some of the reasons consists of lawyers sending put costly cease and desist letters even to "legitimate" users
DMCA demands are, as far as I'm aware, completely automated and couldn't really cost much.
Making a copy of a thing does not violate copyright (eg you can photocopy a book that you possess even temporarily). Sharing a copy that you made can violate copyright.
It is like mixing up “it’s illegal to poison somebody with bleach” and “it’s illegal to own bleach”. The action you take makes a big difference
Also, as an aside, when you view a legitimately-purchased and downloaded video file that you have license to watch, the video player you use makes a copy from the disk to memory.
If I own a license to listen to Metallica - Enter Sandman.m4a that I bought on iTunes and in the download folder I screw up and I make
Metallica - Enter Sandman(1).m4a
Metallica - Enter Sandman(2).m4a
Metallica - Enter Sandman(3).m4a
How much money do I owe Lars Ulrich for doing that based on The Law of The Earth Everywhere But Switzerland?
Making copies of a book you legally own for personal use is an established fair use exception to copyright. However, making copies of a book that you borrowed from a library would be copyright infringement. Similarly, lending the copies you've made of a book to friends would technically void the fair use exception for your copies.
The copy that a playback device has to make of a copyrighted audio/video file for its basic functioning is typically mentioned explicitly in the license you buy, thus being an authorized copy for a specific purpose. If you make several copies of a file on your own system for personal use, then again you are likely within fair use exemptions, similar to copying a book case - though this is often a bit more complicated legally by the fact that you don't own a copy but a license to use the work in various ways, and some companies' licenses can theoretically prohibit even archival copies, which in turn may or may not be legal in various jurisdictions.
But in no jurisdiction is it legal to, for example, go with a portable photocopy machine into a bookstore and make copies of books you find in there, even if they are only for personal use: you first have to legally acquire an authorized copy from the rights holder. All other exemptions apply to what you do with that legally obtained copy.
This even means that you don't have any rights to use a fraudulent copy of a work, even if you legitimately believed you were obtaining a legal copy. For example, say a library legally bought a book from a shady bookstore that, unbeknownst to them, was selling counterfeit copies of a book. If the copyright holder finds out, they can legally force the library to pay them to continue offering this book, or to destroy it otherwise, along with any archival copies that they had made of this book. The library can of course seek to obtain reparations from the store that sold them the illegal copy, but they can't refuse to pay the legal copyright holder.
This is a very funny thing to say given that post is entirely correct, while you are wrong.
> Making a copy of a thing does not violate copyright
Yes it does, unless it's permitted under a designated copyright exemption by local law. For instance, you mention that the video player makes a copy from disk to memory, well that is explicitly permitted by Article 5(1) of the Copyright Directive 2001 in the EU as a use that is "temporary, transient or incidental and an integral and essential part of a technological process", as otherwise it would be illegal as by default, any action to copy is a breach of copyright. That's literally where the word comes from.
> If I own a license to listen to Metallica - Enter Sandman.m4a that I bought on iTunes and in the download folder I screw up and I make
> Metallica - Enter Sandman(1).m4a
> Metallica - Enter Sandman(2).m4a
> Metallica - Enter Sandman(3).m4a
In legal terms you do indeed owe him something, yes. It would probably be covered under the private copy exemptions in some EU territories, but only on the basis that blank media is taxed to pay rightsholders a royalty for these actions under the relevant collective management associations.
To answer your question with the only answer I know: Switzerland.
The only exception (sort of) is Switzerland. And the reason downloading copyrighted content you haven't bought for personal use is legal in Switzerland is because the government is essentially paying for it - there is a tax in Switzerland on empty media, the proceeds from which are distributed to copyright holders whose content is consumed in Switzerland, regardless of whether it is bought directly from the rights holder or otherwise.
Apparently the legal status of downloading copyrighted materials for personal use is also murky in Spain, where apparently at least one judge found that it is legal - but I don't know how solid the reasoning was or whether other judges would agree (being a civil law country, legal precedent is not binding in Spain to the same extent that it would be in the UK or USA).
No it isn't.
It's not a criminal offense, but if someone can sue you for it and win then it isn't "legal" under any technical or popular definition of the word.
Otherwise, less any examples of enforcement or successful legal action, downloading movies is illegal in the US in the same way that blasphemy is illegal in Michigan.
https://www.legislature.mi.gov/Laws/MCL?objectName=MCL-750-1...
It could be argued that if you bought a movie, say on DVD, downloading another copy of it from an online source could fall under fair use, but this is more debatable.
[0] https://legalclarity.org/is-pirating-movies-illegal-what-are...
Even the Protecting Lawful Streaming Act of 2020 explicitly does not punish consumers of copyrighted content, only its distributors.
>Tillis stated that the bill is tailored to specifically target the websites themselves, and not "those who may use the sites nor those individuals who access pirated streams or unwittingly stream unauthorized copies of copyrighted works"
There are so many paragraphs in response to my “You can’t get in trouble for downloading movies in the US” post and none of them have any examples of people getting in trouble for downloading movies in the US.
Poland signed Berne convention in 1919, has "well regulated" copyright, but still downloading all media (except for software) for personal use is fully legal. Tax on "empty media" is in place as well.
Format shifting and personal copying are legal in Poland, but you as an individual still have to have legally obtained your original in the first place to exercise that right, and an illicit download certainly doesn't count. Taxing "empty media" is to compensate for those format shifting rights, but it doesn't cover renumeration for acquiring media in the first place (and indeed no EU member state could operate such a scheme - they are prohibited by EU Directive 2001/29 https://eur-lex.europa.eu/legal-content/EN/ALL/?uri=celex%3A...).
That's the Polish law, both the letter and the implementation. On at least one occasion the police issued an official statement saying exactly that.
I think no one was ever fined in Poland for incidental upload while using bittorrent protocol to download. There are high profile cases for people who where publishing large amounts of media files, especially commercially. Little more than a decade ago there was one case where some company tried to go after bittorrent downloaders of 3 specific Polish movies. But I think it was ultimately thrown out or cheaply settled because no case like that has been publicized ever since and everybody who knows how to use bittorent, does.
Again, it covers everything except for software that has more restrictive laws more similar to what you think the law is.
Tax on empties was set up long time ago to support creators who's music is shared among friends directly. It's was not intended to compensate for downloads. I think only Polish artists receive any money from this (I might be wrong on that) and the organization that distributes the money is highly inefficient. They tried to extend the tax to electronic devices, but nobody likes them, companies and people both, so they didn't get too far with this proposal for now.
Poland enjoys a lot of digital freedoms and is conscious of them and ready to defend them against ACTA, Chat Control and extend them with Stop Killing Games.
Like everywhere else where personal copies are legal and you can download them. If both conditions are true, then the mere fact that you are downloading it, it's not a sign you are downloading pirated content.
OTOH there is also Spain where piracy with no direct monetary gain is tolerated and nobody goes after people torrenting.
Thank you.
Obviously illegal ≠ immoral, and being a free-software/libre advocate opposed to copyright, I am in favor of the free sharing of humanity's knowledge, and therefore supportive of piracy, but that doesn't change the perception in a corporate environment.
Your use of the stuff might not be at all malware like, but in a corporate environment if it isn't needed it gets flagged as something to be checked up on in case it is not there for good reason. I've been flagged for some of the tools I've played with, and this is fine: I have legitimate use for that sort of thing in my dealings with infrastructure, there are flags ticked that say “Dave has good reason to have these tools installed, don't bother us about it again unless he fails to install security updates that are released for them”, and this is fine: I want those things flagged in case people who won't be doing the things I do end up with such stuff installed without there knowledge, so it can be dealt with (and they can be given more compulsory “don't just thoughtlessly click on every link in any email you receive, and carelessly type your credentials into resulting forms” training!).
So, I'm not using BT at work anymore.
A month later an IT admin came to ask what I might be doing with port 6881. Once I remembered, we went to the tracker's website and saw "imperial.ac.uk" had the top position for seeding, by far.
The admin said to leave running.
This can be read in two wildly different ways.
"S3 quietly deprecates BitTorrent support" - https://news.ycombinator.com/item?id=27524549
but you are already uploading while you are still downloading. and that can't be turned off. if seeding scares someone, then uploading should scare them too. so they are right, because they are required to upload.
For public trackers maybe.
Now, as a seeder, you may still be interested in those clients being able to download and reach whatever information you are seeding.
In the same vein, as a seeder, you may just not serve those clients. That's kind of the beauty of it. I understand that there may be some old school/cultural "code of conduct" but really this is not a problem with a behavioral but instead with a technical solution that happens to be already built-in.
well, yes and no. legal issues aside (think about using bittorrent only for legal stuff), the whole point of bittorrent is that it works best if everyone uploads.
actually, allowing clients to disable uploading is almost an acknowledgement that illegal uses should be supported, because there are few reasons why legal uses should need to disable uploading.
and as an uploader i also don't want others not to upload. so while disabling upload is technically possible, it is also reasonable and not unlikely that connections from such clients could be rejected.
Almost every client let you set uploading limit, which you can set at 0. The only thing that generate upload bp usage that cannot be deactivated would be protocol stuff (but you can deactivate part of bt like using the DHT).
Well, in many such situations data is provided for free, putting huge burden on the other side. Even it it's a little bit less convenient it makes service a lot more sustainable. I imagine torrent for free tier and direct download as a premium option would work perfectly
This basically handles every problem stated. There's nothing to install on computers: it's just js running on the page. There's no firewall rules or port forwarding to setup, all handled by the stun/turn in webrtc. Users wouldn't necessarily even be aware they are uploading.
The advantage is that at least it's all builtin. It's not a magic solution, but it's a pretty good solution, with fallbacks builtin for when the networking gets in the way of the magic.
You know what has a bad rep? Big companies that use and trade my personal information like they own it. I'll start caring about copyrights when governments force these big companies to care about my information.
6. Service providers have little control over the service level of seeders and thus the user experience. And that's before you get malicious users.
I think it’s more a matter of how large the demand is for frequent downloads of very large files/sets, which leads to a questions of reliability and seeding volume, all versus the effort involved to develop the tooling and integrate it with various RCS and file syncing services.
Would something like Git LFS help here? I’m at the limit of my understanding for this.
Immutability of specific releases is great, but you also want a way to find new related releases/versions.
For some you want a name where the underlying resource can change, for others you want a hash of the actual resource. Which one you want depends on the application.
In the context of webpages, a domain lets you deploy new versions.
With a torrent file, a domain does not let you do that.
Please try to understand the comparison they're making instead of just saying "domains are not hashes" "domains do exist".
> For some you want a name where the underlying resource can change, for others you want a hash of the actual resource. Which one you want depends on the application.
Right.
And torrents don't give you the choice.
Not having the choice is much closer to "bug" than "feature".
Needing a new magnet link is fine. The old magnet link working indefinitely is great. Having no way to get from old magnet to new magnet is not as fine.
There are many other ways torrents get used, where people aren't looking at the website or there is no website.
Everybody on HN should know how a domain works. I think most people on HN understand what a hash is and how a magnet link works. The fact that you can't easily replace the resource under a magnet link is a feature not a bug. If you think for a bit about the consequences of what would happen if you could easily replace the resources associated with a magnet link rather than just having the 'convenience of being able to update a torrent' and you'll see that this is not a simple thing at all.
Torrents are simply a different thing than 'the web' and to try to equate the one to the other is about as silly as trying to say that you can't use a screwdriver to put nails in the wall. They're different things. Analogies are supposed to be useful, not a demonstration of your complete lack of understanding of the underlying material.
I distribute some software that I wrote using a torrent with a magnet link, so I'm well aware of the limitations there, but these limitations are exactly why I picked using a torrent in the first place.
I didn't even go that far. I just said link to a new one.
You're the one that said replacing can be good! What is this.
If you're using ArcGIS Pro, use this plugin: https://tech.marksblogg.com/overture-maps-esri-arcgis-pro.ht...
TomTom did a few write ups on their contribution, this one is from 2023: https://www.tomtom.com/newsroom/behind-the-map/how-tomtom-ma...
If you have QGIS running, I did a walkthrough using the GeoParquet Downloader Plugin with the 2.75B Building dataset TUM released a few weeks ago. It can take any bounding box you have your workspace centred on and download the latest transport layers for Overture. No need for a custom URL as its one of the default data sources the plugin ships with. https://tech.marksblogg.com/building-footprints-gba.html
Additionally, as anyone who has tried to share an internet connection with someone heavily torrenting, the excessive number of connections means overall quality of non-torrent traffic on networks goes down.
Not to mention, of course, that BitTorrent has a significant stigma attached to it.
The answer would have been a squid cache box before, but https makes that very difficult as you would have to install mitm certs on all devices.
For container images, yes you have pull through registries etc, but not only are these non-trivial to setup (as a service and for each client) the cloud providers charge quite a lot for storage making it difficult to justify when not having a check "works just fine".
The Linux distros (and CPAN and texlive etc) have had mirror networks for years that partially addresses these problems, and there was an OpenCaching project running that could have helped, but it is not really sustainable for the wide variety of content that would be cached outside of video media or packages that only appear on caches hours after publishing.
BitTorrent might seem seductive, but it just moves the problem, it doesn't solve it.
As a consumer, I pay the same for my data transfer regardless of the location of the endpoint though, and ISPs arrange peering accordingly. If this topology is common then I expect ISPs to adjust their arrangements to cater for it, just the same as any other topology.
Two eyeball networks (consumer/business ISPs) are unlikely to have large PNIs with each other across wide geographical areas to cover sudden bursts of traffic between them. They will, however, have substantial capacity to content networks (not just CDNs, but AWS/Google etc) which is what they will have built out.
BitTorrent turns fairly predictable "North/South" traffic where capacity can be planned in advance and handed off "hot potato" as quickly as possible, into what is essentially "East/West" with no clear consistency which would cause massive amounts of congestion and/or unused capacity as they have to carry it potentially over long distances they have not been used to, with no guarantee that this large flow will exist in a few weeks time.
If BitTorrent knew network topology, it could act smarter -- CDNs accept BGP feeds from carriers and ISPs so that they can steer the traffic, this isn't practical for BitTorrent!
it could surely be made to care for topology but imho handing that problem to congestion control and routing mechanisms in lower levels works good enough and should not be a problem.
At the expense of other traffic. Do this experiment: find something large-ish to download over HTTP, perhaps an ISO or similar from Debian or FreeBSD. See what the speed is like, and try looking at a few websites.
Now have a large torrent active at the same time, and see how slow the HTTP download drops to, and how much slower the web is. Perhaps try a Twitch stream or YouTube video, and see how the quality suffers greatly and/or starts rebuffering.
Your HTTP download uses a single TCP connection, most websites will just use a single connection also (perhaps a few short-duration extra connections for js libraries on different domains etc). By comparison, BitTorrent will have dozens if not hundreds of connections open and so instead of sharing that connection in half (roughly) it is monopolising 95%+ of your connection.
The other main issue I forgot to mention is that on most cloud providers, downloading from the internet is free, uploading to the internet costs a lot... So not many on public cloud are going to want to start seeding torrents!
I had the same thoughts for some time now. It would be really nice to distribute software and containers this way. A lot of people have the same data locally and we could just share it.
[1]: https://www.bittorrent.org/beps/bep_0046.html
https://github.com/uber/kraken exists, using a modified BT protocol, but unless you are distributing quite large images to a very large number of nodes, a centralized registry is probably faster, simpler and cheaper
98 more comments available on Hacker News