Landrecords – Cheap Nationwide Parcel Dataset Standardized Using Gemma3
Posted5 months agoActive4 months ago
landrecords.usTechstory
supportivepositive
Debate
0/100
Open DataStandardizationGis
Key topics
Open Data
Standardization
Gis
A new dataset of nationwide parcel data has been made available, standardized using Gemma3, at a low cost.
Snapshot generated from the HN discussion
Discussion Activity
Light discussionFirst comment
N/A
Peak period
2
24-30h
Avg / period
1.3
Key moments
- 01Story posted
Aug 19, 2025 at 10:01 PM EDT
5 months ago
Step 01 - 02First comment
Aug 19, 2025 at 10:01 PM EDT
0s after posting
Step 02 - 03Peak activity
2 comments in 24-30h
Hottest window of the conversation
Step 03 - 04Latest activity
Aug 23, 2025 at 10:10 AM EDT
4 months ago
Step 04
Generating AI Summary...
Analyzing up to 500 comments to identify key contributors and discussion patterns
ID: 44958031Type: storyLast synced: 11/20/2025, 11:47:20 AM
Want the full context?
Jump to the original sources
Read the primary article or dive into the live Hacker News thread when you're ready.
Because I don't have $100K+ to buy the US parcel dataset from Regrid or ReportAll, I bought a pair of L40s and a 30TB NVMe hard drive, and used them to collect and harmonize 155M parcels into a single dataset from over 3,100 US counties.
And because I don't have a couple dozen employees to feed like Reportall and Regrid and Corelogic, my goal is to try to resell this dataset at much lower prices than the current incumbents, and make the data accessible to smaller projects and smaller budgets.
I ended up with close to 99% coverage of the United States.
Backend stack is a single server running Postgres, gemma3 on ollama, and a big pile of python and plpgsql. Website is running on Firebase with PMTiles as the mapping layer. Parcel file exports are served from Google Cloud Storage.
My plan is to open-source a big portion of this system once I can clean it up, but my first priority was getting a product on the market and trying to make this self-sustaining.
If anyone is interested in any of the technical details or if you want to try to do this yourself, I'm happy to share anything you want to know.
I was able to automate a big chunk of this work by crawling county websites and looking for these web services that I could download from.
But there is no agreed-upon schema standard -- they all store the data in different formats, schemas, etc. About 50% of the effort in maintaining a dataset like this is maintaining the mappings from the source data to the target schema. That's where I am making heavy use of LLMs. This turns out to be something they are very good at. I found gemma3 to have the best balance of reliability, ease of use, and speed for my use case.