On Having a Data Object
Posted2 months agoActiveabout 2 months ago
natemeyvis.comTechstory
calmmixed
Debate
40/100
Software DevelopmentData ModelingCode Organization
Key topics
Software Development
Data Modeling
Code Organization
The article discusses the concept of a 'data object' in software development, sparking a discussion on its merits and potential drawbacks in code organization and data modeling.
Snapshot generated from the HN discussion
Discussion Activity
Moderate engagementFirst comment
5d
Peak period
8
Day 6
Avg / period
4
Comment distribution16 data points
Loading chart...
Based on 16 loaded comments
Key moments
- 01Story posted
Oct 27, 2025 at 7:52 AM EDT
2 months ago
Step 01 - 02First comment
Nov 1, 2025 at 9:36 AM EDT
5d after posting
Step 02 - 03Peak activity
8 comments in Day 6
Hottest window of the conversation
Step 03 - 04Latest activity
Nov 6, 2025 at 7:29 AM EST
about 2 months ago
Step 04
Generating AI Summary...
Analyzing up to 500 comments to identify key contributors and discussion patterns
ID: 45719919Type: storyLast synced: 11/20/2025, 12:50:41 PM
Want the full context?
Jump to the original sources
Read the primary article or dive into the live Hacker News thread when you're ready.
Onboarding new programmers to your codebase and making the codebase simpler for developers to reason about is a massive non-functional benefit. Unless you have a very strong reason to do things otherwise, follow the principle of "least surprise". In fact vibe coding adds another layer to this - an LLM generally expects the most common pattern - and so maintenance and testing will be orders of magnitude easier.
1. The downsides of following it are severe, and 2. People often expect slightly different things, so you often don't get the benefits of following convention.
You can (should) also apply it selectively. Eg for auth, I'd never want a single UserDTO used for creating and displaying a user - creating a user requires a password, a field you don't want when retrieving a user, to avoid mistakes.
I know DDD advocates would say that you're then not being true to DDD, but yes that's business. It's very very hard to get everyone to agree with the reduced velocity of 100% adherence to DDD for an extended period. In my experience it starts off as "this is great" then people start hate reviewing PRs for simple changes that have 28 new files (particularly in Java) and they quietly moan to the boss about being slowed down by DDD
> 1. You should often be using different objects in different contexts.
This is because "data" are just "facts" that your application has observed. Different facts are relevant in different circumstances. The User class in my application may be very similar to the User class in your application, they may even have identical "login" implementations, but neither captures the "essence" of a "User", because the set of facts one could observe about Users is unbounded, and combinatorially explosive. This holds for subsets of facts as well. Maybe our login method only cares about a User's email address and password, but to support all the other stuff in our app, we have to either: 1. Pass along every piece of data and behavior the entire app specifies 2. Create another data object that captures only the facts that login cares about (e.g. a LoginPayload object, or a LoginUser object, Credential object, etc.)
Option 1 is a nightmare because refactoring requires taking into consideration ALL usages of the object, regardless of whether or not the changes are relevant to the caller. Option 2 sucks because your Object hierarchy is combinatorial on the number of distinct _callers_. That's why it is so hard to refactor large systems programmed in this style.
> 3. The classes get huge and painful.
The author observed the combinatorial explosion of facts!
If you have a rich information landscape that is relevant to your application, you are going to have a bad time if you try modeling it with Data Objects. Full stop.
See Rich Hickey's talks, but in particular this section about the shortcomings of data objects compared to plain data structures (maps in this case).
https://www.youtube.com/watch?v=aSEQfqNYNAc
I kinda like that. Suppose we do something like `let mut authn = UserLoginView.build(userDataRepository); let session = authn.login(user, pwd)`. You no longer get to have one monolithic user object—you need a separate UserDataRepository and UserLoginView—but the relationship between those two objects encodes exactly what the login process does and doesn't need to know about users. No action-at-a-distance.
I've never used clojure, but the impression I get of its "many functions operating over the same map" philosophy is that you trade away your ability to make structural guarantees about which functions depend on which fields. It's the opposite of the strong structural guarantees I love in Rust or Haskell.
You might make this trade off using map keys like strings or keywords, but not if you use namespace qualified keywords like ::my-namespace/id, in combination with something like spec.alpha or malli, in which case you can easily make those structural guarantees in a way that is more expressive than an ordinary type system.
In the first case, it makes sense to unit test logins using every conceivable variation of `UserLoginView`s. In the second case, your surface area is much larger. `userDataMap` is full of details that are irrelevant to logins, so you only test the small relevant subset of user data variations. As the code ages and changes, it becomes harder and harder to assess at a glance whether your test data really represents all the test cases you need or not.
I worry that Clojure-style maps don't fix the problems pointed out by the article. In a codebase that passes around big dumb data objects representing important entities (incrementally processing them, updating fields, etc), the logic eventually gets tangled. Every function touches a wide assortment of fields, and your test data is full of details that are probably inconsequential but you can't tell without inspecting the functions. I don't see how Clojure solves this without its own UserLoginView-style abstraction.
- generative testing (with clojure.spec.gen.alpha/generate)
- function instrumentation (with clojure.spec.test.alpha/instrument)- automatic failure case minimization (with clojure.spec.alpha/explain + explain-data)
- data normalization / coercion (with clojure.spec.alpha/conform)
- easier refactoring - You can change specs without changing data structures
- serialization is free - maps already serialize, whereas you have to implement it with Records.
Plus you get to leverage the million other functions that already work on maps, because they are the fundamental data structure in Clojure. You just don't have to create the intermediate record, let your data be data.
Generally, I prefer to create functions for specific queries, rather than for specific "entity" types, and the return type of each query matching the result of the query. This fits with the reality that queries often involve multiple entity types.
My favourite application-later database tool so far is https://www.jooq.org/ because it allows for code generation from the database schema, allowing for type-safe construction of queries. I find this makes it easier to create and maintain queries. It is a relatively unopinionated power tool, with minimal attempts at "automagic" behaviour. I find myself missing jOOQ now that I am not working much with Java.