On Having a Data Object

Posted2 months agoActiveabout 2 months ago

Theaetetus

40 points

16 comments

natemeyvis.comTechstory

calmmixed

Debate

40/100

Software DevelopmentData ModelingCode Organization

Key topics

Software Development

Data Modeling

Code Organization

The article discusses the concept of a 'data object' in software development, sparking a discussion on its merits and potential drawbacks in code organization and data modeling.

Snapshot generated from the HN discussion

Discussion Activity

Moderate engagement

First comment

Peak period

Day 6

Avg / period

Comment distribution16 data points

Loading chart...

Based on 16 loaded comments

Key moments

01Story posted
Oct 27, 2025 at 7:52 AM EDT
2 months ago
Step 01
02First comment
Nov 1, 2025 at 9:36 AM EDT
5d after posting
Step 02
03Peak activity
8 comments in Day 6
Hottest window of the conversation
Step 03
04Latest activity
Nov 6, 2025 at 7:29 AM EST
about 2 months ago
Step 04

Generating AI Summary...

Analyzing up to 500 comments to identify key contributors and discussion patterns

Discussion (16 comments)

Showing 16 comments

reillyse

2 months ago

1 reply

The fact that people expect a data object, as argued by the author, is a very strong argument in favor of having one.

Onboarding new programmers to your codebase and making the codebase simpler for developers to reason about is a massive non-functional benefit. Unless you have a very strong reason to do things otherwise, follow the principle of "least surprise". In fact vibe coding adds another layer to this - an LLM generally expects the most common pattern - and so maintenance and testing will be orders of magnitude easier.

TheaetetusAuthor

2 months ago

I agree that following expected patterns is a _prima facie_ reason to do something. That said, in this case:

1. The downsides of following it are severe, and 2. People often expect slightly different things, so you often don't get the benefits of following convention.

hexbin010

2 months ago

1 reply

It's a trade off like everything. More DTOs means more mapping, more coming up with names, more files etc. There's definitely a middle ground.

You can (should) also apply it selectively. Eg for auth, I'd never want a single UserDTO used for creating and displaying a user - creating a user requires a password, a field you don't want when retrieving a user, to avoid mistakes.

I know DDD advocates would say that you're then not being true to DDD, but yes that's business. It's very very hard to get everyone to agree with the reduced velocity of 100% adherence to DDD for an extended period. In my experience it starts off as "this is great" then people start hate reviewing PRs for simple changes that have 28 new files (particularly in Java) and they quietly moan to the boss about being slowed down by DDD

TheaetetusAuthor

2 months ago

I basically agree with this. Extra objects do carry costs. That said, usually the extra objects are necessary to make your code conform to reality. N things with N names in N files definitely carries overhead, but if the alternative is N things with 1 name in 1 file, and programmers carrying the complexity in their heads (or the complexity being a bunch of spaghetti in the one file), I'll happily accept the N - N - N side of the tradeoff.

codemonkey-zeta

2 months ago

2 replies

Author is on the verge of having a Clojure epiphany.

> 1. You should often be using different objects in different contexts.

This is because "data" are just "facts" that your application has observed. Different facts are relevant in different circumstances. The User class in my application may be very similar to the User class in your application, they may even have identical "login" implementations, but neither captures the "essence" of a "User", because the set of facts one could observe about Users is unbounded, and combinatorially explosive. This holds for subsets of facts as well. Maybe our login method only cares about a User's email address and password, but to support all the other stuff in our app, we have to either: 1. Pass along every piece of data and behavior the entire app specifies 2. Create another data object that captures only the facts that login cares about (e.g. a LoginPayload object, or a LoginUser object, Credential object, etc.)

Option 1 is a nightmare because refactoring requires taking into consideration ALL usages of the object, regardless of whether or not the changes are relevant to the caller. Option 2 sucks because your Object hierarchy is combinatorial on the number of distinct _callers_. That's why it is so hard to refactor large systems programmed in this style.

> 3. The classes get huge and painful.

The author observed the combinatorial explosion of facts!

If you have a rich information landscape that is relevant to your application, you are going to have a bad time if you try modeling it with Data Objects. Full stop.

See Rich Hickey's talks, but in particular this section about the shortcomings of data objects compared to plain data structures (maps in this case).

https://www.youtube.com/watch?v=aSEQfqNYNAc

bccdee

2 months ago

1 reply

> Option 2 sucks because your Object hierarchy is combinatorial on the number of distinct _callers_.

I kinda like that. Suppose we do something like `let mut authn = UserLoginView.build(userDataRepository); let session = authn.login(user, pwd)`. You no longer get to have one monolithic user object—you need a separate UserDataRepository and UserLoginView—but the relationship between those two objects encodes exactly what the login process does and doesn't need to know about users. No action-at-a-distance.

I've never used clojure, but the impression I get of its "many functions operating over the same map" philosophy is that you trade away your ability to make structural guarantees about which functions depend on which fields. It's the opposite of the strong structural guarantees I love in Rust or Haskell.

codemonkey-zeta

2 months ago

1 reply

> you trade away your ability to make structural guarantees about which functions depend on which fields

You might make this trade off using map keys like strings or keywords, but not if you use namespace qualified keywords like ::my-namespace/id, in combination with something like spec.alpha or malli, in which case you can easily make those structural guarantees in a way that is more expressive than an ordinary type system.

bccdee

2 months ago

1 reply

Spec & Malli look cool. But my concern is more with something like this (reusing my earlier example):

  let mut authn = UserLoginView.build(userDataRepository);
  let session = authn.login(user, pwd);
  // vs
  let session = userLogin(userDataMap);

In the first case, we know that `login` only has access to the fields in `UserLoginView`. In the second case, `userLogin` has access to every field in `userDataMap`. It's not simple to know how changes to other facets of the user entity will bleed across into logins. With `UserLoginView`, the separation is explicit, and the exchange between the general pool of user info and the specific view of it required for handling authorization is wrapped up in one factory method.

In the first case, it makes sense to unit test logins using every conceivable variation of `UserLoginView`s. In the second case, your surface area is much larger. `userDataMap` is full of details that are irrelevant to logins, so you only test the small relevant subset of user data variations. As the code ages and changes, it becomes harder and harder to assess at a glance whether your test data really represents all the test cases you need or not.

I worry that Clojure-style maps don't fix the problems pointed out by the article. In a codebase that passes around big dumb data objects representing important entities (incrementally processing them, updating fields, etc), the logic eventually gets tangled. Every function touches a wide assortment of fields, and your test data is full of details that are probably inconsequential but you can't tell without inspecting the functions. I don't see how Clojure solves this without its own UserLoginView-style abstraction.

codemonkey-zeta

about 2 months ago

To be clear, there's nothing wrong with your approach, and many people implement systems exactly the way you are describing in Clojure using Records (which are Java classes).

  (defrecord UserLoginView [email password])

  ;; DIFFERENCE: compile-time validation
  (defn login [^UserLoginView view]
    (authenticate (:email view) (:password view)))

  ;; Usage
  (let [user-data {:user/email "user@example.com"
                   :user/password-hash "hash123"
                   :user/address "123 Main St"
                   :user/purchase-history []}

        ;; DIFFERENCE: construct the intermediary data structure - ignore extra stuff explicitly
        login-view (->UserLoginView
                     (:user/email user-data)
                     (:user/password-hash user-data))]
    (login login-view))

I prefer not to work this way though. The spec-driven alternative could be:

  (require '[clojure.spec.alpha :as s])
  (require '[clojure.spec.gen.alpha :as gen])

  (s/def :user.login/email string?)
  (s/def :user.login/password-hash string?)
  (s/def :user.login/credentials
     (s/keys :req [:user.login/email ;; spec's compose
                   :user.login/password-hash]))

  (defn login [credentials]
    ;; DIFFERENCE: runtime validation
    {:pre [(s/valid? :user.login/credentials credentials)]}
    (authenticate (:user.login/email credentials)
                  (:user.login/password-hash credentials)))

  (let [user-data {:user.login/email "user@example.com"
                   :user.login/password-hash "hash123"
                   :user/address "123 Main St"  
                   :user/purchase-history []}]
    ;; DIFFERENCE: extra data ignored implicitly
    (login user-data))

  ;; Can also pass a minimal map
  (login {:user.login/email "user@example.com"
          :user.login/password-hash "hash123"})

  ;; or you can generate the data (only possible because spec is a runtime construct)
  (let [user-data 
        (gen/generate (s/gen :user.login/credentials)) ; evaluates to #:user.login{:email "cWC1t3", :password-hash "Ok85cHMP5Bhrd4Lzx"}
        ]
    (login user-data))

The drawbacks of Records are the same for Objects - Records couple data structure to behavior (they're Java classes with methods), while spec separates validation from data, giving you:

- generative testing (with clojure.spec.gen.alpha/generate)

  You say "it makes sense to unit test logins using every conceivable variation of `UserLoginView`", well, with spec you can actually *do that*:

  (require '[clojure.test.check.properties :as prop])
  (require '[clojure.test.check.clojure-test :refer [defspec]])

  (defspec login-always-returns-session 100
    (prop/for-all [creds (s/gen :user.login/credentials)]
      (let [result (login creds)]
        (s/valid? :user.session/token result))))

  This is impossible with Records/Objects - you can't generate arbitrary Record instances without custom generators.

- function instrumentation (with clojure.spec.test.alpha/instrument)

- automatic failure case minimization (with clojure.spec.alpha/explain + explain-data)

- data normalization / coercion (with clojure.spec.alpha/conform)

- easier refactoring - You can change specs without changing data structures

- serialization is free - maps already serialize, whereas you have to implement it with Records.

Plus you get to leverage the million other functions that already work on maps, because they are the fundamental data structure in Clojure. You just don't have to create the intermediate record, let your data be data.

TheaetetusAuthor

2 months ago

1 reply

I really appreciate this comment. I have some passing familiarity with Clojure, but I don't understand it (and its motivations) fully and would like to understand it better. Thanks for the YouTube link; I'd love some reading material (essay-length preferred but anything will do) also, if you have any to recommend.

codemonkey-zeta

about 2 months ago

This is a great place to start! https://clojure.org/about/history

Noumenon72

2 months ago

1 reply

I'm not sure I understand the different approaches being compared here. Are you opposing 1 and supporting 2?

  1. HatsService with methods .get_hats(), .throw_hats()  
  2. wearable_hats = db.query(Hats).map(hat => WearableHatDto(hat)
     throwable_hats = db.query(Hats).map(hat => ThrowableHatDto(hat)

TheaetetusAuthor

2 months ago

1 reply

(2)-ish patterns are more what I have in mind, but note that as soon as you're expecting the same `db.query(hats)` to give you the data you need for both (1) and (2), you might already be pretty far down the wrong road.

Noumenon72

2 months ago

I'm trying this right now and I don't get it. There's no difference between

    HatsService
        getWearableHats() { return db.query(Hats).toDto() } 
        getThrowableHats() { return db.otherQuery(Hats).toDto() }

and

    # wearable.py
    getWearableHats() { return db.query(Hats).toDto() }

    # throwable.py
    getThrowableHats() { return db.otherQuery(Hats).toDto() }

rokkamokka

2 months ago

In my experience (in our rather large MVC-style Laravel code base) DTOs are almost always an unnecessary abstraction. I'm much more content just shuffling actual Models around, with small methods that map these to whatever format the client then requires. I've refactored away many a DTO added by junior developers and the code is always much simplified.

oftenwrong

2 months ago

There is something to be said for having some basic data access libraries already in place, even if they are not ideal, so that developers can bang out functionality more quickly. That is the typical selling point of ORMs, isn't it? While there are well-known downsides, you can skip the ORM when it's not a good fit, or later when you realise that it is causing a problem.

Generally, I prefer to create functions for specific queries, rather than for specific "entity" types, and the return type of each query matching the result of the query. This fits with the reality that queries often involve multiple entity types.

My favourite application-later database tool so far is https://www.jooq.org/ because it allows for code generation from the database schema, allowing for type-safe construction of queries. I find this makes it easier to create and maintain queries. It is a relatively unopinionated power tool, with minimal attempts at "automagic" behaviour. I find myself missing jOOQ now that I am not working much with Java.

View full discussion on Hacker News

ID: 45719919Type: storyLast synced: 11/20/2025, 12:50:41 PM

Want the full context?

Jump to the original sources

Read the primary article or dive into the live Hacker News thread when you're ready.

Open link View on HN