Protobuffers Are Wrong (2018)

4 months ago

5 replies

I'm more than a little curious what event caused such a strong objection to protobuffers. :D

I do tend to agree that they are bad. I also agree that people put a little too much credence in "came from Google." I can't bring myself to have this much anger towards it. Had to have been something that sparked this.

rimunroe

4 months ago

2 replies

I'm just a frontend developer so most of my exposure is just as an API consumer and not someone working on the service side of things. That said:

A few years ago I moved to a large company where protobufs were the standard way APIs were defined. When I first started working with the generated TypeScript code, I was confused as to why almost all fields on generated object types were marked as optional. I assumed it was due to the way people were choosing to define the API at first, but then I learned this was an intentional design choice on the part of protobufs.

We ended up having to write our own code to parse the responses from the "helpfully" generated TypeScript client's responses. This meant we had to also handle rejecting nonsensical responses where an actually required field wasn't present, which is exactly the sort of thing I'd want generated clients to do. I would expect having to do some transformation myself, but not to that degree. The generated client was essentially useless to us, and the protocol's looseness offered no discernible benefit over any other API format I've used.

I imagine some of my other complaints could be solved with better codegen tools, but I think fundamentally the looseness of the type system is a fatal issue for me.

4 months ago

3 replies

Yeah, as soon as you have a moderately complex type the generated code is basically useless. Honestly, ~80% of my gripes about protocol buffers could be alleviated by just allowing me to mark a message field as required.

iamdelirium

4 months ago

2 replies

You think you do but you really don't.

What happens if you mark a field as required and then you need to delete it in the future? You can't because if someone stored that proto somewhere and is no longer seeing the field, you just broke their code.

ozgrakkurt

4 months ago

1 reply

Maybe you don’t delete it then?

4 months ago

I mean, this is essentially the same lesson that database admins learn with nullable fields. Often it isn't the "deleting one is hard" so much as "adding one can be costly."

It isn't that you can't do it. But the code side of the equation is the cheap side.

4 months ago

1 reply

If you need to deserialize an old version then it's not a problem. The unknown field is just ignored during deserialization. The problem is adding a required field since some clients might be sending the old value during the rollout.

But in some situations you can be pretty confident that a field will be required always. And if you turn out to be wrong then it's not a huge deal. You add the new field as optional first (with all upgraded clients setting the value) and then once that is rolled out you make it required.

And if a field is in fact semantically required (like the API cannot process a request without the data in a field) then making it optional at the interface level doesn't really solve anything. The message will get deserialized but if the field is not set it's just an immediate error which doesn't seem much worse to me than a deserialization error.

iamdelirium

4 months ago

1 reply

1. Then it's not really required if it can be ignored.

2. This is the problem, software (and protos) can live for a long time). They might be used by other clients elsewhere that you don't control. What you thought might not required 10 years down the line is not anymore. What you "think" is not a huge deal then becomes a huge deal and can cause downtime.

3. You're mixing business logic and over the wire field requirement. If a message is required for an interface to function, you should be checking it anyway and returning the correct error. How is that change with proto supporting require?

4 months ago

> Then it's not really required if it can be ignored.

It can be required in v2 but not in v1 which was my point. If the client is running v2 while the server is still on v1 temporarily, then there is no problem. The server just ignores the new field until it is upgraded.

> This is the problem, software (and protos) can live for a long time). They might be used by other clients elsewhere that you don't control. What you thought might not required 10 years down the line is not anymore. What you "think" is not a huge deal then becomes a huge deal and can cause downtime.

Part of this is just that trying to create a format that is suitable both as an rpc wire serialization format and ALSO a format suitable for long term storage leads to something that is not great for either use case. But even taking that into account, RDBMS have been dealing with this problem for decades and every RDBMS lets you define fields as non-nullable.

> If a message is required for an interface to function, you should be checking it anyway and returning the correct error. How is that change with proto supporting require?

That's my point, you have to do that check in code which clutters the implementation with validation noise. That and you often can't use the wire message in your internal domain model since you now have to do that defensive null-check everywhere the object is used.

Aside from that, protocol buffers are an interface definition language so should be able to encode some of the validation logic at least (make invalid states unrepresentable and all that). If you are just looking at the proto IDL you have no way of knowing whether a field is really required or not because there is no way to specify that.

4 months ago

1 reply

To add to the sibling, I've seen this with Java enums a lot. People will add it so that the value is consumed using the enum as fast as they can. This works well as long as the value is not retrieved from data. As soon as you do that, you lose the ability to add to the possible values in a rolling release way. It can be very frustrating to know that we can't push a new producer of a value before we first change all consumers. Even if all consumers already use switch statements with default clauses to exhaustively cover behavior.

4 months ago

1 reply

But this is something you should be able to handle on a case-by-case basis. If you have a type which is stored durably as protobuf then adding required fields is much harder. But if you are just dealing with transient rpc messages then it can be done relatively easily in a two step process. First you add the field as optional and then once all producers are upgraded (and setting the new field), make it required. It's annoying for sure but still seems better than having everything optional always and needing to deal with that in application code everywhere.

4 months ago

1 reply

Largely true. If you are at Google scale, odds are you have mixed fleets deployed. Such that it is a bit of involved process. But it is well defined and doable. I think a lot of us would rather not do a dance we don't have to do?

4 months ago

Sure, you just have to balance that against the cost of a poorly specified API interface. The errors because clients aren't clear on what is really required or not, what they should consider an error if it is not defined, etc. And of course all the boilerplate code that you have to write to convert the interface model to an internal domain model you can actually use inside your code.

cherrycherry98

4 months ago

3 replies

Proto2 let you do this and the "required" keyword was removed because of the problems it introduces when evolving the schema in a system with many users that you don't necessarily control. Let's say you want to add a new required field, if your system receives messages from clients some clients may be sending you old data without the field and now the parse step fails because it detects a missing field. If you ever want to remove a required field you have the opposite problem, there will components that have to have those fields present just to satisfy the parser even if they're only interested in some other fields.

Philosophically, checking that a field is required or not is data validation and doesn't have anything to do with serialization. You can't specify that an integer falls into a certain valid range or that a string has a valid number of characters or is the correct format (e.g. if it's supposed to be an email or a phone number). The application code needs to do that kind of validation anyway. If something really is required then that should be the application's responsibility to deal with it appropriately if it's missing.

The Captn Proto docs also describe why being able to declare required fields is a bad idea: https://capnproto.org/faq.html#how-do-i-make-a-field-require...

rimunroe

4 months ago

1 reply

> Philosophically, checking that a field is required or not is data validation and doesn't have anything to do with serialization.

My issue is that people seem to like to use protobuf to describe the shape of APIs rather than just something to handle serialization. I think it's very bad at the describing API shapes.

4 months ago

I think it is somewhat of a natural failure of DRY taken to the extreme? People seem to want to get it so that they describe the API in a way that is then generated for clients and implementations.

It is amusing, in many ways. This is specifically part of what WSDL aspired to, but people were betrayed by the big companies not having a common ground for what shapes they would support in a description.

instig007

4 months ago

1 reply

> Let's say you want to add a new required field, if your system receives messages from clients some clients may be sending you old data without the field and now the parse step fails because it detects a missing field.

A parser has to (inherently) neither fail (compatibility mode) nor lose the new field (a passthrough mode), nor allow diverging (strict mode). The fact that capnproto/parser authors don't realize that the same single protocol can operate in three different scenarios (but strictly speaking: at boundaries vs in middleware) at the same time, should not result in your thinking that there are problems with required fields in protocols. This is one of the most bizzare kinds of FUD in the industry.

kentonv

4 months ago

1 reply

Hi, I'm the apparently-FUD-spreading Cap'n Proto author.

Sure! You could certainly imagine extending Protobuf or Cap'n Proto with a way to specify validation that only happens when you explicitly request it. You'd then have separate functions to parse vs. to validate a message, and then you can perform strict validation at the endpoints but skip it in middleware.

This is a perfectly valid feature idea which many people have entertained an even implemented successfully. But I tend to think it's not worth trying to do have this in the schema language because in order to support every kind of validation you might want, you end up needing a complete programming language. Plus different components might have different requirements and therefore need different validation (e.g. middleware vs. endpoints). In the end I think it is better to write any validation functions in your actual programming language. But I can certainly see where people might disagree.

lostdog

4 months ago

1 reply

It gets super frustrating to have to empty/null check fields everywhere you use them, especially for fields that are effectively required for the message to make sense.

A very common example I see is Vec3 (just x, y, z). In proto2 you should be checking for the presence of x,y,z every time you use them, and when you do that in math equations, the incessant existence checks completely obscure the math. Really, you want to validate the presence of these fields during the parse. But in practice, what I see is either just assuming the fields exist in code and crashing on null, or admitting that protos are too clunky to use, and immediately converting every proto into a mirror internal type. It really feels like there's a major design gap here.

Don't get me started on the moronic design of proto3, where every time you see Vec3(0,0,0) you get to wonder whether it's the right value or mistakenly unset.

kentonv

4 months ago

1 reply

> It gets super frustrating to have to empty/null check fields everywhere you use them, especially for fields that are effectively required for the message to make sense.

That's why Protobuf and Cap'n Proto have default values. You should not bother checking for presence of fields that are always supposed to be there. If the sender forgot to set a field, then they get the default value. That's their problem.

> just assuming the fields exist in code and crashing on null

There shouldn't be any nulls you can crash on. If your protobuf implementation is returning null rather than a default value, it's a bad implementation, not just frustrating to use but arguably insecure. No implementation of mine ever worked that way, for sure.

lostdog

4 months ago

1 reply

Sadly, the default values are an even bigger source of bugs. We just caught another one at $work where a field was never being filled in, but the default values made it look fine. It caused hidden failures later on.

It's an incredibly frustrating "feature" to deal with, and causes lots of problems in proto3.

kentonv

4 months ago

You can still verify presence explicitly if you want, with the `has` methods.

But if you don't check, it should return a default value rather than null. You don't want your server to crash on bad input.

https://github.com/connectrpc/connect-es

4 months ago

> Philosophically, checking that a field is required or not is data validation and doesn't have anything to do with serialization

But protocol buffers is not just a serialization format it is an interface definition language. And not being able to communicate that a field is required or not is very limiting. Sometimes things are required to process a message. If you need to add a new field but be able to process older versions of the message where the field wasn't required (or didn't exist) then you can just add it as optional.

I understand that in some situations you have very hard compatibility requirements and it makes sense to make everything optional and deal with it in application code, but adding a required attribute to fields doesn't stop you from doing that. You can still just make everything optional. You can even add a CI lint that prevents people from merging code with required fields. But making required fields illegal at the interface definition level just strikes me as killing a fly with a bazooka.

4 months ago

It used to be that there was no official TypeScript protobuf generator from Google and third-party generators sucked. Using protobufs from web browser or in nodejs was painful.

Couple years ago Connect released very good generator for TypeScript, we use in in production and it's great:

mike_hearn

4 months ago

1 reply

He says that in the article; he had to work on a "compiler" project that was much harder than it should have been because of protobuf's design choices.

4 months ago

Yeah, I saw that. I took that as something that happened in the past, though. Certainly colored a lot of the thinking, but feels like something more immediate had to have happened. :D

mrits

4 months ago

I've used them almost daily for 15 years. They are way down the list of things I'd want improved. It has been interesting to see the protobuffers killers die out every few years though

jandrese

4 months ago

As a developer I always see "came from Google" as a yellow flag.

Too often I find something mildly interesting, but then realize that in order for me to try to use it I need to set up a personal mirror of half of Google's tech stack to even get it to start.

https://en.wikipedia.org/wiki/Jeff_Dean

4 months ago

I feel like I could have written an article like this at various points. Probably while spending two hours trying to figure out a way to represent some protobuf type in a sane way internally.

ndr

4 months ago

7 replies

Not even before the first line ends you get "They’re clearly written by amateurs".

This is a rage bait, not worth the read.

jilles

4 months ago

4 replies

The best way to get your point across is by starting with ad-hominem attacks to assert your superior intelligence.

tshaddox

4 months ago

1 reply

IMO it's a pretty reasonable claim about experience level, not intelligence, and isn't at all an ad hominem attack because it's referring directly to the fundamental design choices of protocol buffers and thus is not at all a fallacy of irrelevance.

compiler-guy

4 months ago

Whatever else Jeff Dean and Sanjay Ghemawat are, and whatever mistakes they made in designing protobufs, they are not amateurs.

Not long after they designed and implemented protobuffers, they shared the ACM prize in computing, as well as many other similar honors. And the honors keep stacking up.

None of this means that protobufs are perfect (or even good), but it does mean they weren't amateurs when they did it.

https://en.wikipedia.org/wiki/Sanjay_Ghemawat

instig007

4 months ago

1 reply

Yeah, let's pretend that type algebra doesn't exist, and even if it does exist then it's not useful and definitely isn't practical in data protocols. Let's believe that the authors of protobuf considered everything, and since they aren't amateurs (by the virtue of having worked on protobuf at Google, presumably), every elaborated opinion that draws them as amateurs at applying type algebra in data protocol designs is a personal ad-hominem attack.

tptacek

4 months ago

1 reply

They're not amateurs by virtue of being some of the most senior engineers ever to work at Google. You don't get to play the "ad hominem" card while calling them names. This whole thread is embarrassing.

instig007

4 months ago

1 reply

Ok, "some of the most senior engineers ever to work at Google" don't seem to know that static bounds checking don't require dependent types: https://news.ycombinator.com/item?id=45150008

> You don't get to play the "ad hominem" card while calling them names

The entire article explains it at length why there's the impression, it's not ad-hominem.

tptacek

4 months ago

Previous threads on this story have spelled out specifically which Googlers were behind this design, and, again, it's embarrassing that anybody is trying to defend the hill of "protobuf's designers were amateurs". You can keep digging in if you want.

perching_aix

4 months ago

Is this in reference to the blogpost, the comment above, or your own comment? Cause it honestly works for all of them.

notmyjob

4 months ago

I disagree, unless you are in the majority.

jeffbee

4 months ago

1 reply

Yep, the article opens with a Hall of Fame-grade compound fallacy: a strawman refutation of a hypothetical ad hominem that nobody has argued.

You can kinda see how this author got bounced out of several major tech firms in one year or less, each, according to their linkedin.

omnicognate

4 months ago

It's a terrible attitude and I agree that sort of thing shouldn't be (and generally isn't) tolerated for long in a professional environment.

That said the article is full of technical detail and voices several serious shortcomings of protobuf that I've encountered myself, along with suggestions as to how it could be done better. It's a shame it comes packaged with unwarranted personal attacks.

BugsJustFindMe

4 months ago

2 replies

If only the article offered both detailed analyses of the problems and also solutions. Wait, it does! You should try reading it.

pphysch

4 months ago

Where's the download link for the solution? I must have missed it.

kiitos

4 months ago

it does not

btilly

4 months ago

1 reply

The reasons for that line get at a fundamental tension. As David Wheeler famously said, "All problems in computer science can be solved by another level of indirection, except for the problem of too many indirections."

Over time we accumulate cleverer and cleverer abstractions. And any abstraction that we've internalized, we stop seeing. It just becomes how we want to do things, and we have no sense of what cost we are imposing with others. Because all abstractions leak. And all abstractions pose a barrier for the maintenance programmer.

All of which leads to the problem that Brian Kernighan warned about with, "Everyone knows that debugging is twice as hard as writing a program in the first place. So if you’re as clever as you can be when you write it, how will you ever debug it?" Except that the person who will have to debug it is probably a maintenance programmer who doesn't know your abstractions.

One of the key pieces of wisdom that show through Google's approaches is that our industry's tendency towards abstraction is toxic. As much as any particular abstraction is powerful, allowing too many becomes its own problem. This is why, for example, Go was designed to strongly discourage over-abstraction.

Protobufs do exactly what it says on the tin. As long as you are using them in the straightforward way which they are intended for, they work great. All of his complaints boil down to, "I tried to do some meta-manipulation to generate new abstractions, and the design said I couldn't."

That isn't the result of them being written by amateurs. That's the result of them being written to incorporate a piece of engineering wisdom that most programmers think that they are smart enough to ignore. (My past self was definitely one of those programmers.)

Can the technology be abused? Do people do stupid things with them? Are there things that you might want to do that you can't? Absolutely. But if you KISS, they work great. And the more you keep it simple, the better they work. I consider that an incentive towards creating better engineered designs.

b_e_n_t_o_n

4 months ago

I think you nailed it. So many complaints about Go for example basically come down to "it didn't let me create X abstraction" and that's basically the point.

TZubiri

4 months ago

> if (m_foo = null)

Imagine calling google amateurs, and then the only code you write has a first year student error in failing to distinguish assignment from comparision operator.

There's a class of rant on the internet where programmers complain about increasingly foundational tech instead of admitting skill issues. If you go far deep into that hole, you end up rewriting the kernel in Rust.

IncreasePosts

4 months ago

It's written by amateurs, but solves problems that only Google(one of the biggest/most advanced tech companies in the world) has.

awalsh128

4 months ago

Yeah, there is a lot of snark in the article which undermines their argument.

dano

4 months ago

5 replies

It is a 7 year old article without specifying alternatives to an "already solved problem."

So HN, what are the best alternatives available today and why?

gsliepen

4 months ago

2 replies

Something like MessagePack or CBOR, and if you want versioning, just have a version field at the start. You don't require a schema to pack/unpack, which I personally think is a good thing.

mgaunard

4 months ago

Arrow is also becoming a good contender, with the extra benefit it is better optimized for data batches.

fmbb

4 months ago

> You don't require a schema to pack/unpack

Then it hardly solves the same problem Protobuf solves.

rapsey

4 months ago

1 reply

There are none, protobufs are great.

nicce

4 months ago

1 reply

Depends. ASN.1 is a beast and another industry standard, but unfortunately the best tooling is closed source.

4 months ago

There was ZERO PB tooling in 2000. Just write it for ASN.1 instead.

akavi

4 months ago

Mentioned above: https://github.com/stepchowfun/typical

mdhb

4 months ago

CBOR is probably the best and most standards compliant thing out there that I’m aware of.

It’s the new default in a lot of IOT specs, it’s the backbone for deep space communication networks etc..

Maintains interoperability with JSON. Is very much battle tested in very challenging environments.

https://github.com/stepchowfun/typical

4 months ago

Support across languages etc is much less mature but I find thrift serialization format to be much nicer than protobuf. The codegen somehow manages to produce types that look like types I would actually write compared to the monstrosities that protoc generates.

lalaithion

4 months ago

19 replies

Protocol buffers suck but so does everything else. Name another serialization declaration format that both (a) defines which changes can be make backwards-compatibly, and (b) has a linter that enforces backwards compatible changes.

Just with those two criteria you’re down to, like, six formats at most, of which Protocol Buffers is the most widely used.

And I know the article says no one uses the backwards compatible stuff but that’s bizarre to me – setting up N clients and a server that use protocol buffers to communicate and then being able to add fields to the schema and then deploy the servers and clients in any order is way nicer than it is with some other formats that force you to babysit deployment order.

The reason why protos suck is because remote procedure calls suck, and protos expose that suckage instead of trying to hide it until you trip on it. I hope the people working on protos, and other alternatives, continue to improve them, but they’re not worse than not using them today.

jitl

4 months ago

6 replies

Not widely used but I like Typical's approach

> Typical offers a new solution ("asymmetric" fields) to the classic problem of how to safely add or remove fields in record types without breaking compatibility. The concept of asymmetric fields also solves the dual problem of how to preserve compatibility when adding or removing cases in sum types.

cornstalks

4 months ago

1 reply

I've never heard of Typical but the fact they didn't repeat protobuf's sin regarding varint encoding (or use leb128 encoding...) makes me very interested! Thank you for sharing, I'm going to have to give it a spin.

zigzag312

4 months ago

1 reply

It looks similar to how vint64 lib encodes varints. Total length of varint can be determined via the first byte alone.

haberman

4 months ago

2 replies

I advocated for PrefixVarint (which seems equivalent to vint64 ) for WebAssembly, but it was decided against, in favor of LEB128: https://github.com/WebAssembly/design/issues/601

The recent CREL format for ELF also uses the more established LEB128: https://news.ycombinator.com/item?id=41222021

At this point I don't feel like I have a clear opinion about whether PrefixVarint is worth it, compared with LEB128.

kannanvijayan

4 months ago

1 reply

Varint encoding is something I've peeked at in various contexts. My personal bias is towards the prefix-style, as it feels faster to decode and the segregation of the meta-data from the payload data is nice.

But, the thing that tends to tip the scales is the fact that in almost all real world cases, small numbers dominate - as the github thread you linked relates in a comment.

The LEB128 fast-path is a single conditional with no data-dependencies:

  if ! (x & 0x80) { x }

Modern CPUs will characterize that branch really well and you'll pay almost zero cost for the fastpath which also happens to be the dominant path.

It's hard to beat.

yencabulator

4 months ago

SQLite format equivalent:

  if x <= 240 { x }

while strictly improving all other aspects (at least IMHO)

https://sqlite.org/src4/doc/trunk/www/varint.wiki

zigzag312

4 months ago

Just remember that XML was more established than JSON for a long time.

rkagerer

4 months ago

1 reply

More direct link to the juicy bit: https://github.com/stepchowfun/typical?tab=readme-ov-file#as...

An asymmetric field in a struct is considered required for the writer, but optional for the reader.

sdenton4

4 months ago

1 reply

That's a nice idea... But I believe the design direction of proto buffers was to make everything `optional`, because `required` tends to bite you later when you realize it should actually be optional.

bilkow

4 months ago

2 replies

My understanding is that asymmetric fields provide a migration path in case that happens, as stated in the docs:

> Unlike optional fields, an asymmetric field can safely be promoted to required and vice versa.

> [...]

> Suppose we now want to remove a required field. It may be unsafe to delete the field directly, since then clients might stop setting it before servers can handle its absence. But we can demote it to asymmetric, which forces servers to consider it optional and handle its potential absence, even though clients are still required to set it. Once that change has been rolled out (at least to servers), we can confidently delete the field (or demote it to optional), as the servers no longer rely on it.

yencabulator

4 months ago

> My understanding is that asymmetric fields provide a migration path in case that happens, as stated in the docs:

If you can assume you can churn a generation of fresh data soonish, and never again read the old data. For RPC sure, but someone like Google has petabytes of stored protobufs, so they don't pretend they can upgrade all the writers.

sdenton4

4 months ago

....or we can just say that everything is optional always, and leave it to the servers instead of the protocol to handle irregularities.

4 months ago

2 replies

Seems like a lot of effort to avoid adding a message version field. I’m not a web guy, so maybe I’m missing the point here, but I always embed a schema version field in my data.

vouwfietsman

4 months ago

3 replies

I get that.

The point is that its hard to prevent asymmetry in message versions if you are working with many communicating systems. Lets say four services inter-communicate with some protocol, it is extremely annoying to impose a deployment order where the producer of a message type is the last to upgrade the message schema, as this causes unnecessary dependencies between the release trains of these services. At the same time, one cannot simply say: "I don't know this message version, I will disregard it" because in live systems this will mean the systems go out of sync, data is lost, stuff breaks, etc.

There's probably more issues I haven't mentioned, but long story short: in live, interconnected systems, it becomes important to have intelligent message versioning, i.e: a version number is not enough.

4 months ago

1 reply

I think I see what you’re getting at? My mental model is client and server, but you’re implying a more complex topology where no one service is uniquely a server or a client. You’d like to insert a new version at an arbitrary position in the graph without worrying about dependencies or the operational complexity of doing a phased deployment. The result is that you try to maintain a principled, constructive ambiguity around the message schema, hence asymmetrical fields? I guess I’m still unconvinced and I may have started the argument wrong, but I can see a reasonable person doing it that way.

vouwfietsman

4 months ago

1 reply

Yes thats a big part, but even bigger is just the alignment of teams.

Imagine team A building feature XYZ Team B is building TUV

one of those features in each team deals with messages, the others are unrelated. At some point in time, both teams have to deploy.

If you have to sync them up just to get the protocol to work, thats an extra complexity in the already complex work of the teams.

If you can ignore this, great!

It becomes even more complex with rolling updates though: not all deployments of a service will have the new code immediately, because you want multiple to be online to scale on demand. This creates an immediate necessary ambiguity in the qeustion: "which version does this service accept?" because its not about the service anymore, but about the deployments.

4 months ago

1 reply

Ah, I see. Team A would like to deploy a new version of a service. It used to accept messages with schema S, but the new version accepts only S’ and not S. So the only thing you can do is define S’ so that it is ambiguous with S. Team B uses Team A’s service but doesn’t want to have to coordinate deployments with Team A.

I think the key source of my confusion was Team A not being able to continue supporting schema S once the new version is released. That certainly makes the problem harder.

vouwfietsman

4 months ago

Exactly!

kiitos

4 months ago

> Lets say four services inter-communicate with some protocol, it is extremely annoying to impose a deployment order where the producer of a message type is the last to upgrade the message schema

i don't know how you arrived at this conclusion

the protocol is the unifying substrate, it is the source of truth, the services are subservient to the protocol, it's not the other way around

also it's not just like each service has a single version, each instance of each service can have separate versions as well!

what you're describing as "annoying" is really just "reality", you can't hand-wave away the problems that reality presents

1718627440

4 months ago

> one cannot simply say: "I don't know this message version, I will disregard it" because in live systems this will mean the systems go out of sync, data is lost, stuff breaks, etc.

You already need to deal with lost messages, rejected messages, so just treat this case the same. If you have versions surely you have code to deal with mismatches and e.g. fail back to the older version.

vineyardmike

4 months ago

1 reply

Idk I generally think “magic numbers” are just extra effort. The main annoyance is adding if statements everywhere on version number instead of checking the data field you need being present.

It also really depends on the scope of the issue. Protos really excel at “rolling” updates and continuous changes instead of fixed APIs. For example, MicroserviceA calls MicroserviceB, but the teams do deployments different times of the week. Constant rolling of the version number for each change is annoying vs just checking for the new feature. Especially if you could have several active versions at a time.

It also frees you from actually propagating a single version number everywhere. If you own a bunch of API endpoints, you either need to put the version in the URL, which impacts every endpoint at once, or you need to put it in the request/response of every one.

4 months ago

I think this is only a problem if you’re using a weak data interchange library that can’t use the schema number field to discriminate a union. Because you really shouldn’t have to write that if statement yourself.

atombender

4 months ago

I'm really hoping Typical will catch on, as I quite like the design. One important gap right now is the lack of Go and Python support.

summerlight

4 months ago

This seems interesting. Still not sure if `required` is a good thing to have (for persistent data like log you cannot really guarantee some field's presence without schema versioning baked into the file itself) but for an intermediate wire use cases, this will help.

zigzag312

4 months ago

This actually looks quite interesting.

jnwatson

4 months ago

1 reply

ASN.1 implements message versioning in an extremely precise way. Implementing a linter would be trivial.

4 months ago

1 reply

This. Plus ASN.1 is pluggable as to encoding rules and has a large family of them:

  - BER/DER/CER (TLV)
  - OER and PER ("packed" -- no tags and
                 no lengths wherever
                 possible)
  - XER (XML!)
  - JER (JSON!)
  - GSER (textual representation)
  - you can add your own!
    (One could add one based on XDR,
     which would look a lot like OER/PER
     in a way.)

ASN.1 also gives you a way to do things like formalize typed holes.

Not looking at ASN.1, not even its history and evolution, when creating PB was a crime.

StopDisinfo910

4 months ago

2 replies

The people who wrote PB clearly knew ASN.1. It was the most famous IDL at the time. Do you assume they just came one morning and decided to write PB without taking a look at what existed?

Anyway, as stated PB does more than ASN.1. It specifies both the description format and the encoding. PB is ready to be used out of the box. You have a compact IDL and a performant encoding format without having to think about anything. You have to remember that PB was designed for internal Google use as a tool to solve their problems, not as a generic solution.

ASN.1 is extremely unwieldy in comparaison. It has accumulated a lot of cruft through the year. Plus they don’t provide a default implementation.

troupo

4 months ago

1 reply

> The people who wrote PB clearly knew ASN.1.

And your assumption is based on what exactly?

> It was the most famous IDL at the time.

Strange that at the same time (2001) people were busy implementing everyting in Java and XML, not ASN.1

> Do you assume they just came one morning and decided to write PB without taking a look at what existed?

Yes, that is a great assumption. Looking at what most companies do, this is an assumption bordering on prescience.

StopDisinfo910

4 months ago

> Strange that at the same time (2001) people were busy implementing everyting in Java and XML, not ASN.1

Yes. Meanwhile Google was designing an IDL with a default binary serialisation format. And this is not 2025 typical big corp, over staffed, fake HR levels heavy Google we are talking about. That’s Google in its heyday. I think you have answered your own comment.

4 months ago

1 reply

> Do you assume they just came one morning and decided to write PB without taking a look at what existed?

Considering how bad an imitation of 1984 ASN.1 PB's IDL is, and how bad an imitation of 1984 DER PB is, yes I assume that PB's creators did not in fact know ASN.1 well. They almost certainly knew of ASN.1, and they almost certainly did not know enough about it because all the worst mistakes in ASN.1 PB re-created while adding zero new ideas or functionality. It's a terrible shame.

StopDisinfo910

4 months ago

PB is not a bad imitation of 1984 ASN.1. ASN.1 is choke full of useless representations clearly there to serve what a committee thought the need of the telco industry should be.

I find it funny you are making it looks like a good and pleasant to use IDL. It’s a perfect example of design by committee at its worst.

PB is significantly more space efficient than DER by the way.

mattnewton

4 months ago

2 replies

Exactly, I think of protobuffers like I think of Java or Go - at least they weren’t writing it in C++.

Dragging your org away from using poorly specified json is often worth these papercuts IMO.

const_cast

4 months ago

1 reply

Protobufs are better but not best. Still, by far, the easiest thing to use and the safest is actual APIs. Like, in your application. Interfaces and stuff.

Obviously if your thing HAS to communicate over the network that's one thing, but a lot of applications don't. The distributed system micro service stuff is a choice.

Guys, distributed systems are hard. The extremely low API visibility combined with fragile network calls and unsafe, poorly specified API versioning means your stuff is going to break, and a lot.

Want a version controlled API? Just write in interface in C# or PHP or whatever.

4 months ago

1 reply

> Protobufs are better but not best.

This sort of comments doesn't add anything to the discussion unless you are able to point out what you believe to be the best. It reads as an unnecessary and unsubstantiated put-down.

const_cast

4 months ago

I... did.

anonymousiam

4 months ago

The original RPC code, from which Google derived their protobuf stuff was written in (pre-ANSI) C at Sun Microsystems.

mgaunard

4 months ago

2 replies

in the systems I built I didn't bother with backwards compatibility.

If you make any change, it's a new message type.

For compatibility you can coerce the new message to the old message and dual-publish.

jimbokun

4 months ago

1 reply

That only works if you control all the clients.

mgaunard

4 months ago

Dual-publishing makes it transparent to older clients.

Obviously you need to track when the old clients have been moved over so you can eventually retire the dual-publishing.

You could also do the conversion on the receiving side without a-priori information, but that would be extremely slow.

o11c

4 months ago

I prefer a little builtin backwards (and forwards!) compatibility (by always enforcing a length for each object, to be zero-padded or truncated as needed), but yes "don't fear adding new types" is an important lesson.

tyleo

4 months ago

1 reply

We use protocol buffers on a game and we use the back compat stuff all the time.

We include a version number with each release of the game. If we change a proto we add new fields and deprecate old ones and increment the version. We use the version number to run a series of steps on each proto to upgrade old fields to new ones.

swiftcoder

4 months ago

1 reply

> We use the version number to run a series of steps on each proto to upgrade old fields to new ones

It sounds like you've built your own back-compat functionality on top of protobuf?

The only functionality protobuf is giving you here is optional-by-default (and mandatory version numbers, but most wire formats require that)

tyleo

4 months ago

1 reply

Yeah, I’d probably say something more like, “we leverage protobuf built ins to make a slightly more advanced back compat system”

We do rename deprecated fields and often give new fields their names. We rely on the field number to make that work.

vkou

4 months ago

1 reply

> We do rename deprecated fields and often give new fields their names. We rely on the field number to make that work.

Why share names? Wouldn't it be safer to, well, not?

tyleo

4 months ago

The code becomes hard to read. You might need to change int health to float health. In that case “health” properly describes the idea. We’d change this to int DEPRECATED_health and float health.

Folks can argue that’s ugly but I’ve not seen one instance of someone confused.

maximilianburke

4 months ago

2 replies

Flatbuffers satisfies those requirements and doesn’t have varint shenanigans.

leoc

4 months ago

1 reply

What about Cap’n Proto https://capnproto.org/ ? (Don't know much about these things myself, but it's a name that usually comes up in these discussions.)

usrnm

4 months ago

2 replies

Cap'n'proto is not very nice to work with in C++, and I'd discourage anyone from using it from other programming languages, the implementations are just not there yet. We use both cnp and protobufs at work, and I vastly prefer protobufs, even for C++. I only wish they stayed the hell away from abseil, though.

porridgeraisin

4 months ago

1 reply

I always thought people had a positive view on abseil, never used it myself other than when tinkering on random projects. What's the main issue?

usrnm

4 months ago

2 replies

The thing is a huge pain to manage as a dependency, especially if you wander away from the official google-approved way of doing things. Protobuf went from a breeze to use to the single most common source of build issues in our cross-platform project the moment they added this dependency. It's so bad that many distros and package managers keep the pre-abseil version as a separate package, and many just prefer to get stuck with it rather than upgrade. Same with other google libraries that added abseil as a dependency, as far as I'm aware

mkoubaa

4 months ago

1 reply

I'd rather they just used the abseil headers they needed with the abseil license at the top than make it a build dependency.

The concept of a package is antithetical to C++ and no amount of tooling can fix that.

usrnm

4 months ago

1 reply

abseil is not header-only, though

mkoubaa

4 months ago

Skill issue

jjmarr

4 months ago

I like abseil besides the compile times. Not having to specialize my own hash when using maps is nice.

yencabulator

4 months ago

The developer experience of capnproto is pretty darn miserable. I replaced my Rust use of it with https://rkyv.org/ -- probably the biggest ergonomic improvement was a single validation after which the message is safe to look at, instead of errors on every code path. The biggest downside was loss of built-in per-message schema evolution; in my use case I can have one version number up front.

AYBABTME

4 months ago

But you can't trust flatbuffers sent from unknown senders.

stickfigure

4 months ago

2 replies

Backwards compatibility is just not an issue in self-describing structures like JSON, Java serialization, and (dating myself) Hessian. You can add fields and you can remove fields. That's enough to allow seamless migrations.

It's only positional protocols that have this problem.

jimbokun

4 months ago

1 reply

At the cost of much larger payloads.

stickfigure

4 months ago

With gzip encoding... not really.

dangets

4 months ago

1 reply

You can remove JSON fields at the cost of breaking your clients at runtime that expect those fields. Of course the same can happen with any deserialization libraries, but protobufs at least make it more explicit - and you may also be more easily able to track down consumers using older versions.

nomel

4 months ago

For the missing case, whenever I use json, I always start with a sane default struct, then overwrite those with the externally provided values. If a field is missing, it will be handled reasonably.

4 months ago

1 reply

> Just with those two criteria you’re down to, like, six formats at most, of which Protocol Buffers is the most widely used.

What I dislike the most about blog posts like this is that, although the blogger is very opinionated and critical of many things, the post dates back to 2018, protobuf is still dominant, and apparently during all these years the blogger failed to put together something that they felt was a better way to solve the problem. I mean, it's perfectly fine if they feel strongly about a topic. However, investing so much energy to criticize and even throw personal attacks on whoever contributed to the project feels pointless and an exercise in self promotion at the expense of shit-talking. Either you put something together that you feel implements your vision and rights some wrongs, or don't go out of your day to put down people. Not cool.

ardit33

4 months ago

1 reply

JSON exists, and when compressed it is pretty efficient. (not as efficient as protobuff though).

For client facing protocol Protobufs is a nightmare to use. For Machine to Machine services, it is ok-ish, yet personally I still don't like it.

When I was at Spotify we ditched it for client side apis (server to mobile/web), and never looked back. No one liked working with it.

4 months ago

1 reply

> JSON exists (...)

The blog post leads with the personal assertion that "ad-hoc and built by amateurs". Therefore I doubt that JSON, a data serialization language designed by trimming most of JavaScript out and to be parses with eval(), would meet the opinionated high bar.

Also, JSON is a data interchange language, and has no support for types beyond the notoriously ill-defined primitives. In contrast, protobuf is a data serialization language which supports specifying types. This means that for JSON, to start to come close to meet the requirements met by protobuf, would need to be paired with schema validation frameworks and custom configurable parsers. Which it definitely does not cover.

ardit33

4 months ago

1 reply

You must be young. XML and XML Schemas existed before JSON or Protobuf, and people ditched them for a good reason and JSON took over.

Protobuf is just another version of the old RPC/Java Beans, etc... of a binary format. Yes, it is more efficient data wise than JSON, but it is a PITA to work on and debug with.

https://protobuf.dev/programming-guides/encoding/

4 months ago

> You must be young. XML and XML Schemas existed before JSON or Protobuf, and people ditched them for a good reason and JSON took over.

I'm not sure you got the point. It's irrelevant how old JSON or XML (a non sequitur) are. The point is that one of the main features and selling points of protobuf is strong typing and model validation implemented at the parsing level. JSON does not support any of these, and you need to onboard more than one ad-hoc tool to have a shot at feature parity, which goes against the blogger's opinionated position on the topic.

naikrovek

4 months ago

1 reply

TLV style binary formats are all you need. The “Type” in that acronym is a 32-bit number which you can use to version all of your stuff so that files are backwards compatible. Software that reads these should read all versions of a particular type and write only the latest version.

Code for TLV is easy to write and to read, which makes viewing programs easy. TLV data is fast for computers to write and to read.

Protobuf is overused because people are fucking scared to death to write binary data. They don’t trust themselves to do it, which is just nonsense to me. It’s easy. It’s reliable. It’s fast.

oftenwrong

4 months ago

Protobuf is typically serialised using a TLV-style encoding.

A major value of protobuf is in its ecosystem of tools (codegen, lint, etc); it's not only an encoding. And you don't generally have to build or maintain any of it yourself, since it already exists and has significant industry investment.

tgma

4 months ago

> And I know the article says no one uses the backwards compatible stuff but that’s bizarre to me – setting up N clients and a server that use protocol buffers to communicate and then being able to add fields to the schema and then deploy the servers and clients in any order is way nicer than it is with some other formats that force you to babysit deployment order.

Yet the author has the audacity to call the authors of protobuf (originally Jeff Dean et al) "amateurs."

yearolinuxdsktp

4 months ago

I agree that saying that no-one uses backwards compatible stuff is bizarre. Rolling deploys, being able to function with a mixed deployment is often worth the backwards compatibility overhead for many reasons.

In Java, you can accomplish some of this with using of Jackson JSON serialization of plain objects, where there several ways in which changes can be made backwards-compatibly (e.g. in the recent years, post-deserialization hooks can be used to handle more complex cases), which satisfies (a). For (b), there’s no automatic linter. However, in practice, I found that writing tests that deserialize prior release’s serialized objects get you pretty far along the line of regression protection for major changes. Also it was pretty easy to write an automatic round-trip serialization tester to catch mistakes in the ser/deser chain. Finally, you stay away from non-schemable ser/deser (such as a method that handles any property name), which can be enforced w/ a linter, you can output the JSON schema of your objects to committed source. Then any time the generated schema changes, you can look for corresponding test coverage in code reviews.

I know that’s not the same as an automatic linter, but it gets you pretty far in practice. It does not absolve you from cross-release/upgrade testing, because serialization backwards-compatibility does not catch all backwards-compatibility bugs.

Additionally, Jackson has many techniques, such as unwrapping objects, which let you execute more complicated refactoring backwards-compatibly, such as extracting a set of fields into a sub-object.

I like that the same schema can be used to interact with your SPA web clients for your domain objects, giving you nice inspectable JSON. Things serialized to unprivileged clients can be filtered with views, such that sensitive fields are never serialized, for example.

You can generate TypeScript objects from this schema or generate clients for other languages (e.g. with Swagger). Granted it won’t port your custom migration deserialization hooks automatically, so you will either have to stay within a subset of backwards-compatible changes, or add custom code for each client.

You can also serialize your RPC comms to a binary format, such as Smile, which uses back-references for property names, should you need to reduce on-the-wire size.

It’s also nice to be able to define Jackson mix-ins to serialize classes from other libraries’ code or code that you can’t modify.

jcgrillo

4 months ago

As someone who has written many mapreduce jobs over years old protobufs I can confidently report the backwards compatibility made it possible at all.

noitpmeder

4 months ago

Not that I love it -- but SBE (Simple Binary Encoding) is a _decent_ solution in the realm of backwards/forwards compatibility.

the__alchemist

4 months ago

This is always the thing to look for; "What are the alternatives?", and/why aren't there better ones.

I don't understand most use cases of protobufs, including ones that informed their design. I use it for ESP-hosted, to communicate between two MCUs. It is the highest-friction serialization protocol I've seen, and is not very byte-efficient.

Maybe something like the specialized serialization libraries (bincode, postcard etc) would be easier? But I suspect I'm missing something about the abstraction that applies to networked systems, beyond serialization.

mkoubaa

4 months ago

Real ones know that serialization is what sucks.

tshaddox

4 months ago

> Name another serialization declaration format that both (a) defines which changes can be make backwards-compatibly, and (b) has a linter that enforces backwards compatible changes.

The article covers this in the section "The Lie of Backwards- and Forwards-Compatibility." My experience working with protocol buffers matches what the author describes in this section.

tomrod

4 months ago

> Name another serialization declaration format that both (a) defines which changes can be make backwards-compatibly, and (b) has a linter that enforces backwards compatible changes.

ASCII text (tongue in cheek here)

orochimaaru

4 months ago

Protobufs aren’t new. They’re really just rpc over https. I’ve used dce-rpc in 1997 which had IDL. I believe CORBA used IDL as well although I personally did not use it. There have been other attempts like ejb, etc. which are pretty much the same paradigm.

The biggest plus with protobuf is the social/financial side and not the technology side. It’s open source and free from proprietary hacks like previous solutions.

Apart from that, distributed systems of which rpc is a sub topic are hard in general. So the expectation would be that it sucks.

dlahoda

4 months ago

https://github.com/dfinity/candid/blob/master/spec/Candid.md

Analemma_

4 months ago

3 replies

The "no enums as map keys" thing enrages me constantly. Every protobuf project I've ever worked with either has stringly-typed maps all over the place because of this, or has to write its own function to parse Map<String, V> into Map<K, V> from the enums and then remember to call that right after deserialization, completely defeating the purpose of autogenerated types and deserializers. Why does Google put up with this? Surely it's the same inside their codebase.

riku_iki

4 months ago

1 reply

And v1 and v2 protos didn't even have maps.

Also, why you use string as a key and not int?

Arainach

4 months ago

1 reply

proto2 absolutely supported the map type.

riku_iki

4 months ago

It could be, it looks like there was some versions misalignment:

The maps syntax is only supported starting from v3.0.0. The "proto2" in the doc is referring to the syntax version, not protobuf release version. v3.0.0 supports both proto2 syntax and proto3 syntax while v2.6.1 only supports proto2 syntax. For all users, it's recommended to use v3.0.0-beta-1 instead of v2.6.1. https://stackoverflow.com/questions/50241452/using-maps-in-p...

dweis

4 months ago

I believe that the reason for this limitation is that not all languages can represent open enums cleanly to gracefully handle unknown enums upon schema skew.

Arainach

4 months ago

Maps are not a good fit for a wire protocol in my experience. Different languages often have different quirks around them, and they're non-trivial to represent in a type-safe way.

If a Map is truly necessary I find it better to just send a repeated Message { Key K, Value V } and then convert that to a map in the receiving end.

mountainriver

4 months ago

2 replies

> Protobuffers correspond to the data you want to send over the wire, which is often related but not identical to the actual data the application would like to work with

This sums up a lot of the issues I’ve seen with protobuf as well. It’s not an expressive enough language to be the core data model, yet people use it that way.

In general, if you don’t have extreme network needs, then protobuf seems to cause more harm than good. I’ve watched Go teams spend months of time implementing proto based systems with little to no gain over just REST.

nicce

4 months ago

2 replies

On the other hand, ASN.1 is very expressive and can cover pretty much anything, but Protobuff was created because people thought ASN.1 is too complex. I guess we can't have both.

jandrese

4 months ago

1 reply

"Those who cannot remember the past are condemned to repeat it" -- George Santayana

theamk

4 months ago

1 reply

Oh, I remember ASN.1 very well, and I would not want to repeat it again.

Protobufs have lots of problems, but at least they are better than ASN.1!

4 months ago

1 reply

Details please.

Things people say who know very little about ASN.1:

- it's bloated! (it's not)

- it's had lots of vulnerabilities! (mainly in hand-coded codecs)

- it's expensive (it's not -- it's free and has been for two decades)

- it's ugly (well, sure, but so is PB's IDL)

- the language is context-dependent, making it harder to write a parser for (this is quite true, but so what, it's not that big a deal)

The vulnerabilities were only ever in implementations, and almost entirely in cases of hand-coded codecs, and the thing that made many of these vulnerabilities possible was the use of tag-length-value encoding rules (BER/DER/CER) which, ironically, Protocol Buffers bloody is too.

If you have a different objections to ASN.1, please list them.

theamk

4 months ago

1 reply

Neither of those, the main problems are:

- There is no backward or forward compatibility by default.

(Sure, you can have every SEQUENCE have all fields OPTIONAL and ... at the end, but how many real-life schemas like that you have seen? Almost every ASN.1 you can find on the internet is static SEQUENCE, with no extensibility whatsoever)

- Tools are bad.

Yes, protoc can be a PITA to integrate into build system, but at least it (1) exists, (2) well-tested (3) supports many languages. Compared to ASN.1 where the good tooling is so rare, people routinely manually parse/generate the files!

- Honorable mention: using "tag" in TLV to describe only the type and not field name - that SEQUENCE(30) tag will be all over the place, and the contents will be wildly different. Compare to protobuf, where the "tag" is field index, and that's exactly what allows such a great forward/backward compatibility.

(Could ASN.1 fix those problems? Not sure. Yes, maybe one could write better tooling, but all the existing users know that extensibility is for the weak, and non-optional SEQUENCEs are the way to go. It is easier to write all-new format than try to change existing conventions.)

4 months ago

> - There is no backward or forward compatibility by default.

ASN.1 in 1984 had it. Later ASN.1 evolved to have a) explicit extensibility markers, and b) the `EXTENSIBILITY IMPLIED` module option that implies every SEQUENCE, SET, ENUM, and other things are extensible by default, as if they ended in `, ...`.

There are good reasons for this change:

- not all implementors had understood the intent, so not all had implemented "ignore unexpected new fields"

- sometimes you want non-extensible things

- you may actually want to record in the syntax all the sets of extensions

> - Tools are bad.

But there were zero -ZERO!- tools for PB when Google created PB. Don't you see that "the tools that existed were shit" is not a good argument for creating tools for a completely new thing instead?

> - Honorable mention: using "tag" in TLV to describe only the type and not field name - that SEQUENCE(30) tag will be all over the place, and the contents will be wildly different. Compare to protobuf, where the "tag" is field index, and that's exactly what allows such a great forward/backward compatibility.

In a TLV encoding you can very much use the "type" as the tag for every field sometimes, namely when there would be no ambiguity due to OPTIONAL fields being present or absent, and when you do have such ambiguities you can resort to manual tagging with field numbers or whatever you want. For example:

  Thing ::= SEQUENCE {
    a UTF8String,
    b UTF8String
  }

works even though both fields get the same tag (when using a TLV encoding) because both fields are required, while this is broken:

  Broken ::= SEQUENCE {
    a UTF8String OPTIONAL,
    b UTF8String
  }

and you would have to fix it with something like:

  Fixed ::= SEQUENCE {
    a [0] UTF8String OPTIONAL,
    b UTF8String
  }

What PB does is require the equivalent of manually applying what ASN.1 calls IMPLICIT tags to every field, which is silly and makes it harder to decode data w/o reference to the module that defines its schema (this last is sketchy anyways, and I don't think it is a huge advantage for the ASN.1 BER/DER way of doing things, though others will disagree).

> (Could ASN.1 fix those problems? Not sure. Yes, maybe one could write better tooling, but all the existing users know that extensibility is for the weak, and non-optional SEQUENCEs are the way to go. It is easier to write all-new format than try to change existing conventions.)

ASN.1 does not have these problems.

Better tooling does exist and can exist -- it's no different than writing PB tooling, at least for a subset of ASN.1, because ASN.1 does have many advanced features that PB lacks, and obviously implementing all of ASN.1 is more work than implementing all of PB.

> It is easier to write all-new format than try to change existing conventions.

Maybe, but only if you have a good handle on what came before.

I strongly recommend that you actually read x.680.