Protobuffers Are Wrong (2018)
There are some legit criticisms here (remember, protobuf is a more than 20 years old format with extremely high expectation of backward compatibility so there are lots of unfixable issues), but in general this article reveals a fundamental misunderstanding of protobuf's goal. It is not a tool for elegant description of schema/data, but evolving distributed systems composed of thousands of services and storage. We're not talking about just code, but about petabytes of data potentially communicated through incomprehensible level of complex topology.
Significant previous discussions:
https://news.ycombinator.com/item?id=21871514 (211 comments)
https://news.ycombinator.com/item?id=18188519 (298 comments)
In particular, Kenton's rebuttal at https://news.ycombinator.com/item?id=18190005 is worth a read.
The type system woes are quite legit. I wonder how many of these were improved in FlatBuffers. Which has a ton of other good things going for it, chiefly ideally less serialization demands.
Once you start doing rpc, the demand goes so up. Cap'n'Proto's ability to return & process future values is such an exciting twist, that melds with where we are with local procedure calls: they return promise objects that we can pass all around while work goes on.
Kenton popped up a couple months ago to mention maybe finding time for mutli-party support, where I can for ex request say the address of a building & send the future result to another 3rd party system. Now this is less just a way to serialize some stuff, & more a way to imagine connecting interesting systems.
Missing the point. The whole point of protobufs is to be a stable wire format that's forward compatible. They can't have sum types because you can't add one of those in a forward compatible way; oneof works the way it does because that's the only way it can. (I do think repeated is overengineered and should just be a collection field, which would solve the problem of not composing with oneof, but that's honestly just a minor wart).
Also if your language's bindings don't preserve the relationship between x and has_x and don't play nice with recursion schemes, use a better binding generator! You don't have to use the default one.
> They’re clearly written by amateurs
Protocol buffers were written by Jeff Dean and Sanjay Ghemawat. Whether you see issues with the implementation or not and whether they are fit for your use case or not are valid discussions, but if your argument starts with "omg these idiots couldn't design a simple product" then I'm already reading the rest of it with a huge grain of salt.
This was hard for read because of the glaring omissions of existing engineering practices. It feels author only cares about programming purity and writing programs which transform metadata.
Making all fields in a message required makes messages into product types but loses compatibility with older or newer versions of protocol. Auto-creating objects on read dramatically increases chances that a field removal in future version of protocol will be handled properly. Auto-creating on write simplifies writing complex setters, etc...
You could argue that those problems could be solved better (and I might agree, protobufs are one of my least favorite serialization protocols) but not even acknowledging reasons and pretending the designers didn't know better makes writer seem either ignorant or arguing in bad faith.
protobufs are a data exchange format. The schema needs to map clearly to the wire format because it is a notation for the wire format, not your object model du jour.
If the protobuf schema code generator had to translate these suggestions into efficient wire representations and efficient language representations, it would be more complex than half the compilers of the languages it targets.
I do mourn the days when a project could bang out a great purpose-built binary serialization format and actually use it. But half the people I hire today, and everyone in the team down the hallway that needs to use our API, can no longer do that. I'm lucky if they know how two's complement works
I've worked with many serialization formats (XDR, SOAP XML w/ XML schema, CORBA IDL and IIOP, JSON with and without schema, pickle, and many more). Protocol buffers remind me of XDR (https://en.wikipedia.org/wiki/External_Data_Representation). Which was a great technology in the day; NFS and NIS were implemented using it.
I do agree with some of the schema modelling criticisms in this article, but the ultimate thing to understand is: protocol buffers were invented to allow google to upgrade servers of different services (ads and search) asychronously and still be able to pass messages between them that can be (at least partly) decoded. They were then adopted for wide-range data modelling and gained a number of needed features, but also evolved fairly poorly.
IIRC you still can't make a 4GB protocol buffer because the Java implementation required signed integers (experts correct me if I'm wrong) and wouldn't change. This is a problem when working with large serialized DL models.
I was talking to Rob Pike over coffee one morning and he said everythign could be built with just nested sequences key/value pairs and no schema (IE, placing all the burden for decoding a message semantically on the client) and I don't think he's completely wrong.
I find rolling my own serialization protocol easy enough, and I want some friction from using it: i've been burned when it is too easy and people start to use IPC where a function call would just as well and be more efficient. Premature pessimization is a large problem these days.
Personally I avoid Map fields in protos. I don't find a huge amount of value for a Map<int, Foo> over a repeated Foo with an int key. That tends to be more flexible over time (since you can use different key fields) and it sidesteps all of the issues about composability of them.
I think required fields are fine, provided that you understand that "required" means "required forever". If you're already using protos this isn't exactly a brand new concept. When you use a field number that field number is assigned forever, you can't reuse it once that field is deprecated. It requires a bit more thoughtful design, and obviously not all fields should be required, but it has value in some places.
"They’re clearly written by amateurs, unbelievably ad-hoc, mired in gotchas, tricky to compile, and solve a problem that nobody but Google really has."
Quite a statement from someone that spent less than a year at Google.
Personally I really appreciate protobuf; it has saved me and teams I've worked with a ton of time and effort.
Putting this article on my ignore list.
I worked a lot with protocol buffers and for the problem it claims to solve it solves that problem well. I can't really grasp this criticism. That extreme type hell that Java coders like is not really a thing outside the Java world.
If it is actually so terrible, I wonder why no player in the industry (e.g. Microsoft or some other big player) comes up with a better protocol.
Protobuf is the worst serialization format and schema IDL, except for all the others.
The truth is, a lot of these critiques are valid. You could redesign protobuf from the ground up and improve it greatly. However, I think this article is bad. Firstly, it repeatedly implies that protobuf was designed by amateurs because of its design pitfalls, ignoring the fact that it's also clear that protobuf grew organically into what it is today rather than deliberately. Secondly, I do not feel it is giving enough credit to the idea that protobuf simply was not designed to be elegant in the first place. It's not a piece of modern art. It is not a Ferrari. It is a Ford F150.
What protobuf does have right is the stuff that's boring; it has a ton of language bindings that are mostly pretty efficient and complete. I'm not going to say it's all perfect, because it's not. But protobuf has an ecosystem that, say, Cap'n Proto does not. This matters more than the fact that protobuf is inelegant. Protobuf may be inelegant, but it does in fact get the job done for many of the use cases.
In my opinion, protobuf is actually good enough. It's not that ugly. Most programming languages you would use it from are more inelegant and incongruent than it is. I don't feel like modelling data or APIs in it is overly limited by this. Etc, etc. That said, if you want to make the most of protobuf, you'll want to make use of more than just protoc, and that's the benefit of having a rich ecosystem: there are plenty of good tools and libraries built around it. I'll admit that outside of Google, it's a bit harder to get going, but my experiences on side projects have made me feel that protobuf outside of Google is still a great idea. It is, in my opinion, significantly less of a pain in the ass than using OpenAPI based solutions or GraphQL, and much more flexible than language-specific tools like tRPC.
For a selection of what the ecosystem has to offer, I think this list is a good start.