I Spent a Year Making an Asn.1 Compiler in D
Posted2 months agoActive2 months ago
bradley.chatha.devTechstoryHigh profile
calmmixed
Debate
40/100
Asn.1D Programming LanguageCompiler Development
Key topics
Asn.1
D Programming Language
Compiler Development
The author spent a year developing an ASN.1 compiler in D and shares their experience, sparking a discussion about the complexities and challenges of working with ASN.1.
Snapshot generated from the HN discussion
Discussion Activity
Very active discussionFirst comment
N/A
Peak period
64
0-6h
Avg / period
20
Comment distribution160 data points
Loading chart...
Based on 160 loaded comments
Key moments
- 01Story posted
Oct 23, 2025 at 8:47 AM EDT
2 months ago
Step 01 - 02First comment
Oct 23, 2025 at 8:47 AM EDT
0s after posting
Step 02 - 03Peak activity
64 comments in 0-6h
Hottest window of the conversation
Step 03 - 04Latest activity
Oct 25, 2025 at 2:17 PM EDT
2 months ago
Step 04
Generating AI Summary...
Analyzing up to 500 comments to identify key contributors and discussion patterns
ID: 45681200Type: storyLast synced: 11/20/2025, 8:32:40 PM
Want the full context?
Jump to the original sources
Read the primary article or dive into the live Hacker News thread when you're ready.
So I threw a bunch of semi-related ramblings together and I'm daring to call it a blog post.
Sorry in advance since I will admit it's not the greatest quality, but it's really not easy to talk about so much with such brevity (especially since I've already forgot a ton of stuff I wanted to talk about more deeply :( )
It was hilarious because clearly none of the people who were in favor had ever used ASN.1.
[1] https://news.ycombinator.com/user?id=cryptonector
[2] https://github.com/heimdal/heimdal/tree/master/lib/asn1
I especially like XDR, though maybe that's because I worked at Sun Microsystems :)
"Pointers" in XDR are really just `OPTIONAL` in ASN.1. Seems so silly to call them pointers. The reason they called them "pointers" is that that's how they represented optionality in the generated structures and code: if the field was present on the wire then the pointer is not null, and if it was absent the then pointer is null. And that's exactly what one does in ASN.1 tooling, though maybe with a host language Optional<> type rather than with pointers and null values. Whereas in hand-coded ASN.1 codecs one does sometimes see special values used as if the member had been `DEFAULT` rather than `OPTIONAL`.
The main problem is that to work with the data you need to understand the semantics of the magic object identifiers and while things like the PKIX module can be found easily, the definitions for other more obscure namespaces for extensions can be harder to locate as it's scattered in documentation from various standardization organizations.
So, protobuf could very well have been transported in DER, the problem issue was probably more one of Google not seeing any value of interoperability and wanting to keep it simple (or worse, clashing by oblivious users re-using the wrong less well documented namespaces).
You are truly a masochist and I salute you.
Completely ignoring the ASN.1 support for complicated structures, with more than one CVE linked to incorrect parsing of these text fields m
We _are_ using subject DNs for linking certs to their issuers, but though that's "free-form", we don't parse them, we only check for equality.
With wildcards.
The problems with PKI/PKIX all go back to terrible, awful, no good, very bad ideas about naming that people in the OSI/European world had in the 80s -- the whole x.400/x.500 naming style where they expected people to use something like street addresses as digital names. DNS already existed, but it seems almost like those folks didn't get the memo, or didn't like it.
What people forget is that you do not have to use the whole set of schema attributes.
And encrypted and/or signed email? That too, though very poorly, but the issue there is key management, and DAP/LDAP don't help because in the age of spam public directories are not a thing. Right now the best option for cryptographic security for email is hop-bby-hop encryption using DANE for authentication in SMTP, with headers for requesting this, and headers for indicating whether received email transited with cryptographic protection all the way from sender to recipient.
As for the "ability to address groups in flexible ways", I'm not sure what that means, but I've never see group distribution lists not be sufficient.
As for group addressing, distribution lists are pitiful in comparison especially on discovery side.
Anyway, ultimately the big issue is that the DAP schema is always presented as "oh you need all the details", when... you don't. And we never got to point of really implementing things well outside the more expected use case where people do not, actually, use them directly but pick by name/function from directory.
Oh I can't remember. Binary attachments have worked since I started using them long long ago. It worked at least in the mind-90s. Back then I was using both, Internet email and x.400 (HP OpenMail!), and x.400 was a massive pain (for me especially since I was one of the people who maintained a gateway between the two). I know what you're referring to: it took a long time for email to get "8-bit clean" / MIME because of the way SMTP works, but MIME was very much a thing by the mid-90s.
So it took a while if you count the days of UUCP email -- round it to two decades. But "by the md-90s" was plenty good enough because that's when the Internet revolution hit big companies. Lack of binary attachments wasn't something that held back Internet adoption. As far as the public and corps are concerned the Internet only became a thing circa 1994 anyways.
> As for group addressing, distribution lists are pitiful in comparison especially on discovery side.
Discovery, meaning directories. Those are nice inside corporate networks, which is where you need this functionality, so I agree, and yes people use Exchange / Exchange 365 / Outlook for this sort of thing, though even mutt can do LDAP-based discovery (poorly, but yes). Outside corporate networks directories are only useful within academia and governments / government labs. Outside all of that no one wants directories because they would only encourage the spammers.
Really, I do.
In particular I like:
- that ASN.1 is generic, not specific to a given encoding rules (compare to XDR, which is both a syntax and a codec specification)
- that ASN.1 lets you get quite formal if you want to in your specifications
For example, RFC 5280 is the base PKIX spec, and if you look at RFCs 5911 and 5912 you'll see the same types (and those of other PKIX-related RFCs) with more formalisms. I use those formalisms in the ASN.1 tooling I maintain to implement a recursive, one-shot codec for certificates in all their glory.
- that ASN.1 has been through the whole evolution of "hey, TLV rules are all you need and you get extensibility for free!!1!" through "oh no, no that's not quite right is it" through "we should add extensibility functionality" and "hmm, tags should not really have to appear in modules, so let's add AUTOMATIC tagging" and "well, let's support lots of encoding rules, like non-TLV binary ones (PER, OER) and XML and JSON!".
Protocol Buffers is still stuck on TLV, all done badly by comparison to BER/DER.
(However, like many other formats (including JSON, XML, etc), ASN.1 can be badly used.)
If it was being used in homogenous environments the way protocol buffers typically are, where the schemas are usually more reasonable and both read and write side are owned by the same entity, it might not have gotten such a bad rap...
:) Glad to see someone else who's gone down this road as well.
[1] https://en.wikipedia.org/wiki/Aeronautical_Telecommunication...
(i worked in telecommunications when ASN.1 was common thing)
For example, when I inherited a public key signature system (mainly retrieving certificates and feeding them to cryptographic primitives and downloading and checking certificate revocation lists) everything troublesome was left by dismissed consultants; there were libraries for dealing with ASN.1 and I only had to educate myself about the different messages and their structure, like with any other standard protocol.
The various WIP features, and switching focus of what might bring more people into the ecosystem, have given away to other languages.
Even C#, Java and C++ have gotten many of features that were only available in D as Andrei Alexandrescu's book came out in 2011.
I imagine that the scope of its uses has shrunk as other languages caught up, and I don't think it's necessarily a good language for general enterprise stuff (unless you're dealing with C++), but for new projects it's still valid IMO. I think that the biggest field it could be used in is probably games too, especially if you're already writing a new engine from scratch. You could start with the GC and then ease off of it as the project develops in order to speed up development, for example. And D could always add newer features again too, tbh.
Another thing that Java and C# got to do since 2011, is that AOT is also part of the ecosystem and free of charge (Java has had commercial compilers for a while), so not even a native binary is an advantage as you imagine.
First D has to finish what is already there in features that are almost done but not quite.
Do you have any other ideas about how D could stand out again?
That is why outside startups selling a specific product, most IT departments are polyglot.
For D to stand out, there must be a Rails, Docker like framework, something, that is getting such a buzz that makes early adopters want to go play with D.
However I don't see it happening on LLM age, where at a whim of a prompt thoughts can be generated in whatever language, which is only a transition step until we start having agent runtimes.
C++ modules, well, they should have more closely copied D modules!
Worse is better approach tends to always win.
And still today, the first thought that comes to mind when I think D is "that language with proprietary compilers", even though there has apparently been some movement on that front? Not really worth looking into now that we have Go as an excellent GC'd compiled language and Rust as an excellent C++ replacement.
Having two different languages for those purposes seems like a better idea anyway than having one "optionally managed" language. I can't even imagine how that could possibly work in a way that doesn't just fragment the community.
> There are a lot of loud people in the D community who freak out and whine about the GC, and there are plenty more quiet ones who are happily getting things done without making much noise
This sounds like an admission that the community is fractured, except with a weirdly judgemental tone towards those who use D without a GC?
"Are you saying that if I'm using Rust in the Linux kernel, I can use any Rust library, including ones written with the assumption they will be running in userspace? If not, how does that not fracture the community?"
"Are you saying that if I'm using C++ in an embedded environment without runtime type information and exceptions, I can use any C++ library, including ones written with the assumption they can use RTTI/exceptions? If not, how does that not fracture the community?"
You can make this argument about a lot of languages and particular subsets/restrictions on them that are needed in specific circumstances. If you need to write GC-free code in D you can do it. Yes, it restricts what parts of the library ecosystem you can use, but that's not different from any other langauge that has wide adoption in a wide variety of applications. It turns out that in reality most applications don't need to be GC-free (the massive preponderance of GC languages is indicative of this) and GC makes them much easier and safer to write.
I think most people in the D community are tired of people (especially outsiders) constantly rehashing discussions about GC. It was a much more salient topic before the core language supported no-GC mode, but now that it does it's up to individuals to decide what the cost/benefit analysis is for writing GC vs no-GC code (including the availability of third-party libraries in each mode).
> If you need to write GC-free code in D you can do it.
This seems correct, with the emphasis. Plenty of people make it sound like the GC in D is no problem because it's optional, so if you don't want GC you can just write D without a GC. It's a bit like saying that the stdlib in Rust is no problem because you can just use no_std, or that exceptions in C++ is no problem because you can just use -fno-exceptions. All these things are naïve for the same reason; it locks you out of most of the ecosystem.
That's not what I'm saying, and who cares if it's fractured or not? Why should that influence your decision making?
There are people who complain loudly about the GC, and then there are lots of other people who do not complain loudly and also use D in many different interesting ways. Some use the GC, some don't. People get hyper fixated on the GC, but it isn't the only thing going on in the language.
Because, if I want to write code in D without the GC, it impacts me negatively if I can't use most of the libraries created by the community.
I will say there are a larger and larger number of no-GC libraries. Phobos is getting an overhaul which is going to seriously reduce the amount that the GC is used.
It is probably also worth reflecting for a moment why the GC is causing problems for you. If you're doing something where you need some hard-ish realtime guarantee, bear in mind that the GC only collects when you allocate.
It's also possible to roll your own D runtime. I believe people have done things like replace the GC with allocators. The interface to the GC is not unduly complicated. It may be possible to come up with a creative solution to your problem.
I'm happy with C++ and Rust for those tasks. For other tasks where I want a GC, I'm perfectly happy with some combination of Go, Python and Typescript. I'm realistically never going to look into D.
I think you're really overstepping with AI is eaten by python. I can imagine an AI stack with out python llama.cpp (for inference not training... isn't completely that, but most of the core functionality is not python, and not-GCd at all), I can not imagine an AI stack with out CUDA + C++. Even the premier python tools (pytorch, vllm) would be non-functional with out these tools.
While some very common interfaces to AI require a GC'd language I think if you deleted the non-GC parts you'd be completely stuck and have years of digging your self out, but if you deleted the 'GC' parts you can end up with a usable thing in very short order.
Thus now you can use PTX directly from Python, and with the new cu Tiles approach, you can write CUDA kernels in a Python subset.
Many of these tools get combined because that is what is already there, and the large majority of us don't want, or has the resources, to spend bootstrapting a whole new world.
Until there is some monetary advantage in doing so.
Wait so are you, or are you not, saying that a GC-less D program can use libraries written with the assumption that there's a GC? The statement "there's nothing stopping [you] from not using the GC" implies that all libraries work with D-without-GC, otherwise the lack of libraries written for D-without-GC would be stopping you from not using the GC
It's not all or nothing with the GC, as I explained in another reply. There are many libraries that use the GC, many that don't. If you're writing code with the assumption that you'll just be plugging together libraries to do all the heavy lifting, D may not be the right language for you. There is a definite DIY hacker mentality in the community. Flexibility in attitude and approach are rewarded.
Something else to consider is that the GC is often more useful for high level code while manual memory management is more useful for low level code. This is natural because you can always use non-GC (e.g. C) libraries from GC, but (as you point out) not necessarily the other way around. That's how the language is supposed to be used.
You can use the GC for a GUI or some other loose UI thing and drop down to tighter C style code for other things. The benefit is that you can do this in one language as opposed to using e.g. Python + C++. Debugging software written in a mixture of languages like this can be a nightmare. Maybe this feature is useful for you, maybe not. All depends on what you're trying to do.
It looks like you work in a ton of different domains. I think based on what I've written in response to you so far, it should be easy to see that D is likely a good fit for some things you work on and a bad fit for others. I don't see what the problem is.
The D community is full of really nice and interesting people who are fun to interact with. It's also got a smaller number of people who complain loudly about the GC. This latter contingency of people is perceived as being maybe a bit frustrating and unreasonable.
I don't care whether you check D out or not. But your initial foray into this thread was to cast shade on D by mentioning issues with proprietary compilers (hasn't been a thing in years), and insinuating that the community was fractured because of the GC. Since you clearly don't know that much about the language and have no vested interest in it, why not leave well enough alone instead of muddying the waters with misleading and biased commentary?
Stop making excuses or expecting others to do your work for you.
Nothing is preventing you.
Unfortunely those that cry about GCs are still quite vocal, at least we can now throw Rust into their way.
If you understand how it's hooked into, it's very easy to work with. There is only one area of the language related to closure context creation that can be unexpected.
I honestly think if Walter Bright (or anyone within D) invested in having a serious web framework for D even if its not part of the standard library, it could be worth its weight in gold. Right now there's only Vibe.d that stands out but I have not seen it grow very much since its inception, its very slow moving. Give me a feature rich web framework in D comparable to Django or Rails and all my side projects will shift to D. The real issue is it needs to be batteries included since D does not have dozens of OOTB libraries to fill in gaps with.
Look at Go as an example, built-in HTTP server library, production ready, its not ultra fancy but it does the work.
There are plenty of people who aren't interested in using languages with proprietary toolchains. Those people typically don't use C#. The people who don't mind proprietary toolchains typically write software for an environment where D isn't relevant, such as .NET or the Apple world.
Currently we need to get a stackless coroutine into the language, actors for windowing event handling, reference counting and a better escape analysis story to make the experience really nice.
This work is not scheduled for PhobosV3 but a subset such as a web client with an event loop may be.
Lately I've been working on some exception handling improvements and start on the escape analysis DFA (but not on the escape analysis itself). So the work is progressing. Stackless coroutine proposal needs editing, but it is intended to be done at the start of next year for approval process.
One of the issues I've seen in the community is just that there aren't enough people in the community with enough interest and enough spare time to spend on a large project. Everyone in the core team is focused on working on the actual language (and day-jobs), while everyone else is doing their own sort of thing.
From your profile you seem to have a lot of experience in the field and in software in general, so I'd like to ask you if you have any other advice for getting the language un-stuck, especially with regards to the personnel issues. I think I'd like to take up your proposal for a web framework as well, but I don't really have any knowledge of web programming beyond the basics. Do you have any advice on where to start or what features/use case would be best as well?
I rather give that to languages like C# with Native AOT, or Swift (see chapter 5 of GC Handbook).
D only lacks someone like Google to push it into mainstream no matter what, like Go got to benefit from Docker and Kubernetes.
But that's only the reference compiler, DMD. The other two compilers were fully open source (including gcc, which includes D) before that.
Fully disagree on your position that having all possibilities with one language is bad. When you have a nice language, it's nice to write with it for all things.
The main advantage of ASN.1 (specifically DER) in an HTTPS/PKI context is that it's a canonical encoding. To my understanding Protobuf isn't; I don't know about Thrift.
(A lot of hay is made about ASN.1 being bad, but it's really BER and other non-DER encodings of ASN.1 that make things painful. If you only read and write DER and limit yourself to the set of rules that occur in e.g. the Internet PKI RFCs, it's a relatively tractable and normal looking serialization format.)
You can parse DER perfectly well without a schema, it's a self-describing format. ASN.1 definitions give you shape enforcement, but any valid DER stream can be turned into an internal representation even if you don't know the intended structure ahead of time.
rust-asn1[1] is a nice demonstration of this: you can deserialize into a structure if you know your structure AOT, or you can deserialize into the equivalent of a "value" wrapper that enumerates/enforces all valid encodings.
> which is that this ends up being complex enough that basically every attempt to do so is full of memory safety issues.
Sort of -- DER gets a bad rap for two reasons:
1. OpenSSL had (has?) an exceptionally bad and permissive implementation of a DER parser/serializer.
2. Because of OpenSSL's dominance, a lot of "DER" in the wild was really a mixture of DER and BER. This has caused an absolutely obscene amount of pain in PKI standards, which is why just about every modern PKI standard that uses ASN.1 bends over backwards to emphasize that all encodings must be DER and not BER.
(2) in particular is pernicious: the public Web PKI has successfully extirpated BER, but it still skulks around in private PKIs and more neglected corners of the Internet (like RFC 3161 TSAs) because of a long tail of OpenSSL (and other misbehaving implementation) usage.
Overall, DER itself is a mostly normal looking TLV encoding; it's not meaningfully more complicated than Protobuf or any other serialization form. The problem is that it gets mashed together with BER, and it has a legacy of buggy implementations. The latter is IMO more of a byproduct of ASN.1's era -- if Protobuf were invented in 1984, I imagine we'd see the same long tail of buggy parsers regardless of the quality of the design itself.
> rust-asn1[1] is a nice demonstration of this: you can deserialize into a structure if you know your structure AOT, or you can deserialize into the equivalent of a "value" wrapper that enumerates/enforces all valid encodings.
Almost. The "tag" of the data doesn't actually tell you the type of the data by itself (most of the time at least), so while you can say "there is something of length 10 here", you can't say if it's an integer or a string or an array.
Could you explain what you mean? The tag does indeed encode this: for an integer you'd see `INTEGER`, for a string you're see `UTF8String` or similar, for an array you'd see `SEQUENCE OF`, etc.
You can verify this for yourself by using a schemaless decoder like Google's der-ascii[1]. For example, here's a decoded certificate[2] -- you get fields and types, you just don't get the semantics (e.g. "this number is a public key") associated with them because there's no schema.
[1]: https://github.com/google/der-ascii
[2]: https://github.com/google/der-ascii/blob/main/samples/cert.t...
So yeah, in that instance you do need a schema to make progress beyond "an object of some size is here in the stream."
Kerberos uses EXPLICIT tagging, and it uses context tags for every SEQUENCE member, so these extra tags and lengths add up, but yeah, dumpasn1 on a Kerberos PDU (if you have the plaintext of it) is more usable than on a PKIX value.
PER lacks type information, making encoding much more efficient as long as both sides of the connection have access to the schema.
In my experience it does tell you the type, but it depends on the schema. If implicit types are used, then it won't tell you the type of the data, but if you use explicit, or if it is neither implicit nor explicit, then it does tell you the type of the data. (However, if the data type is a sequence, then you might not lose much by using an implicit type; the DER format still tells you that it is constructed rather than primitive.)
(I don't think variable-length-lengths are that big of a deal in practice. That hasn't been a significant hurdle whenever I've needed to parse DER streams.)
This has the fun side effect that DER essentially allows you to process data ("give me the 4th integer and the 2nd string of every third optional item within the fifth list") without knowing what you're interpreting.
If the schema uses IMPLICIT tags then - unless I'm missing something - this isn't (easily) possible.
The most you'd be able to tell is whether the TLV contains a primitive or constructed value.
This is a pretty good resource on custom tagging, and goes over how IMPLICIT works: https://www.oss.com/asn1/resources/asn1-made-simple/asn1-qui...
> Because of OpenSSL's dominance, a lot of "DER" in the wild was really a mixture of DER and BER
:sweat: That might explain why some of the root certs on my machine appear to be BER encoded (barring decoder bugs, which is honestly more likely).
The wikipedia page on serialization formats[0] calls ASN.1 'information object system' style formalisms (which RFCs 5911 and 5912 make use of, and which Heimdal's ASN.1 makes productive use of) "references", which I think is a weird name.
[0] https://en.wikipedia.org/wiki/Comparison_of_data-serializati...
You need to populate a string? First look up whether it's a UTF8String, NumericString, PrintableString, TeletexString, VideotexString, IA5String, GraphicString, VisibleString, GeneralString, UniversalString, CHARACTER STRING, or BMPString. I'll note that three of those types have "Universal" / "General" in their name, and several more imply it.
How about a timestamp? Well, do you mean a TIME, UTCTime, GeneralizedTime, or DATE-TIME? Don't be fooled, all those types describe both a date _and_ time, if you just want a time then that's TIME-OF-DAY.
It's understandable how a standard with teletex roots got to this point but doesn't lead to good implementations when there is that much surface area to cover.
Implementing GeneralString in all its horror is a real pain, but also you'll never ever need it.
This generality in ASN.1 is largely due to it being created before Unicode.
> You need to populate a string? First look up whether it's a UTF8String, NumericString, PrintableString, TeletexString, VideotexString, IA5String, GraphicString, VisibleString, GeneralString, UniversalString, CHARACTER STRING, or BMPString.
They could be grouped into three groups: ASCII-based (IA5String, VisibleString, PrintableString, NumericString), Unicode-based (UTF8String, BMPString, UniversalString), and ISO-2022-based (TeletexString, VideotexString, GraphicString, GeneralString). (CHARACTER STRING allows arbitrary character sets and encodings, and does not fit into any of these groups. You are unlikely to need it, but it is there in case you do need it.)
IA5String is the most general ASCII-based type, and GeneralString is the most general ISO-2022-based type. For decoding, you can treat the other ASCII-based types as IA5String if you do not need to validate them, and you can treat GraphicString like GeneralString (for TeletexString and VideotexString, the initial state is different, so you will have to consider that). For the Unicode-based types, BMPString is UTF-16BE (although normally only BMP characters are allowed) and UniversalString is UTF-32BE.
When making your own formats, you might just use the most general ones and specify your own constraints, although you might prefer to use the more restrictive types if they are known to be suitable; I usually do (for example, PrintableString is suitable for domain names (as well as ICAO airport codes, etc) and VisibleString is suitable for URLs (as well as many other things)).
> How about a timestamp? Well, do you mean a TIME, UTCTime, GeneralizedTime, or DATE-TIME?
UTCTime probably should not be used for newer formats, since it is not Y2K compliant (although it may be necessary when dealing with older formats that use it, such as X.509); GeneralizedTime is better.
In all of these cases, you only need to implement the types you are using in your program, not necessarily all of them.
(If needed, you can also use the "ASN.1X" that I made up which adds some additional nonstandard types, such as: BCD string, TRON string, key/value list, etc. Again, you will only need to implement the types that you are actually using in your program, which is probably not all of them.)
You can parse DER without using a schema (except for implicit types, although even then you can always parse the framing even if the value cannot necessarily be parsed; for this reason I only use implicit types for sequences and octet strings (and only if an implicit type is needed, which it often isn't), in my own formats). (The meaning of many fields will not be known without the schema, but that is also true of XML and JSON.)
I wrote a DER parser without handling the schema at all.
That's not really the problem. The problem is that DER is a tag-length-value encoding, which is quite redundant and inefficient and a total crutch that people who didn't see XDR first could not imagine not needing, but yeah, they really didn't need it. That crutch made it harder, not easier, to implement ASN.1/DER.
XML is no picnic either, by the way. JSON is much much simpler, and it's true you don't need a schema, but you end up wanting one anyways.
However, to have a sane interface for actually working with the data you do need a schema that can be compiled to a language specific notation.
There should be no need for a canonical encoding. 40 years ago people thought you needed that so you could re-encode a TBSCertificate and then validate a signature, but in reality you should keep the encoding as-received of that part of the Certificate. And so on.
no, not at all
they share some ideas, that doesn't make it "pretty much ASN.1". Its only "pretty much the same" if you argue all schema based general purpose binary encoding formats are "pretty much the same".
ASN.1 also isn't "file" specific at all it's main use case is and always has been being used as message exchange protocols.
(Strictly speaking ASN.1 is also not a single binary serialization format but 1. one schema language, 2. some rules for mapping things to some intermediate concepts, 3. a _docent_ different ways how to "exactly" serialize things. And in the 3rd point the difference can be pretty huge, from having something you can partially read even without schema (like protobuff) to more compact representations you can't read without a schema at all.)
At the implementation level they are different, but when integrating these protocols into applications, yeah, pretty much. Schema + data goes in, encoded data comes out, or the other way around. In the same way YAML and XML are pretty much the same, just different expressions of the same concepts. ASN.1 even comes with multiple expressions of exactly the same grammar, both in text form and binary form.
ASN.1 was one of the early standardised protocols in this space, though, and suffers from being used mostlyin obscure or legacy protocols, often with proprietary libraries if you go beyond the PKI side of things.
ASN.1 isn't file specific, it was designed for use in telecoms after all, but encodings like DER work better inside of file formats than Protobuf and many protocols like it. Actually having a formal standard makes including it in file types a lot easier.
Yes, JOSE is still infinitely better than XmlSignatures and the canonical XML madness to allow signatures _inside_ the document to be signed.
- huge braking change with the whole cert infrastructure
- this question was asked to the people who did choose ASN.1 for X509 and AFIK they saied today they would use protobuf. But I don't remember where I have that from.
- JOSE/JWT etc. aren't exactly that well regarded in the crypto community AFIK or designed with modern insights about how to best do such things (too much header malleability, too much crypto flexibility, too little deterministic encoding of JSON, too much imprecise defined corner cases related to JSON, too much encoding overhead for keys and similar (which for some pq stuff can get in the 100KiB ranges), and the argument of it being readable with a text editor falls apart if anything you care about is binary (keys, etc.) and often encrypted (producing binary)). (And IMHO opinion the plain text argument also falls apart for most non-crypto stuff I mean if you anyway add a base64 encoding you already dev need tooling to read it, and weather your debug tooling does a base64 decode or a (maybe additional) data decode step isn't really relevant, same for viewing in IDE which can handle binary formats just fine etc. but thats an off topic discussion)
- if we look at some modern protocols designed by security specialists/cryptographers and have been standardized we often find other stuff (e.g. protobuf for some JWT alternatives or CBOR for HSK/AuthN related stuff).
That is true, but it's also true that JWT/JOSE is a market winner and "everywhere" today. Obviously, it's not a great one and not without flaws, and its "competition" is things like SAML which even more people hate, so it had a low bar to clear when it was first introduced.
> CBOR
CBOR is a good mention. I have met at least one person hoping a switch to CWT/COSE happens to help somewhat combat JWT bloat in the wild. With WebAuthN requiring CBOR, there's more of a chance to get an official browser CBOR API in JS. If browsers had an out-of-the-box CBOR.parse() and CBOR.stringify(), that would be interesting for a bunch of reasons (including maybe even making CWT more likely).
One of the fun things about CBOR though is that is shares the JSON data model and is intended to be a sibling encoding, so I'd also maybe argue that if CBOR ultimately wins that's still somewhat indirectly a "JSON win".
- ASN.1 is a set of a docent different binary encodings
- ASN.1's schema languages is IMHO way better designed then Protobuf but also more complex as it has more features
- ASN.1 can encode much more different data layouts (e.g. things where in Protobuf you have to use "tricks") each being layout in the output differently depending on the specific encoding format, annotations on the schema and options during serialization
- ASN.1 has many ways to represent things more "compact" which all come with their own complexity (like bit mask encoded boolean maps)
overall the problem of ASN.1 is that it's absurdly over engineered leading to you needing to now many hundred of pages of across multiple standard documents to just implement one single encoding of the docent existing ones and even then you might run into ambiguous unclear definitions where you have to ask on the internet for clarification
if we ignore the schema languages for a moment most senior devs probably can write a crappy protobuf implementation over the weekend, but for ASN.1 you might not even be able to digest all relevant standards in that time :/
Realistically if ASN.1 weren't as badly overengineered and had shipped only with some of the more modern of it's encoding formats we probably would all be using ASN.1 for man things including maybe your web server responses and this probably would cut non image/video network bandwidth by 1/3 or more. But then the network is overloaded by image/video transmissions and similar not other stuff so I guess who cares???!???
ASN.1 has many encoding standards, but you don't need to implement them all, only the specific one for your needs.
ASN.1 has a standard and an easy to follow spec, which Protobuf doesn't.
In sum: I could cobble together a working ASN.1 implementation over a weekend. In contrast, getting to a clean-room working Protobuf library is a month's work.
Caveat: I have not had to deal with PKI stuff. My experience with ASN.1 is from LDAP, one of the easiest protocols to implement ever, IMO.
> to follow spec, which Protobuf doesn't.
I can't say so personally, but from what I heard from the coworker I helped the spec isn't always easy to follow as there are many edge cases where you can "guess" what they probably mean but they aren't precise enough. Through they had a surprising amount of success to get clarification from authoritative sources (I think some author or maintainer of the standard but I'm not fully sure, it was a few years ago).
In general there seem to be a huge gap between writing something which works for some specific usage(s) of ASN.1 and something which works "in general/all the standard" (for the relevant encodings (mainly DER, but also at least part of non-DER BER as far as I remember)).
> Protobuf doesn't.
yes but it's wire format is relatively simple and documented (not as a specification but documented anyway) so getting something going which can (de-)serialize the wire format really isn't that hard and the mapping from that to actual data types is also simpler (through also won't work for arbitrary types due to dump design limitations). I would be surprised if you need a month of work to get something going there. Through if you want to reproduce all the tooling eco-system or special ways some libraries can interact with it (e.g. in place edits etc.) it's a different project altogether. What I mean is just a (de-)serializer for the wire format with appropriate mapping to data types (objects,structs, whatever the language you use prefers).
But for Protobuf you kinda have to. Needing to parse the .proto files and comform to Google's code gen ideas is implied.
For ASN you just need to follow a spec, not a concrete implementation.
Yes, but you can use a subset of ASN.1. You don't have to implement all of x.680, let alone all of x.681, x.682, and x.683.
ASN.1 was not over-engineered in 1990. The things that kept it from ruling the world are:
- the ITU-T specs for it were _not_ free back then
- the syntax is context dependent, so using a LALR(1) parser generator to parse ASN.1 is difficult, though not really any more than it is to parse C with a LALR(1) parser generator, but yeah if it had had a LALR(1)-friendly syntax then ASN.1 would have been much easier to write tooling for
- competition from XDR, DCE/MS RPC, XML, JSON, Protocol Buffers, Flat Buffers, etc.
The over-engineering came later, as many lessons were learned from ASN.1's early years. Lessons that the rest of the pack mostly have not learned.
I had to look up https://www.merriam-webster.com/dictionary/docent
1] the somewhat ironic part is that when it was discovered that using just passwords for authentication is not enough, the so called "lighweight" LDAP got arguably more complex that X.500. Same thing happened to SNMP (another IETF protocol using ASN.1) being "Simple" for similar reasons.
CBOR is self-describing like JSON/XML meaning you don’t need a schema to parse it. It has better set of specific types for integers and binary data unlike JSON. It has an IANA database of tags and a canonical serialization form unlike MsgPack.
With the top level encoding solved, we could then go back to arguing about all the specific lower level encodings such as compressed vs uncompressed curve points, etc.
[1] https://datatracker.ietf.org/doc/rfc9804
If it were created today it would look a lot like OAuth JSON Web Tokens (JWT) and would use JSON instead of ASN.1/DER.
[1] https://github.com/PADL/ASN1Codable
[2] https://github.com/heimdal/heimdal/tree/master/lib/asn1
> which can transform ASN.1 into a much more parseable JSON AST
The sign of a person who's been hurt, and doesn't want others to feel the same pain :D
What I disagree is on the disdain being veiled. Seems very explicit to me.
Anyway, yeah, I hadn't heard about it before either, and it's great to know that somebody out there did solve that horrible problem already, and that we can use the library.
> ASN.1 is a... some would say baroque, perhaps obsolete, archaic even, "syntax" for expressing data type schemas, and also a set of "encoding rules" (ERs) that specify many ways to encode values of those types for interchange.
for me expressing disdain for ASN.1. On the contrary: I'm saying those who would say that are wrong:
> ASN.1 is a wheel that everyone loves to reinvent, and often badly. It's worth knowing a bit about it before reinventing this wheel badly yet again.
:)
(i worked with asn1c (not sure which fork) and had to hack in custom allocator and 64bit support. i shiver every time something needs attention in there)
Honestly any compiler project in pure C is pretty hardcore in my eyes, ASN.1 must amplify the sheer horror.
One memcpy made it like 30% faster overall.
* unit tests anywhere, so I usually write my methods/functions with unit tests following them immediately
* blocks like version(unittest) {} makes it easy to exclude/include things that should only be compiled for testing
* enums, unions, asserts, contract programming are all great
I would say I didn't have to learn D much. Whatever I wanted to do with it, I would find in its docs or asked ChatGPT and there would always be a very nice way to do things.
From a philosophical/language-design standpoint, it ticks so many boxes. It had the potential to be wildly popular, had a few things gone differently.
If the language tooling and library ecosystem were on par with the titans of today, like Rust/Go, it really would be a powerhouse language.
Still much better than GCCGO, kind of useless for anything beyond Go 1.18, no one is updating it any longer, and may as well join gcj.
For regular common code that is indeed not an issue.
67 more comments available on Hacker News