Yaml Document From Hell (2023)
Posted4 months agoActive3 months ago
ruudvanasseldonk.comTechstoryHigh profile
heatednegative
Debate
80/100
YamlConfiguration FilesData SerializationAlternatives to Yaml
Key topics
Yaml
Configuration Files
Data Serialization
Alternatives to Yaml
The article 'YAML document from hell' highlights the complexities and pitfalls of using YAML for configuration files, sparking a heated discussion among commenters about its shortcomings and potential alternatives.
Snapshot generated from the HN discussion
Discussion Activity
Very active discussionFirst comment
52m
Peak period
80
0-6h
Avg / period
14.2
Comment distribution142 data points
Loading chart...
Based on 142 loaded comments
Key moments
- 01Story posted
Sep 23, 2025 at 5:04 AM EDT
4 months ago
Step 01 - 02First comment
Sep 23, 2025 at 5:57 AM EDT
52m after posting
Step 02 - 03Peak activity
80 comments in 0-6h
Hottest window of the conversation
Step 03 - 04Latest activity
Sep 26, 2025 at 2:11 AM EDT
3 months ago
Step 04
Generating AI Summary...
Analyzing up to 500 comments to identify key contributors and discussion patterns
ID: 45344554Type: storyLast synced: 11/20/2025, 6:12:35 PM
Want the full context?
Jump to the original sources
Read the primary article or dive into the live Hacker News thread when you're ready.
In a lot of the Ansible documentation, yes/no are used instead of true/false. When seeing this in the official docs, I used it, figuring this was the preferred convention in Ansible. These days it now throws warnings or lint errors, so I’m updating it all over the places as I find it. Yet the Ansible documentation still commonly uses it.
(In case I haven't succeeded in hitting the right tone, this is intended to be good-natured jest and not snark.)
Now, JSON is more suited for machine-to-machine, but YAML works fairly well for humans. It's a pity, but a few domain specific don't really hurt, since you can't copy some bit of YAML and paste it in an entirely different config anyway.
PS campfire story? "When we were still working in the old building, deep down in the cellar, there was a colleague who had been there since the early days. Nobody saw him arrive at work or leave. It was as if he was always there. One of the things he had written was a custom parser ... FOR YAML!"
I did run into a project once with a very cool custom YAML parser to recommend how to recover from errors. I think you do have to type check all deserialization, and you should fail if you process a bool where you expect a string. Automatically fixing things can be very dangerous. But if you were going to do it, the way you described is the best way to do it.
> Well, JSON cannot represent ... NaN ...
Here's another horror story:
Ansible is a wonderful tool though, if you can excuse these idiosyncrasies.
The only advantage Ansible has is how easy it is to start with it - you don't need to deploy agents or even understand a lot about how it works.
Trouble is, it doesn't really scale. It's pretty slow when running against a bunch of machines, and large configurations get unwieldily quickly (be it because of YAML when in large documents its impossible to orient/know what is where/at what level, or because of the structure of playbooks vs roles vs whatever, or because templating a whitespace-as-logic-"language" is just hell). It's also fun to debug "missing X at line A, but the error can be somewhere else". Cool, thanks for the tip.
So it's pretty great to get started with, or at a home lab. Big organisations struggling with it is a bit weird.
What are the best practices along these lines? What's the "something better"?
There's an Ansible provider for Terraform so you can do the whole thing in there.
But once ansible is set, it's easy to achieve parallelism when provisioning multiple instances.
Problem is that it requires lots of back and forth over ssh, so the more latency you have between the control plane and the target hosts the slower it'll be.
And yeah... Debugging is a pain. I wish I could write ansible in an actual language instead of having to fight multiple layers of indirection with ansible, jinja2 and yaml.
Sometimes the tech world moves at warp speed, sometimes it just treads water.
The author didn’t even get into the weird stuff GitLab does with YAML too!
Yaml has its uses cases where you want things json doesnt do like recursion or anchors/aliases/tags. Or at least it has had - perhaps cue/dhall/hcl solves things better. Jsonnet is another. I havent tried enough to test how much better they are.
Yeah, that was my first thought as well. I personally don't mind YAML, but I've also made a habit out of quoting strings. And, I mean, you're quoting both keys and strings in JSON, so you're still saving approx. 2 double quotes per key/value pair in YAML if that's a metric that's important to you.
So, sure, if you want to play it super safe, quote keys as well. But I'm personally fine with the trade-off in not quoting keys.
The distinguishing draw of yaml is largely the "easiness" of not having explicit opening or - more importantly - closing delimeters. This is done using a combination of white-space delimiting for structure, & heuristic parsing for values. The latter is fundamentally flawed, but yaml fans think the flaws are a worthwhile trade-off. If you're going to bring delimiters in as a requirement, imho yaml loses its raison d'être.
Recursion/anchors/etc. on the other hand are optional extras that few use & some parsers don't even support. If they were the driving value of yaml they'd be more ubiquitous.
Disclaimer: I hate yaml & wish it didn't exist, but I do understand why it does & I frankly don't have a great suggestion for alternatives that would fill those needs. Toml is also flawed.
Along with a coworker, I wrote the package manager for Dart, which uses YAML for its main manifest file (pubspec.yaml). The lack of delimiters is kind of nice but wasn't instrumental in the choice to use YAML.
It's because JSON doesn't have comments.
If there was a JSON+comments what was specified and widely compatible, we would have used that. YAML really is a brittle nightmare, and the lack of delimiters cause problems as often as they solve them. We wrote a YAML parser from scratch and I still get the indentation on lists wrong sometimes.
But YAML lets you actually, you know, comment out a line of text in it temporarily, and that's really fucking handy. I think of Crockford had left comments in JSON, YAML would be dead.
This is a big plus but JSON5 has pretty widespread language library support - probably equal to that of YAML tbh (e.g. Swift has native JSON5 support, I don't know that anyone natively supports YAML). Any reason not to opt for it here?
Obviously, migrating to it now when there are thousands and thousands of packages and dozens of tools all reading pubspecs would be much more trouble than it's worth.
VS code defaults to complaining about trailing commas though (the warnings can be turned off though (it feels like a hack and they didn't properly document it though (it is an officially sanctioned procedure though))).
The various features it has for nesting and arrays make it convenient to write, but can make it harder to read. There is no canonical serialization of a TOML document, as far as I can tell, you could do it any number of ways.
So while TOML has its use for small config files you edit by hand, it doesn't really make sense for interchange, and it doesn't see much use outside of Rust afaik.
With JSON, YAML, XML and many other formats, the syntax for nesting has a visual appearance that matches the logical nesting. TOML does not. You have to maintain a mental model of the data structure, and slot the flat syntax into that structure.
Furthermore, there are multiple ways to express the same thing like
or It isn't always obvious which approach is more appropriate for a task, and mixing them creates a big mess.And the more nested the format becomes, with arrays of dicts, or dicts of arrays, the harder it is to follow.
While I have some minor annoyances with TOML, I counterintuitively consider it a strength of the format that nesting quickly becomes untenable, because it produces pressure on the designers of config file schemas to keep nesting to a minimum.
Maybe some projects have a legitimate need for something more complex, but IMO config files are at their best when they're just key-value pairs organized into sections.
Sampling bias, there are no complaints about it because no-one uses it (jk).
It's subjective of course but despite the name TOML never really seemed that 'Obvious' to me, in particular the spec for tables. I also think the leniency in the syntax isn't necessarily a good feature and serves to make it less 'Minimal' than its name suggests.
If we want to avoid quoting in particular, then we could use - for strings and anything else for non-strings. But the heuristics suck.
[1] the broken part was due to an ex-coworker that cheated his way out of GitOps and left basically "fake code" committed, and modified by hand (with Lens) the deployment to make it work
That’s effectively what jsonnet/cue/hcl do, though as a preprocessor instead of a postprocessor.
It's very fair to cry "why the hell do I need a linter for my trivial config file format", and these footguns are a valid reason to avoid YAML.
But overall YAML's sketchiness is a pretty easy problem to solve and if you have a good reason to keep/choose YAML, and a context where adding a linter is viable, it's not really a big deal IMO.
And as hinted in the post, there's really no well-established universal alternative. TOML is a good default but it's only usable for pretty straightforward stuff. I'm personally a fan of the "just use Nix" approach but you can't put a Nix interpreter everywhere. And Cue is way overpowered for most usecases.
I guess the tldr is that the takeaway isn't "don't use YAML" but just "beware of YAML footguns, know the alternatives".
>Many of the problems with yaml are caused by unquoted things that look like strings but behave differently. This is easy to avoid: always quote all strings.
IMO anything other than the basic types supported by JSON (number, true, false, null) ought to be be parsed as a string. Or if you really insist, some kind of special syntax to make it clear it's not a string would probably be acceptable.
https://news.ycombinator.com/item?id=34351503 , 566 points, 358 comments
If you're passing data between processes, and you still want the data to be human readable, then JSON is a good choice.
If you're writing a configuration file that's going to be edited by a human, then YAML is easier to look at and understand.
When you're on line 4000 of a YAML configuration file and the previous 70 lines have been at indentation level 6, and you see a blank line and another line at indentation level 4 (or is that 5? maybe 3?) then I strongly, strongly disagree that two '}' characters are more difficult to read than newlines, tabs, and spaces.
YAML is one of a family of languages borne from the idea that punctuation is bad and therefore should be invisible. Not gone, because all of these languages still have punctuation. No, these characters that are critically important to the interpretation of the file must be invisible.
Code and markup is easy to read when it is easy to predict what the computer will do when it parses it. Invisible punctuation makes the files harder to read, not easier. The only thing easier in YAML is writing it in the first place, and we all know that "write-only" is an insult.
For quite some time I thought toml, but the way you can spread e.g. lists all over the document can also cause some headaches.
Dhall is exactly my kind of type fest but you can hit a hard brick wall because the type system is not as strong as you think.
On top of that, the grammar is quite difficult to parse. You need a parser that can keep several candidate parses running in parallel (like the classic `Parser a = Parser (String -> [(a, String)])` type) to disambiguate some of the gnarlier constructs (maybe around file paths, URLs, and record accesses? I forget). The problem with this is that it makes the parse errors downright inscrutable, because it's hard to know when the parse you actually intended was rejected by the parser when the only error you get was "Unexpected ','".
Oh, and you can't multiply integers together, only naturals.
Maybe Nix in pure eval mode, absurd as that sounds?
I think the best thing for tools to do is to take and return JSON (possible exception: tools whose format is simple enough for old-school UNIX-style stdin/stdout file formats). Someone will come up with a good functional abstraction over JSON eventually, and until then you can make do with Dhall, YAML, or whatever else.
It doesn’t sound absurd, it’s pretty nice. What do you think about https://rcl-lang.org?
Gonna have to set aside some time to play with it compared to HCL where I spend a lot of time.
Given its general use around infrastructure, it'd be nice if it had IPv4 and IPv6 addresses as native types that get parsed.
This has resulted bunch of hacks (such as the count directive on terraform) so that the end result is a frustrating mess.
Pkl seems syntactically beautiful and powerful, but having types and functions and loops makes it a lot more complicated than the dead-simple JSON data model that YAML is based on.
> RCL is a domain-specific language for generating configuration files and querying json documents. It extends json into a simple, gradually typed, functional programming language that resembles Python and Nix.
https://github.com/ruuda/rcl
https://rcl-lang.org
> A simple subset of yaml
Which already exists and is called StrictYAML. It's just strings, lists and dicts. No numbers. No booleans. No _countries_. No anchors. No JSON-compatible blocks. So, essentially it's what most of use think as being proper YAML, without all the stupid/bad/overcomplicated stuff. Just bring your own schema and types where required.
https://hitchdev.com/strictyaml/
This would be a massive breaking change for Kubernetes. There are piles and piles of YAML all around the opensource that would need updating. It would be very hard to adopt.
Also, quoting strings 100% of the time just looks ugly in my opinion. Not a big deal with autogenerated YAML, or YAML that I do not maintain, but for anything handwritten it's annoying.
For readability of large blocks of texts that may or may not contain various special characters and newlines the only other alternative we have seen was XML, but that is very verbose.
So what the author finds as a negative, the many string formats, are exactly what drew us to yaml in the first place.
Oh yeah it is literally the best of a bad bunch in my opinion
I'm hopeful of languages like CUE https://cuelang.org/
JSON is for computers. Writing and editing by hand is not great. Escaping things sucks. A simple multi line string or something gets really awkward.
XML goes too far the other way... it's annoyingly verbose to write by hand. Escaping can get annoying. It often allows you to represent data structures that are not easily representable in various languages.
INI sucks because it lacks a specification. It also sucks for nested data.
TOML fixes this by essentially specifying a better INI file. Much like an INI file, this falls apart at any real level of nesting.
EverythingElse is not widely supported.
When it comes to basic configs and stuff humans need to work with, I usually start with a basic K=V format. Writing a "parser" in any language usually takes about one minute and has no dependencies so is an easy win.
As soon as a use case grows beyond that (quoting, explicit typing, multiple lines, escapes, whatever) I just move to YAML. It's not the best, but it's easily available and the least bad from my point of view.
I mean, this is just great:
```php
[
];```
Obviously not a lot of support though... Its PHP.
Just a yucky standard all-around
I use block scalars constantly now, with liberal use of the trimming dashes all over the place.
Any time I need to preserve some indentation in my result, I always hate the formatting I’m left with, especially if there is logic involved.
Reality is, clunky XML is badly designed, or simply has no schema attached.
In the mean time, I’m very much enjoying KDL.
So, `key1` is a string and doesn't need to be quoted. `12345` as a key is interpreted as a string (because keys are strings) and doesn't need to be quoted. `"key 1"` has a space, so it needs to be quoted.
Use more quotes, use yamllint.
Like bash, more quotes and shellcheck.
None of the systems I've seen achieve all those goals at once.
YAML, while at first sight a good idea, is irredeemably broken and should be deprecated for further use.
JSONC (https://jsonc.org/) is backwards-compatible with JSON, and a good target for long-term future migration.
.INI format works well as a structured subject-predicate-object tuple store for simple use cases.
We're probably going to have to live with that indifinitely, until someone comes up with a proposal that is better.
What blew my mind was learning that the entire JSON grammar is included as a subset in the YAML grammar. So every valid JSON document is automatically a valid YAML document.
But you don't have to stop there. You could also mix and match the JSON grammar elements with the additional "proper YAML" ones - including comments.
So this means any* software that accepts a YAML config would also accept the config as JSON or JSON-with-comments instead. No ecosystem bootstrapping necessary!
(*or almost any, as long as they don't use dicts with non-string keys)
https://news.ycombinator.com/item?id=45335129
It was on the front page yesterday!
Human-oriented Markup Language
HUML is a simple, strict, serialization language for documents, datasets, and configuration. It prioritizes strict form for human-readability. It looks like YAML, but tries to avoid its complexity, ambiguity, and pitfalls.
Still love it.
https://github.com/rethinkdb/rethinkdb/blob/main/test/common...
The problem I was trying to solve was that our tests involved a lot of things that looked like dicts (in fact they were), so my YAML-like parser stops parsing things when it looks like we have hit test code. This took out so much escaping, and made it easy to copy-paste tests into a REPL when you were working on the test (and vise-versa).
So it looks like YAML, but without most of the features, and without the footguns.
Alas, YAML is just about everywhere, so the chances for a replacement that'll be both better behaved and as ubiquitous are unfortunately slim.
I had no idea it was even so opinionated.
Mostly I use it for docker and k8s configuration, so I haven’t run into it yet I suppose
TIL that yaml and json do not have the same data model and there are yaml documents that are not representable as json...
More seriously: this is a good overview of the reasons I dislike YAML as a web configuration language. There's too much overlap between the "friendly" auto-type-determination in YAML and the symbols used in web tech, from colons to Norway having a TLD. It wouldn't be so bad if yaml parsers could use expected type of each value as a hint, but that's not a feature in any parser I've met, so I'd rather just not use yaml for anything that's going to end up describing a web service.