Formatting Code Should Be Unnecessary
Posted4 months agoActive4 months ago
maxleiter.comTechstoryHigh profile
heatedmixed
Debate
85/100
Code FormattingProgramming LanguagesSoftware Development
Key topics
Code Formatting
Programming Languages
Software Development
The article argues that code formatting is unnecessary with the right tooling, sparking a debate on the importance of formatting and the potential benefits and drawbacks of alternative approaches.
Snapshot generated from the HN discussion
Discussion Activity
Very active discussionFirst comment
22m
Peak period
37
0-3h
Avg / period
16
Comment distribution160 data points
Loading chart...
Based on 160 loaded comments
Key moments
- 01Story posted
Sep 7, 2025 at 7:08 PM EDT
4 months ago
Step 01 - 02First comment
Sep 7, 2025 at 7:30 PM EDT
22m after posting
Step 02 - 03Peak activity
37 comments in 0-3h
Hottest window of the conversation
Step 03 - 04Latest activity
Sep 9, 2025 at 10:34 AM EDT
4 months ago
Step 04
Generating AI Summary...
Analyzing up to 500 comments to identify key contributors and discussion patterns
ID: 45163043Type: storyLast synced: 11/20/2025, 8:09:59 PM
Want the full context?
Jump to the original sources
Read the primary article or dive into the live Hacker News thread when you're ready.
There's a scissor that cuts through the formatting debate: If initial space width was configurable in their editor of choice, would those who prefer tabs have any other arguments?
the unix philosophy on the other hand only "thrives" if every other tool is designed around (and contains code to parse) "plain text"
And how did that work out for them?
This seems like one of the many cases where unix won out by being a lowest common denominator. Every platform can handle plain text.
The lowest common denominator rather is binary blobs. :-)
The goal of having every developer viewing the code with their own preferences just isn't that important. On every team I've been on, we just use a standard style guide, enforced by formatter, and while not everyone agrees with every rule, it just doesn't matter. You get used to it.
Arguing and obsessing about code formatting is simply useless bikeshedding.
https://astyle.sourceforge.net/astyle.html#_style=whitesmith
And then someone said: oh yeah? Hold my beer https://astyle.sourceforge.net/astyle.html#_style=pico
Unless it's an accessibility issue, and it is an accessibility issue sometimes.
Bah! So, what is more important? Is the average convenience of the herd more important? Average of the convenience, even if there was ever such a thing.
What if you really liked reading books in paper format, but were forced to read them on displays for... reasons?
What I would be curious on is tracing from errors back to the source code. Nearly every language I’ve used prints line number and offset on the line for the error. How that worked in the Diana world would be interesting to learn.
[1]: https://github.com/Wilfred/difftastic
If we had a formatting tool that operated solely on AST, checked in code could be in a canonical form for a given AST. Editors could then parse the AST and display the source with a different formatting of the users choice, and convert to canonical form when writing the file to disk.
If we can’t progress our ecosystem because we are reliant on one very specific 50+ year old line parser, then that says more about the inflexibility of the industry to move forward than it does about the “new” ideas being presented.
Grep works great.
Except it already is a solved problem.
If languages compile to a common byte code then you just need one tool. You already see examples of this with things like the IR assembly produced by LLVM, various Microsoft languages that compile to CLR, and the different languages that target JVM.
There are also already common ways to create reusable parsing rules like LSP for IDEs and treesitter.
In fact there are already grep-like utilities that are based on treesitter.
So it’s not only very possible to create language agnostic, reusable, tools; but these tools already exist and being used by a great many developers.
The problem raised in the article is that we just don’t push these concepts hard enough these days. Instead relying on outdated concepts of what source code should look like.
> Grep works great
For LF-separated lists it does. But if it worked great for structured content then we wouldn’t be having this conversation to begin with.
So the real choice is either:
- new tool: grep with caching reverse-formatter filter.
- new tool: ast-grep with understanding of AST serialization format for your specific language.
At least in the first case, you still have fall back.
About grep and diff working on a textual representation of the AST, it would be like grepping on Javascript source code when the actual source code is Typescript or some other more distant language that compiles to Javascript (does anybody remember Coffescript?) We want to see only the source code we typed in.
By the way, add git diff to the list of tools that should work on the AST but show us the real source code.
> Everyone had their own pretty-printing settings for viewing [DIANA] however they wanted.
I’m still confused because the specifically call the IR DIANA, and they talk about viewing the IR. It isn’t clear to me if the IR is more like a bytecode or something, or more like just the original source code with a little processing done to it. They also have a quote,
> Grady Booch summarizes it well: R1000 was effectively a DIANA machine. We didn't store source code: source code was simply a pretty-printing of the DIANA tree.
So maybe the other visualizations they could do by transforming the IR were so nice that nobody even cared to look at the original ADA that they’d written to generate it?
All the same tools can exist with a text backend, and you get grep/sed support for free too!
This becomes an issue with say CI where maybe I add a gate to check something with grep. But whose format do I assume? My local (that I used to test it locally) or the canonical (which means I need to switch local format to test it)?
You would use the format on disk for the grep. "Your format" only exists displayed in your editor.
Want it to look like C? Lisp? Pascal? Why not!
Yes, of course, because tab width is * dynamically* flexible, so initial space width isn't enough
But for "dirty-width" indents, eg, after some text that can vary in size (proportional fonts or some special chars even in fixed fonts) you can't align with spaces while a tab width can be auto-adjusted to match the other line
You still work with text, the text just isn't the canonical stored representation. You get diffs to resolve only when structure is changed.
You get most of the same benefit with a pre-commit linter hook, though.
What happens when you stage the line `} else return {`? git doesn't allow to stage specific AST nodes. It would also mean that you can't stage partial code (that produces syntax errors)
You would still store text, and still check out text, just transformed text. You could still check in anything you want, including partial code, syntax errors, or any other arbitrary text. Diffs would work the same way they do now.
All of your examples work better for code with structural knowledge:
- grep: symbol search (I use it about 100x as often as a text grep) or https://github.com/ast-grep/ast-grep
- diff: https://semanticdiff.com (and others), i.e.: hide noisy syntax only changes, attempt to capture moved code. I say attempt, because with projectional programming we could have a more expressive notion of code being moved
- sed: https://npmjs.com/package/@codemod/cli
- version control: I'd look towards languages like Unison to see what funky things we could do here, especially for libraries. A general example: no conflicts due to non-semantic changes (re-orderings, irrelevant whitespaces, etc.)
And there are abilities we lose completely by making text the source of truth, like a reliable version control for "this function moved to a new file".
But if you store ASTs, you _have_ to have the support of each of the language for each of the tools (because each language has its own AST). This basically means a major chicken-and-egg problem - a new language won't be compatible with any of the tools, so the adoption will be very low until the editor, diff, sed etc.. are all updated.. and those tools won't be updated until the language is popular.
And you still don't get any advantages over text! For example, if you really cared about "this function moved to new file" functionality, you could have unique id after each function ("def myfunc{f8fa2bdd}..."), and insert/hide them in your editor. This way the IDE can show nice definition, but grep/git etc.. still work but with extra noise.
In fact, I bet that any technology that people claim requires non-readable AST files, can be implemented as text for many extra upsides and no major downsides (with the obvious exception of truly graphical things - naive diffs on auto-generated images, graphs or schematics files are not going to be very useful, no matter what kind of text format is used)
Want to have each person see it's own formatting style? Reformat to person's style on load and format back to project style on save. Modern formatters are so fast, people won't even notice this.
Want fast semantic search? Maintain the binary cache files, but use text as source-of-truth.
Want better diff output? Same deal, parse and cache.
Want to have no files, but instead have function list and edit each one directly, a la Smalltalk? Maintain files transparently with text code - maybe one file per function, or one file per class, or one per project...
The reason people keep source code as text as it's really a global maximum. The non-text format gives you a modest speedup, but at the expense of imposing incredible version compatibility pain.
I'm also not saying we can have all these good things, but they are not free, and the costs are more spread out and thus less obviously noticeable than the ones projectional code imposes.
If the runtime, then I bet almost no one will notice, especially if the appropriate caching is used.
If the programming-time - sure, but it's not like you can avoid parsers altogether. If the parsers are not in the tools, they must be in IDE. Factor out that parsing logic, and make it a library all the tools can use (or a one-shot LSP server if you are in the language that has hard-to-use bindings).
Note even with AST-in-file approach, you _still_ need the library to read and write that AST, it's not like you can have a shared AST schema for multiple languages. So either way, tools like diff will need to have a wide variety of libraries linked in, one for each language they support. And at that point, there is not much difference between AST reader and code parser.
Cross-language libraries don't seem to be super common for this. The recovering-sense-from-text tools I named all use different parsers in their respective languages.
Again, reading (and yes, technically that's also parsing) from an AST from a data-exchange formatted file is mags simpler. And for parsing these schemes there are battle-tested cross-language solutions, e.g. protobuf.
* [Difftastic](https://difftastic.wilfred.me.uk/) — my go-to diff tool for years * [Nu shell](https://www.nushell.sh/) — a promising idea, but still lacking in design/implementation maturity
What I’d really like to see is a *viable projectional editor* and a broader shift from text-centric to data-centric tools.
The issue is that nearly everything we use today (editors, IDEs, coreutils) is built around text, and there’s no agreed-upon data interchange format. There have been attempts (Unison, JetBrains MCP, Nu shell), but none have gained real traction.
Rare “miracles” like the C++ --> Rust migration show paradigm shifts can happen. But a text → projectional transition would be even bigger. For that to succeed, someone influential would need to offer a *clear, opt-in migration path* where:
* some people stick with text-based tools, * others move to semantic model editing, * and both can interoperate in the same codebase.
What would be needed:
* Robust, data-native alternatives to [coreutils](https://wiki.archlinux.org/title/Core_utilities) operating directly on structured data (avoid serialize ↔ parse boundaries). Learn from Nushell’s mistakes, and aim for future-compatible, stable, battle-tested tools. * A more declarative-first mindset. * Strong theoretical foundations for the new paradigm. * Seamless conversion between text-based and semantic models. * New tools that work with mainstream languages (not niche reinventions), and enforce correctness at construction time (no invalid programs). * Integration of semantic model with existing version control systems * Shared standards for semantic models across languages/tools (something on the scale of MCP or LSP — JetBrains’ are better, but LSP won thanks to Microsoft’s push). * Dual compatibility in existing editors/IDEs (e.g. VSCode supporting both text files and semantic models). * Integrate knowledge across many different projects to distill the best way forward -> for example learn from Roslyn's semantic vs syntax model, look into tree sitter, check how difftastic does tree diffing, find tree regex engines, learn from S-expressions and LISP like languages, check unison, adopt helix editor/vim editing model, see how it can eb integrated with LSP and MCP etc.
This isn’t something you can brute-force — it needs careful planning and design before implementation. The train started on text rails and won’t stop, so the only way forward is to *build an alternative track* and make switching both gradual and worthwhile. Unfortunately it is pretty impossible to do for an entity without enough influence.
https://docs.helix-editor.com/syntax-aware-motions.html
https://www.masteringemacs.org/article/combobulate-structure...
https://zed.dev/blog/syntax-aware-editing
Etc etc.
Without tools in mainstream editors I don't see how it can push us forward instead of saying a niche barely anyone knows about.
And yet it didn't, it reversed. I think the fact that "plain text for all source files" actually won in the actual ecosystem wasn't just because too many developers had the wrong idea/short-sightedness -- because in fact most influential people wanted and believed in what you say. It's because there are real factors that make the level of investment required for the other paths unsustainable, at least compared to the text source path.
it's definitely related to the "victory" of unix and unix-style OSs. Which is often understood as the victory of a philosophy of doing it cheaper, easier, simpler, faster, "good enough".
It's also got to do with how often languages and platforms change -- both change within a language/platform and languages/platforms rising and falling. Sometimes I wish this was less quick, I'm definitely a guy who wants to develop real expertise with a system by using it over a long time, and think you can work so much more effectively and productively when you have done such. But the actual speed of change of platforms and languages we see depends on reduced cost of tooling.
It’s a really subtle difference but I can’t quite put my finger on why it is important. I think of all the little text files I’ve made over the decades that record information in various different ways where the only real syntax they share is that they use short lines (80 columns) and use line orientation for semantics (lah-dee-dah way of saying lots of lists!)
I have a lot of experience of being firmly ensconced in software engineering environments where the only resources being authored and edited were source code files.
But I’ve also had a lot of experience of the kind of admin / project / clerical work where you make up files as you go along. Teaching in a high school was a great place to practice that kind of thing.
Yes. Because Yaml exists. And mixing tabs and spaces is horrible in it. And the rules are very finnicky.
Optimal tab usage is emit 2-4 spaces.
xslt was a Diana like pre-parsed representation of dsssl. oh how I miss dsssl (a scheme based sgml transformation language) but no. dsssl was a lisp! with hygienic macros! "ikes" they went and invented XSLT.
the "logic" escapes me to this day.
no. plain text it is. human readable. and grep/sed/diff able.
Perhaps this is rather a design mistake in how UNIX handles things and is so focused on text.
Would a few decades help in universally having such a translator in all the tools?
I heard this, many years ago, when we used Perforce. The Perforce consultant that we dealt with, told us this, as an example of triggers. Back then, I was told that Google was a big Perforce shop (maybe just a part of Google. I dunno).
I have heard that this was one of the goals of developing IDLs. I think the vision was, that you could have a dozen different programmers, working in multiple languages (for example, C for the drivers, Haskell for the engine, and Lua for the UI). They would be converted to a common IDL, when submitted to configuration management, and then extracted from that, when the user looks at it.
I can't see that working, but a lot of stuff that I used to think was crazy, has happened, so, who knows?
I was on an internal tools team doing distinctly unsexy LAMP-stack work, but all the documentation I ever saw talked about perforce/p4.
I had never heard of DIANA but I love old ideas being new again. (Plus you made me laugh)
I guess lisp still has whitespace? That seems like the only meaningful way it isn't already just what the post is describing.
In theory a system could be made where this level of code isn't what's actually stored and is just a reverse pretty-print-with-my-preferences version of the code, as the post mentions. SBCL compiles my function when I enter it, I can ask SBCL to describe it back to me:
I can also ask SBCL to show me the disassembly, perhaps again in theory a system could be made where you can get and edit text at that level of abstraction before putting it back in. (SBCL does actually let you modify the compiled code directly if you felt the urge to do such a thing. You just get a pointer to the given origin address and offset and write away.)But just going back to the Lisp source form, it's close enough that you could recover the original and format it a few different ways depending on different preferences. e.g. someone might prefer the first expression given to handler-case to be on the same line instead of a new line like I did. But to such a person, is that preference universal, or does it depend on the specific expressions involved? There are other not strictly formatting preferences at play here too, like the use of "cl-bcrypt" vs "bcrypt" as package name, or one could arrange to have no explicit package name at all. My own preferences on both matters are context-sensitive. The closest universal preference I have around this general topic is that I really hate enforced format tools even if they bent to my specific desires 100% of the time.
I'd say the closest modern renditions of what the post is talking about are expressed by node editors. Unreal's Blueprints or Blender's shader editor are two examples, ETL tools are another. But people tend to work at the node level (and may have formatting arguments about the node layout) rather than a pretty-printed text representation of the same data. I think in the ETL world it's perhaps more common to go under the hood a little and edit some text representation, which may be an XML file (and XML can be pretty-printed for many different preferences) or a series of SQL statements or something CSV or INI like... whether or not that text is a 'canonical' representation or a projection would depend on the tool.
That's true, but there is a very big difference between S-expressions stored as text and other programming languages stored as text because there is a standard representation of S-expressions as text, and Common Lisp provides functions that implement that standard in both directions (READ and PRINT) as part of its standard library. Furthermore, the standard ensures READ-PRINT equivalency, i.e. if you READ the result of PRINTing an object the result is an equivalent object. So there is a one-to-one mapping (modulo copying) between the text form and the internal representation. And, most importantly, the semantics of the language are defined on the internal representation and not the textual form. So if you wanted to store S-expressions in, say, a relational database rather than a text file, that would be an elementary exercise. This is why many CL implementations provide alternative serializations that can be rendered and parsed more efficiently than the standard one, which is designed to be human-readable.
This is in very stark contrast to nearly every other programming language, where the semantics are defined directly on the textual form. The language standard typically doesn't even require that an AST exist, let alone define a canonical form for it. Parsers for other languages are typically embedded deep inside compilers, and not provided as part of the standard library. Every one is bespoke, and they are often byzantine. There are no standard operations for manipulating an AST. If you want to write code that generates code, the output must be text, and the only way to run that code is to parse and compile it using the bespoke parser that is an opaque part of the language compiler. (Note that Python is a notable exception.)
By that I mean highlighting the diff between these:
With the diff highlighting the `car` changed to `cdr` rather than just the raw lines being changed.I'm pretty sure this exists, but it's uncommon (at least to me its uncommon).
Also, structural diff is actually a very hard problem.
It doesn’t get much less formatted than Minified JavaScript, except maybe Perl or Brainfuck.
However 'if (x) == (1) {}' is totally fine with the formatter. As is an assignment of '(x) = (y)'.
It's actively annoying too because like, extra parenthesis often have important meaning.
For example, consider the following code:
In that case, the code is obviously temporarily commented out, but go's formatting will make it so that if you comment it out like that, fmt, and then uncomment it and forget to re-add the parens, you get shot in the foot.I've hit that far more times than it's uhh... I dunno, I guess removed parenthesis I didn't want? I don't write them if I don't want them.
i wonder how many default formatting decisions are made this way (including go fmt, etc)
Something between "everything fits on one short line" and "every argument gets its own line" would be nice too. Spreading a function definition or call across ten lines when it would fit on two or three doesn't feel like an automatic win.
https://naildrivin5.com/blog/2013/05/17/source-code-typograp...
Now explain a declaration like "char *argv[]"...
> We’ve also re-set the data type such that there is no space between char and * - the data type of both of these variables is “pointer to char”, so it makes more sense to put the space before the argument name, not in the middle the data type’s name (update: it should be pointed out that this only makes sense for a single declaration. A construct like char* a, b will create a pointer to char, a, and a regular char, b).
Ah, yes, the delusional C++ formatting style. At least it's nice that the update provides the explanation why it should be avoided.
You also don't think about dollars differently than other units, just because the sign goes before the number.
I wouldn't draw any conclusions about autoformatters from clang-format.
Status quo fallacy alert. Arguments are not forever mired in a current state of affairs. People can learn and can build tools to help them do better.
This could change quickly; e.g. if Claude or GitHub or (Your Team) decide to prioritize how source code looks.
(That said, it must be possible to make a more sophisticated formatter for the source code too.)
> Some of us even align other parts of our code, such repeated inline comments
> Now, the arguments block forms a table of three columns. The modifiers make up the first column, the data types are aligned in the second column, and the names are in the third column
These feel like pretty trivial routines that can be encompassed by code formatting.
We can contrive more extreme examples, like the for loop, but super custom formatting ("typesetting") like that has always made me feel awkward, feels like it givesicemse for people to use all manners of arbitrary formatting. The author has some intent, but when you run into an inconsistent code based with lots of things going on, the variance doesn't feel informative or helpful: it sucks and it's a drain.
What's stored is perhaps more minimal, some kind of reference encoding, maybe prettier-ifies for js. The meat of this article to me is that it shouldn't matter: the IDE should let you view and edit as you like:
> Everyone had their own pretty-printing settings for viewing it however they wanted.
What’s the point of such an heavy obfuscation of the intend, really? Let’s take the first example.
If we are fine with the "lengthy" register, why not use character in full word? Or if we want something shorter sign would be actually semantically more on point in general.What with the star to design a pointer? Why not sign-pointer? Or pin for short if we dare to use a pretty straightforward metaphor, so sign-pin. Ah yes by the way, using "dot" (.) or "dash, greater than" (->) is such a typographical non-sense.
And as a side note *char brings nothing in readability compared to sign-pin-pin. Remember that most people read words or even word sequences as a whole. And let’s compare **char to something like sign-pin-back-5.
What with
strcpy? Do we want to play code-obfuscation to look smart being able to decode this pile of letter sequence? What’s wrong with string·copy* or even stringcopy (compare photocopy)? Or even simply copy? If we want to avoid some redundant identifier without relying on overriding through argument types, English is rich in synonyms. For example duplicate, replicate, reproduce.Various parentheses could be just as well optional to ease code browsing if proper typography is already on place, and English already provide many adverb/preposition that could replace/complement them into a linguistically more usual counterparts.
Speaking about prepositions, using from and to as identifiers for things which would be far more aptly described with nouns is really such a confusing choice. What’s wrong with origin/source and destination/target? It’s also a bit counterproductive to put the identifier, which is the main point of interest, at the very end of it’s declaration statement.
Equal for assignment is just really an artifact of more relevant symbol like ← or ≔ because most keyboard layouts stem from disastrous design. But using an more adequate symbol is really pushing for unnecessary obscured notation.
Mandatory semicolon to end a statement is obviously also a typographical nonsense.
If a parameter is to be left blank in for, we would obviously be better served with a separate control-flow construction rather than any way to highlight it’s not filled in that employ.
So packing it all:
Given that in that case the parentheses and comas are purely ornamental, the compiler could just ignore them and would have enough information with something like Or evenBut formatting still doesn't matter. Outside of whitespace-dependent languages, formatting is a subjective thing -- it's a people concern, not a computer concern. I can store my JavaScript as AST if I want to.
Its such a cool idea, though I haven't spent much time using it in anger, so its hard to say if its a useful idea.
I'm just waiting for a breakthrough project to show that it's ready for wider adoption. Leaving text-based tooling is a big ask.
The principles behind Unison, for those who haven't read them yet: https://www.unison-lang.org/docs/the-big-idea/#richer-codeba...
> Each Unison definition is identified by a hash of its syntax tree.
Log statements however I think have an effectively unbounded length. Nothing I hate more than a stupid linter turning a sprinkling of logs into 7 line monsters. cargo fmt is especially bad about this. It’s so bad.
Sent from my 49” G9 Ultrawide.
All that said, I'm interested with this 132 number, where does it come from?
Interesting here perhaps is that even back then it was recognized, that for different situations, different display modes were of advantage.
I'd forgotten that; now that waa a fugly font. I don't think anyone ever used it (aside from the "Setup" banner on the settings screen)
I think the low pixel count was rather mitigated by the persistence of phospher though - there's reproductions of the fonts that had to take this into account; see the stuff about font stretching here: https://vt100.net/dec/vt220/glyphs
Really suites each language imo Although I could probably get away with 80, habit to use tailwind classes can get messy compared to 120
16:9 is rarely what you want for anything that is mainly text.
What I actually want from a linter is “120, unless the trailing bits aren’t interesting in which case 140+ is fine”. The ideal rule isn’t hard and fast! It’s not pure science. There’s an art to it.
https://en.wikipedia.org/wiki/Line_length#cite_note-dykip-8
But someone will always have to either scroll horizontally or wrap the text. I’m speaking as someone who often views code on my phone, with a ~40 characters wide screen.
In typography, it’s well accepted that an average of ~66 chars per line increases readability of bulk text, with the theory being that short lines require you to mentally «jump» to the beginning of the next line frequently which interrupts flow, but long lines make it harder to mentally keep track of where you are in each line. There is however a difference between newspapers and books, since shorter ~40-char columns allows rapid skimming by moving your eyes down a column instead of zigzagging through the text.
But I don’t think these numbers translate directly to code, which is usually written with most lines indented (on the left) and most lines shorter than the maximum (few statements are so long). Depending on language, I could easily imagine a line length of 100 leading to an average of ~66 chars per line.
For C/C++ headers I absolutely despise verbose doxygen bullshit commented a spreading relatively straightforward functions across 10 lines of comments and args.
I want to be able to quickly skim function names and then read arguments only if deemed relevant. I don’t want to read every single word.
In my experience, with programming you rarely have lines of 140 printable characters. A lot of it is indentation. So it’s probably rarely a problem to find your way back on the next line.
I like splitting long text as in log statements into appropriate source lines, just like you would a Markdown paragraph. As in:
I agree that many formatters are bad about this, like introducing an indent for all but the first content line, or putting the concatenation operator in the front instead of the back, thereby also causing non-uniform alinkemt of the text content.Some languages (java) really need the extra horizontal space if you can afford it and aren’t too hard to read when softwrapped.
All of this seems doable, I just think for the most part we don't care very much about our preferences, it has very little impact on readability. Its definitely doable however we could view the code however we most wanted it and have it stored in a different formatting. Might not be 100% round trip stable but it probably doesn't matter.
There is always better where the defaults can be overridden and formatting forced and we only format new and changed lines to reduce potential instability but again go fmt doesn't really suffer from this so its possible to make things pretty reliable. Its simple really, there is a default formatting and the code is stored that way and we can then have our view of choice reformat the code as we want it, when its stored its stored in the default.
If you want everyone to see their own preference of format, either write a script or get AI to format it for you.
But I'll also mention that this pretty much already exists. You can have whitespace options for git. I also imagine there's some setup using hooks that uses one formatter locally, and another for remote.
Also, the common IR already exists - it's just the AST. It was "solved" back in the day when people were throwing whatever they could to the wall to see what sticks since it was all so new. With the benfit of hindsight, I think we can say that it's not that good of an idea.
The project is dead enough that they no longer own the TLD for the company. As far as I know, the only remnants of the project are youtube recordings of demos held at conferences.
With things like treesitter and the like, I sometimes daydream about what an efficient and effective HCI for an AST or IR would look like.
Things like f#s ordered compilation often make code reviews more simple for me, but that’s because a piece of the intermediate form (dependency order) is exposed to me as a first class item. I find it much more simple to reason about compared to small changes in code with more lax ordering requirements, where I often find myself jumping up and down and back and forth in a diff and all the related interfaces and abstract classes and implementations to understand what effect the delta is having on the program as a whole.
Leave code format up to the primary owner of the file. It is pretty rare that code has more than one person that does 95% of the edits on a file so let them own the formatting. In the rare case where there are shared files with shared edits then it is ok to mandate some sort of enforced format but those are so rare that it generally isn't worth discussing. The proposed approach here ignores all the messy non-standard stuff that happens because of the margins or the rules that are very hard to build in when codifying personal coding style.
Let me have my messy desk and I'll let you have yours.
The plain text encoding itself exists in a process of incremental, path-dependent development from Morse Code signals to Unicode resulting in a "Gigantic Lookup Table" (GLUT, my coining) approach to symbolic comprehension. The assumption is useful - lots of features can "just work" by knowing that a particular bit pattern is always a particular symbol.
If we push up the abstraction level, we get a different set of symbols that are better suited to the app, but not equivalent GLUT tooling. Instead we usually get parsing of plain text as a transport. For example, CSV parsing. It is sloppy; it is also good enough.
Edit: XML is also a key example. It goes out of its way to respect the text transport approach. There are dedicated XML editors. But people want to edit it as plain text and they can't quite get there because funny-business with character encodings gets in the way, adding a bunch of ampersands and semicolons onto the symbols they want to edit. Thus we have ended up with "the CSV of hypertext documents", Markdown.
316 more comments available on Hacker News