Formatting Code Should Be Unnecessary

Posted4 months agoActive4 months ago

MaxLeiter

354 points

476 comments

maxleiter.comTechstoryHigh profile

heatedmixed

Debate

85/100

Code FormattingProgramming LanguagesSoftware Development

Key topics

Code Formatting

Programming Languages

Software Development

The article argues that code formatting is unnecessary with the right tooling, sparking a debate on the importance of formatting and the potential benefits and drawbacks of alternative approaches.

Snapshot generated from the HN discussion

Discussion Activity

Very active discussion

First comment

22m

Peak period

0-3h

Avg / period

Comment distribution160 data points

Loading chart...

Based on 160 loaded comments

Key moments

01Story posted
Sep 7, 2025 at 7:08 PM EDT
4 months ago
Step 01
02First comment
Sep 7, 2025 at 7:30 PM EDT
22m after posting
Step 02
03Peak activity
37 comments in 0-3h
Hottest window of the conversation
Step 03
04Latest activity
Sep 9, 2025 at 10:34 AM EDT
4 months ago
Step 04

Generating AI Summary...

Analyzing up to 500 comments to identify key contributors and discussion patterns

Discussion (476 comments)

Showing 160 comments of 476

kelseyfrog

4 months ago

14 replies

The tradeoff here is not being able to use a universal set of tooling to interact with source files. Anything but text makes grep, diff, sed, and version control less effective. You end up locked into specialized tools, formats, or IDE extensions, while the Unix philosophy thrives on composability with plain text.

There's a scissor that cuts through the formatting debate: If initial space width was configurable in their editor of choice, would those who prefer tabs have any other arguments?

Avshalom

4 months ago

1 reply

The entire OS was built around these source files.

the unix philosophy on the other hand only "thrives" if every other tool is designed around (and contains code to parse) "plain text"

lmm

4 months ago

2 replies

> The entire OS was built around these source files.

And how did that work out for them?

This seems like one of the many cases where unix won out by being a lowest common denominator. Every platform can handle plain text.

aleph_minus_one

4 months ago

1 reply

> This seems like one of the many cases where unix won out by being a lowest common denominator.

The lowest common denominator rather is binary blobs. :-)

thfuran

4 months ago

The conversion of which to text and back has historically proven rather fraught.

account42

4 months ago

Not all platforms come with powerful text handling tools out of the box - or at least they didn't used to until Unix-based systems forced them to catch up.

jsharpe

4 months ago

4 replies

Exactly. This idea comes up time and time again, but the cost/benefit just doesn't make sense at all. You're adding an unbelievable amount of complex tooling just to avoid running a simple formatter.

The goal of having every developer viewing the code with their own preferences just isn't that important. On every team I've been on, we just use a standard style guide, enforced by formatter, and while not everyone agrees with every rule, it just doesn't matter. You get used to it.

Arguing and obsessing about code formatting is simply useless bikeshedding.

scubbo

4 months ago

1 reply

I disagree with almost every choice made by the Go language designers, but `Gofmt's style is no one's favorite, yet gofmt is everyone's favorite` is solid. Pick a not-unreasonable standard, enforce it, and move on to more important things.

spyspy

4 months ago

1 reply

My only complaint about gofmt is that it’s not even stricter about some things.

duskwuff

4 months ago

Good news: there are tools like https://github.com/mvdan/gofumpt which fork gofmt and enforce stricter rules (while remaining invariant under gofmt).

rbits

4 months ago

1 reply

Yeah it would probably be a waste of time. It's a nice idea to dream about though. It would be nice to be able to look at some C# code and not have opening curly brackets on a separate line.

mdaniel

4 months ago

1 reply

I say this fully cognizant of the thread in which it's posted, but these people are sick

https://astyle.sourceforge.net/astyle.html#_style=whitesmith

And then someone said: oh yeah? Hold my beer https://astyle.sourceforge.net/astyle.html#_style=pico

masklinn

4 months ago

These are beginner horrors. For me nothing beats the insanity of the gnu style: https://astyle.sourceforge.net/astyle.html#_style=gnu

Buttons840

4 months ago

1 reply

> Arguing and obsessing about code formatting is simply useless bikeshedding.

Unless it's an accessibility issue, and it is an accessibility issue sometimes.

mmastrac

4 months ago

Maybe if you use 16-wide tabs or a 40 character line length.

raspasov

4 months ago

>> The goal of having every developer viewing the code with their own preferences just isn't that important.

Bah! So, what is more important? Is the average convenience of the herd more important? Average of the convenience, even if there was ever such a thing.

What if you really liked reading books in paper format, but were forced to read them on displays for... reasons?

cowsandmilk

4 months ago

1 reply

How is diff less effective? I see the diff in the formatting I prefer? With sed, I can project the source into a formatting most convenient for what I’m trying to do with sed. And I have no idea what you’re on about version control. It ruins sending patch files that require a line number around, but most places don’t do that any more.

What I would be curious on is tracing from errors back to the source code. Nearly every language I’ve used prints line number and offset on the line for the error. How that worked in the Diana world would be interesting to learn.

sublinear

4 months ago

1 reply

You'd have to run diff and sed before the formatter which is harder for everyone.

peanball

4 months ago

I can only recommend difftastic[1], which is a language aware diff. Independent of linter that shows the logical diff, not an assortment of characters or lines that changed.

[1]: https://github.com/Wilfred/difftastic

accelbred

4 months ago

2 replies

What if the common intermediate encoding is text, not binary? Then grep/diff/sed all still work.

If we had a formatting tool that operated solely on AST, checked in code could be in a canonical form for a given AST. Editors could then parse the AST and display the source with a different formatting of the users choice, and convert to canonical form when writing the file to disk.

sublinear

4 months ago

2 replies

Nobody wants to have to run their own formatter rules in reverse in their head just to know what to grep for. That defeats the point of formatting at all.

pwdisswordfishz

4 months ago

1 reply

That's why you grep for a syntactic structure, not undifferentiated text.

michaelmrose

4 months ago

2 replies

Which grep doesn't do and you need to either use a new different tool or more likely several for little real benefit

hnlmorg

4 months ago

2 replies

grep is half a century old now.

If we can’t progress our ecosystem because we are reliant on one very specific 50+ year old line parser, then that says more about the inflexibility of the industry to move forward than it does about the “new” ideas being presented.

komali2

4 months ago

1 reply

The things all being described are way beyond non trivial to solve, and they'd need to be solved for every language.

Grep works great.

hnlmorg

4 months ago

> The things all being described are way beyond non trivial to solve, and they'd need to be solved for every language.

Except it already is a solved problem.

If languages compile to a common byte code then you just need one tool. You already see examples of this with things like the IR assembly produced by LLVM, various Microsoft languages that compile to CLR, and the different languages that target JVM.

There are also already common ways to create reusable parsing rules like LSP for IDEs and treesitter.

In fact there are already grep-like utilities that are based on treesitter.

So it’s not only very possible to create language agnostic, reusable, tools; but these tools already exist and being used by a great many developers.

The problem raised in the article is that we just don’t push these concepts hard enough these days. Instead relying on outdated concepts of what source code should look like.

> Grep works great

For LF-separated lists it does. But if it worked great for structured content then we wouldn’t be having this conversation to begin with.

account42

4 months ago

1 reply

We still use grep because its useful. And it's useful precisely because it doesn't depend on syntax so will work on anything text based.

hnlmorg

4 months ago

grep is great. My point isn’t that we shouldn’t use it. My point is that we shouldn’t be held back by it.

jitl

4 months ago

comby is fantastic, give it a shot. It’s saved me huge amounts of time.

theamk

4 months ago

You'd need all-news tools for non-text world as well.

So the real choice is either:

- new tool: grep with caching reverse-formatter filter.

- new tool: ast-grep with understanding of AST serialization format for your specific language.

At least in the first case, you still have fall back.

pmontra

4 months ago

All mainstream editors that agree to work on a standard AST for any given language could be nice. I'm not expecting that to happen at any time in future.

About grep and diff working on a textual representation of the AST, it would be like grepping on Javascript source code when the actual source code is Typescript or some other more distant language that compiles to Javascript (does anybody remember Coffescript?) We want to see only the source code we typed in.

By the way, add git diff to the list of tools that should work on the AST but show us the real source code.

bee_rider

4 months ago

1 reply

Is it possible converted from the DIANA ir back to something that looks like source code? Then the result of the conversion backward could be grepped, etc…

teo_zero

4 months ago

1 reply

From TFA:

> Everyone had their own pretty-printing settings for viewing [DIANA] however they wanted.

bee_rider

4 months ago

1 reply

> Back when he was working on Ada, they didn't store text sources at all — they used an IR called DIANA. Everyone had their own pretty-printing settings for viewing it however they wanted.

I’m still confused because the specifically call the IR DIANA, and they talk about viewing the IR. It isn’t clear to me if the IR is more like a bytecode or something, or more like just the original source code with a little processing done to it. They also have a quote,

> Grady Booch summarizes it well: R1000 was effectively a DIANA machine. We didn't store source code: source code was simply a pretty-printing of the DIANA tree.

So maybe the other visualizations they could do by transforming the IR were so nice that nobody even cared to look at the original ADA that they’d written to generate it?

brabel

4 months ago

I imagine it’s like storing JVM bytecode, ie class files instead of Java files. So when you open it up the editor decompiles it , like IntelliJ does if you try to open a class file, but then it also applies your own style, like from .editorconfig, on the code it shows. It’s a really good idea and I can’t believe people here are complaining that it’s bad because they can’t use grep! But that’s a good thing!! Who the hell is grepping code as if code had no structure and that’s the best you can do? So you also grep JSON instead of using jq? Just don’t!

danielheath

4 months ago

2 replies

If you’re going to store the source in a canonical format and unpack that to suit each developer… why should the canonical format just be regular source code?

All the same tools can exist with a text backend, and you get grep/sed support for free too!

giveita

4 months ago

2 replies

My grep may not work on your settings for the same code.

This becomes an issue with say CI where maybe I add a gate to check something with grep. But whose format do I assume? My local (that I used to test it locally) or the canonical (which means I need to switch local format to test it)?

treadmill

4 months ago

1 reply

You're misunderstanding the idea I think.

You would use the format on disk for the grep. "Your format" only exists displayed in your editor.

giveita

4 months ago

Aha

brabel

4 months ago

You really rely on grep on CI? How fragile is that ?! This is a good argument for storing non-text. Grepping code is laughably unreliable. The only way to write things like that reliably is by actually parsing the code and working in its AST. Working in text is like writing code in a completely untyped language. It can be done, but it’s beyond stupid for anything where accuracy matters.

psychoslave

4 months ago

1 reply

That’s seems like a genious remark actually. If you store the abstract objects and have the mechanism to transform to whatever the desired output form is, it’s almost trivial to expose a version as files and text rendering for tools that are thus oriented, isn’t it?

danielheath

4 months ago

1 reply

Originally my fathers idea from back in the 90s to create a language with a whole suite of syntactic representations to suit your preferences.

Want it to look like C? Lisp? Pascal? Why not!

psychoslave

4 months ago

Do you have more references about what your fathers did back then?

eviks

4 months ago

1 reply

> If initial space width was configurable in their editor of choice, would those who prefer tabs have any other arguments?

Yes, of course, because tab width is * dynamically* flexible, so initial space width isn't enough

pasc1878

4 months ago

1 reply

Yes because if you want to deindent with tabs it is just delete one character whilst spaces requires you top delete x characters where x is the number of spaces you indent by.

eviks

4 months ago

For "clean-fixed-width" unambiguous indent (eg, at the beginning of lines) you can make delete also delete X=indent_width spaces.

But for "dirty-width" indents, eg, after some text that can vary in size (proportional fonts or some special chars even in fixed fonts) you can't align with spaces while a tab width can be auto-adjusted to match the other line

MyOutfitIsVague

4 months ago

3 replies

The way I envision this working is with something like git filters. Checking out from version control converts it all into text in your preferred formatting, which you then work with as expected. Staging it converts it into the stored representation. In git, this would be done with smudge and clean filters, like how git LFS works. You'd also have viewers for forges and the like that are built to interpret all the stored representations as needed.

You still work with text, the text just isn't the canonical stored representation. You get diffs to resolve only when structure is changed.

You get most of the same benefit with a pre-commit linter hook, though.

bapak

4 months ago

2 replies

This is it, unfortunately git is "too dumb" for this. In order to merge code, it would have to either understand the AST.

What happens when you stage the line `} else return {`? git doesn't allow to stage specific AST nodes. It would also mean that you can't stage partial code (that produces syntax errors)

Hendrikto

4 months ago

Smudge and clean filters work on text, git would not need to change at all.

You would still store text, and still check out text, just transformed text. You could still check in anything you want, including partial code, syntax errors, or any other arbitrary text. Diffs would work the same way they do now.

zokier

4 months ago

Git can use arbitrary merge (and diff) tools. Something like https://mergiraf.org/introduction.html works with git and gets you ast aware merging. Do not underestimate gits flexibility.

zokier

4 months ago

The problem is that there is little benefit in not having the canonical stored representation be text. The crucial thing is to have some canonical representation but it might as well be human readable.

account42

4 months ago

Please no, git trying to automatically "correct" \n vs \r\n line endings is already horrible enough. At least you can turn that off.

gr__or

4 months ago

5 replies

Text surely is a hill, but I believe it's a local one, we got stuck on due to our short-sighted inability to go into a valley for a few miles until we find the (projectional) mountain.

All of your examples work better for code with structural knowledge:

- grep: symbol search (I use it about 100x as often as a text grep) or https://github.com/ast-grep/ast-grep

- diff: https://semanticdiff.com (and others), i.e.: hide noisy syntax only changes, attempt to capture moved code. I say attempt, because with projectional programming we could have a more expressive notion of code being moved

- sed: https://npmjs.com/package/@codemod/cli

- version control: I'd look towards languages like Unison to see what funky things we could do here, especially for libraries. A general example: no conflicts due to non-semantic changes (re-orderings, irrelevant whitespaces, etc.)

zokier

4 months ago

2 replies

But as the tools you link demonstrate, having "text" as the on-disk format does not preclude AST based (or even smarter) tools. So there is little benefit in having non-text format. Ultimately it's all just bytes on disk

gr__or

4 months ago

1 reply

Even that is not without its cost. Most of these tools are written in different languages, which all have to maintain their own parsers, which have to keep up with language changes.

And there are abilities we lose completely by making text the source of truth, like a reliable version control for "this function moved to a new file".

theamk

4 months ago

1 reply

At least the parsers are optional now - you can still grep, diff, etc.. even if your tools have no idea about language's semantics.

But if you store ASTs, you _have_ to have the support of each of the language for each of the tools (because each language has its own AST). This basically means a major chicken-and-egg problem - a new language won't be compatible with any of the tools, so the adoption will be very low until the editor, diff, sed etc.. are all updated.. and those tools won't be updated until the language is popular.

And you still don't get any advantages over text! For example, if you really cared about "this function moved to new file" functionality, you could have unique id after each function ("def myfunc{f8fa2bdd}..."), and insert/hide them in your editor. This way the IDE can show nice definition, but grep/git etc.. still work but with extra noise.

In fact, I bet that any technology that people claim requires non-readable AST files, can be implemented as text for many extra upsides and no major downsides (with the obvious exception of truly graphical things - naive diffs on auto-generated images, graphs or schematics files are not going to be very useful, no matter what kind of text format is used)

Want to have each person see it's own formatting style? Reformat to person's style on load and format back to project style on save. Modern formatters are so fast, people won't even notice this.

Want fast semantic search? Maintain the binary cache files, but use text as source-of-truth.

Want better diff output? Same deal, parse and cache.

Want to have no files, but instead have function list and edit each one directly, a la Smalltalk? Maintain files transparently with text code - maybe one file per function, or one file per class, or one per project...

The reason people keep source code as text as it's really a global maximum. The non-text format gives you a modest speedup, but at the expense of imposing incredible version compatibility pain.

gr__or

4 months ago

1 reply

The complexity of a parser is orders of magnitude higher than that of an AST schema.

I'm also not saying we can have all these good things, but they are not free, and the costs are more spread out and thus less obviously noticeable than the ones projectional code imposes.

theamk

4 months ago

1 reply

Are you talking about runtime complexity or programming-time complexity?

If the runtime, then I bet almost no one will notice, especially if the appropriate caching is used.

If the programming-time - sure, but it's not like you can avoid parsers altogether. If the parsers are not in the tools, they must be in IDE. Factor out that parsing logic, and make it a library all the tools can use (or a one-shot LSP server if you are in the language that has hard-to-use bindings).

Note even with AST-in-file approach, you _still_ need the library to read and write that AST, it's not like you can have a shared AST schema for multiple languages. So either way, tools like diff will need to have a wide variety of libraries linked in, one for each language they support. And at that point, there is not much difference between AST reader and code parser.

gr__or

4 months ago

I meant programming-time, but runtime is also a good point.

Cross-language libraries don't seem to be super common for this. The recovering-sense-from-text tools I named all use different parsers in their respective languages.

Again, reading (and yes, technically that's also parsing) from an AST from a data-exchange formatted file is mags simpler. And for parsing these schemes there are battle-tested cross-language solutions, e.g. protobuf.

rafaelmn

4 months ago

Why even have a database - let's just keep the data in CSVs, we can grep it easily, it's all bytes on a disk.

Tooster

4 months ago

1 reply

I’d also add:

* [Difftastic](https://difftastic.wilfred.me.uk/) — my go-to diff tool for years * [Nu shell](https://www.nushell.sh/) — a promising idea, but still lacking in design/implementation maturity

What I’d really like to see is a *viable projectional editor* and a broader shift from text-centric to data-centric tools.

The issue is that nearly everything we use today (editors, IDEs, coreutils) is built around text, and there’s no agreed-upon data interchange format. There have been attempts (Unison, JetBrains MCP, Nu shell), but none have gained real traction.

Rare “miracles” like the C++ --> Rust migration show paradigm shifts can happen. But a text → projectional transition would be even bigger. For that to succeed, someone influential would need to offer a *clear, opt-in migration path* where:

* some people stick with text-based tools, * others move to semantic model editing, * and both can interoperate in the same codebase.

What would be needed:

* Robust, data-native alternatives to [coreutils](https://wiki.archlinux.org/title/Core_utilities) operating directly on structured data (avoid serialize ↔ parse boundaries). Learn from Nushell’s mistakes, and aim for future-compatible, stable, battle-tested tools. * A more declarative-first mindset. * Strong theoretical foundations for the new paradigm. * Seamless conversion between text-based and semantic models. * New tools that work with mainstream languages (not niche reinventions), and enforce correctness at construction time (no invalid programs). * Integration of semantic model with existing version control systems * Shared standards for semantic models across languages/tools (something on the scale of MCP or LSP — JetBrains’ are better, but LSP won thanks to Microsoft’s push). * Dual compatibility in existing editors/IDEs (e.g. VSCode supporting both text files and semantic models). * Integrate knowledge across many different projects to distill the best way forward -> for example learn from Roslyn's semantic vs syntax model, look into tree sitter, check how difftastic does tree diffing, find tree regex engines, learn from S-expressions and LISP like languages, check unison, adopt helix editor/vim editing model, see how it can eb integrated with LSP and MCP etc.

This isn’t something you can brute-force — it needs careful planning and design before implementation. The train started on text rails and won’t stop, so the only way forward is to *build an alternative track* and make switching both gradual and worthwhile. Unfortunately it is pretty impossible to do for an entity without enough influence.

zokier

4 months ago

1 reply

But almost every editor worth its salt these days has structural editing.

https://docs.helix-editor.com/syntax-aware-motions.html

https://www.masteringemacs.org/article/combobulate-structure...

https://zed.dev/blog/syntax-aware-editing

Etc etc.

Tooster

4 months ago

And that's a great thing! I look forward to them being more mature and more widely adopted, as I have tried both zed and helix, and for the day to day work they are not yet there. For stuff to take traction though. Both of them, however, don't intend to be projectional editors as far as I am aware. For vims or emacs out there - I don't think they mainstream tools which can tip the scale. Even now vim is considered a niche, quirky editor with very high barrier of entry. And still, they operate primarily on text.

Without tools in mainstream editors I don't see how it can push us forward instead of saying a niche barely anyone knows about.

jrochkind1

4 months ago

1 reply

So there was an era, as the OP says, where your arguments were popular and believed and it was understood that things would move in this direction.

And yet it didn't, it reversed. I think the fact that "plain text for all source files" actually won in the actual ecosystem wasn't just because too many developers had the wrong idea/short-sightedness -- because in fact most influential people wanted and believed in what you say. It's because there are real factors that make the level of investment required for the other paths unsustainable, at least compared to the text source path.

it's definitely related to the "victory" of unix and unix-style OSs. Which is often understood as the victory of a philosophy of doing it cheaper, easier, simpler, faster, "good enough".

It's also got to do with how often languages and platforms change -- both change within a language/platform and languages/platforms rising and falling. Sometimes I wish this was less quick, I'm definitely a guy who wants to develop real expertise with a system by using it over a long time, and think you can work so much more effectively and productively when you have done such. But the actual speed of change of platforms and languages we see depends on reduced cost of tooling.

gr__or

4 months ago

For me, that's what "short-sighted inability" means. The business ecosystem we have does not have the attention span for this kind of project. What we need is individuals grouping together against the gradient of incentives (which is hard indeed).

kelseyfrog

4 months ago

Thank you for your response. Conveniently, we can use an existing example - Clang's pch files. Could you walk me through using grep, diff, sed, and git on pch? I'd really appreciate it.

gorgoiler

4 months ago

I feel it’s important to stick up for the difference between text and code. The two overlap a lot, but not all text is code, even if most code is text.

It’s a really subtle difference but I can’t quite put my finger on why it is important. I think of all the little text files I’ve made over the decades that record information in various different ways where the only real syntax they share is that they use short lines (80 columns) and use line orientation for semantics (lah-dee-dah way of saying lots of lists!)

I have a lot of experience of being firmly ensconced in software engineering environments where the only resources being authored and edited were source code files.

But I’ve also had a lot of experience of the kind of admin / project / clerical work where you make up files as you go along. Teaching in a high school was a great place to practice that kind of thing.

rendaw

4 months ago

Grep, diff, sed, and line-based non-semantic merge are all terrible tools for manipulating code... rather than dig ourselves in either further with those maybe a reason to come up with something better would be good.

charcircuit

4 months ago

In practice how many tools do you really need to handle the custom format? Probably single digits and they could all use a common library to handle the formatting aspect of things.

Ygg2

4 months ago

> would those who prefer tabs have any other arguments?

Yes. Because Yaml exists. And mixing tabs and spaces is horrible in it. And the rules are very finnicky.

Optimal tab usage is emit 2-4 spaces.

froh

4 months ago

yes, contemporary editors and tools like treesitter have decided this debate in favor of plain text file representation, exactly for the reasons you give: universal accessibility by general purpose tools.

xslt was a Diana like pre-parsed representation of dsssl. oh how I miss dsssl (a scheme based sgml transformation language) but no. dsssl was a lisp! with hygienic macros! "ikes" they went and invented XSLT.

the "logic" escapes me to this day.

no. plain text it is. human readable. and grep/sed/diff able.

aleph_minus_one

4 months ago

> Anything but text makes grep, diff, sed, and version control less effective.

Perhaps this is rather a design mistake in how UNIX handles things and is so focused on text.

shmerl

4 months ago

2 replies

You can't easily search / grep etc. an IR, unless you use some kind of reverse translator. Readable source files have their benefits in being simple in that sense.

marssaxman

4 months ago

1 reply

Imagine having to write a new diff tool for each language!

kesor

4 months ago

1 reply

You don't need a special grep for every language, you just need a tool that translates the mini version into the formatted version and back. Then you chain the tools, just like anything else in UNIX.

marssaxman

4 months ago

Seems reasonable. Since you're likely to perform this translation more than once for any given file, it seems like it would be practical to cache the translated output, perhaps as a file on disk.

eviks

4 months ago

1 reply

> unless you use some kind of reverse translator

Would a few decades help in universally having such a translator in all the tools?

account42

4 months ago

1 reply

In all the tools? No, you'd need an infinite amount of time for that.

eviks

4 months ago

The tools didn't require an infinite amount of time to write, why would it take infinity to change a format??? (but no not all the tools, just all the ones that are used)

ChrisMarshallNY

4 months ago

2 replies

I've heard that Google works [sort of] that way (don't know, myself). They have a lot of tools that allow devs to use what formatting they want, and it's made standard, during checkin.

I heard this, many years ago, when we used Perforce. The Perforce consultant that we dealt with, told us this, as an example of triggers. Back then, I was told that Google was a big Perforce shop (maybe just a part of Google. I dunno).

I have heard that this was one of the goals of developing IDLs. I think the vision was, that you could have a dozen different programmers, working in multiple languages (for example, C for the drivers, Haskell for the engine, and Lua for the UI). They would be converted to a common IDL, when submitted to configuration management, and then extracted from that, when the user looks at it.

I can't see that working, but a lot of stuff that I used to think was crazy, has happened, so, who knows?

yojo

4 months ago

I can confirm that Google was using Perforce for version control extensively, at least through 2008. I think it was somehow customized, but I definitely have lingering muscle memory around “p4 sync” and “p4 submit”.

I was on an internal tools team doing distinctly unsexy LAMP-stack work, but all the documentation I ever saw talked about perforce/p4.

__loam

4 months ago

Go was designed at Google with a built in style checker to explicitly address this and prevent bikeshedding.

lisper

4 months ago

2 replies

It never ceases to amaze me how many times people can essentially re-invent S-expressions without realizing that's what they are doing.

mdaniel

4 months ago

1 reply

Wait until that Bablr user shows up to these threads, and then you'll really have to start drinking

conartist6

4 months ago

Wow I am thoroughly honored. You are probably the first person ever who isn't me to bring it up in a thread.

I had never heard of DIANA but I love old ideas being new again. (Plus you made me laugh)

benrutter

4 months ago

1 reply

Scanning the comments waiting for a lisper to comment and found one!

I guess lisp still has whitespace? That seems like the only meaningful way it isn't already just what the post is describing.

Jach

4 months ago

1 reply

In actual Common Lisp development, code is stored in text files and edited and diffed as text in source controlled repositories. Once code is evaluated by an implementation, it's a different story, but before that there are many formatting options. It's mostly around where to put line breaks, whitespace, and parens, but still. The other day I wrote this simple function:

    (defun check-password-against-hash (password hash)
      (handler-case
        (bcrypt:password= password hash)
        (error () nil)))

There's already multiple choices on formatting (and naming, and other things) just from this sample.

In theory a system could be made where this level of code isn't what's actually stored and is just a reverse pretty-print-with-my-preferences version of the code, as the post mentions. SBCL compiles my function when I enter it, I can ask SBCL to describe it back to me:

    * (describe #'check-password-against-hash)
    #<FUNCTION CHECK-PASSWORD-AGAINST-HASH>
      [compiled function]
    
    Lambda-list: (PASSWORD HASH)
    Derived type: (FUNCTION (T T) *)
    Source form:
      (LAMBDA (PASSWORD HASH) (BLOCK CHECK-PASSWORD-AGAINST-HASH (HANDLER-CASE (CL-BCRYPT:PASSWORD= PASSWORD HASH) (ERROR NIL NIL))))

I can also ask SBCL to show me the disassembly, perhaps again in theory a system could be made where you can get and edit text at that level of abstraction before putting it back in.

    * (disassemble #'check-password-against-hash)
    ; disassembly for CHECK-PASSWORD-AGAINST-HASH
    ; Size: 308 bytes. Origin: #xB8018AA278                       ; CHECK-PASSWORD-AGAINST-HASH
    ; 278:       498B4510         MOV RAX, [R13+16]               ; thread.binding-stack-pointer
    ; 27C:       488945F8         MOV [RBP-8], RAX
    ; 280:       488965D8         MOV [RBP-40], RSP
    ; 284:       488D45B0         LEA RAX, [RBP-80]
    ; 288:       4D8B7520         MOV R14, [R13+32]               ; thread.current-unwind-protect-block
    ; 28C:       4C8930           MOV [RAX], R14
    ; ... and so on ....

(SBCL does actually let you modify the compiled code directly if you felt the urge to do such a thing. You just get a pointer to the given origin address and offset and write away.)

But just going back to the Lisp source form, it's close enough that you could recover the original and format it a few different ways depending on different preferences. e.g. someone might prefer the first expression given to handler-case to be on the same line instead of a new line like I did. But to such a person, is that preference universal, or does it depend on the specific expressions involved? There are other not strictly formatting preferences at play here too, like the use of "cl-bcrypt" vs "bcrypt" as package name, or one could arrange to have no explicit package name at all. My own preferences on both matters are context-sensitive. The closest universal preference I have around this general topic is that I really hate enforced format tools even if they bent to my specific desires 100% of the time.

I'd say the closest modern renditions of what the post is talking about are expressed by node editors. Unreal's Blueprints or Blender's shader editor are two examples, ETL tools are another. But people tend to work at the node level (and may have formatting arguments about the node layout) rather than a pretty-printed text representation of the same data. I think in the ETL world it's perhaps more common to go under the hood a little and edit some text representation, which may be an XML file (and XML can be pretty-printed for many different preferences) or a series of SQL statements or something CSV or INI like... whether or not that text is a 'canonical' representation or a projection would depend on the tool.

lisper

4 months ago

1 reply

> In actual Common Lisp development, code is stored in text files and edited and diffed as text in source controlled repositories.

That's true, but there is a very big difference between S-expressions stored as text and other programming languages stored as text because there is a standard representation of S-expressions as text, and Common Lisp provides functions that implement that standard in both directions (READ and PRINT) as part of its standard library. Furthermore, the standard ensures READ-PRINT equivalency, i.e. if you READ the result of PRINTing an object the result is an equivalent object. So there is a one-to-one mapping (modulo copying) between the text form and the internal representation. And, most importantly, the semantics of the language are defined on the internal representation and not the textual form. So if you wanted to store S-expressions in, say, a relational database rather than a text file, that would be an elementary exercise. This is why many CL implementations provide alternative serializations that can be rendered and parsed more efficiently than the standard one, which is designed to be human-readable.

This is in very stark contrast to nearly every other programming language, where the semantics are defined directly on the textual form. The language standard typically doesn't even require that an AST exist, let alone define a canonical form for it. Parsers for other languages are typically embedded deep inside compilers, and not provided as part of the standard library. Every one is bespoke, and they are often byzantine. There are no standard operations for manipulating an AST. If you want to write code that generates code, the output must be text, and the only way to run that code is to parse and compile it using the bespoke parser that is an opaque part of the language compiler. (Note that Python is a notable exception.)

whartung

4 months ago

1 reply

Its interesting that despite the utility of S-Expressions, as mentioned, semantic diff, for example, of CL code is uncommon.

By that I mean highlighting the diff between these:

  (dolist (i l)
    (print (car i)))

  (dolist (i l) (print (cdr i)))

With the diff highlighting the `car` changed to `cdr` rather than just the raw lines being changed.

I'm pretty sure this exists, but it's uncommon (at least to me its uncommon).

lisper

4 months ago

It is uncommon because it turns out that text diff is good enough 99% of the time, especially if you follow normal formatting and indentation conventions.

Also, structural diff is actually a very hard problem.

cnnlives83

4 months ago

1 reply

It basically is, unless you’re in a whitespace-Nazi language like Python (no offense!).

It doesn’t get much less formatted than Minified JavaScript, except maybe Perl or Brainfuck.

kesor

4 months ago

Minified JS often comes with mangling the names of functions, variables, etc... Formatters and prettifiers lack the ability to bring back the original names and meaning.

hackerbrother

4 months ago

1 reply

Along these lines, Go eliminates many formatting decisions at the syntax level. E.g.,

  func main()
  {
          fmt.Println("HELLOWORLD")
  }

is not just non-standard formatting, but illegal Go syntax. Similarly, extra parentheses around if clauses are not allowed.

TheDong

4 months ago

2 replies

> Similarly, extra parentheses around if clauses are not allowed.

However 'if (x) == (1) {}' is totally fine with the formatter. As is an assignment of '(x) = (y)'.

It's actively annoying too because like, extra parenthesis often have important meaning.

For example, consider the following code:

    if (x.isFoo() || x.isBar()) /* && x.isBaz() */ { /* code */ }

In that case, the code is obviously temporarily commented out, but go's formatting will make it so that if you comment it out like that, fmt, and then uncomment it and forget to re-add the parens, you get shot in the foot.

I've hit that far more times than it's uhh... I dunno, I guess removed parenthesis I didn't want? I don't write them if I don't want them.

lsaferite

4 months ago

FWIW, in that scenario, for the reasons you've called out, I would normally dupe the if line and comment the original one for reference. Mind you, none of that would ever get committed, but for a temp local change it's fair game.

lsaferite

4 months ago

go is generally hostile to temporarily commenting out code. The if syntax issue you call out is just one aspect. Another is the inability to have unused variables.

bertil

4 months ago

1 reply

That’s essentially what black has done with Python, though.

dubya

4 months ago

1 reply

I want to like Black (or rather, uv format), but the mandatory trailing commas weird me out, especially in function definitions. It always looks like an error to me.

thewisenerd

4 months ago

1 reply

this is so future 'git diff's when adding new parameters don't look bad

i wonder how many default formatting decisions are made this way (including go fmt, etc)

dubya

4 months ago

I've seen the reason, I just don't find it convincing. I would like to omit these commas, but also use the formatter sometimes, but that's not really an option. It's frustrating since 'uv format' will omit them for older python versions, so the logic is there.

Something between "everything fits on one short line" and "every argument gets its own line" would be nice too. Spreading a function definition or call across ten lines when it would fit on two or three doesn't feel like an automatic win.

davetron5000

4 months ago

7 replies

There’s also a typography element to formatting source code. The notion that all code formatting is mere personal preference isn’t true. Formatting code a certain way can help to communicate meaning and structure. This is lost when the minimal tokens are serialized and re-constituted using an automated tool.

https://naildrivin5.com/blog/2013/05/17/source-code-typograp...

pwdisswordfishz

4 months ago

1 reply

> A C argument declaration is made up of modifiers (register, const), a data type (char *), and a name (from).

Now explain a declaration like "char *argv[]"...

> We’ve also re-set the data type such that there is no space between char and * - the data type of both of these variables is “pointer to char”, so it makes more sense to put the space before the argument name, not in the middle the data type’s name (update: it should be pointed out that this only makes sense for a single declaration. A construct like char* a, b will create a pointer to char, a, and a regular char, b).

Ah, yes, the delusional C++ formatting style. At least it's nice that the update provides the explanation why it should be avoided.

yccs27

4 months ago

My $0.02: Don't throw away a perfectly good mental model because of a compiler ideosyncasy. Just treat it as a special case and use a linter against stuff like char* a, b.

You also don't think about dollars differently than other units, just because the sign goes before the number.

anticodon

4 months ago

1 reply

Yes. In Python, black formatter consistently breaks SQLAlchemy queries in an unreadable way (e.g. splitting conditions over multiple lines when it's not really necessary and makes reading harder).

3036e4

4 months ago

1 reply

For C++ clang-format does things like that all the time as well. Of course it has no idea what semantically belongs together on the same line or not. I wish the C++ world had settled on some other standard linter.

IshKebab

4 months ago

clang-format is probably the worst of the autoformatters. They tried to get fancy with a sort of global optimisation algorithm but in practice it's buggier and uglier than the classic Prettier algorithm which is elegant and generally works very well. It's also way less diff friendly.

I wouldn't draw any conclusions about autoformatters from clang-format.

IshKebab

4 months ago

1 reply

Yeah in theory people can do a better job than auto-formatters. In practice they absolutely do not, so that argument is moot.

xpe

4 months ago

> Yeah in theory people can do a better job than auto-formatters. In practice they absolutely do not, so that argument is moot.

Status quo fallacy alert. Arguments are not forever mired in a current state of affairs. People can learn and can build tools to help them do better.

This could change quickly; e.g. if Claude or GitHub or (Your Team) decide to prioritize how source code looks.

frizlab

4 months ago

Yes! I’m always appalled that people cannot see that.

Mikhail_Edoshin

4 months ago

And I'd add that typographers go out of their skin to typeset tables and formulae so that everything is aligned and has proper spacing. For centuries this was done manually because it it important, even though an outsider cannot notice it.

(That said, it must be possible to make a more sophisticated formatter for the source code too.)

jauntywundrkind

4 months ago

I'm pretty unconvinced by the examples.

> Some of us even align other parts of our code, such repeated inline comments

> Now, the arguments block forms a table of three columns. The modifiers make up the first column, the data types are aligned in the second column, and the names are in the third column

These feel like pretty trivial routines that can be encompassed by code formatting.

We can contrive more extreme examples, like the for loop, but super custom formatting ("typesetting") like that has always made me feel awkward, feels like it givesicemse for people to use all manners of arbitrary formatting. The author has some intent, but when you run into an inconsistent code based with lots of things going on, the variance doesn't feel informative or helpful: it sucks and it's a drain.

What's stored is perhaps more minimal, some kind of reference encoding, maybe prettier-ifies for js. The meat of this article to me is that it shouldn't matter: the IDE should let you view and edit as you like:

> Everyone had their own pretty-printing settings for viewing it however they wanted.

psychoslave

4 months ago

Caring for typography but blindly bending to dubious programming-language convention feels really like putting efforts on the wrong starting point though.

What’s the point of such an heavy obfuscation of the intend, really? Let’s take the first example.

    char *
    strcpy(to, from)
            register char *to;
            register const char *from;
    {
            char *save = to;

            for (; (*to = *from) != 0; ++from, ++to);
            return(save);
    }

If we are fine with the "lengthy" register, why not use character in full word? Or if we want something shorter sign would be actually semantically more on point in general.

What with the star to design a pointer? Why not sign-pointer? Or pin for short if we dare to use a pretty straightforward metaphor, so sign-pin. Ah yes by the way, using "dot" (.) or "dash, greater than" (->) is such a typographical non-sense.

And as a side note *char brings nothing in readability compared to sign-pin-pin. Remember that most people read words or even word sequences as a whole. And let’s compare **char to something like sign-pin-back-5.

What with

strcpy? Do we want to play code-obfuscation to look smart being able to decode this pile of letter sequence? What’s wrong with string·copy* or even stringcopy (compare photocopy)? Or even simply copy? If we want to avoid some redundant identifier without relying on overriding through argument types, English is rich in synonyms. For example duplicate, replicate, reproduce.

Various parentheses could be just as well optional to ease code browsing if proper typography is already on place, and English already provide many adverb/preposition that could replace/complement them into a linguistically more usual counterparts.

Speaking about prepositions, using from and to as identifiers for things which would be far more aptly described with nouns is really such a confusing choice. What’s wrong with origin/source and destination/target? It’s also a bit counterproductive to put the identifier, which is the main point of interest, at the very end of it’s declaration statement.

Equal for assignment is just really an artifact of more relevant symbol like ← or ≔ because most keyboard layouts stem from disastrous design. But using an more adequate symbol is really pushing for unnecessary obscured notation.

Mandatory semicolon to end a statement is obviously also a typographical nonsense.

If a parameter is to be left blank in for, we would obviously be better served with a separate control-flow construction rather than any way to highlight it’s not filled in that employ.

So packing it all:

     duplicate as function ⟨
          requiring (
               origin as sign-pin-register,
               destination as sign-pin-register
          )
          making {
               save as sign-pin
               save assigned origin
               destination-pin assigned origin-pin until ( zeroized,
                    whilst [
                        origin-increment,
                        destination-increment
                    wrought ]
               done )
               return save
          made }
     built ⟩

Given that in that case the parentheses and comas are purely ornamental, the compiler could just ignore them and would have enough information with something like

     duplicate as function
          requiring
               origin as sign-pin-register
               destination as sign-pin-register
          making
               save as sign-pin
               save assigned origin
               destination-pin assigned origin-pin until zeroized
                    whilst
                        origin-increment
                        destination-increment
                    wrought
               done
               return save
          made
     built

Or even

     duplicate as function requiring origin as sign-pin-register destination as sign-pin-register making save as sign-pin save assigned origin destination-pin assigned origin-pin until zeroized whilst origin-increment destination-increment wrought done return save made built

yeasku

4 months ago

1 reply

Kind of a stupid take if you ever plan on sharing your code or using git.

kesor

4 months ago

2 replies

Is it? When every developer has an IDE that can easily format the code in whichever the way they prefer, and minify it back just before pushing a commit.

MaxLeiterAuthor

4 months ago

No reason websites couldn’t let you choose how to view it like editors, either

yeasku

4 months ago

The world is not javascript.

lxe

4 months ago

1 reply

> you could view the source however you wanted. Spaces vs. tabs didn't matter because neither affects the semantics and the editor on the system let you modify the program tree directly (known today as projectional editing).

But formatting still doesn't matter. Outside of whitespace-dependent languages, formatting is a subjective thing -- it's a people concern, not a computer concern. I can store my JavaScript as AST if I want to.

kmoser

4 months ago

1 reply

There are annoying edge cases where formatting does matter, such as whitespace around HTML text nodes, e.g.:

  <span>foo</span>

vs:

  <span>
    foo
  </span>

grumbelbart2

4 months ago

1 reply

Sure, but the same goes for verbatim strings, or leading whitespaces in python. "Code formatting" questions usually only concern themself with those degrees of freedom that do not alter the semantic meaning of the code.

kmoser

4 months ago

Agreed, but my experience is that many formatters will try to "help" by formatting non-code, thereby breaking something.

__MatrixMan__

4 months ago

2 replies

Unison doesn't move the formatting choices further than the machine on which the code was written. The codebase only contains the AST.

Its such a cool idea, though I haven't spent much time using it in anger, so its hard to say if its a useful idea.

wonger_

4 months ago

Yeah, if any language has potential for AST source of truth instead of textual source of truth, it's Unison.

I'm just waiting for a breakthrough project to show that it's ready for wider adoption. Leaving text-based tooling is a big ask.

The principles behind Unison, for those who haven't read them yet: https://www.unison-lang.org/docs/the-big-idea/#richer-codeba...

> Each Unison definition is identified by a hash of its syntax tree.

oftenwrong

4 months ago

Unison's immutable definitions also enable a bunch of compelling capabilities. No merge conflicts. Incremental everything: build, test, lint, distribution, rendering as formatted text, et cetera. Trivial to apply "hot" updates to running systems.

automatoney

4 months ago

1 reply

I've never understood why people care so much about the linter settings. It's so obviously bikeshedding, just make a choice, run the linter automatically and be done with it. I'm too busy doing actual software engineering to care about where exactly everything goes - I promise after a week you'll just get used to whatever format your team lands on.

jupp0r

4 months ago

2 replies

I generally agree, but max line length being so high you have to horizontally scroll while reading code is very detrimental to productivity.

forrestthewoods

4 months ago

3 replies

Define high? I think 120 is pretty reasonable. Maybe even as high as 140.

Log statements however I think have an effectively unbounded length. Nothing I hate more than a stupid linter turning a sprinkling of logs into 7 line monsters. cargo fmt is especially bad about this. It’s so bad.

skinner927

4 months ago

5 replies

I still prefer 80. I won’t (publicly) scoff at 100 though. IMO 120 is reasonable for HTML and Java, but that’s about it.

Sent from my 49” G9 Ultrawide.

Joker_vD

4 months ago

2 replies

Give a try to 132 mode, maybe? It was the standard paper width for printouts since, well, forever.

psychoslave

4 months ago

3 replies

Printing industry have not been anything close to forever, even writing is relatively novel compared to human spoken languages.

All that said, I'm interested with this 132 number, where does it come from?

dcminter

4 months ago

1 reply

Printers aside the VT220 terminal from DEC had a 132 column mode. Probably it was aping a standard printer column count. Most of the time we used the 80 column mode as it was far more readable on what was quite a small screen.

guenthert

4 months ago

1 reply

Not only a small screen by modern standards, but the hardware lacked the needed resolution. The marketing brochure claims a 10x10 dot matrix. That will be for the 80 column mode. That works out to respectable 800 pixel horizontally, barely sufficient 6x10 pixel in 132 column mode. There was even a double-high, double-width mode for easier reading ;-)

Interesting here perhaps is that even back then it was recognized, that for different situations, different display modes were of advantage.

dcminter

4 months ago

> There was even a double-high, double-width mode for easier reading

I'd forgotten that; now that waa a fugly font. I don't think anyone ever used it (aside from the "Setup" banner on the settings screen)

I think the low pixel count was rather mitigated by the persistence of phospher though - there's reproductions of the fonts that had to take this into account; see the stuff about font stretching here: https://vt100.net/dec/vt220/glyphs

bloak

4 months ago

The IBM 1403 line printer, apparently.

Joker_vD

4 months ago

"Since forever" as in, "since the start of electronic computing"; we started printing the programs out on paper almost immediately. The 132 columns comes from the IBM's ancient line printers (circa 1957); most of other manufacturers followed the suit, and even the glass ttys routinely had 132-column mode (for VT100 you had to buy a RAM extension, for later models it was just there, I believe). My point is, most of the people did understand, back even in the sixties, that 80-columns wide screen is tiny, especially for reading the source code.

balamatom

4 months ago

That's actually just weirdly specific enough to be worth a shot.

typpilol

4 months ago

1 reply

That's literally my setup everywhere. 120 for html/java/JavaScript and 80 elsewhere.

Really suites each language imo Although I could probably get away with 80, habit to use tailwind classes can get messy compared to 120

Cthulhu_

4 months ago

Caveat, my personal experience is mainly limited to JS/TS, Java, and associated languages. 120 is fine for most use cases; I've only seen 80 work in Go, but that one also has unwritten rules that prefer reducing indentation as much as possible; "line-of-sight programming", no object-oriented programming (which gives almost everything a layer of indentation already), but also it has no ternary statements, no try/catch blocks, etc. It's a very left-aligned language, which is great for not unnecessarily using up that 80 column "budget".

anilakar

4 months ago

1 reply

But a 49" ultrawide is just two 27" monitors side by side. :-)

account42

4 months ago

Better yet, its three monitors with more reasonable aspect ratios side by side.

16:9 is rarely what you want for anything that is mainly text.

forrestthewoods

4 months ago

Ugh. 80 is the worst. For C++ it’s entirely unreasonable. I definitely can not reconcile “linters make code easier to read” and “80 width is good”. Those are mutually exclusive imho.

What I actually want from a linter is “120, unless the trailing bits aren’t interesting in which case 140+ is fine”. The ideal rule isn’t hard and fast! It’s not pure science. There’s an art to it.

guenthert

4 months ago

Obviously 100 is the right choice.

https://en.wikipedia.org/wiki/Line_length#cite_note-dykip-8

setopt

4 months ago

2 replies

It’s tricky to find an objective optimum. Personally I’ve been happy with up to 100 chars per line (aim for 80 but some lines are just more readable without wrapping).

But someone will always have to either scroll horizontally or wrap the text. I’m speaking as someone who often views code on my phone, with a ~40 characters wide screen.

In typography, it’s well accepted that an average of ~66 chars per line increases readability of bulk text, with the theory being that short lines require you to mentally «jump» to the beginning of the next line frequently which interrupts flow, but long lines make it harder to mentally keep track of where you are in each line. There is however a difference between newspapers and books, since shorter ~40-char columns allows rapid skimming by moving your eyes down a column instead of zigzagging through the text.

But I don’t think these numbers translate directly to code, which is usually written with most lines indented (on the left) and most lines shorter than the maximum (few statements are so long). Depending on language, I could easily imagine a line length of 100 leading to an average of ~66 chars per line.

forrestthewoods

4 months ago

I don’t think code is comparable. Reading code is far more stochastic than reading a novel.

For C/C++ headers I absolutely despise verbose doxygen bullshit commented a spreading relatively straightforward functions across 10 lines of comments and args.

I want to be able to quickly skim function names and then read arguments only if deemed relevant. I don’t want to read every single word.

fmbb

4 months ago

> the theory being that short lines require you to mentally «jump» to the beginning of the next line frequently which interrupts flow, but long lines make it harder to mentally keep track of where you are in each line.

In my experience, with programming you rarely have lines of 140 printable characters. A lot of it is indentation. So it’s probably rarely a problem to find your way back on the next line.

layer8

4 months ago

2 replies

100 is the sweet spot, IMO.

I like splitting long text as in log statements into appropriate source lines, just like you would a Markdown paragraph. As in:

    logger.info(
        "I like splitting long text as in log statements " +
        "into ” + suitablelAdjective + " source lines, " +
        "just like you would a Markdown paragraph. " +
        "As in: " + quine);

I agree that many formatters are bad about this, like introducing an indent for all but the first content line, or putting the concatenation operator in the front instead of the back, thereby also causing non-uniform alinkemt of the text content.

saagarjha

4 months ago

1 reply

This makes it really annoying to grep for log messages. I can't control what you do in your codebase but I will always argue against this the ones I work on.

layer8

4 months ago

1 reply

I haven’t found this to be a problem in practice. You generally can’t grep for the complete message anyway due to inserted arguments. Picking a distinctive formulation from the log message virtually always does the trick. I do take care to not place line breaks in the middle of a semantic unit if possible.

saagarjha

4 months ago

1 reply

Yes, I find the part of the message that doesn't have interpolated arguments in it. The problem is that the literal part of the string might be broken up across lines.

bogomog

4 months ago

And to add to this, you rarely need to read a log message when just visually scanning code, its fine going off the screen.

maleldil

4 months ago

1 reply

Nitpick: this looks like Python. You don't need + to concatenate string literal. This is the type of thing a linter can catch.

layer8

4 months ago

It’s actually Java, where the “+” is necessary.

appellations

4 months ago

I forget there are people who don’t configure softwrap in their text editor.

Some languages (java) really need the extra horizontal space if you can afford it and aren’t too hard to read when softwrapped.

PaulKeeble

4 months ago

In theory we could have an IDE apply a reformatting to any piece of code we looked at and formatted any changes back to the standard for the code base on updates. One of the things I dislike is that sometimes autoformatting does a poor job and looses some information that manually formatting provides but honestly in go fmt is mostly fine it just works.

All of this seems doable, I just think for the most part we don't care very much about our preferences, it has very little impact on readability. Its definitely doable however we could view the code however we most wanted it and have it stored in a different formatting. Might not be 100% round trip stable but it probably doesn't matter.

There is always better where the defaults can be overridden and formatting forced and we only format new and changed lines to reduce potential instability but again go fmt doesn't really suffer from this so its possible to make things pretty reliable. Its simple really, there is a default formatting and the code is stored that way and we can then have our view of choice reformat the code as we want it, when its stored its stored in the default.

kesor

4 months ago

This is how Chrome Dev Tools shows source code. The original is often minified or in whatever format the author left it. And when you check the "pretty" checkbox in dev tools, it shows up using whichever format Chrome developers decided it should look like.

lordnacho

4 months ago

Aren't most projects these days written in a mix of languages, most of them text? You'd have to get them to change to use the same tools we currently use, or else you'd have to use special tools. The beauty of the modern stack is the base tools are near universal.

If you want everyone to see their own preference of format, either write a script or get AI to format it for you.

preommr

4 months ago

Others have already mentioned how why this is a bad idea (e.g. common plaintext tools don't work, added complexity, etc.)

But I'll also mention that this pretty much already exists. You can have whitespace options for git. I also imagine there's some setup using hooks that uses one formatter locally, and another for remote.

Also, the common IR already exists - it's just the AST. It was "solved" back in the day when people were throwing whatever they could to the wall to see what sticks since it was all so new. With the benfit of hindsight, I think we can say that it's not that good of an idea.

laserbeam

4 months ago

Reminds me of dion systems. A few years ago a group of devs was working on a programming environment that feels very close to what DIANA is describing.

The project is dead enough that they no longer own the TLD for the company. As far as I know, the only remnants of the project are youtube recordings of demos held at conferences.

banashark

4 months ago

Interesting read. I’ve often wondered why the projection we see needs to be the same as the stored artifact. Even something like a git diff should be viewable via a projection of the source IR.

With things like treesitter and the like, I sometimes daydream about what an efficient and effective HCI for an AST or IR would look like.

Things like f#s ordered compilation often make code reviews more simple for me, but that’s because a piece of the intermediate form (dependency order) is exposed to me as a first class item. I find it much more simple to reason about compared to small changes in code with more lax ordering requirements, where I often find myself jumping up and down and back and forth in a diff and all the related interfaces and abstract classes and implementations to understand what effect the delta is having on the program as a whole.

jmward01

4 months ago

I have gotten into discussions with people about linters and code formatting standards in general and I always liken it to a work desk. If my company decided that every work desk had to be 100% generic and that every day if I put any adjustment, even to the seat height, on that desk they would reset it, I would probably think that place was hostile. Even if I could 'auto format' it back to something close every time I stepped up to the desk I would be pretty unhappy. It just wouldn't feel like mine and eventually I would be beaten into whatever style, which wasn't my own, the code came out of the repo as. Basically, linters are evil. They only work for the person that set them up.

Leave code format up to the primary owner of the file. It is pretty rare that code has more than one person that does 95% of the edits on a file so let them own the formatting. In the rare case where there are shared files with shared edits then it is ok to mandate some sort of enforced format but those are so rare that it generally isn't worth discussing. The proposed approach here ignores all the messy non-standard stuff that happens because of the margins or the rules that are very hard to build in when codifying personal coding style.

Let me have my messy desk and I'll let you have yours.

crq-yml

4 months ago

I think the problem can be defined equally as: we can't invest in something more abstract than "plain text" at this time. When we try, it gets downgraded to a plain text projection of the syntax.

The plain text encoding itself exists in a process of incremental, path-dependent development from Morse Code signals to Unicode resulting in a "Gigantic Lookup Table" (GLUT, my coining) approach to symbolic comprehension. The assumption is useful - lots of features can "just work" by knowing that a particular bit pattern is always a particular symbol.

If we push up the abstraction level, we get a different set of symbols that are better suited to the app, but not equivalent GLUT tooling. Instead we usually get parsing of plain text as a transport. For example, CSV parsing. It is sloppy; it is also good enough.

Edit: XML is also a key example. It goes out of its way to respect the text transport approach. There are dedicated XML editors. But people want to edit it as plain text and they can't quite get there because funny-business with character encodings gets in the way, adding a bunch of ampersands and semicolons onto the symbols they want to edit. Thus we have ended up with "the CSV of hypertext documents", Markdown.

316 more comments available on Hacker News

View full discussion on Hacker News

ID: 45163043Type: storyLast synced: 11/20/2025, 8:09:59 PM

Want the full context?

Jump to the original sources

Read the primary article or dive into the live Hacker News thread when you're ready.

Open link View on HN