Syntax Highlighting Is a Waste of an Information Channel (2020)
Posted3 months agoActive2 months ago
buttondown.comTechstoryHigh profile
calmmixed
Debate
70/100
Code EditorsSyntax HighlightingProgramming Productivity
Key topics
Code Editors
Syntax Highlighting
Programming Productivity
The article argues that syntax highlighting is underutilized and proposes alternative uses for color in code editors, sparking a discussion on the effectiveness of different highlighting approaches.
Snapshot generated from the HN discussion
Discussion Activity
Very active discussionFirst comment
1h
Peak period
66
Day 5
Avg / period
17.8
Comment distribution160 data points
Loading chart...
Based on 160 loaded comments
Key moments
- 01Story posted
Oct 12, 2025 at 8:48 PM EDT
3 months ago
Step 01 - 02First comment
Oct 12, 2025 at 10:00 PM EDT
1h after posting
Step 02 - 03Peak activity
66 comments in Day 5
Hottest window of the conversation
Step 03 - 04Latest activity
Oct 22, 2025 at 12:01 AM EDT
2 months ago
Step 04
Generating AI Summary...
Analyzing up to 500 comments to identify key contributors and discussion patterns
ID: 45563576Type: storyLast synced: 11/20/2025, 7:40:50 PM
Want the full context?
Jump to the original sources
Read the primary article or dive into the live Hacker News thread when you're ready.
since it take me effort to actually parse the colors, this is a constant distraction.
so I can read monochrome text just fine, but multi-colored text really slows me down.
Panic's "Nova" does this. It lets you pick your palette for the parenthesis from about 30 choices. It also adds vertical lines along the left edge to show indentation level and lets you choose from the same palettes. There's three at the bottom of both palette lists designed for protanopia/deuteranopia/tritanopia.
I can't tell some apart, but because they've got a different colour in-between, it makes it easier to jump between start and end of expressions. Being able to box blocks in my head faster.
That being said, I always have to tweak accessibility settings anyway. Change of font, change of size. Having to toggle off rainbow as well doesn't seem to really add to the large list of things.
I would actually prefer the opposite. Render the () characters as different matching glyph pairs. The space for distinctive asymmetric glyphs is a lot larger and not generally very loaded because people code 99.9% in ascii.
Interesting. But how many matching glyph pairs do we actually have in current fonts?
What editors like VSCode do instead is to highlight the matching parenthesis when the cursor is over the other. That way you don't have to hunt for a matching color.
And to make any unmatched opening/closing parenthesis bright red. The invalid state is made very clear.
Together, these are much more effective.
I've used the Kate editor for years, it has a short list of strings that are auto-hilited... and I use frequently. If only I could edit and to that list ... wherever it's located!
If only there were a way I could highlite -one- string, and then use a single key to move from that instance to the next!
And hlsearch isn't needed at all; that just controls whether or not the word + its search results get highlighted.
Other things it can show you via highlighting:
1. Bugs. The online static analysis will highlight code likely to be in error.
2. Dead code. It's rendered in grey.
3. Code that won't execute in this debugger session. Same.
4. Identifiers you chose to temporarily highlight.
5. Mutable vs immutable variables. Also: mutable variables that are never actually mutated.
6. The assertion that failed in the last unit test run.
You can also create your own smart highlighters using semantic search (it's sort of a grep for ASTs).
And a gazillion more. People still using plain programmer text editors are missing out on a lot of features.
From Wikipedia: "Coloring in colorForth has semantic meaning. Red words start a definition, and green words are compiled into the current definition... Yellow words are executed. The transition from green to yellow and back again can be used while defining words, to transition between compiling words into the current definition, executing words immediately (manipulating the data stack during compilation), and back again (adding the top of the data stack to the current definition) – in other words, precomputing a value during compilation (a functionality that other languages use macros or optimizing compilers for)"
If you're defining 'SWAP', you type it in red. If you want to execute it immediately, type it in yellow. If you want it to be compiled and executed when the code is ran later, type it in green.
All that said, I'm one who appreciates information density! How about coloring branching code paths/call stacks?
My keyboard has a concept of "layers," which allows each key to map differently depending on the layer. I've seen this used to make a numpad or to have a QWERTY and DVORAK layer. What if highlighting was the same? Instead of competing for priority over the color channels, developers could explicitly swap layers?
The downside with broken syntax highlighting (and electric-indent!) is when the editor's parser is insufficient, as is often the case with basic online editors, and breaks with legitimate constructs (Emacs with certain C macros). Then I can't trust the highlighter and also I have less-legible code.
These days I rely on clangd driven autoindent (which is fast enough to do every line), but I still use emacs because it is so easy to tweak the interaction to clangd to work exactly as I prefer.
I was thinking about coloring logic /scope blocks as a way to help visualize scope and flow, even if it required static analysis and a simple script it could be useful when I need to debug
I agree with the broader point of the article that color is underused, but the state of the art has moved way past what the author’s tools are currently configured to provide.
The author seemed to be unfamiliar with tree-sitter (first appeared in 2018) and incorrectly assumed Atom used TextMate.
Since then it's gotten much more popular and adopted by other editors.
Have I missed an important development in tree-sitter? Can it now do such things?
I used to reply that the color pens made it easier to keep context such as what teacher said was important, what I found difficult, when in the note I had an "aha!" moment, side comment from me, Q&A asked by student during lecture, or how certain things written down now is related to the point made earlier/later in the lecture/notes.
Text (note) is the content but our (at least mine) attention are not really made for plain text. There's so much more you can play with visual information.
As an example, imagine writing an eyewitness report about something that happened. There are facts that can be backed up with primary sources, then there is one's own hearsay. The hearsay might be important for telling the whole story, however, it is only hearsay. It has less value than something that can be proven with other sources. So maybe there is a suitable colour for that.
I can mark up my HTML with articles, sections, asides and the rest of it, so that everything is very well structured, and yet none of that is visible - I might as well go for a 'sea of divs'. If I mark HTML up with classes, for example 'hearsay', then I can use that to do some colour things.
Where I speculate or advance a hypothesis, I can use different colours again.
Same with too much detail, I can put that in a colour that can make it easier for the reader to skip, furthermore, I can put that behind details/summary elements.
I need not tell the reader what the significance of the colours amounts to, however, if done well, it can be made to work without the reader knowing why, or needing to care about that. There is going to be an art to this, and pulling it off will require work, but I think it is fully doable.
I had best get busy!
I’ve often thought about stealing some of these techniques for note taking as it would be really useful to have extra context without having to write it all out longhand.
Check out lojban attitudinals for one example along these lines if you are interested. There are natural languages that do this as well but I can’t remember any specifics off the top of my head
To solve it we need to be able to describe the structured content of a document without rendering it, and that means we need an embedding language for code documents.
I hope this doesn't sound overly technical: I'm just borrowing ideas from web browsers. I think of my project as being the creation of a DOM for code documents. The DOM serves a similar function. A semantic HTML documents has meaning independent of its rendered presentation and so it can be rendered many ways.
CSTML is my novel embedding language for code. You could think of it like a safe way to hold or serialize an arbitrary parse tree. Like HTML a CSTML document has "inner text" which this case is the source text if the program the parser saw. E.g. a tiny document might be `<Boolean> 'true' </>`. The parser injects node tags into the source text, creating what is essentially the perfect data stream to feed a syntax highlighter. To do the highlighting you print the string content if the document and use the control tags to decide on color. This is actually already how we syntax highlight the output from our own CLI as it happens. We use our streaming parser technology to parse our log output into a CSTML tag stream (in real time) and then we just swap out open and close node tags for ANSI escape codes, print the strings, and send that stream to stdout.
Here's a more complicated document generated from a real parse: https://gist.github.com/conartist6/412920886d52cb3f4fdcb90e3...
Technically right now BABLR turns text into parse trees but it doesn't render the trees, so it doesn't have any firsthand concept of styling. If you print the content of a CSTML document to the terminal, you'll have to style it with ANSI codes. If you want to print the document to a web page, you'll have to style it with CSS. Right now we leave that part as an exercise to the user. The tree has the data needed to achieve any of the results you suggest, and as time goes on we will do better at providing higher level APIs that make it really easy to implement those kinds of code-semantic styling rules
We also think that at some point in the future we could run Tree-sitter grammars without first compiling them from JS to C or wasm.
Our major innovations over Tree-sitter are scripted grammars (no compile step), streaming parsing, and the idea that we are a standalone complete source of truth for an IDE, where Tree-sitter only wants to be half the story: it expects to sync with a text buffer where the text buffer is the source of truth.
So that XML-like tree would become the source of truth?
> where Tree-sitter only wants to be half the story: it expects to sync with a text buffer where the text buffer is the source of truth.
There are probably a ton of reasons for this, e.g. 1) The source of truth at the file system level is actually the bare text. 2) Performance reasons. 3) Stuff like git diff is easier to implement.
Because we're designing a new and radically simpler IDE we'll just skip the part where the source of truth is bare text on disk (i.e. the git working tree).
We still will be able to read and write flat text files from disc if the user needs to, but our reason for being is to see what kind of good things we could make happen if cut out the middleman and make the IDE's state/history (i.e. undos) and the VCS state/history one and the same. To that end our in-memory representation of the CSTML trees is a reference-immutable btree with efficient copy-writes through deep structural reuse.
I would explicitly push back on one idea though, which is that this our approach is monolithic. Yes our tech stack looks foreign from the outside (and it is) but inside it's quite nicely broken down into different layers and libraries with well-differentiated responsibilities. The core of the IDE is so incredibly lightweight that we embed it into our blog posts to parse and syntax highlight our code examples. That gives you a little hint of how we intend to get the tech into the hands of a lot more people. We intend to be able to give them a whole IDE that runs effortlessly in their web browser!
[BEGIN: talking way beyond my ken] On the other hand, maybe you can make the case that, in aggregate, compute time and memory are saved by having this consolidated tree (rather than having each IDE feature make its own special-purpose tree-like structure). However, aggregate savings probably don't always help--I'm thinking raw editor latency (particularly in larger/more complex/more error-ridden contexts). Then maybe you're reduced to hacking around latency via stuff like optimistic updates, fudging transactions on your tree structure, or whatever.
Yeah, this is the sort of thing I was trying to gesture at when I mentioned "fudging transactions." But if you don't actually have to fudge stuff and can do proper partial evaluation, that's super cool! Presumably it's not too hard to build synchronization on top once you've got that nice foundation.
It's super that you're a UI person who wants things to feel fast! Imo that gives you a big advantage in terms of design sensibility compared to someone who's more deeply a data structures person. Best of luck!
Before that, I completed arguably the 2nd largest syntax highlighting: ISC Bind9 (most versions)
https://github.com/egberts/vim-syntax-nftables
https://github.com/egberts/vim-syntax-bind-named
My secret weapon was using a smaller highlighted syntax to project even faster completion of these larger syntax tree: EBNF
http://github.com/egberts/vim-syntax-ebnf
The real magic trick is that I used S-expression to pull up all the first-encounter/deeply-nested keywords touched to its Vim syntax 'nextgroup=', and region block-offs.
Basically said, I complied complex EBNF into Vimscript zeal and need for pure-deterministic LL(1) syntax tree. (Vim regex is weird, must order by largest static pattern first to most wildly wildcard pattern lastly within single regex string).
Rainbow nest braces, command/statement/keyword/unit/integer coloring.
For my next trick, I need to determine which route to go next (maybe HN can help me here).
- JetBrain's properitary LSP
- VSCode textmate LSP?
- treesitter
- or something more LSP mainstream, if any.
Kinda disappointed that there is no holy grail for both syntax-highlighting and autocompletion.
Was looking forward to adding hint-hover as well.
The first thing you do in a JetBrains language is to write a lexer and parser for the target language. Your parser produces a syntax tree containing enough information to reconstruct the original document, and the IDE then operates on this semi-abstract syntax tree. When the IDE saves a file, it re-generates the contents from the semi-AST.
JetBrain's products are best understood as a refactoring engine (their original product) skinned with an editor.
Also just polished the EBNF to S-expeession.
So a common glue (via 6 schema) has been found for VCode, LSP, JetBrains and Vimscript.
with colors this is perfectly readable, because if, for and return appear in red, and other keywords in blue. so they stand out, making the structure more visible than without colors.
without colors i might prefer to write something like this. using braces around each block, and line breaks, to make each part stand out. without colors clearly the second is easier to read.github uses a different color scheme but maybe you can get the idea:
https://github.com/pikelang/Pike/blob/fe4b7ef78cc26316e62e79...
I've never seen that language. Looks C like (e.g. sizeof), but seams to have a harder type system.
A bit of a quibble, but the information is already "in the code", even when viewing it in a simple text-editor.
What you're describing is redundant display of information in the raw text, as opposed to a different redundant display the IDE does automatically on the fly when needed.
Sometimes that's useful, and other times it harms comprehension, like when you have dozens of integer variables sharing a prefix like `i_apples` and `i_pears`. An IDE lets you switch presentation modes without making changes.
Also languages are already redundant in a lot of places, because humans prefer it that way. For example you wouldn't really need to have types in a function signature, because it is already in the declaration. I think a lot of '(' or ',' could be omitted without it becoming ambiguous.
For source code, that suggests a spectrum between:
1. Important information is declared in just one spot, and it is difficult to make anything inconsistent. You rely heavily on analysis tools (IDE) to provide combined views of information in useful places so that you don't need to keep tracing back to the single source of truth.
2. Important information is duplicated in different areas, where any mismatch between primary and secondary sites makes potential for corruption. Tools are used heavily to block corrupt states from being loaded, or to "repair" bad secondary sites, and to recheck relationships to avoid saving corrupt states.
Personally I feel erring towards #1 is better: While unassisted-reading of the code becomes marginally harder, it doesn't ruin the ability to make unassisted-changes of the code.
Also I think it's a bit off the mark to think of it as being a wasted information channel. Redundancy is a feature of human languages, because our languages are not optimizing solely for density. A bit of redundancy helps our brains pattern match more consistently, almost like a form of forward error correction. Syntax highlighting is like that, at least for me, where it makes a big difference in seeing the structure at a glance, and more overly complex coloring rules thwart that for me. Like I don't want to be trying to match up rainbow shades of parens.
Every example past that was just worse for readability. I think you're right about density not being the only important metric here.
So, not much different than a search for regular expressions or a "show definition" tooltip
One of the big problems with a lot of the examples here is that, well, I spend most of my time on multimillion line codebases. If you want to pop stuff out to me, showing it in a different color is useless because it's not on the same screen; no, the way you give it to me is a macro that takes me to the next location of the thing of interest. And with a macro that lets me move to points of interest... the use of color is entirely redundant.
These days, I'm using very minimal highlighting (doric-themes [1] which is basically shade of one color and font-weight). I prefer to separate semantic units of code by whitespace. Then scanning becomes quick.
[0]: https://melpa.org/#/dumb-jump
[1]: https://elpa.gnu.org/packages/doric-themes.html
i feel like HN might be the only place i can explain the "AI syntax highlighting" angle and have somebody get it. the Codemaps UX isnt exactly tuned for the exact form factor of syntax highlighting, but the general idea of "hey you can selectively highlight code based on what you're currently trying to do, and btw also reorganize your filesystem accordingly" is kinda cool and would love ideas on how to best express it.
[1]: https://github.com/willcrichton/flowistry
Aliasing something to make it easier to M-x sounds like the next step after running out of letters to use with C-c? I have not reached that point yet, but that was another thing I never considered that could be useful to remember.
That is a big "unless". We aren't all working on pristine code written to high standards…
I'm surprised my comment was downvoted so bad for making an astute observation. It happens to me all the time. A lot of the users of this site suck.
https://news.ycombinator.com/newsguidelines.html:
"Please don't fulminate. Please don't sneer, including at the rest of the community."
"Please don't comment about the voting on comments. It never does any good, and it makes boring reading."
But since it's irritating you I had a quick look at your comment history.
I would say that the downvotes are because a high percentage of your comments are argumentative, and occasionally just flat out insulting. Within that your tone is often condescending and dismissive.
But that says nothing about providing the same value to the user in other languages via different means.
But that says nothing about whether it actually can exist.
Huh, where am I making that assumption?
> But that says nothing about whether it actually can exist.
Yes, I expressed no opinion on that.
You didn't, and you didn't express any opinion on whether or not it can exist, either - GP is arguing dishonestly to shill Rust.
There's no need to unleash a fuzzy and inefficient network on something designed to be deterministic and parseable.
And I said the same as you but that also it could highlight on more abstract things that aren't in the parse tree, I meant things like feature-relatedness etc. These variables are only used for intermediate logging stuff and get greyed out etc.
Eclipse CDT has a real-time static analyzer for the code you're working on though. It's not as naive as it looks from a distance.
In other words are the two pointers actually pointing to the same variable? There's no way to know. When you select one pointer, should the other also be highlighted?
But in Rust you cannot have two mutable references to a single variable so the above cannot happen.
Variable renaming is a much much simpler task than this. Of course it is archaic and has existed for a long time.
This statement is incorrect. You can't have pointers to a variable. You can have pointers to a memory address, and most people would be fine with a tool that is correct up to the level of being unable to differentiate that.
This is a disingenuous redefinition of what the GP is looking for in order to shill Rust.
It just highlights parts that interface with a symbol, right? And that's information the language server is supposed to expose.
You'd "just" have to make a x+1 mapping to highlight anything that's touching the symbol
To be clear, I'm just speculating here and not speaking with authority on the topic. It's possible I'm missing something which makes this approach infeasible in practice
The information channel is shareable.
It then goes on to talk about rainbow parentheses, which is information multiplexed on the color channel with the syntax highlighting.
https://www.janestreet.com/tech-talks/rust-for-everyone/
As I recall, he also explains how Rust is uniquely positioned to enable this kind of syntax formatting.
Give me proper indentation and the languages use of parenthesis, semicolons, etc and I'm good and I can find everything I'm looking for
Color could tell you whether your
has the right number of quotes and unquotes to refer to the correct evaluation level/context where var exists.In run of the mill blub code not doing any metaprogramming, indentation is more than visible enough for indicating levels of nesting.
Years ago I used to work on C++ code in a commercial editor called Source Insight [1]. At its default settings, it would do things like:
1. Show function and class names in HUGE fonts in declarations and definitions, so you always knew what was a declaration as opposed to a use
2. Show nested parentheses with the outermost parens being biggest and getting smaller as you got further in
3. Show comments in a proportional sans-serif font instead of a monospaced one so that you could tell where the comments were even if you have color blindness
Those features, along with having a C++ parser and code relationship visualizer much faster than the Visual Studio of the day without having to parse ahead of time (a la ctags), made Source Insight a near standard in my company. I still miss it on occasion.
[1] https://www.sourceinsight.com/feature-details/
Now I see it definitely made sense.
Re: comments in proportional font; while it's an interesting way to highlight them, the problem is that then precludes you from using ASCII art to diagram things in comments (or from reading such diagrams in existing codebases where they exist).
So many ways to focus attention and highlight related areas, but so few IDEs that do anything about it...
They are in my JetBrains IDE, IntelliJ. Just use proper tools instead of toy text editors? It's free and Open Source even.
(Some are difficult to implement with current systems, but there are ways to design systems that they do work (possibly with a IDE rather than just a ordinary editor).)
That way the colour can be defined at write time, languages don't need to implement them, they can be like whitespace and you can use color for whatever you want. Colour would be ignored by both compile and runtime of course... unless it wasn't
For example, in a loop body, putting the cursor on a "continue" statement will highlight all other loop control statements along with the "for" statement the pointed at "continue" is associated with. This helps massively.
Shame the author missed this.
To some extent syntax highlighting distinguishes bad syntax from good; it can be configured to flag deviations from token-level or even grammar-level syntax. For instance, not closing a string literal, or bungling a numeric token with nondigits.
In all the syntax that is good, it mainly just assigns colors to lexical categories, which have a role within syntax.
It does that while retaining some channel capacity for other functions,like matches for regex searches still being highlighted, and visual selections of text being highlighted. (The channel is not exclusively occupied by syntax coloring to the point that no other information can squeeze in there concurrently.)
https://snapwiki.miraheze.org/wiki/Snap!
>Zebra Coloring: This feature, introduced in BYOB 3.0, is representative of the careful attention to the user interface in BYOB/Snap!. When same-color blocks are nested, it's hard to see the borders between them. Zebra coloring assigns two colors to each palette category, the normal color and a lighter color. When same-color blocks are nested, the outermost one has the normal color, and an inner block takes the opposite color from the one just outside it. The text inside light color blocks is black, instead of white as usual.
Snap Manual, Page 11:
https://snap.berkeley.edu/snap/help/SnapManual.pdf#page=11
>The round block rounds 35.3905… to 35, and the + block adds 100 to that. (By the way, the round block is in the Operators palette, just like +, but in this script it’s a lighter color with black lettering because Snap!alternates light and dark versions of the palette colors when a block is nested inside another block from the same palette:
>This aid to readability is called zebra coloring.)
Snap Manual, Page 150:
https://snap.berkeley.edu/snap/help/SnapManual.pdf#page=150
>Non-goal: Emulate the terse APL syntax. It’s too bad, in a way; as noted above, the terseness of expressing a computation affects APL programmers’ sense of what’s difficult and what isn’t. But you can’t say “terse” and “block language” in the same sentence. Our whole raison d’être is to make it possible to build a program without having to memorize the syntax or the names of functions, and to allow those names to be long enough to be self-documenting. And APL’s syntax has its own issues, of which the biggest is that it’s hard to use functions with more than two inputs; because most mathematical dyadic functions use infix notation (the function symbol between the two inputs), the notion of “left argument” and “right argument” is universal in APL documentation. The thing people most complain about, that there is no operator precedence (like the multiplication-before-addition rule in normal arithmetic notation), really doesn’t turn out to be a problem. Function grouping is strictly right to left, so 2×3+4 means two times seven, not six plus four. That takes some getting used to, but it really doesn’t take long if you immerse yourself in APL. The reason is that there are too many infix operators for people to memorize a precedence table. But in any case, block notation eliminates the problem, especially with Snap!’s zebra coloring. You can see and control the grouping by which block is inside which other block’s input slot. Another problem with APL’s syntax is that it bends over backward not to have reserved words, as opposed to Fortran, its main competition back then. So the dyadic ○ “circular functions” function uses the left argument to select a trig function. 1○x is sin(x), 2○x is cos(x), and so on. ‾1○x is arcsin(x). What’s 0○x? Glad you asked; it’s √1 − W!.
I didn't actually find the uncoloured circle to be that much harder to spot.
Before looking at the concrete examples, I thought:
> Understanding the syntax similarly isn't hard scanning over the code. But if the words were coloured in a way that didn't correspond to syntax, I expect I would find it distracting, in the same way as those experiments with colour words written in a different colour from what the word indicates. Trying to make use of that second information channel isn't necessarily a good idea.
> I like the aesthetic of syntax-coloured code. But also, it's reassuring. I'm not using it to help understand the syntax, but to confirm that there's software in place that understands the syntax the same way I do.
After: the rainbow parentheses honestly don't help that much, not for a language like LISP anyway. The context highlighting seems to work much better. But then, as I go on through the examples... these are all really doing the same kind of thing that syntax highlighting does! They're about the structure of the code. And I'd have to shift my expectation, but once aligned, it's again something I'd perceive the same way — as something to confirm my understanding rather than eliciting that understanding.
Perhaps being able to switch between colouring modes would change that, but I don't know how quickly I could get used to that technique. And then, the kind of code where most of these things would help is smelly anyway. I already try to avoid this kind of nesting. Maybe import and argument highlighting (which could be used at the same time, along with highlighting for class attributes and closures in Python?), though...
It also highlights shaded variables (in Go), private/exported functions, class members, package names, etc.
This kind of highlighting as a secondary information channel for compiler feedback is great. Color, weight, italics, underlines - all help increase information density when reading code.
One thing I am still asking for: I want to be able to clearly and obviously distinguish mutable state from immutable state. IntelliJ can't do it yet.
1. Performance 2. Developer ergonomics 3. Syntax highlighting 4. Semantic highlighting 5. LSP
that's a lot.
The only language I can think of is Assembly :-)
Neovim can use treesitter for it.
- Making variables, functions,... grey if they're not used - Temporarily coloring variables + their usages when the cursor is on one - Showing wiggled red underlines for syntax errors
Especially the unused code feature is wonderful.
Also most modern IDE’s already contextually highlight usages of what you’ve selected too
The colour channel is being well and truly used to close to it's maximum.
¹https://www.cs.cmu.edu/~emc/15817-s11/lect-2011-03-23.pdf
Here is another use case i found helpful for html: i use a different, dimmer color for the closing html tags.
I now only use my own tree-sitter syntax and my own neovim highlight colors, i literally stripped all the defaults.
I started writting code with literally white text on a black background, and i only added new highlight groups when my brain really needed it. This way every color becomes deliberate and i gradually add in just what i need
11 more comments available on Hacker News