Hacker News Leaderboard
gally.netKey Features
Tech Stack
Key Features
Tech Stack
The em-dash used to be a slightly snooty way for Mac users to announce themselves. Sad that the polarity of perception has reversed.
I’ve been typing em-dashes since I got my first MacBook in 2006 and I’m not going to let the AI companies take my beautiful punctuation away from me.
Feature request 2: Em-dash regular-dash ratio.
What's a “regular dash”?
Hyphen-minus (which isn't even a dash at all)? En-dash? Figure dash?
Compose-minus-minus-minus in X
It's one of the long-press punctuation marks on Android
Option-shift-minus on Mac
And the continuations… Honestly? They'll never <|im_end|>.
// • Chronic option-dash and option-shift-dash user, option-[ or option-shift-[ as well as option-] and option-shift-] — not to mention option-8 and option-; …
Idk, working in the AI space, I've started to write very succinctly and straight to the point, maybe as a counterweight to the often overly flattering, verbose forms of prose that the LLMs employ. I pay close attention to every word and try to never write more than is necessary.
But what if need more words for complicated idea?
Short message easy if just 'orange man good' or 'orange man bad' but what if want to explain reason also? Dumb down? What if discussion too dumb already?
But I'm not on a Mac right now so I don't know how to even make a real one at the moment other than that LaTeX method.
I guess on Windows it's Alt+0,1,5,1 on a numpad. Or you copy+paste from Character Map.
I find it a bit sad that using proper typography is now frowned upon, but it seems that ship has sailed.
But British usage – instead – uses spaces, so an en-dash or an em-dash is acceptable.
https://github.com/andrewaylett/aylett.co.uk/blob/d338d35a3d...
Writing and publishing style guides like Hart's Rules (Oxford Style Guide) & Chicago manual of style have the 'em' dash use as a parenthetical closed or "no spaces" dash.
In British use – Hart's Rules – writers will choose the 'en' dash with spaces as a parenthetical dash, where US writers/publishers choose the closed 'em' dash for the same thing.
Imo, there is a conflation of 'en' dash and 'em' dash going around due to the ease of smart-dashes auto-correction turning (--) into 'em' dash with the 'en' dash and non-auto-correct 'em' dash needing a key-combo.
Common everyday typing online, I think people will simply use what is convenient and "good enough" -- a single hyphen dash as an 'en' dash or 2-hyphen dashes that may or may not auto correct into an 'em' dash. I prefer mixing spaces with a 2-hyphen dash 'em' dash, but I'm not a published writer so I enjoy doing wild things like that
≤ ≥ ≠ × — – “ ” ’ ° … ¹ ² ³ ™ • ♣ ♢ ♡ ♠
If you work in languages other than English but have a standard English keyboard layout, a compose key is handy for typing accents and non-English letters/ligatures too.
(Also provides access to the Greek alphabet.)
[0] The AltGrDead variant just means that the regular dead keys on the US Intl are flipped; e.g. ' is now no longer dead per default: I have to hit altgr+' to make it dead (i.e. an acute accent (´)).
What's needed is a writing comparison before/after 2022 for these users. If there's a sudden 200% increase in the use of em-dashes from one month to the next, it's a very strong indicator that the user started LLMing their posts.
Whether this is interesting or not, well…
J/k:)
https://console.cloud.google.com/bigquery?p=bigquery-public-...
Click on the `+` (white over blue background) in the tab bar at the top that says "SQL query" on popup, and type the following (I use the GoogleSQL pipe syntax (https://cloud.google.com/bigquery/docs/reference/standard-sq... / https://news.ycombinator.com/item?id=41347188) below, but you can also use standard SQL if you prefer):
FROM `bigquery-public-data.hacker_news.full`
|> WHERE type = 'comment' AND timestamp < '2022-11-30'
|> AGGREGATE COUNT(*) AS total, COUNTIF(text LIKE '%—%') AS with_em GROUP BY `by`
|> EXTEND with_em / total AS fraction_with_em
|> ORDER BY fraction_with_em DESC
|> WHERE total > 100 AND fraction_with_em > 0.1
(I'm in place 47 of the 516 results, with 0.29 of my comments (258 of 875) having an em dash in them.)Edit: As you also asked about timestamps:
FROM `bigquery-public-data.hacker_news.full`
|> WHERE type = 'comment' AND timestamp < '2022-11-30'
|> EXTEND text LIKE '%—%' AS has_em
|> AGGREGATE
COUNT(*) AS total,
COUNTIF(has_em) AS with_em,
MIN(timestamp) AS first_comment_timestamp,
MIN(IF(has_em, timestamp, NULL)) AS first_em_timestamp,
TIMESTAMP_SECONDS(CAST(AVG(time) AS INT64)) AS avg_comment_timestamp,
TIMESTAMP_SECONDS(CAST(AVG(IF(has_em, time, NULL)) AS INT64)) AS avg_em_timestamp,
GROUP BY `by`
|> EXTEND with_em / total AS fraction_with_em
|> ORDER BY fraction_with_em DESC
|> WHERE total > 100 AND fraction_with_em > 0.1
for most people the average timestamp is just the midpoint of when they started posting (with em dashes) and the cutoff date of 2022-11-30, and the top-place user zmgsabst stands out for having started only in late January 2022.On the other hand, I don’t think o3 was ever a common choice among people copying from LLMs, so en dashes remain infrequent regardless.
Edit: And here’s me using fancy curly quotes. Maybe that’s an AI signal as well?
It’s an iOS vs. Android signal.
- *Less formatting*: I don't start every bullet point with bold text
- *Varying structure*: I don't start each list item with a one or two word summary, followed by a longer description of what I mean
- *Mobile differences*: I actually only use em dashes on my phone, since it's easy to type on Android, but I refrain from their use on desktop.
Very effective way to summarize reports, recommendations, or analysis. IME well-received and appreciated by those consuming complex info for the first time.
Still love the style, though one does need to soft-shoe it so as to not scream "this is LLM copypasta!"
A Vibe is not a Function—
Yet—how it compiles so—
An unseen kind of Language—
That only Coders—know—
Is the amount of em dashes counted or the comments that have at least one em dash inside them?
You know, I am asking for...science(?).
I also wanted to point out that these could be Kantonese/Mandarin/Japanese/SouthEast Asian users that use their local keymapping software because a lot of them use the idiom symbols (e.g. the dot character, too) when they switch to the English keymaps.
Check out how laptops usually look like over there, a lot of manufacturers build that right into the firmware.
Otherwise it looks like the "race" is biased towards just the amount of comment posted.
Probably some autocomplete related software release.
https://daringfireball.net/2018/02/ios_messages_smart_punctu...
A lot of symbols can be accessed with Alt Gr compared to Windows
- you can’t make a ?.. or !.. with it
- the spacing between the dots is awful in a lot of fonts
- it is hideous in monospace
- typing ellipsis properly is a very easy gesture (triple-tap the dot key), arguably easier than Alt Gr + . (depending on the keyboard)
But an ellipsis is separate from and doesn't mmerge with sentence-terminal punctuation, whether its a period or somethig else (when it replaces words at the end of a sentence, the terminal punctuation follows the ellipsis, when at the beginning of a sentence that follows another, the ellipsis follows the punctuation.) The constructs you say can't be formed with it aren't needed.
Meanwhile there are a lot of languages and cultures. Somewhere all those characters were useful for something. My Atari had a very fun utility that gave you a compose-key that could combine just about everything on the keyboard to access all those weird characters of the extended ascii table. <compose>+ao would give you "a" with a ring on top (å), <compose>+ae gave the danish welded together character that I can't even type any more on windows.
The idea came from some unix thing I believe.
Compose ` e produces è
" a produces ä
v s produces š
v S produces Š
a e produces æ
C = produces €
l - produces £
- > produces →
( 1 ) produces ①
^ 1 produces ¹
_ 1 produces ₁
1 8 produces ⅛
- - - produces —
- - . produces –
. . produces …
. - produces ·
| - produces †
| = produces ‡
" < produces “
x x produces ×
m u produces µ
> = produces ≥
See /usr/share/X11/locale/en_US.UTF-8/Compose for the list and https://en.wikipedia.org/wiki/Compose_keyI have also configured Shift+Compose to send the code 'dead_greek' using ~/.Xmodmap:
keycode 135 = Multi_key dead_greek Multi_key Multi_key
Then I can type α, β, γ, Δ, Ε, Ζ easily, although I hardly ever need this nowadays.More generally any measurable feature of writing that underwent a significant change in frequency around that time would be interesting to look at. Looking at frequencies across the entire post dataset would suggest likely candidates, which individual people could then be tested against. There would be lots of confounding factors and red herrings though -- like the word "ChatGPT" itself!
You can't write CO₂ or m², use a fraction like ½, claim © or mention a price in Euros or Pounds Sterling.
You can't even write major American place names (San José, Oʻahu).
But anyway, I agree: there's no reason plain text shouldn't be rich.
(Like now)
It’s become a weird kind of witch hunting regarding blogs, too, and I have a 20+ year old site that renders all of its content using Markdown extensions that do the same (and that also convert dual hyphens to em dashes—something I’ve been typing for about as long).
On my desktop, the two hyphens remain literal. But on iOS, it turns into an em dash I think. Although it seems like I get the smart quotes more often than the em dash
I remap a key to the right of Space to Compose, and add various custom sequences. Before long, I was completely comfortably and casually typing dashes and curly quotes and more, and in fact it takes conscious effort for me to limit myself to ASCII when typing prose. (Writing code, writing *, /, -, ' and " is easy. But writing prose, I genuinely will write ×, ÷ if it feels the right one in that place, −, ‘/’ and “/”.)
On one previous laptop keyboard I mapped Menu, on my current one RAlt is more suitable.
When on Windows, I use WinCompose. On Linux, I used to just use it bare, which had advantages and disadvantages—apps implement a Compose key inconsistently, some messing things up related to includes and some handling overlapping sequences differently. More recently I wanted to be able to type Telugu and installed fcitx5 which is no longer mostly broken under Wayland like it was last time I tried, so now fcitx5 is handling the Compose sequences across the entire system, and working more consistently. Also I can use Ctrl+Alt+Shift+U and get a popup where I can search Unicode by code or description. Now if only that pesky popup would handle Shift+Space and Ctrl+Backspace itself rather than letting them fall through to the parent…
In my ~/.config/sway/config:
input * {
xkb_options "caps:backspace,compose:ralt"
}
(caps:backspace isn’t entirely relevant here, but it’s on the same line and I choose to mention it. When people are remapping Caps Lock, I’ve never understood why so many seem to choose to make it Escape. Just extend the left hand and slap the corner of the keyboard with the ring finger, it’s not a huge movement and is easy to reach and return. Backspace, however, tends to be needed at least as often (and yes, I say that despite using Vim), and is much harder to hit. In my mind, a far better candidate for shifting to that prime real estate.)For my ~/.XCompose, I start with the defaults and one good set of additions, https://raw.githubusercontent.com/kragen/xcompose/master/dot...:
include "/usr/share/X11/locale/en_US.UTF-8/Compose"
include "/home/chris/.XCompose-kragen"
Then I add all kinds of additions. Lots of fine typography stuff like zero-width space and non-joiner, narrow no-break space, thin space… a few more hyphen/dash mappings… and lots of other things like nice emoji sequences, music notation stuff, Greek letters matching Vim digraphs, superscript ordinals (ˢᵗ, ⁿᵈ, ʳᵈ, ᵗʰ), the keyboard shortcut symbols macOS uses (⌘⌃⌥⇧⌫ and another dozen less common ones), control pictures like ␆, and a handful of other things.When all’s said and done:
• Compose - - - gets me — EM DASH (stock)
• Compose - - . gets me – EN DASH (stock)
• Compose - - = gets me − MINUS SIGN (custom)
• Compose - - w gets me ⸺ TWO EM DASH (custom; w for wide)
• Compose - - W gets me ⸻ THREE EM DASH (custom; W for Wider)
The last two I use occasionally, the other three I use very frequently. I went through a phase of using HYPHEN and SOFT HYPHEN, now I seldom use them.
I also like to write &c. (italic where supported) for et cetera.
For quotation marks, I also use custom mappings:
<Multi_key> <semicolon> <semicolon> : "‘" U2018 # LEFT SINGLE QUOTATION MARK
<Multi_key> <apostrophe> <apostrophe> : "’" U2019 # RIGHT SINGLE QUOTATION MARK
<Multi_key> <colon> <colon> : "“" U201c # LEFT DOUBLE QUOTATION MARK
<Multi_key> <quotedbl> <quotedbl> : "”" U201d # RIGHT DOUBLE QUOTATION MARK
Think about how you physically type them, and I reckon these mappings make a lot of sense, very easy to type. Much better than the stock bindings (<' >' <" >") or kragen ones (`Space 'Space `` ''; or 6' 9' 6" 9").—⁂—
(Oh yeah, that one’s <Multi_key> <h> <r> : "—⁂—".)
Now, I have one question I’d like answered. Overlapping sequences. If you have -> → and <- ← you’re fine, but when you add <-> ↔, I can’t find any way of using the <- sequence any more. Before fcitx5, some apps would ignore one or the other (in ways difficult to explain which I think involved the fact that some definitions came from includes), and some would let you terminate the sequence early and match the shorter one (e.g. Compose < - Enter). Is there some proper solution I’ve missed?
I have plans for an article on my keyboard arrangements, including sharing a full .XCompose, but I’m going to finish my next major revision to my website first. Because then I’ll be able to draw things instead of just writing.
—⁂—
On mobile, I think I use FUTO keyboard at present, which lets me access most of these things, but not elegantly. I want to make my own keyboard layout that lets me access the good stuff more easily, but I haven’t got to it yet.
Also: anyone want to join me in advocating for completion dictionaries and libraries to replace their ' apostrophes with ’, or at least to support both approaches equally? I’m fed up with not having this stuff, Vim is the only place where it was straightforward to get it about right, and mobile is just a mess.
X11 is likely walking a tree of .XCompose entries with each keypress. Once it gets to '<' and '-' it finds '←' and does not continue to consider your next '>'. So, you need to provide a way to walk a different path.
This works for me.
<Multi_key> <less> <period> <greater> : "↔"
It is like how EN DASH is "--." to be distinct from EM DASH's "---".In general we must consider the entirety of .XCompose when choosing new compose key bindings. Maybe there is some utility to help with that. For me, I removed 98% of the default Compose file entries which makes manual checking feasible.
Some only let Compose < - (←) work, stopping and preventing Compose < - > (↔) from working. Others, if I remember correctly, let Compose < - Enter work to get ←.
Once an Input Method is involved, it can handle the Compose key, and that’s what fcitx5 is doing for me now, so that everything’s behaving the same… but that “same” is not what I reckon it should be.
https://www.gally.net/miscellaneous/hn-em-dash-user-leaderbo...
This second version was vibe-coded with Codex CLI. I also tried Gemini CLI, but it didn’t work very well. The SQL scripts I ran at BigQuery were by Claude.
I am not a programmer or web designer, so I will leave these pages as they are, warts and all. It was a fun project, though. I never would have attempted something like this pre-vibe-coding.
SELECT
EXTRACT(YEAR FROM timestamp) AS year,
SUM(CASE WHEN text LIKE '%—%' THEN 1 ELSE 0 END) AS withDash,
COUNT(*) AS total,
SUM(CASE WHEN text LIKE '%—%' THEN 1 ELSE 0 END) / COUNT(*) AS fraction
FROM `bigquery-public-data.hacker_news.full`
WHERE type = 'comment'
GROUP BY year
ORDER BY year;
year with— total frac
2006 0 12 0.000
2007 13 70858 0.000
2008 461 247922 0.001
2009 1497 491034 0.003
2010 3835 842438 0.005
2011 4719 1044913 0.005
2012 5648 1246782 0.005
2013 7881 1665185 0.005
2014 8400 1510814 0.006
2015 9967 1642912 0.006
2016 12081 2093612 0.006
2017 14530 2361709 0.006
2018 19246 2384086 0.008
2019 23662 2755063 0.009
2020 27316 3243173 0.008
2021 32863 3765921 0.009
2022 34657 4062159 0.009
2023 36611 4221940 0.009
2024 32543 3339861 0.010
2025 30608 2231919 0.014
So there's definitely been an increase.Querying for the users who use "—" most as a proportion of all their comments:
SELECT
`by`,
SUM(CASE WHEN text LIKE '%—%' THEN 1 ELSE 0 END) / COUNT(*) AS fraction,
COUNT(*) AS total,
MIN(timestamp) AS minTime,
MAX(timestamp) AS maxTime
FROM `bigquery-public-data.hacker_news.full`
WHERE
type = 'comment' AND
timestamp < '2022-11-30'
GROUP BY `by`
HAVING COUNT(*) > 100
ORDER BY fraction DESC
LIMIT 250;
zmgsabst uses them the most [1], westoncb [2] is an older account that uses them fourth-most.[0] https://console.cloud.google.com/marketplace/product/y-combi...
ChatGPT always uses them without spaces—like this.
text LIKE '%—%' AND text NOT LIKE '% —%' AND text NOT LIKE '%— %'
puts westoncb in the lead, followed by mucholove, trebbble, _zzaw and lexcorvus.106 more comments available on Hacker News
Not affiliated with Hacker News or Y Combinator. We simply enrich the public API with analytics.