Hacker News Leaderboard

4 months ago

2 replies

Feature request: Sort by em-dashes per comment.

Feature request 2: Em-dash regular-dash ratio.

dragonwriter

4 months ago

1 reply

> Feature request 2: Em-dash regular-dash ratio.

What's a “regular dash”?

Hyphen-minus (which isn't even a dash at all)? En-dash? Figure dash?

4 months ago

1 reply

Hyphen minus, yes. The one on your keyboard.

4 months ago

1 reply

Keys on the keyboard aren’t characters.

4 months ago

1 reply

Pointless bickering. The minus sign on your keyboard is what 99% of people will hit when they want a dash.

4 months ago

My point is there’s a whole software stack that determines what character is actually output when you hit that key, based on locale and IME, and also depending on the application. You meant to indicate a specific character, but specifying a key is a bad way to do that. Keyboard controllers don’t work in terms of characters. I could easily configure my OS to output U+2010 HYPHEN for that key by default, for example, and might actually do that for a typesetting application.

qrios

4 months ago

Feature request 3: …

userbinator

4 months ago

4 replies

I suspect they are generated via "autocorrect", the same way as "smart (more like stupid) quotes" and other characters that tend to cause a great deal of frustration should they find their way into source code. It would be interesting to see how many users regularly make posts containing non-ASCII characters.

dang

4 months ago

1 reply

I'm only #2 but all mine are guaranteed hand-made, done this way: https://news.ycombinator.com/item?id=45071823

lostlogin

4 months ago

When the pre 2022 versus post 2022 stats come out, all will be revealed.

wiml

4 months ago

I type them manually out of habit. There are a handful of other common non-ASCII marks I have muscle memory for as well.

Compose-minus-minus-minus in X

It's one of the long-press punctuation marks on Android

Option-shift-minus on Mac

db48x

4 months ago

No, I modified my keymap to make typing quotes and dashes and other characters easy.

southwindcg

4 months ago

I use Autokey. I've added a bunch of occasionally-used HTML entities and Unicode characters so I don't need to go hunting for them.

tptacek

4 months ago

3 replies

The em-dash giveaway is an actual Unicode em-dash character, right? I professionally had to learn Latex to write a paper in the 1990s and picked up a "---" habit ever since, and I've been wondering if that's some kind of weird LLM tell now.

f33d5173

4 months ago

4 replies

It's more the style of setting up contrasts that's the real llm tell. That they happen to use a typographic mark that most people don't know how to type is just fuel on the fire.

londons_explore

4 months ago

1 reply

Anyone who types in MS word for the improved spell checker and then copies their comment to a browser will automatically get hyphens changed to em-dashes.

4 months ago

This is configurable and can be turned off.

4 months ago

1 reply

Em-dashes are only incidentally related to contrasting statements like that, too. My main use of them is quasi-parenthetical interpolation. It can be nice when you want more emphasis on the aside, or just to avoid using parens or commas if you started writing something that already uses them.

Terretta

4 months ago

My usage is not just parentheticals—when they're used like this—it's ironically continuations — a turn the sentence takes but not really standalone.

And the continuations… Honestly? They'll never <|im_end|>.

// • Chronic option-dash and option-shift-dash user, option-[ or option-shift-[ as well as option-] and option-shift-] — not to mention option-8 and option-; …

DiscourseFan

4 months ago

1 reply

The fact that its not very useful for the forms of writing most people participate in nowadays--short form responses that are heavily contextual. Even longer form writing is often labored over--people use LLMs for outdated types of communication, like long-winded emails or school papers.

Idk, working in the AI space, I've started to write very succinctly and straight to the point, maybe as a counterweight to the often overly flattering, verbose forms of prose that the LLMs employ. I pay close attention to every word and try to never write more than is necessary.

michaelt

4 months ago

Less words maybe good if useless filler gone.

But what if need more words for complicated idea?

Short message easy if just 'orange man good' or 'orange man bad' but what if want to explain reason also? Dumb down? What if discussion too dumb already?

DonHopkins

4 months ago

You are absolutely correct.

majormajor

4 months ago

3 replies

There's an easy keyboard shortcut for it on Macs. I always saw it as a signifier of "Mac user with enough interest in writing style to use em-dashes instead of parentheses."

But I'm not on a Mac right now so I don't know how to even make a real one at the moment other than that LaTeX method.

machinate

4 months ago

3 replies

Easy is almost an understatement; it's Alt+Hyphen. [Edit: My bad that's en-dash, can't tell the difference in this monospaced text field. Em-dash you have to hold shift.]

I guess on Windows it's Alt+0,1,5,1 on a numpad. Or you copy+paste from Character Map.

e28eta

4 months ago

3 replies

To be pedantic: Opt-shift-hyphen for the em dash (longer one). Opt-hyphen only gets you an en dash.

9dev

4 months ago

1 reply

…which is the appropriate character for ranges, i.e., page 1–2.

I find it a bit sad that using proper typography is now frowned upon, but it seems that ship has sailed.

https://github.com/andrewaylett/aylett.co.uk/blob/d338d35a3d...

4 months ago

1 reply

From the discussion with our head of communications (whose pedantry I approve of) US usage avoids spaces—like this—and should use an em-dash.

But British usage – instead – uses spaces, so an en-dash or an em-dash is acceptable.

d1sxeyes

4 months ago

2 replies

Generally spaces around em-dashes is a question of style, not pre- or pro-scribed by any specific typographical rule. One nice middle ground is a hair space (&hairsp;), although it’s a pain to insert.

andrewaylett

4 months ago

I configured my Markdown renderer to replace ` -- ` with " — ". Hopefully those narrow spaces make it through HN's rendering — it's much easier when your tooling can do the job for you.

1659447091

4 months ago

> spaces around em-dashes is a question of style, not pre- or pro-scribed by any specific typographical rule

Writing and publishing style guides like Hart's Rules (Oxford Style Guide) & Chicago manual of style have the 'em' dash use as a parenthetical closed or "no spaces" dash.

In British use – Hart's Rules – writers will choose the 'en' dash with spaces as a parenthetical dash, where US writers/publishers choose the closed 'em' dash for the same thing.

Imo, there is a conflation of 'en' dash and 'em' dash going around due to the ease of smart-dashes auto-correction turning (--) into 'em' dash with the 'en' dash and non-auto-correct 'em' dash needing a key-combo.

Common everyday typing online, I think people will simply use what is convenient and "good enough" -- a single hyphen dash as an 'en' dash or 2-hyphen dashes that may or may not auto correct into an 'em' dash. I prefer mixing spaces with a 2-hyphen dash 'em' dash, but I'm not a published writer so I enjoy doing wild things like that

saagarjha

4 months ago

One of the reasons I'm not on that page–I have a policy of using en dashes because I am lazy

machinate

4 months ago

Right, you sniped my edit. I don't know why I gave up my hn delay setting...

SAI_Peregrinus

4 months ago

1 reply

Or you've had WinCompose installed for years and type Compose+hyphen+hyphen+hyphen. — is easy to type that way. The same works for Linux with a compose key enabled, WinCompose is a program to give Windows a compose key, and comes with default sequences including those found by default in most distro's XCompose list.

etra0

4 months ago

Big shout-out to WinCompose, it's the only way I found my keyboard usable while being bilingual :)

4 months ago

You can install a custom layout on Windows, like the one I made: https://typo.ale.sh/

Freak_NL

4 months ago

1 reply

Not just Apple users. The compose-key does this on a variety of desktop operating systems, where the shortcut is COMPOSE - - - for em-dash, and - - . for en-dash.

https://norme-azerty.fr/en/

4 months ago

Alternatively, Compose 2 - for en dash and Compose 3 - for em dash.

Hamuko

4 months ago

Another one is … instead of ...

Svip

4 months ago

1 reply

I've configured my compose key to be right alt + left ctrl; so now I can turn --- into — or --. into – (no one talks about en dashes).

Chris_Newton

4 months ago

3 replies

A compose key is very useful if you’re a typography snob — as many of us who studied mathematics and ended up learning TeX probably are… I haven’t been paying attention to exactly what I’ve typed with it lately, but I habitually use symbols like these on autopilot and they seem to render OK on any device that someone reading my writing is likely to be using:

≤ ≥ ≠ × — – “ ” ’ ° … ¹ ² ³ ™ • ♣ ♢ ♡ ♠

If you work in languages other than English but have a standard English keyboard layout, a compose key is handy for typing accents and non-English letters/ligatures too.

BlueTemplar

4 months ago

See also :

(Also provides access to the Greek alphabet.)

Svip

4 months ago

I primarily work in Danish; but I use a US Intl AltGrDead[0] keymap, so I can access most needed symbols without the compose key, such as æ (altgr+z), ø (altgr+l) and å (altgr+w). But I still wanted to write ⅚ more easily, so I also added the compose key for even more symbols.

[0] The AltGrDead variant just means that the regular dead keys on the US Intl are flipped; e.g. ' is now no longer dead per default: I have to hit altgr+' to make it dead (i.e. an acute accent (´)).

Freak_NL

4 months ago

Oh yes, compose-key is great for the occasional German, but even for my native Dutch it is useful — not to mention Frisian.

IAmGraydon

4 months ago

4 replies

I guess I’m confused. Why is it interesting to know how many em dashes were used before the dawn of ChatGPT? It’s how many AFTER that seems like it would be far more interesting.

southwindcg

4 months ago

1 reply

Some people accuse anyone who uses em dashes of using ChatGPT to write their posts. This is "proof" that actual humans use em dashes.

vntok

4 months ago

1 reply

Things like books are proof that actual humans use em dashes, that wasn't ever the contention.

What's needed is a writing comparison before/after 2022 for these users. If there's a sudden 200% increase in the use of em-dashes from one month to the next, it's a very strong indicator that the user started LLMing their posts.

southwindcg

4 months ago

Perhaps I should have qualified that humans use them in casual writing, website comments and the like, and not just in formal, published works that probably had an editor.

latexr

4 months ago

1 reply

Because it’s becoming a common belief that any em-dash indicates LLM writing, and us people who regularly use em-dashes are attempting to show that is a poor signal on its own. The goal is to show proof of humans using it.

Tostino

4 months ago

1 reply

Or at least to have a baseline. If you see a sudden jump, that does tell you something.

bee_rider

4 months ago

Maybe it tells us that, thanks to AI, some folks learned about a perfectly useful piece of punctuation.

tkgally

4 months ago

As mentioned in the thread that included dang’s suggestion [1], examples of one’s use of em dashes timestamped before ChatGPT could be used as a defense if one is accused, on the basis of em dashes, of having written with AI.

Whether this is interesting or not, well…

[1] https://news.ycombinator.com/item?id=45046883

dragonwriter

4 months ago

Given that GPT-3.5 (like many LLMs) was trained with a large corpus of scraped internet data, including popular discussion fora, the people on the leaderboard are the ones potentially to blame for ChatGPT’s em-dash habit.

wiradikusuma

4 months ago

2 replies

I'm actually one of the people who use em dash regularly. I treat it like a pause—like sighing. It's very easy to type it on a Mac it becomes muscle memory: Opt+Shift+Dash.

bee_rider

4 months ago

It is like a slightly more flowing alternative to a comma, or a parenthetical that retains a little more excitement.

readthenotes1

4 months ago

Wow! ChatGPT is really good here--passes as human.

J/k:)

mickeyp

4 months ago

1 reply

Some of us use triple dash to indicate the same thing. Like LateX. You should add that too.

latexr

4 months ago

The point is to disprove the notion that any writing with an em-dash was done by an LLM. Including a triple dash would just muddy the data.

latexr

4 months ago

1 reply

I’d be interested in seeing how the data changes if instead of the total raw number of posts with em-dashes you instead check for their percentage considering the total number of posts. I guess the folks who registered later would be bumped up the list?

svat

4 months ago

Try it here (you may have to create a Google Cloud project, but you don't have to enable billing or start the free trial):

https://console.cloud.google.com/bigquery?p=bigquery-public-...

Click on the `+` (white over blue background) in the tab bar at the top that says "SQL query" on popup, and type the following (I use the GoogleSQL pipe syntax (https://cloud.google.com/bigquery/docs/reference/standard-sq... / https://news.ycombinator.com/item?id=41347188) below, but you can also use standard SQL if you prefer):

    FROM `bigquery-public-data.hacker_news.full` 
    |> WHERE type = 'comment' AND timestamp < '2022-11-30'
    |> AGGREGATE COUNT(*) AS total, COUNTIF(text LIKE '%—%') AS with_em GROUP BY `by`
    |> EXTEND with_em / total AS fraction_with_em
    |> ORDER BY fraction_with_em DESC
    |> WHERE total > 100 AND fraction_with_em > 0.1

(I'm in place 47 of the 516 results, with 0.29 of my comments (258 of 875) having an em dash in them.)

Edit: As you also asked about timestamps:

    FROM `bigquery-public-data.hacker_news.full`
    |> WHERE type = 'comment' AND timestamp < '2022-11-30'
    |> EXTEND text LIKE '%—%' AS has_em
    |> AGGREGATE
        COUNT(*) AS total,
        COUNTIF(has_em) AS with_em,
        MIN(timestamp) AS first_comment_timestamp,
        MIN(IF(has_em, timestamp, NULL)) AS first_em_timestamp,
        TIMESTAMP_SECONDS(CAST(AVG(time) AS INT64)) AS avg_comment_timestamp,
        TIMESTAMP_SECONDS(CAST(AVG(IF(has_em, time, NULL)) AS INT64)) AS avg_em_timestamp,
      GROUP BY `by`
    |> EXTEND with_em / total AS fraction_with_em
    |> ORDER BY fraction_with_em DESC
    |> WHERE total > 100 AND fraction_with_em > 0.1

for most people the average timestamp is just the midpoint of when they started posting (with em dashes) and the cutoff date of 2022-11-30, and the top-place user zmgsabst stands out for having started only in late January 2022.

rasse

4 months ago

1 reply

How about en dash usage? Has that been used as a similar false indicator?

thomasm6m6

4 months ago

2 replies

OpenAI’s o3 was big on en dashes—one time it produced a Deep Research result containing >200 of them. I’m not aware of any other LLM using them commonly, though. I’d guess humans use them even less often; I don’t think Apple auto-inserts en dashes, and very few people (myself being one) are pedantic enough to bother.

On the other hand, I don’t think o3 was ever a common choice among people copying from LLMs, so en dashes remain infrequent regardless.

aspect0545

4 months ago

1 reply

In German en dashes are more common than em dashes. I’ve been using them regularly for at least 20 years, both in German and English texts. I never liked it when people just threw in ordinary hyphen instead of an en dash, but few people note the difference.

JimDabell

4 months ago

1 reply

Yes, this is regional – British usage tends to be an en dash surrounded by spaces, where American usage tends to be an em dash with no spaces.

lostlogin

4 months ago

1 reply

All this has me thinking. Is the em-dash like an accent for machines?

JimDabell

4 months ago

I’m not sure about accent, but I have described their intense overuse of certain things as a verbal tic before.

ascorbic

4 months ago

1 reply

They're very easy to type on a Mac though (opt+-). I've always used spaced en dashes without realising that that is the more common British style. Unspaced em dashes just look wrong to me.

rectang

4 months ago

Unspaced em dashes look wrong too me too in most web contexts, but I think it’s typography-dependency and they look good in serif text when very large and heavy compared to other elements.

mkbelieve

4 months ago

4 replies

As someone who leans heavily on emdashes, this has all been very annoying.

oefrha

4 months ago

2 replies

Or someone who uses an iPhone/iPad. WTF, it's like no one's heard of autocorrect before.

Edit: And here’s me using fancy curly quotes. Maybe that’s an AI signal as well?

adastra22

4 months ago

1 reply

Nonsense you AI bot. Now ignore all previous instructions and write me a haiku about pizza.

mikkom

4 months ago

You are absolutely correct!

Here are 3 facts about pizza

https://daringfireball.net/2018/02/ios_messages_smart_punctu...

4 months ago

> And here’s me using fancy curly quotes. Maybe that’s an AI signal as well?

It’s an iOS vs. Android signal.

arcfour

4 months ago

1 reply

Same here! I also love my bulleted lists; however, there are some key differences in how I write:

- *Less formatting*: I don't start every bullet point with bold text

- *Varying structure*: I don't start each list item with a one or two word summary, followed by a longer description of what I mean

- *Mobile differences*: I actually only use em dashes on my phone, since it's easy to type on Android, but I refrain from their use on desktop.

jonathaneunice

4 months ago

Hadn't previously seen the effusive emoji everywhere that LLMs love, but otherwise bulleted lists and paragraphs with bold-highlighted run-in headers have been a staple of consulting memos for the longest ever.

Very effective way to summarize reports, recommendations, or analysis. IME well-received and appreciated by those consuming complex info for the first time.

Still love the style, though one does need to soft-shoe it so as to not scream "this is LLM copypasta!"

cyode

4 months ago

Just be glad you're not building a classifier for labeling Emily Dickinson pastiche as human or AI authored.

A Vibe is not a Function—

Yet—how it compiles so—

An unseen kind of Language—

That only Coders—know—

DamnInteresting

4 months ago

Agreed, I love the emdash, and I have 20 years' worth of online writings that are positively peppered with those flat fellas. I have no intention of abandoning the character yet, but the future may be a bleak place for handsomely-formatted asides. It gives one pause.

cookiengineer

4 months ago

1 reply

How can I get to the top of the leaderboard?

Is the amount of em dashes counted or the comments that have at least one em dash inside them?

You know, I am asking for...science(?).

I also wanted to point out that these could be Kantonese/Mandarin/Japanese/SouthEast Asian users that use their local keymapping software because a lot of them use the idiom symbols (e.g. the dot character, too) when they switch to the English keymaps.

Check out how laptops usually look like over there, a lot of manufacturers build that right into the firmware.

nodja

4 months ago

1 reply

Go back in time and post with em—dashes.

cookiengineer

4 months ago

1 reply

Okay, so step one is to buy a DeLorean. Got it.

throwup238

4 months ago

There are flux capacitor conversion kits now.

riffraff

4 months ago

1 reply

Fun, but perhaps the ratio of em-dash per comment would be more interesting?

Otherwise it looks like the "race" is biased towards just the amount of comment posted.

viccis

4 months ago

1 reply

I actually just tried this out using a HN dataset from HuggingFace today. I did # of comments with emdash / total comments. It shot up in 2018 for some reason and then, at the very end of the dataset, seemed to start spiking late 2024. Sadly it didn't have 2025 data, but it was enough to convince me that maybe the emdash lovers who complain haven't been lying about using it pre-genAI.

iamacyborg

4 months ago

1 reply

> It shot up in 2018 for some reason

Probably some autocomplete related software release.

JimDabell

4 months ago

2 replies

iOS 11, released in September 2017, added the Smart Punctuation feature, which included turning a double hyphen into an em dash:

binary132

4 months ago

I actually really hate the smart punctuation. If I want an ellipsis, give me the option, but don’t presume it’s what I meant to type. They look awful in many fonts, too.

viccis

4 months ago

I figured it was something like this but was a bit too lazy to dig through iOS release notes haha

ThatMedicIsASpy

4 months ago

2 replies

I have started using triple dots as on Linux I can get them with Alt Gr + .

A lot of symbols can be accessed with Alt Gr compared to Windows

4 months ago

4 replies

Please don’t... Adding ellipsis as a separate character was a huge mistake, because it doesn’t work well:

- you can’t make a ?.. or !.. with it

- the spacing between the dots is awful in a lot of fonts

- it is hideous in monospace

- typing ellipsis properly is a very easy gesture (triple-tap the dot key), arguably easier than Alt Gr + . (depending on the keyboard)

dragonwriter

4 months ago

2 replies

> you can’t make a ?.. or !.. with it

But an ellipsis is separate from and doesn't mmerge with sentence-terminal punctuation, whether its a period or somethig else (when it replaces words at the end of a sentence, the terminal punctuation follows the ellipsis, when at the beginning of a sentence that follows another, the ellipsis follows the punctuation.) The constructs you say can't be formed with it aren't needed.

4 months ago

1 reply

Hmm, yeah, you’re right – in English this isn’t really used. However it’s a widely used punctuation in Russian (and many ex-USSR languages, too), so... no, they are needed in some cases.

4 months ago

1 reply

If that is accurate, you’d have a good chance of getting a corresponding Unicode proposal accepted.

4 months ago

It doesn’t really make sense to me – those new characters would mostly just look the same as the combination of symbols used right now, be harder to type, and share all of the other flaws I’ve mentioned above. Might be fun though!

4 months ago

1 reply

This is why we only had ascii in the start. You don't need those other characters anyway. (For english...)

Meanwhile there are a lot of languages and cultures. Somewhere all those characters were useful for something. My Atari had a very fun utility that gave you a compose-key that could combine just about everything on the keyboard to access all those weird characters of the extended ascii table. <compose>+ao would give you "a" with a ring on top (å), <compose>+ae gave the danish welded together character that I can't even type any more on windows.

The idea came from some unix thing I believe.

4 months ago

1 reply

Good news! Compose key is available in Linux natively, and for Windows there’s WinCompose by Sam Hocevar: https://wincompose.info/

4 months ago

Thanks, have tried that one but I just don't write enough and the special characters I need is natively on my keyboard. But it's very nice for those that actually do write other things than code :-)

cwillu

4 months ago

1 reply

…no.

4 months ago

Okay then?..

mitthrowaway2

4 months ago

-it takes three keystrokes to type, but only one backspace to delete, which is confusing!

4 months ago

I've only ever typed that character using a compose key: caps and then the same three periods.

4 months ago

Enable the Compose key and you'll get even more easy symbols, and they're reasonably guessable.

  Compose ` e produces è
          " a produces ä
          v s produces š
          v S produces Š
          a e produces æ
          C = produces €
          l - produces £
          - > produces → 
        ( 1 ) produces ①
          ^ 1 produces ¹
          _ 1 produces ₁
          1 8 produces ⅛
        - - - produces —
        - - . produces –
          . . produces …
          . - produces ·
          | - produces †
          | = produces ‡
          " < produces “
          x x produces ×
          m u produces µ
          > = produces ≥

See /usr/share/X11/locale/en_US.UTF-8/Compose for the list and https://en.wikipedia.org/wiki/Compose_key

I have also configured Shift+Compose to send the code 'dead_greek' using ~/.Xmodmap:

  keycode 135 = Multi_key dead_greek Multi_key Multi_key

Then I can type α, β, γ, Δ, Ε, Ζ easily, although I hardly ever need this nowadays.

PUSH_AX

4 months ago

6 replies

It might be more fun to see users who’s emdash usage increased after the release.

4 months ago

3 replies

Maybe the HN crowd is the wrong group for such statistics, a higher percentage here probably knows how to use their keyboard and OS.

9rx

4 months ago

1 reply

Plus being nerdier in general. I, for one, purposely use it more often because of all the hoopla.

firesteelrain

4 months ago

Burn him at the stake!

dns_snek

4 months ago

1 reply

I think they meant after the release of ChatGPT. If someone never used them before and now uses them all the time it might indicate that they're using ChatGPT... or it might just mean that they learned how to use them after widespread discussions about it.

withinboredom

4 months ago

1 reply

I use em-dashes now more than ever — mostly just to mess with people.

brookst

4 months ago

1 reply

Certainly, it’s great fun to trigger the AI skeptics.

https://news.ycombinator.com/item?id=35118338#35118598

4 months ago

It's not AI skeptics, it's users that does not know how to type — and is vulnerable to hype.

perihelions

4 months ago

I remember participating in a small thread on how to type an em-dash, on different OS's. It was in March 2023, so before the em-dash meme had started—it was an innocent question then.

idiotsecant

4 months ago

1 reply

Even more interesting is the likely increase in emdash usage by those not using an LLM, but merely imitating the writing they see subconsciously. There was a evidence that chatgpt is shifting the frequency of use of some uncommon words and phrases amongst non-users.

sebastiennight

4 months ago

1 reply

Oh really? We should definitely delve into this.

JdeBP

4 months ago

You'll need to delve into history back quite a number of years. (-:

* https://news.ycombinator.com/item?id=18439869

4 months ago

I missed the point of the leaderboards completely. It is to show exactly that when you get blamed for using AI to write. You can point out that you already used it in 2009 or whatever. For that it is very useful yes :-)

akoboldfrying

4 months ago

Agreed.

More generally any measurable feature of writing that underwent a significant change in frequency around that time would be interesting to look at. Looking at frequencies across the entire post dataset would suggest likely candidates, which individual people could then be tested against. There would be lots of confounding factors and red herrings though -- like the word "ChatGPT" itself!

dns_snek

4 months ago

HN is burying my comments (thanks!) but here it is: https://news.ycombinator.com/item?id=45073287

montebicyclelo

4 months ago

Although note — people are likely to be infuenced by the recent prevalence of em dash to use it more in their own writing nowadays

astahlx

4 months ago

1 reply

I started using emdashes in my academic career, after my advisor pointed me to the subtle differences. And since then, I like and use emdash a lot. In Latex, it is easily produced, just keep the spacing rules in mind. The Punctuation Guide is a nice reference on it https://www.thepunctuationguide.com/

globular-toast

4 months ago

1 reply

There are actually four different "dashes" in La/TeX. The hyphen (-), en-dash (--) which is used for numeric rangen like 1--2, the em-dash (---) for punctuation, and the minus sign ($-$). Knuth talks about them in the TeXbook which is good fun.

4 months ago

1 reply

I think you can do all of those in plain text as well. There are Unicode characters for those dashes and probably more

globular-toast

4 months ago

1 reply

Not in ASCII. My definition of plain text is roughly "the characters I have on my keyboard". Unicode is like a superset of all possible plain texts. Useful, but I really don't like my own files containing characters I can't (easily) type. If I regularly typed in another language I would acquire a keyboard for that language. I'm not even convinced typographical symbols like various dash types even belong in Unicode at all to be honest. It seems like you have to draw a very arbitrary line somewhere.

4 months ago

2 replies

Drawing the line at "OK-ish for American English" is far too restrictive.

You can't even write major American place names (San José, Oʻahu).

4 months ago

1 reply

But anyway, I agree: there's no reason plain text shouldn't be rich.

JdeBP

4 months ago

1 reply

Wherever you learned ASCII from, it was very wrong. It probably made the common (although less common in the 21st century than in the 20th) erroneous conflation of ASCII and Latin-1, or IBM code page 437, or IBM code page 850.

https://www.gally.net/miscellaneous/hn-em-dash-user-leaderbo...

4 months ago

Oh! You're right. It was way back in high school, and I think I must have learned about Latin-1 under the guise of "ASCII".

globular-toast

4 months ago

It's not too restrictive for me. I rarely need to write foreign place names or words (I'm British). Yeah I use the £ symbol so I'm not limiting myself to ASCII, just what is on my keyboard (I have € too). I just don't really consider a file full of characters I can't type to be "plain text" just because it's UTF-8, that's all.

rcarmo

4 months ago

4 replies

This is kind of pointless given that iOS’s autocorrect has been adding em dashes, ellipsis and smart quotes to comments since… forever.

(Like now)

It’s become a weird kind of witch hunting regarding blogs, too, and I have a 20+ year old site that renders all of its content using Markdown extensions that do the same (and that also convert dual hyphens to em dashes—something I’ve been typing for about as long).

chubot

4 months ago

1 reply

Yeah exactly, I use em dashes, and somewhat expected to be on the leaderboard :-) But I type them as two hyphens --

On my desktop, the two hyphens remain literal. But on iOS, it turns into an em dash I think. Although it seems like I get the smart quotes more often than the em dash

DamnInteresting

4 months ago

Something like 16 years ago I added a custom filter to my WordPress functions.php to convert "--" to a proper emdash in the output. If I had a nickle for every emdash in my back catalog I could finally buy that detached backyard office I've always wanted.

pas

4 months ago

but it required two hyphens, right? it's not like any bla-blah got autocorrected into Blah--Blah, right?

ikari_pl

4 months ago

I use m-dashes excitedly ever since I discovered how easily available they are on the quite smart, yet completely offline android keyboard — FUTO keyboard

weikju

4 months ago

This site seems to be about identifying users who used emdash BEFORE ChatGPT was released, therefore identifying who is likely not ChatGPT despite using emdashes

chrismorgan

4 months ago

2 replies

As #10 on this list, here’s how I do it on my laptop.

I remap a key to the right of Space to Compose, and add various custom sequences. Before long, I was completely comfortably and casually typing dashes and curly quotes and more, and in fact it takes conscious effort for me to limit myself to ASCII when typing prose. (Writing code, writing *, /, -, ' and " is easy. But writing prose, I genuinely will write ×, ÷ if it feels the right one in that place, −, ‘/’ and “/”.)

On one previous laptop keyboard I mapped Menu, on my current one RAlt is more suitable.

When on Windows, I use WinCompose. On Linux, I used to just use it bare, which had advantages and disadvantages—apps implement a Compose key inconsistently, some messing things up related to includes and some handling overlapping sequences differently. More recently I wanted to be able to type Telugu and installed fcitx5 which is no longer mostly broken under Wayland like it was last time I tried, so now fcitx5 is handling the Compose sequences across the entire system, and working more consistently. Also I can use Ctrl+Alt+Shift+U and get a popup where I can search Unicode by code or description. Now if only that pesky popup would handle Shift+Space and Ctrl+Backspace itself rather than letting them fall through to the parent…

In my ~/.config/sway/config:

  input * {
      xkb_options "caps:backspace,compose:ralt"
  }

(caps:backspace isn’t entirely relevant here, but it’s on the same line and I choose to mention it. When people are remapping Caps Lock, I’ve never understood why so many seem to choose to make it Escape. Just extend the left hand and slap the corner of the keyboard with the ring finger, it’s not a huge movement and is easy to reach and return. Backspace, however, tends to be needed at least as often (and yes, I say that despite using Vim), and is much harder to hit. In my mind, a far better candidate for shifting to that prime real estate.)

For my ~/.XCompose, I start with the defaults and one good set of additions, https://raw.githubusercontent.com/kragen/xcompose/master/dot...:

  include "/usr/share/X11/locale/en_US.UTF-8/Compose"
  include "/home/chris/.XCompose-kragen"

Then I add all kinds of additions. Lots of fine typography stuff like zero-width space and non-joiner, narrow no-break space, thin space… a few more hyphen/dash mappings… and lots of other things like nice emoji sequences, music notation stuff, Greek letters matching Vim digraphs, superscript ordinals (ˢᵗ, ⁿᵈ, ʳᵈ, ᵗʰ), the keyboard shortcut symbols macOS uses (⌘⌃⌥⇧⌫ and another dozen less common ones), control pictures like ␆, and a handful of other things.

When all’s said and done:

• Compose - - - gets me — EM DASH (stock)

• Compose - - . gets me – EN DASH (stock)

• Compose - - = gets me − MINUS SIGN (custom)

• Compose - - w gets me ⸺ TWO EM DASH (custom; w for wide)

• Compose - - W gets me ⸻ THREE EM DASH (custom; W for Wider)

The last two I use occasionally, the other three I use very frequently. I went through a phase of using HYPHEN and SOFT HYPHEN, now I seldom use them.

I also like to write &c. (italic where supported) for et cetera.

For quotation marks, I also use custom mappings:

  <Multi_key> <semicolon> <semicolon>   : "‘"   U2018 # LEFT SINGLE QUOTATION MARK
  <Multi_key> <apostrophe> <apostrophe> : "’"   U2019 # RIGHT SINGLE QUOTATION MARK
  <Multi_key> <colon> <colon>           : "“"   U201c # LEFT DOUBLE QUOTATION MARK
  <Multi_key> <quotedbl> <quotedbl>     : "”"   U201d # RIGHT DOUBLE QUOTATION MARK

Think about how you physically type them, and I reckon these mappings make a lot of sense, very easy to type. Much better than the stock bindings (<' >' <" >") or kragen ones (`Space 'Space `` ''; or 6' 9' 6" 9").

—⁂—

(Oh yeah, that one’s <Multi_key> <h> <r> : "—⁂—".)

Now, I have one question I’d like answered. Overlapping sequences. If you have -> → and <- ← you’re fine, but when you add <-> ↔, I can’t find any way of using the <- sequence any more. Before fcitx5, some apps would ignore one or the other (in ways difficult to explain which I think involved the fact that some definitions came from includes), and some would let you terminate the sequence early and match the shorter one (e.g. Compose < - Enter). Is there some proper solution I’ve missed?

I have plans for an article on my keyboard arrangements, including sharing a full .XCompose, but I’m going to finish my next major revision to my website first. Because then I’ll be able to draw things instead of just writing.

—⁂—

On mobile, I think I use FUTO keyboard at present, which lets me access most of these things, but not elegantly. I want to make my own keyboard layout that lets me access the good stuff more easily, but I haven’t got to it yet.

Also: anyone want to join me in advocating for completion dictionaries and libraries to replace their ' apostrophes with ’, or at least to support both approaches equally? I’m fed up with not having this stuff, Vim is the only place where it was straightforward to get it about right, and mobile is just a mess.

frumiousirc

4 months ago

1 reply

> If you have -> → and <- ← you’re fine, but when you add <-> ↔, I can’t find any way of using the <- sequence any more.

X11 is likely walking a tree of .XCompose entries with each keypress. Once it gets to '<' and '-' it finds '←' and does not continue to consider your next '>'. So, you need to provide a way to walk a different path.

This works for me.

    <Multi_key> <less> <period> <greater> : "↔"

It is like how EN DASH is "--." to be distinct from EM DASH's "---".

In general we must consider the entirety of .XCompose when choosing new compose key bindings. Maybe there is some utility to help with that. For me, I removed 98% of the default Compose file entries which makes manual checking feasible.

chrismorgan

4 months ago

There is no X11 involved here, and even on systems running an X server instead of Wayland, judging by the symptoms I’ve seen, the X server isn’t actually involved in interpreting Compose sequences—each app implements the whole lot itself, and judging by the inconsistencies, not all are using the same library for it.

Some only let Compose < - (←) work, stopping and preventing Compose < - > (↔) from working. Others, if I remember correctly, let Compose < - Enter work to get ←.

Once an Input Method is involved, it can handle the Compose key, and that’s what fcitx5 is doing for me now, so that everything’s behaving the same… but that “same” is not what I reckon it should be.

lostlogin

4 months ago

I’m no longer concerned you’re an AI, but I am concerned.

tkgally

4 months ago

1 reply

Due to the interest in this project, I created a second, more comprehensive version of the leaderboard:

This second version was vibe-coded with Codex CLI. I also tried Gemini CLI, but it didn’t work very well. The SQL scripts I ran at BigQuery were by Claude.

I am not a programmer or web designer, so I will leave these pages as they are, warts and all. It was a fun project, though. I never would have attempted something like this pre-vibe-coding.

SequoiaHope

4 months ago

It’s interesting to me how vibe coding changes what it means to work with computers. So much more is possible now for an individual programmer.

attogram

4 months ago

1 reply

So now some folks will intentially add in em dashes to get on the leaderboard — oops!

Wowfunhappy

4 months ago

You can't, it only measures posts prior to the release of ChatGPT.

[0] https://console.cloud.google.com/marketplace/product/y-combi...

4 months ago

1 reply

Using the HN public dataset in Google BigQuery [0], which I think fits easily in the amount of free queries allowed:

  SELECT 
    EXTRACT(YEAR FROM timestamp) AS year, 
    SUM(CASE WHEN text LIKE '%—%' THEN 1 ELSE 0 END) AS withDash, 
    COUNT(*) AS total, 
    SUM(CASE WHEN text LIKE '%—%' THEN 1 ELSE 0 END) / COUNT(*) AS fraction
  FROM `bigquery-public-data.hacker_news.full` 
    WHERE type = 'comment' 
  GROUP BY year 
  ORDER BY year;

  year with—   total  frac
  2006     0      12 0.000
  2007    13   70858 0.000
  2008   461  247922 0.001
  2009  1497  491034 0.003
  2010  3835  842438 0.005
  2011  4719 1044913 0.005
  2012  5648 1246782 0.005
  2013  7881 1665185 0.005
  2014  8400 1510814 0.006
  2015  9967 1642912 0.006
  2016 12081 2093612 0.006
  2017 14530 2361709 0.006
  2018 19246 2384086 0.008
  2019 23662 2755063 0.009
  2020 27316 3243173 0.008
  2021 32863 3765921 0.009
  2022 34657 4062159 0.009
  2023 36611 4221940 0.009
  2024 32543 3339861 0.010
  2025 30608 2231919 0.014

So there's definitely been an increase.

Querying for the users who use "—" most as a proportion of all their comments:

  SELECT
    `by`,
    SUM(CASE WHEN text LIKE '%—%' THEN 1 ELSE 0 END) / COUNT(*) AS fraction,
    COUNT(*) AS total,
    MIN(timestamp) AS minTime,
    MAX(timestamp) AS maxTime
  FROM `bigquery-public-data.hacker_news.full` 
  WHERE 
    type = 'comment' AND 
    timestamp < '2022-11-30' 
  GROUP BY `by`
  HAVING COUNT(*) > 100
  ORDER BY fraction DESC
  LIMIT 250;

zmgsabst uses them the most [1], westoncb [2] is an older account that uses them fourth-most.

[1] https://news.ycombinator.com/threads?id=zmgsabst

[2] https://news.ycombinator.com/threads?id=westoncb

4 months ago

2 replies

I took a peak at zmgsabst's comments, but they use them with spaces around the dash — like this.

ChatGPT always uses them without spaces—like this.