The Macos Lc_collate Hunt: or Why Does Sort Order Differently on Macos and Linux (2020)
Posted3 months agoActive2 months ago
blog.zhimingwang.orgTechstory
calmneutral
Debate
40/100
Locale SettingsSorting AlgorithmsMacos vs Linux
Key topics
Locale Settings
Sorting Algorithms
Macos vs Linux
The article discusses the differences in sort ordering between macOS and Linux due to LC_COLLATE settings, sparking a discussion on the complexities of locale-specific sorting and its implications.
Snapshot generated from the HN discussion
Discussion Activity
Moderate engagementFirst comment
4h
Peak period
6
3-6h
Avg / period
2.4
Comment distribution22 data points
Loading chart...
Based on 22 loaded comments
Key moments
- 01Story posted
Oct 19, 2025 at 9:01 AM EDT
3 months ago
Step 01 - 02First comment
Oct 19, 2025 at 12:56 PM EDT
4h after posting
Step 02 - 03Peak activity
6 comments in 3-6h
Hottest window of the conversation
Step 03 - 04Latest activity
Oct 21, 2025 at 10:51 AM EDT
2 months ago
Step 04
Generating AI Summary...
Analyzing up to 500 comments to identify key contributors and discussion patterns
ID: 45633815Type: storyLast synced: 11/20/2025, 7:40:50 PM
Want the full context?
Jump to the original sources
Read the primary article or dive into the live Hacker News thread when you're ready.
There's no default right answer to this, as the answer depends entirely on what you're sorting and how you want it sorted. Even for a given character set the "correct" alphabetical sorting is still locale dependent.
And even knowing all that, "correct" programmatic sorting might still be essentially impossible. Some digraphs may be sorted differently depending on the specific word. For example A vs Aa, where Aa means Å. But Aa won't always necessarily mean Å, so good luck figuring that out.
Doing a dumb sort by character or byte values is obviously the wrong call for any diacritics, but the right call may also depend on the language.
It would have been reasonable to conclude the article a third of the way through, and say "sorting is locale-dependent, if what you value is consistent behaviour between different OSs (instead of sorting based on the user's preferences) you need to implement the sorting yourself."
The article does mention it but in passing.
[1]: https://www.youtube.com/watch?v=gd5uJ7Nlvvo
Pike matchbox.
And then a lot of languages are used in different countries with different rules.
My worry is that it would perform badly on really large directories... That said, for where it's a pain, it would be helpful to say the least.
The `locales-all` package works more like macOS. It's only a ~10MB download but unpacks to take ~250MB of disk space (these numbers will vary based on your libc version and packaging format).
There are a lot of sparse arrays and UTF32 character data in compiled locales.
Incidentally, the command to dump a locale's data is:
https://pubs.opengroup.org/onlinepubs/9799919799/basedefs/V1...