Decoding Utf-8. Part Iii: Determining Sequence Length – a Lookup Table
Posted4 months agoActive4 months ago
nemanjatrifunovic.substack.comTechstory
calmpositive
Debate
20/100
Utf-8 DecodingLookup TablesPerformance Optimization
Key topics
Utf-8 Decoding
Lookup Tables
Performance Optimization
The article discusses optimizing UTF-8 decoding by using a lookup table to determine sequence length, with commenters exploring alternative approaches and potential optimizations.
Snapshot generated from the HN discussion
Discussion Activity
Light discussionFirst comment
4d
Peak period
1
84-90h
Avg / period
1
Key moments
- 01Story posted
Sep 2, 2025 at 3:00 PM EDT
4 months ago
Step 01 - 02First comment
Sep 6, 2025 at 8:43 AM EDT
4d after posting
Step 02 - 03Peak activity
1 comments in 84-90h
Hottest window of the conversation
Step 03 - 04Latest activity
Sep 6, 2025 at 11:43 AM EDT
4 months ago
Step 04
Generating AI Summary...
Analyzing up to 500 comments to identify key contributors and discussion patterns
ID: 45107534Type: storyLast synced: 11/17/2025, 10:07:25 PM
Want the full context?
Jump to the original sources
Read the primary article or dive into the live Hacker News thread when you're ready.
I guess something like
could work, although I've not tested it for all cases or checked if it's fastThe idea there is to invert the bits, use a built in operation to count leading zeros (i.e. leading ones in the original byte) and then do some math to achieve the same semantics as the lookup table
This is not compatible with the special cases that need to be checked (e.g. c0 and c1 start bytes must be rejected).