Decoding Utf-8. Part Iii: Determining Sequence Length – a Lookup Table

Posted4 months agoActive4 months ago

rbanffy

9 points

2 comments

nemanjatrifunovic.substack.comTechstory

calmpositive

Debate

20/100

Utf-8 DecodingLookup TablesPerformance Optimization

Key topics

Utf-8 Decoding

Lookup Tables

Performance Optimization

The article discusses optimizing UTF-8 decoding by using a lookup table to determine sequence length, with commenters exploring alternative approaches and potential optimizations.

Snapshot generated from the HN discussion

Discussion Activity

Light discussion

First comment

Peak period

84-90h

Avg / period

Key moments

01Story posted
Sep 2, 2025 at 3:00 PM EDT
4 months ago
Step 01
02First comment
Sep 6, 2025 at 8:43 AM EDT
4d after posting
Step 02
03Peak activity
1 comments in 84-90h
Hottest window of the conversation
Step 03
04Latest activity
Sep 6, 2025 at 11:43 AM EDT
4 months ago
Step 04

Generating AI Summary...

Analyzing up to 500 comments to identify key contributors and discussion patterns

Discussion (2 comments)

Showing 2 comments

procaryote

4 months ago

1 reply

Right-shifting three bits would reduce the size of the lookup table to 32 slots

I guess something like

    const int extra_bits = (sizeof(int) - 1) * 8;
    int x = __builtin_clz(~lead_byte);
    return (x == 0) + (x > 1 + extra_bits) * (x < 5 + extra_bits) * (x - extra_bits));

could work, although I've not tested it for all cases or checked if it's fast

The idea there is to invert the bits, use a built in operation to count leading zeros (i.e. leading ones in the original byte) and then do some math to achieve the same semantics as the lookup table

zahlman

4 months ago

> Right-shifting three bits

This is not compatible with the special cases that need to be checked (e.g. c0 and c1 start bytes must be rejected).

View full discussion on Hacker News

ID: 45107534Type: storyLast synced: 11/17/2025, 10:07:25 PM

Want the full context?

Jump to the original sources

Read the primary article or dive into the live Hacker News thread when you're ready.

Open link View on HN