Nov 23, 2025 at 8:32 AM EST

How Proper Names Behave in Text Embedding Space

etoud

2 points

1 comments

Mood

informative

Sentiment

neutral

Discussion Activity

Light discussion

First comment

N/A

Peak period

Hour 1

Avg / period

Comment distribution1 data points

Loading chart...

Based on 1 loaded comments

Key moments

01Story posted
Nov 23, 2025 at 8:32 AM EST
18h ago
Step 01
02First comment
Nov 23, 2025 at 8:32 AM EST
0s after posting
Step 02
03Peak activity
1 comments in Hour 1
Hottest window of the conversation
Step 03
04Latest activity
Nov 23, 2025 at 8:32 AM EST
18h ago
Step 04

Generating AI Summary...

Analyzing up to 500 comments to identify key contributors and discussion patterns

Discussion (1 comments)

Showing 1 comments

etoud

18h ago

I was debugging a RAG system and noticed that “semantic” dense retrievers were oddly good at author names, even when hybrid clearly worked better overall. This post builds a small diagnostic around synthetic (author, topic) queries and shows that proper names carry about half as much separation power as the topic in embedding space. Then I systematically “break” the names (masks, gibberish IDs, small edit-distance corruptions, formatting and layout changes) to see what survives, and find that most of the signal comes from surface form and exact-match bias rather than any deep notion of identity.

View full discussion on Hacker News

ID: 46023432Type: storyLast synced: 11/23/2025, 6:43:48 PM

Want the full context?

Jump to the original sources

Read the primary article or dive into the live Hacker News thread when you're ready.

Read Article View on HN

Nov 23, 2025 at 8:32 AM EST

How Proper Names Behave in Text Embedding Space

etoud

2 points

1 comments

Mood

informative

Sentiment

neutral

Discussion Activity

Light discussion

First comment

N/A

Peak period

Hour 1

Avg / period

Comment distribution1 data points

Loading chart...

Based on 1 loaded comments

Key moments

01Story posted
Nov 23, 2025 at 8:32 AM EST
18h ago
Step 01
02First comment
Nov 23, 2025 at 8:32 AM EST
0s after posting
Step 02
03Peak activity
1 comments in Hour 1
Hottest window of the conversation
Step 03
04Latest activity
Nov 23, 2025 at 8:32 AM EST
18h ago
Step 04

Generating AI Summary...

Analyzing up to 500 comments to identify key contributors and discussion patterns

Discussion (1 comments)

Showing 1 comments

etoud

18h ago

View full discussion on Hacker News

ID: 46023432Type: storyLast synced: 11/23/2025, 6:43:48 PM

Want the full context?

Jump to the original sources

Read the primary article or dive into the live Hacker News thread when you're ready.

Read Article View on HN