Nov 23, 2025 at 8:32 AM EST
How Proper Names Behave in Text Embedding Space
Mood
informative
Sentiment
neutral
Category
research
Key topics
Text_embedding
Natural_language_processing
Machine_learning
Discussion Activity
Light discussionFirst comment
N/A
Peak period
1
Hour 1
Avg / period
1
Comment distribution1 data points
Loading chart...
Based on 1 loaded comments
Key moments
- 01Story posted
Nov 23, 2025 at 8:32 AM EST
18h ago
Step 01 - 02First comment
Nov 23, 2025 at 8:32 AM EST
0s after posting
Step 02 - 03Peak activity
1 comments in Hour 1
Hottest window of the conversation
Step 03 - 04Latest activity
Nov 23, 2025 at 8:32 AM EST
18h ago
Step 04
Generating AI Summary...
Analyzing up to 500 comments to identify key contributors and discussion patterns
Discussion (1 comments)
Showing 1 comments
18h ago
I was debugging a RAG system and noticed that “semantic” dense retrievers were oddly good at author names, even when hybrid clearly worked better overall. This post builds a small diagnostic around synthetic (author, topic) queries and shows that proper names carry about half as much separation power as the topic in embedding space. Then I systematically “break” the names (masks, gibberish IDs, small edit-distance corruptions, formatting and layout changes) to see what survives, and find that most of the signal comes from surface form and exact-match bias rather than any deep notion of identity.
ID: 46023432Type: storyLast synced: 11/23/2025, 6:43:48 PM
Want the full context?
Jump to the original sources
Read the primary article or dive into the live Hacker News thread when you're ready.