How Big Are Our Embeddings Now and Why?
Posted4 months agoActive4 months ago
vickiboykis.comTechstory
calmmixed
Debate
60/100
EmbeddingsLarge Language ModelsArtificial IntelligenceNlp
Key topics
Embeddings
Large Language Models
Artificial Intelligence
Nlp
The article discusses the growing size of embeddings in AI models and the discussion revolves around the reasons behind this trend, its implications, and potential future directions.
Snapshot generated from the HN discussion
Discussion Activity
Moderate engagementFirst comment
3d
Peak period
9
78-84h
Avg / period
4.7
Comment distribution14 data points
Loading chart...
Based on 14 loaded comments
Key moments
- 01Story posted
Sep 2, 2025 at 7:45 AM EDT
4 months ago
Step 01 - 02First comment
Sep 5, 2025 at 1:53 PM EDT
3d after posting
Step 02 - 03Peak activity
9 comments in 78-84h
Hottest window of the conversation
Step 03 - 04Latest activity
Sep 6, 2025 at 5:19 AM EDT
4 months ago
Step 04
Generating AI Summary...
Analyzing up to 500 comments to identify key contributors and discussion patterns
ID: 45101797Type: storyLast synced: 11/20/2025, 2:38:27 PM
Want the full context?
Jump to the original sources
Read the primary article or dive into the live Hacker News thread when you're ready.
“With the constant upward pressure on embedding sizes not limited by having to train models in-house, it’s not clear where we’ll slow down: Qwen-3, along with many others is already at 4096”
But aren’t embedding models separate from the LLMs? The size of attention heads in LLMs etc isn’t inherently connected to how a lab might train and release an embedding model. I don’t really understand why growth in LLM size fundamentally puts upward pressure on embedding size as they are not intrinsically connected.
However the embedding dimension sets the rank of the token representation space. Each layer can transform or refine those vectors, but it can’t expand their intrinsic capacity. A tall but narrow network is bottlenecked by that width. Width-first scaling tends to outperform pure depth scaling, you want enough representational richness per token before you start stacking more layers of processing.
So yeah, embedding size doesn’t have to scale up in lockstep with model size, but in practice it usually does, because once models grow deeper and more capable, narrow embeddings quickly become the limiting factor.
> As a quick review, embeddings are compressed numerical representations of a variety of features (text, images, audio) that we can use for machine learning tasks like search, recommendations, RAG, and classification.
Current standalone embedding models are not intrinsically connected to SotA LLM architectures (e.g. the Qwen reference) -- right? The article seems to mix the two ideas together.
So an old down-pressure on sizes – internal training costs & resource limits – now weaker. And as long as LLMs are seeing benefits from larger embeddings, they'll become more common and available. (Of course via truncation/etc, no one is forced to use larger than works for them... but larger may keep becoming more common & available.)
Like LLMs, the bottleneck is still training data and the training regimen, but there's still a demand for smaller embedding models due to both storage and compute concerns. EmbeddingGemma (https://huggingface.co/google/embeddinggemma-300m), released just yesterday, beats the 4096D Qwen-3 benchmarks at 768D, and using the 128D equivalent via MRL beats many 768D embedding models.
A few months ago I happened to play with OpenAI’s embeddings model (can’t remember which ones) and I was shocked to see that the cosine similarity of most texts was super close, even if the texts had nothing in common. It’s like the wide 0-1 range that USE (and later BERT) were giving me was compressed to perhaps a 0.2 one. Why is that? Does it mean those embeddings are not great for semantic similarity?
The absolute value of cosine similarity isn't critical (just the order when comparing multiple candidates), but if you finetune an embeddings model for a specific domain, the model will give a wider range of cosine similarity since it can learn which attributes specifically are similar/dissimilar.
But now, like the OpenAI embedding you're talking about the embedding are constrained, trained for retrieval in mind. The pairs are ordered closer, easier to search.
* Small data - talk to your PDF on-the-fly etc: Getting bigger & faster via cloud APIs
* Big data - for RAG: Getting smaller, bc we don't want to pay crazy fees for vector DB hosting, and doable bc easier to get higher-quality small embeddings that do that