A Linear-Time Alternative for Dimensionality Reduction and Fast Visualisation
Key topics
The quest for faster dimensionality reduction just got a boost with a new linear-time alternative to t-SNE, sparking excitement and curiosity among data enthusiasts. The author, romanfll, built this tool to enable client-side dimensionality reduction in the browser, and commenters are now digging into its inner workings, comparing it to existing methods like UMAP and Landmark MDS. While some users, like lmeyerov, share their experience with UMAP, noting its performance is fine for interactive use at certain data sizes, others, like threeducks, raise questions about the new tool's runtime, suggesting potential optimization opportunities. As the discussion unfolds, it becomes clear that this innovation has the potential to make a real impact in the field of data visualization.
Snapshot generated from the HN discussion
Discussion Activity
Very active discussionFirst comment
34m
Peak period
21
0-6h
Avg / period
6.2
Based on 37 loaded comments
Key moments
- 01Story posted
Dec 16, 2025 at 1:47 AM EST
20 days ago
Step 01 - 02First comment
Dec 16, 2025 at 2:21 AM EST
34m after posting
Step 02 - 03Peak activity
21 comments in 0-6h
Hottest window of the conversation
Step 03 - 04Latest activity
Dec 19, 2025 at 1:17 AM EST
17 days ago
Step 04
Generating AI Summary...
Analyzing up to 500 comments to identify key contributors and discussion patterns
Want the full context?
Jump to the original sources
Read the primary article or dive into the live Hacker News thread when you're ready.
This approach ("Sine Landmark Reduction") uses linearised trilateration—similar to GPS positioning—against a synthetic "sine skeleton" of landmarks.
The main trade-offs:
It is O(N) and deterministic (solves Ax=b instead of iterative gradient descent).
It forces the topology onto a loop structure, so it is less accurate than UMAP for complex manifolds (like Swiss Rolls), but it guarantees a clean layout for user interfaces.
It can project ~9k points (50 dims) to 3D in about 2 seconds on a laptop CPU. Python implementation and math details are in the post. Happy to answer questions!
I have a couple of questions for now: (1) I am confused by your last sentence. It seems you're saying embeddings are a substitute for clustering. My understanding is that you usually apply a clustering algorithm over embeddings - good embeddings just ensure that the grouping produced by the clustering algo "makes sense".
(2) Have you tried PaCMAP? I found it to produce high quality and quick results when I tried it. Haven't tried it in a while though - and I vaguely remember that it won't install properly on my machine (a Mac) the last time I had reached out for it. Their group has some new stuff coming out too (on the linked page).
[1] https://github.com/YingfanWang/PaCMAP
My last sentence was on more valuable problems, we are finding it makes sense to go straight to GNNs, LLMs, etc and embed multidimensional data that way vs via UMAP dim reductions. We can still use UMAP as a generic hammer to control further dimensionality reductions, but the 'hard' part would be handled by the model. With neural graph layouts, we can potentially even skip the UMAP for that too.
Re:pacmap, we have been eyeing several new tools here, but so far haven't felt the need to go from UMAP to them. We'd need to see significant improvements given the quality engineering in UMAP has set the bar high. In theory I can imagine some tools doing better in the future, but the creators have't done the engineering investment, so internally, we rather stay with UMAP. We make our API pluggable, so you can pass in results from other tools, and we haven't heard much from that path from others.
Table stakes for our bigger users:
- parity or improvement on perf, for both CPU & GPU mode
- better support for learning (fit->transform) so we can embed billion+ scale data
- expose inferred similarity edges so we can do interactive and human-optimized graph viz, vs overplotted scatterplots
New frontiers:
- alignment tooling is fascinating, as we increasingly want to re-fit->embed over time as our envs change and compare, eg, day-over-day analysis. This area is not well-defined yet common for anyone operational so seems ripe for innovation
- maybe better support for mixing input embeddings. This seems increasingly common in practice, and seems worth examining as special cases
Always happy to pair with folks in getting new plugins into the pygraphistry / graphistry community, so if/when ready, happy to help push a PR & demo through!
It is probably not all the things you want, but AlignedUMAP can do some of this right now: https://umap-learn.readthedocs.io/en/latest/aligned_umap_bas...
If you want to do better than that, I would suggest that the quite new landmarked parametric UMAP options are actually very good this: https://umap-learn.readthedocs.io/en/latest/transform_landma...
Training the parametric UMAP is a little more expensive, but the new landmarked based updating really does allow you to steadily update with new data and have new clusters appear as required. Happy to chat as always, so reach out if you haven't already looked at this and it seems interesting.
My experience has been as workloads get heavier, it's "cheaper" to push to an accelerated & dedicated inferencing server. This doesn't always work though, eg, world of difference between realtime video on phones vs an interactive chat app.
Re:edges, I've been curious about the push by a few to 'foundation GNNs', and it may be fun to compare UMAP on edges to those...
The technique actually supports both modes in the implementation (synthetic skeleton or random subsampling). However, for this browser visualisation, we default to the synthetic sine skeleton for two reasons:
1. Determinism: Random landmarks produce a different layout every time you calculate the projection. For a user interface, we needed the layout to be identical every time the user loads the data, without needing to cache a random seed. 2. Topology Forcing: By using a fixed sine/loop skeleton, we implicitly 'unroll' the high-dimensional data onto a clean reduced structure. We found this easier for users to visually navigate compared to the unpredictable geometry that comes from a random subset
UMAP is not O(n^2) it is O(n log n).
To clarify: K is a fixed hyperparameter in this implementation, strictly independent of N. Whether we process 9k points or 90k points, we keep K at ~100. We found that increasing K yields diminishing returns very quickly. Since the landmarks are generated along a fixed synthetic topology, increasing K essentially just increases resolution along that specific curve, but once you have enough landmarks to define the curve's structure, adding more doesn't reveal new topology… it just adds computational cost to the distance matrix calculation. Re: sqrt(N): That is purely a coincidence!
Code:
``` import numpy as np import time import matplotlib.pyplot as plt from sklearn.base import BaseEstimator, TransformerMixin from sklearn.decomposition import PCA from sklearn.preprocessing import StandardScaler, MinMaxScaler from sklearn.datasets import load_digits from sklearn.metrics import pairwise_distances from sklearn.manifold import TSNE
try: import umap HAS_UMAP = True except ImportError: HAS_UMAP = False print("Warning: 'umap-learn' not installed. Comparison will be skipped.")
class SineLandmarkReduction(BaseEstimator, TransformerMixin): def __init__(self, n_components=2, n_landmarks=50, mode='data_derived', # 'sine' or 'data_derived' distance_warping=1.0, random_state=42): self.n_components = n_components self.n_landmarks = n_landmarks self.mode = mode self.p = distance_warping self.random_state = random_state self.rng = np.random.RandomState(random_state)
if HAS_UMAP: digits = load_digits() X = digits.data y = digits.target else: print("Please install umap-learn to run the comparison: pip install umap-learn") ```Would be nice to see it come back. Would love to browse for books and movies on maps again, rather that getting lists regurgitated at me.