Indexing 100m Vectors in 20 Minutes on Postgresql with 12gb RAM

Posted25 days agoActive19 days ago

gaocegege

69 points

13 comments

blog.vectorchord.aiTech Discussionstory

informativepositive

Debate

20/100

PostgresqlVector DatabasesDatabase Optimization

Key topics

Postgresql

Vector Databases

Database Optimization

The impressive feat of indexing 100M vectors in 20 minutes on PostgreSQL with just 12GB RAM has sparked a lively debate about the need for cloud infrastructure. While some commenters, like duckbot3000, wonder if cloud is still necessary beyond remote backups, others, such as setr and riku_iki, point out the complexities of on-premises infrastructure and the value of cloud services in handling failovers and redundancy. The discussion highlights that cloud infrastructure simplifies tasks like disk redundancy and failover nodes, with some, like positron26, using it to streamline their operations, while others, like nwellinghoff, lament the limitations of managed cloud services, such as AWS RDS.

Snapshot generated from the HN discussion

Discussion Activity

Active discussion

First comment

10h

Peak period

120-132h

Avg / period

4.3

Comment distribution26 data points

Loading chart...

Based on 26 loaded comments

Key moments

01Story posted
Dec 8, 2025 at 12:30 PM EST
25 days ago
Step 01
02First comment
Dec 8, 2025 at 10:38 PM EST
10h after posting
Step 02
03Peak activity
18 comments in 120-132h
Hottest window of the conversation
Step 03
04Latest activity
Dec 14, 2025 at 10:56 AM EST
19 days ago
Step 04

Generating AI Summary...

Analyzing up to 500 comments to identify key contributors and discussion patterns

Discussion (13 comments)

Showing 26 comments

duckbot3000

25 days ago

3 replies

Kinda makes you wonder why you need cloud for anything besides remote encrypted backups if you can run all that on 12GB

setr

25 days ago

1 reply

Because getting any hardware out of infra-team on premise is utterly miserable, across the board.

lelanthran

20 days ago

1 reply

That's not the only alternative.

Rent your VPS and add in extra volumes for like $10 per 100GB.

Imustaskforhelp

20 days ago

2 replies

Funny thing but netcup has $10 per 1 TB

Netcup is under-rated but there are also other providers too at lowendbox/lowendtalk and I am interested to try out hetzner too sometime.

benjiro

20 days ago

And if you want to go even cheaper, check out Hetzner their EX63 (go to custom) > 4x 7.68TB drives for like 140 Euro.

Not counting the fact that Netcup is raided (also Netcup is limited to 8TB on a VPS).

That is like 4.7 Euro /TB. That is like 4$/TB. 6 Euro / TB in a raid 5 setup.

I do not understand why they are not using this new pricing model on their older servers. There the best you can get is like 10 Euro /TB (for the single 15TB U.2).

lelanthran

20 days ago

> Funny thing but netcup has $10 per 1 TB

Nice to know, but I was just guessing at what a reasonable price would be :-)

riku_iki

24 days ago

4 replies

what about failover story if server dies? PG failover setup is complicated, and cloud infra handles this for you.

positron26

20 days ago

1 reply

Do we mean managed or PG on K8s like CNPG? In all cases, I use the infra to simplify things like having disk redundancy and failover nodes, not because 12GB is interesting.

riku_iki

20 days ago

1 reply

Primary managed PG, since you still need setup/maintenance/monitoring on your K8S own solution.

positron26

20 days ago

You guys are doing monitoring? ;-)

logifail

20 days ago

2 replies

(Genuine question) What's your current plan for when your cloud provider goes offline? Do you have a failover story, or it a case of "wait for them to come back online"?

riku_iki

20 days ago

I have backups on different cloud provider, so I could bootstrap db if provider goes dark indefinitely.

But realistically, I believe major clouds (google, aws) likely has more robust org and infra for recovery than I can built and maintain.

walthamstow

20 days ago

For most businesses, if you're with AWS and they go down you almost get a pass on being down. It's like the internet was down, nothing you coulda done about it.

tjwebbnorfolk

20 days ago

1 reply

What are you willing to pay for cloud-native failover?

Not every use case requires 100% uptime

riku_iki

20 days ago

1 reply

Sure, but those who require (99% of major businesses) are ready to pay.

Nextgrid

20 days ago

1 reply

Is that why most of them go down every time a single provider or even region goes down?

Actual active-active HA of your datastores is really hard to do (CAP theorem and all that). The majority of companies don't do it.

riku_iki

19 days ago

PG doesn't have active-active. Solution is to have multizone failover with replication.

benjiro

20 days ago

https://github.com/multigres/multigres ... when its complete. From the guy that made Vitess for Mysql.

And yes, i agree, the PG failover setup (and especially dealing with a failure afterwards, to restore the ex-master is beyond infuriating).

But its not pay 10x the amount, while eating easily 10x performance infuriating :)

Nextgrid

20 days ago

You don't, but business executives aren't the kind to easily admit they got conned - and if they're getting close to that stage, a nice dinner or golfing session paid by the vendor's representative generally alleviates those feelings very well.

Engineers who started their career during the cloud craze and don't know anything else are also not the kind to rock the boat, lest the cash cow dies and their whole "investment" in their career becomes useless.

q3k

20 days ago

2 replies

> $272 monthly + GPU cost

Imagine paying $250+/mo for 32GB of RAM and 4 VCPUs. No wonder Amazon is swimming in cash, the markup on this is bonkers.

anko

20 days ago

1 reply

100% this, i've been finding metal is getting very compelling against aws. For example latitude has 4 real cores and 32 GB of ram for $92/month.

https://www.latitude.sh/pricing/c2-small-x86?gen=gen-2

hetzner doesn't even have specs this low from what i can tell!

https://www.hetzner.com/dedicated-rootserver/#cores_threads_...

q3k

20 days ago

It has a VM with 32GB RAM and 4x the cores for 1/10th of the price: 25eur/mo. Actually even lower effectively, because it has 20TB of included traffic, and the overage cost for it is ~1/10th of the AWS egress cost.

Or, for 184eur/mo you can get one of their bare metal GPU offerings with 64GB of RAM, a i5-13500 and an RTX4000.

paulddraper

20 days ago

+ 3.75 TB NVMe

A year commitment takes 30% off.

ayende

20 days ago

That suffer from a serious issue

You must have the data upfront, you cannot build this in an incremental fashion

There is also bo mention on how this would handle updates, and from the description, even if updates are possible, this will degrade over time, requiring new indexing batch

nwellinghoff

25 days ago

Too bad aws does not support any of these other vector extensions in managed rds.

esafak

20 days ago

How does it compare with paradedb and lancedb?

View full discussion on Hacker News

ID: 46195068Type: storyLast synced: 12/13/2025, 8:00:26 PM

Want the full context?