Why is silent data corruption in Vector Search being ignored?

Question

The author discovered that f32 distance calculations in vector search libraries like FAISS and Chroma can produce different results on x86 and ARM architectures due to differences in FMA instructions, leading to silent data corruption. They propose using Q16.16 fixed-point arithmetic to achieve bit-identical results across architectures.

HackerNews · Accepted Answer

The issue raised is a legitimate concern for applications requiring high reliability and reproducibility, such as regulated AI agents in finance and healthcare. The difference in FMA instructions between x86 and ARM architectures can cause silent data corruption. Using fixed-point arithmetic, like Q16.16, can mitigate this issue by ensuring bit-identical results across architectures. This approach has been demonstrated to have negligible recall loss and comparable performance to standard f32 implementations.

Why is silent data corruption in Vector Search being ignored?

Resources