BeyondWeb: Lessons from Scaling Synthetic Data for Trillion-Scale Pretraining | Not Hacker News!