FinePDFs: 3T token dataset made from internet PDFs | Not Hacker News!