Wrote an in-depth blog on scaling modern transformers with n-D parallelism | Not Hacker News!