From multi-head to latent attention: The evolution of attention mechanisms | Not Hacker News!