Understanding RL for model training, and future directions with GRAPE | Not Hacker News!