Claude Opus 4.5, and why evaluating new LLMs is increasingly difficult | Not Hacker News!