Show HN: New eval from SWE-bench team evalutes LMs based on goals not tickets | Not Hacker News!