Measuring What Matters: Construct Validity in Large Language Model Benchmarks | Not Hacker News!