We Tested 6 AI Models on 3 Common Security Exploits
Postedabout 2 months agoActiveabout 2 months ago
blog.kilocode.aiTechstory
skepticalnegative
Debate
40/100
AI SecurityLLM EvaluationModel Bias
Key topics
AI Security
LLM Evaluation
Model Bias
The post tests 6 AI models on 3 common security exploits, but the discussion raises concerns about the methodology, particularly the use of a different model to judge the output of another.
Snapshot generated from the HN discussion
Discussion Activity
Light discussionFirst comment
3h
Peak period
1
2-3h
Avg / period
1
Key moments
- 01Story posted
Nov 5, 2025 at 5:23 PM EST
about 2 months ago
Step 01 - 02First comment
Nov 5, 2025 at 8:01 PM EST
3h after posting
Step 02 - 03Peak activity
1 comments in 2-3h
Hottest window of the conversation
Step 03 - 04Latest activity
Nov 5, 2025 at 8:01 PM EST
about 2 months ago
Step 04
Generating AI Summary...
Analyzing up to 500 comments to identify key contributors and discussion patterns
ID: 45828917Type: storyLast synced: 11/17/2025, 7:54:12 AM
Want the full context?
Jump to the original sources
Read the primary article or dive into the live Hacker News thread when you're ready.
That's a bit silly, especially since all openai models will share some elements. The points lost meaning there. They could for example use glm for all judging instead. Or go all the way and do a full matrix of everything judging everything else.