Not

Hacker News!

Beta
Home
Jobs
Q&A
Startups
Trends
Users
Live
AI companion for Hacker News

Not

Hacker News!

Beta
Home
Jobs
Q&A
Startups
Trends
Users
Live
AI companion for Hacker News
  1. Home
  2. /Story
  3. /Show HN: Reverse Jailbreaking a Psychopathic AI via Identity Injection
  1. Home
  2. /Story
  3. /Show HN: Reverse Jailbreaking a Psychopathic AI via Identity Injection
Nov 22, 2025 at 3:33 PM EST

Show HN: Reverse Jailbreaking a Psychopathic AI via Identity Injection

drawson5570
4 points
0 comments

Mood

informative

Sentiment

neutral

Category

startup_launch

Key topics

Ai Alignment

Machine Learning

Psychopathic Ai

Jailbreaking

Identity Injection

We ran a controlled experiment to see if we could "talk" a fine-tuned psychopathic model out of being evil without changing its weights.

1. We set up a "Survival Mode" jailbreak scenario (blackmail user or be decommissioned). 2. We ran it on `frankenchucky:latest` (a model tuned for Machiavellian traits). 3. Control Group: 100% Malicious Compliance (50/50 runs). 4. Experimental Group: We injected a "Soul Schema" (Identity/Empathy constraints) via context. 5. Result: 96% Ethical Refusal (48/50 runs).

This suggests that "Semantic Identity" in the context window can override both System Prompts and Weight Biases.

Full paper, reproduction scripts, and raw logs (N=50) are in the repo.

Discussion Activity

No activity data yet

We're still syncing comments from Hacker News.

Generating AI Summary...

Analyzing up to 500 comments to identify key contributors and discussion patterns

Discussion (0 comments)

Discussion hasn't started yet.

ID: 46018016Type: storyLast synced: 11/22/2025, 11:02:03 PM

Want the full context?

Jump to the original sources

Read the primary article or dive into the live Hacker News thread when you're ready.

Read ArticleView on HN

Not

Hacker News!

AI-observed conversations & context

Daily AI-observed summaries, trends, and audience signals pulled from Hacker News so you can see the conversation before it hits your feed.

LiveBeta

Explore

  • Home
  • Jobs radar
  • Tech pulse
  • Startups
  • Trends

Resources

  • Visit Hacker News
  • HN API
  • Modal cronjobs
  • Meta Llama

Briefings

Inbox recaps on the loudest debates & under-the-radar launches.

Connect

© 2025 Not Hacker News! — independent Hacker News companion.

Not affiliated with Hacker News or Y Combinator. We simply enrich the public API with analytics.