Visual Features Across Modalities: Svg and Ascii Art Cross-Modal Understanding
Posted2 months agoActiveabout 2 months ago
transformer-circuits.pubResearchstory
calmpositive
Debate
0/100
AI ResearchMultimodal LearningComputer Vision
Key topics
AI Research
Multimodal Learning
Computer Vision
Researchers demonstrate a cross-modal understanding between SVG and ASCII art using transformer circuits, sparking interest in the HN community for its potential applications in AI research.
Snapshot generated from the HN discussion
Discussion Activity
Light discussionFirst comment
10d
Peak period
1
Day 10
Avg / period
1
Key moments
- 01Story posted
Oct 25, 2025 at 7:25 AM EDT
2 months ago
Step 01 - 02First comment
Nov 4, 2025 at 4:07 AM EST
10d after posting
Step 02 - 03Peak activity
1 comments in Day 10
Hottest window of the conversation
Step 03 - 04Latest activity
Nov 4, 2025 at 4:07 AM EST
about 2 months ago
Step 04
Generating AI Summary...
Analyzing up to 500 comments to identify key contributors and discussion patterns
ID: 45702993Type: storyLast synced: 11/20/2025, 3:10:53 PM
Want the full context?
Jump to the original sources
Read the primary article or dive into the live Hacker News thread when you're ready.
I'm much more interested in stuff like Venn diagrams and bipartite graphs than pictures of cats or pelicans riding bikes. It's similar to a code-generation problem in that output is a new artifact that's one step away from the problem-presentation, but it has the advantage that it's simpler than code, is less likely to have exact-match training data, usually has one correct answer, and is easy to check. Try making venn diagrams on a few circles with "exactly and only the following intersections" and gradually elaborating the spec.
This is a great way to get a starter diagram boilerplate if that's what you're looking for. One shot prompts for simple things are ok, sometimes. But it always completely falls apart when you try to iterate with small modifications, introducing errors in parts that were correct previously or ignoring requested changes. Maybe it's wrong to conclude anything from that, but to me this looks bad for the "they can reason!" argument and very bad for trusting complicated work in other domains that are harder to check. Haven't read TFA yet, but whether it confirms or denies my gut here hopefully it's going to add some perspective