Visual Features Across Modalities: Svg and Ascii Art Cross-Modal Understanding

Posted2 months agoActiveabout 2 months ago

vismit2000

12 points

1 comments

transformer-circuits.pubResearchstory

calmpositive

Debate

0/100

AI ResearchMultimodal LearningComputer Vision

Key topics

AI Research

Multimodal Learning

Computer Vision

Researchers demonstrate a cross-modal understanding between SVG and ASCII art using transformer circuits, sparking interest in the HN community for its potential applications in AI research.

Snapshot generated from the HN discussion

Discussion Activity

Light discussion

First comment

10d

Peak period

Day 10

Avg / period

Key moments

01Story posted
Oct 25, 2025 at 7:25 AM EDT
2 months ago
Step 01
02First comment
Nov 4, 2025 at 4:07 AM EST
10d after posting
Step 02
03Peak activity
1 comments in Day 10
Hottest window of the conversation
Step 03
04Latest activity
Nov 4, 2025 at 4:07 AM EST
about 2 months ago
Step 04

Generating AI Summary...

Analyzing up to 500 comments to identify key contributors and discussion patterns

Discussion (1 comments)

Showing 1 comments

robot-wrangler

about 2 months ago

Generating and displaying diagrams in mermaid, svg, or css has become one of my go-to tests for reasoning. This seems fair because while SVG is admittedly syntactically difficult and maybe not emphasized in training, CSS is certainly a popular output target, and mermaid is very simple. It seems like SOTA should be able to draw and modify things that it "understands".

I'm much more interested in stuff like Venn diagrams and bipartite graphs than pictures of cats or pelicans riding bikes. It's similar to a code-generation problem in that output is a new artifact that's one step away from the problem-presentation, but it has the advantage that it's simpler than code, is less likely to have exact-match training data, usually has one correct answer, and is easy to check. Try making venn diagrams on a few circles with "exactly and only the following intersections" and gradually elaborating the spec.

This is a great way to get a starter diagram boilerplate if that's what you're looking for. One shot prompts for simple things are ok, sometimes. But it always completely falls apart when you try to iterate with small modifications, introducing errors in parts that were correct previously or ignoring requested changes. Maybe it's wrong to conclude anything from that, but to me this looks bad for the "they can reason!" argument and very bad for trusting complicated work in other domains that are harder to check. Haven't read TFA yet, but whether it confirms or denies my gut here hopefully it's going to add some perspective

View full discussion on Hacker News

ID: 45702993Type: storyLast synced: 11/20/2025, 3:10:53 PM

Want the full context?

Jump to the original sources

Read the primary article or dive into the live Hacker News thread when you're ready.

Open link View on HN