Experiment: Grid-Indexed Localization with Llms for Bounding Boxes
Posted4 months ago
github.comResearchstory
informativeneutral
Debate
20/100
Large Language ModelsDetrsLocalization
Key topics
Large Language Models
Detrs
Localization
Discussion Activity
Light discussionFirst comment
N/A
Peak period
1
Start
Avg / period
1
Key moments
- 01Story posted
Aug 26, 2025 at 5:36 PM EDT
4 months ago
Step 01 - 02First comment
Aug 26, 2025 at 5:36 PM EDT
0s after posting
Step 02 - 03Peak activity
1 comments in Start
Hottest window of the conversation
Step 03 - 04Latest activity
Aug 26, 2025 at 5:36 PM EDT
4 months ago
Step 04
Generating AI Summary...
Analyzing up to 500 comments to identify key contributors and discussion patterns
ID: 45032604Type: storyLast synced: 11/18/2025, 12:08:58 AM
Want the full context?
Jump to the original sources
Read the primary article or dive into the live Hacker News thread when you're ready.
As a workaround, I tried a different approach: instead of asking the model for raw pixel coordinates, I divide the image into a grid and let the LLM reason in terms of grid cells (e.g. row/column indices). These grid indices are then mapped back into pixel coordinates.
This “grid-indexed” method doesn’t solve everything, but it seems to reduce randomness and makes outputs more stable across providers (OpenAI, Anthropic, Gemini, etc.). It’s lightweight — just a single JS file + example HTML demo.
Code and README are here: https://github.com/IntelligenzaArtificiale/GILM-Grid-Indexed...
I’d be curious if others have tried similar approaches, or if anyone has ideas on how to improve robustness of bounding box detection with LLMs.