Do Not Write Docs Manually, Let Video2docs Do It for You
Posted3 months agoActive3 months ago
video2docs.comTechstory
supportivepositive
Debate
20/100
DocumentationAutomationVideo Processing
Key topics
Documentation
Automation
Video Processing
The post promotes a tool called video2docs that automatically generates documentation from videos, with the community showing interest and some discussion around its potential applications and limitations.
Snapshot generated from the HN discussion
Discussion Activity
Light discussionFirst comment
N/A
Peak period
1
0-2h
Avg / period
1
Key moments
- 01Story posted
Oct 24, 2025 at 3:59 PM EDT
3 months ago
Step 01 - 02First comment
Oct 24, 2025 at 3:59 PM EDT
0s after posting
Step 02 - 03Peak activity
1 comments in 0-2h
Hottest window of the conversation
Step 03 - 04Latest activity
Oct 25, 2025 at 1:22 PM EDT
3 months ago
Step 04
Generating AI Summary...
Analyzing up to 500 comments to identify key contributors and discussion patterns
ID: 45698504Type: storyLast synced: 11/17/2025, 9:14:05 AM
Want the full context?
Jump to the original sources
Read the primary article or dive into the live Hacker News thread when you're ready.
But the idea is quite simple, recently I had to write more and more docs (more often - how-tos and guides for company systems) and got a bit tired of it. Decided it would be cool if I could just record a video clicking through the app (or multiple apps, does not matter) and then analyse the video content even without audio narration. That is how video2docs was born! I plan to add audio analysis too, for even better quality documentation, but for now, I am happy with how it works even without it.
You can choose from 10 LLM models for video analysis. Choose documentation style (tutorial, how-to, quickstart...) And, of course, choose whether to include screenshots in generated markdown docs. Yay, no need to make screenshots manually! :)
I hope someone else might find this useful. I will continue working on this project!
Is there anything you can share about the architecture or pipeline you used for it? A high-level overview would be enough.
I’m guessing you’re doing video-to-image, image-to-text, and then text-to-docs, right? Since not all of the models you mentioned are multimodal.
More or less, I have python worker that does the video processing job. Video into frames, frame deduplication, frame LLM analysis and then generating docs from that information. Soon audio narration analysis will be added too!