I Built a Whatsapp AI Assistant That Processes Images, Voice Notes, and Pdfs

Postedabout 2 months ago

elizabeth1212

1 points

1 comments

github.comTechstory

calmpositive

Debate

0/100

AIWhatsappMultimodal Processing

Key topics

Multimodal Processing

The author built a WhatsApp AI assistant that can process various media formats, sparking interest in its capabilities and potential applications.

Snapshot generated from the HN discussion

Discussion Activity

Light discussion

First comment

N/A

Peak period

Start

Avg / period

Key moments

01Story posted
Nov 18, 2025 at 8:02 PM EST
about 2 months ago
Step 01
02First comment
Nov 18, 2025 at 8:02 PM EST
0s after posting
Step 02
03Peak activity
1 comments in Start
Hottest window of the conversation
Step 03
04Latest activity
Nov 18, 2025 at 8:02 PM EST
about 2 months ago
Step 04

Generating AI Summary...

Analyzing up to 500 comments to identify key contributors and discussion patterns

Discussion (1 comments)

Showing 1 comments

elizabeth1212Author

about 2 months ago

I built this because I wanted a personal AI assistant that works where I already chat - WhatsApp. It handles:

- Voice notes → transcription + AI response - Images → vision analysis + answers - PDFs → extracts text + answers questions - Regular text messages

The interesting parts: - Multi-modal handling in one conversation thread - Session management across message types - Conversation history without a database (uses conversation context)

The LLM integration is abstracted so you can plug in whatever provider you want.

View full discussion on Hacker News

ID: 45974558Type: storyLast synced: 11/19/2025, 1:05:43 AM

Want the full context?

Jump to the original sources

Read the primary article or dive into the live Hacker News thread when you're ready.

Open link View on HN