Last activity 3 months agoPosted Sep 1, 2025 at 12:40 PM EDT

Lessons From Building an AI Data Analyst

pedromnasc

72 points

12 comments

Mood

calm

Sentiment

positive

Discussion Activity

Moderate engagement

First comment

N/A

Peak period

Day 3

Avg / period

Comment distribution12 data points

Loading chart...

Based on 12 loaded comments

Key moments

01Story posted
Sep 1, 2025 at 12:40 PM EDT
3 months ago
Step 01
02First comment
Sep 1, 2025 at 12:40 PM EDT
0s after posting
Step 02
03Peak activity
8 comments in Day 3
Hottest window of the conversation
Step 03
04Latest activity
Sep 4, 2025 at 2:55 AM EDT
3 months ago
Step 04

Generating AI Summary...

Analyzing up to 500 comments to identify key contributors and discussion patterns

Discussion (12 comments)

Showing 12 comments

attogram

3 months ago

1 reply

Great TL;DR section. Context is indeed the product.

blef

3 months ago

1 reply

I would also add that context and tools are the product. This is super important to correctly tune the tools and details matter (ok you can also argue that tools are context in some way)

pedromnascAuthor

3 months ago

Exactly. The main thing is that it is easy to underestimate the impact of the context and proper tools. Narrowing down the search space by adding inductive biases into the the system not only make the multi-agent system more correct, but also faster.

loganfrederick

3 months ago

1 reply

The "Short Story" section definitely matches my experience at most companies, startups and bigger non-tech companies alike: They already have more data than they're aware of and know what to do with, and understanding what they have is the starting point before most analysis can be done.

Glad I read the post as I hadn't heard of Malloy before. Excuse me if I missed the answer to this, but: How much do you as Findly/Conversion Pattern implement the Semantic Layer on behalf of your users (and if so, I assume you have some process for auto-generating the Malloy models), or do your users have to do something to input the semantics themselves?

pedromnascAuthor

3 months ago

> The "Short Story" section definitely matches my experience at most companies, startups and bigger non-tech companies alike: They already have more data than they're aware of and know what to do with, and understanding what they have is the starting point before most analysis can be done.

exactly, most of them are concerned about the data they don't have, while in practice they do have a lot to generate good insights.

> Glad I read the post as I hadn't heard of Malloy before. Excuse me if I missed the answer to this, but: How much do you as Findly/Conversion Pattern implement the Semantic Layer on behalf of your users (and if so, I assume you have some process for auto-generating the Malloy models), or do your users have to do something to input the semantics themselves?

We do have an automatic semantic layer generation framework which works as a great starting point, but for the generic case you still have to manually edit / improve it based on the customer's internal context. User's can edit themselves in our UI too, but it usually requires some level of help from us.

We do have a vertical product for commodity trading and shipping: https://www.darlinganalytics.ai/ -> in that case the semantic layer is much more well defined, which makes setup way easier.

mrtimo

3 months ago

1 reply

Very cool to see Malloy mentioned here. Great stuff. There is an MCP server built into Malloy Publisher[1]. Perhaps useful to the author or others trying to do something similar to what the author describes. Directions on how to use the MCP server are here [2]. [1] https://github.com/malloydata/publisher [2] https://github.com/malloydata/publisher/blob/main/docs/ai-ag...

pedromnascAuthor

3 months ago

One big problem now is that LLMs are not great at writing Malloy, so it is important to have a intermediate DSL. In the future as the language models evolve or someone creates a fine-tuned model that can write Malloy well, we will be able to have more autonomous agents.

pil0u

3 months ago

1 reply

Maybe I haven't found the right tool yet, but every time I see a product trying to solve data analysis with AI, the first example often deals with aggregated revenue over time or something similar.

This can be solved by a student after 3 days of learning SQL from scratch.

The article, while technical, remains pretty vague about implementation and what real, business problem they managed to solve with such a framework.

Of course building on top of a semantic layer is good for LLMs, but that assumes 1. this semantic layer exists 2. it is not a freaking mess. While tools like dbt helped with 1, I'm yet to see a clean, well-documented, lineage-perfect semantic layer.

djoldman

3 months ago

1 reply

Agreed.

Data curation is barely on the radar of most non-tech industries. Even in tech, it's rare to have any meta data.

This is a huge blocker to many efforts.

Someone somewhere has to go through every table and field and document where it came from, when, and what it actually means.

Very very few places do this.

"Oh yeah I go to gold.inventory. I think it updates every night. Columns? Should be pretty intuitive, just look at the names."

mrklol

3 months ago

But IF there’s documentation a LLM could help and find things faster than a "new“ human who isn’t familiar with all the tables. I would rather see it as a helper - maybe even more with good architecture and docs.

pedromnascAuthor

3 months ago

Hi all,

I wrote a post on some lessons from building an AI data analyst. The gap from a nice demo to a real production system is big -> with a lot of yet to be solved challenges.

Would love to share ideas with other builders in the space and willing to learn more about it.

PeterStuer

3 months ago

I gradually came to the conclusion that RAG is just a new term for ye old Knowledge Management techniques.

View full discussion on Hacker News

ID: 45094256Type: storyLast synced: 11/20/2025, 4:35:27 PM

Want the full context?

Jump to the original sources

Read the primary article or dive into the live Hacker News thread when you're ready.

Read Article View on HN

Last activity 3 months agoPosted Sep 1, 2025 at 12:40 PM EDT

Lessons From Building an AI Data Analyst

pedromnasc

72 points

12 comments

Mood

calm

Sentiment

positive

Discussion Activity

Moderate engagement

First comment

N/A

Peak period

Day 3

Avg / period

Comment distribution12 data points

Loading chart...

Based on 12 loaded comments

Key moments

01Story posted
Sep 1, 2025 at 12:40 PM EDT
3 months ago
Step 01
02First comment
Sep 1, 2025 at 12:40 PM EDT
0s after posting
Step 02
03Peak activity
8 comments in Day 3
Hottest window of the conversation
Step 03
04Latest activity
Sep 4, 2025 at 2:55 AM EDT
3 months ago
Step 04

Generating AI Summary...

Analyzing up to 500 comments to identify key contributors and discussion patterns

Discussion (12 comments)

Showing 12 comments

attogram

3 months ago

1 reply

Great TL;DR section. Context is indeed the product.

blef

3 months ago

1 reply

I would also add that context and tools are the product. This is super important to correctly tune the tools and details matter (ok you can also argue that tools are context in some way)

pedromnascAuthor

3 months ago

loganfrederick

3 months ago

1 reply

pedromnascAuthor

3 months ago

exactly, most of them are concerned about the data they don't have, while in practice they do have a lot to generate good insights.

We do have a vertical product for commodity trading and shipping: https://www.darlinganalytics.ai/ -> in that case the semantic layer is much more well defined, which makes setup way easier.

mrtimo

3 months ago

1 reply

pedromnascAuthor

3 months ago

pil0u

3 months ago

1 reply

Maybe I haven't found the right tool yet, but every time I see a product trying to solve data analysis with AI, the first example often deals with aggregated revenue over time or something similar.

This can be solved by a student after 3 days of learning SQL from scratch.

The article, while technical, remains pretty vague about implementation and what real, business problem they managed to solve with such a framework.

djoldman

3 months ago

1 reply

Agreed.

Data curation is barely on the radar of most non-tech industries. Even in tech, it's rare to have any meta data.

This is a huge blocker to many efforts.

Someone somewhere has to go through every table and field and document where it came from, when, and what it actually means.

Very very few places do this.

"Oh yeah I go to gold.inventory. I think it updates every night. Columns? Should be pretty intuitive, just look at the names."

mrklol

3 months ago

pedromnascAuthor

3 months ago

Hi all,

I wrote a post on some lessons from building an AI data analyst. The gap from a nice demo to a real production system is big -> with a lot of yet to be solved challenges.

Would love to share ideas with other builders in the space and willing to learn more about it.

PeterStuer

3 months ago

I gradually came to the conclusion that RAG is just a new term for ye old Knowledge Management techniques.

View full discussion on Hacker News

ID: 45094256Type: storyLast synced: 11/20/2025, 4:35:27 PM

Want the full context?

Jump to the original sources

Read the primary article or dive into the live Hacker News thread when you're ready.

Read Article View on HN