Claude for Excel

about 1 month ago

4 replies

[flagged]

about 1 month ago

3 replies

Okay. But then you could say the same for a human, isn't your brain just a cloud of matter and electricity that just reacts to senses deterministically?

about 1 month ago

1 reply

> isn't your brain just a cloud of matter and electricity that just reacts to senses deterministically?

LLMs are not deterministic.

I'd argue over the short term humans are more deterministic. I ask a human the same question multiple times and I get the same answer. I ask an LLM and each answer could be very different depending on its "temperature".

krzyk

about 1 month ago

If you ask human the same question repeatedly, you'll get different answers. I think that at third you'll get "I already answered that" etc.

qwertox

about 1 month ago

1 reply

I'm having a bad day today. I'm 100% certain that today I'll react completely different to any tiny issue compared to how I did yesterday.

https://news.ycombinator.com/newsguidelines.html

29 days ago

Right, if you change the input to your function, you get a different output. By that logic, the function `(def (add a b) (+ a b)` isn't deterministic.

worldsayshi

about 1 month ago

We hardly react to things deterministically.

But I agree with the sentiment. It seems it is more important than ever to agree on what it means to understand something.

dang

about 1 month ago

"Eschew flamebait. Avoid generic tangents."

baal80spam

about 1 month ago

OK then. Groks?

NDizzle

about 1 month ago

I mean - try clicking the CoPilot button and see what it can actually do. Last I checked, it told me it couldn't change any of the actual data itself, but it could give you suggestions. Low bar for excellence here.

d--b

about 1 month ago

3 replies

Ok, they weren't confident enough to let the model actually edit the spreadsheet. Phew..

Only a matter of time before someone does it though.

password4321

about 1 month ago

1 reply

How well does change tracking work in Excel... how hard would it be to review LLM changes?

AFAIK there is no 'git for Excel to diff and undo', especially not built-in (aka 'for free' both cost-wise and add-ons/macros not allowed security-wise).

My limited experience has been that it is difficult to keep LLMs from changing random things besides what they're asked to change, which could cause big problems if unattackable in Excel.

NewsaHackO

about 1 month ago

I thought there was track changes on all office products. Most Office documents are zip files of XML files and assets, so I'd imagine it would be possible to rollback changes.

about 1 month ago

When I think how easy I can misclick to stuff up a spreadsheet I can't begin to imagine all the subtle ways LLMs will screw them up.

Unlike code where it's all on display, with all these formulas are hidden in each cell, you won't see the problem unless click on the cell so you'll have a hard time finding the cause.

about 1 month ago

I wish Gemini could edit more in Google sheets and docs.

Little stuff like splitting text more intelligently or following the formatting seen elsewhere would be very satisfying.

about 1 month ago

6 replies

Yet more evidence of the bubble burst being imminent. If any of these companies really had some almost-AGI system internally, they wouldn’t be spending any effort making f’ing Excel plugins. Or at the very least, they’d be writing their own Excel because AI is so amazing at coding, right?

qsort

about 1 month ago

5 replies

You wouldn't believe the amount of shit that runs on Excel.

powvans

about 1 month ago

1 reply

Yes. I once interviewed a developer who’s previous job was maintaining the .NET application that used an Excel sheet as the brain for decisions about where to drill for oil on the sea floor. No one understood what was in the Excel sheet. It was built by a geologist who was long gone. The engineering team understood the inputs and outputs. That’s all they needed to know.

mwigdahl

about 1 month ago

Years ago when I worked for an engineering consulting company we had to work with a similarly complex, opaque Excel spreadsheet from General Electric modeling the operation of a nuclear power plant in exacting detail.

Same deal there -- the original author was a genius and was the only person who knew how it was set up or how it worked.

about 1 month ago

I spotted a custom dialog in an Excel spreadsheet in a medical context the other day, I was horrified.

efields

about 1 month ago

This. I work in Pharma. Excel and faxes.

about 1 month ago

I think you’re misunderstanding me. This might be something somewhat useful, I don’t know, and I’m not judging it based on that.

What I’m saying is that if you really believed we were 2, maybe 3 years tops from AGI or the singularity or whatever you would spend 0 effort serving what already seems to be a domain that is already served by 3rd parties that are already using your models! An excel wrapper for an LLM isn’t exactly cutting edge AI research.

They’re desperate to find something that someone will pay a meaningful amount of money for that even remotely justifies their valuation and continued investment.

dickersnoodle

about 1 month ago

Sic

pton_xd

about 1 month ago

1 reply

The fine tuning will continue until we reach AGI.

amlib

about 1 month ago

The fine tuning will continue until we reach the torment nexus, at best

about 1 month ago

Excel is living business knowlege stuck in private SharePoint Sites, tappimg into it might kick off a nice data flywheel not to speak of the nice TAM.

FergusArgyll

about 1 month ago

A program that can do excel for you is almost AGI

about 1 month ago

You make a great point. Where is all of the complex applications? They haven't been able to create than own office suite or word processor or really anything aside from a halloween matching game in js. You would think we would have some complex application they can point to but nothing.

about 1 month ago

The current valuations do not require AGI. They require products like this that will replace scores of people doing computer based grunt work. MSFT is worth $4 trillion off the back of enterprise productivity software, the AI labs just need some of that money.

jawns

about 1 month ago

5 replies

Gemini already has its hooks in Google Sheets, and to be honest, I've found it very helpful in constructing semi-complicated Excel formulas.

Being able to select a few rows and then use plain language to describe what I want done is a time saver, even though I could probably muddle through the formulas if I needed to.

about 1 month ago

1 reply

Last time I tried using Gemini in Google Sheets it hallucinated a bunch of fake data, then gave me a summary that included all that fake data. I'd given it a bunch of transaction data, and asked it to group the records into different categories for budgeting. When asking it to give the largest values in each category, all the values that came back were fake. I'm not sure I'd really trust it to touch a spreadsheet after that.

about 1 month ago

2 replies

you should:

-stop using the free plan -don't use gemini flash for these tasks -learn how to do things over time and know that all ai models have improved significantly every few months

28 days ago

Well I'm paying for pro, so I assume it's not using the model that does nothing useful. Is there a setting for that?

Whats the month over month improvement if the current state is "creates entirely fake data that looks convincing" as a user it's hard to tell when we've hit a point of this being a useful feature. The old timey metric would normally be that when a company rolls out a new feature it's usually mostly functional, that doesn't appear to be the case here at all, so what's the sign?

about 1 month ago

Or not use it.

about 1 month ago

I would recommend trying TabTabTab at https://tabtabtab.ai/

It is an entire agent loop. You can ask it to build a multi sheet analysis of your favorite stock and it will. We are seeing a lot of early adopters use it for financial modeling, research automation, and internal reporting tasks that used to take hours.

about 1 month ago

I forgot to add, you can try TabTabTab, without installing anything as well.

To see something much more powerful on Google Sheets than Gemini for free, you can add "try@tabtabtab.ai" to your sheet, and make a comment tagging "try@tabtabtab.ai" and see it in action.

If that is too much just go to ttt.new!

frankfrank13

about 1 month ago

I have had the opposite experience. I've never had Gemini give me something useful in sheets, and I'm not asking for complicated things. Like "group this data by day" or "give me p50 and p90"

dangoodmanUT

about 1 month ago

Gemini integratoins to Google workspace feels like it's using Gemini 1.5 flash, it's so comically bad at understanding and generating

about 1 month ago

3 replies

It’s interesting to me that this page talks a lot about “debugging models” etc. I would’ve expected (from the title) this to be going after the average excel user, similar to how chatgpt went after every day people.

I would’ve expected “make a vlookup or pivot table that tells me x” or “make this data look good for a slide deck” to be easier problems to solve.

layer8

about 1 month ago

1 reply

The issue is that the average Excel user doesn’t quite have the skills to validate and double-check the Excel formulas that Claude would produce, and to correct them if needed. It would be similar to a non-programmer vibe-coding an app. And that’s really not what you want to happen for professionally used Excel sheets.

about 1 month ago

IMO that is exactly what people want. At my work everyone uses LLMs constantly and the trade off of not perfect information is known. People double check it, etc, but the information search is so much faster even if it finds the right confluence but misquotes it, it still sends me the link.

For easy spreadsheet stuff (which 80% of average white collars workers are doing when using excel) I’d imagine the same approach. Try to do what I want, and even if you’re half wrong the good 50% is still worth it and a better starting point.

Vibe coding an app is like vibe coding a “model in excel”. Sure you could try, but most people just need to vibe code a pivot table

about 1 month ago

I think actually Anthropic themselves are having trouble with imagining how this could be used. Coders think like coders - they are imagining the primary use case being managing large Excel sheets that are like big programs. In reality most Excel worksheets are more like tiny, one-off programs. More like scripts than applications. AI is very very good at scripts.

about 1 month ago

I think this is aiming to be Claude Code for people who use Excel as a programming environment.

https://www.metabase.com/features/metabot-ai

about 1 month ago

2 replies

I'm excited to see what national disasters will be caused by auto-generated Excel sheets that nobody on the planet understands. A few selections from past HN threads to prime your imagination:

Thousands of unreported COVID cases: https://news.ycombinator.com/item?id=24689247

Thousands of errors in genetics research papers: https://news.ycombinator.com/item?id=41540950

Wrong winner announced in national election: https://news.ycombinator.com/item?id=36197280

Countries across the world implement counter-productive economic austerity programs: https://en.wikipedia.org/wiki/Growth_in_a_Time_of_Debt#Metho...

malthaus

about 1 month ago

from my experience in the corporate world, i'd trust an excel generated / checked by an LLM more than i would one that has been organically grown over years in a big corporation where nobody ever checks or even can check anything because its one big growing pile of technical debt people just accept as working

HPsquared

about 1 month ago

Especially combined with the dynamic array formulas that have recently been added (LET, LAMBDA etc). You can have much more going on within each cell now. Think whole temporary data structures. The "evaluate formula" dialog doesn't quite cut it anymore for debugging.

whalesalad

about 1 month ago

1 reply

I just want Claude inside of Metabase.

adamfeldman

about 1 month ago

about 1 month ago

7 replies

George Hotz said there's 5 tiers of AI systems, Tier 1 - Data centers, Tier 2 - fabs, Tier 3 - chip makers, Tier 4 - frontier labs, Tier 5 - Model wrappers. He said Tier 4 is going to eat all the value of Tier 5, and that Tier 5 is worthless. It's looking like that's going to be the case

about 1 month ago

1 reply

Claude is a model wrapper, no?

about 1 month ago

1 reply

Anthropic is a frontier lab, and Claude is a frontier model

https://docs.claude.com/en/docs/about-claude/models/overview

about 1 month ago

1 reply

Anthropic models are Sonnet / Haiku / Opus

30 days ago

Okay, Claude is a _family_ of frontier models then. IMO that's a pedantic distinction in this context.

mediaman

about 1 month ago

2 replies

That is a common refrain by people who have no domain expertise in anything outside of tech.

Spend a few years in an insurance company, a manufacturing plant, or a hospital, and then the assertion that the frontier labs will figure it out appears patently absurd. (After all, it takes humans years to understand just a part of these institutions, and they have good-functioning memory.)

This belief that tier 5 is useless is itself a tell of a vulnerability: the LLMs are advancing fastest in domain-expertise-free generalized technical knowledge; if you have no domain expertise outside of tech, you are most vulnerable to their march of capability, and it is those with domain expertise who will rely increasingly less on those who have nothing to offer but generalized technical knowledge.

about 1 month ago

I dont think the claim is exactly that tier 5 is useless more that tier 5 synergizes so well with tier 4 that all the popular tier 5 products will eventually be made by the tier 4 companies.

about 1 month ago

yeah but if Anthropic/OpenAI dedicate resources to gaining domain expertise then any tier 5 is dead in the water. For example, they recently hired a bunch of finance professionals to make specialized models for financial modeling. Any startup in that space will be wiped out

about 1 month ago

Andrew Ng argumented in 2023 (https://www.youtube.com/watch?v=5p248yoa3oE ) that the underlying tiers depend on the app tier‘s success.

That OpenAI is now apparantly striving to become the next big app layer company could hint at George Hotz being right but only if the bets work out. I‘m glad that there is competition on the frontier labs tier.

1: https://x.com/tbpn/status/1935072881425400016

about 1 month ago

George Hotz says a lot of things. I think he's directionally correct but you could apply this argument to tech as a whole. Even outside of AI, there are plenty of niches where domain-specific solutions matter quite a bit but are too small for the big players to focus on.

rudedogg

about 1 month ago

Tier 5 requires domain expertise until we reach AGI or something very different from the latest LLMs.

I don’t think the frontier labs have the bandwidth or domain knowledge (or dare I say skills) to do tier 5 tasks well. Even their chat UIs leave a lot to be desired and that should be their core competency.

matsur

about 1 month ago

People were saying the same thing about AWS vs SaaS ("AWS wrappers") a decade ago and none of that came to pass. Same will be true here.

benatkin

about 1 month ago

Interesting. I found a reference to this in a tweet [1], and it looks to be a podcast. While I'm not extremely knowledgable. I'd put it like this: Tier 1 - fabs, Tier 2 - chip makers, Tier 3 - data centers, Tier 4 - frontier labs, Tier 5 - Model wrappers

However I would think more of elite data centers rather than commodity data centers. That's because I see Tier 4 being deeply involved in their data centers and thinking of buying the chips to feed their data centers. I wouldn't be so inclined to throw in my opinion immediately if I found an article showing this ordering of the tiers, but being a tweet of a podcast it might have just been a rough draft.

about 1 month ago

3 replies

What is with the negativity in these comments? This is a huge, huge surface area that touches a large percentage of white collar work. Even just basic automation/scaffolding of spreadsheets would be a big productivity boost for many employees.

My wife works in insurance operations - everyone she manages from the top down lives in Excel. For line employees a large percentage of their job is something like "Look at this internal system, export the data to excel, combine it with some other internal system, do some basic interpretation, verify it, make a recommendation". Computer Use + Excel Use isn't there yet...but these jobs are going to be the first on the chopping block as these integrations mature. No offense to these people but Sonnet 4.5 is already at the level where it would be able to replicate or beat the level of analysis they typically provide.

about 1 month ago

17 replies

I don't trust LLMs to do the kind of precise deterministic work you need in a spreadsheet.

It's one thing to fudge the language in a report summary, it can be subjective, however numbers are not subjective. It's widely known LLMs are terrible at even basic maths.

Even Google's own AI summary admits it which I was surprised at, marketing won't be happy.

Yes, it is true that LLMs are often bad at math because they don't "understand" it as a logical system but rather process it as text, relying on pattern recognition from their training data.

about 1 month ago

6 replies

Seems like you're very confused about what this work typically entails. The job of these employees is not mental arithmatic. It's closer to:

- Log in to the internal system that handles customer policies

- Find all policies that were bound in the last 30 days

- Log in to the internal system that manages customer payments

- Verify that for all policies bound, there exists a corresponding payment that roughly matches the premium.

- Flag any divergences above X% for accounting/finance to follow up on.

Practically this involves munging a few CSVs, maybe typing in a few things, setting up some XLOOKUPs, IF formulas, conditional formatting, etc.

Will AI replace the entire job? No...but that's not the goal. Does it have to be perfect? Also no...the existing employees performing this work are also not perfect, and in fact sometimes their accuracy is quite poor.

Ntrails

about 1 month ago

3 replies

Checking someone elses spreadsheet is a fucking nightmare. If your company has extremely good standards it's less miserable because at least the formatting etc will be consistent...

The one thing LLMs should consistently do is ensure that formatting is correct. Which will help greatly in the checking process. But no, I generally don't trust them to do sensible things with basic formulation. Not a week ago GPT 5 got confused whether a plus or a minus was necessary in a basic question of "I'm 323 days old, when is my birthday?"

xmprt

about 1 month ago

6 replies

I think you have a misunderstanding of the types of things that LLMs are good at. Yes you're 100% right that they can't do math. Yet they're quite proficient at basic coding. Most Excel work is similar to basic coding so I think this is an area where they might actually be pretty well suited.

My concern would be more with how to check the work (ie, make sure that the formulas are correct and no columns are missed) because Excel hides all that. Unlike code, there's no easy way to generate the diff of a spreadsheet or rely on Git history. But that's different from the concerns that you have.

about 1 month ago

1 reply

I've built spreadsheet diff tools on Google sheets multiple times. As the needs grows I think we will see diffs and commits and review tools reach customers

about 1 month ago

1 reply

hey Collin! I am working on an AI agent on Google Sheets, I am curious if any of your designs are out in the public. We are trying to re-think how diffs should look like and want to make something nicer than what we currently have, so curious.

about 1 month ago

Hi! Nothing public nor generic enough to be a good building block. I found myself often frustrated by the tools that came out of the box but I believe better apis could make this slightly easier to solve.

The UX of spreadsheet diffs is a hard one to solve because of how weird the calculation loops are and how complicated the relationship between fields might be.

I've never tried to solve this for a real end user before in a generic way - all my past work here was for internal ability to audit changes and rollback catastrophes. I took a lot of shortcuts by knowing which cells are input data vs various steps of calculations -- maybe part of your ux is being able to define that on a sheet by sheet basis? Then you could show how different data (same formulas) changed outputs or how different formulas (same data) did differently?

Spreadsheets are basically weird app platforms at this point so you might not be able to create a single experience that is both deep and generic. On the other hand maybe treating it as an app is the unlock? Get your AI to noodle on what the whole thing is for, then show diff between before and after stable states (after all calculation loops stabilize or are killed) side by side with actual diffs of actual formulas? I feel like Id want to see a diff as a live final spreadsheet and be able to click on changed cells and see up the chain of their calculations to the ancestors that were modified.

Fun problem that sounds extremely complicated. Good luck distilling it!

alfalfasprout

about 1 month ago

1 reply

proficient != near-flawless.

> Most Excel work is similar to basic coding so I think this is an area where they might actually be pretty well suited.

This is a hot take. One I'm not sure many would agree with.

mguerville

30 days ago

Excel work of people who make a living because of their excel skills (Bankers, VCs, Finance pros) is truly on the spectrum of basic coding. Excel use by others (Strategy, HR, etc.) is more like crude UI to manipulate small datasets (filter, sort, add, share and collaborate). Source: have lived both lives.

mr_toad

about 1 month ago

> Most Excel work is similar to basic coding

Excel is similar to coding in BASIC, a giant hairy ball of tangled wool.

klausnrooster

30 days ago

MS Office Tools menu has a "Spreadsheet Compare" application. It is quite good for diffing 2 spreadsheets. Of course it cannot catch logic errors, human or ML.

mapt

about 1 month ago

So do it in basic code where numbering your line G53 instead of G$53 doesn't crash a mass transit network because somebody's algorithm forgot to order enough fuel this month.

Wowfunhappy

about 1 month ago

> Yes you're 100% right that they can't do math.

The model ought to be calling out to some sort of tool to do the math—effectively writing code, which it can do. I'm surprised the major LLM frontends aren't always doing this by now.

runarberg

about 1 month ago

1 reply

> The one thing LLMs should consistently do is ensure that formatting is correct.

In JavaScript (and I assume most other programming languages) this is the job of static analysis tools (like eslint, prettier, typescript, etc.). I’m not aware of any LLM based tools which performs static analysis with as good a results as the traditional tools. Is static analysis not a thing in the spreadsheet world? Are there the tools which do static analysis on spreadsheets subpar, or offer some disadvantage not seen in other programming languages? And if so, are LLMs any better?

eric-burel

about 1 month ago

Just use a normal static analysis tool and shove the result to an LLM. I believe Anthropic properly figured that agents are the key, in addition to models, contrary to OpenAI that is run by a psycho that only believes in training the bigger model.

koliber

about 1 month ago

1 reply

Maybe LLMs will enable a new type of work in spreadsheets. Just like in coding we have PR reviews, with an LLM it should be possible to do a spreadsheet review. Ask the LLM to try to understand the intent and point out places where the spreadsheet deviates from the intent. Also ask the LLM to narrate the spreadsheet so it can be understood.

Insanity

about 1 month ago

1 reply

That first condition "try to understand the intent" is where it could go wrong. Maybe it thinks the spreadsheet aligns with the intent, but it misunderstood the intent.

LLMs are a lossy validation, and while they work sometimes, when they fail they usually do so 'silently'.

monkeydust

about 1 month ago

Maybe we need some kind of method, framework to develop intent. Most of things that go wrong in knowledge working are down to lack of common understanding of intent.

lossolo

about 1 month ago

3 replies

Last time, I gave claude an invoice and asked it to change one item on it, it did so nicely and gave me the new invoice. Good thing I noticed it had also changed the bank account number..

The more complicated the spreadsheet and the more dependencies it has, the greater the room for error. These are probabilistic machines. You can use them, I use them all the time for different things, but you need to treat them like employees you can't even trust to copy a bank account number correctly.

wholinator2

about 1 month ago

Yeah, asking llm to edit one specific thing in a large or complex document/ codebase is like those repeated "give me the exact same image" gifs. It's fundamentally a statistical model so the only thing we can be _certain_ of is that _it's not_. It might get the desired change 100% correct but it's only gonna get the entire document 99 5%

mikeyouse

about 1 month ago

We’ve tried to gently use them to automate some of our report generation and PDF->Invoice workflows and it’s a nightmare of silent changes and absence of logic.. basic things like specifically telling it “debits need to match credits” and “balance sheets need to balance” that are ignored.

about 1 month ago

Something that Claude Sonnet does when you use it to code is write scripts to test whether or not something is working. If it does that for Excel (e.g. some form of verification) it should be fine.

Besides, using AI is an exercise in a "trust but verify" approach to getting work done. If you asked a junior to do the task you'd check their output. Same goes for AI.

about 1 month ago

2 replies

Sysadmin of a small company. I get asked pretty often to help with a pivot table, vlookup, or just general excel functions (and smartsheet, these users LOVE smartsheet)

about 1 month ago

1 reply

> these users LOVE smartsheet

I hate smartsheet…

Excel or R. (Or more often, regex followed by pen and paper followed by more regex.)

30 days ago

They're coming to me for pivot tables....

Handing them regex would be like giving a monkey a bazooka

toomuchtodo

about 1 month ago

Indeed, in a small enough org, the sysadmin/technologist becomes support of last resort for all the things.

about 1 month ago

3 replies

> “Does it have to be perfect?”

Actually, yes. This kind of management reporting is either (1) going to end up in the books and records of the company - big trouble if things have to be restated in the future or (2) support important decisions by leadership — who will be very much less than happy if analysis turns out to have been wrong.

A lot of what ties up the time of business analysts is ticking and tying everything to ensure that mistakes are not made and that analytics and interpretations are consistent from one period to the next. The math and queries are simple - the details and correctness are hard.

2b3a51

about 1 month ago

2 replies

There is another aspect to this kind of activity.

Sometimes there can be an advantage in leading or lagging some aspects of internal accounting data for a time period. Basically sitting on credits or debits to some accounts for a period of weeks. The tacit knowledge to know when to sit on a transaction and when to action it is generally not written down in formal terms.

I'm not sure how these shenanigans will translate into an ai driven system.

iamacyborg

about 1 month ago

> Sometimes there can be an advantage in leading or lagging some aspects of internal accounting data for a time period.

This worked famously well for Enron.

about 1 month ago

That’s the kind of thing that can get a company into a lot of trouble with its auditors and shareholders. Not that I am offering accounting advice of course. And yeah, one can not “blame” and ai system or try to ai-wash any dodgy practices.

about 1 month ago

1 reply

Speak for yourself and your own use cases. There are a huge diversity of workflows with which to apply automation in any medium to large business. They all have differing needs. Many excel workflows I'm personally familiar with already incoporate a "human review" step. Telling a business leader that they can now jump straight to that step, even if it requires 2x human review, with AI doing all of the most tediuous and low-stakes prework, is a clear win.

Revanche1367

about 1 month ago

1 reply

>Speak for yourself and your own use cases

Take your own advice.

about 1 month ago

1 reply

I'm taking a much weaker position than the respondent: LLMs are useful for many classes of problem that do not require zero shot perfect accuracy. They are useful in contexts where the cost of building scaffolding around them to get their accuracy to an acceptable level is less than the cost of hiring humans to do the same work to the same degree of accuracy.

This is basic business and engineering 101.

Barbing

about 1 month ago

>LLMs are useful for many classes of problem that do not require zero shot perfect accuracy. They are useful in contexts where the cost of building scaffolding around them to get their accuracy to an acceptable level is less than the cost of hiring humans to do the same work to the same degree of accuracy.

Well said. Concise and essentially inarguable, at least to the extent it means LLMs are here to stay in the business world whether anyone likes it or not (barring the unforeseen, e.g. regulation or another pressure).

jacksnipe

about 1 month ago

2 replies

Is this not belligerently ignoring the fact that this work is already done imperfectly? I can’t tell you how many serious errors I’ve caught in just a short time of automating the generation of complex spreadsheets from financial data. All of them had already been checked by multiple analysts, and all of them contained serious errors (in different places!)

harrall

about 1 month ago

1 reply

There’s actually different classes of errors though. There’s errors in the process itself versus errors that happen when performing the process.

For example, if I ask you to tabulate orders via a query but you forgot to include an entire table, this is a major error of process but the query itself actually is consistently error-free.

Reducing error and mistakes is very much modeling where error can happen. I never trust an LLM to interpret data from a spreadsheet because I cannot verify every individual result, but I am willing to ask an LLM to write a macro that tabulates the data because I can verify the algorithm and the macro result will always be consistent.

Using Claude to interpret the data directly for me is scary because those kinds of errors are neither verifiable nor consistent. At least with the “missing table” example, that error may make the analysis completely bunk but once it is corrected, it is always correct.

29 days ago

Very much agreed

30 days ago

No belligerence intended! Yes, processes are faulty today even with maker-checker and other QA procedures. To me it seems the main value of LLMs in a spreadsheet-heavy process is acceleration - which is great! What is harder is quality assurance - like the example someone gave regarding deciding when and how to include or exclude certain tables, date ranges, calc, etc. Properly recording expert judgment and then consistently applying that judgement over time is key. I’m not sure that is the kind of thing LLMs are great at, even ignoring their stochastic nature. Let’s figure out how to get best use out of the new kit - and like everything else, focus on achieving continuously improving outcomes.

jay_kyburz

about 1 month ago

>Does it have to be perfect? Also no.

Yeah, but it could be perfect, why are there humans in the loop at all? That is all just math!

next_xibalba

about 1 month ago

The use cases for spreadsheets are much more diverse than that. In my experience, spreadsheets just as often used for calculation. Many of them do require high accuracy, rely on determinism, and necessitate the understanding of maths ranging from basic arithmetic to statistics and engineering formulas. Financial models, for example, must be built up from ground truth and need to always use the right formulas with the right inputs to generate meaningful outputs.

I have personally worked with spreadsheet based financial models that use 100k+ rows x dozens of columns and involve 1000s of formulas that transform those data into the desired outputs. There was very little tolerance for mistakes.

That said, humans, working in these use cases, make mistakes >0% of the time. The question I often have with the incorporation of AI into human workflows is, will we eventually come to accept a certain level of error from them in the way we do for humans?

mrcwinn

about 1 month ago

1 reply

I couldn’t agree more. I get all my perfectly deterministic work output from human beings!

goatlover

about 1 month ago

1 reply

If only we had created some device that could perform deterministic calculations and then wrote software that made it easy for humans to use such calculations.

bryanrasmussen

about 1 month ago

1 reply

ok but humans are idiots, if only we could make some sort of Alternate Idiot, a non-human but every bit as generally stupid as humans are! This A.I would be able to do every stupid thing humans did with the device that performed deterministic calculations only many times faster!

about 1 month ago

Yes and when the AI did that all the stupid humans could accept its output without question. This would save the humans a lot of work and thought and personal responsibility for any mistakes! See also Israel’s Lavender for an exciting example of this in action.

laweijfmvo

about 1 month ago

1 reply

I don't trust humans to do the kind of precise deterministic work you need in a spreadsheet!

about 1 month ago

2 replies

Right, we shouldn’t use humans or LLMs. We should use regular deterministic computer programs.

For cases where that is not available, we should use a human and never an LLM.

davidpolberger

about 1 month ago

1 reply

I like to use Claude Code to write deterministic computer programs for me, which then perform the actual work. It saves a lot of time.

I had a big backlog of "nice to have scripts" I wanted to write for years, but couldn't find the time and energy for. A couple of months after I started using Claude Code, most of them exist.

about 1 month ago

That’s great and the only legitimate use case here. I suspect Microsoft will not try to limit customers to just writing scripts and will instead allow and perhaps even encourage them to let the AI go ham on a bunch of raw data with no intermediary code that could be reviewed.

Just a suspicion.

about 1 month ago

"regular deterministic computer programs" - otherwise known as the SUM function in Microsoft Excel

doug_durham

about 1 month ago

1 reply

Sure, but this isn't requiring that the LLM do any math. The LLM is writing formulas and code to do the math. They are very good at that. And like any automated system you need to review the work.

causal

about 1 month ago

Exactly, and if it can be done in a way that helps users better understand their own spreadsheets (which are often extremely complex codebases in a single file!) then this could be a huge use case for Claude.

zarmin

about 1 month ago

1 reply

>I don't trust LLMs to do the kind of precise deterministic work

not just in a spreadsheet, any kind of deterministic work at all.

find me a reliable way around this. i don't think there is one. mcp/functions are a band aid and not consistent enough when precision is important.

after almost three years of using LLMs, i have not found a single case where i didn't have to review its output, which takes as long or longer than doing it by hand.

ML/AI is not my domain, so my knowledge is not deep nor technical. this is just my experience. do we need a new architecture to solve these problems?

about 1 month ago

ML/AI is not my domain but you don’t have to get all that technical to understand that LLMs run on probability. We need a new architecture to solve these problems.

Kiro

about 1 month ago

1 reply

Most real-world spreadsheets I've worked with were fragile and sloppy, not precise and deterministic. Programmers always get shocked when they realize how many important things are built on extremely messy spreadsheets, and that people simply accept it. They rather just spend human hours correcting discrepancies than trying to build something maintainable.

about 1 month ago

Usually this is very hard because the tasks and the job often subtly shifts in somewhat unpredictable and unforeseen ways and there is no neat clean abstraction that you can just implement as an application. Too hererogeneous, too messy, too many exceptions. If you develop some clean elegant solution, next week there will be something that your shiny app doesn't allow and they'd have to submit a feature request or whatever.

In Excel, it's possible to just ad hoc adjust things and make it up as you go. It's not clean but very adaptable and flexible.

MangoCoffee

about 1 month ago

1 reply

LLMs are just a tool, though. Humans still have to verify them, like with very other tools out there

A4ET8a8uTh0_v2

about 1 month ago

Eh, yes. In theory. In practice, and this is what I have experienced personally, bosses seem to think that you now have interns so you should be able to do 5x the output.. guess what that means. No verification or rubber stamp.

about 1 month ago

1 reply

Do you trust humans to be precise and deterministic, or even to be especially good at math?

This is talking about applying LLMs to formula creation and references, which they are actually pretty good at. Definitely not about replacing the spreadsheet's calculation engine.

about 1 month ago

1 reply

I trust humans to not be able to shoot the company on the foot without even realizing it.

Why are we suddenly ok with giving every underpaid and exploited employee a foot gun and expect them to be responsible with it???

27 days ago

1 reply

Maybe my experience is unique, but I have seen so many security issues from underpaid and exploited employees being successfully phished, among many other abuses.

If your experience with the lowest, most-abused employees is better than mine, I envy you.

25 days ago

Now imagine your accountant can ask AI to write a formula that calculates EBITDA so he can go back to gambling on basketball faster

sdeframond

about 1 month ago

> I don't trust LLMs to do the kind of precise deterministic work you need in a spreadsheet.

Rightly so! But LLMs can still make you faster. Just don't expect too much from it.

chpatrick

about 1 month ago

They're not great at arithmetic but at abstract mathematics and numerical coding they're pretty good actually.

bg24

about 1 month ago

"I don't trust LLMs to do the kind of precise deterministic work" => I think LLM is not doing the precise arithmetic. It is the agent with lots of knowledge (skills) and tools. Precise deterministic work is done by tools (deterministic code). Skills brings domain knowledge and how to sequence a task. Agent executes it. LLM predicts the next token.

mbreese

about 1 month ago

I don’t see the issue so much as the deterministic precision of an LLM, but the lack of observability of spreadsheets. Just looking at two different spreadsheets, it’s impossible to see what changes were made. It’s not like programming where you can run a `git diff` to see what changes an LLM agent made to a source code file. Or even a word processing document where the text changes are clear.

Spreadsheets work because the user sees the results of complex interconnected values and calculations. For the user, that complexity is hidden away and left in the background. The user just sees the results.

This would be a nightmare for most users to validate what changes an LLM made to a spreadsheet. There could be fundamental changes to a formula that could easily be hidden.

For me, that the concern with spreadsheets and LLMs - which is just as much a concern with spreadsheets themselves. Try collaborating with someone on a spreadsheet for modeling and you’ll know how frustrating it can be to try and figure out what changes were made.

informal007

about 1 month ago

you might trust when the precision is extremely high and others agree with that.

high precision is possible because they can realize that by multiple cross validations

about 1 month ago

It's widely known LLMs are terrible at even basic maths.

Claude for Excel isn't doing maths. It's doing Excel. If the llm is bad at maths then teaching it to use a tool that's good at maths seems sensible.

game_the0ry

about 1 month ago

> I don't trust LLMs to do the kind of precise deterministic work you need in a spreadsheet.

I was thinking along the same lines, but I could not articulate as well as you did.

Spreadsheet work is deterministic; LLM output is probabilistic. The two should be distinguished.

Still, its a productivity boost, which is always good.

prisonguard

about 1 month ago

ChatGPT is actively being used as a calculator.

mhh__

about 1 month ago

If LLMs can replace mathematica for me when I'm doing affine yield curve calculations they can do a DCF for some banker idiots

pavel_lishin

about 1 month ago

2 replies

My concern is that my insurance company will reject a claim, or worse, because of something an LLM did to a spreadsheet.

Now, granted, that can also happen because Alex fat-fingered something in a cell, but that's something that's much easier to track down and reverse.

about 1 month ago

5 replies

They already doing that with AI, rejecting claims at higher numbers than before .

Privatized insurance will always find a way to pay out less if they could get away with it . It is just nature of having the trifecta of profit motive , socialized risk and light regulation .

philipallstar

about 1 month ago

1 reply

> It is just nature of having the trifecta of profit motive , socialized risk and light regulation.

It's the nature of everything. They agree to pay you for something. It's nothing specific to "profit motive" in the sense you mean it.

about 1 month ago

I should have been clearer - profit maximization above all else as long it is mostly legal. Neither profit or profit maximization at all cost is nature of everything .

There are many other entity types from unions[1], cooperatives , public sector companies , quasi government entities, PBC, non profits that all offer insurance and can occasionally do it well.

We even have some in the US and don’t think it is communism even - like the FDIC or things like social security/ unemployment insurance.

At some level government and taxation itself is nothing but insurance ? We agree to paying taxes to mitigate against variety of risks including foreign invasion or smaller things like getting robbed on the street.

[1] Historically worker collectives or unions self-organized to socialize the risks of both major work ending injuries or death.

Ancient to modern armies operate on because of this insurance the two ingredients that made them not mercenaries - a form of long term insurance benefit (education, pension, land etc) or family members in the event of death and sovereign immunity for their actions.

about 1 month ago

1 reply

> They already doing that with AI, rejecting claims at higher numbers than before

Source?

nartho

about 1 month ago

Haven't risk based models been a thing for the last 15-20 years ?

keernan

about 1 month ago

1 reply

>>They already doing that with AI, rejecting claims at higher numbers than before .

That's a feature, not a bug.

elpakal

about 1 month ago

1 reply

This is a great application of this quote. Insurance providers have 0 incentive to make their AI "good" at processing claims, in fact it's easy to see how "bad" AI can lead to a justification to deny more claims.

about 1 month ago

The question is how you define good. They surely want the Ai to be good in the sense that it rejects all claims that they think can get away with rejecting. But it should not reject those where rejection likely results in litigation and losing and having to pay damages.

about 1 month ago

1 reply

Couldn't they accomplish the same thing by rejecting a certain percentage of claims totally at random?

about 1 month ago

2 replies

That would be illegal though, the goal is do this legally after all.

We also have to remember all claims aren't equal. i.e. some claims end up being way costlier than others. You can achieve similar % margin outcomes by putting a ton of friction like, preconditions, multiple appeals processes and prior authorization for prior authorization, reviews by administrative doctors who have no expertise in the field being reviewed don't have to disclose their identity and so and on.

While U.S. system is most extreme or evolved, it is not unique, it is what you get when you end up privatize insurance any country with private insurance has some lighter version of this and is on the same journey .

Not that public health system or insurance a la NHS in UK or like Germany work, they are underfunded, mismanaged with long times in months to see a specialist and so on.

We have to choose our poison - unless you are rich of course, then the U.S. system is by far the best, people travel to the U.S. to get the kind of care that is not possible anywhere else.

https://www.healthsystemtracker.org/chart-collection/u-s-spe...

about 1 month ago

2 replies

> While U.S. system is most extreme or evolved, it is not unique, it is what you get when you end up privatize insurance any country with private insurance has some lighter version of this and is on the same journey .

I disagree with the statement that healthcare insurance is predominantly privatized in the US: Medicare and Medicaid, at least in 2023, outspent private plans for healthcare spending by about ~10% [1]; this is before accounting for government subsidies for private plans. And boy, does America have a very unique relationship with these programs.

about 1 month ago

It is more nuanced, for example Medicare Advantage(Part C) is paid by Medicare money but it is profitable private operators who provide the plans and service it a fast growing part of Medicare .

John Oliver had an excellent segment coincidentally yesterday on this topic.

While the government pays for it, it is not managed or run by them so how to classify the program as public or private ?

30 days ago

That's a great and thorough analysis!

My take away is that as public health costs are overtaking private insurance and at the same time doing a better job controlling costs per enrollee, it makes more and more sense just to have the government insure everyone.

I can't see what argument the private insurers have in their favor.

about 1 month ago

Why does saying "AI did it" make it legal, if the outcome is the same?

about 1 month ago

2 replies

If you think that insurance companies have "light regulation", I shudder to think of what "heavy regulation" would look like. (Source: I'm the CTO at an insurance company.)

about 1 month ago

1 reply

They have too much regulation, and too little auditing (at least in the managed healthcare business).

about 1 month ago

1 reply

I agree, and I can see where it comes from (at least at the state level). The cycle is: bad trend happens that has deep root causes (let's say PE buying rural hospitals because of reduced Medicaid/Medicare reimbursements); legislators (rightfully) say "this shouldn't happen", but don't have the ability to address the deep root causes so they simply regulate healthcare M&As – now you have a bandaid on a problem that's going to pop up elsewhere.

about 1 month ago

I mean even in the simple stuff like denying payment for healthcare that should have been covered. CMS will come by and out a handful of cases, out of millions, every few years.

So obviously the company that prioritizes accuracy of coverage decisions by spending money on extra labor to audit itself is wasting money. Which means insureds have to waste more time getting the payment for healthcare they need.

about 1 month ago

2 replies

Light did not mean to imply quantity of paperwork you have to do, rather are you allowed to do the things you want to do as a company.

More compliance or reporting requirements usually tend to favor the larger existing players who can afford to do it and that is also used to make the life difficult and reject more claims for the end user.

It is kind of thing that keeps you and me busy, major investors don't care about it all, the cost of the compliance or the lack is not more than a rounding number in the balance, the fines or penalties are puny and laughable.

The enormous profits year on year for decades now, the amount of consolidation allowed in the industry show that the industry is able to do mostly what they want pretty much, that is what I meant by light regulation.

https://riskandinsurance.com/us-pc-insurance-industry-posts-...

about 1 month ago

1 reply

I'm not sure we're looking at the same industry. Overall, insurance company profit margins are in the single digits, usually low single digits - and in many segments, they're frequently not profitable at all. To take one example, 2024 was the first profitable year for homeowners insurance companies since 2019, and even then, the segment's entire profit margin was 0.3% (not 3% - 0.3%).

about 1 month ago

1 reply

It's an accounting 101 thing to use all tricks in the book to reduce the reported profit, to avoid paying taxes on that profit.

25 days ago

Insurance companies vote with their feet. The practical reality is that if insurance companies are able to make money in a given state, they'll stay in that state. Insurance companies fleeing states like CA or FL in droves is a really good indication that the actual, hard reality on the ground is that they can't make money when the regulations are stacked against them. That's fine for insurance companies - they'll just go somewhere else - but it's really bad for the people who need insurance.

zetazzed

about 1 month ago

The total profit of ALL US health insurance companies added together was $9bln in 2024: https://content.naic.org/sites/default/files/2024-annual-hea.... This is a profit margin of 0.8% down from 2.2% in the previous year.

Meta alone made $62bln in 2024: https://investor.atmeta.com/investor-news/press-release-deta...

So it's weird to see folks on a tech site talking about how enormous all the profits are in health insurance, and citations with numbers would be helpful to the discussion.

I worked in insurance-related tech for some time, and the providers (hospitals, large physician groups) and employers who actually pay for insurance have signficant market power in most regions, limiting what insurers can charge.

wombatpm

about 1 month ago

Wait until a company has to restate earnings because of a bug in a Claudified Excel spreadsheet.

doctorpangloss

about 1 month ago

1 reply

> What is with the negativity in these comments?

Some people - normal people - understand the difference between the holistic experience of a mathematically informed opinion and an actual model.

It's just that normal people always wanted the holistic experience of an answer. Hardly anyone wants a right answer. They have an answer in their heads, and they want a defensible journey to that answer. That is the purpose of Excel in 95% of places it is used.

Lately people have been calling this "syncophancy." This was always the problem. Sycophancy is the product.

Claude Excel is leaning deeply into this garbage.

about 1 month ago

1 reply

It seems like to me the answer is moreso "People on HN are so far removed from the real use cases for this kind of automation they simply have no idea what they're talking about".

View full discussion on Hacker News

about 1 month ago

This is so correct it hurts

299 more comments available on Hacker News

ID: 45722639Type: storyLast synced: 11/22/2025, 11:47:55 PM

Want the full context?

Jump to the original sources

Read the primary article or dive into the live Hacker News thread when you're ready.

Read Article View on HN

Last activity 25 days agoPosted Oct 27, 2025 at 12:09 PM EDT

Claude for Excel

meetpateltech

684 points

459 comments

Mood

heated

Sentiment

mixed

Discussion Activity

Very active discussion

First comment

23m

Peak period

154

Day 1

Avg / period

Comment distribution160 data points

Loading chart...

Based on 160 loaded comments

Key moments

01Story posted
Oct 27, 2025 at 12:09 PM EDT
about 1 month ago
Step 01
02First comment
Oct 27, 2025 at 12:32 PM EDT
23m after posting
Step 02
03Peak activity
154 comments in Day 1
Hottest window of the conversation
Step 03
04Latest activity
Nov 1, 2025 at 1:03 PM EDT
25 days ago
Step 04

Generating AI Summary...

Analyzing up to 500 comments to identify key contributors and discussion patterns

Discussion (459 comments)

Showing 160 comments of 459

about 1 month ago

4 replies

[flagged]

about 1 month ago

3 replies

Okay. But then you could say the same for a human, isn't your brain just a cloud of matter and electricity that just reacts to senses deterministically?

about 1 month ago

1 reply

> isn't your brain just a cloud of matter and electricity that just reacts to senses deterministically?

LLMs are not deterministic.

krzyk

about 1 month ago

If you ask human the same question repeatedly, you'll get different answers. I think that at third you'll get "I already answered that" etc.

qwertox

about 1 month ago

1 reply

I'm having a bad day today. I'm 100% certain that today I'll react completely different to any tiny issue compared to how I did yesterday.

https://news.ycombinator.com/newsguidelines.html

29 days ago

Right, if you change the input to your function, you get a different output. By that logic, the function `(def (add a b) (+ a b)` isn't deterministic.

worldsayshi

about 1 month ago

We hardly react to things deterministically.

But I agree with the sentiment. It seems it is more important than ever to agree on what it means to understand something.

dang

about 1 month ago

"Eschew flamebait. Avoid generic tangents."

baal80spam

about 1 month ago

OK then. Groks?

NDizzle

about 1 month ago

d--b

about 1 month ago

3 replies

Ok, they weren't confident enough to let the model actually edit the spreadsheet. Phew..

Only a matter of time before someone does it though.

password4321

about 1 month ago

1 reply

How well does change tracking work in Excel... how hard would it be to review LLM changes?

AFAIK there is no 'git for Excel to diff and undo', especially not built-in (aka 'for free' both cost-wise and add-ons/macros not allowed security-wise).

My limited experience has been that it is difficult to keep LLMs from changing random things besides what they're asked to change, which could cause big problems if unattackable in Excel.

NewsaHackO

about 1 month ago

I thought there was track changes on all office products. Most Office documents are zip files of XML files and assets, so I'd imagine it would be possible to rollback changes.

about 1 month ago

When I think how easy I can misclick to stuff up a spreadsheet I can't begin to imagine all the subtle ways LLMs will screw them up.

Unlike code where it's all on display, with all these formulas are hidden in each cell, you won't see the problem unless click on the cell so you'll have a hard time finding the cause.

about 1 month ago

I wish Gemini could edit more in Google sheets and docs.

Little stuff like splitting text more intelligently or following the formatting seen elsewhere would be very satisfying.

about 1 month ago

6 replies

qsort

about 1 month ago

5 replies

You wouldn't believe the amount of shit that runs on Excel.

powvans

about 1 month ago

1 reply

mwigdahl

about 1 month ago

Same deal there -- the original author was a genius and was the only person who knew how it was set up or how it worked.

about 1 month ago

I spotted a custom dialog in an Excel spreadsheet in a medical context the other day, I was horrified.

efields

about 1 month ago

This. I work in Pharma. Excel and faxes.

about 1 month ago

I think you’re misunderstanding me. This might be something somewhat useful, I don’t know, and I’m not judging it based on that.

They’re desperate to find something that someone will pay a meaningful amount of money for that even remotely justifies their valuation and continued investment.

dickersnoodle

about 1 month ago

Sic

pton_xd

about 1 month ago

1 reply

The fine tuning will continue until we reach AGI.

amlib

about 1 month ago

The fine tuning will continue until we reach the torment nexus, at best

about 1 month ago

Excel is living business knowlege stuck in private SharePoint Sites, tappimg into it might kick off a nice data flywheel not to speak of the nice TAM.

FergusArgyll

about 1 month ago

A program that can do excel for you is almost AGI

about 1 month ago

about 1 month ago

jawns

about 1 month ago

5 replies

Gemini already has its hooks in Google Sheets, and to be honest, I've found it very helpful in constructing semi-complicated Excel formulas.

Being able to select a few rows and then use plain language to describe what I want done is a time saver, even though I could probably muddle through the formulas if I needed to.

about 1 month ago

1 reply

about 1 month ago

2 replies

you should:

-stop using the free plan -don't use gemini flash for these tasks -learn how to do things over time and know that all ai models have improved significantly every few months

28 days ago

Well I'm paying for pro, so I assume it's not using the model that does nothing useful. Is there a setting for that?

about 1 month ago

Or not use it.

about 1 month ago

I would recommend trying TabTabTab at https://tabtabtab.ai/

about 1 month ago

I forgot to add, you can try TabTabTab, without installing anything as well.

To see something much more powerful on Google Sheets than Gemini for free, you can add "try@tabtabtab.ai" to your sheet, and make a comment tagging "try@tabtabtab.ai" and see it in action.

If that is too much just go to ttt.new!

frankfrank13

about 1 month ago

I have had the opposite experience. I've never had Gemini give me something useful in sheets, and I'm not asking for complicated things. Like "group this data by day" or "give me p50 and p90"

dangoodmanUT

about 1 month ago

Gemini integratoins to Google workspace feels like it's using Gemini 1.5 flash, it's so comically bad at understanding and generating

about 1 month ago

3 replies

I would’ve expected “make a vlookup or pivot table that tells me x” or “make this data look good for a slide deck” to be easier problems to solve.

layer8

about 1 month ago

1 reply

about 1 month ago

Vibe coding an app is like vibe coding a “model in excel”. Sure you could try, but most people just need to vibe code a pivot table

about 1 month ago

about 1 month ago

I think this is aiming to be Claude Code for people who use Excel as a programming environment.

https://www.metabase.com/features/metabot-ai

about 1 month ago

2 replies

I'm excited to see what national disasters will be caused by auto-generated Excel sheets that nobody on the planet understands. A few selections from past HN threads to prime your imagination:

Thousands of unreported COVID cases: https://news.ycombinator.com/item?id=24689247

Thousands of errors in genetics research papers: https://news.ycombinator.com/item?id=41540950

Wrong winner announced in national election: https://news.ycombinator.com/item?id=36197280

Countries across the world implement counter-productive economic austerity programs: https://en.wikipedia.org/wiki/Growth_in_a_Time_of_Debt#Metho...

malthaus

about 1 month ago

HPsquared

about 1 month ago

whalesalad

about 1 month ago

1 reply

I just want Claude inside of Metabase.

adamfeldman

about 1 month ago

about 1 month ago

7 replies

about 1 month ago

1 reply

Claude is a model wrapper, no?

about 1 month ago

1 reply

Anthropic is a frontier lab, and Claude is a frontier model

https://docs.claude.com/en/docs/about-claude/models/overview

about 1 month ago

1 reply

Anthropic models are Sonnet / Haiku / Opus

30 days ago

Okay, Claude is a _family_ of frontier models then. IMO that's a pedantic distinction in this context.

mediaman

about 1 month ago

2 replies

That is a common refrain by people who have no domain expertise in anything outside of tech.

about 1 month ago

I dont think the claim is exactly that tier 5 is useless more that tier 5 synergizes so well with tier 4 that all the popular tier 5 products will eventually be made by the tier 4 companies.

about 1 month ago

about 1 month ago

Andrew Ng argumented in 2023 (https://www.youtube.com/watch?v=5p248yoa3oE ) that the underlying tiers depend on the app tier‘s success.

1: https://x.com/tbpn/status/1935072881425400016

about 1 month ago

rudedogg

about 1 month ago

Tier 5 requires domain expertise until we reach AGI or something very different from the latest LLMs.

matsur

about 1 month ago

People were saying the same thing about AWS vs SaaS ("AWS wrappers") a decade ago and none of that came to pass. Same will be true here.

benatkin

about 1 month ago

about 1 month ago

3 replies

about 1 month ago

17 replies

I don't trust LLMs to do the kind of precise deterministic work you need in a spreadsheet.

It's one thing to fudge the language in a report summary, it can be subjective, however numbers are not subjective. It's widely known LLMs are terrible at even basic maths.

Even Google's own AI summary admits it which I was surprised at, marketing won't be happy.

Yes, it is true that LLMs are often bad at math because they don't "understand" it as a logical system but rather process it as text, relying on pattern recognition from their training data.

about 1 month ago

6 replies

Seems like you're very confused about what this work typically entails. The job of these employees is not mental arithmatic. It's closer to:

- Log in to the internal system that handles customer policies

- Find all policies that were bound in the last 30 days

- Log in to the internal system that manages customer payments

- Verify that for all policies bound, there exists a corresponding payment that roughly matches the premium.

- Flag any divergences above X% for accounting/finance to follow up on.

Practically this involves munging a few CSVs, maybe typing in a few things, setting up some XLOOKUPs, IF formulas, conditional formatting, etc.

Ntrails

about 1 month ago

3 replies

Checking someone elses spreadsheet is a fucking nightmare. If your company has extremely good standards it's less miserable because at least the formatting etc will be consistent...

xmprt

about 1 month ago

6 replies

about 1 month ago

1 reply

I've built spreadsheet diff tools on Google sheets multiple times. As the needs grows I think we will see diffs and commits and review tools reach customers

about 1 month ago

1 reply

about 1 month ago

The UX of spreadsheet diffs is a hard one to solve because of how weird the calculation loops are and how complicated the relationship between fields might be.

Fun problem that sounds extremely complicated. Good luck distilling it!

alfalfasprout

about 1 month ago

1 reply

proficient != near-flawless.

> Most Excel work is similar to basic coding so I think this is an area where they might actually be pretty well suited.

This is a hot take. One I'm not sure many would agree with.

mguerville

30 days ago

mr_toad

about 1 month ago

> Most Excel work is similar to basic coding

Excel is similar to coding in BASIC, a giant hairy ball of tangled wool.

klausnrooster

30 days ago

MS Office Tools menu has a "Spreadsheet Compare" application. It is quite good for diffing 2 spreadsheets. Of course it cannot catch logic errors, human or ML.

mapt

about 1 month ago

So do it in basic code where numbering your line G53 instead of G$53 doesn't crash a mass transit network because somebody's algorithm forgot to order enough fuel this month.

Wowfunhappy

about 1 month ago

> Yes you're 100% right that they can't do math.

The model ought to be calling out to some sort of tool to do the math—effectively writing code, which it can do. I'm surprised the major LLM frontends aren't always doing this by now.

runarberg

about 1 month ago

1 reply

> The one thing LLMs should consistently do is ensure that formatting is correct.

eric-burel

about 1 month ago

koliber

about 1 month ago

1 reply

Insanity

about 1 month ago

1 reply

That first condition "try to understand the intent" is where it could go wrong. Maybe it thinks the spreadsheet aligns with the intent, but it misunderstood the intent.

LLMs are a lossy validation, and while they work sometimes, when they fail they usually do so 'silently'.

monkeydust

about 1 month ago

Maybe we need some kind of method, framework to develop intent. Most of things that go wrong in knowledge working are down to lack of common understanding of intent.

lossolo

about 1 month ago

3 replies

Last time, I gave claude an invoice and asked it to change one item on it, it did so nicely and gave me the new invoice. Good thing I noticed it had also changed the bank account number..

wholinator2

about 1 month ago

mikeyouse

about 1 month ago

about 1 month ago

Something that Claude Sonnet does when you use it to code is write scripts to test whether or not something is working. If it does that for Excel (e.g. some form of verification) it should be fine.

Besides, using AI is an exercise in a "trust but verify" approach to getting work done. If you asked a junior to do the task you'd check their output. Same goes for AI.

about 1 month ago

2 replies

Sysadmin of a small company. I get asked pretty often to help with a pivot table, vlookup, or just general excel functions (and smartsheet, these users LOVE smartsheet)

about 1 month ago

1 reply

> these users LOVE smartsheet

I hate smartsheet…

Excel or R. (Or more often, regex followed by pen and paper followed by more regex.)

30 days ago

They're coming to me for pivot tables....

Handing them regex would be like giving a monkey a bazooka

toomuchtodo

about 1 month ago

Indeed, in a small enough org, the sysadmin/technologist becomes support of last resort for all the things.

about 1 month ago

3 replies

> “Does it have to be perfect?”

2b3a51

about 1 month ago

2 replies

There is another aspect to this kind of activity.

I'm not sure how these shenanigans will translate into an ai driven system.

iamacyborg

about 1 month ago

> Sometimes there can be an advantage in leading or lagging some aspects of internal accounting data for a time period.

This worked famously well for Enron.

about 1 month ago

about 1 month ago

1 reply

Revanche1367

about 1 month ago

1 reply

>Speak for yourself and your own use cases

Take your own advice.

about 1 month ago

1 reply

This is basic business and engineering 101.

Barbing

about 1 month ago

jacksnipe

about 1 month ago

2 replies

harrall

about 1 month ago

1 reply

There’s actually different classes of errors though. There’s errors in the process itself versus errors that happen when performing the process.

For example, if I ask you to tabulate orders via a query but you forgot to include an entire table, this is a major error of process but the query itself actually is consistently error-free.

29 days ago

Very much agreed

30 days ago

jay_kyburz

about 1 month ago

>Does it have to be perfect? Also no.

Yeah, but it could be perfect, why are there humans in the loop at all? That is all just math!

next_xibalba

about 1 month ago

mrcwinn

about 1 month ago

1 reply

I couldn’t agree more. I get all my perfectly deterministic work output from human beings!

goatlover

about 1 month ago

1 reply

If only we had created some device that could perform deterministic calculations and then wrote software that made it easy for humans to use such calculations.

bryanrasmussen

about 1 month ago

1 reply

about 1 month ago

laweijfmvo

about 1 month ago

1 reply

I don't trust humans to do the kind of precise deterministic work you need in a spreadsheet!

about 1 month ago

2 replies

Right, we shouldn’t use humans or LLMs. We should use regular deterministic computer programs.

For cases where that is not available, we should use a human and never an LLM.

davidpolberger

about 1 month ago

1 reply

I like to use Claude Code to write deterministic computer programs for me, which then perform the actual work. It saves a lot of time.

I had a big backlog of "nice to have scripts" I wanted to write for years, but couldn't find the time and energy for. A couple of months after I started using Claude Code, most of them exist.

about 1 month ago

Just a suspicion.

about 1 month ago

"regular deterministic computer programs" - otherwise known as the SUM function in Microsoft Excel

doug_durham

about 1 month ago

1 reply

Sure, but this isn't requiring that the LLM do any math. The LLM is writing formulas and code to do the math. They are very good at that. And like any automated system you need to review the work.

causal

about 1 month ago

zarmin

about 1 month ago

1 reply

>I don't trust LLMs to do the kind of precise deterministic work

not just in a spreadsheet, any kind of deterministic work at all.

find me a reliable way around this. i don't think there is one. mcp/functions are a band aid and not consistent enough when precision is important.

after almost three years of using LLMs, i have not found a single case where i didn't have to review its output, which takes as long or longer than doing it by hand.

ML/AI is not my domain, so my knowledge is not deep nor technical. this is just my experience. do we need a new architecture to solve these problems?

about 1 month ago

ML/AI is not my domain but you don’t have to get all that technical to understand that LLMs run on probability. We need a new architecture to solve these problems.

Kiro

about 1 month ago

1 reply

about 1 month ago

In Excel, it's possible to just ad hoc adjust things and make it up as you go. It's not clean but very adaptable and flexible.

MangoCoffee

about 1 month ago

1 reply

LLMs are just a tool, though. Humans still have to verify them, like with very other tools out there

A4ET8a8uTh0_v2

about 1 month ago

about 1 month ago

1 reply

Do you trust humans to be precise and deterministic, or even to be especially good at math?

This is talking about applying LLMs to formula creation and references, which they are actually pretty good at. Definitely not about replacing the spreadsheet's calculation engine.

about 1 month ago

1 reply

I trust humans to not be able to shoot the company on the foot without even realizing it.

Why are we suddenly ok with giving every underpaid and exploited employee a foot gun and expect them to be responsible with it???

27 days ago

1 reply

Maybe my experience is unique, but I have seen so many security issues from underpaid and exploited employees being successfully phished, among many other abuses.

If your experience with the lowest, most-abused employees is better than mine, I envy you.

25 days ago

Now imagine your accountant can ask AI to write a formula that calculates EBITDA so he can go back to gambling on basketball faster

sdeframond

about 1 month ago

> I don't trust LLMs to do the kind of precise deterministic work you need in a spreadsheet.

Rightly so! But LLMs can still make you faster. Just don't expect too much from it.

chpatrick

about 1 month ago

They're not great at arithmetic but at abstract mathematics and numerical coding they're pretty good actually.

bg24

about 1 month ago

mbreese

about 1 month ago

This would be a nightmare for most users to validate what changes an LLM made to a spreadsheet. There could be fundamental changes to a formula that could easily be hidden.

informal007

about 1 month ago

you might trust when the precision is extremely high and others agree with that.

high precision is possible because they can realize that by multiple cross validations

about 1 month ago

It's widely known LLMs are terrible at even basic maths.

Claude for Excel isn't doing maths. It's doing Excel. If the llm is bad at maths then teaching it to use a tool that's good at maths seems sensible.

game_the0ry

about 1 month ago

> I don't trust LLMs to do the kind of precise deterministic work you need in a spreadsheet.

I was thinking along the same lines, but I could not articulate as well as you did.

Spreadsheet work is deterministic; LLM output is probabilistic. The two should be distinguished.

Still, its a productivity boost, which is always good.

prisonguard

about 1 month ago

ChatGPT is actively being used as a calculator.

mhh__

about 1 month ago

If LLMs can replace mathematica for me when I'm doing affine yield curve calculations they can do a DCF for some banker idiots

pavel_lishin

about 1 month ago

2 replies

My concern is that my insurance company will reject a claim, or worse, because of something an LLM did to a spreadsheet.

Now, granted, that can also happen because Alex fat-fingered something in a cell, but that's something that's much easier to track down and reverse.

about 1 month ago

5 replies

They already doing that with AI, rejecting claims at higher numbers than before .

Privatized insurance will always find a way to pay out less if they could get away with it . It is just nature of having the trifecta of profit motive , socialized risk and light regulation .

philipallstar

about 1 month ago

1 reply

> It is just nature of having the trifecta of profit motive , socialized risk and light regulation.

It's the nature of everything. They agree to pay you for something. It's nothing specific to "profit motive" in the sense you mean it.

about 1 month ago

I should have been clearer - profit maximization above all else as long it is mostly legal. Neither profit or profit maximization at all cost is nature of everything .

There are many other entity types from unions[1], cooperatives , public sector companies , quasi government entities, PBC, non profits that all offer insurance and can occasionally do it well.

We even have some in the US and don’t think it is communism even - like the FDIC or things like social security/ unemployment insurance.

[1] Historically worker collectives or unions self-organized to socialize the risks of both major work ending injuries or death.

about 1 month ago

1 reply

> They already doing that with AI, rejecting claims at higher numbers than before

Source?

nartho

about 1 month ago

Haven't risk based models been a thing for the last 15-20 years ?

keernan

about 1 month ago

1 reply

>>They already doing that with AI, rejecting claims at higher numbers than before .

That's a feature, not a bug.

elpakal

about 1 month ago

1 reply

about 1 month ago

about 1 month ago

1 reply

Couldn't they accomplish the same thing by rejecting a certain percentage of claims totally at random?

about 1 month ago

2 replies

That would be illegal though, the goal is do this legally after all.

Not that public health system or insurance a la NHS in UK or like Germany work, they are underfunded, mismanaged with long times in months to see a specialist and so on.

We have to choose our poison - unless you are rich of course, then the U.S. system is by far the best, people travel to the U.S. to get the kind of care that is not possible anywhere else.

https://www.healthsystemtracker.org/chart-collection/u-s-spe...

about 1 month ago

2 replies

about 1 month ago

It is more nuanced, for example Medicare Advantage(Part C) is paid by Medicare money but it is profitable private operators who provide the plans and service it a fast growing part of Medicare .

John Oliver had an excellent segment coincidentally yesterday on this topic.

While the government pays for it, it is not managed or run by them so how to classify the program as public or private ?

30 days ago

That's a great and thorough analysis!

I can't see what argument the private insurers have in their favor.

about 1 month ago

Why does saying "AI did it" make it legal, if the outcome is the same?

about 1 month ago

2 replies

If you think that insurance companies have "light regulation", I shudder to think of what "heavy regulation" would look like. (Source: I'm the CTO at an insurance company.)

about 1 month ago

1 reply

They have too much regulation, and too little auditing (at least in the managed healthcare business).

about 1 month ago

1 reply

about 1 month ago

I mean even in the simple stuff like denying payment for healthcare that should have been covered. CMS will come by and out a handful of cases, out of millions, every few years.

about 1 month ago

2 replies

Light did not mean to imply quantity of paperwork you have to do, rather are you allowed to do the things you want to do as a company.

https://riskandinsurance.com/us-pc-insurance-industry-posts-...

about 1 month ago

1 reply

about 1 month ago

1 reply

It's an accounting 101 thing to use all tricks in the book to reduce the reported profit, to avoid paying taxes on that profit.

25 days ago

zetazzed

about 1 month ago

Meta alone made $62bln in 2024: https://investor.atmeta.com/investor-news/press-release-deta...

So it's weird to see folks on a tech site talking about how enormous all the profits are in health insurance, and citations with numbers would be helpful to the discussion.

wombatpm

about 1 month ago

Wait until a company has to restate earnings because of a bug in a Claudified Excel spreadsheet.

doctorpangloss

about 1 month ago

1 reply

> What is with the negativity in these comments?

Some people - normal people - understand the difference between the holistic experience of a mathematically informed opinion and an actual model.

Lately people have been calling this "syncophancy." This was always the problem. Sycophancy is the product.

Claude Excel is leaning deeply into this garbage.

about 1 month ago

1 reply

It seems like to me the answer is moreso "People on HN are so far removed from the real use cases for this kind of automation they simply have no idea what they're talking about".