Product Launch

anonymous

90 points

58 comments

Postedabout 2 months agoActiveabout 2 months ago

Launch HN: Hypercubic (YC F25) – AI for COBOL and Mainframes

COBOLmainframe modernizationAI for code

Discussion (58 comments)

Showing 58 comments

jclay

about 2 months ago

2 replies

Exciting work! I’ve often wondered if an LLM with the right harness could restore and optimize an aging C/C++ codebase. It would be quite compelling to get an old game engine running again on a modern system.

I would expect most of these systems come with very carefully guarded access controls. It also strikes me as a uniquely difficult challenge to track down the decision maker who is willing to take the risk on revamping these systems (AI or not). Curious to hear more about what you’ve learned here.

Also curious to hear how LLMs perform on a language like COBOL that likely doesn’t have many quality samples in the training data.

sai18

about 2 months ago

1 reply

Thank you!

The decision makers we work with are typically modernization leaders and mainframe owners — usually director or VP level and above. There are a few major tailwinds helping us get into these enterprises:

1. The SMEs who understand these systems are retiring, so every year that passes makes the systems more opaque.

2. There’s intense top-down pressure across Fortune 500s to adopt AI initiatives.

3. Many of these companies are paying IBM 7–9 figures annually just to keep their mainframes running.

Modernization has always been a priority, but the perceived risk was enormous. With today’s LLMs, we’re finally able to reduce that risk in a meaningful way and make modernization feasible at scale.

You’re absolutely right about COBOL’s limited presence in training data compared to languages like Java or Python. Given COBOL is highly structured and readable, the current reasoning models get us to an acceptable level of performance where it's now valuable to use them for these tasks. For near-perfect accuracy (95%+), that is where we see an large opportunity to build domain-specific frontier models purpose built for these legacy systems.

ciaranmca

about 2 months ago

1 reply

In terms of training your own models, is there enough COBOL available for training or are you going to have to convince your customers to let you train on their data (do you think banks would push back against that?)

sai18

about 2 months ago

There isn’t enough COBOL data available to reach human-level performance yet.

That’s exactly the opportunity we have in front us to make it possible through our own frontier models and infra.

fock

about 2 months ago

> It also strikes me as a uniquely difficult challenge to track down the decision maker who is willing to take the risk on revamping these systems (AI or not).

here that person is a manager which got demoted from ~500 reports to ~40 and then convinced his new boss that it's good to reuse his team for his personal AI strategy which will make him great again.

mkoubaa

about 2 months ago

1 reply

I'm excited for you but nervous about the kinds of bugs that might get caused

sai18

about 2 months ago

Given how far back these systems go, the real challenge isn’t just the code or the lack of documentation, but the tribal knowledge baked into them. A lot of the critical logic lives in conventions, naming patterns, and unwritten rules that only long-time SMEs understand.

Using AI and a few different modalities of information that exist about these systems (existing code, docs, AI-driven interviews, and workflow capture), we can triangulate and extract that tribal knowledge out.

zurfer

about 2 months ago

2 replies

I heard the story once on how you migrate these old systems: 1 you get a super extensive test suite of input - output pairs

2 you do a "line by line" reimplementation in Java (well banks like it).

3 you run the test suite and track your progress

4 when you get to 100 percent, you send the same traffic to both systems and shadow run the new implementation. Depending on how that goes you either give up, go back to work implementation or finally switch to the new system

This obviously is super expensive and slow to minimize any sort of risks for systems that usually handle billions or trillions of dollars.

aayn

about 2 months ago

1 reply

Yeah, we've heard the same "big bang" story a bunch of times as well. However, step 1 (extensive test suite) is often missing and is something you'd have to do as part of the modernization effort as well. Overall, it is a pretty hairy and difficult problem.

mvh

about 2 months ago

1 reply

Sort of related to what bloop is trying to do too right?

sai18

about 2 months ago

Like Bloop, we’re also focused on modernization, but our approach extends beyond code to include the people behind these systems and capturing the institutional knowledge they hold.

makestuff

about 2 months ago

1 reply

I have been part of a few migration projects like this. There is another issue apart from tests not existing. Business/Product still want new high priority features so developers keep adding new logic to the old system because they cannot be supported by the new system yet.

mkoubaa

about 2 months ago

The secret sauce is being able to operate below your maximum effectiveness while still seeming impressive enough. That is, if you want to play the long game and get a lot done over a 5 year horizon.

1970-01-01

about 2 months ago

2 replies

This is weird. Mainframes are the dinosaurs of tech. Never would I think, 'let's add some AI' to a mainframe. It would be like adding opposable thumbs to dinosaurs. You are still solving the wrong problem, but it sure will be interesting watching the systems cope.

sai18

about 2 months ago

1 reply

Looks like it's already been pointed out. We’re not applying AI to these systems — IBM is already pursuing those initiatives (https://research.ibm.com/blog/spyre-for-z).

Our focus is different: we’re using AI to understand these 40+ year-old black box systems and capture the knowledge of the SMEs who built and maintain them before they retire. There simply aren’t enough engineers left who can fully understand or maintain these systems, let alone modernize them.

The COBOL talent shortage has already been a challenge for many decades now, and it’s only becoming more severe.

spratzt

about 2 months ago

Isn’t strange that this talent shortage never translates itself into decent rates?

koolba

about 2 months ago

IIUC, what's being solved here is not the mainframe, it's the lack of knowledge transfer of what the heck that software is doing and how it works. Anything that drives down the cost of answering those types of questions, whether for debugging or replacing, is going to be worth a lot to the dinosaurs running these systems.

koolba

about 2 months ago

1 reply

Neat stuff. For more esoteric environments that could use this type of automated leg up, check out MUMPS: https://en.wikipedia.org/wiki/MUMPS

There's a bunch of mainly legacy hospital and government (primarily VA) systems that run on it. And where there's big government systems, there's big government dollars.

sai18

about 2 months ago

1 reply

Thanks for sharing. It seems MUMPS is just as old and legacy as some of the COBOL systems!

antonvs

about 2 months ago

I was offered a MUMPS job in the 1980s. I took one look at the code and very quickly concluded that life was too short for that.

Later I got into programming language theory, and took another look at MUMPS from that perspective. As a programming language, it’s truly terrible in ways that languages like COBOL and FORTRAN are not. Just as one example, “local” variables have an indefinite lifetime and are accessible throughout a process, i.e. they’re not scoped to functions. But you can dynamically hide/shadow and delete them. It would be hard to design a less tractable way of managing variables if you tried.

MUMPS’ value proposition was how it handled persistent data as a built-in part of the language. In that sense it was a precursor to systems like dBASE, which were eventually supplanted by SQL databases. MUMPS was a pretty good persistent data management system coupled with a truly terrible programming language.

gregsadetsky

about 2 months ago

3 replies

I submitted this [0] story a few weeks ago, which led to some discussion and then being flagged since (I think) people were unsure of how verifiable the stats around COBOL were (the submitted page had more than a tinge of self-promotional language).

I was curious to ask you, as domain experts, if you could talk more to the "70% of the Fortune 500 still run on mainframes" stat you mentioned.

Where do these numbers come from? And also, does it mean that those 70% of Fortune 500s literally run/maintain 1k-1M+ LoC of COBOL? Or do these companies depend on a few downstream specialized providers (financial, aviation/logistics, etc.) which do rely on COBOL?

Like, is it COBOL all the way down, or is everything built in different ways, but basically on top of 3 companies, and those 3 companies are mostly doing COBOL?

Thanks!

[0] https://news.ycombinator.com/item?id=45644205

Muromec

about 2 months ago

2 replies

Rule of thumb — if a bank already was there and dealing with things 30 years ago, it likely has some cobol left.

Generalizing — if the company had enough need for it 30 years ago, was big enough to but a mainframe and the thing they used it for barely changes — chances are it’s still there, if the company is still there.

Banks absolutely do have it in house, in a dedicated secure site with a fence and a moat

tapland

about 2 months ago

And insurance, and there are telcos still having boxes spinning. And places dealing with inventories or airlines and whatever, or healthcare. If databases helped and it was around before 1992 it might still be running some cobol.

sai18

about 2 months ago

This is accurate to what we've seen in the market.

If they were large enough to need compute 30-40+ years ago, they certainly have some mainframes running today. Think Walmart, United Airlines, JPMC, Geico, Coca Cola and so on.

hobofan

about 2 months ago

One quick data point (might be not the most accurate, but I just looked into it a bit and thought I might as well share it): According to a quick look up with Sumble[0], ~120 out of Fortune 500 companies show any interest in COBOL.

Out of those it looks like about ~60 of those are actually using COBOL in-house, while most job postings from the rest are mostly "has experience with dealing with legacy COBOL systems". The top ~40 users are the ones you would expect (big banks, insurances, telco).

Of course this is a very LinkedIn/job postings lens on it, but in terms of gauging how big the addressable market for such a solution may be, I think it should do a decent job.

[0]: https://sumble.com - Not affiliated, I just quite like their product

fock

about 2 months ago

I work at a shop (a specialized provider for finance in your eyes) which still has the "transaction" workload on IBM z/OS (IMS/DB2). The parts we manage (in Openshift) interface with that (as well as other systems) and I have heard of people/seen the commits moving PL/I to Cobol. In 2021. Given Cobol's nature, those apps have more than 1k LoC easily.

We also sublease our mainframes to at least 3 other ventures; one of which is very outspoken they have left the mainframe behind. I guess that's true if you view outsourcing as (literally) leaving it behind with the competitor of your new system... It seems to be the same for most banks, none of which are having mainframes anymore publicly, but for weird reasons they still hire people for it offshore.

Given that our (and IBM's!) services are not cheap I think either a) our customers are horribly dysfunctional in anything but earning money slow and steady (...) and b) they actually might depend on those mainframe jobs. So if you are IBM or a startup adding AI to IBM I guess the numbers might add up to the claims.

cess11

about 2 months ago

1 reply

All I can think about when reading this presentation is that Glamorous Toolkit pretty much has this and more.

sai18

about 2 months ago

1 reply

Reminds me of this comment on the Dropbox HN launch thread: https://news.ycombinator.com/item?id=9224

There are may be other general-purpose tools out there that overlap in some ways, but our focus is on vertically specializing in the mainframe ecosystem and building AI-native tooling specifically for the problems in this space.

cess11

about 2 months ago

I'm not sure what "AI-native" would mean but GT has LLM-integrations, support for working in distributed systems and a COBOL-bridge that has been used in managing transitions of legacy systems.

Here's a talk about it:

https://www.youtube.com/watch?v=W8TSPED0alY

If you load the code referenced here, https://book.gtoolkit.com/analyzing-cobol--the-aws-carddemo-... , you can explore the demo used in the talk.

I'm sure you'll manage to figure out the LLM-integrations.

Edit: The Feenk folks also have a structured theory for why and how to do these things that they've spent a lot of time and experience on refining, visualising and developing tooling around.

I think it is a good idea for anyone working with large legacy systems to have such a theoretical foundation for how to communicate, approach problems and evaluate progress. Without it one is highly likely to make expensive decisions based on gut feeling and vague assumptions.

huevosabio

about 2 months ago

1 reply

Cool! I think this is a great use of LLMs.

The only other player I've seen is Mechanical Orchard

sai18

about 2 months ago

Mechanical Orchard is a major player in this space, though their model is closer to professional services than a true end-to-end AI modernization platform.

iepathos

about 2 months ago

1 reply

I can tell they're using MkDocs for the HyperDocs static site gen based on the screenshot. I'm working on an open source solution to generating docs with AI for codebases and maintaining them for documentation drift. I wrote a bit about it initially here https://entropicdrift.com/blog/prodigy-docs-automation/ for anyone who is interested. I'm currently using mdbook instead of mkdocs, may add support for a mkdocs workflow to produce similar doc sites for just the cost of the LLM tokens. I didn't realize COBOL had such a large market for documentation with code as the source of truth. I'll have to try generating docs on an open source COBOL codebase to see how it fairs. Would be nice if hypercubic had actual live doc examples for what hyperdocs generates instead of just a screenshot, hard to judge how good or bad it is with just a single screenshot.

sai18

about 2 months ago

There’s a live playground you can try out here: https://hyperdocs-public.onrender.com/ — built just for the HN crowd.

29athrowaway

about 2 months ago

2 replies

There is a large graveyard of people trying to escape the mainframe, replace COBOL, etc.

tapland

about 2 months ago

1 reply

I’ve seen a few, 9 figure projects to replace essentially 6 devs and three sysadmins.

Another proposed replacement about to fail now, after half the COBOL devs were laid off.

So if anyone needs a remote openvms/hp nonstop or junior z/os dev :D

sai18

about 2 months ago

Curious why the COBOL devs were laid off if they're in the middle of a modernization or some sort of replacement project?

sai18

about 2 months ago

1 reply

Absolutely true, and the challenge is that a large portion of modernization projects fail (around 70%).

The main reasons are the loss of institutional knowledge, the difficulty of untangling 20–30-year-old code that few understand, and, most importantly, ensuring the new system is a true 1:1 functional replica of the original via testing.

Modernization is an incredibly expensive process involving numerous SMEs, moving parts, and massive budgets. Leveraging AI creates an opportunity to make this process far more efficient and successful overall.

29athrowaway

about 2 months ago

IMO, not the best use case for LLMs.

COBOL projects have millions of lines of code. Any prompt/reasoning will rapidly fill the context window of any model.

And you'll probably have a better luck if you had tokenization understands COBOL keywords.

You probably have better luck implementing a data miner that slowly digests all the code and requirements into a proprietary information retrieval solution or ontology that can help answer questions...

What an engineer tells you can be inaccurate, incomplete, outdated, etc.

cool-RR

about 2 months ago

1 reply

> What’s left behind are opaque, black box systems with almost no one who understands how they work.

Maybe 50-year-old COBOL programs are the original neural networks.

brightball

about 2 months ago

2 replies

Really good money in it for people who want to learn COBOL though.

mkoubaa

about 2 months ago

1 reply

Good money elsewhere for people capable of (a) learning COBOL (b) being effective in a legacy codebase and (c) operating in a large organization in a political savvy way.

sai18

about 2 months ago

I’d also note that COBOL is only one layer of the stack.

The real complexity lies in also understanding z/OS (mainframe operating systems), CICS, JCL, and the rest of the mainframe runtime, it’s an entirely parallel computing universe compared to the x86 space.

le-mark

about 2 months ago

1 reply

> Really good money in it for people who want to learn COBOL though.

False, all those jobs were outsource and offshored long ago.

dragonwriter

about 2 months ago

A lot of them are domestic, either in the public sector directly (so probably not really good money, but usually pretty decent benefits and job security) or with government contractors which require the work to be domestically (so maybe good money.)

martyheyman

about 2 months ago

2 replies

Mainframe COBOL to POSIX native machine language or Java is largely a solved problem. 40 year old code that hasn't been touched in 30 years "ain't broke" and likely doesn't need fixing IMHO. Our take is get it to POSIX/Java so you can sort out all the OTHER STUFF and then rewrite/re-architect at your leisure. {disclosure: we're the GCC COBOL team}.

crackez

about 2 months ago

What about CICS? What about JES?

teruakohatu

about 2 months ago

> The challenge is the institutional knowledge that never made it into code or documentation and has walked out the door.

This is a problem a compiler cannot fix, and is a very real problem.

nnurmanov

about 2 months ago

1 reply

AI for COBOL? Now I’ve seen everything.

anonymous

about 2 months ago

[deleted]

learner007

about 2 months ago

very cool!

le-mark

about 2 months ago

I’ve witnessed two legacy migration projects, both failed. One was a source translation to Java, this one failed because they didn’t have the expertise to manage production Java applications and pulled the plug. The other was a rewrite that went over budget and was cancelled.

> HyperDocs ingests COBOL, JCL, and PL/I codebases to generate documentation, architecture diagrams, and dependency graphs.

Lots of tools available that do this already without AI.

> The goal is to build digital “twins” of the experts on how they debug, architect, and maintain these systems in practice.

That will be a neat trick, will the output be more than sparsely populated wiki?

My experience is there’s not a lot of Will or money to follow these things through.

Edit to add there was a lot of work around business rule extraction, automatic documentation, and static analysis of mainframe systems in 90s leading up to Y2K, but it all fizzled out after that. You all should search the literature if you haven’t.

vector_spaces

about 2 months ago

> Our other tool, HyperTwin, tackles the “tribal knowledge” problem. It learns directly from subject-matter experts, observing workflows, analyzing screen interactions, and conducting AI-driven interviews to capture how they debug and reason about their systems. The goal is to build digital “twins” of the experts on how they debug, architect, and maintain these systems in practice.

How do you consolidate this knowledge across disparate teams and organizational silos? How will you identify and reconcile subtle differences in terminology used across the organization?

Perhaps I misunderstood, but on your website you primarily identify technical implementors as SMEs. IME modernizing legacy data systems in high-stakes environments, the devil is more on the business side -- e.g. disparate teams using the same term to refer to different concepts (and having that reflected in code), or the exact stakeholders of reports or data systems being unknown and unknowable, and discerning between rules that are critical to a particular team or workflow that are opaque to you because e.g. you don't know who all relies on this data or are missing business context, or because the rule is not actually used anymore, or because the implementation of the rule itself is wrong.

Besides, both technical and non-technical stakeholders and SMEs lean heavily on heuristics to decision with the data they are looking at, but often struggle to explicitly articulate them. They don't think to mention certain conditions or filters because for them those are baked into the terminology, or it doesn't occur to them that the organization deals with broader data than what they interact with in their day-to-day.

And unfortunately in these settings, you don't get many chances to get it wrong -- trust is absolutely critical.

I am skeptical that what you will end up with at the end of the day will be a product, at least if your intent is to provide meaningful value to people who rely on these systems and solve the problems that keep them up at night. My feeling is that you will end up as primarily a consultancy, which makes sense given that the problem you are solving isn't primarily technical in nature, it just has technical components.

wiz21c

about 2 months ago

> The goal is to build digital “twins” of the experts on how they debug, architect, and maintain these systems in practice.

Sounds great but... I have migrated a big cobol codebase several years ago. Knowledge stored in the experts is 1/ very wide 2/ full of special cases that pop up only a few times in a year 3/ are usually complex cases involving analysing data, files which are on paper, etc. I strongly doubt an AI will be able to spot that.

The knowledge that usually misses the most is not "how is that done", because spending a few hours on COBOL code is frankly not that hard. What misses is: "why". And that is usually stored in laws, sub-sub-laws, etc. You'll have to ingest the code and the law and pray the AI can match them.

So in the end the AI will make probably 50% of the effort but then you'll need 150% to understand the AI job... So not sure it balances well.

But if it works, well, that's cool because re-writing cobol code is not exactly funny: devs don't want to do it, customers do it because they have to (and not because it'll bring additional value) and the best outcome possible is the customer saysing to you "okay, we paid you 2mio and the new system does the same things as before we started" (the most likely outcome, which I faced, is "you rewrote the system and its worse than before). So if an I can do it, well, cool.

(but then it means you'll fire the team who does the migration which, althgouhg is not funny and not rocket science, requires real expertise; it's not grunt work at all)

brightball

about 2 months ago

In case you’re interested in learning more about COBOL, there was a great talk on it at Carolina Code Conference this year.

https://youtu.be/RM7Q7u0pZyQ?si=zpP6mP7SYxLbuHqJ

kjs3

about 2 months ago

The goal is to build digital “twins” of the experts on how they debug, architect, and maintain these systems in practice.

The goal is to replace people to 'save money'. And I'm always amused at startup founders who so obviously never worked with real people in real environments (outside of the startup bubble) that they think people are too stupid to see this for what it is. I look forward to their explanation to their investors as to why their product didn't meet expectations because after taking 5-6 seconds to figure out what this new 'tool' was intended to do, the users spent all of their time figuring out how to feed it garbage so it didn't become their besty twin replacement.

We’re curious to hear your thoughts and feedback, especially from anyone who’s worked with mainframes or tried to modernize legacy systems.

Lol.

didgeoridoo

about 2 months ago

> rapidly increasing

Surely not at a rate faster than one year per year?

iammrpayments

about 2 months ago

You’re absolutely right, here’s the corrected code to prevent users from receiving 4 quadrillion dollars in their savings account

jasonmar

about 2 months ago

better be prepared with an AI tool to monitor all sales calls and forcibly terminate the zoom immediately if anyone mentions assembly

Resources