Evolving the Ocaml Programming Language (2025) [pdf]

4 months ago

4 replies

I am the author of the talk here o/.

This talk is a _subjective_ take on how the OCaml programming language evolves, based on my observations over the last 10 years I've been involved with it. My aim/hope is to demystify the compiler development process and the tradeoffs involved and encourage more developers to take a shot at contributing to the OCaml compiler.

Happy to answer questions, but also, more importantly, hear your comments and criticisms around the compiler development process, ideas to make it more approachable, etc.

ofrzeta

4 months ago

3 replies

To be honest the story about the two closed PRs for dynamic arrays doesn't really inspire contributions :)

4 months ago

2 replies

You are right that the dynamic arrays story does not read like a straightforward “how to inspire contributions.” But part of what I wanted to do in the talk was to show things as they actually unfolded. In OCaml compiler development, there is a very strong emphasis on correctness and long-term stability. That can make contributions, especially to core language features, feel harder than they might in faster-moving ecosystems.

The dynamic arrays case is a good illustration. What began as a small PR grew into years of design iterations, debates about representation, performance, and multicore safety, and eventually a couple of thousand lines of code and more than 500 comments before it landed. From one perspective, that looks discouraging. From another, it shows the weight we place on getting things right, because once a feature ships, it is very hard to undo.

That tension, between wanting to be open and encouraging contributions but also needing to protect stability, is something I think we should be talking about openly. My hope is that by making the process more visible we can demystify it and help contributors understand not just what happened, but why.

sidkshatriya

4 months ago

3 replies

I think this is the tension in most software. If you want to have excellent and correct software it will take time.

And if you want more features with a "fix as you go approach" you will often have huge technical debt and get saddled with poor interfaces, often forever.

But, I think OCaml errs too much on the side of getting it right the first time. The result is that state of the art keeps moving far ahead. By the time OCaml "catches up" the field of programming languages has moved far ahead. So OCaml always remains the Jack of all trades and the master of none (IMHO).

I like the direction OxCaml is taking. But the problem is that no one has another 10 years to see its learnings get folded back into OCaml. There is a real chance that OxCaml may diverge so much that it becomes impractical to merge it into OCaml. Flambda2 is another great piece of software that may also take a long time to come into OCaml proper.

So I feel that things need to be "speeded up" if OCaml has to become a bigger ecosystem. You can see that some big projects are moving away from OCaml -- facebook for instance used to have their python typechecker in OCaml. Their new one, pyrefly is in Rust. This could be an isolated story, no doubt.

sidkshatriya

4 months ago

3 replies

Now OCaml values adding features carefully to the language so that there is no future regret. But being slow and conservative has _not_ minimized regret. The "O" in OCaml i.e. Objects (and Classes) is almost ignored nowadays. Janestreet, a large industrial user seems to be actively against using the "O" part of OCaml.

So here we have gotten the worst of both worlds -- a language that is evolving slowly and a language that has large features that are almost soft discouraged. My primary language is Rust and not OCaml (mostly dabble in OCaml) so I may not fully know what I'm talking about when it comes to OCaml.

jact

4 months ago

1 reply

The distaste for the OCaml object system is mostly misplaced in the community. While first class modules can mostly replace them — sometimes you really need open recursion. Object types are also a very useful feature used by core libraries.

StopDisinfo910

4 months ago

Ocaml objects are structurally typed which can also be very nice. They definitely have their place.

spit2wind

4 months ago

1 reply

Interesting. I'm still working my way through Correct+Efficient+Beautiful. My takeaway so far has been that Modules _are_ the "O" part of OCaml. I guess there's something more "traditionally" an object? Does that mean there were modules in Caml (or whatever the predecessor was) and it was decided classes might be a good feature to add?

4 months ago

Yes, the O in OCaml extends the type system with structurally typed objects and a classic OOP class system.

They're conceptually nice, specially the in-place object syntax, but the class syntax feels tacked on and the overall implementation is very naive compared to a proper OOP runtime like CLR/JVM/JS.

I suggest reaching for them only when first-class modules aren't enough (i.e. you need open recursion). Even then you could sometimes get away with polymorphic variant constraints, but that's admittedly harder to read and understand.

4 months ago

> and a language that has large features that are almost soft discouraged

It's literally just objects, one large (and early!) feature. Arguably too large compared to the rest of the language: first-class modules or polymorphic variants can handle most of their use cases while being much simpler, and faster than the existing class system. (Objects and object types without the actual classes are maybe ok.)

The only other controversial feature I can think of is Seq and that's just because it can be allocation-heavy. Then again ordinary OCaml lists are not much cheaper (thankfully immutable arrays are already in for 5.4).

StopDisinfo910

4 months ago

1 reply

> By the time OCaml "catches up" the field of programming languages has moved far ahead.

Hard to reconcile with the fact that Ocaml had 90% of the features people like in Rust today twenty years ago, a module system which is still better than Haskell, and is currently implementing a full effect system.

It still pretty much ahead of every mainstream languages.

sidkshatriya

4 months ago

1 reply

> a module system which is still better than Haskell

The module system though powerful is quite awkward and verbose. I personally prefer adhoc polymorphism (class/instance in Haskell, trait/implementation in Rust). That is really missed in OCaml and is likely to be missing for the next few years even though there have been (stalled) efforts like Modular implicits in the past.

Haskell and Scala seem to have many features lacking in OCaml. Some of those features are excessive I'll admit and OCaml can argue that it is more minimalistic (which is also useful).

Yes, effects are definitely a cutting edge feature in OCaml. But they are untyped which is a big limitation I would say.

TL;DR -- OCaml does many things well. It's a good language. My main point is that the language needs to speed up its pace of evolution. OCaml's lunch is being eaten up by lower level and performance oriented languages like Rust. At the higher level it is being squeezed by Lean, Haskell, Fstar etc.

StopDisinfo910

4 months ago

> The module system though powerful is quite awkward and verbose.

Deeply disagree. It’s a lot easier to use and reason about that type classes. You can use Haskell if that’s what you want anyway. I’m glade Ocaml isn’t Haskell.

> Yes, effects are definitely a cutting edge feature in OCaml. But they are untyped which is a big limitation I would say.

APersonally I think it’s an insignificant limitation for a feature existing approximately nowhere else. Anyway I think we have safely killed your initial argument that Ocaml was somehow lagging behind.

4 months ago

OCaml is doing just fine, thanks.

octachron

4 months ago

A point that I find missing in the timeline for dynamic array is that there have been implementation for dynamic arrays available in libraries for more than twenty years.

However, none of the authors of those libraries were really happy with their own implementation because those implementations had to choose between performance, API usability or thread safety.

When I closed the student pull request (which was a naive implementation with no unsafe features), it was with the idea that it was unfair to expect a beginner use to solve those issues.

The subsequent iterations explored different part of the design space before the final iteration which converged to safely using unsafe language features to reach a new local API optimum.

klodolph

4 months ago

1 reply

Maybe what I read here is “this is how contributions go”…

Get the API right first. Make sure it’s correct, safe, and useful. Iterate on the performance afterwards.

IMO, a lot of contributions should take this shape.

4 months ago

It is often hard to see the shape of these things before a serious PR attempt is made. Each of the PRs reveals more of the shape of the problem being solved. Hard to skip them in practice, especially for new contributors.

aseipp

4 months ago

1 reply

I think it's just the nature of the beast, in this case. Serious "industrial" implementations of a programming language might stick around for a long time, and breaking things a lot can mar the appeal; getting it right the first time pays off in that case.

I think the acceptance threshold can be much lower in other kinds of tooling. "It is what it is", so to speak.

Quekid5

4 months ago

Add a sane deprecation process and this is much less of an issue -- see e.g. the Java language. Sure, it's not ideal to have multiple implementations of the 'same' data structure (if a better way is found, say)... but at least you aren't stalling everything and causing API interop issues for years and years.

4 months ago

2 replies

I'm at a conference at the moment so can't give a lengthy answer, but I'm the maintainer of virt-v2v, one large open source OCaml project (large if you include all the dependencies) which generates actual multi-millions in annual revenue, but is often overlooked in all this discussion of the OCaml ecosystem. Glad to talk by email some time.

[BTW we currently have open positions for two developers]

lambda_foo

4 months ago

1 reply

Thanks for sharing, I had no idea about this project.

Could you share some more details about where this project is used? Links to those open positions for OCaml developers would be interesting too, not for myself :-)

4 months ago

1 reply

For converting VMs that run on VMware to run on KVM. After Broadcom purchased VMware a couple of years ago & raised prices (in some cases up to 10x), lots and lots of businesses are desperate to leave VMware. The open positions can be found on the Red Hat website if you search roles for "virt-v2v".

yawaramin

4 months ago

1 reply

Found this one: https://redhat.wd5.myworkdayjobs.com/en-US/jobs/details/Prin...

No mention of OCaml though.

4 months ago

1 reply

Right, we don't mention it in the advert because doing so draws in a certain type of academic person that we don't want to hire. Also because the majority of the code is actually in other programming languages, and we can easily train up (eg.) an ace C coder in OCaml if we need to.

yawaramin

4 months ago

On a different note, I am curious about something–does RH still maintain createrepo/mergerepo? Ie https://github.com/rpm-software-management/createrepo_c

I've been exploring converting them into OCaml as the C versions are very segfault-prone.

4 months ago

It was overlooked because I didn't know!

Great to hear about virt-v2v. I will reach out to you by email.

alabhyajindal

4 months ago

1 reply

This is not a direct comment on compiler development but on industrial projects in general: how do you begin contributing to something that is so large?

What should a beginner in compiler development, someone who has written a few compilers of their own, do to get involved in a project such as OCaml? I understand this issue is not specific to compilers, but is faced by any sufficiently large project. Still, I think it's an important issue. I believe there are many resources for people to get up and running in a field but not enough for them to make the next jump into industrial projects.

4 months ago

2 replies

(I think this advice applies to most large open source projects)

Make sure you have installed and are using the software. Ideally you'd have an ongoing interest in it because it's something you use regularly (whether personally or for work).

Read first, especially the documentation, guidelines to contributing, mailing lists / Github issues / however else the upstream maintainers engage with each other.

Start small. Actually a great place is just to go and fix spelling mistakes and typos in documentation, code, comments, etc. Follow the guidelines for contributing to the letter, even if they appear over-complicated at first.

After you've engaged with small patches, build up. Look through their issues and (since you're using the software every day) find something that is an "itch" that you want to "scratch", and attempt to fix that.

I don't really need to go further because either at some point in this process you'll have become discouraged (for good or bad reasons), or you'll have found your community and will want to contribute more and more.

nitnelave

4 months ago

Also, of course, talk to people. Pitch your PR idea before writing it, so you can avoid hearing "oh, there's a much simpler way" or "we can never merge this approach because of X"

alabhyajindal

4 months ago

Thank you - I appreciate it!

hoppp

4 months ago

Is there a video for the talk? The link is just pdf and I would love to hear it

kubb

4 months ago

4 replies

For people using OCaml, there’s one thing that kinda discourages me in it, that is exceptions as part of the API in the standard library.

Because exceptions aren’t checked, this effectively means that a language designed for type safety has as much type safety as python, because it’s very easy to forget handling something, and get runtime errors.

How do you deal with this day to day? I assume it’s impossible to just believe that all the code you pull in doesn’t use exceptions?

4 months ago

1 reply

> as much type safety as python

There's no type unsafety from unchecked exceptions, because uncaught exceptions are not unsound. Even Haskell has them (error and undefined), because from a theoretical standpoint they're equivalent to reaching an infinite loop. (Now, recovering from an exception isn't unsound either, but it might mess with your usual mutable invariants.)

In more practical terms, concerning overall correctness, OCaml has been adding option-returning variants of those functions, so most exceptions raised from the stdlib nowadays are much more likely to be intended by the author.

ux266478

4 months ago

2 replies

I don't think Haskell is a good language to model our idea of error handling off of. It's one of many bugbears I have with that language, that it uses the Maybe monad as an error type. It technically works, but doesn't provide a meaningful distinction between "This function might not return anything, and this is defined behavior" and "This function has a singularity". MonadError exists, but I can't think of anywhere it shows up without digging deep into dragon caves of the compiler. Everything a normal user is going to touch will deal exclusively in Maybes.

I'm not a fan of Rust as a language for many reasons, but I will give it credit for making proper usage of the Result monad. They could have abused Option the same way Haskell abuses Maybe, but they didn't.

akkad33

4 months ago

1 reply

> I'm not a fan of Rust as a language for many reasons, but I will give it credit for making proper usage of the Result

Rust also has exceptions aka panics

ux266478

4 months ago

So does Haskell.

4 months ago

I was just using Haskell's reputation to push back on the "as much type safety as python" hot take.

> [Haskell] doesn't provide a meaningful distinction between "This function might not return anything, and this is defined behavior" and "This function has a singularity"

I think Haskellers should fear divergence less, or push for SPARK-like static checking. In OCaml, the current trend would be to represent "not return anything" as None; and "has a singularity" by raising Invalid_argument or similar when the singularity check was considered a precondition, or returning Error (or an equivalent variant) for expected inputs.

Usage of Result in OCaml is also growing, thankfully. It's part of the stdlib, and we can use binding operators (let* foo = result) to do the same as ? in Rust (or let! in F#). OCaml 5.4 is even adding a Result.Syntax module so we can just open it instead of defining (let*) ourselves.

On the other hand, Result doesn't give us backtraces, and composes badly with other combinators or imperative flow. In my current project I'm instead giving a try to an effectul result_scope/get_ok API, which composes better.

yodsanklai

4 months ago

> as much type safety as python

That's an exaggeration.

You can use error types / monads like you would do in Rust/Haskell. When you use the Core standard library, you can use function who don't throw exceptions. Those who do use specific name conventions (foobar_exn).

yawaramin

4 months ago

Every mainstream language has exceptions. Everyone knows how to use exceptions. They're easy to use and get the job done. OCaml suffers no type safety issues from the use of exceptions. It also has option and result types so people who need more control flow can use those. The OCaml standard library typically uses exceptions for real exceptional conditions like eg trying to access a key that doesn't exist in a map. Even Rust has panics which are basically exceptions.

You criticized Haskell as not a great example of error handling. Well, Erlang/Elixir also have exceptions, and they are considered the industry leader in error recovery.

Exceptions are actually fine, it doesn't really take much to install handlers which take care of catching, logging, telemetry, re-raising etc. They mostly get a bad rep because of the latest fashions in the PL space.

johnisgood

4 months ago

Your comment does not make much sense even if it is true.

Factor (Forth-like language) implements even its own ":" (defines a word, i.e. a function) using the language itself, it is not builtin, same with "if", and so forth. Thus, "MEMO:" or locals[1] ("::") being implemented as a library does not mean it is a bad thing, on the contrary, in the case of Factor, it makes it quite powerful. The object system is entirely implemented in Factor, too. "Large chunks of functionality are not part of the core language, they are in just as library".[2]

And to compare OCaml's type system to Python's is straight out absurd.

[1] Locals are entirely implemented in Factor, too, which is only about ~500 lines of code. It is not part of the core language, and on top of that, there is no performance penalty whatsoever!

[2] See more here: https://www.youtube.com/watch?v=f_0QlhYlS8g.

zerr

4 months ago

3 replies

I remember the go to alternative "standard" library was being developed by some bank from Wall Street. Is it still the case? i.e. do most people still use that 3rd party lib or did the real standard library evolve since then?

4 months ago

1 reply

My impression is that most people, as in a majority, aren't using Jane Street's Base and Core. Maybe some or even many, but not most, and specially not in the FOSS ecosystem. I think this idea comes from so many learning materials using their libs, you feel kind of funneled towards them at the start.

But yes, the standard library has added many helper functions that were sorely needed during the last few years, and the upcoming 5.4 keeps adding more. Still not as many goodies as Jane Street's libraries, but nowadays I don't miss them as long as I can use just a few small libraries, mostly by dbunzli and c-cube.

aguluman

4 months ago

1 reply

Is stdlib the original then base and core are extensions?