Mergiraf: Syntax-Aware Merging for Git

Posted2 months agoActiveabout 2 months ago

Velocifyer

161 points

45 comments

lwn.netTechstory

supportivepositive

Debate

20/100

GitMerge ConflictsSyntax-Aware Merging

Key topics

Git

Merge Conflicts

Syntax-Aware Merging

Mergiraf is a syntax-aware merging tool for Git that can simplify developers' lives by automatically resolving merge conflicts, and the community is generally enthusiastic about its potential benefits.

Snapshot generated from the HN discussion

Discussion Activity

Very active discussion

First comment

15m

Peak period

Day 10

Avg / period

Comment distribution45 data points

Loading chart...

Based on 45 loaded comments

Key moments

01Story posted
Nov 3, 2025 at 9:54 AM EST
2 months ago
Step 01
02First comment
Nov 3, 2025 at 10:10 AM EST
15m after posting
Step 02
03Peak activity
30 comments in Day 10
Hottest window of the conversation
Step 03
04Latest activity
Nov 17, 2025 at 4:55 AM EST
about 2 months ago
Step 04

Generating AI Summary...

Analyzing up to 500 comments to identify key contributors and discussion patterns

Discussion (45 comments)

Showing 45 comments

pavelai

2 months ago

2 replies

Very impressive enhancement. Not a panacea though. It uses tree-sitter approach to solve situations when two users change the same line of code. For example one change function name and other adds a new argument. It will merge it without conflicts. It still has some troubles to solve complex issues, without knowing author intensions. But can significantly simplify developers' lives. Not sure if it would land into git very soon. It requires all git to know all the parsers you need. But definitely worth adding.

1718627440

about 2 months ago

1 reply

What does it do, when the change in function name mean that the number of spaces before each parameter (alignment) changed?

pavelai

about 2 months ago

1 reply

If one of developers changed function name and the other changes alignment of parameters in the same line, this tool would recognize the changes and merge this line without conflicts. Regular git algorithm would turn it into a conflict because the changes happened on the same line

1718627440

about 2 months ago

1 reply

The idea is that the alignment and function name change happens on the same side, since the alignment is caused by the function name. The other side e.g. adds another parameter. Does the new parameter get the correct alignment, or that of the old function name?

pavelai

about 2 months ago

It would be merged into a valid code. What's about alignment in particular, I'm not sure

VelocifyerAuthor

2 months ago

This is a seprate tool that one can tell git to use.

mnemonet

about 2 months ago

1 reply

This is a very interesting idea that could save a lot of time and pain in big projects.

The example shown reminds me pf Zed's CRDTs [1], and their journey to build a fine-grained version control system for agentic development [2]—I imagine this work could prove useful to the Zed/Cursor team, and likely shares a lot of functionality with DeltaDB [2].

- [1]: https://zed.dev/blog/crdts

- [2]: https://zed.dev/blog/sequoia-backs-zed

vinnyhaps

about 2 months ago

1 reply

I’m pretty sure one of the Zed founders wrote tree-sitter, so I’m sure there’s some overlap

It’s really cool to see tree-sitter unlock so many of these use cases. I love using [difftastic] for my diffing tool to get context aware diffs. So in the example from the article, the diff would highlight the `void` and `int` changes with a heavier background of red and green respectively

[difftastic]: https://github.com/Wilfred/difftastic

conartist6

about 2 months ago

1 reply

Max Brunsfeld in fact, yep. He went along to Zed from the Atom team.

But curiously Zed hasn't been very interested in Tree-sitter. They don't seem to see it as having much strategic value to their company, which is odd because lots of other people do see it as a valuable platform. You have Tweag building code formatting on it, you had GitHub building stack graphs on it, you have Merigraph. You even have sone really "out there" stuff like the Software Evolution Library!

olejorgenb

about 2 months ago

1 reply

They use it quite a bit in Zed though. What do you count as "not very interested"?

conartist6

about 2 months ago

1 reply

It comes down to tree sitter being the heart of a semantic IDE. If you use Tree sitter's data to apply a fix for a formatting problem or a lint error you are making a semantic edit to your code using it: you aren't describing that change in terms of the line/col in a text buffer then, but first in terms of the path to the node you wish to adjust in the syntax tree and the semantic rules used to target it.

Zed doesn't want to build a semantic IDE. They've said it a million times, they want to build a text editor, so they just aren't going to put the tree representation at the center of the experience. A text editor's UX is built around the text buffer so that it emulates experience of coding while sitting at a typewriter filling out punch cards. We can do better than the typewriter as the anchoring metaphor for all UX!

I think those projects I listed that build on top of Tree-sitter (all ignored by Zed) all see the potential of semantic changes and of Tree-sitter as a platform for making them.

conartist6

about 2 months ago

Think about it. Tree-sitter is an IDE.

I don't mean a standalone syntax highlighter, I mean it's a whole environment in which you can write software and in which things integrate. An Integrated Development Environment.

But Zed doesn't want that product. That product, if they cared that they owned it, would compete with Zed

ltbarcly3

about 2 months ago

2 replies

claude "resolve merge conflicts"

littlestymaar

about 2 months ago

1 reply

Using 30s worth of H100 GPU instead of <10ms worth of an entry-level CPU, for a worse result.

Well done.

ltbarcly3

about 2 months ago

"Compositing text into graphical data to display it on a 2D array of millions of 32bit RGB pixels instead of just using a pencil and a 50 cent notebook."

Actually I've done this a hundred times now and it has yet to make a single mistake. I don't give a crap how much GPU it uses, grandpa.

Cthulhu_

about 2 months ago

1 reply

OK, I'm going to try and resolve these merge conflicts for you!

First, let me pull up the diff and git status

......

....

...

Hmm, that didn't quite work, let me try that again!

ltbarcly3

about 2 months ago

1 reply

I've resolved hundreds of conflicted merges this way and I don't remember it making a single mistake.

n4r9

about 2 months ago

1 reply

Might that be because LLMs are potentially negatively impacting your memory?

ltbarcly3

about 2 months ago

Yea that's probably it. Or you're wrong? One of those for sure.

sysguest

about 2 months ago

1 reply

finally...

I've been using 1-arg-1-line to avoid most conflicts

Cthulhu_

about 2 months ago

I've been doing some SQL again and one technique I learned years ago was having each thing on its own line, both to reduce churn in version control and allow for easier reordering and commenting out.

Instead of

    SELECT foo, bar, quux FROM baz WHERE storge = 'grault';

    SELECT
       foo
      ,bar
      ,quux
    FROM
      baz
    WHERE
      storge = 'grault'
    ;

It's pretty hideous in this example but for bigger queries maintained over a long period of time it can be beneficial. I assume, it's been nearly 20 years since I did anything more serious with SQL.

mentalgear

about 2 months ago

2 replies

- Related in fine-grained diffing approach: Git heatmap: diff viewer for code reviews

> Heatmap color-codes every diff line/token by how much human attention it probably needs. Unlike PR-review bots, we try to flag not just by “is it a bug?” but by “is it worth a second look?” (examples: hard-coded secret, weird crypto mode, gnarly logic).

https://0github.com/

worldsayshi

about 2 months ago

1 reply

Hmm, it would be nice to just see a heatmap over how many times a line has been changed. There must be some easy-ish way to do that right?

Cthulhu_

about 2 months ago

I think you'd need to write a tool that goes through all revisions of a file and does a count, but if that's cached then it's doable. There's a few tools to view that by file though, including some Git commands, it's a valuable tool to determine which files are edited the most (see also the word "churn").

Valodim

about 2 months ago

The idea is cool but boy does it make you blind to anything the AI doesn't deem noteworthy. Comes down to whether you trust a human reviewer more, or the LLM

paulirish

about 2 months ago

2 replies

Have been using Mergiraf for the past 4 months. It's automatically solved about 70% of my conflicts and, luckily, I've never contested any of them. Pretty pleased.

goku12

about 2 months ago

1 reply

> luckily, I've never contested any of them.

That's to be expected. The philosophy behind git merges is that it will merge only if it is absolutely and unambiguously sure that the resolution is correct. That's when there is only one solution for the merge. It will just throw it's hands up and leave it to the developer if there is any ambiguity - that's if there's more than one way to do the merge.

Every single chunk of merge is a potential conflict. But have you ever contested the regular merge algorithm (ort by default) when it did work? Like when the merge was fully successful, or the successfully merged chunks within a conflicted merge? You can expect the same experience with any merge algorithm that sticks to the git philosophy of being a git [1]. Problems will happen only if they start using some complex heuristics or LLM or something unpredictable like that for the merge.

> It's automatically solved about 70% of my conflicts

At the risk of explaining the obvious, I'm going to try to explain this. (So please don't get angry at me if you already know this.) Imagine that you're trying to manually merge 2 branches without any sort of merge algorithm. For the first case, just assume that you don't know the programming language (imagine that it's in some foreign script). All you have to go by is the record of when each line was added in each branch. The best 'dumb' strategy you have to go with, is the 3-way merge [2]. The referenced page illustrates this. It clearly shows you the advantage of the 3-way merge algorithm over the traditional 2-way merge that we all are familiar with.

But this method still has a disadvantage. You are looking at the source files simply as a bunch of lines, without the knowledge of its more granular structures like the syntax. (Note: That assumption itself may be wrong. That's why merges and git in general doesn't work well on binary files.) At best, all you can hope for is that the two branches don't contain any edits on the same or the adjacent lines. You won't even know the order in which the lines should be arranged. Now you have a conflict - a merge that you're leaving for someone else to solve.

Now assume a second case. You know the programming language this time. But you have no idea what the program does - it's not your project. Even with that limitation, you'll still be able to do a better job than just comparing the lines blindly. Mergiraf docs has a page full of these examples [3]. You can see how obvious the merges look - there is no way you can go wrong. See if you can resolve them just by looking at the lines. That's why mergiraf gives you much better performance without any errors.

There is of course a deeper level of knowledge - the semantic level. The knowledge of what the program does. You need that knowledge to resolve 100% of the merges. And that ultimate merge algorithm is ... you.

> Pretty pleased.

Understandable. But I see a potential problem here. As you are aware, the files to submit to mergiraf are specified in the gitattributes file. There are two ways this can go wrong. First, someone else with your repo may not have or even know about mergiraf. The second, even bigger problem is that some people have global gitattributes files [4] where you place your default attributes. It's possible to setup mergiraf there. But if you do so, your colleagues may not even get a clue as to why certain merges succeed for you, but fail for all of them.

The above problem becomes a bigger issue because merge and rebase conflicts sometimes reappear in later merges or rebases. If that's something mergiraf can solve and you have it, then everything's fine. But if the conflict reappears for someone without mergiraf, they will have to repeat the manual resolution again and again. This happens because git simply wont commit a merge or rebase until we resolve the conflict manually. Therefore, git has no idea what we did in between to resolve it - that is not recorded anywhere. (Well, git-rerere [5] records it if we ask it to. But that's a local-only solution. Everyone will have to do it once on their system.)

There is actually a known solution to the problem. It's called 'first class conflicts' [6]. The idea is to record the conflicts and its resolution in the repo itself (the same info that rerere stores, but in the shared repo). This means that a conflict once resolved will not come back again, because the structured information to resolve it is available in the repo. This means not everyone needs mergiraf and nobody needs to repeat a completed manual resolution. It has other advantages too. You can just continue working after a conflicted merge and leave the resolution for later. Or you could send the conflicts to someone else more specialized in that area of the code.

I have seen this feature in Jujutsu [6] and Pijul [7]. Git doesn't have it probably because this wasn't around when it was developed. But Jujutsu uses git repository format and they somehow managed to implement first-class conflicts on it. Meanwhile, the concept is already there in git as rerere. So perhaps first-class conflicts are possible in Git too. It would be awesome if we had that in Git too. So if anybody who sees this knows how to do it, please please take it up as a wish!

[1] https://github.com/git/git/blob/e83c5163316f89bfbde7d9ab23ca...

[2] https://blog.git-init.com/the-magic-of-3-way-merge/

[3] https://mergiraf.org/conflicts.html

[4] https://git-scm.com/docs/git-config#Documentation/git-config...

[5] https://git-scm.com/docs/git-rerere

[6] https://jj-vcs.github.io/jj/latest/conflicts/

[7] https://pijul.com/manual/why_pijul.html#modeling-conflicts

1718627440

about 2 months ago

1 reply

> But have you ever contested the regular merge algorithm (ort by default) when it did work?

Depends on what you mean by 'contested', but yes. You can have "merge conflicts", that are even correct as far as the syntax is concerned, but are garbage on a semantic level.

goku12

about 2 months ago

1 reply

I'm not talking about the conflicts. I'm talking about the hunks that were resolved successfully. Sometimes they're part of successful merges. Sometimes they're part of conflicted merges where some other hunk was in conflict.

1718627440

about 2 months ago

Me too. A merge can be entirely without merge conflicts and still wrong, because it has (semantic or architectural) "merge conflicts".

Sesse__

about 2 months ago

This is my experience as well. Not a gamechanger, but definitely on the positive side.

gritzko

about 2 months ago

1 reply

> After extracting a list of every merge conflict in the kernel's Git history, I tried using Mergiraf to resolve them. 6,987 still resulted in conflicts, but 428 were resolved successfully. A much larger fraction of merge conflicts were still partially resolved.

bjackman

about 2 months ago

1 reply

Take this with a grain of salt as I haven't tested this claim, but I think C might be a pretty weak language for this tool because you can't really parse it without running the whole preprocessor, which it can't do:

https://codeberg.org/mergiraf/mergiraf/issues/612#issuecomme...

So I think in a more sensible language you might get much better results than this.

gritzko

about 2 months ago

Another aspect is the fact this repo reflects Torvalds’ view of the world. He operates in large-ish changesets.

scoodah

about 2 months ago

1 reply

Way back in the day when I primarily wrote c# I used to use a tool called SemanticMerge. It was pretty cool, it actually parsed the code and could pick up refactors like moving a method to a different class and what not. This kinda reminds me of that a bit.

Cthulhu_

about 2 months ago

Yeah, the article mentions a similar project for Java; I'm a bit surprised / disappointed that there's no more language specific merge tools tbh, or a super-tool that has plugins for individual languages. Maybe this article will attract more attention though.

1718627440

about 2 months ago

1 reply

> Therefore, this merge conflict can be resolved automatically by putting the lines in any order. The resulting merged program has the same behavior either way.

That means that if I the programmer care about the order, I must now review lines, where no merge conflict is indicated. I am not sure I would like that.

PoignardAzur

about 2 months ago

Yeah, that's a bad example, there's a bunch of ways field order matters in Rust.

Import order would have been a better example (they're always supposed to be sorted).

indentit

about 2 months ago

1 reply

I tried using Mergiraf a year or so ago, and ended up with so many weird problems that I eventually tracked down to being caused by it, that I disabled and uninstalled it and never looked back - it was more hassle than it was worth

0x7cfe

about 2 months ago

What kind of problems did you encounter? Could you provide an example?

James_K

about 2 months ago

Very interesting to see what Tree Sitter starting to get used for more things.

virajk_31

about 2 months ago

I really liked the last section of your article, thanks for the numbers

jayd16

about 2 months ago

I wish there were a lot more syntax aware merges built into git (et al). Why are separate columns on the same row of a CSV or multiple appends to a list (in any language you don't want a trailing comma) so annoying to merge?

It could be so much better.

Valodim

about 2 months ago

fyi, comes configured in jj by default. Just `jj resolve --tool mergiraf` and some conflicts go away :)

KuhaLeyka

about 2 months ago

Don't use it in when you write code for critical infrastructure or aviation please. :)

View full discussion on Hacker News

ID: 45799664Type: storyLast synced: 11/20/2025, 8:47:02 PM

Want the full context?