The Rsync Algorithm (1996) [pdf]

Posted8 days agoActive2d ago

vortex_ape

186 points

30 comments

andrew.cmu.eduResearchstory

informativeneutral

Debate

0/100

RsyncAlgorithmScience

Key topics

Rsync

Algorithm

Science

The 1996 paper on the rsync algorithm has sparked a lively debate about the state of security back then, with some commenters arguing it was "pretty fucking bad" [PunchyHamster], while others counter that certain systems and protocols were actually quite secure, citing the existence of PGP, OpenBSD, and Apache [axiolite]. The discussion reveals a nuanced picture, with some pointing out that while certain OSes and protocols were insecure, others were prioritizing security, and that even today, similar security issues persist, as seen in AWS and CloudFlare outage postmortems [gritzko]. As commenters reminisce about the past, they highlight the contrast between then and now, from 6-character passwords to the laborious process of generating entropy for SSH installations [jrpelkonen]. The thread feels relevant today as it underscores that, despite progress, some security challenges remain stubbornly persistent.

Snapshot generated from the HN discussion

Discussion Activity

Active discussion

First comment

Peak period

12-24h

Avg / period

7.1

Comment distribution50 data points

Loading chart...

Based on 50 loaded comments

Key moments

01Story posted
Jan 2, 2026 at 11:56 AM EST
8 days ago
Step 01
02First comment
Jan 2, 2026 at 2:56 PM EST
3h after posting
Step 02
03Peak activity
17 comments in 12-24h
Hottest window of the conversation
Step 03
04Latest activity
Jan 8, 2026 at 12:52 PM EST
2d ago
Step 04

Generating AI Summary...

Analyzing up to 500 comments to identify key contributors and discussion patterns

Discussion (30 comments)

Showing 50 comments

doodlesdev

8 days ago

2 replies

Well-written, succinct.

This small document shows what computer science looked like to me when I was just getting started: a way to make computers more efficient and smarter, to solve real problems. I wish more people who claim to be "computer scientists" or "engineers" would actually work on real problems like this (efficient file sync) instead of having to spend time learning how to use the new React API or patching the f-up NextJS CVE that's affecting a multitude of services.

PunchyHamster

8 days ago

3 replies

to be fair level of security of systems back then was pretty fucking bad

observationist

8 days ago

3 replies

6 characters or fewer passwords, if there were passwords at all. Phreaking still worked into the 90s, and all sorts of really stupid things were done without really thinking about the security at all. They'd print out receipts with the entire credit or debit card number and information on it, or carbon copy the card with an impression, and you'd see these receipts blowing around parking lots, or find entire bags or dumpsters full of them. Knowing an IP address might be sufficient information to gain access to systems that should have been secured. It's pretty amazing that things functioned as well as they did, that society was as trusting and trustworthy as it was, that we were able to build as much as we did with as relatively a tiny level of exploitation that happened.

If the same level of vulnerability was as prevalent today as it was back then, civilization might collapse overnight.

SoftTalker

7 days ago

We still print the routing and account number in full on paper checks, and that's all that's needed to do an ECH transaction. Yet there's not an alarming level of ECH fraud.

mjevans

8 days ago

To be fair, back then it was relatively easy for anyone intelligent enough to be able to abuse any of that to have a well paying 'white collar' job with things like full health benefits, a pension, and more than sufficient income to support an entirely family SOLO. They even owned houses!

When your life is set like that why risk trying to defraud someone a the cost of a nice suit when that's something that can be done legally and written off as a business expense on taxes?

gritzko

8 days ago

Just read AWS or CloudFlare outage postmortems and you will see: are still there, in the happy land.

axiolite

8 days ago

2 replies

In 1996? OpenBSD and Apache had been around for a year. PGP had been around for several years. HTTPS was used where needed. SecurID tokens were common for organizations that cared about security.

Admittedly SSH wasn't around, but kerberos+rlogin and SSL+telnet was available. Organizations who cared about security would have SecurID tokens issued to their employees and required for login.

Dial-in over phone lines, and requiring a password, was much less discoverable or exploitable than services exposed to the internet, today.

wmf

8 days ago

1 reply

And every machine had 100 RCEs that you could discover with a few hours of effort.

axiolite

7 days ago

Even back in 1996, OpenBSD emphasized security. By 2000 they claimed "Three years without a remote hole in the default install!" at the very top of their website. Qmail was released in Dec 1995 and its security withstood scrutiny for quite a lot of years. I'd be interested in seeing just how many RCEs a modern security researcher could actually come up with from a 1996 release of BSDi, OpenBSD, Solaris, AIX, etc. I'd bet on just a handful.

I can understand how, if your whole world was Windows 3.1 and 95, you'd feel that way about security at the time.

jrpelkonen

7 days ago

1 reply

SSH was around, but not nearly as pervasive it is today. I have memories of having to shake my mouse around during the windows client installation to generate entropy. Fun times

axiolite

7 days ago

1 reply

I believe your recollection is off by several years...

What you're describing is PuttyGen. According to Wikipedia, the first Putty release was in 1999. Archive.org doesn't have any snapshots of the Putty website before 2000, so that checks-out.

The RSA patent didn't expire in the US until September 2000, so that's when free implementations like OpenSSH first became widely available. That's precisely when I started using it...

The original SSH was first released mid-1995. There would have been a small number of installations in 1996, but absolutely negligible. It was not well-known until later, circa 2000.

jbotz

7 days ago

2 replies

> There would have been a small number of installations in 1996, but absolutely negligible.

On HN there's always a good chance you're talking to some of the people involved in those "negligible" installations. I know that I submitted some patches to Tatu Ylönen for Ssh to compile on Ultrix, so that must have been in 1995 or early 1996 because after that I didn't have access to any Ultrix machines. I may have been an early adopter, but it didn't take long for ssh to take over the world, at least among Unix system administrators; at Usenix within a year everybody was using ssh because there wasn't any alternative and in terms of security it was a life-saver.

As for the RSA patent... I don't know what license the original Ssh was released under, but it was considered "freeware" when it came out and nobody cared about the US RSA patent. Maybe technically in the USA you shouldn't have used it? Nobody cared.

And the mouse-jiggling thing... not specifically a PuttyGen thing. On linux /dev/random device gave you a few bits at a time stingily, only after it had enough entropy, so it was common for programs that needed good randomness to ask you to jiggle the mouse because that was one of the sources of entropy, so bits random bits would come faster. I'm pretty sure that was still the case well into the Zips.

itsthecourier

6d ago

so I was running a SVN server in a decommissioned PC somewhere in a startup as an intern. whole company ends up using it and out of nowhere it used to freeze, I would go to check if it had rebooted or crashed and everything was fine.

it fixed by itself, without any fixes from my part. happened many times.

asked for help to a senior, guy ran strace and found a read waiting in /dev/random. and of course it solved by itself any time I checked because I was moving the mouse!

controversially but acceptably, we had linked it to urandom and move on

how fast that guy used strace and analyzed the syscalls inspired me to be better at linux

axiolite

6d ago

> it didn't take long for ssh to take over the world

That doesn't seem to be accurate. Wikipedia says, by the end of "2000 the number of users had grown to 2 million"

> everybody was using ssh because there wasn't any alternative

I already listed TWO of the most popular alternatives.

> the mouse-jiggling thing... not specifically a PuttyGen thing. On linux

Parent specifically said "windows client installation." Putty was very common on Windows. PuttyGen specifically and prominently told the user to move their mouse... etc. etc.

mmh0000

8 days ago

No, it wasn't.

Certain Operating Systems (M$) were very bad. Certain protocols were designed without security in mind (smtp, telnet).

But if you were a l33t hax0r back in the day, secure options for everything were available. By 1999 we had BSD Jails (i.e. Docker of today) and chroots (somewhat prone to bad setups), well before that.

https://en.wikipedia.org/wiki/OpenBSD#Security_record

  - Five years without a remote hole in the default install!

cobertos

8 days ago

1 reply

If only those who claim to be "managers" enabled those "engineers" to do such work, but it's not in their interest to their product, their bottom line, or their performance review. At least in their mind.

UqWBcuFx6NV4r

8 days ago

…what? IC developers are a huge, huge contributor to the sort of over-complicated engineering and stack churn that’s at the heart of what’s being described here. Take an iota of responsibility for yourself.

craftkiller

8 days ago

4 replies

I've been using this extensively recently. I was setting up virtual machines that boot a live ISO containing all the software for the machine. I build that ISO with nix. Sometimes I need to change a small config file, which would lead to generating a new 1.7GiB ISO, but 99.9% of that ISO is identical to the previous one. So I used rsync. Blew my mind when after a day of working on these images, uploading 1.7GiB ISO after 1.7GiB ISO, wireguard showed that I had only sent 600MiBs.

Fun surprise, rsync uses file size and modified time first to see if the files are identical. Nix sets the time to Jan 1st 1970 for reproducible builds, and I suspect the ISOs are padded out to the next sector. So rsync was not noticing the new ISO images when I made small changes to config files until I added the --checksum flag.

axiolite

7 days ago

1 reply

You'll find something like BorgBackup will be far more efficient than rsync.

SoftTalker

7 days ago

1 reply

But rsync is widely available, usually installed by default on linux or unix-like systems. You can just use it.

axiolite

6d ago

Borg is available for download as a standalone binary, easily dropped onto any Linux system even with very limited privs. And in the repos of every distro easily installed and kept up-to-date.

By avoiding that one step and using rsync instead, you're resigning yourself to "send 600MiBs" over the network for every tiny config change. Not a good trade-off.

yencabulator

6d ago

[delayed]

seb1204

8 days ago

In the past I downloaded daily diffs from iso which were only few MB. I then applied this diff to my iso from yesterday. Forgot the name of this tool though. I did this on my machine, if parent wants to update in a remote machine I'm not sure it works the same way.

alright2565

7 days ago

If it was a 100GB image on the other hand—good luck! It'd be faster to copy it from scratch every time than to use rsync.

teleforce

8 days ago

6 replies

Fun facts, the author of rsync, Andrew Tridgell, is also the one who reverse-engineered Microsoft SMB that laid the foundation for Samba [1].

How he did manage to avoid lawsuits from Microsoft is beyond me.

[1] Server Message Block:

https://en.wikipedia.org/wiki/Server_Message_Block

js2

8 days ago

1 reply

He also wrote a free BitKeeper client, antagonizing Larry McVoy, which is largely why we have git.

https://blog.brachiosoft.com/en/posts/git/

tekkk

7 days ago

Now that was an awesome blog post, thank you for linking!

webdevver

7 days ago

1 reply

>How he did manage to avoid lawsuits from Microsoft is beyond me.

MS probably chose not to shut down that effort on the basis that it was enabling the MS stack in Linux.

I wish I could dig up an internal presentation that was prepared in the 90s for Bill Gates at the time, which evaluated the threat posed by Linux to Microsoft. I think they were probably happy that Linux now had a reason to talk to Windows machines.

amiga386

7 days ago

1 reply

https://en.wikipedia.org/wiki/Halloween_documents -> https://www.gnu.org/software/fsfe/projects/ms-vs-eu/hallowee...

webdevver

6d ago

thats the one, thankyou for posting!

YesThatTom2

7 days ago

At first, MS didn’t mind as long as SAMBA only implemented the outdated older protocols.

Then they realized interoperability could make them more money, and they invited him and his team to Redmond for a week of working with MS engineers to understand the latest protocol versions. Oh wait, no, it was because the EU forced them. https://www.theregister.com/2007/12/21/samba_microsoft_agree...

oska

8 days ago

Australians might like to know he worked on rsync and Samba while a PhD student at the ANU

kvemkon

8 days ago

A protocol is not a software, it is needed for interoperability.

Similar with header files. Issues arise if there is a "misuse" to derive actually not a compatible but competing solution.

amiga386

7 days ago

He describes how he did it with a French Café analogy:

https://download.samba.org/pub/tridge/misc/french_cafe.txt

snvzz

8 days ago

1 reply

Besides Tridgell's venerable rsync, there exists a permissively licensed implementation[0] by openbsd.

0. https://www.openrsync.org/

irusensei

7 days ago

1 reply

Which is also the current version on MacOS. Thing is it doesn't seem to talk very well with the samba version of rsync. The OpenBSD implementation seems to be capped at the version 29 of the protocol.

When pulling data on MacOS from a Linux computer I would experiment hangs even when setting protocol versions to 29 or 28. What fixed for me was to just switch to the samba rsync program on MacOS.

ectospheno

5d ago

I believe openrsync exists just because of rpki.

https://news.ycombinator.com/item?id=43605846

Most openbsd people I know install the real version from ports.

imiric

8 days ago

1 reply

Rsync is one of my favorite programs. I use it daily. The CLI is a bit quirky (e.g. trailing slashes), but once you get used to it, it makes sense. And I really always use the same flags: `-avmLP`, with `-n` for dry runs.

One alternative I'd like to try is Google's abandoned CDC[1], which claims to be up to 30x faster than rsync in certain scenarios. Does anyone know if there is a maintained fork with full Linux support?

[1]: https://github.com/google/cdc-file-transfer

adrian_b

7 days ago

The same is true for me.

I always alias rsync to:

'/usr/bin/rsync --archive --xattrs --acls --hard-links --progress --rsh="ssh -p PORT -l USER"'

I almost never use any other program for file transfers between computers.

ssl-3

7 days ago

1 reply

The first time I got paid to use rsync was nearly 25 years ago. It provided for reasonably space-efficient, remote, versioned backups of a mail server, using hard links.

That mail server used maildir, which...for those who are not familiar: With maildir, each email message is a separate file on the disk. Thus, there were a lot of folders that had many thousands of files in them. Plus hardlinks for daily/weekly/whatever versions of each of those files.

At the time there were those who were very vocal about their opinion of using maildir in this kind of capacity, as it likened to abuse of the filesystem. And if that was stupid, then my use of hard links certainly multiplied that stupidity.

Perhaps I was simply not very smart at that time.

But it was actually fun to fit that together, and it was kind of amazing to watch rsync perform this job both automatically and without complaint between a pair of particularly not-fast (256kbps?) DOCSIS connections from Roadrunner.

It worked fine. Whenever I needed to go back in time for some reason, the information was reliably present at the other end with adequate granularity -- with just a couple of cron jobs, rsync, and maybe a little bit of bash script to automate it all.

xk3

7 days ago

1 reply

> there were a lot of folders that had many thousands of files in them

If you ever need to do something like this again, it's often faster to parallelize rsync. One tool that provides this is fpsync:

https://www.fpart.org/fpsync/

jraph

7 days ago

1 reply

And you'd probably use the snapshot feature of a filesystem like btrfs or zfs instead of hardlinks for deduplication :-)

xk3

2d ago

Yes and something like btrfs-send or zfs-send is probably faster than fpsync

linsomniac

7 days ago

1 reply

If you want to do similar for block devices: https://github.com/rolffokkens/bdsync

I use it to back up a few virtual machines that, in the event of a site loss, would be difficult to rebuild but also critical to getting our developers back to work. I take an LVM snapshot of the VM, then use bdsync to replicate it to our backup server, and from there I replicate it off to backblaze, then destroy the snapshot.

OrangeDelonge

5d ago

1 reply

How does this compare to drbd?

linsomniac

4d ago

DRBD is more of a live sync, and it's great stuff, as long as you set it up BEFORE you need it, and you need it frequently. If you want to keep a second copy of your data on another system, up to the second(ish), it's a great choice.

If, however, you just want a copy of a block device on another system, like for weekly backup (our case), it's probably overkill. Especially as to keep it truly consistent you need to run in the mode where writes are acked only once the remote AND local devices have it.

My VMs are running on ganeti, which has a mode where the backing device can be DRBD and written to another host. Which works great if you have the extra disc space and can deal with the latency. Also allows you to live migrate VMs between the two hosts.

In my case I ultimately want the copy off-site, so DRBD isn't really a great fit.

DRBD is very good stuff though, I've used it for decades for HA database servers and the like.

bix6

8 days ago

Funny timing, I just used this today while setting up my NAS

cyanydeez

6d ago

has anyone see rsync or similar implemented in WASM for the browser?

mickael-kerjean

7 days ago

In section 6: "tar files ... of the Linux kernel sources ... version ... 1.99.10 ... are approximately 24MB in size ... Out of the 2441 files in the 2.0.0 release 291 files had changed"

It never crossed my mind Linux at some point only had 2441 files and you could actually parse the code that went through a new version, that time has sailed

alex1138

7 days ago

1996 is not that long ago as Unix goes but it's fun to know as I browse with my Debian computer that I'm using something (derived from Unix but BSD also isn't original Unix either) that has a long tradition

View full discussion on Hacker News

ID: 46466734Type: storyLast synced: 1/3/2026, 6:00:43 PM

Want the full context?

Jump to the original sources

Read the primary article or dive into the live Hacker News thread when you're ready.

Open link View on HN