Memory Integrity Enforcement

4 months ago

3 replies

Both approaches revealed the same conclusion: Memory Integrity Enforcement vastly reduces the exploitation strategies available to attackers. Though memory corruption bugs are usually interchangeable, MIE cut off so many exploit steps at a fundamental level that it was not possible to restore the chains by swapping in new bugs. Even with substantial effort, we could not rebuild any of these chains to work around MIE. The few memory corruption effects that remained are unreliable and don’t give attackers sufficient momentum to successfully exploit these bugs.

This is great, and a bit of a buried lede. Some of the economics of mercenary spyware depend on chains with interchangeable parts, and countermeasures targeting that property directly are interesting.

leoc

4 months ago

2 replies

In terms of Apple Kremlinology, should this be seen a step towards full capability-based memory safety like CHERI ( https://en.wikipedia.org/wiki/Capability_Hardware_Enhanced_R... ) or more as Apple signaling that it thinks it can get by without something like CHERI?

4 months ago

3 replies

IMO it's the latter; CHERI requires a lot of heavy lifting at the compile-and-link layer that restricts application code behaviors, and an enormous change to the microarchitecture. On the other hand, heap-cookies / tag secrets can be delegated to the allocator at runtime in something like MIE / MTE, and existing component-level building blocks like the SPTM can provide some of the guarantees without needing a whole parallel memory architecture for capabilities like CHERI demands.

checker659

4 months ago

1 reply

> compile-and-link layer

Not to mention the dynamic linker.

4 months ago

1 reply

Yeah you need a compiler, linker and OS. That's true of any security technology. CHERI may be more significant in that regard because it's a bigger rethink than just stuffing some extra metadata into the existing types, but it's not at all intractable. We, a research group, maintain CheriBSD, a "full-fat" port of FreeBSD to CHERI (Morello and CHERI-RISC-V), so to a big tech organisation it's a small investment. The cost to tech companies is not making it work, it's often much more boring business factors.

chalst

4 months ago

Homepage here:

  https://www.cheribsd.org/

which strangely doesn’t seem to link here:

  https://github.com/CTSRD-CHERI/cheribsd

4 months ago

To reiterate what I've said elsewhere, CHERI does not need a whole parallel memory architecture, there is just one that gets a slight extension over a non-CHERI/MTE system to include tags. But that is the same story as MTE, which also needs to propagate the tags in the memory system (and in fact, more tags, since we just need one bit per 16 bytes, whereas MTE needs 4 bits per 16 bytes in the common scheme).

mschuster91

4 months ago

> CHERI requires a lot of heavy lifting at the compile-and-link layer that restricts application code behaviors, and an enormous change to the microarchitecture.

Well, Apple already routinely forces developers to recompile their applications so if Apple wants to introduce something needing a compiler / toolchain update they can do that easily. And they also control the entire SoC from start to finish and unlike pretty much everyone else also hold an ARM Architecture License so they can go and change whatever they want in the hardware side as well.

4 months ago

3 replies

MTE and CHERI are so different that it’s hard and maybe not even possible to do both at the same time (you might not have enough spare bits in a CHERI 128 bit ptr for the MTE tag)

They also imply a very different system architecture.

quotemstr

4 months ago

2 replies

> MTE and CHERI are so different that it’s hard and maybe not even possible to do both at the same time (you might not have enough spare bits in a CHERI 128 bit ptr for the MTE tag)

Why would you need MTE if you have CHERI?

4 months ago

2 replies

Not saying you’d want both. Just answering why MTE isn’t a path to CHERI

But here’s a reason to do both: CHERI’s UAF story isn’t great. Adding MTE means you get a probabilistic story at least

4 months ago

1 reply

True! On the flip side, MTE sucks at intra-object corruption: if I get access to a heap object with pointers, MTE doesn't affect me, I can go ahead and write to that object because I own the tag.

Overall my _personal_ opinion is that CHERI is a huge win at a huge cost, while MTE is a huge win at a low cost. But, there are definitely vulnerability classes that each system excels at.

4 months ago

2 replies

I think the intra object issue might be niche enough to not matter.

And CHERI fixes it only optionally, if you accept having to change a lot more code

and https://github.com/CTSRD-CHERI/cheripedia/wiki/Colocation-Tu...

4 months ago

1 reply

I think I broadly agree with you. IMO tagging is practically much, much more valuable than capabilities systems modeled like CHERI.

quotemstr

4 months ago

1 reply

Yes, but CHERI opens whole new system design possibilities, including things like ultra-cheap intra-address-space security boundaries. See https://www.cl.cam.ac.uk/research/security/ctsrd/pdfs/201607...

> We have used CHERI’s ISA facilities as a foundation to build a software object-capability model supporting orders of magnitude greater compartmentalization performance, and hence granularity, than current designs. We use capabilities to build a hardware-software domain-transition mechanism and programming model suitable for safe communication between mutually distrusting software

> Processes are Unix' natural compartments, and a lot of existing software makes use of that model. The problem is, they are heavy-weight; communication and context switching overhead make using them for fine-grained compartmentalisation impractical. Cocalls, being fast (order of magnitude slower than a function call, order of magnitude faster than a cheapest syscall), aim to fix that problem.

This functionality revolves around two functions: cocall(2) for the caller (client) side, and coaccept(2) for the callee (service) side. Underneath they are implemented using CHERI magic in the form of CInvoke / LDPBR CPU instruction to switch protection domains without the need to enter the kernel, but from the API user point of view they mostly look like ordinary system calls and follow the same conventions, errno et al.

There's a decent chance that we get back whatever performance we pay for CHERI with interest as new systems architecture possibilities open up.

MTE helps us secure existing architectures. CHERI makes new architectures possible.

4 months ago

Yes, but this breaks mirror mappings.

4 months ago

1 reply

Where studies suggest "a lot" is sub-0.1%. For example, https://www.capabilitieslimited.co.uk/_files/ugd/f4d681_e0f2... was a study into porting 6 million lines of C and C++ to run a KDE+X11 desktop stack on CHERI, and saw 0.026% LoC change, or ~1.5k LoC out of ~6 million LoC, all done in just 3 months by one person. That's even an overestimate, because it includes many changes to build systems just to be able to cross-compile the projects. It's not nothing, but it's the kind of thing where a single engineer can feasibly port large bodies of code. Yes, certain systems code will be worse (like JITs), but the vast majority of cases are not that, and even those are still feasible (e.g. we have people working with Chromium and V8).

4 months ago

1 reply

Does that study include enabling intra object overflow protection, or not?

When I say that this optional feature would force you to change a lot more code I’m comparing CHERI without intra object overflow protection to CHERI with intra object object overflow protection.

Finally, 6 million lines of code is not that impressive. Real OSes are measured in billions

4 months ago

> Does that study include enabling intra object overflow protection, or not? > > When I say that this optional feature would force you to change a lot more code I’m comparing CHERI without intra object overflow protection to CHERI with intra object object overflow protection.

Sorry, I misinterpreted what you were saying. No, that's not with subobject bounds. If you want that then yes there is more incompatibility, because C does not have a good subobject memory model. That's not really because there's anything wrong with CHERI, it's just because the language itself is at odds in places with doing that kind of enforcement with any technology. But, if you're willing to incur that additional friction (as we do for our pure-capability kernel in CheriBSD), you can enable it, and it can protect against additional vulnerabilities that other security technologies fundamentally cannot. We even provide a sliding scale of subobject bounds enforcement, where each of the three levels restricts bounds in more cases at the expense of compatibility. The architecture gives you the flexibility to decide what software model you want to enforce with it.

> Finally, 6 million lines of code is not that impressive.

We have far more than that ported, that was just one case study done in a few months by one developer. FreeBSD alone is, by my very rough estimation cloc that excludes LLVM, about 14 million lines of C and C++ (yes, I'm not distinguishing architecture-specific code and all kinds of other considerations, but it's close enough and gives an order of magnitude for the purposes of this conversation), and we have FreeBSD ported. Not to mention our work on, say, Chromium and V8 (Chromium being another set of 10s of millions of lines of code, again tractable with the engineering effort of just a few members of our research group).

> Real OSes are measured in billions

Citation needed. The Linux kernel is only a bit over 40 million lines of code these days. Real systems may well approach the billions of lines of code running once you factor in all the libraries, daemons and applications running on top of it, but that is not all low-level OS code that needs the kind of porting an OS or runtime does. Even if it were a billion lines of code, though, extrapolating at 0.026% that would be 260 kLoC changed, which isn't that scary a number.

Even V8, which is about the worse case you could possibly have (highly-stylised code written in a way that uses types in CHERI-unfriendly ways; a language runtime full of pointers; many (about 6?) different highly-optimised just-in-time compilers that embed deep knowledge of the ISAs and ABIs they are targeting and like to play games with pointers in the name of performance) we see (last I checked) ~0.8% LoC changed, or about 16k out of 2 million. The porting cost is real, but the numbers have never suggested to us it's at all intractable for industry.

quotemstr

4 months ago

Some progress on UAF though! https://dl.acm.org/doi/10.1145/3703595.3705878

4 months ago

2 replies

Why would you need CHERI if you have working mitigations that don't demand a second bus?

I think it's two halves of the same coin and Apple chose the second half of the coin.

The two systems are largely orthogonal; I think if Apple chose to go from one to the other it will be a generational change rather than an incremental one. The advantage of MTE/MIE is you can do it incrementally by just changing the high bits the allocator supplies; CHERI requires a fundamental paradigm shift. Apple love paradigm shifts but there's no indication they're going to do one here; if they do, it will be a separate effort.

als0

4 months ago

1 reply

Second bus?

4 months ago

3 replies

CHERI fundamentally relies on capabilities living in memory that is architecturally separate from program memory. You could do so using a bus firewall, but then you're at the same place as MIE with the SPTM.

4 months ago

That's not true. Capabilities are in main memory as much as any other data. The tags are in separate memory (whether a wider SRAM, DRAM ECC bits, or a separate table off on the side in a fraction of memory that's managed by the memory controller; all three schemes have been implemented and have trade-offs). But this is also true of MTE; you do not want those tags in normal software-visible main memory either, they need to be protected.

Findecanor

4 months ago

A CHERI capability is stored in main memory but with the tag bit for that location set. The tags are stored in separate memory pages, also in main memory in current designs.

Maybe you've been confused by a description of how it works inside a processor. In early CHERI designs, capabilities were in different architectural processor registers from integers.

In recent CHERI designs, the same register numbers are used for capabilities and other registers. A micro-architecture could be designed to have either all registers be capability registers with the tag bit, or use register renaming to separate integer and capability registers.

I suppose a CHERI MCU for embedded systems with small memory could theoretically have tag pages in separate SRAM instead of caching main memory, but I have not seen that.

4 months ago

So something like having built in RAM for the pagetables that aren’t part of the normal pool? That way no matter what kind of attack you come up with user space cannot pass a pointer to it?

4 months ago

2 replies

CHERI is deterministic.

That’s strictly better, in theory.

(Not sure it’s practically better. You could make an argument that it’s not.)

VogonPoetry

4 months ago

1 reply

This is on the verge of pedantry - CHERI determinism isn't strictly true, garbage collecting abandoned descriptors is currently done asynchronously. Malicious code could attempt to reuse an abandoned descriptor before it is "disappeared". I think it might be possible to construct a synthetic situation where two threads operating with perhaps different privilege in the same address space (something CHERI can support!) have an IPC channel might be affected by the timing.

There is a section in the technical reports that talks about garbage collection.

I don't think CHERI is currently being used with different privileged threads in the same address space.

Findecanor

4 months ago

1 reply

I suspect that the parent poster was referring to MTE's memory protection being probabilistic. There are only 16 tag values for an attacker to guess. You can combine MTE and PAC, but PAC is also only probabilistic.

With CHERI, there is nothing to guess. You either have a capability or you don't.

labcomputer

4 months ago

1 reply

Right, but the problem with CHERI is that you may (probabilistically) continue to have that capability even after you shouldn't. That's the problem.

That's because the capability (tagged pointer) itself is what gives you the right to access memory. So you have to find all the capabilities pointing to a segment of memory and invalidate them. Remember, capabilities are meant to be copied.

Early work on CHERI (CHERIvoke) proposed a stop-the-world barrier to revoke capabilities by doing a full scan of the program's memory (ouch!) to find and invalidate any stale capabilities. Because that is so expensive, the scan is only performed after a certain threshold amount of memory has been freed. That threshold introduces a security / battery life trade-off.

That was followed by "Cornucopia", which proposed a concurrent in-kernel scan (with some per-page flags to reduce the number of pages scanned) followed by a shorter stop-the-world. In 2024 (just last year), "Reloaded" was proposed, which add still more MMU hardware to nearly eliminate pauses, at the cost of 10% more memory traffic.

Unfortunately, the time between free and revocation introduces a short-but-not-zero window for UAF bugs/attacks. This time gap is even explicitly acknowledged in the Reloaded paper! Moreover, the Reloaded revocation algo requires blocking all threads of an application to ensure no dead capabilities are hidden in registers.

In contrast, with MTE, you just change the memory's tag on free, which immediately causes all formerly-valid pointers to the memory granule to become invalid. That's why you would want both: They're complementary.

* MTE gives truly instantaneous invalidation with zero battery impact, but only probabilistic spatial protections from attackers.

* CHERI gives deterministic spatial protection with eventually-consistent temporal invalidation semantics.

4 months ago

> Unfortunately, the time between free and revocation introduces a short-but-not-zero window for UAF bugs/attacks. This time gap is even explicitly acknowledged in the Reloaded paper!

Yes, revocation is batched and asynchronous. This does mean that capabilities remain valid beyond the language-level lifetime of the allocation. However, that does not mean that, within that window, we have not dealt with any UAF attacks. The vast majority of UAF attacks do not care about the fact that the memory has been freed, but rather that the memory has since been repurposed for something else (whether the allocator's own internal metadata or some other new allocation). Cornucopia (both versions) ensures that this does not happen until the next revocation pass; that is, it "quarantines" the memory. Effectively, when you call free, it's "as if" the free were deferred until revocation time. Therefore, if your capability is still valid, that memory is still only in use by you, and so the vast majority of attacks no longer work. This protects you against UAF in a similar way to how making free a no-op protects against most attacks. This is not all attacks, very occasionally the bug is a result of something like undefined behaviour that follows, but I don't know if we've found even one real-world instance of a UAF that this approach isn't going to catch. I'm sure they exist, but the nuance is crucial here to be able to reason about the security of various models.

But yes, MTE+CHERI are complementary in this regard. We have drafted ideas for using MTE with CHERI, which would (a) let you immediately prevent access (noting though that the capability would remain valid for a while, still) (b) let you recycle memory with different MTE colours before needing to quarantine the memory (hoping that, by the time you run out of colours for that memory region, a revocation pass has reclaimed some of them). That is, in theory it both gives stronger protection and better performance. I say in theory because this is just a sketch of ideas, nobody has yet explored that research.

I also note that MTE does not fix the undefined behaviour problem; it will only trap when it sees a memory access, but vulnerabilities introduced due to compilers exploiting undefined behaviour for optimisation purposes may not perform a memory access with the pointer before it's too late.

4 months ago

FWIW (I am a nobody compared to you; I didn't make FIL-C :) ) - I think that MIE/MTE are practically superior to CHERI.

I also think this argument is compelling because one exists in millions of consumer drives, to-be-more (MTE -> MIE) and one does not.

leoc

4 months ago

1 reply

Sure, I'm not suggesting that Apple might actually do both at the same time. They could however implement the less burdensome one now while intending to replace it with the the all-singing-all-dancing alternative down the line.

4 months ago

Gotcha. My point about different systems architectures makes me think it’s unlikely that you’d want to do that

4 months ago

We actually have ideas for how to combine the two; see section C.5 of https://www.cl.cam.ac.uk/techreports/UCAM-CL-TR-987.pdf

commandersaki

4 months ago

1 reply

RIP Vigilant Labs

Okay a bit drastic, I don’t really know if this will affect them.

4 months ago

I think they're going to print money hats, but we'll see. Remember: there isn't a realistic ceiling on what NATO-friendly intelligence and law enforcement agencies will pay for this technology; it competes with human intelligence, which is nosebleed expensive.

https://news.ycombinator.com/item?id=39671337

4 months ago

1 reply

> This is great ...

That's Apple and here is Google (who have been at memory safety since the early Chrome/Android days):

  Google folks were responsible for pushing on Hardware MTE ... It originally came from the folks who also did work on ASAN, syzkaller, etc ... with the help and support of folks in Android ... ARM/etc as well.

  I was the director for the teams that created/pushed on it ... So I'm very familiar with the tradeoffs.
  
  ...

  Put another way - the goal was to make it possible to use have the equivalent of ASAN be flipped on and off when you want it.

  Keeping it on all the time as a security mitigation was a secondary possibility, and has issues besides memory overhead.

  For example, you will suddenly cause tons of user-visible crashes. But not even consistently. You will crash on phones with MTE, but not without it (which is most of them).

  This is probably not the experience you want for a user.

  For a developer, you would now have to force everyone to test on MTE enabled phones when there are ~1mn of them. This is not likely to make developers happy.

  Are there security exploits it will mitigate? Yes, they will crash instead of be exploitable. Are there harmless bugs it will catch? Yes.

  ...

  As an aside - It's also not obvious it's the best choice for run-time mitigation.

Google Security (ex: TAG & Project Zero) do so much to tackle CSVs but with MTE the mothership dropped the ball so hard.

4 months ago

1 reply

This is a Daniel Berlin post explaining why Google didn't originally enable MTE full-time on Android. It explicitly acknowledges that keeping MTE enforcement enabled for everyone would block vulnerabilities.

4 months ago

2 replies

Unfortunate Daniel Berlin did not push Google to invest in MTE for security specifically, like Apple has done now with EMTE (MTE v4?). I mean, AOSP is investing heavily in rewriting core components like Binder IPC in Rust for memory safety instead... They also haven't resurrected the per-app toggle to disable JIT in ART for Java/Kotlin apps (like DVM's android:vmSafeMode)... especially after having delivered on-device "Isolated compilation" but (from what I can tell) only for OS (Java/Kotlin) components.

AOSP's security posture is frustrating (as Google seemingly solely decides what's good and what's bad and imposes that decision on each of their 3bn users & ~1m developers, despite some in the security community, like Daniel Micay, urging them to reconsider). The steps Apple has been taking (in both empowering the developers and locking down its own OS) in response to Celebgate and Pegasus hacks has been commendable.

pjmlp

4 months ago

1 reply

Meanwhile Oracle has been doing it since 2015 with SPARC ADI on Solaris.

I do agree it is a pain not seeing this becoming widely adopted.

As for disabling JIT, it would have the same effect as early Androids, lagging behind Symbian devices, with applications that were wrappers around NDK code.

4 months ago

> As for disabling JIT, it would have the same effect as early Androids

DVM tried to mitigate the slowness with JIT+SSA, but ART mixed in JIT+SSA alongside AOT+PGO (that is, a no JITing ART means a full AOT ART, unlike in DVM where the Interp takes over when in vmSafeMode). Even if the runtime will continue to lag in terms of power/performance efficiency wrt ObjC/Swift, Google should at least let the developers decide if they want to disallow JIT from creating executable memory regions inside their app's sandbox, like Apple does: https://developer.apple.com/documentation/security/hardened-...

4 months ago

1 reply

Google did invest in MTE. In fact you linked to some of their investments that ended up trickling down to Android. The problem is actually shipping this is hard and Google was not able to do it. No, "some in the security community" being loud does not mean it is ready to ship. Google identified several problems that they were not able to solve and thus did not ship it generally.

4 months ago

1 reply

> Google identified several problems that they were not able to solve and thus did not ship it generally.

My lament is, Google did not push it through when it mattered as Apple here has (assuming FEAT_MTE4 is them solving similar problems to productize MTE for security).

> "some in the security community" being loud

Think the GrapheneOS authors deserve more respect. They aren't merely "loud", they shipped features that AOSP later incorporated.

https://xcancel.com/GrapheneOS/status/1964757878910136346

4 months ago

No, FEAT_MTE4 is just part of it. There's a bunch of implementation work that goes on top of it to make it perform well for consumer devices.

OutOfHere

4 months ago

1 reply

Meanwhile, Google is doing all it can to weaken Android safety by withholding images and patches, also by failing to fully segregate applications from each other. The evidence is linked below:

(1) AOSP isn't dead, but Google just landed a huge blow to custom ROM developers: https://www.androidauthority.com/google-not-killing-aosp-356...

(2) Privacy-Focused GrapheneOS Warns Google Is Locking Down Android: https://cyberinsider.com/privacy-focused-grapheneos-warns-go...

(3) GrapheneOS exposes Google's empty promises on Android security updates: https://piunikaweb.com/2025/09/08/grapheneos-google-security...

acdha

4 months ago

2 replies

Look, I’m an iOS user but this seems like flame-bait to me without any technical details. I’ve seen a lot of Google blog posts about security improvements over the years so that seems like a very sweeping assertion if you’re not going to support it.

transpute

4 months ago

1 reply

Recent discussion on 90-day embargo for security updates, https://news.ycombinator.com/item?id=45158523

acdha

4 months ago

1 reply

That’s potentially substantial but I note that Graphene specifically rejected the framing:

transpute

4 months ago

Yes, they said it was worse, i.e. affected all Android, not only AOSP, https://news.ycombinator.com/item?id=45161011

4 months ago

I haven't read the articles posted (and I don't know how credible piunikaweb and cyberinsider are) but here is the first-ish hand information from GrapheneOS: https://grapheneos.social/@GrapheneOS/115164133992525834

> ... Google recently made incredibly misguided changes to Android security updates. Android security patches are (now) almost entirely quarterly instead of monthly to make it easier for OEMs. They're giving OEMs 3-4 months of early access.. Google's existing system for distributing security patches to OEMs was already incredibly problematic. Extending 1 month of early access to 4 months is atrocious. This applies to all of the patches in the bulletins.

> ... The existing system should have been moving towards shorter broad disclosure of patches instead of 30 days. Moving in the opposite direction with 4 months of early access is extraordinarily irresponsible. ...Their 3-4 month embargo has an explicit exception for binary-only releases of patches. We're fully permitted to release the December 2025 patches this month in a release but not the source code.

> Nearly all OEMs were failing to ship the monthly security patch backports despite how straightforward it is. The backports alone are not even particularly complete patches. They're only the High and Critical severity Android patches and a small subset of external patches for the Linux kernel, etc. Getting the full Android patches requires the latest stable releases.

rs_rs_rs_rs_rs

4 months ago

1 reply

This looks amazing, I cannot wait to see how attackers pivot.

_diyar

4 months ago

1 reply

https://xkcd.com/538/

5f3cfa1a

4 months ago

1 reply

I hate this comic because it is profoundly lazy, and I hate it when people hand-wave away meaningful security advances with it.

Hitting people with wrenches leaves marks that can be shown to the media and truth & reconciliation commissions. Wetwork and black-bagging dissidents leaves records: training, operational, evidence after the fact. And it hardly scales – no matter what the powers at be want you to think, I think history shows there are more Hugh Thompsons than Oskar Dirlewangers, even if it takes a few years to recognize what they've done.

If we improve security enough that our adversaries are _forced_ to break out the wrenches, that's a very meaningful improvement!

kridsdale3

4 months ago

1 reply

OK sure, but you don't really need to scale, just find the one guy with $500,000,000 in BTC that you want and hit him.

5f3cfa1a

4 months ago

Again, lazy!

Yes: if you have half of a billion dollars in BTC, sure – you're a victim to the wrench, be it private or public. If you're a terrorist mastermind, you're likely going to Gitmo and will be placed in several stress positions by mean people until you say what they want to hear.

Extreme high-value targets always have been, and always will be, vulnerable to directed attacks. But these improvements are deeply significant for everyone who is not a high-value target – like me, and (possibly) you!

In my lifetime, the government has gone from "the feds can get a warrant to record me speaking, in my own voice, to anyone I dial over my phone" to "oh, he's using (e2e encrypted platform) – that's a massive amount more work if we can even break it". That means the spectrum of people who can be targeted is significantly lower than it used to be.

Spec-fiction example: consider what the NSA could do today, with whisper.cpp & no e2e encrypted calls.

gjsman-1000

4 months ago

8 replies

I think hackers are not ready for the idea that unhackable hardware might actually be here. Hardware that will never have an exploit found someday, never be jailbroken, never have piracy, outside of maybe nation-state attacks.

Xbox One, 2012? Never hacked.

Nintendo Switch 2, 2025? According to reverse engineers... flawlessly secure microkernel and secure monitor built over the Switch 1 generation. Meanwhile NVIDIA's boot code is formally verified this time, written in the same language (ADA SPARK) used for nuclear reactors and airplanes, on a custom RISC-V chip.

iPhone? iOS 17 and 18 have never been jailbroken; now we introduce MIE.

Avamander

4 months ago

2 replies

Saying "never" is too bold. But it's definitely getting immensely difficult.

There are still plenty of other flaws besides memory unsafety to exploit. I doubt that we'll see like a formally proven mainstream OS for a long time.

wbl

4 months ago

1 reply

Those flaws get harder the more restricted devices are.

bigyabai

4 months ago

Depends. If "restriction" means "complexity" then you may end up with scenarios like the BlastDoor vulns (eg. FORCEDENTRY).

4 months ago

True. But if developing an exploit takes 15 years and the average life of the device is five then to some degree that is effectively perfect.

jgalt212

4 months ago

> iPhone? iOS 17 and 18 have never been jailbroken; now we introduce MIE.

So far as you know. There's a reason they call them zero-day vulnerabilities.

ls612

4 months ago

As the ability to make remote controlled hardware unhackable increases the power asymmetry between those who can create such hardware and the masses who cannot will drastically increase. I leave it as an exercise for the audience as to what the equilibrium implications are for the common man, especially in western countries where the prior equilibrium was quite different.

orbital-decay

4 months ago

Unhackable and backdoored. Operation Triangulation would have been impossible without Apple backdooring their own hardware.

heavyset_go

4 months ago

I think the nature of the scene changed and exploits and jailbreaks are kept to small groups, individuals or are sold.

For example, I might know of an unrelated exploit I'm sitting on because I don't want it fixed and so far it hasn't been.

I think the climate has become one of those "don't correct your adversary when they make mistakes" types of things versus an older culture of release clout.

landr0id

4 months ago

>Xbox One, 2012? Never hacked.

Not publicly :)

zb3

4 months ago

Israeli companies and agencies will surely find a way.. even if software/hardware might really be unhackable, it seems people will never be..

4 months ago

I would deeply, strongly caution against using public exploit availability as any evidence of security. It’s a bad idea, because hundreds of market factors and random blind luck affect public exploitability more than the difficulty of developing an exploit chain.

Apple are definitely doing the best job that any firm ever has when it comes to mitigation, by a wide margin. Yet, we still see CVEs drop that are marked as used in the wild in exploit chains, so we know someone is still at it and still succeeding.

When it comes to the Xbox One, it’s an admirable job, in no small part because many of the brightest exploit developers from the Xbox 360 scene were employed to design and build the Xbox One security model. But even still, it’s still got little rips at the seams even in public: https://xboxoneresearch.github.io/games/2024/05/15/xbox-dump...

whitepoplar

4 months ago

1 reply

Is this only available on iPhone 17 for now?

circuitAuthor

4 months ago

1 reply

Available on all the models announced today: air and 17/17 pro (a19 chip and above)

kridsdale3

4 months ago

1 reply

Presumably future M5 model Macs and iPads.

4 months ago

I hope so too, but I could see it being in the M6 instead.

The 202X M-series don’t always have the same core revisions as the A-series. Sometimes they’re based on the cores from 202X-1.

Given how nice a feature it is I certainly hope it’s in the M5.

brcmthrowaway

4 months ago

1 reply

If we are checking every pointer at runtime how isn't this dog slow?

hyperhello

4 months ago

The chip does it by itself, in parallel to its other operations.

brcmthrowaway

4 months ago

2 replies

How does this compare to CHERI?

4 months ago

4 replies

Substantially less complex and therefore likely to be substantially easier to actually use.

CHERI-Morello uses 129-bit capability objects to tag operations, has a parallel capability stack, capability pointers, and requires microarchitectural support for a tag storage memory. Basically with CHERI-Morello, your memory operations also need to provide a pointer to a capability object stored in the capability store. Everything that touches memory points to your capability, which tells the processor _what_ you can do with memory and the bounds of the memory you can touch. The capability store is literally a separate bus and memory that isn't accessible by programs, so there are no secrets: even if you leak the pointer to a capability, it doesn't matter, because it's not in a place that "user code" can ever touch. This is fine in theory, but it's incredibly expensive in practice.

MIE is a much simpler notion that seems to use N-bit (maybe 4?) tags to protect heap allocations, and uses the SPTM to protect tag space from kernel compromise. If it's exactly as in the article: heap allocations get a tag. Any load/store operation to the heap needs to provide the tag that was used for their allocation in the pointer. The tag store used by the kernel allocator is protected by SPTM so you can't just dump the tags.

If you combine MIE, SPTM, and PAC, you get close-ish to CHERI, but with independent building blocks. It's less robust, but also a less granular system with less overhead.

MIE is both probabilistic (N-bits of entropy) and protected by a slightly weaker hardware protection (SPTM, which to my understanding is a bus firewall, vs. a separate bus). It also only protects heap allocations, although existing mitigations protect the stack and execution flow.

Going off of the VERY limited information in the post, my naive read is that the biggest vulnerability here will be tag collision. If you try enough times with enough heap spray, or can groom the heap repeatedly, you can probably collide a tag with however many bits of entropy are present in the system. But, because the model is synchronous, you will bus fault every time before that, unlike MTE, so you'll get caught, which is a big problem for nation-state attackers.

leoc

4 months ago

1 reply

Something I'm not clear about: is CHERI free and clear in patent terms, or do people have their hands out grasping for an MPEG-like licensing bonanza? If it's the latter then that might matter as much as purely technical obstacles to CHERI adoption.

4 months ago

Cambridge and Arm have made a joint statement that nothing that is essential to the deployment of CHERI ("capability essential IP") is being patented by them: https://www.cl.cam.ac.uk/techreports/UCAM-CL-TR-953.pdf. As with any patent issues, you should consult your legal team and not take anyone else's word for it, because patent law is a minefield and who knows what patents may be out there lurking that nobody realises happens to cover some aspect of CHERI, or design choices in an implementation of it, as with any processor technology, but we are not out to patent it. We believe that the right thing to do is to make the technology open in order to allow it to be widely used for the good of the field.

4 months ago

1 reply

> has a parallel capability stack

There is one stack, the normal program stack that's normal main memory.

> capability pointers

If you use pure-capability CHERI C/C++ then there is only one type of pointer to manage; they just are implemented as capabilities rather than integers. They're also just extensions of the existing integer registers; much as 64-bit systems extend 32-bit registers, CHERI capability registers extend the integer registers.

> requires microarchitectural support for a tag storage memory

Also true of MTE?

> your memory operations also need to provide a pointer to a capability object stored in the capability store

There is no "capability object stored in the capability store". The capability is just a thing that lives in main memory that you provide as your register operand to the memory instruction. Instead of `ldr x0, [x1]` to load from the address `x1` into `x0`, you do `ldr x0, [c1]` to load from the capability `c1`. But `c1` has all of the capability; there is no indirection. It sounds like you are thinking of classical capability systems that did have that kind of indirection, but an explicit design goal of CHERI is to not do that in order to be much more aligned with contemporary microarchitecture.

> The capability store is literally a separate bus and memory that isn't accessible by programs,

As above, there is no separate bus, and capabilities are not in separate memory. Everything lives in main memory and is accessed using the same bus. The only difference is there are now capability tags being stored alongside that data, with different schemes possible (wider SRAM, DRAM ECC bits, carving out a bit of main memory so the memory controller can store tags there and pretend to the rest of the system that memory itself stores tags). To anything interacting with the memory subsystem, there is one bus, and the tags flow with the data on it.

4 months ago

1 reply

> To anything interacting with the memory subsystem, there is one bus, and the tags flow with the data on it.

To the architecture, there is one access mechanism with the tag bit set and one separate mechanism with the tag bit unset, no?

I thought this was the whole difference: in MTE, there is a secret tag hidden in a “normal” pointer by the allocator, and in CHERI, there is a separate architectural route for tag=0 (normal memory) and tag=1 (capabilities memory), whether that separate route eventually goes to some partition of main memory, a separate store entirely, ECC bit stuffing, or whatever?

4 months ago

No. The capability itself lives in normal memory intermingling with data just like any other pointer. There is no "capabilities memory", it is just memory.

In MTE, you have the N-bit (typically 4) per-granule (typically 16 byte) "colour"/tag that is logically part of the memory but the exact storage details are abstracted by the implementation. In CHERI, you have the 1-bit capability tag that is logically part of the memory but the exact storage details are abstracted by the implementation. If you understand how MTE is able to store the colours to identify the different allocations in memory (the memory used for the allocations, not the pointers to the allocations) then you understand how CHERI stores the tags for its capabilities, because they are the same basic idea. The difference comes in how they're used: in MTE, they identify the allocation, which means you "paint" the whole allocation with the given "colour" at allocation time (malloc, new, alloca / stack variables, load time for globals), but in CHERI, they identify valid capabilities, and so only get set when you write a valid capability to that memory location (atomically and automatically). This leads to very different access patterns and densities (e.g. MTE must tag all data regardless of its type, whereas CHERI only tags pointers, meaning large chunks of plain data have large chunks of zero tag bits, so how you optimise your microarchitecture changes).

Perhaps you're getting confused with details about the "tag table + cache" implementation for how tags can be stored in commodity DRAM? For CHERI you really want 129-bit word (or some multiple thereof) memory, but commodity DRAM doesn't give you that. So as part of the memory controller (or just in front of it) you can put a "tag controller" which hides a small (< 1%) fraction of the memory and uses it to store the tags for the rest of the memory, with various caching tricks to make it go fast. But that is just the tag, and that is an implementation detail for how to pretend that your memory can tag data. You could equally have an implementation that uses wider DRAM (e.g. in the case of DRAM with ECC bits to spare). Both schemes have been implemented. But importantly memory is just 128+1-bit; the same 128 bits always store the data, whether it's some combination of integers and floats, or the raw bytes of a capability. In the former case, the 129th tag bit will be kept as 0, and in the latter case it will be kept as whatever the capability's tag is (hopefully 1).

4 months ago

The early ARM Cortex MTE support has full support for synchronous and asymmetric (synchronous on reads, asynchronous on write) modes. Asynchronous was near zero cost and asymmetric comparable to a mitigation like MTE. This has been available since the launch of the Pixel 8 for Android. GrapheneOS began using it in the month the Pixel 8 launched after integrating it into hardened_maloc. It currently uses mode synchronous for the kernel and asymmetric for userspace. EMTE refers to FEAT_MTE4 which is a standard ARM extension with the 4th round of MTE features. It isn't Apple specific.

MTE is 4 bits with 16 byte granularity. There's usually at least 1 tag reserved so there are 15 random tags. It's possible to dynamically exclude tags to have extra deterministic guarantees. GrapheneOS excludes the previous random tag and adjacent random tags so there are 3 dynamically excluded tags which were themselves random.

Linux kernel MTE integration for internal usage is not very security focused and has to be replaced with a security-focused implementation integrated with pKVM at some point. Google's recently launched Advanced Protection feature currently doesn't use kernel MTE.

4 months ago

SPTM isn't a hardware feature, it's basically a hypervisor that manages the page tables and tag memory so that the kernel doesn't own its own tags.

4 months ago

1 reply

https://saaramar.github.io/memory_safety_blogpost_2022/ is a nice article which goes into this topic for MTE in the past.

4 months ago

And of note, the Apple implementation basically forces the invariants documented in the author's talk:

* use synchronous exceptions (“precise-mode”), which means the faulted instruction cannot retire and cause damage

* re-tag allocations on free

4 months ago

3 replies

This is really impressive.

It’s my understanding that this won’t protect you in the case where the attacker has a chance to try multiple times.

The approach would be something like: go out of bounds far enough to skip the directly adjacent object, or do a use after free with a lot of grooming, so that you get a a chance of getting a matching tag. The probability of getting a matching tag is 1/16.

But this post doesn’t provide enough details for me to be super confident about what I’m saying. Time will tell! If this is successful then the remaining exploit chains will have to rely on logic bugs, which would be super painful for the bad guys

4 months ago

2 replies

Even with Android MTE, one of the workarounds was probabilistic attacks on the small tag size, which imply multiple tries. One of the big distinctions here is uniform synchronous enforcement, so writes trap immediately and not on the next context switch.

4 months ago

2 replies

It's typically used in synchronous or asymmetric mode on Android. The asymmetric mode preserves nearly the same performance as asymmetric while only having writes remain asynchronous. It's enforced once there's a read or system call. Synchronous is more important in the kernel due to how many holes there are for bypassing it, which is why GrapheneOS is using it as synchronous in the kernel and asymmetric in userspace. io_uring is a major example of how there could be a major bypass of asymmetric mode, although Android doesn't allow it for more than a few core processes. Deploying asynchronous is still useful since it's a widely distributed bug finding tool with near zero cost. The main cost is that it finds so many bugs which need to be addressed which is a barrier for deploying it for third party apps.

The main weakness is that MTE is only 4 bits... and it's not even 1/16 but typically 1/15 chance of bypassing it since a tag is usually reserved for metadata, free data, etc. The Linux kernel's standard implementation for in-kernel usage unnecessarily reserves more than 1 to make debugging easier. MTE clears the way for a more serious security focused memory tagging implementation with far more bits and other features. It provides a clear path to providing very strong protection against the main classes of vulnerabilities used in exploits, especially remote/proximity ones. It's a great feature but it's more what it leads to that's very impressive than the current 4 bit MTE. Getting rid of some known side channels doesn't make it into a memory safety implementation.

4 months ago

1 reply

You'd know better than I would; I'm a bystander on this whole area of development. I was really just responding to the notion that these countermeasures fall to attackers who get multiple bites at the apple --- those attackers are explicitly part of the threat model. I think I have realistic expectations about what this revision of MIE is going to do (raise costs, maybe over time wash out a lower tier of exploit developers on the platform).

4 months ago

2 replies

I think they've likely done a great job implementing it and think it will significantly improve iPhone security. I dislike the over the top marketing resembling a technical blog post. It's as if they've deployed CHERI in production with near 0 overhead rather than an incremental improvement over what standard ARM Cortex cores shipped years ago which people have been using in production.

Others are aware of where MTE needs improvement and are working on it for years. Cortex shipped MTE with a side channel issue which is better than not shipping it and it will get addressed. Apple has plenty of their own side channel vulnerabilities for their CPUs. Deterministic protections provided via MTE aren't negatively impacted by the side channel and also avoid depending on only 4 bits of entropy. The obvious way to use MTE is not the only way to use it.

GrapheneOS began using MTE in production right after the Pixel 8 provided a production quality implementation, which was significantly later than it could have been made available since Pixels aren't early adopters of new Cortex cores. On those cores, asynchronous MTE is near free and asymmetric is comparable to something like -fstack-protector-strong. Synchronous is relatively expensive, so making that perform better than the early Cortex cores providing MTE seems to be where Apple made a significant improvement. Apple has higher end, larger cores than the current line of Cortex cores. Qualcomm's MTE implementation will be available soon and will be an interesting comparison. We expect Android to heavily adopt it and therefore it will be made faster out of necessity. The security advantage of synchronous over asymmetric for userspace is questionable. It's clearer within the kernel, where little CPU time is spent on an end user device. We use synchronous in the kernel and asymmetric in userspace. We haven't offered full synchronous as an option mainly because we don't have any example of it making a difference. System calls act as a synchronization point in addition to reads. io_uring isn't available beyond a few core processes, etc.

4 months ago

1 reply

Apple has implemented synchronous MTE with almost neutral overhead and also mitigated Spectre v1 using a novel technique that I haven't heard of before (which, alas, they don't really go into detail here); what's more, they plan to ship this to (hundreds of) millions of devices. I think these are significant improvements of the state of the art.

4 months ago

1 reply

> also mitigated Spectre v1 using a novel technique that I haven't heard of before

Unsure about iOS, but back then, Webkit published their initial mitigations (like: Index masking, Pointer poisoning): https://webkit.org/blog/8048/what-spectre-and-meltdown-mean-...

4 months ago

Yeah, this is what they came up with in the wake of those being released. They have a performance impact though.

commandersaki

4 months ago

I just want to address this part. Why shouldn't Apple advertise or market its achievements here? If they're effectively mitigating and/or frustrating real world attacks and seems to eliminate a class of security bugs, why shouldn't they boast about it; it shows that security R&D is in the forefront of the products they build which is an effective strategy for selling more product to the security conscious consumer.

Not a shill, but a shareholder, and I invest in Apple because they're at the forefront of a lot of tech.

vayup

4 months ago

1 reply

In theory, it is a 1/15 chance of successful attack. Which is a terribly low success rate of attack prevention.

In practice, it is 15/16 chance of detection of the exploit attempt. Which is an extraordinarily high rate of detection, which will lead to a fix by Apple.

Net net, huge win. But I agree they come across as overstating the prevention aspect.

4 months ago

1 reply

1/16 is a miserable rate for commercial spyware; it would be very difficult to sell this kind of product for a remote 0- or 1-click attack.

4 months ago

1 reply

It’s miserable today because you’d be competing with others selling deterministic attacks.

But what if the only thing available to purchase is 1/16 or 1/256? Then maybe it’s not so miserable

4 months ago

I trust that people will find a way to make deterministic attacks, I just think they will charge through the nose for it.

4 months ago

Yeah that’s really good.

That makes the probability work against the attacker really well. But it’s not a guarantee

achierius

4 months ago

2 replies

The other 15/16 attempts would crash though, and a bug that unstable is not practically usable in production, both because it would be obvious to the user / send diagnostics upstream and because when you stack a few of those 15/16s together it's actually going to take quite a while to get lucky.

4 months ago

1 reply

I get that. That’s why I’m adding the caveat that this doesn’t protect you against attackers that are in a position to try multiple times

zarzavat

4 months ago

1 reply

Detection is 14/15ths of the battle. Forcing attackers to produce a brand new exploit chain every few weeks massively increases attack cost which could make it uneconomical except for national security targets.

4 months ago

It will be really interesting to see how well that part of the story works out!

What we're essentially saying is that evading detection is now 14/15 of the battle, from the attacker's perspective. Those people are very clever

4 months ago

2 replies

Typically 14/15 since a tag is normally reserved for metadata, free data, etc. Linux kernel reserves multiple for the internal kernel usage since it was introduced upstream as more of a hardware accelerated debugging feature even though it's very useful for hardening.

loeg

4 months ago

93%, 94%, it's not a huge difference.

achierius

4 months ago

It's more complicated than that, so I just use 15/16 to gesture at the general idea. E.g. some strategies for ensuring adjacent tags don't collide can include splitting the tags-range in half and tagging from one or the other based on the parity of an object within its slab allocation region. But even 1/7 is still solid.

amelius

4 months ago

It also won't help with supply-chain attacks. Executed by e.g. a state actor who is playing the long game.

4 months ago

4 replies

1988 called and wants it memory tagging back https://www.devever.net/~hl/ppcas !

But yeah this was support for a the longest time by IBM basically. It's nice to see it's getting more widespread.

4 months ago

1 reply

The problem with PowerPC AS tagging was that it relied entirely on the trap instruction. If you could control execution at all, you could skip the trap instruction and it did nothing. This implementation, by my reading, essentially adds a synchronous trap instruction after every single load and store, which builds a real security boundary (even compared to Android MTE, where reads would trap but writes were only checked at the next context switch).

4 months ago

Yeah, the security part wasn't baked into the hardware. It relied on the OS (it ran a virtualization layer of sorts) to enforce it via traps if it set those traps.

From https://www.devever.net/~hl/ppcas

> As such, they can principally be viewed as providing a performance enhancement for the IBM i operating system, which uses these instructions to keep track of pointer validity. It is the IBM i OS which enforces security invariants, for example by always following every pointer LQ with a TXER.

sillywalk

4 months ago

1 reply

Nitpick: The AS/400 in 1988 didn't use the PowerPC. I believe it had it's own proprietary memory with tag bits included.

The first RS-64 with the PowerPC AS extensions came out in 1995.

4 months ago

You're right. That's a good point.

4 months ago

1 reply

SPARC ADI was a predecessor to ARM MTE. ARM MTE has been available and used in production for several years now. ADI is also 4 bit but with 64 byte granularity rather than 16 byte.

4 months ago

That's interesting. I had no idea about SPARC.

From https://lwn.net/Articles/710668/

> If a rogue app attempts to access ADI enabled data pages, its access is blocked and processor generates an exception.

Yeah that sounds closer to ARM MTE. Thanks for the pointer

pyth0

4 months ago

The big difference with this seems like it is an actual security mechanism to block "invalid" accesses where as the tagged memory extensions only provided pointer metadata and it was up to the OS to enforce invariants.

> Extensions provide no security. [...] The tagged memory extensions don't stop you from doing anything.

superkuh

4 months ago

5 replies

This is the opposite of fun computing. This is commercial computing who's only use case it making sure that people can send/receive money through their computers securely. I love being able to peek/poke inside and look at my processes ram, or patch the memory of an executable. All this sounds pretty impossible on Apple's locked down systems.

They're not so much general purpose computers anymore as they are locked down bank terminals.

user3939382

4 months ago

2 replies

Bingo. None of this is for users. Apple somehow managed to put on a marketing mask of user respect when they’re at least as user abusive as anyone else.

https://www.theguardian.com/news/2022/jan/17/two-female-acti...

4 months ago

1 reply

bigyabai

4 months ago

1 reply

It's detestable how Apple handled the aftermath of this: https://en.wikipedia.org/wiki/FORCEDENTRY

  In November 2021, Apple Inc. filed a complaint against NSO Group and its parent company Q Cyber Technologies in the United States District Court for the Northern District of California in relation to FORCEDENTRY, requesting injunctive relief, compensatory damages, punitive damages, and disgorgement of profits but in 2024 asked the court to dismiss the lawsuit.

The perpetrators were caught red-handed and let, go by Apple! This crime can, will, and has continued to happen due to the negligence of Apple's leadership. No doubt influenced by Tim Cook's obligation to the White House and their friends.

4 months ago

1 reply

If I remember correctly the Israeli government stepped in and seized all the material that Apple could use in the lawsuit, so there was no point in continuing.

bigyabai

4 months ago

1 reply

If Israel can deter Apple so easily, that doesn't bode well at all for Apple's stance towards American surveillance.

4 months ago

It's just a private company. They only have power compared to governments in Cyberpunk.

pparanoidd

4 months ago

1 reply

   >None of this is for users

Your hatred for apple has made you genuinely delusional

user3939382

4 months ago

A company who cared about users instead of its own profits wouldn’t do any of the things Apple does. Who’s really the naive one here?

nine_k

4 months ago

1 reply

It's all fun and games until somebody else patches the RAM of your device, and sends your money away from your account.

More interesting is how to trace and debug code on such a CPU. Because what a debugger often does is exactly patching an executable in RAM, peeks and pokes inside, etc. If such an interface exists, I wonder how is it protected; do you need extra physical wires like JTAG? If it does not, how do you even troubleshoot a program running on the target hardware?

4 months ago

1 reply

You disable mitigations for those processes.

superkuh

4 months ago

1 reply

So what stops malware from disabling the mitigations? This is the same issue that Firefox had re: requiring all add-ons to be approved and cryptographically signed by them. If it were possible to disable it it'd be useless. So 99.999% of firefox executables simply cannot run anything not first signed by Moz.

4 months ago

Currently you can't disable these if you don't have developer mode turned on

snowwrestler

4 months ago

1 reply

It’s a shame you’re getting downvoted because I think you’re correct, and this is a perfectly valid opinion to hold.

I would respond by saying that sometimes I actually want a locked-down bank terminal (when I’m banking for example), and I appreciate the opportunity to buy one.

Computing hardware in general is way less expensive and more abundant than it used to be, so there are still many options in the marketplace for people to peek and poke into.

superkuh

4 months ago

>sometimes I actually want a locked-down bank terminal (when I’m banking for example), and I appreciate the opportunity to buy one.

Yep, it's a valid use case. It's just not a general purpose computer. And it's a complete refutation of the ideals of Apple when it started out (see, 1984 commercial).

4 months ago

If you like using debuggers, don't worry, MTE gives you a lot more chances to use them since it finds a lot more crashes. It doesn't stop you writing to memory though, as long as it's the correct type.

PAC may stop you from changing values - or at least you'd have to run code in the process to change them.

b_e_n_t_o_n

4 months ago

I think if you want to tinker with hardware, you shouldn't buy Apple. It's designed for people who use it as a means to an end, and I think that's a good thing for most people (including me). I want to bank on hardware that I can trust to be secure. Nothing wrong with building your own linux box for play time though.

[1] https://grapheneos.org/releases#2023103000 [2] https://xcancel.com/GrapheneOS/status/1716946325277909087#m

4 months ago

1 reply

>Google took a great first step last year when they offered MTE to those who opt in to their program for at-risk users. But even for users who turn it on, the effectiveness of MTE on Android is limited by the lack of deep integration with the operating system that distinguishes Memory Integrity Enforcement and its use of EMTE on Apple silicon.

>With the introduction of the iPhone 17 lineup and iPhone Air, we’re excited to deliver Memory Integrity Enforcement: the industry’s first ever, comprehensive, always-on memory-safety protection covering key attack surfaces — including the kernel and over 70 userland processes — built on the Enhanced Memory Tagging Extension (EMTE) and supported by secure typed allocators and tag confidentiality protections.

Of course it is a little disappointing not to see GrapheneOS's efforts in implementing [1] and raising awareness [2] recognised by others but it is very encouraging to see Apple making a serious effort on this. Hopefully it spurs Google on to do the same in Pixel OS. It should also inspire confidence that GrapheneOS are generally among the leaders in creating a system that defends the device owner against unknown threats.

4 months ago

3 replies

Apple has been working on this for years. It's not like they started thinking about memory tagging when Daniel decided to turn it on in GrapheneOS.

4 months ago

2 replies

I didn't mean to imply Apple (and Google) hadn't been spearheading multi-year efforts to ship this in collaboration with Arm, I regret a little that it came across that way. Just that it would be nice to see production use of it acknowledged even just as a passing comment.

As an outsider I am quite ignorant to what security developments these companies are considering and when the trade-offs are perhaps too compromising for them to make it to production. So I can't appreciate the scale of what Apple had to do to reach this stage, whereas with GrapheneOS I know they favour privacy/security on balance. I use that as a weak signal to gauge how committed Apple/Google/Microsoft are to realising those kinds of goals too.

4 months ago

1 reply

Personally I had no idea anyone had shipped this. I knew that MTE existed, though I don’t think I knew about EMTE.

Nice to hear it’s already in use in some forms.

And of course it seems pretty obvious that if this is in the new iPhones it’s going to be in the M5 or M6 chips.

4 months ago

ARM shipped it as a standard feature of Cortex cores significantly after it was added as an ISA extension. MediaTek and Exynos provide it and Snapdragon is approaching shipping an implementation.

Google set it up for usage on Pixels, and then later Samsung and others did too. Pixel 8 was the first device where it was actually usable and production quality. GrapheneOS began using it in production nearly immediately after it launched on the Pixel 8.

4 months ago

ARM largely built and shipped it on their own. Cortex cores were the first real world implementation. Pushing ARM to care about it as a security feature instead of only a bug finding feature is something Apple and Google are probably responsible for doing. Pixels are not the only Android devices making MTE but were the first to take advantage of the CPU support by actually setting it up and making it available for use. There are other Android devices doing that now too.

Qualcomm has to make their own implementation which has significantly delayed widespread availability. Exynos and MediaTek have it though.

4 months ago

2 replies

GrapheneOS made our own integration of MTE for hardened_malloc and has done significant work on it. It wasn't simply something we turned on. ARM designed and built the feature which was made available in Cortex cores. Google's Tensor uses standard Cortex cores so unlike Qualcomm they didn't need to make their own implementation. Google integrated it into Android and did some work to make it available on Pixels along with fixing many bugs it uncovered, although definitely not all of them. We had to fix many of the issues. Apple had to make their own hardware implementation because they have their own cores, which Qualcomm finally got done too.

Pixels are not the only Android devices with MTE anymore and haven't been for a while. We've tried it on a Samsung tablet which we would have liked to be able to support if Samsung allowed it and did a better job with updates.

GrapheneOS is not a 1 person project and not a hobby project. I wasn't the one to implement MTE for hardened_malloc and have not done most of the work on it. The work was primarily done by Dmitry Muhomor who is the lead developer of GrapheneOS and does much more development work on the OS than I do. That has been the case for years. GrapheneOS is not my personal project.

We've done a large amount of work on it including getting bugs fixed in Linux, AOSP and many third party apps. Our users are doing very broad testing of Android apps with MTE and reporting issues to developers. There's a specific crash reporting system we integrated for it to help users provide usable information to app developers. The hard part is getting apps to deal with their memory corruption bugs and eventually Google is going to need to push for that by enabling heap MTE by default at a new target API level. Ideally stack allocation MTE would also be used but it has a much higher cost than heap MTE which Apple and Google are unlikely to want to introduce for production use.

Android apps were historically largely written in Java which means they have far fewer memory corruption bugs than desktop software and MTE is far easier to deploy than it otherwise would be. Still, there are a lot of native libraries and certain kinds of apps such as AAA games with far more native code have much bigger issues with MTE.

4 months ago

1 reply

None of this is wrong but none of this really has any impact on what Apple decided to do. In fact Apple very specifically chose not to go in this direction as they describe in their blog post.

4 months ago

2 replies

The side channel fixes and new MTE instruction features are not specific to Apple. Apple's blog post has some significant misleading claims and omissions. It's marketing material, not a true technical post without massive bias. It's aimed at putting down the existing deployments of MTE, hyping up what they've done and even downplaying the factually widespread exploits of Apple devices which are proven to be happening. If they're not aware of how widespread the exploits of their devices are including by low level law enforcement with widely available tools, that's quite strange.

4 months ago

1 reply

I think you have to read "widespread malware attack" in Apple lit as a term of art; it's a part of the corporate identity dating back to the inception of the iPhone and (I think maybe) ties into some policy stuff that is very salient to them right now. I think SEAR is extremely aware of what real-world exploitation of iPhones looks like. You were never going to get their unfiltered take in a public blog post like this, though.

4 months ago

1 reply

> I think you have to read "widespread malware attack" in Apple lit as a term of art

There's widespread exploitation of Apple devices around the world by many governments, companies, etc. Apple and Google downplay it. The attacks are often not at all targeted but rather you visit a web page involving a specific political movement such as Catalan independence and get exploited via Safari or Chrome. That's not a highly targeted attack and is a typical example of how those exploits get deployed. The idea that they're solely used against specific individuals targeted by governments is simply not true. Apple and Google know that's the case but lead people to believe otherwise to promote their products as more safe than they are.

> I think SEAR is extremely aware of what real-world exploitation of iPhones looks like.

Doesn't seem that way based on their interactions with Citizen Lab and others.

4 months ago

1 reply

I understood the point you were making previously and was not pushing back on it. I think you're wrong about SEAR's situational awareness, though. Do you know many people there? I'd be surprised if not. Platform security is kind of an incestuous scene.

4 months ago

1 reply

We have regular contact with many people at Google in that space and nearly no contact with anyone at Apple as a whole. Sometimes people we know go to work at Apple and become nearly radio silent about anything technical.

It's often external parties finding exploits being used in the wild and reporting it to Apple and Google. Citizen Lab, Amnesty International, etc.

We regularly receive info from people working at or previously working at companies developing exploits and especially from people at organization using those exploits. A lot of our perspective on it is based on having documentation on capabilities, technical documents, etc. from this over a long period of time. Sometimes we even get access to outdated exploit code. It's major releases bringing lots of code churn, replaced components and new mitigations which seem to regularly break exploits rather than security patches. A lot of the vulnerabilities keep working for years and then suddenly the component they exploited was rewritten so it doesn't work anymore. There's not as much pressure on them to develop new exploits regularly as people seem to think.

4 months ago

Disclaimer: I have never worked with the team on the Apple side.

My impression is that Apple's threat intelligence effort is similar in quality to Google's. Of course external parties also help but Apple also independently finds chains sometimes.

4 months ago

1 reply

The choices they made are novel to my understanding.

4 months ago

1 reply

There's a difference between Apple doing good integration of MTE and the work they're doing being truly novel. ARM MTE is not the only memory tagging implementation. Apple getting ARM to add something many people have wanted from elsewhere is useful, but it doesn't make it their idea. The fact is that they're not at all the first to deploy MTE to production and MTE was not the first deployment of hardware memory tagging to production. Their integration is better than what Google offers in Android 16 themselves. Unlike Apple, Google's mobile OS is open source and not limited to what Google does themselves. There are ways their integration is better than what's implemented elsewhere and also ways that it's worse. For one thing, it's deployed for a narrower set of components. What's implemented elsewhere is not static and will improve. MTE has been deployed in production in GrapheneOS for 2 years without significant hardware changes yet, but those are coming.

4 months ago

Apple did not just “get ARM to add something” they got dozens if not hundreds of engineers to think really hard about how to roll out MTE with no performance impact on all their critical attack surface in a way that actually targets specific exploit strategies rather than just going “oh ok our allocator has tags now”. Google (and Android) took a very different approach. Of course it’s very possible Apple messed up and their implementation is not as secure as it was designed to be but they did put significant effort in many areas that I feel are novel.

HackerNewt-doms

4 months ago

2 replies

Is MTE on GrapheneOS restricted to some (newest?) Pixel models? Or does it work with all models that are currently supported by GrapheneOS itself?

4 months ago

1 reply

MTE is only available in hardware on Pixel 8 and later https://googleprojectzero.blogspot.com/2023/11/first-handset.... GrapheneOS supports all the Pixel 8 and 9 series phones. They plan to support Pixel 10 once Google stop delaying their open-source releases of AOSP.

4 months ago

MTE is also available on a bunch of non-Pixel devices we can't support or which don't meet our other requirements.

8th/9th generation Pixels are half of the devices we support. 7 years of support is the status quo but it was 3 years before the Pixel 6 raised it to 5 so the earlier devices aren't supported anymore.

4 months ago

It's available since October 2023 when it launched on the Pixel 8. We integrated it into hardened_malloc that month and deployed it in production. We've been working on further research and improvements based on MTE since then.

GrapheneOS always uses it for the kernel, all of the base OS processes including apps with a couple exceptions, user installed apps opting into it and user installed apps solely written in Java/Kotlin which are very common on Android. For other user installed apps, there's a toggle for users to opt-in and most apps work with it already. For apps not known to work with it, there's a user-facing system for MTE crash reports and users can make an exception. Users can't disable it for base OS apps or apps which should work due to opting in or being pure Java/Kotlin.

Apple uses it for the kernel and parts of the base OS. They require opt-in by app developers and discourage doing it.

GrapheneOS is working on improvements to the kernel integration, Chromium PartitionAlloc integration and other aspects of it. We'll enable enforcement of tags for untagged memory once that's available, but we're also expanding the tagging. As an example, fully enabling stack allocation tagging has a more than acceptable performance cost for GrapheneOS but not Apple or Google. That's something we've been actively testing and will be deploying.

slashtab

4 months ago

1 reply

So Apple did research and Daniel just “turned it on”?! I am not talking about Hardware part even then you're biased and dismissive of other's effort.

4 months ago

1 reply

It certainly isn't something you can just turn on. I don't know how hardened_malloc works, but one problem is that C malloc() doesn't know the type of memory it's allocating, which is naturally an issue when you need to… allocate typed memory.

You can fix this insofar as you control the compiler and calls to malloc(), which you don't, because third party code may have wrappers around it.