Airbus A320 – Intense Solar Radiation May Corrupt Data Critical for Flight
Key topics
Airbus issued a precautionary update regarding potential data corruption in A320 aircraft due to intense solar radiation, which could impact critical flight data. The update was met with minimal discussion on Hacker News. The original title was modified to better reflect the content.
Snapshot generated from the HN discussion
Discussion Activity
Very active discussionFirst comment
2m
Peak period
31
12-15h
Avg / period
12.3
Key moments
- 01Story posted
Nov 28, 2025 at 4:40 PM EST
about 1 month ago
Step 01 - 02First comment
Nov 28, 2025 at 4:42 PM EST
2m after posting
Step 02 - 03Peak activity
31 comments in 12-15h
Hottest window of the conversation
Step 03 - 04Latest activity
Nov 30, 2025 at 7:11 PM EST
about 1 month ago
Step 04
Generating AI Summary...
Analyzing up to 500 comments to identify key contributors and discussion patterns
Want the full context?
Jump to the original sources
Read the primary article or dive into the live Hacker News thread when you're ready.
> At least 15 passengers were injured and taken to the hospital after a sudden drop in altitude on the flight from Mexico was forced to make an emergency landing in Florida, US aviation officials said at the time.
> The Thursday flight from Cancun was headed to Newark, New Jersey, when the altitude dropped, leading to the diversion to Tampa International Airport, the US Federal Aviation Administration said in a statement.
> Pilots reported “a flight control issue” and described injuries including a possible “laceration in the head,” according to air traffic audio recorded by LiveATC.net.
> Medical personnel met the passengers and crew on the ground at the airport. Between 15 and 20 people were taken to hospitals with non-life-threatening injuries, said Vivian Shedd, a spokesperson for Tampa Fire Rescue.
> Pablo Rojas, a Miami-based attorney who specialises in aviation law, said a “flight control issue” indicated that the aircraft wasn't responding to the pilots.
https://www.stuff.co.nz/travel/360903363/what-happened-fligh...
I’m surprised passengers are allowed to unbuckle for so much of each flight. You can get injured while buckled it, but that seems less common.
Only aviation professionals or recovering flight phobics like me who have watched every episode of Air Crash Investigation will take proactive safety measure of their own accord. To normies it's all just a pointless hassle.
Not just ignoring flight crew advice and common sense to generally stay buckled in order to gain maybe a minor amount of comfort and convenience being unbuckled, but unbuckling even when the seat belt sign is on and again common sense says being buckled in is the smart move. On my most recent flight I heard quite a few people unbuckling their seat belts while the plane was still rolling down the runway after landing. You couldn't wait 5 more minutes until the plane is at the gate?
Also: people clapping the second the back wheels touch on landing, is particularly hilarious to me, because it implies an acknowledgement of the precariousness of flying, but a complete ignorance that you're in fact just entering the second most dangerous 30 seconds of the entire flight.
https://www.swpc.noaa.gov/noaa-scales-explanation
https://kauai.ccmc.gsfc.nasa.gov/CMEscoreboard/prediction/de...
The European Agency Aviation Safety Agency [2] instruction describes the characteristics of the incident but not the date.
[1] https://www.theguardian.com/business/2025/nov/28/airbus-issu...
[2] https://ad.easa.europa.eu/ad/2025-0268-E
https://docs.oracle.com/cd/E19095-01/sf4810.srvr/816-5053-10...
https://en.wikipedia.org/wiki/Cosmic_ray
Hardware fix is the ultimate solution but it might be possible to paper over with software.
Mind you whatever came out of that project is rolling on the street today.
I still design this into many of the things I work on, especially if I’m working close to the metal on controller systems. At some point it becomes ridiculous / impossible but I’m often thinking about how a system would handle memory corruption, bit flips, invalid sensor data, etc. These days, somebody should design a triple redundant microcontroller that runs quorum on the gpio at the hardware level. It could be a 0.30 part instead of 0.10 one, but I would specify it just about everywhere. Add $3 to BOM cost to categorically eliminate an entire class of failure would be ramrodded by legal into just about every medical device, PLC, critical automotive system, etc one would think. Seems like a good gambit for a riscV startup, but what do I know.
One glider instructor talked about taking a stick with him in case of panicking student, so they could hit them hard enough so they would stop holding the controls.
There's a detailed breakdown here: https://admiralcloudberg.medium.com/the-long-way-down-the-cr...
I don't believe there was any issue identified with the software of the plane.
https://forums.raspberrypi.com/viewtopic.php?t=99167
https://forums.raspberrypi.com/viewtopic.php?f=28&t=99042
https://www.raspberrypi.com/news/xenon-death-flash-a-free-ph...
https://www.youtube.com/watch?v=wyptwlzRqaI
And of course you can block the type radiation that caused problems for the rpi with a good piece of paper.
For manned spaceflight, NASA ups N from 3 to 5.
Other mitigations include completely disabling all CPU caches (with a big performance hit), and continuously refreshing the ECC in background.
Eg. I could understand if each subsystem had its own actuators and they were designed so any 3 could aerodynamically override the other 2, but I don't think that's how it works in practice.
They do not. Just make voting circuit much more reliable than computing blocks.
As example, computing block could be CMOS, but voting circuit made from discrete components, which are just too large to be sensitive to particles.
Unfortunately, discrete components are sensitive to overall exposure (more than nm scale transistors), because large square gather more events and suffered by diffusion.
Other example from aviation world - many planes still have mechanic connection of steering wheel to control surfaces, because mechanic connection considered ideally reliable. Unfortunately, at least one catastrophe happen because one pilot blocked his wheel and other cannot overcome this block.
BTW weird fact, modern planes don't have rod physically connected to engine, because engine have it's own computer, which emulate behavior of old piston carburetor, and on Boeing emulating stick have electronic actuator, so it automatically placed in position, corresponding to actual engine mode, but Airbus don't have such actuator.
I want to say - especially big planes (and planes overall), are weird mix of very conservative inherited mechanisms and new technologies.
It's interesting to me that triple-voting wasn't necessary on the older (rad-hard) processors. Every foundry in the world is steering toward CPUs with smaller and smaller feature sizes, because they are faster and consume less power, but the (very small) market for space-based processors wants large feature sizes. Because those aren't available anymore, TMR is the work-around.
In other cases all of the subsystems implement the comparison logic and "vote themselves out" if their outputs diverge from the others. A lot of aircraft control systems are structured more as primary/secondary/backup where there is a defined order of reversion in case of disagreement, rather than voting between equals.
But, more generally, it is very hard to eliminate all possible single points of failure in complex control systems, and there are many cases of previously unknown failure points appearing years or decades into service. Any sort of multi-drop shared data bus is very vulnerable to common failures, and this is a big part of the switch to ethernet-derived switched avionics systems (e.g. Afdx) from older multi-drop serial busses.
You can see much more data in te report.
Why would you assume they're not? I don't know about aircraft specifically, but there's plenty of hardware that uses components older than that. Microchip still makes 8051 clones 45 years after the 8051 was released.
There's a reason the airlines and manufacturers hem and haw about new models until the economics overwhelmingly make it worthwhile, and even then it can still be a shitshow. The MCAS issue is case in point of how introducing new tech can cause unexpected issues (made worse by Boeing's internal culture).
The 787 dreamliner is also a good example of how hard it is. By all accounts is a success, but it had some serious teething problems and still has some concerns about the long term wear and tear of the composite materials (though a lot of it's problems wasn't necessarily the application of new tech, but Boeing's simultaneous desire to overcomplicate the manufacturing pipeline via outsourcing and spreading out manufacturing).
They didn't say the design was brand new.
Because getting a new one certified is extremely expensive. And designing an aircraft with a new type certificate is unpopular with the airlines. Since pilots are locked into a single type at a time, a mixed fleet is less efficient.
Having a pilot switch type is very expensive, in the 50-100k per pilot range. And it comes with operational restrictions, you can't pair a newly trained (on type) captain with a newly trained first officer, so you need to manage all of this.
Significant internal hardware changes might indeed require a new/updated type certificate, but they generally wouldn't mean that pilots need to re-qualify or get a new type rating.
But to do that you'll still have to prove that the changes don't change any of the aircraft characteristics. And that's not just the normal handling but also any failure modes. Which is an expensive thing to do, so Airbus would normally not do this unless there is a strong reason to do it.
The crew is also trained on a lot of knowledge about the systems behind the interface, so they can figure out what might be wrong in case of problems. That doesn't include the software architecture itself but it does include a lot of information on how redundancy between the systems work and what happens in case one system output is invalid. For example how the fail over logic works in case of a flight control computer failure, or how it responds to loosing certain inputs. And how that affects automation capabilities. For example no autoland when X fails, no autopilot and degradation to alternate contol law when Y fails, further degradation if X and Z fail at the same time. Sometimes also per "side", not all computers are connected to all sensors.
2. Bigger changes than this are made all the time under the same type certificate. Many planes went from steam gauges to glass cockpits. A320 added a new fuel tank with transfer valves and transfer logic and new failure modes, and has completely changed control law over the type. etc.
[1] Honeywell actually bought full license et al from AMD and operates a fabless team that ensures they have stock and if necessary updates the chip.
Guessing that using previously certified stuff is an advantage
Wasn't the philosophy back then to run multiple independent (and often even designed and manufactured by different teams) computers and run a quorum algorithm at a very high level?
Maybe ECC was seen as redundant in that model?
Jeez, it would drive me _up the wall_. Let's say I could somewhat justify the security concerns, but this seems like it severely hampers the ability to design the system. And it seems like a safety concern.
Sometimes the solution is obvious, such that if you ask three engineers to solve it you’ll get three copies of the same solution, whereas that might not happen if they’re able to communicate.
I’m sure they knew what they were doing, but I wonder how they avoided that scenario.
Redundancy is a tool for reducing the probability of encountering statistical errors, which come from things like SEUs.
Dissimilarity is a tool for reducing the “probability” of encountering non-statistical errors — aka defects, bugs — but it’s a bit of a category error to discuss the probability of a non-probabilistic event; either the bug exists or it does not, at best you can talk about the state coverage that corresponds to its observability, but we don’t sample state space uniformly.
There has been a trend in the past few decades, somewhat informed by NASA studies, to favor redundancy as the (only, effective) tool for mitigating statistical errors, but to lean against heavy use of dissimilarity for software development in particular. This is because of a belief that (a) independent software teams implement the same bugs anyway and (b) an hour spent on duplication is better spent on testing. But at the absolute highest level of safety, where development hours are a relatively low cost compared to verification hours, I know it’s still used; and I don’t know how the hardware folks’ philosophy has evolved.
Providing errors are independent, it's better to have three subsystems with 99% reliability in a voting arrangement than one system with 99.9% reliability.
Otherwise, I can easily see teams doing parallel construction of the same techniques. So many developments seem to happen like this, due to everyone being primed by the same socio-technical environment...
It’s essentially a very intentional trade-off between groupthink and the wisdom of crowds, but it lands on a very different point on that scale than most other systems.
Arguably the track record of Airbus’s fly-by-wire does them some justice for that decision.
ECC memory usage in the past was heavily correlated with, well, way lower quality of the hardware from chips to assembly, electromagnetic interference from unexpected sources, or even customer/field technician errors. Remember an early 1980s single user workstation might require extensive check & fix cycle just from moving it around.
An aircraft component would eliminate all major parts of that, including both through more thorough self-testing, careful sealed design, selection of high grade parts, etc.
The possibility of space radiation causing considerable issues came up as fully digital fly by wire became more common in civilian usage and has led over time to retrofitting with EDAC, but radiation-triggered SEU was deemed low enough risk due to design of the system.
This does not match my experience (although, admittedly, I've been in the field only a couple decades -- the hardware under discussion predates that). The problem with SEU-induced bit flips is not that errors happen, but that errors with unbounded behavior happen -- consider a bit flip in the program counter, especially in an architecture with variable sized instructions. This drives requirements around error detection, not correction -- but the three main tools here are lockstep processor cores, parity on small memories, and SECDED on large memories. SECDED ECC here is important both because it can catch double errors that happen close together in time, and because memory scrubbing with single error correction allows multiple errors spaced in time to be considered separately. At the system level, the key insight is that detectable failures of a single ECU have to be handled anyway, because of non-transient statistical failures -- connector failures, tin whiskers, etc. The goal, then, is to convert substantially all failures to detectable failures, and then have defined failure behavior (often fail-silent). This leads to dual-dual redundancy architectures and similar, instead of triplex; each channel consists of two units that cross-check each other, and downstream units can assume that commands received from either channel are either correct or absent.
An under-appreciated thing is also that the devices in question used to be rebooted pretty often which triggered self-test routines in addition to the run-time tests - something that didn't trigger anything in case of A330 in 2008, but was impactful in risk assessments missing certain things with 787 some years later (and newer A380/A350 recently).
"There was a limitation in the algorithm used by the A330/A340 flight control primary computers for processing angle of attack (AOA) data. This limitation meant that, in a very specific situation, multiple AOA spikes from only one of the three air data inertial reference units could result in a nose-down elevator command. [Significant safety issue]"
This is most likely what they will address.
Difference between it and ECC?
All of the value of your comment comes from the first sentence and the last two.
What you're doing here is half the job: consulting an LLM and sharing the output without verifying whether it is true. You're then saying 'okay everyone else, finish my job for me, specifically the hard part of it (the verification), while I did the easy part (asking a magic 8 ball)'.
From this perspective, your comment is disrespectful of others by asking them to finish your job, and of negative value because it could be totally hallucinated and false, and you didn't care enough about others to find out.
- EDAC is a term that encompasses anything used to detect and correct errors. While this almost always involves redundancy of some sort, _how_ it is done is unspecified.
- The term ECC used stand-alone refers specifically to adding redundancy to data in the form of an error correcting code. But it is not a single algorithm - there are many ECC / FEC codes, from hamming codes used on small chunks of data such as data stored in RAM, to block codes like reed-solomon more commonly used on file storage data.
- The term ECC memory could really just mean "EDAC" memory, but in practice, error correcting codes are _the_ way you'd do this from a cost perspective, so it works out. I don't think most systems would do triple redundancy on just the RAM -- at that point you'd run an independent microcontroller with the RAM to get higher-level TMR.
https://www.sciencedirect.com/science/article/abs/pii/S01419...
It’s confusing because EDAC and ECC seem to mean the same thing, but ECC is a term primarily used in memory integrity, where EDAC is a system level concept.
I turn the page on the excuse sheet. "SOLAR FLARES" stares out at me. I'd better read up on that..."
How is it possible that this wouldn't impact upon flight schedules?
But as I hear, air transporters could buy planes in different configurations, so for example, Emirates airlines, or Lufthansa always buy planes with all features included, but small Asian airlines could buy limited configuration (even without some safety indicators).
So for Emirates or Lufthansa, will need one empty flight to home airport, but for small airline will need to flight to some large maintenance base (or to factory base) and wait in queue there (you could find in internet images of Boeing factory base with lot of grounded 737-MAXes few years ago).
So for Emirates or Lufthansa will be minimal impact to flights (just like replacement of bus), but for small airlines things could be much worse.
So here’s everything you need to know about ELAC.
The ELAC System in the Airbus A320: The Brains Behind Pitch and Roll Control https://x.com/Turbinetraveler/status/1994498724513345637
"We take proactive measures, whereas our competitor only takes action after multiple fatal crashes!"
As far as I'm concerned it has not helped with their marketing.
It actually inspires a lot of confidence to people who can at least think economically, if not technically:
Grounding thousands of planes is very expensive (passengers get cash for that in at least the EU, and sometimes more than the ticket cost!), so doing it both shows that it’s probably a serious issue and it’s being taken seriously.
With that out of the way, being expensive does not preclude shoddy work. At the end of the day, the only difference between "they are so concerned about security that they are willing to lose millions[1]" and "their process must be so bad that they have no other choice but to lose millions before their death trap cost them ten times that" is how good your previous perception of their airplanes is.
I think that, had this exact same issue happened to Boeing, we would be having a very different conversation. As the current top-comment suggests, it would probably be less "these things happen" and more "they cheapened out on the ECC".
[1] Disclaimer: I have no idea who loses money in this scenario, if it's also Airbus or if it's exclusively the airlines who bought them.
So the immediate cost to Airbus of grounding the fleet is quite low, whilst the downside of not grounding the fleet (risk of incident, lawsuits, reputation, etc.) could be substantial.
It sounds like the fix is fairly quick so probably not as expensive as the max multi month groundings
I doubt anyone is going to sue. Repairs etc are a part of life when owning aircraft. So as long as Airbus makes this happen fast and smooth they’re probably ok
Airbus/Thales's fix in this case appears to add more error checking, and to restart the misbehaving component. https://bea.aero/fileadmin/user_upload/BEA2024-0404-BEA2025-...
("une supervision interne du composant à l’origine de la défaillance ; - un mécanisme de redémarrage automatique de ce composant dès lors que la défaillance est détectée)
Curious what a sw change might have done in terms of resiliency. Maybe an incorrect memory setting or some code path that is not calculating things redundantly maybe?
https://avherald.com/h?article=52f1ffc3&opt=0
"This identified vulnerability could lead in the worst case scenario to an uncommanded elevator movement that may result in exceeding the aircraft structural capability."
The actual bug was unsafe code somewhere else in the application corrupting the memory. The application worked fine, but the log message strings were being slightly corrupted. Just a random letter here and there being something it shouldn't be.
The question really should have been, if this was truly cosmic interference, why only this service and why was the problem appearing more than once over multiple versions of the application?
Cosmic rays are a great excuse to problems you don't yet understand. But the reality of them is extremely rare and it's like 99% a memory corruption bug caused by application code.
I won’t blame cosmic rays but more likely dying RAM. The NAS now runs ECC memory.
I jest, but, once upon a time I worked with an infallible developer. When my projects crashed and burned, I would assume that it was my lack of competence and take that as my starting point. However, my colleague would assume that it was a stray neutrino that had flipped a bit to trigger the failure, even if it was a reproducible error.
He would then work backwards from 93 million miles away to blame the client, blame the linux kernel, blame the device drivers and finally, once all of that and the 'three letter agencies' were eliminated, perhaps consider the problem was between his keyboard and his chair.
In all fairness, he was a genius, and, regarding the A320 situation, he would have been spot on!
If a radiation event caused some bit-flip, how would you realize that's what triggered an error? Or maybe the FDR does record when certain things go wrong? I'm thinking like, voting errors of the main flight computers?
Anyway, would be very interested to know!
"Had the same problem with low power CMOS 3 transistor memory cells used in implantable defibrillators in the 1990s. Needed software detection and correction upgrade for implanted devices, and radiation hardening for new devices. Issue was confirmed to be caused by solar radiation by flying devices between Sydney and Buenos Aires over the south pole multiple times, accumulating a statistically significant different error rate to control sample in Sydney."
The cause could have also been an extra check introduced in one of the routines - which backfired in this particular failure scenario.
Trapping for every known instance can be tricky and difficult. When things go wrong they tend to really go wrong.