Although a few years ago, Linus Torvalds, the creator of the popular Linux kernel, once declared that I would try to control my gruff personality to become a professional person when working with someone else, but it seems Intel made him break his commitments.
In early January, in a post on realworldtech, Mr. Linus repeatedly criticized Intel for viewing them as the main cause for the lack of a key component in the computer, ECC RAM ( Error Correcting Checksum) – or self-correcting RAM. Most importantly, because of Intel, the average user is not even aware of the existence of this component.
If you are a fan of server hardware, such as CPUs, server-specific motherboards, you will know this ECC RAM. In essence, ECC RAM will contain an extremely small piece of memory used to detect and correct errors in memory.
Currently, ECC RAM is more commonly found in server systems than consumer computers
Why is the ECC feature necessary?
In most current memory, for every 64-bit word stored in RAM, there are 8 bits checked. If a bit fails – for example the zero has been jumped to 1 or the number 1 changed to 0 – it is detected and corrected automatically. If there are 2 chipped bits in the same word, they can be detected but not corrected. If 3 or more bits are jumped in the same word, they can still be detected, but not entirely guaranteed.
There are many reasons for this bit jump error, be it because of cosmic rays or maybe because of random hardware failure. About 32% of Google’s servers (and 8% in DIMMs) experience at least 1 memory failure per year, according to a Google server study. Much of that is a one-bit error – and since Google uses server CPUs and ECC RAM, its machines are able to fix the error on their own and continue to function as usual.
Meanwhile, for user computers, Google data shows that these one-bit jumps are 40 times more likely to occur than multi-bit jumps. However, because ECC RAM is not used like Google, these errors go undetected and can cause system instability and data corruption.
The bit jump is not always random
Not every fault in RAM is accidental due to hardware failure or EMF electromagnetic waves. In recent years, researchers have discovered more and more side channel attacks through physical access to the system.
From using a controlled bit jump error in areas of RAM accessible to an application, an attacker can deduce or modify data values in areas of RAM that should be. they are inaccessible.
Although the ECC RAM cannot mitigate this RAMBleed attack, in order to infer the value of the data in the memory neighborhoods, it can generally prevent a Rowhammer attack. update continuously to a location on the DRAM to cause a bit jump error, which can be done remotely) when the system is turned off without affecting the data. (Most systems using ECC RAM are configured to stop working completely if an irreparable error is found.)
Rowhammer type attacks can be prevented using ECC RAM
But very few casual users are aware of the existence of ECC RAM, and according to Torvalds, it’s Intel’s fault.
There was a time when you bought a regular CPU, but still got ECC RAM support, but that’s the story of 15 years ago with the 975X chipsets. Later, this feature was only available to Intel for server CPUs like Xeon. Intel’s argument is simply “consumers don’t need ECC”.
But for the reasons above, Rowhammer-style attacks and bit-hopping errors continue to occur – which proves the general user needs it too – unlike Intel claims. Without the support of Intel CPUs – which had been a dominant company for many years now – general consumer ECC RAM has also disappeared.
Intel has caused Mr. Linus Torvalds to break his commitment to stay calm.
“And memory manufacturers will say it is because of the economy and low performance. And they’re liar bastards – look at the Row-hammer attack to see that this problem has been around for generations, but these bastards are happy to sell bad hardware to users and say that it is an “attack”, when it is always there. ”
“How many times jump bits similar to the Rowhammer attacks occur because bad luck falls on the workload and not get hacked? We will never know. Because Intel pushed that part of the loss on users. ”
But why would Intel do that? That’s how Intel segmented its market, according to Torvalds. Regular user CPUs that are cheaper and have lower margins, but lack the necessary protection features like ECCs, won’t step on server CPUs – which are more expensive and more profitable. – when it comes to large businesses.
Often times, it is thought that cost is the reason that ECC-enabled hardware is rarely for the average user. However, let’s look at the ECC RAM. Even if they are difficult to find, their retail price is only about 20% higher than regular RAM. The problem however is that, if the motherboard and CPU don’t support it, there’s no benefit to ECC RAM assembly either.
AMD’s Ryzen line of CPUs also theoretically support ECC, albeit unofficially. But for that reason, there’s no guarantee that other component manufacturers, like motherboards, or computer OEMs, support this feature – even if they claim like so. The only sure way to make sure a Ryzen motherboard is ECC compatible is to run an application that causes a bit jump error to enable that feature.
Another hope for users is that DDR5 RAM is starting to roll out in 2020. Besides capacity, bandwidth and power improvements, it’s worth noting that ECC self-correcting codes will be embedded within. memory chip. The popularity of this new generation of RAM will likely entail wider support from other components such as motherboards.
Refer to ArsTechnica