Why is ecc necessary
Intel has a vested interest in pushing deeper-pocketed businesses toward its more expensive—and profitable—server-grade CPUs rather than letting those entities effectively use the necessarily lower-margin consumer parts. Torvalds' argument here is that Intel's refusal to support ECC RAM in its consumer-targeted parts—along with its de facto near-monopoly in that space—is the real reason that ECC is nearly unavailable outside the server space.
Sign up to join this community. The best answers are voted up and rise to the top. Stack Overflow for Teams — Collaborate and share knowledge with a private group.
Create a free Team What is Teams? Learn more. Ask Question. Asked 12 years, 6 months ago. Active 10 months ago. Viewed 34k times. Improve this question. Jon Tackabury Jon Tackabury 1 1 gold badge 7 7 silver badges 14 14 bronze badges. Add a comment. Active Oldest Votes. Improve this answer. It might be worth revisiting particularly the last sentence in light of this. From the article: "[ This sounds like computers should be crashing constantly and data should become corrupted all the time.
Yet everyone seems to be doing pretty fine without ECC That's because that article is false when it comes to the error rate. The actual error rate is lower by many orders of magnitude. See the relevant reddit thread reddit. Whatever the error rate is, it also depends what is affected. Chances are it's not something that causes a system crash. Sam Cogan Sam Cogan Zan Lynx Zan Lynx 5 5 silver badges 13 13 bronze badges.
It seems like an immeasurably small chance of improved stability. It makes some read: still only a tiny bit of sense on servers where you're crunching terabytes of data constantly, but on workstations maybe the only thing that gets close is high-end graphics rendering or video processing. I ran memtest86 several times overnight without any error. That's how often memory flip occurs If lives depend on it, that would justify using ECC, otherwise I don't think this is a real issue If ECC was so important, so critical to the reliable function of computers, why isn't it built in to every desktop, laptop, and smartphone in the world by now?
Why is it optional? This smells awfully… enterprisey to me. I am not anti-insurance, nor am I anti-ECC. First, let's look at the Puget Systems reliability stats.
These guys build lots of commodity x86 gamer PCs, burn them in, and ship them. They helpfully track statistics on how many parts fail either from burn-in or later in customer use. Go ahead and read through the stats. For the last two years, CPU reliability has dramatically improved.
At the time we theorized that this should raise CPU failure rates since there are more components on the CPU to break but the data shows that it has actually increased reliability instead. Even though DDR4 is very new, reliability so far has been excellent. SSD reliability has dramatically improved recently. Modern commodity computer parts from reputable vendors are amazingly reliable. And their trends show from onward essential PC parts have gotten more reliable, not less. And doesn't this make sense from a financial standpoint?
How does it benefit you as a company to ship unreliable parts? That's money right out of your pocket and the reseller's pocket, plus time spent dealing with returns. We had a, uh, "spirited" discussion about this internally on our private Discourse instance. This is not a new debate by any means, but I was frustrated by the lack of data out there. In particular, I'm really questioning the difference between "soft" and "hard" memory errors :.
But what is the nature of those errors? Are they soft errors — as is commonly believed — where a stray Alpha particle flips a bit? Or are they hard errors, where a bit gets stuck? I absolutely believe that hard errors are reasonably common.
I've seen it plenty. But a soft error where a bit of memory randomly flips? There are two types of soft errors, chip-level soft error and system-level soft error. Chip-level soft errors occur when the radioactive atoms in the chip's material decay and release alpha particles into the chip. Because an alpha particle contains a positive charge and kinetic energy, the particle can hit a memory cell and cause the cell to change state to a different value.
The atomic reaction is so tiny that it does not damage the actual structure of the chip. Outside of airplanes and spacecraft, I have a difficult time believing that soft errors happen with any frequency, otherwise most of the computing devices on the planet would be crashing left and right.
I deeply distrust the anecdotal voodoo behind "but one of your computer's memory bits could flip, you'd never know, and corrupted data would be written!
However, because of the extra processing required on the RAM chips, ECC may have a slight impact on memory performance. But with users prioritising the error minimisation and maximum uptime that ECC RAM provides, even if it does come with a minor performance hit, this is hardly a major issue.
The slight performance advantage that comes with non-ECC memory over ECC memory is outweighed by the potential risks of a harmful single-bit error occurring. For business-critical server applications, ECC memory is often well worth the short answer is yes. But on a server handling sensitive customer details or financial transactions, even a single error holds the potential for catastrophe.
To protect against financial loss caused by corrupted data, or reputational damage caused by downtime in the aftermath of a system failure, ECC RAM is highly recommended for organisations that process large volumes of customer data online. With a Bare Metal server, you get all the advantages of your own dedicated hardware, combined with the features of our latest cloud hosting platform.
What are single-bit errors? The causes of single-bit errors come in two main flavours: Hard single-bit errors are caused by physical factors like temperature or power variation, and stress on the hardware. Soft single-bit errors result from factors that are harder to observe, such as magnetic interference and even cosmic rays.
0コメント