Intel confirms no recall for Raptor Lake CPUs,microcode won't fix affected units

josephcsible | 119 points

> Will Intel share specific manufacturing dates and serial number ranges for the oxidized processors so mission-critical businesses can selectively rip and replace?

> Intel will continue working with its customers on Via Oxidation-related reports and ensure that they are fully supported in the exchange process.

Intel is refusing to disclose serial number ranges of the fundamentally defective processors?

Followup question: How do owners of that series of CPU, who suspect theirs is one of the defective units, exchange it for a non-defective CPU before it fails?

neilv | a month ago

Interesting. I bought a 13900 (non-k) in mid April this year for a new server build. It ran fine for a couple of weeks and then started randomly crashing. Having never had a cpu go bad on me before and not having another one laying around to test with, it took me a long time to figure out what the issue was. Finally, by the end of May, I had ruled everything else out and RMA’d it. The system has been running fine ever since.

I assumed I had just got a bad unit. Now I’m wondering if this might have been the cause.

jpatters | a month ago

Prediction: Intel is stalling the recall until after the earnings report to avoid tanking the stock.

Glad I switched to using AMD. Although some RUMINT indicating quality assurance troubles in 9000 series though. They were supposed to push out new product by end of July. But delayed to mid August.

xyst | a month ago

The headline isn't clear, but the claim so far is that the microcode update will fix any CPUs that haven't begun exhibiting instability. Nothing can fix the ones that are already broken. For those that are somewhere in between, hope it fails within the warranty period I guess.

abracadaniel | a month ago

This article is just quoting the original source of this information, which would be a much better link: https://www.theverge.com/2024/7/26/24206529/intel-13th-14th-...

Dunedan | a month ago

I've only ever bought Intel, mostly because I perceived them to be more stable and reliable. I think next time, I will give AMD a try.

MaximilianEmel | a month ago

Given that some reports put failure rates as high as 50% for some models/conditions this may as well be a recall.

Havoc | a month ago

What a mess. Intel should be out in front of this, but they are going to kick the can down the road and hope some of the problem goes away.

No system integrator has the infrastructure to handle replacing vast numbers of CPU's and entire systems where the CPU is soldered to the mainboard and cannot be replaced.

Also, there's no way this problem suddenly snuck up on them without warning. They had customers returning massive numbers of them, but Intel kept selling defective units instead of stopping production. They absolutely knew about this as soon as problems started popping up.

A microcode patch may make it stable, but if you have a dud, you have a dud. There's no way to patch around defects in manufacturing of this nature. Microcode can reduce frequencies, but a defective part is still defective.

How PG still has a job is beyond me.

AmVess | a month ago

I’m out of the loop. If I bought a 13700KF last week, is there anything I need to know or do? BIOS update when one is available soon?

Waterluvian | a month ago

I feel like the language is intentionally vague and intended to link the voltage issue with the oxidization issue. However, I would not feel comfortable knowing my chip may randomly become unstable way before the expected end of life.

beart | a month ago

I just wanna say that the AMD / Intel battle is cyclical and there will always be downfall of AMD or Intel.

What people ought do is just undervolt the chip by .05 at least and reduce the clock speed.

cametosay | a month ago

Blacklisting Intel. We don't need them anyways, AMD already has plenty powerful chips that don't come broken.

Intel lost their engineering abilities years ago. Got eaten by sharks.

devwastaken | a month ago

Other than the chips that are significantly damaged prior to the updated microcode, why should this be treated differently from the meltdown saga? Fixes for that required significant slowdowns, except in niche applications with trustworthy code or air-gapped systems such that the vulnerability could be ignored. No Recall. Class actions ongoing, but everyone's mostly forgotten about them.

More conservative voltages will only lead to small (low single-digit) performance decreases, right? Isn't that less significant a performance hit than meltdown countermeasures? The only way I can imagine performance really tanking would be if Intel has to severely down-rate supported memory speeds as well, if the ring bus can't handle higher memory speeds at slightly reduced voltages.

harshreality | a month ago

Maybe off topic but I am old enough that I remember when AMD were inferior and should be avoided. It is one point in the famous "Is your son a computer hacker [1]

> If your son has requested a new "processor" from a company called "AMD", this is genuine cause for alarm. AMD is a third-world based company who make inferior, "knock-off" copies of American processor chips

[1] https://gwern.net/doc/cs/security/2001-12-02-treginaldgibbon...

elashri | a month ago

The only thing keeping me on Intel (for my homelab) is QuickSync. If Plex supported AMD’s equivalent I’d drop Intel in a heartbeat.

joshstrange | a month ago

If this does not justify a recall, what would then?

mrjin | a month ago

Will multi CPU systems be resistant or fail if one CPU dies?

flemhans | a month ago

I've seen multiple rumor-style explanations to this issue, including:

  - it may have to do with oxidation of metallic copper deposited inside through-sillicon via(TSV)
  - it may have to do with improper connection between die and substrate(that hard green plate) causing higher resistance and temperature somewhere, which is by the way the true limiting factor for socket Tjmax  
  - it could have to do with VCORE values used to meet the performance target being too high for the on-die ring bus logic including L1
  - it is possibly related to big-little heterogeneous configuration and how voltages for big-core and little-core/uncore/IO are generated  
  - I've heard they had an HVAC outage in one of US fabs and it ruined some dies  
  - Yeah they just flew too close to the Sun, frankly
  - ...
I have absolutely no skin in this game, and my question is: are there any more plausible technobabbly stories around? It all sounds intriguing to me. Some of it could be correct or relevant.
numpad0 | a month ago

I'm starting to want the FTC to get involved with this

navjack27 | a month ago