Ahrefs 15TB SSDs Failure Rate Statistics 2022 Q4, 2023 Q1&Q2

Efim Mirochnik
Ahrefs
Published in
4 min readAug 2, 2023

--

Last time, we published statistics about Annualized Failure Rate (AFR) for big SSD drives at the end of the last year. We’ve added stats for three full quarters since then. During this period, we started a new data center in Virginia, US. Hundreds of new servers started their operation. This added about 5000 new Kioxia CD6-R drives to our review. Let’s check the stats and discuss the complications of this addition.

SSD AFR in 2022 Q4
SSD AFR in 2023Q1
SSD AFR in 2023 Q2
Combined SSD AFR from 2021 Q4 till 2023 Q2

Here is the quarterly AFR graph from 2021 Q4 till 2023 Q2. Theoretically, the graphs should resemble a bathtub curve over the long term. But it seems we still don’t have old enough drives to see the right side of the curve going up.

Combined SSD AFR from 2021 Q4 till 2023 Q2

You may see that there is a new column named ‘Batch’. The reason for this column to appear is that disks are not installed all simultaneously. When a disk fail, we replace it. The new disk has zero age as we just installed it. Since we have relatively big batches and a small amount of newly installed disk drives, they don’t affect the average age significantly.

But suppose we install a significant amount of drives comparable to or even larger than there was in production. Such a big amount of new drives affects the statistics by pushing the average age to zero.

We use about 1000pcs Kioxia CD6-R drives in our Singapore data center. Roughly 5000 more of the same drives were added to the new US location by the end of 2023 Q1. This massive addition would break the stats on the graph completely: the average age for CD6 would drop from 16.5 months to 5.9 months in 2023 Q2. So we split Kioxia drives into 2 batches: SG1 and US1. We’ll continue tracking these batches separately and draw different graphs, while the other drives will stay in a single batch only.

Longer term results overview

Since the beginning of the review, the Samsung PM1733 model has had the best result with just 0.13% AFR. Micron 9300 is practically nearby with 0.14% while having much more drives and working for much longer. These good results may be explained by the fact that the both these models are positioned as Enterprise-grade series.

The results for the data center-grade disks of the same brands are different though. While Samsung PM9A3 has a decent 0.36% failure rate, Micron 7450 shows quite a worrying AFR of 0.97%. I would think about such a high failure rate as more usual for an HDD, not an SSD. Also, the graph for Micron 7450 doesn’t show a trend to a reduction of this rate. Even more, we can see even higher AFR of 1.67% for Micron drives in the first 27 days of July 2023. We have contacted Micron about this and we’ll work together on this issue.

Overall, we see 79 drive failures for 7 quarters of review. It is a bit more than 11 drives per quarter, or less than a single drive failure per week. I would prefer failure rate to be even lower. But even the current such amount of failures doesn’t look too burdensome compared to HDD era.

Looking forward, we should see new even bigger 30.72TB drives added to the statistics.

One of the drives in our new US data center

--

--