Are all SSDs Unreliable?


by russ_watters
Tags: ssds, unreliable
russ_watters
russ_watters is online now
#1
Feb13-14, 03:45 PM
Mentor
P: 22,008
I have a Crucial M4 SSD that is 2.5 years old and still in warranty. The drive has a nasty habit: If it loses power, it disappears from my laptop. Crucial has a procedure for bringing it back to life prominently displayed on its support site:

1. Remove the drive from my laptop
2. Connecting it to the power of my desktop for 20 minutes.
3. Remove power and wait 30 seconds.
4. Repeat step 2.
http://forum.crucial.com/t5/Solid-St...tem/ta-p/65215

The procedure works. Unfortunately, the result is often a corrupted windows installation. Crucial has issued bios fixes attempting to mitigate the issue, but if anything it appears to be getting worse: it has happened 3 times in the past 4 months or so and even at that, the drive has only been in my system for half the time!

If you google "ssd disappear", the majority of the hits you get are about this issue on the M4.

Crucial won't replace it because they say it is normal(!?):
Justin _M : It is normal for an SSD to need to power cycle after a sudden power failure, but if its locking up just after normal use that would be a cause of concern.

Russell Watters: I disagree. No computer is perfectly stable and most occasionally lock-up/crash. I've never owned another hard drive that required such a procedure to bring it back to life.

Russell Watters: I'm lucky enough to be computer savvy enough to know how to replace a hard drive (which I suppose is normal for a replacement HD buyer), otherwise I'd be screwed!

Justin _M : Its just due to the nature of the caching technology that this can happen when the computer crashes. You shouldn't be needed to reinstall windows each time this happens, but the power cycle will restart the garbage controller back in the SSD so it can get back to work on actively maintaining itself.

Russell Watters: I still disagree and point out that Crucial wouldn't be trying to address the issue with firmware fixes if it was actually normal. I'll certainly check to see if this is common to other vendors' drives (note, I already own another drive from another manufacturer and have yet to have this happen). Crucial needs to fix its caching technology.

Russell Watters: In any case, I'm not really here to argue: are you telling me Crucial will not under any circumstances replace a drive that is displaying the "disappear" problem?

Justin _M : That's not entirely true, but if the troubleshooting fixes it, I wouldn't want to send out a new drive to you to make you think it will be different, when this is the nature of the SSD as a whole.
[emphasis added]
Digging further, though, they may not be wrong, but that is not an easy question to answer: Googling "kingston ssd disappear", aside from links where people say they are going to replace their disappeared Crucial M4 with a new Kingston, there are some that describe the same issue happening with Kingston:
http://www.hardwarecanucks.com/forum...y-what-do.html
This involved a "Sandforce" controller and the forums suggest it was a bios bug that was fixed, but nevertheless took down the company:
http://en.wikipedia.org/wiki/SandForce

But the M4 doesn't use the SandForce controller.

Worse, this paper implies that Crucial may be correct:
...but a report from the 11th Usenix Conference on File and Storage Technologies (FAST 13), given early this year, suggests most models have a fundamental problem with sudden power loss. While the paper came out in mid-February, I only recently came across it, after a reader asked if Id look into a rather puzzling recovery program recommended by Crucial for its M4 SSD line....

Baffled, I began to poke at this further, then stumbled across the aforementioned report from early this year.

Samsung Flash SSDResearchers working with the University of Ohio rounded up 15 different SSDs from five different vendors, as well as a brace of HDDs, and put them through a series of tests designed to measure how they responded to sudden power failures. No vendors are identified, but the drives in question incorporate both MLC and SLC. Some (the SLC versions) are explicitly enterprise drives. Some include supercapacitors, which are designed to mitigate catastrophic power failure.

Of the 15 drives (10 different models, from five vendors), only one drive model, from one vendor, had no failures of any sort. One device failed completely (SSD #1), while one-third of SSD #3 became unusable due to metadata corruption. The other SSDs all exhibited various types of data corruption when they unexpectedly lost power, including the high-end enterprise SSDs with SLC NAND and supercapacitors. According to the research team, part of the problem is that virtually none of the devices actually behave as expected under fault conditions. While all the drives claim to use ECC RAM, for example, many exhibited single-bit errors of the kind of errors that ECC is meant to prevent. While one of the two included hard drives also developed errors, the HDDs are both far cheaper and showed no sign of the disastrous failures that characterized the SSDs.
[emphasis added]
http://www.extremetech.com/computing...ling-your-ssds

I quote/link the article about the white paper instead of the paper itself because it is a bit over my head. I really don't know what to do here. I have a $400 SSD that I'd rather not have to throw in the trash, but I also would rather not waste a day re-installing windows every time a minor issue causes it to disappear and become corrupted.

Anyone have experience with this issue? Comments? Recommendations?
Phys.Org News Partner Science news on Phys.org
SensaBubble: It's a bubble, but not as we know it (w/ video)
The hemihelix: Scientists discover a new shape using rubber bands (w/ video)
Microbes provide insights into evolution of human language
Chronos
Chronos is offline
#2
Feb13-14, 05:30 PM
Sci Advisor
PF Gold
Chronos's Avatar
P: 9,185
I'm at a disadvantage. I have a 256 Samsung SSD [~$200] and have not experienced such an issue. It has data migration software that can be used to clone the OS from a hard drive, which is considerably less aggravating than a fresh install.
Ben Niehoff
Ben Niehoff is offline
#3
Feb13-14, 05:58 PM
Sci Advisor
P: 1,563
I had this problem in a drive, but the solution is not nearly so complicated. What I did was:

1. Power on laptop and press whatever key to enter the BIOS screen.

2. Let it sit in BIOS, plugged in, for about 10 minutes. (This way, the drive is getting power, but is not in use).

At this point, either the drive shows up in BIOS (yay!), or it will after rebooting.

It turns out, however, that my drive was having other problems. It would die while my laptop was in sleep mode, even if my laptop was plugged in. I forget the exact details, but since I was using Linux, I would see errors in dmesg about the drive.

I asked Corsair to RMA the drive, and they did so quickly and painlessly. I have not noticed any problems since*. So you may just have a bad drive; I'd contact Crucial while you're still in warranty.

* However, I received the RMA around the same time I got my Surface Pro 2, so I haven't used that laptop as much since then either. In fact, since it has been sitting with no power for a few months now, I can go turn it on later and let you know if there is any issue with the drive booting up.


Some further suggestions: Do you have any regular method of data backup? I built myself a NAS server in mirror mode and I regularly copy things there, so generally speaking I am not concerned if the drive on my laptop fails, or if its OS gets corrupted, except for the temporary inconvenience.

For even more piece-of-mind, you can use Clonezilla to make a clone of your OS drive. I can tell you from experience that it works beautifully. While my SSD was being RMA'd, I cloned the OS to my old HDD and continued to use the laptop, and then cloned it back to the new SSD. You should use Gparted to shrink your partitions slightly (it can shrink Windows partitions safely, including moving "immovable" files), because Clonezilla cannot clone a drive to a *smaller* drive, and not all 250 GB drives contain exactly the same number of bytes.

Edit to correct: My SSD is a Corsair, not Crucial. I was very impressed with Corsair's handling of the issue.

russ_watters
russ_watters is online now
#4
Feb13-14, 06:54 PM
Mentor
P: 22,008

Are all SSDs Unreliable?


Quote Quote by Chronos View Post
I'm at a disadvantage. I have a 256 Samsung SSD [~$200] and have not experienced such an issue.
Have you ever had to shut down your computer by holding-down the power button due to a lock-up or failed shutdown?
It has data migration software that can be used to clone the OS from a hard drive, which is considerably less aggravating than a fresh install.
Yeah, if I start using the SSD again, I'm going to do something like that.
AlephZero
AlephZero is offline
#5
Feb13-14, 07:32 PM
Engineering
Sci Advisor
HW Helper
Thanks
P: 6,386
It's interesting the way usage of SSDs has changed over time. Back in the early days of SSDs on supercomputers, nobody would have even thought about using them for "permanent" file storage. They were strictly for fast access to scratch files, and/or as another level of memory paging where the application could pre-fetch the data it knew would be needed next, rather than letting a OS's virtual memory logic keep trying to play catch-up with the CPU.

I don't have any experience either way with modern "consumer level" SSDs though. But I suspect they more "my computer boots faster than yours" bragging rights for many users, rather than something actually useful - though the reduced power consumption and mechanical reliability are obviously real benefits if you need them.
Ben Niehoff
Ben Niehoff is offline
#6
Feb13-14, 07:56 PM
Sci Advisor
P: 1,563
Mechanical reliability is a big deal to me. I lost an HDD once because I was using my laptop on an airplane. Once I realized what was happening, I shut it down and I was able to get most things off of it later before the drive became unusable. Luckily this happened on the way home; my trip would have been a disaster if it had happened on the way out.

This is actually the main reason I bought my SSD.
Chronos
Chronos is offline
#7
Feb14-14, 01:37 AM
Sci Advisor
PF Gold
Chronos's Avatar
P: 9,185
Yes, I have had to hard boot many times, Russ. The Samsung has so far handled it with ease. Of course, that could change tomorrow.
Psinter
Psinter is offline
#8
Feb25-14, 01:15 AM
Psinter's Avatar
P: 72
Quote Quote by russ_watters View Post
Anyone have experience with this issue? Comments? Recommendations?
I have a comment (useless, but I still want to comment). It's weird. When I go to Amazon I see many bad reviews for SSD Drives while Hard Disk which are cheaper have way less negative reviews. Hard disks failing usually warn and can have their data recovered, but according to the reviews SSDs appear to terminally fail in an instant without warnings. They also appear to have so many tricks on how to make them work. (I think things should just work with no tricks whatsoever like unplugging power, waiting minutes, and stuff).

SSD are claimed to be wonders of technology in the media industry, but why do they have so many people saying they die quickly? Why do the user has to unplug powers, wait some time, re install operating systems, do magic tricks, and many other stuff with this technology? According to the reviews they too appear to be lasting less than hard disks in the long run. That's what has kept me from getting one. It really scares me to see so many people saying they die in a few months in those reviews plus the fact that they have to do acrobatics and magic tricks with their computers to make them work.
Chronos
Chronos is offline
#9
Feb25-14, 01:56 AM
Sci Advisor
PF Gold
Chronos's Avatar
P: 9,185
The 'magic' of an SSD is its instant on ability. It is also a critical weakness when it fails. It is not difficult to restore if you know what to do. The only 'trick' is to boot off a back up HDD with an uncorrupted OS. You can then clone the OS off the HDD to the SSD with proper software.
rcgldr
rcgldr is offline
#10
Feb28-14, 01:48 AM
HW Helper
P: 6,931
Part of the issue is the way SSD's try to distribute writes to the SS memory somewhat evenly, so they utilize a mapping scheme to map "logical" sectors into "physical" sectors, and that map needs to be stored somewhere, usually also in the SS memory. If there's a power loss during a map update operation, it can lose a lot of data. I would assume some sort of self archiving, like having dual maps and not re-using sectors removed from the previous map until the current map update was completed. I don't know how sophisitcated the mapping schemes in current SSD's are.

Maybe someday the number of writes for the "lifespan" of a SS memory will increase so that a mapping scheme is no longer needed (or maybe some SSD's are already there?).
SixNein
SixNein is offline
#11
Feb28-14, 03:52 AM
PF Gold
SixNein's Avatar
P: 183
Quote Quote by russ_watters View Post
Anyone have experience with this issue? Comments? Recommendations?
Change your settings for battery power.

http://www.dummies.com/how-to/conten...ows-7-or-.html

Increase the cut off point so that you don't run completely out of power.


An RMA may also be a good idea.


Register to reply

Related Discussions
Internet - too much, too fast, too unreliable? General Discussion 34
Does this professor sound unreliable? Academic Guidance 16
Is redshift unreliable as a measuring tool? Cosmology 14
Most Unreliable Technique in the World to compute pi Linear & Abstract Algebra 4