SMART reporting and Hard Disk buzzing sound (1 Viewer)

Users Who Are Viewing This Thread (Users: 0, Guests: 1)

Vanadium 50

Staff Emeritus
Science Advisor
Education Advisor
22,374
4,703
I had a drive that every few minutes makes a buzzing sound. Here's what SMART is telling me.

Code:
ID# ATTRIBUTE_NAME          FLAG     VALUE WORST THRESH TYPE      UPDATED  WHEN_FAILED RAW_VALUE
  1 Raw_Read_Error_Rate     0x000b   071   071   016    Pre-fail  Always       -       61213753
  2 Throughput_Performance  0x0005   139   139   054    Pre-fail  Offline      -       71
  3 Spin_Up_Time            0x0007   160   160   024    Pre-fail  Always       -       399 (Average 317)
  4 Start_Stop_Count        0x0012   100   100   000    Old_age   Always       -       89
  5 Reallocated_Sector_Ct   0x0033   100   100   005    Pre-fail  Always       -       87
  7 Seek_Error_Rate         0x000b   100   100   067    Pre-fail  Always       -       0
Amazingly, here's the SMART health report:

Code:
=== START OF READ SMART DATA SECTION ===
SMART overall-health self-assessment test result: PASSED
How it thinks this drive is healthy with 61 million read errors (since Tuesday) is beyond me.

I'm doing a surface scan of its replacement now. I hope it finishes and is good before this one gives up the ghost. Anyone thing I am being too paranoid?
 

Vanadium 50

Staff Emeritus
Science Advisor
Education Advisor
22,374
4,703
Toshiba. It's a DT01ACA300.
 

Vanadium 50

Staff Emeritus
Science Advisor
Education Advisor
22,374
4,703
It is an internal. 3.5", 3TB, SATA-3, 3 TB. It is the oldest of the four drives in the array, and has January 2014 on it. 24799 powered on hours.

Oddly, the Raw_Read_Error_Rate dropped to zero - but is creeping upward again. It's at 28 now.
 
441
93
I had a drive that every few minutes makes a buzzing sound. Here's what SMART is telling me.

Code:
ID# ATTRIBUTE_NAME          FLAG     VALUE WORST THRESH TYPE      UPDATED  WHEN_FAILED RAW_VALUE
  1 Raw_Read_Error_Rate     0x000b   071   071   016    Pre-fail  Always       -       61213753
  2 Throughput_Performance  0x0005   139   139   054    Pre-fail  Offline      -       71
  3 Spin_Up_Time            0x0007   160   160   024    Pre-fail  Always       -       399 (Average 317)
  4 Start_Stop_Count        0x0012   100   100   000    Old_age   Always       -       89
  5 Reallocated_Sector_Ct   0x0033   100   100   005    Pre-fail  Always       -       87
  7 Seek_Error_Rate         0x000b   100   100   067    Pre-fail  Always       -       0
Amazingly, here's the SMART health report:

Code:
=== START OF READ SMART DATA SECTION ===
SMART overall-health self-assessment test result: PASSED
How it thinks this drive is healthy with 61 million read errors (since Tuesday) is beyond me.

I'm doing a surface scan of its replacement now. I hope it finishes and is good before this one gives up the ghost. Anyone thing I am being too paranoid?
I wouldn't worry about it. The Raw_Read_Error_Rate is the indicator for the rate of sector read operations. There are always errors when attempting to read and this is dealt with by the drive's error correction mechanisms. The RAW_VALUE field is supposed to be the number of read errors but that value is only actually reported by Seagate drives so you can safely ignore this value.

The important bit to compare here is to compare the Worst field to the Thresh field. If the Worst value drops below the Thresh value, then the drive is considered as failed.

https://en.wikipedia.org/wiki/S.M.A.R.T.#Known_ATA_S.M.A.R.T._attributes
"(Vendor specific raw value.) Stores data related to the rate of hardware read errors that occurred when reading data from a disk surface. The raw value has different structure for different vendors and is often not meaningful as a decimal number."


I suggest that you pull out the drive from the case and reseat it. Also reseat the SATA cable on both ends. See if that does anything to resolve the noise issue.
 

Vanadium 50

Staff Emeritus
Science Advisor
Education Advisor
22,374
4,703
I suggest that you pull out the drive from the case and reseat it. Also reseat the SATA cable on both ends. See if that does anything to resolve the noise issue.
Done (it's in its own enclosure) and no difference.

While errors are normal, it seems to me that 61 million errors is excessive. If that number is meaningless, the other three drives are at 100 for Worst, and this drive is at 71. I'm also a bit concerned that 87 sectors have been reallocated since Tuesday. Reading every byte in use normally takes 2 hours, but Tuesday it took 9. However, no data was lost. (Compared to the data on the drive's mirror and the checksum)
 
441
93
Done (it's in its own enclosure) and no difference.

While errors are normal, it seems to me that 61 million errors is excessive. If that number is meaningless, the other three drives are at 100 for Worst, and this drive is at 71. I'm also a bit concerned that 87 sectors have been reallocated since Tuesday. Reading every byte in use normally takes 2 hours, but Tuesday it took 9. However, no data was lost. (Compared to the data on the drive's mirror and the checksum)
The 61 million probably doesn't actually mean 61 million read errors. Remember, most vendors don't report this value so it's probably just a meaningless number.

This is a 3TB drive, each sector is 4k in length. So we're talking 87 sectors in 805 million. The surface scan you ran should mark those sectors as bad and that should be the end of it. You're pretty far off the threshold for Reallocated Sectors.

Here's the output from one of my drives which is working just fine.
Code:
ID# ATTRIBUTE_NAME          FLAG     VALUE WORST THRESH TYPE      UPDATED  WHEN_FAILED RAW_VALUE
  1 Raw_Read_Error_Rate     0x002f   200   200   051    Pre-fail  Always       -       0
  3 Spin_Up_Time            0x0027   143   140   021    Pre-fail  Always       -       3825
  4 Start_Stop_Count        0x0032   096   096   000    Old_age   Always       -       4454
  5 Reallocated_Sector_Ct   0x0033   147   147   140    Pre-fail  Always       -       417
  7 Seek_Error_Rate         0x002e   100   253   000    Old_age   Always       -       0
  9 Power_On_Hours          0x0032   026   026   000    Old_age   Always       -       54256
The reallocated count on this drive is 417, it's a 500GB drive so 417 out of 131 million. This is still normal for a 4-5 year old drive. I should potentially consider getting a new replacement drive and keeping it on standby because my Worst is pretty close to my Thresh but it's still fine.

You said you had it back in its enclosure. This sounds to me like a external drive you're connecting with USB. Does the caddy you are using support USB3 and were you plugged into a USB2 port when you did your test that took 9 hours? That would explain the 4x longer scan as USB2 is around 4 times slower than USB3.
 

Vanadium 50

Staff Emeritus
Science Advisor
Education Advisor
22,374
4,703
The enclosure is SATA - it's an Icy Dock 4-drives-in-the-space-of-3 thing, but the nice thing is I get front access to the drives. The spare is going through badblocks now, and when it finishes, I'll swap it for the loud drive. Then I'll badblock the heck of it out of the suspect drive and based on the output decide if I want to keep it or not.

The 9 hours was my weekly RAID verification. It's run weekly since April. It's typically 2-3 hours and runs Tuesday nights. My first sign something was wrong was that in the morning it hadn't finished, and the LED from the drive in question was on solid. There were no errors the week before or from the previous weekly SMART long test. (I do a weekly SMART long, a daily SMART short, and a weekly RAID check and compare)

Oh, and the drive is getting louder. And there's definitely a correlation between its LED and the sound.
 
441
93
This is definitely a red alert. Read it as long as you can and make a copy. If the head is already maladjusted ...
I also agree. If the drive is getting louder as its spinning then its most likely approaching failure. Backup your data while you still have time.

The 9 hours was my weekly RAID verification. It's run weekly since April. It's typically 2-3 hours and runs Tuesday nights.
I'm curious, what is your RAID setup, and are you running the Intel RAID Volume Data Verify and Repair?
 

Vanadium 50

Staff Emeritus
Science Advisor
Education Advisor
22,374
4,703
It's ZFS. I have four 3 TB drives, configured as 2 3+3 mirrors.
 
441
93
I suspect that the weekly verification of the checksums may have accelerated the drive failure. This is pretty heavy usage for consumer grade hardware. I would suggest that you do not do the checks so often. Drives these days are pretty reliable and a RAID3 array keeps a parity to ensure the data is safe.
If the data is really important and you need the peace of mind, then perhaps do the checks once a month or once every other month, or invest in enterprise grade hard drives that are meant to see heavy use.
 

Vanadium 50

Staff Emeritus
Science Advisor
Education Advisor
22,374
4,703
If a drive can't stand up to 52 reads per year, that is a problem. After all, it holds my login area, so files like .bashrc get read many more than 50 times in a year.

I would much rather have a drive fail after a year in a way that the data is recoverable than last twice as long but lose data when it fails. Weekly scans protect against silent corruption. Besides, that's the general recommendation - weekly for consumer-grade disks, monthly for enterprise-grade, if possible. There are some very large pools out there, and monthly scrubs would mean non-stop scrubbing. And to be fair, the system worked as intended - it alerted me to a probable failing drive in time to do something about it.

The drive has been swapped, and is resilvering now. This takes 2-3 hours. The original disk was hot when I took it out. Not warm like other disks - hot. Like fresh from the oven cookies. I'm going to run a couple of R/W badblock tests on it, and if it looks mostly OK, I may keep it around as an emergency spare, but right now I doubt that this will be possible.
 

Vanadium 50

Staff Emeritus
Science Advisor
Education Advisor
22,374
4,703
Update: the drive has been replaced. I erased the original drive and it reallocated another sector. It's much quieter in the external USB "toaster".

In it's normal position, it was running about 65 C. The other three drives (and now the replacement) range from 38 to 41 C or so. In the external USB adapter, drives are vertical and have air on all four sides. Drives run at 25 C or so at idle and high 30's under heavy load. The questionable drive idles at 40 C and is 48-51 C or so at load. My conclusion is that something mechanical in the drive likely has more friction than it should, and it's only a matter of time before it goes. The immediate symptoms are second-order effects.

Oh, and no data loss. Swapping the drives and rebuilding the RAID was a 5 minute job, plus the time it took to rebuild.
 

russ_watters

Mentor
17,945
4,445
How it thinks this drive is healthy with 61 million read errors (since Tuesday) is beyond me.
It doesn't say "healthy", it says "passed" and "pre-fail". That's a C- in my book...
Anyone thin[k] I am being too paranoid?
Nope. I'me very paranoid about hard drive failures and I think justifiably. A failed hard drive is the data equivalent of burning down your house: It's not about the money, it's about the potentially irreplaceable things you lose (if not properly backed-up).

Since we're on the subject of failing hard drives, I'm going to whine a bit about my Crucial M4 SSD (again). After 6 months or so of being installed in my laptop it turned into a brick for no apparent reason. Google informed me that it had a bug that made it brick after a certain number of hours of use (a counter overflow or something). Fixed with a firmware flash. Awesome. But then I found that if my laptop ever locked-up and had to be hard reset for any reason, it would brick again. Google informed me that this was an "issue" with the drive's fault response system and could be recovered with a cumbersome series of 20 minute power cycles. Crucial didn't consider this to be a problem worthy of recall (since if it comes back to life it isn't really dead, right?), and since it was expensive I put it into a media center PC. Well last night it crashed again. I recovered it, but still, it is really annoying. [/rant]
 

Vanadium 50

Staff Emeritus
Science Advisor
Education Advisor
22,374
4,703
I've had good luck with Crucial - or rather, I had some bad luck, but the company really made it right. It was failing main memory, and the replacement memory was also failing. It tested fine with one stick, but any more and it failed. Randomly. On different motherboards. They sent me a big pile of memory and asked me to find a pair that worked and send the rest back. Oh, and they sent me a memory fan as well.

Kingston on the other hand...I had a 128 GB SSD that failed. Eventually they agreed to replace it, and they replaced it with a 120 GB SSD. Their position was, "hey, close enough. Like it or lump it."
 
441
93
I've had absolutely rotten luck with Crucial and Samsung SSDs. 100% failure rate for me. I've tried 3 different ones over last year all failed within a month or two. 1 other person at work who I helped swap their spinning 2.5" for a SSD also failed pretty fast. I finally gave up and went for a RAID5 with spinny disks. I obviousy had a run of really bad luck but i really wish i can some day work up the courge try a SSD raid lol
 

Vanadium 50

Staff Emeritus
Science Advisor
Education Advisor
22,374
4,703
Oddly, I have had very good luck with a Samsung SSD 840 EVO.

I've hammered on it pretty hard. I've written 24.50 TB to it. 22204 power on hours. No problems ever.

Maybe I had good luck because it's an @Evo.
 

russ_watters

Mentor
17,945
4,445
Oddly, I have had very good luck with a Samsung SSD 840 EVO.
I had a Samsung SSD 840 Pro until a few minutes ago when it turned into a brick. [sigh] Windows gave me the old "configuring windows, do not turn off your computer" troll -- After two hours, I turned it off and now the SSD won't detect.
 

The Physics Forums Way

We Value Quality
• Topics based on mainstream science
• Proper English grammar and spelling
We Value Civility
• Positive and compassionate attitudes
• Patience while debating
We Value Productivity
• Disciplined to remain on-topic
• Recognition of own weaknesses
• Solo and co-op problem solving
Top