Dismiss Notice
Join Physics Forums Today!
The friendliest, high quality science and math community on the planet! Everyone who loves science is here!

SMART reporting and Hard Disk buzzing sound

  1. Dec 29, 2016 #1

    Vanadium 50

    User Avatar
    Staff Emeritus
    Science Advisor
    Education Advisor

    I had a drive that every few minutes makes a buzzing sound. Here's what SMART is telling me.

    Code (Text):
    ID# ATTRIBUTE_NAME          FLAG     VALUE WORST THRESH TYPE      UPDATED  WHEN_FAILED RAW_VALUE
      1 Raw_Read_Error_Rate     0x000b   071   071   016    Pre-fail  Always       -       61213753
      2 Throughput_Performance  0x0005   139   139   054    Pre-fail  Offline      -       71
      3 Spin_Up_Time            0x0007   160   160   024    Pre-fail  Always       -       399 (Average 317)
      4 Start_Stop_Count        0x0012   100   100   000    Old_age   Always       -       89
      5 Reallocated_Sector_Ct   0x0033   100   100   005    Pre-fail  Always       -       87
      7 Seek_Error_Rate         0x000b   100   100   067    Pre-fail  Always       -       0
    Amazingly, here's the SMART health report:

    Code (Text):
    === START OF READ SMART DATA SECTION ===
    SMART overall-health self-assessment test result: PASSED
    How it thinks this drive is healthy with 61 million read errors (since Tuesday) is beyond me.

    I'm doing a surface scan of its replacement now. I hope it finishes and is good before this one gives up the ghost. Anyone thing I am being too paranoid?
     
  2. jcsd
  3. Dec 29, 2016 #2

    fresh_42

    Staff: Mentor

    Not me. Who manufactured the drive?
     
  4. Dec 29, 2016 #3

    Vanadium 50

    User Avatar
    Staff Emeritus
    Science Advisor
    Education Advisor

    Toshiba. It's a DT01ACA300.
     
  5. Dec 29, 2016 #4

    fresh_42

    Staff: Mentor

    Thanks. Looks like an internal. They seem to be far less reliable than external ones (my experiences).
     
  6. Dec 29, 2016 #5

    Vanadium 50

    User Avatar
    Staff Emeritus
    Science Advisor
    Education Advisor

    It is an internal. 3.5", 3TB, SATA-3, 3 TB. It is the oldest of the four drives in the array, and has January 2014 on it. 24799 powered on hours.

    Oddly, the Raw_Read_Error_Rate dropped to zero - but is creeping upward again. It's at 28 now.
     
  7. Dec 29, 2016 #6
    I wouldn't worry about it. The Raw_Read_Error_Rate is the indicator for the rate of sector read operations. There are always errors when attempting to read and this is dealt with by the drive's error correction mechanisms. The RAW_VALUE field is supposed to be the number of read errors but that value is only actually reported by Seagate drives so you can safely ignore this value.

    The important bit to compare here is to compare the Worst field to the Thresh field. If the Worst value drops below the Thresh value, then the drive is considered as failed.

    https://en.wikipedia.org/wiki/S.M.A.R.T.#Known_ATA_S.M.A.R.T._attributes
    "(Vendor specific raw value.) Stores data related to the rate of hardware read errors that occurred when reading data from a disk surface. The raw value has different structure for different vendors and is often not meaningful as a decimal number."


    I suggest that you pull out the drive from the case and reseat it. Also reseat the SATA cable on both ends. See if that does anything to resolve the noise issue.
     
  8. Dec 29, 2016 #7

    Vanadium 50

    User Avatar
    Staff Emeritus
    Science Advisor
    Education Advisor

    Done (it's in its own enclosure) and no difference.

    While errors are normal, it seems to me that 61 million errors is excessive. If that number is meaningless, the other three drives are at 100 for Worst, and this drive is at 71. I'm also a bit concerned that 87 sectors have been reallocated since Tuesday. Reading every byte in use normally takes 2 hours, but Tuesday it took 9. However, no data was lost. (Compared to the data on the drive's mirror and the checksum)
     
  9. Dec 29, 2016 #8
    The 61 million probably doesn't actually mean 61 million read errors. Remember, most vendors don't report this value so it's probably just a meaningless number.

    This is a 3TB drive, each sector is 4k in length. So we're talking 87 sectors in 805 million. The surface scan you ran should mark those sectors as bad and that should be the end of it. You're pretty far off the threshold for Reallocated Sectors.

    Here's the output from one of my drives which is working just fine.
    Code (Text):
    ID# ATTRIBUTE_NAME          FLAG     VALUE WORST THRESH TYPE      UPDATED  WHEN_FAILED RAW_VALUE
      1 Raw_Read_Error_Rate     0x002f   200   200   051    Pre-fail  Always       -       0
      3 Spin_Up_Time            0x0027   143   140   021    Pre-fail  Always       -       3825
      4 Start_Stop_Count        0x0032   096   096   000    Old_age   Always       -       4454
      5 Reallocated_Sector_Ct   0x0033   147   147   140    Pre-fail  Always       -       417
      7 Seek_Error_Rate         0x002e   100   253   000    Old_age   Always       -       0
      9 Power_On_Hours          0x0032   026   026   000    Old_age   Always       -       54256
     
    The reallocated count on this drive is 417, it's a 500GB drive so 417 out of 131 million. This is still normal for a 4-5 year old drive. I should potentially consider getting a new replacement drive and keeping it on standby because my Worst is pretty close to my Thresh but it's still fine.

    You said you had it back in its enclosure. This sounds to me like a external drive you're connecting with USB. Does the caddy you are using support USB3 and were you plugged into a USB2 port when you did your test that took 9 hours? That would explain the 4x longer scan as USB2 is around 4 times slower than USB3.
     
  10. Dec 29, 2016 #9

    Vanadium 50

    User Avatar
    Staff Emeritus
    Science Advisor
    Education Advisor

    The enclosure is SATA - it's an Icy Dock 4-drives-in-the-space-of-3 thing, but the nice thing is I get front access to the drives. The spare is going through badblocks now, and when it finishes, I'll swap it for the loud drive. Then I'll badblock the heck of it out of the suspect drive and based on the output decide if I want to keep it or not.

    The 9 hours was my weekly RAID verification. It's run weekly since April. It's typically 2-3 hours and runs Tuesday nights. My first sign something was wrong was that in the morning it hadn't finished, and the LED from the drive in question was on solid. There were no errors the week before or from the previous weekly SMART long test. (I do a weekly SMART long, a daily SMART short, and a weekly RAID check and compare)

    Oh, and the drive is getting louder. And there's definitely a correlation between its LED and the sound.
     
  11. Dec 29, 2016 #10

    fresh_42

    Staff: Mentor

    This is definitely a red alert. Read it as long as you can and make a copy. If the head is already maladjusted ...
     
  12. Dec 29, 2016 #11
    I also agree. If the drive is getting louder as its spinning then its most likely approaching failure. Backup your data while you still have time.

    I'm curious, what is your RAID setup, and are you running the Intel RAID Volume Data Verify and Repair?
     
  13. Dec 29, 2016 #12

    Vanadium 50

    User Avatar
    Staff Emeritus
    Science Advisor
    Education Advisor

    It's ZFS. I have four 3 TB drives, configured as 2 3+3 mirrors.
     
  14. Dec 30, 2016 #13
    I suspect that the weekly verification of the checksums may have accelerated the drive failure. This is pretty heavy usage for consumer grade hardware. I would suggest that you do not do the checks so often. Drives these days are pretty reliable and a RAID3 array keeps a parity to ensure the data is safe.
    If the data is really important and you need the peace of mind, then perhaps do the checks once a month or once every other month, or invest in enterprise grade hard drives that are meant to see heavy use.
     
  15. Dec 30, 2016 #14

    Vanadium 50

    User Avatar
    Staff Emeritus
    Science Advisor
    Education Advisor

    If a drive can't stand up to 52 reads per year, that is a problem. After all, it holds my login area, so files like .bashrc get read many more than 50 times in a year.

    I would much rather have a drive fail after a year in a way that the data is recoverable than last twice as long but lose data when it fails. Weekly scans protect against silent corruption. Besides, that's the general recommendation - weekly for consumer-grade disks, monthly for enterprise-grade, if possible. There are some very large pools out there, and monthly scrubs would mean non-stop scrubbing. And to be fair, the system worked as intended - it alerted me to a probable failing drive in time to do something about it.

    The drive has been swapped, and is resilvering now. This takes 2-3 hours. The original disk was hot when I took it out. Not warm like other disks - hot. Like fresh from the oven cookies. I'm going to run a couple of R/W badblock tests on it, and if it looks mostly OK, I may keep it around as an emergency spare, but right now I doubt that this will be possible.
     
  16. Jan 2, 2017 #15

    Vanadium 50

    User Avatar
    Staff Emeritus
    Science Advisor
    Education Advisor

    Update: the drive has been replaced. I erased the original drive and it reallocated another sector. It's much quieter in the external USB "toaster".

    In it's normal position, it was running about 65 C. The other three drives (and now the replacement) range from 38 to 41 C or so. In the external USB adapter, drives are vertical and have air on all four sides. Drives run at 25 C or so at idle and high 30's under heavy load. The questionable drive idles at 40 C and is 48-51 C or so at load. My conclusion is that something mechanical in the drive likely has more friction than it should, and it's only a matter of time before it goes. The immediate symptoms are second-order effects.

    Oh, and no data loss. Swapping the drives and rebuilding the RAID was a 5 minute job, plus the time it took to rebuild.
     
  17. Jan 2, 2017 #16

    russ_watters

    User Avatar

    Staff: Mentor

    It doesn't say "healthy", it says "passed" and "pre-fail". That's a C- in my book...
    Nope. I'me very paranoid about hard drive failures and I think justifiably. A failed hard drive is the data equivalent of burning down your house: It's not about the money, it's about the potentially irreplaceable things you lose (if not properly backed-up).

    Since we're on the subject of failing hard drives, I'm going to whine a bit about my Crucial M4 SSD (again). After 6 months or so of being installed in my laptop it turned into a brick for no apparent reason. Google informed me that it had a bug that made it brick after a certain number of hours of use (a counter overflow or something). Fixed with a firmware flash. Awesome. But then I found that if my laptop ever locked-up and had to be hard reset for any reason, it would brick again. Google informed me that this was an "issue" with the drive's fault response system and could be recovered with a cumbersome series of 20 minute power cycles. Crucial didn't consider this to be a problem worthy of recall (since if it comes back to life it isn't really dead, right?), and since it was expensive I put it into a media center PC. Well last night it crashed again. I recovered it, but still, it is really annoying. [/rant]
     
  18. Jan 2, 2017 #17

    Vanadium 50

    User Avatar
    Staff Emeritus
    Science Advisor
    Education Advisor

    I've had good luck with Crucial - or rather, I had some bad luck, but the company really made it right. It was failing main memory, and the replacement memory was also failing. It tested fine with one stick, but any more and it failed. Randomly. On different motherboards. They sent me a big pile of memory and asked me to find a pair that worked and send the rest back. Oh, and they sent me a memory fan as well.

    Kingston on the other hand...I had a 128 GB SSD that failed. Eventually they agreed to replace it, and they replaced it with a 120 GB SSD. Their position was, "hey, close enough. Like it or lump it."
     
  19. Jan 3, 2017 #18
    I've had absolutely rotten luck with Crucial and Samsung SSDs. 100% failure rate for me. I've tried 3 different ones over last year all failed within a month or two. 1 other person at work who I helped swap their spinning 2.5" for a SSD also failed pretty fast. I finally gave up and went for a RAID5 with spinny disks. I obviousy had a run of really bad luck but i really wish i can some day work up the courge try a SSD raid lol
     
  20. Jan 6, 2017 #19

    Vanadium 50

    User Avatar
    Staff Emeritus
    Science Advisor
    Education Advisor

    Oddly, I have had very good luck with a Samsung SSD 840 EVO.

    I've hammered on it pretty hard. I've written 24.50 TB to it. 22204 power on hours. No problems ever.

    Maybe I had good luck because it's an @Evo.
     
  21. Jan 22, 2017 #20

    russ_watters

    User Avatar

    Staff: Mentor

    I had a Samsung SSD 840 Pro until a few minutes ago when it turned into a brick. [sigh] Windows gave me the old "configuring windows, do not turn off your computer" troll -- After two hours, I turned it off and now the SSD won't detect.
     
Know someone interested in this topic? Share this thread via Reddit, Google+, Twitter, or Facebook

Have something to add?
Draft saved Draft deleted



Similar Discussions: SMART reporting and Hard Disk buzzing sound
  1. Hard Disk Problem (Replies: 2)

  2. Hard Disk Data Retrieval (Replies: 18)

  3. Hard Disk Status (Replies: 3)

Loading...