Dismiss Notice
Join Physics Forums Today!
The friendliest, high quality science and math community on the planet! Everyone who loves science is here!

SMART reporting and Hard Disk buzzing sound

  1. Dec 29, 2016 #1

    Vanadium 50

    User Avatar
    Staff Emeritus
    Science Advisor
    Education Advisor
    2017 Award

    I had a drive that every few minutes makes a buzzing sound. Here's what SMART is telling me.

    Code (Text):
    ID# ATTRIBUTE_NAME          FLAG     VALUE WORST THRESH TYPE      UPDATED  WHEN_FAILED RAW_VALUE
      1 Raw_Read_Error_Rate     0x000b   071   071   016    Pre-fail  Always       -       61213753
      2 Throughput_Performance  0x0005   139   139   054    Pre-fail  Offline      -       71
      3 Spin_Up_Time            0x0007   160   160   024    Pre-fail  Always       -       399 (Average 317)
      4 Start_Stop_Count        0x0012   100   100   000    Old_age   Always       -       89
      5 Reallocated_Sector_Ct   0x0033   100   100   005    Pre-fail  Always       -       87
      7 Seek_Error_Rate         0x000b   100   100   067    Pre-fail  Always       -       0
    Amazingly, here's the SMART health report:

    Code (Text):
    === START OF READ SMART DATA SECTION ===
    SMART overall-health self-assessment test result: PASSED
    How it thinks this drive is healthy with 61 million read errors (since Tuesday) is beyond me.

    I'm doing a surface scan of its replacement now. I hope it finishes and is good before this one gives up the ghost. Anyone thing I am being too paranoid?
     
  2. jcsd
  3. Dec 29, 2016 #2

    fresh_42

    User Avatar
    2017 Award

    Staff: Mentor

    Not me. Who manufactured the drive?
     
  4. Dec 29, 2016 #3

    Vanadium 50

    User Avatar
    Staff Emeritus
    Science Advisor
    Education Advisor
    2017 Award

    Toshiba. It's a DT01ACA300.
     
  5. Dec 29, 2016 #4

    fresh_42

    User Avatar
    2017 Award

    Staff: Mentor

    Thanks. Looks like an internal. They seem to be far less reliable than external ones (my experiences).
     
  6. Dec 29, 2016 #5

    Vanadium 50

    User Avatar
    Staff Emeritus
    Science Advisor
    Education Advisor
    2017 Award

    It is an internal. 3.5", 3TB, SATA-3, 3 TB. It is the oldest of the four drives in the array, and has January 2014 on it. 24799 powered on hours.

    Oddly, the Raw_Read_Error_Rate dropped to zero - but is creeping upward again. It's at 28 now.
     
  7. Dec 29, 2016 #6
    I wouldn't worry about it. The Raw_Read_Error_Rate is the indicator for the rate of sector read operations. There are always errors when attempting to read and this is dealt with by the drive's error correction mechanisms. The RAW_VALUE field is supposed to be the number of read errors but that value is only actually reported by Seagate drives so you can safely ignore this value.

    The important bit to compare here is to compare the Worst field to the Thresh field. If the Worst value drops below the Thresh value, then the drive is considered as failed.

    https://en.wikipedia.org/wiki/S.M.A.R.T.#Known_ATA_S.M.A.R.T._attributes
    "(Vendor specific raw value.) Stores data related to the rate of hardware read errors that occurred when reading data from a disk surface. The raw value has different structure for different vendors and is often not meaningful as a decimal number."


    I suggest that you pull out the drive from the case and reseat it. Also reseat the SATA cable on both ends. See if that does anything to resolve the noise issue.
     
  8. Dec 29, 2016 #7

    Vanadium 50

    User Avatar
    Staff Emeritus
    Science Advisor
    Education Advisor
    2017 Award

    Done (it's in its own enclosure) and no difference.

    While errors are normal, it seems to me that 61 million errors is excessive. If that number is meaningless, the other three drives are at 100 for Worst, and this drive is at 71. I'm also a bit concerned that 87 sectors have been reallocated since Tuesday. Reading every byte in use normally takes 2 hours, but Tuesday it took 9. However, no data was lost. (Compared to the data on the drive's mirror and the checksum)
     
  9. Dec 29, 2016 #8
    The 61 million probably doesn't actually mean 61 million read errors. Remember, most vendors don't report this value so it's probably just a meaningless number.

    This is a 3TB drive, each sector is 4k in length. So we're talking 87 sectors in 805 million. The surface scan you ran should mark those sectors as bad and that should be the end of it. You're pretty far off the threshold for Reallocated Sectors.

    Here's the output from one of my drives which is working just fine.
    Code (Text):
    ID# ATTRIBUTE_NAME          FLAG     VALUE WORST THRESH TYPE      UPDATED  WHEN_FAILED RAW_VALUE
      1 Raw_Read_Error_Rate     0x002f   200   200   051    Pre-fail  Always       -       0
      3 Spin_Up_Time            0x0027   143   140   021    Pre-fail  Always       -       3825
      4 Start_Stop_Count        0x0032   096   096   000    Old_age   Always       -       4454
      5 Reallocated_Sector_Ct   0x0033   147   147   140    Pre-fail  Always       -       417
      7 Seek_Error_Rate         0x002e   100   253   000    Old_age   Always       -       0
      9 Power_On_Hours          0x0032   026   026   000    Old_age   Always       -       54256
     
    The reallocated count on this drive is 417, it's a 500GB drive so 417 out of 131 million. This is still normal for a 4-5 year old drive. I should potentially consider getting a new replacement drive and keeping it on standby because my Worst is pretty close to my Thresh but it's still fine.

    You said you had it back in its enclosure. This sounds to me like a external drive you're connecting with USB. Does the caddy you are using support USB3 and were you plugged into a USB2 port when you did your test that took 9 hours? That would explain the 4x longer scan as USB2 is around 4 times slower than USB3.
     
  10. Dec 29, 2016 #9

    Vanadium 50

    User Avatar
    Staff Emeritus
    Science Advisor
    Education Advisor
    2017 Award

    The enclosure is SATA - it's an Icy Dock 4-drives-in-the-space-of-3 thing, but the nice thing is I get front access to the drives. The spare is going through badblocks now, and when it finishes, I'll swap it for the loud drive. Then I'll badblock the heck of it out of the suspect drive and based on the output decide if I want to keep it or not.

    The 9 hours was my weekly RAID verification. It's run weekly since April. It's typically 2-3 hours and runs Tuesday nights. My first sign something was wrong was that in the morning it hadn't finished, and the LED from the drive in question was on solid. There were no errors the week before or from the previous weekly SMART long test. (I do a weekly SMART long, a daily SMART short, and a weekly RAID check and compare)

    Oh, and the drive is getting louder. And there's definitely a correlation between its LED and the sound.
     
  11. Dec 29, 2016 #10

    fresh_42

    User Avatar
    2017 Award

    Staff: Mentor

    This is definitely a red alert. Read it as long as you can and make a copy. If the head is already maladjusted ...
     
  12. Dec 29, 2016 #11
    I also agree. If the drive is getting louder as its spinning then its most likely approaching failure. Backup your data while you still have time.

    I'm curious, what is your RAID setup, and are you running the Intel RAID Volume Data Verify and Repair?
     
  13. Dec 29, 2016 #12

    Vanadium 50

    User Avatar
    Staff Emeritus
    Science Advisor
    Education Advisor
    2017 Award

    It's ZFS. I have four 3 TB drives, configured as 2 3+3 mirrors.
     
  14. Dec 30, 2016 #13
    I suspect that the weekly verification of the checksums may have accelerated the drive failure. This is pretty heavy usage for consumer grade hardware. I would suggest that you do not do the checks so often. Drives these days are pretty reliable and a RAID3 array keeps a parity to ensure the data is safe.
    If the data is really important and you need the peace of mind, then perhaps do the checks once a month or once every other month, or invest in enterprise grade hard drives that are meant to see heavy use.
     
  15. Dec 30, 2016 #14

    Vanadium 50

    User Avatar
    Staff Emeritus
    Science Advisor
    Education Advisor
    2017 Award

    If a drive can't stand up to 52 reads per year, that is a problem. After all, it holds my login area, so files like .bashrc get read many more than 50 times in a year.

    I would much rather have a drive fail after a year in a way that the data is recoverable than last twice as long but lose data when it fails. Weekly scans protect against silent corruption. Besides, that's the general recommendation - weekly for consumer-grade disks, monthly for enterprise-grade, if possible. There are some very large pools out there, and monthly scrubs would mean non-stop scrubbing. And to be fair, the system worked as intended - it alerted me to a probable failing drive in time to do something about it.

    The drive has been swapped, and is resilvering now. This takes 2-3 hours. The original disk was hot when I took it out. Not warm like other disks - hot. Like fresh from the oven cookies. I'm going to run a couple of R/W badblock tests on it, and if it looks mostly OK, I may keep it around as an emergency spare, but right now I doubt that this will be possible.
     
  16. Jan 2, 2017 #15

    Vanadium 50

    User Avatar
    Staff Emeritus
    Science Advisor
    Education Advisor
    2017 Award

    Update: the drive has been replaced. I erased the original drive and it reallocated another sector. It's much quieter in the external USB "toaster".

    In it's normal position, it was running about 65 C. The other three drives (and now the replacement) range from 38 to 41 C or so. In the external USB adapter, drives are vertical and have air on all four sides. Drives run at 25 C or so at idle and high 30's under heavy load. The questionable drive idles at 40 C and is 48-51 C or so at load. My conclusion is that something mechanical in the drive likely has more friction than it should, and it's only a matter of time before it goes. The immediate symptoms are second-order effects.

    Oh, and no data loss. Swapping the drives and rebuilding the RAID was a 5 minute job, plus the time it took to rebuild.
     
  17. Jan 2, 2017 #16

    russ_watters

    User Avatar

    Staff: Mentor

    It doesn't say "healthy", it says "passed" and "pre-fail". That's a C- in my book...
    Nope. I'me very paranoid about hard drive failures and I think justifiably. A failed hard drive is the data equivalent of burning down your house: It's not about the money, it's about the potentially irreplaceable things you lose (if not properly backed-up).

    Since we're on the subject of failing hard drives, I'm going to whine a bit about my Crucial M4 SSD (again). After 6 months or so of being installed in my laptop it turned into a brick for no apparent reason. Google informed me that it had a bug that made it brick after a certain number of hours of use (a counter overflow or something). Fixed with a firmware flash. Awesome. But then I found that if my laptop ever locked-up and had to be hard reset for any reason, it would brick again. Google informed me that this was an "issue" with the drive's fault response system and could be recovered with a cumbersome series of 20 minute power cycles. Crucial didn't consider this to be a problem worthy of recall (since if it comes back to life it isn't really dead, right?), and since it was expensive I put it into a media center PC. Well last night it crashed again. I recovered it, but still, it is really annoying. [/rant]
     
  18. Jan 2, 2017 #17

    Vanadium 50

    User Avatar
    Staff Emeritus
    Science Advisor
    Education Advisor
    2017 Award

    I've had good luck with Crucial - or rather, I had some bad luck, but the company really made it right. It was failing main memory, and the replacement memory was also failing. It tested fine with one stick, but any more and it failed. Randomly. On different motherboards. They sent me a big pile of memory and asked me to find a pair that worked and send the rest back. Oh, and they sent me a memory fan as well.

    Kingston on the other hand...I had a 128 GB SSD that failed. Eventually they agreed to replace it, and they replaced it with a 120 GB SSD. Their position was, "hey, close enough. Like it or lump it."
     
  19. Jan 3, 2017 #18
    I've had absolutely rotten luck with Crucial and Samsung SSDs. 100% failure rate for me. I've tried 3 different ones over last year all failed within a month or two. 1 other person at work who I helped swap their spinning 2.5" for a SSD also failed pretty fast. I finally gave up and went for a RAID5 with spinny disks. I obviousy had a run of really bad luck but i really wish i can some day work up the courge try a SSD raid lol
     
  20. Jan 6, 2017 #19

    Vanadium 50

    User Avatar
    Staff Emeritus
    Science Advisor
    Education Advisor
    2017 Award

    Oddly, I have had very good luck with a Samsung SSD 840 EVO.

    I've hammered on it pretty hard. I've written 24.50 TB to it. 22204 power on hours. No problems ever.

    Maybe I had good luck because it's an @Evo.
     
  21. Jan 22, 2017 #20

    russ_watters

    User Avatar

    Staff: Mentor

    I had a Samsung SSD 840 Pro until a few minutes ago when it turned into a brick. [sigh] Windows gave me the old "configuring windows, do not turn off your computer" troll -- After two hours, I turned it off and now the SSD won't detect.
     
  22. Jan 23, 2017 #21

    Vanadium 50

    User Avatar
    Staff Emeritus
    Science Advisor
    Education Advisor
    2017 Award

    Did it eventually recover? Can you take it out and attach it to a USB dongle?
     
  23. Jan 23, 2017 #22

    russ_watters

    User Avatar

    Staff: Mentor

    Nope; I did and nope.

    Fortunately, I was speaking Samsung customer service's language and they gave me an RMA after just the 30 second description of the problem. Normally I wouldn't go after $150 spent three years ago, but I want these guys to know that SSDs are waaaay too unreliable. Also fortunately, there was nothing of value on this drive; I used it only for programs and the OS. Virtually no data (that's mostly on spinning drives, RAID 1....though some is on my laptop on an SSD).
     
  24. Feb 6, 2017 #23
    That is what I have been saying for like a thousand years, but nobody believes me, russ! Nobody! :frown:

    There are white papers out there that clearly show that consumer grade SSDs are a crappy technology despite the huge effort that is put into making them. I mean, the white papers are not written in a way to negatively criticize the technology, they are objective, but if you read them you realize how crappy the consumer grade technology is. One of the various variables is that the manufacturers try to fix the hardware quality inconveniences with software because doing a good quality hardware product is too expensive (also big in size) and consumers will be very unlikely to buy them unless they reduce the prices. So they reduce the prices by reducing the hardware quality and you end up with all sort of problems. Specially in the software that keeps growing and growing to make up for the hardware quality. As the software grows, more bugs will appear.

    Vanadium 50, as soon as I hear buzzing sounds, I begin looking for a new one. I'd recommend you ignore the SMART report and begin looking for a new one.
     
  25. Feb 6, 2017 #24

    rcgldr

    User Avatar
    Homework Helper

    The raw read errors in itself doesn't mean much without also knowing the number of raw reads, in order to get an idea of the raw read error rate. You would also want to know what raw read rate error corresponds to the drives read failure rate (usually 1 in 10^14 or more bits read). As hard drives increase density, the normal trend is to expect higher raw read error rates, countered with stronger error correction, to maintain the stated read failure rate. Sector remapping algorithms are vendor specific. I had the impression that there was some smart related sequence that indicated if a drive was becoming marginal and should be backed up and replaced.

    One concern here is the mapping of sectors done to distribute writes somewhat evenly on the internal memory, combined with garbage collection performed to repack the data, as overwritten sectors end up as duplicates in the memory with a flag obsoleting the "older" copies. Operating systems that use the trim command to note which sectors have been deleted helps with the garbage collection. It's a pretty complex process. During a power loss, it would be nice if the SSD drive had enough energy storage (capacitor) in order to complete any on going garbage collection cycle.

    As noted by russ_waters, it's probably best to keep SSD's for operating system and programs but not data. Only the stuff that doesn't change much (except for the near daily updates of Windows defender ...) , using image backups to regular hard drives just to be safe.
     
Share this great discussion with others via Reddit, Google+, Twitter, or Facebook

Have something to add?
Draft saved Draft deleted