Dismiss Notice
Join Physics Forums Today!
The friendliest, high quality science and math community on the planet! Everyone who loves science is here!

Best way of copying same data to several drives?

  1. May 23, 2017 #1

    ORF

    User Avatar

    Hello

    I would like to copy a huge amount of data (10-20TB) to 5-10 hdd (internal/external).

    I thought about using the usual "cp/rsync" utilities, but it will take a long time.

    After a quick search, I found this unix utility
    http://ask.metafilter.com/260329/best-way-to-clone-my-hard-drive-to-multiple-external-drives-at-once
    https://en.wikipedia.org/wiki/Mdadm#Non-RAID_configurations

    Is there any limitation, something to worry about, or it can be used as "cp"?

    Thank you for your time :)

    Regards,
    ORF.
     
  2. jcsd
  3. May 23, 2017 #2
    I think the fastest way to copy data from one drive to many without special hardware is to use multicasting (supported by clonezilla), which sends the data to the network once and is received by all waiting clients (this would require a separate PC for each copy).
     
  4. May 25, 2017 #3
    Have you found a good solution yet? You could always write a program that will do this. Maybe a command line tool that takes as it's first argument the folder to copy over and then all the destination drives. Shouldn't take more then a couple dozen lines of code.
    btw. are all HDDs connected to the same PC? Are they all about equally fast, including the external ones?
    Also where did you get 20TB HDDs from?
     
  5. May 27, 2017 #4

    ORF

    User Avatar

    Hello

    @stoomart: thank you. I will try it :)
    @DrZoidberg: No, I haven't found a good solution yet... The answers to your questions are: yes, all disks can be conneted to the same PC. No, the internal ones are much (3-4 times) faster than external ones.

    Thank you very much for your time :)

    Regards,
    ORF
     
  6. May 27, 2017 #5

    Vanadium 50

    User Avatar
    Staff Emeritus
    Science Advisor
    Education Advisor

    In that case, you can run 3-4 copies at once. Your throughput is maximized when every drive is working at 100% capacity.

    20 TB is a lot of data. A drive can run at about 1 Gb/s sustained, so it will take 2 days or so just to read every bit on the source drive, divided by however much parallelism you are able to achieve. If the output drives are 3-4 times slower, it will take a week. It's going to take a long time no matter what.
     
  7. May 27, 2017 #6
    I did some quick testing with the commands 'tar' and 'tee' for single read/parallel write, which seems ideal if all drives can be connected simultaneously. One thing I noticed was the writes only transferred as fast as the slowest drive, but as @Vanadium 50 mentioned, the best case scenario will still take at least a week with a single read from the source disk.
     
    Last edited: May 27, 2017
  8. May 27, 2017 #7

    Vanadium 50

    User Avatar
    Staff Emeritus
    Science Advisor
    Education Advisor

    I don't know what will happen with tee if one drive falls behind. I suspect it will not be pretty.
     
  9. May 27, 2017 #8
    Using the command 'iotop' during my test, I saw the other drives slow down so the same transer rate used to all drives.
     
  10. May 27, 2017 #9
    Yes, the tar/tee method sounds like a good solution. But maybe you want to do all the internal drives first and afterwards copy to the external ones.
     
Know someone interested in this topic? Share this thread via Reddit, Google+, Twitter, or Facebook

Have something to add?
Draft saved Draft deleted



Similar Discussions: Best way of copying same data to several drives?
  1. Copying data (Replies: 6)

Loading...