Best way of copying same data to several drives?

  • Thread starter Thread starter ORF
  • Start date Start date
  • Tags Tags
    Data
Click For Summary
To copy a large amount of data (10-20TB) to multiple hard drives (5-10 internal/external), traditional methods like "cp" or "rsync" may be too slow. An alternative approach involves using multicasting utilities like Clonezilla, which can efficiently send data over a network to multiple clients simultaneously, though this requires a separate PC for each copy. For local transfers, connecting all drives to the same PC is recommended, with internal drives performing significantly faster (3-4 times) than external ones. Testing with commands like 'tar' and 'tee' shows that while they can facilitate parallel writing, the overall speed is limited by the slowest drive. It is suggested to prioritize copying to internal drives first before transferring data to external drives. Despite these methods, the transfer process will still be time-consuming, potentially taking up to a week depending on the configuration and speed of the drives involved.
ORF
Messages
169
Reaction score
18
Hello

I would like to copy a huge amount of data (10-20TB) to 5-10 hdd (internal/external).

I thought about using the usual "cp/rsync" utilities, but it will take a long time.

After a quick search, I found this unix utility
http://ask.metafilter.com/260329/best-way-to-clone-my-hard-drive-to-multiple-external-drives-at-once
https://en.wikipedia.org/wiki/Mdadm#Non-RAID_configurations

Is there any limitation, something to worry about, or it can be used as "cp"?

Thank you for your time :)

Regards,
ORF.
 
Computer science news on Phys.org
ORF said:
Hello

I would like to copy a huge amount of data (10-20TB) to 5-10 hdd (internal/external).

I thought about using the usual "cp/rsync" utilities, but it will take a long time.

After a quick search, I found this unix utility
http://ask.metafilter.com/260329/best-way-to-clone-my-hard-drive-to-multiple-external-drives-at-once
https://en.wikipedia.org/wiki/Mdadm#Non-RAID_configurations

Is there any limitation, something to worry about, or it can be used as "cp"?

Thank you for your time :)

Regards,
ORF.
I think the fastest way to copy data from one drive to many without special hardware is to use multicasting (supported by clonezilla), which sends the data to the network once and is received by all waiting clients (this would require a separate PC for each copy).
 
  • Like
Likes Jamison Lahman
Have you found a good solution yet? You could always write a program that will do this. Maybe a command line tool that takes as it's first argument the folder to copy over and then all the destination drives. Shouldn't take more then a couple dozen lines of code.
btw. are all HDDs connected to the same PC? Are they all about equally fast, including the external ones?
Also where did you get 20TB HDDs from?
 
Hello

@stoomart: thank you. I will try it :)
@DrZoidberg: No, I haven't found a good solution yet... The answers to your questions are: yes, all disks can be conneted to the same PC. No, the internal ones are much (3-4 times) faster than external ones.

Thank you very much for your time :)

Regards,
ORF
 
ORF said:
No, the internal ones are much (3-4 times) faster than external ones.

In that case, you can run 3-4 copies at once. Your throughput is maximized when every drive is working at 100% capacity.

20 TB is a lot of data. A drive can run at about 1 Gb/s sustained, so it will take 2 days or so just to read every bit on the source drive, divided by however much parallelism you are able to achieve. If the output drives are 3-4 times slower, it will take a week. It's going to take a long time no matter what.
 
  • Like
Likes Jamison Lahman
ORF said:
yes, all disks can be conneted to the same PC. No, the internal ones are much (3-4 times) faster than external ones.
I did some quick testing with the commands 'tar' and 'tee' for single read/parallel write, which seems ideal if all drives can be connected simultaneously. One thing I noticed was the writes only transferred as fast as the slowest drive, but as @Vanadium 50 mentioned, the best case scenario will still take at least a week with a single read from the source disk.
 
Last edited:
I don't know what will happen with tee if one drive falls behind. I suspect it will not be pretty.
 
Vanadium 50 said:
I don't know what will happen with tee if one drive falls behind. I suspect it will not be pretty.
Using the command 'iotop' during my test, I saw the other drives slow down so the same transer rate used to all drives.
 
Yes, the tar/tee method sounds like a good solution. But maybe you want to do all the internal drives first and afterwards copy to the external ones.
 

Similar threads

Replies
2
Views
3K
  • · Replies 6 ·
Replies
6
Views
2K
Replies
5
Views
2K
  • · Replies 1 ·
Replies
1
Views
2K
  • · Replies 1 ·
Replies
1
Views
3K
  • · Replies 35 ·
2
Replies
35
Views
7K
  • · Replies 2 ·
Replies
2
Views
4K
  • · Replies 6 ·
Replies
6
Views
4K
  • · Replies 152 ·
6
Replies
152
Views
10K
  • · Replies 1 ·
Replies
1
Views
3K