- #1
- 34,518
- 21,272
- TL;DR Summary
- Why aren't ditto blocks used more often and more transparently?
Why aren't ditto blocks used more often and more transparently?
First, what is a ditto block? It is simply a copy of a block stored elsewhere on the disks. It provides an additional level of redundancy beyond RAID, and faster read speeds. If you have a 3-disk RAID5, and data is distributed as D1 (data) on drive 1, D2 on drive 2 and P1 (parity) on drive 3, if you dittoed it D2, P1 and D1 respectively, you could lose any two drives and get your data back.
This is routinely done with critical data, such as the superblock.
Sp why isn't it done transparently with data? What's the downside? If the disk array is 40% full, I can use the remaining space for one more copy of everything, and two more copies of some things. As the array fills up, the number of copies is reduced.
I see two downsides, but both are easily fixed. One is the writes take longer, because you are writing N copies. Sure, but once you have one complete copy, you can report the write as successful and finish the replication in the background. The other is you use up space N times faster, but again, if this is dynamic, N can decrease as the array fills up.
So, why don't we see more of this out 'in the wild'?
First, what is a ditto block? It is simply a copy of a block stored elsewhere on the disks. It provides an additional level of redundancy beyond RAID, and faster read speeds. If you have a 3-disk RAID5, and data is distributed as D1 (data) on drive 1, D2 on drive 2 and P1 (parity) on drive 3, if you dittoed it D2, P1 and D1 respectively, you could lose any two drives and get your data back.
This is routinely done with critical data, such as the superblock.
Sp why isn't it done transparently with data? What's the downside? If the disk array is 40% full, I can use the remaining space for one more copy of everything, and two more copies of some things. As the array fills up, the number of copies is reduced.
I see two downsides, but both are easily fixed. One is the writes take longer, because you are writing N copies. Sure, but once you have one complete copy, you can report the write as successful and finish the replication in the background. The other is you use up space N times faster, but again, if this is dynamic, N can decrease as the array fills up.
So, why don't we see more of this out 'in the wild'?