Why aren't ditto blocks used more often and more transparently?

  • Thread starter Thread starter Vanadium 50
  • Start date Start date
Click For Summary

Discussion Overview

The discussion revolves around the use of ditto blocks in data storage systems, exploring their potential benefits and drawbacks. Participants examine the redundancy and speed advantages of ditto blocks compared to traditional RAID configurations, as well as the reasons for their limited adoption in practice.

Discussion Character

  • Exploratory
  • Technical explanation
  • Debate/contested

Main Points Raised

  • Some participants describe ditto blocks as copies of data stored elsewhere, providing redundancy and faster read speeds, particularly in RAID configurations.
  • Concerns are raised about the perceived threat to the Data-Backup Industrial Complex, suggesting that this may hinder the adoption of ditto blocks.
  • Participants discuss the impact of ditto blocks on write wear and tear, with some arguing that the effect is more pronounced when drives are nearly empty.
  • There is a suggestion that having more data on a drive may reduce security against bad actors, despite improving resilience against hardware failures.
  • Some participants propose that the management of duplication could be automated rather than done manually, to optimize storage efficiency.
  • Discussion includes the idea that separating paging and temporary files from data drives could provide additional insurance against drive failures.
  • There are differing opinions on the longevity of SSDs versus HDDs, with some arguing that SSDs can last as long as HDDs under normal workloads.
  • Concerns are raised about the implications of deduplication, which some argue makes data more fragile compared to duplication methods like ditto blocks.
  • Participants question the actual increase in write operations due to ditto blocks, suggesting that logical data management could mitigate wear on drives.

Areas of Agreement / Disagreement

Participants express a range of views on the benefits and drawbacks of ditto blocks, with no clear consensus on their practicality or security implications. The discussion remains unresolved regarding the optimal approach to data redundancy and the trade-offs involved.

Contextual Notes

Limitations include assumptions about drive wear, the impact of data management strategies on performance, and the varying definitions of security in the context of data storage.

Vanadium 50
Staff Emeritus
Science Advisor
Education Advisor
Gold Member
Dearly Missed
Messages
35,005
Reaction score
21,707
TL;DR
Why aren't ditto blocks used more often and more transparently?
Why aren't ditto blocks used more often and more transparently?

First, what is a ditto block? It is simply a copy of a block stored elsewhere on the disks. It provides an additional level of redundancy beyond RAID, and faster read speeds. If you have a 3-disk RAID5, and data is distributed as D1 (data) on drive 1, D2 on drive 2 and P1 (parity) on drive 3, if you dittoed it D2, P1 and D1 respectively, you could lose any two drives and get your data back.

This is routinely done with critical data, such as the superblock.

Sp why isn't it done transparently with data? What's the downside? If the disk array is 40% full, I can use the remaining space for one more copy of everything, and two more copies of some things. As the array fills up, the number of copies is reduced.

I see two downsides, but both are easily fixed. One is the writes take longer, because you are writing N copies. Sure, but once you have one complete copy, you can report the write as successful and finish the replication in the background. The other is you use up space N times faster, but again, if this is dynamic, N can decrease as the array fills up.

So, why don't we see more of this out 'in the wild'?
 
Technology news on Phys.org
Vanadium 50 said:
TL;DR Summary: Why aren't ditto blocks used more often and more transparently?

So, why don't we see more of this out 'in the wild'?
Probably seen as a threat to the Data-Backup Industrial Complex.

There's another keyword : "dup profile".

Kneejerk reaction says that the radial portion of the read seek-time improves 25% (for two full, contiguous partitions, and not bothering to account for track-length differences on a platter). Rotational latency and settle-time stay the same, of course.

Sounds worth a shot on a HDD, if you don't mind doubling the write wear-n-tear. SSD you only get the security benefit : no improvement in access time.
 
Last edited:
Ah, but does it really double the write wear and tear? If you drive is nearly empty, sure. As it fills up, it can't ditto so often. Indeed, the write wear-and-tear increase is at its largest when disks are nearly empty, so if you have a weak drive, better it fail now than later. Because your data has multiple copies, recovery from a bad drive goes faster.

I think resi8liency is a better word than security. It's actually less secure since the data can be recovered from fewer disks.
 
Vanadium 50 said:
I think resi8liency is a better word than security.
Was this a finger-slip or did you really mean 'r8y'?

In s/w dev we use terms like i19n and l10n. r8y works perfectly.
 
Finger slip.

But my point is that having more data on the drive makes it less secure against bad actors, although it is more robust against hardware failures.
 
There may be Backup parameters that can get you close, albeit without read-seek improvement.

(It's been awhile since I bothered - M$'s forced mediocrity-at-best has taken my soul - but I'm one of those guys who likes to micromanage i/o parameters on a PC)
Vanadium 50 said:
Finger slip.

But my point is that having more data on the drive makes it less secure against bad actors, although it is more robust against hardware failures.
Actually, your point was wondering why dup'ing data wasn't done more, so... ?

Dup'ing data in the manner we've been talking about does not increase the security risk : the data is still on one disk, accessed by one controller, and managed by one chunk of software : there's no increase in the number of exploitable entry vectors.
 
Last edited:
Well, there's more vendors that Microsoft. :)

You could do this manually. Start with a 10-way mirror, and when that fills up, make it a 5-way, and when that fills up, make it a 3-way, and so on. But that seems like something better handled automatically.

The security hole that I was thinking of involved returning bad drives. If your drive has only every Nth bit on it, there's not so much risk. if you have a complete copy, in N different places, there is more. A past employer had a service contract where we didn't return bad drives for just that reason.
 
In my experience, the drive(s) used for paging and temporary files are the ones that fail most often.

The extra Seek operations, at least for drives installed laying flat, seem to be the limiting factor in hardware lifetime.

So having the paging/temporary file drive(s) separated from your data is some extra insurance.

Tip: I've had luck recovering data from a failed drive by changing its orientation and doing an image copy.

Cheers,
Tom
 
I think in this day and age, page drives are moving towards SSDs. Why not? Indeed, you can make them cheap/fragile SSDs; if they fail, there's no real loss, just annoyance.

Actual data data looks to be spinning rust for the foreseeable future. 20 TB drives are not uncommon in data centers (and you can pre-order 26 TB drives), and a robust 20 TB SSD would cost a fortune.

My experience is that drives follow the bathtub curve - infant mortality and old age are the problems. But if you can put 1000 hours on a drive, you can usually put 10,000 hours on a drive. So I don't see this increased "wear and tear" is a problem. And as I said, if this is going to happen, I'd rather have this happen early and with lots of replicas.

So why don't we see more of this?
 
  • #10
My first SSD drive, I turned paging off and ran all the download/temp/cache/spool directories in RAM, to keep from wearing it out. Worked fine.
 
Last edited:
  • #11
That just tells me you didn't need a page/swap area to begin with.

The "you'll write your SSD to death" meme is kind of oversold. You get 10.000 writes, right? Call it 8000 from a worse-than-average SSD. Filling a 1 TB HDD takes about an hour, and there are 8000 hours in a year. So you have a year of non-stop writing before the drive falls over. How many calendar years will this take? Five? Ten?

For normal workloads, SSDs will last roughly as long as HDDs. Can I come up with a workload that breaks SSDs? Sure - fill the drive up to 99% and hammer on that last 1%. Is this what people typically do? No.

So I don't think "You only get 9995 writes out of your SSD and not 10.,000!" is a compelling argument.

Further, consider the excitement about the opposite design, deduplication - where if two files store the same block, only one is stored. This, as far as I can tell, has only one use case where it matters, makes the data more fragile, and hogs memory. That should scream out "go the other direction!"
 
  • #12
Vanadium 50 said:
That just tells me you didn't need a page/swap area to begin with.
Well, yeah, which is why I turned it off ; had to keep an eye on things, of course.

Also, it did away with other usually-unnecessary disk-writes, such as print spooling, non-persistent temporary files, downloads which are going to be unpacked anyways, etc.

So, anyways, your objection to your proposal, on security grounds, is that the decreased lifetime of a disk means it gets binned faster, ie: more potential breaches.
 
  • #13
I don't think the number of writes is substantially higher with this implemented, Sure, if the drive is empty it can add a whole bunch of ditto blocks. But if the drive is 2/3 full, only half the data can be dittoed.

Additionally, in principle the system can move the drive head around in a more logical way ("elevator seeking") which might, again in theory, reduce wear on the stepper motors. In practice, I am not so sure. Modern drives do a lot of lying about their geometry to their hosts.
 
  • #14
Vanadium 50 said:
The "you'll write your SSD to death" meme is kind of oversold.
I recall small embedded computers with flash drives and Linux dying after a week or so because someone forgot to disable sync when setting up the drive mounts. Of course, flash drives back then did not have the fancy life-prolonging firmware that modern SSD has so if it was just used as a HDD it died pretty quickly.
 

Similar threads

Replies
2
Views
3K
  • · Replies 2 ·
Replies
2
Views
3K
  • · Replies 2 ·
Replies
2
Views
3K
Replies
1
Views
2K
  • · Replies 46 ·
2
Replies
46
Views
9K
  • · Replies 1 ·
Replies
1
Views
3K
  • · Replies 1 ·
Replies
1
Views
3K
  • · Replies 26 ·
Replies
26
Views
8K
  • · Replies 4 ·
Replies
4
Views
3K