Why aren't ditto blocks used more often and more transparently?

  • Thread starter Vanadium 50
  • Start date
  • #1
Vanadium 50
Staff Emeritus
Science Advisor
Education Advisor
2023 Award
34,518
21,272
TL;DR Summary
Why aren't ditto blocks used more often and more transparently?
Why aren't ditto blocks used more often and more transparently?

First, what is a ditto block? It is simply a copy of a block stored elsewhere on the disks. It provides an additional level of redundancy beyond RAID, and faster read speeds. If you have a 3-disk RAID5, and data is distributed as D1 (data) on drive 1, D2 on drive 2 and P1 (parity) on drive 3, if you dittoed it D2, P1 and D1 respectively, you could lose any two drives and get your data back.

This is routinely done with critical data, such as the superblock.

Sp why isn't it done transparently with data? What's the downside? If the disk array is 40% full, I can use the remaining space for one more copy of everything, and two more copies of some things. As the array fills up, the number of copies is reduced.

I see two downsides, but both are easily fixed. One is the writes take longer, because you are writing N copies. Sure, but once you have one complete copy, you can report the write as successful and finish the replication in the background. The other is you use up space N times faster, but again, if this is dynamic, N can decrease as the array fills up.

So, why don't we see more of this out 'in the wild'?
 
Technology news on Phys.org
  • #2
Vanadium 50 said:
TL;DR Summary: Why aren't ditto blocks used more often and more transparently?

So, why don't we see more of this out 'in the wild'?
Probably seen as a threat to the Data-Backup Industrial Complex.

There's another keyword : "dup profile".

Kneejerk reaction says that the radial portion of the read seek-time improves 25% (for two full, contiguous partitions, and not bothering to account for track-length differences on a platter). Rotational latency and settle-time stay the same, of course.

Sounds worth a shot on a HDD, if you don't mind doubling the write wear-n-tear. SSD you only get the security benefit : no improvement in access time.
 
Last edited:
  • #3
Ah, but does it really double the write wear and tear? If you drive is nearly empty, sure. As it fills up, it can't ditto so often. Indeed, the write wear-and-tear increase is at its largest when disks are nearly empty, so if you have a weak drive, better it fail now than later. Because your data has multiple copies, recovery from a bad drive goes faster.

I think resi8liency is a better word than security. It's actually less secure since the data can be recovered from fewer disks.
 
  • #4
Vanadium 50 said:
I think resi8liency is a better word than security.
Was this a finger-slip or did you really mean 'r8y'?

In s/w dev we use terms like i19n and l10n. r8y works perfectly.
 
  • #5
Finger slip.

But my point is that having more data on the drive makes it less secure against bad actors, although it is more robust against hardware failures.
 
  • #6
There may be Backup parameters that can get you close, albeit without read-seek improvement.

(It's been awhile since I bothered - M$'s forced mediocrity-at-best has taken my soul - but I'm one of those guys who likes to micromanage i/o parameters on a PC)
Vanadium 50 said:
Finger slip.

But my point is that having more data on the drive makes it less secure against bad actors, although it is more robust against hardware failures.
Actually, your point was wondering why dup'ing data wasn't done more, so... ?

Dup'ing data in the manner we've been talking about does not increase the security risk : the data is still on one disk, accessed by one controller, and managed by one chunk of software : there's no increase in the number of exploitable entry vectors.
 
Last edited:
  • #7
Well, there's more vendors that Microsoft. :)

You could do this manually. Start with a 10-way mirror, and when that fills up, make it a 5-way, and when that fills up, make it a 3-way, and so on. But that seems like something better handled automatically.

The security hole that I was thinking of involved returning bad drives. If your drive has only every Nth bit on it, there's not so much risk. if you have a complete copy, in N different places, there is more. A past employer had a service contract where we didn't return bad drives for just that reason.
 
  • #8
In my experience, the drive(s) used for paging and temporary files are the ones that fail most often.

The extra Seek operations, at least for drives installed laying flat, seem to be the limiting factor in hardware lifetime.

So having the paging/temporary file drive(s) separated from your data is some extra insurance.

Tip: I've had luck recovering data from a failed drive by changing its orientation and doing an image copy.

Cheers,
Tom
 
  • #9
I think in this day and age, page drives are moving towards SSDs. Why not? Indeed, you can make them cheap/fragile SSDs; if they fail, there's no real loss, just annoyance.

Actual data data looks to be spinning rust for the foreseeable future. 20 TB drives are not uncommon in data centers (and you can pre-order 26 TB drives), and a robust 20 TB SSD would cost a fortune.

My experience is that drives follow the bathtub curve - infant mortality and old age are the problems. But if you can put 1000 hours on a drive, you can usually put 10,000 hours on a drive. So I don't see this increased "wear and tear" is a problem. And as I said, if this is going to happen, I'd rather have this happen early and with lots of replicas.

So why don't we see more of this?
 
  • #10
My first SSD drive, I turned paging off and ran all the download/temp/cache/spool directories in RAM, to keep from wearing it out. Worked fine.
 
Last edited:
  • #11
That just tells me you didn't need a page/swap area to begin with.

The "you'll write your SSD to death" meme is kind of oversold. You get 10.000 writes, right? Call it 8000 from a worse-than-average SSD. Filling a 1 TB HDD takes about an hour, and there are 8000 hours in a year. So you have a year of non-stop writing before the drive falls over. How many calendar years will this take? Five? Ten?

For normal workloads, SSDs will last roughly as long as HDDs. Can I come up with a workload that breaks SSDs? Sure - fill the drive up to 99% and hammer on that last 1%. Is this what people typically do? No.

So I don't think "You only get 9995 writes out of your SSD and not 10.,000!" is a compelling argument.

Further, consider the excitement about the opposite design, deduplication - where if two files store the same block, only one is stored. This, as far as I can tell, has only one use case where it matters, makes the data more fragile, and hogs memory. That should scream out "go the other direction!"
 
  • #12
Vanadium 50 said:
That just tells me you didn't need a page/swap area to begin with.
Well, yeah, which is why I turned it off ; had to keep an eye on things, of course.

Also, it did away with other usually-unnecessary disk-writes, such as print spooling, non-persistent temporary files, downloads which are going to be unpacked anyways, etc.

So, anyways, your objection to your proposal, on security grounds, is that the decreased lifetime of a disk means it gets binned faster, ie: more potential breaches.
 
  • #13
I don't think the number of writes is substantially higher with this implemented, Sure, if the drive is empty it can add a whole bunch of ditto blocks. But if the drive is 2/3 full, only half the data can be dittoed.

Additionally, in principle the system can move the drive head around in a more logical way ("elevator seeking") which might, again in theory, reduce wear on the stepper motors. In practice, I am not so sure. Modern drives do a lot of lying about their geometry to their hosts.
 
  • #14
Vanadium 50 said:
The "you'll write your SSD to death" meme is kind of oversold.
I recall small embedded computers with flash drives and Linux dying after a week or so because someone forgot to disable sync when setting up the drive mounts. Of course, flash drives back then did not have the fancy life-prolonging firmware that modern SSD has so if it was just used as a HDD it died pretty quickly.
 

FAQ: Why aren't ditto blocks used more often and more transparently?

1. Why aren't ditto blocks used more often in scientific research?

Ditto blocks are not used more often in scientific research because they can introduce bias and confounding variables into the study. Researchers prefer to use more transparent and reliable methods to ensure the validity of their results.

2. Are ditto blocks considered a reliable method in scientific research?

Ditto blocks are not considered a reliable method in scientific research because they can lead to inaccurate conclusions due to their potential for bias and lack of transparency. Researchers usually opt for more robust and well-established techniques to obtain accurate and reproducible results.

3. What are the drawbacks of using ditto blocks in research studies?

The drawbacks of using ditto blocks in research studies include the potential for introducing bias, confounding variables, and inaccuracies into the results. Additionally, ditto blocks can make it difficult to replicate the study findings and may not provide transparent and reliable data.

4. How do researchers ensure transparency when using ditto blocks in their studies?

Researchers can ensure transparency when using ditto blocks in their studies by clearly documenting their methods, data collection procedures, and analysis techniques. Additionally, researchers should provide detailed explanations of how ditto blocks were used and how they may have impacted the study results.

5. Are there any situations where using ditto blocks would be appropriate in scientific research?

There may be rare situations where using ditto blocks could be appropriate in scientific research, such as in preliminary exploratory studies or when other methods are not feasible. However, researchers should be cautious when using ditto blocks and take steps to minimize bias and ensure the reliability of their findings.

Back
Top