File System Improvements in a Half Century

Vanadium 50 · Feb 11, 2023

Triggered by the restore data from a deleted file thread, I was thinking about what improvements have been made in file systems in the last 50 or so years, What was not present in Multics or ITS or early unix..."

I'm not talking about capacity limits. At the birth of the PC, one could have build FAT that would have worked on today's disks - a million times bigger - but what would have been the point?

I can think of three candidates:
1. Sparse files
2. Deduplication
3. Copy-on-write

Sparse files are definietly a niche. Copy on write can promote fragmentation, which was an issue back in the day, so I could understand why it didn't happen. ("You want me to do what?") Deduplicaton is a good ides when it is, and a very very bad idea at other times.

But I can't think of anything else. It's as if all the progress in CS kind of sidestepped the file systems.

berkeman · Feb 11, 2023

Wear leveling for media that have limited endurance (like flash)...?

https://www.techtarget.com/searchstorage/definition/write-endurance

jedishrfu · Feb 11, 2023

Interesting question, I never knew how files were stored on mainframe disks. I knew they used terms like blinks and links for storage size and that lowlevel random access was used to get data. This implied a table that managed the available sectors, their order and platters one could write on.

There was no notion of undeleting files. You could do file recovery though where a tape could be mounted to get an old copy of a file. You had to request it through the computer center help desk And they did the magic recovery.

source code management for programs was done via card decks and mag tape with special control cards indicating what program to alter and what lines were to be replaced. This usually meant changes were seldom, small and required program redesign after the changes became unmanageable.

more on Multics a 1965 paper

https://multicians.org/fjcc4.html

Filip Larsen · Feb 11, 2023

Wikipedia has a nice list of file systems and a selection of comparative attributes:
https://en.wikipedia.org/wiki/Comparison_of_file_systems
(Well, nice, except the formatting of the tables seems to be a bit broken).

Vanadium 50 · Feb 11, 2023

berkeman said:

Wear leveling for media that have limited endurance (like flash)..

That's good. I originally thought of that as SSD firmware, but you need the file system to tell the drive what can and can't be trimmed.

Filip Larsen said:

Wikipedia has a nice list

But most of those features have been around for a very long time.

jedishrfu said:

I never knew how files were stored on mainframe disks

I remember allocating blocks by hand. You'd tell the system you needed N blocks and it would give you the space. You could give it a mnemonic alias, but that was the closest thing to a file name.

PeterDonis · Feb 11, 2023

Vanadium 50 said:

what improvements have been made in file systems in the last 50 or so years, What was not present in Multics or ITS or early unix..."

How about journaling filesystems? IIRC those didn't come into use until the 1990s.

Vanadium 50 · Feb 11, 2023

I think I was using IBM JFS, at least a beta version, in the late 80's.. But I don't think JFS was the first. CICS had a ROLLBACK command, and I think Tandem had something similar.

But I think there is some new feature related to this: snapshots. That uses journaling in a very different way.

jedishrfu · Feb 12, 2023

Wow, those bring back memories of IDS and ISP databases. IDS was the hot baby at the time with very fast lookup but had the problem getting entangled in its own data and requiring a custom program to unload it and reload it.

ISP for Indexed Sequential Processing (GE name // IBM used ISAM) was a simpler paged database where each as records were added they were inserted into a page and when the page got full a new overflow page was allocated and linked to and the record was written there.

A standard utility could follow the records sequentially offloading them to tape and later reloading them back onto pages. A percentage was used to indicate how full a page should allowing for a nominal number of record inserts before an overflow page was allocated.

Filip Larsen · Feb 12, 2023

Vanadium 50 said:

But most of those features have been around for a very long time.

Yes, it was mainly just to have a nice overview of existing file systems in use. Last I tried, some years ago, to get up to speed on new advanced features of newer file systems I seem to recall ZFS being prominent on the list for its inclusion of most advanced features at that time, and it seems to still be the case.

I am no expert, but it somewhat seems file system technology has kind of stabilized and its difficult to imagine disruptive new features being added. Storage innovation last decade or so has mostly seemed to be oriented towards cloud and network storage technologies and less towards features for the "stand-alone" OS file system.

.Scott · Feb 12, 2023

Vanadium 50 said:

Triggered by the restore data from a deleted file thread, I was thinking about what improvements have been made in file systems in the last 50 or so years, What was not present in Multics or ITS or early unix..."

I have been writing code professionally for more than 50 years, so...

The first improvement is that there are file systems. Fifty years ago, it was normal to allocate disk cylinders to different tasks as a method of optimizing seek times. I had a coworker who told me that his old boss told him to only access the disk through a subroutine the boss had written - because the hardware supported the feature of seeking to more cylinders than were available. I called that "lathe mode".

On the IBM1620 we had at Lowell Tech, the 10Mbyte hard drive was arranged with code and a files area, but students were expected to keep their code on card decks and their output to printed form - paper tape was also available.

You mentioned "sparse files". In the '70's and early '80's, some systems would support B-Tree style random access files. So you could skip the file pointer around on writes and it would only allocate for sectors that were required.

As systems became multi-processing, with two or more processes writing to disk at the same time, the file systems needed to become re-entrant - and the file structures and system needed to mediate between applications that had no means of cooperation.

It's hard to separate file system changes from the changing requirements created by different hardware. The file systems are expected to handle fragmentation issues on their own. To guarantee a contiguous file, 50 years ago, you could allocate a file size before writing to it. Now, with SSD, it's not even an issue. The cost of fragmentation is imperceptible.

Vanadium 50 · Feb 12, 2023

That's a very good point that problems today are different. Compared to ~40 years ago, CPUs are maybe 50,000x faster, but memory only 10x. Spinning disks are a million times bigger but only 1000x faster.

The same technology can serve different purposes. Back then, we compressed data to save space. Today we do it to save time. The reason I compress my data is not to get 10% more capacity - it's to get better speed: 10% more cache hits and 10% faster transfer times.

Filip Larsen said:

ZFS

I like ZFS. I run it at home. I suspect that there is little it can do that GPFS can not (except maybe be run at home in a practical manner)

File System Improvements in a Half Century

Discussion Overview

Discussion Character

Main Points Raised

Areas of Agreement / Disagreement

Contextual Notes

Similar threads

How to increase phone signal strength by lying about it

Who is responsible for the software when AI takes over programming?

Use of AI (ML/DL) in Science

Could the reason why I can't select any kernels in VS Code be this error?

How useful is this if I want to begin programming?

Insights Remote Operated Gate Control System

Insights AI Enriched Problem Solving

Insights Thinking Outside The Box Versus Knowing What’s In The Box

Insights Why Entangled Photon-Polarization Qubits Violate Bell’s Inequality

Insights Quantum Entanglement is a Kinematic Fact, not a Dynamical Effect

Insights What Exactly is Dirac’s Delta Function? - Insight