File System Improvements in a Half Century

  • Thread starter Thread starter Vanadium 50
  • Start date Start date
  • Tags Tags
    File System
Click For Summary

Discussion Overview

The discussion revolves around the improvements made in file systems over the past 50 years, particularly in comparison to early systems like Multics, ITS, and Unix. Participants explore various advancements and features that have emerged, excluding capacity limits, and reflect on the evolution of file system technology.

Discussion Character

  • Exploratory
  • Technical explanation
  • Debate/contested

Main Points Raised

  • Some participants mention sparse files, deduplication, and copy-on-write as significant advancements, though they note that sparse files are niche and copy-on-write can lead to fragmentation.
  • Wear leveling for flash media is highlighted as an important improvement, with discussions on its relationship to file system functionality.
  • Journaling file systems are noted as a development that became common in the 1990s, with some participants recalling specific implementations like IBM JFS.
  • Snapshots are introduced as a new feature related to journaling, although the specifics of their implementation are not fully agreed upon.
  • Participants reflect on historical practices in file management, such as manual block allocation and the use of tape for file recovery, indicating a significant shift in file system management over time.
  • Some express the view that while many features have been around for a long time, recent innovations in storage technology seem to focus more on cloud and network solutions rather than standalone file systems.
  • There is a recognition that the challenges faced by file systems today differ from those of the past, with advancements in hardware influencing file system design and functionality.

Areas of Agreement / Disagreement

Participants do not reach a consensus on the most significant improvements in file systems, and multiple competing views on the relevance and impact of various features remain present throughout the discussion.

Contextual Notes

Some limitations are noted, such as the dependence on specific hardware capabilities and the evolving nature of user requirements, which complicate the assessment of file system advancements.

Vanadium 50
Staff Emeritus
Science Advisor
Education Advisor
Gold Member
Dearly Missed
Messages
35,005
Reaction score
21,707
Triggered by the restore data from a deleted file thread, I was thinking about what improvements have been made in file systems in the last 50 or so years, What was not present in Multics or ITS or early unix..."

I'm not talking about capacity limits. At the birth of the PC, one could have build FAT that would have worked on today's disks - a million times bigger - but what would have been the point?

I can think of three candidates:
1. Sparse files
2. Deduplication
3. Copy-on-write

Sparse files are definietly a niche. Copy on write can promote fragmentation, which was an issue back in the day, so I could understand why it didn't happen. ("You want me to do what?") Deduplicaton is a good ides when it is, and a very very bad idea at other times.

But I can't think of anything else. It's as if all the progress in CS kind of sidestepped the file systems.
 
Technology news on Phys.org
Interesting question, I never knew how files were stored on mainframe disks. I knew they used terms like blinks and links for storage size and that lowlevel random access was used to get data. This implied a table that managed the available sectors, their order and platters one could write on.

There was no notion of undeleting files. You could do file recovery though where a tape could be mounted to get an old copy of a file. You had to request it through the computer center help desk And they did the magic recovery.

source code management for programs was done via card decks and mag tape with special control cards indicating what program to alter and what lines were to be replaced. This usually meant changes were seldom, small and required program redesign after the changes became unmanageable.

more on Multics a 1965 paper

https://multicians.org/fjcc4.html
 
Last edited:
berkeman said:
Wear leveling for media that have limited endurance (like flash)..
That's good. I originally thought of that as SSD firmware, but you need the file system to tell the drive what can and can't be trimmed.

Filip Larsen said:
Wikipedia has a nice list
But most of those features have been around for a very long time.

jedishrfu said:
I never knew how files were stored on mainframe disks
I remember allocating blocks by hand. You'd tell the system you needed N blocks and it would give you the space. You could give it a mnemonic alias, but that was the closest thing to a file name.
 
Vanadium 50 said:
what improvements have been made in file systems in the last 50 or so years, What was not present in Multics or ITS or early unix..."
How about journaling filesystems? IIRC those didn't come into use until the 1990s.
 
  • Like
Likes   Reactions: berkeman and Vanadium 50
I think I was using IBM JFS, at least a beta version, in the late 80's.. But I don't think JFS was the first. CICS had a ROLLBACK command, and I think Tandem had something similar.

But I think there is some new feature related to this: snapshots. That uses journaling in a very different way.
 
Wow, those bring back memories of IDS and ISP databases. IDS was the hot baby at the time with very fast lookup but had the problem getting entangled in its own data and requiring a custom program to unload it and reload it.

ISP for Indexed Sequential Processing (GE name // IBM used ISAM) was a simpler paged database where each as records were added they were inserted into a page and when the page got full a new overflow page was allocated and linked to and the record was written there.

A standard utility could follow the records sequentially offloading them to tape and later reloading them back onto pages. A percentage was used to indicate how full a page should allowing for a nominal number of record inserts before an overflow page was allocated.
 
Vanadium 50 said:
But most of those features have been around for a very long time.
Yes, it was mainly just to have a nice overview of existing file systems in use. Last I tried, some years ago, to get up to speed on new advanced features of newer file systems I seem to recall ZFS being prominent on the list for its inclusion of most advanced features at that time, and it seems to still be the case.

I am no expert, but it somewhat seems file system technology has kind of stabilized and its difficult to imagine disruptive new features being added. Storage innovation last decade or so has mostly seemed to be oriented towards cloud and network storage technologies and less towards features for the "stand-alone" OS file system.
 
  • #10
Vanadium 50 said:
Triggered by the restore data from a deleted file thread, I was thinking about what improvements have been made in file systems in the last 50 or so years, What was not present in Multics or ITS or early unix..."
I have been writing code professionally for more than 50 years, so...

The first improvement is that there are file systems. Fifty years ago, it was normal to allocate disk cylinders to different tasks as a method of optimizing seek times. I had a coworker who told me that his old boss told him to only access the disk through a subroutine the boss had written - because the hardware supported the feature of seeking to more cylinders than were available. I called that "lathe mode".

On the IBM1620 we had at Lowell Tech, the 10Mbyte hard drive was arranged with code and a files area, but students were expected to keep their code on card decks and their output to printed form - paper tape was also available.

You mentioned "sparse files". In the '70's and early '80's, some systems would support B-Tree style random access files. So you could skip the file pointer around on writes and it would only allocate for sectors that were required.

As systems became multi-processing, with two or more processes writing to disk at the same time, the file systems needed to become re-entrant - and the file structures and system needed to mediate between applications that had no means of cooperation.

It's hard to separate file system changes from the changing requirements created by different hardware. The file systems are expected to handle fragmentation issues on their own. To guarantee a contiguous file, 50 years ago, you could allocate a file size before writing to it. Now, with SSD, it's not even an issue. The cost of fragmentation is imperceptible.
 
  • Like
Likes   Reactions: berkeman and Filip Larsen
  • #11
That's a very good point that problems today are different. Compared to ~40 years ago, CPUs are maybe 50,000x faster, but memory only 10x. Spinning disks are a million times bigger but only 1000x faster.

The same technology can serve different purposes. Back then, we compressed data to save space. Today we do it to save time. The reason I compress my data is not to get 10% more capacity - it's to get better speed: 10% more cache hits and 10% faster transfer times.
Filip Larsen said:
ZFS
I like ZFS. I run it at home. I suspect that there is little it can do that GPFS can not (except maybe be run at home in a practical manner)
 
  • Like
Likes   Reactions: Filip Larsen

Similar threads

  • · Replies 14 ·
Replies
14
Views
2K
  • · Replies 9 ·
Replies
9
Views
3K
  • · Replies 14 ·
Replies
14
Views
2K
  • · Replies 19 ·
Replies
19
Views
9K
Replies
10
Views
5K
  • · Replies 4 ·
Replies
4
Views
4K
Replies
2
Views
3K
  • · Replies 7 ·
Replies
7
Views
3K
  • · Replies 7 ·
Replies
7
Views
4K