Time complexity for an OS to find a file/directory?

Click For Summary

Discussion Overview

The discussion revolves around the time complexity for an operating system to locate a file or directory, particularly in the context of systems like Windows and Unix/Linux. Participants explore various indexing methods and their implications on search times, considering both theoretical and practical aspects.

Discussion Character

  • Technical explanation
  • Debate/contested
  • Mathematical reasoning

Main Points Raised

  • Some participants propose that the time complexity for finding a file could be O(1), particularly in ideal scenarios.
  • Others argue that older operating systems utilize indexing, leading to a time complexity of approximately ln(N), which is slower than O(1).
  • A participant clarifies that indexing is done by hashing filenames and searching a binary tree of hashes, which allows for faster searches due to shorter hash lengths.
  • There is a mention of the potential for worst-case scenarios, such as when files are newly added and not yet indexed, or when using complex search patterns like regular expressions.
  • Some participants question whether the time complexity for binary tree searches is O(log n), and this is affirmed by others.
  • One participant notes that while hash table lookups can be O(1) in ideal conditions, real-world factors such as collisions can affect performance.

Areas of Agreement / Disagreement

Participants express differing views on the time complexity of file searches, with no consensus reached on a definitive answer. The discussion includes multiple competing models and acknowledges the complexity of real-world implementations.

Contextual Notes

The discussion highlights limitations such as the dependence on specific operating system implementations, the variability in indexing methods, and the impact of real-world conditions on theoretical performance metrics.

kolleamm
Messages
476
Reaction score
44
What is the time complexity for an OS to find a file? Is it O(1) time?

Let's say for example, you had a billion files in a single folder, and you wanted to load a file into a string in your program, would the system find a specific file right away, or would there be a longer wait?

Let's assume an OS like Windows.
 
Technology news on Phys.org
kolleamm said:
What is the time complexity for an OS to find a file? Is it O(1) time?

Let's say for example, you had a billion files in a single folder, and you wanted to load a file into a string in your program, would the system find a specific file right away, or would there be a longer wait?

Let's assume an OS like Windows.
Older OS (since Windows 2000 or NT 4) have the utility for files to be indexed, therefore time to find a file in indexed database is ~ln(N), which is slightly slower than O(0). The indexing is active by default and run fully automatically in Windows 7 or Windows 10, but is absent in Windows 8.
Most of Unix/Linux versions also use a sort of indexing/hashing of file tree.
 
  • Like
Likes   Reactions: kolleamm
trurle said:
Older OS (since Windows 2000 or NT 4) have the utility for files to be indexed, therefore time to find a file in indexed database is ~ln(N), which is slightly slower than O(0). The indexing is active by default and run fully automatically since Windows 7.
Do you mean O (log n)? Which would be the time complexity for a binary tree search.
 
  • Like
Likes   Reactions: trurle
kolleamm said:
Do you mean O (log n)? Which would be the time complexity for a binary tree search.
Yes, exactly
 
trurle said:
Yes, exactly
Ah I see, and by indexing do you mean alpha numerical order? I'm guessing by the ascii values of the characters?
 
kolleamm said:
Ah I see, and by indexing do you mean alpha numerical order? I'm guessing by the ascii values of the characters?
As i know, indexing is done by hashing the filename, and then by searching binary tree of hashes. Because hashes are shorter than full-path filenames, (4 to 16 bytes), the search is faster. Of course, this way do not work for templates, only for exact file names.
 
  • Like
Likes   Reactions: kolleamm
trurle said:
As i know, indexing is done by hashing the filename, and then by searching binary tree of hashes. Because hashes are shorter than filenames, (4 to 16 bytes), the search is faster. Of course, this way do not work for templates, only for exact file names.
Wow I never knew, thanks for the info!
 
@kolleamm , The science of sorting and searching is very advanced. I recommend the classical book on the subject.

Donald Knuth's "The Art and Science of Computer Programming, Volume 3"
 
  • Like
Likes   Reactions: Klystron, QuantumQuest, FactChecker and 1 other person
anorlunda said:
Donald Knuth's "The Art and Science of Computer Programming, Volume 3"

"THE ART OF COMPUTER PROGRAMMING, Volume 3/Sorting and Searching" Donald Knuth, ADDISON-WESLEY PUBLISHING COMPANY, Reading, Massachusetts. 1973, ISBN 0-201-03803-X

There are later editions.

'The Bible.' Highly recommended!

Cheers,
Tom
 
  • Like
Likes   Reactions: Klystron, FactChecker and anorlunda
  • #10
It's a huge text tree search. Lots of possible searches, and even if indexing, lots of ways to index (and keep an index updated). Worst case can be quite bad, especially if a file was just added and hasn't made it into the index yet, or if you are using regular expressions and there are a gazillion candidates for consideration.
 
  • #11
But wouldn't a hash table lookup be O(1)?
 
  • #12
Well, yeah, in the 'ideal' implementations. Once you hit the 'real' world there are either collisions or you are over provisioning the algorithm, the storage space, or both.
 
  • Like
Likes   Reactions: Vanadium 50 and DanielChin

Similar threads

  • · Replies 17 ·
Replies
17
Views
3K
Replies
81
Views
7K
  • · Replies 19 ·
Replies
19
Views
2K
  • · Replies 18 ·
Replies
18
Views
2K
  • · Replies 5 ·
Replies
5
Views
2K
  • · Replies 2 ·
Replies
2
Views
3K
  • · Replies 3 ·
Replies
3
Views
2K
  • · Replies 7 ·
Replies
7
Views
2K
  • · Replies 0 ·
Replies
0
Views
2K
Replies
19
Views
4K