[C++] Quickly finding directory files using boostfilesystem

  • Context: C/C++ 
  • Thread starter Thread starter PPeter
  • Start date Start date
  • Tags Tags
    files
Click For Summary
SUMMARY

This discussion focuses on optimizing file searches within a directory containing approximately 14,000 files using the Boost.Filesystem library in C++. The user, Peter, initially considered using directory iterators but faced performance issues due to the large number of files. Suggestions included utilizing Boost.Regex to filter files based on patterns, which can significantly enhance search efficiency. Additionally, for repeated searches, storing filenames in an array and applying a binary search technique was recommended, provided the search string starts with known characters.

PREREQUISITES
  • Familiarity with C++ programming language
  • Understanding of Boost.Filesystem library
  • Knowledge of Boost.Regex for pattern matching
  • Basic concepts of binary search algorithms
NEXT STEPS
  • Implement Boost.Regex to filter files in Boost.Filesystem
  • Learn about binary search algorithms and their implementation in C++
  • Explore Boost.Filesystem API for advanced directory operations
  • Investigate performance optimization techniques for file handling in C++
USEFUL FOR

Software developers, particularly those working with C++ and file management, as well as anyone looking to optimize file searching in large directories using Boost libraries.

PPeter
Messages
3
Reaction score
1
I'm working on a program which copies pdf's from one directory to another using boost filesystem. My problem is that the directory I'm grabbing files from contained about 14000 files (not including files in sub-directories). So iterating through each one to find something isn't very practical.

I just started using the library, so I don't know the ins and outs. As far as I can see, the only way to get information from the directory is through the directory iterators (and I believe it automatically sorts entries in alphanumeric order from what I've seen). I've thought about incrementing the iterator by something more than 1, but anything I read about when doing that (for iterators in general) warns against trouble when it gets close to the end.

My initial plan was to keep breaking all the directory entries into smaller segments to find what I'm looking for. IE: jump to the middle entry between 2 limits, compare that filename with what I'm looking for, then split the appropriate segment into half again. Then keep going until it's found. I'm not sure if something like this is even possible, or suggested when using iterators. So if anyone has any suggestions or ideas, I'm open to hearing them.Thanks,

Peter
 
Technology news on Phys.org
So you're searching for a file by name from a list of 14000 files. If this is a once only search then the iterator is the only way unless boost handles file masks so you could ask for a list of files matching the mask and then search the reduced list iteratively.

If it's a repeated operation then you could iterate through the list and place each file name in an array where you can apply the binary search scheme you mentioned earlier. Although this only works if you know the first few letters of the file name not if the search string is embedded inside the filename.

I found this API reference for boost software:

http://www.boost.org/doc/libs/1_34_0/libs/filesystem/doc/index.htm
 
jedishrfu said:
So you're searching for a file by name from a list of 14000 files. If this is a once only search then the iterator is the only way unless boost handles file masks so you could ask for a list of files matching the mask and then search the reduced list iteratively.

If it's a repeated operation then you could iterate through the list and place each file name in an array where you can apply the binary search scheme you mentioned earlier. Although this only works if you know the first few letters of the file name not if the search string is embedded inside the filename.

I found this API reference for boost software:

http://www.boost.org/doc/libs/1_34_0/libs/filesystem/doc/index.htm
Ah, this is very helpful. It doesn't seem that boost::filesystem has the functionality to do that on it's own, but by using boost:regex to filter, it should be able to speed things up dramatically.

Thanks
 
  • Like
Likes   Reactions: Medicol

Similar threads

  • · Replies 16 ·
Replies
16
Views
4K
  • · Replies 1 ·
Replies
1
Views
4K
  • · Replies 2 ·
Replies
2
Views
3K
  • · Replies 11 ·
Replies
11
Views
2K
  • · Replies 19 ·
Replies
19
Views
2K
  • · Replies 6 ·
Replies
6
Views
2K
  • · Replies 18 ·
Replies
18
Views
2K
Replies
4
Views
5K
  • · Replies 6 ·
Replies
6
Views
12K
  • · Replies 3 ·
Replies
3
Views
1K