Dismiss Notice
Join Physics Forums Today!
The friendliest, high quality science and math community on the planet! Everyone who loves science is here!

Verifying directory structure

  1. Nov 12, 2006 #1

    0rthodontist

    User Avatar
    Science Advisor

    Are there standard tools for verifying that a directory structure takes a certain form? This is for when a program needs certain minimum files with certain name constraints to be in certain places and directories with certain name constraints to be in certain places. It would specify things like "there must be a file called files.dat in the home directory, and there must be some number of directories with numeric names, each of which has a file called configure.dat." It's not hard to write application-specific means of verifying this but it would be nice if there is a general framework.
     
  2. jcsd
  3. Nov 12, 2006 #2

    jim mcnamara

    User Avatar
    Science Advisor
    Gold Member

    No - no OS provides that. Most OSes have system calls (like sysconf() in UNIX) that tells you a lot about a system - but not application expectations. And I don't know of a utility to do that. You can try scrounging around sourceforge and see.

    You'll probably have to build your own front end. It's just a matter of stat()-ing for the existence of some directories and a few files. You may want to require the existence of an init file (or rule file) or a database table that lists the location of these files/directories. Programs do this all the time.
     
  4. Nov 12, 2006 #3

    0rthodontist

    User Avatar
    Science Advisor

    Well, it would be interesting if there is no current tool that does this, because it would certainly be useful and would not be terribly difficult to write. I was thinking that directory structure could be abstractly specified using regular expressions or something similar and then verified in a standard way.
     
  5. Nov 12, 2006 #4

    chroot

    User Avatar
    Staff Emeritus
    Science Advisor
    Gold Member

    I don't see any reason to make an entire abstracted "framework" for a task that requires perhaps ten lines of code, is different for every program, and occupies the CPU for perhaps a couple of milliseconds.

    Unless you're scanning the filesystem for many thousands of files, this framework would be of little value. Of course, if you really need to store thousands of files on the filesystem, you probably have a pretty lousy design.

    Sometimes I sense that you have the "abstractaholism" suffered by many programmers -- particularly new ones. I went through it, too. It's characterized by a desire to modularize, generalize, and abstract even the simplest actions into fifteen cooperating classes.

    My best advice would be to spend your effort on the 20% of your code that consumes 80% of your resources -- memory, hard disk space, CPU time, man-hours spent coding, etc. -- and don't worry about the rest. Assuredly, the trivial task of looking for a few files is not worthy of this kind of attention.

    - Warren
     
  6. Nov 12, 2006 #5

    0rthodontist

    User Avatar
    Science Advisor

    Optimization is not the aim. In fact, since it would parse regular expressions (or I'm thinking now, expressions in backus-naur form), it would probably be slower by an insignificant amount. For most applications optimization is a very minor concern, which is why we have scripting languages.

    You can never have too much abstraction and re-usability available to you.
     
  7. Nov 12, 2006 #6

    chroot

    User Avatar
    Staff Emeritus
    Science Advisor
    Gold Member

    Spoken like a true sufferer of abstractaholism. When you spend days writing and debugging the most general possible solution to looking for files in the filesystem -- when four calls to stat() would suffice for 90% of your programs -- you absolutely are using too much abstraction. Spend your energy on another part of your program that matters.

    There's no way at all that you could possibly argue that your grammar-parsing "framework" is anywhere close to the speed of the equivalent calls to stat(). In fact, the time (and memory) required to just load the regular expression library would be orders of magnitude slower than just calling stat().

    I remember those days... when I almost felt dirty for writing a line of code that actuallly did something.

    - Warren
     
  8. Nov 12, 2006 #7

    0rthodontist

    User Avatar
    Science Advisor

    We are talking milliseconds here. Even a two order of magnitude difference wouldn't matter. You think that a user cares that your program takes an extra .01 seconds to load a library?

    A good programmer builds up his own private library of programming tools. This looks like a good candidate, one that may be re-used many times.
     
  9. Nov 12, 2006 #8

    chroot

    User Avatar
    Staff Emeritus
    Science Advisor
    Gold Member

    Then knock yourself out, man. You don't need my (or anyone else's) approval. I just think it's a pretty boring thing to spend so much time on. It doesn't appear (to me) to be the sort of task that's big enough, hard enough, or meaningful enough to be made into reusable "framework."

    - Warren
     
  10. Nov 14, 2006 #9
    Python has lots of Posix calls, generally in the library module "os". They also have tools that are a little bit above stat.

    Look for a Python module/function/class called "walk": it will walk a directory tree for you, performing a user-supplied function at each node. If you do it right, you should be able to carry some state with you. Even if you're not using Python, it should give you some ideas, for better or worse. Another option is to construct an object tree that mirrors a directory tree and use a Visitor pattern.

    As Warran points out, it's important to match the expense and complexity of developing something like this with the scale of the overall project and this component's role in that project.
     
  11. Nov 14, 2006 #10

    jim mcnamara

    User Avatar
    Science Advisor
    Gold Member

    I agree with Warren. Checking file existence is a done deal, and creating some abstracted framework to do it is analagous to "circle-squaring" in the computer programming tools domain.

    If it were a reasonable problem, it is also reasonable to assume tools would already exist -- even if they were poor. UNIX abounds with tools. Esp. since the advent of open source. And no tool I can locate does anything like this - I had a rummage at sourceforge and found nothing like your requirements.

    On UNIX, you can integrate regex.h calls and ftw to create a custom file search.
     
  12. Nov 14, 2006 #11

    0rthodontist

    User Avatar
    Science Advisor

    You're right--because it doesn't exist, it's not useful.
     
  13. Nov 14, 2006 #12

    chroot

    User Avatar
    Staff Emeritus
    Science Advisor
    Gold Member

    Sarcasm aside, if you think it's worth writing, then write it. It doesn't really matter if I, or anyone else, thinks it's useful.

    - Warren
     
  14. Nov 14, 2006 #13

    jim mcnamara

    User Avatar
    Science Advisor
    Gold Member

    The point I didn't make clearly:

    This problem has been around since I started programming like 40+ years ago. It is not new. Since dealing with it on a "onesies" basis has been more than good enough for people who wrote lots of other tools, then maybe it's fair to conclude it ain't all that hard to do it the "hard" way.

    Alternatively stated, abstracting it isn't worth the effort. Or abstracting it and applying the abstraction is much more time inefficient than doing it the old way. Ockham's razor, so to speak.

    However if you want to write it, do it. Put your effort where your feelings are. I'd like to see a very simple abstraction - one or two lines of C/C++ to call a library call or a class.

    My personal belief is that this isn't generally possible because of the required background data. How are you gonna specify all of the requirements to any call without gobs of arguments? Or some kind of data file?

    YMMV.
     
  15. Nov 14, 2006 #14

    0rthodontist

    User Avatar
    Science Advisor

    Well, you would use a data file containing a BNF expression representing the directory structure. I can basically do it, except I'm having trouble making a parseable BNF expression (simpleparse module)

    Basically I do
    Code (Text):

     def tdir(dirname):
        x = os.listdir(dirname)
        string = ''
        for y in x:
            if(os.path.isdir(dirname + '/' + y)):
                string = string + '[' + y + tdir(dirname + '/' + y) + ']'
            else:
                string = string + '(' + y + ')'
        return string
     
    which turns the directory structure into a single string, then make a BNF expression that says what directory structure you want. However I am having some difficulty getting this to work... the simpleparse module matches the string greedily and sometimes won't parse a correct string. It doesn't seem to do full BNF parsing. Specifically I need to parse matched parentheses and brackets to arbitrary depth.
     
    Last edited: Nov 14, 2006
  16. Nov 15, 2006 #15
    If you need some exotic filesystem support you can experiment with this
    http://www.eclipsezone.com/eclipse/forums/t83786.rhtml

    otherwize having a simple try catch block does the job under most of todays OS' that support Java.

    Code (Text):

    File configFile = new File(System.getProperty("user.dir") + File.separator + "Config.properties");
            Properties prop = new Properties();
            try {
                prop.load(new FileInputStream(configFile));
            } catch (FileNotFoundException e) {
        System.out.println("UPS, the config file cannot be found...");
            } catch (IOException e) {
           
            }

     
     
  17. Nov 15, 2006 #16

    0rthodontist

    User Avatar
    Science Advisor

    No, I am having trouble with the simpleparse module, which parses the directory string according to the BNF grammar defining the directory structure.
     
  18. Nov 15, 2006 #17

    Hurkyl

    User Avatar
    Staff Emeritus
    Science Advisor
    Gold Member

    tdir looks odd. Consider the following tree:

    dir0/file0
    dir0/dir1/file1
    dir0/dir1/file2

    tdir("dir0") produces:

    (file0)[dir1(file1)(file2)]

    is that really what you wanted? I suppose it could be. Also, if you're matching too much, then make sure you aren't allowing parentheses to be matched as text. e.g. I would use the regex

    [^\(\)\[\]]+

    to match individual directory and file names.
     
  19. Nov 15, 2006 #18

    0rthodontist

    User Avatar
    Science Advisor

    Yes, that's what I intended, so it is recursive. A file is a single name in round parentheses and a directory is grouped in square brackets along with all of its files and subdirectories. Regex's can't match parentheses.
     
  20. Nov 15, 2006 #19

    Hurkyl

    User Avatar
    Staff Emeritus
    Science Advisor
    Gold Member

    I mean the regex

    .+​

    will happily match the string

    file1)(file2​

    which is a possible source of problems. (of course, I don't know exactly what problem you're having. :tongue:)


    If you need to parse context-free grammars, have you considered using yacc, or bison?
     
  21. Nov 16, 2006 #20

    chroot

    User Avatar
    Staff Emeritus
    Science Advisor
    Gold Member

    Of course regexes can match parentheses. You just need to escape them, as you must with all meaningful metacharacters, .* + etc.

    - Warren
     
Know someone interested in this topic? Share this thread via Reddit, Google+, Twitter, or Facebook

Have something to add?



Similar Discussions: Verifying directory structure
Loading...