Dismiss Notice
Join Physics Forums Today!
The friendliest, high quality science and math community on the planet! Everyone who loves science is here!

Bash script - Monitor File Writes?

  1. May 11, 2010 #1

    minger

    User Avatar
    Science Advisor

    I have a small problem that I would like to solve. I have an idea, but it's kind of half-assed and I'm not sure if there's a better way.

    The problem: I have a CFD solution that writes out a solution file everything 5 minutes or so. I currently am writing out 32 total solution files. So, when 32 files are written, the next one will overwrite the first flow file.

    What I would like to do is create a "convergence monitor". So, I want to take each solution file and compare it to the previous file. I want to then write the time and change to file.

    My half-assed approached is to just create a program that opens all 32 files and calculates the convergence for each file. I would then create a crontab that runs the executable every 5 minutes or so. This would seem to work, but it doesn't seem like an elegant way to solve it.

    My question is if there is a way to monitor the folder and simply run the program say when the last flow file is overwritten?
     
  2. jcsd
  3. May 11, 2010 #2
    Sure-- in bash? I dunno, something like this:

    Code (Text):
    for(( ; ; ))
    do
      if [ -e last_flow_file ]
      then
        my_program
      fi
      sleep 5
    done
    But, more appropriately, you'd write a forking monitor from your program, or a program wrapper. That way, when you're done, your monitor program won't still be running for weeks and weeks. The above obviously would just sit there indefinitely, until you (or the system) killed it.

    DaveE
     
  4. May 11, 2010 #3

    minger

    User Avatar
    Science Advisor

    I agree that the proper way would be to put a wrapper or some subroutine in the program itself....however, you must understand that the CFD code we use/develop is hundreds of modules/etc. Not only would I be hard pressed to find a good spot to put it, but I'd be scared to make a change that could just screw things up.

    Not being really experienced at all with shell scripting, may I ask what the '-e' does? I can't seem to find it Googling.

    Thanks a bunch.

    edit:

    OK, I think I found it; the -e in the if statement determines if the file exists. This is a problem though. For example, I am allowing 32 flow files to be written. So, 5 minutes into the run, I already have, flow0001.fast through flow0032.fast written. They then get sequentially overwritten as the solution proceeds.

    So, I suppose I rather want something to determine if a file has been recently overwritten. My program right now handles all 32 files. So I only need to determine if, i.e. flow0001.fast is "new". This could be determined by timestamp comparison to flow0002.

    Do you have a better solution? Thanks a lot.
     
    Last edited: May 11, 2010
  5. May 11, 2010 #4
    1) use the python os module to moniter/spawn jobs?
    2) a new folder for every run?

    basically:
    savedir = "newset%d"
    while runs<totalruns:
    os.chdir(working dir)
    os.mkdir(savedir %runs)
    os.chdir(savedir)
    os.system(cmd to run program)
    runs +=1
     
  6. May 11, 2010 #5

    minger

    User Avatar
    Science Advisor

    To be honest, I know zero python. However, when you say a new folder for every run, I'm assuming that you mean a new folder for each file write. The run I have going now will take at least 15000 time steps, and I am writing out files every 5 time steps. That means 3000 files will be written (only 32 at any time saved though).

    I cannot decrease the frequency the file writing because I need the 32 files to cover 2 complete cycles of a cyclic flow field (and yes, it takes that bloody long to converge).
     
  7. May 11, 2010 #6
    I've had runs where I needed to write out something like 1000 files, so you have my sympathy here-I'm just thinking of the safest way for you to not run into a race condition, and that's by keeping your writes in separate locations. If you have the file space, I don't see why that's an issue. You then write another script or two to parse/process all those folders-assuming you keep a consistent naming convention, it's not all that difficult.

    Actually, I just thought up a hack that may work. Commit the 32 files to some repo (svn, mercurial, RCS), and just recommit every run. Then you just look through the versions/run some diffs to see the changes.

    *shrugs* As a language it's dead simple, but the bash equivalent is something like
    for (;;;)
    chdir workingdir
    mkdir savedir
    program
     
    Last edited: May 12, 2010
  8. May 12, 2010 #7
    I haven't used them before, but this page lists some other file test operators that might be suitable, like these:

    -N
    file modified since it was last read

    f1 -nt f2
    file f1 is newer than f2

    So, I would guess the -nt check (stands for "newer than?") is probably what you want, assuming that the files will appear in order. Something like this:

    Code (Text):
    for(( ; ; ))
    do
      if [ last_flow_file -nt next_to_last_flow_file ]
      then
        touch next_to_last_flow_file
        my_program
      fi
      sleep 5
    done
    I'm no bash programmer, but something along those lines might be worth a shot...

    DaveE
     
  9. May 12, 2010 #8

    minger

    User Avatar
    Science Advisor

    Dave, it sounds like that would work very nicely! In my case, I can sleep for much longer, but basically, yea, that's almost perfect.

    Thanks!
     
  10. May 12, 2010 #9
    Actually, one other note-- if the files are large, and take a long time to write to, it may be possible that you start prematurely. Essentially, that you might start running the aggregate program before the last file is really finished writing. Dunno if that's the case or not. If so, you might consider writing the file to a temporary file name until it's finished, or modifying the bash script to sleep a few seconds prior to running the aggregate program (if it's of a predictable speed)

    DaveE
     
  11. May 12, 2010 #10

    minger

    User Avatar
    Science Advisor

    The files are kinda large. Fortunately they are unformatted, so they aren't ridiculous. As far as when the moons align and the program starts before the file is written, well that's fine. That just means that one data point will be screwed up. Considering the thousands I'll have, that's an acceptable loss.

    Thanks again,
     
Share this great discussion with others via Reddit, Google+, Twitter, or Facebook