Bash script - Monitor File Writes?

In summary: But if it's not, then you can get rid of the sleep, and just do a "while [ last_flow_file -nt next_to_last_flow_file ]; do sleep N; done". The sleep in there is going to be the amount of time it takes to write the file, plus a little bit just in case. So the next time it fires (say, 5 minutes later), it'll either be done writing, or it'll still be writing, but it'll be done soon. So instead of sleeping for a fixed amount of time, it'll sleep for the amount of time it takes to write a file, plus 5 minutes.Sorry to keep changing my mind, but I think this is the best
  • #1
minger
Science Advisor
1,496
2
I have a small problem that I would like to solve. I have an idea, but it's kind of half-assed and I'm not sure if there's a better way.

The problem: I have a CFD solution that writes out a solution file everything 5 minutes or so. I currently am writing out 32 total solution files. So, when 32 files are written, the next one will overwrite the first flow file.

What I would like to do is create a "convergence monitor". So, I want to take each solution file and compare it to the previous file. I want to then write the time and change to file.

My half-assed approached is to just create a program that opens all 32 files and calculates the convergence for each file. I would then create a crontab that runs the executable every 5 minutes or so. This would seem to work, but it doesn't seem like an elegant way to solve it.

My question is if there is a way to monitor the folder and simply run the program say when the last flow file is overwritten?
 
Technology news on Phys.org
  • #2
minger said:
My question is if there is a way to monitor the folder and simply run the program say when the last flow file is overwritten?

Sure-- in bash? I dunno, something like this:

Code:
for(( ; ; ))
do
  if [ -e last_flow_file ]
  then
    my_program
  fi
  sleep 5
done

But, more appropriately, you'd write a forking monitor from your program, or a program wrapper. That way, when you're done, your monitor program won't still be running for weeks and weeks. The above obviously would just sit there indefinitely, until you (or the system) killed it.

DaveE
 
  • #3
I agree that the proper way would be to put a wrapper or some subroutine in the program itself...however, you must understand that the CFD code we use/develop is hundreds of modules/etc. Not only would I be hard pressed to find a good spot to put it, but I'd be scared to make a change that could just screw things up.

Not being really experienced at all with shell scripting, may I ask what the '-e' does? I can't seem to find it Googling.

Thanks a bunch.

edit:

OK, I think I found it; the -e in the if statement determines if the file exists. This is a problem though. For example, I am allowing 32 flow files to be written. So, 5 minutes into the run, I already have, flow0001.fast through flow0032.fast written. They then get sequentially overwritten as the solution proceeds.

So, I suppose I rather want something to determine if a file has been recently overwritten. My program right now handles all 32 files. So I only need to determine if, i.e. flow0001.fast is "new". This could be determined by timestamp comparison to flow0002.

Do you have a better solution? Thanks a lot.
 
Last edited:
  • #4
minger said:
Do you have a better solution? Thanks a lot.
1) use the python os module to moniter/spawn jobs?
2) a new folder for every run?

basically:
savedir = "newset%d"
while runs<totalruns:
os.chdir(working dir)
os.mkdir(savedir %runs)
os.chdir(savedir)
os.system(cmd to run program)
runs +=1
 
  • #5
story645 said:
1) use the python os module to moniter/spawn jobs?
2) a new folder for every run?

basically:
savedir = "newset%d"
while runs<totalruns:
os.chdir(working dir)
os.mkdir(savedir %runs)
os.chdir(savedir)
os.system(cmd to run program)
runs +=1

To be honest, I know zero python. However, when you say a new folder for every run, I'm assuming that you mean a new folder for each file write. The run I have going now will take at least 15000 time steps, and I am writing out files every 5 time steps. That means 3000 files will be written (only 32 at any time saved though).

I cannot decrease the frequency the file writing because I need the 32 files to cover 2 complete cycles of a cyclic flow field (and yes, it takes that bloody long to converge).
 
  • #6
minger said:
That means 3000 files will be written (only 32 at any time saved though).
I've had runs where I needed to write out something like 1000 files, so you have my sympathy here-I'm just thinking of the safest way for you to not run into a race condition, and that's by keeping your writes in separate locations. If you have the file space, I don't see why that's an issue. You then write another script or two to parse/process all those folders-assuming you keep a consistent naming convention, it's not all that difficult.

Actually, I just thought up a hack that may work. Commit the 32 files to some repo (svn, mercurial, RCS), and just recommit every run. Then you just look through the versions/run some diffs to see the changes.

To be honest, I know zero python
*shrugs* As a language it's dead simple, but the bash equivalent is something like
for (;;;)
chdir workingdir
mkdir savedir
program
 
Last edited:
  • #7
minger said:
OK, I think I found it; the -e in the if statement determines if the file exists. This is a problem though. For example, I am allowing 32 flow files to be written. So, 5 minutes into the run, I already have, flow0001.fast through flow0032.fast written. They then get sequentially overwritten as the solution proceeds.

I haven't used them before, but this page lists some other file test operators that might be suitable, like these:

-N
file modified since it was last read

f1 -nt f2
file f1 is newer than f2

So, I would guess the -nt check (stands for "newer than?") is probably what you want, assuming that the files will appear in order. Something like this:

Code:
for(( ; ; ))
do
  if [ last_flow_file -nt next_to_last_flow_file ]
  then
    touch next_to_last_flow_file
    my_program
  fi
  sleep 5
done

I'm no bash programmer, but something along those lines might be worth a shot...

DaveE
 
  • #8
Dave, it sounds like that would work very nicely! In my case, I can sleep for much longer, but basically, yea, that's almost perfect.

Thanks!
 
  • #9
Actually, one other note-- if the files are large, and take a long time to write to, it may be possible that you start prematurely. Essentially, that you might start running the aggregate program before the last file is really finished writing. Dunno if that's the case or not. If so, you might consider writing the file to a temporary file name until it's finished, or modifying the bash script to sleep a few seconds prior to running the aggregate program (if it's of a predictable speed)

DaveE
 
  • #10
The files are kinda large. Fortunately they are unformatted, so they aren't ridiculous. As far as when the moons align and the program starts before the file is written, well that's fine. That just means that one data point will be screwed up. Considering the thousands I'll have, that's an acceptable loss.

Thanks again,
 

1. How can I monitor a specific file for writes in a bash script?

To monitor a specific file for writes in a bash script, you can use the inotifywait command. This command allows you to watch a file or directory for changes, including writes. You can specify the file you want to monitor and use the --format option to specify the output format.

2. Can I monitor multiple files for writes at the same time?

Yes, you can use the inotifywait command to monitor multiple files for writes simultaneously. You can either specify multiple files as arguments to the command or use a wildcard to monitor all files within a directory.

3. How often will the bash script check for file writes?

The frequency at which the bash script checks for file writes depends on the implementation of the inotifywait command. By default, it will check for changes every second. However, you can use the --interval option to specify a different interval in seconds.

4. What happens if the file I am monitoring is deleted?

If the file you are monitoring is deleted, the inotifywait command will exit with an error. This means that your bash script will need to handle this error and potentially restart the monitoring process if the file is recreated.

5. Can I perform other actions when a write occurs on the monitored file?

Yes, you can use the inotifywait command with the --exec option to specify a command or script to execute when a write occurs on the monitored file. This allows you to perform additional actions, such as logging or sending notifications, when a write occurs.

Similar threads

  • Programming and Computer Science
Replies
9
Views
2K
  • Programming and Computer Science
Replies
33
Views
2K
  • Programming and Computer Science
Replies
9
Views
851
  • Programming and Computer Science
3
Replies
81
Views
5K
  • Programming and Computer Science
Replies
16
Views
3K
  • Programming and Computer Science
Replies
29
Views
2K
  • Programming and Computer Science
Replies
0
Views
483
  • Programming and Computer Science
Replies
22
Views
907
  • Programming and Computer Science
Replies
3
Views
1K
  • Programming and Computer Science
Replies
3
Views
4K
Back
Top