Dismiss Notice
Join Physics Forums Today!
The friendliest, high quality science and math community on the planet! Everyone who loves science is here!

Fastest way to file diff

  1. Apr 8, 2008 #1
    I am making a Makefile generator and for that, I need something like:

    There is a list of files. When the generator is run, I need to know that which files out of those have changed since the last time the generator was run so as to not to include these files in the compiling list. I was thinking first a filesize check and if they match, a checksum. Would that do?

    Since the program i'll be making would be Windows-native, so I can use anything from the Win32 API, something that would allow me to check the last file modified date??

    Since the list of files could be HUGE.. i need to know the fastest way to do this. Any help is appreciated.

  2. jcsd
  3. Apr 8, 2008 #2
    1- the generator reads the last time it was run (from a "special file")
    2- for each file in the list, the generator checks the last modified date
    3- at the end of the whole process, the generator should "touch" the "special file" in order to store the new timestamp
  4. Apr 8, 2008 #3
    Well.. i quite had this method figured out.. i mean that of maintaining an index and using it to store the last modified data or the md5 checksum.

    but what i needed to know was how to get the last modified date for a file? Also, if I don't use date but purely filesize and checksum, will it be fine? [I mean collision wise as well as speed wise]
  5. Apr 9, 2008 #4
    In the win32 API I think the function you are looking for is GetFileTime()

    You can go on checksum alone if you want to, but it means more work. Also, there is no need to check the filesize if you have a checksum.

    The easiest approach is checking the timestamps though. And there is no need to save the time stamps in an index-file, just compare the "modified" time of the source files against the "created" time of the binary.

  6. Apr 9, 2008 #5
    thanks for that. i'll look on MSDN for that

    Actually.. the plan was since a filesize check is much faster than a checksum, i'll first do a filesize check. If filesizes don't match, then for sure i need to add the file to the modified list. But in case they do match, I'll do a checksum.

    also, using a filesize check with checksums can reduce the checksum collision rate a bit...

    EDIT: Got it here: http://msdn2.microsoft.com/en-us/library/ms724320(VS.85).aspx

    but there is a problem associated with it:

    hence i think i'll go with the filesize/checksum thing.
  7. Apr 9, 2008 #6
    Ok, that makes sense.

Share this great discussion with others via Reddit, Google+, Twitter, or Facebook