What is the fastest way to check for file differences in a Makefile generator?

  • Thread starter Thread starter rohanprabhu
  • Start date Start date
  • Tags Tags
    File
Click For Summary

Discussion Overview

The discussion revolves around methods for efficiently checking file differences in a Makefile generator, focusing on determining which files have changed since the last run. Participants explore various approaches, including using file modification dates, file sizes, and checksums, with an emphasis on performance due to potentially large file lists.

Discussion Character

  • Exploratory
  • Technical explanation
  • Debate/contested

Main Points Raised

  • One participant suggests using a filesize check followed by a checksum to determine file changes, arguing that filesize checks are faster.
  • Another participant proposes checking the last modified date of each file using the Win32 API function GetFileTime() as a straightforward method.
  • There is a discussion about the potential for checksum collisions and whether relying solely on checksums is sufficient.
  • A participant notes that not all file systems accurately record creation and last access times, which could affect the reliability of using timestamps for file comparison.
  • Some participants express uncertainty about the effectiveness of using only filesize and checksums versus incorporating timestamps.

Areas of Agreement / Disagreement

Participants present multiple competing views on the best approach to check for file changes, with no consensus reached on a single method. There is ongoing debate about the reliability and efficiency of different strategies.

Contextual Notes

Limitations include the variability in how different file systems handle timestamps, which may affect the choice of method for determining file changes.

rohanprabhu
Messages
410
Reaction score
2
I am making a Makefile generator and for that, I need something like:

There is a list of files. When the generator is run, I need to know that which files out of those have changed since the last time the generator was run so as to not to include these files in the compiling list. I was thinking first a filesize check and if they match, a checksum. Would that do?

Since the program i'll be making would be Windows-native, so I can use anything from the Win32 API, something that would allow me to check the last file modified date??

Since the list of files could be HUGE.. i need to know the fastest way to do this. Any help is appreciated.

thanks,
rohan
 
Technology news on Phys.org
1- the generator reads the last time it was run (from a "special file")
2- for each file in the list, the generator checks the last modified date
3- at the end of the whole process, the generator should "touch" the "special file" in order to store the new timestamp
 
Rogerio said:
1- the generator reads the last time it was run (from a "special file")
2- for each file in the list, the generator checks the last modified date
3- at the end of the whole process, the generator should "touch" the "special file" in order to store the new timestamp

Well.. i quite had this method figured out.. i mean that of maintaining an index and using it to store the last modified data or the md5 checksum.

but what i needed to know was how to get the last modified date for a file? Also, if I don't use date but purely filesize and checksum, will it be fine? [I mean collision wise as well as speed wise]
 
In the win32 API I think the function you are looking for is GetFileTime()

You can go on checksum alone if you want to, but it means more work. Also, there is no need to check the filesize if you have a checksum.

The easiest approach is checking the timestamps though. And there is no need to save the time stamps in an index-file, just compare the "modified" time of the source files against the "created" time of the binary.

k
 
kenewbie said:
In the win32 API I think the function you are looking for is GetFileTime()

thanks for that. i'll look on MSDN for that

You can go on checksum alone if you want to, but it means more work. Also, there is no need to check the filesize if you have a checksum.

Actually.. the plan was since a filesize check is much faster than a checksum, i'll first do a filesize check. If filesizes don't match, then for sure i need to add the file to the modified list. But in case they do match, I'll do a checksum.

also, using a filesize check with checksums can reduce the checksum collision rate a bit...

EDIT: Got it here: http://msdn2.microsoft.com/en-us/library/ms724320(VS.85).aspx

but there is a problem associated with it:

Not all file systems can record creation and last access times and not all file systems record them in the same manner. For example, on FAT, create time has a resolution of 10 milliseconds, write time has a resolution of 2 seconds, and access time has a resolution of 1 day (really, the access date). Therefore, the GetFileTime function may not return the same file time information set using SetFileTime. NTFS delays updates to the last access time for a file by up to one hour after the last access.

hence i think i'll go with the filesize/checksum thing.
 
rohanprabhu said:
Actually.. the plan was since a filesize check is much faster than a checksum, i'll first do a filesize check. If filesizes don't match, then for sure i need to add the file to the modified list. But in case they do match, I'll do a checksum.

Ok, that makes sense.

k
 

Similar threads

  • · Replies 29 ·
Replies
29
Views
4K
  • · Replies 6 ·
Replies
6
Views
4K
Replies
65
Views
5K
  • · Replies 4 ·
Replies
4
Views
8K
  • · Replies 11 ·
Replies
11
Views
2K
  • · Replies 34 ·
2
Replies
34
Views
5K
  • · Replies 15 ·
Replies
15
Views
3K
  • · Replies 1 ·
Replies
1
Views
3K
  • · Replies 2 ·
Replies
2
Views
2K
  • · Replies 3 ·
Replies
3
Views
11K