What is the fastest way to check for file differences in a Makefile generator?

  • Thread starter Thread starter rohanprabhu
  • Start date Start date
  • Tags Tags
    File
AI Thread Summary
The discussion revolves around creating a Makefile generator that efficiently identifies changed files since its last run. The proposed method involves checking the last modified date of files, which can be accessed using the Win32 API function GetFileTime. The generator will read the last run timestamp from a special file, compare it against the modified dates of the files in the list, and update the timestamp after processing.While some participants suggest relying solely on checksums for file comparison, it is noted that checking file sizes first can enhance speed and reduce the likelihood of checksum collisions. The consensus leans towards a dual approach: performing a quick file size check initially, followed by a checksum verification if sizes match. Concerns about the reliability of file timestamps across different file systems, particularly with FAT and NTFS, are acknowledged, leading to a preference for the size/checksum method for consistency and efficiency.
rohanprabhu
Messages
410
Reaction score
2
I am making a Makefile generator and for that, I need something like:

There is a list of files. When the generator is run, I need to know that which files out of those have changed since the last time the generator was run so as to not to include these files in the compiling list. I was thinking first a filesize check and if they match, a checksum. Would that do?

Since the program i'll be making would be Windows-native, so I can use anything from the Win32 API, something that would allow me to check the last file modified date??

Since the list of files could be HUGE.. i need to know the fastest way to do this. Any help is appreciated.

thanks,
rohan
 
Technology news on Phys.org
1- the generator reads the last time it was run (from a "special file")
2- for each file in the list, the generator checks the last modified date
3- at the end of the whole process, the generator should "touch" the "special file" in order to store the new timestamp
 
Rogerio said:
1- the generator reads the last time it was run (from a "special file")
2- for each file in the list, the generator checks the last modified date
3- at the end of the whole process, the generator should "touch" the "special file" in order to store the new timestamp

Well.. i quite had this method figured out.. i mean that of maintaining an index and using it to store the last modified data or the md5 checksum.

but what i needed to know was how to get the last modified date for a file? Also, if I don't use date but purely filesize and checksum, will it be fine? [I mean collision wise as well as speed wise]
 
In the win32 API I think the function you are looking for is GetFileTime()

You can go on checksum alone if you want to, but it means more work. Also, there is no need to check the filesize if you have a checksum.

The easiest approach is checking the timestamps though. And there is no need to save the time stamps in an index-file, just compare the "modified" time of the source files against the "created" time of the binary.

k
 
kenewbie said:
In the win32 API I think the function you are looking for is GetFileTime()

thanks for that. i'll look on MSDN for that

You can go on checksum alone if you want to, but it means more work. Also, there is no need to check the filesize if you have a checksum.

Actually.. the plan was since a filesize check is much faster than a checksum, i'll first do a filesize check. If filesizes don't match, then for sure i need to add the file to the modified list. But in case they do match, I'll do a checksum.

also, using a filesize check with checksums can reduce the checksum collision rate a bit...

EDIT: Got it here: http://msdn2.microsoft.com/en-us/library/ms724320(VS.85).aspx

but there is a problem associated with it:

Not all file systems can record creation and last access times and not all file systems record them in the same manner. For example, on FAT, create time has a resolution of 10 milliseconds, write time has a resolution of 2 seconds, and access time has a resolution of 1 day (really, the access date). Therefore, the GetFileTime function may not return the same file time information set using SetFileTime. NTFS delays updates to the last access time for a file by up to one hour after the last access.

hence i think i'll go with the filesize/checksum thing.
 
rohanprabhu said:
Actually.. the plan was since a filesize check is much faster than a checksum, i'll first do a filesize check. If filesizes don't match, then for sure i need to add the file to the modified list. But in case they do match, I'll do a checksum.

Ok, that makes sense.

k
 
Dear Peeps I have posted a few questions about programing on this sectio of the PF forum. I want to ask you veterans how you folks learn program in assembly and about computer architecture for the x86 family. In addition to finish learning C, I am also reading the book From bits to Gates to C and Beyond. In the book, it uses the mini LC3 assembly language. I also have books on assembly programming and computer architecture. The few famous ones i have are Computer Organization and...
I have a quick questions. I am going through a book on C programming on my own. Afterwards, I plan to go through something call data structures and algorithms on my own also in C. I also need to learn C++, Matlab and for personal interest Haskell. For the two topic of data structures and algorithms, I understand there are standard ones across all programming languages. After learning it through C, what would be the biggest issue when trying to implement the same data...
Back
Top