What is the fastest way to check for file differences in a Makefile generator?

  • Thread starter rohanprabhu
  • Start date
  • Tags
    File
In summary, the conversation is about creating a Makefile generator that needs to keep track of which files have been modified since the last time it was run. The proposed solution is to check the last modified date for each file, and then "touch" a "special file" with a new timestamp at the end of the process. The person also suggests using the Win32 API function GetFileTime() to get the last modified date for a file. Other suggestions include using a checksum instead of just the filesize for faster processing, but there may be issues with different file systems recording timestamps differently. Ultimately, the person plans to use a combination of filesize and checksum checks for the best results.
  • #1
rohanprabhu
414
2
I am making a Makefile generator and for that, I need something like:

There is a list of files. When the generator is run, I need to know that which files out of those have changed since the last time the generator was run so as to not to include these files in the compiling list. I was thinking first a filesize check and if they match, a checksum. Would that do?

Since the program i'll be making would be Windows-native, so I can use anything from the Win32 API, something that would allow me to check the last file modified date??

Since the list of files could be HUGE.. i need to know the fastest way to do this. Any help is appreciated.

thanks,
rohan
 
Technology news on Phys.org
  • #2
1- the generator reads the last time it was run (from a "special file")
2- for each file in the list, the generator checks the last modified date
3- at the end of the whole process, the generator should "touch" the "special file" in order to store the new timestamp
 
  • #3
Rogerio said:
1- the generator reads the last time it was run (from a "special file")
2- for each file in the list, the generator checks the last modified date
3- at the end of the whole process, the generator should "touch" the "special file" in order to store the new timestamp

Well.. i quite had this method figured out.. i mean that of maintaining an index and using it to store the last modified data or the md5 checksum.

but what i needed to know was how to get the last modified date for a file? Also, if I don't use date but purely filesize and checksum, will it be fine? [I mean collision wise as well as speed wise]
 
  • #4
In the win32 API I think the function you are looking for is GetFileTime()

You can go on checksum alone if you want to, but it means more work. Also, there is no need to check the filesize if you have a checksum.

The easiest approach is checking the timestamps though. And there is no need to save the time stamps in an index-file, just compare the "modified" time of the source files against the "created" time of the binary.

k
 
  • #5
kenewbie said:
In the win32 API I think the function you are looking for is GetFileTime()

thanks for that. i'll look on MSDN for that

You can go on checksum alone if you want to, but it means more work. Also, there is no need to check the filesize if you have a checksum.

Actually.. the plan was since a filesize check is much faster than a checksum, i'll first do a filesize check. If filesizes don't match, then for sure i need to add the file to the modified list. But in case they do match, I'll do a checksum.

also, using a filesize check with checksums can reduce the checksum collision rate a bit...

EDIT: Got it here: http://msdn2.microsoft.com/en-us/library/ms724320(VS.85).aspx

but there is a problem associated with it:

Not all file systems can record creation and last access times and not all file systems record them in the same manner. For example, on FAT, create time has a resolution of 10 milliseconds, write time has a resolution of 2 seconds, and access time has a resolution of 1 day (really, the access date). Therefore, the GetFileTime function may not return the same file time information set using SetFileTime. NTFS delays updates to the last access time for a file by up to one hour after the last access.

hence i think i'll go with the filesize/checksum thing.
 
  • #6
rohanprabhu said:
Actually.. the plan was since a filesize check is much faster than a checksum, i'll first do a filesize check. If filesizes don't match, then for sure i need to add the file to the modified list. But in case they do match, I'll do a checksum.

Ok, that makes sense.

k
 

1. What is a "file diff" and why is it important?

A file diff, or file difference, is a comparison between two versions of a file to identify any changes made between them. It is important because it allows users to track changes, collaborate on a document, and identify potential errors or conflicts.

2. How does the "fastest way to file diff" differ from traditional methods?

The fastest way to file diff typically involves using specialized software or tools that automate the process of comparing files, rather than manually reviewing each line of code or text. This can save time and improve accuracy.

3. What factors should I consider when choosing a file diff tool?

Some factors to consider include the type of files you are comparing, the level of detail you need in the comparison, the complexity of the files, and the features offered by the tool such as side-by-side comparison or highlighting of differences.

4. Can file diff be used for non-coding files?

Yes, file diff can be used for any type of file, not just coding files. It can be useful for comparing documents, spreadsheets, images, and other types of files to track changes and identify differences.

5. Are there any limitations to using file diff?

One limitation of file diff is that it may not be able to accurately identify changes in files that have been heavily modified or contain a large amount of data. It also relies on the accuracy of the original versions of the files being compared. Additionally, file diff may not be suitable for identifying changes in non-text files such as images or videos.

Similar threads

  • Programming and Computer Science
Replies
29
Views
2K
  • Programming and Computer Science
Replies
6
Views
973
  • Programming and Computer Science
2
Replies
65
Views
2K
  • Programming and Computer Science
Replies
4
Views
5K
  • Programming and Computer Science
Replies
11
Views
991
  • Programming and Computer Science
Replies
15
Views
1K
  • Programming and Computer Science
Replies
34
Views
3K
  • Programming and Computer Science
Replies
1
Views
3K
  • Programming and Computer Science
Replies
2
Views
2K
Replies
1
Views
942
Back
Top