Fastest way to file diff

412
2

Main Question or Discussion Point

I am making a Makefile generator and for that, I need something like:

There is a list of files. When the generator is run, I need to know that which files out of those have changed since the last time the generator was run so as to not to include these files in the compiling list. I was thinking first a filesize check and if they match, a checksum. Would that do?

Since the program i'll be making would be Windows-native, so I can use anything from the Win32 API, something that would allow me to check the last file modified date??

Since the list of files could be HUGE.. i need to know the fastest way to do this. Any help is appreciated.

thanks,
rohan
 

Answers and Replies

403
1
1- the generator reads the last time it was run (from a "special file")
2- for each file in the list, the generator checks the last modified date
3- at the end of the whole process, the generator should "touch" the "special file" in order to store the new timestamp
 
412
2
1- the generator reads the last time it was run (from a "special file")
2- for each file in the list, the generator checks the last modified date
3- at the end of the whole process, the generator should "touch" the "special file" in order to store the new timestamp
Well.. i quite had this method figured out.. i mean that of maintaining an index and using it to store the last modified data or the md5 checksum.

but what i needed to know was how to get the last modified date for a file? Also, if I don't use date but purely filesize and checksum, will it be fine? [I mean collision wise as well as speed wise]
 
237
0
In the win32 API I think the function you are looking for is GetFileTime()

You can go on checksum alone if you want to, but it means more work. Also, there is no need to check the filesize if you have a checksum.

The easiest approach is checking the timestamps though. And there is no need to save the time stamps in an index-file, just compare the "modified" time of the source files against the "created" time of the binary.

k
 
412
2
In the win32 API I think the function you are looking for is GetFileTime()
thanks for that. i'll look on MSDN for that

You can go on checksum alone if you want to, but it means more work. Also, there is no need to check the filesize if you have a checksum.
Actually.. the plan was since a filesize check is much faster than a checksum, i'll first do a filesize check. If filesizes don't match, then for sure i need to add the file to the modified list. But in case they do match, I'll do a checksum.

also, using a filesize check with checksums can reduce the checksum collision rate a bit...

EDIT: Got it here: http://msdn2.microsoft.com/en-us/library/ms724320(VS.85).aspx

but there is a problem associated with it:

Not all file systems can record creation and last access times and not all file systems record them in the same manner. For example, on FAT, create time has a resolution of 10 milliseconds, write time has a resolution of 2 seconds, and access time has a resolution of 1 day (really, the access date). Therefore, the GetFileTime function may not return the same file time information set using SetFileTime. NTFS delays updates to the last access time for a file by up to one hour after the last access.
hence i think i'll go with the filesize/checksum thing.
 
237
0
Actually.. the plan was since a filesize check is much faster than a checksum, i'll first do a filesize check. If filesizes don't match, then for sure i need to add the file to the modified list. But in case they do match, I'll do a checksum.
Ok, that makes sense.

k
 

Related Threads for: Fastest way to file diff

Replies
11
Views
21K
Replies
15
Views
983
Replies
9
Views
60K
  • Last Post
Replies
10
Views
3K
  • Last Post
Replies
17
Views
3K
Replies
14
Views
7K
  • Last Post
Replies
5
Views
6K
Top