I'm trying to figure out the theory behind a simple plagiarism detection algorithm I'm making. The essence is that it's a function f:S×S→[0, 1] where S is the set of all strings and [0, 1] is the plagiarism quotient with 0 being no plagiarism and 1 being a completely copied string (that is, f(s1,s2)=1 if s1=s2). This is analogous a some function that maps two real-valued vectors to the unit interval, something like norm(v1, v2). Only problem here is that the strings can be of different lengths. I guess my algorithm will have to, in that case, take the smaller string and check to see if the larger string contains any substrings suspicious of being plagiarized by the smaller one. Any ideas?