What sort of Math do I need for a plagiarism detection algorithm?

Click For Summary
A plagiarism detection algorithm can be conceptualized as a function f:S×S→[0, 1], where S represents all strings and the output indicates the plagiarism quotient. The challenge arises from handling strings of different lengths, necessitating a method to check smaller strings against substrings of larger ones. The algorithm is designed to be recursive, with initial comparisons made when strings are of equal length. Techniques such as cluster analysis and similarity distance calculations, including the Levenshtein distance, are relevant for enhancing the detection process. Overall, the discussion emphasizes the mathematical foundations needed for effective plagiarism detection.
Jamin2112
Messages
973
Reaction score
12
I'm trying to figure out the theory behind a simple plagiarism detection algorithm I'm making. The essence is that it's a function

f:S×S→[0, 1]

where S is the set of all strings and [0, 1] is the plagiarism quotient with 0 being no plagiarism and 1 being a completely copied string (that is, f(s1,s2)=1 if s1=s2).

This is analogous a some function that maps two real-valued vectors to the unit interval, something like norm(v1, v2). Only problem here is that the strings can be of different lengths. I guess my algorithm will have to, in that case, take the smaller string and check to see if the larger string contains any substrings suspicious of being plagiarized by the smaller one.

Any ideas?
 
Mathematics news on Phys.org
Ok, I just started to make the function like I described above. It's recursive and I still need the Math stuff figured out. Suggestions welcome.
Code:
double plgrsm_qtnt(std::string s1, std::string s2) {
	if (s1.length() == s2.length()) { 
		// math/stat stuff goes here ...
		
		
		// ...
	} else { 
		// Use s1 to be the smaller string if isn't already:
		if (s1.length() > s2.length())
			swap(s1, s2);
		// Check s1 against all substrings of s2 and return the 
		double max_qtnt(0); // since plgrsm_qtnt fallse in [0, 1]
		int s1_len(s1.length()), s2_len(s2.length());
		for (int i(0), j(s1_len - 1); j != s2_len; ++i, ++j) { 
			std::string s2_sbstr = s2.substring(i, j);
                        double this_qtnt = plgrsm_qtnt(s1, s2_sbstr);
			if (this_qtnt > max_qtnt)
				max_qtnt = this_qtnt;
		}
		return max_qtnt;
	} 	
}
 
Here is a little puzzle from the book 100 Geometric Games by Pierre Berloquin. The side of a small square is one meter long and the side of a larger square one and a half meters long. One vertex of the large square is at the center of the small square. The side of the large square cuts two sides of the small square into one- third parts and two-thirds parts. What is the area where the squares overlap?

Similar threads

  • · Replies 14 ·
Replies
14
Views
5K
  • · Replies 6 ·
Replies
6
Views
2K
  • · Replies 3 ·
Replies
3
Views
3K
Replies
2
Views
2K
  • · Replies 6 ·
Replies
6
Views
2K
  • · Replies 75 ·
3
Replies
75
Views
7K
  • · Replies 5 ·
Replies
5
Views
924
  • · Replies 0 ·
Replies
0
Views
2K
  • · Replies 8 ·
Replies
8
Views
4K
  • · Replies 5 ·
Replies
5
Views
3K