What sort of Math do I need for a plagiarism detection algorithm?

In summary, the conversation discusses the theory behind a simple plagiarism detection algorithm, which involves a function that maps two strings to a plagiarism quotient. The algorithm checks for similarities between strings and uses cluster analysis to determine if the content is plagiarized. One suggested similarity algorithm is the Levenshtein distance.
  • #1
Jamin2112
986
12
I'm trying to figure out the theory behind a simple plagiarism detection algorithm I'm making. The essence is that it's a function

f:S×S→[0, 1]

where S is the set of all strings and [0, 1] is the plagiarism quotient with 0 being no plagiarism and 1 being a completely copied string (that is, f(s1,s2)=1 if s1=s2).

This is analogous a some function that maps two real-valued vectors to the unit interval, something like norm(v1, v2). Only problem here is that the strings can be of different lengths. I guess my algorithm will have to, in that case, take the smaller string and check to see if the larger string contains any substrings suspicious of being plagiarized by the smaller one.

Any ideas?
 
Mathematics news on Phys.org
  • #2
Ok, I just started to make the function like I described above. It's recursive and I still need the Math stuff figured out. Suggestions welcome.
Code:
double plgrsm_qtnt(std::string s1, std::string s2) {
	if (s1.length() == s2.length()) { 
		// math/stat stuff goes here ...
		
		
		// ...
	} else { 
		// Use s1 to be the smaller string if isn't already:
		if (s1.length() > s2.length())
			swap(s1, s2);
		// Check s1 against all substrings of s2 and return the 
		double max_qtnt(0); // since plgrsm_qtnt fallse in [0, 1]
		int s1_len(s1.length()), s2_len(s2.length());
		for (int i(0), j(s1_len - 1); j != s2_len; ++i, ++j) { 
			std::string s2_sbstr = s2.substring(i, j);
                        double this_qtnt = plgrsm_qtnt(s1, s2_sbstr);
			if (this_qtnt > max_qtnt)
				max_qtnt = this_qtnt;
		}
		return max_qtnt;
	} 	
}
 
  • #3

What sort of Math do I need for a plagiarism detection algorithm?

There are several areas of math that are important for developing a plagiarism detection algorithm, including probability and statistics, linear algebra, and graph theory. A solid understanding of these mathematical concepts is necessary for accurately analyzing and comparing large amounts of text data.

Do I need to be proficient in calculus for a plagiarism detection algorithm?

While a basic understanding of calculus can be helpful for understanding certain aspects of a plagiarism detection algorithm, it is not necessarily required. Most of the math involved in developing such an algorithm is more focused on discrete mathematics and data analysis rather than calculus.

Is a background in computer science necessary for developing a plagiarism detection algorithm?

A background in computer science is definitely beneficial for developing a plagiarism detection algorithm. It is important to have a strong understanding of programming languages, algorithms, and data structures in order to effectively implement the mathematical concepts involved in the algorithm.

Can I develop a plagiarism detection algorithm without a strong background in math?

It is possible to develop a basic plagiarism detection algorithm without a strong background in math, as there are many tools and libraries available that can handle the mathematical calculations for you. However, a deeper understanding of the underlying math concepts can help improve the accuracy and effectiveness of the algorithm.

Are there any specific math techniques or algorithms that are commonly used in plagiarism detection?

Yes, there are several techniques and algorithms that are commonly used in plagiarism detection, such as cosine similarity, Jaccard similarity, and Levenshtein distance. These techniques use mathematical calculations to measure the similarity between texts and identify potential instances of plagiarism.

Similar threads

  • Engineering and Comp Sci Homework Help
Replies
14
Views
4K
  • Programming and Computer Science
Replies
6
Views
2K
  • Special and General Relativity
3
Replies
75
Views
3K
  • Programming and Computer Science
Replies
3
Views
2K
  • Programming and Computer Science
Replies
2
Views
2K
  • Programming and Computer Science
Replies
6
Views
2K
  • Engineering and Comp Sci Homework Help
Replies
1
Views
990
  • Engineering and Comp Sci Homework Help
Replies
5
Views
2K
  • Programming and Computer Science
Replies
8
Views
3K
  • Beyond the Standard Models
Replies
10
Views
2K
Back
Top