1. Limited time only! Sign up for a free 30min personal tutor trial with Chegg Tutors
    Dismiss Notice
Dismiss Notice
Join Physics Forums Today!
The friendliest, high quality science and math community on the planet! Everyone who loves science is here!

What sort of Math do I need for a plagiarism detection algorithm?

  1. Nov 12, 2013 #1
    I'm trying to figure out the theory behind a simple plagiarism detection algorithm I'm making. The essence is that it's a function

    f:S×S→[0, 1]

    where S is the set of all strings and [0, 1] is the plagiarism quotient with 0 being no plagiarism and 1 being a completely copied string (that is, f(s1,s2)=1 if s1=s2).

    This is analogous a some function that maps two real-valued vectors to the unit interval, something like norm(v1, v2). Only problem here is that the strings can be of different lengths. I guess my algorithm will have to, in that case, take the smaller string and check to see if the larger string contains any substrings suspicious of being plagiarized by the smaller one.

    Any ideas?
  2. jcsd
  3. Nov 12, 2013 #2
    Ok, I just started to make the function like I described above. It's recursive and I still need the Math stuff figured out. Suggestions welcome.

    Code (Text):

    double plgrsm_qtnt(std::string s1, std::string s2) {
        if (s1.length() == s2.length()) {
            // math/stat stuff goes here ...
            // ....
        } else {
            // Use s1 to be the smaller string if isn't already:
            if (s1.length() > s2.length())
                swap(s1, s2);
            // Check s1 against all substrings of s2 and return the
            double max_qtnt(0); // since plgrsm_qtnt fallse in [0, 1]
            int s1_len(s1.length()), s2_len(s2.length());
            for (int i(0), j(s1_len - 1); j != s2_len; ++i, ++j) {
                std::string s2_sbstr = s2.substring(i, j);
                            double this_qtnt = plgrsm_qtnt(s1, s2_sbstr);
                if (this_qtnt > max_qtnt)
                    max_qtnt = this_qtnt;
            return max_qtnt;
  4. Nov 12, 2013 #3

    jim mcnamara

    User Avatar

    Staff: Mentor

Share this great discussion with others via Reddit, Google+, Twitter, or Facebook