Is there a flaw in my longest common subsequence algorithm?

  • Context:
  • Thread starter Thread starter SlurrerOfSpeech
  • Start date Start date
  • Tags Tags
    Algorithm Subsequence
Click For Summary

Discussion Overview

The discussion revolves around a participant's implementation of an algorithm to solve the Common Child string problem, specifically focusing on the longest common subsequence (LCS) between two strings. The scope includes algorithm design, debugging, and performance considerations.

Discussion Character

  • Technical explanation
  • Debate/contested
  • Mathematical reasoning

Main Points Raised

  • The initial implementation of the LCS algorithm is shared, which passes several test cases but fails on a specific larger test case.
  • One participant suggests providing a smaller test case to help identify the error in the algorithm.
  • Another participant emphasizes the need for clearer explanations and comments in the code to aid understanding.
  • There is a recommendation to use a debugger to analyze the algorithm's behavior with smaller inputs.
  • A suggestion is made to test the algorithm with the strings "CYAT" and "CATY" to check for correctness.
  • A critique is offered regarding the brute force approach of the current algorithm, highlighting its inefficiency and suggesting the use of dynamic programming to improve performance.

Areas of Agreement / Disagreement

Participants express a consensus on the need for clearer code documentation and debugging strategies. However, there is disagreement regarding the algorithm's approach, with some advocating for a more efficient method while others focus on identifying specific errors in the current implementation.

Contextual Notes

The discussion highlights limitations in the current algorithm's approach, particularly its brute force nature leading to high time complexity. There are also unresolved issues regarding the specific error causing undercalculation in the larger test case.

SlurrerOfSpeech
Messages
141
Reaction score
11
What I have is

Code:
    /// <summary>
    /// Provides a solution to the Common Child string problem
    /// https://www.hackerrank.com/challenges/common-child/problem
    /// </summary>
    public static class CommonChild
    {
        public static int Solve(string first, string second)
        {
            int max = 0;
            for (int i = 0; i < first.Length; ++i)
            {
                int current = 0;
                for (int j = i, k = 0; j < first.Length; ++j)
                {
                    int m = k;
                    bool found = false;
                    for (; k < second.Length && !found; ++k)
                    {
                        if (first[j] == second[k])
                        {
                            current += 1;
                            found = true;
                        }
                    }
                    if (!found)
                    {
                        k = m;
                    }
                }
                if (current > max)
                {
                    max = current;
                }
            }
            return max;
        }
    }

Which is passing the basic test cases of mine and the sites's:

Code:
        [TestMethod]
        public void SampleTest0()
        {
            ExecuteTest(first: "HARRY", second: "SALLY", expected: 2);
        }

        [TestMethod]
        public void SampleTest1()
        {
            ExecuteTest(first: "AA", second: "BB", expected: 0);
        }

        [TestMethod]
        public void SampleTest2()
        {
            ExecuteTest(first: "SHINCHAN", second: "NOHARAAA", expected: 3);
        }

        [TestMethod]
        public void SampleTest3()
        {
            ExecuteTest(first: "ABCDEF", second: "FBDAMN", expected: 2);
        }

        [TestMethod]
        public void MyTest0()
        {
            ExecuteTest(first: "", second: "", expected: 0);
        }

        [TestMethod]
        public void MyTest1()
        {
            ExecuteTest(first: "", second: "A", expected: 0);
        }

        [TestMethod]
        public void MyTest2()
        {
            ExecuteTest(first: "A", second: "A", expected: 1);
        }

        [TestMethod]
        public void MyTest3()
        {
            ExecuteTest(first: "A", second: "B", expected: 0);
        }

        [TestMethod]
        public void MyTest4()
        {
            ExecuteTest(first: "A", second: "AB", expected: 1);
        }

        [TestMethod]
        public void MyTest5()
        {
            ExecuteTest(first: "A", second: "BA", expected: 1);
        }
        [TestMethod]
        public void MyTest6()
        {
            ExecuteTest(first: "AB", second: "BA", expected: 1);
        }

        [TestMethod]
        public void MyTest7()
        {
            ExecuteTest(first: "AB", second: "AB", expected: 2);
        }

        [TestMethod]
        public void MyTest8()
        {
            ExecuteTest(first: "AC", second: "ABC", expected: 2);
        }

        [TestMethod]
        public void MyTest9()
        {
            ExecuteTest(first: "AC", second: "ABCD", expected: 2);
        }

        [TestMethod]
        public void MyTest10()
        {
            ExecuteTest(first: "ACE", second: "ABCDEE", expected: 3);
        }

        [TestMethod]
        public void MyTest11()
        {
            ExecuteTest(first: "ACEF", second: "ABCDEE", expected: 3);
        }

        [TestMethod]
        public void MyTest12()
        {
            ExecuteTest(first: "ACEJJKLMNPST", second: "ABCDEEGHHHIPQRRS", expected: 5);
        }

        [TestMethod]
        public void MyTest13()
        {
            ExecuteTest(first: "ACEF", second: "BCDEEA", expected: 2);
        }

        [TestMethod]
        public void MyTest14()
        {
            ExecuteTest(first: "WYZXY", second: "YOOX", expected: 2);
        }

        [TestMethod]
        public void MyTest15()
        {
            ExecuteTest(first: "XXXXXXYXXX", second: "XYXXXXXXXX", expected: 9);
        }

But it is failing this one

Code:
        [TestMethod]
        public void TestCase1()
        {
            ExecuteTest
            (
                first: "WEWOUCUIDGCGTRMEZEPXZFEJWISRSBBSYXAYDFEJJDLEBVHHKS",
                second: "FDAGCXGKCTKWNECHMRXZWMLRYUCOCZHJRRJBOAJOQJZZVUYXIC",
                expected: 15
            );
        }

and I can't figure out why because the inputs are too big for me to see what's going on. Mine is undercalculating (10).
 
Technology news on Phys.org
Basically, I'm wondering if someone here has a smaller test case that can help me see my error
 
It helps if you explain what you want to achieve in plain English. It is not clear to me from the examples.
 
S_David said:
It helps if you explain what you want to achieve in plain English. It is not clear to me from the examples.
I agree. Rather than start with "What I have is" and then a bunch of code it would be helpful to specify what you're looking for. I think I figured out from your examples that the longest matching subsequences in HARRY and SALLY are AY, but neither I nor other readers should have to parse your code to see what you're doing, especially since you have not a single comment other than the summary at the top. I shouldn't have to follow a hyperlink to get the gist of what you're doing.

If you're having trouble with your longer example, take the first 11 characters of each string and watch what is happening in a debugger. Watching what happens for the first 9 characters might be a bit tedious, but would probably enlighten you as to what is causing the bug.
 
  • Like
Likes   Reactions: QuantumQuest
try
CYAT
CATY
in this order. (of course order shouldn't matter in a correct program)
 
SlurrerOfSpeech said:
... and I can't figure out why because the inputs are too big for me to see what's going on. Mine is undercalculating (10).

First, I'll totally agree to Mark44. It would be much more helpful for anyone reviewing your code but even more importantly for you to put the appropriate comments into your code. A summary comment doesn't tell much. So, the way it is, leaves you with two options. First, as has already been pointed out, you can use your debugger and see where the code fails. Second, you can write your program in some pseudocode form and see what operation(s) potentially fail and go from there. If you didn't write pseudocode first, including the algorithm you use and then wrote your code based on it as - although not recommended, happens to all programmers sometimes , you can do it now. I think that spotting the error(s) this way is preferable in order to get a complete view for any (feasible) input.

A second point is that, as I see reviewing your code, you 're using a brute force approach. This is not the optimal way to solve the longest common subsequence problem. It has ##O(2^n)## time complexity where ##n## is the number of characters of the first string. Why not use dynamic programming? It has ##O(2^n)## time complexity in the worst case i.e. when all characters of the two examined strings mismatch. Using memorization you can deal with the overlapping substructure problem and reduce it to ##O(mn)## where ##m##,##n## the length of the two strings respectively: ##[0...m - 1], [0...n - 1]##.
 

Similar threads

  • · Replies 9 ·
Replies
9
Views
2K
  • · Replies 2 ·
Replies
2
Views
2K
  • · Replies 8 ·
Replies
8
Views
2K
  • · Replies 3 ·
Replies
3
Views
2K
Replies
2
Views
3K
  • · Replies 2 ·
Replies
2
Views
2K
Replies
1
Views
2K
  • · Replies 2 ·
Replies
2
Views
2K
  • · Replies 1 ·
Replies
1
Views
2K
Replies
2
Views
2K