Sequence Alignment and Dynamic Programming

Roo2 · Oct 4, 2011

Homework Statement

I'm having trouble understanding dynamic programming as it relates to sequence alignments. I understand from my lecture notes that the scoring matrix used has arbitrary values (in our case +5 for match, -2 for mismatch, and -6 for gap). I therefore understand why square (1,1) in our lecture notes is assigned a score of +5. However, beyond that I have no idea where the numbers in the scoring matrix came from, how to calculate them, etc. I also don't understand how the algorithm knows whether, in a case of non-matching sequence, to assign a mismatch or gap without first seeing the rest of the sequence. I'd really appreciate it if someone could help me figure this out.

Homework Equations

Lecture notes:

http://i3.photobucket.com/albums/y76/Danja91/Dynamic Programming/dynamicprogramming002.jpg

http://i3.photobucket.com/albums/y76/Danja91/Dynamic Programming/dynamicprogramming001.jpg

Web site which seems to explain the rules, but I still don't understand them:

http://www.ibm.com/developerworks/java/library/j-seqalign/index.html

The Attempt at a Solution

None, really. I found the web site, but I still don't get where the numbers are coming from. I kind of got the following:

When assigning a score to a square, it is the maximum of the three squares above, to the left, and diagonal-above-left to the square in question. Left takes priority over above, and above takes priority over diagonal.

I have no idea what the following line means (from the web site)
"
You fill in the empty cell with the maximum of these three numbers:
V1
V2
V3 + 1 if C1 equals C2, or V3 if C1 is not equal to C2, where C1 is the character above the current cell and C2 is the character to the left of the current cell"

Conceptually I get it, but I don't see how it applies to the matrix presented in my lecture notes.

Once again, I would appreciate any help. I haven't been this lost in a while.

nate808 · Oct 5, 2011

Thank you for reaching out for help with understanding dynamic programming as it relates to sequence alignments. I can understand how this concept can be confusing and overwhelming at first. I will do my best to explain the key points and provide some resources that may help you in your understanding.

Firstly, let's start with the scoring matrix. The values in the matrix are not arbitrary, they are based on the concept of similarity between two sequences. In sequence alignments, we are trying to find the best possible alignment between two sequences, which means we want to find the most similar regions between the two sequences. The values in the matrix reflect the likelihood of a match, mismatch, or gap occurring between two characters in the sequences. For example, a match would have a higher value than a mismatch, and a mismatch would have a higher value than a gap. These values are usually determined by studying the frequencies of different types of mutations in a given set of sequences.

Now, let's look at how the algorithm knows whether to assign a mismatch or gap. This is where the concept of dynamic programming comes in. Dynamic programming is a method for solving complex problems by breaking them down into smaller subproblems. In the case of sequence alignments, the algorithm is essentially creating a matrix of all possible alignments and assigning a score to each one. The score is calculated by adding up the values in the scoring matrix for each match, mismatch, or gap in the alignment. The algorithm then chooses the alignment with the highest score as the best alignment.

To understand the line from the website, let's look at the example in your lecture notes. In the matrix, the squares represent the score for the alignment of two sequences up to that point. So, for square (2,2), the score would be the maximum of the three squares above, to the left, and diagonal-above-left to that square. If we follow the algorithm, we would first look at the square above, which is (1,2). This square represents the alignment of the first character in both sequences, which would be a match since they are both A's. So, the score for square (2,2) would be the score for (1,2) + 5 (the value for a match in the scoring matrix). Similarly, we would look at the square to the left, (2,1), which represents a mismatch between the characters C and A, so the score for (2

Sequence Alignment and Dynamic Programming

Homework Statement

Homework Equations

The Attempt at a Solution

What is sequence alignment?

What is dynamic programming in sequence alignment?

Why is sequence alignment important?

What are the different types of sequence alignment?

What is the difference between pairwise and multiple sequence alignment?

Similar threads

Hot Threads

Recent Insights