Sequence Alignment and Dynamic Programming

In summary: Lastly, we would look at the diagonal-above-left square, (1,1), which represents a gap in the alignment, so the score for (2,2) would be the score for (1,1) - 6 (the value for a gap in the scoring matrix). We would then choose the maximum of these three scores as the score for (2,2).I hope this helps to clarify the algorithm and how the values in the scoring matrix are used. I would also recommend checking out some other resources such as Khan Academy's video on dynamic programming and the book "Bioinformatics: Sequence
  • #1
Roo2
47
0

Homework Statement



I'm having trouble understanding dynamic programming as it relates to sequence alignments. I understand from my lecture notes that the scoring matrix used has arbitrary values (in our case +5 for match, -2 for mismatch, and -6 for gap). I therefore understand why square (1,1) in our lecture notes is assigned a score of +5. However, beyond that I have no idea where the numbers in the scoring matrix came from, how to calculate them, etc. I also don't understand how the algorithm knows whether, in a case of non-matching sequence, to assign a mismatch or gap without first seeing the rest of the sequence. I'd really appreciate it if someone could help me figure this out.

Homework Equations



Lecture notes:

http://i3.photobucket.com/albums/y76/Danja91/Dynamic Programming/dynamicprogramming002.jpg

http://i3.photobucket.com/albums/y76/Danja91/Dynamic Programming/dynamicprogramming001.jpg

Web site which seems to explain the rules, but I still don't understand them:

http://www.ibm.com/developerworks/java/library/j-seqalign/index.html

The Attempt at a Solution



None, really. I found the web site, but I still don't get where the numbers are coming from. I kind of got the following:

When assigning a score to a square, it is the maximum of the three squares above, to the left, and diagonal-above-left to the square in question. Left takes priority over above, and above takes priority over diagonal.

I have no idea what the following line means (from the web site)
"
You fill in the empty cell with the maximum of these three numbers:
V1
V2
V3 + 1 if C1 equals C2, or V3 if C1 is not equal to C2, where C1 is the character above the current cell and C2 is the character to the left of the current cell"

Conceptually I get it, but I don't see how it applies to the matrix presented in my lecture notes.

Once again, I would appreciate any help. I haven't been this lost in a while.
 
Physics news on Phys.org
  • #2

Thank you for reaching out for help with understanding dynamic programming as it relates to sequence alignments. I can understand how this concept can be confusing and overwhelming at first. I will do my best to explain the key points and provide some resources that may help you in your understanding.

Firstly, let's start with the scoring matrix. The values in the matrix are not arbitrary, they are based on the concept of similarity between two sequences. In sequence alignments, we are trying to find the best possible alignment between two sequences, which means we want to find the most similar regions between the two sequences. The values in the matrix reflect the likelihood of a match, mismatch, or gap occurring between two characters in the sequences. For example, a match would have a higher value than a mismatch, and a mismatch would have a higher value than a gap. These values are usually determined by studying the frequencies of different types of mutations in a given set of sequences.

Now, let's look at how the algorithm knows whether to assign a mismatch or gap. This is where the concept of dynamic programming comes in. Dynamic programming is a method for solving complex problems by breaking them down into smaller subproblems. In the case of sequence alignments, the algorithm is essentially creating a matrix of all possible alignments and assigning a score to each one. The score is calculated by adding up the values in the scoring matrix for each match, mismatch, or gap in the alignment. The algorithm then chooses the alignment with the highest score as the best alignment.

To understand the line from the website, let's look at the example in your lecture notes. In the matrix, the squares represent the score for the alignment of two sequences up to that point. So, for square (2,2), the score would be the maximum of the three squares above, to the left, and diagonal-above-left to that square. If we follow the algorithm, we would first look at the square above, which is (1,2). This square represents the alignment of the first character in both sequences, which would be a match since they are both A's. So, the score for square (2,2) would be the score for (1,2) + 5 (the value for a match in the scoring matrix). Similarly, we would look at the square to the left, (2,1), which represents a mismatch between the characters C and A, so the score for (2
 

What is sequence alignment?

Sequence alignment is a process of comparing two or more sequences of DNA, RNA, or protein to identify regions of similarity. It helps to understand the evolutionary relationship between different species and to predict the function of newly discovered genes or proteins.

What is dynamic programming in sequence alignment?

Dynamic programming is a method used in sequence alignment to find the optimal alignment between two sequences. It involves breaking down a large problem into smaller subproblems and solving them recursively. This helps to reduce the time and space complexity of the alignment process.

Why is sequence alignment important?

Sequence alignment helps in identifying conserved regions and variations in genetic sequences, which can provide insights into the function, structure, and evolution of genes and proteins. It is also used in disease diagnosis, drug design, and evolutionary studies.

What are the different types of sequence alignment?

The main types of sequence alignment are global alignment, local alignment, and multiple sequence alignment. Global alignment compares the entire length of the sequences, while local alignment focuses on identifying regions of similarity. Multiple sequence alignment aligns three or more sequences simultaneously.

What is the difference between pairwise and multiple sequence alignment?

Pairwise sequence alignment compares two sequences at a time, while multiple sequence alignment compares three or more sequences. Pairwise alignment is used to identify the best alignment between two sequences, whereas multiple sequence alignment is used to identify conserved regions among multiple sequences.

Similar threads

  • Programming and Computer Science
Replies
1
Views
1K
  • Engineering and Comp Sci Homework Help
Replies
10
Views
2K
  • STEM Academic Advising
Replies
13
Views
2K
  • Quantum Physics
Replies
6
Views
1K
  • Astronomy and Astrophysics
3
Replies
72
Views
6K
  • Introductory Physics Homework Help
Replies
2
Views
2K
  • Calculus and Beyond Homework Help
Replies
3
Views
1K
  • STEM Academic Advising
Replies
33
Views
7K
  • Beyond the Standard Models
Replies
14
Views
3K
  • Aerospace Engineering
Replies
5
Views
7K
Back
Top