DNA Sequence Alignment: Understanding Smith-Waterman Algorithm

Click For Summary

Discussion Overview

The discussion centers on the Smith-Waterman algorithm for DNA sequence alignment, particularly its mechanics, the role of substitution matrices like PAM and BLOSUM, and the interpretation of alignment scores. Participants express confusion about the algorithm's matrix entries and how substitution matrices apply to DNA sequences.

Discussion Character

  • Exploratory
  • Technical explanation
  • Conceptual clarification
  • Debate/contested
  • Homework-related

Main Points Raised

  • Rayne expresses confusion about how the entries of the MxN matrix in the Smith-Waterman algorithm are determined and questions the relevance of PAM and BLOSUM matrices for DNA sequences.
  • Some participants suggest resources like the NCBI BLAST help page for further understanding of substitution matrices.
  • Rayne questions how to interpret maximum scores from comparisons of multiple sequences using the Smith-Waterman algorithm to determine similarity.
  • Another participant requests programs for computing distance matrices from DNA or protein sequences.
  • A later reply reiterates the confusion about substitution matrices and suggests that they are less relevant for initial understanding of the Smith-Waterman algorithm.

Areas of Agreement / Disagreement

Participants generally agree on the complexity of the Smith-Waterman algorithm and the confusion surrounding substitution matrices. However, there is no consensus on how to interpret alignment scores or the applicability of substitution matrices to DNA sequences.

Contextual Notes

There are limitations in understanding the derivation of matrix entries and the specific conditions under which substitution matrices apply. The discussion does not resolve these uncertainties.

Who May Find This Useful

Individuals interested in bioinformatics, particularly those studying DNA sequence alignment and the Smith-Waterman algorithm, may find this discussion relevant.

wu_weidong
Messages
27
Reaction score
0
Hi all,
I'm interested in learning more about DNA sequence alignment and have been reading up on the topic online.

I'm more interested in the Smith-Waterman algorithm for local alignment, but I'm quite confused about how the algorithm works.

I know the algorithm works on a MxN matrix, where M and N are the lengths of the 2 DNA sequences, but I'm not sure how the entries of the matrix came about. Also, I keep coming across the substitution matrices PAM and BLOSUM, but I thought they're mostly used for amino acid sequences and their matrix entries are predetermined. So how do they fit into the Smith-Waterman algorithm where the DNA sequences are different in different comparisons?

Thank you.

Regards,
Rayne
 
Biology news on Phys.org
I have another question. Say we compare sequence A with sequence B, C and D using Smith-Waterman algorithm, and the maximum score for each of the 3 comparisons are 1, 2 and 3 respectively. Does that mean sequence A and C are the most similar and therefore the most useful for future research? If not, how do we determine which 2 sequences are the most similar?

Thanks.
 
Hi,
Please anybody provide me with a program which compute distance matrix from dna or protein sequences
 
sundus said:
Hi,
Please anybody provide me with a program which compute distance matrix from dna or protein sequences

http://www.megasoftware.net/
 
wu_weidong said:
Hi all,
I'm interested in learning more about DNA sequence alignment and have been reading up on the topic online.

I'm more interested in the Smith-Waterman algorithm for local alignment, but I'm quite confused about how the algorithm works.

I know the algorithm works on a MxN matrix, where M and N are the lengths of the 2 DNA sequences, but I'm not sure how the entries of the matrix came about. Also, I keep coming across the substitution matrices PAM and BLOSUM, but I thought they're mostly used for amino acid sequences and their matrix entries are predetermined. So how do they fit into the Smith-Waterman algorithm where the DNA sequences are different in different comparisons?

Thank you.

Regards,
Rayne

http://en.wikipedia.org/wiki/Smith-Waterman_algorithm#Example

Your best bet is to work through this example.

If you're new to local alignment, I suggest you start with Needleman-Wunsch - it's simpler, and a precursor to Smith-Waterman.
http://en.wikipedia.org/wiki/Needleman-Wunsch_algorithm

If you're still stuck, try asking specific questions again, and I'll try to help you out.

As for substitution matrices - substitutions between A and G (purines) or C and T (pyrimidines) are penalized less than a purine to a pyrimidine (or vice versa) just like how substitutions between phenylalanine and tyrosine are penalized less (similar side chains!)

The reason why you come across PAM/BLOSUM is because Smith-Waterman (and Needleman-Wunsch) can be used not only for nucleotide sequence alignment, but amino acid sequence alignment as well. All that being said, you really ought to ignore substitution matrices for now.
 

Similar threads

  • · Replies 15 ·
Replies
15
Views
4K
Replies
4
Views
7K
  • · Replies 12 ·
Replies
12
Views
5K
  • · Replies 20 ·
Replies
20
Views
6K
Replies
1
Views
4K
  • · Replies 31 ·
2
Replies
31
Views
7K
  • · Replies 4 ·
Replies
4
Views
4K
  • · Replies 11 ·
Replies
11
Views
6K
  • · Replies 3 ·
Replies
3
Views
5K
  • · Replies 1 ·
Replies
1
Views
4K