DNA Sequence Alignment: Understanding Smith-Waterman Algorithm

In summary, the Smith-Waterman algorithm works on a MxN matrix, where M and N are the lengths of the 2 DNA sequences, and it determines the most similar sequence by comparing the entries of the matrix.
  • #1
wu_weidong
32
0
Hi all,
I'm interested in learning more about DNA sequence alignment and have been reading up on the topic online.

I'm more interested in the Smith-Waterman algorithm for local alignment, but I'm quite confused about how the algorithm works.

I know the algorithm works on a MxN matrix, where M and N are the lengths of the 2 DNA sequences, but I'm not sure how the entries of the matrix came about. Also, I keep coming across the substitution matrices PAM and BLOSUM, but I thought they're mostly used for amino acid sequences and their matrix entries are predetermined. So how do they fit into the Smith-Waterman algorithm where the DNA sequences are different in different comparisons?

Thank you.

Regards,
Rayne
 
Biology news on Phys.org
  • #3
I have another question. Say we compare sequence A with sequence B, C and D using Smith-Waterman algorithm, and the maximum score for each of the 3 comparisons are 1, 2 and 3 respectively. Does that mean sequence A and C are the most similar and therefore the most useful for future research? If not, how do we determine which 2 sequences are the most similar?

Thanks.
 
  • #4
Hi,
Please anybody provide me with a program which compute distance matrix from dna or protein sequences
 
  • #5
sundus said:
Hi,
Please anybody provide me with a program which compute distance matrix from dna or protein sequences

http://www.megasoftware.net/
 
  • #6
wu_weidong said:
Hi all,
I'm interested in learning more about DNA sequence alignment and have been reading up on the topic online.

I'm more interested in the Smith-Waterman algorithm for local alignment, but I'm quite confused about how the algorithm works.

I know the algorithm works on a MxN matrix, where M and N are the lengths of the 2 DNA sequences, but I'm not sure how the entries of the matrix came about. Also, I keep coming across the substitution matrices PAM and BLOSUM, but I thought they're mostly used for amino acid sequences and their matrix entries are predetermined. So how do they fit into the Smith-Waterman algorithm where the DNA sequences are different in different comparisons?

Thank you.

Regards,
Rayne

http://en.wikipedia.org/wiki/Smith-Waterman_algorithm#Example

Your best bet is to work through this example.

If you're new to local alignment, I suggest you start with Needleman-Wunsch - it's simpler, and a precursor to Smith-Waterman.
http://en.wikipedia.org/wiki/Needleman-Wunsch_algorithm

If you're still stuck, try asking specific questions again, and I'll try to help you out.

As for substitution matrices - substitutions between A and G (purines) or C and T (pyrimidines) are penalized less than a purine to a pyrimidine (or vice versa) just like how substitutions between phenylalanine and tyrosine are penalized less (similar side chains!)

The reason why you come across PAM/BLOSUM is because Smith-Waterman (and Needleman-Wunsch) can be used not only for nucleotide sequence alignment, but amino acid sequence alignment as well. All that being said, you really ought to ignore substitution matrices for now.
 

1. What is DNA sequence alignment?

DNA sequence alignment is a process of comparing two or more DNA sequences to identify similarities and differences between them. It is an important technique in genetics and molecular biology, as it helps us understand the relationships between different organisms and the genetic basis of inherited traits.

2. Why is DNA sequence alignment important?

DNA sequence alignment allows us to identify and analyze genetic variations, mutations, and evolutionary relationships between different organisms. It also helps us understand the functions of different genes and their role in diseases.

3. What is the Smith-Waterman algorithm?

The Smith-Waterman algorithm is a dynamic programming algorithm used for local sequence alignment. It compares two sequences by assigning scores for matching and mismatching nucleotides and gaps, and finds the optimal alignment with the highest score.

4. How does the Smith-Waterman algorithm work?

The algorithm works by creating a matrix of scores for all possible alignments between two sequences. It then finds the highest scoring alignment by tracing back through the matrix. This method allows for more accurate alignments by considering local similarities rather than just global ones.

5. What are some applications of the Smith-Waterman algorithm?

The Smith-Waterman algorithm is widely used in bioinformatics for DNA and protein sequence analysis, including identifying homologous sequences, predicting protein structures, and detecting mutations and genetic variations. It is also used in medical research for understanding disease mechanisms and developing personalized treatments.

Similar threads

  • Biology and Medical
Replies
15
Views
2K
Replies
4
Views
6K
  • Mechanical Engineering
Replies
20
Views
2K
  • Biology and Medical
Replies
12
Views
4K
  • Biology and Medical
Replies
31
Views
5K
Replies
3
Views
3K
Replies
2
Views
3K
  • Biology and Medical
Replies
11
Views
5K
  • Biology and Medical
Replies
1
Views
3K
Replies
12
Views
2K
Back
Top