# DNA sequence alignment

Hi all,
I'm interested in learning more about DNA sequence alignment and have been reading up on the topic online.

I'm more interested in the Smith-Waterman algorithm for local alignment, but I'm quite confused about how the algorithm works.

I know the algorithm works on a MxN matrix, where M and N are the lengths of the 2 DNA sequences, but I'm not sure how the entries of the matrix came about. Also, I keep coming across the substitution matrices PAM and BLOSUM, but I thought they're mostly used for amino acid sequences and their matrix entries are predetermined. So how do they fit into the Smith-Waterman algorithm where the DNA sequences are different in different comparisons?

Regards,
Rayne

I have another question. Say we compare sequence A with sequence B, C and D using Smith-Waterman algorithm, and the maximum score for each of the 3 comparisons are 1, 2 and 3 respectively. Does that mean sequence A and C are the most similar and therefore the most useful for future research? If not, how do we determine which 2 sequences are the most similar?

Hi,
Please any body provide me with a program which compute distance matrix from dna or protein sequences

http://en.wikipedia.org/wiki/Smith-Waterman_algorithm#Example

Your best bet is to work through this example.

If you're new to local alignment, I suggest you start with Needleman-Wunsch - it's simpler, and a precursor to Smith-Waterman.
http://en.wikipedia.org/wiki/Needleman-Wunsch_algorithm

As for substitution matrices - substitutions between A and G (purines) or C and T (pyrimidines) are penalized less than a purine to a pyrimidine (or vice versa) just like how substitutions between phenylalanine and tyrosine are penalized less (similar side chains!)

The reason why you come across PAM/BLOSUM is because Smith-Waterman (and Needleman-Wunsch) can be used not only for nucleotide sequence alignment, but amino acid sequence alignment as well. All that being said, you really ought to ignore substitution matrices for now.