DNA Sequence Alignment: Understanding Smith-Waterman Algorithm

AI Thread Summary
The discussion centers on the Smith-Waterman algorithm for local DNA sequence alignment. Key points include the algorithm's operation on an MxN matrix, where M and N represent the lengths of the two DNA sequences. There is confusion regarding how matrix entries are determined and the role of substitution matrices like PAM and BLOSUM, which are primarily used for amino acid sequences but can also apply to nucleotide sequences. The conversation suggests that substitution penalties differ based on the type of nucleotides or amino acids involved. Additionally, a question is raised about interpreting scores from multiple sequence comparisons to determine similarity, emphasizing the need for clarity on how to assess which sequences are most similar. Recommendations include reviewing specific examples and considering simpler algorithms like Needleman-Wunsch for foundational understanding.
wu_weidong
Messages
27
Reaction score
0
Hi all,
I'm interested in learning more about DNA sequence alignment and have been reading up on the topic online.

I'm more interested in the Smith-Waterman algorithm for local alignment, but I'm quite confused about how the algorithm works.

I know the algorithm works on a MxN matrix, where M and N are the lengths of the 2 DNA sequences, but I'm not sure how the entries of the matrix came about. Also, I keep coming across the substitution matrices PAM and BLOSUM, but I thought they're mostly used for amino acid sequences and their matrix entries are predetermined. So how do they fit into the Smith-Waterman algorithm where the DNA sequences are different in different comparisons?

Thank you.

Regards,
Rayne
 
Biology news on Phys.org
I have another question. Say we compare sequence A with sequence B, C and D using Smith-Waterman algorithm, and the maximum score for each of the 3 comparisons are 1, 2 and 3 respectively. Does that mean sequence A and C are the most similar and therefore the most useful for future research? If not, how do we determine which 2 sequences are the most similar?

Thanks.
 
Hi,
Please anybody provide me with a program which compute distance matrix from dna or protein sequences
 
sundus said:
Hi,
Please anybody provide me with a program which compute distance matrix from dna or protein sequences

http://www.megasoftware.net/
 
wu_weidong said:
Hi all,
I'm interested in learning more about DNA sequence alignment and have been reading up on the topic online.

I'm more interested in the Smith-Waterman algorithm for local alignment, but I'm quite confused about how the algorithm works.

I know the algorithm works on a MxN matrix, where M and N are the lengths of the 2 DNA sequences, but I'm not sure how the entries of the matrix came about. Also, I keep coming across the substitution matrices PAM and BLOSUM, but I thought they're mostly used for amino acid sequences and their matrix entries are predetermined. So how do they fit into the Smith-Waterman algorithm where the DNA sequences are different in different comparisons?

Thank you.

Regards,
Rayne

http://en.wikipedia.org/wiki/Smith-Waterman_algorithm#Example

Your best bet is to work through this example.

If you're new to local alignment, I suggest you start with Needleman-Wunsch - it's simpler, and a precursor to Smith-Waterman.
http://en.wikipedia.org/wiki/Needleman-Wunsch_algorithm

If you're still stuck, try asking specific questions again, and I'll try to help you out.

As for substitution matrices - substitutions between A and G (purines) or C and T (pyrimidines) are penalized less than a purine to a pyrimidine (or vice versa) just like how substitutions between phenylalanine and tyrosine are penalized less (similar side chains!)

The reason why you come across PAM/BLOSUM is because Smith-Waterman (and Needleman-Wunsch) can be used not only for nucleotide sequence alignment, but amino acid sequence alignment as well. All that being said, you really ought to ignore substitution matrices for now.
 
https://www.nhs.uk/mental-health/conditions/body-dysmorphia/ Most people have some mild apprehension about their body, such as one thinks their nose is too big, hair too straight or curvy. At the extreme, cases such as this, are difficult to completely understand. https://www.msn.com/en-ca/health/other/why-would-someone-want-to-amputate-healthy-limbs/ar-AA1MrQK7?ocid=msedgntp&cvid=68ce4014b1fe4953b0b4bd22ef471ab9&ei=78 they feel like they're an amputee in the body of a regular person "For...
Thread 'Did they discover another descendant of homo erectus?'
The study provides critical new insights into the African Humid Period, a time between 14,500 and 5,000 years ago when the Sahara desert was a green savanna, rich in water bodies that facilitated human habitation and the spread of pastoralism. Later aridification turned this region into the world's largest desert. Due to the extreme aridity of the region today, DNA preservation is poor, making this pioneering ancient DNA study all the more significant. Genomic analyses reveal that the...
Popular article referring to the BA.2 variant: Popular article: (many words, little data) https://www.cnn.com/2022/02/17/health/ba-2-covid-severity/index.html Preprint article referring to the BA.2 variant: Preprint article: (At 52 pages, too many words!) https://www.biorxiv.org/content/10.1101/2022.02.14.480335v1.full.pdf [edited 1hr. after posting: Added preprint Abstract] Cheers, Tom

Similar threads

Back
Top