MHB Corresponding character matching probability

  • Thread starter Thread starter vivek1
  • Start date Start date
  • Tags Tags
    Probability
Click For Summary
The discussion focuses on calculating the probability of matching k-mers in a protein dataset of 10,000 sequences. It highlights that the probability of an amino acid's occurrence is based on its frequency within the dataset. The key question is determining the likelihood that a k-mer "b" matches k-mer "a" in at least "r" positions out of "k". The conclusion emphasizes that without specific numerical data, an exact probability cannot be established. This analysis is crucial for understanding sequence similarities in protein research.
vivek1
Messages
1
Reaction score
0
I have a dataset of protein, consisting of 10000 sequence each, having length Si
, where 1<=i<=10000. Now, I extracted k-mer "a" from the 1st sequence. The probability of occurrence of amino acid (character of protein sequence) is given by its frequency in the dataset. If I choose k-mer "b" from other sequence, what will be the probability that k-mer "b" matches k-mer "a" at least in r position out of k position?
 
Mathematics news on Phys.org
I believe that would be the probability that k-mer a appears in the remaining 9999 sequences. Without numerical data we can't give an exact value.
 

Similar threads

  • · Replies 1 ·
Replies
1
Views
2K
  • · Replies 15 ·
Replies
15
Views
2K
  • · Replies 2 ·
Replies
2
Views
2K
Replies
3
Views
3K
  • · Replies 13 ·
Replies
13
Views
2K
Replies
4
Views
2K
  • · Replies 3 ·
Replies
3
Views
3K
  • · Replies 1 ·
Replies
1
Views
2K
  • · Replies 1 ·
Replies
1
Views
2K
  • · Replies 6 ·
Replies
6
Views
2K