MHB Corresponding character matching probability

vivek1 · Jan 21, 2018

I have a dataset of protein, consisting of 10000 sequence each, having length Si
, where 1<=i<=10000. Now, I extracted k-mer "a" from the 1st sequence. The probability of occurrence of amino acid (character of protein sequence) is given by its frequency in the dataset. If I choose k-mer "b" from other sequence, what will be the probability that k-mer "b" matches k-mer "a" at least in r position out of k position?

Greg · Jan 21, 2018

I believe that would be the probability that k-mer a appears in the remaining 9999 sequences. Without numerical data we can't give an exact value.

MHB Corresponding character matching probability

Thread 'There are only finitely many primes'

Similar threads

I Trigonometry problem of interest

Insights Fixing Things Which Can Go Wrong With Complex Numbers

B Geometry Puzzle with 20 points in a cross pattern

I Geometry problem of interest with a 3-4-5 triangle

B Excel: converting a 3-ish week count into a monthly count

Insights Thinking Outside The Box Versus Knowing What’s In The Box

Insights Why Entangled Photon-Polarization Qubits Violate Bell’s Inequality

Insights Quantum Entanglement is a Kinematic Fact, not a Dynamical Effect

Insights What Exactly is Dirac’s Delta Function? - Insight

Insights Relativator (Circular Slide-Rule): Simulated with Desmos - Insight

Insights Fixing Things Which Can Go Wrong With Complex Numbers