I have a dataset of protein, consisting of 10000 sequence each, having length Si
, where 1<=i<=10000. Now, I extracted k-mer "a" from the 1st sequence. The probability of occurrence of amino acid (character of protein sequence) is given by its frequency in the dataset. If I choose k-mer "b" from...