- #1

- 146

- 2

## Homework Statement

I have a 4-letter DNA sequence (AGGA) that appears 10 times in a strand that is 1027 letters long. The probability of finding this sequence at any random position is 1/256. What is the Z-score of this observation?

## Homework Equations

##Z=\frac{(X-E(X))}{\sqrt{Var(X)}}##

## The Attempt at a Solution

First I need to find out E(X). From a strand of 1027 there will be 1024 possible 4-letter positions (I found this by seeing how many were available in a 10 letter long strand and expanded it (n-3)). Multiplying the total by the probability of finding it in any position I get 1024/256=4. So E(X)=4. The actual found was 10 so I just need to square root of the variance. ##V(X)=E(x^2)-E(x)^2##. I have E(X) so I just need to find ##E(X^2)##.

As before this is where I seem to struggle. I'm not given a distribution or anything. All I know is the probability of finding this sequence in a random position in my current strand: 10/1024=0.009766. But I don't think this is what I should use for finding the variance as it would give me a negative result for the z value. My other idea was to just say that for this strand the variance is 6 (because 10 is 6 from the mean of 4). Then ##\sqrt{6}=2.44## giving ##\frac{10-4}{2.44}=2.44## which is more along the lines of what I expected.

Is this second method correct or is there something else that I'm not figuring out here?

Thanks for the help.

Last edited: