yevi
- 65
- 0
hmm Anova seems interesting
The discussion revolves around applying empirical tests to analyze DNA sequences for complexity and randomness. Participants explore various statistical methods, including frequency tests, runs tests, and chi-square tests, while considering how to convert DNA sequences into suitable formats for these analyses.
Participants express differing views on the best representation for DNA sequences (8-bit vs. 2-bit) and the implications of these choices on randomness tests. There is no consensus on the optimal approach or representation method, and the discussion remains unresolved regarding the impact of different representations on test outcomes.
Participants note that the tests discussed may not be statistically independent and that the terminology used in some references may differ from standard statistical definitions. There is also uncertainty regarding the sufficiency of bit representation for accurately assessing randomness.
You can use any ordering you like as long as you "trick" the test into treating the endpoints similarly with the interior points.yevi said:I am not sure that circular ordering like this is suitable and can be used for run
Excel Analysis ToolPak Add-In (which, if installed, will show up as an item "Data Analysis" under the "Tools" menu) also has built-in ANOVA packages -- which I have never used because (for me) regression is more intuitive and practical.yevi said:hmm Anova seems interesting
Run test. A sequence may also be tested for "runs up" and "runs down."
This means we examine the length of monotone subsequences of the original
sequence, i.e., segments that are increasing or decreasing.
As an example of the precise definition of a run, consider the sequence of ten
numbers "1298536704"; putting a vertical line at the left and right and between
Xj and Xj+1 whenever Xj >Xj+1, we obtain |1 2 9| 8|5| 3 6 7 |0 4|, which displays the "runs up": there is a run of length 3, followed by two runs of length 1, followed by another run of length 3, followed by a run of length 2.
See also: http://www.statisticssolutions.com/Chi_square_test.htm[PLAIN said:http://en.wikipedia.org/wiki/Pearson%27s_chi-square_test]The[/PLAIN] approximation to the chi-square distribution breaks down if expected frequencies are too low. It will normally be acceptable so long as no more than 10% of the events have expected frequencies below 5. Where there is only 1 degree of freedom, the approximation is not reliable if expected frequencies are below 10. In this case, a better approximation can be had by reducing the absolute value of each difference between observed and expected frequencies by 0.5 before squaring; this is called Yates' correction.
I understand. Chi-sq. is nonparametric, which some people take as an advantage. OTOH, the parametric regression/ANOVA approach let's you to test many hypotheses simultaneously (jointly), including "difference-in-differences." In those respects the regression/ANOVA approach can be nested to an arbitrary depth.yevi said:For frequency within a block test I prefer to use Chi-square (Pearson's), like I did in "standard" frequency test.