A statistical question

1. Feb 17, 2017

Faiq

1. The problem statement, all variables and given/known data
Can someone tell me when testing for independence using chi square tests, why is the expected frequency of a cell is denoted by the formula
$$\frac{\sum row * \sum column}{\sum total } \$$

2. Feb 17, 2017

BvU

Is this a human cell or a plant cell ? Or perhaps an excell ? Some more description might make your question a bit clearer, perhaps ?

3. Feb 17, 2017

Ray Vickson

In a two-way table, if $R_i$ is the total number in row $i$ and $C_j$ the total number in column $j$ then $f_i = R_i/N$ is the estimated probability of the event for row $i$ and $g_j = C_j/N$ is the estimated probability of the event for column $j$. Here, $N = \sum_i R_i = \sum_j C_j$ is the total number of observations. Under the hypothesis of independence between rows and columns, the estimated probabilty of the cell $(i,j)$ is $\bar{p}_{ij} = f_i \,g_j = R_i C_j/N^2.$ Thus, the expected frequency of cell $(i,j)$ is $E_{ij} = N \bar{p}_{ij} = R_i C_j/N.$

4. Feb 17, 2017

haruspex

The null hypothesis is that the two attributes are independent. Call the attributes A (rows representing A and not A) and B (columns representing B and not B). If they are independent then the fraction having attribute A multiplied by the fraction having attribute B should approximately equal the fraction having attributes A and B. I.e. #(A & B) / total = ( #A / total)*( #B /total), so #(A & B) = #A * #B / total.

5. Feb 17, 2017

Faiq

Shouldn't it be $N = \sum_i R_i + \sum_j C_j$

6. Feb 17, 2017

Ray Vickson

No. Try it for yourself on a simple example:
$$\begin{array}{ccc|l} & & &\text{tot.} \\ \hline 1 & 2 & 3 &6\\ 4 & 5 & 6 & 15 \\ 7 & 8 & 9 & 24\\ \hline 12 & 15 & 18 & 45\; \leftarrow \text{totals} \end{array}$$

Last edited: Feb 17, 2017