chi-square test | Improve Society

How to execute χ-square test with cross tabulation?

You can execute $\chi^2$ test with cross tabulation by such formula as below. In each cells, subtract expected value (E) from observed value (O), square the subtraction, divide the squared by expected value and add them all.

$\displaystyle\chi^2(df)=\sum\frac{(O-E)^2}{E}$

df: degree of freedom

$\chi^2$ statistics follows $\chi^2$ distribution. When degree of freedom is 1, $\chi^2$ statistics is 3.841 if probability is smaller than 0.05 in one sided test, $\chi^2$ is 6.635 if p < 0.01, [latex]\chi^2[/latex] is 10.828 if p < 0.001, respectively. In two-tailed test, [latex]\chi^2[/latex] is 5.024 if p < 0.05, [latex]\chi^2[/latex] is 7.879 if p < 0.01, respectively.

	TRUE	FALSE	Marginal total
POSITIVE	a	b	a + b
NEGATIVE	c	d	c + d
Marginal total	a + c	b + d	N

$\displaystyle \begin{array}{rcl}\chi^2&=&(ad-bc)^2\times\frac{N}{(a+b)(c+d)(a+c)(b+d)}\vspace{0.2in}\\\chi^2(Yates)&=&\left(|ad-bc|-\frac{1}{2}\right)^2\times\frac{N}{(a+b)(c+d)(a+c)(b+d)}\end{array}$

When should you execute Fisher exact test, not chi-square test?

You should not execute chi-square test but Fisher exact probability test when gland total of cross tabulation was smaller than 20 or one or greater than one cells had smaller than 5 expected value. In this article, I would like to describe how to solve expected value. Expected value is calculated with marginal total.

We have cross tabulation below;

	TRUE	FALSE	Marginal total
POSITIVE	a	b	a + b
NEGATIVE	c	d	c + d
Marginal total	a + c	b + d	N

Expected value of each cells is below;

	TRUE	FALSE	Marginal total
POSITIVE	(a + b)*(a + c)/N	(a + b)*(b + d)/N	a + b
NEGATIVE	(c + d)*(a + c)/N	(c + d)*(b + d)/N	c + d
Marginal total	a + c	b + d	N

How to calculate Fisher’s exact test with logarithm?

Chi-square test is known to compare between ratios with two-by-two table. But you couldn’t use chi-square test if total number was smaller than 20 or expected value was smaller than 5.

Even if you couldn’t use chi-square test, you could use Fisher’s exact test and calculate accurate p-value. Although the test has reliability, it requires huge amount of calculation with factorial function and software may overflow. You would easily calculate it with conversion to the logarithm first. Next, you could add or subtract the logarithm. At last, you could convert the result to the power of e, the base of natural logarithm.

	TRUE	FALSE	Marginal total
POSITIVE	a	b	a + b
NEGATIVE	c	d	c + d
Marginal total	a + c	b + d	N

$\displaystyle \begin{array} {rcl} P &=& \frac{(a+b)!(c+d)!(a+c)!(b+d)!}{N!a!b!c!d!}\vspace{0.2in}\\&=& \exp \left[ LN \left( \frac{(a+b)!(c+d)!(a+c)!(b+d)!}{N!a!b!c!d!} \right) \right]\vspace{0.2in}\\ &=& \exp [ LN((a+b)!) + LN((c+d)!) + LN((a+c)!) + LN((b+d)!)\vspace{0.2in}\\& & - LN(N!) - LN(a!) - LN(b!) - LN(c!) - LN(d!) ]\end{array}$

月	火	水	木	金	土	日
	1	2	3	4	5	6
7	8	9	10	11	12	13
14	15	16	17	18	19	20
21	22	23	24	25	26	27
28	29	30	31