前几天一医生朋友给我发信息,他的文章(SCI)编辑问为什么有的地方用卡方,有的地方却用的Fisher的方法。

(估计也不是什么太大牌的杂志,否则不会问出这个问题)

统计内容我帮他做的,自然也要帮人帮到底,遂从wikipedia[1]上抄了几句,后面的稍做了修改:

Fisher's exact test is a statistical significance test used in the analysis of contingency tables. Although in practice it is employed when sample sizes are small, it is valid for all sample sizes.

With large samples, a chi-squared test can be a good choice, but the significance value it provides is only an approximation.

The approximation is inadequate when sample sizes are small, or the data are very unequally distributed among the cells of the table, resulting in the cell counts predicted on the null hypothesis (the “expected values”) being low. The usual rule of thumb for deciding whether the chi-squared approximation is good enough is that the chi-squared test is not suitable when the expected values in 20% or more of the cells of a contingency table are below 5.

可能有人会问,为什么不用卡方的连续性校正?统计课本里就这么教的呀。

事实上,卡方的连续性校正方法(Yates's correction for continuity, 1934),多数情况下矫枉过正。而Fisher's exact test是更好的选择,只不过以前限于计算能力,这个“exact test”的方法不容易实现,而现在则不存在这个障碍。

看一个使用卡方连续性校正结论发生反转的例子:

所以当卡方检验的P值处于α(一般为0.05)附近时,需要注意卡方检验的近似程度是否满足要求,必要时使用Fisher精确概率法替代卡方检验。

[1] https://en.wikipedia.org/wiki/Fisher's_exact_test