The Analytics Compass Blog

Twice weekly articles to help SMB companies optimize business performance with data analytics and to improve their analytics expertise.

Subscribe via E-mail

Your email:

Search SimaFore

FREE SMB Survey Report

describe the image
  • Our report (pdf) is a survey of analytics usage and needs of more than 100 SMBs
  • Find out which analytics applications have
    • Highest value
    • Most demand
    • Best ROI

Affordable Analytics for SMB


Browse by Tag

Current Articles | RSS Feed RSS Feed

2 key assumptions to be aware of before applying the chi-square test


(contributed by Sangeetha Krishnan) 2 key assumptions chi square testIn a few previous articles we introduced a few common business analytics applications of the chi-square test, described the common steps in implementing the chi-square test, and also discussed the core working of the chi-square technique. In this article we touch upon two very important assumptions that go into the chi-square test which analysts must pay heed to.

1. Sample size assumption:

The chi-square test can be used to determine differences in proportions using a two-by-two contingency table. It is however important to understand that the chi-square tests yields only an approximate p-value, on which a correction factor is then applied. This only works well when your datasets are large enough. When sample sizes are small, as indicated by more than 20% of the contingency cells having expected values ­< 5 a Fisher's exact test  maybe more appropriate. This test is one of a class of “exact tests”, because the significance of the deviation from a “null hypothesis” can be calculated exactly, rather than relying on an approximation.

2. Independence assumption: 

Secondly, the chi-square test cannot be used on correlated data. When you are looking to test differences in proportions among matched pairs in a before/after scenario, an appropriate choice would be the McNemar's test. In essence, it is a chi-square goodness of fit test on the two discordant cells, with a null hypothesis stating that 50% of the changes (agreements or disagreements) go in each direction. This test requires the same subjects to be included in the before and after measurements i.e. the pairs should be matched one-on-one.  

Chi-squared test of independence is a very useful tool for any predictive analytics professional.  What other type of business problems are best solved by using these tools?

download ebook on chi-square applications


information given is useful; thanks! 
However abut the ratio, how do we know if the sample is large enough? 
could quantitative data be analyzed using chi sq. ?
Posted @ Wednesday, January 04, 2012 10:59 AM by mohammad shariati
indepedence and sample size
Posted @ Tuesday, October 23, 2012 5:07 AM by omon
what are 4 assumption underlying the use of chi square
Posted @ Tuesday, April 16, 2013 4:42 AM by preshate
Sample size is a perennial question in statistics. Most techniques presume at least 30 samples as a bare minimum. 
Chi-sq can be used for numerical or quantitative data - you simply convert the numbers into ranges. For example if you have a variable with values 2.5, 3.1,1.0, 5.6, 7.0, you can create 3 bins: less than 3.0, between 3.1 and 6.0, more than 6.1 and so on.
Posted @ Wednesday, April 17, 2013 9:29 AM by Bala Deshpande
Post Comment
Website (optional)

Allowed tags: <a> link, <b> bold, <i> italics