Nearly on eexceptionally text book which discusses the normal approximation to the binomial circulation mentions the dominion of thumb that the approximation deserve to be provided if $npgeq5$ and $n(1-p)geq 5$. Some publications suggest $np(1-p)geq 5$ instead. The same continuous $5$ regularly mirrors up in discussions of as soon as to merge cells in the $chi^2$-test. None of the messages I uncovered offers a justification or reference for this dominion of thumb.

You are watching: Np and n(1-p)

Where does this consistent 5 come from? Why not 4 or 6 or 10? Where was this dominance of thumb initially introduced?

normal-distribution binomial-distribution approximation
Improve this question
asked Apr 16 "16 at 11:34

54111 gold badge44 silver badges1212 bronze badges
Add a comment |

4 Answers 4

Active Oldest Votes
Some possibilities are available by the Wikipedia short article on the Binomial circulation, under the section on Common approximation, which currently includes the following comment (emphasis mine):

Another typically offered preeminence is that both values $np$ and $n(1-p)$ have to be higher than 5. However before, the specific number varies from resource to source, and also relies on how good an approximation one desires.

Now there, this is associated with ensuring that the normal approximation $xsim N(mu,sigma)$ drops within the legal bounds for a binomial variable, $xin<0,n>$.

To spell this out, if we parameterize the desired coverage probcapability in terms of a z-score $z>0$, then we have$$mu pm zsigma in <0,n> implies zsigma leq min<,mu ,,, n - mu ,> implies z^2 leq minleft<, fracmu^2sigma^2 ,,, frac(n - mu)^2sigma^2, ight>$$Using the Binomial moments $mu=np$ and also $sigma^2=np(1-p)$, the above constraints require$$min!ig<,p,,1-p,ig>n geq z^2$$So for this strategy $z^2=5$ would correspond to a coverage probcapacity of$$Phi-Phi<-sqrt5,>approx 97.5\%$$wright here $Phi$ is the standard normal CDF.

See more: How To Measure Nails For Press Ons, Black Heart Press On Nails

So to the level this coverage probability is "pretty" and 5 is a nice round number ... that might provide some justification perhaps? I execute not have actually much experience with probcapability texts, so cannot say exactly how common "5" is, vs. various other "specific numbers" to usage the phrasing of Wikipedia. My feeling is there is nothing really unique around 5, and Wikipedia suggests 9 is prevalent additionally (matching to a "pretty" $z$ of 3).