I just checked out this post on yhathq.com. about conducting interpreting linear regression in R. I thought it was okay, if a little expertnoobish (i.e., the author seems to want to explain what regression is at the same time as explaining how to do it in R… probably not useful for true beginners). AND THEN I got to these sentences:
So if a variables has 3 stars (***), then it means the probability that the variable is NOT relevant is 0%.
That’s so wrong it hurts. Problem #1 (and it’s not even the worst one!) is that “***” means “p<.001.” The astute mathematics student will note that “less than .001” and “zero” are not the same thing. Perhaps the author thinks that, since “<.001” is really small, it might as well be rounded down to zero, but that’s a huge mistake. In some studies, differences between .001 and .0001 are actually quite important. In physics, for example, a standard benchmark for really important, critical results is “five-sigma,” which means (I think) five standard errors from the null-hypothesized expected value. The difference between four-sigma and five-sigma is often of great concern, but this author would just call them both “zero.”
The more fundamental problem (even worse than thinking .0001 = 0) is either a fundamental misunderstanding of, or a deeply misguided attempt to oversimplify, the nature of p-values themselves. In the article, the next reference to p-values, tucked into a table of how to interpret regression output, demonstrates the problem once again:
||Probability the variable is NOT relevant. You want this number to be as small as possible. If the number is really small,
R will display it in scientific notation. In or example 2e-16 means that the odds that parent is meaningless is about 1⁄5000000000000000
That means “run away, screaming!” This is not, to the best of my knowledge, an accurate interpretation of a p-value*. Furthermore, the overall tone of imbuing p-values with such import in the interpretation process is arguably pretty misguided.
Perhaps someone should suspend that author’s driverR’s license for a while… heh heh… okay, that was lame. Anyway, don’t read that post except to learn how NOT to interpret p-values.
*FYI, my interpretation of what the author on yhathq.com has done is to forget that he or she is writing about a frequentist concept. A p-value’s interpretation, to be correct, has to involve the conditional concept of the null hypothesis being true, which this author has forgotten. Additionally, even if the null is true, the p-value only corresponds to the probability of the data being observed, not the “importance” or “relevance” of the data… there is no direct connection to effect sizes (which is what importance and relevance are about). p=.03 means that, if H0 is true, then (and only then) there is a 3% chance that these results might still have been observed due to the vagaries of random sampling from the population (which, if we’re assuming H0 is true, is now, in the universe we are working in, the null hypothesis specification of the population). And then there’s the other possibility… that the null hypothesis is actually false, in which case the p-value means nothing, because (a) it was calculated in reference to the mean of H0, which doesn’t exist in this universe and (b) the effect you’re searching for is most certainly there (because the alternative hypothesis is true!).
So I suspect the author believes him- or herself to be a Bayesian, despite evidence to the contrary, and thinks he or she is saying something about posterior probabilities. However, if I’m wrong in my criticism of the approach taken in this article, I hope someone will let me know.