I just checked out this post on yhathq.com. about conducting interpreting linear regression in R. I thought it was okay, if a little expertnoobish (i.e., the author seems to want to explain *what* regression is at the same time as explaining how to do it in R… probably not useful for true beginners). AND THEN I got to these sentences:

*So if a variables has 3 stars (***), then it means the probability that the variable is NOT relevant is 0%.*

That’s so wrong it hurts. Problem #1 (*and it’s not even the worst one!*) is that “***” means “*p<.001*.” The astute mathematics student will note that “less than .001” and “zero” are not the same thing. Perhaps the author thinks that, since “<.001” is really small, it might as well be rounded down to zero, but that’s a huge mistake. In some studies, differences between .001 and .0001 are actually quite important. In physics, for example, a standard benchmark for really important, critical results is “five-sigma,” which means (I think) five standard errors from the null-hypothesized expected value. The difference between four-sigma and five-sigma is often of great concern, but this author would just call them both “zero.”

The more fundamental problem (even worse than thinking .0001 = 0) is either a fundamental misunderstanding of, or a deeply misguided attempt to oversimplify, the nature of *p-*values themselves. In the article, the next reference to *p-*values, tucked into a table of how to interpret regression output, demonstrates the problem once again:

*6* |
*Variable p-value* |
*Probability the variable is NOT relevant. You want this number to be as small as possible. If the number is really small, *`R` will display it in scientific notation. In or example 2e-16 means that the odds that parent is meaningless is about ^{1}⁄_{5000000000000000} |

AAAAAAAAAAAAAAA!

That means “run away, screaming!” This is not, to the best of my knowledge, an accurate interpretation of a *p*-value*. Furthermore, the overall tone of imbuing *p*-values with such import in the interpretation process is arguably pretty misguided.

Perhaps someone should suspend that author’s driverR’s license for a while… heh heh… okay, that was lame. Anyway, don’t read that post except to learn how NOT to interpret *p*-values.

*FYI, my interpretation of what the author on yhathq.com has done is to forget that he or she is writing about a frequentist concept. A *p*-value’s interpretation, to be correct, has to involve the conditional concept of the null hypothesis being true, which this author has forgotten. Additionally, even if the null is true, the *p-*value only corresponds to the *probability* of the data being observed, not the “importance” or “relevance” of the data… there is no direct connection to effect sizes (which is what *importance* and *relevance* are about). *p=.03* means that, *if* H_{0} is true, *then *(and only then) there is a 3% chance that these results might still have been observed due to the vagaries of random sampling from the population (which, if we’re assuming H_{0} is true, is now, in the universe we are working in, the null hypothesis specification of the population). And then there’s the other possibility… that the null hypothesis is actually *false*, in which case the p-value means nothing, because (a) it was calculated in reference to the mean of H_{0}, which doesn’t exist in this universe and (b) the effect you’re searching for is most certainly there (because the alternative hypothesis is true!).

So I suspect the author believes him- or herself to be a Bayesian, despite evidence to the contrary, and thinks he or she is saying something about posterior probabilities. However, if I’m wrong in my criticism of the approach taken in this article, I hope someone will let me know.