I've found this issue with t-tests and chi-squared in R but I assume this issue applies generally to other tests. If I do:
a <- 1:10
b <- 100:110
t.test(a,b)
I get: t = -64.6472, df = 18.998, p-value < 2.2e-16
. I know from the comments that 2.2e-16
is the value of .Machine$double.eps
- the smallest floating point number such that 1 + x != 1
, but of course R can represent numbers much smaller than that. I know also from the R FAQ that R has to round floats to 53 binary digits accuracy: R FAQ.
A few questions: (1) am I correct in reading that as 53 binary digits of precision or are values in R < .Machine$double.eps
not calculated accurately? (2) Why, when doing such calculations does R not provide a means to display a smaller value for the p-value, even with some loss of precision? (3) Is there a way to display a smaller p-value, even if I lose some precision? For a single test 2 decimal significant figures would be fine, for values I am going to Bonferroni correct I'll need more. When I say "lose some precision" I think < 53 binary digits, but (4) am I completely mistaken and any p-value < .Machine$double.eps
is wildly inaccurate? (5) Is R just being honest and other stats packages are not?
In my field very small p-values are the norm, some examples: http://www.ncbi.nlm.nih.gov/pubmed/20154341, http://www.plosgenetics.org/article/info%3Adoi%2F10.1371%2Fjournal.pgen.1002215 and this is why I want to represent such small p-values.
Thanks for your help, sorry for such a tortuous question.
I'm puzzled by several things in the exchange of answers and comments here.
First of all, when I try the OP's original example I don't get a p value as small as the ones that are being debated here (several different 2.13.x versions and R-devel):
a <- 1:10
b <- 10:20
t.test(a,b)
## data: a and b
## t = -6.862, df = 18.998, p-value = 1.513e-06
Second, when I make the difference between groups much bigger, I do in fact get the results suggested by @eWizardII:
a <- 1:10
b <- 110:120
(t1 <- t.test(a,b))
# data: a and b
# t = -79.0935, df = 18.998, p-value < 2.2e-16
#
> t1$p.value
[1] 2.138461e-25
The behavior of the printed output in t.test
is driven by its call to stats:::print.htest
(which is also called by other statistical testing functions such as chisq.test
, as noted by the OP), which in turn calls format.pval
, which presents p values less than its value of eps
(which is .Machine$double.eps
by default) as < eps
. I'm surprised to find myself disagreeing with such generally astute commenters ...
Finally, although it seems silly to worry about the precise value of a very small p value, the OP is correct that these values are often used as indices of strength of evidence in the bioinformatics literature -- for example, one might test 100,000 candidate genes and look at the distribution of resulting p values (search for "volcano plot" for one example of this sort of procedure).