Why is the third quartile less than the mean in my data?

Ed Fine picture Ed Fine · Dec 6, 2012 · Viewed 7.6k times · Source

I loaded a data set called gob into R and tried the handy summary function. It is Note that the 3rd quartile is less than the mean. How can this be? Is it the size of my data or something else like that?

I already tried passing in a large value for the digits parameter (e.g. 10), and that does not resolve the issue.

> summary(gob, digits=10)

   customer_id         100101.D            100199.D            100201.D        
 Min.   :   1083   Min.   :0.0000000   Min.   :0.0000000   Min.   :0.0000000  
 1st Qu.: 965928   1st Qu.:0.0000000   1st Qu.:0.0000000   1st Qu.:0.0000000  
 Median :2448738   Median :0.0000000   Median :0.0000000   Median :0.0000000  
 Mean   :2660101   Mean   :0.0010027   Mean   :0.0013348   Mean   :0.0000878  
 3rd Qu.:4133368   3rd Qu.:0.0000000   3rd Qu.:0.0000000   3rd Qu.:0.0000000  
 Max.   :6538193   Max.   :1.0000000   Max.   :1.0000000   Max.   :0.7520278  

Note that for gob$100201.D the mean is 0.0000878 but the 3rd Qu. = 0.

Answer

Didzis Elferts picture Didzis Elferts · Dec 6, 2012

It is not a bug, just your data contains lot of 0 values. For example, if I make x with twelve 0 and one 1, I get result that 3rd quartile is smaller than mean

 x<-c(0,0,0,0,0,0,0,0,0,0,0,0,1)
summary(x)

  Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
0.00000 0.00000 0.00000 0.07692 0.00000 1.00000 

Try to use table() on your column to see distribution of values

table(x)
 x
 0  1 
 12  1