I'm very new to R (moving over from SPSS). I'm using RStudio on a Mac running Mavericks. Please answer my question in words of 2 syllables as this is my first real attempt at anything like this. I've worked through some basic tutorials and can make things work on all the sample data.
I have a data set with 64,000-ish rows and about 20 columns. I want to get the mean of the variable "hold_time", but whatever I try I get either NA or NA and a warning message
I have tried all of the following:
> summary(data_Apr_Jun$hold_time,na.rm=TRUE)
5 6 7 4 8 2 1 3 10
9596 9191 3192 1346 1145 977 940 655 534
11 9 12 0 13 15 14 16 17
490 444 249 128 106 86 73 68 40
98 118 121 128 125 97 101 188 86
31 29 28 28 27 27 26 26 26
102 105 113 81 119 139 127 134 152
25 25 25 25 24 24 23 23 23
18 69 96 106 110 111 120 190 76
23 23 23 22 22 22 22 22 22
82 132 135 156 166 94 115 116 117
22 21 21 21 21 21 20 20 20
142 153 165 19 93 100 104 112 126
20 20 20 20 20 19 19 19 19
131 138 143 157 177 189 61 87 103
19 19 19 19 19 19 19 19 18
108 148 176 212 54 56 64 74 79
18 18 18 18 18 18 18 18 18
99 107 129 163 168 171 178 226 236
18 17 17 17 17 17 17 17 17
59 71 78 95 114 122 123 130 (Other)
17 17 17 17 16 16 16 16 2739
NA's
29807
> mean(as.numeric(data_Apr_Jun$hold_time,NA.rm=TRUE))
[1] NA
> data_Apr_Jun$hold_time[data_Apr_Jun$hold_time=="NA"]<-0
> mean(as.numeric(data_Apr_Jun$hold_time))
[1] NA
> mean(data_Apr_Jun$hold_time)
[1] NA
Warning message:
In mean.default(data_Apr_Jun$hold_time) :
argument is not numeric or logical: returning NA
> mean(as.numeric(data_Apr_Jun$hold_time,na.rm=TRUE))
[1] NA
> colMeans(data_Apr_Jun$hold_time)
Error in colMeans(data_Apr_Jun$hold_time) :
'x' must be an array of at least two dimensions
> colMeans(data_Apr_Jun)
Error in colMeans(data_Apr_Jun) : 'x' must be numeric
> mean(data_Apr_Jun$hold_time,na.omit)
[1] NA
Warning message:
In mean.default(data_Apr_Jun$hold_time, na.omit) :
argument is not numeric or logical: returning NA
So even though I am removing the NAs they don't seem to be being removed. I am flummoxed.
Hello Rnovice unfortunatly there are several errors... Lets resolve them one by one:
> mean(as.numeric(data_Apr_Jun$hold_time,NA.rm=TRUE))
[1] NA
This is because you use na.rm
in a wrong manner:
it should be
mean(as.numeric(data_Apr_Jun$hold_time),na.rm=TRUE)
na.rm
is an argument of mean
, not of as.numeric
(caution with the brackets)na.rm
R
is case sensitive==================================================================================
> data_Apr_Jun$hold_time[data_Apr_Jun$hold_time=="NA"]<-0
R
does not allow comparison with NA
as i pointed our here:
Something weird about returning NAs
What you mean is
data_Apr_Jun$hold_time[which(is.na(data_Apr_Jun$hold_time))] <- 0
One more remark =="NA"
is comparing with a string "NA"
. Try is.na("NA")
and is.na(NA)
to see the difference.
==================================================================================
colMeans(data_Apr_Jun$hold_time)
Error in colMeans(data_Apr_Jun$hold_time) :
'x' must be an array of at least two dimensions
try data_Apr_Jun$hold_time
and you will see, that it returns a vector. This is why a colwise mean (computed by colMeans
) makes no sence.
Hope the rest is understandable/solveable with these hints.
One very importent thing that you already realized:
Use R! you are on the right track!