I know this is a basic question but for some strange reason I am unable to find an answer.
How should I apply basic statistical functions like mean, median, etc. over entire array, matrix or dataframe to get unique answers and not a vector over rows or columns
Since this comes up a fair bit, I'm going to treat this a little more comprehensively, to include the 'etc.' piece in addition to mean
and median
.
For a matrix, or array, as the others have stated, mean
and median
will return a single value. However, var
will compute the covariances between the columns of a two dimensional matrix. Interestingly, for a multi-dimensional array, var
goes back to returning a single value. sd
on a 2-d matrix will work, but is deprecated, returning the standard deviation of the columns. Even better, mad
returns a single value on a 2-d matrix and a multi-dimensional array. If you want a single value returned, the safest route is to coerce using as.vector()
first. Having fun yet?
For a data.frame
, mean
is deprecated, but will again act on the columns separately. median
requires that you coerce to a vector first, or unlist
. As before, var
will return the covariances, and sd
is again deprecated but will return the standard deviation of the columns. mad
requires that you coerce to a vector or unlist
. In general for a data.frame
if you want something to act on all values, you generally will just unlist
it first.
Edit: Late breaking news(): In R 3.0.0 mean.data.frame is defunctified:
o mean() for data frames and sd() for data frames and matrices are
defunct.