When should one use a data.frame
, and when is it better to use a matrix
?
Both keep data in a rectangular format, so sometimes it's unclear.
Are there any general rules of thumb for when to use which data type?
Part of the answer is contained already in your question: You use data frames if columns (variables) can be expected to be of different types (numeric/character/logical etc.). Matrices are for data of the same type.
Consequently, the choice matrix/data.frame is only problematic if you have data of the same type.
The answer depends on what you are going to do with the data in data.frame/matrix. If it is going to be passed to other functions then the expected type of the arguments of these functions determine the choice.
Also:
Matrices are more memory efficient:
m = matrix(1:4, 2, 2)
d = as.data.frame(m)
object.size(m)
# 216 bytes
object.size(d)
# 792 bytes
Matrices are a necessity if you plan to do any linear algebra-type of operations.
Data frames are more convenient if you frequently refer to its columns by name (via the compact $ operator).
Data frames are also IMHO better for reporting (printing) tabular information as you can apply formatting to each column separately.