I have a data set, (call it DATA) with a variable, COLOR. The mode of COLOR is numeric and the class is factor. First, I'm a bit confused by the "numeric" -- when printed out, the data for COLOR are not numeric -- they are all character values, like White or Blue or Black, etc. Any clarification on this is appreciated.
Further, I need to Write R code to return the levels of the COLOR variable, then determine the current reference level of this variable, and finally set the reference level of this variable to White. I tried using factor, but was entirely unsuccessful.
Thank you for taking the time to help.
mode(DATA$COLOR)
is "numeric"
because R internally stores factors as numeric codes (to save space), plus an associated vector of labels corresponding to the code values. When you print the factor, R automatically substitutes the corresponding label for each code.
f <- factor(c("orange","banana","apple"))
## [1] orange banana apple
## Levels: apple banana orange
str(f)
## Factor w/ 3 levels "apple","banana",..: 3 2 1
c(f) ## strip attributes to get a numeric vector
## [1] 3 2 1
attributes(f)
## $levels
## [1] "apple" "banana" "orange"
## $class
## [1] "factor"
... I need to Write R code to return the levels of the COLOR variable ...
levels(DATA$COLOR)
... then determine the current reference level of this variable,
levels(DATA$COLOR)[1]
... and finally set the reference level of this variable to White.
DATA$COLOR <- relevel(DATA$COLOR,"White")