Actually there are 2 questions, one is more advanced than the other.
corrplot()
but can deal with factors.I originally tried to use chisq.test()
then calculate the p-value and Cramer's V as correlation, but there too many columns to figure out.
So could anyone tell me if there is a quick way to create a "corrplot" that each cell contains the value of Cramer's V, while the colour is rendered by p-value. Or any other kind of similar plot.
Regarding Cramer's V, let's say tbl
is a 2-dimensional factor data frame.
chi2 <- chisq.test(tbl, correct=F)
Cramer_V <- sqrt(chi2$/nrow(tbl))
I prepared a test data frame with factors:
df <- data.frame(
group = c('A', 'A', 'A', 'A', 'A', 'B', 'C'),
student = c('01', '01', '01', '02', '02', '01', '02'),
exam_pass = c('Y', 'N', 'Y', 'N', 'Y', 'Y', 'N'),
subject = c('Math', 'Science', 'Japanese', 'Math', 'Science', 'Japanese', 'Math')
)
df <- data.frame(
group = c('A', 'A', 'A', 'A', 'A', 'B', 'C'),
student = c('01', '01', '01', '02', '02', '01', '02'),
exam_pass = c('Y', 'N', 'Y', 'N', 'Y', 'Y', 'N'),
subject = c('Math', 'Science', 'Japanese', 'Math', 'Science', 'Japanese', 'Math')
)
df$group <- factor(df$group, levels = c('A', 'B', 'C'), ordered = T)
df$student <- as.integer(df$student)
Here's a tidyverse
solution:
# example dataframe
df <- data.frame(
group = c('A', 'A', 'A', 'A', 'A', 'B', 'C'),
student = c('01', '01', '01', '02', '02', '01', '02'),
exam_pass = c('Y', 'N', 'Y', 'N', 'Y', 'Y', 'N'),
subject = c('Math', 'Science', 'Japanese', 'Math', 'Science', 'Japanese', 'Math')
)
library(tidyverse)
library(lsr)
# function to get chi square p value and Cramers V
f = function(x,y) {
tbl = df %>% select(x,y) %>% table()
chisq_pval = round(chisq.test(tbl)$p.value, 4)
cramV = round(cramersV(tbl), 4)
data.frame(x, y, chisq_pval, cramV) }
# create unique combinations of column names
# sorting will help getting a better plot (upper triangular)
df_comb = data.frame(t(combn(sort(names(df)), 2)), stringsAsFactors = F)
# apply function to each variable combination
df_res = map2_df(df_comb$X1, df_comb$X2, f)
# plot results
df_res %>%
ggplot(aes(x,y,fill=chisq_pval))+
geom_tile()+
geom_text(aes(x,y,label=cramV))+
scale_fill_gradient(low="red", high="yellow")+
theme_classic()
Note that I'm using lsr
package to calculate Cramers V using the cramersV
function.