Problem:
The apriori function of the arules package infers association rules from the input transactions and reports the support, confidence, and lift of each rule. The association rules are derived from frequent itemsets. I'd like to get the most frequent itemsets in the input transactions. Specifically, I'd like to get all itemsets with a given minimum support. The support of an itemset is the ratio of the number of the transactions that contain the itemset to the total number of transactions.
Requirements:
itemFrequency
function provided by the arules package. Unfortunately, this function just reports the itemsets with a single item. I'm interested in all itemsets of any length with a minimum support.Example Input:
a,b
a,b,c
Program:
# The following is how I'm using apriori to infer the association rules.
library(package = "arules")
transactions = read.transactions(file = file("stdin"), format = "basket", sep = ",")
rules = apriori(transactions, parameter = list(minlen=1, sup = 0.001, conf = 0.001))
WRITE(rules, file = "", sep = ",", quote = TRUE, col.names = NA)
Current Output:
"","rules","support","confidence","lift"
"1","{} => {c}",0.5,0.5,1
"2","{} => {b}",1,1,1
"3","{} => {a}",1,1,1
"4","{c} => {b}",0.5,1,1
"5","{b} => {c}",0.5,0.5,1
"6","{c} => {a}",0.5,1,1
"7","{a} => {c}",0.5,0.5,1
"8","{b} => {a}",1,1,1
"9","{a} => {b}",1,1,1
"10","{b,c} => {a}",0.5,1,1
"11","{a,c} => {b}",0.5,1,1
"12","{a,b} => {c}",0.5,0.5,1
Desired Output:
"itemset","support"
"{a}",1
"{a,b}",1
"{b}",1
"{a,b,c}",0.5
"{a,c}",0.5
"{b,c}",0.5
"{c}",0.5
I found the generatingItemsets
function in the reference manual of the arules package.
library(package = "arules")
transactions = read.transactions(file = file("stdin"), format = "basket", sep = ",")
rules = apriori(transactions, parameter = list(minlen=1, sup = 0.001, conf = 0.001))
itemsets <- unique(generatingItemsets(rules))
itemsets.df <- as(itemsets, "data.frame")
frequentItemsets <- itemsets.df[with(itemsets.df, order(-support,items)),]
names(frequentItemsets)[1] <- "itemset"
write.table(frequentItemsets, file = "", sep = ",", row.names = FALSE)