Sorting Association Rules in R

Brent Ferrier picture Brent Ferrier · Apr 8, 2014 · Viewed 10.2k times · Source

I'm trying to accomplish the goals stated below and have oodles of errors. I've spent a lot of time trying to sort the rules and just print the top ten. I know how to print out the entire list.

Use R, to explore generating rules in larger data files. Consider the Adult data (available in R with the > data(Adult) command). Generate the association rules with a confidence threshold of 0.8

  1. Print out the top 10 rules sorted by support. Consider using the inspect command along with sort and indexing into the sorted rules.
  2. Print out the top 10 rules sorted by confidence.
  3. Look at generating rules that are restricted to have income on the lhs of the rule. Note, options for income are two values: small and large. Consider including the appearance parameter of the apriori function. Print the first 10 rules sorted by lift.

Here is my code so far:

library(arules)    
library(arulesViz)

data(Adult)
head(Adult)

rules <- apriori(Adult, parameter = list(supp = 0.5, conf = 0.8))

top.support <- sort(rules, decreasing = TRUE, na.last = NA, by = "support")
top.ten.support <- sort.list(top.support, partial=10)
inspect(top.ten.support)

top.confidence <- sort(rules, decreasing = TRUE, na.last = NA, by = "confidence")
top.ten.confidence <- sort.list(top.support,partial=10)
inspect(top.ten.confidence)

rules2 <- apriori(Adult, parameter=list(supp = 0.5, conf = 0.8), appearance = income)

top.lift <- sort(rules2, decreasing = TRUE, na.last = NA, by = "lift")
top.ten.lift <- sort.list(top.lift, partial=10)
inspect(top.ten.lift)

Answer

rcs picture rcs · Apr 8, 2014

1) Print out the top 10 rules sorted by support:

R> top.support <- sort(rules, decreasing = TRUE, na.last = NA, by = "support")
R> inspect(head(top.support, 10))  # or inspect(sort(top.support)[1:10])
   lhs                               rhs                            support confidence   lift
1  {}                             => {capital-loss=None}             0.9533     0.9533 1.0000
2  {}                             => {capital-gain=None}             0.9174     0.9174 1.0000
3  {}                             => {native-country=United-States}  0.8974     0.8974 1.0000
4  {capital-gain=None}            => {capital-loss=None}             0.8707     0.9491 0.9956
5  {capital-loss=None}            => {capital-gain=None}             0.8707     0.9133 0.9956
...

2) Print out the top 10 rules sorted by confidence:

R> top.confidence <- sort(rules, decreasing = TRUE, na.last = NA, by = "confidence")
R> inspect(head(top.confidence, 10))
   lhs                               rhs                 support confidence   lift
1  {hours-per-week=Full-time}     => {capital-loss=None}  0.5607     0.9583 1.0052
2  {workclass=Private}            => {capital-loss=None}  0.6640     0.9565 1.0034
3  {workclass=Private,                                                            
    native-country=United-States} => {capital-loss=None}  0.5897     0.9555 1.0023
4  {capital-gain=None,                                                            
    hours-per-week=Full-time}     => {capital-loss=None}  0.5192     0.9551 1.0019
5  {workclass=Private,                                                            
    race=White}                   => {capital-loss=None}  0.5675     0.9550 1.0018
...

3)

R> rules2 <- apriori(Adult, parameter=list(supp = 0.1, conf = 0.8),
                     appearance = list(lhs = c("income=small", "income=large"), 
                                       default = "rhs"))
R> top.lift <- sort(rules2, decreasing = TRUE, na.last = NA, by = "lift")
R> inspect(head(subset(top.lift, lhs %pin% "income"), 10))
lhs               rhs                                 support confidence  lift
1 {income=large} => {marital-status=Married-civ-spouse}  0.1370     0.8535 1.8627
2 {income=large} => {sex=Male}                           0.1364     0.8496 1.2710
3 {income=large} => {race=White}                         0.1457     0.9077 1.0615
4 {income=small} => {capital-gain=None}                  0.4849     0.9581 1.0444
5 {income=large} => {native-country=United-States}       0.1468     0.9146 1.0191
...