Association analysis with duplicate transactions using arules package in R

Babatunde Awosanya picture Babatunde Awosanya · Jun 17, 2013 · Viewed 15k times · Source

I want to create a transaction object in basket format which I can call anytime for my analyses. The data contains comma separated items with 1001 transactions. The first 10 transactions look like this:

hering,corned_b,olives,ham,turkey,bourbon,ice_crea
baguette,soda,hering,cracker,heineken,olives,corned_b
avocado,cracker,artichok,heineken,ham,turkey,sardines
olives,bourbon,coke,turkey,ice_crea,ham,peppers
hering,corned_b,apples,olives,steak,avocado,turkey
sardines,heineken,chicken,coke,ice_crea,peppers,ham
olives,bourbon,coke,turkey,ice_crea,heineken,apples
corned_b,peppers,bourbon,cracker,chicken,ice_crea,baguette
soda,olives,bourbon,cracker,heineken,peppers,baguette
corned_b,peppers,bourbon,cracker,chicken,bordeaux,hering
...

I observed that there are duplicated transactions in the data and removed them but each time I tried to read the transactions, I get:

Error in asMethod(object) : can not coerce list with transactions with duplicated items

Here is my code:

data <- read.csv("AssociationsItemList.txt",header=F)
data <-  data[!duplicated(data),]
pop <- NULL
for(i in 1:length(data)){
pop <- paste(pop, data[i],sep="\n")
}
write(pop, file = "Trans", sep = ",")
transdata <- read.transactions("Trans", format = "basket", sep=",")

I'm sure there's something little yet important I've missed. Kindly offer your assistance.

Answer

Vincent Zoonekynd picture Vincent Zoonekynd · Jun 17, 2013

The problem is not with duplicated transactions (the same row appearing twice) but duplicated items (the same item appearing twice, in the same transaction -- e.g., "olives" on line 4).

read.transactions has an rm.duplicates argument to remove those duplicates.

read.transactions("Trans", format = "basket", sep=",", rm.duplicates=TRUE)