I want to create a transaction object in basket format which I can call anytime for my analyses. The data contains comma separated items with 1001 transactions. The first 10 transactions look like this:
hering,corned_b,olives,ham,turkey,bourbon,ice_crea
baguette,soda,hering,cracker,heineken,olives,corned_b
avocado,cracker,artichok,heineken,ham,turkey,sardines
olives,bourbon,coke,turkey,ice_crea,ham,peppers
hering,corned_b,apples,olives,steak,avocado,turkey
sardines,heineken,chicken,coke,ice_crea,peppers,ham
olives,bourbon,coke,turkey,ice_crea,heineken,apples
corned_b,peppers,bourbon,cracker,chicken,ice_crea,baguette
soda,olives,bourbon,cracker,heineken,peppers,baguette
corned_b,peppers,bourbon,cracker,chicken,bordeaux,hering
...
I observed that there are duplicated transactions in the data and removed them but each time I tried to read the transactions, I get:
Error in asMethod(object) : can not coerce list with transactions with duplicated items
Here is my code:
data <- read.csv("AssociationsItemList.txt",header=F)
data <- data[!duplicated(data),]
pop <- NULL
for(i in 1:length(data)){
pop <- paste(pop, data[i],sep="\n")
}
write(pop, file = "Trans", sep = ",")
transdata <- read.transactions("Trans", format = "basket", sep=",")
I'm sure there's something little yet important I've missed. Kindly offer your assistance.
The problem is not with duplicated transactions (the same row appearing twice) but duplicated items (the same item appearing twice, in the same transaction -- e.g., "olives" on line 4).
read.transactions
has an rm.duplicates
argument to remove those duplicates.
read.transactions("Trans", format = "basket", sep=",", rm.duplicates=TRUE)