I have read a CSV
file into an R data.frame. Some of the rows have the same element in one of the columns. I would like to remove rows that are duplicates in that column. For example:
platform_external_dbus 202 16 google 1
platform_external_dbus 202 16 space-ghost.verbum 1
platform_external_dbus 202 16 localhost 1
platform_external_dbus 202 16 users.sourceforge 8
platform_external_dbus 202 16 hughsie 1
I would like only one of these rows since the others have the same data in the first column.
For people who have come here to look for a general answer for duplicate row removal, use !duplicated()
:
a <- c(rep("A", 3), rep("B", 3), rep("C",2))
b <- c(1,1,2,4,1,1,2,2)
df <-data.frame(a,b)
duplicated(df)
[1] FALSE TRUE FALSE FALSE FALSE TRUE FALSE TRUE
> df[duplicated(df), ]
a b
2 A 1
6 B 1
8 C 2
> df[!duplicated(df), ]
a b
1 A 1
3 A 2
4 B 4
5 B 1
7 C 2
Answer from: Removing duplicated rows from R data frame