Remove/collapse consecutive duplicate values in sequence

Question 1

Remove/collapse consecutive duplicate values in sequence

r loops apply lag

Amarjeet · Dec 15, 2014 · Viewed 7.7k times · Source

Answer

Answer

One easy way is to use rle:

Here's your sample data:

x <- scan(what = character(), text = "a a a b c c d e a a b b b e e d d")
# Read 17 items

rle returns a list with two values: the run length ("lengths"), and the value that is repeated for that run ("values").

rle(x)$values
# [1] "a" "b" "c" "d" "e" "a" "b" "e" "d"

Update: For a `data.frame`

If you are working with a data.frame, try something like the following:

## Sample data
mydf <- data.frame(
  V1 = c("a", "a", "a", "b", "c", "c", "d", "e", 
         "a", "a", "b", "b", "e", "e", "d", "d"),
  V2 = c(1, 2, 3, 2, 4, 1, 3, 9, 
         4, 8, 10, 199, 2, 5, 4, 10)
)

## Use rle, as before
X <- rle(mydf$V1)
## Identify the rows you want to keep
Y <- cumsum(c(1, X$lengths[-length(X$lengths)]))
Y
# [1]  1  4  5  7  8  9 11 13 15
mydf[Y, ]
#    V1 V2
# 1   a  1
# 4   b  2
# 5   c  4
# 7   d  3
# 8   e  9
# 9   a  4
# 11  b 10
# 13  e  2
# 15  d  4

Update 2

The "data.table" package has a function rleid that lets you do this quite easily. Using mydf from above, try:

library(data.table)
as.data.table(mydf)[, .SD[1], by = rleid(V1)]
#    rleid V2
# 1:     1  1
# 2:     2  2
# 3:     3  4
# 4:     4  3
# 5:     5  9
# 6:     6  4
# 7:     7 10
# 8:     8  2
# 9:     9  4

Question 2

I have the following dataframe:

a a a b c c d e a a b b b e e d d

The required result should be

a b c d e a b e d

It means no two consecutive rows should have same value. How it can be done without using loop.

As my data set is quite huge, looping is taking lot of time to execute.

The dataframe structure is like the following

a 1 
a 2
a 3
b 2
c 4
c 1
d 3
e 9
a 4
a 8
b 10
b 199
e 2
e 5
d 4
d 10

Result:

a 1 
b 2
c 4
d 3
e 9
a 4
b 10
e 2
d 4

Its should delete the entire row.

Remove/collapse consecutive duplicate values in sequence

Answer

Update: For a data.frame

Update 2

Related questions

Update: For a `data.frame`