Java: Difference between two lists

Tony the Pony picture Tony the Pony · Jun 1, 2011 · Viewed 8.6k times · Source

My company's cat-herding application tracks a convoy of cats. Periodically, it needs to compare previousOrder to currentOrder (each is an ArrayList<Cat>) and notify the cat-wranglers of any changes.

Each cat is unique and can only appear once in each list (or not at all). Most of the time, the previousOrder and currentOrder lists have the same contents, in the same order, but any of the following can happen (from more frequent to less frequent):

  1. The order of cats is scrambled completely
  2. Cats individually move up or down in the list
  3. New cats join, at a specific point in the convoy
  4. Cats leave the convoy

This appears like an edit distance problem to me. Ideally, I am looking for an algorithm that determines the steps required to make previousOrder match currentOrder:

  • MOVE Fluffy to position 12
  • INSERT Snuggles at position 37
  • DELETE Mr. Chubbs
  • etc.

The algorithm should also recognize scenario #1, in which case the new order is communicated in its entirety.

What's the best approach for this?

(This post and that post pose similar questions, but they are both dealing with sorted lists. Mine are ordered, but unsorted.)

EDIT

The Levenshtein algorithm is a great suggestion, but I'm concerned about the time/space requirement of creating a matrix. My main goal is to determine and communicate the changes as quickly as possible. Something that is faster than finding the additions and sending message along the lines of "Here are the new cats, and here is the current order."

Answer

Rob Hruska picture Rob Hruska · Jun 1, 2011

Here's an algorithm I put together to merge two lists, old and new. It's not the most elegant or efficient, but it seems to work okay for the data I'm using it for.

new is the most updated list of data, and old is the out-of-date list that needs to get transformed into new. The algorithm performs its operations on the old list - removing, moving, and inserting items accordingly.

for(item in old)
    if (new does not contain item)
        remove item from old

for(item in new)
    if (item exists in old)
        if (position(item, old) == position(item, new))
            continue // next loop iteration
        else
            move old item to position(item, new)
    else
        insert new item into old at position(item, new)

The deletions are all done up front to make the positions of the items more predictable in the second loop.

The driving force behind this was to sync a list of data from the server with <table> rows in a browser DOM (using javascript). It was needed because we didn't want to redraw the entire table whenever the data changed; the differences between the lists were likely to be small and only affect one or two rows. It may not be the algorithm you're looking for for your data. If not, let me know and I'll delete this.

There are probably some optimizations that could be made for this. But it is performant and predictable enough for me and the data I'm working with.