My company's cat-herding application tracks a convoy of cats. Periodically, it needs to compare previousOrder
to currentOrder
(each is an ArrayList<Cat>
) and notify the cat-wranglers of any changes.
Each cat is unique and can only appear once in each list (or not at all). Most of the time, the previousOrder
and currentOrder
lists have the same contents, in the same order, but any of the following can happen (from more frequent to less frequent):
This appears like an edit distance problem to me. Ideally, I am looking for an algorithm that determines the steps required to make previousOrder
match currentOrder
:
Fluffy
to position 12
Snuggles
at position 37
Mr. Chubbs
The algorithm should also recognize scenario #1, in which case the new order is communicated in its entirety.
What's the best approach for this?
(This post and that post pose similar questions, but they are both dealing with sorted lists. Mine are ordered, but unsorted.)
EDIT
The Levenshtein algorithm is a great suggestion, but I'm concerned about the time/space requirement of creating a matrix. My main goal is to determine and communicate the changes as quickly as possible. Something that is faster than finding the additions and sending message along the lines of "Here are the new cats, and here is the current order."
Here's an algorithm I put together to merge two lists, old
and new
. It's not the most elegant or efficient, but it seems to work okay for the data I'm using it for.
new
is the most updated list of data, and old
is the out-of-date list that needs to get transformed into new
. The algorithm performs its operations on the old
list - removing, moving, and inserting items accordingly.
for(item in old)
if (new does not contain item)
remove item from old
for(item in new)
if (item exists in old)
if (position(item, old) == position(item, new))
continue // next loop iteration
else
move old item to position(item, new)
else
insert new item into old at position(item, new)
The deletions are all done up front to make the positions of the items more predictable in the second loop.
The driving force behind this was to sync a list of data from the server with <table>
rows in a browser DOM (using javascript). It was needed because we didn't want to redraw the entire table whenever the data changed; the differences between the lists were likely to be small and only affect one or two rows. It may not be the algorithm you're looking for for your data. If not, let me know and I'll delete this.
There are probably some optimizations that could be made for this. But it is performant and predictable enough for me and the data I'm working with.