Finding difference between strings in Javascript

Richard picture Richard · Nov 11, 2014 · Viewed 7.8k times · Source

I'd like to compare two strings (a before and after) and detect exactly where and what changed between them.

For any change, I want to know:

  1. Starting position of the change (inclusive, starting at 0)
  2. Ending position of the change (inclusive, starting at 0) relative to the previous text
  3. The "change"

Assume that strings will change in only one place at a time (for example, never "Bill" -> "Kiln").

Additionally, I need the start and end positions to reflect the type of change:

  • If deletion, the start and end position should be the start and end positions of the deleted text, respectively
  • If replacement, the start and end position should be the start and end positions of the "deleted" text, respectively (the change will be the "added" text)
  • If insertion, the start and end positions should be the same; the entry point of the text
  • If no change, let start and end positions remain zero, with an empty change

For example:

"0123456789" -> "03456789"  
Start: 1, End: 2, Change: "" (deletion)

"03456789" -> "0123456789"  
Start: 1, End: 1, Change: "12" (insertion)

"Hello World!" -> "Hello Aliens!"  
Start: 6, End: 10, Change: "Aliens" (replacement)

"Hi" -> "Hi"  
Start: 0, End: 0, Change: "" (no change)

I was able to somewhat detect the positions of the changed text, but it doesn't work in all cases because in order to do that accurately, I need to know what kind of change is made.

var OldText = "My edited string!";
var NewText = "My first string!";

var ChangeStart = 0;
var NewChangeEnd = 0;
var OldChangeEnd = 0;
console.log("Comparing start:");
for (var i = 0; i < NewText.length; i++) {
    console.log(i + ": " + NewText[i] + " -> " + OldText[i]);
    if (NewText[i] != OldText[i]) {
        ChangeStart = i;
        break;
    }
}
console.log("Comparing end:");
// "Addition"?
if (NewText.length > OldText.length) {
    for (var i = 1; i < NewText.length; i++) {
        console.log(i + "(N: " + (NewText.length - i) + " O: " + (OldText.length - i) + ": " + NewText.substring(NewText.length - i, NewText.length - i + 1) + " -> " + OldText.substring(OldText.length - i, OldText.length - i + 1));
        if (NewText.substring(NewText.length - i, NewText.length - i + 1) != OldText.substring(OldText.length - i, OldText.length - i + 1)) {
            NewChangeEnd = NewText.length - i;
            OldChangeEnd = OldText.length - i;
            break;
        }
    }
// "Deletion"?
} else if (NewText.length < OldText.length) {
    for (var i = 1; i < OldText.length; i++) {
        console.log(i + "(N: " + (NewText.length - i) + " O: " + (OldText.length - i) + ": " + NewText.substring(NewText.length - i, NewText.length - i + 1) + " -> " + OldText.substring(OldText.length - i, OldText.length - i + 1));
        if (NewText.substring(NewText.length - i, NewText.length - i + 1) != OldText.substring(OldText.length - i, OldText.length - i + 1)) {
            NewChangeEnd = NewText.length - i;
            OldChangeEnd = OldText.length - i;
            break;
        }
    }
// Same length...
} else {
    // Do something
}
console.log("Change start: " + ChangeStart);
console.log("NChange end : " + NewChangeEnd);
console.log("OChange end : " + OldChangeEnd);
console.log("Change: " + OldText.substring(ChangeStart, OldChangeEnd + 1));

How do I tell whether or not an insertion, deletion, or replacement took place?


I've searched and came up with a few other similar questions, but they don't seem to help.

Answer

Vivek Pradhan picture Vivek Pradhan · Nov 11, 2014

I have gone through your code and your logic for matching string makes sense to me. It logs ChangeStart, NewChangeEnd and OldChangeEnd correctly and the algorithm flows alright. You just want to know if an insertion, deletion or replacement took place. Here's how I would go about it.

First of all, you need to make sure that after you have got the first point of mis-match i.e. ChangeStart when you then traverse the strings from the end, the index shouldn't cross ChangeStart.

I'll give you an example. Consider the following strings:

 var NewText = "Hello Worllolds!";
 var OldText = "Hello Worlds!";

 ChangeStart -> 10 //Makes sense
 OldChangeEnd -> 8
 NewChangeEnd -> 11

 console.log("Change: " + NewText.substring(ChangeStart, NewChangeEnd + 1)); 
 //Ouputs "lo"

The problem in this case is when it starts matching from the back, the flow is something like this:

 Comparing end: 
  1(N: 12 O: 12: ! -> !) 
  2(N: 11 O: 11: s -> s) 
  3(N: 10 O: 10: d -> d)  -> You need to stop here!

 //Although there is not a mismatch, but we have reached ChangeStart and 
 //we have already established that characters from 0 -> ChangeStart-1 match
 //That is why it outputs "lo" instead of "lol"

Assuming, what I just said makes sense, you just need to modify your for loops like so:

 if (NewText.length > OldText.length) {
 for (var i = 1; i < NewText.length && ((OldText.length-i)>=ChangeStart); i++) {
  ...

    NewChangeEnd = NewText.length - i -1;
    OldChangeEnd = OldText.length - i -1;
  if(//Mismatch condition reached){
         //break..That code is fine.
    }
 }

This condition -> (OldText.length-i)>=ChangeStart takes care of the anomaly that I mentioned and therefore the for loop automatically terminates if this condition is reached. However, just as I mentioned there might be situations where this condition is reached before a mis-match is encountered like I just demonstrated. So you need to update values of NewChangeEnd and OldChangeEnd as 1 less than the matched value. In case of a mis-match, you store the values appropriately.

Instead of an else -if we could just wrap those two conditions in a situation where we know NewText.length > OldText.length is definitely not true i.e. it is either a replacement or a deletion. Again NewText.length > OldText.length also means it could be a replacement or an insertion as per your examples, which makes sense. So the else could be something like:

else {
for (var i = 1; i < OldText.length && ((OldText.length-i)>=ChangeStart); i++) { 

    ...
    NewChangeEnd = NewText.length - i -1;
    OldChangeEnd = OldText.length - i -1;
  if(//Mismatch condition reached){
         //break..That code is fine.
    }
 }

If you have understood the minor changes thus far, identifying the specific cases is really simple:

  1. Deletion - Condition -> ChangeStart > NewChangeEnd. Deleted string from ChangeStart -> OldChangeEnd.

Deleted text -> OldText.substring(ChangeStart, OldChangeEnd + 1);

  1. Insertion - Condition -> ChangeStart > OldChangeEnd. Inserted string at ChangeStart.

Inserted text -> NewText.substring(ChangeStart, NewChangeEnd + 1);

  1. Replacement - If NewText != OldText and the above two conditions are not met, then it is a replacement.

Text in old string that got replaced -> OldText.substring(ChangeStart, OldChangeEnd + 1);

The replacement text -> NewText.substring(ChangeStart, NewChangeEnd + 1);

Start and end positions in the OldText that got replaced -> ChangeStart -> OldChangeEnd

I have created a jsfiddle incorporating the changes that I have mentioned in your code. You might want to check it out. Hope it gets you started in the right direction.