Big O complexity to merge two lists

ggould75 picture ggould75 · May 11, 2013 · Viewed 9.5k times · Source

Given 2 singly linked lists already sorted, merge the lists.

Example:
list1: 1 2 3 5 7
list2: 0 4 6 7 10
---> 0 1 2 3 4 5 6 7 7 10

Despite from the fact that the solution is quite simple and there are several different implementations of the problem with or without using recursion (like this http://www.geeksforgeeks.org/merge-two-sorted-linked-lists/ see Method 3),

I was wondering what would be the O big complexity of this implementation:

  1. If one of the lists is empty just return the other
  2. Otherwise insert each node of the second list into the first one using the sortedInsert function which basically scan the list until the right position is found. Because the 2 lists are both already sorted there's no need to compare each time the node with all the nodes in the first list, I can start the comparison from the last added node.

ex: continuing with the previous example if 4 has been already added I can safely start the next comparison from 4:
list1: 0 1 2 3 4 5 7
list2: 6 7 10
now compare 6 with 4 instead with 1 2 3 4....

If I would compare one element with all the elements in the first list it would be O(m*n) with m=#list2 and n=#list1, but considering this "optimisation" what is the complexity?

Implementation:

// Insert a new node in a sorted list
int sortedInsert(struct ListNode **head, struct ListNode* newNode) {
    int headUpdated = 0;
    struct ListNode *current;

    // The list is empty or the new element has to be added as first element
    if (*head == NULL || (*head)->data >= newNode->data) {
        newNode->next = *head;
        *head = newNode;
        headUpdated = 1;
    }
    else {
        // Locate the node before the point of insertion
        current = *head;
        while (current->next != NULL && current->next->data < newNode->data)
            current = current->next;

        newNode->next = current->next;
        current->next = newNode;
    }

    return headUpdated;
}


struct ListNode *mergeLists(struct ListNode *head1, struct ListNode *head2) {
    if (head1 == NULL)
        return head2;

    if (head2 == NULL)
        return head1;

    // Store the node in the first list where to start the comparisons
    struct ListNode *first = head1;

    while (head2) {
        struct ListNode *head2Next = head2->next;

        printf("Adding %d starting comparison from %d\n", head2->data, first->data);
        if (sortedInsert(&first, head2))
            head1 = first;

        first = head2;
        head2 = head2Next;
    }

    return head1;
}

Answer

Daniel Fischer picture Daniel Fischer · May 12, 2013

Actually the merge algorithm you have here is O(m + n), not O(m * n).

Since you have a pointer to the last inserted node, and start looking for the place to insert the next node from that on, the total number of

current = current->next

operations in sortedInsert is at most m + n - 1 (length of result minus one). The number of insert operations (relinking the next pointers) is n (length of the second list). For each comparison

current->next->data < newNode->data

the next operation is either an insertion or a current = current->next, so the number of comparisons is at most

m + 2*n - 1

Let us say that the resulting list starts with m_0 elements from the first list, then n_1 elements from the second, then m_1 from the first, n_2 from the second, ..., n_r from the second, then finally m_r from the first. m_0 and m_r may be 0, all other numbers are strictly positive.

The first element of the n_1 block is compared to each element of the m_0 block and the first element of the m_1 block. All other elements of that block are compared to their predecessor in the second list, and the first element of the m_1 block [unless r = 1 and m_1 = 0, in which case there are fewer comparisons].

That makes m_0 + 1 + (n_1 - 1)*2 = m_0 + 2*n_1 - 1 comparisons (or fewer).

For the later n_k blocks, the first element is compared to the last element of the n_(k-1) block, all elements of the m_(k-1) block, and the first element of the m_k block [if that exists]. The further elements of the block are all compared to their predecessor in list 2, and the first element of the m_k block [if that exists], that makes

 1 + m_(k-1) + 1 + (n_k - 1)*2 = m_(k-1) + 2*n_k

comparisons (or fewer). Since all comparisons involve at least one element of the second list, the total number of comparisons is at most

m_0 + 2*n_1 - 1 + m_1 + 2*n_2 + m_2 + 2*n_3 + ... + m_(r-1) + 2*n_r <= m + 2*n - 1.

We can slightly improve it by intialising

    struct ListNode *first = head1;

and removing the

    if (!first)
        first = head1;

test from the loop.