I'm practicing algorithms and one of my tasks is to count the number of all longest increasing sub-sequences for given 0 < n <= 10^6 numbers. Solution O(n^2) is not an option.
I have already implemented finding a LIS and its length (LIS Algorithm), but this algorithm switches numbers to the lowest possible. Therefore, it's impossible to determine if sub-sequences with a previous number (the bigger one) would be able to achieve the longest length, otherwise I could just count those switches, I guess.
Any ideas how to get this in about O(nlogn)? I know that it should be solved using dynamic-programming.
I implemented one solution and it works well, but it requires two nested loops (i in 1..n) x (j in 1..i-1).
So it's O(n^2) I think, nevertheless it's too slow.
I tried even to move those numbers from array to a binary tree (because in each i iteration I look for all smaller numbers then number[i] - going through elements i-1..1), but it was even slower.
Example tests:
1 3 2 2 4
result: 3 (1,3,4 | 1,2,4 | 1,2,4)
3 2 1
result: 3 (1 | 2 | 3)
16 5 8 6 1 10 5 2 15 3 2 4 1
result: 3 (5,8,10,15 | 5,6,10,15 | 1,2,3,4)
Full Java code of improved LIS algorithm, which discovers not only the length of longest increasing subsequence, but number of subsequences of such length, is below. I prefer to use generics to allow not only integers, but any comparable types.
@Test
public void testLisNumberAndLength() {
List<Integer> input = Arrays.asList(16, 5, 8, 6, 1, 10, 5, 2, 15, 3, 2, 4, 1);
int[] result = lisNumberAndlength(input);
System.out.println(String.format(
"This sequence has %s longest increasing subsequenses of length %s",
result[0], result[1]
));
}
/**
* Body of improved LIS algorithm
*/
public <T extends Comparable<T>> int[] lisNumberAndLength(List<T> input) {
if (input.size() == 0)
return new int[] {0, 0};
List<List<Sub<T>>> subs = new ArrayList<>();
List<Sub<T>> tails = new ArrayList<>();
for (T e : input) {
int pos = search(tails, new Sub<>(e, 0), false); // row for a new sub to be placed
int sum = 1;
if (pos > 0) {
List<Sub<T>> pRow = subs.get(pos - 1); // previous row
int index = search(pRow, new Sub<T>(e, 0), true); // index of most left element that <= e
if (pRow.get(index).value.compareTo(e) < 0) {
index--;
}
sum = pRow.get(pRow.size() - 1).sum; // sum of tail element in previous row
if (index >= 0) {
sum -= pRow.get(index).sum;
}
}
if (pos >= subs.size()) { // add a new row
List<Sub<T>> row = new ArrayList<>();
row.add(new Sub<>(e, sum));
subs.add(row);
tails.add(new Sub<>(e, 0));
} else { // add sub to existing row
List<Sub<T>> row = subs.get(pos);
Sub<T> tail = row.get(row.size() - 1);
if (tail.value.equals(e)) {
tail.sum += sum;
} else {
row.add(new Sub<>(e, tail.sum + sum));
tails.set(pos, new Sub<>(e, 0));
}
}
}
List<Sub<T>> lastRow = subs.get(subs.size() - 1);
Sub<T> last = lastRow.get(lastRow.size() - 1);
return new int[]{last.sum, subs.size()};
}
/**
* Implementation of binary search in a sorted list
*/
public <T> int search(List<? extends Comparable<T>> a, T v, boolean reversed) {
if (a.size() == 0)
return 0;
int sign = reversed ? -1 : 1;
int right = a.size() - 1;
Comparable<T> vRight = a.get(right);
if (vRight.compareTo(v) * sign < 0)
return right + 1;
int left = 0;
int pos = 0;
Comparable<T> vPos;
Comparable<T> vLeft = a.get(left);
for(;;) {
if (right - left <= 1) {
if (vRight.compareTo(v) * sign >= 0 && vLeft.compareTo(v) * sign < 0)
return right;
else
return left;
}
pos = (left + right) >>> 1;
vPos = a.get(pos);
if (vPos.equals(v)) {
return pos;
} else if (vPos.compareTo(v) * sign > 0) {
right = pos;
vRight = vPos;
} else {
left = pos;
vLeft = vPos;
}
}
}
/**
* Class for 'sub' pairs
*/
public static class Sub<T extends Comparable<T>> implements Comparable<Sub<T>> {
T value;
int sum;
public Sub(T value, int sum) {
this.value = value;
this.sum = sum;
}
@Override public String toString() {
return String.format("(%s, %s)", value, sum);
}
@Override public int compareTo(Sub<T> another) {
return this.value.compareTo(another.value);
}
}
As my explanation seems to be long, I will call initial sequence "seq" and any its subsequence "sub". So the task is to calculate count of longest increasing subs that can be obtained from the seq.
As I mentioned before, idea is to keep counts of all possible longest subs obtained on previous steps. So let's create a numbered list of rows, where number of each line equals the length of subs stored in this row. And let's store subs as pairs of numbers (v, c), where "v" is "value" of ending element, "c" is "count" of subs of given length that end by "v". For example:
1: (16, 1) // that means that so far we have 1 sub of length 1 which ends by 16.
We will build such list step by step, taking elements from initial sequence by their order. On every step we will try to add this element to the longest sub that it can be added to and record changes.
Let's build the list using sequence from your example, since it has all possible options:
16 5 8 6 1 10 5 2 15 3 2 4 1
First, take element 16. Our list is empty so far, so we just put one pair in it:
1: (16, 1) <= one sub that ends by 16
Next is 5. It cannot be added to a sub that ends by 16, so it will create new sub with length of 1. We create a pair (5, 1) and put it into line 1:
1: (16, 1)(5, 1)
Element 8 is coming next. It cannot create the sub [16, 8] of length 2, but can create the sub [5, 8]. So, this is where algorithm is coming. First, we iterate the list rows upside down, looking at the "values" of last pair. If our element is greater than values of all last elements in all rows, then we can add it to existing sub(s), increasing its length by one. So value 8 will create new row of the list, because it is greater than values all last elements existing in the list so far (i. e. > 5):
1: (16, 1)(5, 1)
2: (8, ?) <=== need to resolve how many longest subs ending by 8 can be obtained
Element 8 can continue 5, but cannot continue 16. So we need to search through previous row, starting from its end, calculating the sum of "counts" in pairs which "value" is less than 8:
(16, 1)(5, 1)^ // sum = 0
(16, 1)^(5, 1) // sum = 1
^(16, 1)(5, 1) // value 16 >= 8: stop. count = sum = 1, so write 1 in pair next to 8
1: (16, 1)(5, 1)
2: (8, 1) <=== so far we have 1 sub of length 2 which ends by 8.
Why don't we store value 8 into subs of length 1 (first line)? Because we need subs of maximum possible length, and 8 can continue some previous subs. So every next number greater than 8 will also continue such sub and there is no need to keep 8 as sub of length less that it can be.
Next. 6. Searching upside down by last "values" in rows:
1: (16, 1)(5, 1) <=== 5 < 6, go next
2: (8, 1)
1: (16, 1)(5, 1)
2: (8, 1 ) <=== 8 >= 6, so 6 should be put here
Found the room for 6, need to calculate a count:
take previous line
(16, 1)(5, 1)^ // sum = 0
(16, 1)^(5, 1) // 5 < 6: sum = 1
^(16, 1)(5, 1) // 16 >= 6: stop, write count = sum = 1
1: (16, 1)(5, 1)
2: (8, 1)(6, 1)
After processing 1:
1: (16, 1)(5, 1)(1, 1) <===
2: (8, 1)(6, 1)
After processing 10:
1: (16, 1)(5, 1)(1, 1)
2: (8, 1)(6, 1)
3: (10, 2) <=== count is 2 because both "values" 8 and 6 from previous row are less than 10, so we summarized their "counts": 1 + 1
After processing 5:
1: (16, 1)(5, 1)(1, 1)
2: (8, 1)(6, 1)(5, 1) <===
3: (10, 2)
After processing 2:
1: (16, 1)(5, 1)(1, 1)
2: (8, 1)(6, 1)(5, 1)(2, 1) <===
3: (10, 2)
After processing 15:
1: (16, 1)(5, 1)(1, 1)
2: (8, 1)(6, 1)(5, 1)(2, 1)
3: (10, 2)
4: (15, 2) <===
After processing 3:
1: (16, 1)(5, 1)(1, 1)
2: (8, 1)(6, 1)(5, 1)(2, 1)
3: (10, 2)(3, 1) <===
4: (15, 2)
After processing 2:
1: (16, 1)(5, 1)(1, 1)
2: (8, 1)(6, 1)(5, 1)(2, 2) <===
3: (10, 2)(3, 1)
4: (15, 2)
If when searching rows by last element we find equal element, we calculate its "count" again based on previous row, and add to existing "count".
After processing 4:
1: (16, 1)(5, 1)(1, 1)
2: (8, 1)(6, 1)(5, 1)(2, 2)
3: (10, 2)(3, 1)
4: (15, 2)(4, 1) <===
After processing 1:
1: (16, 1)(5, 1)(1, 2) <===
2: (8, 1)(6, 1)(5, 1)(2, 2)
3: (10, 2)(3, 1)
4: (15, 2)(4, 1)
So what do we have after processing all initial sequence? Looking at the last row, we see that we have 3 longest subs, each consist of 4 elements: 2 end by 15 and 1 ends by 4.
On every iteration, when taking next element from initial sequence, we make 2 loops: first when iterating rows to find room for next element, and second when summarizing counts in previous row. So for every element we make maximum to n iterations (worst cases: if initial seq consists of elements in increasing order, we will get a list of n rows with 1 pair in every row; if seq is sorted in descending order, we will obtain list of 1 row with n elements). By the way, O(n2) complexity is not what we want.
First, this is obvious, that in every intermediate state rows are sorted by increasing order of their last "value". So instead of brute loop, binary searching can be performed, which complexity is O(log n).
Second, we don't need to summarize "counts" of subs by looping through row elements every time. We can summarize them in process, when new pair is added to the row, like:
1: (16, 1)(5, 2) <=== instead of 1, put 1 + "count" of previous element in the row
So second number will show not count of longest subs that can be obtained with given value at the end, but summary count of all longest subs that end by any element that is greater or equal to "value" from the pair.
Thus, "counts" will be replaced by "sums". And instead of iterating elements in previous row, we just perform binary search (it is possible because pairs in any row are always ordered by their "values") and take "sum" for new pair as "sum" of last element in previous row minus "sum" from element left to found position in previous row plus "sum" of previous element in the current row.
So when processing 4:
1: (16, 1)(5, 2)(1, 3)
2: (8, 1)(6, 2)(5, 3)(2, 5)
3: (10, 2)(3, 3)
4: (15, 2) <=== room for (4, ?)
search in row 3 by "values" < 4:
3: (10, 2)^(3, 3)
4 will be paired with (3-2+2): ("sum" from the last pair of previous row) - ("sum" from pair left to found position in previous row) + ("sum" from previous pair in current row):
4: (15, 2)(4, 3)
In this case, final count of all longest subs is "sum" from the last pair of the last row of the list, i. e. 3, not 3 + 2.
So, performing binary search to both row search and sum search, we will come with O(n*log n) complexity.
What about memory consumed, after processing all array we obtain maximum n pairs, so memory consumption in case of dynamic arrays will be O(n). Besides, when using dynamic arrays or collections, some additional time is needed to allocate and resize them, but most operations are made in O(1) time because we don't make any kind of sorting and rearrangement during process. So complexity estimation seems to be final.