O(n log n) vs O(n) -- practical differences in time complexity

brain storm picture brain storm · Jan 31, 2014 · Viewed 90.1k times · Source

n log n > n -- but this is like a pseudo-linear relationship. If n=1 billion, log n ~ 30;

So n log n will be 30 billion, which is 30 X n, order of n. I am wondering if this time complexity difference between n log n and n are significant in real life.

Eg: A quick select on finding kth element in an unsorted array is O(n) using quickselect algorithm.

If I sort the array and find the kth element, it is O(n log n). To sort an array with 1 trillion elements, I will be 60 times slower if I do quicksort and index it.

Answer

Sergey Kalinichenko picture Sergey Kalinichenko · Jan 31, 2014

The main purpose of the Big-O notation is to let you do the estimates like the ones you did in your post, and decide for yourself if spending your effort coding a typically more advanced algorithm is worth the additional CPU cycles that you are going to buy with that code improvement. Depending on the circumstances, you may get a different answer, even when your data set is relatively small:

  • If you are running on a mobile device, and the algorithm represents a significant portion of the execution time, cutting down the use of CPU translates into extending the battery life
  • If you are running in an all-or-nothing competitive environment, such as a high-frequency trading system, a micro-optimization may differentiate between making money and losing money
  • When your profiling shows that the algorithm in question dominates the execution time in a server environment, switching to a faster algorithm may improve performance for all your clients.

Another thing the Big-O notation hides is the constant multiplication factor. For example, Quick Select has very reasonable multiplier, making the time savings from employing it on extremely large data sets well worth the trouble of implementing it.

Another thing that you need to keep in mind is the space complexity. Very often, algorithms with O(N*Log N) time complexity would have an O(Log N) space complexity. This may present a problem for extremely large data sets, for example when a recursive function runs on a system with a limited stack capacity.