So, lets say I have 100,000 float arrays with 100 elements each. I need the highest X number of values, BUT only if they are greater than Y. Any element not matching this should be set to 0. What would be the fastest way to do this in Python? Order must be maintained. Most of the elements are already set to 0.
sample variables:
array = [.06, .25, 0, .15, .5, 0, 0, 0.04, 0, 0]
highCountX = 3
lowValY = .1
expected result:
array = [0, .25, 0, .15, .5, 0, 0, 0, 0, 0]
This is a typical job for NumPy, which is very fast for these kinds of operations:
array_np = numpy.asarray(array)
low_values_flags = array_np < lowValY # Where values are low
array_np[low_values_flags] = 0 # All low values set to 0
Now, if you only need the highCountX largest elements, you can even "forget" the small elements (instead of setting them to 0 and sorting them) and only sort the list of large elements:
array_np = numpy.asarray(array)
print numpy.sort(array_np[array_np >= lowValY])[-highCountX:]
Of course, sorting the whole array if you only need a few elements might not be optimal. Depending on your needs, you might want to consider the standard heapq module.