I'm measuring the median and percentiles of a sample of data using Python.
import numpy as np
xmedian=np.median(data)
x25=np.percentile(data, 25)
x75=np.percentile(data, 75)
Do I have to use the np.sort()
function on my data before measuring the median?
According to the documentation of numpy.median
, you don't have to manually sort the data before feeding it to the function, as it does this internally. It is actually very good practice to view the source-code of the function, and try to understand how it works.
Example, showing that sorting beforehand is unnecessary:
In [1]: import numpy as np
In [2]: data = np.array([[ 10, 23, 1, 4, 5],
...: [ 2, 12, 5, 22, 14]])
In [3]: median = np.median(data) # Median of unsorted data
In [4]: median
Out[4]: 7.5
In [5]: data.sort() # Sorting data
In [6]: median_sorted = np.median(data.ravel()) # Median of the flattened array
In [7]: median_sorted
Out[7]: 7.5
In [8]: median == median_sorted # Check that they are equal
Out[8]: True