Say I have an arbitrary numpy matrix that looks like this:
arr = [[ 6.0 12.0 1.0]
[ 7.0 9.0 1.0]
[ 8.0 7.0 1.0]
[ 4.0 3.0 2.0]
[ 6.0 1.0 2.0]
[ 2.0 5.0 2.0]
[ 9.0 4.0 3.0]
[ 2.0 1.0 4.0]
[ 8.0 4.0 4.0]
[ 3.0 5.0 4.0]]
What would be an efficient way of averaging rows that are grouped by their third column number?
The expected output would be:
result = [[ 7.0 9.33 1.0]
[ 4.0 3.0 2.0]
[ 9.0 4.0 3.0]
[ 4.33 3.33 4.0]]
A compact solution is to use numpy_indexed (disclaimer: I am its author), which implements a fully vectorized solution:
import numpy_indexed as npi
npi.group_by(arr[:, 2]).mean(arr)