Normalize numpy array columns in python

ahajib picture ahajib · Apr 15, 2015 · Viewed 111k times · Source

I have a numpy array where each cell of a specific row represents a value for a feature. I store all of them in an 100*4 matrix.

A     B   C
1000  10  0.5
765   5   0.35
800   7   0.09  

Any idea how I can normalize rows of this numpy.array where each value is between 0 and 1?

My desired output is:

A     B    C
1     1    1
0.765 0.5  0.7
0.8   0.7  0.18(which is 0.09/0.5)

Thanks in advance :)

Answer

ali_m picture ali_m · Apr 16, 2015

If I understand correctly, what you want to do is divide by the maximum value in each column. You can do this easily using broadcasting.

Starting with your example array:

import numpy as np

x = np.array([[1000,  10,   0.5],
              [ 765,   5,  0.35],
              [ 800,   7,  0.09]])

x_normed = x / x.max(axis=0)

print(x_normed)
# [[ 1.     1.     1.   ]
#  [ 0.765  0.5    0.7  ]
#  [ 0.8    0.7    0.18 ]]

x.max(0) takes the maximum over the 0th dimension (i.e. rows). This gives you a vector of size (ncols,) containing the maximum value in each column. You can then divide x by this vector in order to normalize your values such that the maximum value in each column will be scaled to 1.


If x contains negative values you would need to subtract the minimum first:

x_normed = (x - x.min(0)) / x.ptp(0)

Here, x.ptp(0) returns the "peak-to-peak" (i.e. the range, max - min) along axis 0. This normalization also guarantees that the minimum value in each column will be 0.