I have the following code which is attempting to normalize the values of an m x n
array (It will be used as input to a neural network, where m
is the number of training examples and n
is the number of features).
However, when I inspect the array in the interpreter after the script runs, I see that the values are not normalized; that is, they still have the original values. I guess this is because the assignment to the array
variable inside the function is only seen within the function.
How can I do this normalization in place? Or do I have to return a new array from the normalize function?
import numpy
def normalize(array, imin = -1, imax = 1):
"""I = Imin + (Imax-Imin)*(D-Dmin)/(Dmax-Dmin)"""
dmin = array.min()
dmax = array.max()
array = imin + (imax - imin)*(array - dmin)/(dmax - dmin)
print array[0]
def main():
array = numpy.loadtxt('test.csv', delimiter=',', skiprows=1)
for column in array.T:
normalize(column)
return array
if __name__ == "__main__":
a = main()
If you want to apply mathematical operations to a numpy array in-place, you can simply use the standard in-place operators +=
, -=
, /=
, etc. So for example:
>>> def foo(a):
... a += 10
...
>>> a = numpy.arange(10)
>>> a
array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9])
>>> foo(a)
>>> a
array([10, 11, 12, 13, 14, 15, 16, 17, 18, 19])
The in-place version of these operations is a tad faster to boot, especially for larger arrays:
>>> def normalize_inplace(array, imin=-1, imax=1):
... dmin = array.min()
... dmax = array.max()
... array -= dmin
... array *= imax - imin
... array /= dmax - dmin
... array += imin
...
>>> def normalize_copy(array, imin=-1, imax=1):
... dmin = array.min()
... dmax = array.max()
... return imin + (imax - imin) * (array - dmin) / (dmax - dmin)
...
>>> a = numpy.arange(10000, dtype='f')
>>> %timeit normalize_inplace(a)
10000 loops, best of 3: 144 us per loop
>>> %timeit normalize_copy(a)
10000 loops, best of 3: 146 us per loop
>>> a = numpy.arange(1000000, dtype='f')
>>> %timeit normalize_inplace(a)
100 loops, best of 3: 12.8 ms per loop
>>> %timeit normalize_copy(a)
100 loops, best of 3: 16.4 ms per loop