I have a numpy array with shape (34799, 32, 32, 3)
which means (num examples, width, height, channels)
.
Now I normalize the image data with the following code:
def normalize(x):
return (x - 128) / 128
X_train_norm = normalize(X_train)
But the result seems not right, the value of X_train[0][0][0]
is [28 25 24]
, but the output of X_train_norm[0][0][0]
is [1.21875 1.1953125 1.1875]
.
I use the following test code:
test = np.array([[[[28, 25, 24]]]])
print ((test - 128) / 128)
output:
[[[[-0.78125 -0.8046875 -0.8125 ]]]]
Why the normalize
function gets the wrong result?
I think the images are loaded as a numpy array filled with uint8
bytes with values between 0
and 255
.
If you perform a subtraction on an uint8
such that the result is negative, a wraparound happens. Like 123 - 128 == 251
, and then you divide it by 128. For example:
>>> np.array([28,25,24], dtype=np.uint8) - 128
array([156, 153, 152], dtype=uint8)
and then, we get the reported:
>>> (np.array([28,25,24], dtype=np.uint8) - 128)/128
array([1.21875 , 1.1953125, 1.1875 ])
In order to solve it, you can use .astype(..)
:
def normalize(x):
return (x.astype(float) - 128) / 128
Note that this has nothing to do with the fact that you use a function, if you had used the expression with the original array, you would have had the same result.