The arrays are of following dimensions:
dists
: (500,5000)
train
: (5000,)
test
:(500,)
Why does the first two statements throw an error whereas the third one works fine?
dists += train + test
Error: ValueError: operands could not be broadcast together with shapes (5000,) (500,)
dists += train.reshape(-1,1) + test.reshape(-1,1)
Error: ValueError: operands could not be broadcast together with shapes (5000,1) (500,1)
dists += train + test.reshape(-1,1)
This works fine!Why does this happen?
It's to do with NumPy's broadcasting rules. Quoting the NumPy manual:
When operating on two arrays, NumPy compares their shapes element-wise. It starts with the trailing dimensions, and works its way forward. Two dimensions are compatible when
- they are equal, or
- one of them is 1
The first statement throws an error because NumPy looks at the only dimension, and (5000,)
and (500,)
are inequal and cannot be broadcast together.
In the second statement, train.reshape(-1,1)
has the shape (5000,1)
and test.reshape(-1,1)
has the shape (500,1)
. The trailing dimension (length one) is equal, so that's ok, but then NumPy checks the other dimension and 5000 != 500
, so the broadcasting fails here.
In the third case, your operands are (5000,)
and (500,1)
. In this case NumPy does allow broadcasting. The 1D-array is extended along the trailing length-1 dimension of the 2D-array.
FWIW, the shape and broadcasting rules can be a bit tricky sometimes, and I've often been confused with similar matters.