I am beginner in neural networks. I am learning about perceptrons. My question is Why is weight vector perpendicular to decision boundary(Hyperplane)? I referred many books but all are mentioning that weight vector is perpendicular to decision boundary but none are saying why?
Can anyone give me an explanation or reference to a book?
The weights are simply the coefficients that define a separating plane. For the moment, forget about neurons and just consider the geometric definition of a plane in N dimensions:
w1*x1 + w2*x2 + ... + wN*xN - w0 = 0
You can also think of this as being a dot product:
w*x - w0 = 0
where w
and x
are both length-N vectors. This equation holds for all points on the plane. Recall that we can multiply the above equation by a constant and it still holds so we can define the constants such that the vector w
has unit length. Now, take out a piece of paper and draw your x-y
axes (x1
and x2
in the above equations). Next, draw a line (a plane in 2D
) somewhere near the origin. w0
is simply the perpendicular distance from the origin to the plane and w
is the unit vector that points from the origin along that perpendicular. If you now draw a vector from the origin to any point on the plane, the dot product of that vector with the unit vector w
will always be equal to w0
so the equation above holds, right? This is simply the geometric definition of a plane: a unit vector defining the perpendicular to the plane (w
) and the distance (w0
) from the origin to the plane.
Now our neuron is simply representing the same plane as described above but we just describe the variables a little differently. We'll call the components of x
our "inputs", the components of w
our "weights", and we'll call the distance w0
a bias. That's all there is to it.
Getting a little beyond your actual question, we don't really care about points on the plane. We really want to know which side of the plane a point falls on. While w*x - w0
is exactly zero on the plane, it will have positive values for points on one side of the plane and negative values for points on the other side. That's where the neuron's activation function comes in but that's beyond your actual question.