If I define my own method of determining the similarity between two input entities of my Support Vector Machine classifier, and thus define it as my kernel, how do I verify if it is indeed a valid kernel that I can use?
For example, if my inputs are strings, and the kernel I choose is lets say some kind of a string distance metric, how can I decide if I can use it or not for my SVM. I know there are some conditions for a valid SVM kernel. Can anyone tell me what they are and how does one go about verifying those conditions?
The most straight forward test is based on the following: A kernel function is valid if and only if the kernel matrix for any particular set of data points has all non-negative eigenvalues. You can easily test this by taking a reasonably large set of data points and simply checking if it is true. For example, if you selected 2000 data samples at random, created their corresponding 2000x2000 kernel matrix, and observed that it had non-negative eigenvalues, then it is highly likely that you have a legit kernel. Alternatively, if there are any negative eigenvalues then the candidate kernel function is definitely not a legitimate kernel.