I don't want sound-to-text software. What I need is the following:
Do you know of such software library? LGPL would be most valuable to me, but I can go for commercial license as well.
Audio clips will contain both music, text, effects, or any combination thereof. So, TEXT recognition is out of the question.
Architecture: c++, C# for glue, CUDA if possible.
I have not found any libraries (yet), but two interesting papers, which may give you terminology and background to refine your searches:
EDIT: Searching for "Audio fingerprinting" came to a page of implementations, both open source and commercial.
Here is an introduction to Audio fingerprinting