I've tried WebSphinx application.
I realize if I put wikipedia.org as the starting URL, it will not crawl further.
Hence, how to actually crawl the entire Wikipedia? Can anyone gimme some guidelines? Do I need to specifically go and find those URLs and put multiple starting URLs?
Anyone has suggestions of good website with the tutorial on usng WebSphinx's API?
If your goal is to crawl all of Wikipedia, you might want to look at the available database dumps. See http://download.wikimedia.org/.