What's a good Web Crawler tool

Glenn Slaven picture Glenn Slaven · Oct 7, 2008 · Viewed 55k times · Source

I need to index a whole lot of webpages, what good webcrawler utilities are there? I'm preferably after something that .NET can talk to, but that's not a showstopper.

What I really need is something that I can give a site url to & it will follow every link and store the content for indexing.

Answer

anjanb picture anjanb · Oct 7, 2008

HTTrack -- http://www.httrack.com/ -- is a very good Website copier. Works pretty good. Have been using it for a long time.

Nutch is a web crawler(crawler is the type of program you're looking for) -- http://lucene.apache.org/nutch/ -- which uses a top notch search utility lucene.