I am trying to configure the whalebot
crawler with the tar file whalebot-0.02.00.tar.gz
. I have extracted it correctly with:
root@Admin1:~/dls# tar xvzf whalebot-0.02.00.tar.gz
After that I want to configure it with:
root@Admin1:~/dls/whalebot# ./configure
It gives me error:
bash: ./configure: No such file or directory
also I have run the command:
root@Admin1:~/dls/whalebot# cmake ./
It gives me the following result:
root@Admin1:~/dls/whalebot# cmake ./
-- The C compiler identification is GNU
-- The CXX compiler identification is GNU
-- Check for working C compiler: /usr/bin/gcc
-- Check for working C compiler: /usr/bin/gcc -- works
-- Detecting C compiler ABI info
-- Detecting C compiler ABI info - done
-- Check for working CXX compiler: /usr/bin/c++
-- Check for working CXX compiler: /usr/bin/c++ -- works
-- Detecting CXX compiler ABI info
-- Detecting CXX compiler ABI info - done
-- Boost version: 1.44.0
-- Found the following Boost libraries:
-- filesystem
-- system
-- thread
-- program_options
-- date_time
CMake Warning (dev) at webspider/CMakeLists.txt:25 (link_directories):
This command specifies the relative path
../statsem_string/bin
as a link directory.
Policy CMP0015 is not set: link_directories() treats paths relative to the
source dir. Run "cmake --help-policy CMP0015" for policy details. Use the
cmake_policy command to set the policy and suppress this warning.
This warning is for project developers. Use -Wno-dev to suppress it.
CMake Warning (dev) at webspider/CMakeLists.txt:25 (link_directories):
This command specifies the relative path
../3dparty/google-url
as a link directory.
Policy CMP0015 is not set: link_directories() treats paths relative to the
source dir. Run "cmake --help-policy CMP0015" for policy details. Use the
cmake_policy command to set the policy and suppress this warning.
This warning is for project developers. Use -Wno-dev to suppress it.
CMake Error: The following variables are used in this project, but they are set to NOTFOUND.
Please set them or make sure they are set and tested correctly in the CMake files:
HTMLCXX_LIBRARY
linked by target "whalebot" in directory /root/dls/whalebot/webspider
-- Configuring incomplete, errors occurred!
How do I proceed?
It appears that CMake is unable to find the htmlcxx library.
In the whalebot documentation, htmlcxx is listed as a dependency.
You need to download htmlcxx, unzip it, then install it:
cd <path to unzipped htmlcxx>
./configure --enable-static=on --enable-shared=off
make
sudo make install
You may need to add #include <cstddef>
to the top of html/tree.h
to get it to build successfully. It will install to usr/local/
by default.
You also need icu installed if you don't already have it:
sudo apt-get install libicu-dev
Finally, you can now build and install whalebot. Again, making might fail if you have a reasonably up-to-date boost installation.
In line 57 of webspider/src/webspider_options.cpp
, you need to replace boost::filesystem::initial_path().native_directory_string()
with boost::filesystem::initial_path().string()
. Then you should be good to build and install:
cd <path to unzipped whalebot>
mkdir build
cd build
cmake .. -DCMAKE_BUILD_TYPE=Release
make
sudo make install
This too will install to usr/local/
by default.