I'm working on some python code that extracts some image data from an ECW file using GDAL (http://www.gdal.org/) and its python bindings. GDAL was built from source to have ECW support.
The program is run on a cluster server that I ssh into. I have tested the program through the ssh terminal and it runs fine. However, I would now like to submit a job to the cluster using qsub, but it reports the following:
Traceback (most recent call last):
File "./gdal-test.py", line 5, in <module>
from osgeo import gdal
File "/home/h3/ctargett/.local/lib/python2.6/site-packages/GDAL-1.11.1-py2.6-linux-x86_64.egg/osgeo/__init__.py", line 21, in <module>
_gdal = swig_import_helper()
File "/home/h3/ctargett/.local/lib/python2.6/site-packages/GDAL-1.11.1-py2.6-linux-x86_64.egg/osgeo/__init__.py", line 17, in swig_import_helper
_mod = imp.load_module('_gdal', fp, pathname, description)
ImportError: /mnt/aeropix/prgs/.local/lib/libgdal.so.1: undefined symbol: H5Eset_auto2
I did a bit more digging and tried using LD_DEBUG=symbols
to try and work out where the difference was, but that's about as far as my knowledge/understanding has got me.
For reference, here's what happens with LD_DEBUG=symbols
and running the code in the ssh terminal (piping through grep H5Eset_auto2
to reduce some of the output):
Symbol debug output for code running in ssh terminal:
11359: symbol=H5Eset_auto2; lookup in file=/usr/bin/python26 [0]
11359: symbol=H5Eset_auto2; lookup in file=/usr/lib64/libpython2.6.so.1.0 [0]
11359: symbol=H5Eset_auto2; lookup in file=/lib64/libpthread.so.0 [0]
11359: symbol=H5Eset_auto2; lookup in file=/lib64/libdl.so.2 [0]
11359: symbol=H5Eset_auto2; lookup in file=/lib64/libutil.so.1 [0]
11359: symbol=H5Eset_auto2; lookup in file=/lib64/libm.so.6 [0]
11359: symbol=H5Eset_auto2; lookup in file=/lib64/libc.so.6 [0]
11359: symbol=H5Eset_auto2; lookup in file=/lib64/ld-linux-x86-64.so.2 [0]
11359: symbol=H5Eset_auto2; lookup in file=/home/h3/ctargett/.local/lib/python2.6/site-packages/GDAL-1.11.1-py2.6-linux-x86_64.egg/osgeo/_gdal.so [0]
11359: symbol=H5Eset_auto2; lookup in file=/usr/lib64/libpython2.6.so.1.0 [0]
11359: symbol=H5Eset_auto2; lookup in file=/mnt/aeropix/prgs/.local/lib/libgdal.so.1 [0]
11359: symbol=H5Eset_auto2; lookup in file=/usr/lib64/libstdc++.so.6 [0]
11359: symbol=H5Eset_auto2; lookup in file=/lib64/libm.so.6 [0]
11359: symbol=H5Eset_auto2; lookup in file=/lib64/libgcc_s.so.1 [0]
11359: symbol=H5Eset_auto2; lookup in file=/lib64/libpthread.so.0 [0]
11359: symbol=H5Eset_auto2; lookup in file=/lib64/libc.so.6 [0]
11359: symbol=H5Eset_auto2; lookup in file=/lib64/libdl.so.2 [0]
11359: symbol=H5Eset_auto2; lookup in file=/lib64/libutil.so.1 [0]
11359: symbol=H5Eset_auto2; lookup in file=/mnt/aeropix/prgs/.local/lib/libhdf5.so.7 [0]
11359: symbol=H5Eset_auto2; lookup in file=/usr/bin/python26 [0]
11359: symbol=H5Eset_auto2; lookup in file=/usr/lib64/libpython2.6.so.1.0 [0]
11359: symbol=H5Eset_auto2; lookup in file=/lib64/libpthread.so.0 [0]
11359: symbol=H5Eset_auto2; lookup in file=/lib64/libdl.so.2 [0]
11359: symbol=H5Eset_auto2; lookup in file=/lib64/libutil.so.1 [0]
11359: symbol=H5Eset_auto2; lookup in file=/lib64/libm.so.6 [0]
11359: symbol=H5Eset_auto2; lookup in file=/lib64/libc.so.6 [0]
11359: symbol=H5Eset_auto2; lookup in file=/lib64/ld-linux-x86-64.so.2 [0]
11359: symbol=H5Eset_auto2; lookup in file=/home/h3/ctargett/.local/lib/python2.6/site-packages/GDAL-1.11.1-py2.6-linux-x86_64.egg/osgeo/_gdal.so [0]
11359: symbol=H5Eset_auto2; lookup in file=/usr/lib64/libpython2.6.so.1.0 [0]
11359: symbol=H5Eset_auto2; lookup in file=/mnt/aeropix/prgs/.local/lib/libgdal.so.1 [0]
11359: symbol=H5Eset_auto2; lookup in file=/usr/lib64/libstdc++.so.6 [0]
11359: symbol=H5Eset_auto2; lookup in file=/lib64/libm.so.6 [0]
11359: symbol=H5Eset_auto2; lookup in file=/lib64/libgcc_s.so.1 [0]
11359: symbol=H5Eset_auto2; lookup in file=/lib64/libpthread.so.0 [0]
11359: symbol=H5Eset_auto2; lookup in file=/lib64/libc.so.6 [0]
11359: symbol=H5Eset_auto2; lookup in file=/lib64/libdl.so.2 [0]
11359: symbol=H5Eset_auto2; lookup in file=/lib64/libutil.so.1 [0]
11359: symbol=H5Eset_auto2; lookup in file=/mnt/aeropix/prgs/.local/lib/libhdf5.so.7 [0]
Symbol debug output for code submitted using qsub:
16915: symbol=H5Eset_auto2; lookup in file=/usr/bin/python26 [0]
16915: symbol=H5Eset_auto2; lookup in file=/usr/lib64/libpython2.6.so.1.0 [0]
16915: symbol=H5Eset_auto2; lookup in file=/lib64/libpthread.so.0 [0]
16915: symbol=H5Eset_auto2; lookup in file=/lib64/libdl.so.2 [0]
16915: symbol=H5Eset_auto2; lookup in file=/lib64/libutil.so.1 [0]
16915: symbol=H5Eset_auto2; lookup in file=/lib64/libm.so.6 [0]
16915: symbol=H5Eset_auto2; lookup in file=/lib64/libc.so.6 [0]
16915: symbol=H5Eset_auto2; lookup in file=/lib64/ld-linux-x86-64.so.2 [0]
16915: symbol=H5Eset_auto2; lookup in file=/home/h3/ctargett/.local/lib/python2.6/site-packages/GDAL-1.11.1-py2.6-linux-x86_64.egg/osgeo/_gdal.so [0]
16915: symbol=H5Eset_auto2; lookup in file=/usr/lib64/libpython2.6.so.1.0 [0]
16915: symbol=H5Eset_auto2; lookup in file=/mnt/aeropix/prgs/.local/lib/libgdal.so.1 [0]
16915: symbol=H5Eset_auto2; lookup in file=/usr/lib64/libstdc++.so.6 [0]
16915: symbol=H5Eset_auto2; lookup in file=/lib64/libm.so.6 [0]
16915: symbol=H5Eset_auto2; lookup in file=/lib64/libgcc_s.so.1 [0]
16915: symbol=H5Eset_auto2; lookup in file=/lib64/libpthread.so.0 [0]
16915: symbol=H5Eset_auto2; lookup in file=/lib64/libc.so.6 [0]
16915: symbol=H5Eset_auto2; lookup in file=/lib64/libdl.so.2 [0]
16915: symbol=H5Eset_auto2; lookup in file=/lib64/libutil.so.1 [0]
16915: symbol=H5Eset_auto2; lookup in file=/mnt/aeropix/prgs/.local/lib/libhdf5.so.7 [0]
16915: symbol=H5Eset_auto2; lookup in file=/usr/lib64/libjpeg.so.62 [0]
16915: symbol=H5Eset_auto2; lookup in file=/usr/lib64/libpng12.so.0 [0]
16915: symbol=H5Eset_auto2; lookup in file=/usr/lib64/libpq.so.4 [0]
16915: symbol=H5Eset_auto2; lookup in file=/usr/lib64/libcurl.so.3 [0]
16915: symbol=H5Eset_auto2; lookup in file=/usr/lib64/libgssapi_krb5.so.2 [0]
16915: symbol=H5Eset_auto2; lookup in file=/usr/lib64/libkrb5.so.3 [0]
16915: symbol=H5Eset_auto2; lookup in file=/usr/lib64/libk5crypto.so.3 [0]
16915: symbol=H5Eset_auto2; lookup in file=/lib64/libcom_err.so.2 [0]
16915: symbol=H5Eset_auto2; lookup in file=/usr/lib64/libidn.so.11 [0]
16915: symbol=H5Eset_auto2; lookup in file=/lib64/libssl.so.6 [0]
16915: symbol=H5Eset_auto2; lookup in file=/lib64/libcrypto.so.6 [0]
16915: symbol=H5Eset_auto2; lookup in file=/mnt/aeropix/prgs/.local/lib/libNCSEcw.so.0 [0]
16915: symbol=H5Eset_auto2; lookup in file=/mnt/aeropix/prgs/.local/lib/libNCSEcwC.so.0 [0]
16915: symbol=H5Eset_auto2; lookup in file=/mnt/aeropix/prgs/.local/lib/libNCSCnet.so.0 [0]
16915: symbol=H5Eset_auto2; lookup in file=/mnt/aeropix/prgs/.local/lib/libNCSUtil.so.0 [0]
16915: symbol=H5Eset_auto2; lookup in file=/lib64/librt.so.1 [0]
16915: symbol=H5Eset_auto2; lookup in file=/usr/lib64/libxml2.so.2 [0]
16915: symbol=H5Eset_auto2; lookup in file=/mnt/aeropix/prgs/.local/lib/libz.so.1 [0]
16915: symbol=H5Eset_auto2; lookup in file=/lib64/ld-linux-x86-64.so.2 [0]
16915: symbol=H5Eset_auto2; lookup in file=/lib64/libcrypt.so.1 [0]
16915: symbol=H5Eset_auto2; lookup in file=/lib64/libresolv.so.2 [0]
16915: symbol=H5Eset_auto2; lookup in file=/lib64/libnsl.so.1 [0]
16915: symbol=H5Eset_auto2; lookup in file=/usr/lib64/libkrb5support.so.0 [0]
16915: symbol=H5Eset_auto2; lookup in file=/lib64/libkeyutils.so.1 [0]
16915: symbol=H5Eset_auto2; lookup in file=/lib64/libselinux.so.1 [0]
16915: symbol=H5Eset_auto2; lookup in file=/lib64/libsepol.so.1 [0]
16915: /mnt/aeropix/prgs/.local/lib/libgdal.so.1: error: symbol lookup error: undefined symbol: H5Eset_auto2 (fatal)
ImportError: /mnt/aeropix/prgs/.local/lib/libgdal.so.1: undefined symbol: H5Eset_auto2
I guess I'm not sure why it seems to stop looking in libgdal.so.1 when submitted using qsub, when it continues to look when just run in the terminal. I also note that the qsub job is able to correctly locate libhdf5.so.7
(which is where it should find H5Eset_auto2
) as it can find a different symbol, H5Eprint
:
16915: symbol=H5Eprint; lookup in file=/usr/lib64/libpython2.6.so.1.0 [0]
16915: symbol=H5Eprint; lookup in file=/mnt/aeropix/prgs/.local/lib/libgdal.so.1 [0]
16915: symbol=H5Eprint; lookup in file=/usr/lib64/libstdc++.so.6 [0]
16915: symbol=H5Eprint; lookup in file=/lib64/libm.so.6 [0]
16915: symbol=H5Eprint; lookup in file=/lib64/libgcc_s.so.1 [0]
16915: symbol=H5Eprint; lookup in file=/lib64/libpthread.so.0 [0]
16915: symbol=H5Eprint; lookup in file=/lib64/libc.so.6 [0]
16915: symbol=H5Eprint; lookup in file=/lib64/libdl.so.2 [0]
16915: symbol=H5Eprint; lookup in file=/lib64/libutil.so.1 [0]
16915: symbol=H5Eprint; lookup in file=/mnt/aeropix/prgs/.local/lib/libhdf5.so.7 [0]
Any pointers on this would be incredibly useful at this stage (I hope that's enough information - I'm more than happy to provide more information, I'm just not sure what else might be useful at this stage).
EDIT:
It seems that the contents of /usr/bin
are different for jobs submitted using qsub
(specifically libtool
is missing). This is being investigated.
After two dozens of comments to understand the situation, it was found that the libhdf5.so.7
was actually a symlink (with several levels of indirection) to a file that was not shared between the queued processes and the interactive processes. This means even though the symlink itself lies on a shared filesystem, the contents of the file do not and as a result the process was seeing different versions of the library.
For future reference: other than checking LD_LIBRARY_PATH
, it's always a good idea to check a library with nm -D
to see if the symbols actually exist. In this case it was found that they do exist in interactive mode but not when run in the queue. A quick md5sum
revealed that the files were actually different.