[Rivet] Sherpa fails to run on batch system

Holger Schulz holger.schulz at physik.hu-berlin.de
Wed Aug 27 11:44:44 BST 2008


Hi,

I am currently experiencing some trouble trying to run Sherpa via rivet 
on the DESY batch farm.
It's again something with loading libraries.

There are no problems with Pythia6, it works if I want to run jobs 
interactively and if I submit them
to the batch farm (PBS).

However, submitting Sherpa-jobs using rivet to the batch-farm segfaults, 
though interactive jobs work smoothly.

Both, the interactive machines and the batch farm ones are of the type 
sl5_amd64_gcc41.

Here is the backtrace of gdb. I also enabled TRACE for AGILe.Loader:

AGILe.Loader: TRACE  Trying to load 
/afs/ifh.de/group/atlas/users/scratch/hschulz/Software/lib/libAGILeSherpa.so
AGILe.Loader: TRACE  Successfully loaded 
/afs/ifh.de/group/atlas/users/scratch/hschulz/Software/lib/libAGILeSherpa.so 
(0x1999ef90)
AGILe.Loader: TRACE  Setting AGILe module handle for 
/afs/ifh.de/group/atlas/users/scratch/hschulz/Software/lib/libAGILeSherpa.so 
(0x1999ef90)

Program received signal SIGSEGV, Segmentation fault.
0x0000003add278350 in strlen () from /lib64/libc.so.6
(gdb) bt
#0  0x0000003add278350 in strlen () from /lib64/libc.so.6
#1  0x00002aaaaacee47d in AGILe::Loader::loadGenLibs () from 
/afs/ifh.de/group/atlas/users/scratch/hschulz/Software/lib/libAGILe.so.2
#2  0x000000000040b871 in Rivet::generate ()
#3  0x0000000000409348 in main ()
(gdb)


And this is what valgrind says:
AGILe.Loader: TRACE  Testing for 
/afs/cern.ch/sw/lcg/external/MCGenerators/lhapdf/5.4.0/slc5_amd64_gcc41/lib/libLHAPDF.so
AGILe.Loader: TRACE  Testing for 
/afs/cern.ch/sw/lcg/external/MCGenerators/lhapdf/5.4.0.2/lib/libLHAPDF.so
AGILe.Loader: TRACE  Testing for 
/afs/cern.ch/sw/lcg/external/MCGenerators/lhapdf/5.4.0/lib/libLHAPDF.so
==5208== Process terminating with default action of signal 11 (SIGSEGV)
==5208==  Access not within mapped region at address 0x0
==5208==    at 0x4A066C2: strlen (mc_replace_strmem.c:246)
==5208==    by 0x4E4B47C: AGILe::Loader::loadGenLibs(std::string const&) 
(in 
/afs/ifh.de/group/atlas/users/scratch/hschulz/Software/lib/libAGILe.so.2.0.0)
==5208==    by 0x40B870: Rivet::generate(Rivet::Configuration&, 
Rivet::Log&) (in 
/afs/ifh.de/group/atlas/users/scratch/hschulz/Software/bin/rivetgun)
==5208==    by 0x409347: main (in 
/afs/ifh.de/group/atlas/users/scratch/hschulz/Software/bin/rivetgun)
==5208==
==5208== ERROR SUMMARY: 1 errors from 1 contexts (suppressed: 5 from 1)
==5208== malloc/free: in use at exit: 274,773 bytes in 2,828 blocks.
==5208== malloc/free: 22,808 allocs, 19,980 frees, 1,353,078 bytes 
allocated.
==5208== For counts of detected errors, rerun with: -v
==5208== searching for pointers to 2,828 not-freed blocks.
==5208== checked 5,049,712 bytes.



Could this be due to some environment variables not being set correctly? 
I simply source my .zshrc in the batch script.

Any ideas? This leaves me absolutely clueless :(

Holger






More information about the Rivet mailing list