[Rivet] Segmentation fault when running agile-runmc on the grid

Andy Buckley andy.buckley at ed.ac.uk
Thu Oct 13 16:15:34 BST 2011


Hi Sara,

This complaint about a missing pdfset_ symbol is odd: that should 
definitely be defined in libLHAPDF:

andy at duality:~$ nm heplocal/lib/libLHAPDF.so | grep pdfset_
000d0f70 T finitpdfset_
00014340 T initpdfset_
00036870 T pdfset_

(the "T" means that the symbol is in the "text" section of the library, 
i.e. it is defined in the library file rather than just declared as 
something that will be eventually found elsewhere, which would show a "U")

You did the right thing to enable the TRACE output, and indeed you see 
that libLHAPDF cannot be loaded. My suspicion is that the job running on 
the Grid node has a different architecture or compiler environment, 
which is why that library cannot be loaded. For example, if the job 
running on the Grid is in a 32 bit environment but that library is 64 
bit, then indeed the dlopen library loading will fail. Can you check that?

Cheers,
Andy


On 13/10/11 10:16, Sara Alderweireldt wrote:
> Hello,
>
> To continue on this issue, it's still unsolved, I ran only agile
> (submitted to the grid), with external PDF and TRACE output. It seems to
> be finding and loading everything, except (as could be expected)
> libLHAPDF.so. In that case it finds the library, but can't succesfully
> load it:
>
>     AGILe.Loader: TRACE Testing for
>     /localgrid/salderwe/TEST/lib/libLHAPDF.so
>     AGILe.Loader: TRACE Found /localgrid/salderwe/TEST/lib/libLHAPDF.so
>     AGILe.Loader: TRACE Trying to load
>     /localgrid/salderwe/TEST/lib/libLHAPDF.so
>     AGILe.Loader: TRACE Failed to load
>     /localgrid/salderwe/TEST/lib/libLHAPDF.so
>
> If I run the exact same command locally (m-machines in Brussels), the
> problem is gone:
>
>     AGILe.Loader: TRACE Testing for
>     /localgrid/salderwe/TEST/lib/libLHAPDF.so
>     AGILe.Loader: TRACE Found /localgrid/salderwe/TEST/lib/libLHAPDF.so
>     AGILe.Loader: TRACE Trying to load
>     /localgrid/salderwe/TEST/lib/libLHAPDF.so
>     AGILe.Loader: TRACE Successfully loaded
>     /localgrid/salderwe/TEST/lib/libLHAPDF.so (0xb478560)
>
> To be complete, the command I ran was:
>
>     agile-runmc Pythia6:425 -b LHC:7000 -n 10 -p PYTUNE=343 -o
>     test.hepmc -l AGILe.Loader=TRACE
>
> Given this output, I don't know whether I'm still posting this question
> to the right people, maybe I need LHAPDF support instead. In any case,
> it's really puzzling me. I'd be happy with any suggestion on how to move
> forward with this problem. Would it for instance be possible to get
> error messages from LHAPDF or the system in general on what exactly goes
> wrong with this loading of libLHAPDF?
>
> Best regards,
> Sara
>
> On 10/10/2011 10:24 AM, Sara Alderweireldt wrote:
>> Hello,
>>
>> I've been running agile+rivet locally for a while now, and this week
>> attempted moving my runs to the grid. I still access my own copy of
>> agile+rivet (which locally runs fine) and use a python script which
>> calls 'agile-runmc ... &' and 'rivet ...'. If I use the PYTUNE or
>> MSTP(52) flag to set an external PDF from lhapdf, I get a segmentation
>> fault when running on the grid, and no problem when running locally.
>> If I use internal PDFs included in pythia6, everything runs fine both
>> on the grid and locally.
>>
>> At some point, I also had this segmentation fault locally, and traced
>> it back with gdb (and a lot of manual print statements) to:
>> /line 471 throw runtime_error((string("Failed to load libraries: ") +
>> dlerror()).c_str());/
>> in AGILe-1.3.0/src/Core/Loader.cc. Recompiling both LHAPDF and pythia6
>> solved this.
>>
>> If I comment out the runtime_error and run on the grid, I get a python
>> symbol lookup error:
>> python: symbol lookup error: mydirs/libpythia6.so: undefined symbol:
>> pdfset_
>>
>> Do you have any idea what might cause this or what I could try to
>> trace it back further? I'm entirely puzzled by the fact that
>> everything is fine when processing locally and not when submitting to
>> the grid, both methods are accessing the same hard drive with the
>> agile & rivet distributions on it. I tried tracking from the
>> agile-runmc script and got to FPythia.cc which calls PYEVT (at which
>> point the symbol lookup error arrives, no events are produced), but I
>> can't figure out where it goes wrong exactly or how to solve it.
>>
>> I checked when running on the grid what the output of 'lhapdf-config
>> --pdfsets-path' is, and it is returning the correct folder and the
>> needed LHpdf file is there. If versions are relevant, I'm using agile
>> 1.3.0, rivet 1.6.0, pythia 6.425, lhapdf 5.8.6, python 2.4.3 and gcc
>> 4.1.2. I hope you can shed some light on this.
>>
>> Best regards and thanks in advance,
>> Sara
>>
>> --
>>
>> Sara Alderweireldt sara.alderweireldt at ua.ac.be
>> <mailto:sara.alderweireldt at ua.ac.be>
>> Universiteit Antwerpen Phone: +32 (0)3 265 3577
>> CGB.U.237 - Physics
>> Groenenborgerlaan 171
>> 2020 Antwerpen http://www.ua.ac.be/edf
>> Belgium
>>


-- 
Dr Andy Buckley
SUPA Advanced Research Fellow
Particle Physics Experiment Group, University of Edinburgh

The University of Edinburgh is a charitable body, registered in
Scotland, with registration number SC005336.



More information about the Rivet mailing list