|
[Rivet] Segmentation fault when running agile-runmc on the gridSara Alderweireldt sara.alderweireldt at ua.ac.beFri Oct 14 13:05:13 BST 2011
Hi Andy, I tried running the nm command before, and locally it returns a T, like you report, whereas on the grid it returns a U. I don't know what causes this difference, but now that you explain the symbols, it seems to be significant. locally: "000000000003e210 T pdfset_" on the grid: " U pdfset_" I checked again (and asked the grid admins) and in principles all machines and worker nodes are supposed to have the same architecture and compiler environment, all slc5 x86_64 with gcc 4.1.2 and python 2.4.3. More ideas :)? Cheers, Sara On 10/13/2011 05:15 PM, Andy Buckley wrote: > Hi Sara, > > This complaint about a missing pdfset_ symbol is odd: that should > definitely be defined in libLHAPDF: > > andy at duality:~$ nm heplocal/lib/libLHAPDF.so | grep pdfset_ > 000d0f70 T finitpdfset_ > 00014340 T initpdfset_ > 00036870 T pdfset_ > > (the "T" means that the symbol is in the "text" section of the > library, i.e. it is defined in the library file rather than just > declared as something that will be eventually found elsewhere, which > would show a "U") > > You did the right thing to enable the TRACE output, and indeed you see > that libLHAPDF cannot be loaded. My suspicion is that the job running > on the Grid node has a different architecture or compiler environment, > which is why that library cannot be loaded. For example, if the job > running on the Grid is in a 32 bit environment but that library is 64 > bit, then indeed the dlopen library loading will fail. Can you check > that? > > Cheers, > Andy > > > On 13/10/11 10:16, Sara Alderweireldt wrote: >> Hello, >> >> To continue on this issue, it's still unsolved, I ran only agile >> (submitted to the grid), with external PDF and TRACE output. It seems to >> be finding and loading everything, except (as could be expected) >> libLHAPDF.so. In that case it finds the library, but can't succesfully >> load it: >> >> AGILe.Loader: TRACE Testing for >> /localgrid/salderwe/TEST/lib/libLHAPDF.so >> AGILe.Loader: TRACE Found /localgrid/salderwe/TEST/lib/libLHAPDF.so >> AGILe.Loader: TRACE Trying to load >> /localgrid/salderwe/TEST/lib/libLHAPDF.so >> AGILe.Loader: TRACE Failed to load >> /localgrid/salderwe/TEST/lib/libLHAPDF.so >> >> If I run the exact same command locally (m-machines in Brussels), the >> problem is gone: >> >> AGILe.Loader: TRACE Testing for >> /localgrid/salderwe/TEST/lib/libLHAPDF.so >> AGILe.Loader: TRACE Found /localgrid/salderwe/TEST/lib/libLHAPDF.so >> AGILe.Loader: TRACE Trying to load >> /localgrid/salderwe/TEST/lib/libLHAPDF.so >> AGILe.Loader: TRACE Successfully loaded >> /localgrid/salderwe/TEST/lib/libLHAPDF.so (0xb478560) >> >> To be complete, the command I ran was: >> >> agile-runmc Pythia6:425 -b LHC:7000 -n 10 -p PYTUNE=343 -o >> test.hepmc -l AGILe.Loader=TRACE >> >> Given this output, I don't know whether I'm still posting this question >> to the right people, maybe I need LHAPDF support instead. In any case, >> it's really puzzling me. I'd be happy with any suggestion on how to move >> forward with this problem. Would it for instance be possible to get >> error messages from LHAPDF or the system in general on what exactly goes >> wrong with this loading of libLHAPDF? >> >> Best regards, >> Sara >> >> On 10/10/2011 10:24 AM, Sara Alderweireldt wrote: >>> Hello, >>> >>> I've been running agile+rivet locally for a while now, and this week >>> attempted moving my runs to the grid. I still access my own copy of >>> agile+rivet (which locally runs fine) and use a python script which >>> calls 'agile-runmc ... &' and 'rivet ...'. If I use the PYTUNE or >>> MSTP(52) flag to set an external PDF from lhapdf, I get a segmentation >>> fault when running on the grid, and no problem when running locally. >>> If I use internal PDFs included in pythia6, everything runs fine both >>> on the grid and locally. >>> >>> At some point, I also had this segmentation fault locally, and traced >>> it back with gdb (and a lot of manual print statements) to: >>> /line 471 throw runtime_error((string("Failed to load libraries: ") + >>> dlerror()).c_str());/ >>> in AGILe-1.3.0/src/Core/Loader.cc. Recompiling both LHAPDF and pythia6 >>> solved this. >>> >>> If I comment out the runtime_error and run on the grid, I get a python >>> symbol lookup error: >>> python: symbol lookup error: mydirs/libpythia6.so: undefined symbol: >>> pdfset_ >>> >>> Do you have any idea what might cause this or what I could try to >>> trace it back further? I'm entirely puzzled by the fact that >>> everything is fine when processing locally and not when submitting to >>> the grid, both methods are accessing the same hard drive with the >>> agile & rivet distributions on it. I tried tracking from the >>> agile-runmc script and got to FPythia.cc which calls PYEVT (at which >>> point the symbol lookup error arrives, no events are produced), but I >>> can't figure out where it goes wrong exactly or how to solve it. >>> >>> I checked when running on the grid what the output of 'lhapdf-config >>> --pdfsets-path' is, and it is returning the correct folder and the >>> needed LHpdf file is there. If versions are relevant, I'm using agile >>> 1.3.0, rivet 1.6.0, pythia 6.425, lhapdf 5.8.6, python 2.4.3 and gcc >>> 4.1.2. I hope you can shed some light on this. >>> >>> Best regards and thanks in advance, >>> Sara -- Sara Alderweireldt sara.alderweireldt at ua.ac.be <mailto:sara.alderweireldt at ua.ac.be> Universiteit Antwerpen Phone: +32 (0)3 265 3577 CGB.U.237 - Physics Groenenborgerlaan 171 2020 Antwerpen http://www.ua.ac.be/edf Belgium -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://www.hepforge.org/lists-archive/rivet/attachments/20111014/45208217/attachment.html>
More information about the Rivet mailing list |