[Rivet] Bugfixes

Andy Buckley andy.buckley at cern.ch
Tue Dec 2 13:35:56 GMT 2014


On 02/12/14 13:20, Andrii Verbytskyi wrote:
> 
> Dear Andy,
> 
> 0)
>> Hmm, that's very strange. An autotools tarball shouldn't need the user
>> to have autotools installed at all: the configure etc. scripts should be
>> pure sh, and the Makefile.in templates just get transformed into proper
>> Makefiles by the scripts.
> 
> For some reason I cannot reproduce it.
> 
> 1)
>>> 1)
>>>>> 1) Compilation with
>>>>>  ./configure --prefix=/usr --libdir=/usr/lib64 --enable-root 
>>>>> doesn't work at all for me --  some Cython files are missing, but
>>>>> I don't need python, I'm interested in ROOT only, so I skip it.
>>>>
>>>> Ah, good point -- ROOT compatibility must have been disabled when I made
>>>> the tarball. We'll fix that with the next release.
>>> OK. I mean I don't need it and therefore I don't debug it.
>>
>> Oh, I thought you meant that the rootcompat.cpp file was missing. What
>> Cython files were reported as missing when Python wasn't disabled? It
>> worked for the LCG software group when they made their build for
>> experiment use. We do require a new version of Cython for anyone who
>> wants to rebuild the interface, but like the autotools scripts this
>> shouldn't be needed for just building the tarball.
> 
> 
> 
> 
> 
> I suspect this file was used for generation of core.cpp
> "/home/andy/proj/hep/yoda/pyext/yoda/include/Functions.pyx";
> And if one touches the code make tries to rebuild core.cpp
> --> fail
> 
> core.pyx:43:0: 'include/Functions.pyx' not found
> 
> Error compiling Cython file:

Thanks very much; I'll look into that.


>> Thanks :-)
>>
>>> Also...
>>> Maybe that would be interesting for you but on the recent HERA workshop
>>> Rivet was mentioned multiple times and some people (e.g. Hannes Jung)
>>> were trying to convince the others to submit their analysis to Rivet
>>> repository.
>>> As for me that is a VERY good idea, but it requires some manpower. Few
>>> people will do that. And if it requires a learning of some very new
>>> histograming system like YODA the chances to have the analysis in Rivet
>>> going even lower. Actually nobody at HERA symposium(I've asked them!)
>>> was familiar with YODA, even Hannes, as far as I understood.
>>
>> Rivet uptake is actually increasing fairly well -- it's still something
>> that a minority of experimentalists have hands-on experience with, but a
>> surprising number have embraced it. And I've had quite a bit of feedback
>> about how nice the interface is, including praise for YODA: people are
>> surprised that they don't need to rewrite the workarounds that ROOT
>> requires!
>>
>> Unfortunately improving an interface means breaking compatibility with
>> bad designs, and all my experience with ROOT tells me it's a bad design.
>> From the Rivet user point of view, there is virtually no difference
>> between ROOT histogram objects and YODA ones, except that you don't have
>> to write workarounds for the ROOT global state or non-handling of
>> weighted fills and bin widths/areas. Internally, work that we are doing
>> for handling multi-weighted events would be much more difficult if we
>> had to deal with the ROOT system's approach to histogram object
>> ownership, uniqueness, and lifetimes. Having support for several
>> different histogram codes & formats in Rivet would be incoherent, which
>> is against our design aims, 
> 
> Coding aims are less important than scientific aims. If an incoherent 
> code will do the work better, let it be. 

We already have one of those in ROOT. We're trying to do things a bit
more coherently here, which takes time since it's a side project for all
of us, but it would be a shame to not aim high!

> Anyway, I do not understand how root<->yoda converter script can bring
> "incoherence" to the whole project.

A converter script wouldn't (but we already have one of those, it just
uses the Python interfaces to YODA and ROOT for flexibility and code
decoupling). I thought you were arguing to support ROOT, YODA, and PAW
histogramming interfaces in Rivet analysis algorithms, and reading of
ROOT/PAW reference data files. That would be incoherent and hard to
maintain.

Note that any data submitted to HepData already needs to be converted
into a special text format (a more awkward one that YODA's, IMHO). It's
just one of the things that needs to be done to preserve analysis data
long-term. This could also benefit from a converter, but for the data
I've encoded dumping the raw ROOT file wouldn't be a good data
representation and some manual/scripting work is needed.

>> and since I've seen many cases where ROOT
>> files (an undocumented binary format) are only readable in particular
>> versions of ROOT, I think it would be a very dangerous choice for
>> long-term data storage.
> 
> That is the only realistic option.

It depends on the sort of data. I think it is reasonable to use
different formats for (relatively) low-volume data like histograms and a
different one for high-volume ntuples (and object ntuples, i.e. event
storage). For the low-volume type I think the robustness and human
readability of a text-based format has a lot going for it.

Due to the incompatibility of ROOT files between versions that I've
already seen happening, the longevity of our big datasets does concern
me. I'm not sure if forward compatibility is always guaranteed, but the
lack of documentation of the format is also worrying from a data
curation perspective.

>> Some will disagree with this but that's their prerogative :-)  We
>> thought long and hard about whether ROOT would meet Rivet's requirements
>> -- it's obviously easier to use something that already exists -- and
>> eventually decided that it did not, and we needed to make something new
>> that didn't have those downsides. That's where YODA started... and
>> continues to evolve, because doing histogramming really well turns out
>> to be quite hard!
>>
>>> So, having
>>> a ROOT support would be a big deal for the preservation of results of
>>> old (e.g. HERA) results for the future. In the best case having PAW also
>>> would be a very nice bonus...
>>
>> No :-)  Some code needs to die. If analysis data can't be dumped into a
>> simple text format (low edge, high edge, value, error) and the logic of
>> when to call "fill" isn't equally valid with another histogramming
>> package, then there are bigger problems than our level of ROOT support.
> 
> The whole idea of saving the results for the future is a hope that one
> day more science will be done with them. Nobody needs files, numbers,
> databases, new or advanced software and nice pictures without an ability
> to extract a science out of that -- do analysis and re-analysis. File
> format is the thing of least importance it should be just common and
> convenient for people who do analysis.

Which is why we provide converters, but use a simple but capable text
format for the reasons given above. If I really want long-term stability
and compatibility then ROOT's format is too unstable: something like
HDF5 would be a more globally standard binary format. A text format will
always be fairly easy to decode even if the original package to
read/write it has died in the intervening years.

Andy

-- 
Dr Andy Buckley, Royal Society University Research Fellow
Particle Physics Expt Group, University of Glasgow / PH Dept, CERN


More information about the Rivet mailing list