[Rivet] [Yoda] YODA development

Andy Buckley andy.buckley at ed.ac.uk
Mon Nov 2 17:02:38 GMT 2009


Ben Waugh wrote:
> On 02/11/09 15:26, Andy Buckley wrote:

> To recap, I still think that:
>  * bin height = sum(w)
>  * error on bin height = sqrt(sum(w^2))
> This follows intuitively from something like:
>  w1(1+-1) + w2(1+-1) + ... wn(1+-1)
>  = (w1 + w2 + ... + wn) +- sqrt(w1^2 + w2^2 + ... wn^2)
> But yes, a definitive reference and/or a more rigorous derivation would 
> be nice.

Okay, I've convinced myself that this works now.

>>>> Except where the observable itself is signed... hmm.
>>>
>>> Disagreed! I don't think the sign of the observable makes any 
>>> difference.
>>
>> Okay, you're right. The problem here is that (A)EEC is implemented in
>> Rivet by use of an extra signed term multiplying the generator event
>> weight. If YODA is designed to do this properly, i.e. assuming that
>> "weight" implies a statistically meaningful measure (i.e. which will be
>>  >= 0 with asymptotic statistics), then the Rivet implementation(s) of
>> (A)EEC will need to be updated not to abuse the weight.
> 
> I see. Yes, that could be seen as abuse!

Several others, such as the CDF and UA* analyses of min bias phase space 
  1/E d^3(sigma)/dp^3 also use "enhanced" weights in this way, in 
converting that expression to a 1D form in d(pT) by integrating over y 
(usually eta in practice) and phi. But I think only the AEEC introduces 
negative weights in this way.

>> Thanks for the discussion, by the way --- I think this has been a really
>> useful way to work out what YODA should be doing (and what ROOT, AIDA
>> etc. should have been doing, too, so that we didn't have to have this
>> discussion at all!)
> 
> My pleasure. It's good to have an excuse to think about these things 
> again now that my stats knowledge is getting a bit rusty. When I get a 
> chance I'll have to have a look and see what ROOT and AIDA have been 
> doing. Are they really getting this wrong?

AIDA doesn't even know the difference between bin heights and bin areas. 
As far as implementation goes, the only one I know in detail is LWH, 
which doesn't attempt any of this moments stuff. AIDA isn't really 
suitable for much, to be honest: pretty much every area of its API 
design manages to be either overengineered, awkward, or naïve... and 
sometimes all three ;)

ROOT manages to avoid the bin height/area confusion by just ambiguously 
calling it BinContent --- but notably unless you call some pretty 
cryptic methods it will draw bins of non-uniform width with heights that 
correspond to their areas, i.e. you can generate arbitrary shapes in 
uniform distributions just by changing the binning ;) You've made me 
look at the code to find the current error implementation, and it 
actually seems okay to me, at least in the head version. The code is 
visible here for histos

http://root.cern.ch/viewcvs/trunk/hist/hist/inc/TH1.h?revision=30558&view=markup

(most of the functionality is at TH1 base class level, even though TH2 
and TH3 inherit from it... that design never made any sense!) and here 
for profiles:

http://root.cern.ch/viewcvs/trunk/hist/hist/inc/TProfile.h?revision=28022&view=markup

(similarly, these all inherit from TH<N>D). They use the sum(w) for bin 
errors, and store the other moments at histogram level, which is fine. I 
think this has been improved over time: it didn't look right last time I 
checked, and in particular it looks like the profiles have only been 
correct for weighted events in the last 7 months... ok, not last week 
but YODA's been on the back burner for a long time ;)  I don't know why 
TProfiles store an extra vector (sorry, "TArrayD") of sum(w^2) on top of 
the one that they inherit from TH1.

So we'll have some implementation differences from ROOT (i.e. I still 
think the binwise distributions are neat enough to not need to optimise 
away a few kB of doubles), much narrower scope (= less bloat), and (I 
hope) an infinitely nicer API for our purposes. But it looks like ROOT 
now actually gets the errors right (or at least has the information 
needed to do so) when combining histos... which is nice to know for LHC 
analyses!

I wonder what would have happened if the LHC had been on time: would the 
errors have been wrong in publication plots? Are they wrong in some 
published Tevatron analyses? I suspect that closer scrutiny in the 
run-up to LHC data has helped find and solve the problems, which both 
increases my faith in HEP to eventually get things right and gives me 
one less ROOT feature to grumble about ;)

Andy

-- 
Dr Andy Buckley
SUPA Advanced Research Fellow
Particle Physics Experiment Group, University of Edinburgh

The University of Edinburgh is a charitable body, registered in
Scotland, with registration number SC005336.



More information about the Rivet mailing list