|
[Rivet] [Yoda] YODA developmentAndy Buckley andy.buckley at ed.ac.ukMon Nov 2 17:02:38 GMT 2009
Ben Waugh wrote: > On 02/11/09 15:26, Andy Buckley wrote: > To recap, I still think that: > * bin height = sum(w) > * error on bin height = sqrt(sum(w^2)) > This follows intuitively from something like: > w1(1+-1) + w2(1+-1) + ... wn(1+-1) > = (w1 + w2 + ... + wn) +- sqrt(w1^2 + w2^2 + ... wn^2) > But yes, a definitive reference and/or a more rigorous derivation would > be nice. Okay, I've convinced myself that this works now. >>>> Except where the observable itself is signed... hmm. >>> >>> Disagreed! I don't think the sign of the observable makes any >>> difference. >> >> Okay, you're right. The problem here is that (A)EEC is implemented in >> Rivet by use of an extra signed term multiplying the generator event >> weight. If YODA is designed to do this properly, i.e. assuming that >> "weight" implies a statistically meaningful measure (i.e. which will be >> >= 0 with asymptotic statistics), then the Rivet implementation(s) of >> (A)EEC will need to be updated not to abuse the weight. > > I see. Yes, that could be seen as abuse! Several others, such as the CDF and UA* analyses of min bias phase space 1/E d^3(sigma)/dp^3 also use "enhanced" weights in this way, in converting that expression to a 1D form in d(pT) by integrating over y (usually eta in practice) and phi. But I think only the AEEC introduces negative weights in this way. >> Thanks for the discussion, by the way --- I think this has been a really >> useful way to work out what YODA should be doing (and what ROOT, AIDA >> etc. should have been doing, too, so that we didn't have to have this >> discussion at all!) > > My pleasure. It's good to have an excuse to think about these things > again now that my stats knowledge is getting a bit rusty. When I get a > chance I'll have to have a look and see what ROOT and AIDA have been > doing. Are they really getting this wrong? AIDA doesn't even know the difference between bin heights and bin areas. As far as implementation goes, the only one I know in detail is LWH, which doesn't attempt any of this moments stuff. AIDA isn't really suitable for much, to be honest: pretty much every area of its API design manages to be either overengineered, awkward, or naïve... and sometimes all three ;) ROOT manages to avoid the bin height/area confusion by just ambiguously calling it BinContent --- but notably unless you call some pretty cryptic methods it will draw bins of non-uniform width with heights that correspond to their areas, i.e. you can generate arbitrary shapes in uniform distributions just by changing the binning ;) You've made me look at the code to find the current error implementation, and it actually seems okay to me, at least in the head version. The code is visible here for histos http://root.cern.ch/viewcvs/trunk/hist/hist/inc/TH1.h?revision=30558&view=markup (most of the functionality is at TH1 base class level, even though TH2 and TH3 inherit from it... that design never made any sense!) and here for profiles: http://root.cern.ch/viewcvs/trunk/hist/hist/inc/TProfile.h?revision=28022&view=markup (similarly, these all inherit from TH<N>D). They use the sum(w) for bin errors, and store the other moments at histogram level, which is fine. I think this has been improved over time: it didn't look right last time I checked, and in particular it looks like the profiles have only been correct for weighted events in the last 7 months... ok, not last week but YODA's been on the back burner for a long time ;) I don't know why TProfiles store an extra vector (sorry, "TArrayD") of sum(w^2) on top of the one that they inherit from TH1. So we'll have some implementation differences from ROOT (i.e. I still think the binwise distributions are neat enough to not need to optimise away a few kB of doubles), much narrower scope (= less bloat), and (I hope) an infinitely nicer API for our purposes. But it looks like ROOT now actually gets the errors right (or at least has the information needed to do so) when combining histos... which is nice to know for LHC analyses! I wonder what would have happened if the LHC had been on time: would the errors have been wrong in publication plots? Are they wrong in some published Tevatron analyses? I suspect that closer scrutiny in the run-up to LHC data has helped find and solve the problems, which both increases my faith in HEP to eventually get things right and gives me one less ROOT feature to grumble about ;) Andy -- Dr Andy Buckley SUPA Advanced Research Fellow Particle Physics Experiment Group, University of Edinburgh The University of Edinburgh is a charitable body, registered in Scotland, with registration number SC005336.
More information about the Rivet mailing list |