[Rivet] Efficiency histograms in Rivet 2

Andy Buckley andy.buckley at cern.ch
Tue Sep 30 18:12:44 BST 2014


Hi James... amazing timing given that I sent my belated reply to your
original email at almost exactly the same time! And also given that we
are _this_ close to releasing new Rivet+YODA versions: --><--

This was quite interesting. My unconsidered impression was that adding
positive weights would always increase the N_eff... or, more nuanced,
that it would increase such that N_eff would always be higher for a
superset. But it's not actually true, and causes the problem you found
if there is the "right" sort of weight bias in how the numerator
selection works.

You can do the algebra on this if you want to feel like a proper pen &
paper physicist again -- it's quite fun -- but here's an interesting
result: if I have an efficiency counter in a state S_1 = sum w and S_2 =
sum w^2, so that N_eff = S_1^2 / S_2, then the change in N_eff when
adding a new weight x is

Delta N_eff = x[ 2 S_1 S_2 + x(S_2 - S_1^2) ]  /  S_2(S_2 + x^2)

The denominator is positive for any situation other than a null (x=0)
fill on a null S_2 = 0 state, so we can ignore that. But the numerator
can be negative. Note that for positive weights S_1^2 > S_2, so the
N_eff will decrease (in the case of only positive fills so far) if

x > 2 S_1 S_2 / (S_1^2 - S_2)

and I verified this with an explicit toy example of filling weights 1,
then 2, then 8 where the last step decreases N_eff (the critical weight
in the last step would have been 7.5). Interesting! It seems a bit more
obvious if you acknowledge that the order of fills can't affect the
result, so you could engineer subsets of fills distributed to give high
and low N_eff which need to give the same answer when added together in
either order.

Anyway, the result is that asserting N_eff^acc <= N_eff^tot is just not
true. Surprise! So this is one of those cases where N_entries is the
better variable. I'll update the YODA efficiency routines in time for
tonight's new release!

Cheers, and I hope that was as illuminating for you as it was for me ;-)
Andy


On 30/09/14 17:09, James Robinson wrote:
> Dear Andy and David,
> 
> I've done a bit more digging here and the problematic check is on the
> effective number of entries.
> ​I have two Histo1DPtr objects,
> hist_denominator
> and
> hist_numerator
> and ​
> I'm filling both histograms with
> 
> hist
> ​->​
> fill( value, weight );
> 
> where sometimes I fill hist_denominator and sometimes hist_numerator AND
> hist_denominator
> ​. If I print out every time a particular bin gets filled, you can see
> this.​
> 
> 
> Rivet.Analysis.ATLAS_STDM_2012_17: INFO  Filling hist_denominator with
> ​weight ​
> 587738 at value = 3.00608
> Rivet.Analysis.ATLAS_STDM_2012_17: INFO    bin now contains actual
> entries: 4 / 16
> Rivet.Analysis.ATLAS_STDM_2012_17: INFO    bin now contains effective
> entries: 1.15024 / 1.195 => 0.962543
> Rivet.Analysis.ATLAS_STDM_2012_17: INFO  Filling hist_denominator
> with weight 107673 at
> ​value
>  = 3.40704
> Rivet.Analysis.ATLAS_STDM_2012_17: INFO  Filling hist_numerator with weight 
> ​ ​
> 107673 at value = 3.40704
> Rivet.Analysis.ATLAS_STDM_2012_17: INFO    bin now contains actual
> entries: 5 / 17
> Rivet.Analysis.ATLAS_STDM_2012_17: INFO    bin now contains effective
> entries: 1.20029 / 1.19789 => 1.002
> 
> ​The second fill here is the problematic one, causing the efficiency to
> go above 1.0. Adding a new value to both the numerator and denominator
> increases the number of effective entries in both cases (as would be
> expected) but increases effNumEntries in the numerator by more than the
> denominator. This is what's causing the problem.
> 
> ​I don't think the logic in using sum(w)^2 / sum(w^2)​ to calculate
> nEntries is wrong, but I'm not sure exactly what's going on here. 
>> Any ideas
> ​?
> 
> I can give a more detailed breakdown of exactly what I fill these
> histograms with if needed (as you can see it's only 17 events).
> 
> Thanks,
> James​
> 
> On 23 September 2014 20:16, James Robinson <james.robinson at cern.ch
> <mailto:james.robinson at cern.ch>> wrote:
> 
>     Dear Andy,
> 
>     I'm getting an error from Rivet complaining about an invalid
>     efficiency calculation:
> 
>     Rivet_i              INFO Rivet_i finalizing
>     Rivet.Analysis.Handler: INFO  Finalising analyses
>     Rivet_i             FATAL  Standard std::exception is caught
>     Rivet_i             ERROR Attempt to calculate an efficiency when
>     the numerator is not a subset of the denominator
> 
>     However, if I get the code to print out the individual histograms,
>     bin-by-bin, I can't see any bin where the numerator is larger than
>     the denominator:
> 
>     Rivet.Analysis.ATLAS_STDM_2012_17: INFO  There are 263 / 393 entri
>     ​es in the histograms
>     ​​
>     ​ ​
> 
>     Rivet.Analysis.ATLAS_STDM_2012_17: INFO  dy accepted: 1.1726e+17,
>     total: 1.23204e+17
>     Rivet.Analysis.ATLAS_STDM_2012_17: INFO  dy accepted: 1.01373e+17,
>     total: 1.23149e+17
>     Rivet.Analysis.ATLAS_STDM_2012_17: INFO  dy accepted: 6.85239e+16,
>     total: 1.2764e+17
>     Rivet.Analysis.ATLAS_STDM_2012_17: INFO  dy accepted: 8.20248e+16,
>     total: 1.4139e+17
>> 
>     Rivet.Analysis.ATLAS_STDM_2012_17: INFO  dy accepted: 1.65158e+16,
>     total: 4.2562e+16
>     Rivet.Analysis.ATLAS_STDM_2012_17: INFO  dy accepted: 7.17599e+15,
>     total: 1.69049e+16
>     Rivet.Analysis.ATLAS_STDM_2012_17: INFO  dy accepted: 7.2618e+14,
>     total: 2.16647e+16
>     Rivet.Analysis.ATLAS_STDM_2012_17: INFO  dy accepted: 9.2832e+15,
>     total: 2.75894e+16
>     Rivet.Analysis.ATLAS_STDM_2012_17: INFO  dy accepted: 2.69685e+14,
>     total: 3.10438e+14
>     Rivet.Analysis.ATLAS_STDM_2012_17: INFO  dy accepted: 9.55894e+14,
>     total: 9.55894e+14
>     Rivet.Analysis.ATLAS_STDM_2012_17: INFO  dy accepted: 0, total:
>     1.66588e+15
>     Rivet.Analysis.ATLAS_STDM_2012_17: INFO  dy accepted: 0, total: 0
>     Rivet.Analysis.ATLAS_STDM_2012_17: INFO  dy accepted: 0, total: 0
>     Rivet.Analysis.ATLAS_STDM_2012_17: INFO  dy accepted: 0, total: 0
> 
>     Any idea what I'm doing wrong?
> 
>     Thanks,
>     James
> 
>     On 17 September 2014 11:13, Andy Buckley <andy.buckley at cern.ch
>     <mailto:andy.buckley at cern.ch>> wrote:
> 
>         On 17/09/14 10:38, James Robinson wrote:
>         >
>         >
>         > On 17 September 2014 11:26, Andy Buckley <andy.buckley at cern.ch <mailto:andy.buckley at cern.ch>
>         > <mailto:andy.buckley at cern.ch <mailto:andy.buckley at cern.ch>>> wrote:
>         >
>         >     On 16/09/14 22:53, James Robinson wrote:
>         >     > Dear Andy and David,
>         >     >
>         >     > On 16 September 2014 20:58, Andy Buckley <andy.buckley at cern.ch <mailto:andy.buckley at cern.ch>
>         <mailto:andy.buckley at cern.ch <mailto:andy.buckley at cern.ch>>
>         >     > <mailto:andy.buckley at cern.ch <mailto:andy.buckley at cern.ch>
>         <mailto:andy.buckley at cern.ch <mailto:andy.buckley at cern.ch>>>> wrote:
>         >     >
>         >     >     In fact, there is even an efficiency(pass, tot) function that takes 1D
>         >     >     histos and returns a Scatter2D with appropriate binomial statistics
>         >     >     treatment!
>         >     >
>         >     >     Andy
>         >     >
>         >     >
>         >     > ​I think this is exactly what I want! The binomial error treatment is
>         >     > important for this analysis as the efficiencies tend towards 1.0 in some
>         >     > parts of phase space.​ What is the appropriate class called?
>         >
>         >     It's a function rather than a bound method (so call it as above):
>         >
>         >     https://yoda.hepforge.org/trac/browser/include/YODA/Histo1D.h#L426
>         >
>         >     We don't yet have the equivalent for 2D histos, but it should be trivial
>         >     to put together from the 1D version. I'll add that to my TODO list, but
>         >     maybe someone else fancies doing it? ;-)
>         >
>         >
>         > ​Ah OK. That makes sense. Presumably I still need to book it somehow to
>         > get it written out? Is there another method to register existing YODA
>         > objects? Or can the bookScatter2D() function accept an existing
>         > Scatter2D as it's argument (instead of a bin descriptor)?
> 
>         Good points -- I should provide helper methods for doing this in
>         Rivet,
>         with automatic registration of the output Scatter, like we have for
>         YODA's divide(h_a,h_b) -> Rivet's divide(h_a,h_b,s_out). I'll
>         add those
>         to the 2.2.0 release candidate.
> 
>         For now I think you can use addAnalysisObject(aoptr) to register any
>         YODA analysis object. But it's made a little fiddly because YODA
>         returns
>         a stack-allocated object and what is needed is a new'd heap
>         object. You
>         might need to do
> 
>         Scatter2D s = efficiency(*tmph1, *tmph2);
>         s.setPath(histoPath("foo"));
>         addAnalysisObject(s.newclone());
> 
>         or similar. It's not very pretty, which is why I should provide the
>         cosmetic wrapper functions.
> 
>         Andy
> 
>         --
>         Dr Andy Buckley, Royal Society University Research Fellow
>         Particle Physics Expt Group, University of Glasgow / PH Dept, CERN
> 
> 
> 


-- 
Dr Andy Buckley, Royal Society University Research Fellow
Particle Physics Expt Group, University of Glasgow / PH Dept, CERN


More information about the Rivet mailing list