[Rivet] [Yoda] YODA development

Andy Buckley andy.buckley at ed.ac.uk
Mon Nov 2 14:16:37 GMT 2009
Previous message: [Rivet] [Yoda] YODA development
Next message: [Rivet] [Yoda] YODA development
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
Ben Waugh wrote:
> Hi Andy,
> 
> Here are my further thoughts from random moments over the weekend, after 
> I have tried to beat them into some kind of order. Can you give a 
> concrete example of a generator and steering parameters where negative 
> weights crop up? I would like to sanity check my arguments against a 
> real case.

The real cases I can think of are MC at NLO, where the generator itself can 
produce negative weights (although they are already unweighted by 
modulus) and of some signed observables like AEEC. I'm motivated by 
wanting to do something which doesn't just die or go crazy when negative 
weights are encountered: the AEEC observable is a case where they 
definitely will be.

> My first thoughts, while I still think correct as far as they go, were 
> not based on a very clear picture of what we are trying to do. I think 
> now I have better untangled in my head the various questions we are 
> trying to answer for any given sample. In particular I may have been 
> confusing the issues related to bin height and those related to means 
> and variances of variables other than those used in the binning.

Yes, me too.

> On that topic, I can't see why it is necessary to use a "distribution" 
> object (Dbn1D) to store the contents of each bin in a non-profile 
> histogram. The fill method in this case is keeping information on the 
> distribution (in the binning variable) of entries within the bin. There 
> may be cases where this is useful, but if so it could be achieved using 
> a profile histogram with the Y variable the same as the X (binned) 
> variable. Using a Dbn1D for every histogram bin seems to me to be 
> overkill, and the generalization to more than one dimension involves 
> tracking all the covariances, i.e. O(m^2) quantities where m is the 
> number of dimensions. Is there a use case for this, or is this a "nice 
> to have" feature?

It is something which became a very useful centralising design feature 
when I started writing multiple classes that had to handle similar 
statistics. I don't have a plan to make any 2D profile classes or 
similar, for which a more extended object would be needed, so it's 
really just a convenience feature... I would certainly like to keep it 
at least for design/implementation/maintenance reasons.

However, unless I'm being dense, storing various moments is required to 
get the whole-distribution moments correct under rebinning: it allows 
the shape of the distribution within the bins to be stored.

> On 31/10/09 11:09, Andy Buckley wrote:
>>> The YODA approach is to do every stat combination based on storing
>>> weighted moments of the variable being filled, i.e. w, w^2, wx, wx^2,
>>> etc.,
> 
> Anything that is calculated exclusively from sums over the sample(s) of 
> event-wise quantities must be invariant under re-partitioning. If we 
> deviate from this we should worry that we are doing something 
> inconsistent. I would also suspect that if we get e.g. an imaginary 
> result for the error on a real quantity then we are also making a 
> mistake. And of course we want to be correct, not just consistent!

Yes ;) I'm inclined to throw exceptions if statistical quantities are 
requested from any distribution with sum(w) < 1

>> #1. What I said above isn't quite right --- the perils of replying in a
>> hurry.

I spotted another of these from me: var(O) = <wO^2> - <wO>^2, rather 
than a first term of <w^2O^2> as I said before. So we don't need to 
store the w^2 moments. But I think we should store N_fill in Dbn1D so 
that for example the average weight can be calculated: this is also safe 
under bin combination.

>> The variance I calculated on the alternating +-1 weight examples
>> isn't the variance on the bin height, but rather the variance of the
>> distribution of weights themselves. It does, however, raise the question
>> of what is the "N" used to compute the averages in this quantity: if N =
>> sum(w), then we get into trouble with division when the positive and
>> negative weights are equal (i.e. sum(w) = 0). The truth is that the
>> distribution does have a width, since it alternates between +1 and -1...
>> so my feeling is that we should be using N = number of fills. But this
>> causes problems elsewhere: maybe for our purposes the correct thing to
>> do is N = sum(w) and just say that the error is undefined for sum(w) =
>> 0, as it would be for N=0. Hmm.
> 
> You are talking here about calculating means, variances and errors of 
> eventwise quantities, not bin heights, are you not? I'll try to deal 
> with this under #3.

Yes, that's right. I also realised that hasty mistake, but it also 
raised the question of what to use for N.

>> #2. The error on bin height is the binomial error sqrt(height). To allow
>> arbitrary partitionings is a pretty crucial invariant, so we should be
>> able to separate this alternating fill set into two equal sets of purely
>> +1 and purely -1 weights, and then recombine. This clearly works for the
>> positive weights, but for the negative weights it requires taking the
>> sqrt of a negative number. If we're not worried about this --- just
>> treat it as a complex quantity and return mod(sqrt(sum(w))) when asked
>> --- it all works fine. The usual "sum in quadrature" rule works if
>> you're more explicit about having an imaginary error, but is really
>> irrelevant from a computational standpoint.
> 
> The bin height in a histogram is the estimator of a probability or an 
> expected number of events in a given region of phase space. If the 
> probability density is zero in some reason, then we would expect (in the 
> long run) equal positive and negative weights to occur in that region. 
> In most cases in fact we would expect no events to be generated in that 
> region at all.
> 
> If we do get events, then we should still calculate the error on the bin 
> height to be the usual sum(w^2), regardless of the sign of w. In our 
> rather pathological toy model where weights are either +1 or -1, suppose 
> we generate a sample of size such that we expect 2N events in a 
> particular bin. We would expect in this bin to see N +- sqrt(N) with 
> weight +1 and N +- sqrt(N) with weight -1, giving a bin height of 0 +- 
> sqrt(2N).
> 
> Here I am using Poisson rather than binomial statistics. I haven't yet 
> considered the difference in error calculation depending on whether we 
> are dealing with known luminosity (but variable number of events) or 
> known number of events (but variable fraction in each bin).

Okay, so I'm now a little uncertain about what stats should be used for 
bin height errors... this way the error grows with number of events, 
while the mean remains at zero in the pathological case. I was coming to 
a conclusion that the binomial error makes sense if calculated as 
sqrt(sum(w)), with a thrown exception if sum(w) < 1.

>> #3. Things become more awkward when we start trying to compute means and
>> variances of observables, just x for example. The most obvious problem
>> comes when there are partitionings (or bins) which have an overall
>> negative sum(w). For the equal +-1 partitionings, if they all are at
>> x=1, then the -1 set will have a mean of -1 (if we divide by N=#fills),
>> which doesn't mean much but is necessary for repartitioning invariance
>> (the sum(wx) needs to be additive). So the apparent central x value is
>> skewed because (-w)x = w(-x). But there are also problems when the two
>> sets are added together: they will have a mean of 0 with a variance of
>> 1: this doesn't represent anything meaningful. And what about when we
>> combine binwise stats to get moments for the whole histogram: bins with
>> negative sum(w) will skew any moment distributions as if the x value had
>> been negative. Symmetry around x=0 doesn't mean any thing for general
>> distributions, so I think this is the wrong approach. I think this is
>> all resolved if we use N=sum(w), and just accept that if sum(w) = 0 (and
>> sum(w) < 0?) we can't produce a meaningful answer.
> 
> I think I'll have to think about this some more and come back to it. 
> This is certainly a more difficult area than the bin heights. Certainly 
> if the weights in some region (nearly) cancel out then the probability 
> density in that region is (close to) zero and we cannot expect to 
> calculate a reliable estimate of our quantity of interest.

Yes, and I don't think regions of phase space with overall negative 
weights can have a very meaningful probabilistic interpretation either. 
Except where the observable itself is signed... hmm.

> However, I think this particular toy case (if I understand it correctly) 
> is actually an exception. If all events have x=1, regardless of weight, 
> then we can still say that x=1 is a good estimate of the true value, if 
> there is such a thing. Even for the sample with w = -1, the mean x is 1, 
> not -1. mean(x) = sum(wx)/sum(w) = (-n)/(-n) = 1.

Yep, that's why I think N=sum(w) works okay... once my <w^2 O^2> -> <w 
O^2> booboo is sorted.

>> Incidentally, another nice sanity check is that if the weights aren't
>> exactly balanced, say 20 -1s, 30 +1s, we should be able to partition as
>> [20+,20-,10+] or as [20-,30+], or even just [10+]. We should also be
>> able to partition binwise, i.e. changing bin boundaries shouldn't change
>> the answers for the whole histogram. Needs a bit more thought.
> 
> I'll certainly give it some more thought, but I'm pretty sure partition 
> invariance is guaranteed by using only eventwise quantities summed over 
> events. Partition invariance doesn't mean that any given subset of the 
> entire sample will itself give a defined value for any given estimator, 
> even if the whole sample does.

Right. It's a one-way statement: different partitions of weights should 
all *combine* to give the same value, but decomposing the whole into 
arbitrary partitions will not in general (and in practice ~all the time) 
give equal values in the partitions. Except in the infinite stats limit, 
where all partitions are also equal, of course... I'm sure measure 
theory has something to say about that sort of asymptotic behaviour, but 
computationally a bit irrelevant!

Binning invariance is the same, and is why Dbn1D is useful in 1D histos: 
the whole distribution moments when calculated from various different 
binnings of the same data should be the same, but that says nothing 
about agreement between different bins.

>> Anyway, just wanted to set that record straight, or at least a *little*
>> less crooked. Have a good weekend ;)
> 
> I did. Hope you did too!

Yep, albeit with occasional mental blips when I tried to think about 
this and my brain hurt! Went to Kirkcaldy in the midst of flooding and 
30 ft waves breaking onto the main road, and tried to feel sorry for 
Gordon Brown ;)

Andy

-- 
Dr Andy Buckley
SUPA Advanced Research Fellow
Particle Physics Experiment Group, University of Edinburgh

The University of Edinburgh is a charitable body, registered in
Scotland, with registration number SC005336.
Previous message: [Rivet] [Yoda] YODA development
Next message: [Rivet] [Yoda] YODA development
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
More information about the Rivet mailing list