|
[Rivet] Histogram normalisationFrank Siegert frank.siegert at durham.ac.ukTue Oct 13 23:45:41 BST 2009
Following up from the discussion today, here is my understanding of our conclusions re normalisation of histograms in the longer term. Please add to it and correct me where I'm wrong. (Note that all of the following only refers to distributions which are proportional to the cross section, i.e. not profile histograms like N_charged vs. pT(leading jet), where normalisation is not an issue.) Rivet's written-out histogram files should never be normalised to a fixed number, be that 1.0 or the integral of the reference histograms etc. Instead they should represent the actual cross section that went into the histogram, which would currently be achieved by finalising with scale(hist, crossSection()/sumOfWeights()); If we agree on that, this should be automated, such that not each finalize() method has to do it. If the reference data is normalised in a different way, then this should be stored as extra information which is written out with the histo data. E.g. Norm=1.0 or Scale=1.0/780.0 where 780.0 could be a number determined during the event processing, like an inclusive XS. Now when tuning with or plotting the histograms, at least two options should be accomodated for all histograms that don't have a fixed norm/scale stored as above: 1. Plot everything according to truth, without any scaling/kfactors. That's simple. 2. Something like a leading-order mode, since many of the generators that Rivet is used with are LO accuracy and usually only care about shapes of distributions (because experiments normalise them to N(N)LO calculations anyway). This is tricky, because you don't want to normalise every histogram separately to data, but only introduce one scaling factor per one event sample or analysis (?). My temporary solution to this has been several lines like these in a make-plots.conf file: # pure QCD .*aida/CDF_2006_S6450792/.*::Scale=1.7 .*aida/CDF_2007_S7057202/.*::Scale=1.7 .*aida/CDF_2008_S7828950/.*::Scale=1.7 .*aida/CDF_2008_S8093652/.*::Scale=1.7 .*aida/D0_2008_S7662670/.*::Scale=1.7 and this has worked quite well for me. We'd want this to be automated though, so maybe we could introduce an additional bit of information for each histogram called "KFactor" which would normally be set to 1.0, but if an analysis thinks for a particular histogram that a kfactor would be meaningful, it could calculate and store it as proper as possible. Of course, this would not always scale each histogram up to data, because the kfactor relates the total *inclusive* NLO/LO cross sections while most histograms contain cross sections after significant cuts. As an example consider a Z+jets analysis which plots histograms of pT(Z) and pT(3rd jet). If you properly introduce a kfactor the fairly inclusive pT(Z) will normally get scaled to data, but the pT(3rd jet) integral could be very different from data if your Monte-Carlo is not able to describe a correct ratio of z+3jet/zinclusive events. Such differences have to be preserved in any case. So each analysis author has the option to provide a reasonable way to normalise a histogram for use with LO Monte-Carlos. Does that sound reasonable? Can we collect more different use cases from actual analyses people have written to discuss this? One more issue, which we haven't mentioned today: Eventually we want to provide our plotting tools with the ability to merge output files from separate independent runs (to increase statistics by running many jobs in parallel e.g. on the grid). For this we need some more information stored in the histogram files, namely the raw sum of weights in each bin (+squared), don't we? And the number of entries in a histogram? If we agree on a Rivet-wide scale(hist, crossSection()/sumOfWeights()); the sum of weights in each bin could be skipped with just storing the number above, but the squared ones are still needed for error estimating. Just wanted to mention this while we discuss which information we store with histograms. Sorry for the long email, comments welcome. Frank
More information about the Rivet mailing list |