|
[Rivet-svn] r2125 - trunk/docblackhole at projects.hepforge.org blackhole at projects.hepforge.orgTue Dec 1 22:57:21 GMT 2009
Author: buckley Date: Tue Dec 1 22:57:20 2009 New Revision: 2125 Log: Adding rant about how to do MC analyses ;-) Modified: trunk/doc/rivet-manual.tex Modified: trunk/doc/rivet-manual.tex ============================================================================== --- trunk/doc/rivet-manual.tex Tue Dec 1 20:55:27 2009 (r2124) +++ trunk/doc/rivet-manual.tex Tue Dec 1 22:57:20 2009 (r2125) @@ -5,14 +5,17 @@ \title{Rivet user manual\\ {\smaller \textsc{version \RivetVersion}}} -\author{Andy Buckley\\ IPPP, Durham University, UK.\\ E-mail: \email{andy.buckley at durham.ac.uk}} +\author{Andy Buckley\\ PPE Group, School of Physics, University of Edinburgh, UK.\\ E-mail: \email{andy.buckley at ed.ac.uk}} \author{Jonathan Butterworth\\ HEP Group, Dept. of Physics and Astronomy, UCL, London, UK.\\ E-mail: \email{J.Butterworth at ucl.ac.uk}} \author{Leif L\"onnblad\\ Theoretical Physics, Lund University, Sweden.\\ E-mail: \email{lonnblad at thep.lu.se}} -\author{Hendrik Hoeth\\ Theoretical Physics, Lund University, Sweden.\\ E-mail: \email{hendrik.hoeth at cern.ch}} +\author{Hendrik Hoeth\\ IPPP, Durham University, UK.\\ E-mail: \email{andy.buckley at durham.ac.uk}} \author{James Monk\\ HEP Group, Dept. of Physics and Astronomy, UCL, London, UK.\\ E-mail: \email{jmonk at hep.ucl.ac.uk}} +\author{Holger Schulz\\ Institut f\"ur Physik, Berlin Humboldt University, Germany.\\ E-mail: \email{holger.schulz@@physik.hu-berlin.de}} +%\author{Eike von Seggern\\ Institut f\"ur Physik, Berlin Humboldt University, Germany.\\ E-mail: \email{jan.eike.von.seggern@@physik.hu-berlin.de}} \author{Frank Siegert\\ IPPP, Durham University, UK.\\ E-mail: \email{frank.siegert at durham.ac.uk}} \author{Lars Sonnenschein\\ CERN, Gen\`eve 1206, Switzerland.\\ E-mail: \email{sonne at cern.ch}} + \preprint{} %\preprint{\hepth{9912999}} @@ -475,7 +478,152 @@ might differ there from experieces with HZTool are the new histogramming system and the fact that we've used some object orientation concepts to make life a bit easier. The meaning of ``projections'', as applied to event analysis, will -probably be less obvious. We'll discuss them now. +probably be less obvious. We'll discuss them soon, but first a +semi-philosophical aside on the ``right way'' to do physics analyses on and +involving simulated data. + + +\section{The science and art of physically valid MC analysis} + +The world of MC event generators is a wonderfully convenient one for +experimentalists: we are provided with fully exclusive events whose most complex +correlations can be explored and used to optimise analysis algorithms and some +kinds of detector correction effects. It is absolutely true that the majority of +data analyses and detector designs in modern collider physics would be very +different without MC simulation. + +But it is very important to remember that it is just simulation: event +generators encode much of known physics and phenomenologically explore the +non-perturbative areas of QCD, but only unadulterated experiment can really tell +us about how the world behaves. The richness and convenience of MC simulation +can be seductive, and it is important that experimental use of MC strives to +understand and minimise systematic biases which may result from use of simulated +data, and to not ``unfold'' imperfect models when measuring the real world. The +canonical example of the latter effect is the unfolding of hadronisation (a +deeply non-perturbative and imperfectly-understood process) at the Tevatron (Run +I), based on MC models. Publishing ``measured quarks'' is not physics --- much +of the data thus published has proven of little use to either theory or +experiment in the following years. In the future we must be alert to such +temptation and avoid such gaffes --- and much more subtle ones. + +These concerns on how MC can be abused in treating measured data also apply to +MC validation studies. A key observable in QCD tunings is the \pT of the \PZ +boson, which has no phase space at exactly $\pT = 0$ but a very sharp peak at +$\mathcal{O}(\unit{1-2}{\GeV})$. The exact location of this peak is mostly +sensitive to the width parameter of a nucleon ``intrinsic \pT'' in MC +generators, plus some soft initial state radiation and QED +bremstrahlung. Unfortunately, all the published Tevatron measurements of this +observable have either ``unfolded'' the QED effects to the ``\PZ \pT'' as +attached to the object in the HepMC/HEPEVT event record with a PDG ID code of +23, or have used MC data to fill regions of phase space where the detector could +not measure. Accordingly, it is very hard to make an accurate and portable MC +analysis to fit this data, without similarly delving into the event record in +search of ``the boson''. While common practice, this approach intrinsically +limits the precision of measured data to the calculational order of the +generator --- often not analytically well-defined. We can do better. + +Away from this philosophical propaganda (which nevertheless we hope strikes some +chords in influential places\dots), there are also excellent pragmatic reasons +for MC analyses to avoid treating the MC ``truth'' record as genuine truth. The +key argument is portability: there is no MC generator which is the ideal choice +for all scenarios, and an essential tool for understanding sub-leading +variability in theoretical approaches to various areas of physics is to use +several generators with similar leading accuracies but different sub-leading +formalisms. While the HEPEVT record as written by HERWIG and PYTHIA has become +familiar to many, there are many ambiguities in how it is filled, from the +allowed graph structures to the particle content. Notably, the Sherpa event +generator explicitly elides matrix element particles from the event record, +perhaps driven by a desire to protect us from our baser analytical +instincts. The Herwig++ event generator takes the almost antipodal approach of +expressing different contributing Feynman diagram topologies in different ways +(\emph{not} physically meaningful!) and seamlessly integrating shower emissions +with the hard process particles. The general trend in MC simulation is to blur +the practically-induced line between the sampled matrix element and the +Markovian parton cascade, challenging many established assumptions about ``how +MC works''. In short, if you want to ``find'' the \PZ to see what its \pT or +$\eta$ spectrum looks like, many new generators may break your honed PYTHIA +code\dots or silently give systematically wrong results. The unfortunate truth +is that most of the event record is intended for generator debugging rather than +physics interpretation. + +Fortunately, the situation is not altogether negative: in practice it is usually +as easy to write a highly functional MC analysis using only final state +particles and their physically meaningful on-shell decay parents. These are, +since the release of HepMC 2.5, standardised to have status codes of 1 and 2 +respectively. \PZ-finding is then a matter of choosing decay lepton candidates, +windowing their invariant mass around the known \PZ mass, and choosing the best +\PZ candidate: effectively a simplified version of an experimental analysis of +the same quantity. This is a generally good heuristic for a safe MC analysis! +Note that since it's known that you will be running the analysis on signal +events, and there are no detector effects to deal with, almost all the details +that make a real analysis hard can be ignored. The one detail that is worth +including is summing momentum from photons around the charged leptons, before +mass-windowing: this physically corresponds to the indistinguishability of +collinear energy deposits in trackers and calorimeters and would be the ideal +published experimental measurement of Drell-Yan \pT for MC tuning. Note that +similar analyses for \PW bosons have the luxury over a true experiment of being +able to exactly identify the decay neutrino rather than having to mess around +with missing energy. Similarly, detailed unstable hadron (or tau) reconstruction +is unnecessary, due to the presence of these particles in the event record with +status code 2. In short, writing an effective analysis which is automatically +portable between generators is no harder than trying to decipher the variable +structures and multiple particle copies of the debugging-level event +objects. And of course Rivet provides lots of tools to do almost all the +standard fiddly bits for you, so there's no excuse!\\[\lineskip] + +\noindent +Good luck, and be careful! + +% While the event record "truth" structure may look very +% compellingly like a history of the event processes, it is extremely important to +% understand that this is not the case. For starters, such a picture is not +% quantum mechanically robust: it is impossible to reconcile such a concept of a +% single history with the true picture of contributing and interfering +% amplitudes. A good example of this is in parton showers, where QM interference +% leads to colour coherence. In the HERWIG-type parton showers, this colour +% coherence is implemented as an angular-ordered series of emissions, while in +% PYTHIA-type showers, an angular veto is instead applied. The exact history of +% which particles are emitted in which order is not physically meaningful but +% rather an artefact of the model used by the generator --- and is primarily +% useful for generator authors' debugging rather than physics analysis. This is in +% general true for all particles in the event without status codes of 1, 2 or 4. + +% Another problem is that the way in which the event internals is documented is +% not well defined: it is for authors' use and as such they can do anything they +% like with the "non-physics" entities stored within. Some examples: + +% * Sherpa does not write matrix element particles (i.e. W, Z, Higgs, ...) into the event record in most processes +% * Herwig++ uses very low-mass Ws in its event record to represent very off-shell weak decay currents of B and D mesons (among others) +% * In Drell-Yan events, Herwig++ sometimes calls the propagating boson a Z, and sometimes a photon, probabilistically depending on the no-mixing admixture terms +% * Sherpa events (and maybe others) can have "bottleneck" particles through which everything flows. Asking if a particle has e.g. a b-quark (NB. an unphysical degree of freedom!) ancestor might always give the answer "yes", depending on the way that the event graph has been implemented +% * Different generators do not even use the same status codes for "documentation" event record entries: newer ones tend to represent all internals as generator-specific particles, and emphasise their lack of physical meaning +% * The generator-level barcodes do not have any reliable meaning: any use of them is based on HEPEVT conventions whihc may break, especially for new generators which have never used HEPEVT +% * Many (all?) generators contain multiple copies of single internal particles, as a bookkeeping tools for various stages of event processing. Determining which (if any) is physically meaningful (e.g. which boosts were or weren't applied, whether QED radiation was included, etc.) is not defined in a cross-generator way. +% * The distinction between "matrix element" and "parton shower" is ill-defined: ideally everything would be emitted from the parton shower and indeed the trend is to head at least partially in this direction (cf. CKKW/POWHEG). You probably can't make a physically useful interpretation of the "hard process", even if particular event records allow you to identify such a thing. +% * Quark and gluon jets aren't as simple as the names imply on the "truth" level: to perform colour neutralisation, jets must include more contributions than a single hard process parton. When you look at event graphs, it becomes hard to define these things on a truth level. Use an observable-based heuristic definition instead. + +% Hence, any truth-structure assumptions need to checked and probably modified +% when moving from one generator to another: clearly this can lead to +% prohibitively large maintenance and development hurdles. The best approach, +% whenever possible, is to only use truth information to access the particles +% with status code 1 (and those with status = 2, if decays of physically +% meaningful particles (i.e. hadrons) are being studied. In practice, this +% adds relatively little to most analyses, and the portability of the analyses +% is massively improved, allowing for wider and more reliable physics +% studies. If you need to dig deeper, be very careful! + +% A final point of warning, more physical than the technicalities above, is that +% the bosons or similar that are written in the event record are imperfect +% calculational objects, rather than genuine truth. MC techniques are improving +% all the time, and you should be extremely careful if planning to do something +% like "unfolding" of QED radiation or initial state QCD radiation on data, based +% on event record internals: the published end result must be meaningfully +% comparable to future fixed order or N^kLL resummation calculations, particularly +% in precision measurements. If not handled with obsessive care, you may end up +% publishing a measurement of the internals of an MC generator rather than +% physical truth! Again, this is not an absolute prohibition --- there are only +% shades of grey in this area --- but just be very careful and stick to statuses +% 1, 2 and 4 whenever possible. \section{Projections} @@ -574,16 +722,15 @@ type-safety, this proliferation of dynamic casting may worry you: the compiler can't possibly check if a projection of the requested name has been registered, nor whether the downcast to the requested concrete type is - legal. These are very legitimate concerns! + legal. These are very legitimate concerns!\\ - In truth, we'd \emph{like} to have this level of extra safety but in the past, - when projections were held as members of \code{ProjectionApplier} classes - rather than in the central \code{ProjectionHandler} repository, the benefits - of the strong typing were outweighed by more serious and subtle bugs relating - to projection lifetime and object ``slicing''. At least when the current - approach goes wrong it will throw an unmissable \emph{runtime} error every - time that you run it (until it's fixed, of course!) rather than silently do - the wrong thing, as was the previous behaviour. + In truth, we'd like to have this level of extra safety! But in the past, when + projections were held as members of \code{ProjectionApplier} classes rather + than in the central \code{ProjectionHandler} repository, the benefits of the + strong typing were outweighed by more serious and subtle bugs relating to + projection lifetime and object ``slicing''. At least when the current approach + goes wrong it will throw an unmissable \emph{runtime} error --- until it's + fixed, of course! --- rather than silently do the wrong thing.\\ Our problems here are a microcosm of the perpetual language battle between strict and dynamic typing, runtime versus compile time errors. In practice, @@ -595,7 +742,7 @@ at runtime. By pushing \emph{some} checking to the domain of runtime errors, Rivet's code is (we believe) in practice safer, and certainly more clear and elegant. However, we believe that with runtime checking should come a culture - of unit testing, which is not yet in place in Rivet. + of unit testing, which is not yet in place in Rivet.\\ As a final thought, one reason for Rivet's internal complexity is that C++ is just not a very good language for this sort of thing: we are operating on the
More information about the Rivet-svn mailing list |