|
[Rivet] Proposal for multi-weight/NLO counter-event support, and re-entry to the event loop / finalize stepsFrank Siegert frank.siegert at cern.chWed Jun 5 13:13:38 BST 2013
Hi Andy, all, Thanks for the summary. I'm inlining some comments/questions below, but in summary it looks very good to me. > Here is the proposal for Rivet histogramming developments in the next > couple of weeks at Les Houches, and beyond. We've moved the development > repositories for Rivet and YODA to use hg rather than svn now, which > will make exploratory development much easier. This is the first time I hear about this (probably due to my absence from last week's meeting)... does it mean that we all have to switch to hg for version control now? Is the bootstrap script changed as well? I just now notice that I haven't received svn commit messages and Jenkins notifications for a week. Are those going to be transitioned (how?)? > A reminder of what we're trying to solve... > > 1. Merging of independent runs (for same or different processes) > 2. Writing fully useful histograms *during* a run > 3. Transparent handling of multi-weight events > 4. Transparent handling of correlated events > > These facets are not orthogonal, but to make development feasible we do > need to be factorise as much as possible into small-ish steps, so that > after each one we'll have a working system which we can test. Doing all > of this in one big step seems a recipe for disaster! > > I'll try to go through these now in the order given, which I think is > also the natural order for development. > > > 1. RUN MERGING > > Actually, we can do simple run merging at the YODA level now, thanks to > Dave M putting together the remaining Python += operators for YODA > histograms, and a yodamerge script which uses them to combine multiple > runs into one. It's very simple at the moment, but we will add a command > line way to specify weights for each input run: will that be enough to > combine *different* processes by cross-section? Please try it out... it > *seems* to be working nicely, given a simple test. Cool, I'll be trying this out right away. > This only works for histograms to which at most a normalisation scale > factor has been applied, though. How is the scale factor for a histogram treated in the merging, i.e. how do I get the correct scale in the final histograms if the scale is only an annotation (IIRC)? > In the general case, arbitrary > manipulations might be done to the histograms in the finalize step: to > merge these from multiple runs we need to merge the data objects > *before* finalization. This introduces significant challenges, > especially since we don't want to introduce very unintuitive structures > into the "user" analysis code. Our chosen approach is as follows: > > * Analyses should *register* every object that will be used in their > finalize() method, in addition to those which are intended for > plotting/comparison use. Registration will normally happen in init() but > can also happen in finalize... Is there a use-case of an object being registered in finalize? I'm asking, because as far as I can tell our proposed system should in principle work without any finalize in the individual runs, right? That allows to have the only finalize directly in the rivet-cmphistos, which could then also take over functionality of merging different runs. > this already happens, so effectively no > change except that even intermediate histograms will now need to be > registered. For analyses with cuts, registered YODA Counter objects will > need to be used in place of doubles for weight counting. Access to the > histos/counters in the analysis will be by pointers, as now, or by the > registered path name, so the user's freedom to structure their analysis' > data as they wish won't be impeded. > > * We will provide a way to declare on the booking methods whether the > object being registered is to be "visible" in the final histograms, or > if it is an interim data object to be used in preparing the final plots. > This way *everything* needed to calculate the final plots in finalize() > will also be written out to the .yoda file. This will inflate the file > size, but this should not be a major problem. We can add a behavioural > flag to disable this behaviour if that is thought to be important. The > plotting scripts (i.e. rivet-cmphistos) should have a flag to plot the > intermediate histos (default = only plot final/visible histos). In principle, *if* we move finalize into rivet-cmphistos, one wouldn't need the final histos in the rivet output at all. They would then only be created as rivet-cmphistos output. I'm not sure whether we really want this, but I'll use it as the basis for discussion here. Ideally we should still allow for run merging at the C++ level though (I'm thinking MPI parallelised generator runs with direct Rivet interface which want to merge their separate Rivet results before writing out). > * Add the ability to "pre-load" the intermediate histograms after > Rivet's init() step, but before the event loop starts, by supplying a > .yoda data file... or multiple files. This is the key step for merging > analyses with complex finalize() logic & manipulations: we perform the > multiple runs in parallel, merge the histogram files (although only with > an interest in the non-final histograms: the final ones will in general > be mangled by this process), and restart the event loop -- perhaps for 0 > events -- with that combined state pre-loaded. The finalize() will then > proceed using the aggregated intermediate data objects and write out > consistent full-stats/all-processes physical ones. Is there a use case for restarting the event loop? This would contradict the move into plotting tools: > (It was even > suggested that if we map everything nicely into Python, that this > merge+preload+finalize step could be done transparently inside > rivet-cmphistos, rather than via an explicit extra run of the rivet script.) Exactly. In that case I would rename rivet-cmphistos to rivet-finalize or such. > Re. the temporary/invisible histo flagging: in the meeting it was > suggested that we use a YODA "annotation" for this, but if we need to > write out both the intermediate and final versions of e.g. histograms to > be normalized, then they need to have different paths so we can > distinguish them. I suggest a /TMP/ path prefix for intermediate histos, > cf. the /REF/ that we already use to distinguish and relate MC and data > histograms and because it will alphabetically group histos in a fairly > predictable way. We can build awareness of /TMP into rivet-cmphistos in > the same way as we already do for /REF. Another benefit of a fully > predictable path scheme is that we can switch the behaviour of the > Analysis::get("name") function, so that in the analyze() method it > returns the temporary histo, and in finalize() it returns the permanent > one: I think this is necessary, and that in fact we will have to do some > pre-finalize sleight of hand to switch the target of the histo pointers > in the analysis to point at the permanent objects rather than the > temporary ones! Thoughts on this? This depends a lot on whether we want to have mixed intermediate/final output files, or whether the stages are separated into "rivet->intermediate.yoda" and "rivet-finalize->final.yoda". > 2. MID-RUN HISTO WRITING > > This is semi-trivial given the above. All that we need to be careful > about here is that finalize() doesn't disrupt the intermediate > histograms, which the sleight-of-hand method should ensure. We can then > do the pointer target switch, run finalize, switch back, and continue > with the run: finalize() can be run any number of times. Maybe this gets > released at the same time as the above, maybe it comes slightly later. That would be trivial with the separation into rivet and rivet-finalize. > 3. MULTI-WEIGHT EVENTS > > The proposal is that for multi-weight events, we don't just book one > intermediate histogram per registered name (and then turn those into > permanent histograms in finalize()), but that for every call of add() we > book 1 temporary histo (which will be clear()ed after each event) and N > intermediates. This requires a little bit of magic as we'll only find > out how many weights there are by looking at the first event, but we > already do that in Run to work out the beam particles and sqrtS before > init()ing the analyses. At the end of every event, the Analysis base > class (or the AnalysisHandler) will sync the temporary histograms to the > intermediates by looping over the weight vector of the event, and > scaling the temporary by the weight before +=ing it to its intermediate. > Note that this means that in the analysis code, ~all weights should be > 1! This will require migration, but also makes for a beautiful > simplification. Oh yes, that sounds like a great simplification for histograms! Do we have to keep in mind any type of object (besides histos, profiles, counters, ...), which this doesn't trivially generalise to? (I'm remembering there was an analysis which correctly filled with weight=1?) > The finalize() code will need to be aware that > operations apply to all intermediates rather than just one... this will > require some thought, e.g. running finalize() once for each weight, with > the appropriate weight-specific pointer switching. > > Again, we'll need distinct path structures to track and access these > distinct-but-related data objects. I suggest a /.../.../FOO at WEIGHTNAME > path syntax extension (perhaps with no @WEIGHTNAME part for the first > weight, i.e. the nominal behaviour) > > Another release should probably happen at this point. > > > 4. CORRELATED EVENTS > > Almost done. Handling NLO counter-events (where a group of events must > be treated as correlated sub-events) requires that we make a temporary > histo for each super-event block and then synchronise that super-event > to the intermediate histos. We'll trigger on the super-event transitions > by looking for a change of event number: if the event number remains the > same between consecutive events, we assume that they are correlated > sub-events. > > The machinery for multi-weight events deals with most of the issues, I > think: we will already have per-event temporary, transient-only > histograms, and these just need to be generalized a bit to become > per-super-event temporaries. There are some open questions: > > * Fuzzy bin edges: counter-events could fall on either side of a bin > boundary by an epsilon separation. We have the machinery to catch this > and deal with it (average out the fills between the two bins or asign > both fills to only one bin) because the YODA bins store their mean fill > position. The abstraction of the temporary histograms makes this > possible without having to put physics knowledge into YODA: phew. I'm not sure this should be done (by default), as it impairs NLO accuracy by definition. At the generator level one can implement slightly smarter ways to deal with such misbinning effects, which use the knowledge about the singularity structure of the counter events to decide whether such a shifting of the weight is even necessary (still of course not NLO accurate, but under better control). I think it's safer to leave it at that... > * What weight do we use to sync to the intermediate histos? The > sub-events can have different weights -- in fact, they can have > systematic weight vectors! Do we need to add a separate "event group > weight" member to HepMC and LHE to separate systematics weights from NLO > subtraction term weights? Or can we make do with the systematics weight > vectors and a standardised procedure? I'm not sure I understand this. Every (counter)event within the event group has its own independent weight (vector in case of systematics variations), so this structure will need to be propagated through to the temporary histograms. > * In this scheme, sub-events must be consecutive and have the same event > number so we can determine that they are to be correlated. Yes, I think it's fine to assume that. > MC at NLO > doesn't do this: is that ok? (MC at NLO's events only have weights +-1, so > I think it's ok to treat them as uncorrelated: in that case, we don't > really have to include it in this "NLO" treatment at all.) MC at NLO (and all NLO+PS for that matter) is completely different: it does not have correlated counter events for a real-emission event. Instead those are combined with a very clever "trick" without impairing NLO accuracy. So the correlated-NLO-events discussion does not apply to NLO+PS at all, only to fixed-order NLO. Cheers, Frank
More information about the Rivet mailing list |