|
This sub-package contains the classes used for loading data from disk and to
re-structure them for tuning.
The loading of files from disk is done in DataProxy (or
RivetDataProxy). Files are only read if necessary
and then cached. A TuneData object holds all information necessary for
a single tune or for comparing MC data (held in MCData) with reference
data. Basically, TuneData is just a list of BinProps objects, one
object for each bin that is in the requested observables.
The data flow looks like this:
+-----+ +------+ +----+
| ref | | ipol | | mc | (files on disk)
+-----+ +------+ +----+
| | |
| +---------+ +--------+
| | IpolSet | | MCData |
| +---------+ +--------+
| | |
+-----------+ |
| DataProxy |---------+
+-----------+
|
+----------+
| TuneData | (list of BinProps)
+----------+
|
+------------+
| GoF object | (this compares MC/interpolation with reference)
+------------+
-
class professor.data.DataProxy
Bases: object
Central object for loading data from the file system.
Three types of data are handled:
- Reference data :
- TODO
- MC data :
- Different types of MC data can be stored. The MC data is stored in a
dict {type-ID => MCData} . type-IDs are for example ‘sample’
or ‘scan’.
- Interpolations :
- TODO
See also
- MCData
- Abstraction of a MC data subdirectory.
Methods
-
addMCData(mcdata, datatype)
Add MC data of given data type.
Add a MC data interface to the internal storage dictionary. If an
entry for datatype already exists it will be overwritten!
Parameters : | mcdata : MCData (or subclass)
The MC data to add.
datatype : str
The MC data type, e.g. ‘sample’ or ‘scan’ or ‘tunes’.
|
Raises : | TypeError :
If mcdata has wrong type.
|
-
static getBinID(histo, ibin)
Get a canonical bin id of the form Analysis/HistoID:BinIdx .
Parameters : | histo : Histo
Histogram.
ibin : int
Bin index.
|
-
static getBinIndex(binid)
Get the bin index from a canonical bin ID.
Parameters : | binid : str
The bin ID.
Returns :
——- :
index : int
|
-
getInterpolationSet(ipolcls, runs)
Get an InterpolationSet.
This is loaded from disk on-the-fly.
Parameters : | ipolcls : class
The interpolation method class.
runs : list, str
The runs that are used as anchor points for the interpolation.
Can be a list of strings or a single string of colon-separated
run keys.
|
-
getIpolFilePath(ipolcls, runs, output=False)
Return the canonical path for an interpolation pickle.
Parameters : | ipolcls : class
The interpolation method class. Must have a ‘method’ attribute.
runs : list, str
The runs that are used as anchor points for the interpolation.
Can be a list of strings or a single string of colon-separated
run keys.
|
-
getIpolPath()
-
getMCData(datatype='sample', checkAIDA=True)
Get MC data of the given type.
Parameters : | datatype : str, optional
The MC data type, e.g. ‘sample’ or ‘scan’ (default is
‘sample’.
|
Returns : | mcdata : MCData
The datatype MC data.
|
Raises : | DataProxyError :
If no MC data of type datatype is available.
|
-
getOutputPath()
-
static getPathsFromCLOptions(opts)
Return a dict with the data paths specified on command line.
The dictionary has the following 4 keys:
Each value can contain None meaning that the respective
command-line option is not available or that a value could not be
constructed from e.g. DATADIR/mc .
-
getRefData()
Get the dictionary of all loaded reference histograms, indexed
by histogram path.
Returns : | refhistos : dict(path => histo.Histo)
The reference histograms.
|
-
getRefHisto(histopath)
Get a reference histogram.
Parameters : | histopath : str
A histogram path of the form ‘/Analysis/HistoID’.
|
Returns : | histogram : histo.Histo
The reference histogram.
|
Raises : | DataProxyError :
If self.refpath is not set.
KeyError :
If histopath is not available.
|
-
getRefPath()
-
getTuneData(withref=True, withmc=None, useipol=None, useruns=None, useobs=None)
Return a TuneData object with the desired data.
The kind of data that is given to TuneData can be steered via the
(optional) flags. Depending on the kind of computation (calculating
interpolation coefficients/minimising/...) different kinds of data
must be turned on.
This is the central data preparation function.
Parameters : | withref : bool, optional
Equip TuneData with reference data (the default is True).
withmc : {str, None}, optional
If not None, the type of MC data that is stored in the
TuneData, e.g. ‘sample’. The default is None.
useipol : {interpolation_class, None}, optional
If not None, the interpolation method class used for the
per-bin interpolations. Only the
method attribute is important because
this is used to construct the file name of the pickle file.
useruns : {list of str, None}, optional
The run numbers used for interpolation. Can be None if
withmc is given. In this case, all available MC runs are used.
useobs : {list of str, None}, optional
The observables to use. Can be None if withmc is given. In
this case, all available observables in the MC data are used.
|
-
ipolpath
Base directory for interpolation set files
-
listInterpolationSets()
Return a list of all InterpolationSets in the ipol directory.
Raises : | DataProxyError :
If self.ipolpath is not set.
|
-
classmethod mkFromCLOptions(opts, checkAIDA=True)
Build DataProxy from CL options that were prepared with
addDataCLOptions.
Only the paths are set in the returned DataProxy for which the
parser has an according option.
-
outdir
Base directory for output
-
refpath
Base directory for reference data files
-
setDataPath(base)
Set data location paths rooted at base.
Sets the data location paths for reference data (base/ref),
MC sample (base/mc) and interpolation storage (base/ipol/).
Parameters : | base : str
Base path for data locations.
|
-
setIpolPath(path)
-
setMCPath(path, datatype='sample', checkAIDA=True)
Add MC data of given type rooted at path.
Parameters : | path : str
Base directory of the MC data.
datatype : str, optional
The type identifier of the MC data, e.g. ‘sample’ or
‘linescan’. The default is ‘sample’.
|
Raises : | IOTestFailed :
If path is not a readable directory.
|
-
setOutputPath(path)
-
setRefPath(path)
-
static splitBinID(binid)
Split a bin ID in observable and bin index.
Parameters : | binid : str
The bin ID.
Returns :
——- :
observable : str
index : int
|
-
class professor.data.proxy.RivetDataProxy
Bases: professor.data.proxy.DataProxy
Data proxy that loads the reference data from the files distributed
with rivet.
Methods
-
getRefData()
-
getRefHisto(histopath)
Get a reference histogram.
Parameters : | histopath : str
A histogram path of the form ‘/Analysis/HistoID’.
|
Returns : | histogram : histo.Histo
The reference histogram.
|
Raises : | KeyError :
If histopath is not available.
|
-
classmethod mkFromCLOptions(opts, checkAIDA=True)
Build DataProxy from CL-options that were prepared with
addDataCLOptions.
Only the paths are set in the returned DataProxy for which the
parser has an according option.
See also
- addDataCLOptions
- Add a data location command-line option group to an OptionParser.
- getPathsFromCLOptions
- Get a dict of data-location paths from command line options.
-
class professor.data.MCData(base, checkAIDA=True)
Bases: object
Interface for a directory with MC generated data.
MCData abstracts a directory with MC generated data with a layout
following:
Data is read from the filesystem only if necesarry.
Variables: |
- basepath – Directory path within which all runs are located (typically basepath/mc).
- availableruns – List of valid run names, based on a scan of valid run dirs found in basepath.
|
Methods
-
availablehistos
The available histogram names (sorted).
-
getAvailableObservables(filtered=True)
Get a sorted list with the available observables.
The observables are taken from the first available MC run data.
By default only the observables containing valid numerical data
(i.e. no NaN’s) are returned.
Parameters : | filtered : bool, optional
Return only histograms that contain valid (i.e. not NaN) data
(default). If set to False all available observables are
returned.
|
-
getParameterBounds(runs=None)
Get the extremal parameter bounds of runs.
Returns : | bounds : ParameterRange |
-
getParameterCmp(run=None)
-
getParameterNames()
-
getRunHistos(run, filtered=False)
Return the {obsname => Histo} dict for given run.
Parameters : | run : str
Run ID.
filtered : bool, optional
Return only histograms that contain valid (i.e. not NaN) data.
By default all histograms are returned (for the sake of speed).
|
Returns : | histograms : dict
Dictionary that map histogram paths to Histo instances.
|
-
getRunParams(run, retall=False)
Get the run parameters.
Parameters : | run : str
Run ID.
|
Returns : | parameters : ParameterPoint
The parameter values.
|
-
getScanParam(run)
-
isValidRunDir(runid=None, runpath=None, checkAIDA=True)
Check that the run directory is valid.
Checks for an out.aida file and an used_params file.
The run can be specified by runid or runpath.
Parameters : | runid : str
The ID of the run, i.e. the subdirectory name.
runpath : str
The full path to the rundirectory. This is used in the
ManualMCData class.
|
-
loadAllRuns(loadhistos=True)
Load the data for all available runs.
Parameters : | loadhistos : bool, optional
Turn loading histogram data on (default) or off.
|
See also
- loadRun
- Load a single run.
- loadAllThreaded
- Load all runs threaded, useful if IO lags are huge, e.g. with network file storage.
-
loadAllThreaded(loadhistos=True, numthreads=8)
Load the data for all available runs (multi-threaded).
This is only useful if IO lags are huge. Otherwise the Python thread
overhead makes this more time-consuming than loadAll.
Parameters : | loadhistos : bool, optional
Turn loading histogram data on (default) or off.
numthreads : int, optional
Number of threads (default: 8).
|
See also
- loadRun
- Load a single run.
- loadAll
- Load all runs sequentially.
-
loadRun(run, loadhistos=True)
Load data for a run.
Parameters : | run : str
The run identifier to load.
loadhistos : bool, optional
Turn loading histogram data on (default) or off.
|
-
loadedruns
The currently loaded run numbers (sorted).
-
class professor.data.ManualMCData(runpathmap=None)
Bases: professor.data.mcdata.MCData
-
addRunPath(runid, path)
-
availableruns
-
getParameterCmp(runid=None)
-
loadRun(runid, loadhistos=True)
Load data for run.
-
class professor.data.TuneData(dataproxy, withref=True, withmc=None, useipol=None, useruns=None, useobs=None)
Bases: dict
Container for data for one choice of runs.
The bin ids (e.g. /Path/To/Obs:index) are mapped on BinProps instances.
Attributes
runnums |
list |
Sorted list run identifiers. |
hasref, hasmc, hasipol |
bool |
Flags that are True if the object contains that type of data. |
paramranges |
ParameterRange, None |
The range of parameters spanned by the used MC runs. Only available
if MC or ipol data was included. |
Methods
Make a TuneData object with the desired data.
The kind of data that is given to TuneData can be steered via the
(optional) flags. Depending on the kind of computation (calculating
interpolation coefficients/minimising/...) different kinds of data
must be turned on.
This is the central data preparation function.
Parameters : | withref :
Equip TuneData with reference data.
withmc : str, optional
Equip TuneData with mc data of the given type, e.g. ‘sample’.
Use None to disable storing MC data. This is the default.
useipol : class, optional
The interpolation method given by the class or None (=> no
interpolation data is loaded). None is the default.
useruns : list of str
List of MC run numbers to use or None (=> use all runs from
mc data given with withmc).
useobs : list of str
List of observables to use or None (=> use all observables from
mc data given with withmc).
|
Raises : | ArgumentError :
If run numbers (if needed) or observables are not specified and
cannot be guessed.
|
Methods
-
applyObservableWeights(weightmanager)
Set the bin weights.
Parameters : | weightmanager : WeightManager |
-
filteredValues()
Return an iterator with the bin properties without vetoed,
zero-weighted.
-
getBinIDs(obs)
List of all binIDs for observable `obs’.
-
getBinProps(obs)
List of all BinProps for observable `obs’.
-
getInterpolationHisto(observable, params)
Interpolation-prediction for observable at params.
Parameters : | observable : str
Path of the observable.
params : MinimizationResult, ParameterPoint, dict
The values of MC model parameters where the interpolation is
evaluated.
|
Returns : | histogram : lighthisto.Histo
Interpolated histogram.
|
-
getObservables()
-
ipolmethod
The interpolation method.
Returns the interpolation method of the first bin. It is assumed
that all bin properties use the same interpolation method.
Raises : | DataProxyError :
If no interpolations were stored.
|
-
numParams()
-
observables
-
vetoEmptyErrors()
Veto bins with zero reference error.
TODO: This is a nasty heck way of identifying broken (for some reason) bins.
We should get rid of it!
-
class professor.data.BinProps(refbin, mcdict, ipol, **kwargs)
Bases: object
Container for all data related to a bin needed to do a minimisation.
A container for all the variants on a distribution bin: its weight, its
reference value and errors, a collection of its simulated equivalents from a
set of MC runs, and an interpolation function for that bin, based on
optimising the fit to a sampling of MC points in the parameter space.
At the moment the following is stored:
Attributes
———-
refbin: lighthisto.Bin
The reference bin.
- mcdict : dict {str => lighthisto.Bin}
- Map for run numbers on MC bins.
- ipol
- The interpolation for this bin.
- veto : bool
- Flag for vetoing this bin in the GoF calculation.
weight : float
sqrtweight : float
The weight of this bin in GoF calculation.
- binid : str
- The bin ID of this bin of the form ‘/Analysis/Observable:BinIndex’.
Methods
-
binid
-
getBinCenter()
-
getProperty(propname)
Get a generic name=value bin property. Return None if not found.
-
ipol
-
mcdict
-
refbin
-
setProperties(propdict)
Set a dictionary of generic name=value bin properties.
-
setProperty(propname, propvalue)
Set a generic name=value bin property.
-
setSqrtWeight(sw)
-
setWeight(w)
-
sqrtweight
-
veto
-
weight
-
class professor.data.WeightManager
Bases: object
This simple object loads observable weight/property files and stores a
dictionary with observable:Weight pairs
Methods
-
addBinRangeWeight(observable, binrange=(-inf, inf), weight=1.0, **kwargs)
Set the weights for bins of ‘observable’ in ‘binrange’.
Parameters : | observable : str
Path of the observable.
binrange : tuple of floats
The x-value bin range.
weight : float
Weight for the bins.
kwargs : dict
Extra named arguments, passed to be Weight properties with those names.
|
-
getWeight(obs, obsvalue=None)
-
loadWeightsFile(wfile)
-
classmethod mkFromFile(path)
-
observables
Indexing operator for weight lookup. Also useable as wm[“obsname”].
If obsvalue is not given or is None, this function returns a Weight object,
or None if there is no matching observable to the obs string. If obsvalue
is given, return the numerical weight for that observable value, obtained via
the Weights.getWeight method.
-
class professor.data.Weight(obs)
Bases: object
A simple object that holds a dict with binrange:weight pairs.
The weights have been extended to be general bin properties since the first
design, and a reworking of this class design is probably overdue.
Methods
-
binRanges()
-
getProperties(bincenter)
Get the properties for a given observable value, excluding the “weight” property.
-
getWeight(bincenter)
Evaluate the weight for bincenter by iterating over
binrange:Weight definitions
If the bincenter is outside the ranges, return 0
-
setProperties(binrange, *args, **kwargs)
Set several properties at once, by supplying either a single dict
object or via keyword arguments.
-
setProperty(binrange, propname, propvalue)
Set a property for the given bin range. The numerical weight is the
most common property, for which the propname is “weight”.
-
setWeight(bincenter, weight)
Set the bin range weight property.
-
professor.data.addDataCLOptions(parser, ref=False, mc=False, ipol=False, scan=False)
Add data location options to command-line option parser.
Use the flags ref, mc, ... to include data locations as needed. Set
only those flags to True that are actually needed by a script to
keep the CL interface clean.
See also
- DataProxy.fromCLOptions
- Build a DataProxy instance from command line options.
- DataProxy.getPathsFromCLOptions
- Get a dict of data-location paths from command line options.
- addRunCombsCLOptions
- Add the standard CL option for loading lists of run combinations.
-
professor.data.addRunCombsCLOptions(parser)
Add run combination options to command-line option parser.
See also
- addDataCLOptions
- Add standard CL options for data, MC, ipol, etc. directories.
|