Skip to content

Support to Convert .fit Results to CSV (or any format)#393

Open
kevScheuer wants to merge 39 commits into
masterfrom
csv_converter
Open

Support to Convert .fit Results to CSV (or any format)#393
kevScheuer wants to merge 39 commits into
masterfrom
csv_converter

Conversation

@kevScheuer
Copy link
Copy Markdown
Contributor

This request is to merge a script and set of classes that will allow any Amptools-based analysis to convert their .fit results into a comma-separated value (CSV) file. Several plotters already exist for analyzing fit results per bin, and these are very well suited for analyzing the angular distributions, but mass-independent fits must "stitch" together their fit results to observe any behavior of the amplitudes and phases across mass bins. In addition, the 100s of fit results produced by bootstrap or randomized fits have no standard way to be aggregated. This CSV converter is designed to fill this gap in the analysis process. Below I've provided a short description for each component added.

convert_to_csv

This is the primary script that users will interact with. A user with several fit results result_1.fit, result_2.fit... can simply execute

user@ifarm:~$ convert_to_csv -i dir/result_*.fit

and a CSV will be made where each row corresponds to the .fit file, and the columns indicate AmpTools fit outputs, parameters, intensities, and phase differences.

This CSV can then be read into a Python Pandas dataframe, ROOT tree or dataframe, or used by practically any programming language, and then plotted. The script is designed to be as generic as possible, so that any AmpTools-based analysis can use it. Listed are some more highlighted features of the script

  • Info about the data (number of events, t_bin info, mass bin info, etc.) the fit was run on is extracted with the --data-file flag. It will read the associated data (with optional weights and/or background) files of the result and extract the info to a CSV file
    • Different reactions are supported via the --lower-vertex-indices flag. This tells the ROOTDataConverter which 4-vector indices correspond to the upper or lower vertex, thus allowing the correct calculation of the mass and $-t$ info
      • By default, the reaction is assumed to simply be a recoil proton
  • Can produce covariance, correlation, and normalization integral matrices
  • Identifies coherent sums according to the amplitude naming scheme, which can be explicitly set via --naming-scheme
    • See AmplitudeParser for more details

FitConverter

Handles the .fit -> .csv conversion. This class stores:

  • Standard fit outputs (likelihood, events, status codes)
  • Parameters
  • Production Coefficients
  • Intensities of unique amplitudes (see here for more explanation)
  • Coherent sums of amplitudes by quantum number (see AmplitudeParser below)
  • Phase Differences between amplitudes

Currently supports .fit -> .csv conversion, but can easily be expanded to any file format desired. This is because all the results of interest are stored in various maps, and so writing to CSV is as easy as iterating over the maps.

ROOTDataConverter

This class is responsible for extracting the PWA-related information from a ROOT file. It stores:

  • Bin edges, centers, averages, and RMS values for $-t$, beam energy, and upper vertex masses
  • Number of events and detector efficiency

Just like the FitConverter, any file format beyond CSV can be used. To get the info, the class uses the data and monte carlo files associated with the fit. If available, it also properly incorporates event weights or background files. As discussed above, to calculate the mass and $-t$ info, the user specifies the 4-vector indices.

AmplitudeParser

This was the biggest hurdle for generalizing the converter. A lot of times we are not just interested in the individual amplitudes and phases, but their (in)coherent sums, like "total reflectivity contribution" or "behavior of JL waves summed over the spin-projections". The problem is that these sums are typically defined manually, because the amplitudes (and thus their quantum numbers) are user defined. The only way to identify them for grouping is by identifying the naming scheme of the amplitude, but not everyone uses the same scheme.

This class tries to identify the amplitude naming scheme used, and defines a set of possible sums based off the quantum numbers given in the scheme. It currently supports:

  • JLme - the current recommended generic format
  • eJPmL - used for some vector-pseudosalar analyses
  • Lme - common scheme for 2-pseudoscalar analysis
    but can be easily extended to other schemes by users.

Updates from previous version

For those using the older standalone version of this script shown in the last tutorial, I figure its worth it to list some key differences:

  • Fast - pure C++, instead of the clunky python -> subprocess -> ROOT interpreter being done before
  • Easy Start - no longer have to setup python envs and ROOT paths, it's all immediately available in halld_sim now
  • Generalized - data files don't have to be separately called, and all types of analyses should be supported now.

Converter directory now reflects that any
other data converters may be added in the future,
not just CSV.
The parameters are now saved with their errors.
The verbose flag now controls the amount of output
during processing.
Was requiring that amplitudes with common amp
names in "reaction::sum::ampName" format be
constrained to each other. Now it will save
the mapping for unique amplitude groups, e.g.
"ampName", "sum::ampName", or the full
"reaction::sum::ampName" strings.
Files are accessed so many times it makes more
sense to save them. File loading happens in the
constructor now. Also added a background file
bool for easy tracking of whether or not the
background files are present. A template for
getting the -t values is also added, but not yet
implemented. This will also effect how the
other distributions are handled.
The largest addition is a function that extracts
the values of interest for the beam energy, which
incorporates signal and background subtraction.
To help with this, a min/max finder function
was added to find a common min/max value for a
branch across files.

A few other report lines were added, and some
fixes to compile properly.
Uses a RDataFrame method to compute t from the
various 4-vector component branches, then fills a
histogram with the t values. If background files
are present, also computes a background histogram
and subtracts it from the data histogram before
calculating statistics.

Aside from this, small reports and comments were
added.
Removed the mass-branch arg, as the mass can be
calculated from the labeled 4-vectors. The indices
can now be set by the user.

Aside from this, the files have been formatted.
Having the functions return the created histogram
makes it:
1. Easier to understand the purpose of the
function, and doesn't hide the map filling in
the implementation
2. Allows for possibility of printing the hist
for debugging purposes

Also added a function to return the total number
of events and its error
In order to save the coherent sums, a new
AmplitudeParser class was created to parse the
amplitude names and categorize them into groups
based on the quantum numbers they contain. This
relies on known "naming schemes" for the
amplitudes. Currently the most common schemes
are supported, with instructions for how to add
new schemes.
@gluex
Copy link
Copy Markdown

gluex commented Jun 3, 2026

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants