Discussion Overview
The discussion revolves around the challenge of defining a similarity metric for comparing weighted distribution functions that represent deviations from unknown background distributions. Participants explore various mathematical approaches and the implications of these metrics in the context of empirical data and simulations.
Discussion Character
- Exploratory
- Technical explanation
- Debate/contested
Main Points Raised
- Some participants suggest metrics like Kullback-Leibler divergence, Earth mover's distance, or Bhattacharyya distance for comparing distributions, but express uncertainty about how to extend these metrics to the problem at hand.
- There is a call for clarity on what constitutes a "good" similarity measure, with emphasis on the need for context regarding decisions that will be made based on these measures.
- One participant questions the meaning of "distribution," asking whether it refers to empirical data or a theoretical model, and highlights the difficulty in measuring deviations without knowledge of the background distribution.
- Another participant describes a scenario involving empirical data and suggests measuring deviations from the sample mean if the background distribution is unknown.
- Concerns are raised about the clarity of terminology, particularly regarding the distinction between probability distributions and histograms of empirical data.
- Participants discuss the complexity of the data format, suggesting that each datum may consist of multiple properties, including spatial coordinates and velocities, along with weights that could be positive or negative.
- There is mention of a 7-dimensional joint distribution and the challenges of processing this data, leading to the need for aggregation of spatially distributed, weighted 2D or 3D velocity distributions for better comprehension.
- One participant expresses uncertainty about the concept of "locally" aggregating distributions and seeks clarification on the meaning of weighted versus unweighted distributions.
Areas of Agreement / Disagreement
Participants do not reach a consensus on the best similarity metric or the definitions of key terms. Multiple competing views and uncertainties remain regarding the nature of the distributions and the appropriate statistical measures to use.
Contextual Notes
Limitations include the ambiguity in defining "good" similarity measures, the unclear distinction between types of distributions, and the challenges posed by the high dimensionality of the data. There are unresolved questions about the nature of weights in the distributions and the implications of merging distributions.