Similarity between -/+ weighted distributions

Jarvis323 · Jul 11, 2018

Suppose that you have +/- elements aggregated into a weighted distribution function that represents some deviation from an unknown background distribution.

What would be a good similarity metric for comparing two such distributions (2D or 3D), if they each represent different perturbations from different backgrounds?

In a simple case where the distribution is known, I was looking into Kullback-Leibler divergence, Earth mover's distance, or Bhattacharyya distance, but I would first need to first consider how to properly extend them to handle this problem (if it makes sense).

Alternatively, I could just integrate their difference.

What do you all think?

Stephen Tashi · Jul 12, 2018

Jarvis323 said:

What would be a good similarity metric for comparing two such distributions (2D or 3D), if they each represent different perturbations from different backgrounds?

You must define what you mean by "good" in order to pose a mathematical problem. If you can't articulate a mathematical definition of "good", perhaps you can explain "good" the full context of a real life problem. It is meaningless to speak of "good" or "bad" statistical measures unless it is clear what decisions will be made on the basis of those measures. Can you give an example of what decisions will be made using a similarity measure?

Can you give an example of two different distributions that you want the similarity measure to say are "similar"?

In a simple case where the distribution is known

It isn't clear whether you are using "distribution" to describe empirical data or whether you use "distribution" to mean an assumed model for the data.

Suppose that you have +/- elements aggregated into a weighted distribution function that represents some deviation from an unknown background distribution.

It's hard to imagine such situation. One way to think about it is that my friend Eddie records deviations from a background distribution that he knows. Then Eddie gives me the data without telling me the background distribution. Unless I imagine that somebody knows the background distribution, I don't understand how anyone can measure deviations from it.

If all I have is empirical data then I can measure the deviation of each datum from the sample mean of the data. Is that what you mean by deviations from an unknown background distribution?

Jarvis323 · Jul 12, 2018

Stephen Tashi said:

You must define what you mean by "good" in order to pose a mathematical problem. If you can't articulate a mathematical definition of "good", perhaps you can explain "good" the full context of a real life problem. It is meaningless to speak of "good" or "bad" statistical measures unless it is clear what decisions will be made on the basis of those measures. Can you give an example of what decisions will be made using a similarity measure?

Can you give an example of two different distributions that you want the similarity measure to say are "similar"?It isn't clear whether you are using "distribution" to describe empirical data or whether you use "distribution" to mean an assumed model for the data.
It's hard to imagine such situation. One way to think about it is that my friend Eddie records deviations from a background distribution that he knows. Then Eddie gives me the data without telling me the background distribution. Unless I imagine that somebody knows the background distribution, I don't understand how anyone can measure deviations from it.

If all I have is empirical data then I can measure the deviation of each datum from the sample mean of the data. Is that what you mean by deviations from an unknown background distribution?

Maybe I can describe more precisely and clarify better. The purpose of the similarity measure is to decide whether two distributions are close enough that they can be merged (aggregated into one distribution).

The distributions are different essentially in that they are conditioned differently. The full joint distribution is 7 dimensional. The 7th dimension is a weight that can be negative or positive, which is used in a computer simulation in order to perturb an evolving background distribution. Storing and processing the full 7D joint distribution, or the evolving background is difficult because of the scale. An analysis is done on the data by binning the particles post-hoc into spatial bins, then binning them into weighted 2D or 3D velocity distributions. We want to further locally aggregate these spatially distributed, weighted 2D velocity distributions to show an overview which can be comprehended by a human being. The shape and scale should at least be preserved.

Stephen Tashi · Jul 12, 2018

Jarvis323 said:

The purpose of the similarity measure is to decide whether two distributions are close enough that they can be merged (aggregated into one distribution).

Your use of the word "distribution" is unclear. There are distributions in the sense of probability distributions. There are "distributions" in the sense of histograms of empircal data. My guess is that you are tallking about histograms of empirical data. My guess that by "aggregating" them, you mean to take two histograms of empircal data and combine them into one histogram. It isn't clear what criteria you want to use to specify that two histograms should be aggregated into one histogram. My guess is that you want a statistic to use in a hypothesis test about whether the two histograms come from the same probability distribution.

The distributions are different essentially in that they are conditioned differently. The full joint distribution is 7 dimensional. The 7th dimension is a weight that can be negative or positive, which is used in a computer simulation in order to perturb an evolving background distribution.

In your field of study, the term "background distribution" may be well known. However, just from the terminology of mathematical statisics it isn't clear whether you are talking about a probability distribution or somethjing computed from empirical data. One possibility is that as time passes, more data is collected and the estimate for some probability distribution is updated. Another possiblity is that the computer program uses a changing background distribution specified by a theory and not updated by the data.

Storing and processing the full 7D joint distribution, or the evolving background is difficult because of the scale. An analysis is done on the data by binning the particles post-hoc into spatial bins, then binning them into weighted 2D or 3D velocity distributions.

I don't know what a "weighted" probability distribution would be versus an "unweighted" one. Can you give an example of each?

We want to further locally aggregate these spatially distributed, weighted 2D velocity distributions to show an overview which can be comprehended by a human being. The shape and scale should at least be preserved.

I don't understand what you mean by "locally" aggregate.

The first step in describing a problem in statistics is to state the format of the data and explain what it means. I'll guess the format and you can comment on it.

Each datum is a vector of 7 numbers. Each number quantifies a different property of something. Since you speak of "spatial binning" and "velocities", I'll assume the data has the format (x,y,z,vx,vy,vz,w) where the (x,y,z) is a spatial location and the v's are velocities. (It isn't clear what the "weight" w is. Presumably not "mass".)

Jarvis323 · Jul 12, 2018

I think it's quite complicated to explain all of the details (which I only partially understand myself). Maybe my previous explanation wasn't quite right. Also, I was trying to generalize enough that I might get some suggestions that I could evaluate more deeply on my own.

There is a particle distribution function, ##f(x,y,z,v_x,v_y,v_z,t)##.
https://en.wikipedia.org/wiki/Distribution_function

There are marker particles moving around in ##(x,y,z,vx,vy,vz)## that carry time varying weights. Their weights represent a change in the number of real particles near ##(x,y,z,v_x,v_y,v_z)## between time steps. In other words, they evolve the particle distribution function.

The data I have is just a large sample of the particles with their positions, velocities and weights. First we are binning them spatially, then binning their weights in ##(v_x,v_y,v_z)##. So each spatial bin has a weighted velocity histogram (although we could also use a GMM or something like that). These "distributions" represent an approximation of how many particles are leaving or entering the respective regions of phase-space, normalized by phase-space volume (change in particles per unit phase-space volume, by position and velocity).

We want to merge similar spatial bins (to represent the merged bins instead with one histogram). So by locally merging, I mean merging adjacent spatial bins. I'm not completely clear on all of the insights that people will try to extract from the result, except that the shape and features are meaningful for studying the turbulence, self-organization phenomena, and for understanding and diagnosing issues regarding the degradation of simulation accuracy and performance.

Stephen Tashi · Jul 13, 2018

You have greatly clarified the situation. In particular the link to the physical definition of a distribution function is helpful. However, applying statistics to a situation in a logical manner requires applying probability theory. In your data, it is not yet clear how probability enters the picture - if it does at all.

The physical "distribution function" defined in https://en.wikipedia.org/wiki/Distribution_function "gives the number of particles per unit volume in single-particle phase space." That definition says nothing about a probability. If we were to interpret "the number of particles" to mean "the average number of particles" we would give ourselves the option of introducing probability.

I don't yet understand if there is anything probabilistic about the behavior of the marker particles. If we are dealing with data from a deterministic process then defining a measure of whether two parts of it are "similar" is distinct from a concept of "similar" based on the idea that two random samples of data are "similar" if they are generated by the same underlying probability distribution.

You mentioned measuring the difference between two functions by least squares.

Alternatively, I could just integrate their difference.

That idea could be embellished. For example if f1(vx,vy) and f2(vx,vy) are two functions, you could find the way to translate them and rotate them so their least squares difference is as small as possible and define that minimum least squares difference to be the measure of similarity. Whether that makes sense depends on whether rotation and translation have some useful interpretation in the physics of the situation.

Similarity between -/+ weighted distributions

Discussion Overview

Discussion Character

Main Points Raised

Areas of Agreement / Disagreement

Contextual Notes

Similar threads

Graduate Hypothesis testing: Defining H0, HA hypotheses so that ( H_A)_A' makes sense

Undergrad My basic understanding of set theory

Undergrad The problem of points

Graduate Expected numbers of cards of a last color remaining

Undergrad How does axiom of foundation prevent infinite sequence of elements?

Insights Revisiting the Velocity-Time Function

Insights Remote Operated Gate Control System

Insights AI Enriched Problem Solving

Insights Thinking Outside The Box Versus Knowing What’s In The Box

Insights Why Entangled Photon-Polarization Qubits Violate Bell’s Inequality

Insights Quantum Entanglement is a Kinematic Fact, not a Dynamical Effect