Summing simple histograms to recreate a more complex one

Master Sidoshi · Jul 19, 2017

I wouldn't be surprised if I've posted in the wrong section because in fact the reason for posting is to get help naming this problem. That being the first step to knowing where to look for a solution. Newbie to the forum so open to advice.

The problem: I have a complex histogram and a database of 600 or so less complex histograms. I know all the 'ingredients' (less complex histograms) I need to recreate the complex one are in my database. Some histograms in my db have unique X/Y data pairs ('bars') so definitely belong in my 'reconstructed' histogram the other bars or X/Y data pairs will have to be recreated by summing various 'ingredients' from my database. I need to iteratively sum histograms from my database until I can as closely as possible recreate the original histogram. What class of problem is this?

Real world application: I have a perfume, I want to recreate it. In the laboratory I acquire a spectrum (wavelength vs. Intensity; chemical shift vs. Intensity or m/z vs intensity etc - basically x/y data points or histograms) of the perfume. In my database I have spectra (histograms) of Sandlewood Extract, Vanilla, Jasmine, Patchouli, Neroli etc etc. I want to sum the Vanilla, Jasmin etc 'histograms' in various combinations/iterations until I've recreated the original perfume histogram.

NB: The Vanilla, Jasmine and other 'ingredient' histograms will have multiple 'peaks' or 'bars', say a dozen, which cannot vary in relative intensity (y-axis), those (relative) values are fixed. Lemon oil and orange oil will have some x-axis values that overlap so will sum the intensity (y value) for that x value if both are used in final solution. The final solution is a histograms with 100's of peaks.

I don't even know where to begin looking for a solution as I don't know what the problem is called. The best tags I could come up with were 'iterative' and 'optimization'.

Nidum · Jul 19, 2017

This is very much like the animal feed mixing problem where the relative proportions of numerous ingredients with known nutrition component profiles and costs are set so that the resultant feed stock has the required overall nutrition component profile and cost of manufacture is minimised .

The actual optimisation calculations are done on the computer using linear programming methods .

Stephen Tashi · Jul 19, 2017

Master Sidoshi said:

I want to sum the Vanilla, Jasmin etc 'histograms' in various combinations/iterations until I've recreated the original perfume histogram.

A tempting way to model your problem is to say it amounts to finding a way to express a vector as a linear combination of other vectors where all the coefficents in the linear combination are non-negative numbers representing the fraction of each vector that is used in the combination.

In many real life problems of this nature, there is no unique solution unless you add other requirements. For example, if you assign a cost to each ingredient, you could ask what for the least costly combination of the ingredients that produces the desired final histogram. That would cast the problem as a "linear programming" problem.

If you don't have a simple function (such as total cost) to minimize, then you need to decide what to do if there are many possible ways of achieving the desired total histogram. Do you have other criteria that would make one solution more plausible or desirable than another? You might get some hints from the mathematics used in the statistical problem of representing a probability distribution as a "mixture" of other distributions.

The problem is more complicated and more interesting if you have histograms of different physical properties - i.e. if you have N histograms of different physical quantities, giving you N total histograms for the unknown substance and N histograms fo each of the known substances.

You also must consider whether the physical properties are actually additive. For example, light emitted when the atoms of compound are excited might be absorbed by another compound in solution with it.

Summing simple histograms to recreate a more complex one

Similar threads

Undergrad The vector to which a dual vector corresponds

Graduate Confusion about the Moyal-Weyl twist

Undergrad 2 interpretations of bra-ket expression: equal, & isomorphic, but...

Undergrad Spinor calculus

Undergrad Matrix representation of rank-2 spinors

Insights Revisiting the Velocity-Time Function

Insights Remote Operated Gate Control System

Insights AI Enriched Problem Solving

Insights Thinking Outside The Box Versus Knowing What’s In The Box

Insights Why Entangled Photon-Polarization Qubits Violate Bell’s Inequality

Insights Quantum Entanglement is a Kinematic Fact, not a Dynamical Effect