Summing simple histograms to recreate a more complex one

Click For Summary
SUMMARY

The discussion centers on the challenge of reconstructing a complex histogram from a database of simpler histograms, specifically in the context of recreating a perfume's spectral profile. The user seeks to iteratively sum histograms of ingredients like Vanilla and Jasmine until the original histogram is closely approximated. This problem is identified as a linear programming challenge, akin to the animal feed mixing problem, where the goal is to find a cost-effective combination of ingredients that meets specific criteria. Key mathematical concepts include linear combinations and the potential need for additional constraints to ensure a unique solution.

PREREQUISITES
  • Understanding of linear programming techniques
  • Familiarity with histogram data representation
  • Knowledge of spectral analysis in chemistry
  • Basic principles of optimization problems
NEXT STEPS
  • Research linear programming methods for optimization problems
  • Explore techniques for histogram summation and reconstruction
  • Study the application of mixture models in statistical analysis
  • Investigate the mathematical representation of probability distributions
USEFUL FOR

Chemists, data scientists, and optimization specialists seeking to solve complex reconstruction problems involving histograms and spectral data.

Master Sidoshi
I wouldn't be surprised if I've posted in the wrong section because in fact the reason for posting is to get help naming this problem. That being the first step to knowing where to look for a solution. Newbie to the forum so open to advice.

The problem: I have a complex histogram and a database of 600 or so less complex histograms. I know all the 'ingredients' (less complex histograms) I need to recreate the complex one are in my database. Some histograms in my db have unique X/Y data pairs ('bars') so definitely belong in my 'reconstructed' histogram the other bars or X/Y data pairs will have to be recreated by summing various 'ingredients' from my database. I need to iteratively sum histograms from my database until I can as closely as possible recreate the original histogram. What class of problem is this?

Real world application: I have a perfume, I want to recreate it. In the laboratory I acquire a spectrum (wavelength vs. Intensity; chemical shift vs. Intensity or m/z vs intensity etc - basically x/y data points or histograms) of the perfume. In my database I have spectra (histograms) of Sandlewood Extract, Vanilla, Jasmine, Patchouli, Neroli etc etc. I want to sum the Vanilla, Jasmin etc 'histograms' in various combinations/iterations until I've recreated the original perfume histogram.

NB: The Vanilla, Jasmine and other 'ingredient' histograms will have multiple 'peaks' or 'bars', say a dozen, which cannot vary in relative intensity (y-axis), those (relative) values are fixed. Lemon oil and orange oil will have some x-axis values that overlap so will sum the intensity (y value) for that x value if both are used in final solution. The final solution is a histograms with 100's of peaks.

I don't even know where to begin looking for a solution as I don't know what the problem is called. The best tags I could come up with were 'iterative' and 'optimization'.
 
Physics news on Phys.org
This is very much like the animal feed mixing problem where the relative proportions of numerous ingredients with known nutrition component profiles and costs are set so that the resultant feed stock has the required overall nutrition component profile and cost of manufacture is minimised .

The actual optimisation calculations are done on the computer using linear programming methods .
 
Last edited:
Master Sidoshi said:
I want to sum the Vanilla, Jasmin etc 'histograms' in various combinations/iterations until I've recreated the original perfume histogram.

A tempting way to model your problem is to say it amounts to finding a way to express a vector as a linear combination of other vectors where all the coefficents in the linear combination are non-negative numbers representing the fraction of each vector that is used in the combination.

In many real life problems of this nature, there is no unique solution unless you add other requirements. For example, if you assign a cost to each ingredient, you could ask what for the least costly combination of the ingredients that produces the desired final histogram. That would cast the problem as a "linear programming" problem.

If you don't have a simple function (such as total cost) to minimize, then you need to decide what to do if there are many possible ways of achieving the desired total histogram. Do you have other criteria that would make one solution more plausible or desirable than another? You might get some hints from the mathematics used in the statistical problem of representing a probability distribution as a "mixture" of other distributions.

The problem is more complicated and more interesting if you have histograms of different physical properties - i.e. if you have N histograms of different physical quantities, giving you N total histograms for the unknown substance and N histograms fo each of the known substances.

You also must consider whether the physical properties are actually additive. For example, light emitted when the atoms of compound are excited might be absorbed by another compound in solution with it.
 

Similar threads

  • · Replies 3 ·
Replies
3
Views
3K
  • · Replies 2 ·
Replies
2
Views
2K
  • · Replies 2 ·
Replies
2
Views
489
  • · Replies 13 ·
Replies
13
Views
3K
  • · Replies 3 ·
Replies
3
Views
2K
  • · Replies 3 ·
Replies
3
Views
2K
  • · Replies 6 ·
Replies
6
Views
3K
Replies
4
Views
21K
  • · Replies 4 ·
Replies
4
Views
1K
  • · Replies 4 ·
Replies
4
Views
5K