Comparing data sets of different size

  • Context: Graduate 
  • Thread starter Thread starter cepheid
  • Start date Start date
  • Tags Tags
    Data Data sets Sets
Click For Summary

Discussion Overview

The discussion revolves around the comparison of two spectral data sets of the O-type star HD 93521, focusing on the challenges posed by differing data point densities and wavelength spacings. Participants explore methods for modifying the intrinsic spectrum to facilitate comparison with the observed spectrum, particularly through interpolation techniques.

Discussion Character

  • Technical explanation
  • Debate/contested
  • Mathematical reasoning

Main Points Raised

  • One participant describes the need to compare an observed spectrum with a known intrinsic spectrum, noting the differences in data point density and spacing.
  • Concerns are raised about the validity of interpolating the intrinsic spectrum, with worries that this might introduce artifacts or misrepresent the actual flux values at certain wavelengths.
  • Another participant suggests that interpolation is a common and sensible approach for such comparisons, recommending linear interpolation to avoid artifacts associated with more complex methods like splines.
  • It is noted that while interpolation cannot guarantee the presence of information between measured points, it does not inherently add or remove information from the data set.
  • A participant emphasizes the importance of knowing the error bars for both data sets to accurately derive the probability range of calculated values.

Areas of Agreement / Disagreement

Participants generally agree that interpolation is a valid method for comparing the two spectra, although there are differing opinions on the implications of using interpolated data and the potential introduction of artifacts.

Contextual Notes

Participants express uncertainty about the effects of interpolation on spectral features and the reliability of the interpolated data points, highlighting the need for caution in interpreting results.

cepheid
Staff Emeritus
Science Advisor
Gold Member
Messages
5,197
Reaction score
38
I have two spectra that I'd like to compare. One is the observed spectrum of HD 93521, an O-type star commonly used as a spectrophotometric standard. The other is the known, instrinsic spectrum of this star. By dividing the observed spectrum by the intrinsic one, I will be able to deduce the combined atmospheric and instrumental response of the system on the night that the observing was done.

The problem I am having is more of a comp sci problem. The wavelength arrays for the two spectra have totally different spacings (bin sizes). The observational data varies from about 3111 - 5613 angstroms with 4175 data points, evenly spaced. For the instrinsic spectrum, the wavelengths are NOT evenly spaced (because this is an amalgamation of multiple data sets for this star), and there are only 1919 data points in the wavelength range of interest.

What would be the best way of going about modifying the intrinsic spectrum so that it might be compared to the observed one? I could just interpolate, but then I am worried that I am basically just adding made up data points to the spectrum, and that I might destroy some features of the spectrum (and add some spurious other ones).
 
Astronomy news on Phys.org
Okay, maybe some context is needed, since nobody has responded so far.

Here is a portion of the original spectrum: http://img2.imageshack.us/img2/2149/originalm.png

After I use IDL's INTERPOL function to interpolate between these data values so that I have fluxes corresponding to the OTHER wavelength array (the one with a larger number of data points of a different spacing), the result is as follows:

http://img190.imageshack.us/img190/8457/interpolated.png

(crosses are the interpolated data points).

Sure, it looks plausible, but I'm wondering how I could possibly use this spectrum for calibration? These extra data points are not measured, they are just interpolated from the original data. So there's nothing to say that they represent the actual flux values at those wavelengths. On the other hand, I can't think of any other obvious way to compare a spectrum with 1919 data points to one with 4175 data points at entirely different wavelength samples.
 
Last edited by a moderator:
I don't know much about astronomy, but I do have some experience comparing data sets.

So there's nothing to say that they represent the actual flux values at those wavelengths. On the other hand, I can't think of any other obvious way to compare a spectrum with 1919 data points to one with 4175 data points at entirely different wavelength samples.

Indeed, interpolation is as far as I know the only sensible method for problems of this type.
Just make sure you use a "safe" interpolation function that does not add any artifacts (e.g splines should probably be avoided). Since you are only doubling the number of points you can probably just use linear interpolation. For the data you linked to i wouldn't worry at all about interpolating to double the number of points.

Of course you can't be sure that there isn't any information "in between" the points; but that information isn't in the data set to start with (since it wasn't measured); by interpolating you are neither adding nor removing any information (but of course you need to keep in mind that the data has been processed).
Interpolation done right is really no different from e.g. filtering data (in the computer or during the measurement), averaging many data sets, smoothing or any of the other "tricks" we use to process data.
 
Last edited:
f95toli said:
I don't know much about astronomy, but I do have some experience comparing data sets.

Of course you can't be sure that there isn't any information "in between" the points; but that information isn't in the data set to start with (since it wasn't measured); by interpolating you are neither adding nor removing any information (but of course you need to keep in mind that the data has been processed).
Interpolation done right is really no different from e.g. filtering data (in the computer or during the measurement), averaging many data sets, smoothing or any of the other "tricks" we use to process data.

These comments are really helpful, thank you! I think that IDL's "INTERPOL" function defaults to linear interpolation unless if a keyword is specified amongst the arguments to use quadratic, least squares quadratic, or something called cubic spline. I did not set such a keyword, so I am guessing that linear interpolation was used. I can certainly see what you mean about how this method does not add or subtract any spectral features, it merely smoothes out any that might have been there but were undetectable at the spectral resolution of the instrument.
 
You need to know the error bars of both data sets to derive the probability range of calculated values. I agree with f95toli that interpolation is valid.
 

Similar threads

  • · Replies 5 ·
Replies
5
Views
2K
  • · Replies 2 ·
Replies
2
Views
2K
  • · Replies 6 ·
Replies
6
Views
1K
  • · Replies 20 ·
Replies
20
Views
4K
  • · Replies 12 ·
Replies
12
Views
3K
  • · Replies 4 ·
Replies
4
Views
2K
  • · Replies 8 ·
Replies
8
Views
2K
  • · Replies 4 ·
Replies
4
Views
8K
  • · Replies 2 ·
Replies
2
Views
3K
  • · Replies 30 ·
2
Replies
30
Views
3K