Comparing data sets of different sizes

Click For Summary

Discussion Overview

The discussion revolves around the challenge of comparing two data sets with different sizes and spacing of x values, specifically in the context of astronomical spectra. Participants explore methods for modifying one data set to facilitate comparison with another, while expressing concerns about the implications of interpolation.

Discussion Character

  • Exploratory
  • Technical explanation
  • Debate/contested

Main Points Raised

  • One participant expresses concern about interpolating the second data set, fearing it may introduce spurious data points or obscure important features.
  • Another participant suggests that the relevance of the x array positions needs to be considered, indicating that zero-padding could be a potential approach, but it depends on the importance of data order.
  • There is a proposal to sample the larger x array to match the size of the smaller one, but this is met with skepticism regarding its usefulness if the wavelengths do not match exactly.
  • A participant raises questions about the filtering methods used during data collection, suggesting that understanding the filtering characteristics might inform better interpolation strategies.

Areas of Agreement / Disagreement

Participants do not reach a consensus on the best approach to modify the second data set. There are competing views on the appropriateness of interpolation and the relevance of data point positions, indicating ongoing uncertainty in the discussion.

Contextual Notes

The discussion highlights the complexities of working with data sets that have different characteristics, including spacing and size, particularly in the context of astronomical measurements.

cepheid
Staff Emeritus
Science Advisor
Gold Member
Messages
5,197
Reaction score
38
I have two data sets, each having its own array of x values and its own corresponding array of y values. I want to divide the y-values of two data sets. The problem I am having is that the x arrays for the two sets have totally different spacings (bin sizes). One set has 4175 data points, evenly spaced. The other set has its x values NOT evenly spaced and there are only 1919 data points in it.

What would be the best way of going about modifying the second data set so that it might be compared to the first one? I could just interpolate, but then I am worried that I am basically just adding made up data points to the y values for the second set, and that I might destroy some features in it, or add spurious ones.
 
Technology news on Phys.org
cepheid said:
I have two data sets, each having its own array of x values and its own corresponding array of y values. I want to divide the y-values of two data sets. The problem I am having is that the x arrays for the two sets have totally different spacings (bin sizes). One set has 4175 data points, evenly spaced. The other set has its x values NOT evenly spaced and there are only 1919 data points in it.

What would be the best way of going about modifying the second data set so that it might be compared to the first one? I could just interpolate, but then I am worried that I am basically just adding made up data points to the y values for the second set, and that I might destroy some features in it, or add spurious ones.

A few questions:

Are the positions in the x arrays relevant? You can perhaps zero-pad the smaller array to make it the same size of the larger one. This is a tricky question to answer because it is certainly do-able, but we'd need to know more about the data and if the order and positioning of the data points is important.

So essentially you have 4 arrays? x1, y1, x2, y2? And the y arrays are derived from the x arrays?

Can you maybe sample the larger x array and extract the number of values equal to the smaller x array? Then generate y arrays that are of equal size?
 
The positions in the x arrays are relevant. The x arrays are wavelengths. The y arrays are essentially intensities. So these are spectra. See the astronomy thread that I linked to for more details.

Yes, there are four arrays as you described. The y arrays are not derived from the x arrays. They are observed/measured intensities for each wavelength.

Sampling the larger x array may not be that useful, since the other data I have that I'm going to calibrate off these data are equally as large. Also, what if none of the wavelengths in x1 exactly match those in x2?
 
Seems to me you're stuck interpolating. When the samples are taken, how is the filtering done: how steep are the ramps of the high and low pass band filters for each frequency range used in the sample gathering? The filters are in effect acting as interpolators already. When you mention "equally" spaced, is this linear, logarithmic, ... ? How large is the range and domain of the sampled data set? Interpolation somewhat modeled after the filters might improve the results.
 

Similar threads

  • · Replies 14 ·
Replies
14
Views
4K
Replies
86
Views
3K
  • · Replies 1 ·
Replies
1
Views
2K
  • · Replies 1 ·
Replies
1
Views
3K
  • · Replies 10 ·
Replies
10
Views
2K
  • · Replies 22 ·
Replies
22
Views
5K
  • · Replies 11 ·
Replies
11
Views
2K
  • · Replies 1 ·
Replies
1
Views
12K
  • · Replies 12 ·
Replies
12
Views
2K
Replies
6
Views
1K