Averaging Two Sets of Data with Different Lengths in Matlab

  • Context: MATLAB 
  • Thread starter Thread starter mathman44
  • Start date Start date
  • Tags Tags
    Data Matlab Sets
Click For Summary

Discussion Overview

The discussion revolves around how to average two sets of data in MATLAB when they have different lengths and correspond to the same variable "x". Participants explore various methods for averaging, including the possibility of using weighted means and subsets of data that match specific values of "x".

Discussion Character

  • Exploratory
  • Technical explanation
  • Debate/contested
  • Mathematical reasoning

Main Points Raised

  • One participant asks how to average two data sets of different lengths, emphasizing that simple averaging is not feasible due to differing scales.
  • Another participant suggests using a weighted mean and taking subsets of the longer data set when the values of "x" match those in the shorter set.
  • There is a proposal to use parametric fitting methods, such as linear regression, to find a best fit for the data sets, which could provide a more rigorous analysis.
  • Some participants note that the "x" values are identical at only a few points, which complicates the averaging process.
  • Concerns are raised about the quality of the data, including gaps and formatting issues that may affect the analysis.
  • One participant mentions the professor's suggestion for averaging but questions its validity given the different sizes of the data sets.

Areas of Agreement / Disagreement

Participants express differing views on the best approach to averaging the data sets, with no consensus reached on a single method. Some agree on the need to match "x" values, while others propose alternative statistical methods.

Contextual Notes

Participants highlight limitations in the data, including gaps and formatting issues that may affect the averaging process. The discussion also reflects uncertainty regarding the interpretation of the averaging task.

Who May Find This Useful

This discussion may be useful for individuals working with experimental data in physics or related fields, particularly those seeking methods for data analysis in MATLAB.

mathman44
Messages
204
Reaction score
0
Hi,

I have 2 sets of data, one is 472 data points long, the other 370. They are both a function of the same variable "x", they both have the same value for "x" as the last point and the same value for "x" as the first point.

I'm being asked to average the two sets of data, but obviously I can't just use "(y+y2)/2". I also can't cut the 472 data set to 370 and then average because the scale is off. What can I do? Can I have MATLAB average values only when they correspond to the same value of "x"?

Any help would be greatly appreciated. Thanks.
 
Physics news on Phys.org
What can of value is x? Scalar or vector?
 
Topher925 said:
What can of value is x? Scalar or vector?

Both x and y are scalar.
 
mathman44 said:
What can I do? Can I have MATLAB average values only when they correspond to the same value of "x"?
What about a weighted mean?

Can I have MATLAB average values only when they correspond to the same value of "x"?
Yeah, you just take a subset of the 472 set when the x is equal to the x in the 370 set.
something like s1=a(a[1]==b[1])
 
Last edited:
Well if they scalars then just sum all the values and divide by x1 + x2.
 
"I'm being asked to average the two sets of data..."

Are you sure you understood what you are asked to do? This could mean several things.
 
story645 said:
Yeah, you just take a subset of the 472 set when the x is equal to the x in the 370 set.
something like s1=a(a[1]==b[1])

This would work perfectly, do you happen to know the exact command?
 
story645 said:
Yeah, you just take a subset of the 472 set when the x is equal to the x in the 370 set.

GCD[472-1,370-1]==3. The x values are identical at exactly 4 points. This would entail ignoring most of the data.
 
The x values are identical at many (75% + ball-parking it) points...
 
  • #10
If I understand you correctly, you have two data sets that are functions of x but don't necessarily contain the same x values. A rigorous analysis is a parametric fit, that is, a best fit to a function, typically found by minimizing the least-squared error. Maximum likelihood methods are also common. For example, if y depends linearly on x then a linear regression to all of the data points gives the best fit values of slope and intercept. The best estimate of y at any given x is then easily calculated. Other common functionals are polynomials, exponentials, etc. Non-linear functions are more complicated to fit, of course.
 
  • #11
The x values are identical at exactly 4 points and some close matches.

In any case, one interpretation of the problem encoded in Mathematica is

( (Sum [ y1, {i, 472} ] /472.) + (Sum [ y2, {i, 370} ]/370.) )/2.
 
  • #12
Phrak said:
The x values are identical at exactly 4 points and some close matches.

In any case, one interpretation of the problem encoded in Mathematica is

( (Sum [ y1, {i, 472} ] /472.) + (Sum [ y2, {i, 370} ]/370.) )/2.


They are non-linear sets of data. And why are the x values identical at only 4 points? Look at the sets of data, attached. A is the magnetic field. The prof suggested that we do "Rxxmn + Rxxpl"/2, but it's evident he didn't realize the sets are different sizes...
 

Attachments

  • #13
Good; it helps to know you are doing experimental physics, rather than an applied
math abstraction. I'm still trying to interpret your text files, n stuff.
 
Last edited:
  • #14
Each file has 443 lines of data. A seems to be your independent variable (or intends to be somewhat independent, abit instrument noise maybe) ranging from -0.02 to 4.40 incremented in units of 0.01.

Where do you get the 472 and 370 counts?
Are the rxx's and rxy's your independent data for which you wish to find averages?
 
  • #15
Phrak said:
Are the rxx's and rxy's your independent data for which you wish to find averages?

Yes, I want to average the Rxx's and the Rxy's. The data was supplied by my prof.
 
Last edited:
  • #16
OK... Looking at the Minus.tex file, you have just as many Rxymn entries as Rxxmn entries. 443 each. The formatting is bad so that it looks like there are holes in the Rxxmn column of data that are due to mis-tabbing. Is this what you are talking about? The Rxypl and Rxxpl also have 443 entries apiece.
 
Last edited:
  • #17
? I see 371 entries for the Rxypl/Rxxpl.
 
Last edited:
  • #18
I see that now. The danged data is full of gaps.
 

Similar threads

  • · Replies 4 ·
Replies
4
Views
1K
  • · Replies 2 ·
Replies
2
Views
4K
  • · Replies 1 ·
Replies
1
Views
3K
  • · Replies 1 ·
Replies
1
Views
2K
  • · Replies 6 ·
Replies
6
Views
1K
  • · Replies 14 ·
Replies
14
Views
4K
  • · Replies 5 ·
Replies
5
Views
3K
Replies
2
Views
3K
  • · Replies 12 ·
Replies
12
Views
4K
  • · Replies 3 ·
Replies
3
Views
8K