Other linear fitting than least squares

Click For Summary

Discussion Overview

The discussion revolves around alternative methods for linear fitting beyond least squares, particularly in the context of analyzing experimental data. Participants explore issues related to data visibility, the appropriateness of linear models, and the implications of removing certain data points.

Discussion Character

  • Exploratory
  • Technical explanation
  • Debate/contested
  • Experimental/applied

Main Points Raised

  • One participant expresses dissatisfaction with the least squares fit, noting low correlation factors and asking for alternative methods that consider data distribution.
  • Another participant suggests there may be an error in the least squares implementation and prompts for calculations of the fit values for different lines.
  • Concerns are raised about the validity of the red and blue lines, with suggestions that there may be issues in the participant's code.
  • A participant identifies that data hidden beyond axis limits could affect the fitting process and plans to correct this oversight.
  • Discussion includes the concept of leverage plots, with a suggestion to investigate a specific data point that may have a significant impact on the fit.
  • One participant questions the rationale behind fitting a straight line to the data, while another defends the approach based on the expectation of linear correspondence in experimental data.
  • Participants discuss the implications of removing zero data points, with one asserting that it is acceptable to discard data points that are known to represent failed experiments.
  • A later reply highlights the importance of analyzing experimental conditions that lead to outlier points in the dataset.

Areas of Agreement / Disagreement

Participants express differing views on the appropriateness of linear fitting methods and the handling of specific data points. There is no consensus on the best approach to take, and multiple competing perspectives remain throughout the discussion.

Contextual Notes

Participants note limitations related to data visibility and the potential impact of outlier points on the fitting process. The discussion reflects uncertainty regarding the best practices for data handling and model fitting in experimental contexts.

Felipe Lincoln
Gold Member
Messages
99
Reaction score
11
I'm analysing some data and my task is to get a line that best fits the data, using least square I'm getting these dashed curves (red and blue) with low correlation factors. Is there another method that takes into consideration the amount of data placed into the direction of a line?
graph.png
 

Attachments

  • graph.png
    graph.png
    22.3 KB · Views: 698
Physics news on Phys.org
It seems like you must be doing something wrong with your least-squares fit. Can you calculate the least-squares value for the red, blue, and black lines? It looks lower for the black line.
 
  • Like
Likes   Reactions: FactChecker
Those red and blue lines don’t look right. I think there must be something wrong in your code
 
the black line is just the identity y=x. The red and blue I got through my data.
I used the scipy.stats.linregress, can't see what's wrong but I'll take a look again
 
Oh there was some data hiding beyond my axis limits. Sorry for this mistake, I'll fix it and post the result.
graph2.png
 

Attachments

  • graph2.png
    graph2.png
    12.6 KB · Views: 581
So if you do a leverage plot that one datapoint will probably have a huge leverage. I would check that point and see if there is some error. Like maybe a typo when copying the data.
 
  • Like
Likes   Reactions: FactChecker
Aha! So the standard least squared regression is doing a good job on the entire data set. But you should consider that the data visible in your first post looks like it is following a different rule than the entire set. If you can see the reason for that, you may want to analyse the data in sections that make more sense.
 
  • Like
Likes   Reactions: Felipe Lincoln
Why do you think fitting a straight line to that data (however it is done) would be a good idea?
 
Stephen Tashi said:
Why do you think fitting a straight line to that data (however it is done) would be a good idea?
It is an experimental data that is expected to have a linear correspondence.
I just removed the zeroes data and this is what I got now. Thank you all
graph.png
 

Attachments

  • graph.png
    graph.png
    18.7 KB · Views: 545
  • #10
Felipe Lincoln said:
I just removed the zeroes data and this is what I got now.
Do you have any experimental justification for that?
 
  • #11
Dale said:
Do you have any experimental justification for that?
Yes sir. The zeroes was generate by my code to represent experiments that failed and resulted in no data.
 
  • #12
Felipe Lincoln said:
Yes sir. The zeroes was generate by my code to represent experiments that failed and resulted in no data.
That is an excellent reason!

It is never a good idea to throw away data just because it makes your fit better, but if a data point is bad for some specific reason then throwing it out is acceptable.
 
  • Like
Likes   Reactions: Felipe Lincoln
  • #13
FYI, you may want to also look into that data point with the high Rphenix. It looks like it has a very high leverage and it may have some other problem.
 
  • Like
Likes   Reactions: FactChecker
  • #14
Right! The next step in our research is to analyse what was the experiment conditions that bring some points a bit far from the expected. Thanks for your attention Dale!
 
  • Like
Likes   Reactions: Dale

Similar threads

  • · Replies 4 ·
Replies
4
Views
2K
  • · Replies 6 ·
Replies
6
Views
3K
  • · Replies 7 ·
Replies
7
Views
3K
  • · Replies 11 ·
Replies
11
Views
3K
  • · Replies 2 ·
Replies
2
Views
2K
  • · Replies 5 ·
Replies
5
Views
2K
Replies
24
Views
3K
  • · Replies 1 ·
Replies
1
Views
2K
  • · Replies 3 ·
Replies
3
Views
2K
  • · Replies 26 ·
Replies
26
Views
3K