Python Python: Help with bestfit line and outliers

  • Thread starter Thread starter DMT
  • Start date Start date
  • Tags Tags
    Line Python
Click For Summary
SUMMARY

The discussion centers on the challenges of using Python's NumPy library, specifically the polyfit function, to calculate the best fit line for scatter plots when outliers distort the results. Users express frustration with outliers affecting the slope and y-intercept, leading to inaccurate representations of data. A suggestion is made that the issue lies within the data quality rather than the functionality of the polyfit tool itself. Additionally, there is a request for methods to account for errors in the data analysis process.

PREREQUISITES
  • Familiarity with Python programming language
  • Understanding of NumPy library and its polyfit function
  • Basic knowledge of scatter plots and linear regression
  • Awareness of data quality issues and their impact on statistical analysis
NEXT STEPS
  • Research methods for identifying and handling outliers in datasets
  • Learn about robust regression techniques to minimize the influence of outliers
  • Explore error analysis methods in Python for better data interpretation
  • Investigate alternative libraries such as SciPy for advanced fitting techniques
USEFUL FOR

Data scientists, statisticians, and Python developers who are working with scatter plots and seeking to improve the accuracy of their best fit line calculations while managing outliers effectively.

DMT
Messages
9
Reaction score
0
I've been having some trouble with outliers messing up my best fit line on my scatter plot in python. I'm using numpy's polyfit function to calculate the slope and y intercept of the best fit line, however I always seem to get one or two points which throw off the slope enough to make quite a noticeable difference. I've already checked a few python references and did a lengthy google search, but haven't found a solution. Does anyone know of a good way to fix this problem without having to limit the interval or physically remove the bad points from my data?

Edit: Also, knowing a way to take errors into account would be very helpful as well.

Thanks!
 
Last edited:
Technology news on Phys.org
I have not used the polyfit function in python, but have used it a lot in Matlab. If have points that are quite far from the best fit line, the best I can say is that the points are not good points. If you are plotting some experiment, then they might be the result of some badly performed experiment. Python, like Matlab, will try to give you the best fit line always. You have yourself said that you haven't found anything on Google. This shows that the software is perfectly fine, and the problem is in your data.
 

Similar threads

  • · Replies 3 ·
Replies
3
Views
2K
  • · Replies 10 ·
Replies
10
Views
3K
  • · Replies 3 ·
Replies
3
Views
1K
  • · Replies 18 ·
Replies
18
Views
2K
  • · Replies 4 ·
Replies
4
Views
4K
  • · Replies 6 ·
Replies
6
Views
1K
  • · Replies 6 ·
Replies
6
Views
2K
  • · Replies 6 ·
Replies
6
Views
4K
  • · Replies 1 ·
Replies
1
Views
2K
  • · Replies 4 ·
Replies
4
Views
5K