Python: Help with bestfit line and outliers

DMT · Jul 7, 2010

I've been having some trouble with outliers messing up my best fit line on my scatter plot in python. I'm using numpy's polyfit function to calculate the slope and y intercept of the best fit line, however I always seem to get one or two points which throw off the slope enough to make quite a noticeable difference. I've already checked a few python references and did a lengthy google search, but haven't found a solution. Does anyone know of a good way to fix this problem without having to limit the interval or physically remove the bad points from my data?

Edit: Also, knowing a way to take errors into account would be very helpful as well.

Thanks!

Wrichik Basu · Sep 28, 2019

I have not used the polyfit function in python, but have used it a lot in Matlab. If have points that are quite far from the best fit line, the best I can say is that the points are not good points. If you are plotting some experiment, then they might be the result of some badly performed experiment. Python, like Matlab, will try to give you the best fit line always. You have yourself said that you haven't found anything on Google. This shows that the software is perfectly fine, and the problem is in your data.

Python: Help with bestfit line and outliers

Similar threads

Is A.I. more than the sum of its parts?

AI vs. Humans as Processors in an Environment

Sweetspot of data compression

Other than just FizzBuzz to test programmer candidates

How to show RS(U+TRS)* is equivalent to (R+SUT)SU?

Insights Revisiting the Velocity-Time Function

Insights Remote Operated Gate Control System

Insights AI Enriched Problem Solving

Insights Thinking Outside The Box Versus Knowing What’s In The Box

Insights Why Entangled Photon-Polarization Qubits Violate Bell’s Inequality

Insights Quantum Entanglement is a Kinematic Fact, not a Dynamical Effect