Hi all, I would like to understand the theory for determining outliers in the following scenario.(adsbygoogle = window.adsbygoogle || []).push({});

Let's say I am to fit a linear model to the data of house size v. sale price for a particular location.

And let's say I have a fairly good linear relationship, as house size increases, so does price.

But then I have a mansion that only sold for $80,000.

Well, if it is just one house, I could safely ignore it as an outlier. But if , in my data, I have 50 mansions that all sold for under $100,000, I may have to suspect there is something about a large house that makes it very undesirable to the particular community, and that I should choose a model that reflects this.

My question is - is there a mathematical method as to when to label data as significant or not.

I have thought about creating the %95 confidence interval for a measure, in this case the measure would be the mean price of mansions. Clearly if I only have one mansion in my data...well I don't even know how to construct a 95% CI for such a small sample size, but if i had one, I would leave it out. If I had 50 mansions that sold for a low price, and I had a fairly tight 95% CI around this low price....then at some point I would say it is significant.

Any help on further understanding this would be much appreciated.

**Physics Forums | Science Articles, Homework Help, Discussion**

Dismiss Notice

Join Physics Forums Today!

The friendliest, high quality science and math community on the planet! Everyone who loves science is here!

The friendliest, high quality science and math community on the planet! Everyone who loves science is here!

# Linear Regression of estimated measures / outliers

**Physics Forums | Science Articles, Homework Help, Discussion**