Linear Regression of estimated measures / outliers

Hi all, I would like to understand the theory for determining outliers in the following scenario.

Let's say I am to fit a linear model to the data of house size v. sale price for a particular location.

And let's say I have a fairly good linear relationship, as house size increases, so does price.

But then I have a mansion that only sold for $80,000. Well, if it is just one house, I could safely ignore it as an outlier. But if , in my data, I have 50 mansions that all sold for under$100,000, I may have to suspect there is something about a large house that makes it very undesirable to the particular community, and that I should choose a model that reflects this.

My question is - is there a mathematical method as to when to label data as significant or not.

I have thought about creating the %95 confidence interval for a measure, in this case the measure would be the mean price of mansions. Clearly if I only have one mansion in my data...well I don't even know how to construct a 95% CI for such a small sample size, but if i had one, I would leave it out. If I had 50 mansions that sold for a low price, and I had a fairly tight 95% CI around this low price....then at some point I would say it is significant.

Any help on further understanding this would be much appreciated.

Stephen Tashi
I can't tell whether you just want someone to give you a ton of links about methods of removing outliers or whether you are trying to solve a specific problem.

To get an answer to a specific problem, you must have a considerable amount of "given" information (which, in real world problems, mean you must make assumptions). You must also be able to state clearly what you are trying to accomplish.

Based on other threads, many non-statisticians who mention "confidence intervals" in their posts are really talking about "credible intervals" or "prediction intervals". So I hesitate to comment on the method you outlined till that is cleared up.

* investigate it to see whether there was some error in transcription (writing $80000 rather than$800,000, for instance)