Regression with uncertain data

In summary, you have a large data set of average wages in different areas, but you don't have data on who is working where or how many people are working in each area. You want to reduce the data set to a model that includes only people who live and work in the same area, but the data set is too large and the regression is weak.
  • #1
mrburns404
2
0
So I have this set of statistical data, which is not completely relevant to what I want to model, and I would like to compensate for that somehow since I do not have the more precise data.

I have about 500 observations of average wages in certain areas which are modeled as dependent on several other parameters (taxes in the area, education of people living in the area, age, etc). The problem is, for each one of those areas I know in percent (from about 5% up to 50%) amount of people traveling to other areas to work there (and ofc getting paid by that area's standard), while still living in home area (and ofc contributing to parameters in home area).

Any ideas how to deal with this kind of problem? I was thinking about weighted regressions but I got kinda stuck since they use standard deviations which is different from what I have.PS I am working with regressions in Excel but any help would be appreciated.
 
Physics news on Phys.org
  • #2
Can you confirm whether this description of your problem is correct?

Ideally your model would be that a person's wages are a function of where they work, or possibly a function of where they work and where they live, but you only have data on where they live. You know what percentage of people work in different areas than where they live, but you don't know who they are or where they are working.

For example you have regions A, B and C. You know the wages of people living in region A, and you also know 10% of people living in A work in B or C, but you don't know who they are or how many work in B and how many in C?
 
  • #3
Yes, that seems pretty accurate.

Ideally a model without those "travelling" people would be enough if there were statistical data over people who live and work in same area. So I am trying to somehow reduce the data which include everyone AND the "uncertainty coefficient" for each area expressed in % of people traveling to work (more travels = less reliable data) to this ideal model.

I am still getting meaningful results but the regression is very weak, R squared is ~0.3 or so.
 
Last edited:

1. What is regression with uncertain data?

Regression with uncertain data is a statistical method used to analyze the relationship between a dependent variable and one or more independent variables, while taking into account the uncertainty or variability in the data. This uncertainty can arise from measurement errors, missing data, or other sources.

2. How is regression with uncertain data different from traditional regression?

Traditional regression assumes that the data is precise and without any uncertainty. However, regression with uncertain data takes into account the variability in the data and provides more accurate estimates of the relationship between variables.

3. What are the benefits of using regression with uncertain data?

Using regression with uncertain data allows for more robust and reliable results, as it takes into account the uncertainty in the data. It also provides a more realistic representation of the relationship between variables.

4. How is uncertainty quantified in regression with uncertain data?

Uncertainty is quantified using methods such as confidence intervals, standard errors, and probabilistic modeling. These measures provide a range of values within which the true value of the relationship between variables is likely to fall.

5. What are the implications of ignoring uncertainty in regression analysis?

If uncertainty is ignored in regression analysis, the results may be biased and misleading. It may also lead to incorrect conclusions about the relationship between variables and can result in inaccurate predictions. Therefore, it is important to consider uncertainty in regression analysis to obtain more accurate and reliable results.

Similar threads

  • Set Theory, Logic, Probability, Statistics
Replies
23
Views
2K
  • Set Theory, Logic, Probability, Statistics
Replies
8
Views
1K
  • Set Theory, Logic, Probability, Statistics
Replies
4
Views
1K
  • Set Theory, Logic, Probability, Statistics
2
Replies
64
Views
3K
  • Set Theory, Logic, Probability, Statistics
Replies
2
Views
348
  • Set Theory, Logic, Probability, Statistics
Replies
2
Views
1K
  • STEM Educators and Teaching
Replies
11
Views
2K
  • Set Theory, Logic, Probability, Statistics
Replies
4
Views
1K
  • Set Theory, Logic, Probability, Statistics
Replies
4
Views
917
  • Set Theory, Logic, Probability, Statistics
Replies
11
Views
2K
Back
Top