Problem at Work with Regression

  • Context: Undergrad 
  • Thread starter Thread starter Diffy
  • Start date Start date
  • Tags Tags
    Regression Work
Click For Summary
SUMMARY

The discussion centers on the challenges faced in creating a sensitivity analysis tool using multiple linear regression. The user developed a model represented by the equation y = c1*x1 + c2*x2 + ... + cn*xn, where the coefficients (c's) are calculated based on independent variables (x's) to predict a dependent variable (y). Despite achieving a high R² value of 0.978 and an F-statistic of 93.967, the user encounters unexpected results where increasing certain variables leads to a decrease in y, raising questions about the appropriateness of the regression type and the influence of noise in the data. Suggestions include exploring alternative modeling techniques and examining the nature of the data for potential noise reduction.

PREREQUISITES
  • Understanding of multiple linear regression and its mathematical representation.
  • Familiarity with statistical metrics such as R² and F-statistic.
  • Knowledge of data noise and its impact on regression models.
  • Experience with sensitivity analysis and its applications in modeling.
NEXT STEPS
  • Explore alternative regression techniques such as polynomial regression or non-linear regression.
  • Investigate methods for noise reduction in datasets, including data cleaning and transformation techniques.
  • Learn about model validation techniques to assess the accuracy of predictive models.
  • Research best practices in sensitivity analysis to enhance model robustness and reliability.
USEFUL FOR

Data analysts, statisticians, and professionals involved in predictive modeling and sensitivity analysis who seek to improve their understanding of regression techniques and data interpretation.

Diffy
Messages
441
Reaction score
0
Hi

I've been tasked with making a sort of sensitivity analysis tool. The goal was to get as many parameters as possible and then use them to build a model that would allow users to change variables a bit and see what happens to the one dependent variable.

So I used a multi variable regression tool to come up with a single equation:

y = c1*x1 + c2*x2 + ... + cn*xn

Here c's are the calculated coefficients, x's are the independent variables and y is the one dependent value.

Then what I did was I built at tool that allowed the users to adjust the variables they wanted to adjust to see how that changes y. The problem is that when some variables are adjusted up, it makes y go down which does make much sense in our business model.

So I have a few questions
1) Should I be using a different type of regression?
2) Is there a better way to go about this?
3) Are there ways to influence the coefficients I calculate?


Additional info:
Please let me know if additional info is needed.
R^2 = .978
F = 93.967
 
Physics news on Phys.org
So maybe I will try asking in a different way since I did not get any responses yet.

I want to build a model based on variables to predict a value, but my variables are have very little correlation to the value I want to predict. What I mean by this is when I look at scatter plots of my variable versus the values I want to predict there is no immediately obvious best fit line, the graphs are truly scattered!

Is there even a way to build an accurate model?
 
Lies, damn lies and statistics. You are touching a very sensitive subject. The main problem if you are asking physicists is this: We make a model and then we fit it to our data. If the fitted model yields wrong results we discard it, or say that it produces wrong results in variable a, but say that it explains some values for variable b better than other models.

You have decided that you have some process that produces a variable y. Your model assumes that it can be expressed as a function of some variables which has a dominating linear component overshadowed by noise.

In physics we usually know the noise. We can measure if it fits our model. And now we come to your job. You were the person who claims that your data should be model-able in the manner you stated. Your model seems to yield wrong results, maybe you should discard it. Why is there noise in your data? Can you reduce it. Is is gaussian?

Of course there are more tools you could use, and maybe squeeze more from your data, but we don't even know what kind of data you have, and you are probably better advised to look at what other people are doing in your field.

On the other hand your fit parameters don't look so bad. Maybe this site gives you some ideas: http://documents.wolfram.com/applications/eda/FittingDataToLinearModelsByLeast-SquaresTechniques.html
 
Last edited by a moderator:

Similar threads

  • · Replies 8 ·
Replies
8
Views
3K
  • · Replies 4 ·
Replies
4
Views
2K
  • · Replies 13 ·
Replies
13
Views
5K
  • · Replies 30 ·
2
Replies
30
Views
5K
  • · Replies 9 ·
Replies
9
Views
2K
  • · Replies 23 ·
Replies
23
Views
4K
Replies
3
Views
3K
  • · Replies 6 ·
Replies
6
Views
3K
  • · Replies 5 ·
Replies
5
Views
2K
  • · Replies 8 ·
Replies
8
Views
3K