What is Regression: Definition and 359 Discussions

In statistical modeling, regression analysis is a set of statistical processes for estimating the relationships between a dependent variable (often called the 'outcome variable') and one or more independent variables (often called 'predictors', 'covariates', or 'features'). The most common form of regression analysis is linear regression, in which one finds the line (or a more complex linear combination) that most closely fits the data according to a specific mathematical criterion. For example, the method of ordinary least squares computes the unique line (or hyperplane) that minimizes the sum of squared differences between the true data and that line (or hyperplane). For specific mathematical reasons (see linear regression), this allows the researcher to estimate the conditional expectation (or population average value) of the dependent variable when the independent variables take on a given set of values. Less common forms of regression use slightly different procedures to estimate alternative location parameters (e.g., quantile regression or Necessary Condition Analysis) or estimate the conditional expectation across a broader collection of non-linear models (e.g., nonparametric regression).
Regression analysis is primarily used for two conceptually distinct purposes. First, regression analysis is widely used for prediction and forecasting, where its use has substantial overlap with the field of machine learning. Second, in some situations regression analysis can be used to infer causal relationships between the independent and dependent variables. Importantly, regressions by themselves only reveal relationships between a dependent variable and a collection of independent variables in a fixed dataset. To use regressions for prediction or to infer causal relationships, respectively, a researcher must carefully justify why existing relationships have predictive power for a new context or why a relationship between two variables has a causal interpretation. The latter is especially important when researchers hope to estimate causal relationships using observational data.

View More On Wikipedia.org
  1. avner yakov

    A Error estimation in linear regression

    I have a data set of 11 predictors and one response for 1000 observation and i want to do linear regression. I also have measurements errors of the predictors (also 11X1000 matrix) and i need to count for them in the total error estimation. how can i do that?
  2. T

    I ANOVA and Linear Regression Resource

    Hello, Can someone please let me know of a resource (book or other) that explains how to use ANOVA in linear regression? I didn't even know what ANOVA was until some days ago so I'm looking for something that explains it thoroughly with deductions. The resources I've read focused solely on...
  3. CopyOfA

    A Linear regression with discrete independent variable

    Hey, I have a problem where I have a discrete independent variable (integers spanning 1 through 27) and a continuous dependent variable (50 data points for each independent variable). I am wondering about the best method of regression here. Should I just fit to the mean or median? Is there a way...
  4. E

    I Logistic Regression: Estimating Probability of Survival

    I have a simple dataset that consists of one predictor Sex and one response variable Survived. Let's say the estimated coefficients are ##\hat{\beta_0}## and ##\hat{\beta_1}## for the intercept and the coefficient of Sex predictor, respectively. Mathematically this means that: \hat{p}(X) =...
  5. W

    Linear Regression, etc : Standard vs ML Techniques

    Hi All, This is probably trivial. What is the difference between techniques such as Linear/Logistic Regression, others done in ML , when they are done in Standard software packages : Excel ( incl. add-ons ), SPSS, etc? Why use ML algorithms when the same can be accomplished with standard...
  6. T

    Simple linear regression model

    Homework Statement For the first question , why when the humidity increase by 1 percent , the moisture content will increase by 0.2727 percent ? Shouldn't it the moisture content will increase by 0.2727 + 0.4911 percent when the humidity increase by 1 percent ? Second question , it's clear that...
  7. T

    Regression rate of fuel in rocket nozzle

    So we have: stability for n < 1 instability for n > 1 I assume to get the time growth, I have to integrate the above first eqn. However I also get ln(δpc) at the LHS from the integrating. However the above equation for time growth removed that ln(δpc) and thus I can't...
  8. E

    Python Polynomial Regression with Scikit-learn

    Hello, I followed an example in a book that compares polynomial regression with linear regression. We have one feature or explanatory variable. The code is the following: import numpy as np import matplotlib.pyplot as plt from sklearn.linear_model import LinearRegression from...
  9. M

    A Indirect effect and spuriousity

    Say one has a regression result (ols) with significant coefficients for all independent variables. Then a new variable (Z) is added. This new variable is either something that reveals a spurious relationship among one of the initially included variables (x) and the dependent variable (y), or...
  10. R

    Conceptual Question regarding hypothesis testing regression

    Homework Statement Hi, I had a question regarding testing a regression models coefficients. Say there is a regression model that has the form: y = b0 + b1x1 + b2x2 + b3x3 + b4x4 + e For the sake of simplicity let: e be the random error, x1 is age, x2 is severity, and x3 is anxiety. y is...
  11. U

    MHB Simple question regarding linear regression model poisson

    The question: Suppose $Y$ is discrete and only takes on non-negative integers and that the conditional distribution of $Y$ given $X=x$ is Poisson, that is, $$P(Y=y|X=x) = \frac{\exp(-x'\beta) (x'\beta)^y}{y!}$$ where $y = 0, 1, 2, \cdots$. First compute $E(Y|X=x)$ and $Var(Y|X=x)$, does this...
  12. F

    A Differential of Multiple Linear Regression

    Say you have a log-level regression as follows: $$\log Y = \beta_0 + \beta_1 X_1 + \beta_1 X_2 + \ldots + \beta_n X_n$$ We're trying come up with a meaningful interpretation for changes Y due to a change in some Xk. If we take the partial derivative with respect to Xk. we end up with...
  13. M

    A Centering variables, linear regression

    I am working with multiple regression with two independent variables, and interaction between them. the expression is: y = b1x1 + b2x2 and b3x1x2 The question is: does one center both independent variables at the same time, when checking for the significance of the effect of the independent...
  14. B

    I Econometrics regression method

    I am using panel- data for 30 countries over 25 years to estimate a regression model. Expenditure on cars is my dependent variable and then I use economic theory to find some explanatory variables. First, is 30 countries and 25 years an ok sample? Or should years > countries? Second, is it an...
  15. R

    A F-test regression test, when and how?

    I am aware that f-tests can be used to check the null hypothesis when comparing regression models if the models are nested. What I am confused about is if I can apply an f-test to compare the following, (and if so what is the best way) I have two regression laws Y = a1*X1 + a2*X2 + b Y =...
  16. M

    A R2 Addition in OLS Regression: Unrelated Variables?

    Hey. I am working with OLS regression. First I run 2 regression operations with each having just one independent variable. Then I run another regression using both the independent variables from the first two regressions. If the explanatory "power" (R^2) in the third regression was to be the sum...
  17. Simoyd

    I Regression of a sine wave

    So my question is, how does this work (hopefully I'm allowed to do hyperlinks): https://www.desmos.com/calculator/zlvrts7mul Given a table of x and y coordinates, how do I find the sin wave of best fit. I need to get f (frequency), a (amplitude), and p (phase) for the function in this form f(x)...
  18. S

    Linear Least-Squares Regression: a_0 = ?

    Homework Statement Hello to everyone that's reading this. :) For this linear least-squares regression problem (typed below and also), I correctly find the value of g (which is what the problem statement wants to have found), but I was curious about the value of ##a_0## (and that's what this...
  19. A

    I Regression on extracted factors

    My initial objective is to make a regression of ##y## dependent variable on a given set of ##x_1##, ##x_2##... and ##x_m## independent variables. Suppose, I am dealing with a data set of ##n## samples, I found that the variables are correlated so I decided to do factor analysis to best represent...
  20. maistral

    A Nonlinear regression in two or more independent variables

    Hi. I wanted to learn more on this topic, but it seems all the available resources in the internet points to using R, SPSS, MINITAB or EXCEL. Is there an established numerical method for such cases? I am aware of the Levenberg-Marquardt, Gauss-Newton and such methods for nonlinear regression on...
  21. M

    I Linear regression and probability distribution

    I have some data that I want to do simple linear regression on. However I don't have a lot of datapoints, and would like to know the uncertainty in the parameters. I.e. the slope and the intercept of the linear regression model. I know it should be possible to get a prob. distribution of the...
  22. Ackbach

    MHB Linear Regression Gradient Descent: Feature Normalization/Scaling for Prediction

    Cross-posted on SE.DS Beta. I'm just doing a simple linear regression with gradient descent in the multivariate case. Feature normalization/scaling is a standard pre-processing step in this situation, so I take my original feature matrix $X$, organized with features in columns and samples in...
  23. FallenApple

    A Keeping Randomized Variable in Regression?

    So if I have a study where people are randomized within treatment sites do I always have to have the site indicator in the regression equation? This text(Homsler and Lemmeshow 2nd ed) says yes. Here is some context provided below. And here is the output for the multivariate equation after...
  24. F

    How can I use the expression for a in this problem

    Homework Statement A random sample of size ##n## from a bivariate distribution is denoted by ##(x_r,y_r), r=1,2,3,...,n##. Show that if the regression line of ##y## on ##x## passes through the origin of its scatter diagram then[/B] $$\bar y \sum^n_{r=1} x_r^2=\bar x\sum^n_{r=1} x_r y_r$$ where...
  25. U

    I Error in declination of linear regression

    During a lab exercise we measured different masses of a magnetic material on a scale while changing the strength of the magnetic field it was in. Afterwards we plotted the masses and the fieldstrength hoping to find a linear slope. Then we drew a linear slope by using linear regression and found...
  26. FallenApple

    A Leaving out a confounder in longitudinal regression?

    Say I want to analyze how a relation changes though time. Like usual, I would throw in the potential confounders into the regression model. But what if some of the confounders are not determined at the beginning of the study but at some point within time? For example, say I want to analyze the...
  27. W

    A Follow-Up on F-Test in Multi-Linear Regression

    Hi All, Say we want to linearly regress Y (dependent) against ## X_1, X_2,..., X_n ## (Independent) , all numerical variables to get a model ## Y=a_1X_1+...+a_n X_n ## . Then we test ## H_0 ## for whether : ##H_0: 0= a_1= a_2 =...=a_n ## ## H_1 : a_i \neq 0 ## for some ## i=1,2,..,n ##...
  28. M

    I Significant correlation, not significant coefficient

    A couple of questions today. First. I am running a panel data regression test. First I check the correlations between the independent variables and the dependent variable. these are the results. The D/(D+Em) is the dependent variable, and the independent are the 4 variables most adjacent...
  29. J

    MHB Mediation & Mediated Regression: Summing Effect Sizes for Overall Direction?

    Hi everyone, I am new to this forum. I hope someone could help me. I am a Psychology student and currently working as a student assistant in a research project. We are testing a mediated model, i.e. we want to see whether A's relationship to B is explainable by the indirect relationship through...
  30. FallenApple

    A Interpreting Poisson Regression Estimates across groups

    Say for example I want to see the rate of injury for firefighter vs police vs soldier. ##InjuryCount_{i}## The number of injuries recorded for the ith person over time ##T_{i} ## Time the person was followed. Varies from person to person. ##I(f)_{i}## indicator for ith person of being a...
  31. FallenApple

    A Multivariate Regression vs Stratification for confounding

    So say we want to see if child birth order causes down syndrome. Say we get that child birth order is statisitcally associated down syndrome. We know that this association could be explained by a third variable: age. Women that give birth to say their third child, is of course going to be...
  32. M

    I F-test longitudinal data

    Can i Use a standard F-test on longitudinal data for a linear multiple regression? Mons
  33. M

    I Panel study, multiple linear regression, assumptions

    Hey. I am doing a project where I am studying a set of companies over a 7-year period. I am doing a multiple linear regression analysis either with fixed or random effects (so, it's a panel study). What I am wondering is if the general assumptions/requirements apply when using the fixed/random...
  34. Ma Xie Er

    A Logistic Regression Interpretation

    I was trying to find an easy interpretation of the predicted probabilities of a logistic regression model, when one of my coworkers claimed that the logistic regression model is a likelihood. Now, I know that maximum likelihood estimation is used to estimate the parameters, but I didn't think...
  35. W

    I Is Adjusting for Weight in TEE Calculation Reliable in Regression Analysis?

    The article Energy expenditure in adults living in developing compared with industrialized countries: a meta-analysis of doubly labeled water studies has the following shocking conclusion: The authors argued that the lack of physical activities in industrialized countries had little effect on...
  36. R

    I Experimental Data - Error in slope

    I have conducted a tensile test on five specimens. I intend to do a linear regression for every set of data and get a value for the slope (modulus of elasticity) and its error by finding the standard deviation (using LINEST function on excel) of the slope. I will now end up with 5 slope values...
  37. W

    I Use of R^2 adjusted in Simple Linear Regression-Excel?

    Hi All, I am kind of confused at the fact that Excel uses the measurement of adjusted ## R^2 ## for simple linear regressions ; please see below. What does that even mean, what is the adjustment for?
  38. Dethrone

    MHB Optimizing Linear Regression Cost Function

    I'm trying to optimize the function below, but I'm not sure where I made a mistake. (This is an application in machine learning.)$$J(\theta)=\sum_{i=1}^n \left(\sum_{j=1}^{k}(\theta^Tx^{(i)}-y^{(i)})_j^2\right)$$ where $\theta$ is a $n$ by $k$ matrix and $x$ is a $n$ by 1 matrix...
  39. W

    A "Many-to-One" Mapping of Variables in Logistic Regression

    Hi all, I have logistically- regressed 3 different numerical variables ,v1,v2,v3 separately against the same variable w . All variables have the same type of S-curve (meaning, in this case, that probabilities increase as vi ; i=1,2,3 increases ). Is there a way of somehow joining the three...
  40. A

    Linear Regression with Measurement Errors

    Hello, I have a set of data, two columns, and each datum has its measurement error like illustration shows below: x | y --------------|----------------- x1+/-xe1 | y1+/-ye1 . | . . | ...
  41. iCloud

    A Need help with regression modeling in Minitab?

    Hi! I am trying to do a project under regression, trying to show how the enrollment (male and female) in college is affected by unemployment (of male and female), Interest rates. I am using Excel and have also tried Minitab but I am unsure how do I factor in the dummy variables (1 or 0) for the...
  42. D

    A Nonlinear regression which can be partially reduced to linear regression

    I encountered several times the following problem: Say I have a variable y dependent in a nonlinear way on m parameters ##\{x_i\}##, with ##i \in \{1,m\}##. However there is a linear relation between n>m functions ##f_j\in{x_i}##, i.e., ##y=\sum_j z_j f_j##. So I can get a solution of my problem...
  43. iCloud

    A Regression analysis and Time Series decomposition

    If we can use Regression analysis to forecast, why do we use “Time Series Decomposition”? What's the difference between the 2? Thanks
  44. T

    I Linear regression on data collection error

    Hi I've collected few sets of data and obtained significant different linear regression (R^2) in 2 particular sets of data . Does that indicates the 2 sets of data is not validated which might due to data collection error? For example, 20 sets of data contain linear regression of 0.900+...
  45. L

    I Help with a difficult (?) regression

    I wonder if someone can please help me understand why a nonlinear regression I'm attempting doesn't work, and suggest how I can tackle it. It's based on this equation: (1-y) \cdot (D \cdot y+K) = n \cdot x \cdot y where x is the independent variable, a real number usually between 6400 and...
  46. R

    Changing sign of data (regression)

    Homework Statement I was trying to fit experimental data which are theoretically modeled by an exponential function. They are shown in the first graph below. For the first set (red line), the data does not appear to be exponential. My lecturer once said that I need to flip the signs of the...
  47. petrushkagoogol

    B Regression of Universe: Is Time Repeatable?

    If every particle of the Universe was made to occupy the state at which it was 2 years ago, would there be an action replay of events from (T-2) yrs to T ? Or would subsequent events be different ?:rolleyes:
  48. N

    MHB Regression analysis/optimization

    Hello everyone! I just need some suggestions about a topic for our case study. The case study should be about the application of regression analysis or optimization. Where we gather data or do a simple experiment and at the end make a conclusion out of it using regression analysis or...
  49. W

    I Lack of Fit in Ordinal Regression -- Analysis/Alternatives?

    Hi All, I ran a binary logistic of Y on three different numerical variables A,B,C respectively. I am having an issue of separation of variables with all of them, meaning that there are values Ao,Bo, Co for each of A,B,C (different values for each, of course) so that for ## A>Ao, B>Bo...
  50. W

    I Are there Issues with Separation of Values in Ordinal Logistic Regression

    Hi all , just curious if someone knows of any issues of Separation of Points in Ordinal 3-valued Logistic Regression. I think I have an idea of why there are issues with separation in binary Logistic -- the need for the S-curve to go to 0 quickly makes the Bo term go to infinity. Are there...
Back
Top