What is Regression: Definition and 359 Discussions

In statistical modeling, regression analysis is a set of statistical processes for estimating the relationships between a dependent variable (often called the 'outcome variable') and one or more independent variables (often called 'predictors', 'covariates', or 'features'). The most common form of regression analysis is linear regression, in which one finds the line (or a more complex linear combination) that most closely fits the data according to a specific mathematical criterion. For example, the method of ordinary least squares computes the unique line (or hyperplane) that minimizes the sum of squared differences between the true data and that line (or hyperplane). For specific mathematical reasons (see linear regression), this allows the researcher to estimate the conditional expectation (or population average value) of the dependent variable when the independent variables take on a given set of values. Less common forms of regression use slightly different procedures to estimate alternative location parameters (e.g., quantile regression or Necessary Condition Analysis) or estimate the conditional expectation across a broader collection of non-linear models (e.g., nonparametric regression).
Regression analysis is primarily used for two conceptually distinct purposes. First, regression analysis is widely used for prediction and forecasting, where its use has substantial overlap with the field of machine learning. Second, in some situations regression analysis can be used to infer causal relationships between the independent and dependent variables. Importantly, regressions by themselves only reveal relationships between a dependent variable and a collection of independent variables in a fixed dataset. To use regressions for prediction or to infer causal relationships, respectively, a researcher must carefully justify why existing relationships have predictive power for a new context or why a relationship between two variables has a causal interpretation. The latter is especially important when researchers hope to estimate causal relationships using observational data.

View More On Wikipedia.org
  1. T

    Linear regression with asymmetric error bars

    I've been trying to figure out how to do a linear regression on data with asymmetric x and y error bars (different for each data point). Any help would be much appreciated.
  2. A

    Multiple Regression: Steps for Variable Selection

    Hey everyone, Does anyone know the steps that minitab or any other stats software does to eliminate variables when doing forward selection, backward elimination, or standard(general) stepwise regression and best subsets regression on a set of data. I'm not sure if the correlation coefficients...
  3. R

    Calculating m and b for Logarithmic Regression with Small Data Set

    I have a very small set of data. Usually 3 points, sometimes 4. Best fit is a logarithmic equation y=m*Ln(x)+b How can I obtain m and b?
  4. T

    Stats: Simple Linear Regression

    Homework Statement [PLAIN]http://img822.imageshack.us/img822/4421/statsii.jpg The Attempt at a Solution Done parts (a) and (b). How do I do parts (c) and (d)? Is the simple linear regression model just Y_i=\beta_0+\beta_1 X_i + \varepsilon_i where \varepsilon_i \stackrel...
  5. M

    Variance analysis and regression

    Homework Statement Assume a one way variance analysis model on the form: Y_{ij} = \mu + \alpha_{i} + e_{ij} where e_{ij} independent with expectation 0 and constant variance z_{ijl} = \left\{ \begin{array}{rcl} 1 & \mbox{for} & 1 \\ 0 & \mbox{else} \end{array}\right show that: a)...
  6. M

    Understanding the Relationship Between One-Way ANOVA and Regression Models

    Homework Statement Assume a one way anova model: Y_{ij} =\my + \alpha_i + e_{ij} where e are independent, normal distributed with variance sigma and expectation = 0 Define: z_{ijl} = 1 if i = l, and 0 else Show that: Y_{ij} = \mu + \sum_{i=1}^I \alpha_l z_{ijl} + e_{ij} Homework...
  7. brainpushups

    Uncertainty in the parameters A and B for a linear regression

    I'm working through John Taylor's An Introduction to Error Analysis and so far this is the only problem I haven't been able to solve. I was hoping someone could lend me some insight. The problem asks you to use error propagation to verify that the uncertainties in A and B for a line of the...
  8. M

    Nonlinear Regression: Best Software to Calculate a & b

    Hello. Can anyone tell me what would be the best software to use for the nonlinear regression? I have an equation with two unknown parameters which I would like to find out. Equation looks something like this: f(x)=a*b*x/(1+b*x), where x and f(x) are known (several points) and I would like to...
  9. T

    Least square linear regression

    In the least square linear regression, say we have y=Xb+e (y,b,e are vector and X is matrix, y is observations, b is coefficient, e is error term) so we need to minimize e'e=(y-Xb)'(y-Xb)=y'y-y'Xb-b'X'y+b'X'Xb we can take the derivative of e'e to b, and we can get the answer is 0-2X'y+2X'Xb...
  10. E

    Multiple linear regression + QQplots problem Includes pics

    I want to do multiple linear regression, but one of the requirements is the residuals to be normally distributed, and I can check that with QQplots but then the QQ plot shows it is about 95% of data fit into the normal line, but 5% is way off! can I still proceed ?*or do I have to find a way...
  11. M

    (regression) why would you exclude an explanatory variable

    If someone is interested in modelling data on households to find out whether there is discrimination in the workplace why would they ever leave out variables which are relevant to explaining the dependent variable but not so relevant to the investigation? e.g. let's say they survey age, race...
  12. L

    Need advice applying a logistic regression to model school data

    To determine whether schools are increasing the amount of students meeting or exceeding standards, I obtained a MEAP database reporting the amount of students that scored in the met or exceed standards range during years 2005-2009, for grades 3-5. I then graphed the data for using numbers...
  13. G

    How to correct regression towards the means

    Does anybody of you know about this problem and how to compensate for it?
  14. C

    Multiple Linear Regression (2 factors, 1 output)

    Hello all; I'm doing work for my job, and I've forgotten my statistics =(. I first want to know if what I'm trying to do is possible. I want to create a linear regression of the form Y = a * x1 + b * x2 + c. http://imgur.com/Q4vGP" As you can see, there is space that is grayed...
  15. majormuss

    Finding Y1 on the TI-83 Plus Calculator for Linear Regression

    Homework Statement Where can I find Y(subscript: 1) on the TI-83 plus calculator? I was working on Linear Regressions in the book and I came across a question that requires me to input Y1 on the screen but I can't find the 'y'.. how can I find it? Homework Equations The Attempt at...
  16. T

    How can I solve this Least Squares Regression problem?

    Homework Statement http://img683.imageshack.us/img683/4744/leastsquares.jpg [PLAIN][PLAIN]http://img149.imageshack.us/img149/4793/graphwd.jpg Homework Equations The Attempt at a Solution So would these be the points? (-41,51),(-22,62),(23,63),(44,24) I'm not too sure how...
  17. K

    MATLAB Multiple Regression with four variables? Also in MATLAB

    I am not sure if Multiple regression is what I want. I essentially have a function Y that is a function of three independent variables. I have a collection of points which have given Y-outputs. So when given the inputs (1, 30, 60), Y = 5. When given (1, 30, 210), Y=8. When given (2, 70...
  18. B

    Linear regression and high correlation problems

    Hi guys, I have data of 20 peoples height, weight, calorie intake and skinfold thickness. I have carried out a regression of calorie on height, on weight and on height and weight. I have done the same thing for skinfold thickness. I then used R to work out the summary of results. each...
  19. T

    Multivariate linear regression

    I have a model y= beta0 + beta1 x1 + beta2 x2 + eps, eps~N(0,1). How to test hypothesis beta1=0 ? Is the same test for beta2=0?
  20. maverick280857

    Logistic Regression with Dummy Variables as regressors?

    Hi Is it possible to use both dummy variable regressors as well as categorical response variables in the same model? Consider the following model from cricket (you don't need to know the game to answer this question): Y = mode of dismissal (0 = not out, 1 = bowled, 2 = caught, 3 =...
  21. D

    Problem at Work with Regression

    Hi I've been tasked with making a sort of sensitivity analysis tool. The goal was to get as many parameters as possible and then use them to build a model that would allow users to change variables a bit and see what happens to the one dependent variable. So I used a multi variable...
  22. J

    Simple Logistic Regression

    Hello all, I've performed a simple logistic regression and could use some help in interpretations/minor calculations. Five different doses of insecticide were applied under standardized conditions to samples of an insect species. The data were: Dose: ---2.6---3.8---5.1---7.7---10.2--- Dead...
  23. D

    Linear and Non-Linear Regression

    Homework Statement Well I have conducted a regression analysis, and have plotted a graph of my residuals vs my input variable. I know that for a linear regression the residuals must be randomly scattered blahblah, but from this graph, it looks like it has a curve to it ? is this because I only...
  24. maverick280857

    Coefficient of Determination in case of repeat points, in linear regression

    Hello, In simple linear regression (or even in multiple linear regression) how does one prove that the coefficient of determination, given by R^2 = \frac{SS_{Reg}}{SS_{Total}} = 1-\frac{SS_{Res}}{SS_{Total}}= 1-\frac{\sum_{i=1}^{n}(y_i-\hat{y}_i)^2}{\sum_{i=1}^{n}(y_i-\overline{y})^2} is...
  25. R

    Multivariate regression for project cost estimates

    Hi All, Hope this is the right sub-forum for this. I currently work for an engineering consultancy and am doing some project management what for what are essentially site investigation reports (sites are quite small, typically a few acres/hectares). Our clients are mostly developers who have...
  26. L

    Propagating Measurement Uncertainty into a Linear Regression Model

    I am trying to figure out how to combine uncertainty (in x and y) into the standard error of the best fit line from the linear regression for that dataset. I am plotting units of concentration (x) versus del t/height (y) to get a value for the flux (which is the slope) I understand how...
  27. S

    Non Linear Regression Initial Guesses

    Hi: This is my first post and I'm not sure if this is the right forum. Please redirect if necessary. I'm new to nonlinear regression, but from what I've read I realize that making "good" initial guesses for the model parameters is very important, otherwise a "best fit" may not result...
  28. M

    Always positive function with regression

    I'm attempting to solve a multiple regression. My problem is that I want the resultant function to be always positive. I need a regression of the machining cutting force for several values of cutting parameters (cutting speed, depth of cut, ...). The cutting force has to be always positive...
  29. D

    A question about uncertainty using a regression line and experimental data

    Homework Statement I'm new to uncertainty, so I needed a little assistance. The circuit in this experiment was simple, a dry cell connected to a variable resistor. I used an ammeter on the circuit and used a voltmeter across the resistor and took some measurments. The experiment was to...
  30. C

    Converting a sin regression to cos regression

    Homework Statement convert a*sin(bx+c)+d to (mx+d)+a*cosb(x-c) Homework Equations The Attempt at a Solution
  31. M

    MATLAB Matlab: Linear Regression - Get Variance of Slope & Y-Component

    i need to get the variance of the slope and the y-component for the fitted line of a given set od data... please,someone help me,it is beyond me
  32. W

    Regression strategy for hypothesis testing vs prediction

    Hi, I'm interested in this question primarily in how it relates to what is usually called classic or frequentist statistics (p-values, etc). I'm fully aware of how stepwise and other automated techniques can negatively impact analyses and render inference based upon parameter estimates...
  33. N

    F90/C Weighted Linear Regression Code

    Hi Guys, Can anyone recommend a code, preferably in fortran 90/77, or possibly in C, which provides a weighted linear regression of a 3 column file? Natski
  34. T

    Formula for closest distance to regression line

    i need to calculate the closest distance of the point that lies closest to the regression line for my programing but i am not sure what is the formula. maybe someone can help me out here? thanks in advance
  35. M

    How Does Firm Age Influence Growth When Evaluated at Mean Values?

    Homework Statement Evaluate the partial effect of age of a firm on growth.(evaluated at the means) Homework Equations Growth=\beta0+\beta1age+\beta2age^2+\beta3size*age+\beta4plant*age The Attempt at a Solution We're supposed to do something like this...
  36. A

    Standard error for marginal effect in regression?

    Hi, I have a regular OLS regression that includes a variable both by itself and squared (i.e. y=b0+b1*x+b2*x^2). I am interested in the marginal effect of the variable at the mean. I know how to get the point estimate, but does anyone know how to get standard errors for the marginal effect...
  37. B

    How to calculate regression of a logistic curve

    i have some data and want to find a logistic cruve regression line is there are formula i can apply to find the regression line from the data?
  38. P

    Conditional expectation and Least Squares Regression

    Hello everybody, I have two questions on conditional expectation w.r.t (Polynomial) OLS: Let X_t be a random variable and F_t the associated filtration, Vect_n{X_t} the vector space spanned by the polynomials of order {i, i<=n }, f(.) one function with enough regularity. I am wondering how...
  39. K

    Sum of the residuals in multiple linear regression

    In my textbook, the following results are proved in the context of SIMPLE linear regression: ∑e_i = 0 ∑(e_i)(Y_i hat)= 0 I tried to modify the proofs to mutliple linear regression, but I am unable to do so, so I am puzzled... Are these results also true in MULTIPLE linear regression...
  40. K

    Regression SS in multiple linear regression

    In MULTIPLE linear regression, is it still true that the regression sum of squares is equal to ∑ (Y_i hat -Y bar)^2 ? My textbook defines regression SS in the chapters for simple linear regression as ∑ (Y_i hat -Y bar)^2, and then in the chapters for multiple linear regression, the...
  41. M

    OLS regression - using an assumption as the proof?

    Hi, My question is about a common procedure used to find minimum and maximum values of a function. In many problems we find the first derivative of a function and then equate it to zero. I understand the use of this method when one is trying to find the minimum or maximum value of the...
  42. K

    Multiple linear regression: partial F-test

    "Suppose that in a MULTIPLE linear regression analysis, it is of interest to compare a model with 3 independent variables to a model with the same response varaible and these same 3 independent variables plus 2 additional independent variables. As more predictors are added to the model, the...
  43. J

    Performing Regression Analysis on Excel: Two Methods

    Does anyone know how to do a regression analysis on excel? I need to find the correlation coefficient and the coefficient of determination and I only know how to do that with a graphing calculator TI 83+
  44. B

    Question about linear regression and sample sizes

    Consider this situation. There is an exam designed in such a way that it appears that the pass/failure rate of the exam has a linear relationship to the age of the exam taker. The older the test taker, the higher the pass rate. I'm not interested in the exact scores of the exam, only pass or...
  45. K

    Linear Regression: reversing the roles of X and Y

    Simple linear regression: Y = β0 + β1 *X + ε , where ε is random error Fitted (predicted) value of Y for each X is: ^ Y = b0 + b1 *X (e.g. Y hat = 7.2 + 2.6 X) Consider ^ X = b0' + b1' *Y [the b0,b1,b0', and b1' are least-square estimates of the β's] Prove whether or not...
  46. O

    How Can I Perform Sinosoidal Regression in Excel or MAPLE?

    Homework Statement We did a lab that involved using sound probes that gave us a bunch of data points which we were able to plot into a waveform using excel. However, we have to determine the equation of the waveform and it is expected that we include the uncertainty of the coefficients (much...
  47. O

    Maple Sound Wave Regression Lab | Excel & MAPLE Solutions

    We did a lab that involved using sound probes that gave us a bunch of data points which we were able to plot into a waveform using excel. However, we have to determine the equation of the waveform and it is expected that we include the uncertainty of the coefficients (much like using the LINEST...
  48. B

    Simple least squares regression problem. Am I doing anything wrongly?

    Least squares regression of Y on A-D based on sample size of 506. Reported results with standard errors are: Y = 11.08 - 0.954*A - 0.134*B + 0.255*C - 0.052*D s.errs (0.32) (0.117) (0.043) (0.019) (0.006) R^2 = 0.581 problem A. Test null that coefficient on D is equal to 0 d =...
  49. B

    Simple least squares regression problem. Am I doing anything wrongly?

    Least squares regression of Y on A-D based on sample size of 506 Y = 11.08 - 0.954*A - 0.134*B + 0.255*C - 0.052*D s.errs (0.32) (0.117) (0.043) (0.019) (0.006) R^2 = 0.581 problem A. Test null that coefficient on D is equal to 0 d = coefficient on D null: D ~ N(0, 0.006) Pr(d...
  50. B

    Simple least squares regression problem. Am I doing anything wrongly?

    Least squares regression of Y on A-D based on sample size of 506 Y = 11.08 - 0.954*A - 0.134*B + 0.255*C - 0.052*D s.errs (0.32) (0.117) (0.043) (0.019) (0.006) R^2 = 0.581 problem A. Test null that coefficient on D is equal to 0 d = coefficient on D null: D ~ N(0...
Back
Top