Linear regression Definition and 118 Threads

In statistics, linear regression is a linear approach to modelling the relationship between a scalar response and one or more explanatory variables (also known as dependent and independent variables). The case of one explanatory variable is called simple linear regression; for more than one, the process is called multiple linear regression. This term is distinct from multivariate linear regression, where multiple correlated dependent variables are predicted, rather than a single scalar variable.In linear regression, the relationships are modeled using linear predictor functions whose unknown model parameters are estimated from the data. Such models are called linear models. Most commonly, the conditional mean of the response given the values of the explanatory variables (or predictors) is assumed to be an affine function of those values; less commonly, the conditional median or some other quantile is used. Like all forms of regression analysis, linear regression focuses on the conditional probability distribution of the response given the values of the predictors, rather than on the joint probability distribution of all of these variables, which is the domain of multivariate analysis.
Linear regression was the first type of regression analysis to be studied rigorously, and to be used extensively in practical applications. This is because models which depend linearly on their unknown parameters are easier to fit than models which are non-linearly related to their parameters and because the statistical properties of the resulting estimators are easier to determine.
Linear regression has many practical uses. Most applications fall into one of the following two broad categories:

If the goal is prediction, forecasting, or error reduction, linear regression can be used to fit a predictive model to an observed data set of values of the response and explanatory variables. After developing such a model, if additional values of the explanatory variables are collected without an accompanying response value, the fitted model can be used to make a prediction of the response.
If the goal is to explain variation in the response variable that can be attributed to variation in the explanatory variables, linear regression analysis can be applied to quantify the strength of the relationship between the response and the explanatory variables, and in particular to determine whether some explanatory variables may have no linear relationship with the response at all, or to identify which subsets of explanatory variables may contain redundant information about the response.Linear regression models are often fitted using the least squares approach, but they may also be fitted in other ways, such as by minimizing the "lack of fit" in some other norm (as with least absolute deviations regression), or by minimizing a penalized version of the least squares cost function as in ridge regression (L2-norm penalty) and lasso (L1-norm penalty). Conversely, the least squares approach can be used to fit models that are not linear models. Thus, although the terms "least squares" and "linear model" are closely linked, they are not synonymous.

View More On Wikipedia.org
  1. F

    I Exploring OLS and GLM Models: Understanding the Link Function and Coefficients

    Hello, I know this is a big topic but I would like to check that what I understand so far is at least correct. Will look more into it. GLM is a family of statistical models in which the coefficients betas are "linear". The relation between ##Y## and the covariates ##Xs## can be nonlinear (ex...
  2. F

    I Linear Regression, Population, Sample

    Hello, 1) Let's consider a population of 1,000,000 data points with each data point being represented by the pair of values (x,y). Let's assume that, when plotted on a graph, the 1,000,000 points look like a spread out cloud with an overall positive linear trend. These 1,000,000 points...
  3. F

    Statistical significance of a ML model...

    Hello, How do we check if a ML model is statistically significant? For models like linear regression, logistic regression, etc. there are tests (t-tests, F-tests, etc.) that will tell us if the model, trained on some dataset, is statistically significant or not. But in the case of ML models...
  4. F

    I Coefficient sign flip in linear regression with correlated predictors

    Hello Forum, I have read about an interesting example of multiple linear regression (https://online.stat.psu.edu/stat501/lesson/12/12.3). There are two highly correlated predictors, ##X_1## as territory population and ##X_2## as per capita income with Sales as the ##Y## variable. My...
  5. M

    Use linear regression to find Planck’s constant

    I am trying to find Planck's constant using Excel given the data: Frequency [Hz] Photon Energy [J] 7.5E+14 4.90E-19 6.7E+14 4.50E-19 6E+14 4.00E-19 5.5E+14 3.60E-19 5E+14 3.30E-19 4.6E+14 3.00E-19 4.3E+14 2.80E-19 4E+14 2.65E-19 3.75E+14 2.50E-19 I am using Linear...
  6. F

    I Linear regression, feature scaling, and regression coefficients

    Hello, In studying linear regression more deeply, I learned that scaling play an important role in multiple ways: a) the range of the independent variables ##X## affects the values of the regression coefficients. For example, a predictor variable ##X## with a large range typically get assigned...
  7. F

    I Expected coefficient change from simple to multiple linear regression

    Hello forum, I have created some linear regression models based on a simple dataset with 4 variables (columns). The first models simply involve one predictor variable: $$Y=\beta_1 X_1+\beta_0$$ and $$Y=\beta_2 X_2+ \beta_0$$ The 3rd model is multiple linear regression model involving the 3...
  8. F

    I Linear regression and random variables

    Hello, I have a question about linear regression models and correlation. My understanding is that our finite set of data ##(x,y)## represents a random sample from a much larger population. Each pair is an observation in the sample. We find, using OLS, the best fit line and its coefficients and...
  9. F

    I Can Alternative Least Squares Methods Be Used in Linear Regression?

    Hello, Simple linear regression aims at finding the slope and intercept of the best-fit line to for a pair of ##X## and ##Y## variables. In general, the optimal intercept and slope are found using OLS. However, I learned that "O" means ordinary and there are other types of least square...
  10. shivajikobardan

    Comp Sci Can't we use linear regression for classification/prediction?

    they say that linear regression is used to predict numerical/continuous values whereas logistic regression is used to predict categorical value. but i think we can predict yes/no from linear regression as well Just say that for x>some value, y=0 otherwise, y=1. What am I missing? What is its...
  11. W

    I Bias in Linear Regression (x-intercept) vs Statistics

    Hi, In simple regression for machine learning , a model : Y=mx +b , Is said AFAIK, to have bias equal to b. Is there a relation between the use of bias here and the use of bias in terms of estimators for population parameters, i.e., the bias of an estimator P^ for a population parameter P is...
  12. S

    MHB Calculating probability from linear regression parameters

    I'm a bit stuck on how to calculate the probability in part b from the linear regression parameters. I tried plugging the parameter values into the linear regression model: Y =β0+β1X+ε, ε∼N(0,σ) So P(Y=y| X=40) = 2.85 + 0.07 * 40 + 1^2 P(Y=y|X=40) = 5.65 But I don't think this is the...
  13. yecko

    Comp Sci Linear regression doubling time

    y=mx+c by z=log y axis m=9-2.5 / 35 = 13/70 z=13/70 * (t-1983) + 2.5 log y = 13/70 * (t-1983) + 2.5 # doubling time: t1=y1, t2=y2=2y1 log y1 = 13/70 * (t1-1983) + 2.5 ---{1} log (2*y1) = 13/70 * (t2-1983) + 2.5 ---{2} {2}-{1}: log2=13/70 * [(t2)-(t1)] [(t2)-(t1)] = log2/(13/70) # for log...
  14. F

    B Limitations of Multivariate Linear Regression

    Hello, With multivariate linear regression, there is a single dependent variable ##y## and multiple independent variables ##x_1##, ##x_2##, ##x_3##, etc. There is a linear, weighted relationship between ##y## and the various ##x## variables: $$ y = c_1 x_1 + c_2 x_2 + c_3 x_3 $$ The...
  15. S

    I Why linear in linear regression?

    In linear regression, one estimates parameters that are supposed to be linear with respect to the dependent variable, for instance ##y=\theta_0 e^x+\epsilon \ ,## or ##y=\theta_0+\theta_1 x_1+\theta_2x_2+...+\theta_n x_n+\epsilon \ . ## Is it not true that neither ##y(\theta_0)## nor...
  16. K

    I Linear regression with error term

    I'm not a statistician, but this has been bothering me for a bit. Suppose we have the simple model Y= aX + b + U where Y,X and U are taken to be random variables representing the explanatory variable, the independent variable and the error term respectively. In the case of a stochastic...
  17. W

    Python Getting this Array to be in 2D instead of 1D for Python Linear Regression

    import matplotlibimport matplotlib.pyplot as plt import numpy as np from sklearn import datasets, linear_model import pandas as pd # Load CSV and columns df = pd.read_csv("C:\Housing.csv") Y = df['price'] X = df['lotsize'] # Split the data into training/testing sets X_train = X[:-250] X_test =...
  18. Vital

    I Least squares line - understanding formulas

    Hello. I have listened to a great lecture, which gave helpful intuitive insight into correlation and regression (basic stuff). But there are formulas, which I cannot grasp intuitively and don't know their origin. To remember them I would like to understand what's happening in each part of the...
  19. synMehdi

    I Linear least squares regression for model matrix identification

    Summary: I need to Identify my linear model matrix using least squares . The aim is to approach an overdetermined system Matrix [A] by knowing pairs of [x] and [y] input data in the complex space. I need to do a linear model identification using least squared method. My model to identify is a...
  20. M

    A "Population-averaged"regression on panel data using Stata

    Hey. I am running regression on panel data. I test different approaches using Stata. When using "population-averaged" no squared R measures are reported. The approach is equal to running a regular linear regression on the panel data, and according to my professor, a squared R is statistically...
  21. A

    I Statistics proof: y = k x holds for a data set

    Simple linear regression statistics: If I have a linear relation (or wish to prove such a relation): y = k x where k = constant. I have a set of n experimental data points ...(y0, x0), (y1, x1)... measured with some error estimates. Is there some way to present how well the n data points shows...
  22. FallenApple

    A Continuous output: logistic vs linear regression

    so say I suspect that there is a positive trend in the data from the scatter plot. Say the output y is continuous. A linear regression would give me a possitive estimate of the slope. For a one unit increase in x, I would get a so and so increase in y. I can also split the data for the y...
  23. avner yakov

    A Error estimation in linear regression

    I have a data set of 11 predictors and one response for 1000 observation and i want to do linear regression. I also have measurements errors of the predictors (also 11X1000 matrix) and i need to count for them in the total error estimation. how can i do that?
  24. T

    I ANOVA and Linear Regression Resource

    Hello, Can someone please let me know of a resource (book or other) that explains how to use ANOVA in linear regression? I didn't even know what ANOVA was until some days ago so I'm looking for something that explains it thoroughly with deductions. The resources I've read focused solely on...
  25. CopyOfA

    A Linear regression with discrete independent variable

    Hey, I have a problem where I have a discrete independent variable (integers spanning 1 through 27) and a continuous dependent variable (50 data points for each independent variable). I am wondering about the best method of regression here. Should I just fit to the mean or median? Is there a way...
  26. W

    Linear Regression, etc : Standard vs ML Techniques

    Hi All, This is probably trivial. What is the difference between techniques such as Linear/Logistic Regression, others done in ML , when they are done in Standard software packages : Excel ( incl. add-ons ), SPSS, etc? Why use ML algorithms when the same can be accomplished with standard...
  27. T

    Simple linear regression model

    Homework Statement For the first question , why when the humidity increase by 1 percent , the moisture content will increase by 0.2727 percent ? Shouldn't it the moisture content will increase by 0.2727 + 0.4911 percent when the humidity increase by 1 percent ? Second question , it's clear that...
  28. U

    MHB Simple question regarding linear regression model poisson

    The question: Suppose $Y$ is discrete and only takes on non-negative integers and that the conditional distribution of $Y$ given $X=x$ is Poisson, that is, $$P(Y=y|X=x) = \frac{\exp(-x'\beta) (x'\beta)^y}{y!}$$ where $y = 0, 1, 2, \cdots$. First compute $E(Y|X=x)$ and $Var(Y|X=x)$, does this...
  29. F

    A Differential of Multiple Linear Regression

    Say you have a log-level regression as follows: $$\log Y = \beta_0 + \beta_1 X_1 + \beta_1 X_2 + \ldots + \beta_n X_n$$ We're trying come up with a meaningful interpretation for changes Y due to a change in some Xk. If we take the partial derivative with respect to Xk. we end up with...
  30. M

    A Centering variables, linear regression

    I am working with multiple regression with two independent variables, and interaction between them. the expression is: y = b1x1 + b2x2 and b3x1x2 The question is: does one center both independent variables at the same time, when checking for the significance of the effect of the independent...
  31. M

    I Linear regression and probability distribution

    I have some data that I want to do simple linear regression on. However I don't have a lot of datapoints, and would like to know the uncertainty in the parameters. I.e. the slope and the intercept of the linear regression model. I know it should be possible to get a prob. distribution of the...
  32. Ackbach

    MHB Linear Regression Gradient Descent: Feature Normalization/Scaling for Prediction

    Cross-posted on SE.DS Beta. I'm just doing a simple linear regression with gradient descent in the multivariate case. Feature normalization/scaling is a standard pre-processing step in this situation, so I take my original feature matrix $X$, organized with features in columns and samples in...
  33. U

    I Error in declination of linear regression

    During a lab exercise we measured different masses of a magnetic material on a scale while changing the strength of the magnetic field it was in. Afterwards we plotted the masses and the fieldstrength hoping to find a linear slope. Then we drew a linear slope by using linear regression and found...
  34. M

    I Panel study, multiple linear regression, assumptions

    Hey. I am doing a project where I am studying a set of companies over a 7-year period. I am doing a multiple linear regression analysis either with fixed or random effects (so, it's a panel study). What I am wondering is if the general assumptions/requirements apply when using the fixed/random...
  35. Dethrone

    MHB Optimizing Linear Regression Cost Function

    I'm trying to optimize the function below, but I'm not sure where I made a mistake. (This is an application in machine learning.)$$J(\theta)=\sum_{i=1}^n \left(\sum_{j=1}^{k}(\theta^Tx^{(i)}-y^{(i)})_j^2\right)$$ where $\theta$ is a $n$ by $k$ matrix and $x$ is a $n$ by 1 matrix...
  36. A

    Linear Regression with Measurement Errors

    Hello, I have a set of data, two columns, and each datum has its measurement error like illustration shows below: x | y --------------|----------------- x1+/-xe1 | y1+/-ye1 . | . . | ...
  37. D

    A Nonlinear regression which can be partially reduced to linear regression

    I encountered several times the following problem: Say I have a variable y dependent in a nonlinear way on m parameters ##\{x_i\}##, with ##i \in \{1,m\}##. However there is a linear relation between n>m functions ##f_j\in{x_i}##, i.e., ##y=\sum_j z_j f_j##. So I can get a solution of my problem...
  38. T

    I Linear regression on data collection error

    Hi I've collected few sets of data and obtained significant different linear regression (R^2) in 2 particular sets of data . Does that indicates the 2 sets of data is not validated which might due to data collection error? For example, 20 sets of data contain linear regression of 0.900+...
  39. E

    I If you were to perform a linear regression of log10(B) vs log10(x)....

    If you were to perform a linear regression of log10(B) vs log10(x) what would you expect the slope to be? The expected relationship between B and x is B(x) = μoI(2πx)-1
  40. J

    A Linear Regression with Non Linear Basis Functions

    So I am currently learning some regression techniques for my research and have been reading a text that describes linear regression in terms of basis functions. I got linear basis functions down and no exactly how to get there because I saw this a lot in my undergrad basically, in matrix...
  41. Josh Terrill

    B Linear regression with two data sets?

    I want to try to predict the USA summer highs using a linear regression. I know I can probably take data from the last 10 summers and plug that in, and use that to predict, but I'd like to use two data sources. 1 data source from the historical highs from past summers in the USA, and the 2nd...
  42. R

    Obtaining standard deviation of a linear regression intercep

    Hello, I have an experiment that I'm trying to conduct where I measure quantity A and normalize by quantity B. I then want to report normalized quantity A with error bars showing standard deviation. Quantity B is obtained via a standard curve that I generated (8 data points measured once each...
  43. W

    Deciding Diminishing Returns based on Data (Regression)

    Hi All, I am thinking of the issue of diminishing returns re linear regression. Can it be determined/decided from the data itself, or is it decided just from the context? I was thinking of examples like that of grade vs daily study hours or (height )jump length vs year ( winner heights have...
  44. W

    Linear Regression with Many y for each x

    Hi, Say we collect data points ##(x_i,y_j)## to do a linear regression, but so that for each ##x_i ## we collect values ##y_{i1}, y_{i2},...,y_{ij} ## . Is there a standard way of doing linear regression with this type of dataset? Would we, e.g., average the ##y_{ij}## abd define it to be ##...
  45. J

    The linear in linear least squares regression

    It is my understanding that you can use linear least squares to fit a plethora of different functions (quadratic, cubic, quartic etc). The requirement of linearity applies to the coefficients (i.e B in (y-Bx)^2). It seems to me that I can find a solution such that a coefficient b_i^2=c_i, in...
  46. X

    Understanding multivariate linear regression

    I am trying to understand multivariate linear regression. I have a list of time that it took running processes based on several params, like % of cpu usage, and data read. Eg, I have a process that took 50 seconds to run, with a cpu usage of 70%, and the process read 10bytes of data. I have...
  47. C

    Multiple linear regression

    I am doing a multiple linear regression on a dataset. It is test scores. It has three highly correlated variables being income, reading score, and math score. Obviously since the test score is the sum of the math score and reading score would it be appropriate to exclude them simply based off...
  48. M

    Linear regression and measured values

    So I'm trying to identify a system that happens to be a synchronus generator via linear regression. I've got a model with the unknown coefficients A, B and C, and the measured variables I, w and T according to I(w, T) = A*T + B*w + C 1. What I fear is that I could get multiple solutions that...
  49. F

    Is the correlation coefficient significant in this data set?

    I also made a graph which is not pictured. 1.) Calculate the least squares line. Put the equation in the form of: y-hat = a + bx. I got: y hat = 11.304 + 106.218x a.) Find correlation coefficient. Is it significant? (use the p-value to decide) I got: r = 0.913... no it...
  50. E

    Regression Analysis of Tidal Phases

    I have some 3-D model output for a river system that is tidally forced at the entrance. Right now, I'm trying to perform some linear regression on the harmonic constants of various tidal constituents at for several locations along the river compared to the observed tidal data. A linear...
Back
Top