What is Linear regression: Definition and 118 Discussions
In statistics, linear regression is a linear approach to modelling the relationship between a scalar response and one or more explanatory variables (also known as dependent and independent variables). The case of one explanatory variable is called simple linear regression; for more than one, the process is called multiple linear regression. This term is distinct from multivariate linear regression, where multiple correlated dependent variables are predicted, rather than a single scalar variable.In linear regression, the relationships are modeled using linear predictor functions whose unknown model parameters are estimated from the data. Such models are called linear models. Most commonly, the conditional mean of the response given the values of the explanatory variables (or predictors) is assumed to be an affine function of those values; less commonly, the conditional median or some other quantile is used. Like all forms of regression analysis, linear regression focuses on the conditional probability distribution of the response given the values of the predictors, rather than on the joint probability distribution of all of these variables, which is the domain of multivariate analysis.
Linear regression was the first type of regression analysis to be studied rigorously, and to be used extensively in practical applications. This is because models which depend linearly on their unknown parameters are easier to fit than models which are non-linearly related to their parameters and because the statistical properties of the resulting estimators are easier to determine.
Linear regression has many practical uses. Most applications fall into one of the following two broad categories:
If the goal is prediction, forecasting, or error reduction, linear regression can be used to fit a predictive model to an observed data set of values of the response and explanatory variables. After developing such a model, if additional values of the explanatory variables are collected without an accompanying response value, the fitted model can be used to make a prediction of the response.
If the goal is to explain variation in the response variable that can be attributed to variation in the explanatory variables, linear regression analysis can be applied to quantify the strength of the relationship between the response and the explanatory variables, and in particular to determine whether some explanatory variables may have no linear relationship with the response at all, or to identify which subsets of explanatory variables may contain redundant information about the response.Linear regression models are often fitted using the least squares approach, but they may also be fitted in other ways, such as by minimizing the "lack of fit" in some other norm (as with least absolute deviations regression), or by minimizing a penalized version of the least squares cost function as in ridge regression (L2-norm penalty) and lasso (L1-norm penalty). Conversely, the least squares approach can be used to fit models that are not linear models. Thus, although the terms "least squares" and "linear model" are closely linked, they are not synonymous.
Hello,
I know this is a big topic but I would like to check that what I understand so far is at least correct. Will look more into it. GLM is a family of statistical models in which the coefficients betas are "linear". The relation between ##Y## and the covariates ##Xs## can be nonlinear (ex...
Hello,
1) Let's consider a population of 1,000,000 data points with each data point being represented by the pair of values (x,y).
Let's assume that, when plotted on a graph, the 1,000,000 points look like a spread out cloud with an overall positive linear trend. These 1,000,000 points...
Hello,
How do we check if a ML model is statistically significant? For models like linear regression, logistic regression, etc. there are tests (t-tests, F-tests, etc.) that will tell us if the model, trained on some dataset, is statistically significant or not.
But in the case of ML models...
Hello Forum,
I have read about an interesting example of multiple linear regression (https://online.stat.psu.edu/stat501/lesson/12/12.3). There are two highly correlated predictors, ##X_1## as territory population and ##X_2## as per capita income with Sales as the ##Y## variable. My...
I am trying to find Planck's constant using Excel given the data:
Frequency [Hz]
Photon Energy [J]
7.5E+14
4.90E-19
6.7E+14
4.50E-19
6E+14
4.00E-19
5.5E+14
3.60E-19
5E+14
3.30E-19
4.6E+14
3.00E-19
4.3E+14
2.80E-19
4E+14
2.65E-19
3.75E+14
2.50E-19
I am using Linear...
Hello,
In studying linear regression more deeply, I learned that scaling play an important role in multiple ways:
a) the range of the independent variables ##X## affects the values of the regression coefficients. For example, a predictor variable ##X## with a large range typically get assigned...
Hello forum,
I have created some linear regression models based on a simple dataset with 4 variables (columns). The first models simply involve one predictor variable: $$Y=\beta_1 X_1+\beta_0$$ and $$Y=\beta_2 X_2+ \beta_0$$
The 3rd model is multiple linear regression model involving the 3...
Hello,
I have a question about linear regression models and correlation. My understanding is that our finite set of data ##(x,y)## represents a random sample from a much larger population. Each pair is an observation in the sample.
We find, using OLS, the best fit line and its coefficients and...
Hello,
Simple linear regression aims at finding the slope and intercept of the best-fit line to for a pair of ##X## and ##Y## variables.
In general, the optimal intercept and slope are found using OLS. However, I learned that "O" means ordinary and there are other types of least square...
they say that linear regression is used to predict numerical/continuous values whereas logistic regression is used to predict categorical value. but i think we can predict yes/no from linear regression as well
Just say that for x>some value, y=0 otherwise, y=1. What am I missing? What is its...
Hi,
In simple regression for machine learning , a model :
Y=mx +b ,
Is said AFAIK, to have bias equal to b. Is there a relation between the use of bias here and the use of bias in terms of estimators
for population parameters, i.e., the bias of an estimator P^ for a population parameter P is...
I'm a bit stuck on how to calculate the probability in part b from the linear regression parameters.
I tried plugging the parameter values into the linear regression model: Y =β0+β1X+ε, ε∼N(0,σ)
So P(Y=y| X=40) = 2.85 + 0.07 * 40 + 1^2
P(Y=y|X=40) = 5.65
But I don't think this is the...
Hello,
With multivariate linear regression, there is a single dependent variable ##y## and multiple independent variables ##x_1##, ##x_2##, ##x_3##, etc.
There is a linear, weighted relationship between ##y## and the various ##x## variables:
$$ y = c_1 x_1 + c_2 x_2 + c_3 x_3 $$
The...
In linear regression, one estimates parameters that are supposed to be linear with respect to the dependent variable, for instance
##y=\theta_0 e^x+\epsilon \ ,##
or
##y=\theta_0+\theta_1 x_1+\theta_2x_2+...+\theta_n x_n+\epsilon \ . ##
Is it not true that neither ##y(\theta_0)## nor...
I'm not a statistician, but this has been bothering me for a bit. Suppose we have the simple model
Y= aX + b + U
where Y,X and U are taken to be random variables representing the explanatory variable, the independent variable and the error term respectively.
In the case of a stochastic...
import matplotlibimport matplotlib.pyplot as plt
import numpy as np
from sklearn import datasets, linear_model
import pandas as pd
# Load CSV and columns
df = pd.read_csv("C:\Housing.csv")
Y = df['price']
X = df['lotsize']
# Split the data into training/testing sets
X_train = X[:-250]
X_test =...
Hello.
I have listened to a great lecture, which gave helpful intuitive insight into correlation and regression (basic stuff). But there are formulas, which I cannot grasp intuitively and don't know their origin. To remember them I would like to understand what's happening in each part of the...
Summary: I need to Identify my linear model matrix using least squares . The aim is to approach an overdetermined system Matrix [A] by knowing pairs of [x] and [y] input data in the complex space.
I need to do a linear model identification using least squared method.
My model to identify is a...
Hey. I am running regression on panel data. I test different approaches using Stata. When using "population-averaged" no squared R measures are reported. The approach is equal to running a regular linear regression on the panel data, and according to my professor, a squared R is statistically...
Simple linear regression statistics:
If I have a linear relation (or wish to prove such a relation): y = k x where k = constant. I have a set of n experimental data points ...(y0, x0), (y1, x1)... measured with some error estimates.
Is there some way to present how well the n data points shows...
so say I suspect that there is a positive trend in the data from the scatter plot. Say the output y is continuous.
A linear regression would give me a possitive estimate of the slope. For a one unit increase in x, I would get a so and so increase in y.
I can also split the data for the y...
I have a data set of 11 predictors and one response for 1000 observation and i want to do linear regression. I also have measurements errors of the predictors (also 11X1000 matrix) and i need to count for them in the total error estimation. how can i do that?
Hello,
Can someone please let me know of a resource (book or other) that explains how to use ANOVA in linear regression? I didn't even know what ANOVA was until some days ago so I'm looking for something that explains it thoroughly with deductions. The resources I've read focused solely on...
Hey, I have a problem where I have a discrete independent variable (integers spanning 1 through 27) and a continuous dependent variable (50 data points for each independent variable). I am wondering about the best method of regression here. Should I just fit to the mean or median? Is there a way...
Hi All,
This is probably trivial. What is the difference between techniques such as Linear/Logistic Regression, others done in ML , when they are done in Standard software packages : Excel ( incl. add-ons ), SPSS, etc? Why use ML algorithms when the same can be accomplished with standard...
Homework Statement
For the first question , why when the humidity increase by 1 percent , the moisture content will increase by 0.2727 percent ? Shouldn't it the moisture content will increase by 0.2727 + 0.4911 percent when the humidity increase by 1 percent ?
Second question , it's clear that...
The question:
Suppose $Y$ is discrete and only takes on non-negative integers and that the conditional distribution of $Y$ given $X=x$ is Poisson, that is, $$P(Y=y|X=x) = \frac{\exp(-x'\beta) (x'\beta)^y}{y!}$$ where $y = 0, 1, 2, \cdots$. First compute $E(Y|X=x)$ and $Var(Y|X=x)$, does this...
Say you have a log-level regression as follows:
$$\log Y = \beta_0 + \beta_1 X_1 + \beta_1 X_2 + \ldots + \beta_n X_n$$
We're trying come up with a meaningful interpretation for changes Y due to a change in some Xk.
If we take the partial derivative with respect to Xk. we end up with...
I am working with multiple regression with two independent variables, and interaction between them.
the expression is: y = b1x1 + b2x2 and b3x1x2
The question is: does one center both independent variables at the same time, when checking for the significance of the effect of the independent...
I have some data that I want to do simple linear regression on.
However I don't have a lot of datapoints, and would like to know the uncertainty in the parameters. I.e. the slope and the intercept of the linear regression model.
I know it should be possible to get a prob. distribution of the...
Cross-posted on SE.DS Beta.
I'm just doing a simple linear regression with gradient descent in the multivariate case. Feature normalization/scaling is a standard pre-processing step in this situation, so I take my original feature matrix $X$, organized with features in columns and samples in...
During a lab exercise we measured different masses of a magnetic material on a scale while changing the strength of the magnetic field it was in. Afterwards we plotted the masses and the fieldstrength hoping to find a linear slope. Then we drew a linear slope by using linear regression and found...
Hey.
I am doing a project where I am studying a set of companies over a 7-year period. I am doing a multiple linear regression analysis either with fixed or random effects (so, it's a panel study). What I am wondering is if the general assumptions/requirements apply when using the fixed/random...
I'm trying to optimize the function below, but I'm not sure where I made a mistake. (This is an application in machine learning.)$$J(\theta)=\sum_{i=1}^n \left(\sum_{j=1}^{k}(\theta^Tx^{(i)}-y^{(i)})_j^2\right)$$
where $\theta$ is a $n$ by $k$ matrix and $x$ is a $n$ by 1 matrix...
Hello,
I have a set of data, two columns, and each datum has its measurement error like illustration shows below:
x | y
--------------|-----------------
x1+/-xe1 | y1+/-ye1
. | .
. | ...
I encountered several times the following problem: Say I have a variable y dependent in a nonlinear way on m parameters ##\{x_i\}##, with ##i \in \{1,m\}##. However there is a linear relation between n>m functions ##f_j\in{x_i}##, i.e., ##y=\sum_j z_j f_j##. So I can get a solution of my problem...
Hi
I've collected few sets of data and obtained significant different linear regression (R^2) in 2 particular sets of data .
Does that indicates the 2 sets of data is not validated which might due to data collection error?
For example, 20 sets of data contain linear regression of 0.900+...
If you were to perform a linear regression of log10(B) vs log10(x) what would you expect the slope to be? The expected relationship between B and x is
B(x) = μoI(2πx)-1
So I am currently learning some regression techniques for my research and have been reading a text that describes linear regression in terms of basis functions. I got linear basis functions down and no exactly how to get there because I saw this a lot in my undergrad basically, in matrix...
I want to try to predict the USA summer highs using a linear regression. I know I can probably take data from the last 10 summers and plug that in, and use that to predict, but I'd like to use two data sources. 1 data source from the historical highs from past summers in the USA, and the 2nd...
Hello,
I have an experiment that I'm trying to conduct where I measure quantity A and normalize by quantity B. I then want to report normalized quantity A with error bars showing standard deviation. Quantity B is obtained via a standard curve that I generated (8 data points measured once each...
Hi All,
I am thinking of the issue of diminishing returns re linear regression. Can it be determined/decided from the
data itself, or is it decided just from the context? I was thinking of examples like that of grade vs daily study hours or (height )jump length vs year ( winner heights have...
Hi,
Say we collect data points ##(x_i,y_j)## to do a linear regression, but so that for each ##x_i ## we collect
values ##y_{i1}, y_{i2},...,y_{ij} ## . Is there a standard way of doing linear regression with this type of dataset?
Would we, e.g., average the ##y_{ij}## abd define it to be ##...
It is my understanding that you can use linear least squares to fit a plethora of different functions (quadratic, cubic, quartic etc). The requirement of linearity applies to the coefficients (i.e B in (y-Bx)^2). It seems to me that I can find a solution such that a coefficient b_i^2=c_i, in...
I am trying to understand multivariate linear regression.
I have a list of time that it took running processes based on several params, like % of cpu usage, and data read. Eg, I have a process that took 50 seconds to run, with a cpu usage of 70%, and the process read 10bytes of data. I have...
I am doing a multiple linear regression on a dataset. It is test scores. It has three highly correlated variables being income, reading score, and math score. Obviously since the test score is the sum of the math score and reading score would it be appropriate to exclude them simply based off...
So I'm trying to identify a system that happens to be a synchronus generator via linear regression. I've got a model with the unknown coefficients A, B and C, and the measured variables I, w and T according to
I(w, T) = A*T + B*w + C
1. What I fear is that I could get multiple solutions that...
I also made a graph which is not pictured.
1.) Calculate the least squares line. Put the equation in the form of: y-hat = a + bx.
I got: y hat = 11.304 + 106.218x
a.) Find correlation coefficient. Is it significant? (use the p-value to decide)
I got: r = 0.913... no it...
I have some 3-D model output for a river system that is tidally forced at the entrance. Right now, I'm trying to perform some linear regression on the harmonic constants of various tidal constituents at for several locations along the river compared to the observed tidal data. A linear...