What is Regression: Definition and 358 Discussions

In statistical modeling, regression analysis is a set of statistical processes for estimating the relationships between a dependent variable (often called the 'outcome variable') and one or more independent variables (often called 'predictors', 'covariates', or 'features'). The most common form of regression analysis is linear regression, in which one finds the line (or a more complex linear combination) that most closely fits the data according to a specific mathematical criterion. For example, the method of ordinary least squares computes the unique line (or hyperplane) that minimizes the sum of squared differences between the true data and that line (or hyperplane). For specific mathematical reasons (see linear regression), this allows the researcher to estimate the conditional expectation (or population average value) of the dependent variable when the independent variables take on a given set of values. Less common forms of regression use slightly different procedures to estimate alternative location parameters (e.g., quantile regression or Necessary Condition Analysis) or estimate the conditional expectation across a broader collection of non-linear models (e.g., nonparametric regression).
Regression analysis is primarily used for two conceptually distinct purposes. First, regression analysis is widely used for prediction and forecasting, where its use has substantial overlap with the field of machine learning. Second, in some situations regression analysis can be used to infer causal relationships between the independent and dependent variables. Importantly, regressions by themselves only reveal relationships between a dependent variable and a collection of independent variables in a fixed dataset. To use regressions for prediction or to infer causal relationships, respectively, a researcher must carefully justify why existing relationships have predictive power for a new context or why a relationship between two variables has a causal interpretation. The latter is especially important when researchers hope to estimate causal relationships using observational data.

View More On Wikipedia.org
  1. S

    B Linear regression question -- Is the slope equivalent to individually finding the slope between all the datapoints?

    So the linear regression formula is https://www.ncl.ac.uk/webtemplate/ask-assets/external/maths-resources/statistics/regression-and-correlation/simple-linear-regression.html found here. Question - is the slope given by the regression formula mathematically equivalent to individually finding...
  2. hongseok

    I Stellar evolution path and Regression line

    I analyzed the relationship between the surface temperature and luminosity of stars of similar mass using a regression model. Through this, I was able to obtain a regression line. Since stars of similar mass show similar evolutionary paths, I believe this regression line can be viewed as a rough...
  3. F

    I How do leaf nodes behave in regression decision trees?

    Hello. Decision trees are really cool. They can be used for either regression or classification. They are built with nodes and each node represents an if-then statement that gets evaluated to be either true or false. Does that mean there are always and only two edges/branches coming out of an...
  4. F

    I Coefficient sign flip in linear regression with correlated predictors

    Hello Forum, I have read about an interesting example of multiple linear regression (https://online.stat.psu.edu/stat501/lesson/12/12.3). There are two highly correlated predictors, ##X_1## as territory population and ##X_2## as per capita income with Sales as the ##Y## variable. My...
  5. F

    Engineering Decision Tree Regression: Avoiding Overfitting in Training Data

    The decision tree in the following curve is too fine details of the training data and learn from the noise, (overfitting). Ref: https://scikit-learn.org/stable/auto_examples/tree/plot_tree_regression.html#sphx-glr-auto-examples-tree-plot-tree-regression-py I tried to remove the overfitting...
  6. F

    I What are the commonly used estimators in regression models?

    Hello everyone, I am trying to close the loop on this important topic of estimators. An estimator is really just a function to calculate point statistics that are close estimates (with low variance) of population parameters. For example, given a set of data, we can compute the mean and the...
  7. F

    I Exploring Nonlinear Least Squares for Regression Analysis

    Hello, Regression analysis is about finding/estimating the coefficients for a particular function ##f## that would best fit the data. The function ##f## could be a straight line, an exponential, a power law, etc. The goal remains the same: finding the coefficients. If the data does not show a...
  8. C

    Use linear regression to find Planck’s constant

    I am trying to find Planck's constant using Excel given the data: Frequency [Hz] Photon Energy [J] 7.5E+14 4.90E-19 6.7E+14 4.50E-19 6E+14 4.00E-19 5.5E+14 3.60E-19 5E+14 3.30E-19 4.6E+14 3.00E-19 4.3E+14 2.80E-19 4E+14 2.65E-19 3.75E+14 2.50E-19 I am using Linear...
  9. F

    I Linear regression, feature scaling, and regression coefficients

    Hello, In studying linear regression more deeply, I learned that scaling play an important role in multiple ways: a) the range of the independent variables ##X## affects the values of the regression coefficients. For example, a predictor variable ##X## with a large range typically get assigned...
  10. F

    I Expected coefficient change from simple to multiple linear regression

    Hello forum, I have created some linear regression models based on a simple dataset with 4 variables (columns). The first models simply involve one predictor variable: $$Y=\beta_1 X_1+\beta_0$$ and $$Y=\beta_2 X_2+ \beta_0$$ The 3rd model is multiple linear regression model involving the 3...
  11. N

    A Non-significant variables in a logistic regression model

    If some variables of a logistic regression models are non significant, should they be considered for a risk index calculation? Should the logistic model include only relevant variables? Thanks for the attention.
  12. F

    I Linear regression and random variables

    Hello, I have a question about linear regression models and correlation. My understanding is that our finite set of data ##(x,y)## represents a random sample from a much larger population. Each pair is an observation in the sample. We find, using OLS, the best fit line and its coefficients and...
  13. F

    I Linear Regression and OLS

    Hello, Simple linear regression aims at finding the slope and intercept of the best-fit line to for a pair of ##X## and ##Y## variables. In general, the optimal intercept and slope are found using OLS. However, I learned that "O" means ordinary and there are other types of least square...
  14. B

    Simple overestimate of slope uncertainty in regression

    Hi all, I am a science educator in high school. I have been thinking about how to make a simple estimate that 1st and maybe 2nd year students can follow for the propagation of error to the uncertainty of the slope in linear regression. The problem is typically that they make some measurements...
  15. chwala

    Find the equation of the regression line of ##x## on ##y##

    The question is as shown below. ( Text book question). The textbook solution is indicated below. Discussion; Now they seemingly used ##r=1## to arrive at ##x=0.8+0.2y##. That is, ##y=-4+5x## then, since ##r=1##, ...implying perfect correlation therefore, ##5x=4+y## ##x=0.8+0.2y## My other...
  16. shivajikobardan

    Comp Sci Can't we use linear regression for classification/prediction?

    they say that linear regression is used to predict numerical/continuous values whereas logistic regression is used to predict categorical value. but i think we can predict yes/no from linear regression as well Just say that for x>some value, y=0 otherwise, y=1. What am I missing? What is its...
  17. W

    Cleaning/Reordering towards regression

    I have quantitative data on all countries on two variables, say A,B in Excel and I am trying to regress A on B. Problem is that data are ordered based on the magnitude of A, B , rather than Alpha by country. Is there a reasonable way of ordering by country for each and then regress A on B? If I...
  18. M

    I Fit a non-linear function to this time series

    I have an experimantally obtained time series: n_test(t) with about 5500 data points. Now I assume that this n_test(t) should follow the following equation: n(t) = n_max - (n_max - n_start)*exp(-t/tau). How can I find the values for n_start, n_max and tau so as to find the best fit to the...
  19. FactChecker

    B Comparing Approaches: Linear Regression of Y on X vs X on Y

    I do disagree. How accurately a variable can be measured is not the significant issue. The head/tail result of a coin toss can be measured with great accuracy but that does not make that result the independent variable. The decision of whether to model Y=aX+b+##\epsilon## versus...
  20. S

    B Question about regression line

    I have note that states regression line x on y is used when we want to calculate x for given y but in this case y is dependent variable. I am pretty sure I can use either line if the value of product moment correlation coefficient (r) is close to 1 but for the case, let say r = 0.6, can we use...
  21. W

    I Bias in Linear Regression (x-intercept) vs Statistics

    Hi, In simple regression for machine learning , a model : Y=mx +b , Is said AFAIK, to have bias equal to b. Is there a relation between the use of bias here and the use of bias in terms of estimators for population parameters, i.e., the bias of an estimator P^ for a population parameter P is...
  22. S

    MHB Calculating probability from linear regression parameters

    I'm a bit stuck on how to calculate the probability in part b from the linear regression parameters. I tried plugging the parameter values into the linear regression model: Y =β0+β1X+ε, ε∼N(0,σ) So P(Y=y| X=40) = 2.85 + 0.07 * 40 + 1^2 P(Y=y|X=40) = 5.65 But I don't think this is the...
  23. yecko

    Comp Sci Linear regression doubling time

    y=mx+c by z=log y axis m=9-2.5 / 35 = 13/70 z=13/70 * (t-1983) + 2.5 log y = 13/70 * (t-1983) + 2.5 # doubling time: t1=y1, t2=y2=2y1 log y1 = 13/70 * (t1-1983) + 2.5 ---{1} log (2*y1) = 13/70 * (t2-1983) + 2.5 ---{2} {2}-{1}: log2=13/70 * [(t2)-(t1)] [(t2)-(t1)] = log2/(13/70) # for log...
  24. M

    MHB Least squares regression line (I'm very lost)

    Hi! Basically this is the exercise: Given the covariance of x and y is -12 and the variance of x is 6,5, using the least squares line of best fit connecting x and y yo estimate the value of x when y=15 x 2 5 9 7 9 10 7 y 25 17 11 10 8 7 13 any help would mean everything, I'm desperate :(
  25. W

    A Error in (Multi)linear Regression

    Hi, I keep reading varying accounts on conditions needed to " justify" the use of ( multi) linear regression to model data. Specifically, I have seen several authors require errors to be normal, i.i.d , whilr others only require the errors be i.i.d with mean 0. Just where is the assumption of...
  26. F

    B Limitations of Multivariate Linear Regression

    Hello, With multivariate linear regression, there is a single dependent variable ##y## and multiple independent variables ##x_1##, ##x_2##, ##x_3##, etc. There is a linear, weighted relationship between ##y## and the various ##x## variables: $$ y = c_1 x_1 + c_2 x_2 + c_3 x_3 $$ The...
  27. M

    I How to propagate errors through a regression & non-linear model?

    Hi, I was working on a predictive linear regression model and was hoping to obtain some bounds to represent the uncertainty present in the model. Question: I suppose this boils down into two separate components: 1. What is a good measure of uncertainty from a linear regression model? MSE, or...
  28. M

    I Regression Prediction with Time Series Data

    Hi, I am not sure what the correct forum is for this question. Question: When do we need to remove seasonality from time series data to do a regression analysis? Context: I am planning to conduct a prediction analysis where I want to find out how a device performs. I hope to estimate a...
  29. S

    I Why linear in linear regression?

    In linear regression, one estimates parameters that are supposed to be linear with respect to the dependent variable, for instance ##y=\theta_0 e^x+\epsilon \ ,## or ##y=\theta_0+\theta_1 x_1+\theta_2x_2+...+\theta_n x_n+\epsilon \ . ## Is it not true that neither ##y(\theta_0)## nor...
  30. K

    I Linear regression with error term

    I'm not a statistician, but this has been bothering me for a bit. Suppose we have the simple model Y= aX + b + U where Y,X and U are taken to be random variables representing the explanatory variable, the independent variable and the error term respectively. In the case of a stochastic...
  31. J

    MHB Diff-in-Diff Regression: 6 Var, Interaction Terms & Estimate

    I am doing a difference-in-difference analysis on a set of survey data for a health education program and I need to find statistical significance for the difference-in-difference estimate. I know that I find this using a regression. I need to use a regression in a mixed logistic model including...
  32. W

    I Sharp Turn in Logistic Regression

    Hi I am trying to remember the name of the situation in logistic regression when all data points beyond a fixed one are all successes or all fails. So we have data points## ( a_{i1}, a_{i2},.., a_{in} , 0/1) ##, with data points ##a_{ij}##ordered; last input a Boolean and a fixed value for j...
  33. SchroedingersLion

    A Regression: Regularization parameter 0

    Hi guys, I am using ScikitLearn's Elastic Net implementation to perform regression on a data set where number of data points is larger than number of features. The routine uses crossvalidation to find the two hyperparameters: ElasticNetCV The elastic net minimizes ##\frac {1}{2N} ||y-Xw||^2 +...
  34. W

    Getting this Array to be in 2D instead of 1D for Python Linear Regression

    import matplotlibimport matplotlib.pyplot as plt import numpy as np from sklearn import datasets, linear_model import pandas as pd # Load CSV and columns df = pd.read_csv("C:\Housing.csv") Y = df['price'] X = df['lotsize'] # Split the data into training/testing sets X_train = X[:-250] X_test =...
  35. W

    MHB Forecasting metric using regression. Is this a sound approach?

    Hello, First post here. I have some data I am trying to do some forecasting on and was hoping somebody who knows what they're actually doing can verify what I have done. A few years ago, the company I work for developed a mobile app for its customers and about 1 year ago they added some new...
  36. S

    Help with Interpreting Regression Results

    Hello all, need your help with interpreting regression reuslts The results are given below for hte regression that I ran in Excel Here, the dependant variable Pc is -63.28 with a standard error of 15.86. But is this different from zero? I see the t-stat here is -3.99, and the p-value is...
  37. synMehdi

    I Linear least squares regression for model matrix identification

    Summary: I need to Identify my linear model matrix using least squares . The aim is to approach an overdetermined system Matrix [A] by knowing pairs of [x] and [y] input data in the complex space. I need to do a linear model identification using least squared method. My model to identify is a...
  38. L

    MHB Levels of Measurement, Basic OLS Regression Questions

    Hello soon to be saviors, 😊I have two really simple questions that I have already answered but the teacher wants more info. I am really stumped and I am not looking for the answers so much as an explanation on how to better answer the questions. I will copy and paste the problems and my answers...
  39. D

    MHB Require assistance with possible multiple regression analysis

    I am interested in determining more efficient ways of determining individuals' body fat percentage. To do this, I measure the circumference of a number of segments (10 of them) of the body and determine the person's percentage body fat through underwater weighing. I have done this for 252 total...
  40. S

    Regression for a first order system

    Homework Statement I am carrying out a regression for diameter of a part Homework Equations Diameter = -0.0531052 + 0.0443237 * exp (-0.0103633 * 'Time elapsed') if diameter is -0.052 then can some one please calculate the value for time elapsed would you please explain the steps The...
  41. M

    A Regression analysis: logarithm or relative change?

    Hi. I am currently studying the market for equity options and the use of these to predict stock return around company earnings announcements. The dependent variable in my regression analyses have been the relative change in stock price or log-return from the day before the announcement to...
  42. SchroedingersLion

    I Ridge Regression Cross Validation

    Hello guys, I have some difficulties understanding the procedure of cross validation to estimate the hyperparameter ## \lambda ## in Ridge Regression. The Ridge Regression yields the weight vector w from $$ min_w ( ||Y-Xw||^2 + \lambda ||w||)$$ X is the data matrix that stores N data vectors...
  43. pslarsen

    Brute force regression software?

    Hi all I have a lot of data, and was thinking if there exists a program that will apply a type of brute force regression tool to basically try any thinkable combination of variables and mathematical expressions to minimize the error between Y and Y_predicted. The data [(x1 vs Y) (x2 vs Y)...
  44. M

    A Using standard deviation values as independent variables

    Hey. I am planning on doing some research, where I predict a change based on different types of risk. The question is simple. Can I use values of standard deviation as independent variables in a linear regression analysis (OLS)? The standard deviation values over time will be calculated in...
  45. O

    B Megastat's Regression Analysis keeps asking confidence level

    Hi, Anyone out there using Megastat? The course I am in requires using it/knowing how to use it for processes. Whenever I try to get a regression analysis it insists that I need to set a confidence level - I've tried different versions of typing 95%/0.95 into no avail, and I don't know what...
  46. FallenApple

    A Continuous output: logistic vs linear regression

    so say I suspect that there is a positive trend in the data from the scatter plot. Say the output y is continuous. A linear regression would give me a possitive estimate of the slope. For a one unit increase in x, I would get a so and so increase in y. I can also split the data for the y...
  47. F

    Is there an optimal distance between measurements for regression

    Suppose I am trying to approximate a function which I do not know, but I can measure. Each measurement takes a lot of effort. Say the function I am approximating is ##y=f(x)## and ##x \in [0,100]## Supose I know the expectation and variance of ##f(x)##. Is there a way to compute the confidence...
  48. S

    MHB Can Categorical Variables be Used in Multiple Regression Models?

    Hello, I am trying to do the following regression model; Y = N + T + F + NT + NF + NTF + error Y= Grams of seed N= Number of fruit T= Type of fruit (2 types, alpha) F= Field number (3) I have tried putting this in MiniTab and I can't get this set up correctly. Assistant> Regression>...
  49. avner yakov

    A Error estimation in linear regression

    I have a data set of 11 predictors and one response for 1000 observation and i want to do linear regression. I also have measurements errors of the predictors (also 11X1000 matrix) and i need to count for them in the total error estimation. how can i do that?
Back
Top