What is Regression: Definition and 359 Discussions
In statistical modeling, regression analysis is a set of statistical processes for estimating the relationships between a dependent variable (often called the 'outcome variable') and one or more independent variables (often called 'predictors', 'covariates', or 'features'). The most common form of regression analysis is linear regression, in which one finds the line (or a more complex linear combination) that most closely fits the data according to a specific mathematical criterion. For example, the method of ordinary least squares computes the unique line (or hyperplane) that minimizes the sum of squared differences between the true data and that line (or hyperplane). For specific mathematical reasons (see linear regression), this allows the researcher to estimate the conditional expectation (or population average value) of the dependent variable when the independent variables take on a given set of values. Less common forms of regression use slightly different procedures to estimate alternative location parameters (e.g., quantile regression or Necessary Condition Analysis) or estimate the conditional expectation across a broader collection of non-linear models (e.g., nonparametric regression).
Regression analysis is primarily used for two conceptually distinct purposes. First, regression analysis is widely used for prediction and forecasting, where its use has substantial overlap with the field of machine learning. Second, in some situations regression analysis can be used to infer causal relationships between the independent and dependent variables. Importantly, regressions by themselves only reveal relationships between a dependent variable and a collection of independent variables in a fixed dataset. To use regressions for prediction or to infer causal relationships, respectively, a researcher must carefully justify why existing relationships have predictive power for a new context or why a relationship between two variables has a causal interpretation. The latter is especially important when researchers hope to estimate causal relationships using observational data.
Hi, I am a ninth-grade student from Portugal with nothing to do, and I have decided that I want to build a simple regression algorithm in Desmos (the online calculator) to fit random binary inputs and maybe predict the next binary digits (although that part may take a considerable amount of...
So the linear regression formula is https://www.ncl.ac.uk/webtemplate/ask-assets/external/maths-resources/statistics/regression-and-correlation/simple-linear-regression.html found here.
Question - is the slope given by the regression formula mathematically equivalent to individually finding...
I analyzed the relationship between the surface temperature and luminosity of stars of similar mass using a regression model. Through this, I was able to obtain a regression line. Since stars of similar mass show similar evolutionary paths, I believe this regression line can be viewed as a rough...
Hello.
Decision trees are really cool. They can be used for either regression or classification. They are built with nodes and each node represents an if-then statement that gets evaluated to be either true or false. Does that mean there are always and only two edges/branches coming out of an...
Hello Forum,
I have read about an interesting example of multiple linear regression (https://online.stat.psu.edu/stat501/lesson/12/12.3). There are two highly correlated predictors, ##X_1## as territory population and ##X_2## as per capita income with Sales as the ##Y## variable. My...
The decision tree in the following curve is too fine details of the training data and learn from the noise, (overfitting).
Ref: https://scikit-learn.org/stable/auto_examples/tree/plot_tree_regression.html#sphx-glr-auto-examples-tree-plot-tree-regression-py
I tried to remove the overfitting...
Hello everyone,
I am trying to close the loop on this important topic of estimators.
An estimator is really just a function to calculate point statistics that are close estimates (with low variance) of population parameters. For example, given a set of data, we can compute the mean and the...
Hello,
Regression analysis is about finding/estimating the coefficients for a particular function ##f## that would best fit the data. The function ##f## could be a straight line, an exponential, a power law, etc. The goal remains the same: finding the coefficients.
If the data does not show a...
I am trying to find Planck's constant using Excel given the data:
Frequency [Hz]
Photon Energy [J]
7.5E+14
4.90E-19
6.7E+14
4.50E-19
6E+14
4.00E-19
5.5E+14
3.60E-19
5E+14
3.30E-19
4.6E+14
3.00E-19
4.3E+14
2.80E-19
4E+14
2.65E-19
3.75E+14
2.50E-19
I am using Linear...
Hello,
In studying linear regression more deeply, I learned that scaling play an important role in multiple ways:
a) the range of the independent variables ##X## affects the values of the regression coefficients. For example, a predictor variable ##X## with a large range typically get assigned...
Hello forum,
I have created some linear regression models based on a simple dataset with 4 variables (columns). The first models simply involve one predictor variable: $$Y=\beta_1 X_1+\beta_0$$ and $$Y=\beta_2 X_2+ \beta_0$$
The 3rd model is multiple linear regression model involving the 3...
If some variables of a logistic regression models are non significant, should they be considered for a risk index calculation?
Should the logistic model include only relevant variables?
Thanks for the attention.
Hello,
I have a question about linear regression models and correlation. My understanding is that our finite set of data ##(x,y)## represents a random sample from a much larger population. Each pair is an observation in the sample.
We find, using OLS, the best fit line and its coefficients and...
Hello,
Simple linear regression aims at finding the slope and intercept of the best-fit line to for a pair of ##X## and ##Y## variables.
In general, the optimal intercept and slope are found using OLS. However, I learned that "O" means ordinary and there are other types of least square...
Hi all,
I am a science educator in high school. I have been thinking about how to make a simple estimate that 1st and maybe 2nd year students can follow for the propagation of error to the uncertainty of the slope in linear regression. The problem is typically that they make some measurements...
The question is as shown below. ( Text book question).
The textbook solution is indicated below.
Discussion;
Now they seemingly used ##r=1## to arrive at ##x=0.8+0.2y##. That is,
##y=-4+5x##
then, since ##r=1##, ...implying perfect correlation therefore,
##5x=4+y##
##x=0.8+0.2y##
My other...
they say that linear regression is used to predict numerical/continuous values whereas logistic regression is used to predict categorical value. but i think we can predict yes/no from linear regression as well
Just say that for x>some value, y=0 otherwise, y=1. What am I missing? What is its...
I have quantitative data on all countries on two variables, say A,B in Excel and I am trying to regress A on B. Problem is that data are ordered based on the magnitude of A, B , rather than Alpha by country. Is there a reasonable way of ordering by country for each and then regress A on B? If I...
I have an experimantally obtained time series: n_test(t) with about 5500 data points. Now I assume that this n_test(t) should follow the following equation:
n(t) = n_max - (n_max - n_start)*exp(-t/tau).
How can I find the values for n_start, n_max and tau so as to find the best fit to the...
I do disagree. How accurately a variable can be measured is not the significant issue. The head/tail result of a coin toss can be measured with great accuracy but that does not make that result the independent variable. The decision of whether to model Y=aX+b+##\epsilon## versus...
I have note that states regression line x on y is used when we want to calculate x for given y but in this case y is dependent variable. I am pretty sure I can use either line if the value of product moment correlation coefficient (r) is close to 1 but for the case, let say r = 0.6, can we use...
Hi,
In simple regression for machine learning , a model :
Y=mx +b ,
Is said AFAIK, to have bias equal to b. Is there a relation between the use of bias here and the use of bias in terms of estimators
for population parameters, i.e., the bias of an estimator P^ for a population parameter P is...
I'm a bit stuck on how to calculate the probability in part b from the linear regression parameters.
I tried plugging the parameter values into the linear regression model: Y =β0+β1X+ε, ε∼N(0,σ)
So P(Y=y| X=40) = 2.85 + 0.07 * 40 + 1^2
P(Y=y|X=40) = 5.65
But I don't think this is the...
Hi! Basically this is the exercise:
Given the covariance of x and y is -12 and the variance of x is 6,5, using the least squares line of best fit connecting x and y yo estimate the value of x when y=15
x
2
5
9
7
9
10
7
y
25
17
11
10
8
7
13
any help would mean everything, I'm desperate :(
Hi,
I keep reading varying accounts on conditions needed to " justify" the use of ( multi) linear regression to model data.
Specifically, I have seen several authors require errors to be normal, i.i.d , whilr others only require the errors be i.i.d with mean 0. Just where is the assumption of...
Hello,
With multivariate linear regression, there is a single dependent variable ##y## and multiple independent variables ##x_1##, ##x_2##, ##x_3##, etc.
There is a linear, weighted relationship between ##y## and the various ##x## variables:
$$ y = c_1 x_1 + c_2 x_2 + c_3 x_3 $$
The...
Hi,
I was working on a predictive linear regression model and was hoping to obtain some bounds to represent the uncertainty present in the model.
Question:
I suppose this boils down into two separate components:
1. What is a good measure of uncertainty from a linear regression model? MSE, or...
Hi,
I am not sure what the correct forum is for this question.
Question: When do we need to remove seasonality from time series data to do a regression analysis?
Context:
I am planning to conduct a prediction analysis where I want to find out how a device performs. I hope to estimate a...
In linear regression, one estimates parameters that are supposed to be linear with respect to the dependent variable, for instance
##y=\theta_0 e^x+\epsilon \ ,##
or
##y=\theta_0+\theta_1 x_1+\theta_2x_2+...+\theta_n x_n+\epsilon \ . ##
Is it not true that neither ##y(\theta_0)## nor...
I'm not a statistician, but this has been bothering me for a bit. Suppose we have the simple model
Y= aX + b + U
where Y,X and U are taken to be random variables representing the explanatory variable, the independent variable and the error term respectively.
In the case of a stochastic...
I am doing a difference-in-difference analysis on a set of survey data for a health education program and I need to find statistical significance for the difference-in-difference estimate. I know that I find this using a regression. I need to use a regression in a mixed logistic model including...
Hi
I am trying to remember the name of the situation in logistic regression when all data points beyond a fixed one are all successes or all fails. So we have data points## ( a_{i1}, a_{i2},.., a_{in} , 0/1) ##, with data points ##a_{ij}##ordered; last input a Boolean and a fixed value for j...
Hi guys,
I am using ScikitLearn's Elastic Net implementation to perform regression on a data set where number of data points is larger than number of features. The routine uses crossvalidation to find the two hyperparameters: ElasticNetCV
The elastic net minimizes ##\frac {1}{2N} ||y-Xw||^2 +...
import matplotlibimport matplotlib.pyplot as plt
import numpy as np
from sklearn import datasets, linear_model
import pandas as pd
# Load CSV and columns
df = pd.read_csv("C:\Housing.csv")
Y = df['price']
X = df['lotsize']
# Split the data into training/testing sets
X_train = X[:-250]
X_test =...
Hello,
First post here. I have some data I am trying to do some forecasting on and was hoping somebody who knows what they're actually doing can verify what I have done. A few years ago, the company I work for developed a mobile app for its customers and about 1 year ago they added some new...
Hello all, need your help with interpreting regression reuslts
The results are given below for hte regression that I ran in Excel
Here, the dependant variable Pc is -63.28 with a standard error of 15.86.
But is this different from zero? I see the t-stat here is -3.99, and the p-value is...
Summary: I need to Identify my linear model matrix using least squares . The aim is to approach an overdetermined system Matrix [A] by knowing pairs of [x] and [y] input data in the complex space.
I need to do a linear model identification using least squared method.
My model to identify is a...
Hello soon to be saviors, 😊I have two really simple questions that I have already answered but the teacher wants more info. I am really stumped and I am not looking for the answers so much as an explanation on how to better answer the questions. I will copy and paste the problems and my answers...
I am interested in determining more efficient ways of determining individuals' body fat percentage. To do this, I measure the circumference of a number of segments (10 of them) of the body and determine the person's percentage body fat through underwater weighing. I have done this for 252 total...
Homework Statement
I am carrying out a regression for diameter of a part
Homework Equations
Diameter = -0.0531052 + 0.0443237 * exp (-0.0103633 * 'Time elapsed')
if diameter is -0.052
then can some one please calculate the value for time elapsed
would you please explain the steps
The...
Hi. I am currently studying the market for equity options and the use of these to predict stock return around company earnings announcements. The dependent variable in my regression analyses have been the relative change in stock price or log-return from the day before the announcement to...
Hello guys,
I have some difficulties understanding the procedure of cross validation to estimate the hyperparameter ## \lambda ## in Ridge Regression.
The Ridge Regression yields the weight vector w from
$$ min_w ( ||Y-Xw||^2 + \lambda ||w||)$$
X is the data matrix that stores N data vectors...
Hi all
I have a lot of data, and was thinking if there exists a program that will apply a type of brute force regression tool to basically try any thinkable combination of variables and mathematical expressions to minimize the error between Y and Y_predicted.
The data [(x1 vs Y) (x2 vs Y)...
Hey. I am planning on doing some research, where I predict a change based on different types of risk.
The question is simple. Can I use values of standard deviation as independent variables in a linear regression analysis (OLS)? The standard deviation values over time will be calculated in...
Hi,
Anyone out there using Megastat? The course I am in requires using it/knowing how to use it for processes. Whenever I try to get a regression analysis it insists that I need to set a confidence level - I've tried different versions of typing 95%/0.95 into no avail, and I don't know what...
so say I suspect that there is a positive trend in the data from the scatter plot. Say the output y is continuous.
A linear regression would give me a possitive estimate of the slope. For a one unit increase in x, I would get a so and so increase in y.
I can also split the data for the y...
Suppose I am trying to approximate a function which I do not know, but I can measure. Each measurement takes a lot of effort.
Say the function I am approximating is ##y=f(x)## and ##x \in [0,100]##
Supose I know the expectation and variance of ##f(x)##.
Is there a way to compute the confidence...
Hello, I am trying to do the following regression model;
Y = N + T + F + NT + NF + NTF + error
Y= Grams of seed
N= Number of fruit
T= Type of fruit (2 types, alpha)
F= Field number (3)
I have tried putting this in MiniTab and I can't get this set up correctly.
Assistant> Regression>...