What is Regression analysis: Definition and 39 Discussions
In statistical modeling, regression analysis is a set of statistical processes for estimating the relationships between a dependent variable (often called the 'outcome variable') and one or more independent variables (often called 'predictors', 'covariates', or 'features'). The most common form of regression analysis is linear regression, in which one finds the line (or a more complex linear combination) that most closely fits the data according to a specific mathematical criterion. For example, the method of ordinary least squares computes the unique line (or hyperplane) that minimizes the sum of squared differences between the true data and that line (or hyperplane). For specific mathematical reasons (see linear regression), this allows the researcher to estimate the conditional expectation (or population average value) of the dependent variable when the independent variables take on a given set of values. Less common forms of regression use slightly different procedures to estimate alternative location parameters (e.g., quantile regression or Necessary Condition Analysis) or estimate the conditional expectation across a broader collection of non-linear models (e.g., nonparametric regression).
Regression analysis is primarily used for two conceptually distinct purposes. First, regression analysis is widely used for prediction and forecasting, where its use has substantial overlap with the field of machine learning. Second, in some situations regression analysis can be used to infer causal relationships between the independent and dependent variables. Importantly, regressions by themselves only reveal relationships between a dependent variable and a collection of independent variables in a fixed dataset. To use regressions for prediction or to infer causal relationships, respectively, a researcher must carefully justify why existing relationships have predictive power for a new context or why a relationship between two variables has a causal interpretation. The latter is especially important when researchers hope to estimate causal relationships using observational data.
Hey guys, I'll try to be as direct as possible. So for school i'm doing an experiment at home trying to find out if the diameter of a pot affects the time it takes to boil water inside the pot as it says in the title. I had three different pots with three different diameters. I got half a liter...
General linear model is
$$y=a_0+\sum_{i=1}^{i=k} a_i x_i$$
In regression analysis one always collects n observations of y at different inputs of ##x_i##s. n>>k or there will be many problems. For each regressor, and response y ,we tabulate all observations in a vector ##\textbf{x}_i## and...
Hello,
Regression analysis is about finding/estimating the coefficients for a particular function ##f## that would best fit the data. The function ##f## could be a straight line, an exponential, a power law, etc. The goal remains the same: finding the coefficients.
If the data does not show a...
Hi,
I keep reading varying accounts on conditions needed to " justify" the use of ( multi) linear regression to model data.
Specifically, I have seen several authors require errors to be normal, i.i.d , whilr others only require the errors be i.i.d with mean 0. Just where is the assumption of...
I am interested in determining more efficient ways of determining individuals' body fat percentage. To do this, I measure the circumference of a number of segments (10 of them) of the body and determine the person's percentage body fat through underwater weighing. I have done this for 252 total...
Hi. I am currently studying the market for equity options and the use of these to predict stock return around company earnings announcements. The dependent variable in my regression analyses have been the relative change in stock price or log-return from the day before the announcement to...
Hi,
Anyone out there using Megastat? The course I am in requires using it/knowing how to use it for processes. Whenever I try to get a regression analysis it insists that I need to set a confidence level - I've tried different versions of typing 95%/0.95 into no avail, and I don't know what...
Homework Statement
Hi,
I had a question regarding testing a regression models coefficients.
Say there is a regression model that has the form:
y = b0 + b1x1 + b2x2 + b3x3 + b4x4 + e
For the sake of simplicity let: e be the random error, x1 is age, x2 is severity, and x3 is anxiety. y is...
I am aware that f-tests can be used to check the null hypothesis when comparing regression models if the models are nested.
What I am confused about is if I can apply an f-test to compare the following, (and if so what is the best way)
I have two regression laws
Y = a1*X1 + a2*X2 + b
Y =...
Hi All,
I have many Likert variables regarding a single item issue. Specifically, I am dealing with several measures of
IT Dept Quality, like % of budget devoted to IT department, Number of External Audits, etc ; each is measured on a Likert scale. I ultimately want to regress EDIT against...
Homework Statement
A random sample of size ##n## from a bivariate distribution is denoted by ##(x_r,y_r), r=1,2,3,...,n##. Show that if the regression line of ##y## on ##x## passes through the origin of its scatter diagram then[/B]
$$\bar y \sum^n_{r=1} x_r^2=\bar x\sum^n_{r=1} x_r y_r$$ where...
Hi All,
Say we want to linearly regress Y (dependent) against ## X_1, X_2,..., X_n ## (Independent) , all numerical variables to get a model ## Y=a_1X_1+...+a_n X_n ## .
Then we test ## H_0 ## for whether :
##H_0: 0= a_1= a_2 =...=a_n ##
## H_1 : a_i \neq 0 ## for some ## i=1,2,..,n ##...
hi.
I am using Stata to do a regression on longitudinal data. However, it does not produce numbers for standardized coefficients. How do I make stata produce the standardised coefficients as part of the regression operation?
Mons
The article Energy expenditure in adults living in developing compared with industrialized countries: a meta-analysis of doubly labeled water studies has the following shocking conclusion:
The authors argued that the lack of physical activities in industrialized countries had little effect on...
Hi all, I have logistically- regressed 3 different numerical variables ,v1,v2,v3 separately against the same variable w . All variables have the same type of S-curve (meaning, in this case, that probabilities increase as vi ; i=1,2,3 increases ). Is there a way of somehow joining the three...
Hi All,
I ran a binary logistic of Y on three different numerical variables A,B,C respectively. I am having an issue of separation of variables with all of them, meaning that there are values Ao,Bo, Co for each of A,B,C (different values for each, of course) so that for ## A>Ao, B>Bo...
Hi all , just curious if someone knows of any issues of Separation of Points in Ordinal 3-valued
Logistic Regression. I think I have an idea of why there are issues with separation in binary
Logistic -- the need for the S-curve to go to 0 quickly makes the Bo term go to infinity. Are there...
Hello!
I am yet very weak in statistics, but I am learning some basic finance, and this requires to create regression.
Please, take a look at attached files - one excel that contains the results of regression and one screen shot of the window of StatPlus that I have to fill in. Before using my...
I'm trying to create a model which is of the form
y = (a0 + a1l)[b0+MΣm=1 bmcos(mx-αm)] [c0 + NΣn=1 cn cos(nz-βn)]
In the above system, l,x and z are independent variables and y is the dependent variable. The a, b and c terms are the unknowns. To solve for these unknowns, I have two separate...
I did some data analysis with excel fitting some linear, zero intercept data with trend line and the regression analysis tool. The slopes generated by the two methods were different by about 10%. The regression line seemed to be weighted differently, are these two methods different for some...
I have some 3-D model output for a river system that is tidally forced at the entrance. Right now, I'm trying to perform some linear regression on the harmonic constants of various tidal constituents at for several locations along the river compared to the observed tidal data. A linear...
Hi everyone. I'm a graduate student and am struggling with something that may possibly be trivial. So, my research is creating a mathematical model to represent a real system. I have data points from my real system that I want to compare my model to. How do I do a regression analysis and get...
Hello,
I am trying to construct a general function/method based on two sets of minimum/maximum data point constraints, which can take on new values in different situations. The only known data for this general function is the starting point (y-axis intercept) and the x-range. The rate of...
A bit confused with this question. my answers are below each question. please help.
Branded Products, Inc., based in Halfway Tree is a leading producer and marketer of household laundry detergent and bleach products. About a year ago, Branded products rolled out its new Super Detergent in four...
Hello,
I'm a second year mathematics and economics student, and I've been hired by an economic development organisation to conduct a research project on the probability of loan default in micro-credit borrowers in rural Kenya (I'll be heading there in person this summer).
Basically, I'll...
Homework Statement
My question is q.3 in the attachment. I don't really understand the scenario of the question.
The Attempt at a Solution
For (a), if X = 1, will the model become: y = (b_1)(E_1) + epsilon? So (b_i)'s are the slopes of the models? But what is the assumption of the...
Hi there,
Could anybody offer any advice on a linear regression sample size problem?
I am using regression to predict the energy consumption (watt/mile) of an electric car based on a number of parameters such as average velocity, max velocity, average acceleration, the number of stops...
Does this even make sense?
Am told to do a multiple regression analysis. The response variable and the explanatory variables add up and should give up ~100 percent of the total product. Example:
Milk = water + fat + protein ~= 100% (all are in terms of percentages)
The regression I was...
Homework Statement
based on this data
http://www.stat.ufl.edu/~rrandles/sta4210/4210lectures/secondexreview/exam2rev.pdf
1) Consider the full (three predictor) model. Is this model useful? (are any of the predictors worthwhile?)
2) Use the All-Subsets and conduct a search for the best...
Homework Statement
based on this data
http://www.stat.ufl.edu/~rrandles/sta4210/4210lectures/secondexreview/exam2rev.pdf
1) Consider the full (three predictor) model. Is this model useful? (are any of the predictors worthwhile?)
2) Use the All-Subsets and conduct a search for the...
I am sooo lost in this class, please help.
1. Let the true (population) model be y = B0+B1x1+B2x2+u where u is an unobserved error term with u (conditional) x1, x2 and N(0, sigma^2). Hence, u is normally distributed with mean 0 and variance sigma^2 (i.e., E[u (conditional) x1, x2] = 0 and V...
Hello everyone, I have a homework assignment in my financial mathematics class and I don't fully understand it, so here is my problem:
I am supposed to backtest a given data set to see if a financial model works, in particular, for 30 maturity dates of the treasuries (bonds) I had to see how...
Hi,
I've asked this question on another forum, but no response until now. Maybe I will have a little bit of luck here. So .. I have a problem. I have a set of 8 parameter and I use this parameters in order to compute a measure (I vary each parameter with a step of 50%). I would like to know...
Does anyone know how to do a regression analysis on excel? I need to find the correlation coefficient and the coefficient of determination and I only know how to do that with a graphing calculator TI 83+
Hello,
I am a first year undergraduate university student majoring in Engineering and Computing Sc. One of my courses is Linear Algebra. We have been given an assignment in which question no. 2 is out of syllabus. It is on Least Squares Regression Analysis. This has not been taught to us. We...
What are some of the elementary remedial procedures to multicollinearity (VIF >= 10) in linear regression? We were told to simply just drop that particular independent variable, but someone else suggested we could center the predictor variables (ie., xi = Xi - Xbar). Can somebody explain why...
What are the most sophisticated methods of performing regression analysis and how does least squares rank among them? Additionally which category would the least squares method fit into below (if any):
Simple, Multiple, Non-linear, Robust, Ridge, Logistic
Thanks,
-Diffy
[SOLVED] Regression Analysis for a Gamma function
My regression analysis program that I developed in BASICS back in the 1980's applies for half a dozen linear equations some of which are transormed into log forms. I would like to modify my program to include this Gamma function...