Multiple regression, why use categories

  • Context: Undergrad 
  • Thread starter Thread starter bradyj7
  • Start date Start date
  • Tags Tags
    Multiple Regression
Click For Summary
SUMMARY

The discussion centers on the use of categorical variables in multiple regression analysis for predicting energy consumption in electric cars. The author categorized 27 variables into four groups, ultimately selecting 16 variables after addressing correlations within groups. The advantages of categorization include reducing analysis complexity and focusing on expected correlations based on general concepts rather than specific routes. The conversation also touches on the implications of multicollinearity and the potential for using stepwise regression methods.

PREREQUISITES
  • Understanding of multiple regression analysis
  • Familiarity with multicollinearity concepts
  • Knowledge of variable categorization techniques
  • Experience with regression model selection methods, such as best subsets regression
NEXT STEPS
  • Research the impact of variable categorization on regression outcomes
  • Learn about multicollinearity and its effects on regression analysis
  • Explore stepwise regression techniques and their applications
  • Investigate best subsets regression and its advantages over traditional methods
USEFUL FOR

Data analysts, statisticians, and researchers involved in predictive modeling, particularly those focused on energy consumption analysis in automotive contexts.

bradyj7
Messages
117
Reaction score
0
Hello,

I have a question regarding multiple regression.

I am reading a paper in which the author performed a multiple regression to predict the energy consumption of an electric car based on a 27 variables measured during journeys, such as speed and acceleration etc.

The author categorised the variables into 4 groups as shown in this table. If 2 variables were correlated within the group he dropped one variable. At the end he has 16 nominated variables for the regression.

https://dl.dropbox.com/u/54057365/All/regtable.JPG

My questions are:

1. What is the advantage of using the categories?
2. What if two variables in separate groups are correlated?
3. Could he have put all the variables in one group and did a stepwise or best subsets regression?

The reason I am asking these questions is because, multicollinearity does not matter if your regression is for prediction. He is removing correlated variables within the categories but not between the categories.

I would of thought that leaving them all in one category, dropping one of two highly correlated variables and then doing a best subsets regression would be a better approach.

My main question is, what if any is the advantage of using the 4 categories?

Thank you

John
 
Last edited by a moderator:
Physics news on Phys.org
1. What is the advantage of using the categories?
I would guess that you look at correlations only where you expect them to come from general concepts of car/tours/drivers and not from specific routes chosen for the calibration.

Categories reduce the complexity of the analysis a bit - maybe it is just a question of computational power. The "best" concept (in an ideal world with test data of arbitrary size and infinite computation power) would be to use all variables, but that might be impractical.
 

Similar threads

  • · Replies 4 ·
Replies
4
Views
2K
  • · Replies 5 ·
Replies
5
Views
2K
  • · Replies 2 ·
Replies
2
Views
2K
  • · Replies 13 ·
Replies
13
Views
2K
  • · Replies 1 ·
Replies
1
Views
2K
  • · Replies 1 ·
Replies
1
Views
3K
  • · Replies 5 ·
Replies
5
Views
2K
  • · Replies 1 ·
Replies
1
Views
4K
  • · Replies 4 ·
Replies
4
Views
1K
  • · Replies 13 ·
Replies
13
Views
2K