Multiple regression, why use categories

In summary, the author used categories to reduce the complexity of the analysis and potentially save on computational power. However, this approach may not account for correlations between variables in different categories. It is unclear if there is a specific advantage to using these categories in the multiple regression analysis.
  • #1
bradyj7
122
0
Hello,

I have a question regarding multiple regression.

I am reading a paper in which the author performed a multiple regression to predict the energy consumption of an electric car based on a 27 variables measured during journeys, such as speed and acceleration etc.

The author categorised the variables into 4 groups as shown in this table. If 2 variables were correlated within the group he dropped one variable. At the end he has 16 nominated variables for the regression.

https://dl.dropbox.com/u/54057365/All/regtable.JPG

My questions are:

1. What is the advantage of using the categories?
2. What if two variables in separate groups are correlated?
3. Could he have put all the variables in one group and did a stepwise or best subsets regression?

The reason I am asking these questions is because, multicollinearity does not matter if your regression is for prediction. He is removing correlated variables within the categories but not between the categories.

I would of thought that leaving them all in one category, dropping one of two highly correlated variables and then doing a best subsets regression would be a better approach.

My main question is, what if any is the advantage of using the 4 categories?

Thank you

John
 
Last edited by a moderator:
Physics news on Phys.org
  • #2
1. What is the advantage of using the categories?
I would guess that you look at correlations only where you expect them to come from general concepts of car/tours/drivers and not from specific routes chosen for the calibration.

Categories reduce the complexity of the analysis a bit - maybe it is just a question of computational power. The "best" concept (in an ideal world with test data of arbitrary size and infinite computation power) would be to use all variables, but that might be impractical.
 

1. What is multiple regression?

Multiple regression is a statistical method used to analyze the relationship between two or more independent variables and a dependent variable. It allows for the prediction of the value of the dependent variable based on the values of the independent variables.

2. Why use multiple regression?

Multiple regression is used to understand the relationship between multiple variables and how they affect the outcome of interest. It is particularly useful in predicting future outcomes and identifying which variables have the strongest impact on the dependent variable.

3. What are the benefits of using categories in multiple regression?

Using categories in multiple regression allows for the inclusion of categorical variables, such as gender or race, in the analysis. This can provide valuable insights into how these variables impact the dependent variable, which may not be possible with only numerical variables.

4. How do categories affect the interpretation of the regression results?

Categories affect the interpretation of the regression results by providing information on the unique effects of each category on the dependent variable. This can help to identify any significant differences between the categories and their impact on the outcome.

5. Are there any limitations to using categories in multiple regression?

One limitation of using categories in multiple regression is the potential for multicollinearity, where the categories may be highly correlated with each other. This can make it difficult to determine the individual effects of each category on the dependent variable. Additionally, categories with a small number of observations may not be representative of the population and can lead to biased results.

Similar threads

  • Set Theory, Logic, Probability, Statistics
Replies
4
Views
1K
  • Set Theory, Logic, Probability, Statistics
Replies
2
Views
489
  • Set Theory, Logic, Probability, Statistics
Replies
1
Views
2K
  • Set Theory, Logic, Probability, Statistics
Replies
1
Views
2K
  • Set Theory, Logic, Probability, Statistics
Replies
1
Views
3K
  • Set Theory, Logic, Probability, Statistics
Replies
4
Views
1K
Replies
1
Views
2K
  • Set Theory, Logic, Probability, Statistics
Replies
1
Views
4K
  • Set Theory, Logic, Probability, Statistics
Replies
1
Views
3K
  • Set Theory, Logic, Probability, Statistics
Replies
13
Views
1K
Back
Top