I Handling categorical variables in R

  • I
  • Thread starter Thread starter fog37
  • Start date Start date
  • Tags Tags
    Variables
AI Thread Summary
In R, nominal categorical variables should be converted to factors before creating dummy variables, typically resulting in k-1 dummy variables for k levels. The lm() function in R automatically handles this conversion, but it's unclear if other models do the same. Converting categorical variables directly to dummy variables without the factor step may work, but it limits the ability to choose different contrasts. The discussion highlights that Python does not utilize factors, simplifying the process. Overall, understanding the factor step is crucial for effective categorical variable handling in R.
fog37
Messages
1,566
Reaction score
108
TL;DR Summary
Handling categorical variables in R
Hello R users,

My general understanding is that, in R, nominal categorical variables (with 2 or more levels) must be first converted into factors and THEN to dummy variables (k-1 dummy variables for k levels). Is that correct?

Once we accomplish categorical variable -> factor -> dummy variables, we can then use the dummy variable as an independent or dependent variable in a statistical model (P.S. : when using the function ##lm()## in R, the function ##lm()## automatically does the dummy variable conversion but I am not sure that being true for other models).

What if we converted the categorical variable to dummy variables without the intermediate factor step? Would that still work in R?

Python does not have factors so that intermediate "factor" step does not apply...

Thanks!
 
Physics news on Phys.org
Can you give a code example? I'm not sure what the factor step is but seeing what's actually called might help.
 
fog37 said:
TL;DR Summary: Handling categorical variables in R

What if we converted the categorical variable to dummy variables without the intermediate factor step? Would that still work in R?
I have never tried this, but from my experience I would think that yes you could do that. You would lose the ability to choose different contrasts, since that would be your dummy variables. But I don’t see why it wouldn’t work
 
Back
Top