I Handling categorical variables in R

  • I
  • Thread starter Thread starter fog37
  • Start date Start date
  • Tags Tags
    Variables
fog37
Messages
1,566
Reaction score
108
TL;DR Summary
Handling categorical variables in R
Hello R users,

My general understanding is that, in R, nominal categorical variables (with 2 or more levels) must be first converted into factors and THEN to dummy variables (k-1 dummy variables for k levels). Is that correct?

Once we accomplish categorical variable -> factor -> dummy variables, we can then use the dummy variable as an independent or dependent variable in a statistical model (P.S. : when using the function ##lm()## in R, the function ##lm()## automatically does the dummy variable conversion but I am not sure that being true for other models).

What if we converted the categorical variable to dummy variables without the intermediate factor step? Would that still work in R?

Python does not have factors so that intermediate "factor" step does not apply...

Thanks!
 
Physics news on Phys.org
Can you give a code example? I'm not sure what the factor step is but seeing what's actually called might help.
 
fog37 said:
TL;DR Summary: Handling categorical variables in R

What if we converted the categorical variable to dummy variables without the intermediate factor step? Would that still work in R?
I have never tried this, but from my experience I would think that yes you could do that. You would lose the ability to choose different contrasts, since that would be your dummy variables. But I don’t see why it wouldn’t work
 
Hi all, I've been a roulette player for more than 10 years (although I took time off here and there) and it's only now that I'm trying to understand the physics of the game. Basically my strategy in roulette is to divide the wheel roughly into two halves (let's call them A and B). My theory is that in roulette there will invariably be variance. In other words, if A comes up 5 times in a row, B will be due to come up soon. However I have been proven wrong many times, and I have seen some...
Thread 'Detail of Diagonalization Lemma'
The following is more or less taken from page 6 of C. Smorynski's "Self-Reference and Modal Logic". (Springer, 1985) (I couldn't get raised brackets to indicate codification (Gödel numbering), so I use a box. The overline is assigning a name. The detail I would like clarification on is in the second step in the last line, where we have an m-overlined, and we substitute the expression for m. Are we saying that the name of a coded term is the same as the coded term? Thanks in advance.
Back
Top