Dismiss Notice
Join Physics Forums Today!
The friendliest, high quality science and math community on the planet! Everyone who loves science is here!

A Different results for factor vs continuous

  1. Apr 27, 2017 #1
    So, I'm doing an interaction model with response vs treatment_type interaction with age+controls(for confounders) with age being continous, say patients ranging from 20 years old to 90 years old.

    so I have two models.
    y=age+treatment_type . + . age*treatment_type

    y=factor(age)+treatment_type . + . factor(age)*treatment_type

    Basically what I got was that age is highly significant for the continous model. For slight deviations in age, there is a huge effect on of treatment.

    However, when age is factorized into groups, there is barely any interaction effect. (pvalue not significant)


    Why? What could be the reason for this. Is the reason why it's so significant for the continous model is that patients simply differ from each other so much, that it seems like age has an effect, but it really doesn't.

    Afterall, it has no effect when looking at groups. But somehow, within the groups, slight deviations give a large effect.
     
  2. jcsd
  3. Apr 27, 2017 #2

    FactChecker

    User Avatar
    Science Advisor
    Gold Member

    Just to clarify, how are your age factor levels defined?
     
  4. Apr 27, 2017 #3

    Dale

    Staff: Mentor

    That seems suspicious on its own. If an effect is large then you should clearly see it when you plot the data even without doing statistics. Is that the case?

    How do your regression diagnostic plots look? Do you have some high leverage or otherwise suspicious points?

    How many groups have you factored age into? If you have factored it into many groups then you will have a model with a large number of degrees of freedom. A good statistics package will take that into account and reduce the significance correspondingly.
     
  5. Apr 28, 2017 #4
    I've split it up into 4 sections. So basically 20-40, 40-60 etc.
     
  6. Apr 28, 2017 #5
    Many. But I can't reject those because they occur due to some systematic process. I've accounted for that by using a negative binomial link.
    Just 4. But I've factored it again into many. And here's the plot.

    ZIgzag_Plot.png


    It seems that they are maybe canceling.
     
  7. Apr 28, 2017 #6

    FactChecker

    User Avatar
    Science Advisor
    Gold Member

    Looking at your data, it looks like there is only one (solid line treatment, age (18.9, 27.4]) combination that is significantly different from the others. (Are the different lines different treatments?) Is there much data in that combination category or could it be a small-sample outlier?

    I recommend that you statistically analyse the one glaring (solid line treatment, age (18.9, 27.4]) combination as one step and then look at the others in a separate statistical analysis.
     
    Last edited: Apr 28, 2017
  8. Apr 28, 2017 #7

    Dale

    Staff: Mentor

    That doesn't look like it should be non significant. How does the data itself look. Can you see the interaction in the raw data?
     
  9. Apr 28, 2017 #8
    When I increase the number of partitions in the factor, it seems that there it follows the same trend(just a bunch of zigzags with the solid one being the most prominant). I think that is why for continuous age, it's highly significant, because even one slight increment in the age could send it in a certain direction.
     
  10. Apr 28, 2017 #9
    So split the data into two different sets? It is a small sample. Less than 10% of the data set. Yet, there's only a small amount of people under this treatment option in the first place. So every data point counts. The response is a count of relatively rare events( negative side effect) so most of the response would be zero anyway
     
  11. Apr 28, 2017 #10

    FactChecker

    User Avatar
    Science Advisor
    Gold Member

    From the looks of the data, I think that it would be very misleading to allow the extreme result from one combination of (treatment, age) to influence your conclusions about the other combinations. In fact, the other treatments show, if anything, a slight bit of the opposite trend. If you do not address that combination separately, I don't think your conclusions will have any merit.
     
  12. Apr 28, 2017 #11

    Dale

    Staff: Mentor

    Then it doesn't sound like you will have enough data points to justify a large number of degrees of freedom. That is probably driving the lack of significance somewhat.

    Also, do you see this interaction when you plot the data itself (not the fit)? I think I have asked this three times now.
     
  13. Apr 28, 2017 #12
    I see. That makes sense. So generally, would I need to have the samples balanced?

    I thought of one thing, so it might not work because there is such a low number within that combination. Like in the tens compared to over a thousand total.

    But if age is continuous, then there is no combination sample of data. Its just the whole data set. Is this the correct way to see it?

    I'm not sure what you mean. The pattern that I plotted was based on the data. It wasn't derived from a regression.
     
Know someone interested in this topic? Share this thread via Reddit, Google+, Twitter, or Facebook

Have something to add?
Draft saved Draft deleted



Similar Discussions: Different results for factor vs continuous
  1. P vs NP and factoring (Replies: 3)

Loading...