Dismiss Notice
Join Physics Forums Today!
The friendliest, high quality science and math community on the planet! Everyone who loves science is here!

Degrees of Freedom

  1. Mar 14, 2014 #1
    Hi all

    If my model consists of two steps,

    e.g, multiple linear regression to get an estimate of an intermediate response variable

    followed by a further regression to get the final estimate of response variable

    To estimate the degrees of freedom for the total model can I simply sum the degrees of freedom for the individual models?

    Thanks

    Emma
     
  2. jcsd
  3. Mar 14, 2014 #2

    FactChecker

    User Avatar
    Science Advisor
    Gold Member

    I always hate it when someone suggests a different approach instead of answering the original question, but I have to suggest this: Why not apply multiple linear regression directly to estimate the response variable? It seems like either way will end up with a linear estimator, but the direct approach will allow you to apply existing tools and obtain all the relevant statistical information directly.
     
  4. Mar 15, 2014 #3
    Hi Fact checker, the reason I phrased the question as such is because it's slightly more complicated than regression, I'm comparing two "models", one of which requires pre processing of the data so want to know if, during this pre processing step I can simply add the degrees of freedom for each individual step?
     
  5. Mar 15, 2014 #4

    FactChecker

    User Avatar
    Science Advisor
    Gold Member

    Emma, I see. My opinion is that you can only add the independent degrees of freedom of the second process from variables that are not the result of the first process. Anything more than that is beyond my abilities.
     
  6. Mar 15, 2014 #5

    Stephen Tashi

    User Avatar
    Science Advisor

    Emma,

    "Degrees of freedom" has no precise meaning until a particular context is specified. (This is the case with many mathematical terms like "dual", "conjugate", "closed", "homogeneous".) I assume you are using a particular statistic or procedure which requires a "degrees of freedom" number. Explain exactly what procedure or formula you intend to use.
     
  7. Mar 15, 2014 #6
    Hi I using the F test for the comparsion of two models
     
  8. Mar 16, 2014 #7

    Stephen Tashi

    User Avatar
    Science Advisor

    As I understand your first post, it talks about a single linear model that is created in stages. This model is not (in general) the same model that you would obtain by a least-squares linear regression because you did the fit in two stages. Your final model is something like z = A x1 + B y + C where x1 is from the data, and y is not. The "intermediate" variable y is the result of a least squares fit to the data that gave y = D x2 + E x3 + F where x2 and x3 are values from the data.

    In least squares regression, to obtain a model z = A x1 + By + C, we assume there are no "errors" in the x1 and y measurements. So you can't say that your procedure produces the same model as you would have obtained if you had done the regression in a single step using the data (x1, x2, x3) because there are "errors" in the y values. (The method of "total least squares" regression is often used when the model assumes errors exist in several of the variables.)

    One technicality to investigate, is whether the F-test comparison of two linear models actually applies if one model is not the result of a least squares fit.

    If you are comparing two models, then I assume the two models predict the same variable, which is z in my example. If so, my example describes only one of the models. Where does the other model come from?
     
  9. Mar 17, 2014 #8
    Thanks for the reply stephen, but without drifting off the topic too much, with respect to the degrees of freedom, is it legitimate to add the degrees of freedom for each individual step?

    Thanks

    Emma
     
  10. Mar 17, 2014 #9

    FactChecker

    User Avatar
    Science Advisor
    Gold Member

    The math of "degrees of freedom" allows you to count up the number of variables in an equation that are independent of others and are free to vary. In that context, you can add them as long as you do not count the variables that are a result of your first step. Those variables are not free to vary since they are calculated in the first step.

    However, using the degrees of freedom in statistics like F or chi-squared requires additional assumptions about the distribution of the free variables and about the equation of the statistic being calculated. Since your calculation is not one of the usual ones (sample mean, sample variance, linear regression, goodness of fit, etc.), it is not clear what statistics are valid to use, even if your degrees of freedom is correct. To use the standard distributions, you will have to use one of the processes that they apply to.
     
  11. Mar 18, 2014 #10

    Stephen Tashi

    User Avatar
    Science Advisor

    Attempting some mind reading, the answer is no.

    let's say you are dealing with a linear model and "degrees of freedom" in your context means the number of parameters in the model that were determined when you fit the model to data.

    Using the previous example, z = A x1 + B y + C can be written as z = A x1 + B( D x2 + E x3 + F) + C = A x1 + BD x2 + BE x3 + BF + C. This amounts to a linear model with 4 parameters P1 = A, P2 = BD, P3 = BE and P4 = (BF + C). So there are 4 degrees of freedom.

    There are 3 parameters in z = A x1 + B y + C and 3 parameters in y = D x2 + E x3 + F but there are only 4 parameters in the model that expresses z as a linear function of x1,x2,x3.
     
  12. Mar 18, 2014 #11
    Thanks stephen

    Im a little confused as to why your not counting e.g, BD as two parameters. In MLR where e.g z=AX1, where A will now have many parameters, aren't all elements of A counted in this case?

    In the example you give above the addition of number of parameters (=6 parameters above) would always result in more parameters when one simply adds them, resulting in even less degrees of freedom.

    When comparing the sum of square residuals of two models using the F test, a simple model (S1) with degrees of freedom DF1 and a more complex model (S2) having less degrees (DF2):

    F=[(S1-S2)/S2]/[(DF1-DF2)/DF2]

    estimating less degrees of freedom in the complex model than may perhaps exist would give a smaller F ratio and thus favour the simpler model. Would this assumption be correct?

    Thanks for your help

    Emma
     
  13. Mar 18, 2014 #12

    Stephen Tashi

    User Avatar
    Science Advisor

    My undestanding of applying the F test to compare linear models is that we assume the models are nested and that they are each least squares fit to the data (...and a lot of other assumptions). So it's hard to answer your question because you are not fitting a model to data by a method that is guaranteed to produce a least squares fit. (and you have not mentioned a second model to which your first model is being compared.)

    But suppose we have a linear model of the form z = P x + Q y + R where P,Q,R are constants. Suppose we fit this to data by some procedure that goes in stages and is guaranteed to produce a least squares fit in the end. We write the model as z = (A)(B)(C) + (D)(E)(F) y + (G)(H)(I) in stage 1, we find A,D,G. In stage 2 we find B,E,H. In stage 3 we find C,F,I. This does not change the fact that the final result of the process is a linear model that is a least squares fit to the data and has the form z = P x + Q y + R, which involves 3 constants.
     
  14. Mar 18, 2014 #13
    Thanks stephen for your answer.

    I was however under the impression that the F test can still be used if the models were not fitted using least squares. Quoting wiki

    "It is most often used when comparing statistical models that have been fitted to a data set, in order to identify the model that best fits the population from which the data were sampled. Exact "F-tests" mainly arise when the models have been fitted to the data using least squares. "

    I understand your proposition with respect to number of constants. Could you please however clarify my earlier point.

    "Im a little confused as to why your not counting e.g, BD as two parameters. In MLR where e.g z=AX1, where A will now have many parameters, aren't all elements of A counted in this case?"

    Thanks

    Emma
     
  15. Mar 18, 2014 #14

    Stephen Tashi

    User Avatar
    Science Advisor

    We'd have to investigate whether "inexact" F-tests are good idea.. I don't know if the wiki is merely stating that they are a customary practice or whether it is stating they are a mathematically justifiable practice.

    The counting of parameters in the linear model counts the number of constants in the model, with each numerical coefficient of a variable counted as single constant and the "constant term" of the model counted as a single constant. So for the term 16 x, the numerical value 16 is one constant even though it could be factored as (8)(2) or (2)(2)(2)(2).
     
  16. Mar 19, 2014 #15
    Hi Stephen thanks for your help:

    The models I'm comparin are:

    Model1: Y=A(B^-1)X

    Model2: Y=A([ Z-CH]D+ C(B^-1)X)

    How would you best proceed in this instance?

    Thanks

    Emma
     
  17. Mar 19, 2014 #16

    Stephen Tashi

    User Avatar
    Science Advisor

    Which of those letters represent independent variables? X and Z ? Z isn't a function of X?
     
  18. Mar 19, 2014 #17
    X is the independent variable, Z is a linear function of a projection of X, i.e S , in a lower dimensional domain, i.e an autoregressive describing the evolution in time of S.

    S=(A^-1)X;
     
  19. Mar 19, 2014 #18

    Stephen Tashi

    User Avatar
    Science Advisor

    I suggest you give a precise definition of things involved.

    I don't know what "an autoregressive" might be. The term "autoregressive" suggests your independent variables might be values indexed with time. Is X a vector of values indexed by a "time" ? Or is the kth component of X the value of something at time k? Are the values of the dependent variable Y also indexed by time?
     
  20. Mar 20, 2014 #19
    Hi Stephen,

    Thanks for your patience with my problem:

    Model1: Y=A(B^-1)X

    A are the eigenvectors of Y,
    B are the eigenvectors of X,
    So the above is a total least squares type problem.
    There are no time indices in the above method

    Model2: Y=A([ Z-CH]D+ C(B^-1)X)

    [ Z-CH]D+ C - is a Kalman filter

    some preliminaries:
    S=(A^-1)X;
    D=(B^-1)Y
    S is an estimate of X in a lower dimensional domain
    D is an estimate of Y in a lower dimensional domain

    C is the Kalman gain
    H is a liner model between S and D
    Z is an autoregressive fit of D

    From the above A and B are fixed, all other parameters can vary, as the Kalman filter is adaptive, dependent upon Y using an EM algorithm.

    Thanks for any help you may have
     
  21. Mar 20, 2014 #20

    Stephen Tashi

    User Avatar
    Science Advisor

    It isn't clear what you are doing, because you haven't described the format of the observed data you are using.

    One effort at mind reading says that your data consists of M ordered pairs of vectors (X,Y), so to exhibit one pair of vectors as scalars (X[k],Y[k]) = ( (X[k][1],X[k][2]...X[k][nx]), (Y[k][1],Y[k][2],...Y[k][ny])

    Another effort at mind reading says your data consists of M ordered pairs of scalars (x[k],y[k]) and that there is a single vector Y = (y[1],y[2],...y[M]) and a single vector X = (x[1],x[2],...x[M]).

    You haven't written an equation which shows any random errors, so it isn't clear why you say that the fit is a total least squares problem. I assume you mean that the model assumes a random additive error in both the X and Y terms.



    What do X and Y represent in this model? (Are they the same variables in this model as they are in Model1?)

    The term Kalman filter suggests that there are time indices involved in this model. Which index represents time?
     
Know someone interested in this topic? Share this thread via Reddit, Google+, Twitter, or Facebook




Similar Discussions: Degrees of Freedom
  1. Degrees of Freedom (Replies: 1)

Loading...