Dismiss Notice
Join Physics Forums Today!
The friendliest, high quality science and math community on the planet! Everyone who loves science is here!

Degrees of Freedom

  1. Mar 14, 2014 #1
    Hi all

    If my model consists of two steps,

    e.g, multiple linear regression to get an estimate of an intermediate response variable

    followed by a further regression to get the final estimate of response variable

    To estimate the degrees of freedom for the total model can I simply sum the degrees of freedom for the individual models?

    Thanks

    Emma
     
  2. jcsd
  3. Mar 14, 2014 #2

    FactChecker

    User Avatar
    Science Advisor
    Gold Member
    2017 Award

    I always hate it when someone suggests a different approach instead of answering the original question, but I have to suggest this: Why not apply multiple linear regression directly to estimate the response variable? It seems like either way will end up with a linear estimator, but the direct approach will allow you to apply existing tools and obtain all the relevant statistical information directly.
     
  4. Mar 15, 2014 #3
    Hi Fact checker, the reason I phrased the question as such is because it's slightly more complicated than regression, I'm comparing two "models", one of which requires pre processing of the data so want to know if, during this pre processing step I can simply add the degrees of freedom for each individual step?
     
  5. Mar 15, 2014 #4

    FactChecker

    User Avatar
    Science Advisor
    Gold Member
    2017 Award

    Emma, I see. My opinion is that you can only add the independent degrees of freedom of the second process from variables that are not the result of the first process. Anything more than that is beyond my abilities.
     
  6. Mar 15, 2014 #5

    Stephen Tashi

    User Avatar
    Science Advisor

    Emma,

    "Degrees of freedom" has no precise meaning until a particular context is specified. (This is the case with many mathematical terms like "dual", "conjugate", "closed", "homogeneous".) I assume you are using a particular statistic or procedure which requires a "degrees of freedom" number. Explain exactly what procedure or formula you intend to use.
     
  7. Mar 15, 2014 #6
    Hi I using the F test for the comparsion of two models
     
  8. Mar 16, 2014 #7

    Stephen Tashi

    User Avatar
    Science Advisor

    As I understand your first post, it talks about a single linear model that is created in stages. This model is not (in general) the same model that you would obtain by a least-squares linear regression because you did the fit in two stages. Your final model is something like z = A x1 + B y + C where x1 is from the data, and y is not. The "intermediate" variable y is the result of a least squares fit to the data that gave y = D x2 + E x3 + F where x2 and x3 are values from the data.

    In least squares regression, to obtain a model z = A x1 + By + C, we assume there are no "errors" in the x1 and y measurements. So you can't say that your procedure produces the same model as you would have obtained if you had done the regression in a single step using the data (x1, x2, x3) because there are "errors" in the y values. (The method of "total least squares" regression is often used when the model assumes errors exist in several of the variables.)

    One technicality to investigate, is whether the F-test comparison of two linear models actually applies if one model is not the result of a least squares fit.

    If you are comparing two models, then I assume the two models predict the same variable, which is z in my example. If so, my example describes only one of the models. Where does the other model come from?
     
  9. Mar 17, 2014 #8
    Thanks for the reply stephen, but without drifting off the topic too much, with respect to the degrees of freedom, is it legitimate to add the degrees of freedom for each individual step?

    Thanks

    Emma
     
  10. Mar 17, 2014 #9

    FactChecker

    User Avatar
    Science Advisor
    Gold Member
    2017 Award

    The math of "degrees of freedom" allows you to count up the number of variables in an equation that are independent of others and are free to vary. In that context, you can add them as long as you do not count the variables that are a result of your first step. Those variables are not free to vary since they are calculated in the first step.

    However, using the degrees of freedom in statistics like F or chi-squared requires additional assumptions about the distribution of the free variables and about the equation of the statistic being calculated. Since your calculation is not one of the usual ones (sample mean, sample variance, linear regression, goodness of fit, etc.), it is not clear what statistics are valid to use, even if your degrees of freedom is correct. To use the standard distributions, you will have to use one of the processes that they apply to.
     
  11. Mar 18, 2014 #10

    Stephen Tashi

    User Avatar
    Science Advisor

    Attempting some mind reading, the answer is no.

    let's say you are dealing with a linear model and "degrees of freedom" in your context means the number of parameters in the model that were determined when you fit the model to data.

    Using the previous example, z = A x1 + B y + C can be written as z = A x1 + B( D x2 + E x3 + F) + C = A x1 + BD x2 + BE x3 + BF + C. This amounts to a linear model with 4 parameters P1 = A, P2 = BD, P3 = BE and P4 = (BF + C). So there are 4 degrees of freedom.

    There are 3 parameters in z = A x1 + B y + C and 3 parameters in y = D x2 + E x3 + F but there are only 4 parameters in the model that expresses z as a linear function of x1,x2,x3.
     
  12. Mar 18, 2014 #11
    Thanks stephen

    Im a little confused as to why your not counting e.g, BD as two parameters. In MLR where e.g z=AX1, where A will now have many parameters, aren't all elements of A counted in this case?

    In the example you give above the addition of number of parameters (=6 parameters above) would always result in more parameters when one simply adds them, resulting in even less degrees of freedom.

    When comparing the sum of square residuals of two models using the F test, a simple model (S1) with degrees of freedom DF1 and a more complex model (S2) having less degrees (DF2):

    F=[(S1-S2)/S2]/[(DF1-DF2)/DF2]

    estimating less degrees of freedom in the complex model than may perhaps exist would give a smaller F ratio and thus favour the simpler model. Would this assumption be correct?

    Thanks for your help

    Emma
     
  13. Mar 18, 2014 #12

    Stephen Tashi

    User Avatar
    Science Advisor

    My undestanding of applying the F test to compare linear models is that we assume the models are nested and that they are each least squares fit to the data (...and a lot of other assumptions). So it's hard to answer your question because you are not fitting a model to data by a method that is guaranteed to produce a least squares fit. (and you have not mentioned a second model to which your first model is being compared.)

    But suppose we have a linear model of the form z = P x + Q y + R where P,Q,R are constants. Suppose we fit this to data by some procedure that goes in stages and is guaranteed to produce a least squares fit in the end. We write the model as z = (A)(B)(C) + (D)(E)(F) y + (G)(H)(I) in stage 1, we find A,D,G. In stage 2 we find B,E,H. In stage 3 we find C,F,I. This does not change the fact that the final result of the process is a linear model that is a least squares fit to the data and has the form z = P x + Q y + R, which involves 3 constants.
     
  14. Mar 18, 2014 #13
    Thanks stephen for your answer.

    I was however under the impression that the F test can still be used if the models were not fitted using least squares. Quoting wiki

    "It is most often used when comparing statistical models that have been fitted to a data set, in order to identify the model that best fits the population from which the data were sampled. Exact "F-tests" mainly arise when the models have been fitted to the data using least squares. "

    I understand your proposition with respect to number of constants. Could you please however clarify my earlier point.

    "Im a little confused as to why your not counting e.g, BD as two parameters. In MLR where e.g z=AX1, where A will now have many parameters, aren't all elements of A counted in this case?"

    Thanks

    Emma
     
  15. Mar 18, 2014 #14

    Stephen Tashi

    User Avatar
    Science Advisor

    We'd have to investigate whether "inexact" F-tests are good idea.. I don't know if the wiki is merely stating that they are a customary practice or whether it is stating they are a mathematically justifiable practice.

    The counting of parameters in the linear model counts the number of constants in the model, with each numerical coefficient of a variable counted as single constant and the "constant term" of the model counted as a single constant. So for the term 16 x, the numerical value 16 is one constant even though it could be factored as (8)(2) or (2)(2)(2)(2).
     
  16. Mar 19, 2014 #15
    Hi Stephen thanks for your help:

    The models I'm comparin are:

    Model1: Y=A(B^-1)X

    Model2: Y=A([ Z-CH]D+ C(B^-1)X)

    How would you best proceed in this instance?

    Thanks

    Emma
     
  17. Mar 19, 2014 #16

    Stephen Tashi

    User Avatar
    Science Advisor

    Which of those letters represent independent variables? X and Z ? Z isn't a function of X?
     
  18. Mar 19, 2014 #17
    X is the independent variable, Z is a linear function of a projection of X, i.e S , in a lower dimensional domain, i.e an autoregressive describing the evolution in time of S.

    S=(A^-1)X;
     
  19. Mar 19, 2014 #18

    Stephen Tashi

    User Avatar
    Science Advisor

    I suggest you give a precise definition of things involved.

    I don't know what "an autoregressive" might be. The term "autoregressive" suggests your independent variables might be values indexed with time. Is X a vector of values indexed by a "time" ? Or is the kth component of X the value of something at time k? Are the values of the dependent variable Y also indexed by time?
     
  20. Mar 20, 2014 #19
    Hi Stephen,

    Thanks for your patience with my problem:

    Model1: Y=A(B^-1)X

    A are the eigenvectors of Y,
    B are the eigenvectors of X,
    So the above is a total least squares type problem.
    There are no time indices in the above method

    Model2: Y=A([ Z-CH]D+ C(B^-1)X)

    [ Z-CH]D+ C - is a Kalman filter

    some preliminaries:
    S=(A^-1)X;
    D=(B^-1)Y
    S is an estimate of X in a lower dimensional domain
    D is an estimate of Y in a lower dimensional domain

    C is the Kalman gain
    H is a liner model between S and D
    Z is an autoregressive fit of D

    From the above A and B are fixed, all other parameters can vary, as the Kalman filter is adaptive, dependent upon Y using an EM algorithm.

    Thanks for any help you may have
     
  21. Mar 20, 2014 #20

    Stephen Tashi

    User Avatar
    Science Advisor

    It isn't clear what you are doing, because you haven't described the format of the observed data you are using.

    One effort at mind reading says that your data consists of M ordered pairs of vectors (X,Y), so to exhibit one pair of vectors as scalars (X[k],Y[k]) = ( (X[k][1],X[k][2]...X[k][nx]), (Y[k][1],Y[k][2],...Y[k][ny])

    Another effort at mind reading says your data consists of M ordered pairs of scalars (x[k],y[k]) and that there is a single vector Y = (y[1],y[2],...y[M]) and a single vector X = (x[1],x[2],...x[M]).

    You haven't written an equation which shows any random errors, so it isn't clear why you say that the fit is a total least squares problem. I assume you mean that the model assumes a random additive error in both the X and Y terms.



    What do X and Y represent in this model? (Are they the same variables in this model as they are in Model1?)

    The term Kalman filter suggests that there are time indices involved in this model. Which index represents time?
     
  22. Mar 21, 2014 #21
    Hi Stephen,

    your correct M ordered pairs of vectors (X,Y) represents the data.

    The fit is total least squares as the eigen domain is used, i.e orthogonal regression

    X and Y are the same in model 1 and model 2. Model 1 however is dynamic as mentioned previously

    Yk=A([ Zk-CkHk]Dk+ Ck(B^-1)Xk) - - - with k as a time indicie

    similarly model 1 can be written as

    Yk=A(B^-1)Xk

    if one wishes, just depends on how the model is being used, i,e with a batch of Xk's or just incremental data points?

    Any idea how to proceed here?
     
  23. Mar 22, 2014 #22

    Stephen Tashi

    User Avatar
    Science Advisor

    Use simulation.

    For simulation you need stochastic models, not mere curve fits. Each model, should specify a method for making a deterministic prediction, (such as Y = AX) but it also must specify a model for how the observed data arises in a stochastic manner (such as Y = AX + B* err(k}, where err(k) is an independent random draw at each time k from normal distribution with mean 0 and variance 1.)

    I think your x-data is a time series of vectors. You need to generate representative examples of the x-data by simulation or have such examples from actual observations (i.e. one "example" is an entire time series of vectors). So you might need a stochastic model for the x-data.

    I am assuming your predictive models give the predicted y-value as a function of the observed x-values , not as a function of the underlying "true" x-values. Of course a model may use the observed x-values to predict the "true" x-values and then make it's prediction based on those estimates.

    Once you have the capability to do simulations, you can investigate various statistics by the Monte-Carlo method.

    -------------
    For example:

    Let model_X be the stochastic model for generating the x-data.

    Create a Mont-Carlo simulation involving two (possibly identical models) model_A and model_B as follows. One replication of the simulation is:

    1) Generate the X-data using model_X
    2) Generate the Y-data using the stochastic model associated with model_A
    3) Generate the predicted Y-data using the deterministic model associated with model_A
    4) Compute RSS_A = the sum of the squared residuals between the Y-data of step 2 and the predicted Y-values of step 3.

    ( I'm assuming that when using the F-test, your intent was to define the "residual" between two vectors as the euclidean distance between them. Whether this is wise depends on details of the real world problem.)

    5) Generate the Y_data using the stochastic model associated with model_B
    6) Generate the the predicted Y-data using the deterministic model associated with model_B
    7) Compute RSS_B = the sum of the squares of the residuals between the Y_data from step 5 and the predictions of step 6.

    8) Compute G = (RSS_A - RSS_B) / RSS_B

    G is an obvious imitation of the F-statistic. We don't know that G has the same distribution as any F-statistic, so we shouldn't call it one.

    We can set model_A = model_B = your model2 and use the Monte-Carlo simulation to estimate the distribution of G. (When the stochastic model associated with model 2 is applied to the same X-data twice, it probably won't produce the same residuals due to the stochastic terms. Hence the value of G will vary on different replications.)

    Take the "null hypothesis" to be that model1 is the same as model2 (as far as producing residuals goes). Compute the single numerical value G_obs = (RSS_1 - RSS_2)/RSS_2 by applying the two models to the actually observed X-data. Use the distribution of G to compute how likely it is to get a value of G equal or greater than G_obs. Then "accept" or "reject" the null hypothesis based on how this probability compares to whatever "significance level" you have chosen.
     
  24. Mar 22, 2014 #23
    Hi stephen thanks for your lengthy reply,

    wouldn't i still however be in the same predicament, as the testing of the null hypothesis requires the degrees of freedom for each model, which is what I am not sure about?

    Thanks

    Emma
     
  25. Mar 22, 2014 #24

    Stephen Tashi

    User Avatar
    Science Advisor

    No, you don't need to know any "degrees of freedom" information. You use the empirical distribution of G to do the test. There is never any need to deal with F statistics.
     
  26. Mar 23, 2014 #25
    I actually have 10 examples of the ground truth Y, would this be enough to not perform siulation as I am calculating the residuals (RSS_A - RSS_B) / RSS_B using them. Also the technique you mention would be dependent upon what one chooses as err{k}, in Y = AX + B* err(k}.

    I'm also a little confused on how you would use the emprical distirbution of G to do the test, if you could clarify that would be great.

    Thanks so much for your help

    Emma
     
Share this great discussion with others via Reddit, Google+, Twitter, or Facebook




Loading...
Similar Threads for Degrees Freedom
I Degrees of Freedom