Dismiss Notice
Join Physics Forums Today!
The friendliest, high quality science and math community on the planet! Everyone who loves science is here!

A Comparing Kullback-Leibler divergence values

  1. Apr 17, 2017 #1
    I’m currently evaluating the "realism" of two survival models in R by comparing the respective Kullback-Leibler divergence between their simulated survival time dataset (`dat.s1` and `dat.s2`) and a “true”, observed survival time dataset (`dat.obs`). Initially, directed KLD functions show that `dat.s2` is a better match to the observation:

    > library(LaplacesDemon)
    > KLD(dat.s1, dat.obs)$sum.KLD.py.px
    [1] 1.17196
    > KLD(dat.s2, dat.obs)$sum.KLD.py.px
    [1] 0.8827712​

    However, when I visualize the densities of all three datasets, it seems quite clear that `dat.s1` (green) better alignes with the observation:

    > plot(density(dat.obs), lwd=3, ylim=c(0,0.9))
    > lines(density(dat.s1), col='green')
    > lines(density(dat.s2), col='purple')​

    What is the cause behind this discrepancy? Am I applying KLD incorrectly due to some conceptual misunderstanding?

    Attached Files:

  2. jcsd
  3. Apr 17, 2017 #2
    Keep in mind that the KL-divergence is non-commutative, and different "orders" correspond to different objective functions (and different research questions). The way you're fitting it (that is, KL(Q||P), where Q is being fit to P) is trying to match regions of high density, and it does seem to be the case that the highest probability mass in your "worse fitting" model coincides with the highest probability mass in your target better than does the "better fitting" model. There's a fairly good discussion related to the topic here:


    The other direction may actually be closer to what you're interested in:
    KLD(dat.obs, dat.s1)$sum.KLD.py.px
    KLD(dat.obs, dat.s2)$sum.KLD.py.px
    Last edited: Apr 17, 2017
Share this great discussion with others via Reddit, Google+, Twitter, or Facebook

Have something to add?
Draft saved Draft deleted