pawelch
- 10
- 0
Hi all,
I am trying to devise a mathematical model for my project I am working at. Description is as follows:
we have a sample space
[tex] \Omega=\{w_1,w_2,\cdots, w_N\}[/tex]
It is very large. Suppose further, that we have some assumption of frequency of occurrence of each [itex]w_i[/itex] , stored in probability vector [itex]\pi[/itex] .
In general, suppose we observe occurrences of [itex]w_i[/itex] , stored as a sequence [itex]\{S\}[/itex] . Now, we would like to compare [itex]\{S\}[/itex] against [itex]\pi[/itex] .
One of the solution would be application of Kullback–Leibler divergence. However, the problem is that [itex]|S| < |\Omega|[/itex] (by [itex]||[/itex] I mean cardinality) and as a result, we will observe that for some [itex]w_i \in \Omega[/itex] , we have corresponding [itex]w_s \in S[/itex] that are 0 (it stems from the fact that [itex]\{S\}[/itex] has not managed to explore [itex]\Omega[/itex] throughout). In this case we have undefined element [itex]0\cdot \log \frac{0}{s_i} = \infty[/itex] .
In principal, I generate multiple [itex]\{S\}[/itex] that are of two types, say, [itex]\{1,0\}[/itex] . The underlying concept for my project is that [itex]\{S\}_1[/itex] of the first type will hit [itex]w_i[/itex] that posses high frequency in [itex]\Omega[/itex] , on the contrary [itex]\{S\}_2[/itex] of the second type would generate [itex]w_i[/itex] that have low frequency. And, it is unlikely that [itex]|S| = |\Omega|[/itex]
Thus, again, I thought I would compare [itex]\{S\}_1[/itex] against [itex]\pi[/itex] and then [itex]\{S\}_2[/itex] against [itex]\pi[/itex] and observe the differences. But, because of the assumption of [itex]0\cdot \log \frac{0}{s_i} = \infty[/itex] , I could not get it right. Thus, I thought that maybe I "normalise" (i.e. shrink) [itex]\Omega[/itex], so that the new [itex]\Omega[/itex] contains only elements that have occurred in [itex]\{S\}[/itex]. but I have been told it is not a good idea either.
So hmm.. well, the question is how should I compare both types [itex]\{S\}_1[/itex] and [itex]\{S\}_2[/itex] against [itex]\pi[/itex] if their are of different length ?
Thank you for any suggestions, and accept my apology for poor mathematical language of this description,
cheers!
I am trying to devise a mathematical model for my project I am working at. Description is as follows:
we have a sample space
[tex] \Omega=\{w_1,w_2,\cdots, w_N\}[/tex]
It is very large. Suppose further, that we have some assumption of frequency of occurrence of each [itex]w_i[/itex] , stored in probability vector [itex]\pi[/itex] .
In general, suppose we observe occurrences of [itex]w_i[/itex] , stored as a sequence [itex]\{S\}[/itex] . Now, we would like to compare [itex]\{S\}[/itex] against [itex]\pi[/itex] .
One of the solution would be application of Kullback–Leibler divergence. However, the problem is that [itex]|S| < |\Omega|[/itex] (by [itex]||[/itex] I mean cardinality) and as a result, we will observe that for some [itex]w_i \in \Omega[/itex] , we have corresponding [itex]w_s \in S[/itex] that are 0 (it stems from the fact that [itex]\{S\}[/itex] has not managed to explore [itex]\Omega[/itex] throughout). In this case we have undefined element [itex]0\cdot \log \frac{0}{s_i} = \infty[/itex] .
In principal, I generate multiple [itex]\{S\}[/itex] that are of two types, say, [itex]\{1,0\}[/itex] . The underlying concept for my project is that [itex]\{S\}_1[/itex] of the first type will hit [itex]w_i[/itex] that posses high frequency in [itex]\Omega[/itex] , on the contrary [itex]\{S\}_2[/itex] of the second type would generate [itex]w_i[/itex] that have low frequency. And, it is unlikely that [itex]|S| = |\Omega|[/itex]
Thus, again, I thought I would compare [itex]\{S\}_1[/itex] against [itex]\pi[/itex] and then [itex]\{S\}_2[/itex] against [itex]\pi[/itex] and observe the differences. But, because of the assumption of [itex]0\cdot \log \frac{0}{s_i} = \infty[/itex] , I could not get it right. Thus, I thought that maybe I "normalise" (i.e. shrink) [itex]\Omega[/itex], so that the new [itex]\Omega[/itex] contains only elements that have occurred in [itex]\{S\}[/itex]. but I have been told it is not a good idea either.
So hmm.. well, the question is how should I compare both types [itex]\{S\}_1[/itex] and [itex]\{S\}_2[/itex] against [itex]\pi[/itex] if their are of different length ?
Thank you for any suggestions, and accept my apology for poor mathematical language of this description,
cheers!