- #1
- 2,198
- 4,484
I have a question about an entry on the following Wikipedia page - tf-idf ( term frequency–inverse document frequency). In the Matematical Details section where they are describing the inverse document frequency as follows:
idf(t, D) = log [itex]\frac{|D|}{|{d \in D : t \in d}|}[/itex]
This is my first attempt at Latex so I couldn't figure out how to put the {} symbols around the lower term.
It states that it is common to adjust the the lower term as follows to avoid division by zero:
|{d [itex]\in[/itex] D : t [itex]\in[/itex] d}| → 1 + |{d [itex]\in[/itex] D : t [itex]\in[/itex] d}|
The article then continues on and states "the ratio inside the idf's log function is always greater than or equal to 1". However, if a word appears in every document in the set D, you end up with a log that ends up negative. Shouldn't the value of D be increased by 1 in order for idf (t, D) to always be >= 0?
I'm thinking that the final formula should look like this:
idf(t, D) = log [itex]\frac{1 + |D|}{1 + |{d \in D : t \in d}|}[/itex]
idf(t, D) = log [itex]\frac{|D|}{|{d \in D : t \in d}|}[/itex]
This is my first attempt at Latex so I couldn't figure out how to put the {} symbols around the lower term.
It states that it is common to adjust the the lower term as follows to avoid division by zero:
|{d [itex]\in[/itex] D : t [itex]\in[/itex] d}| → 1 + |{d [itex]\in[/itex] D : t [itex]\in[/itex] d}|
The article then continues on and states "the ratio inside the idf's log function is always greater than or equal to 1". However, if a word appears in every document in the set D, you end up with a log that ends up negative. Shouldn't the value of D be increased by 1 in order for idf (t, D) to always be >= 0?
I'm thinking that the final formula should look like this:
idf(t, D) = log [itex]\frac{1 + |D|}{1 + |{d \in D : t \in d}|}[/itex]