- 2,265
- 4,860
I have a question about an entry on the following Wikipedia page - tf-idf ( term frequency–inverse document frequency). In the Matematical Details section where they are describing the inverse document frequency as follows:
idf(t, D) = log \frac{|D|}{|{d \in D : t \in d}|}
This is my first attempt at Latex so I couldn't figure out how to put the {} symbols around the lower term.
It states that it is common to adjust the the lower term as follows to avoid division by zero:
|{d \in D : t \in d}| → 1 + |{d \in D : t \in d}|
The article then continues on and states "the ratio inside the idf's log function is always greater than or equal to 1". However, if a word appears in every document in the set D, you end up with a log that ends up negative. Shouldn't the value of D be increased by 1 in order for idf (t, D) to always be >= 0?
I'm thinking that the final formula should look like this:
idf(t, D) = log \frac{1 + |D|}{1 + |{d \in D : t \in d}|}
idf(t, D) = log \frac{|D|}{|{d \in D : t \in d}|}
This is my first attempt at Latex so I couldn't figure out how to put the {} symbols around the lower term.
It states that it is common to adjust the the lower term as follows to avoid division by zero:
|{d \in D : t \in d}| → 1 + |{d \in D : t \in d}|
The article then continues on and states "the ratio inside the idf's log function is always greater than or equal to 1". However, if a word appears in every document in the set D, you end up with a log that ends up negative. Shouldn't the value of D be increased by 1 in order for idf (t, D) to always be >= 0?
I'm thinking that the final formula should look like this:
idf(t, D) = log \frac{1 + |D|}{1 + |{d \in D : t \in d}|}