Log-Likelihood ratio in the context of natural language processing

Click For Summary
The discussion centers on the log-likelihood ratio (LLR) in the context of document summarization, particularly for calculating probabilities of words in input and background corpora. The LLR is defined as the ratio of probabilities under two different assumptions: equal probabilities across both corpora and differing probabilities for a specific word. Participants seek clarification on how to compute these probabilities, with suggestions that the numerator involves a combined count of word occurrences across both corpora, while the denominator may simply be the product of individual probabilities. Concerns are raised about the sampling procedure and the omission of the logarithmic aspect of the LLR, which is essential in Bayesian statistics. Overall, the conversation highlights the complexities of applying statistical concepts to natural language processing tasks.
starcoast
Messages
8
Reaction score
0
First of all, let me apologize if this question is in the wrong place. It's fundamentally a statistics question but it relates to computer science. I'm also not sure if this falls under the "homework" category, since it's for a class, but I need assistance on a general idea, not a problem set. Anyway:

I am implementing some unsupervised methods of content-selection/extraction based document summarization and I'm confused about what my textbook calls the "log-likelihood ratio". The book briefly describes it as such:

"The LLR for a word, generally called lambda(w), is the ratio between the probability of observing w in both the input and in the background corpus assuming equal probabilities in both corpora, and the probability of observing w in both assuming different probabilities for w in the input and the background corpus."

Breaking that down, we have the numerator: "the probability of observing w in both the input and in the background corpus assuming equal probabilities in both corpora" - How do I calculate what probability to use here?

and the denominator: "the probability of observing w in both assuming different probabilities for w in the input and the background corpus". - is this as simple as the probability of the word occurring in the input times the probability of the word occurring in the corpus? ex:

(count(word,input) / total words in input) * (count(word,corpus) / total words in corpus)

I've been looking over a paper my book references, Accurate Methods for the Statistics of Surprise and Coincidence (Dunning 1993), but I'm finding it difficult to relate to the problem of calculating LLR values for individual words in extraction based summarization. Any clarification here would be really appreciated.
 
Physics news on Phys.org
I don't know the conventions used in document analysis and the passage you quoted isn't well written, so I can only guess at what is meant. My guess is that "the probability of observing w in both the input and in the background corpus assuming equal probabilities in both corpora" involves estimating "the probability that a randomly chosen word is w" by taking the ratio: ( total occurrences of w in input + total occurences of w in background corpus)/ (total words in input + total words in background corpus). My guess for the denominator would be the same as yours.

One problem with the quoted passage is that "the probability of observing w" depends on the sampling procedure. I am assuming that procedure is "pick one random word from a uniform probability distribution over all the words".

Another problem is that the topic is the "log" likelihood ratio, but the passage doesn't mention taking the logarithm of the ratio.
 
Log likelihood ratio is a concept from Bayesian statistics. It is used quite frequently in Bayesian analyses.
 
The standard _A " operator" maps a Null Hypothesis Ho into a decision set { Do not reject:=1 and reject :=0}. In this sense ( HA)_A , makes no sense. Since H0, HA aren't exhaustive, can we find an alternative operator, _A' , so that ( H_A)_A' makes sense? Isn't Pearson Neyman related to this? Hope I'm making sense. Edit: I was motivated by a superficial similarity of the idea with double transposition of matrices M, with ## (M^{T})^{T}=M##, and just wanted to see if it made sense to talk...

Similar threads

  • · Replies 4 ·
Replies
4
Views
2K
  • · Replies 1 ·
Replies
1
Views
2K
  • · Replies 11 ·
Replies
11
Views
3K
  • · Replies 1 ·
Replies
1
Views
5K
  • · Replies 5 ·
Replies
5
Views
3K
  • · Replies 19 ·
Replies
19
Views
3K
  • · Replies 2 ·
Replies
2
Views
3K
  • · Replies 24 ·
Replies
24
Views
23K
  • · Replies 3 ·
Replies
3
Views
3K