Bayesian model for hierarchical evaluation

  • Context: Graduate 
  • Thread starter Thread starter hemanthk
  • Start date Start date
  • Tags Tags
    Bayesian Model
Click For Summary

Discussion Overview

The discussion revolves around modeling a hierarchical evaluation of document similarity using a Bayesian framework. Participants explore how to probabilistically represent a two-level similarity search process in a library organized into clusters, focusing on the application of Bayesian principles to this problem.

Discussion Character

  • Exploratory
  • Technical explanation
  • Debate/contested

Main Points Raised

  • One participant describes a process involving two phases for finding documents similar to a query document, emphasizing the need for a Bayesian representation of the similarity evaluation.
  • Another participant questions the interpretation of the term "Bayesian framework," suggesting that it typically relates to statistical estimation or inference rather than simulation.
  • A participant seeks clarification on how to express the posterior probability of document similarity in a way that aligns with their two-phase evaluation method.
  • Concerns are raised about the clarity of the notation used, particularly regarding the meaning of "D_q ~ i" and the interpretation of clusters as events.

Areas of Agreement / Disagreement

Participants express differing views on the nature of a Bayesian framework and its applicability to the problem at hand. There is no consensus on how to properly model the hierarchical evaluation or the specific probabilistic expressions involved.

Contextual Notes

Participants highlight potential ambiguities in the notation and definitions used in the discussion, particularly regarding the representation of similarity and the role of clusters in the Bayesian framework.

hemanthk
Messages
2
Reaction score
0
Hi

I have a problem in which I have to search a huge library in order to find documents similar to that of a given query document.

The library is organised into clusters and each cluster contains documents of a particular class.

Given a query document, first we retrieve the most similar clusters and then find the most similar documents among the retrieved clusters.

Now my question is, how do we model this hierarchical evaluation in a bayesian framework ? (Assuming we already have the retrieval methods in place) We just need a framework to probabilistically represent this hierarchical search.

I just need a starting point or an example (a research paper, textbook, etc)...

Any help would be greatly appreciated. Thank you
 
Physics news on Phys.org
hemanthk said:
how do we model this hierarchical evaluation in a bayesian framework ? (Assuming we already have the retrieval methods in place) We just need a framework to probabilistically represent this hierarchical search.

When I think of modeling such a situation, I think of writing a simulation for it. Writing simulations can can involve computing conditional probabilities by Bayes rule, but I wouldn't call that a "Bayesian framework". I don' t know what it would mean to have a Bayesian framework for a simulation.

I think of a Bayesian framework as an approach to problems of statistical estimation or inference.

In the field of artificial intelligence, there are types of learning algorithms that are called Bayesian.

If you already have your procedures in place and you want to simulate them at work, I think you are just asking about how to create a simulation.

If you are trying to optimize your procedures by using a statistical decision method then it makes sense to ask about a Bayesian approach.

I think you should clarify which situation you're asking about.
 
Hi Stephen,

Thank for your reply. Actually, its not a simulation that I want to do.
However as you said, I would like to perform my two level similarity evaluation and denote the result as a posterior (conditional probability representation).

I ll explain briefly.

Lets say the query document is D_q , there is a library of documents organised into k clusters (c_1 ... c_k). Let the total number of documents in the library (in all clusters) be N.

Now, we have to find a document in the library (k clusters) most similar to the query D_q.
We do it in two phases:
Phase I: Perform a similarity analysis of D_q with each of the k clusters and find some number of most similar clusters. Let us denote the set of these clusters as C_s.
(Now the search space is boiled down to C_s - a small part of the huge library)
Phase II: Perform a similarity analysis of D_q with each document of each cluster in the set C_s.
Finally, we have a list of N similarities corresponding to all the N documents in the library.
The non-zero similarity values are the ones corresponding to the documents of C_s.
And all the other documents are given zero similarity since they did not even qualify in the first level of analysis.

This is the way we are doing stuff right now.
Now what we want is to represent this evaluation as a similarity posterior over all the documents in the library given the clusters forming the library:

For i=1 to N
p(D_q~i/c_1,c_2,...c_k)=...

The crux of my problem is here: How to decompose this posterior probability expression such that it corresponds to what I do as explained above ? I thought this might be done using bayes rule but am not sure how!

I hope I made myself clear and if I did not, I am ready to explain further more.

Thank you
 
hemanthk said:
p(D_q~i/c_1,c_2,...c_k)=...

It isn't clear what that means.

Does "D_q ~ i " refer to a yes-or-no random variable that says document D_q is (or is not) similar to document i ? Or does D_q refer to a real number that measures similarity? -or a vector of numbers?

What event does "c1,c_2,...c_k" denote? I thought the c_i were clusters, not events.
 

Similar threads

  • · Replies 26 ·
Replies
26
Views
5K
  • · Replies 7 ·
Replies
7
Views
2K
  • · Replies 23 ·
Replies
23
Views
4K
  • · Replies 2 ·
Replies
2
Views
620
  • · Replies 2 ·
Replies
2
Views
3K
  • · Replies 42 ·
2
Replies
42
Views
8K
  • · Replies 1 ·
Replies
1
Views
3K
  • · Replies 5 ·
Replies
5
Views
2K
  • · Replies 3 ·
Replies
3
Views
3K
Replies
6
Views
2K