Probability of Normal Distribution Generating a Sample

In summary: Then the probability of the series is fixed. So that's the "probability of the series given the model".The model consists of the initial probabilities, the transition probabilities and the output probabilities. So the model has a probability too.The model is not the series. The model is a model for the series.
  • #1
verdverm
7
0
I would like to know how to calculate the probability that a normal distribution generated a sample.

More specifically, I am clustering lines so I have several assumed normal distributions. Each cluster has a mean and variance/StdDev.
of both slope and length.

Given a set of clusters (normal distributions) AND a sample line,
I would like to be able to calculate the probabilities for each cluster.

I think it is something like:
P(L|C_i) = P_len(L.len|C_i) * P_slp(L.slp|C_i)
I don't know how to calculate the two RHS probabilities.

Thanks in advance,
Tony
 
Physics news on Phys.org
  • #2
verdverm said:
I would like to know how to calculate the probability that a normal distribution generated a sample.

Unless you adopt a Bayesian approach, you can't calculate "the probability that a normal distribution generated a sample".

Judging by remainder of your post, you may be able to calculate something that might be loosely interpreted as "the probability of a particular sample, give that we assume a particular normal distribution generated it." If that's what you meant, then we can discuss how to do it. First let's clarify what you are trying to do.

(The type of distinction you must make is between "The probability of A given B" versus "The probability of B given A". They aren't the same. )
 
  • #3
For a detailed specific reference: research.microsoft.com/pubs/144983/mod480-wang.pdf
( specifically the calculation of b_i(L) from section 3.3 )

I'm a little unclear on how the Bayesian comes into play...
perhaps because of the formula, perhaps because there are several clusters

a little clarification on the objective...

given a time series, I break it into line segments (Piecewise Linear Approximation).
Each line segment has a θ and a length.
Next I group the lines into clusters based on these values.
Then from each group/cluster we can calculate the mean and variance of the θ and length of the lines.

So at this point I have a bunch of clusters with 2 normal distributions each.
(one for θ and one for length) (joint probability?)

Now, given a new line, I want to associate a probability with each cluster.
This probability should encapsulate the likelihood that the cluster generated the new line.

Quote:
""(The type of distinction you must make is between "The probability of A given B" versus "The probability of B given A". They aren't the same. )""

I will always have the case of "The probability of LINE given CLUSTER"
 
  • #4
verdverm said:
This probability should encapsulate the likelihood that the cluster generated the new line.

I will always have the case of "The probability of LINE given CLUSTER"

Assuming you are attempting to define your goal, don't you see that these are contradictory statements?

Your first statement has the tortuous phrase "the probability should encapsulate the likelihood", but it amounts to saying that you want "the probability that a specific cluster generated the line given the data that defines the line". The second statement obviously refers to "the probability of the data that defines a line given the cluster than generated it".

The paper you mention assumes the reader is familiar with the context of applying "the Viterbi" algorithm. I'm not, but from a few minutes of Wikipedia surfing, this algorithm can be applied to data assumed to be from a Markov model. The Markov model has a vector of probabilities for its initial states. I suppose these might function as "prior probabilities" for a Bayesian analysis. Can you explain the probability model that the paper assumes?
 
  • #5
Not contradictory given that it is an iterative algorithm...

a Hidden Markov Model (HMM) has many states, each with:
- initial probability ( to start an observation series )
- transition probabilities ( to move from one state to another state ) { Matrix }
* output probability(ies) ( the probability of generating an observation )

the idea is to determine the hidden states of the model from the observations.

In the paper, instead of the points in time being the observations, the lines that approximate the data are the observations.

so my problem is with calculating the *output probabilities*

To initialize an iterative refinement, we first segment the series using the previously mentioned PLA
Then we cluster the lines created by PLA
Next, each cluster becomes a hidden state in an initial HMM (pHMM in the paper)
The output probabilities are calculated from the cluster of lines that is associated with the state (1-1 correspondence)

The output probabilities of a state are the {mean and variance} of the {angle and length}
of the lines that comprise the cluster. ( 4 values for the output in order to calculate probabilities later)So now we get to the iterative refinement stage after creating an initial HMM...
-- Re-segment the time series under guidance of the initial HMM
( this is where my question arises from )

given a candidate line from the new segmentation,
for each state in the HMM,
*** measure how likely it is that this state generated the candidate line [ b_i(L) in section 3.3 ]
***

measure is some how related to the two Gaussian distribution from each state ( angle & length )
and the current candidate line under consideration

The HMM will remain constant through the course of the re-segmentation
The candidate line will always be a different 'sample'

b_i(L) is used as part of a larger computation to find a new, optimal segmentation given the current HMMthe iterative process continues by

until HMM doesn't change
-- resegment with current HMM
-- create new HMM from resegmentationI could provide sample clusters and a single line if actual numbers are desiredTony
 
  • #6
verdverm said:
Not contradictory given that it is an iterative algorithm...

a Hidden Markov Model (HMM) has many states, each with:
- initial probability ( to start an observation series )
- transition probabilities ( to move from one state to another state ) { Matrix }
* output probability(ies) ( the probability of generating an observation )

the idea is to determine the hidden states of the model from the observations.

If this fits any resemblence to a standard markov modeling problem (which it seems to do), then if you have the initial probabilities and the transition matrix, then what you need to do is to find the steady state solutions that should correspond to the "output probabilities".

Is the above assumption correct or is there something else that we are missing?
 
  • #7
okay, i think people are looking to far into this...


the problem I am having is simply this:

given (possibly joint) gaussian probability distributions

pHMM
1 |339|
theta: 1.4544 0.2695
lens: 26.8225 6.2101
2 |24|
theta: 0.8524 0.1335
lens: 2.4693 0.5381
3 |72|
theta: -0.9516 0.2081
lens: 3.7492 0.8248
4 |21|
theta: 0.0000 0.0000
lens: 2.0000 0.0000
5 |24|
theta: -0.1932 0.2335
lens: 3.1475 0.1783
6 |21|
theta: 0.6506 0.3428
lens: 3.3084 0.0837


and given a line:

line
theta: 1.0
lens: 3.0



what is the probability that the line belongs to / was generated by / fits in with / ... each cluster:
1: ?
2: ?
3: ?
4: ?
5: ?
6: ?

I need the probability of the line with each of the clustres in the pHMM

currently I am using a hack I think
( function of the standard deviations away from the mean with domian [-3,3] and range [0,1] )



func (s *pState) calcLineGenProb( length,theta float64 ) float64 {
lDiff, tDiff := length-s.lMean, theta-s.tMean // difference from mean
lNorm, tNorm := lDiff/s.lVari, tDiff/s.tVari // normalize to Std Deviations
ret := calcZscore(lNorm) * calcZscore(tNorm) // calc hack ~= [0,1]*[0,1]
return ret // return a value close to 1 if a 'probable' line, return close to zero if an 'unlikely' line
}

// hack helper function
func calcZscore( X float64 ) float64 {
X = math.Abs(X) // only care about magnitude
if X > 3.0 || math.IsNaN(X) { return 0.000001 }
d := int(X*100.0) // index scaling
d1 := d/10 // calc vert axis
d2 := d%10 // calc horz axis
z := ZSCORE[d1][d2] // [0.5,0.9999] table of zscores from back of probability book
R := (z-0.5)*2.0 // scale to range [0,0.5] then [0,1] so that close to mean is close to 0
return 1.0 - R // invert for close to 1 for good lines
}
 

1. What is the normal distribution?

The normal distribution, also known as the Gaussian distribution, is a probability distribution that is often used to model real-world data. It is characterized by a bell-shaped curve and is symmetric around its mean value.

2. How is the normal distribution related to probability?

The normal distribution is used to calculate probabilities of events occurring within a certain range of values. It can also be used to generate random samples that follow a specific set of characteristics.

3. How can the normal distribution be used to generate a sample?

The probability of a normal distribution generating a sample can be calculated using a formula that takes into account the mean, standard deviation, and number of samples. This formula is known as the central limit theorem.

4. What is the significance of the central limit theorem in probability?

The central limit theorem is one of the most important concepts in probability and statistics. It states that as the sample size increases, the distribution of sample means will approach a normal distribution, regardless of the underlying distribution of the population.

5. How is the normal distribution used in hypothesis testing?

The normal distribution is used in hypothesis testing to determine the probability of obtaining a certain sample mean, given a specific population mean and standard deviation. This allows researchers to make inferences about the population based on a sample of data.

Similar threads

  • Set Theory, Logic, Probability, Statistics
Replies
2
Views
1K
  • Set Theory, Logic, Probability, Statistics
Replies
7
Views
466
  • Set Theory, Logic, Probability, Statistics
Replies
1
Views
439
  • Set Theory, Logic, Probability, Statistics
Replies
30
Views
2K
  • Set Theory, Logic, Probability, Statistics
Replies
7
Views
1K
  • Set Theory, Logic, Probability, Statistics
Replies
7
Views
338
  • Set Theory, Logic, Probability, Statistics
Replies
9
Views
1K
  • Set Theory, Logic, Probability, Statistics
Replies
4
Views
2K
  • Set Theory, Logic, Probability, Statistics
Replies
1
Views
921
  • Set Theory, Logic, Probability, Statistics
Replies
3
Views
1K
Back
Top