Generating a probability density function

Click For Summary
To generate a probability density function (PDF) from a dataset with mixed data types, one must assume a specific model due to the inherent uncertainty in the data. Missing data can complicate the process, and techniques for handling "censored data" may be applicable. Estimating parameters from the data is essential, as a unique PDF cannot be determined without making assumptions. Providing more details about the dataset can help in identifying suitable modeling approaches. Overall, a structured approach that considers both the continuous and discrete nature of the data is necessary for accurate likelihood calculations.
mrb427
Messages
8
Reaction score
0
I am trying to create a simple implementation of the Bayes decision rule with minimum error criterion and I am running into a problem. Specifically, if I have a data set consisting of a number of feature vectors stored in rows, how can I generate a probability density function from this data?

Also, how can I do this if some of the data is discrete, some is continuous, and some is missing? For example, let us assume each feature vector, x, has three elements.

x = [ a, b, c]

where;

a is categorical data and will be an element of the set {0, 1, 2, 3}
b is continuous data and will be in the range [0,1]
c is also continuous data in the range [0,1], but may be missing for some feature vectors
I want to be able to calculate the likelihood of a feature vector, x, based on the total data set or given that x is from a subset, w, of the total data set.

p(x) = ? and p(x|w) = ?

I have also posted this on Stack Exchange Mathematics, here:
http://math.stackexchange.com/quest...sity-function-from-a-set-of-multivariate-data

I would really appreciate if someone can help me out or point me in the right direction! :biggrin:
 
Physics news on Phys.org
mrb427 said:
Specifically, if I have a data set consisting of a number of feature vectors stored in rows, how can I generate a probability density function from this data?

When you don't have enough information to solve a problem, a standard technique is to assume a specific model for the data, a model that has a few unknown parameters. Then estimate the parameters from the data.

There's no use pretending that "I don't make any assumptions". Whatever you do, you'll end up having to make assumptions of some sort because even a simple data set does not determine a unique probability density function unless you make assumptions.

Treating situations where data is missing is known as dealing with "censored data". If you search on those keywords, you might find something that applies to your problem. To get suggestions for a plausible model for your data, I think you have to reveal more details about it.
 
The standard _A " operator" maps a Null Hypothesis Ho into a decision set { Do not reject:=1 and reject :=0}. In this sense ( HA)_A , makes no sense. Since H0, HA aren't exhaustive, can we find an alternative operator, _A' , so that ( H_A)_A' makes sense? Isn't Pearson Neyman related to this? Hope I'm making sense. Edit: I was motivated by a superficial similarity of the idea with double transposition of matrices M, with ## (M^{T})^{T}=M##, and just wanted to see if it made sense to talk...

Similar threads

Replies
4
Views
2K
  • · Replies 2 ·
Replies
2
Views
2K
  • Poll Poll
  • · Replies 4 ·
Replies
4
Views
2K
  • · Replies 4 ·
Replies
4
Views
2K
  • · Replies 2 ·
Replies
2
Views
1K
  • · Replies 3 ·
Replies
3
Views
2K
  • · Replies 7 ·
Replies
7
Views
2K
  • · Replies 4 ·
Replies
4
Views
3K
  • · Replies 1 ·
Replies
1
Views
2K
  • · Replies 11 ·
Replies
11
Views
3K