Bayesian Statistics - obtaining parameters for model from real data

In summary, the conversation discusses using a mathematical model to generate data on an epidemic and the free parameters involved. The goal is to find values for the parameters that closely match the original data distribution, but the method for measuring this closeness is not clear. Suggestions for using maximum likelihood fitting and Bayesian statistics are mentioned, but it is noted that the model may be deterministic and therefore not suitable for statistical analysis. It is also mentioned that the format of the data is important in determining the best approach for fitting the model.
  • #1
trelek2
88
0
Hello,

I've got some data on an epidemic in various locations - the total number of agents and number killed by the infection after 1 year. -This gives gives me a distribution of percentages of the populations that have been killed by the infection. (but all the percentage values are relatively small)

I wrote a mathematical ODE model for the disease spread within a population with 3 free parameters:

p1 - probability of getting infected externally from the environment
p2 - probability of infecting a new agent once at least one is already sick
p3 - once an agent dies, it is replaced with a new one, the probability that the new one is already infected is given by p3.

Now I need to choose values for p1,p2 and p3 so that the model generates data distributed as closely to the original distribution as possible.
The trouble is that I have never done anything like this before and have very little experience with any sort of statistics.

How should I define the original data distribution - a list of percentages of killed agents? a continuous function somehow?

Then should I choose values for p1,p2,p3 by trial and error and run simulations multiple times to also generate distributions of data?

Lastly, is there a proper way of comparing the obtained data with the original set? I've seen somewhere something about distance functions, what would be the best way of implementing this?

Thanks for any advice!
 
Physics news on Phys.org
  • #2
trelek2 said:
Now I need to choose values for p1,p2 and p3 so that the model generates data distributed as closely to the original distribution as possible.

That goal doesn't quite define a mathematical problem. To define a mathematical problem you must reveal how you intend to measure the "closeness" of the model and the data.

Also, your goal is not necessarily related to Bayesian Statistics. A typical goal of using Bayesian statistics would be to find the values p1,p2,p3 that have maximum posterior probability given the data. To do that, you must specifiy a prior probability distribution for p1,p2,p3.
 
  • #3
Thanks for your reply.

That's what I was mainly asking about - how should I go about measuring the closeness of the data? I don't have ideas for that.

Since I don't know what p1,p2,p3 might be I could set them to be uniform distributions between 0 and 0.1, since I expect them to be small.
 
  • #4
trelek2 said:
That's what I was mainly asking about - how should I go about measuring the closeness of the data? I don't have ideas for that.

There is no universal law of statistics that tells you how to measure the closeness of a model to data.

In special situations people know what decisions will be made by using a model with particular parameters. Definite economic (or other) rewards and costs can be assigned for "errors" between the models predictions and actual events.

However, its is more common that people fitting models to data can't specify how the model will be put to practical use. In that case there are certain methods of fitting that are traditional - and tradition isn't any objective justification for the methods being "best".

One traditional method is "maximium liklihood" fitting. You search for the values of p1,p2,p3 that make the data "most likely". If you are dealing with discrete probability distributions, you can say that you search for the values of p1,p2,p3 that make the observed data the most probable sample. If you are dealing with continuous distributions, the work "liklihood" is used instead of "probability" since the probability of picking an exact sample value v from a continuous distribution is usually zero, even if the value of the probability density f(x) of the distribution is not zero at x = v. We can call f(v) a "liklihood", but not a probability.

You can do "maximum liklihood" fitting without using a Bayesian prior, but you can't claim the parameters that maximize the liklihood are "the most probable values of the parameters".

If you assume a Bayesian prior distirbutions for the parameters then you can (in theory) compute a posterior distribution for the parameters and pick, as your estimate for the parameters, the value that is "most likely" in the posterior distribution. ( Often the "most likely" Bayesian posterior parameters are nearly the same as the parameters that maximize the liklihood of the data, but there is no theorem that this must always be true.)

To give more specific advice, we need to know how your model is implemented. How hard would it be to compute the liklihood of the data for a given set of parameters p1,p2,p3? For example, if you model is implemented as a Monte-Carlo simulation, it might be rather hard. Is it possible to compute the liklihood of the data in a deterministic way?
 
  • #5
trelek2 said:
I wrote a mathematical ODE model for the disease spread within a population with 3 free parameters:

It occurs to me that since you used an ODE model, you're predictions for the data might be deterministic - i.e. you might predict only one possibility for the data instead of a probability distribution of possibilities. If you are using a deterministic model then probability and statistics can't help you unless you put some probabilistic feature into the scenario. You can assume there is something that introduces error in measuring the data or you can assume there is something that introduces variation in the model.
 
  • #6
Sorry, it is not an ODE model. That was my initial idea, but I ended up with a Monte Carlo type simulation without a lattice.

That's why I am not sure how to proceed. All I can really do with my current knowledge is run sample simulations using guessed parameters.
 
  • #7
What is the exact format of the data you have? Is it a a single value ( number of people infected)? Or is it a sequence of values ( number of people infected at time t = 1,2,...N)?
 
  • #8
I do have the time points for each of the values.
 
  • #9
trelek2 said:
I do have the time points for each of the values.

So an output of the model is a vector of values. There are probably too many possible output vectors for you to run the model enough times to estimate the probability of each possible output vector occurring.

I think we must get into the details of you model to find a way to fit it to data. At a given time step, what are the variables that define the state of the process? What algorithm produces the next state from the current state? Which of the state variables are recorded in the real data and which are not known from the real data. (For example, you mention "agents". I'd guess the numberof agents is a state variable in the model but not something that was measured in the real data.)
 
  • #11
trelek2,

What is your definition of "agent"? In a quick look at Google hits for "epidemic agent", I only see "agent" mentioned as an object in computer software that simulates epidemics. Does "agent" have a medical definition?
 

1. What is Bayesian statistics?

Bayesian statistics is a branch of statistics that uses a mathematical framework to update our beliefs about a certain phenomenon or event based on new evidence or data. It involves using prior knowledge and incorporating new information to make more accurate predictions or inferences.

2. How is Bayesian statistics different from traditional statistics?

The main difference is that Bayesian statistics allows for the incorporation of prior knowledge and beliefs, while traditional statistics relies solely on the data at hand. This allows for more flexibility and adaptability in making predictions and inferences.

3. How do you obtain parameters for a model using Bayesian statistics?

In Bayesian statistics, parameters for a model are obtained by using a combination of prior knowledge and data. The process involves specifying a prior distribution for the parameters, updating it with the data using Bayes' theorem, and obtaining a posterior distribution for the parameters. This posterior distribution can then be used to make inferences and predictions.

4. What are the advantages of using Bayesian statistics?

One of the main advantages of Bayesian statistics is its ability to incorporate prior knowledge and beliefs, which can lead to more accurate predictions. It also allows for the quantification of uncertainty in the parameter estimates, and the ability to update and refine these estimates as new data becomes available.

5. What are some common applications of Bayesian statistics?

Bayesian statistics has a wide range of applications, including in biology, economics, medicine, and social sciences. It is commonly used in fields where there is limited data or where incorporating prior knowledge can lead to more accurate predictions, such as in climate modeling, marketing research, and risk analysis.

Similar threads

  • Set Theory, Logic, Probability, Statistics
Replies
26
Views
3K
  • Set Theory, Logic, Probability, Statistics
Replies
5
Views
1K
  • Quantum Interpretations and Foundations
Replies
3
Views
1K
  • Set Theory, Logic, Probability, Statistics
Replies
4
Views
1K
  • Set Theory, Logic, Probability, Statistics
Replies
20
Views
2K
  • Set Theory, Logic, Probability, Statistics
Replies
4
Views
1K
Replies
1
Views
578
  • Calculus and Beyond Homework Help
Replies
4
Views
984
  • Programming and Computer Science
Replies
11
Views
953
  • Set Theory, Logic, Probability, Statistics
Replies
4
Views
2K
Back
Top