QuArK21343 said:
It is known that a process occurs with a frequency of 10 events per second and a second process with a frequency of 12 per second. Consider the two situations:
1. It is possible to repeat N times the experiment, that consists in counting the processes, over a period of 10 seconds.
2. It is possible to take a single measurement of the number of processes over a time T arbitrarily large.
How should I choose N and T in order to claim, with a confidence level of 95%, that the two processes indeed occur at different rates?
Hey QuArK213143 and welcome to the forums.
For 1. as long as you make sure to adhere to collecting data in terms of the resolution of your process then you should be ok. In other words if you are guaranteed that you will at most have only 1 event per period of time then as long you collect data it should be ok. If you can't guarantee this and can't guarantee that events are independent of one another then you should not use a Poisson distribution which is the common process used to model things like counting and rates processes. Are you familiar with the Poisson distribution and its interpretation with respect to distribution parameters?
I'm not exactly sure what you mean by number 2. Could you explain physically what you are doing in terms of data collection and what specifically that data is referring to?
In terms of showing that the two processes are different, that is a different issue.
The first thing you need to do for this is to figure out what kind of constraints you have for the model, but not for the parameters necessarily. The less constraints you make the more general and the harder it will be to make a more specific assertion.
If your processes seem to fit or are at least well approximated by a Poisson distribution then that becomes your first constraint. This alone makes a huge simplification in terms of analysis.
The next thing is whether you want to assume that these two processes are independent from one another in that the data pertaining to one process is independent from the data of another. This is to say that that there is no dependency on elements in one process having to do with the other. If these two groups of data have some dependency, then this will change things a lot and you need to alter your model and analytic techniques to take this into account.
So if you have two Poisson distribution in which there is no dependency characteristics between one set of data and another (corresponding to the processes) then you will need to move on to a statistical test that eventually tests whether the two distributions (which in the above set of assumptions are independent Poisson) are statistically significantly equal.
This means that under a given confidence criteria, given sample sizes for both datasets, and the data itself you will do a hypothesis test to see if the both processes have statistically significantly the same parameters which will be used to support your claim that they come from the same process (which means you have evidence to suggest your assumption is false) or from different processes' (which means you have evidence to suggest your assumption is true).
Now a poisson distribution has the property that the mean is equal to the variance which means you only have to deal with a statistical test for that one parameter. In terms of doing a hypothesis test I would need to lookup the assumptions but if you have enough data, I would think that a t-test should be appropriate due to the results of the central limit theorem which says that if you have enough data with respect to your distribution, the distribution of your mean should go normal when you standardize it with respect to other parameters like the real mean. Also the t-test should be a two-sample test just for clarification.
The actual determination of parameters has to do with using tables for a specific level of confidence that is calculated by using a value for N for both datasets.