PainterGuy said:
How is such a situation is handled in such scenarios
I don't know what you mean specifically by "handled". Are you asking how to compute a sampling distribution?
To talk about a sampling distribution, you must first specified what statistic is being sampled. One can compute statistics from a sample in various ways. For example, from a sample of 5 persons, one could record the height of the tallest of those 5 persons ( an example of "order statistics"). Or one could compute the average height of those 5 persons, or one could compute how many of those 5 persons wear suspenders.
As I understand your example, we have two investigators, I1, I2. They draw a sample of 5 persons from a population of 30 as follows. I1 selects 1 person from the 30. Then I2 selects 1 person from the 29 persons remaining. The the process is repeated on the original population of 30 until each investigator has selected 5 people.
Intuitively, I see no reason why this changes the probability of I2 getting a particular sample. For example assume the persons are designated by numbers 1,2,...30. When I2 does random sampling with replacement by himself there is a certain probability that I2 selects a multi-set such as {1,1,15, 18, 25, }. Does the probability of getting that multi-set change when I1 is picking a person before I2 picks a person?
and which one of the two, 'sampling with replacement' or 'sampling without replacement', should be chosen as a more practical and sensible methodology?
There is no general rule.
I got the correct number now. But how do interpret the result.
The result is ##Pr(X_1 + X_2 = 4)##.
In post #11 the following was said.
...
The way you are forming questions in post #``11 is incoherent because it fails to distinguish between the concept of
data and the concept of a probability distribution.
In post #``11 you are referring to the collection of data. The calculation you are asking about is done from the values in a probability distribution. To repeat advice from previous posts, a set of data representing samples from a probability distribution is
not the same thing as the probability distribution.
A given binomial distribution such as ##B(70,p)## can be used to model different situations. ##B(70,p)## might model the probability that there are ##k## latenesses among 70 employees on 1 day. It might also model the lateness of 1 employee over a period of 70 days. Or it might model the total latenesses of 5 employees over 14 days.
One interpretation of the probability distribution ##B(5,p)## is that it gives the probability that ##k## employees will be late on 1 given day. The possible values of ##k## are 0,1,2,3,4,5.
If you are talking about 2 weeks of
data about lateness , then we consider both ##k## , the number of employees late on a given day and also ##d##, the number of days where ##k## employees were late. The possible values of ##d## are 0,1,2,...14. The
data concerns the
frequency of latenesses.
Frequencies are
not the same concept as probabilities.
Apparently you are imagining that 2 weeks worth of data are taken and that the data is used to estimate a probability distribution. Ignoring the order of the 14 days, two weeks worth of data can be represented by a set of pairs of numbers of the form ##(k,d)##. It might turn out that the data consists of one pair of numbers such as (2,14). Or it might turn out that we need as many as 6 pairs of numbers such as { (0,7),(1,1),(2,1),(3,1)(4,1)(5,3)}. That data could be used to
estimate the probability for ##k## employees being late on 1 given day. The procedure for estimation can be done in different ways.If we take ##B(5,p)## as the probability distribution for ##k## employees being late on 1 given day and assume each day is an independent experiment then we can compute the probability distribution for x total lateness in D total days. This done by taking the "D-fold convolution" of ##B(5,p)##. Because of the special nature of the binomial distribution, the D-fold convolution of ##B(5,p)## is ##B(5D,p)##. Those calculations have nothing to do with specific data.
The binomial distribution is special. Let ##f(K)## be a non-binomial distribution for the probability that ##k = 0,1,2,..K## employees are late on a 1 given day. It is not, in general, true that the D-fold convolution of ##f(K)## is ##f(KD)##. Constraints about the total number of employees and the total number of days might be critical if we use a non-binomial distribution to model what happens on 1 day.