(Complicated?) Probability problem

KStolen · Feb 15, 2012

I've got a problem I'd like to solve and I haven't done any probability in years.

Here's the problem.

There are two M*M grids of squares. An N*N mask is placed randomly on the second grid, such that it covers at least one square, but up to N*N squares. (i.e, the majority of the mask may lie outside the grid)

If the covered squares are then copied into the corresponding squares in the first grid, then n is the average number of times I will have to perform this step to have covered at least 50% of the first grid with squares from the second grid.

What is n in terms of N and M?

M > N > 0

I don't even know where to begin. Taking repeating squares into account seems especially difficult. Anyone have any tips for this?

KStolen · Feb 16, 2012

Anyone?

Stephen Tashi · Feb 16, 2012

Tell us whether this is a textbook problem - then we'll have more hope! Offhand it looks like a very hard problem.

I assume you want some kind of formula as an answer. The practical answer would be to do Monte-Carlo simulations.

One suggestion (I won't call it a "tip" because I don't know it would work) is to analyze the problem in the style of induction. Assume you know the formula for an n by n square and see how that answer relates to the formula for an (n+1) by (n+1) dimensional square.

Ray Vickson · Feb 16, 2012

KStolen said:

I've got a problem I'd like to solve and I haven't done any probability in years.

Here's the problem.

There are two M*M grids of squares. An N*N mask is placed randomly on the second grid, such that it covers at least one square, but up to N*N squares. (i.e, the majority of the mask may lie outside the grid)

If the covered squares are then copied into the corresponding squares in the first grid, then n is the average number of times I will have to perform this step to have covered at least 50% of the first grid with squares from the second grid.

What is n in terms of N and M?

M > N > 0

I don't even know where to begin. Taking repeating squares into account seems especially difficult. Anyone have any tips for this?

The general case is difficult, so let’s look at the simplest case, with N = 1, where the geometry of the currently-covered squares does not matter. At any stage if k squares are covered (i.e., already selected) then the probability that a new square will be covered next is p(k) = 1 – k/M^2, while the probability that we just re-cover an existing square is q(k) = k/M^2. This is a Markov chain with states 0, 1, 2, …, M^2 (representing the number of currently-covered squares). At time 0 we start in state 0, then move to state 1 with probability 1 at time 1. From state 1, a time-step either takes us back to state 1 again (probability = q(1)) or takes us to state 2 (probability = p(1)). From state 2 we stay in state 2 with probability q(2) or move to state 3 with probability p(2), and so forth. Given a particular K (say near M^2/2) you want the expected number of steps needed to reach state K, starting from state 0. This is the expected first-passage time, and can be found by a fairly straightforward recursive procedure, as follows. Let x(j) = expected first passage-time from state j < K to state K. We have x(K-1) = 1 + q(K-1)*x(K-1), so x(K-1) = 1/p(K-1) [remember: 1-q(K-1) = p(K-1)]. For j < K-1 we have x(j) = 1 + q(j)*x(j) + p(j)*x(j+1), hence x(j) = [1 + p(j)*x(j+1)]/p(j) = x(j+1) + 1/p(j). So, from x(K-1) we can easily get x(K-2), then can get x(K-3),… and eventually get x(0), which is the answer we want. With a bit of work we get Answer = x(0) = 1 + 1/p(1) + 1/p(2) + 1/p(3) + ... + 1/p(K-1). [You could also get this by recognizing that the number of steps needed to reach a new square has a geometric distribution, so we need the expected value of independent but non-identically-distributed geometric random variables.]

Even the case N = 2 is much, much more difficult. Now not only is the area of already-covered squares important, but also their “geometry” (that is, the shape of the figure already covered). It could be that the problem is sufficiently hard that the most reasonable approach to it would be to do Monte-Carlo simulation (basically, an experimental method).

RGV

KStolen · Feb 16, 2012

Tell us whether this is a textbook problem - then we'll have more hope! Offhand it looks like a very hard problem.

Unfortunately it's not a textbook problem - it came up in a programming project.

Thanks to both of you - Ray and Stephen. I hadn't realized it was so difficult for the general case. I had hoped for a formula into which I could plug in my values, instead of having to determine the value of n experimentally. I'm able to simulate it fairly easily, so I suppose that will have to do.

Thanks again :)

pbuk · Feb 16, 2012

I started off by thinking the analytical soltion was highly complicated, requiring the analysis of many different branching possibilities. But now I wonder...

The NxN square can be laid in a total of (M + N - 1)² positions.

Each cell of the MxM square is covered by a total of N² different positions of the NxN square.

The probability of any individual cell of the MxM square being covered in a single iteration is therefore
N² / (M + N - 1)².

After n iterations, the probability of any individual cell NOT being covered is therefore
(1 - N² / (M + N - 1)²)ⁿ.

When a fraction p of the cells are covered, 1 - p are not covered so we have
1 - p = (1 - N² / (M + N - 1)²)ⁿ

n = log(1 - p) / log(1 - (N / (M + N -1))²)

If you are going to do some Monte Carlo simulation I would love to so how it compares.

pbuk · Feb 17, 2012

Hmmm, don't need Monte Carlo simulation, it's fairly easy to enumerate the (M + N -1)² possible placings of the tile as MxM matrices and then simply add matrices together n times for a total of (M + N - 1)^k branches where k = 2ⁿ.

The formula (1 - (N / (M + N - 1))²)ⁿ gives the expected value of the proportion of cells not covered after n placings. Taking M = 5, N = 3 and n=3 I have verified that the mean number of cells not covered is 64000/117649 (note 117649=49² ) exactly as given by the formula.

Looks like my intuition about scale/translation invariance was correct.

Ray Vickson · Feb 17, 2012

MrAnchovy said:

Hmmm, don't need Monte Carlo simulation, it's fairly easy to enumerate the (M + N -1)² possible placings of the tile as MxM matrices and then simply add matrices together n times for a total of (M + N - 1)^k branches where k = 2ⁿ.

The formula (1 - (N / (M + N - 1))²)ⁿ gives the expected value of the proportion of cells not covered after n placings. Taking M = 5, N = 3 and n=3 I have verified that the mean number of cells not covered is 64000/117649 (note 117649=49² ) exactly as given by the formula.

Looks like my intuition about scale/translation invariance was correct.

Your formula does not give the correct result for n (the mean number of steps until a certain fraction of cells are covered) in the case of n = 1, although it *does come close* numerically. Given M = m^2 cells, if we want to cover K cells, the expected number of steps we need is
S = sum(M/(M-k),k=0..K). Basically, the n = 1 problem is much like the classical Coupon Collector's Problem, but instead of asking for how many coupons we need to have a complete set we merely ask for an incomplete set (K out of M). For more on the coupon collector's problem, see http://en.wikipedia.org/wiki/Coupon_collector's_problem .

RGV

pbuk · Feb 17, 2012

Ah yes, I have answered a different question - thanks for pointing it out.

I have determined the number of steps required to achive a mean coverage of 50% of the cells, not the mean number of steps required to cover 50% of the cells.

Oops!

KStolen · Feb 19, 2012

Ah yes, I have answered a different question - thanks for pointing it out.

I have determined the number of steps required to achive a mean coverage of 50% of the cells, not the mean number of steps required to cover 50% of the cells.

Oops!

Hey, thanks for the reply. Although I initially thought that I wanted the mean number of steps required to cover 50% of the cells, I think the number of steps required to achive a mean coverage of 50% of the cells might be better.

If n is the mean number of steps required to cover 50% of the cells and
o is the number of steps required to achieve a mean coverage 50% of the cells,

then would I be correct in saying that in several iterations of o steps, the number of cells covered in each step would be closer to 50% than in several iterations of n steps?

(Complicated?) Probability problem

What is probability?

What is a complicated probability problem?

How do you approach a complicated probability problem?

What are some common mistakes to avoid when solving a complicated probability problem?

How can I improve my skills in solving complicated probability problems?

Similar threads

Hot Threads

Recent Insights