# Number of Nodes as Neighbors: Probability Question

## Homework Statement

Consider an unstructured overlay network in which every node randomly chooses c neighbors. To search for a file, a node floods a request to its neighbors and requests those to flood the request once more. How many nodes will be reached?

## The Attempt at a Solution

I have the solution already (from the textbook) but it doesn't explain it very well. Can you please explain this solution to me? I'm not understanding the probability part (I understand the c*c-1).

TEXTBOOK SOLUTION:
An easy upper bound can be computed as c*(c -1), but in that case we ignore the fact that neighbors of node P can be each other's neighbor as well. The probability q that a neighbor of P will send a message only to nonneighbors of P is 1 minus the probability of sending it to at least one neighbor of P:
$$q = 1 - \sum\limits_{k=1}^{c-1} \binom{c-1}{k} \left( \frac{c}{N-1}\right)^k \left( 1-\frac{c}{N-1} \right)^{c-1-k}$$

In that case, this flooding strategy will reach $c \times q(c -1)$ nodes. For example, with c = 20 and N = 10, 000, a query will be flooded to 365.817 nodes.

So, that's the textbook's answer.... but I'm really confused! I don't understand the probability part. I mean, I understand what it's trying to do, but I'm totally lost on where the numbers or summation is coming from. I would REALLY appreciate an explanation of this! I think it's trying to enumerate all combinations and give them a probability in some way, but I'm lost at how they arrived at this. Our teacher said this was an "almost trivial" problem.... and when I submitted my homework, I submitted $c \times (c-1)$ and then I saw this and almost fell over.

tiny-tim
Homework Helper
welcome to pf!

hi randomuser11! welcome to pf! The probability q that a neighbor of P will send a message only to nonneighbors of P is 1 minus the probability of sending it to at least one neighbor of P:
$$q = 1 - \sum\limits_{k=1}^{c-1} \binom{c-1}{k} \left( \frac{c}{N-1}\right)^k \left( 1-\frac{c}{N-1} \right)^{c-1-k}$$

the probability that the second flood from one node in the first flood will include exactly k nodes from the c-1 other nodes in the first flood (and therefore c-1-k nodes not from the first flood) is

$$\binom{c-1}{k} \left( \frac{c}{N-1}\right)^k \left( 1-\frac{c}{N-1} \right)^{c-1-k}$$

isn't it? I'm confused. :( why don't we consider nodes that are repeated at the third level then? Because some nodes from the second level may also be neighbors with more than one node on the third level? Do we just ignore that case then?

And why wouldn't it be c-2 because we wouldn't send a message back to the node that originally sent it and we wouldn't send a message to ourselves, so wouldn't it be c-2? It's just confusing..

tiny-tim
Homework Helper
hi randomuser11! why don't we consider nodes that are repeated at the third level then? Because some nodes from the second level may also be neighbors with more than one node on the third level? Do we just ignore that case then?

because that's double-counting

if you count a repeated node the second time, you're counting it twice!

when you count, it's essential to count everything only once! so in practice, what you do is you count them twice, and subtract the number that you counted twice: the ∑ is the number you counted twice
And why wouldn't it be c-2 because we wouldn't send a message back to the node that originally sent it ..

yes, you could: the question doesn't say no-backsies! My point was something like this. The formula seems to be trying to correct for cases like this (p is the top level node):
p
/ \
n - n

This is the case where c = 2 and the neighbors are repeated on the second level.

But what about a case like this:

p
/ \
n n
\ /
n

It seems like we could also have the problem of repetition at the third level, right (as in the above example)? This seems like a different case than the case at the second level...
Or am I just overthinking it?

Thanks!

tiny-tim
i honestly don't understand your diagrams can you say it in words? 