Number of Nodes as Neighbors: Probability Question

  • Thread starter Thread starter randomuser11
  • Start date Start date
  • Tags Tags
    Nodes Probability
Click For Summary
SUMMARY

The discussion centers on calculating the number of nodes reached in an unstructured overlay network where each node randomly selects c neighbors. The formula provided in the textbook, c × q(c - 1), incorporates the probability q that a neighbor will send a message only to non-neighbors. This probability is derived from a summation involving binomial coefficients and the total number of nodes N. For instance, with c = 20 and N = 10,000, the flooding strategy reaches approximately 365,817 nodes, highlighting the importance of understanding the underlying probability mechanics.

PREREQUISITES
  • Understanding of unstructured overlay networks
  • Familiarity with probability theory and binomial coefficients
  • Knowledge of flooding algorithms in network communications
  • Basic grasp of combinatorial mathematics
NEXT STEPS
  • Study the derivation of the probability q in network flooding scenarios
  • Learn about binomial distributions and their applications in network theory
  • Explore advanced concepts in overlay networks, such as structured vs. unstructured designs
  • Investigate the implications of neighbor selection on network efficiency and performance
USEFUL FOR

This discussion is beneficial for computer scientists, network engineers, and students studying distributed systems, particularly those interested in network algorithms and their probabilistic foundations.

randomuser11
Messages
3
Reaction score
0

Homework Statement


Consider an unstructured overlay network in which every node randomly chooses c neighbors. To search for a file, a node floods a request to its neighbors and requests those to flood the request once more. How many nodes will be reached?


Homework Equations




The Attempt at a Solution


I have the solution already (from the textbook) but it doesn't explain it very well. Can you please explain this solution to me? I'm not understanding the probability part (I understand the c*c-1).

TEXTBOOK SOLUTION:
An easy upper bound can be computed as c*(c -1), but in that case we ignore the fact that neighbors of node P can be each other's neighbor as well. The probability q that a neighbor of P will send a message only to nonneighbors of P is 1 minus the probability of sending it to at least one neighbor of P:
q = 1 - \sum\limits_{k=1}^{c-1} \binom{c-1}{k} \left( \frac{c}{N-1}\right)^k \left( 1-\frac{c}{N-1} \right)^{c-1-k}

In that case, this flooding strategy will reach c \times q(c -1) nodes. For example, with c = 20 and N = 10, 000, a query will be flooded to 365.817 nodes.


So, that's the textbook's answer... but I'm really confused! I don't understand the probability part. I mean, I understand what it's trying to do, but I'm totally lost on where the numbers or summation is coming from. I would REALLY appreciate an explanation of this! I think it's trying to enumerate all combinations and give them a probability in some way, but I'm lost at how they arrived at this. Our teacher said this was an "almost trivial" problem... and when I submitted my homework, I submitted c \times (c-1) and then I saw this and almost fell over.

Thanks in advance!
 
Physics news on Phys.org
welcome to pf!

hi randomuser11! welcome to pf! :smile:
randomuser11 said:
The probability q that a neighbor of P will send a message only to nonneighbors of P is 1 minus the probability of sending it to at least one neighbor of P:
q = 1 - \sum\limits_{k=1}^{c-1} \binom{c-1}{k} \left( \frac{c}{N-1}\right)^k \left( 1-\frac{c}{N-1} \right)^{c-1-k}

the probability that the second flood from one node in the first flood will include exactly k nodes from the c-1 other nodes in the first flood (and therefore c-1-k nodes not from the first flood) is

\binom{c-1}{k} \left( \frac{c}{N-1}\right)^k \left( 1-\frac{c}{N-1} \right)^{c-1-k}

isn't it? :wink:
 
I'm confused. :( why don't we consider nodes that are repeated at the third level then? Because some nodes from the second level may also be neighbors with more than one node on the third level? Do we just ignore that case then?

And why wouldn't it be c-2 because we wouldn't send a message back to the node that originally sent it and we wouldn't send a message to ourselves, so wouldn't it be c-2? It's just confusing..
 
hi randomuser11! :smile:
randomuser11 said:
why don't we consider nodes that are repeated at the third level then? Because some nodes from the second level may also be neighbors with more than one node on the third level? Do we just ignore that case then?

because that's double-counting

if you count a repeated node the second time, you're counting it twice!

when you count, it's essential to count everything only once! :smile:

so in practice, what you do is you count them twice, and subtract the number that you counted twice: the ∑ is the number you counted twice
And why wouldn't it be c-2 because we wouldn't send a message back to the node that originally sent it ..

yes, you could: the question doesn't say no-backsies! :wink:
 
My point was something like this. The formula seems to be trying to correct for cases like this (p is the top level node):
p
/ \
n - n

This is the case where c = 2 and the neighbors are repeated on the second level.But what about a case like this:

p
/ \
n n
\ /
n

It seems like we could also have the problem of repetition at the third level, right (as in the above example)? This seems like a different case than the case at the second level...
Or am I just overthinking it?

Thanks!
 
i honestly don't understand your diagrams :confused:

can you say it in words? :smile:
 

Similar threads

  • · Replies 1 ·
Replies
1
Views
3K
Replies
1
Views
2K
  • · Replies 10 ·
Replies
10
Views
2K
  • · Replies 4 ·
Replies
4
Views
2K
  • · Replies 7 ·
Replies
7
Views
8K
  • · Replies 1 ·
Replies
1
Views
2K
  • · Replies 2 ·
Replies
2
Views
3K
  • · Replies 6 ·
Replies
6
Views
5K
  • · Replies 9 ·
Replies
9
Views
9K
  • · Replies 1 ·
Replies
1
Views
2K