# Probability of finding a k-subset in a d-dimensional random

• A
• derpetermann
In summary, the problem presented is to find the probability of finding a k-subset in a d-dimensional random binary array of length n, where the subset has at least s_1 ones in dimension 1, s_2 ones in dimension 2, ..., s_d ones in dimension d. The total number of ones per dimension is fixed. It is related to Combinatorial and Ramsey Theory, and simulations may be used to find approximate solutions.
derpetermann
For a somwehat simple version of the problem, imagine a group of ten people (n=10), for which we observe two binary attributes (d= 2). One of the attributes tells us if any given person wears glasses (d_1) the other if any given person wears a hat (d_2). We know that in total 5 people wear hats and 6 people wear glasses, but we don't know who wears what (random binary array). What is the probability of finding a subset of size 4 (k= 4) such that at least three people in the subset wear a hat (s_1 = 3) AND at least four people wear glasses (s_2 = 4)? It's somewhat clear that there are realisations where wearing a hat and wearing glasses are mutually exclusive, such that those who have a hat don't have glasses, and realisations where having glasses and having a hat go together. Thus, for any realisation of the random binary array there might or might not be a subset of size k with the above properties. I can't figure, though, how to calculate the probabilities.

The more general definition of the problem is this:

What is the probability of finding a k-subset in a d-dimensional random binary array of length n, such that this subset has at least s_1 ones in dimension 1, s_2 ones in dimension 2, ..., s_d ones in dimension d. The total number of ones per dimension (ts_1 to ts_d) are fixed.

Is there a name to this problem? I tried to google, but I really did not know what to look for, since the problem definition is quite unwieldy to begin with.

Is the problem tractable analytically? If yes does the solution scale to larger d, k and n when solved algorithmically?

Your help is highly appreciated! Thanks a lot in advance!

derpetermann said:
For a somwehat simple version of the problem, imagine a group of ten people (n=10), for which we observe two binary attributes (d= 2). One of the attributes tells us if any given person wears glasses (d_1) the other if any given person wears a hat (d_2). We know that in total 5 people wear hats and 6 people wear glasses, but we don't know who wears what (random binary array). What is the probability of finding a subset of size 4 (k= 4) such that at least three people in the subset wear a hat (s_1 = 3) AND at least four people wear glasses (s_2 = 4)? It's somewhat clear that there are realisations where wearing a hat and wearing glasses are mutually exclusive, such that those who have a hat don't have glasses, and realisations where having glasses and having a hat go together. Thus, for any realisation of the random binary array there might or might not be a subset of size k with the above properties. I can't figure, though, how to calculate the probabilities.

The more general definition of the problem is this:

What is the probability of finding a k-subset in a d-dimensional random binary array of length n, such that this subset has at least s_1 ones in dimension 1, s_2 ones in dimension 2, ..., s_d ones in dimension d. The total number of ones per dimension (ts_1 to ts_d) are fixed.

Is there a name to this problem? I tried to google, but I really did not know what to look for, since the problem definition is quite unwieldy to begin with.

Is the problem tractable analytically? If yes does the solution scale to larger d, k and n when solved algorithmically?

Your help is highly appreciated! Thanks a lot in advance!

Sorry, I know this is vague ( and I will look into it further and get back to you when/if I find something better) but it seems like a Combinatorial question maybe related to Ramsey Theory https://duckduckgo.com/?q=Ramsey+numbers&t=h_&ia=web

Klystron
Are d_1 and d_2 independent?
There are 5 options for the number of people wearing both a hat and glasses, you can calculate the probability separately for each case and then add them.

I don't know about nice analytic solutions when there are too many cases to consider all of them, but simulations should work very well for approximate answers.

## What is the probability of finding a k-subset in a d-dimensional random?

The probability of finding a k-subset in a d-dimensional random depends on the specific distribution of the random variables and the size of the subset. Generally, the probability can be calculated using combinatorics and the binomial distribution formula.

## What is the difference between a k-subset and a d-dimensional random?

A k-subset refers to a set of k elements selected from a larger set, while a d-dimensional random refers to a set of d random variables that are independent and identically distributed. The k-subset is a subset of the d-dimensional random, and the probability of finding the k-subset depends on the distribution of the d-dimensional random.

## How can the probability of finding a k-subset in a d-dimensional random be calculated?

The probability can be calculated using combinatorics and the binomial distribution formula, which takes into account the number of possible combinations of the k elements in the d-dimensional random. The formula for calculating the probability is P = (n choose k) * p^k * (1-p)^(n-k), where n is the number of trials, k is the size of the subset, and p is the probability of success for each trial.

## What factors affect the probability of finding a k-subset in a d-dimensional random?

The main factors that affect the probability are the distribution of the d-dimensional random variables and the size of the subset. In addition, the number of trials and the probability of success for each trial also play a role in determining the overall probability.

## How is the probability of finding a k-subset in a d-dimensional random used in scientific research?

The probability of finding a k-subset in a d-dimensional random is often used in research to determine the likelihood of certain events occurring. It can also be used to analyze data and make predictions about future outcomes. In fields such as statistics, biology, and physics, understanding the probability of finding a k-subset in a d-dimensional random is crucial for making accurate conclusions and decisions based on data.

• Set Theory, Logic, Probability, Statistics
Replies
7
Views
2K
• Set Theory, Logic, Probability, Statistics
Replies
6
Views
5K
• Set Theory, Logic, Probability, Statistics
Replies
1
Views
2K
• Quantum Physics
Replies
6
Views
2K
• Set Theory, Logic, Probability, Statistics
Replies
5
Views
3K
• Set Theory, Logic, Probability, Statistics
Replies
4
Views
4K
• Calculus and Beyond Homework Help
Replies
9
Views
2K
• Calculus and Beyond Homework Help
Replies
7
Views
1K
• Calculus and Beyond Homework Help
Replies
1
Views
1K
• Math Proof Training and Practice
Replies
100
Views
7K