Understanding Probability Distributions for Generating Random Data

  • Context: Undergrad 
  • Thread starter Thread starter EliteLegend
  • Start date Start date
  • Tags Tags
    Numbers Random
Click For Summary
SUMMARY

This discussion focuses on using the Pareto Distribution's probability density function (pdf) to generate datasets that reflect specific item selection probabilities. The user seeks guidance on generating 40% of a set of 50 items, 60% of the time, and explores the method of duplicating items to achieve the desired selection proportions. The proposed solution involves creating a modified dataset by duplicating items based on their required probabilities and then applying a uniform selection rule. This approach ensures that items are selected according to specified frequency distributions.

PREREQUISITES
  • Understanding of Pareto Distribution and its probability density function (pdf)
  • Familiarity with basic probability concepts and item selection techniques
  • Knowledge of dataset manipulation and item duplication methods
  • Experience with random sampling techniques in data generation
NEXT STEPS
  • Research the mathematical foundations of the Pareto Distribution and its applications
  • Learn about advanced random sampling techniques, including stratified sampling
  • Explore programming libraries for generating random data, such as NumPy in Python
  • Investigate methods for simulating probability distributions in data science
USEFUL FOR

Data scientists, statisticians, and software developers interested in generating random datasets with specific probability distributions will benefit from this discussion.

EliteLegend
Messages
6
Reaction score
0
I have recently come across the 80/20 rule... I am using the Pareto Distribution's pdf to generate some dataset that I wanted... Now if I have a set of 50 items and I need to generate 40% of these items 60% of the times, how am I supposed to go about doing this? I know how to select items with certain probabilities but this task is confusing me... Anyone has some inputs for me please?
 
Physics news on Phys.org
One way is to create duplicate (multiplicate) items until you reach the desired proportion, then use a uniform rule to select.

For example, if I had 2 items {x, y} and wanted to obtain x 67% of the time, I'd duplicate x once, and make draws from the set {x, x, y}.
 
To select one of m items, from a total of n items, a proportion y of the time, you need to select each of the m items with probability y/m, and each of the remaining n-m items with probability (1-y)/(n-m)
 
Last edited:

Similar threads

  • · Replies 11 ·
Replies
11
Views
4K
  • · Replies 3 ·
Replies
3
Views
3K
  • · Replies 11 ·
Replies
11
Views
3K
  • · Replies 30 ·
2
Replies
30
Views
5K
  • · Replies 5 ·
Replies
5
Views
3K
  • · Replies 6 ·
Replies
6
Views
3K
  • · Replies 2 ·
Replies
2
Views
2K
Replies
10
Views
1K
  • · Replies 2 ·
Replies
2
Views
1K
  • · Replies 1 ·
Replies
1
Views
2K