Data Analysis: Automating Distribution Comparisons from Daily Stats

Click For Summary
SUMMARY

The discussion focuses on automating the analysis of daily statistical updates and calculating distributions for a dataset involving 1000 individuals. The user seeks a programming or database language, likely SQL, to facilitate this process. Key requirements include the ability to create distributions for various variables and visually represent individual statistics against these distributions, potentially using color coding. Excel is mentioned as a viable tool for this task, particularly with macros for data importation and statistical calculations.

PREREQUISITES
  • Understanding of SQL for database management and queries.
  • Familiarity with Excel for data manipulation and statistical analysis.
  • Basic knowledge of statistical concepts such as distributions and percentiles.
  • Experience with macros in Excel for automating repetitive tasks.
NEXT STEPS
  • Learn advanced SQL techniques for data aggregation and analysis.
  • Explore Excel's statistical functions and conditional formatting for visual data representation.
  • Research how to implement macros in Excel for automating data import and processing.
  • Investigate statistical software tools like R or Python's Pandas for more complex data analysis.
USEFUL FOR

Data analysts, statisticians, and anyone involved in automating statistical comparisons and visualizing data distributions will benefit from this discussion.

bloynoys
Messages
25
Reaction score
0
Hey guys, I have a question of how to go about answering a question. I am trying to decide what coding/database language to learn next (most likely SQL with some access thrown in), but have an overall question.

I am looking for an application or way to crunch daily statistical updates, then calculate distributions from that, then go back and compare numbers to distribution. So ideally, there are 1000 people, they do things on a daily basis in many different variables, these get tabulated and updated daily, I would like a program or write one that takes these creates a distribution for each column variable from data and then goes back and compares each individual statistic to overall distribution and color coats it (or another defining way of showing that) depending on where they are in the distribution. Like white is people in the top 5% of that variable, yellow shows the next 15% etc etc. Is there a way to do this easier than epically long formulas in excel and more automated to read in daily stat updates?

Thanks!
 
Physics news on Phys.org
All common statistics tools should be able to do that.
If you have data as list (name,value), marking the top 5%/20%/... is easy with excel as well, and you don't need those tools. You could probably import data with a macro.
 

Similar threads

  • · Replies 7 ·
Replies
7
Views
2K
  • · Replies 3 ·
Replies
3
Views
2K
  • · Replies 24 ·
Replies
24
Views
4K
  • · Replies 8 ·
Replies
8
Views
2K
  • · Replies 3 ·
Replies
3
Views
3K
  • · Replies 6 ·
Replies
6
Views
6K
  • · Replies 1 ·
Replies
1
Views
2K
  • · Replies 1 ·
Replies
1
Views
1K
  • · Replies 8 ·
Replies
8
Views
2K
  • · Replies 4 ·
Replies
4
Views
4K