Dismiss Notice
Join Physics Forums Today!
The friendliest, high quality science and math community on the planet! Everyone who loves science is here!

News Dangers of Using Statistics Wrongly in Scientific Research

  1. Apr 24, 2017 #1

    jedishrfu

    Staff: Mentor

  2. jcsd
  3. Apr 24, 2017 #2
    Indeed a very interesting article. Thanks for sharing it.
    Yes, I completely agree with you.
     
  4. Apr 24, 2017 #3

    StatGuy2000

    User Avatar
    Education Advisor

    There has been an ongoing discussion on the blog of Andrew Gelman, a professor of statistics at Columbia University, regarding p-hacking and deep data dives in general, and in particular the work of Brian Wansink, of which the Ars Technica article above refers to.

    Here is one blog post, among many others:

    http://andrewgelman.com/2016/12/15/hark-hark-p-value-heavens-gate-sings/
     
  5. Apr 24, 2017 #4

    Ygggdrasil

    User Avatar
    Science Advisor

    FiveThirtyEight also has run a few features about p-hacking. One has a nice interactive demonstrating how one can p-hack a dataset to say one conclusion or another (https://fivethirtyeight.com/features/science-isnt-broken/#part1) and in another, they run some surveys and p-hack to find spurious correlations such as those linking raw tomato consumption with Judaism or drinking lemonade with believing Crash deserved to win best picture (http://fivethirtyeight.com/features/you-cant-trust-what-you-read-about-nutrition/).

    These are important points to consider when someone starts making wild claims about how data mining with artificial intelligence will revolutionize a new field or do something like help to cure cancer.
     
  6. Apr 24, 2017 #5
    In his book "Introduction to Medical Statistics" Second Edition Robert Mould gives a example of the dangers of interpreting correlations. Actual data on the number of storks documented in various towns shows a striking linear correlation with population. Finding correlations between seemingly unrelated variables can be dangerous in drawing conclusion if we do not have some underlying ideas for guidance of a possible relationship between the variables to start. In the case of the stork data a biologist would know that storks make nests on houses so no surprise with the stork population correlation.

    P-hacking is statistics bass-ackwards.​
     
  7. May 20, 2017 #6

    Stephen Tashi

    User Avatar
    Science Advisor

    How do we reconcile the advice "Don't do p-hacking" with advice like "Always graph you data to see what it looks like"? Is this just a matter of accepting perceptions of patterns that we find visually "obvious" and rejecting patterns detected by other means?
     
  8. May 20, 2017 #7

    Ygggdrasil

    User Avatar
    Science Advisor

    I would say do the opposite of p-hacking. Analyze your data in multiple ways, and only trust your conclusion if the statistical significance is robust to multiple means of analysis.
     
  9. May 20, 2017 #8

    jedishrfu

    Staff: Mentor

    Folks in data mining do a form of p hacking when scoring and clumping groups of data and they must develop a rationale that describes what they found.

    As an example, analysis of bank customer history can identify a group of customers planning to leave the bank because they match others who have. From there you can drill down to see why both groups are similar and develop marketting plans to stem the loss.

    In contrast, Cornell researchers developed a program to tease out the equations that describe a system based on measurement. It successfully discovered the equations of motion of a compound pendulum.

    Some biology researchers did the same thing and got some great equatons but couldnt publish because they couldnt explain them with some new plausible theory.
     
  10. Jun 2, 2017 #9

    DrDu

    User Avatar
    Science Advisor

    Therefore they adjust their significance levels for multiple testing.
     
  11. Jun 2, 2017 #10

    DrDu

    User Avatar
    Science Advisor

    Of course you should look at data to find something interesting. The point is that you shouldn't use the same data to test your hypotheses.
     
Know someone interested in this topic? Share this thread via Reddit, Google+, Twitter, or Facebook

Have something to add?
Draft saved Draft deleted



Similar Discussions: Dangers of Using Statistics Wrongly in Scientific Research
Loading...