What is Statistics: Definition and 999 Discussions

Statistics is the discipline that concerns the collection, organization, analysis, interpretation, and presentation of data. In applying statistics to a scientific, industrial, or social problem, it is conventional to begin with a statistical population or a statistical model to be studied. Populations can be diverse groups of people or objects such as "all people living in a country" or "every atom composing a crystal". Statistics deals with every aspect of data, including the planning of data collection in terms of the design of surveys and experiments.When census data cannot be collected, statisticians collect data by developing specific experiment designs and survey samples. Representative sampling assures that inferences and conclusions can reasonably extend from the sample to the population as a whole. An experimental study involves taking measurements of the system under study, manipulating the system, and then taking additional measurements using the same procedure to determine if the manipulation has modified the values of the measurements. In contrast, an observational study does not involve experimental manipulation.
Two main statistical methods are used in data analysis: descriptive statistics, which summarize data from a sample using indexes such as the mean or standard deviation, and inferential statistics, which draw conclusions from data that are subject to random variation (e.g., observational errors, sampling variation). Descriptive statistics are most often concerned with two sets of properties of a distribution (sample or population): central tendency (or location) seeks to characterize the distribution's central or typical value, while dispersion (or variability) characterizes the extent to which members of the distribution depart from its center and each other. Inferences on mathematical statistics are made under the framework of probability theory, which deals with the analysis of random phenomena.
A standard statistical procedure involves the collection of data leading to test of the relationship between two statistical data sets, or a data set and synthetic data drawn from an idealized model. A hypothesis is proposed for the statistical relationship between the two data sets, and this is compared as an alternative to an idealized null hypothesis of no relationship between two data sets. Rejecting or disproving the null hypothesis is done using statistical tests that quantify the sense in which the null can be proven false, given the data that are used in the test. Working from a null hypothesis, two basic forms of error are recognized: Type I errors (null hypothesis is falsely rejected giving a "false positive") and Type II errors (null hypothesis fails to be rejected and an actual relationship between populations is missed giving a "false negative"). Multiple problems have come to be associated with this framework, ranging from obtaining a sufficient sample size to specifying an adequate null hypothesis. Measurement processes that generate statistical data are also subject to error. Many of these errors are classified as random (noise) or systematic (bias), but other types of errors (e.g., blunder, such as when an analyst reports incorrect units) can also occur. The presence of missing data or censoring may result in biased estimates and specific techniques have been developed to address these problems.

View More On Wikipedia.org
  1. S

    B Should I be treating the data I have as a Population or Sample?

    A study on strength properties of high-performance concrete obtained by using super-plasticizers and certain binders recorded the following data on flexural strength (in mega-pascals, MPa) from 28 tests: 6.1, 5.6, 7.1, 7.3, 6.6, 8.0, 6.8, 6.6, 7.6, 6.8, 6.7, 6.6, 6.8, 7.6, 9.3, 8.2, 8.7, 7.7...
  2. Gary Venter

    New member: Stat Modeler Learning Physics

    I studied foundations of mathematics from mathematical and philosophical angles in grad school but then went on to a career of building and testing statistical risk models. The guiding philosophy there, which I call Boxian Skepticism, derives from a quote of George Box: "All models are wrong but...
  3. J

    A Bivariate Smoothing Splines

    Does anyone know of a bivariate smoothing spline package that lets you set your own loss function? All of the public domain software I've been able to find (e.g., SCIPY) appears to minimize the sum of squared errors. For example, I'd like to set the spline coefficients to maximize the...
  4. chwala

    Test whether the diets are different from one another at ##α=5\%##

    Looking at stats today, In my working i have; Let ##H_0 = μ_1=μ_2## v/s ##H_1 = μ_1-μ_2≠ 0## then, ##\bar x = \dfrac{134+83+...+123}{12}=120## ##\bar y = \dfrac{70+118...+94}{7}=101## ##t=\dfrac{\bar x- \bar y}{S_p ⋅\sqrt {\dfrac{1}{n_1}+\dfrac{1}{n_2}}}## ##t=\dfrac{120-101}{21.21...
  5. TomVassos

    I Calculating the End of the Universe Using Standard Deviation Statistics

    One possible end to the Universe is called vacuum decay, where a Higgs boson could transition from a false vacuum to a true vacuum state. This would create a vacuum decay bubble (known as bubble nucleation) that would expand at light speed, destroying everything in its path. According to Anders...
  6. R

    I Variation of the Liar's Paradox

    A variation of the Liar's Paradox occurred to me: "Statistics are wrong 90% of the time". This statement seems to refute itself, but does so in a less straightforward way. I would appreciate any insights! And what about, "Statistics are wrong 50% of the time"? (Even odds.)
  7. fresh_42

    B AI Detection - Phase 1: sample collection

    I know two programs that claim to be able to detect whether a text has been written by a machine or by a human. A (ZeroGPT): https://www.zerogpt.com/ B (OpenAI): https://openai-openai-detector.hf.space/ Character Count: https://www.lettercount.com/ If you have time and examples, please test...
  8. A

    B Definition of a random variable in quantum mechanics?

    In a line of reasoning that involves measurement outcomes in quantum mechanics, such as spins, photons hitting a detection screen (with discrete positions, like in a CCD), atomic decays (like in a Geiger detector counting at discrete time intervals, etc.), I would like to define rigorously the...
  9. T

    Statistics problem: Comparing written work with & w/out use of AI

    I want to compare performance on written work under different conditions, for example with and without the use of AI, according to some specified criteria. Assume the written work is a critical analysis of specific content. The written work will be scored on a number of dimensions, such as...
  10. Memo

    Statistics and Probabilities: How many type A pigs from the farm's herd of 1,000 pigs have 3 DYL blood?

    Mentor note: Thread moved from technical section to here, so is missing the homework template. TL;DR Summary: The weight of DYL 3-blood hybrid pigs after correction of a farm is a random quantity with a normal distribution. Knowing that the probability of a pig weighing over 20 kg is 0.1587 and...
  11. A

    Meaning of "Average" Flux Tallies in MCNP

    Hello, I've been working with MCNP on and off for a few years now, but just recently realized that I don't entirely understand how tallies are actually calculated in MCNP, and what they signify. Taking the example of the F2 tally, the user manual (Section 3.3.5.1) states that F2 is the "flux...
  12. C

    I Help with probability problem: Probability that one random Gaussian event will happen before another one

    For concretness I'll use atoms and photons but this problem is actually just about probabilities. There's an atom A whose probability to emit a photon between times t and t+dt is given by a gaussian distribution probability P_A centered around time T_A with variance V_A. There's a similar atom...
  13. M

    Hello friends!

    Post-grad, my background is in mathematical physics, probability/statistics, and information theory. I am here for discussion and collaboration on things I find interesting from time to time.
  14. Artemisa

    Error floors in this Bayesian analysis

    In this article((https://arxiv.org/pdf/2001.04581.pdf)), the authors use a Bayesian analysis based on the positions of astrophysical bodies and their errors in the medians. This statistical analysis uses the markov chain monte carlo chains. The uncertainties in the positions are large, so what...
  15. hagopbul

    About meta trader platform

    TL;DR Summary: Asking about meta trader platform and what mathematical theories should i read about Hello : Recently got my attention a claim about meta trader platform and how you can use it as supportive income source What is this platform exactly ? What should I read to be able to use...
  16. Graham87

    I Basic standard deviation calculation

    I don’t get how they got the equation for the standard deviation. Why do they only square with the time in the denominator? Thanks!
  17. H

    Good introductory book on statistical/data analysis?

    TL;DR Summary: I'm looking for a book on statistical/data analysis. Hey all. I've been doing statistical analysis in my research (such as using PCA and LDA), but I have never received a formal education on statistical analysis or data mining, and what I know about analysis is quite scattered...
  18. P

    Understanding the meaning of "expected fraction" (Statistics)

    The first part of the question asked me to calculate the mean and standard deviation for the number of remain votes in the simple binomial model consisting of total sample size of 2091 people. I believe this is fairly straightforward, it was simply ##E(X) = \mu = 2091(0.5) = 1045.5## votes and...
  19. S

    Probability of Hypokalemia w/ 1 or Multiple Measurements

    TL;DR Summary: Finding the probability with one measurement and multiple measurements on separate days. Question: Hypokalemia is diagnosed when blood potassium levels are low, below 3.5 mmol/L. Let’s assume we know a patient whose measured potassium levels vary daily according to N(µ = 3.8...
  20. A

    Break a Stick Example: Random Variables

    Hello, I would like to confirm my answers to the following random variables question. Would anyone be willing to provide feedback and see if I'm on the right track? Thank you in advance. My attempt:
  21. shahbaznihal

    A Computing the Fisher Matrix numerically

    Hi, I have been studying the Fisher matrix to apply in a project. I understand how to compute a fisher matrix when you have a simple model for example which is linear in the model parameters (in that case the derivatives of the model with respect to the parameters are independent of the...
  22. P

    I Are Boltzman's statistics compatible with a deterministic universe?

    Are Boltzman's statistics compatible with deterministic universe? Suppose that the gas molecules in a given container are perfectly elastic objects obeying Newton's laws. Suppose further that we select the initial conditions (impulse and position of each molecule) at random. Is it true that, if...
  23. A

    A How to derive the sampling distribution of some statistics

    Assume that ##T## has an Erlang distribution: $$\displaystyle f \left(t \, | \, k \right)=\frac{\lambda ^{k }~t ^{k -1}~e^{-\lambda ~t }}{\left(k -1\right)!}$$ and ##K## has a geometric distribution $$\displaystyle P \left( K=k \right) \, = \, \left( 1-p \right) ^{k-1}p$$ Then the compound...
  24. WMDhamnekar

    MHB Probability, Expected value, joint P.D.F. and order statistics

    I want to know how did author derive the red underlined term in the below given Example? Would any member of Math help board enlighten me in this regard? Any math help will be accepted.
  25. L

    [Statistics] Calculate the percentage

    My attempt: P(x>=90) = 85/90 = 17/18 Is my understanding of the equation correct? Thanks
  26. A

    I Modeling the concentration of gas constituents in a Force Field

    Say there is a gas made up of two gas molecules: Molecule A and Molecule B. Molecule A has a mass: ma and mole fraction: na. Molecule B has a mass: mb and mole fraction: nb. The gas is at thermal equilibrium and has a constant temperature throughout itself (T) everywhere. It is placed in a...
  27. D

    Learn About Ancillary Statistics & Their Role in Education

    Ancillary statistics! You don't know what this means? I didn't know either, so I looked it up: http://utstat.toronto.edu/reid/research/A20n41.pdf As a non-native speaker, I didn't even know what "ancillary" means, so I had to look it up, too. The word has its root in latin "ancilla" which is...
  28. A

    Calculus Advanced Calculus with Applications in Statistics

    Is someone has already heard about this book wrote by Andre I. Khuri (Professor emeritus in science at university of Florida) ? By the table of contents the book seems to cover a lot of things in calculus/multivariable calculus and in a rigourous way according to the preface (they argue that...
  29. S

    I Bayesian statistics in science

    [Moderator's note: This thread has been split off from a previous thread since its topic is best addressed in a separate discussion. This post has been edited to focus on the topic for separate discussion.] Jaynes has used in the derivation of the rules of probability as the logic of plausible...
  30. chwala

    Solve the variance problem below - statistics

    The question is below: below is my own working; the mark scheme for the question is below here; i am seeking for any other approach that may be there...am now trying to refresh on stats...bingo!
  31. tixi

    Labwork Statistics help: Average of averages

    I have done the experiment, and have a lot of data. For each data point (we have five), we did ten repetitions, for which we need to do video analysis. The analysis works frame by frame and gives a velocity between each frame. So, to get the value of one repetition, we already need to calculate...
  32. Amitkumarr

    I Finding bias of the coin from noise corrupted signals

    Suppose there are two persons A and B such that both have a personal communication system which can transmit and receive bits. B has a biased coin whose bias is not known. A asks B to toss the coin 2000 times, send a 0 when a tail comes up and a 1 when a head comes up. It is known that whatever...
  33. V

    B Convince Covid-19 Vaccine Efficiency Through Statistics

    I have been trying to convince someone that it is wrong to compare the death percentages of two different populations (percentage of death of Covid-19 cases per category: vaccinated vs unvaccinated) in an uncontrolled setting (i.e. real-world data), and conclude that the Covid-19 vaccine does...
  34. ohwilleke

    I Why Do Physicists Use Gaussian Error Distributions?

    David C. Bailey. "Not Normal: the uncertainties of scientific measurements." Royal Society Open 4(1) Science 160600 (2017). How bad are the tails? According to Bailey in an interview, "The chance of large differences does not fall off exponentially as you'd expect in a normal bell curve," and...
  35. S

    Stock trading volume statistics

    Has the advent of computer trading greatly increased the size of statistics for trading volume? - or do those statistics (for individual stocks) somehow omit the flash trades done by computers? In the pre-computer days, there were people who had theories of stock trading based on both the...
  36. W

    A Using Statistics to Test for Normality of Pi

    Is there a " reasonable" way to test for the normality of ##\pi## , i .e., that every digit occurs with the same frequency? Someone suggested randomly sampling strings of size 20 and outputting the frequency. Then I guess we could average the frequencies among samples , use a chi-squared test...
  37. Falgun

    Prob/Stats Looking for a probability and statistics textbook

    I want to learn some probability & statistics on my own. I am well versed in Calc 1-3 , elementary ODEs and very little linear algebra. I want a comprehensive , introductory textbook which is NOT COOKBOOK STYLE. I might be self studying AP statistics next term so if the book covers everything I...
  38. shahbaznihal

    A Galaxy statistics calculation in Saslaw's book

    I am trying to follow a calculation from the book of William C. Saslaw, The Distribution of the Galaxies: Gravitational Clustering in Cosmology. The calculation is shown on the pages following page 122 in chapter 14 where the author talks about the Correlation function. I am able to reproduce...
  39. chwala

    Discrete data vs continous data in statistics

    I would like to seek your take on the two terms; discrete and continuous in this context, In my understanding, when we look at height of individuals (in cms), this measure in general or in definition implies continuous data. If we are to look at specific math problem that involves height of say...
  40. W

    I Bias in Linear Regression (x-intercept) vs Statistics

    Hi, In simple regression for machine learning , a model : Y=mx +b , Is said AFAIK, to have bias equal to b. Is there a relation between the use of bias here and the use of bias in terms of estimators for population parameters, i.e., the bias of an estimator P^ for a population parameter P is...
  41. L

    How to Start a Problem I'm Struggling With

    I really don't know what to do for this problem. I looked at similar threads but couldn't seem to grasp the idea of it. I would like help on how to start.
  42. V

    MHB Creating An Awesome Statistics Course For Students

    Hi everybody, my name is Vaughny. I was once a statistics tutor, and I loved making statistics easier to understand for those who struggle learning the material. I've seen students in high school and college go through a painful experience of not having enough resources or just having horrible...
  43. S

    Question about hint given by the problem related to statistics

    I want to ask about the "problem-solving" box on the right. I don't understand why the class boundaries for the 16 - 25 group are 16 and 26. If I try to find it using my usual way, it will be 15.5 and 25.5 and the midpoint will be (15.5 + 25.5) / 2 = 20.5 Or if I use the hint "since age is...
  44. S

    MHB What is the probability of exceeding maximum weight with a normal distribution?

    The weight of goats at a farm is normally distributed with a mean of 60 kg and a standard deviation of 10 kg. A truck used to transport goats can only accommodate not more than 650 kg. If 10 goats are selected at random from the population, what is the probability that the total weight exceeds...
  45. Athenian

    Finding the Relative Uncertainty for the Standard Error of the Mean

    While I will not be showing the graph here, I am trying to dissect what the question even means. While I do understand that relative uncertainty can be found via the equation ##\frac{\sigma_A}{A}##, I do not understand how I can find the "relative uncertainty of SEM". Does anybody here have any...
  46. CPW

    Encouraging fact from cancer statistics

    We study cancer to get better at killing it. And here is an enouraging detail: Since 1975, the cancer death rate in the United States has decreased by 21.9% with a 15% decrease from 2007 to 2017. (https://seer.cancer.gov/csr/1975_2017)
  47. C

    I Large Q^2 statistics at different colliders

    How tractable is it experimentally to measure deeply virtual compton scattering in bins of large Q^2, where Q^2 is the virtuality of the incoming photon, at e.g. Jefferson Lab which collides electron and proton? I know at LHC, colliding proton-proton, such processes would instead be statistics...
Back
Top