Statistics - What should I conclude about this data?

musicgold · Jan 22, 2014

Hi,

This is not really a homework question. Attached is a slide from a statistics presentation I found on the web.

I am not sure what conclusion I can draw from this data if I ignore the outlier (1906). Saying "higher magnitude earthquakes result in fewer deaths" seems totally counteractive.

Thanks.

berkeman · Jan 22, 2014

musicgold said:

Hi,

This is not really a homework question. Attached is a slide from a statistics presentation I found on the web.

I am not sure what conclusion I can draw from this data if I ignore the outlier (1906). Saying "higher magnitude earthquakes result in fewer deaths" seems totally counteractive.

Thanks.

It does look funny, but the data are not well organized. The high magnitude low death datapoint is for an area with very sparse population, so should be excluded. A better graph would compare quakes in similar population areas with similar building codes.

musicgold · Jan 22, 2014

Ok. Thanks.

SteamKing · Jan 22, 2014

musicgold said:

I am not sure what conclusion I can draw from this data if I ignore the outlier (1906). Saying "higher magnitude earthquakes result in fewer deaths" seems totally counteractive.

I think the word phrase you are looking for is 'counter intuitive' rather than 'counteractive'.

haruspex · Jan 22, 2014

berkeman said:

It does look funny, but the data are not well organized. The high magnitude low death datapoint is for an area with very sparse population, so should be excluded. A better graph would compare quakes in similar population areas with similar building codes.

Might be able to do a bit better than that. If you knew the density of population in each area you could normalise the data by taking the deaths as fraction of population. Looks like population density is influencing the numbers rather more than the severity of the earthquake is. Question is, what area to take around each site? Ideally, it would be some kind of integral wrt radius from epicentre, something like ##\int_{r=0}\frac{density(r)}{1+k r^n}rdr##, but maybe you could just fix on a sufficiently large circle to encompass all deaths.

AlephZero · Jan 22, 2014

Also, you could investigate what size of correlation coefficient is significant, for such a small sample size.

Or even whether the notion of "correlation" is meaningful at all, with so few data points. To take a ridiculously extreme example, if you only have two data points, you will always get a correlation of +1 or -1.

If you knew the density of population in each area you could normalise the data by taking the deaths as fraction of population.

Maybe ... but the relevant building codes would be different in a low population density rural area, compared with skyscrapers in a city center. And earthquakes don't necessarily happen where planners think they are most likely to happen.

The danger of going down this route is that you do a lot of research and end up with 7 different "stories" about 7 different events, but you still can't really draw any general conclusions because there events don't have much in common except they were all "earthquakes".

Statistics - What should I conclude about this data?

Attachments

Thread 'Finding the nth roots of a complex number'

Thread 'Solve this problem that involves induction'

Similar threads

Hot Threads

Prove that the integral is equal to ##\pi^2/8##

Solving the wave equation with piecewise initial conditions

Area of loop in x-y plane

Calculating radius of gyration of plane figure about x-axis

Solve this problem that involves induction

Recent Insights

Insights Quantum Entanglement is a Kinematic Fact, not a Dynamical Effect

Insights What Exactly is Dirac’s Delta Function? - Insight

Insights Relativator (Circular Slide-Rule): Simulated with Desmos - Insight

Insights Fixing Things Which Can Go Wrong With Complex Numbers

Insights Fermat's Last Theorem

Insights Why Vector Spaces Explain The World: A Historical Perspective