How is causality proven when it's impossible to change the IV?

Click For Summary
SUMMARY

This discussion centers on the challenges of proving causality when the independent variable (IV) cannot be manipulated. It highlights that correlation is often misinterpreted as causation, as seen in studies like the one from WebMD linking breakfast consumption to lower obesity rates. Participants argue that while correlation can suggest causation, it is not definitive without experimental manipulation. The conversation emphasizes the importance of understanding the context of data and the methodologies used in statistical analysis, particularly when assessing non-linear relationships and employing data mining techniques such as Naive Bayes and Support Vector Machines.

PREREQUISITES
  • Understanding of correlation vs. causation in statistics
  • Familiarity with statistical methodologies and confidence intervals
  • Knowledge of data mining techniques, including Naive Bayes and Support Vector Machines
  • Ability to interpret scatter plots and assess linear vs. non-linear relationships
NEXT STEPS
  • Research the implications of correlation in scientific studies and its limitations
  • Explore advanced statistical methods for establishing causality without manipulation
  • Learn about data mining techniques and their applications in causal inference
  • Investigate the role of experimental design in proving causality in research
USEFUL FOR

Researchers, statisticians, data scientists, and anyone interested in understanding the complexities of causality and correlation in statistical analysis.

Cinitiator
Messages
66
Reaction score
0
How is causality usually proven when it's impossible to change the independent variable?
Also, why is correlation often treated as a very reliable hint to causation, even in various respected journals?

Here's an example:
http://www.webmd.com/diet/news/20080303/eating-breakfast-may-beat-teen-obesity

This article directly implies a causality between eating breakfast and weighting less ("Although adolescents may think that skipping breakfast seems like a good way to save on calories, findings suggest the opposite."), based only on a correlation. However, there can be many causes for the said correlation. For example, those who are more physically active (which is a cause for a weight loss) may be more likely to eat breakfast. Or those who sleep less may be more likely to skip breakfasts, and also have a slower metabolism as a result of sleep deprivation. Doesn't one need to conduct an experiment and see if manipulating the breakfast independent variable will actually lead to a reduction in weight?
 
Physics news on Phys.org
Cinitiator said:
How is causality usually proven when it's impossible to change the independent variable?

Also, why is correlation often treated as a very reliable hint to causation, even in various respected journals?

It's easy to conclude that "Showing X doesn't mathematiclly prove causality" for any X that you want to pick since there is no rigorous definition for casuality in mathematical statistics. Thus there are no mathematical proofs of it.
 
Cinitiator said:
How is causality usually proven when it's impossible to change the independent variable?

You argue: give evidence, refute objections, etc. and try to persuade that X causes Y.

Cinitiator said:
Also, why is correlation often treated as a very reliable hint to causation, even in various respected journals?

It is a reliable hint.
 
As has been pointed out, the point is to provide evidence and to make suggestions to build on what has been found as a means of where to look next and how to make an interpretation of what has been observed.

All of statistics is about confidence at some point, but even then we need to look at why we are confident about a particular assertion and that means looking at what our data is, what the context of the experiment is and how the whole process leads to a conclusion or interpretation.

Confidence can be a con-game and it's important to understand when this is the case by avoiding the results and looking at where they came from and how they were generated.
 
ImaLooser said:
It is a reliable hint.

Let's say that ice cream sales usually go up in the summer, and so do the drowning deaths.
Is the correlation between the rise of the aggregate ice cream consumption and drowning deaths a reliable hint to a causality between the two?

chiro said:
As has been pointed out, the point is to provide evidence and to make suggestions to build on what has been found as a means of where to look next and how to make an interpretation of what has been observed.

All of statistics is about confidence at some point, but even then we need to look at why we are confident about a particular assertion and that means looking at what our data is, what the context of the experiment is and how the whole process leads to a conclusion or interpretation.

Confidence can be a con-game and it's important to understand when this is the case by avoiding the results and looking at where they came from and how they were generated.

But how can we demonstrate any causality with any confidence intervals if we can't change the independent variable? What kind of statistics tools are available for determining the said causality in these cases?
 
You can determine that something fits a particular model that relates two variables together in a way that they are highly dependent.

The simplest way of doing this is to use correlation, but this is really only a good thing when you have a linear relationship: it doesn't really work when you have something highly non-linear even if there is a strong non-linear relationship.

The next thing is to look at whether there is evidence of a non-linear relationship given the scatter-plot or the data. Typically one will either try and transform the data to something resembling a linear fit or they will try one of several general methods to find relationships.

The area of data mining has a lot of these general methods and they have different ideas in terms of how they originated and what they focus on.

The idea is that these methods look for patterns of any kind and you have everything from Naive Bayes to Support Vector Machines to find the so called patterns.

Statistically, naive bayes and the entropy methods are ways of finding patterns and establish possible causal links between variables or at the least, subsets of the data that you have.

There are definitions for orthogonality of random variables and also for independence of random variables and these can also be used to make hypotheses of whether certain variables may have a relation or not.
 
Cinitiator said:
But how can we demonstrate any causality with any confidence intervals if we can't change the independent variable? What kind of statistics tools are available for determining the said causality in these cases?

As I said before, there is no formal definition of "causality" in mathematical statistics. You are asking a question about scientific methodology, not a question about mathematics.
 

Similar threads

  • · Replies 13 ·
Replies
13
Views
3K
  • · Replies 40 ·
2
Replies
40
Views
8K
  • · Replies 5 ·
Replies
5
Views
3K
  • · Replies 54 ·
2
Replies
54
Views
11K
  • · Replies 13 ·
Replies
13
Views
10K
  • · Replies 4 ·
Replies
4
Views
39K
  • · Replies 13 ·
Replies
13
Views
5K