- #1
stats
- 1
- 0
Dear all,
One of the main challenges in social science is inferring causality from nonexperimental data.
Statistics can help. In a regression context regressing Y on T may not give the causal effect of T on Y due to unobserved confounding. A solution is finding a variable Z, called an instrumental variable, which only, and strongly (strength of the instrument), affects T but does not affect Y other than trough T (validity of the instrument). The problem in many areas of social science is that such a variable is very hard or impossible to come by.
In the most general instrumental variable (IV) model, you want to estimate the effect of T on Y:
Y = f(T,X,U)
T = g(Z,X,V),
where X are control variables, U and V possible related error terms and Z the instrument. Here identificaiton of the model hinges on
Y independent of Z given T,X
To cut the story short, what has not been used here but could be potentially useful is the concept of Conditional Mutual Information (MI) from information theory, a measure of conditional nonlinear dependence between two variables: http://en.wikipedia.org/wiki/Conditional_mutual_information .
That is because MI(Y;Z | T,X) = 0 iff Y and Z are independent given T,X.
Suppose we have a strong instrument Z, strongly related to T. However, it may not be totally valid, that is, related with Y other than through T and X. We could make it valid by transforming it by the following optimization program:
find function f such that
max MI(T;f(Z) | X )
s.t. MI(Y;f(Z) | X,T) = 0
or find function f such that max E(lik(f(Z), T | X)
s.t. lik(f(Z), Y | X,T) = 0
or equivalently
s.t. g(Y,T | X) = g(Y,T | X,f(Z)).
where lik stands for loglikelihood and g for probability density function. Anyone any idea on how to solve this constrained optimization problem?
One of the main challenges in social science is inferring causality from nonexperimental data.
Statistics can help. In a regression context regressing Y on T may not give the causal effect of T on Y due to unobserved confounding. A solution is finding a variable Z, called an instrumental variable, which only, and strongly (strength of the instrument), affects T but does not affect Y other than trough T (validity of the instrument). The problem in many areas of social science is that such a variable is very hard or impossible to come by.
In the most general instrumental variable (IV) model, you want to estimate the effect of T on Y:
Y = f(T,X,U)
T = g(Z,X,V),
where X are control variables, U and V possible related error terms and Z the instrument. Here identificaiton of the model hinges on
Y independent of Z given T,X
To cut the story short, what has not been used here but could be potentially useful is the concept of Conditional Mutual Information (MI) from information theory, a measure of conditional nonlinear dependence between two variables: http://en.wikipedia.org/wiki/Conditional_mutual_information .
That is because MI(Y;Z | T,X) = 0 iff Y and Z are independent given T,X.
Suppose we have a strong instrument Z, strongly related to T. However, it may not be totally valid, that is, related with Y other than through T and X. We could make it valid by transforming it by the following optimization program:
find function f such that
max MI(T;f(Z) | X )
s.t. MI(Y;f(Z) | X,T) = 0
or find function f such that max E(lik(f(Z), T | X)
s.t. lik(f(Z), Y | X,T) = 0
or equivalently
s.t. g(Y,T | X) = g(Y,T | X,f(Z)).
where lik stands for loglikelihood and g for probability density function. Anyone any idea on how to solve this constrained optimization problem?
Last edited: