Dear all, One of the main challenges in social science is inferring causality from nonexperimental data. Statistics can help. In a regression context regressing Y on T may not give the causal effect of T on Y due to unobserved confounding. A solution is finding a variable Z, called an instrumental variable, which only, and strongly (strength of the instrument), affects T but does not affect Y other than trough T (validity of the instrument). The problem in many areas of social science is that such a variable is very hard or impossible to come by. In the most general instrumental variable (IV) model, you want to estimate the effect of T on Y: Y = f(T,X,U) T = g(Z,X,V), where X are control variables, U and V possible related error terms and Z the instrument. Here identificaiton of the model hinges on Y independent of Z given T,X To cut the story short, what has not been used here but could be potentially useful is the concept of Conditional Mutual Information (MI) from information theory, a measure of conditional nonlinear dependence between two variables: http://en.wikipedia.org/wiki/Conditional_mutual_information . That is because MI(Y;Z | T,X) = 0 iff Y and Z are independent given T,X. Suppose we have a strong instrument Z, strongly related to T. However, it may not be totally valid, that is, related with Y other than through T and X. We could make it valid by transforming it by the following optimization program: find function f such that max MI(T;f(Z) | X ) s.t. MI(Y;f(Z) | X,T) = 0 or find function f such that max E(lik(f(Z), T | X) s.t. lik(f(Z), Y | X,T) = 0 or equivalently s.t. g(Y,T | X) = g(Y,T | X,f(Z)). where lik stands for loglikelihood and g for probability density function. Anyone any idea on how to solve this constrained optimization problem?