# Function estimation subject to equality of densities

1. Dec 10, 2011

### stats

Dear all,

One of the main challenges in social science is inferring causality from nonexperimental data.
Statistics can help. In a regression context regressing Y on T may not give the causal effect of T on Y due to unobserved confounding. A solution is finding a variable Z, called an instrumental variable, which only, and strongly (strength of the instrument), affects T but does not affect Y other than trough T (validity of the instrument). The problem in many areas of social science is that such a variable is very hard or impossible to come by.

In the most general instrumental variable (IV) model, you want to estimate the effect of T on Y:

Y = f(T,X,U)
T = g(Z,X,V),

where X are control variables, U and V possible related error terms and Z the instrument. Here identificaiton of the model hinges on

Y independent of Z given T,X

To cut the story short, what has not been used here but could be potentially useful is the concept of Conditional Mutual Information (MI) from information theory, a measure of conditional nonlinear dependence between two variables: http://en.wikipedia.org/wiki/Conditional_mutual_information .

That is because MI(Y;Z | T,X) = 0 iff Y and Z are independent given T,X.
Suppose we have a strong instrument Z, strongly related to T. However, it may not be totally valid, that is, related with Y other than through T and X. We could make it valid by transforming it by the following optimization program:

find function f such that
max MI(T;f(Z) | X )
s.t. MI(Y;f(Z) | X,T) = 0

or find function f such that max E(lik(f(Z), T | X)
s.t. lik(f(Z), Y | X,T) = 0
or equivalently
s.t. g(Y,T | X) = g(Y,T | X,f(Z)).

where lik stands for loglikelihood and g for probability density function. Anyone any idea on how to solve this constrained optimization problem?

Last edited: Dec 10, 2011
2. Dec 10, 2011

### Stephen Tashi

My thoughts:

First the truisms:

If the distributions involved are all represented as expressions with known or unknown constant parameters then this looks like the usual sort of optimization problem. The "MI" functions would be functions of the uknown constants, which we regard as the variables in the optimization problem. We try to maximize the the function in the top line subject to the constraint given by the second line. I'm sure people can Wikipedize us with links on that topic (lagrange multipliers etc.)

A limitation of this approach is that f(Z) doesn't vary over "the set of all possible functions", it only varies over a certain parametric family.

The well known field of math that allows f(Z) to vary over something more inclusive that a parameterized family is the Calculus Of Variations. I don't know it well enough! So I can't say whether you could fit your problem into that form. Perhaps some other forum member will tell us. I suppose we could eventually figure it out.

Other thoughts:

I don't have a good intuition about whether the constraint MI(Y;f(Z)| X,T) = 0 is possible to meet. All I recall is that the entropy of a continuous distribution is not necessarily invariant under change of variable. So if we are dealing with continuous distributions perhaps there is a chance. I also don't have a good intuition about whether a solution to the problem as stated might be trivial in some way, for example f(Z) = constant.

Due my lack of intuition, I suggest starting with a simple version of this problem. If we want to model dependent random variables A and B, then a simple minded way is to postulate the existence of a set of independent identically distributed random variables $\{W_1,W_2,..\}$ and assume A and B are each known functions of these variables, thereby introducing a dependence between A and B.

What type of functions of the $W_i$ make the problem simple? Maybe you've already tried something like this and you know.

Linear models are boring, but there are lots of known results about them. Let W be column vector $(1,w_1,w_2,...)$. Represent a random variable A as A = aW where a is a row vector of constants.

An example optimizing over a parametrized family approach would be to take T,Z,X,Y as having known constants and f(Z) as being in some parametric family of functions, the simplest interesting one being a linear function. Maybe working the problem this way would re-derive some well known result from linear modeling.