Function estimation subject to equality of densities

  • Context: Graduate 
  • Thread starter Thread starter stats
  • Start date Start date
  • Tags Tags
    Estimation Function
Click For Summary
SUMMARY

This discussion centers on the challenges of inferring causality in social sciences using instrumental variables (IV) within regression models. The key equation presented is Y = f(T,X,U) and T = g(Z,X,V), where Z serves as the instrumental variable. The concept of Conditional Mutual Information (MI) is introduced as a potential solution to optimize the validity of the instrument Z, specifically through the optimization problem: find function f such that max MI(T;f(Z) | X) subject to MI(Y;f(Z) | X,T) = 0. The discussion also touches on the limitations of the approach and suggests exploring the Calculus of Variations for a broader solution space.

PREREQUISITES
  • Understanding of instrumental variables in regression analysis
  • Familiarity with Conditional Mutual Information (MI) and its applications
  • Knowledge of optimization techniques, including Lagrange multipliers
  • Basic principles of the Calculus of Variations
NEXT STEPS
  • Research the application of Conditional Mutual Information in causal inference
  • Study optimization methods in statistics, focusing on Lagrange multipliers
  • Explore the Calculus of Variations and its relevance to constrained optimization problems
  • Investigate linear models and their implications in modeling dependent random variables
USEFUL FOR

Researchers in social sciences, statisticians working on causal inference, and data scientists interested in advanced regression techniques and optimization methods.

stats
Messages
1
Reaction score
0
Dear all,

One of the main challenges in social science is inferring causality from nonexperimental data.
Statistics can help. In a regression context regressing Y on T may not give the causal effect of T on Y due to unobserved confounding. A solution is finding a variable Z, called an instrumental variable, which only, and strongly (strength of the instrument), affects T but does not affect Y other than trough T (validity of the instrument). The problem in many areas of social science is that such a variable is very hard or impossible to come by.

In the most general instrumental variable (IV) model, you want to estimate the effect of T on Y:

Y = f(T,X,U)
T = g(Z,X,V),

where X are control variables, U and V possible related error terms and Z the instrument. Here identificaiton of the model hinges on

Y independent of Z given T,X

To cut the story short, what has not been used here but could be potentially useful is the concept of Conditional Mutual Information (MI) from information theory, a measure of conditional nonlinear dependence between two variables: http://en.wikipedia.org/wiki/Conditional_mutual_information .

That is because MI(Y;Z | T,X) = 0 iff Y and Z are independent given T,X.
Suppose we have a strong instrument Z, strongly related to T. However, it may not be totally valid, that is, related with Y other than through T and X. We could make it valid by transforming it by the following optimization program:

find function f such that
max MI(T;f(Z) | X )
s.t. MI(Y;f(Z) | X,T) = 0

or find function f such that max E(lik(f(Z), T | X)
s.t. lik(f(Z), Y | X,T) = 0
or equivalently
s.t. g(Y,T | X) = g(Y,T | X,f(Z)).

where lik stands for loglikelihood and g for probability density function. Anyone any idea on how to solve this constrained optimization problem?
 
Last edited:
Physics news on Phys.org
stats said:
find function f such that
max MI(T;f(Z) | X )
s.t. MI(Y;f(Z) | X,T) = 0

My thoughts:

First the truisms:

If the distributions involved are all represented as expressions with known or unknown constant parameters then this looks like the usual sort of optimization problem. The "MI" functions would be functions of the uknown constants, which we regard as the variables in the optimization problem. We try to maximize the the function in the top line subject to the constraint given by the second line. I'm sure people can Wikipedize us with links on that topic (lagrange multipliers etc.)

A limitation of this approach is that f(Z) doesn't vary over "the set of all possible functions", it only varies over a certain parametric family.

The well known field of math that allows f(Z) to vary over something more inclusive that a parameterized family is the Calculus Of Variations. I don't know it well enough! So I can't say whether you could fit your problem into that form. Perhaps some other forum member will tell us. I suppose we could eventually figure it out.

Other thoughts:

I don't have a good intuition about whether the constraint MI(Y;f(Z)| X,T) = 0 is possible to meet. All I recall is that the entropy of a continuous distribution is not necessarily invariant under change of variable. So if we are dealing with continuous distributions perhaps there is a chance. I also don't have a good intuition about whether a solution to the problem as stated might be trivial in some way, for example f(Z) = constant.

Due my lack of intuition, I suggest starting with a simple version of this problem. If we want to model dependent random variables A and B, then a simple minded way is to postulate the existence of a set of independent identically distributed random variables [itex]\{W_1,W_2,..\}[/itex] and assume A and B are each known functions of these variables, thereby introducing a dependence between A and B.

What type of functions of the [itex]W_i[/itex] make the problem simple? Maybe you've already tried something like this and you know.

Linear models are boring, but there are lots of known results about them. Let W be column vector [itex](1,w_1,w_2,...)[/itex]. Represent a random variable A as A = aW where a is a row vector of constants.

An example optimizing over a parametrized family approach would be to take T,Z,X,Y as having known constants and f(Z) as being in some parametric family of functions, the simplest interesting one being a linear function. Maybe working the problem this way would re-derive some well known result from linear modeling.
 

Similar threads

  • · Replies 30 ·
2
Replies
30
Views
5K
  • · Replies 1 ·
Replies
1
Views
2K
  • · Replies 2 ·
Replies
2
Views
2K
  • · Replies 6 ·
Replies
6
Views
5K
  • · Replies 11 ·
Replies
11
Views
2K
  • · Replies 7 ·
Replies
7
Views
3K
  • · Replies 1 ·
Replies
1
Views
2K
  • · Replies 9 ·
Replies
9
Views
3K
  • · Replies 2 ·
Replies
2
Views
2K
  • · Replies 8 ·
Replies
8
Views
2K