Function estimation subject to equality of densities

In summary: Or maybe it would give you intuition about what can be achieved.In summary, the conversation discusses the challenge of inferring causality from non-experimental data in social science. Statistics can help, but regression may not accurately determine the causal effect due to unobserved confounding. The concept of Conditional Mutual Information from information theory is proposed as a potential solution, which involves finding a strong instrumental variable Z that only affects T and does not affect Y other than through T. The problem is that such a variable is difficult to find. The conversation then discusses a constrained optimization problem to make the instrument valid, and suggests using the Calculus of Variations to find a solution. It also suggests starting with a simple version of the problem and using linear modeling
  • #1
stats
1
0
Dear all,

One of the main challenges in social science is inferring causality from nonexperimental data.
Statistics can help. In a regression context regressing Y on T may not give the causal effect of T on Y due to unobserved confounding. A solution is finding a variable Z, called an instrumental variable, which only, and strongly (strength of the instrument), affects T but does not affect Y other than trough T (validity of the instrument). The problem in many areas of social science is that such a variable is very hard or impossible to come by.

In the most general instrumental variable (IV) model, you want to estimate the effect of T on Y:

Y = f(T,X,U)
T = g(Z,X,V),

where X are control variables, U and V possible related error terms and Z the instrument. Here identificaiton of the model hinges on

Y independent of Z given T,X

To cut the story short, what has not been used here but could be potentially useful is the concept of Conditional Mutual Information (MI) from information theory, a measure of conditional nonlinear dependence between two variables: http://en.wikipedia.org/wiki/Conditional_mutual_information .

That is because MI(Y;Z | T,X) = 0 iff Y and Z are independent given T,X.
Suppose we have a strong instrument Z, strongly related to T. However, it may not be totally valid, that is, related with Y other than through T and X. We could make it valid by transforming it by the following optimization program:

find function f such that
max MI(T;f(Z) | X )
s.t. MI(Y;f(Z) | X,T) = 0

or find function f such that max E(lik(f(Z), T | X)
s.t. lik(f(Z), Y | X,T) = 0
or equivalently
s.t. g(Y,T | X) = g(Y,T | X,f(Z)).

where lik stands for loglikelihood and g for probability density function. Anyone any idea on how to solve this constrained optimization problem?
 
Last edited:
Physics news on Phys.org
  • #2
stats said:
find function f such that
max MI(T;f(Z) | X )
s.t. MI(Y;f(Z) | X,T) = 0

My thoughts:

First the truisms:

If the distributions involved are all represented as expressions with known or unknown constant parameters then this looks like the usual sort of optimization problem. The "MI" functions would be functions of the uknown constants, which we regard as the variables in the optimization problem. We try to maximize the the function in the top line subject to the constraint given by the second line. I'm sure people can Wikipedize us with links on that topic (lagrange multipliers etc.)

A limitation of this approach is that f(Z) doesn't vary over "the set of all possible functions", it only varies over a certain parametric family.

The well known field of math that allows f(Z) to vary over something more inclusive that a parameterized family is the Calculus Of Variations. I don't know it well enough! So I can't say whether you could fit your problem into that form. Perhaps some other forum member will tell us. I suppose we could eventually figure it out.

Other thoughts:

I don't have a good intuition about whether the constraint MI(Y;f(Z)| X,T) = 0 is possible to meet. All I recall is that the entropy of a continuous distribution is not necessarily invariant under change of variable. So if we are dealing with continuous distributions perhaps there is a chance. I also don't have a good intuition about whether a solution to the problem as stated might be trivial in some way, for example f(Z) = constant.

Due my lack of intuition, I suggest starting with a simple version of this problem. If we want to model dependent random variables A and B, then a simple minded way is to postulate the existence of a set of independent identically distributed random variables [itex]\{W_1,W_2,..\}[/itex] and assume A and B are each known functions of these variables, thereby introducing a dependence between A and B.

What type of functions of the [itex] W_i [/itex] make the problem simple? Maybe you've already tried something like this and you know.

Linear models are boring, but there are lots of known results about them. Let W be column vector [itex] (1,w_1,w_2,...) [/itex]. Represent a random variable A as A = aW where a is a row vector of constants.

An example optimizing over a parametrized family approach would be to take T,Z,X,Y as having known constants and f(Z) as being in some parametric family of functions, the simplest interesting one being a linear function. Maybe working the problem this way would re-derive some well known result from linear modeling.
 

1. What is function estimation subject to equality of densities?

Function estimation subject to equality of densities is a statistical method used to estimate the relationship between two variables when their probability density functions are equal. This is often done using maximum likelihood estimation or other techniques to find the best fitting function.

2. How is function estimation subject to equality of densities different from other methods?

This method differs from other statistical methods in that it specifically focuses on finding the relationship between two variables when their probability density functions are equal. This is useful in situations where traditional regression techniques may not be appropriate.

3. What are some applications of function estimation subject to equality of densities?

This method has a wide range of applications, including in econometrics, finance, and biology. It can be used to model the relationship between economic variables, forecast financial markets, and analyze biological data.

4. What are some challenges of using function estimation subject to equality of densities?

One of the main challenges is determining the appropriate functional form to use in the estimation. This can be difficult, as it requires a good understanding of the underlying data and variables. Additionally, the method may not be suitable for all types of data and may not provide accurate results in certain scenarios.

5. How can function estimation subject to equality of densities be validated?

Similar to other statistical methods, function estimation subject to equality of densities can be validated through various techniques such as cross-validation, goodness of fit tests, and comparing the estimated function to other known relationships in the data. It is important to thoroughly evaluate the results and consider potential limitations of the method.

Similar threads

  • Set Theory, Logic, Probability, Statistics
Replies
30
Views
2K
  • Set Theory, Logic, Probability, Statistics
Replies
1
Views
738
  • Set Theory, Logic, Probability, Statistics
Replies
2
Views
1K
  • Set Theory, Logic, Probability, Statistics
Replies
23
Views
2K
  • Set Theory, Logic, Probability, Statistics
Replies
7
Views
457
  • Set Theory, Logic, Probability, Statistics
Replies
6
Views
4K
  • Set Theory, Logic, Probability, Statistics
Replies
3
Views
805
  • Set Theory, Logic, Probability, Statistics
Replies
1
Views
2K
  • Set Theory, Logic, Probability, Statistics
Replies
9
Views
2K
  • General Math
Replies
2
Views
714
Back
Top