Could Someone Explain Partial Dependence to me?

Aero51 · Aug 29, 2013

I am venturing in the realm of probability and I came across a concept that I call partial dependence. IE, it is a set of a events which a neither independent nor dependent on each other. They fall somewhere "in between". I have looked on the internet and I really don't understand the explanations because they are outside my mathematical background.

chiro · Aug 30, 2013

Hey Aero51.

Do you have a link for us to see the actual definition? Is there a wiki page (or similar)?

Stephen Tashi · Aug 30, 2013

Aero51 said:

I came across a concept that I call partial dependence. .

That's what you call it? What was it called in source where you came across it?

Perhaps you were reading about "partial correlation" instead.

Aero51 · Aug 30, 2013

Here is an example:
http://www.avianknowledge.net/content/features/archive/visualizing-predictor-effects-with-partial-dependence-plots

But I was talking about that in the context of a "probable" (I really don't know the vernacurlar) differential equation. In other words, a probability mass function defining a set of functions where each one has the probability of being selected. For example:

[itex]f_1= x*y, p(f_1) = .25[/itex]
[itex]f_2= x*y^2, p(f_2) = .75[/itex]
[itex]f_3= x^2*y, p(f_3) = .25[/itex]

However I am subjecting two sets of functions described by their own probability mass functions to the following constraint:

[itex]\partial f_1 / \partial x_1 + \partial f_2 / x_2 = 0[/itex]

for example, let:
[itex]f_1 = 3x^2 +1[/itex]
[itex]f_2 = -6xy +sin(t)[/itex]

It is clear that the above condition is satisfied. But only parts of f₁ and f₂ actually depend on each other. In other words, we can break f₁ and f₂ into two distinct parts say f_1a and f_1b. f_1a only has to satisfy the above condition provided that f_1b is not a function of x₁

From a probabilistic standpoint, how would you describe the dependence of a more general system who shares the properties I just explained.

chiro · Aug 30, 2013

In probability, a random variable has a joint distribution when it comes to specifying the probabilities.

If you have multiple random variables that are a function of a smaller number of random variables, then basically the minimum representation of those random variables and the probability spaces is the minimum joint distribution that is used to describe those random variables.

Once you have the joint distribution of the minimum space, you can then use a few techniques in probability like characteristic/probability-generation-functions,transformation theorems of random variables, results of product/ratio/summation distributions as well as non-analytic techniques like simulation (including monte carlo and markov chain monte carlo).

You can think of it in the way that you reduce a matrix. If you have a lot of linear dependence, then you may eliminate a few rows to get zero. What are left with is the actual information of the system and instead of vectors, you have a joint distribution.

Aero51 · Aug 31, 2013

Do you have a document or a specific topic in probability I may view. Currently I own "A first course in probability" and "Introduction to Mathematical Statistics"

Its interesting you metion markov chain methods. The model I am working on reduces to a markov chain if you can assume f₁,f₂...f_n are all independent events.

I really need some concrete information regarding this subject I've come across. I am presenting my work to a professor, who will help guide me into a further area of research.

chiro · Aug 31, 2013

There are quite a lot of topics that cover introductory probability and random variables, but some of the topics I have mentioned are probably going to be found when doing a google search or reading journal papers (if you don't get the information directly from say internal university lecture notes).

Something like this should be OK for an overview of the core basic topics:

https://www.amazon.com/dp/0321795431/?tag=pfamazon01-20

Stuff like simulation and MCMC is quite new (in terms of the established theory) and a lot of research is still going into these areas for various applications (finance, bio-statistics, and general statistical theory).

Basically the thing you want to look at is the joint distribution, establishing independence between two variables and use these concepts (in conjunction with the appropriate results) to establish what the minimum number of random variables are and what their joint distribution is (given sufficient information to figure it out).

In the case that you have the definitions of the random variables, you can just first count the number of independent variables (say u = xy, v = x^2y, w = xy^2 then x and y are the independent variables), and then use definitions of independence to see if they are independent.

If variables are independent then the joint distribution is a product of the individual distributions P(A = a, B = b) = P(A = a)*P(B = b). If it isn't, then the joint distribution can't be simplified or separated out.

If you want to look at a function of random variables, then depending on what you want you will need to look at the things I mentioned above (transformation theorem, characteristic function, ratio and product results, convolution theorem, MCMC simulation, normal simulation, etc).

If you only care about the mean and variance (or other moments), then you don't need the above and you can use the results that are talked about in your introductory statistics book.

Aero51 · Aug 31, 2013

Thank you, I'll take a look at some of those topics. To narrow down my description (and perhaps help you direct me to a more specific topic) I will sum up my problem as follows.

I essentially have a set of probability mass functions, which do not describe random variables per say, but do describe the probability of observing a specific function. We'll say there are N functions in each PMF. The possible set of functions from each probability mass function must adhere to a certain criteria, in my case the equation I described above. If one function is randomly selected from each PMF and they do not meet the criteria, that specific combination will be "thrown out". The possible combinations and the corresponding probabilities of said combinations are what I need.

In addition, there is also the possibility that the pmf can themselves evolve over a spacio-temporal domain. Thus, making my problem even more difficult :)

chiro · Aug 31, 2013

Without knowing anything extra, you would probably have to formulate a joint distribution via exhaustion of the state-space and set the thrown out events to zero probability leaving the joint distribution with only non-zero events.

This is probably not what you wanted to hear, but without knowing more about it I can't really say anything else.

If you have a way to link all the functions together through some kind of parametric family (i.e. link functions together by using a new set of parameters) you can solve for the parameters that don't make the cut and use this information to form your joint distribution.

Stephen Tashi · Sep 1, 2013

Aero51 said:

Here is an example:
http://www.avianknowledge.net/content/features/archive/visualizing-predictor-effects-with-partial-dependence-plots

That example and further web searching shows that "partial dependence plots" are specialized topic associated with a particular method of "machine learning". So you won't find this subject treated in books about general probability theory.

But I was talking about that in the context of a "probable" (I really don't know the vernacurlar) differential equation. In other words, a probability mass function defining a set of functions where each one has the probability of being selected. For example:

[itex]f_1= x*y, p(f_1) = .25[/itex]
[itex]f_2= x*y^2, p(f_2) = .75[/itex]
[itex]f_3= x^2*y, p(f_3) = .25[/itex]

I can't tell if your example is supposed to be related to the machine learning technique described in the link you gave. The relation to standard probability theory goes this way. When we deal with random functions (as opposed to random variables), the selection of a random function determines more than a single scalar or finite vector of values. The selection of a random function gives a graph or "trajectory" (if we visualize the graph as a function of something plotted vs time). Selecting random functions is the most common example of a "continuous stochastic process". Keywords like "time series", "random functions", "stationary random functions", "stochastic differential equations" are appropriate things to use in searching.

If your example is related to the specialized topic of machine learning, I don't know any details.

However I am subjecting two sets of functions described by their own probability mass functions to the following constraint:

[itex]\partial f_1 / \partial x_1 + \partial f_2 / x_2 = 0[/itex]

for example, let:
[itex]f_1 = 3x^2 +1[/itex]
[itex]f_2 = -6xy +sin(t)[/itex]

I don't understand your notation. What are [itex] x_1 [/itex] and [itex] x_2 [/itex]? I don't see either variable mentioned in the functions [itex] f_1, f_2 [/itex].

Aero51 · Sep 4, 2013

I solved the problem I ended up having to decompose the functions into distinct parts and applied a bunch of logic and basic set theory to express two distributions: one relating the sets of different functions and 2) a distribution relating the functions domains.

Could Someone Explain Partial Dependence to me?

1. What is partial dependence?

2. How is partial dependence different from other measures of variable importance?

3. How is partial dependence calculated?

4. What is the benefit of using partial dependence?

5. Are there any limitations to using partial dependence?

Similar threads

Hot Threads

Recent Insights