# Could Someone Explain Partial Dependence to me?

## Main Question or Discussion Point

I am venturing in the realm of probability and I came across a concept that I call partial dependence. IE, it is a set of a events which a neither independent nor dependent on each other. They fall somewhere "in between". I have looked on the internet and I really don't understand the explanations because they are outside my mathematical background.

Related Set Theory, Logic, Probability, Statistics News on Phys.org
chiro
Hey Aero51.

Do you have a link for us to see the actual definition? Is there a wiki page (or similar)?

Stephen Tashi
I came across a concept that I call partial dependence. .
That's what you call it? What was it called in source where you came across it?

Here is an example:
http://www.avianknowledge.net/content/features/archive/visualizing-predictor-effects-with-partial-dependence-plots [Broken]

But I was talking about that in the context of a "probable" (I really dont know the vernacurlar) differential equation. In other words, a probability mass function defining a set of functions where each one has the probability of being selected. For example:

$f_1= x*y, p(f_1) = .25$
$f_2= x*y^2, p(f_2) = .75$
$f_3= x^2*y, p(f_3) = .25$

However I am subjecting two sets of functions described by their own probability mass functions to the following constraint:

$\partial f_1 / \partial x_1 + \partial f_2 / x_2 = 0$

for example, let:
$f_1 = 3x^2 +1$
$f_2 = -6xy +sin(t)$

It is clear that the above condition is satisfied. But only parts of f1 and f2 actually depend on each other. In other words, we can break f1 and f2 into two distinct parts say f1a and f1b. f1a only has to satisfy the above condition provided that f1b is not a function of x1

From a probabilistic standpoint, how would you describe the dependence of a more general system who shares the properties I just explained.

Last edited by a moderator:
chiro
In probability, a random variable has a joint distribution when it comes to specifying the probabilities.

If you have multiple random variables that are a function of a smaller number of random variables, then basically the minimum representation of those random variables and the probability spaces is the minimum joint distribution that is used to describe those random variables.

Once you have the joint distribution of the minimum space, you can then use a few techniques in probability like characteristic/probability-generation-functions,transformation theorems of random variables, results of product/ratio/summation distributions as well as non-analytic techniques like simulation (including monte carlo and markov chain monte carlo).

You can think of it in the way that you reduce a matrix. If you have a lot of linear dependence, then you may eliminate a few rows to get zero. What are left with is the actual information of the system and instead of vectors, you have a joint distribution.

Do you have a document or a specific topic in probability I may view. Currently I own "A first course in probability" and "Introduction to Mathematical Statistics"

Its interesting you metion markov chain methods. The model I am working on reduces to a markov chain if you can assume f1,f2...fn are all independent events.

I really need some concrete information regarding this subject I've come across. I am presenting my work to a professor, who will help guide me into a further area of research.

chiro
There are quite a lot of topics that cover introductory probability and random variables, but some of the topics I have mentioned are probably going to be found when doing a google search or reading journal papers (if you don't get the information directly from say internal university lecture notes).

Something like this should be OK for an overview of the core basic topics:

https://www.amazon.com/dp/0321795431/?tag=pfamazon01-20

Stuff like simulation and MCMC is quite new (in terms of the established theory) and a lot of research is still going into these areas for various applications (finance, bio-statistics, and general statistical theory).

Basically the thing you want to look at is the joint distribution, establishing independence between two variables and use these concepts (in conjunction with the appropriate results) to establish what the minimum number of random variables are and what their joint distribution is (given sufficient information to figure it out).

In the case that you have the definitions of the random variables, you can just first count the number of independent variables (say u = xy, v = x^2y, w = xy^2 then x and y are the independent variables), and then use definitions of independence to see if they are independent.

If variables are independent then the joint distribution is a product of the individual distributions P(A = a, B = b) = P(A = a)*P(B = b). If it isn't, then the joint distribution can't be simplified or separated out.

If you want to look at a function of random variables, then depending on what you want you will need to look at the things I mentioned above (transformation theorem, characteristic function, ratio and product results, convolution theorem, MCMC simulation, normal simulation, etc).

If you only care about the mean and variance (or other moments), then you don't need the above and you can use the results that are talked about in your introductory statistics book.

Last edited by a moderator:
Thank you, I'll take a look at some of those topics. To narrow down my description (and perhaps help you direct me to a more specific topic) I will sum up my problem as follows.

I essentially have a set of probability mass functions, which do not describe random variables per say, but do describe the probability of observing a specific function. We'll say there are N functions in each PMF. The possible set of functions from each probability mass function must adhere to a certain criteria, in my case the equation I described above. If one function is randomly selected from each PMF and they do not meet the criteria, that specific combination will be "thrown out". The possible combinations and the corresponding probabilities of said combinations are what I need.

In addition, there is also the possibility that the pmf can themselves evolve over a spacio-temporal domain. Thus, making my problem even more difficult :)

Last edited:
chiro
Without knowing anything extra, you would probably have to formulate a joint distribution via exhaustion of the state-space and set the thrown out events to zero probability leaving the joint distribution with only non-zero events.

This is probably not what you wanted to hear, but without knowing more about it I can't really say anything else.

If you have a way to link all the functions together through some kind of parametric family (i.e. link functions together by using a new set of parameters) you can solve for the parameters that don't make the cut and use this information to form your joint distribution.

Stephen Tashi
Here is an example:
http://www.avianknowledge.net/content/features/archive/visualizing-predictor-effects-with-partial-dependence-plots [Broken]
That example and further web searching shows that "partial dependence plots" are specialized topic associated with a particular method of "machine learning". So you won't find this subject treated in books about general probability theory.

But I was talking about that in the context of a "probable" (I really dont know the vernacurlar) differential equation. In other words, a probability mass function defining a set of functions where each one has the probability of being selected. For example:

$f_1= x*y, p(f_1) = .25$
$f_2= x*y^2, p(f_2) = .75$
$f_3= x^2*y, p(f_3) = .25$
I can't tell if your example is supposed to be related to the machine learning technique described in the link you gave. The relation to standard probability theory goes this way. When we deal with random functions (as opposed to random variables), the selection of a random function determines more than a single scalar or finite vector of values. The selection of a random function gives a graph or "trajectory" (if we visualize the graph as a function of something plotted vs time). Selecting random functions is the most common example of a "continuous stochastic process". Keywords like "time series", "random functions", "stationary random functions", "stochastic differential equations" are appropriate things to use in searching.

If your example is related to the specialized topic of machine learning, I don't know any details.

However I am subjecting two sets of functions described by their own probability mass functions to the following constraint:

$\partial f_1 / \partial x_1 + \partial f_2 / x_2 = 0$

for example, let:
$f_1 = 3x^2 +1$
$f_2 = -6xy +sin(t)$
I don't understand your notation. What are $x_1$ and $x_2$? I don't see either variable mentioned in the functions $f_1, f_2$.

Last edited by a moderator:
I solved the problem I ended up having to decompose the functions into distinct parts and applied a bunch of logic and basic set theory to express two distributions: one relating the sets of different functions and 2) a distribution relating the functions domains.