Could Someone Explain Partial Dependence to me?

In summary, partial dependence is a concept that Aero51 came across and describes as a set of events which neither independent nor dependent on each other. Partial dependence falls somewhere between the two. He has looked on the internet and does not understand the explanations because they are outside his mathematical background.
  • #1
Aero51
548
10
I am venturing in the realm of probability and I came across a concept that I call partial dependence. IE, it is a set of a events which a neither independent nor dependent on each other. They fall somewhere "in between". I have looked on the internet and I really don't understand the explanations because they are outside my mathematical background.
 
Physics news on Phys.org
  • #2
Hey Aero51.

Do you have a link for us to see the actual definition? Is there a wiki page (or similar)?
 
  • #3
Aero51 said:
I came across a concept that I call partial dependence. .

That's what you call it? What was it called in source where you came across it?

Perhaps you were reading about "partial correlation" instead.
 
  • #4
Here is an example:
http://www.avianknowledge.net/content/features/archive/visualizing-predictor-effects-with-partial-dependence-plots

But I was talking about that in the context of a "probable" (I really don't know the vernacurlar) differential equation. In other words, a probability mass function defining a set of functions where each one has the probability of being selected. For example:

[itex]f_1= x*y, p(f_1) = .25[/itex]
[itex]f_2= x*y^2, p(f_2) = .75[/itex]
[itex]f_3= x^2*y, p(f_3) = .25[/itex]

However I am subjecting two sets of functions described by their own probability mass functions to the following constraint:

[itex]\partial f_1 / \partial x_1 + \partial f_2 / x_2 = 0[/itex]

for example, let:
[itex]f_1 = 3x^2 +1[/itex]
[itex]f_2 = -6xy +sin(t)[/itex]


It is clear that the above condition is satisfied. But only parts of f1 and f2 actually depend on each other. In other words, we can break f1 and f2 into two distinct parts say f1a and f1b. f1a only has to satisfy the above condition provided that f1b is not a function of x1


From a probabilistic standpoint, how would you describe the dependence of a more general system who shares the properties I just explained.
 
Last edited by a moderator:
  • #5
In probability, a random variable has a joint distribution when it comes to specifying the probabilities.

If you have multiple random variables that are a function of a smaller number of random variables, then basically the minimum representation of those random variables and the probability spaces is the minimum joint distribution that is used to describe those random variables.

Once you have the joint distribution of the minimum space, you can then use a few techniques in probability like characteristic/probability-generation-functions,transformation theorems of random variables, results of product/ratio/summation distributions as well as non-analytic techniques like simulation (including monte carlo and markov chain monte carlo).

You can think of it in the way that you reduce a matrix. If you have a lot of linear dependence, then you may eliminate a few rows to get zero. What are left with is the actual information of the system and instead of vectors, you have a joint distribution.
 
  • #6
Do you have a document or a specific topic in probability I may view. Currently I own "A first course in probability" and "Introduction to Mathematical Statistics"

Its interesting you metion markov chain methods. The model I am working on reduces to a markov chain if you can assume f1,f2...fn are all independent events.

I really need some concrete information regarding this subject I've come across. I am presenting my work to a professor, who will help guide me into a further area of research.
 
  • #7
There are quite a lot of topics that cover introductory probability and random variables, but some of the topics I have mentioned are probably going to be found when doing a google search or reading journal papers (if you don't get the information directly from say internal university lecture notes).

Something like this should be OK for an overview of the core basic topics:

https://www.amazon.com/dp/0321795431/?tag=pfamazon01-20

Stuff like simulation and MCMC is quite new (in terms of the established theory) and a lot of research is still going into these areas for various applications (finance, bio-statistics, and general statistical theory).

Basically the thing you want to look at is the joint distribution, establishing independence between two variables and use these concepts (in conjunction with the appropriate results) to establish what the minimum number of random variables are and what their joint distribution is (given sufficient information to figure it out).

In the case that you have the definitions of the random variables, you can just first count the number of independent variables (say u = xy, v = x^2y, w = xy^2 then x and y are the independent variables), and then use definitions of independence to see if they are independent.

If variables are independent then the joint distribution is a product of the individual distributions P(A = a, B = b) = P(A = a)*P(B = b). If it isn't, then the joint distribution can't be simplified or separated out.

If you want to look at a function of random variables, then depending on what you want you will need to look at the things I mentioned above (transformation theorem, characteristic function, ratio and product results, convolution theorem, MCMC simulation, normal simulation, etc).

If you only care about the mean and variance (or other moments), then you don't need the above and you can use the results that are talked about in your introductory statistics book.
 
Last edited by a moderator:
  • #8
Thank you, I'll take a look at some of those topics. To narrow down my description (and perhaps help you direct me to a more specific topic) I will sum up my problem as follows.

I essentially have a set of probability mass functions, which do not describe random variables per say, but do describe the probability of observing a specific function. We'll say there are N functions in each PMF. The possible set of functions from each probability mass function must adhere to a certain criteria, in my case the equation I described above. If one function is randomly selected from each PMF and they do not meet the criteria, that specific combination will be "thrown out". The possible combinations and the corresponding probabilities of said combinations are what I need.

In addition, there is also the possibility that the pmf can themselves evolve over a spacio-temporal domain. Thus, making my problem even more difficult :)
 
Last edited:
  • #9
Without knowing anything extra, you would probably have to formulate a joint distribution via exhaustion of the state-space and set the thrown out events to zero probability leaving the joint distribution with only non-zero events.

This is probably not what you wanted to hear, but without knowing more about it I can't really say anything else.

If you have a way to link all the functions together through some kind of parametric family (i.e. link functions together by using a new set of parameters) you can solve for the parameters that don't make the cut and use this information to form your joint distribution.
 
  • #10
Aero51 said:
Here is an example:
http://www.avianknowledge.net/content/features/archive/visualizing-predictor-effects-with-partial-dependence-plots

That example and further web searching shows that "partial dependence plots" are specialized topic associated with a particular method of "machine learning". So you won't find this subject treated in books about general probability theory.

But I was talking about that in the context of a "probable" (I really don't know the vernacurlar) differential equation. In other words, a probability mass function defining a set of functions where each one has the probability of being selected. For example:

[itex]f_1= x*y, p(f_1) = .25[/itex]
[itex]f_2= x*y^2, p(f_2) = .75[/itex]
[itex]f_3= x^2*y, p(f_3) = .25[/itex]
I can't tell if your example is supposed to be related to the machine learning technique described in the link you gave. The relation to standard probability theory goes this way. When we deal with random functions (as opposed to random variables), the selection of a random function determines more than a single scalar or finite vector of values. The selection of a random function gives a graph or "trajectory" (if we visualize the graph as a function of something plotted vs time). Selecting random functions is the most common example of a "continuous stochastic process". Keywords like "time series", "random functions", "stationary random functions", "stochastic differential equations" are appropriate things to use in searching.

If your example is related to the specialized topic of machine learning, I don't know any details.

However I am subjecting two sets of functions described by their own probability mass functions to the following constraint:

[itex]\partial f_1 / \partial x_1 + \partial f_2 / x_2 = 0[/itex]

for example, let:
[itex]f_1 = 3x^2 +1[/itex]
[itex]f_2 = -6xy +sin(t)[/itex]

I don't understand your notation. What are [itex] x_1 [/itex] and [itex] x_2 [/itex]? I don't see either variable mentioned in the functions [itex] f_1, f_2 [/itex].
 
Last edited by a moderator:
  • #11
I solved the problem I ended up having to decompose the functions into distinct parts and applied a bunch of logic and basic set theory to express two distributions: one relating the sets of different functions and 2) a distribution relating the functions domains.
 

1. What is partial dependence?

Partial dependence is a statistical concept that measures the relationship between a specific feature or variable and the outcome of a statistical model, while controlling for the effects of other variables. It helps to understand the impact of a single variable on the model's predictions.

2. How is partial dependence different from other measures of variable importance?

Partial dependence takes into account the effects of other variables in the model, whereas other measures of variable importance, such as feature importance or coefficient values, only consider the individual variable's impact on the outcome.

3. How is partial dependence calculated?

Partial dependence is typically calculated by varying the values of a specific feature while holding all other variables at their average or most common values. The model's predictions are then recorded for each value, and the mean or median of these predictions is used to estimate the partial dependence for that feature.

4. What is the benefit of using partial dependence?

Partial dependence helps to identify and understand the relationships between individual features and the outcome of a statistical model. It can also reveal non-linear relationships or interactions between variables that may not be apparent from other measures of variable importance.

5. Are there any limitations to using partial dependence?

Partial dependence assumes that all other variables in the model are held constant, which may not accurately reflect real-world scenarios. Additionally, it may not capture the full complexity of relationships between variables, especially in more complex models.

Similar threads

  • Set Theory, Logic, Probability, Statistics
Replies
4
Views
2K
  • Set Theory, Logic, Probability, Statistics
Replies
5
Views
2K
  • Differential Equations
Replies
5
Views
1K
  • Set Theory, Logic, Probability, Statistics
Replies
4
Views
1K
  • Calculus and Beyond Homework Help
Replies
4
Views
1K
Replies
2
Views
2K
Replies
18
Views
2K
  • Special and General Relativity
3
Replies
78
Views
4K
  • Introductory Physics Homework Help
Replies
9
Views
1K
  • Set Theory, Logic, Probability, Statistics
Replies
5
Views
1K
Back
Top