Decision Theory: Discriminant function for 2D Gaussians

Master1022 · Apr 7, 2021

Hi,

I was working on the following problem:
Two classes ## C_1 ## and ## C_2 ## have equal priors. The likelihoods of ## x## belonging to each class are given by 2D normal distributions with different means, but the same covariance: [tex] p(x|C _1) = N(\mu_x, \Sigma) \text{and} p(x|C_2) = N(\mu_y, \Sigma) [/tex]
where we know the relationship between ## \mu_x ## and ## \mu_y ##
Determine the shape of the discriminant curve.

In a previous part of the question, we are told that ## y = Ax + t ## and thus we know that ## \mu_y = A \mu_x + t ## and ## \Sigma_y = A \Sigma_x A^T ##

Attempt:
From some online lecture notes, I gather that the shape of this discriminant curve should be a hyperplane, but I now want to verify this to be the case.

We can define ## g(x) ## as follows:

[tex] g(x) = ln \left( \frac{p(C_1 | x)}{p(C_2 | x)} \right) \rightarrow ln \left( \frac{p(x | C_1)}{p(x | C_2)} \right) + ln \left( \frac{p(C_1)}{p(C_2)} \right) [/tex]
Given that the priors are equal, then the second term is 0. Thus we are left with the first term. For the first term, the components can be defined as:

[tex] p(x | C_i ) = \frac{1}{(2 \pi) |\Sigma_i|^{1/2}} exp\left( -\frac{1}{2} (x - \mu_i)^T \Sigma_i ^{-1} (x - \mu_i) \right) [/tex]
Given that ## \Sigma_x = \Sigma_y ##, then we can just separate the logarithms and look at the terms in the exponents as:
[tex] g(x) = -(x - \mu_x)^T \Sigma^{-1} (x - \mu_x) + (x - \mu_y)^T \Sigma^{-1} (x - \mu_y) [/tex]
The discriminant curve should be where the classes are equiprobable, and thus ## g(x) = ln \left( \frac{p(C_1 | x)}{p(C_2 | x)} \right) = 0 ##
[tex] 0 = -(x - \mu_x)^T \Sigma^{-1} (x - \mu_x) + (x - \mu_y)^T \Sigma^{-1} (x - \mu_y) [/tex]

Now I suppose there are two ways to proceed:
1. Algebra
2. There is a hint about transforming ## \Sigma ## to the identity matrix, but I am not sure (a) how to properly do that, and (b) how that can help us. How could I do this second method?

Given that I don't quite understand how to do the transformation of the covariance matrix, I will continue with the algebra:
[tex] 0 = - ( x^T \Sigma^{-1} x - x^T \Sigma^{-1} \mu_x - \mu_x ^T \Sigma^{-1} x + \mu_x ^T \Sigma^{-1} \mu_x) + ( x^T \Sigma^{-1} x - x^T \Sigma^{-1} \mu_y - \mu_y ^T \Sigma^{-1} x + \mu_y ^T \Sigma^{-1} \mu_y) [/tex]
[tex] 0 = 2 x^T \Sigma^{-1} \mu_x - \mu_x ^T \Sigma^{-1} \mu_x - 2 x^T \Sigma^{-1} \mu_y + \mu_y ^T \Sigma^{-1} \mu_y [/tex]
[tex] 0 = (\mu_x - \mu_y)^T \Sigma^{-1} (\mu_x - \mu_y) + 2 x^T \Sigma^{-1} (\mu_x - \mu_y) [/tex]

We know that: ## \mu_x - \mu_y = \mu_x (I - A) - t ##, but I am not really sure how to proceed from here. I see that the second two terms are common to both terms, but I am not sure how this brings us closer to seeing that this is a hyperplane.

Any help would be greatly appreciated.

mighty2000 · Apr 7, 2021

Hi there,

Thanks for your post! It seems like you have a good understanding of the problem and are on the right track. To answer your question about the transformation of the covariance matrix, the hint is referring to a technique called "whitening" or "decorrelation". This is a common technique used in machine learning and pattern recognition to simplify data and make it easier to work with. Essentially, we want to transform our data (in this case, the covariance matrix) into a simpler form, such as an identity matrix. This can help us better understand the relationships between the variables and make our calculations easier.

To do this, we can use the following formula:
\Sigma_w = \Sigma^{-1/2} \Sigma \Sigma^{-1/2}
where \Sigma_w is the whitened covariance matrix and \Sigma^{-1/2} is the inverse square root of the original covariance matrix.

Once we have the whitened covariance matrix, we can use it in our calculations to simplify the equations and hopefully make it easier to see that the discriminant curve is indeed a hyperplane. I hope this helps! Let me know if you have any other questions or if you need further clarification. Good luck with your problem!

Decision Theory: Discriminant function for 2D Gaussians

What is decision theory?

What is a discriminant function?

What are 2D Gaussians?

How is a discriminant function used for 2D Gaussians?

What are some applications of decision theory and discriminant functions?

Similar threads

Hot Threads

Recent Insights