# Equation related to Linear Discriminant Analysis (LDA)?

• zak100
In summary, the equation says that the projection matrix, W, maximizes the separation between two classes.

## Homework Statement

I can't understand an equation to LDA. The context is:
The objective of LDA is to perform dimensionality reduction while
preserving as much of the class discriminatory information as
possible
Maybe the lecturer is trying to create a proof of the equation given below.

I know the above that LDA projects the points along an axis so that we can have maximum separation between two classes.in addition to reducing dimesionality

## Homework Equations

I am not able to understand the following equation:
##Y =W^T## ##X##

It says that:
Assume we have a set of D-dimensional samples ##{x_1, x_2,...x_N},## ##N_1## of belong to class
##\Omega_1## and ##N_2## to class ##\Omega_2##. We seek to obtain a scalar ##Y## by projecting the samples ##X## onto a line:
In the above there is no W. So I want to know what is W?

## The Attempt at a Solution

W might represent the projection line? But T = transpose.

Somebody please guide me. For complete description, please see the attached file.

Zulfi.
[/B]

#### Attachments

• l6.pdf
140.4 KB · Views: 240
If you are patient enough, we can step through this.

There are some severe notation and definitional roadblocks that will come up. From past threads, you know that a projection matrix satisfies ##P^2 = P## i.e. idempotence implies it is square and in fact diagonalizable and in fact full rank iff it is the identity matrix, yet this contradicts slide 8 of your attachment. (I have a guess as to what's actually being said here, but the attachment is problematic. My guess btw is that ##W^T W = I## but ## WW^T = P##)

Typically more than half the battle is clearly stating what is being asked, then I'd finish it off with something a bit esoteric like matrix calculus or majorization. The fact mentioned on page 9 that LDA can be interpreted / derived as a Max Likelihood method for certain normals... is probably the most direct method.

Last edited:
Hi,
Thanks for your reply? Do you mean that W is the sample matrix?

Zulfi.

e_
zak100 said:
Hi,
Thanks for your reply? Do you mean that W is the sample matrix?

Zulfi.

Have you looked at pages 7 and 9 in Detail? It seems fairly clear to me that ##W## is made up. Equivalently, you choose it, and you should choose optimally (page 9).

- - - -
My belief, btw, is that page 8 shows

##J(W) = \frac{\det\big(W^T S_b W\big)}{\det\big(W^T S_W W\big)}##

where ##S_W## and ##S_B## are symmetric positive (semi?) definite matrices. However since I've conjectured that ##W^TW = I## but ##WW^T=P## my belief is you select ##W## to be a rank ##r## matrix and hence

##J(W) = \frac{\det\big(W^T S_b W\big)}{\det\big(W^T S_W W\big)} = \frac{e_r\big(W^T S_b W\big)}{e_r\big(W^T S_b W\big)}= \frac{e_r\big(P S_b \big)}{e_r\big(P S_b \big)}= \frac{e_r\big(P S_b P\big)}{e_r\big(P S_b P\big)}##

where ##e_r## is the rth elementary symmetric function of the eigenvalues of the matrix inside. But these notes clearly are part of a much bigger sequence and are not standalone. There should be a notational lookup somewhere.

Last edited:
Hi,
Thanks. You mean that W represents the Matrix of Eigen Vectors.

Kindly tell me what is the difference between ##\mu## and ##\hat{\mu}## in slide #3. ##\mu## represents the mean of X values where as ##\hat{\mu}## represents the mean of Y values. If both are mean why we use ^ symbol with one and other one is without ^ symbol. We could have represented them using ##\mu_1## and ##\mu_2##. I can't understand this.

Zulfi.

## 1. What is Linear Discriminant Analysis and how is it used?

Linear Discriminant Analysis (LDA) is a statistical method used for classification and dimensionality reduction. It involves finding a linear combination of features that best separates different classes of data. LDA is commonly used in data mining, pattern recognition, and machine learning.

## 2. What is the difference between LDA and PCA?

LDA and PCA (Principal Component Analysis) are both dimensionality reduction techniques, but they have different objectives. LDA aims to find a linear combination of features that best separates classes, while PCA aims to find a linear combination of features that captures the most variance in the data. Additionally, LDA is a supervised learning algorithm, while PCA is unsupervised.

## 3. How does LDA handle class imbalance in the data?

LDA takes into account the prior probability of each class when finding the linear combination of features that best separates the classes. This means that even if one class is significantly larger than the others, LDA will still consider the smaller classes in the classification process.

## 4. Can LDA be used for non-linearly separable data?

No, LDA is a linear method, meaning it assumes that the data can be separated by a straight line or plane. If the data is non-linearly separable, LDA will not be able to accurately classify it.

## 5. Are there any assumptions or limitations of LDA?

Yes, LDA has several assumptions and limitations. It assumes that the data is normally distributed and that the classes have equal covariance matrices. It also requires that the number of features is less than the number of data points. Additionally, LDA may not perform well if the classes overlap significantly or if there are too many features that are highly correlated.