How do we represent multidimensional data vectors on a 2D plot.

In summary, the conversation covers the topic of representing multidimensional vectors on a 2D plot using unique features and eigenvalues for dimensional reduction. The speaker is seeking more information on the topic and asks for book or link suggestions. The expert recommends learning about data mining and multivariate statistics, particularly regression modeling and principal components. They also suggest learning R, a free and open-source tool for statistical analysis, and exploring topics like neural networks, decision trees, and spatial classification. The expert advises building a strong foundation in statistics and other related subjects, as different researchers may use different platforms.
  • #1
dexterdev
194
1
Hi all,
I would like to know if there exists any method to represent multidimensional vectors on a 2D plot so using extracting any unique features of those vectors. Can eigen values be used for such purposes like dimensional reduction. If so how ? I would like to know more about these things, any links or book suggestions will be helpful

-Devanand
 
Physics news on Phys.org
  • #2
Hey dexterdev.

The key question here is: What kind of features you are looking for?

Eigen-values give information about the spectrum of an operator and with the eigen-vectors give an idea of the basis and principal directions that scale existing vectors.

Do you want to for example, see which eigenvector corresponds with say a really small eigenvalue and remove that prinicipal basis vector to make a good enough "approximation" and reduce the dimension?

You could sort the eigen-vectors by eigen-value (in decreasing order) to do this kind of thing (dimension reduction) and depending on the magnitude it will tell you how much that particular vector makes a contribution.
 
  • #3
Sir,
Basically I am interested in learning 'machine learning'. But still I am a beginner. I am trying to get an insight to eigen values and vectors etc. (Spectrum here you mentioned is entirely different from the frequency spectrum right, Sir?). When you say operator, I don't understand anything.
I also believe that Principal components are orthogonal. Is that right?
Can you suggest me some books related to this or links etc.
 
  • #4
There are links between frequency spectrum and spectrum when you are talking about say solutions to linear ordinary differential equations. It really depends on what you are looking at and what you are using the results for.

Principal components are indeed orthogonal: what they actually do is they take a set of independent random vectors (at least that is what is assumed) and they "orthogonalize" the set of vectors by un-correlating them and the way they do is to "rotate" the vectors much in the same way you rotate a vector by multiplying it by a valid rotation matrix (matrix with determinant 1).

These components are sorted from the amount of contribution to variation (think variance) in a decreasing order, so the first set of vectors make large contributions to variation of the data set and the last have the lowest contribution.

This is why in PCA analysis, throwing out the last set of vectors is a way of using lower dimension, and also to check for some kind of linear dependence or "close" linear dependence (by checking the eigenvalues).

The area that you want to check out is Data Mining for more information on these topics. In terms of recommendations, you need to specify the approach you want to take.

Some books are purely practical in the sense that they give you a tool (like a library, independent piece of software, etc) and tell you how to use it through typed commands or a GUI interface.

Other books are more theoretical (data mining by its nature is practical) and they cover results and proofs in a way that gives understanding and is often targeted for researchers, academics, or professionals that have an interest in the theory and its extensions.

If you want the pure theory, I'd recommend you look at the mathematics or statistics books. PCA is a part of multivariate statistics and eigen-values and various matrix decompositions is a part of linear and multi-linear algebra.
 
  • #5
Sir,
Basically what I am looking is for a practical approach. But also I would like to learn the theory behind it. I try to read IEEE papers etc, but when I see big equations I get lost. My problem basically is I have basics lacking. And I take long time in understanding concepts. I have a masters degree in electronics engineering (from India) , but machine learning is a new area for me. Also I am not an expert in my field. Only recently , I discovered that Signal processing and statistics etc are my dear subjects. I dream of doing a PhD in these areas like ML and signal processing.
 
Last edited:
  • #6
From the statistical point of view, you should understand regression modelling and multivariate statistics.

If you have access to a university library, go to the statistics section and get books on those topics. You can use amazon or something similar to get feedback and ratings on the books, but most books should cover the same sorts of things.

There is a dedicated book on Principal Components:

https://www.amazon.com/dp/0387954422/?tag=pfamazon01-20

In terms of data mining, a practical book that covers the tools (not so much the theory) that makes use of open source freely available tools is this:

https://www.amazon.com/dp/1441998896/?tag=pfamazon01-20

Note that there are a lot of books like this, but since R and Rattle are free and open source, it means you can download it and play around with it straight away as opposed to something that costed money (and was expensive).

Also if you plan to do a lot of analysis in the future that uses some of statistics then learning R is a worthwhile investment since there are packages that do almost everything that you can do regarding statistical analysis.

If you don't have a good statistics background to start with, then I'd suggest you get one in some form.

There are a tonne of introduction statistics books including one like this:

https://www.amazon.com/dp/0321795431/?tag=pfamazon01-20

On top of statistics you will probably want to look at things like neural networks, decision trees, and various classification schemes like spatial classification and support vector machines.

Spatial classification looks at dividing space into disjoint regions and they involve things like parametric classification (spheres, ellipsoids, cuboids, etc that are specified using parameters) or non-parametric (k-dops, convex hulls, etc that are defined using general planes). For this you will need to understand geometry and linear algebra.

Also if you read basic research papers, you will need to know what integrals and derivatives are and what they mean in the context of your problem.

Also be aware that different researchers use different platforms. R is a multi-platform environment (linux, windows, mac) but some source codes might be written for linux or packages that are linux only. So if you have to use linux, windows, or a mac exclusively, be aware how to do so.
 
Last edited by a moderator:
  • #7
Thank you for the guidance and suggestions.
What about MATLAB / Octave as a tool in this area? I have prior experience with matlab.
 
  • #8
There should be libraries out there for MATLAB (and possibly Octave) and they will range from open source libraries to commercial ones depending on what you need them for, how good they are performance wise, and what kind of functionality they have.

Just note though, that MATLABs core data structure is that of a matrix: it's not the only thing it does but it is the core structure and it is designed with this in mind.

If you are going to use statistical calculations and similar functionality, then R as a tool is good for this use.

Different tools have different advantages and disadvantages: MATLAB is a computational tool primarily for matrix problems and numerical calculations. R is primarily for statistical type calculations and output (including graphs).

In your work you will have to use multiple tools for the job and you should get used to this idea because using the output of one tool for the input of another is a common thing when the project is broad in its scale.

Be aware of what a tools tradeoffs are and what it should be used for (as well as what it should not).
 
  • #9
thanks again sir...
 

1. What is the purpose of representing multidimensional data vectors on a 2D plot?

The purpose of representing multidimensional data vectors on a 2D plot is to visually display the relationships and patterns between multiple variables in the data. This allows for easier interpretation and analysis of the data.

2. How do we choose which dimensions to plot on the x and y axes?

The dimensions chosen for the x and y axes depend on the specific research question or hypothesis being investigated. Generally, the most important or influential variables are chosen to be plotted on the axes.

3. Can we represent more than two dimensions on a 2D plot?

Yes, it is possible to represent more than two dimensions on a 2D plot using techniques such as color coding, size of data points, and adding additional layers or panels to the plot. However, this can make the plot more complex and difficult to interpret.

4. How do we ensure that the 2D plot accurately represents the data?

To ensure that the 2D plot accurately represents the data, it is important to carefully choose the scaling and range of the axes, and to use appropriate visualization techniques such as using a logarithmic scale or normalizing the data. It is also important to properly label the axes and provide a clear and accurate legend for any additional layers or panels.

5. What are the limitations of representing multidimensional data on a 2D plot?

One limitation of representing multidimensional data on a 2D plot is that it can be difficult to accurately represent the relationships between more than two variables. Additionally, it may not be possible to capture all of the complexity and nuance of the data in a 2D plot, and certain patterns or trends may not be easily visible. In these cases, other visualization techniques such as 3D plots or interactive graphics may be more useful.

Similar threads

  • Astronomy and Astrophysics
Replies
1
Views
273
  • Linear and Abstract Algebra
Replies
2
Views
2K
  • MATLAB, Maple, Mathematica, LaTeX
Replies
1
Views
2K
  • Linear and Abstract Algebra
Replies
8
Views
2K
Replies
2
Views
1K
Replies
9
Views
1K
  • Science and Math Textbooks
Replies
4
Views
617
  • Introductory Physics Homework Help
Replies
14
Views
981
  • STEM Educators and Teaching
Replies
5
Views
512
  • Set Theory, Logic, Probability, Statistics
Replies
2
Views
1K
Back
Top