How to learn Topological Data Analysis

In summary: I don't know... like a proof or something, then that's something you might want to look into, but it looks like you can get by without it.In summary, it looks like you don't need a full proof-based course in algebraic topology to be able to use the algorithms, but if you want to know when it would be useful to use them, then that is something you might need to look into.
  • #1
FallenApple
566
61
It's a really interesting idea. I think I want to eventually add this to my toolbox. But how? I've heard it uses ideas from algebraic topology. But how much of theoretical topology do I actually need to learn? Are proofs important? I don't care about developing the algorithms from scratch. I mean, I just want to be able to use the algorithms or know when it would be useful to use them in a real world data analysis setting. So I'm not sure if a full proof based course in algebraic topology is needed. Opinions?

I have a background in stats and applied math so I already understand the ideas of data analysis. But in terms of relevant theory that is somewhat related, I've only finished up to real and complex analysis. I have self studied the bare bones basics of abstract algebra and point set topology.

Also, how useful is topological data analysis in industry?
 
Physics news on Phys.org
  • #2
FallenApple said:
It's a really interesting idea. I think I want to eventually add this to my toolbox. But how? I've heard it uses ideas from algebraic topology. But how much of theoretical topology do I actually need to learn? Are proofs important? I don't care about developing the algorithms from scratch. I mean, I just want to be able to use the algorithms or know when it would be useful to use them in a real world data analysis setting. So I'm not sure if a full proof based course in algebraic topology is needed. Opinions?

I have a background in stats and applied math so I already understand the ideas of data analysis. But in terms of relevant theory that is somewhat related, I've only finished up to real and complex analysis. I have self studied the bare bones basics of abstract algebra and point set topology.

Also, how useful is topological data analysis in industry?

I am no expert on the general area, but I do know a bit about persistent homology: you attach a homology group to a dataset so that the properties of the data set are somehow mirrorred in the Homological Complex you attach to the space. The homology group is a "graded" object containing groups in different integral dimensions. Data that persists/exists across several dimensions is said to be an accurate (albeit noisy) reflection of the original data set, while data that does not persist is considered to be noise and discarded. Sorry but most of what I know in the area does involve Algebraic Topology with minir tinges of point set to it.
 
  • Like
Likes FallenApple
  • #3
WWGD said:
I am no expert on the general area, but I do know a bit about persistent homology: you attach a homology group to a dataset so that the properties of the data set are somehow mirrorred in the Homological Complex you attach to the space. The homology group is a "graded" object containing groups in different integral dimensions. Data that persists/exists across several dimensions is said to be an accurate (albeit noisy) reflection of the original data set, while data that does not persist is considered to be noise and discarded. Sorry but most of what I know in the area does involve Algebraic Topology with minir tinges of point set to it.

So is it something like the Betti numbers? For example, a torus cloud could be a data set and would be "close" to a torus manifold. And that manifold would have holes of different character when we look at the dimension? For example, it has two one dimensional holes that can be drawn on the surface and a 2 dimensional hole corresponding to the 3d void. And there are different groups associated with theses holes. And data that does not fit into any of these schemes are discarded?
 
  • #4
FallenApple said:
So is it something like the Betti numbers? For example, a torus cloud could be a data set and would be "close" to a torus manifold. And that manifold would have holes of different character when we look at the dimension? For example, it has two one dimensional holes that can be drawn on the surface and a 2 dimensional hole corresponding to the 3d void. And there are different groups associated with theses holes. And data that does not fit into any of these schemes are discarded?
The case I am most familiar with is that of Simplicial Homology. You break down your Simplicial complexes into their respective dimensions. Each k-th dimensional object reflects k-dimensional features of the data. Please give me a bit more time to elaborate, Panera is shutting down in 5 minutes.
 
  • #5
The article A User’s Guide to Topological Data Analysis by Elizabeth Munch looks promising. It's available as a free PDF download from the publishing journal's website.

It doesn't look like you need a ton of topology for this. If you need any references, you could get by with just a few books. A book on discrete mathematics that covers graph theory, like an older edition of Rosen would be a good start if you haven't learned that already. You've probably learned that stuff, but I mentioned it for completeness' sake. After that, introductory texts in abstract algebra and topology should suffice as references for the basic necessary concepts. I recommend Pinter's A Book of Abstract Algebra and Introduction to Topology, 2e by Gamelin and Greene, both Dover books and both excellent in my opinion.
 
  • Like
Likes FallenApple
  • #6
FallenApple said:
It's a really interesting idea. I think I want to eventually add this to my toolbox. But how? I've heard it uses ideas from algebraic topology. But how much of theoretical topology do I actually need to learn? Are proofs important? I don't care about developing the algorithms from scratch. I mean, I just want to be able to use the algorithms or know when it would be useful to use them in a real world data analysis setting. So I'm not sure if a full proof based course in algebraic topology is needed. Opinions?

I have a background in stats and applied math so I already understand the ideas of data analysis. But in terms of relevant theory that is somewhat related, I've only finished up to real and complex analysis. I have self studied the bare bones basics of abstract algebra and point set topology.

Also, how useful is topological data analysis in industry?

As it happens, I just reviewed a book about this- I agree it's a highly interesting topic.

Mario Rasetti and Emanuela Merelli wrote an excellent chapter in the book, maybe look through their publications and see what you can find:

https://www.isi.it/en/publications?year=&domain=&author=Rasetti
http://dblp.org/pers/hd/m/Merelli:Emanuela
 
  • Like
Likes WWGD and FallenApple
  • #7
As I understood, the general idea is the analogy between Topology seen as studying properties not affected by continuous transformations and noisy data seen as a result of " continuously distorting " the original/source. Since dataset is seen as a continuous distortion of the original, the "intrinsic" properties of data are still recognizable.
 
  • Like
Likes FallenApple
  • #8
Andy Resnick said:
As it happens, I just reviewed a book about this- I agree it's a highly interesting topic.

Mario Rasetti and Emanuela Merelli wrote an excellent chapter in the book, maybe look through their publications and see what you can find:

https://www.isi.it/en/publications?year=&domain=&author=Rasetti
http://dblp.org/pers/hd/m/Merelli:Emanuela

You didn't directly mention the title of the book their chapter is in. I assume you mean "Advances in Disordered Systems, Random Processes and Some Applications."
 
  • #9
The Bill said:
You didn't directly mention the title of the book their chapter is in. I assume you mean "Advances in Disordered Systems, Random Processes and Some Applications."

Yep- that's it.
 
  • Like
Likes WWGD and The Bill
  • #10
WWGD said:
As I understood, the general idea is the analogy between Topology seen as studying properties not affected by continuous transformations and noisy data seen as a result of " continuously distorting " the original/source. Since dataset is seen as a continuous distortion of the original, the "intrinsic" properties of data are still recognizable.

Ah ok. That makes sense. So its like an imperfect scanner sees a doughnut as the data cloud when in actually it's a coffee mug and hence the doughnut cloud would be heavily noisy because it is an extreme distortion. It makes sense because the hole would have less noise because there simply would be completely different light coming from it in a systematic manner.
 
  • Like
Likes WWGD
  • #11
FallenApple said:
Ah ok. That makes sense. So its like an imperfect scanner sees a doughnut as the data cloud when in actually it's a coffee mug and hence the doughnut cloud would be heavily noisy because it is an extreme distortion. It makes sense because the hole would have less noise because there simply would be completely different light coming from it in a systematic manner.
Yes, that seems right. It would be interesting to see how holes are treated in persistent homology, e.g., by taking data from, say a doughnut and seeing how the point cloud somehow reflects/describes the existence of a hole. EDIT: Seeing how the associated, say, Simplicial Complex describes the hole.
 
Last edited:

1. What is Topological Data Analysis (TDA)?

Topological Data Analysis (TDA) is a mathematical framework that combines concepts from topology and data analysis to help analyze and understand complex data sets. It uses topological methods to identify patterns and structures in the data, which can then be used for further analysis.

2. What are the benefits of learning TDA?

Learning TDA can help scientists and researchers better understand and analyze complex data sets that may be difficult to interpret using traditional methods. It can also provide new insights and perspectives on the data, leading to new discoveries and advancements in various fields.

3. What are the prerequisites for learning TDA?

A strong foundation in mathematics, specifically in linear algebra and topology, is necessary to understand and apply TDA. A basic knowledge of programming and data analysis is also helpful, as TDA often involves working with large data sets and using computational tools.

4. What are some common applications of TDA?

TDA has a wide range of applications in various fields, including biology, neuroscience, computer science, and social sciences. It has been used to analyze biological networks, brain imaging data, and social networks, among other things. It can also be applied to image and signal processing, and to analyze complex systems in physics and engineering.

5. How can I start learning TDA?

There are various online courses, tutorials, and books available for learning TDA. It is recommended to start with a basic understanding of topology and then explore TDA-specific resources. It is also helpful to practice with real-world data sets and collaborate with other researchers in the field to gain hands-on experience.

Similar threads

  • STEM Academic Advising
Replies
7
Views
2K
  • STEM Academic Advising
Replies
7
Views
2K
  • STEM Academic Advising
Replies
14
Views
705
  • STEM Academic Advising
Replies
5
Views
1K
  • STEM Academic Advising
Replies
7
Views
1K
  • STEM Academic Advising
Replies
3
Views
1K
  • STEM Academic Advising
Replies
3
Views
925
  • STEM Academic Advising
Replies
3
Views
946
  • STEM Academic Advising
Replies
4
Views
983
Replies
8
Views
518
Back
Top