Studying How to learn Topological Data Analysis

Click For Summary
The discussion centers on the interest in incorporating topological data analysis (TDA) into data analysis practices, particularly its connection to algebraic topology. Participants express curiosity about the level of theoretical knowledge required, specifically whether a deep understanding of algebraic topology and proofs is necessary for practical application. The consensus suggests that a full proof-based course may not be essential, especially for those who aim to utilize existing algorithms rather than develop them from scratch. Background knowledge in statistics and applied mathematics is noted, with some familiarity in real and complex analysis, abstract algebra, and point set topology. The utility of TDA in industry is questioned, with references to concepts like persistent homology and Betti numbers, which help in understanding the structure of datasets by identifying features that persist across dimensions. Resources such as introductory texts on discrete mathematics, abstract algebra, and topology are recommended for foundational concepts.
FallenApple
Messages
564
Reaction score
61
It's a really interesting idea. I think I want to eventually add this to my toolbox. But how? I've heard it uses ideas from algebraic topology. But how much of theoretical topology do I actually need to learn? Are proofs important? I don't care about developing the algorithms from scratch. I mean, I just want to be able to use the algorithms or know when it would be useful to use them in a real world data analysis setting. So I'm not sure if a full proof based course in algebraic topology is needed. Opinions?

I have a background in stats and applied math so I already understand the ideas of data analysis. But in terms of relevant theory that is somewhat related, I've only finished up to real and complex analysis. I have self studied the bare bones basics of abstract algebra and point set topology.

Also, how useful is topological data analysis in industry?
 
Physics news on Phys.org
FallenApple said:
It's a really interesting idea. I think I want to eventually add this to my toolbox. But how? I've heard it uses ideas from algebraic topology. But how much of theoretical topology do I actually need to learn? Are proofs important? I don't care about developing the algorithms from scratch. I mean, I just want to be able to use the algorithms or know when it would be useful to use them in a real world data analysis setting. So I'm not sure if a full proof based course in algebraic topology is needed. Opinions?

I have a background in stats and applied math so I already understand the ideas of data analysis. But in terms of relevant theory that is somewhat related, I've only finished up to real and complex analysis. I have self studied the bare bones basics of abstract algebra and point set topology.

Also, how useful is topological data analysis in industry?

I am no expert on the general area, but I do know a bit about persistent homology: you attach a homology group to a dataset so that the properties of the data set are somehow mirrorred in the Homological Complex you attach to the space. The homology group is a "graded" object containing groups in different integral dimensions. Data that persists/exists across several dimensions is said to be an accurate (albeit noisy) reflection of the original data set, while data that does not persist is considered to be noise and discarded. Sorry but most of what I know in the area does involve Algebraic Topology with minir tinges of point set to it.
 
  • Like
Likes FallenApple
WWGD said:
I am no expert on the general area, but I do know a bit about persistent homology: you attach a homology group to a dataset so that the properties of the data set are somehow mirrorred in the Homological Complex you attach to the space. The homology group is a "graded" object containing groups in different integral dimensions. Data that persists/exists across several dimensions is said to be an accurate (albeit noisy) reflection of the original data set, while data that does not persist is considered to be noise and discarded. Sorry but most of what I know in the area does involve Algebraic Topology with minir tinges of point set to it.

So is it something like the Betti numbers? For example, a torus cloud could be a data set and would be "close" to a torus manifold. And that manifold would have holes of different character when we look at the dimension? For example, it has two one dimensional holes that can be drawn on the surface and a 2 dimensional hole corresponding to the 3d void. And there are different groups associated with theses holes. And data that does not fit into any of these schemes are discarded?
 
FallenApple said:
So is it something like the Betti numbers? For example, a torus cloud could be a data set and would be "close" to a torus manifold. And that manifold would have holes of different character when we look at the dimension? For example, it has two one dimensional holes that can be drawn on the surface and a 2 dimensional hole corresponding to the 3d void. And there are different groups associated with theses holes. And data that does not fit into any of these schemes are discarded?
The case I am most familiar with is that of Simplicial Homology. You break down your Simplicial complexes into their respective dimensions. Each k-th dimensional object reflects k-dimensional features of the data. Please give me a bit more time to elaborate, Panera is shutting down in 5 minutes.
 
The article A User’s Guide to Topological Data Analysis by Elizabeth Munch looks promising. It's available as a free PDF download from the publishing journal's website.

It doesn't look like you need a ton of topology for this. If you need any references, you could get by with just a few books. A book on discrete mathematics that covers graph theory, like an older edition of Rosen would be a good start if you haven't learned that already. You've probably learned that stuff, but I mentioned it for completeness' sake. After that, introductory texts in abstract algebra and topology should suffice as references for the basic necessary concepts. I recommend Pinter's A Book of Abstract Algebra and Introduction to Topology, 2e by Gamelin and Greene, both Dover books and both excellent in my opinion.
 
  • Like
Likes FallenApple
FallenApple said:
It's a really interesting idea. I think I want to eventually add this to my toolbox. But how? I've heard it uses ideas from algebraic topology. But how much of theoretical topology do I actually need to learn? Are proofs important? I don't care about developing the algorithms from scratch. I mean, I just want to be able to use the algorithms or know when it would be useful to use them in a real world data analysis setting. So I'm not sure if a full proof based course in algebraic topology is needed. Opinions?

I have a background in stats and applied math so I already understand the ideas of data analysis. But in terms of relevant theory that is somewhat related, I've only finished up to real and complex analysis. I have self studied the bare bones basics of abstract algebra and point set topology.

Also, how useful is topological data analysis in industry?

As it happens, I just reviewed a book about this- I agree it's a highly interesting topic.

Mario Rasetti and Emanuela Merelli wrote an excellent chapter in the book, maybe look through their publications and see what you can find:

https://www.isi.it/en/publications?year=&domain=&author=Rasetti
http://dblp.org/pers/hd/m/Merelli:Emanuela
 
  • Like
Likes WWGD and FallenApple
As I understood, the general idea is the analogy between Topology seen as studying properties not affected by continuous transformations and noisy data seen as a result of " continuously distorting " the original/source. Since dataset is seen as a continuous distortion of the original, the "intrinsic" properties of data are still recognizable.
 
  • Like
Likes FallenApple
Andy Resnick said:
As it happens, I just reviewed a book about this- I agree it's a highly interesting topic.

Mario Rasetti and Emanuela Merelli wrote an excellent chapter in the book, maybe look through their publications and see what you can find:

https://www.isi.it/en/publications?year=&domain=&author=Rasetti
http://dblp.org/pers/hd/m/Merelli:Emanuela

You didn't directly mention the title of the book their chapter is in. I assume you mean "Advances in Disordered Systems, Random Processes and Some Applications."
 
The Bill said:
You didn't directly mention the title of the book their chapter is in. I assume you mean "Advances in Disordered Systems, Random Processes and Some Applications."

Yep- that's it.
 
  • Like
Likes WWGD and The Bill
  • #10
WWGD said:
As I understood, the general idea is the analogy between Topology seen as studying properties not affected by continuous transformations and noisy data seen as a result of " continuously distorting " the original/source. Since dataset is seen as a continuous distortion of the original, the "intrinsic" properties of data are still recognizable.

Ah ok. That makes sense. So its like an imperfect scanner sees a doughnut as the data cloud when in actually it's a coffee mug and hence the doughnut cloud would be heavily noisy because it is an extreme distortion. It makes sense because the hole would have less noise because there simply would be completely different light coming from it in a systematic manner.
 
  • Like
Likes WWGD
  • #11
FallenApple said:
Ah ok. That makes sense. So its like an imperfect scanner sees a doughnut as the data cloud when in actually it's a coffee mug and hence the doughnut cloud would be heavily noisy because it is an extreme distortion. It makes sense because the hole would have less noise because there simply would be completely different light coming from it in a systematic manner.
Yes, that seems right. It would be interesting to see how holes are treated in persistent homology, e.g., by taking data from, say a doughnut and seeing how the point cloud somehow reflects/describes the existence of a hole. EDIT: Seeing how the associated, say, Simplicial Complex describes the hole.
 
Last edited:

Similar threads

  • · Replies 6 ·
Replies
6
Views
2K
  • · Replies 7 ·
Replies
7
Views
3K
Replies
41
Views
6K
  • · Replies 7 ·
Replies
7
Views
3K
  • · Replies 14 ·
Replies
14
Views
3K
  • · Replies 7 ·
Replies
7
Views
2K
  • · Replies 5 ·
Replies
5
Views
2K
  • · Replies 3 ·
Replies
3
Views
2K
  • · Replies 13 ·
Replies
13
Views
11K
  • · Replies 5 ·
Replies
5
Views
5K