How to learn Topological Data Analysis

FallenApple · Oct 5, 2017

It's a really interesting idea. I think I want to eventually add this to my toolbox. But how? I've heard it uses ideas from algebraic topology. But how much of theoretical topology do I actually need to learn? Are proofs important? I don't care about developing the algorithms from scratch. I mean, I just want to be able to use the algorithms or know when it would be useful to use them in a real world data analysis setting. So I'm not sure if a full proof based course in algebraic topology is needed. Opinions?

I have a background in stats and applied math so I already understand the ideas of data analysis. But in terms of relevant theory that is somewhat related, I've only finished up to real and complex analysis. I have self studied the bare bones basics of abstract algebra and point set topology.

Also, how useful is topological data analysis in industry?

WWGD · Oct 5, 2017

FallenApple said:

It's a really interesting idea. I think I want to eventually add this to my toolbox. But how? I've heard it uses ideas from algebraic topology. But how much of theoretical topology do I actually need to learn? Are proofs important? I don't care about developing the algorithms from scratch. I mean, I just want to be able to use the algorithms or know when it would be useful to use them in a real world data analysis setting. So I'm not sure if a full proof based course in algebraic topology is needed. Opinions?

I have a background in stats and applied math so I already understand the ideas of data analysis. But in terms of relevant theory that is somewhat related, I've only finished up to real and complex analysis. I have self studied the bare bones basics of abstract algebra and point set topology.

Also, how useful is topological data analysis in industry?

I am no expert on the general area, but I do know a bit about persistent homology: you attach a homology group to a dataset so that the properties of the data set are somehow mirrorred in the Homological Complex you attach to the space. The homology group is a "graded" object containing groups in different integral dimensions. Data that persists/exists across several dimensions is said to be an accurate (albeit noisy) reflection of the original data set, while data that does not persist is considered to be noise and discarded. Sorry but most of what I know in the area does involve Algebraic Topology with minir tinges of point set to it.

FallenApple · Oct 5, 2017

WWGD said:

I am no expert on the general area, but I do know a bit about persistent homology: you attach a homology group to a dataset so that the properties of the data set are somehow mirrorred in the Homological Complex you attach to the space. The homology group is a "graded" object containing groups in different integral dimensions. Data that persists/exists across several dimensions is said to be an accurate (albeit noisy) reflection of the original data set, while data that does not persist is considered to be noise and discarded. Sorry but most of what I know in the area does involve Algebraic Topology with minir tinges of point set to it.

So is it something like the Betti numbers? For example, a torus cloud could be a data set and would be "close" to a torus manifold. And that manifold would have holes of different character when we look at the dimension? For example, it has two one dimensional holes that can be drawn on the surface and a 2 dimensional hole corresponding to the 3d void. And there are different groups associated with theses holes. And data that does not fit into any of these schemes are discarded?

WWGD · Oct 5, 2017

FallenApple said:

So is it something like the Betti numbers? For example, a torus cloud could be a data set and would be "close" to a torus manifold. And that manifold would have holes of different character when we look at the dimension? For example, it has two one dimensional holes that can be drawn on the surface and a 2 dimensional hole corresponding to the 3d void. And there are different groups associated with theses holes. And data that does not fit into any of these schemes are discarded?

The case I am most familiar with is that of Simplicial Homology. You break down your Simplicial complexes into their respective dimensions. Each k-th dimensional object reflects k-dimensional features of the data. Please give me a bit more time to elaborate, Panera is shutting down in 5 minutes.

The Bill · Oct 6, 2017

The article A User’s Guide to Topological Data Analysis by Elizabeth Munch looks promising. It's available as a free PDF download from the publishing journal's website.

It doesn't look like you need a ton of topology for this. If you need any references, you could get by with just a few books. A book on discrete mathematics that covers graph theory, like an older edition of Rosen would be a good start if you haven't learned that already. You've probably learned that stuff, but I mentioned it for completeness' sake. After that, introductory texts in abstract algebra and topology should suffice as references for the basic necessary concepts. I recommend Pinter's A Book of Abstract Algebra and Introduction to Topology, 2e by Gamelin and Greene, both Dover books and both excellent in my opinion.

Andy Resnick · Oct 6, 2017

FallenApple said:

It's a really interesting idea. I think I want to eventually add this to my toolbox. But how? I've heard it uses ideas from algebraic topology. But how much of theoretical topology do I actually need to learn? Are proofs important? I don't care about developing the algorithms from scratch. I mean, I just want to be able to use the algorithms or know when it would be useful to use them in a real world data analysis setting. So I'm not sure if a full proof based course in algebraic topology is needed. Opinions?

I have a background in stats and applied math so I already understand the ideas of data analysis. But in terms of relevant theory that is somewhat related, I've only finished up to real and complex analysis. I have self studied the bare bones basics of abstract algebra and point set topology.

Also, how useful is topological data analysis in industry?

As it happens, I just reviewed a book about this- I agree it's a highly interesting topic.

Mario Rasetti and Emanuela Merelli wrote an excellent chapter in the book, maybe look through their publications and see what you can find:

https://www.isi.it/en/publications?year=&domain=&author=Rasetti
http://dblp.org/pers/hd/m/Merelli:Emanuela

WWGD · Oct 6, 2017

As I understood, the general idea is the analogy between Topology seen as studying properties not affected by continuous transformations and noisy data seen as a result of " continuously distorting " the original/source. Since dataset is seen as a continuous distortion of the original, the "intrinsic" properties of data are still recognizable.

The Bill · Oct 6, 2017

Andy Resnick said:

As it happens, I just reviewed a book about this- I agree it's a highly interesting topic.

Mario Rasetti and Emanuela Merelli wrote an excellent chapter in the book, maybe look through their publications and see what you can find:

https://www.isi.it/en/publications?year=&domain=&author=Rasetti
http://dblp.org/pers/hd/m/Merelli:Emanuela

You didn't directly mention the title of the book their chapter is in. I assume you mean "Advances in Disordered Systems, Random Processes and Some Applications."

Andy Resnick · Oct 6, 2017

The Bill said:

You didn't directly mention the title of the book their chapter is in. I assume you mean "Advances in Disordered Systems, Random Processes and Some Applications."

Yep- that's it.

FallenApple · Oct 8, 2017

WWGD said:

As I understood, the general idea is the analogy between Topology seen as studying properties not affected by continuous transformations and noisy data seen as a result of " continuously distorting " the original/source. Since dataset is seen as a continuous distortion of the original, the "intrinsic" properties of data are still recognizable.

Ah ok. That makes sense. So its like an imperfect scanner sees a doughnut as the data cloud when in actually it's a coffee mug and hence the doughnut cloud would be heavily noisy because it is an extreme distortion. It makes sense because the hole would have less noise because there simply would be completely different light coming from it in a systematic manner.

WWGD · Oct 8, 2017

FallenApple said:

Ah ok. That makes sense. So its like an imperfect scanner sees a doughnut as the data cloud when in actually it's a coffee mug and hence the doughnut cloud would be heavily noisy because it is an extreme distortion. It makes sense because the hole would have less noise because there simply would be completely different light coming from it in a systematic manner.

Yes, that seems right. It would be interesting to see how holes are treated in persistent homology, e.g., by taking data from, say a doughnut and seeing how the point cloud somehow reflects/describes the existence of a hole. EDIT: Seeing how the associated, say, Simplicial Complex describes the hole.

How to learn Topological Data Analysis

1. What is Topological Data Analysis (TDA)?

2. What are the benefits of learning TDA?

3. What are the prerequisites for learning TDA?

4. What are some common applications of TDA?

5. How can I start learning TDA?

Similar threads

Hot Threads

Recent Insights