# Topological Data Analysis - Persistent Homology

• phys_student1

#### phys_student1

Hi,

I am not a mathematician, but I have noticed some recent papers on this seemingly new field, called Topological Data Analysis (see this relevant paper).

I have had an overview of the applications and it seems that when you have data points that were sampled from some source (e.g. an image), you can use Persistent Homology to visualize what these data looks like in higher dimensions. (this is my understanding).

I am still unsure what this really means. Will any data set have higher dimensional shape or geometry?

This is build on thin ice. Topological here means the lack of scales, metrics and coordinates. But data are measures somehow which gives natural coordinates, even though the author says differently. With topology you also get lost of all analytical means, plus that a finite set of data points only allow trivial topologies, except some hypothesis are added.

I wouldn't take the paper very serious, i.e. a closer examination of these additional conditions is due. Topology is a rather new field of mathematics - only 100 years old - so people are still looking for applications outside mathematics. Of course this is a personal opinion, so let's wait and see.

I’m by no means an expert, but I have noticed a significant increase in interest around the field of topological data analysis by a number of US funding agencies.

As I understand it, the basic technique is to take high-dimensional data and find lower-dimensional features that are persistent over several different length scales (according to some relevant metric). Features that are persistent are presumed to be related functionally in some way, where’s features that aren’t are generally disregarded as noise. I have no idea how useful the technique is, but I wanted to chime into point out that it seems to have caught funders’ attention here in the US.

Don't be put off by people who dismiss topology because it is 'new' -- judge topological data analysis on its own merits.

A basic discussion is here.

https://towardsdatascience.com/from-tda-to-dl-d06f234f51d
A little more discussion is next (I tend to agree with the commentary here: It's an interesting idea, and could bring us some powerful mathematical ideas for sorting structure out from noise in high-dimensional data, but as of now results are mixed.)

https://rviews.rstudio.com/2018/11/...rspective-on-topological-data-analysis-and-r/

This may be close to an appeal to authority but I have heard people who seem knowledgeable, smart-enough otherwise endorse it. You create an associated complex to the data you are given . Features that "persist" across dimensions are thought to be "structural" and are otherwise considered noise.

This is what I have understood: We somehow assign "functorially" a Simplicial Complex K to a data set S together with a filtration F , meaning the 1-complex is a subset of the 2-complex, and in general, if i<j, the ith complex is a subcomplex of the j-th . The filtration in question usually arises from a Real valued function ##f: K \rightarrow \mathbb R ## defined to mimick or model the problem of interest, which gives rise through a filtration for every Real a, through ## f^{-1}(a) ## for every Real number a. Then the k-th persistent Homology group is the homology induced by inclusion . We ultimately use the fundamental theorem of decomposition of finitely-generated modules over a PID so that the persistent parts are part of the "free part" of copies of ## \mathbb Z ## and the torsion part denotes the non-persistent part/features. We exploit the fact that there is a correspondence between F[t]-modules ( " F -adjoint t modules " *) and " Bar Codes". Bar Codes are collections of intervals describing the persistence of an element of homology. Persistence means homomorphism given by inclusion has a non-zero image.

Hope I did not make it more confusing. Will try to rewrite into more clarity when I can.

* I am not sure what these are, but I believe these are a standard module where "multiplication" is given by some fixed transformation.

I feel like people are making it out to be more complicated than it is: the whole idea behind persistent homology is that
1) high dimensional structured data (e.g. images) often lives on some sub manifolds in the total space; and these manifolds often have nontrivial topological data associated, e.g. nontrivial homology groups
2) we can compute these groups effectively via persistent homology, which measures the homology that persists as you take a sampling in your high-dimensional space and grow balls around them. (e.g. if you have a 1-hole represented in H_1 in your data, as you grow balls about the points, the 1-hole will persist for a while, then die off. The persistent features encode the homology of the underlying manifold)

TeethWhitener
I feel like people are making it out to be more complicated than it is: the whole idea behind persistent homology is that
1) high dimensional structured data (e.g. images) often lives on some sub manifolds in the total space; and these manifolds often have nontrivial topological data associated, e.g. nontrivial homology groups
2) we can compute these groups effectively via persistent homology, which measures the homology that persists as you take a sampling in your high-dimensional space and grow balls around them. (e.g. if you have a 1-hole represented in H_1 in your data, as you grow balls about the points, the 1-hole will persist for a while, then die off. The persistent features encode the homology of the underlying manifold)
But how do you explain the ideal match between simplicial homology and bar codes. Why Simplicial and not other types?