Topological Data Analysis - Persistent Homology

Click For Summary

Discussion Overview

The discussion revolves around Topological Data Analysis (TDA), specifically focusing on Persistent Homology and its applications in analyzing high-dimensional data. Participants explore the theoretical underpinnings, practical implications, and the current state of research in this emerging field.

Discussion Character

  • Exploratory
  • Technical explanation
  • Debate/contested

Main Points Raised

  • Some participants express uncertainty about the implications of Persistent Homology and whether all data sets exhibit higher-dimensional shapes or geometries.
  • Concerns are raised regarding the foundational aspects of topology in relation to data analysis, suggesting that finite data sets may only allow trivial topologies without additional hypotheses.
  • Others note a growing interest in TDA from funding agencies, indicating its potential relevance and application in various fields.
  • Some participants argue that TDA should be evaluated on its own merits, despite being a relatively new field in mathematics.
  • Applications in neuroscience are mentioned as examples of TDA's utility.
  • A participant describes the process of assigning a Simplicial Complex to a data set and discusses the role of filtration and persistent homology groups in this context.
  • There is a discussion about the relationship between high-dimensional structured data and the nontrivial topological features that can be extracted through persistent homology.
  • Questions are raised about the connection between simplicial homology and bar codes, specifically why simplicial methods are preferred over others.

Areas of Agreement / Disagreement

Participants exhibit a mix of agreement and disagreement. While some acknowledge the potential of TDA and its applications, others express skepticism about its foundational aspects and the validity of current research. The discussion remains unresolved regarding the overall effectiveness and theoretical grounding of TDA.

Contextual Notes

Participants highlight limitations related to the assumptions underlying TDA, the dependence on definitions of topology, and the unresolved nature of certain mathematical steps involved in the analysis.

phys_student1
Messages
104
Reaction score
0
Hi,

I am not a mathematician, but I have noticed some recent papers on this seemingly new field, called Topological Data Analysis (see this relevant paper).

I have had an overview of the applications and it seems that when you have data points that were sampled from some source (e.g. an image), you can use Persistent Homology to visualize what these data looks like in higher dimensions. (this is my understanding).

I am still unsure what this really means. Will any data set have higher dimensional shape or geometry?
 
Physics news on Phys.org
This is build on thin ice. Topological here means the lack of scales, metrics and coordinates. But data are measures somehow which gives natural coordinates, even though the author says differently. With topology you also get lost of all analytical means, plus that a finite set of data points only allow trivial topologies, except some hypothesis are added.

I wouldn't take the paper very serious, i.e. a closer examination of these additional conditions is due. Topology is a rather new field of mathematics - only 100 years old - so people are still looking for applications outside mathematics. Of course this is a personal opinion, so let's wait and see.
 
I’m by no means an expert, but I have noticed a significant increase in interest around the field of topological data analysis by a number of US funding agencies.

As I understand it, the basic technique is to take high-dimensional data and find lower-dimensional features that are persistent over several different length scales (according to some relevant metric). Features that are persistent are presumed to be related functionally in some way, where’s features that aren’t are generally disregarded as noise. I have no idea how useful the technique is, but I wanted to chime into point out that it seems to have caught funders’ attention here in the US.
 
Don't be put off by people who dismiss topology because it is 'new' -- judge topological data analysis on its own merits.

A basic discussion is here.

https://towardsdatascience.com/from-tda-to-dl-d06f234f51d
A little more discussion is next (I tend to agree with the commentary here: It's an interesting idea, and could bring us some powerful mathematical ideas for sorting structure out from noise in high-dimensional data, but as of now results are mixed.)

https://rviews.rstudio.com/2018/11/...rspective-on-topological-data-analysis-and-r/
 
This may be close to an appeal to authority but I have heard people who seem knowledgeable, smart-enough otherwise endorse it. You create an associated complex to the data you are given . Features that "persist" across dimensions are thought to be "structural" and are otherwise considered noise.
 
This is what I have understood: We somehow assign "functorially" a Simplicial Complex K to a data set S together with a filtration F , meaning the 1-complex is a subset of the 2-complex, and in general, if i<j, the ith complex is a subcomplex of the j-th . The filtration in question usually arises from a Real valued function ##f: K \rightarrow \mathbb R ## defined to mimick or model the problem of interest, which gives rise through a filtration for every Real a, through ## f^{-1}(a) ## for every Real number a. Then the k-th persistent Homology group is the homology induced by inclusion . We ultimately use the fundamental theorem of decomposition of finitely-generated modules over a PID so that the persistent parts are part of the "free part" of copies of ## \mathbb Z ## and the torsion part denotes the non-persistent part/features. We exploit the fact that there is a correspondence between F[t]-modules ( " F -adjoint t modules " *) and " Bar Codes". Bar Codes are collections of intervals describing the persistence of an element of homology. Persistence means homomorphism given by inclusion has a non-zero image.

Hope I did not make it more confusing. Will try to rewrite into more clarity when I can.
* I am not sure what these are, but I believe these are a standard module where "multiplication" is given by some fixed transformation.
 
I feel like people are making it out to be more complicated than it is: the whole idea behind persistent homology is that
1) high dimensional structured data (e.g. images) often lives on some sub manifolds in the total space; and these manifolds often have nontrivial topological data associated, e.g. nontrivial homology groups
2) we can compute these groups effectively via persistent homology, which measures the homology that persists as you take a sampling in your high-dimensional space and grow balls around them. (e.g. if you have a 1-hole represented in H_1 in your data, as you grow balls about the points, the 1-hole will persist for a while, then die off. The persistent features encode the homology of the underlying manifold)
 
  • Informative
Likes   Reactions: TeethWhitener
springbottom said:
I feel like people are making it out to be more complicated than it is: the whole idea behind persistent homology is that
1) high dimensional structured data (e.g. images) often lives on some sub manifolds in the total space; and these manifolds often have nontrivial topological data associated, e.g. nontrivial homology groups
2) we can compute these groups effectively via persistent homology, which measures the homology that persists as you take a sampling in your high-dimensional space and grow balls around them. (e.g. if you have a 1-hole represented in H_1 in your data, as you grow balls about the points, the 1-hole will persist for a while, then die off. The persistent features encode the homology of the underlying manifold)
But how do you explain the ideal match between simplicial homology and bar codes. Why Simplicial and not other types?
 

Similar threads

  • · Replies 25 ·
Replies
25
Views
4K
Replies
27
Views
3K
  • · Replies 7 ·
Replies
7
Views
5K
  • · Replies 12 ·
Replies
12
Views
5K
  • · Replies 10 ·
Replies
10
Views
2K
  • · Replies 3 ·
Replies
3
Views
2K
  • · Replies 5 ·
Replies
5
Views
2K
  • · Replies 2 ·
Replies
2
Views
2K
  • · Replies 42 ·
2
Replies
42
Views
8K
Replies
5
Views
2K