Topological Data Analysis - Persistent Homology

In summary, it seems like this technique could be useful for understanding high-dimensional data and finding related information.
  • #1
phys_student1
106
0
Hi,

I am not a mathematician, but I have noticed some recent papers on this seemingly new field, called Topological Data Analysis (see this relevant paper).

I have had an overview of the applications and it seems that when you have data points that were sampled from some source (e.g. an image), you can use Persistent Homology to visualize what these data looks like in higher dimensions. (this is my understanding).

I am still unsure what this really means. Will any data set have higher dimensional shape or geometry?
 
Physics news on Phys.org
  • #2
This is build on thin ice. Topological here means the lack of scales, metrics and coordinates. But data are measures somehow which gives natural coordinates, even though the author says differently. With topology you also get lost of all analytical means, plus that a finite set of data points only allow trivial topologies, except some hypothesis are added.

I wouldn't take the paper very serious, i.e. a closer examination of these additional conditions is due. Topology is a rather new field of mathematics - only 100 years old - so people are still looking for applications outside mathematics. Of course this is a personal opinion, so let's wait and see.
 
  • #3
I’m by no means an expert, but I have noticed a significant increase in interest around the field of topological data analysis by a number of US funding agencies.

As I understand it, the basic technique is to take high-dimensional data and find lower-dimensional features that are persistent over several different length scales (according to some relevant metric). Features that are persistent are presumed to be related functionally in some way, where’s features that aren’t are generally disregarded as noise. I have no idea how useful the technique is, but I wanted to chime into point out that it seems to have caught funders’ attention here in the US.
 
  • #4
Don't be put off by people who dismiss topology because it is 'new' -- judge topological data analysis on its own merits.

A basic discussion is here.

https://towardsdatascience.com/from-tda-to-dl-d06f234f51d
A little more discussion is next (I tend to agree with the commentary here: It's an interesting idea, and could bring us some powerful mathematical ideas for sorting structure out from noise in high-dimensional data, but as of now results are mixed.)

https://rviews.rstudio.com/2018/11/...rspective-on-topological-data-analysis-and-r/
 
  • #6
This may be close to an appeal to authority but I have heard people who seem knowledgeable, smart-enough otherwise endorse it. You create an associated complex to the data you are given . Features that "persist" across dimensions are thought to be "structural" and are otherwise considered noise.
 
  • #7
This is what I have understood: We somehow assign "functorially" a Simplicial Complex K to a data set S together with a filtration F , meaning the 1-complex is a subset of the 2-complex, and in general, if i<j, the ith complex is a subcomplex of the j-th . The filtration in question usually arises from a Real valued function ##f: K \rightarrow \mathbb R ## defined to mimick or model the problem of interest, which gives rise through a filtration for every Real a, through ## f^{-1}(a) ## for every Real number a. Then the k-th persistent Homology group is the homology induced by inclusion . We ultimately use the fundamental theorem of decomposition of finitely-generated modules over a PID so that the persistent parts are part of the "free part" of copies of ## \mathbb Z ## and the torsion part denotes the non-persistent part/features. We exploit the fact that there is a correspondence between F[t]-modules ( " F -adjoint t modules " *) and " Bar Codes". Bar Codes are collections of intervals describing the persistence of an element of homology. Persistence means homomorphism given by inclusion has a non-zero image.

Hope I did not make it more confusing. Will try to rewrite into more clarity when I can.
* I am not sure what these are, but I believe these are a standard module where "multiplication" is given by some fixed transformation.
 
  • #8
I feel like people are making it out to be more complicated than it is: the whole idea behind persistent homology is that
1) high dimensional structured data (e.g. images) often lives on some sub manifolds in the total space; and these manifolds often have nontrivial topological data associated, e.g. nontrivial homology groups
2) we can compute these groups effectively via persistent homology, which measures the homology that persists as you take a sampling in your high-dimensional space and grow balls around them. (e.g. if you have a 1-hole represented in H_1 in your data, as you grow balls about the points, the 1-hole will persist for a while, then die off. The persistent features encode the homology of the underlying manifold)
 
  • Informative
Likes TeethWhitener
  • #9
springbottom said:
I feel like people are making it out to be more complicated than it is: the whole idea behind persistent homology is that
1) high dimensional structured data (e.g. images) often lives on some sub manifolds in the total space; and these manifolds often have nontrivial topological data associated, e.g. nontrivial homology groups
2) we can compute these groups effectively via persistent homology, which measures the homology that persists as you take a sampling in your high-dimensional space and grow balls around them. (e.g. if you have a 1-hole represented in H_1 in your data, as you grow balls about the points, the 1-hole will persist for a while, then die off. The persistent features encode the homology of the underlying manifold)
But how do you explain the ideal match between simplicial homology and bar codes. Why Simplicial and not other types?
 

1. What is Topological Data Analysis (TDA)?

Topological Data Analysis (TDA) is a mathematical framework that uses topological concepts and techniques to analyze and extract meaningful information from complex datasets. It is a powerful tool for understanding the underlying structure and patterns in high-dimensional data.

2. What is Persistent Homology?

Persistent Homology is a key technique in TDA that aims to identify and measure topological features that persist across different scales in a dataset. It provides a way to capture the shape and connectivity of data points and visualize them in a way that is both robust and interpretable.

3. How does Persistent Homology work?

Persistent Homology works by converting a dataset into a topological space, where each point represents a data point and the distances between points are determined by a chosen metric. This topological space is then analyzed using algebraic topology techniques to identify topological features and their persistence.

4. What are the applications of Topological Data Analysis?

Topological Data Analysis has a wide range of applications in various fields, including biology, neuroscience, computer vision, and materials science. It has been used to study patterns in biological networks, identify anomalies in brain imaging data, and analyze complex materials and their properties.

5. What are the benefits of using Topological Data Analysis?

Topological Data Analysis offers several benefits, such as being able to handle noisy and incomplete datasets, providing a global perspective of the data, and being robust to changes in scale and resolution. It also allows for the extraction of meaningful features and patterns that may not be apparent through traditional data analysis methods.

Similar threads

  • Topology and Analysis
Replies
25
Views
3K
  • Topology and Analysis
Replies
7
Views
3K
  • Topology and Analysis
2
Replies
42
Views
6K
  • STEM Academic Advising
Replies
10
Views
2K
  • Topology and Analysis
Replies
2
Views
1K
Replies
7
Views
624
Replies
12
Views
4K
  • Computing and Technology
Replies
16
Views
2K
  • Topology and Analysis
Replies
2
Views
1K
Back
Top