Topological data analysis - where to begin

Ben picture Ben · Aug 6, 2014 · Viewed 8.1k times · Source

I've recently come across 'topological data analysis' (TDA) as a unique way of visualizing large datasets. Here is a Stanford paper with example output towards the end https://research.math.osu.edu/tgda/mapperPBG.pdf.

I'd like to produce similar results but am having difficulty finding runnable code on the net where you install a package, load sample data, then execute a few lines (like http://scikit-learn.org/ examples). My language preference is Python but could use R as well.

Has anybody been able to get traction with TDA and if so, any advice on how to get code up and running?

Answer

vonjd picture vonjd · Apr 15, 2015

There is a new r package out:

TDA: Statistical Tools for Topological Data Analysis
This package provides tools for the statistical analysis of persistent homology and for density clustering.

The very well written vignette can be found here: Introduction to the R package TDA

Abstract

We present a short tutorial and introduction to using the R package TDA, which provides some tools for Topological Data Analysis. In particular, it includes implementations of functions that, given some data, provide topological information about the underlying space, such as the distance function, the distance to a measure, the kNN density estimator, the kernel density estimator, and the kernel distance. The salient topological features of the sublevel sets (or superlevel sets) of these functions can be quantified with persistent homology. We provide an R interface for the efficient algorithms of the C++ libraries GUDHI, Dionysus and PHAT, including a function for the persistent homology of the Rips filtration, and one for the persistent homology of sublevel sets (or superlevel sets) of arbitrary functions evaluated over a grid of points. The significance of the features in the resulting persistence diagrams can be analyzed with functions that implement the methods discussed in Fasy, Lecci, Rinaldo, Wasserman, Balakrishnan, and Singh (2014), Chazal, Fasy, Lecci, Rinaldo, and Wasserman (2014c) and Chazal, Fasy, Lecci, Michel, Rinaldo, and Wasserman (2014a). The R package TDA also includes the implementation of an algorithm for density clustering, which allows us to identify the spatial organization of the probability mass associated to a density function and visualize it by means of a dendrogram, the cluster tree.