| Title: | Persistence Homology Utilities |
|---|---|
| Description: | A low-level package for hosting persistence data. It is part of the 'TDAverse' suite of packages, which is designed to provide a collection of packages for enabling machine learning and data science tasks using persistent homology. Implements a class for hosting persistence data, a number of coercers from and to already existing and used data structures from other packages and functions to compute distances between persistence diagrams. A formal definition and study of bottleneck and Wasserstein distances can be found in Bubenik, Scott and Stanley (2023) <doi:10.1007/s41468-022-00103-8>. Their implementation in 'phutil' relies on the 'C++' Hera library developed by Kerber, Morozov and Nigmetov (2017) <doi:10.1145/3064175>. |
| Authors: | Aymeric Stamm [aut, cre] (ORCID: <https://orcid.org/0000-0002-8725-3654>), Jason Cory Brunson [aut] (ORCID: <https://orcid.org/0000-0003-3126-9494>), Michael Kerber [ctb] (HERA C++ code), Dmitriy Morozov [ctb] (HERA C++ code), Arnur Nigmetov [ctb] (HERA C++ code) |
| Maintainer: | Aymeric Stamm <[email protected]> |
| License: | MIT + file LICENSE |
| Version: | 0.0.2.9000 |
| Built: | 2026-05-17 09:22:25 UTC |
| Source: | https://github.com/tdaverse/phutil |
A collection of 24 samples of size 120 on the arch spiral from which a
persistence diagram is computed using the TDA::ripsDiag() function with
maxdimension = 2 and maxscale = 6. Each diagram has been generated using
the tdaunif::sample_arch_spiral() function with the following parameters:
n = 120, arms = 2`` and sd = 0.05'. The seed was fixed to 28415.
arch_spiralsarch_spirals
A list of length 24, where each element is an object of class 'persistence'.
This collection of functions computes the distance between two persistence diagrams of the same homology dimension. The diagrams must be represented as 2-column matrices. The first column of the matrix contains the birth times and the second column contains the death times of the points.
bottleneck_distance( x, y, tol = sqrt(.Machine$double.eps), validate = TRUE, dimension = 0L ) wasserstein_distance( x, y, tol = sqrt(.Machine$double.eps), p = 1, validate = TRUE, dimension = 0L ) kantorovich_distance( x, y, tol = sqrt(.Machine$double.eps), p = 1, validate = TRUE, dimension = 0L )bottleneck_distance( x, y, tol = sqrt(.Machine$double.eps), validate = TRUE, dimension = 0L ) wasserstein_distance( x, y, tol = sqrt(.Machine$double.eps), p = 1, validate = TRUE, dimension = 0L ) kantorovich_distance( x, y, tol = sqrt(.Machine$double.eps), p = 1, validate = TRUE, dimension = 0L )
x |
Either a matrix of shape |
y |
Either a matrix of shape |
tol |
A numeric value specifying the relative error. Defaults to
|
validate |
A boolean value specifying whether to validate the input
persistence diagrams. Defaults to |
dimension |
An integer value specifying the homology dimension for which
to compute the distance. Defaults to |
p |
A numeric value specifying the power for the Wasserstein distance.
Defaults to |
A matching between persistence diagrams is a
bijection of multisets, where both diagrams are assumed to have all points on
the diagonal with infinite multiplicity. The -Wasserstein distance
between and is defined as the infimum over all matchings
of the expression
that can be thought of as the Minkowski distance between the diagrams viewed
as vectors on the shared coordinates defined by the matching .
The norm can be arbitrary; as implemented here, it
is the infinity norm . In
the limit , the Wasserstein distance becomes the
bottleneck distance:
The Wasserstein metric is also called the Kantorovich metric in recognition of the originator of the metric.
A numeric value storing either the Bottleneck or the Wasserstein distance between the two persistence diagrams.
bottleneck_distance( persistence_sample[[1]]$pairs[[1]], persistence_sample[[2]]$pairs[[1]] ) bottleneck_distance( persistence_sample[[1]], persistence_sample[[2]] ) wasserstein_distance( persistence_sample[[1]]$pairs[[1]], persistence_sample[[2]]$pairs[[1]] ) wasserstein_distance( persistence_sample[[1]], persistence_sample[[2]] )bottleneck_distance( persistence_sample[[1]]$pairs[[1]], persistence_sample[[2]]$pairs[[1]] ) bottleneck_distance( persistence_sample[[1]], persistence_sample[[2]] ) wasserstein_distance( persistence_sample[[1]]$pairs[[1]], persistence_sample[[2]]$pairs[[1]] ) wasserstein_distance( persistence_sample[[1]], persistence_sample[[2]] )
A simulated data set consisting of 100 points sampled from a circle with additive Gaussian noise using a standard deviation of 0.05.
noisy_circle_points noisy_circle_ripserr noisy_circle_tda_ripsnoisy_circle_points noisy_circle_ripserr noisy_circle_tda_rips
noisy_circle_pointsA matrix with 100 rows and 2 columns listing the coordinates of the points.
noisy_circle_ripserrAn object of class 'PHom' as returned by the
ripserr::vietoris_rips()
function, which is a data frame with 3 variables:
dimension: the dimension/degree of the feature,
birth: the birth value of the feature,
death: the death value of the feature.
noisy_circle_tda_ripsA list of length 1 containing an object of class 'diagram' as returned by the
TDA::ripsDiag()$diagram
function, which is a matrix with 3 columns:
dimension: the dimension/degree of the feature,
birth: the birth value of the feature,
death: the death value of the feature.
An object of class PHom (inherits from data.frame) with 101 rows and 3 columns.
An object of class list of length 1.
The point cloud stored in noisy_circle_points has been generated using the
tdaunif package using the
tdaunif::sample_circle()
function. Specifically, the following parameters were used: n = 100, sd = 0.05 and a seed of 1234.
The persistence diagram stored in noisy_circle_ripserr has been computed
using the ripserr package with the
ripserr::vietoris_rips()
function. Specifically, the following parameters were used: max_dim = 1L.
The persistence diagram stored in noisy_circle_tda_rips has been computed
using the TDA package with the
TDA::ripsDiag()
function. Specifically, the following parameters were used: maxdimension = 1L and maxscale = 1.6322.
https://tdaverse.github.io/tdaunif/reference/circles.html, https://tdaverse.github.io/ripserr/reference/vietoris_rips.html, https://www.rdocumentation.org/packages/TDA/versions/1.9.1/topics/ripsDiag
This collection of functions computes the pairwise distance matrix between all pairs in a set of persistence diagrams of the same homology dimension. The diagrams must be represented as 2-column matrices. The first column of the matrix contains the birth times and the second column contains the death times of the points.
bottleneck_pairwise_distances( x, tol = sqrt(.Machine$double.eps), validate = TRUE, dimension = 0L, ncores = 1L ) wasserstein_pairwise_distances( x, tol = sqrt(.Machine$double.eps), p = 1, validate = TRUE, dimension = 0L, ncores = 1L ) kantorovich_pairwise_distances( x, tol = sqrt(.Machine$double.eps), p = 1, validate = TRUE, dimension = 0L, ncores = 1L )bottleneck_pairwise_distances( x, tol = sqrt(.Machine$double.eps), validate = TRUE, dimension = 0L, ncores = 1L ) wasserstein_pairwise_distances( x, tol = sqrt(.Machine$double.eps), p = 1, validate = TRUE, dimension = 0L, ncores = 1L ) kantorovich_pairwise_distances( x, tol = sqrt(.Machine$double.eps), p = 1, validate = TRUE, dimension = 0L, ncores = 1L )
x |
A list of either 2-column matrices or objects of class persistence specifying the set of persistence diagrams. |
tol |
A numeric value specifying the relative error. Defaults to
|
validate |
A boolean value specifying whether to validate the input
persistence diagrams. Defaults to |
dimension |
An integer value specifying the homology dimension for which
to compute the distance. Defaults to |
ncores |
An integer value specifying the number of cores to use for
parallel computation. Defaults to |
p |
A numeric value specifying the power for the Wasserstein distance.
Defaults to |
An object of class 'dist' containing the pairwise distance matrix between the persistence diagrams.
spl <- persistence_sample[1:10] # Extract the list of 2-column matrices for dimension 0 in the sample x <- lapply(spl[1:10], function(x) x$pairs[[1]]) # Compute the pairwise Bottleneck distances Db <- bottleneck_pairwise_distances(spl) Db <- bottleneck_pairwise_distances(x) # Compute the pairwise Wasserstein distances Dw <- wasserstein_pairwise_distances(spl) Dw <- wasserstein_pairwise_distances(x)spl <- persistence_sample[1:10] # Extract the list of 2-column matrices for dimension 0 in the sample x <- lapply(spl[1:10], function(x) x$pairs[[1]]) # Compute the pairwise Bottleneck distances Db <- bottleneck_pairwise_distances(spl) Db <- bottleneck_pairwise_distances(x) # Compute the pairwise Wasserstein distances Dw <- wasserstein_pairwise_distances(spl) Dw <- wasserstein_pairwise_distances(x)
S3 class for storing persistence dataA collection of functions to coerce persistence data into objects of class
persistence (See Value section for more details on this class). It is
currently possible to coerce persistence data from the following sources:
a matrix with at least 3 columns (dimension/degree, start/birth, end/death)
as returned by
ripserr::vietoris_rips()
in the form of the 'PHom' class,
a list as returned by any *Diag() function in the TDA package.
as_persistence(x, warn = TRUE, ...) ## S3 method for class 'list' as_persistence(x, warn = TRUE, ...) ## S3 method for class 'persistence' as_persistence(x, warn = TRUE, ...) ## S3 method for class 'data.frame' as_persistence(x, warn = TRUE, ...) ## S3 method for class 'matrix' as_persistence(x, warn = TRUE, ...) ## S3 method for class 'diagram' as_persistence(x, warn = TRUE, ...) ## S3 method for class 'PHom' as_persistence(x, ...) ## S3 method for class 'hclust' as_persistence(x, warn = TRUE, birth = NULL, ...) ## S3 method for class 'persistence' print(x, ...) ## S3 method for class 'persistence' format(x, ...) get_pairs(x, dimension, ...) ## S3 method for class 'persistence' as.matrix(x, ...) ## S3 method for class 'persistence' as.data.frame(x, row.names = NULL, optional = TRUE, ...) as_diagram(x, ...) ## S3 method for class 'persistence' as_diagram(x, list = TRUE, ...) ## S3 method for class 'PHom' as_diagram(x, list = TRUE, ...)as_persistence(x, warn = TRUE, ...) ## S3 method for class 'list' as_persistence(x, warn = TRUE, ...) ## S3 method for class 'persistence' as_persistence(x, warn = TRUE, ...) ## S3 method for class 'data.frame' as_persistence(x, warn = TRUE, ...) ## S3 method for class 'matrix' as_persistence(x, warn = TRUE, ...) ## S3 method for class 'diagram' as_persistence(x, warn = TRUE, ...) ## S3 method for class 'PHom' as_persistence(x, ...) ## S3 method for class 'hclust' as_persistence(x, warn = TRUE, birth = NULL, ...) ## S3 method for class 'persistence' print(x, ...) ## S3 method for class 'persistence' format(x, ...) get_pairs(x, dimension, ...) ## S3 method for class 'persistence' as.matrix(x, ...) ## S3 method for class 'persistence' as.data.frame(x, row.names = NULL, optional = TRUE, ...) as_diagram(x, ...) ## S3 method for class 'persistence' as_diagram(x, list = TRUE, ...) ## S3 method for class 'PHom' as_diagram(x, list = TRUE, ...)
x |
An
|
warn |
A boolean specifying whether to issue a warning if the input
persistence data contained unordered pairs. Defaults to |
... |
Parameters passed to methods. |
birth |
A numeric value specifying the height at which to declare all
leaves were born. Defaults to |
dimension |
A non-negative integer specifying the homology dimension for which to recover a matrix of persistence pairs. |
row.names |
|
optional |
logical. If |
list |
Logical; whether to return the |
Caution. When providing an unnamed input matrix, the matrix coercer assumes that it has at least 3 columns, with the first column being the dimension/degree, the second column being the start/birth and the third column being the end/death.
An object of class persistence which is a list of 2 elements:
pairs: A list of 2-column matrices containing birth-death pairs. The
-th element of the list corresponds to the -th homology
dimension. If there is no pairs for a given dimension but there are pairs in
higher dimensions, the corresponding element(s) is/are filled with a
numeric matrix.
metadata: A list of length 6 containing information about how the data
was computed:
orderered_pairs: A boolean indicating whether the pairs in the
pairs list are ordered (i.e. the first column is strictly less than the
second column).
data: The name of the object containing the original data on which the
persistence data was computed.
engine: The name of the package and the function of this package that
computed the persistence data in the form
"package_name::package_function".
filtration: The filtration used in the computation in a human-readable
format (i.e. full names, capitals where need, etc.).
parameters: A list of parameters used in the computation.
call: The exact call that generated the persistence data.
as_persistence(noisy_circle_ripserr) x <- as_persistence(noisy_circle_tda_rips) x as_persistence(x) get_pairs(x, dimension = 1) as.data.frame(x) # back and forth between `diagram` and `persistence` x <- tdaunif::sample_projective_plane(n = 12) ( d <- TDA::alphaComplexDiag(x, maxdimension = 2) ) ( p <- as_persistence(d) ) as_diagram(p) as_diagram(p, list = FALSE) # distances between cities euroclust <- hclust(eurodist, method = "ward.D") as_persistence(euroclust) # `hclust()` can accommodate negative distances d <- as.dist(rbind(c(0, 3, -4), c(3, 0, 5), c(-4, 5, 0))) hc <- hclust(d, method = "single") ph <- as_persistence(hc, birth = -10) get_pairs(ph, 0)as_persistence(noisy_circle_ripserr) x <- as_persistence(noisy_circle_tda_rips) x as_persistence(x) get_pairs(x, dimension = 1) as.data.frame(x) # back and forth between `diagram` and `persistence` x <- tdaunif::sample_projective_plane(n = 12) ( d <- TDA::alphaComplexDiag(x, maxdimension = 2) ) ( p <- as_persistence(d) ) as_diagram(p) as_diagram(p, list = FALSE) # distances between cities euroclust <- hclust(eurodist, method = "ward.D") as_persistence(euroclust) # `hclust()` can accommodate negative distances d <- as.dist(rbind(c(0, 3, -4), c(3, 0, 5), c(-4, 5, 0))) hc <- hclust(d, method = "single") ph <- as_persistence(hc, birth = -10) get_pairs(ph, 0)
A collection of 100 samples of size 100 on the sphere from which a
persistence diagram is computed using the TDA::ripsDiag() function with
parameters maxdimension = 1L and maxscale = 1.6322. Each diagram has been
generated using the tdaunif::sample_2sphere() function with the following
parameters: n = 100 and sd = 0.05. The seed was fixed to 1234.
persistence_samplepersistence_sample
A list of length 100, where each element is an object of class 'persistence'.
An 'S3' class object for storing sets of persistence diagrams
as_persistence_set(x) ## S3 method for class 'persistence_set' format(x, ...) ## S3 method for class 'persistence_set' print(x, ...)as_persistence_set(x) ## S3 method for class 'persistence_set' format(x, ...) ## S3 method for class 'persistence_set' print(x, ...)
x |
A list of objects of class persistence. |
... |
Additional arguments passed to the function. |
An object of class 'persistence_set' containing the set of persistence diagrams.
# Create a persistence set from a list of persistence diagrams as_persistence_set(persistence_sample[1:10])# Create a persistence set from a list of persistence diagrams as_persistence_set(persistence_sample[1:10])
A collection of 24 samples of size 120 on the trefoil from which a
persistence diagram is computed using the TDA::ripsDiag() function with
maxdimension = 2 and maxscale = 6. Each diagram has been generated using the
tdaunif::sample_trefoil() function with the following parameters: n = 120 and sd = 0.05. The seed was fixed to 28415.
trefoilstrefoils
A list of length 24, where each element is an object of class 'persistence'.