In this vignette, we will generate a point cloud using a sample of
2-dimensional points on the unit circle’s circumference; this will be
stored in a variable named circle_df
.
# create reproducible dataset
set.seed(42)
angles <- runif(25, 0, 2 * pi)
circle_df <- data.frame(x = cos(angles),
y = sin(angles))
# take a peek at first 6 rows
head(circle_df)
#> x y
#> 1 0.8601210 -0.5100900
#> 2 0.9228553 -0.3851468
#> 3 -0.2251251 0.9743299
#> 4 0.4842164 -0.8749483
#> 5 -0.6289353 -0.7774577
#> 6 -0.9928106 -0.1196957
Above, each of the 100 rows represents a single point, with each of
the 2 columns representing a Cartesian coordinate for a single
dimension. Column x
contains the x-coordinates of the 100
points and column y
contains the respective y-coordinates.
To confirm that the points in circle_df
do lie on the
circumference of a circle, we can quickly create a scatterplot.
Given that the points in circle_df
are uniformly
distributed across the circumference of a circle without any error or
noise, we expect a single prominent 1-cycle to be present in its
persistent homology. The Ripser C++ library is
wrapped by R using Rcpp,
and performs calculations on a Vietoris-Rips complex created with the
input point cloud. These calculations result in a data frame that
contains all the necessary information to characterize the persistence
of homological features within circle_df
, and can be
performed with a single line of R code using ripserr.
# calculate persistent homology
circ_phom <- vietoris_rips(circle_df)
# print first 6 features (ordered by dimension and birth)
head(circ_phom)
#> dimension birth death
#> 1 0 0 0.01509939
#> 2 0 0 0.01846671
#> 3 0 0 0.02540582
#> 4 0 0 0.02859409
#> 5 0 0 0.04180345
#> 6 0 0 0.06699952
# print last 6 features (ordered by dimension and birth)
tail(circ_phom)
#> dimension birth death
#> 20 0 0.000000 0.4582335
#> 21 0 0.000000 0.5059727
#> 22 0 0.000000 0.5793416
#> 23 0 0.000000 0.5812266
#> 24 0 0.000000 0.7170409
#> 25 1 1.026735 1.7859821
Each row in the homology matrix returned by the
vietoris_rips
function (variable named
circ_phom
) represents a single feature (cycle). The
homology matrix has 3 columns in the following order:
Persistence of a feature is generally defined as the length of the
interval of the radius within which the feature exists. This can be
calculated as the numerical difference between the second (birth) and
third (death) columns of the homology matrix. Confirmed in the output of
the head
and tail
functions above, the
homology matrix is ordered by dimension, with the birth column used to
compare features of the same dimension. As expected for
circle_df
, the homology matrix contains a single prominent
1-cycle (last line of tail
’s output). Although we suspect
the feature to be a persistent 1-cycle, comparison with the other
features in the homology matrix is required to confirm that it is
sufficiently persistent. This task is done far more easily with an
appropriate visualization (e.g. topological barcode or persistence
diagram) than by eyeballing the contents of circ_phom
. The
ggtda and TDAstats R
packages could be useful to create such visualizations.
Although we used a data frame to store the example point cloud, a
matrix or dist object would work equally well. Let’s test this out by
creating the matrix and dist equivalents of circle_df
and
checking if the resulting vietoris_rips
outputs are equal
using base::all.equal
to compare data frames.
# matrix format
circ_mat <- as.matrix(circle_df)
head(circ_mat)
#> x y
#> [1,] 0.8601210 -0.5100900
#> [2,] 0.9228553 -0.3851468
#> [3,] -0.2251251 0.9743299
#> [4,] 0.4842164 -0.8749483
#> [5,] -0.6289353 -0.7774577
#> [6,] -0.9928106 -0.1196957
# dist object
circ_dist <- dist(circ_mat)
head(circ_dist)
#> [1] 0.1398085 1.8388207 0.5238567 1.5128695 1.8936112 1.0621818
# calculate persistent homology of each
mat_phom <- vietoris_rips(circ_mat)
dist_phom<- vietoris_rips(circ_dist)
# compare equality
all.equal(circ_phom, mat_phom)
#> [1] TRUE
all.equal(circ_phom, dist_phom)
#> [1] TRUE
Ripser can calculate persistent homology with coefficients in various
prime fields (ℤ/pℤ). ripserr
implements this functionality via the p
parameter in
vietoris_rips
.