Skip to content

InDelsTopo

InDelsTopo is a Python package for studying the topological structure of sets of words, especially when their primary source of variation arises from insertions and deletions.

It implements the Insertion Chain Complex introduced in

Natasha Jonoska, Francisco Martinez-Figueroa, and Masahico Saito,
The Insertion Chain Complex: A Topological Approach to the Structure of Word Sets, 2025.


Overview

The Insertion Chain Complex framework provides a topological structure on the relationships between words that differ by insertions and deletions.
It was originally developed to model DNA sequence variation during double-strand break repair but can be applied to any setting where the structure of sets of words needs to be analyzed.

InDelsTopo provides:

  • Construction of Filtrations and Complexes over word sets
  • Tools to compute homology, Euler characteristic curves, and persistent homology
  • Optional integration with SageMath
  • Methods to analyze and visualize the topological structure of word sets

Installation

Install via pip:

pip install InDelsTopo

For full functionality (e.g., integer homology over \(\mathbb{Z}\)), install SageMath and run your notebooks in a SageMath kernel.


Quick Start

from InDelsTopo import Filtration, filtration_plot

# Create a filtration from a small set of words
F = Filtration()
F.compute_d_skeleton(['a', 'b', 'ab', '', 'ba'], [1,2,3,4,5])

# Visualize the sublevel sets at different heights. 
F.get_graph(1)
F.get_graph(2)
F.get_graph(5)

Main Concepts

Concept Description
Block Represents a combinatorial element generated by insertions.
Chain Linear combination of blocks with integer coefficients.
Complex Collection of blocks and faces forming a topological structure.
Filtration Sequence of nested complexes indexed by a height function.

For detailed examples, see the Jupyter Notebook Tutorial.


Documentation Structure

Reference: API documentation.


Citation

If you use InDelsTopo in academic work, please cite:

Jonoska, N., Martinez-Figueroa, F., & Saito, M.
The Insertion Chain Complex: A Topological Approach to the Structure of Word Sets.
arXiv preprint arXiv:2509.12607, 2025.


Contributing

Contributions, pull requests, and feedback are welcome!
Please open an issue on the GitHub repository.


License

This project is licensed under the MIT License.
See the LICENSE file for details.


Acknowledgements

This project was developed at the University of South Florida, in collaboration with Prof. Natasha Jonoska and Prof. Masahico Saito (University of South Florida), and insights from experimental data provided by Prof. Francesca Storici’s lab (Georgia Tech).

This package was built under auspices of the Southeast Center for Mathematics and Biology, an NSF-Simons Research Center for Mathematics of Complex Biological Systems, under National Science Foundation Grant No. DMS-1764406 and Simons Foundation Grant No. 594594 as well as NSF DMS-2054321, CCF-2107267, CCF-2505771 and the W.M. Keck Foundation.