Leveraging Markov Model Lumpability and Ontologies to Enhance Phylogenetics and Biosystematics

During the bi-weekly CCEM meeting on 11 September 2024, Sergei Tarasov gave a presentation. Sergei Tarasov is curator and Associate Professor at the Finnish Museum of Natural History.

Abstract

Analyzing phenotypes—traits of organisms—is a cornerstone of evolutionary biology. Traits are crucial for reconstructing phylogenetic trees, inferring ancestral character states, and understanding the mechanisms driving species diversification across the Tree of Life.
In this talk, I will primarily focus on discrete traits, which are modeled in a phylogenetic context using continuous-time Markov chains (CTMCs). I will explore how an intriguing mathematical concept known as Markov chain lumpability can enhance existing models and support the development of new models to address emerging biological questions. Lumpability also offers a solution to the issue of model congruence, a problem that arises when multiple models representing vastly different evolutionary scenarios fit the data equally well, challenging standard model selection practices.
Furthermore, I will discuss how to make species traits interpretable by computers using ontologies—computational frameworks that represent domain knowledge (e.g., anatomy) through well-defined terms and relationships. Specifically, I will highlight Phenoscript, a language that leverages ontologies to generate machine-readable descriptions of species and their traits. This approach has the potential to facilitate large-scale trait analysis for phylogenetics and biosystematics.

Bio

Sergei Tarasov earned my PhD in Entomology from the University of Oslo and completed two postdoctoral fellowships at the National Institute for Mathematical and Biological Synthesis (Knoxville, USA) and Virginia Tech (Blacksburg, USA) on modeling trait evolution. Currently, he serves as a curator and Associate Professor at the Finnish Museum of Natural History. His research centers on the systematics of beetles, with empirical insights fueling my theoretical work. This includes developing computational tools and statistical models for trait analysis on phylogenetic trees and taxonomy, aimed at addressing a wide range of evolutionary questions.