Research
graphic slider element
Research topics
Bioinformatics
Bioinformatics applies computational techniques to the analysis of biological sequence data, with DNA as a primary focus. DNA is composed of the four nucleotides adenine (A), cytosine (C), guanine (G) and thymine (T), arranged in linear sequences that store genetic information. Typical bioinformatics tasks include the identification of protein-coding genes, transcription factor binding sites, regulatory regions and the intron–exon structure of transcripts. A central objective is to detect and systematically annotate these genomic features to support molecular and genetic analyses.
Key tasks are:
- Assembly: Reconstruction of longer genomic sequences from individual sequencing reads, often complicated by repetitive regions and uneven coverage.
- Alignment: Mapping sequencing reads to a reference genome to determine their genomic origin and enable downstream analyses.
- Variant Calling: Identification of genetic differences such as single-nucleotide variants, insertions, deletions or structural variants by comparing aligned reads to the reference.
- Annotation / Gene Prediction: Detection and characterization of genomic elements—including genes, introns, exons, and regulatory regions—using computational models such as Hidden Markov Models.
- Variant Effect Prediction: Evaluation of the potential functional consequences of genetic variants. Recent approaches increasingly make use of DNA language models, which learn statistical patterns in genomic sequences to improve predictions of regulatory, structural, or splicing-related impacts.
DNA Language Models
DNA Language Models apply concepts from natural language processing to genomic sequences. During training, the model is presented with DNA sequences in which individual positions are masked. The model then attempts to predict the correct base using the surrounding context. This procedure enables the model to learn statistical patterns and structural features of DNA.
By capturing these context-dependent relationships, DNA Language Models can identify informative regions within the genome. Deviations between model predictions and observed bases may indicate potentially deleterious mutations.
Our research examines the applicability of such models to previously uncharacterized species, particularly when training is performed on closely related sister species. We also analyze differences in the resulting predictive motifs to identify both species-specific and conserved patterns in genomic sequences.
Population Genetics
Population genetics studies the genetic variability within populations. It uses mathematical models, such as the Wright-Fisher process or the Kingman coalescent, to describe evolutionary dynamics and to compare observed genetic variation with expectations under stochastic models. Deviations between expected and observed patterns can provide evidence for adaptive processes.
Our research focuses on how different evolutionary forces—such as genetic drift, recombination, natural selection, migration, changes in population size, and gene conversion—shape genetic variation within populations. We also develop and apply statistical methods to detect and quantify the impact of these forces in genomic data.
Gene duplications
Gene duplication is an important mechanism in genome evolution, generating additional copies of genes that can contribute to adaptation and functional diversification. Duplicated genes may become fixed in populations or exist as copy number variants, providing raw material for evolutionary innovation.
Duplications can have immediate effects on gene dosage, potentially altering the expression levels of the duplicated genes and affecting cellular processes. Over time, duplicated genes may undergo neofunctionalization, acquiring new functions that were not present in the original gene. In other cases, diversifying selection can act on gene copies, promoting the retention of variants that confer distinct or complementary functions within the genome.
Gene copies are often subject to gene conversion, a process that can homogenize sequences between copies. The number of gene copies in a genome can change over time through mechanisms such as duplication, gene loss, or unequal recombination and selective pressures.