Skip to content

SELECTED PUBLICATIONS

2026
Single-Cell Profiling Reveals RAB13+ Endothelial Cells and Profibrotic Mesenchymal Cells in Aged Human Bone Marrow
Aging Cell, 04/2026. Aging Cell

Aging alters the bone marrow microenvironment, affecting key cell populations involved in hematopoiesis. Single-cell RNA sequencing reveals transcriptional changes in endothelial and mesenchymal stromal cells. Aged endothelial cells show prothrombotic features and reduced mitochondrial and vascular function, along with a new arterial subset. Stromal cells exhibit impaired matrix remodeling, partly driven by a profibrotic cell population absent in younger individuals. Spatial analyses confirm these aging-associated changes, providing insight into mechanisms and potential therapeutic targets.

Predicting and interpreting cell-type-specific drug responses in the small-data regime using inductive priors
Nature Machine Intelligence, 03/2026. Nature Machine Intelligence | GitHub

Predicting cell-type-specific responses to small molecules is key for drug discovery but remains difficult. A graph-based deep learning approach models transcriptional responses using cell-type-specific co-expression networks. It captures gene interactions and enables interpretable, gene-level predictions. Across multiple single-cell and bulk datasets, it generalizes to unseen drugs and cell types, outperforming baseline methods. The approach supports scalable and accurate prediction of drug effects at cell-type resolution.

Human neuronal differentiation under Aβ exposure: a single-cell transcriptomic and epigenomic dataset
Scientific data, 03/2026. Scientific data | GSE307094

Human iPSC-derived neural progenitor cells are used to model early neuronal differentiation across four time points, with and without amyloid-β exposure. A multimodal single-cell dataset combines scRNA-seq and scATAC-seq to capture both transcriptional and chromatin dynamics. After quality control, tens of thousands of cells were analyzed to characterize cell composition, signaling pathways, and regulatory networks. The dataset is also compared with human hippocampal bulk RNA-seq data. It serves as a reference resource for studying neuronal differentiation and multimodal single-cell analysis.

Single-cell and spatial transcriptomic profiling of cardiac fibroblasts following myocardial infarction
Scientific data, 01/2026. Scientific data

Cardiac fibroblasts are essential for heart repair after myocardial infarction, with a reparative subset driving scar formation and preventing rupture. The timing and molecular basis of their emergence are not well defined. A multi-modal dataset captures early transcriptional changes using bulk RNA sequencing, in situ hybridization, and spatial transcriptomics. This enables mapping of the transition from activated fibroblasts to reparative states across time and tissue. The dataset provides a resource to study fibroblast heterogeneity and repair mechanisms.

2025
Leveraging network motifs to improve artificial neural networks
Nature communications, 12/2025. Nature commmunications | GitHub

Small structural patterns within neural networks (three-node motifs) play a key role in performance. Analysis of hundreds of thousands of motifs and training experiments shows that incoherent loops offer better representational capacity, numerical stability, and robustness to noise. In contrast, coherent loops tend to focus on high-gradient regions during learning, which can reduce stability. By avoiding this behavior, incoherent structures enable more stable adaptation across tasks, highlighting the importance of motif design for building more reliable and accurate neural networks.

Uncovering the regulatory landscape of early human B cell lymphopoiesis and its implications in the pathogenesis of B-ALL
ScienceAdvances, 10/2025. ScienceAdvances | Atlas

A multiomics atlas of chromatin accessibility and gene expression across early human B cell precursors reveals cell type–specific regulatory elements and reconstructs differentiation networks. Candidate regulons, such as ELK3, were validated using single-cell data, refining the regulatory landscape. This publicly available resource provides key insights into B cell development and disease, supporting studies of immunity and hematologic malignancies.

GeneSetCluster 2.0: a comprehensive toolset for summarizing and integrating gene-sets analysis
BMC Bioinformatics, 08/2025. BMC Bioinformatics | GitHub | ShinyApp

Gene-Set Analysis (GSA) often struggles with redundancies that complicate clustering and interpretation. GeneSetCluster 2.0 addresses this with improved methods for handling duplicated gene-sets, a seriation-based clustering algorithm, faster computation, and enhanced cluster annotations linking results to tissues and biological processes. A user-friendly web application and R package make the tool accessible to both programmers and non-programmers, enabling efficient and interpretable gene-set analyses.

SPELL: Spatial Prompting with Chain-of-Thought for Zero-Shot Learning in Spatial Transcriptomics
ICLR Singapur Conference, 04/2025. ICLR | Poster

SPELL introduces a zero-shot learning framework for cell-type classification in spatial transcriptomics, integrating spatial embeddings and chain-of-thought prompting. Using graph autoencoders and BART models, it achieves high accuracy (e.g., 64% on MERFISH) without task-specific fine-tuning. Spatial context significantly enhances performance, highlighting its critical role in biologically interpretable classification across diverse datasets.

stDiffusion: A Diffusion Based Model for Generative Spatial Transcriptomics
ICLR Singapur Conference, 04/2025ICLR | Poster

stDiffusion employs a denoising diffusion probabilistic model to generate spatial transcriptomics data, predicting unseen tissue slices. It learns 2D gene expression patterns and interpolates between finite ST slices, advancing AI-augmented spatial transcriptomics. The model sets the stage for predictive 3D tissue modeling from limited data.

Interpretable Causal Representation Learning for Biological Data in the Pathway Space
ICLR Singapur Conference, 04/2025ICLR | Poster

SENA-discrepancyVAE enhances causal representation learning by mapping latent factors to interpretable biological processes. It predicts the effects of genomic and drug perturbations while maintaining performance comparable to non-interpretable models. The SENA-δ encoder ensures biologically meaningful causal factors, improving therapy development.

NLRP3-mediated glutaminolysis controls microglial phagocytosis to promote Alzheimer’s disease progression
Immunity, 02/2025. Immunity

NLRP3 inflammasome activation contributes to Alzheimer’s disease, but its broader roles were unclear. Evidence shows that Aβ deposition directly activates NLRP3, while its loss enhances glutamine/glutamate metabolism and mitochondrial activity in microglia. This shift increases Aβ clearance and induces epigenetic and transcriptional changes via α-ketoglutarate. The mechanism is conserved in human and mouse cells. Chronic inhibition of NLRP3 reproduces these effects, highlighting its role in regulating metabolism and disease progression.

2024
Reviewability and supportability: New complementary principles to empower research software practices
Computational and Structural Biotechnology Journal, 12/2024Computational and Structural Biotechnology Journal | GitHub

This review proposes reviewability and supportability as principles to enhance research software, complementing FAIR principles. It highlights software’s role in reproducibility and transparency in life sciences. The principles aim to improve peer review efficiency and guide scientists in developing robust research software.

ClustAll: An R package for patient stratification in complex diseases
PLOS Computational Biology, 12/2024PLOS Computational Biology | Bioconductor | GitHub

ClustAll is a Bioconductor R package for unsupervised patient stratification in complex diseases using clinical data. Built on a validated clustering framework, it handles mixed data types, missing values, and collinearity, identifying multiple robust stratifications within a population. It uses parallel computing and user-friendly tools, validated on public clinical datasets for personalized medicine.

A comparative analysis of blastoid models through single-cell transcriptomics
iScience, 11/2024iScience | Atlas

This study uses single-cell RNA sequencing to compare blastoid models with human blastocysts, assessing cell-type composition and lineage profiles. Blastoids from naive pluripotent stem cells resemble blastocysts more closely than those from extended pluripotent stem cells, which show higher primitive endoderm and ambiguous cells. Gene expression heterogeneity in starting cell lines influences blastoid lineage differentiation, aiding optimization of embryogenesis models.

Derivation of two iPSC lines (KAIMRCi004-A, KAIMRCi004-B) from a Saudi patient with Biotin-Thiamine-responsive Basal Ganglia Disease (BTBGD) carrying homozygous pathogenic missense variant in the SCL19A3 gene
Human Cell, 09/2024Human Cell

Biotin-thiamine-responsive basal ganglia disease is a rare genetic neurological disorder caused by mutations in the SLC19A3 gene. Patient-derived induced pluripotent stem cells were generated and differentiated into neural progenitors to model the disease in vitro. This system provides a way to study how the mutation affects cellular function and contributes to disease pathology. The results establish a functional cellular model that enables investigation of underlying mechanisms. It offers a foundation for improving understanding of the disease and supporting the development of targeted therapies.

Competition shapes the landscape of X-chromosome-linked genetic diversity
Nature Genetics, 07/2024Nature Genetics

X chromosome inactivation creates distinct cell clones within individuals, each expressing different X-linked variants. Using variation in the STAG2 gene, it is shown that these clones contribute normally to most tissues but fail to generate lymphocytes. This absence is not only due to intrinsic defects but also to competition from wild-type clones. These interactions actively shape how genetic diversity is expressed across cell types. Overall, clonal competition influences the impact of X-linked variation in a tissue-specific manner.

Global compositional and functional states of the human gut microbiome in health and disease
Genome Research, 06/2024Genome Research

This study analyzes 6,014 gut metagenome samples across 19 countries and 23 diseases to map microbial diversity and function. It identifies key bacteria like Fusobacterium nucleatum (enriched) and Anaerostipes hadrus (depleted) in disease cohorts, revealing distinct functional profiles in westernized and nonwesternized populations. The findings are accessible via the Human Gut Microbiome Atlas for exploring microbiota signatures.

An atlas of cells in the human tonsil
Immunity, 02/2024. Immunity | GitHub

A comprehensive tonsil cell atlas was created using 556,000 cells profiled via multi-modal single-cell and spatial transcriptomics. The atlas identifies 121 cell types and states, mapping developmental trajectories and functional units critical for immunological defense against pathogens. It defines immune cell functions, developmental trajectories, and validates findings in lymphoma, revealing age-related shifts in tonsillar composition.

2023
Whispering LLaMA: A Cross-Modal Generative Error Correction Framework for Speech Recognition
Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing, 12/2023. Conference | GitHub | Video

Whispering LLaMA introduces a cross-modal framework for generative error correction in speech recognition, using acoustic and linguistic data. It improves word error rate by 37.66% compared to n-best Oracle, leveraging pre-trained models. The open-source code encourages further research.

Reusability report: Learning the transcriptional grammar in single-cell RNA-sequencing data using transformers
Nature Machine Intelligence, 11/2023. Nature Machine Intelligence | GitHub

This report evaluates scBERT, a transformer-based model, for annotating cell types in single-cell RNA-seq data. It leverages pretraining and self-attention to learn transcriptional patterns but is sensitive to imbalanced cell-type distributions. Subsampling and oversampling techniques mitigate this, enhancing generalizability in single-cell genomics.

Gene therapy restores the transcriptional program of hematopoietic stem cells in Fanconi anemia
Haematologica, 10/2023. Haematologica

This study uses single-cell RNA sequencing to show that lentiviral gene therapy corrects the transcriptional defects in hematopoietic stem and progenitor cells (HSPCs) of Fanconi anemia patients. It demonstrates that corrected HSPCs resemble healthy cells, with downregulated TGF-β and p21 and upregulated DNA repair pathways, suggesting gene therapy can reverse molecular defects in Fanconi anemia HSPCs.

A second update on mapping the human genetic architecture of COVID-19
Nature, 09/2023. Nature

A large-scale genome-wide association study analyzed host genetic factors influencing COVID-19 susceptibility and severity across diverse populations. The analysis identified 51 genetic loci linked to infection risk and disease severity, highlighting key biological pathways including viral entry, airway defense, and type I interferon response Results show that genetic variation influences both susceptibility to infection and progression to severe disease through distinct mechanisms.

LEP-AD: Language Embeddings of Proteins and Attention to Drugs predicts drug target interactions
ICLR Conference, 04/2023ICLR | GitHub

LEP-AD combines Evolutionary Scale Modeling (ESM-2) and Transformer-GCN to predict drug-target interactions, outperforming methods like SimBoost and DeepCPI. It achieves state-of-the-art binding affinity predictions using pre-trained protein language models across multiple datasets (e.g., Davis, KIBA). Pre-trained protein embeddings surpass AlphaFold 3D representations, scaling well with training data size.

Preclinical models for prediction of immunotherapy outcomes and immune evasion mechanisms in genetically heterogeneous multiple myeloma
Nature Medicine, 03/2023. Nature Medicine

This study develops 15 genetically diverse mouse models of multiple myeloma to study immunotherapy outcomes. A MAPK–MYC pathway accelerates progression, influencing immune evasion. Rapid MYC-driven progressors show high CD8+ T cell activation with low Treg cells, while slow progressors have higher Treg infiltration. High CD8+ T/Treg ratios predict immunotherapy response, guiding strategies to overcome resistance.

Translating single-cell genomics into cell types
Nature Machine Intelligence, 01/2023. Nature Machine Intelligence

This news piece discusses machine translation techniques that automatically classify cell types from single-cell transcriptomic data. It highlights potential for analyzing complex clinical samples like tumors at scale, advancing precision medicine.

2022
Data-driven bioinformatics to disentangle cells within a tissue microenvironment
Trends in Cell Biology, 06/2022Trends in Cell Biology

This spotlight showcases how machine learning advances molecular profiling of clinical tissues by enabling the deconvolution of mixed cell types and the identification of population shifts in response to infections or drug treatments. It emphasizes detecting cellular changes in response to infections or drugs, supporting precision medicine through molecular profiling.

Deconvolution of the hematopoietic stem cell microenvironment reveals a high degree of specialization and conservation
iScience, 04/2022. iScience

This study integrates single-cell RNA-seq datasets to map the hematopoietic stem cell microenvironment, identifying 14 endothelial and 11 mesenchymal cell states. It reveals high specialization and conserved regulatory features across species, advancing bone marrow microenvironment understanding.

2021
Mapping the human genetic architecture of COVID-19
Nature, 07/2021Nature | GitHub

This GWAS of 49,562 COVID-19 cases across 46 studies identifies 13 loci associated with SARS-CoV-2 infection and severity, implicating lung, autoimmune, and inflammatory pathways. Mendelian randomization supports smoking and BMI as causal risk factors. It identifies actionable mechanisms for therapeutic development and informs future genetic studies of pandemics

A robust machine learning framework to identify signatures for frailty: a nested case-control study in four aging European cohorts
GeroScience, 02/2021. GeroScience

This study uses machine learning to identify frailty biomarkers across four aging cohorts, analyzing genomic, proteomic, and metabolomic data. It finds protective (vitamin D3, lutein zeaxanthin, miRNA125b-5p) and risk (cardiac troponin T, pro-BNP, sRAGE) biomarkers. Oxidative stress, vitamin D, and cardiovascular markers vary by disability status, offering insights into multi-systemic pathological processes.

2020
Harmonization of quality metrics and power calculation in multi-omic studies
Nature Communications, 06/2020Nature Communications | GitHub

MultiPower method harmonizes quality metrics across omic platforms, estimating optimal sample sizes for multi-omic experiments. Complemented by MultiML, it supports machine learning classification tasks, offering graphical tools for experimental design. The approach ensures robust multi-omic data analysis, enhancing the reliability of comprehensive cellular models in diverse experimental settings.

2019
2018
2017
2016
Building gene regulatory networks from scATAC-seq and scRNA-seq using Linked Self Organizing Maps
PLOS Computational Biology, 11/2019. PLOS Computational Biology

SOMatic uses self-organizing maps to integrate scATAC-seq and scRNA-seq, building gene regulatory networks. Applied to a B cell differentiation time-course with Ikaros overexpression, it recovers known interactions and predicts new Ikaros targets. The method overcomes challenges of sparse and noisy single-cell data, enabling integrative analysis of heterogeneous genomic datasets and advancing regulatory network discovery.

Multiple sclerosis genomic map implicates peripheral immune cells and microglia in susceptibility
Science, 09/2019Science

This study identifies over 200 risk loci for multiple sclerosis (MS) using genome-wide association studies, implicating immune cells and microglia in susceptibility. The analysis explains up to 48% of MS’s genetic contribution, highlighting immune pathways. It implicates immune cells and microglia, with enrichment in brain-resident immune cells, clarifying MS susceptibility mechanisms.

An Algorithmic Information Calculus for Causal Discovery and Reprogramming Systems
iScience, 09/2019iScience

This method uses algorithmic information content to control and reprogram systems via controlled interventions. Validated on cellular automata, graphs, and biological networks (e.g., E. coli, Th17 cells), it reconstructs phase spaces and predicts causal interactions for therapeutic applications.

Therapeutic efficacy of dimethyl fumarate in relapsing-remitting multiple sclerosis associates with ROS pathway in monocytes
Nature Communications, 07/2019Nature Communications

This study links dimethyl fumarate’s efficacy in multiple sclerosis to increased monocytic ROS, with changes in monocyte methylome and transcriptome preceding T cell effects. A NOX3 gene variant is linked to beneficial treatment response. Monocyte counts and redox state serve as potential biomarkers for DMF therapy decisions, implicating oxidative processes in autoimmune disease treatment.

Combining evidence from four immune cell types identifies DNA methylation patterns that implicate functionally distinct pathways during Multiple Sclerosis progression
EBioMedicine, 05/2019EBioMedicine

The omicsNPC framework integrates DNA methylation data from four immune cell types, identifying changes in relapsing-remitting (RRMS) and secondary progressive (SPMS) multiple sclerosis. RRMS shows lymphocyte signaling and T cell activation, while SPMS implicates myeloid metabolism and neuronal pathways. Shared methylation patterns co-localize with MS risk loci, offering insights into disease progression and pathogenesis

Causal deconvolution by algorithmic generative models
Nature Machine Intelligence, 01/2019Nature Machine Intelligence | GitHub

This study introduces a parameter-free, algorithmic probability-based method to deconvolve complex interactions into generative models, applied to bit strings, images, and networks. It successfully infers generative models for bit strings, images, and networks, complementing statistical approaches to tackle causation in complex systems.

DNA methylation as a mediator of HLA-DRB1*15:01 and a protective variant in multiple sclerosis
Nature Communications, 06/2018Nature Communications

The HLA-DRB1*15:01 haplotype, a major multiple sclerosis risk factor, is hypomethylated in monocytes, driving increased expression. A differentially methylated region in HLA-DRB1 exon 2 regulates expression, while a protective variant (rs9267649) increases methylation, reducing risk. Causal inference supports HLA variants’ role in MS via methylation changes, suggesting therapeutic strategies targeting epigenetic regulation.

A Decomposition Method for Global Evaluation of Shannon Entropy and Local Estimations of Algorithmic Complexity
Entropy, 04/2018Entropy

The Block Decomposition Method extends the Coding Theorem Method to estimate algorithmic complexity by decomposing objects into small programs. It performs well on low-complexity objects but aligns with Shannon entropy when less accurate, offering multi-dimensional applications.

Low-algorithmic-complexity entropy-deceiving graphs
Physical Review E, 07/2017Physical Review E

Using Borel-normal integer sequences, this study constructs recursive and nonrecursive graphs to expose limitations of entropy-based complexity measures. Different lossless descriptions of the same graph yield disparate entropy values, misrepresenting causal likelihood. The approach highlights the dependence of computable measures on object representation, advocating for algorithmic complexity metrics in graph analysis.

A survey of best practices for RNA-seq data analysis
Genome Biology, 01/2016Genome Biology

This review outlines RNA-seq analysis steps, including experimental design, quality control, read alignment, and differential expression analysis. It addresses challenges in quantifying gene/transcript levels, detecting alternative splicing, and integrating with other genomics techniques. It discusses small RNA analysis and integration with other genomics techniques