Deep learning for transcription factors binding sites predictions

Publication (in preparation): maxATAC-v2 infers nucleosome positions from single-cell ATAC-seq training data for improved genome-wide transcription factor binding site prediction

Poster presentations:

Transcription factors (TF) regulate gene expression by binding to specific DNA sequences called motifs. However, the presence of motifs does not guarantee TF binding in vivo, due to other epigenetics factors such as chromatin accessibility, DNA methylation marks, … In this project, we investigate the use of deep learning to predict cell-specific TF binding across 127 human TFs, utilizing inputs from the reference DNA sequence and ATAC-seq signal. The work utilizes the transformer architecture to integrate multimodal epigenetic inputs and predict binding at 32-bp resolution.

My contributions include building the prototype transformer, benchmarking its performance on held-out hematopoietic stem cells, curating and cross-correlating bulk and single-cell ATAC-seq data, and leading analyses for attention matrix visualizations for interpretability.

My utmost gratitude to Dr. Matthew Weirauch and Dr. Emily Miraldi for allowing me to work on the project!

maxATAC v2 final architecture