Machine Learning for Genomics Explorations (MLGenX)


Our limited understanding of the biological mechanisms underlying diseases remains a critical bottleneck in drug discovery. As a result, we often lack insights into why patients develop specific conditions, leading to the failure of many drug candidates in clinical trials. Recent advancements in genomics platforms and the emergence of diverse omics datasets have sparked increasing interest in this field. The primary objective of this workshop is to bridge the gap between machine learning and genomics, emphasizing target identification and emerging drug modalities such as gene and cell therapies and RNA-based drugs. By fostering interdisciplinary collaboration, we aim to advance the integration of these disciplines and accelerate innovation in drug discovery.




📢 MLGenX Workshop will take place on Sunday April 27th in Room Garnet 212–213!


Schedule

Time Title Presenter
09:00 - 09:15 Opening Remark
09:15 - 09:35 (Oral Presentation) Test-Time View Selection for Multi-Modal Decision Making Eeshaan Jain
09:40 - 10:00 (Oral Presentation) Efficient Fine-Tuning Of Single-Cell Foundation Models Enables Zero-Shot Molecular Perturbation Prediction Sepideh Maleki
10:00 - 10:15 Coffee Break
10:15 - 11:00 (Invited Talk) Empowering Biomedical Discovery with "AI Scientists" Marinka Zitnik
11:05 - 11:25 (Oral Presentation) "LangPert: LLM-Driven Contextual Synthesis for Unseen Perturbation Prediction" Kaspar Märtens
11:30 - 13:00 Lunch Break
13:00 - 13:45 (Invited Talk) Jure Leskovec Jure Leskovec
13:45 - 14:40 (Panel Discussion) AI Agents in Biology and Drug Discovery - Limsoon Wong, Gabriele Scalia, Daniel Burkhardt, Aya Abdelsalam Ismail, Jan-Christian Hütter
14:40 - 15:00 (Oral Presentation) AI Agent For Data-Driven Hypothesis Exploration In Single-Cell Transcriptomics Artemy Bakulin
15:00 - 15:15 Coffee Break
15:15 - 16:00 (Invited Talk) The Next Genomics Frontier: From Risk Prediction to Digital Twins Mihaela van der Schaar
16:00 - 17:25 Poster Session - In Person
17:25 - 17:30 Closing Remark
17:35 - 18:00 (Invited Talk) Large Language Models and AI Agents in Cancer Research and Oncology Jakob Nikolas Kather


Tentative Speakers & Panelists

Marinka Zitnik

Marinka Zitnik

Harvard University
Jure Leskovec

Jure Leskovec

Stanford University
Shekoofeh Azizi

Shekoofeh Azizi

Google DeepMind
Mihaela van der Schaar

Mihaela van der Schaar

University of Cambridge
Djork-Arne Clevert

Djork-Arne Clevert

Pfizer
Limsoon Wong

Limsoon Wong

National University of Singapore
Jakob Nikolas Kather

Jakob Nikolas Kather

NCT, TDU
Daniel Burkhardt

Daniel Burkhardt

NVIDIA
Aya Abdelsalam Ismail

Aya Abdelsalam Ismail

Guide Labs


Accepted Papers

Spotlight Papers Authors
ProteinGPT: Multimodal LLM for Protein Property Prediction and Structure Understanding Yijia Xiao et al.
EFFICIENT FINE-TUNING OF SINGLE-CELL FOUNDATION MODELS ENABLES ZERO-SHOT MOLECULAR PERTURBATION PREDICTION Sepideh Maleki et al.
ESM-Effect: An Effective and Efficient Fine-Tuning Framework towards accurate prediction of Mutation's Functional Effect Moritz Glaser et al.
InfoSEM: A Deep Generative Model with Informative Priors for Gene Regulatory Network Inference Tianyu Cui et al.
RAG-ESM: Improving pretrained protein language models via sequence retrieval Damiano Sgarbossa et al.
Decision Tree Induction with Dynamic Feature Generation: A Framework for Interpretable DNA Sequence Analysis Nicolas Huynh et al.
Relaxed Equivariance via Multitask Learning Ahmed A. A. Elhag et al.
Learning Non-Equilibrium Signaling Dynamics in Single-Cell Perturbation Dynamics Heman Shakeri et al.
AI-Powered Virtual Tissues from Spatial Proteomics for Clinical Diagnostics and Biomedical Discovery Johann Wenckstern et al.
Learning Representations of Instruments for Partial Identification of Treatment Effects Jonas Schweisthal et al.
RAG-Enhanced Collaborative LLM Agents for Drug Discovery Namkyeong Lee et al.
AI AGENT FOR DATA-DRIVEN HYPOTHESIS EXPLORATION IN SINGLE-CELL TRANSCRIPTOMICS Artemy Bakulin et al.
Test-Time View Selection for Multi-Modal Decision Making Eeshaan Jain et al.
LangPert: LLM-Driven Contextual Synthesis for Unseen Perturbation Prediction Kaspar Märtens et al.

Poster Papers Authors
SPACE: Your Genomic Profile Predictor is a Powerful DNA Foundation Model Jiwei Zhu et al.
Uncertainty-aware genomic deep learning with knowledge distillation Jessica Zhou et al.
Flexible Models of Functional Annotations to Variant Effects using Accelerated Linear Algebra Alan Nawzad Amin et al.
Sampling Protein Language Models for Functional Protein Design Jeremie Theddy Darmawan et al.
Supervised Contrastive Block Disentanglement Taro Makino et al.
Graph Pseudotime Analysis and Neural Stochastic Differential Equations for Analyzing Retinal Degeneration Dynamics and Beyond Dai Shi et al.
Structure-based metabolite function prediction using graph neural networks Tancredi Cogne et al.
Spatially-Informed Sampling Enables Accurate Prediction of Large-Scale Mutational Effects Maxime Basse et al.
Piloting Structure-Based Drug Design via Modality-Specific Optimal Schedule Keyue Qiu et al.
Detecting cell level transcriptomic changes of Perturb-seq using Contrastive Fine-tuning of Single-Cell Foundation Models Wenmin Zhao et al.
MolCap-Arena: A Comprehensive Captioning Benchmark on Language-Enhanced Molecular Property Prediction Carl Edwards et al.
A Scalable LLM Framework for Therapeutic Biomarker Discovery: Grounding Q/A Generation in Knowledge Graphs and Literature Marc Boubnovski Martell et al.
HybriDNA: A Hybird Transformer-Mamba2 Long-Range DNA Language Model Mingqian Ma et al.
ShortListing Model: A Streamlined Simplex Diffusion for Biological Sequence Generation Yuxuan Song et al.
Predicting Drug-likeness via Biomedical Knowledge Alignment and EM-like One-Class Boundary Optimization Dongmin Bang et al.
Large Language Models for Zero-shot Inference of Causal Structures in Biology Izzy Newsham et al.
LLM4GRN: Discovering Causal Gene Regulatory Networks with LLMs – Evaluation through Synthetic Data Generation Tejumade Afonja et al.
A data-driven recommendation framework for genomic discovery Ying Yang et al.
Integrating Protein Language Model and Active Learning for Few-Shot Viral Variant Detection Marian Huot et al.
Curly Flow Matching for Learning Non-gradient Field Dynamics Katarina Petrović et al.
PIONEER: a virtual platform for iterative improvement of genomic deep learning Alessandro Crnjar et al.
GENATATOR: de novo Gene Annotation With DNA Language Model Aleksei Shmelev et al.
Leveraging GPT Continual Fine-Tuning for Improved RNA Editing Site Prediction Zohar Rosenwasser et al.
COLOR: A COMPOSITIONAL LINEAR OPERATION BASED REPRESENTATION OF PROTEIN SEQUENCES FOR IDENTIFICATION OF MONOMER CONTRIBUTIONS TO PROPERTIES Akash Pandey et al.
CellMemory: Hierarchical Interpretation of Out-of-Distribution Cells Using Bottlenecked Transformer Qifei Wang et al.
PREDICTING TIME-VARYING METABOLIC DYNAMICS USING STRUCTURED NEURAL ODE PROCESSES Santanu Rathod et al.
RNAGym: Benchmarks for RNA Fitness and Structure Prediction Rohit Arora et al.
SpaceDX: A Bayesian test for localized differential expression in population-level spatial transcriptomics datasets Niklas Stotzem et al.
GraphPINE: Graph importance propagation for interpretable drug response prediction Yoshitaka Inoue et al.
ECG-Nest-FM: A Frequency-Focused ECG Foundation Model with Nested Embeddings Abhishek Sharma et al.
NOLAN: SELF-SUPERVISED FRAMEWORK FOR MAPPING CONTINUOUS TISSUE ORGANIZATION Artemy Bakulin et al.
Featurization of sinlge cell trajectories through kernel mean embedding of optimal transport maps Alec Plotkin et al.
Enhancing DNA Foundation Models to Address Masking Inefficiencies Monireh Safari et al.
Building Foundation Models to Characterize Cellular Interactions via Geometric Self-Supervised Learning on Spatial Genomics Yuning You et al.
Uncovering BioLOGICAL Motifs and Syntax via Sufficient and Necessary Explanations Beepul Bharti et al.
LoFTPat: Low-Rank Subspace Optimization for Parameter-Efficient Fine-Tuning of Genomic Language Models in Pathogenicity Identification Sajib Acharjee Dip et al.
Capturing functional context of genetic pathways through hyperedge disentanglement Yoonho Lee et al.
A Topologically Guided Machine Learning Framework for Enhanced Fine-Mapping in Whole-Genome Bacterial Studies Tamsin Emily James et al.
Multiplexed DNA Assembly from Oligo-pools Shaozhong Zou et al.
When repeats drive the vocabulary: a Byte-Pair Encoding analysis of T2T primate genomes Marina Popova et al.
Talk2Biomodels and Talk2KnowledgeGraph: AI agent-based application for prediction of patient biomarkers and reasoning over biomedical knowledge graphs Gurdeep Singh et al.

Tiny Papers Authors
Exploring the potential of genetic variation and zygosity in DNA language models Ali Saadat et al.
Enhancing Downstream Analysis in Genome Sequencing: Species Classification While Basecalling Riselda Kodra et al.
Knockoff Statistics-Driven Interpretable Deep Learning Models for Uncovering Potential Biomarkers for COVID-19 Severity Prediction Qian Liu et al.
Benchmarking Fine-Tuned RNA Language Models for Intronic Branch Point Prediction Pablo Rodenas Ruiz et al.
HARMONY: A Multi-Representation Framework for RNA Property Prediction Junjie Xu et al.
MutEmbed: Self-Supervised Learning of Biological Latent Embeddings from Cancer Mutational Profiles Aakansha Narain et al.
Multi-modal single-cell foundation models via dynamic token adaptation Wenmin Zhao et al.
BEYOND SEQUENCE-ONLY MODELS: LEVERAGING STRUCTURAL CONSTRAINTS FOR ANTIBIOTIC RESISTANCE PREDICTION IN SPARSE GENOMIC DATASETS Mahbuba Tasmin et al.
Reference-free cell-type annotation with LLM agents Yidi Huang et al.
Multi-omic Causal Discovery using Genotypes and Gene Expression Stephen M. Asiedu et al.
2DE: a probabilistic method for differential expression across niches in spatial transcriptomics data Nathan Levy et al.
Searching for Phenotypic Needles in Genomic Haystacks: DNA Language Models for Sex Prediction Alla Chepurova et al.
Gradient-Based Gene Selection for Multimodal scRNA-seq Foundation Models Pakaphol Thadawasin et al.
Aligning Molecules and Fragments in a Shared Embedding Space for RL-Based Molecule Generation Youngkuk Kim et al.
DrugAgent: Multi-Agent Large Language Model-Based Reasoning for Drug-Target Interaction Prediction Yoshitaka Inoue et al.
LIMEADE: Local Interpretable Manifold Explanations for Dimension Evaluations Tarek M Zikry et al.
Helix-mRNA: A Hybrid Foundation Model For Full Sequence mRNA Therapeutics Matthew Wood et al.
stDiffusion: A Diffusion Based Model for Generative Spatial Transcriptomics Sumeer Ahmad Khan et al.
SPELL: Spatial Prompting with Chain-of-Thought for Zero-Shot Learning in Spatial Transcriptomics Sumeer Ahmad Khan et al.
Enhancing E. coli Genomic Analysis with Retrieval-Augmented Generation KRITIKA CHUGH et al.
Transferring Preclinical Drug Response to Patient via Tumor Heterogeneity-Aware Alignment and Perturbation Modeling Inyoung Sung et al.
PRISM: Enhancing Protein Inverse Folding through Fine-Grained Retrieval on Structure-Sequence Multimodal Representations Sazan Mahbub et al.
Interpretable prediction of DNA replication origins in S. cerevisiae using attention-based motif discovery Zohreh Piroozeh et al.
Interpretable single-cell perturbations from decoder gradients Andreas Bjerregaard et al.
To trap or not to trap--analyzing the trade-offs in diffusion transport models Rushmila Shehreen Khan et al.
Gene Set Function Discovery with LLM-Based Agents and Knowledge Retrieval Daniela Pinto Veizaga et al.
FACA-GEN: Investigating Bias and Generalization in Active Learning for Genomics AI Amber Qayum Hawabaz et al.
BirdieDNA: Reward-Based Pre-Training for Genomic Sequence Modeling Sam Blouir et al.
WASSERSTEIN CYCLEGAN FOR SINGLE-CELL RNA- SEQ DATA GENERATION USING CROSS-MODALITY TRANSLATION Sajib Acharjee Dip et al.
Pathway-Attentive GAN for Interpretable Biomolecular Design Azmine Toushik Wasi et al.
Hallucinations vs. Predictions: Reframing Uncertainty in LLM-Generated Medical Responses Saleh Afroogh et al.

Camera Ready Instructions

An email with instructions for uploading camera-ready submissions will go out mid-March 2025. To prepare your camera ready, please use the MLGenX 2025 template style.

  • The authors can use one additional page beyond the page limit specified during the submission (9 pages for main/special track papers and 5 pages for tiny track papers). This extra page can be used to appropriately address the comments received during the review process.

Important Dates

All deadlines are 11:59 pm UTC -12h ("Anywhere on Earth"). All authors must have an OpenReview profile when submitting.

  • Submission Deadline (Main and Special Tracks - Up to 8 pages): February 12, 2025 February 16, 2025
  • Submission Deadline (Tiny Papers Track - Up to 4 pages): February 23, 2025
  • Acceptance Notification: March 5, 2025
  • Camera-Ready Deadline: April 24, 2025
  • Workshop Date: Sunday, April 27, 2025

Organizers

Ehsan Hajiramezanali
Ehsan Hajiramezanali
Aviv Regev
Aviv Regev
Fabian Theis
Fabian Theis
Arman Hasanzadeh
Arman Hasanzadeh
Mengdi Wang
Mengdi Wang
Tommaso Biancalani
Tommaso Biancalani
Sara Mostafavi
Sara Mostafavi
Aïcha Bentaieb
Aïcha Bentaieb
Gabriele Scalia
Gabriele Scalia
Edward De Brouwer
Edward De Brouwer

Student Organizers

Namkyeong Lee
Namkyeong Lee
Sofia Kapsiani
Sofia Kapsiani