Machine Learning for Genomics Explorations (MLGenX)

Our limited understanding of the biological mechanisms underlying diseases remains a critical bottleneck in drug discovery. As a result, we often lack insights into why patients develop specific conditions, leading to the failure of many drug candidates in clinical trials. Recent advancements in genomics platforms and the emergence of diverse omics datasets have sparked increasing interest in this field. The primary objective of this workshop is to bridge the gap between machine learning and genomics, emphasizing target identification and emerging drug modalities such as gene and cell therapies and RNA-based drugs. By fostering interdisciplinary collaboration, we aim to advance the integration of these disciplines and accelerate innovation in drug discovery.

📢 MLGenX Workshop will take place on Sunday April 27th in Room Garnet 212–213!

Schedule

Time	Title	Presenter
09:00 - 09:15	Opening Remark
09:15 - 09:35	(Oral Presentation) Test-Time View Selection for Multi-Modal Decision Making	Eeshaan Jain
09:40 - 10:00	(Oral Presentation) Efficient Fine-Tuning Of Single-Cell Foundation Models Enables Zero-Shot Molecular Perturbation Prediction	Sepideh Maleki
10:00 - 10:15	Coffee Break
10:15 - 11:00	(Invited Talk) Empowering Biomedical Discovery with "AI Scientists"	Marinka Zitnik
11:05 - 11:25	(Oral Presentation) "LangPert: LLM-Driven Contextual Synthesis for Unseen Perturbation Prediction"	Kaspar Märtens
11:30 - 13:00	Lunch Break
13:00 - 13:45	(Invited Talk) Jure Leskovec	Jure Leskovec
13:45 - 14:40	(Panel Discussion) AI Agents in Biology and Drug Discovery - Limsoon Wong, Gabriele Scalia, Daniel Burkhardt, Aya Abdelsalam Ismail, Jan-Christian Hütter
14:40 - 15:00	(Oral Presentation) AI Agent For Data-Driven Hypothesis Exploration In Single-Cell Transcriptomics	Artemy Bakulin
15:00 - 15:15	Coffee Break
15:15 - 16:00	(Invited Talk) The Next Genomics Frontier: From Risk Prediction to Digital Twins	Mihaela van der Schaar
16:00 - 17:25	Poster Session - In Person
17:25 - 17:30	Closing Remark
17:35 - 18:00	(Invited Talk) Large Language Models and AI Agents in Cancer Research and Oncology	Jakob Nikolas Kather

Tentative Speakers & Panelists

Marinka Zitnik
Harvard University

Jure Leskovec
Stanford University

Shekoofeh Azizi
Google DeepMind

Mihaela van der Schaar
University of Cambridge

Djork-Arne Clevert
Pfizer

Limsoon Wong
National University of Singapore

Jakob Nikolas Kather
NCT, TDU

Daniel Burkhardt
NVIDIA

Aya Abdelsalam Ismail
Guide Labs

Accepted Papers

Spotlight Papers	Authors
ProteinGPT: Multimodal LLM for Protein Property Prediction and Structure Understanding	Yijia Xiao et al.
EFFICIENT FINE-TUNING OF SINGLE-CELL FOUNDATION MODELS ENABLES ZERO-SHOT MOLECULAR PERTURBATION PREDICTION	Sepideh Maleki et al.
ESM-Effect: An Effective and Efficient Fine-Tuning Framework towards accurate prediction of Mutation's Functional Effect	Moritz Glaser et al.
InfoSEM: A Deep Generative Model with Informative Priors for Gene Regulatory Network Inference	Tianyu Cui et al.
RAG-ESM: Improving pretrained protein language models via sequence retrieval	Damiano Sgarbossa et al.
Decision Tree Induction with Dynamic Feature Generation: A Framework for Interpretable DNA Sequence Analysis	Nicolas Huynh et al.
Relaxed Equivariance via Multitask Learning	Ahmed A. A. Elhag et al.
Learning Non-Equilibrium Signaling Dynamics in Single-Cell Perturbation Dynamics	Heman Shakeri et al.
AI-Powered Virtual Tissues from Spatial Proteomics for Clinical Diagnostics and Biomedical Discovery	Johann Wenckstern et al.
Learning Representations of Instruments for Partial Identification of Treatment Effects	Jonas Schweisthal et al.
RAG-Enhanced Collaborative LLM Agents for Drug Discovery	Namkyeong Lee et al.
AI AGENT FOR DATA-DRIVEN HYPOTHESIS EXPLORATION IN SINGLE-CELL TRANSCRIPTOMICS	Artemy Bakulin et al.
Test-Time View Selection for Multi-Modal Decision Making	Eeshaan Jain et al.
LangPert: LLM-Driven Contextual Synthesis for Unseen Perturbation Prediction	Kaspar Märtens et al.

Poster Papers	Authors
SPACE: Your Genomic Profile Predictor is a Powerful DNA Foundation Model	Jiwei Zhu et al.
Uncertainty-aware genomic deep learning with knowledge distillation	Jessica Zhou et al.
Flexible Models of Functional Annotations to Variant Effects using Accelerated Linear Algebra	Alan Nawzad Amin et al.
Sampling Protein Language Models for Functional Protein Design	Jeremie Theddy Darmawan et al.
Supervised Contrastive Block Disentanglement	Taro Makino et al.
Graph Pseudotime Analysis and Neural Stochastic Differential Equations for Analyzing Retinal Degeneration Dynamics and Beyond	Dai Shi et al.
Structure-based metabolite function prediction using graph neural networks	Tancredi Cogne et al.
Spatially-Informed Sampling Enables Accurate Prediction of Large-Scale Mutational Effects	Maxime Basse et al.
Piloting Structure-Based Drug Design via Modality-Specific Optimal Schedule	Keyue Qiu et al.
Detecting cell level transcriptomic changes of Perturb-seq using Contrastive Fine-tuning of Single-Cell Foundation Models	Wenmin Zhao et al.
MolCap-Arena: A Comprehensive Captioning Benchmark on Language-Enhanced Molecular Property Prediction	Carl Edwards et al.
A Scalable LLM Framework for Therapeutic Biomarker Discovery: Grounding Q/A Generation in Knowledge Graphs and Literature	Marc Boubnovski Martell et al.
HybriDNA: A Hybird Transformer-Mamba2 Long-Range DNA Language Model	Mingqian Ma et al.
ShortListing Model: A Streamlined Simplex Diffusion for Biological Sequence Generation	Yuxuan Song et al.
Predicting Drug-likeness via Biomedical Knowledge Alignment and EM-like One-Class Boundary Optimization	Dongmin Bang et al.
Large Language Models for Zero-shot Inference of Causal Structures in Biology	Izzy Newsham et al.
LLM4GRN: Discovering Causal Gene Regulatory Networks with LLMs – Evaluation through Synthetic Data Generation	Tejumade Afonja et al.
A data-driven recommendation framework for genomic discovery	Ying Yang et al.
Integrating Protein Language Model and Active Learning for Few-Shot Viral Variant Detection	Marian Huot et al.
Curly Flow Matching for Learning Non-gradient Field Dynamics	Katarina Petrović et al.
PIONEER: a virtual platform for iterative improvement of genomic deep learning	Alessandro Crnjar et al.
GENATATOR: de novo Gene Annotation With DNA Language Model	Aleksei Shmelev et al.
Leveraging GPT Continual Fine-Tuning for Improved RNA Editing Site Prediction	Zohar Rosenwasser et al.
COLOR: A COMPOSITIONAL LINEAR OPERATION BASED REPRESENTATION OF PROTEIN SEQUENCES FOR IDENTIFICATION OF MONOMER CONTRIBUTIONS TO PROPERTIES	Akash Pandey et al.
CellMemory: Hierarchical Interpretation of Out-of-Distribution Cells Using Bottlenecked Transformer	Qifei Wang et al.
PREDICTING TIME-VARYING METABOLIC DYNAMICS USING STRUCTURED NEURAL ODE PROCESSES	Santanu Rathod et al.
RNAGym: Benchmarks for RNA Fitness and Structure Prediction	Rohit Arora et al.
SpaceDX: A Bayesian test for localized differential expression in population-level spatial transcriptomics datasets	Niklas Stotzem et al.
GraphPINE: Graph importance propagation for interpretable drug response prediction	Yoshitaka Inoue et al.
ECG-Nest-FM: A Frequency-Focused ECG Foundation Model with Nested Embeddings	Abhishek Sharma et al.
NOLAN: SELF-SUPERVISED FRAMEWORK FOR MAPPING CONTINUOUS TISSUE ORGANIZATION	Artemy Bakulin et al.
Featurization of sinlge cell trajectories through kernel mean embedding of optimal transport maps	Alec Plotkin et al.
Enhancing DNA Foundation Models to Address Masking Inefficiencies	Monireh Safari et al.
Building Foundation Models to Characterize Cellular Interactions via Geometric Self-Supervised Learning on Spatial Genomics	Yuning You et al.
Uncovering BioLOGICAL Motifs and Syntax via Sufficient and Necessary Explanations	Beepul Bharti et al.
LoFTPat: Low-Rank Subspace Optimization for Parameter-Efficient Fine-Tuning of Genomic Language Models in Pathogenicity Identification	Sajib Acharjee Dip et al.
Capturing functional context of genetic pathways through hyperedge disentanglement	Yoonho Lee et al.
A Topologically Guided Machine Learning Framework for Enhanced Fine-Mapping in Whole-Genome Bacterial Studies	Tamsin Emily James et al.
Multiplexed DNA Assembly from Oligo-pools	Shaozhong Zou et al.
When repeats drive the vocabulary: a Byte-Pair Encoding analysis of T2T primate genomes	Marina Popova et al.
Talk2Biomodels and Talk2KnowledgeGraph: AI agent-based application for prediction of patient biomarkers and reasoning over biomedical knowledge graphs	Gurdeep Singh et al.

Tiny Papers	Authors
Exploring the potential of genetic variation and zygosity in DNA language models	Ali Saadat et al.
Enhancing Downstream Analysis in Genome Sequencing: Species Classification While Basecalling	Riselda Kodra et al.
Knockoff Statistics-Driven Interpretable Deep Learning Models for Uncovering Potential Biomarkers for COVID-19 Severity Prediction	Qian Liu et al.
Benchmarking Fine-Tuned RNA Language Models for Intronic Branch Point Prediction	Pablo Rodenas Ruiz et al.
HARMONY: A Multi-Representation Framework for RNA Property Prediction	Junjie Xu et al.
MutEmbed: Self-Supervised Learning of Biological Latent Embeddings from Cancer Mutational Profiles	Aakansha Narain et al.
Multi-modal single-cell foundation models via dynamic token adaptation	Wenmin Zhao et al.
BEYOND SEQUENCE-ONLY MODELS: LEVERAGING STRUCTURAL CONSTRAINTS FOR ANTIBIOTIC RESISTANCE PREDICTION IN SPARSE GENOMIC DATASETS	Mahbuba Tasmin et al.
Reference-free cell-type annotation with LLM agents	Yidi Huang et al.
Multi-omic Causal Discovery using Genotypes and Gene Expression	Stephen M. Asiedu et al.
2DE: a probabilistic method for differential expression across niches in spatial transcriptomics data	Nathan Levy et al.
Searching for Phenotypic Needles in Genomic Haystacks: DNA Language Models for Sex Prediction	Alla Chepurova et al.
Gradient-Based Gene Selection for Multimodal scRNA-seq Foundation Models	Pakaphol Thadawasin et al.
Aligning Molecules and Fragments in a Shared Embedding Space for RL-Based Molecule Generation	Youngkuk Kim et al.
DrugAgent: Multi-Agent Large Language Model-Based Reasoning for Drug-Target Interaction Prediction	Yoshitaka Inoue et al.
LIMEADE: Local Interpretable Manifold Explanations for Dimension Evaluations	Tarek M Zikry et al.
Helix-mRNA: A Hybrid Foundation Model For Full Sequence mRNA Therapeutics	Matthew Wood et al.
stDiffusion: A Diffusion Based Model for Generative Spatial Transcriptomics	Sumeer Ahmad Khan et al.
SPELL: Spatial Prompting with Chain-of-Thought for Zero-Shot Learning in Spatial Transcriptomics	Sumeer Ahmad Khan et al.
Enhancing E. coli Genomic Analysis with Retrieval-Augmented Generation	KRITIKA CHUGH et al.
Transferring Preclinical Drug Response to Patient via Tumor Heterogeneity-Aware Alignment and Perturbation Modeling	Inyoung Sung et al.
PRISM: Enhancing Protein Inverse Folding through Fine-Grained Retrieval on Structure-Sequence Multimodal Representations	Sazan Mahbub et al.
Interpretable prediction of DNA replication origins in S. cerevisiae using attention-based motif discovery	Zohreh Piroozeh et al.
Interpretable single-cell perturbations from decoder gradients	Andreas Bjerregaard et al.
To trap or not to trap--analyzing the trade-offs in diffusion transport models	Rushmila Shehreen Khan et al.
Gene Set Function Discovery with LLM-Based Agents and Knowledge Retrieval	Daniela Pinto Veizaga et al.
FACA-GEN: Investigating Bias and Generalization in Active Learning for Genomics AI	Amber Qayum Hawabaz et al.
BirdieDNA: Reward-Based Pre-Training for Genomic Sequence Modeling	Sam Blouir et al.
WASSERSTEIN CYCLEGAN FOR SINGLE-CELL RNA- SEQ DATA GENERATION USING CROSS-MODALITY TRANSLATION	Sajib Acharjee Dip et al.
Pathway-Attentive GAN for Interpretable Biomolecular Design	Azmine Toushik Wasi et al.
Hallucinations vs. Predictions: Reframing Uncertainty in LLM-Generated Medical Responses	Saleh Afroogh et al.

Camera Ready Instructions

An email with instructions for uploading camera-ready submissions will go out mid-March 2025. To prepare your camera ready, please use the MLGenX 2025 template style.

The authors can use one additional page beyond the page limit specified during the submission (9 pages for main/special track papers and 5 pages for tiny track papers). This extra page can be used to appropriately address the comments received during the review process.

Important Dates

All deadlines are 11:59 pm UTC -12h ("Anywhere on Earth"). All authors must have an OpenReview profile when submitting.

Submission Deadline (Main and Special Tracks - Up to 8 pages): ~~February 12, 2025~~ February 16, 2025
Submission Deadline (Tiny Papers Track - Up to 4 pages): February 23, 2025
Acceptance Notification: March 5, 2025
Camera-Ready Deadline: April 24, 2025
Workshop Date: Sunday, April 27, 2025