DM2 Lab

Meng Jiang

Hi! I am an Associate Professor and Frank M. Freimann Collegiate Professor of Computer Science and Engineering at the University of Notre Dame. I'm appointed as a Lucy Family Institute Fellow as well as the Program Chair of ND-IBM Tech Ethics Lab. I am also an Amazon Scholar. My research fields are AI and Data Science. I'm interested in text and graph data for applications such as material discovery, recommender system, question answering, education, and mental health. My recent projects focus on knowledge-augmented NLP, instructed LLM, self-correct LLM, personalized LLM, unlearned LLM, graph data augmentation, and graph diffusion model. [C.V.]

I am directing the Foundation Models and Applications Lab (FAML) at Lucy Institute. By harnessing cutting-edge foundation models, AI systems can rapidly adapt to diverse tasks, from accelerating material discovery and combating climate change to transforming education into a more engaging, personalized experience.

I am also directing the Data Mining towards Decision Making (DM2) Lab, supported by National Science Foundation (NSF), National Institutes of Health (NIH), and Office of Naval Research (ONR).

The DM2 Lab at Notre Dame CSE is recruiting two PhD students to begin in Spring or Fall 2026. Our research focuses on AI for Material Discovery, leveraging graph neural networks and multimodal LLMs. We work closely with experts in mechanical and chemical engineering on material property measurement and novel material synthesis. To apply, please visit this link. Experience in Molecular Dynamics and its software, Graph Neural Network design, development, and evaluation, and/or Multimodal Large Language Model training and testing will be preferred. Feel free to reach out to me (mjiang2 [at] nd.edu) if you are interested! -- Some selected research activities/outcomes:

Open Polymer Challenge: Leveraging Machine Learning for Polymer Informatics was accepted to NeurIPS 2025 Competition Track and is now LAUNCHed on Kaggle! It is co-organized by University of Notre Dame, University of Wisconsin-Madison, and Kaggle. JOIN US AND WIN $50,000 Awards! YES, FOUR "0"s - it's $50,000! Soooo what are YOU waiting for???

Tool for molecular discovery: torch-molecule is a package that facilitates molecular discovery through deep learning, featuring a user-friendly, sklearn-style interface. It includes model checkpoints for efficient deployment and benchmarking across a range of molecular tasks, including predictive models, generative models, and representation models.

Modeling Polymers with Neural Networks is for polymer scientists that are interested in applying machine learning and neural networks. It is designed for college students and published by American Chemical Society in July 2025.

Deep Learning for Polymer Discovery: Foundation and Advances is for scientists that are interested in advanceing deep learning and data science for polymer informatics. It is designed for graduate-level research and published by Springer in June 2025. (eBook on Google Play)

Graph data augmentation: for rationale supervised learning (KDD'22), semi-supervised learning (KDD'23), unsupervised learning (NeurIPS'23), self-supervised learning (LoG'24); supported by NSF IIS-2146761;
Generative models: graph diffusion transformers (NeurIPS'24), heterogeneous molecular representations (ICLR'25), multimodal LLMs for inverse molecular design (ICLR'25); supported by IBM PhD Fellowship and NSF IIS-2146761;
Novel polymer materials: two superior polymeric gas separation membrane materials (Cell Reports Physical Science), a comprehensive review on AI for gas separation material design (Chemical Physics Reviews), transfer learning for predicting thermal conductivity (Materials Today Physics) and electron properties (Science Advances); supported by NSF CBET-2332270.

The FAML Lab at Lucy Family Institute will be looking for one postdoctoral research associate to join in Spring or Fall 2026 and be co-advised by me and Prof. Xiangliang Zhang. The research topic is Foundation Models and Applications, emphasizing interdisciplinary collaborations. Stay tuned for the job post. Drop me an e-mail (mjiang2 [at] nd.edu) if you are interested!

What's New

August 2025: "LLM Function Calling" (led by Hy Dang) and "Zipf's Law in Tokenization" (led by iSURE student Yanjin He) were accepted to EMNLP!
June 2025: Zhihan Zhang and Lingbo Tong have successfully passed their dissertation defense. Congratulations, Dr. Zhang and Dr. Tong!
May 2025: NeurIPS Open Polymer Challenge is launched on Kaggle! JOIN US AND WIN $50,000!
May 2025: DM2 students are graduating: Zhihan Zhang will join Amazon Rufus as a scientist in June. Lingbo Tong will join the School of Education in the University of Wisconsin-Madison as an assistant professor in August. Qingkai Zeng will join the School of Computer Science in Nankai University as an assistant professor in 2026. And, Gang Liu will be on the academic job market!
May 2025: Leopard (Text-Rich Multi-Image Vision-Language Model) was accepted to TMLR!
May 2025: Eight papers were accepted to ACL Main and one paper was accepted to ACL Findings!
April 2025: Midwest Speech and Language Days (MSLD) was very successful -- over 130 registrations, 75 presentations, and 4 keynote speakers!
April 2025: MIT News covered Llamole -- Gang's ICLR work on molecular multimodal LLMs!
March 2025: Three new benchmarks Multimodal Unlearning (led by Frank), Instruction-following Hierarchy (led by Zhihan), and MultiChartQA were accepted to NAACL!
February 2025: MLLM for Molecular Design and Learning Molecular Representations in a Cell (both led by Gang) were accepted to ICLR!
January 2025: Gang Liu received the 2024-2025 IBM PhD Fellowship for his work on Foundation Models. Congratulations!
December 2024: Qingkai Zeng successfully defended his dissertation Improving Scientific Information Extraction with Text Generation. Congratulations, Dr. Zeng!

Latest Publications

Pre-trained Models Perform the Best When Token Distributions Follow Zipf's Law, EMNLP, 2025.
Improving Large Language Models Function Calling and Interpretability via Guided-Structured Templates, EMNLP, 2025.
Leopard: A Vision Language Model for Text-Rich Multi-Image Tasks, TMLR, 2025.
CodeTaxo: Enhancing Taxonomy Expansion with Limited Examples via Code Language Prompts, Findings of ACL, 2025.
QG-SMS: Enhancing Test Item Analysis via Student Modeling and Simulation, ACL, 2025.
Optimizing Decomposition for Optimal Claim Verification, ACL, 2025.
Modality-Aware Neuron Pruning for Unlearning in Multimodal Large Language Models, ACL, 2025.
Disentangling Biased Knowledge from Reasoning in Large Language Models via Machine Unlearning, ACL, 2025.
Aligning Large Language Models with Implicit Preferences from User-Generated Content, ACL, 2025.
Enhancing Mathematical Reasoning in LLMs by Stepwise Correction, ACL, 2025.
UniConv: Unifying Retrieval and Response Generation for Large Language Model in Conversation, ACL, 2025.
Protecting Privacy in Multimodal Large Language Models with MLLMU-Bench, NAACL, 2025.
IHEval: Evaluating Language Models on Following the Instruction Hierarchy, NAACL, 2025.
MultiChartQA: Benchmarking Vision-Language Models on Multi-Chart Problems, NAACL, 2025.
Benchmarking Language Model Creativity: A Case Study on Code Generation, NAACL, 2025.
Multimodal Large Language Models for Inverse Molecular Design with Retrosynthetic Planning, ICLR, 2025.
Learning Molecular Representation in a Cell, ICLR, 2025.
Limitations of the LLM-as-a-Judge Approach for Evaluating LLM Outputs in Expert Knowledge Tasks, IUI, 2025.
Learning Attribute as Explicit Relation for Sequential Recommendation, KDD, 2025.
Motif-aware Attribute Masking for Molecular Graph Pre-training, LoG, 2024.
Graph Diffusion Transformer for Multi-Conditional Molecular Generation, NeurIPS, 2024.
Large Language Models Can Self-Correct with Key Condition Verification, EMNLP, 2024.
Personalized Pieces: Efficient Personalized Large Language Models through Collaborative Efforts, EMNLP, 2024.
Democratizing Large Language Models via Personalized Parameter-Efficient Fine-tuning, EMNLP, 2024.
Learn Beyond The Answer: Training Language Models with Reflection for Mathematical Reasoning, EMNLP, 2024.
Reference-based Metrics Disprove Themselves in Question Generation, Findings of EMNLP, 2024.
TOWER: Tree Organized Weighting for Evaluating Complex Instructions, Findings of EMNLP, 2024.

Advised PhD Dissertations

Last updated on August 21, 2025.