I'm an Associate Professor in the Department of Computer Science and Engineering (CSE) at the University of Notre Dame. I'm appointed as a Lucy Family Institute Fellow. I am also an Amazon Scholar. My research fields are data mining, machine learning, and natural language processing. My data science research works on graph and text data for applications such as material discovery, recommender system, question answering, education, and mental health. My recent projects focus on knowledge-augmented NLP, instructed LLM, self-correct LLM, personalized LLM, LLM machine unlearning, graph neural networks, graph data augmentation, and graph diffusion model. [C.V.]
I am co-directing the Foundation Models and Applications Lab (FAML). By harnessing cutting-edge foundation models, our AI systems can rapidly adapt to diverse tasks, from accelerating material discovery and combating climate change to transforming education into a more engaging, personalized experience.
I am also directing the Data Mining towards Decision Making (DM2) Lab, supported by National Science Foundation (NSF), National Institutes of Health (NIH), and Office of Naval Research (ONR).
The DM2 Lab at Notre Dame CSE is recruiting two PhD students to begin in Spring or Fall 2026. Our research focuses on AI for Material Discovery, leveraging graph neural networks and multimodal LLMs. We work closely with experts in mechanical and chemical engineering on material property measurement and novel material synthesis. To apply, please visit this link. Experience in Molecular Dynamics and its software, Graph Neural Network design, development, and evaluation, and/or Multimodal Large Language Model training and testing will be preferred. Feel free to reach out to me (mjiang2 [at] nd.edu) if you are interested! -- Some selected research outcomes of ours in this field:
- Graph data augmentation: for rationale supervised learning (KDD'22), semi-supervised learning (KDD'23), unsupervised learning (NeurIPS'23), self-supervised learning (LoG'24); supported by NSF IIS-2146761;
- Generative models: graph diffusion transformers (NeurIPS'24), heterogeneous molecular representations (ICLR'25), multimodal LLMs for inverse molecular design (ICLR'25); supported by IBM PhD Fellowship and NSF IIS-2146761;
- Novel polymer materials: two superior polymeric gas separation membrane materials (Cell Reports Physical Science), a comprehensive review on AI for gas separation material design (Chemical Physics Reviews), transfer learning for predicting thermal conductivity (Materials Today Physics) and electron properties (Science Advances); supported by NSF CBET-2332270.
- Tool for molecular discovery: torch-molecule is a package that facilitates molecular discovery through deep learning, featuring a user-friendly, sklearn-style interface. It includes model checkpoints for efficient deployment and benchmarking across a range of molecular tasks, including predictive models, generative models, and representation models.
|
|
The FAML Lab at Lucy Family Institute will be looking for one postdoctoral research associate to join in Spring or Fall 2026 and be co-advised by me and Prof. Xiangliang Zhang. The research topic is Foundation Models and Applications, emphasizing interdisciplinary collaborations. Stay tuned for the job post. Drop me an e-mail (mjiang2 [at] nd.edu) if you are interested!
What's New
Latest Publications
- Leopard: A Vision Language Model for Text-Rich Multi-Image Tasks,
TMLR, 2025.
- CodeTaxo: Enhancing Taxonomy Expansion with Limited Examples via Code Language Prompts,
Findings of ACL, 2025.
- QG-SMS: Enhancing Test Item Analysis via Student Modeling and Simulation,
ACL, 2025.
- Optimizing Decomposition for Optimal Claim Verification,
ACL, 2025.
- Modality-Aware Neuron Pruning for Unlearning in Multimodal Large Language Models,
ACL, 2025.
- Disentangling Biased Knowledge from Reasoning in Large Language Models via Machine Unlearning,
ACL, 2025.
- Aligning Large Language Models with Implicit Preferences from User-Generated Content,
ACL, 2025.
- Enhancing Mathematical Reasoning in LLMs by Stepwise Correction,
ACL, 2025.
- UniConv: Unifying Retrieval and Response Generation for Large Language Model in Conversation,
ACL, 2025.
- Protecting Privacy in Multimodal Large Language Models with MLLMU-Bench,
NAACL, 2025.
- IHEval: Evaluating Language Models on Following the Instruction Hierarchy,
NAACL, 2025.
- MultiChartQA: Benchmarking Vision-Language Models on Multi-Chart Problems,
NAACL, 2025.
- Benchmarking Language Model Creativity: A Case Study on Code Generation,
NAACL, 2025.
- Multimodal Large Language Models for Inverse Molecular Design with Retrosynthetic Planning,
ICLR, 2025.
- Learning Molecular Representation in a Cell,
ICLR, 2025.
- Limitations of the LLM-as-a-Judge Approach for Evaluating LLM Outputs in Expert Knowledge Tasks,
IUI, 2025.
- Learning Attribute as Explicit Relation for Sequential Recommendation,
KDD, 2025.
- Motif-aware Attribute Masking for Molecular Graph Pre-training,
LoG, 2024.
- Graph Diffusion Transformer for Multi-Conditional Molecular Generation,
NeurIPS, 2024.
- Large Language Models Can Self-Correct with Key Condition Verification,
EMNLP, 2024.
- Personalized Pieces: Efficient Personalized Large Language Models through Collaborative Efforts,
EMNLP, 2024.
- Democratizing Large Language Models via Personalized Parameter-Efficient Fine-tuning,
EMNLP, 2024.
- Learn Beyond The Answer: Training Language Models with Reflection for Mathematical Reasoning,
EMNLP, 2024.
- Reference-based Metrics Disprove Themselves in Question Generation,
Findings of EMNLP, 2024.
- TOWER: Tree Organized Weighting for Evaluating Complex Instructions,
Findings of EMNLP, 2024.
Advised PhD Dissertations
Last updated on June 10, 2025.