I'm an Associate Professor in the Department of Computer Science and Engineering at the University of Notre Dame. My research fields are data mining, machine learning, and natural language processing. My data science research focuses on graph and text data for applications such as material discovery, recommender system, question answering, education, and mental health. [C.V.]
My recent projects focus on knowledge-augmented NLP, auto-instruct LLM, self-correct LLM, harm-unlearned LLM, graph neural networks, graph data augmentation, and graph diffusion transformer.
I am directing the Data Mining towards Decision Making (DM2) Lab, supported by National Science Foundation (NSF), National Institutes of Health (NIH), Office of Naval Research (ONR), Amazon, Snap, Condé Nast, and ND International.
What's New
Latest Publications
- Chain-of-Layer: Iteratively Prompting Large Language Models for Taxonomy Induction from Limited Examples,
CIKM, 2024.
- PLUG: Leveraging Pivot Language in Cross-Lingual Instruction Tuning,
ACL, 2024.
- Towards Safer Large Language Models through Machine Unlearning,
Findings of ACL, 2024.
- Instructing Large Language Models to Identify and Ignore Irrelevant Conditions,
NAACL, 2024. [project]
- OpenFMNav: Towards Open-Set Zero-Shot Object Navigation via Vision-Language Foundation Models,
Findings of NAACL, 2024. [project]
- Get an A in Math: Progressive Rectification Prompting,
AAAI, 2024. [project]
- Pre-training Language Models for Comparative Reasoning,
EMNLP, 2023.
- IfQA: A Dataset for Open-domain Question Answering under Counterfactual Presuppositions,
EMNLP, 2023. (Outstanding Paper Award)
- Auto-Instruct: Automatic Instruction Generation and Ranking for Black-Box Language Models,
Findings of EMNLP, 2023.
- Exploring Contrast Consistency of Open-Domain Question Answering Systems on Minimally Edited Questions,
EMNLP (from TACL), 2023.
- Data-Centric Learning from Unlabeled Graphs with Diffusion Model,
NeurIPS, 2023.
- Generate rather than Retrieve: Large Language Models are Strong Context Generators,
ICLR, 2023.
- Semi-Supervised Graph Imbalanced Regression,
KDD, 2023.
- Large Language Models are Built-in Autoregressive Search Engines,
Findings of ACL, 2023.
- A Survey of Multi-task Learning in Natural Language Processing: Regarding Task Relatedness and Training Methods,
EACL, 2023.
- Rationalizing Graph Neural Networks with Data Augmentation,
TKDD, 2023.
- User Modeling in the Era of Large Language Models,
IEEE Data Engineering Bulletin, 2023.
- Graph Data Augmentation for Graph Machine Learning: A Survey,
IEEE Data Engineering Bulletin, 2023.
Advised PhD Dissertations
Talks and Abstracts
- Effective and Efficient Knowledge-Intensive NLP (2023)
[abstract]:
cover RACo (EMNLP 2022), GenRead (ICLR 2023), and EDMem (EMNLP 2022).
- Data Augmentation for Graph Regression (2023)
[abstract]:
cover GREA (KDD 2022), SGIR (KDD 2023), and DCT (NeurIPS 2023).
- Enhancing Language Generation with Knowledge Graphs (2022)
[abstract]:
cover FASum (NAACL 2021), MoKGE (ACL 2022), and EDMem (EMNLP 2022).
- Novel Methods that Learn to Augment Graph Data (2021)
[abstract]:
cover GAug (AAAI 2021), Eland (CIKM 2021), CFLP (ICML 2022), and GREA (KDD 2022).
- Structured Knowledge is Still Essential to Understand Sciences (2020)
[abstract]:
cover SciKG (KDD 2019), MIMO (EMNLP 2019), Tablepedia (WWW 2020), TCN (WWW 2021), and GenTaxo (KDD 2021).
- Graph Learning for Behavior Modeling (2020):
cover TUBE (KDD 2019), M2TUBE (TNNLS 2022), CalendarGNN (KDD 2020), CoEvoGNN (DLG 2020 Best Paper / TKDE 2021), GAL (CIKM 2021), and PamFul (TNNLS 2021), including user profiling, recommendation, and fraud detection.
Last updated on September 7, 2024.