NSF III: Small: Intelligent Scientific Text Analytics with Knowledge-Augmented Abductive Reasoning

Project Description (NSF IIS-2234058)

Scientists are producing vast numbers of research articles and patents every year to advance our understanding of our world and the universe. Meanwhile, they are making a great effort to build tools to boost their productivity. While these tools are able to process scientific text, they are not endowed with intelligence to think or write like scientists to help their work. Natural language generation systems may generate some new statements that are fluent to read and hard to distinguish from human-written texts. However, existing systems are not as intelligent or reliable as working with human research assistants due to lack of reasoning abilities about scientific innovation. This project aims to enable comparative reasoning in a novel intelligent system of scientific text analytics, which is missing in existing systems. Comparative reasoning establishes the importance of something by comparing it against something else. Comparative reasoning plays a central role in scientific innovation and can be categorized as abductive reasoning in the context of artificial intelligence. This project will design and develop novel text generation approaches for scientific abductive reasoning and intelligent scientific text analytics. Moreover, this research will support the professional development of a cohort of PhD, undergraduate, and high school students.

The technical aims of the project are divided into three thrusts. The first develops and compares natural language generation models based on a data-driven architecture and a novel architecture inspired and rooted in theories of abduction. These models will be evaluated on the tasks of comparative summarization and comparative argument generation in scientific domains. The second thrust designs retrieval-augmented approaches with heterogeneous knowledge sources such as tables, taxonomies, and knowledge graphs to improve the performance of scientific abductive reasoning models. Because retrieving and encoding every instance can be very time consuming, the third thrust builds knowledge memory networks that learns and manages distributed representations of scientific concepts and relations from the knowledge sources. They will accelerate the retrieval augmentation, when all the types of scientific source data are of large scale. Finally, these techniques will be integrated into a new artificial intelligence system that accurately generates explanatory sentences to automate comparative reasoning and assist scientific innovation.

We are grateful for NSF support to make this project possible!

Faculty

Meng Jiang

Research Assistants

Mengxia Yu

Zhihan Zhang

Wenhao Yu

Publications

Optimizing Decomposition for Optimal Claim Verification Annual Meetings of the Association for Computational Linguistics (ACL), 2025.
Aligning Large Language Models with Implicit Preferences from User-Generated Content Annual Meetings of the Association for Computational Linguistics (ACL), 2025.
MultiChartQA: Benchmarking Vision-Language Models on Multi-Chart Problems Annual Conference of the North American Chapter of the Association for Computational Linguistics (NAACL), 2025.
IfQA: A Dataset for Open-domain Question Answering under Counterfactual Presuppositions Conference on Empirical Methods in Natural Language Processing (EMNLP), 2023.
Scientific Comparative Argument Generation Third Document Intelligence Workshop (DI) at ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD), 2022.
A Unified Encoder-Decoder Framework with Entity Memory Empirical Methods on Natural Language Processing (EMNLP), 2022.
Retrieval Augmentation for Commonsense Reasoning: A Unified Approach Empirical Methods on Natural Language Processing (EMNLP), 2022.
Diversifying Content Generation for Commonsense Reasoning with Mixture of Knowledge Graph Experts Findings of Annual Meeting of the Association for Computational Linguistics (ACL), 2022.