NSF III: Small: Intelligent Scientific Text Analytics with Knowledge-Augmented Abductive Reasoning

Project Description (NSF IIS-2234058)

Scientists are producing vast numbers of research articles and patents every year to advance our understanding of our world and the universe. Meanwhile, they are making a great effort to build tools to boost their productivity. While these tools are able to process scientific text, they are not endowed with intelligence to think or write like scientists to help their work. Natural language generation systems may generate some new statements that are fluent to read and hard to distinguish from human-written texts. However, existing systems are not as intelligent or reliable as working with human research assistants due to lack of reasoning abilities about scientific innovation. This project aims to enable comparative reasoning in a novel intelligent system of scientific text analytics, which is missing in existing systems. Comparative reasoning establishes the importance of something by comparing it against something else. Comparative reasoning plays a central role in scientific innovation and can be categorized as abductive reasoning in the context of artificial intelligence. This project will design and develop novel text generation approaches for scientific abductive reasoning and intelligent scientific text analytics. Moreover, this research will support the professional development of a cohort of PhD, undergraduate, and high school students.

The technical aims of the project are divided into three thrusts. The first develops and compares natural language generation models based on a data-driven architecture and a novel architecture inspired and rooted in theories of abduction. These models will be evaluated on the tasks of comparative summarization and comparative argument generation in scientific domains. The second thrust designs retrieval-augmented approaches with heterogeneous knowledge sources such as tables, taxonomies, and knowledge graphs to improve the performance of scientific abductive reasoning models. Because retrieving and encoding every instance can be very time consuming, the third thrust builds knowledge memory networks that learns and manages distributed representations of scientific concepts and relations from the knowledge sources. They will accelerate the retrieval augmentation, when all the types of scientific source data are of large scale. Finally, these techniques will be integrated into a new artificial intelligence system that accurately generates explanatory sentences to automate comparative reasoning and assist scientific innovation.

We are grateful for NSF support to make this project possible!


Research Assistants