Scientific Text Mining and Knowledge Graphs

Tutorial in the the 26th ACM SIGKDD Conferences on Knowledge Discovery and Data Mining

August 23-27, 2017

San Diego, California USA (Zoom)



Meng Jiang is an Assistant Professor in the Department of Computer Science and Engineering at the University of Notre Dame. His research interests include data mining, machine learning, and information extraction. He has published over 50 conference and journal papers of the topics. His work was KDD 2014 Best Paper Finalist. He has delivered seven tutorials in conferences such as KDD, SIGMOD, WWW, CIKM, ICDM, and SDM. He is the recipient of Notre Dame Global Gateway Faculty Award.


Jingbo Shang is an Assistant Professor at UC San Diego, jointly appointed by Computer Science Engineering (CSE) Department and Halıcıoğlu Data Science Institute (HDSI). His research focuses on mining and constructing structured knowledge from massive text corpora with minimum human effort. His research has been recognized with multiple prestigious awards, including Grand Prize of Yelp Dataset Challenge in 2015, Google PhD Fellowship in Structured Data and Database Management in 2017.


Unstructured scientific text, in various forms of textual artifacts, including manuscripts, publications, patents, and proposals, is used to store the tremendous wealth of knowledge discovered after weeks, months, and years, developing hypotheses, working in the lab or clinic, and analyzing results. A grand challenge on data mining research is to develop effective methods for transforming the scientific text into well-structured forms (e.g., ontology, taxonomy, knowledge graphs), so that machine intelligent systems can build on them for hypothesis generation and validation. In this tutorial, we provide a comprehensive overview on recent research and development in this direction. First, we introduce a series of text mining methods that extract phrases, entities, scientific concepts, relations, claims, and experimental evidence. Then we discuss methods that construct and learn from scientific knowledge graphs for accurate search, document classification, and exploratory analysis. Specifically, we focus on scalable, effective, weakly supervised methods that work on text in sciences (e.g., chemistry, biology).

Tutors' Previous Related Tutorials