NSF III: Small: Comprehensive Methods to Learn to Augment Graph Data

Project Description (NSF IIS-2146761)

Machine learning algorithms learn from data. The quality and quantity of training data has as much to do with the success of machine learning projects as the algorithms themselves. Data augmentation methods aim at increasing the amount of data, assuming that the training examples do not have to be the raw data. They have been progressively successful in image classification: from adhoc augmentation such as cropping, flipping, rotation, to learning­to­augment methods. Graph machine learning (GML) methods such as graph neural networks (GNNs) play an important role in studying many types of graphs, including social networks, molecular networks, and knowledge graphs. The computational graph in GNNs does not have to be the same as the raw graph. The optimal computational graph can improve the GML performance. Adhoc augmentation methods have been applied and achieved the state­of­the­art performance. For example, the computational graph could connect all 2­hop neighbors in the raw graph so that representations would be aggregated directly, not through two layers; it could add a virtual node that connects to all nodes in the graph so that none of them would be isolated. However, these computational graphs are far away from the optimum; and learning­to­augment methods are missing for graph data and GML. This project will explore the possibility of building computational graphs, augmenting raw graph data, to improve the performance of graph learning methods.

The technical aims of the project are divided into three thrusts. The first thrust develops novel graph machine learning techniques to augment the graph data by counterfactual inference on the effect of edges as treatment variables. The second thrust develops novel graph machine learning techniques to augment the graph by forecasting if sequential, temporal, or dynamic patterns exist in the data. The third thrust develops novel graph machine learning techniques to augment the graph by pseudo labeling and disconnecting the nodes in very different communities or clusters. This project will deliver novel methods that learn to augment graph data integrating with theories and methods from a variety of research fields such as statistical causal analysis, sequence modeling and prediction, and graph mining algorithms. It will advance the technologies of graph machine learning and expand the scope of data augmentation for deep learning.

We are grateful for NSF support to make this project possible!

Faculty

Research Assistants

Publications