NSF III: Small: Comprehensive Methods to Learn to Augment Graph Data

Project Description (NSF IIS-2146761)

Machine learning algorithms learn from data. The quality and quantity of training data has as much to do with the success of machine learning projects as the algorithms themselves. Data augmentation methods aim at increasing the amount of data, assuming that the training examples do not have to be the raw data. They have been progressively successful in image classification: from adhoc augmentation such as cropping, flipping, rotation, to learningtoaugment methods. Graph machine learning (GML) methods such as graph neural networks (GNNs) play an important role in studying many types of graphs, including social networks, molecular networks, and knowledge graphs. The computational graph in GNNs does not have to be the same as the raw graph. The optimal computational graph can improve the GML performance. Adhoc augmentation methods have been applied and achieved the stateoftheart performance. For example, the computational graph could connect all 2hop neighbors in the raw graph so that representations would be aggregated directly, not through two layers; it could add a virtual node that connects to all nodes in the graph so that none of them would be isolated. However, these computational graphs are far away from the optimum; and learningtoaugment methods are missing for graph data and GML. This project will explore the possibility of building computational graphs, augmenting raw graph data, to improve the performance of graph learning methods.

The technical aims of the project are divided into three thrusts. The first thrust develops novel graph machine learning techniques to augment the graph data by counterfactual inference on the effect of edges as treatment variables. The second thrust develops novel graph machine learning techniques to augment the graph by forecasting if sequential, temporal, or dynamic patterns exist in the data. The third thrust develops novel graph machine learning techniques to augment the graph by pseudo labeling and disconnecting the nodes in very different communities or clusters. This project will deliver novel methods that learn to augment graph data integrating with theories and methods from a variety of research fields such as statistical causal analysis, sequence modeling and prediction, and graph mining algorithms. It will advance the technologies of graph machine learning and expand the scope of data augmentation for deep learning.

We are grateful for NSF support to make this project possible!

Faculty

Meng Jiang

Research Assistants

Tong Zhao

Gang Liu

Eric Inae

Weike Fang: REU

Jackson Ballow: REU

Publications

Motif-aware Attribute Masking for Molecular Graph Pre-training Learning on Graphs Conference (LoG), 2024.
Graph Diffusion Transformer for Multi-Conditional Molecular Generation Conference on Neural Information Processing Systems (NeurIPS), 2024. (Oral)
Data-Centric Learning from Unlabeled Graphs with Diffusion Model Conference on Neural Information Processing Systems (NeurIPS), 2023.
Semi-Supervised Graph Imbalanced Regression ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD), 2023.
Leveraging Low-Fidelity Data to Improve Machine Learning of Sparse High-Fidelity Thermal Conductivity Data via Transfer Learning Materials Today Physics.
A Synergistic Approach for Graph Anomaly Detection with Pattern Mining and Feature Learning by T. Zhao, T. Jiang, N. Shah, M. Jiang. IEEE Transactions on Neural Networks and Learning Systems (TNNLS).
AutoGDA: Automated Graph Data Augmentation for Node Classification Learning on Graphs Conference (LoG), 2022.
Graph Rationalization with Environment-based Augmentations ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD), 2022.
Learning from Counterfactual Graph for Link Prediction International Conference on Machine Learning (ICML), 2022.
Action Sequence Augmentation for Early Graph-based Anomaly Detection ACM International Conference on Information and Knowledge Management (CIKM), 2021.
Data Augmentation for Graph Neural Networks AAAI Conference on Artificial Intelligence (AAAI), 2021.