Data-Driven Approaches towards Malicious Behavior Modeling

Tutorial in the the 23rd ACM SIGKDD Conferences on Knowledge Discovery and Data Mining

Sunday, August 13, 2017

Halifax, Nova Scotia, Canada

Tutors

Meng Jiang is a Postdoctoral Research Associate in the Computer Science Department at University of Illinois at Urbana-Champaign. He is going to join the Computer Science and Engineering Department at the University of Notre Dame as an Assistant Professor in Fall 2017. His research interests focus on data-driven behavioral analytics for prediction, recommendation, and suspicious behavior detection. He obtained his Ph.D. and Dissertation Award in 2015 and B.E. in 2010 from Tsinghua University, China. He visited CMU in 2013. He has published over 20 refereed articles and 2 book chapters. He received the KDD 2014 Best Paper Finalist. More details can be found at http://www.meng-jiang.com/.

Srijan Kumar is a Postdoctoral Researcher in the Computer Science Department at Stanford University. He obtained his Ph.D. in 2017 from the Computer Science Department at the University of Maryland, College Park, USA. His research focuses on malicious user and information detection. This work has been presented as major parts of his tutorials at ASONAM 2016 and WWW 2017 conferences. He is a WorldQuant PhD Fellow, and has been awarded Dr. Bidhan Chandra Roy Gold Medal and UMD Outstanding Graduate Student Dean's Fellowship. He completed his undergraduate education from Indian Institute of Technology (IIT), Kharagpur, India. More details can be found at http://cs.umd.edu/~srijan/.

VS Subrahmanian is a Professor in the Computer Science Department, director of the Lab for Computational Cultural Dynamics and Director of the Center for Digital International Government at the University of Maryland, College Park. His work stands squarely at the intersection of big data analytics for increased security, policy, and business needs. He has published over 280 peer-reviewed papers including papers on detecting bots on Twitter, detecting trolls on Slashdot, and detecting vandals on Wikipedia. He led the team that won DARPA's Twitter Bot Challenge in early 2015. He currently serves on the boards of numerous journals including Science, ACM Transactions on Intelligent Systems & Technology, ACM Transactions on Computational Logic, and IEEE Transactions on Computational Social Systems. Moreover, he serves currently on the Research Advisory Board of Tata Consultancy Services, the Board of Directors of the Development Gateway, Sentimetrix, Inc., and CosmosId. More details can be found at http://cs.umd.edu/~vs/.

Christos Faloutsos is a Professor in the Department of Computer Science and the Department of Electrical and Computer Engineering at Carnegie Mellon University. He has received the Presidential Young Investigator Award by the National Science Foundation (1989), the Research Contributions Award in ICDM 2006, the SIGKDD Innovations Award (2010), 24 "best paper" awards (including 5 "test of time" awards), and four teaching awards. Six of his advisees have attracted KDD or SCS dissertation awards. He is an ACM Fellow. He has served as a member of the executive committee of SIGKDD; he has published over 350 refereed articles, 17 book chapters and two monographs. He holds seven patents (and 2 pending), and he has given over 40 tutorials and over 20 invited distinguished lectures. His research interests include large-scale data mining with emphasis on graphs and time sequences; anomaly detection, tensors, and fractals. More details can be found at http://www.cs.cmu.edu/~christos/.

Abstract

The safety, reliability and usability of web platforms are often compromised by malicious entities, such as vandals on Wikipedia, bot connections on Twitter, fake likes on Facebook, and several more. Computational models developed with large-scale real-world behavioral data have shown significant progress in identifying these malicious entities. This tutorial discusses three broad directions of state-of-the-art data-driven methods to model malicious behavior: (i) feature-based algorithms, in which distinguishing behavioral features are proposed to predict the malicious users; (ii) spectral-based algorithms, which have been widely used in settings of directed graphs, undirected graphs, and bipartite graphs such as "who-follows-whom" Twitter data and "who-likes-what" Facebook data; and (iii) density-based algorithms, which efficiently look for suspicious, highly-dense components in multi-dimensional behavioral data. This tutorial will introduce the details of the general algorithms from the above three classes that can be applied to any platform and dataset.

Tutors' Previous Related Tutorials

Srijan Kumar, Justin Cheng, and Jure Leskovec. "Antisocial behavior on the Web: characterization and detection", International World Wide Web conference (WWW), 2017.
Meng Jiang, Peng Cui, and Jiawei Han. "Data-driven behavioral analytics: observations, representations and models", ACM International Conference on Information and Knowledge Management (CIKM), 2016.
Srijan Kumar, Francesca Spezzano and VS Subrahmanian. "Identifying malicious actors on social media", International Conference on Advances in Social Network Analysis and Mining (ASONAM), 2016.
Meng Jiang and Peng Cui. "Behavioral modeling in social networks: from micro to macro", IEEE International Conference on Data Mining (ICDM), 2015.
Alex Beutel, Leman Akoglu, and Christos Faloutsos. "Graph-based user behavior modeling: from prediction to fraud detection", ACM SIGKDD Conferences on Knowledge Discovery and Data Mining (KDD), 2015.

Schedule: Detailed Description [pdf]

Time	Topic (Preliminary Slides)
2:00-2:15	Introduction
2:15-3:30	Feature-based methods: Bots, sockpuppets, vandals, hoaxes
3:30-4:00	Break
4:00-4:30	Spectral-based methods: Visualization, camouflage
4:30-5:15	Density-based methods: Ill-gotten likes, synchronized behaviors, social spam, advertising campaign
5:15-5:30	Conclusions