iDev: Enhancing Social Coding Security by Cross-platform User Identification Between GitHub and Stack Overflow

iDev: Enhancing Social Coding Security by Cross-platform User Identification Between GitHub and Stack Overflow

Yujie Fan, Yiming Zhang, Shifu Hou, Lingwei Chen, Yanfang Ye, Chuan Shi, Liang Zhao, Shouhuai Xu

Proceedings of the Twenty-Eighth International Joint Conference on Artificial Intelligence
Main track. Pages 2272-2278. https://doi.org/10.24963/ijcai.2019/315

As modern social coding platforms such as GitHub and Stack Overflow become increasingly popular, their potential security risks increase as well (e.g., risky or malicious codes could be easily embedded and distributed). To enhance the social coding security, in this paper, we propose to automate cross-platform user identification between GitHub and Stack Overflow to combat the attackers who attempt to poison the modern software programming ecosystem. To solve this problem, an important insight brought by this work is to leverage social coding properties in addition to user attributes for cross-platform user identification. To depict users in GitHub and Stack Overflow (attached with attributed information), projects, questions and answers as well as the rich semantic relations among them, we first introduce an attributed heterogeneous information network (AHIN) for modeling. Then, we propose a novel AHIN representation learning model AHIN2Vec to efficiently learn node (i.e., user) representations in AHIN for cross-platform user identification. Comprehensive experiments on the data collections from GitHub and Stack Overflow are conducted to validate the effectiveness of our developed system iDev integrating our proposed method in cross-platform user identification by comparisons with other baselines.
Keywords:
Machine Learning: Data Mining
Multidisciplinary Topics and Applications: Security and Privacy