Out-of-sample Node Representation Learning for Heterogeneous Graph in Real-time Android Malware Detection

Out-of-sample Node Representation Learning for Heterogeneous Graph in Real-time Android Malware Detection

Yanfang Ye, Shifu Hou, Lingwei Chen, Jingwei Lei, Wenqiang Wan, Jiabin Wang, Qi Xiong, Fudong Shao

Proceedings of the Twenty-Eighth International Joint Conference on Artificial Intelligence
Main track. Pages 4150-4156. https://doi.org/10.24963/ijcai.2019/576

The increasingly sophisticated Android malware calls for new defensive techniques that are capable of protecting mobile users against novel threats. In this paper, we first extract the runtime Application Programming Interface (API) call sequences from Android apps, and then analyze higher-level semantic relations within the ecosystem to comprehensively characterize the apps. To model different types of entities (i.e., app, API, device, signature, affiliation) and rich relations among them, we present a structured heterogeneous graph (HG) for modeling. To efficiently classify nodes (e.g., apps) in the constructed HG, we propose the HG-Learning method to first obtain in-sample node embeddings and then learn representations of out-of-sample nodes without rerunning/adjusting HG embeddings at the first attempt. We later design a deep neural network classifier taking the learned HG representations as inputs for real-time Android malware detection. Comprehensive experiments on large-scale and real sample collections from Tencent Security Lab are performed to compare various baselines. Promising results demonstrate that our developed system AiDroid which integrates our proposed method outperforms others in real-time Android malware detection.
Keywords:
Machine Learning: Classification
Multidisciplinary Topics and Applications: Security and Privacy