Learning Unified Features from Natural and Programming Languages for Locating Buggy Source Code / 1606
Xuan Huo, Ming Li, Zhi-Hua Zhou
Bug reports provide an effective way for end-users to disclose potential bugs hidden in a software system, while automatically locating the potential buggy source code according to a bug report remains a great challenge in software maintenance. Many previous studies treated the source code as natural language by representing both the bug report and source code based on bag-of-words feature representations, and correlate the bug report and source code by measuring similarity in the same lexical feature space. However, these approaches fail to consider the structure information of source code which carries additional semantics beyond the lexical terms. Such information is important in modeling program functionality. In this paper, we propose a novel convolutional neural network NP-CNN, which leverages both lexical and program structure information to learn unified features from natural language and source code in programming language for automatically locating the potential buggy source code according to bug report. Experimental results on widely-used software projects indicate that NP-CNN significantly outperforms the state-of-the-art methods in locating the buggy source files.