Towards Generating Summaries for Lexically Confusing Code through Code Erosion

Towards Generating Summaries for Lexically Confusing Code through Code Erosion

Fan Yan, Ming Li

Proceedings of the Thirtieth International Joint Conference on Artificial Intelligence
Main Track. Pages 3721-3727. https://doi.org/10.24963/ijcai.2021/512

Code summarization aims to summarize code functionality as high-level nature language descriptions to assist in code comprehension. Recent approaches in this field mainly focus on generating summaries for code with precise identifier names, in which meaningful words can be found indicating code functionality. When faced with lexically confusing code, current approaches are likely to fail since the correlation between code lexical tokens and summaries is scarce. To tackle this problem, we propose a novel summarization framework named VECOS. VECOS introduces an erosion mechanism to conquer the model's reliance on precisely defined lexical information. To facilitate learning the eroded code's functionality, we force the representation of the eroded code to align with the representation of its original counterpart via variational inference. Experimental results show that our approach outperforms the state-of-the-art approaches to generate coherent and reliable summaries for various lexically confusing code.
Keywords:
Multidisciplinary Topics and Applications: Knowledge-based Software Engineering
Data Mining: Mining Codebase and Software Repository