Supporting Readability by Comprehending the Hierarchical Abstraction of a Software Project

Section 1: Publication

Publication Type

Authorship

Bhattacharjee A, Roy B, and Schneider KA

Title

Supporting Readability by Comprehending the Hierarchical Abstraction of a Software Project

Year

2022

Publication Outlet

ACM 15th Innovation in Software Engineering Conference (ISEC 2022), Article 13, pp. 1-10, DA-IICT Gandhinagar, February

DOI

https://doi.org/10.1145/3511430.3511441

ISBN

ISSN

Citation

Bhattacharjee A, Roy B, and Schneider KA, Supporting Readability by Comprehending the Hierarchical Abstraction of a Software Project, ACM 15th Innovation in Software Engineering Conference (ISEC 2022), Article 13, pp. 1-10, DA-IICT Gandhinagar, February 2022.

Abstract

Exploring the source code of a software system is a prevailing task that is frequently done by contributors to a system. Practitioners often use call graphs to aid in understanding the source code of an inadequately documented software system. Call graphs, when visualized, show caller and callee relationships between functions. A static call graph provides an overall structure of a software system and dynamic call graphs generated from dynamic execution logs can be used to trace program behaviour for a particular scenario. Unfortunately a call graph of an entire system can be very complicated and hard to understand. Hierarchically abstracting a call graph can be used to summarize an entire system’s structure and more easily comprehending function calls. In this work, we mine concepts from source code entities (functions) to generate a concept cluster tree with improved naming of cluster nodes to complement existing studies and facilitate more effective program comprehension for developers. We apply three different information retrieval techniques (TFIDF, LDA, and LSI) on function names and function name variants to label the nodes of a concept cluster tree generated by clustering execution paths. From our experiment in comparing automatic labelling with manual labeling by participants for 12 use cases, we found that among the techniques on average, TFIDF performs better with 64% matching. LDA and LSI had 37% and 23% matching respectively. In addition, using the words in function name variants performed at least 5% better in participant ratings for all three techniques on average for the use cases.

Plain Language Summary