Supporting Readability by Comprehending the Hierarchical Abstraction of a Software Project
Section 1: Publication
Publication Type
Journal Article
Authorship
Bhattacharjee A, Roy B, and Schneider KA
Title
Supporting Readability by Comprehending the Hierarchical Abstraction of a Software Project
Year
2022
Publication Outlet
ACM 15th Innovation in Software Engineering Conference (ISEC 2022), Article 13, pp. 1-10, DA-IICT Gandhinagar, February
DOI
ISBN
ISSN
Citation
Bhattacharjee A, Roy B, and Schneider KA, Supporting Readability by Comprehending the Hierarchical Abstraction of a Software Project, ACM 15th Innovation in Software Engineering Conference (ISEC 2022), Article 13, pp. 1-10, DA-IICT Gandhinagar, February 2022.
Abstract
Exploring the source code of a software system is a prevailing task that is frequently done by contributors to a system. Practitioners often use call graphs to aid in understanding the source code of an inadequately documented software system. Call graphs, when visualized, show caller and callee relationships between functions. A static call graph provides an overall structure of a software system and dynamic call graphs generated from dynamic execution logs can be used to trace program behaviour for a particular scenario. Unfortunately a call graph of an entire system can be very complicated and hard to understand. Hierarchically abstracting a call graph can be used to summarize an entire system’s structure and more easily comprehending function calls. In this work, we mine concepts from source code entities (functions) to generate a concept cluster tree with improved naming of cluster nodes to complement existing studies and facilitate more effective program comprehension for developers. We apply three different information retrieval techniques (TFIDF, LDA, and LSI) on function names and function name variants to label the nodes of a concept cluster tree generated by clustering execution paths. From our experiment in comparing automatic labelling with manual labeling by participants for 12 use cases, we found that among the techniques on average, TFIDF performs better with 64% matching. LDA and LSI had 37% and 23% matching respectively. In addition, using the words in function name variants performed at least 5% better in participant ratings for all three techniques on average for the use cases.
Plain Language Summary
Section 2: Additional Information
Program Affiliations
Project Affiliations
Submitters
Publication Stage
Published
Theme
Presentation Format
Additional Information
Computer Science Core Team, Refereed Publications