HCPC: Human centric program comprehension by grouping static execution scenarios, June

Section 1: Publication

Publication Type

Authorship

Bhattacharjee Avijit

Title

HCPC: Human centric program comprehension by grouping static execution scenarios, June

Year

2021

Publication Outlet

Co-supervisors: BRoy and Schneider

DOI

https://hdl.handle.net/10388/13522

ISBN

ISSN

Citation

Bhattacharjee Avijit, HCPC: Human centric program comprehension by grouping static execution scenarios, June 2021. Co-supervisors: BRoy and Schneider

Abstract

New members of a software team can struggle to locate user requirements if proper software engineering principles are not practiced. Reading through code, finding relevant methods, classes and files take a significant portion of software development time. Many times developers have to fix issues in code written by others. Having a good tool support for this code browsing activity can reduce human effort and increase overall developers' productivity. To help program comprehension activities, building an abstract code summary of a software system from the call graph is an active research area. A call graph is a visual representation of caller-callee relationships between different methods of a software project. Call graphs can be difficult to comprehend for a larger code-base. The motivation is to extract the essence from the call graph by finding execution scenarios from a call graph and then cluster them together by concentrating the information in the code-base. Later, different techniques are applied to label nodes in the abstract code summary tree. In this thesis, we focus on static call graphs for creating an abstract code summary tree as it clusters all possible program scenarios and groups similar scenarios together. Previous work on static call graph clusters execution paths and uses only one information retrieval technique without any feedback from developers. First, to advance existing work, we introduced new information retrieval techniques alongside human-involved evaluation. We found that developers prefer node labels generated by terms in method names with TFIDF (term frequency-inverse document frequency). Second, from our observation, we introduced two new types of information (text description using comments and execution patterns) for abstraction nodes to provide better overview. Finally, we introduced an interactive software tool which can be used to browse the code-base in a guided way by targeting specific units of the source code. In the user study, we found developers can use our tool to overview a project alongside finding help for doing particular jobs such as locating relevant files and understanding relevant domain knowledge.

Plain Language Summary