This site requires Cookies enabled in your browser for login.
Updating ...
WaterNet Home
WaterNet
for
pour le
Canada
Menu
WaterNet
Home
GWFO
Home
Master
List
Data
Centre
Collections
X
Defaults
Select All
Websites
X
Global Water Futures Observatories (GWFO) Global Water Futures (GWF) Global Institute for Water Security (GIWS) International Network of Alpine Research Catchment Hydrology
Legacy Research Programs
X
Changing Cold Regions Network (CCRN) Drought Research Initiative (DRI) International Network of Alpine Research Catchment Hydrology (Legacy Site) Improving Processes & Parameterization for Prediction in Cold Regions Hydrology (IP3) The Mackenzie Global Energy and Water Cycle Experiment (GEWEX) Study (MAGS)
Legacy sites
Map
Utilities
X
Account Settings Metadata Editor Record List Alias List Editor
Data Centre
Data Type Editor
. . .
X
Clear
Select All
Advanced Search
Go to Top⇡
Related items loading ...
Fetching Chart ...
Publication Additional Information Download
Publication Type
Thesis
Authorship
Nafi, K. W.
Title
Exploring Cross-Language Software Similarity Analysis Using Source Code Context
Year
2026
Publication Outlet
Department of Computer Science, University of Saskatchewan
DOI
https://hdl.handle.net/10388/18057
Citation
Nafi, K. W. (2026) Exploring Cross-Language Software Similarity Analysis Using Source Code Context, Department of Computer Science, University of Saskatchewan https://hdl.handle.net/10388/18057
Abstract
The rapid growth of multi-language and cross-platform software development has created an urgent need for effective techniques to identify functional similarity across programming languages. Developers routinely reuse or reimplement functionally similar code blocks across cross-language and multilingual software systems, resulting in intentional and unintentional cross-language similar code fragments, as well as the adaptation of APIs and libraries that serve similar purposes but are implemented in different languages. While these adaptations can improve portability and broaden software reach to various users, they also increase development cost, maintenance complexity, and the potential for inconsistency or defects. Despite recent advances in machine learning, code representation learning, and Large Language Models (LLMs), existing approaches for cross-language software similarity often struggle with deep syntactic reasoning, diverse coding styles, and limited availability of high-quality multi-lingual code datasets. This thesis is grounded in the premise that accurate detection of cross-language code similarity can significantly mitigate longstanding challenges in cross-language software development and maintenance. Motivated by this premise, the thesis investigates the foundational problem of establishing reliable, robust cross-language code-similarity measures. The proposed investigation aims to support a wide range of software engineering tasks, including single-language, cross-language, and multi-language development and maintenance activities. Drawing on a comprehensive systematic literature review, the thesis identifies key limitations in the state of the art and proposes five complementary contributions across four levels of code granularity. First, it introduces a universal software similarity detector (CroLSim) that categorizes cross-language software applications by leveraging API call documentation similarity. Second, it presents a source code feature-driven and API documentation-adapted cross-language clone detection model (CLCDSA) that combines syntactic features with API documentation semantics similarity to identify cross-language clones more accurately. Third, it develops an LLM-guided, multimodal framework (XLCoCo) that fuses multi-intent source code information retrieval from LLMs and attention-based VAEs to predict structural feature similarity, improving the performance of cross-language code-to-code search and clone detection tasks. Fourth, it proposes XLibRec, a technique for recommending analogical cross-language libraries by mining reliable library usage information from different developer community discussion forums, along with Library short descriptions collected from various package managers. Finally, it introduces XAPIRec, an efficient method for analogical API mapping based on API usage patterns, mined from functionally equivalent API usage patterns collected in an automatic way, and LLM-driven API document similarity, which completely replaces the need to manually mine functionally similar parallel code fragments or any prior knowledge of true mapped API or a labeled API mapping dataset. Together, these contributions form a scalable ecosystem that advances automation, accuracy, and practical applicability in industry-level cross-language software development and maintenance. The techniques are extensively evaluated against state-of-the-art baselines across diverse datasets and programming languages, demonstrating consistent improvements in precision, recall, ranking quality, and real-world usability. Overall, this thesis offers a unified framework to support developers and organizations in building, understanding, and maintaining robust cross-language software systems in large scale.
Program Affiliations
GWF: Global Water Futures
GWFO: Global Water Futures Observatories
Project Affiliations
GWF-MWF: Mountain Water Futures
Publication Stage
Published
Download Links
https://harvest.usask.ca/bitstreams/749ebc2b-923f-416f-850a-5da1760d8d03/download
© 2026 - WaterNet Version 2026-06-10
Global Water Futures Observatories
Powered by
G W F Net
T-2026-06-01-t1Hifm0jeeEqKM2BCKAs2lA Publication 1.0