This site requires Cookies enabled in your browser for login.
Updating ...
WaterNet Home
WaterNet
for
pour le
Canada
Menu
WaterNet
Home
GWFO
Home
Master
List
Data
Centre
Collections
X
Defaults
Select All
Websites
X
Global Water Futures Observatories (GWFO) Global Water Futures (GWF) Global Institute for Water Security (GIWS) International Network of Alpine Research Catchment Hydrology
Legacy Research Programs
X
Changing Cold Regions Network (CCRN) Drought Research Initiative (DRI) International Network of Alpine Research Catchment Hydrology (Legacy Site) Improving Processes & Parameterization for Prediction in Cold Regions Hydrology (IP3) The Mackenzie Global Energy and Water Cycle Experiment (GEWEX) Study (MAGS)
Legacy sites
Map
Utilities
X
Account Settings Metadata Editor Record List Alias List Editor
Data Centre
Data Type Editor
. . .
X
Clear
Select All
Advanced Search
Go to Top⇡
Related items loading ...
Fetching Chart ...
Publication Additional Information Download
Publication Type
Book Chapter
Authorship
Debasish Chakroborti, Banani Roy, Amit Kumar Mondal, Golam Mostaeen, Ralph Deters, Chanchal K. Roy and Kevin A. Schneider.
Title
A Data Management Scheme for Micro-Level Modular Computation-intensive Programs in Big Data Platforms
Year
2019
Publication Outlet
In: Moshirpour M., Far B., Alhajj R. (eds) Highlighting the Importance of Big Data Management and Analysis for Various Applications, 20pp., vol x. Springer (to appear with minor revisions
DOI
https://link.springer.com/chapter/10.1007/978-3-030-32587-9_9
ISBN
978-3-030-32586-2
Citation
Debasish Chakroborti, Banani Roy, Amit Kumar Mondal, Golam Mostaeen, Ralph Deters, Chanchal K. Roy and Kevin A. Schneider. A Data Management Scheme for Micro-Level Modular Computation-intensive Programs in Big Data Platforms, In: Moshirpour M., Far B., Alhajj R. (eds) Highlighting the Importance of Big Data Management and Analysis for Various Applications, 20pp., vol x. Springer (to appear with minor revisions). Book Chapter
Abstract
Big Data analytics or systems developed with parallel distributed processing frameworks (e.g., Hadoop and Spark) are becoming popular for finding important insights from a huge amount of heterogeneous data (e.g., image, text, and sensor data). These systems offer a wide range of tools and connect them to form workflows for processing Big Data. Independent schemes from different studies for managing programs and data of workflows have been already proposed by many researchers and most of the systems have been presented with data or metadata management. However, to the best of our knowledge, no study particularly discusses the performance implications of utilizing intermediate states of data and programs generated at various execution steps of a workflow in distributed platforms. In order to address the shortcomings, we propose a scheme of Big Data management for micro-level modular computation-intensive programs in a Spark and Hadoop-based platform. In this paper, we investigate whether management of the intermediate states can speed up the execution of an image processing pipeline consisting of various image processing tools/APIs in Hadoop Distributed File System (HDFS) while ensuring appropriate reusability and error monitoring. From our experiments, we obtained prominent results, e.g., we have reported that with the intermediate data management, we can gain up to 87% computation time for an image processing job.
Program Affiliations
GWF: Global Water Futures
Publication Stage
N/A
Download Links
https://link.springer.com/chapter/10.1007/978-3-030-32587-9_9
© 2026 - WaterNet Version 2026-06-15
Global Water Futures Observatories
Powered by
G W F Net
T-2023-01-11-81K0070u6RUiGxI3Onm3BuA Publication 1.0