Paper
1 August 2022 MyETL: a flexible and efficient data quality profiling framework
Nianfeng Weng, Jianjun Cao, Guoquan Jiang
Author Affiliations +
Proceedings Volume 12257, 4th International Conference on Information Science, Electrical, and Automation Engineering (ISEAE 2022); 122570S (2022) https://doi.org/10.1117/12.2640107
Event: 4th International Conference on Information Science, Electrical, and Automation Engineering (ISEAE 2022), 2022, Guangzhou, China
Abstract
Data quality is very important in data centric environments. The data quality profiling task is to filter out data records which violate domain semantic rules. The data quality profiling framework is expected to be flexible to represent complex domain semantics and efficient to tackle large amount of data records. We propose a data quality profiling framework, named MyETL, which is conceived from ETL paradigm to fulfil these requirements. A directed acyclic graph is employed to represent domain semantic rules in design phase. Then the graph is optimized by a topology optimization procedure. At last, the graph is mapped to threads and memory objects and scheduled to execution. As implemented based on OSGi framework, MyETL is constructed by bundles and can be extended for convenient.
© (2022) COPYRIGHT Society of Photo-Optical Instrumentation Engineers (SPIE). Downloading of the abstract is permitted for personal use only.
Nianfeng Weng, Jianjun Cao, and Guoquan Jiang "MyETL: a flexible and efficient data quality profiling framework", Proc. SPIE 12257, 4th International Conference on Information Science, Electrical, and Automation Engineering (ISEAE 2022), 122570S (1 August 2022); https://doi.org/10.1117/12.2640107
Advertisement
Advertisement
RIGHTS & PERMISSIONS
Get copyright permission  Get copyright permission on Copyright Marketplace
KEYWORDS
Profiling

Optimization (mathematics)

Databases

Data processing

Back to Top