HAQ, MUHAMMAD RAIHAN IZHARUL (2025) INTEGRASI TEKNOLOGI OCR DENGAN PROSES ETL UNTUK OTOMATISASI DATA PIPELINE ARSIP PENCAIRAN KEUANGAN BPS KABUPATEN SUKABUMI. Other thesis, Nusa Putra University.
M. Raihan Izharul H (Repo).pdf - Other
Download (746kB)
Abstract
In the digital era, managing archival data poses challenges for many institutions, including the Central Statistics Agency (BPS) of Sukabumi Regency, especially when dealing with unstructured PDF documents. This study develops a data pipeline by effectively integrating Optical Character Recognition (OCR) technology and the Extract, Transform, Load (ETL) process. Unstructured data from SPM and SP2D documents were automatically extracted with high accuracy, achieving an average of 98.52% for SPM using a combination of OCR and PDFPlumber, and 100% for SP2D extracted using PDFPlumber. The extraction results were stored in a data warehouse, then transformed using Apache Spark and loaded into data marts. The ETL process was automated using Apache Airflow, which operated reliably according to dependencies. The processed data were presented through an interactive Looker Studio dashboard in real-time, supporting efficient archive management and more informed decision-making. This study not only provides a solution to existing archival management problems but also opens opportunities for further development in the application of big data technologies and business process automation in the public sector.
Keywords: Data Warehouse, Optical Character Recognition (OCR), Extract Transform Load (ETL), Automated Pipeline, Financial Disbursement
| Item Type: | Thesis (Other) |
|---|---|
| Subjects: | T Technology > Computer Science > Informatic Engineering |
| Divisions: | Faculty of Engineering, Computer and Design > Informatic Engineering |
| Depositing User: | Unnamed user with email liu@nusaputra.ac.id |
| Date Deposited: | 24 Jul 2025 09:37 |
| Last Modified: | 24 Jul 2025 09:37 |
| URI: | http://repository.nusaputra.ac.id/id/eprint/1529 |
