The goal of this research is to bring the digitized text closer to the original newspaper articles by applying post-correction. Post-correction involves improving digitized text quality by manipulating the textual output of the OCR process directly. The idea is that better quality data boosts eHumantities research. Although the quality of the KB newspaper data would definitely benefit from improving the OCR process itself (improved image recognition), post-correction will still be necessary, because the quality of historical newspapers is suboptimal for OCR (for example, due to poor paper and print quality).
https://www.esciencecenter.nl/project/deep-learning-ocr-post-correction
Ongoing
2018
Not set
The idea is that better quality data boosts eHumantities research.
The Netherlands eScience Center
Netherlands — Academia
https://www.esciencecenter.nl
National Library of the Netherlands
Submit New Project
ITU, Place des Nations, 1211 Geneva 20, Switzerland