Main Article Content
The administration activity in an institute is largerly done by using a paper based mailing and document as a media. Therefore, a great effort needs to be performed in the case of management and archiving, in the form of providing storage space through the categorizing system. Digitalization of document by scanning it into a digital image is one of the solution to reduce the effort to perform the work of archiving and categorizing such document. It also provide searching feature in the form of metadata, that is manually written during the digitalization process. The metadata can contains the title of document, summary, or category. The needs to manually input this metadata can be solved by utilizing Optical Character Recognition (OCR) that converts any text in the document into readable text storing in the database system. This research focused on the implementation of the OCR system to extract text in the scanned document image and performing optimization of the pre-processing stage which is Image Thresholding. The aim of the optimization is to increase OCR accuracy by tuning threshold value of given value sets, and resulting 0.6 as the best thresholding value. Experiment performed by processing text extraction towards several scanned document and achieving accuration rate of 92.568%.