What is Optical Character Recognition (OCR)?

Optical character recognition

Optical character recognition (OCR) is essentially the conversion of non-electronic documents to machine-readable and encoded text. This can be achieved across many forms of wording and content including handwritten and typed letters and forms, photographs and even television.

Original documents are scanned and digitised, or images of text processed, and intelligent data capture extracts the information, creating a modified version of a file and bringing a range of options that weren’t previously available.

The OCR Algorithm

OCR comprises of two basic algorithms – matrix matching and feature extraction.

Matrix matching, also known as pattern recognition or image correlation, is generally best used with type-printed texts and was first used by early photocell OCR systems. It assesses the document one pixel at a time and compares it with pre-stored glyphs (characters) in the system. Of course for this to work, the input glyph must be sufficiently formatted with ample spacing between characters meaning that cursive and unique handwritten fonts are almost impossible to recognise.

Feature extraction works by breaking down glyphs into smaller sections such as loops, intersections and directional lines to decipher text. The minute details are taken into account and compared with almost vector type characters matching the features of the text.

Commonly used to digitise handwritten surveys, forms and questionnaires, the process is sometimes referred to as “intelligent character recognition” (ICR) and is the most contemporary form of OCR which allows for even text deemed illegible to become easily readable.

How Can OCR Be Used?

Many people and companies use OCR in line with data entry processes. Businesses that receive large amount of paper documents use it to convert everything from questionnaires and HR forms to invoices and books.

The data extraction operation recognises characters and symbols and processes them into electronic information. These digitised text documents can then be stored without bulky furniture and searched for, edited, revised and displayed in a much easier to read and communicate format.

Finding specific files is sped up greatly as documents can be searched by a number of parameters and by more than one person at a time if need be. Documents are also much easier to share as they can be placed in computer folders or emailed to the relevant people.

How Accurate is OCR?

Between 1992 and 1996, studies were carried out on late 19th and early 20th century newspapers to ascertain the levels of accuracy that can be obtained by OCR. Highs of 99% were recorded with results hitting 100% after manual review of the text. Of course this study was a fairly long time ago and advanced technology and innovative solutions have resulted in even greater precision, although a manual review is usually performed to check for any discrepancies.