Find out more on how CACI Knowledge-based Solutions can help you!

Knowledge-Based Solutions - Optical Character Recognition (OCR)

This process converts text on a scanned page into ASCII code, enabling pages to be found by searching for words and phrases in the document text. We also perform OCR on documents already in electronic image form (such as PDF files) that do not require scanning.

The CACI OCR pipeline is modular and flexible. We analyze client needs and test alternatives before recommending the optimal pre-processing and complement of OCR engines for a project. All OCR results are delivered according to customer specified format and media.

CACI's OCR pipeline undergoes continuous process improvement to increase automation, accuracy, capacity and speed. We employ multiple polling OCR engines and offer automated OCR correction. CACI uses state-of-the-art Scansoft to ensure the best possible product.

CACI provides foreign character set recognition and can OCR over 100 languages (including mixed languages on a single page in some instances).

QC/Cleanup

  • Cleanup of images ensures best initial OCR returns
  • Specialized post-OCR software used to correct common OCR errors and idiosyncrasies
  • Spell check (not merely against a dictionary, but also libraries of known OCR problems)
  • De-hyphenation
  • Special settings to OCR columned data
  • Random checking of OCR output for particular batches

Contact CACI
For more information contact Veronica Hubbard, 703-642-4562, vhubbard@caci.com