An Extensible Optical Character Recognition Framework


Conjecture is a modular, extensible, open-source C++ framework for Optical Character Recognition (OCR). Conjecture is not a single OCR, but rather is an extensible collection of OCRs that can be explored, analyzed, compared, extended, modified, and merged within a unified environment.


  • Modular: Conjecture is built around a collection of OCR Modules (more...)
    • Independent OCRs: Each Module provides a complete solution to the problem of converting an input image into output text. When the conjecture executable is invoked, the user specifies which OCR Module will be used. Conjecture is not a single OCR, but rather an arbitrarily large collection of OCRs.
    • Class-based: Each Module is implemented by a C++ "module" class, which uses a number of additional "component" classes that implement solutions to the various subproblems associated with OCR).
    • Third-party commercial OCRs: Conjecture can create a Module that "wraps" an arbitrary third-party OCR, thus allowing that OCR to participate in the Conjecture framework.
    • Third-party open-source OCRsConjecture provides Modules for third-party open-source OCRs like GOCR, Ocrad and Claraocr that incorporate their code bases into Conjecture. This allows Conjecture to benefit from and build on top of existing OCR development.
  • Testable: Conjecture provides a sophisticated testing infrastructure that (more...)
    • Database: Maintains a database of input images and associated text files representing what an OCR should produce if it is 100% accurate.
    • Module Variants: Formalizes the concept of a Module Variant, which concists of a particular OCR Module and a set of values for the configurable parameters associated with that Module.
    • Testing: Allows users to execute one or more variants of one or more OCR modules on one or more input images.
    • Verification: Allows one to verify that a particular change in the Conjecture code base has not caused unexpected consequences. This is accomplished by invoking conjecture on all variants of all modules on all input images and comparing the output of each run against expected output.
    • Assessment: Allows one to establish the accuracy of one or more module/variant/image triples by comparing generated output against desired output.
  • Extensible: Conjecture makes it easy to make Conjecture better (more...)
    • Open Source: Conjecture is open source (LGPL license), and read/write accessible (via a Subversion repository) to all who are interested in advancing the state-of- the-art in optical character recognition.
    • User Extensible: Conjecture provides an infrastructure that allows non-programmers to experiment with different configurable parameters, and to record the combination of parameter values that produce the best results. Over time, the best possible combination of input parameters for each module/image will be identified, benefitting all Conjecture users.
    • Programmer Extensible: Conjecture uses an object-oriented design emphasizing modularity, making it easy for programmers to make everything from tiny incremental changes in a single method of an existing Component, to entirely new Modules that implement everything necessary for image-to-text conversion.
      • Modules (and the associated Components) are described by an entry in a special Conjecture.modules file.
      • Conjecture can automatically and fully generate the module class based on the entry in the modules file.
      • A Module can reuse components that have already been developed for some other Module, or can create new ones.
      • Conjecture automatically produces a stub class for each new component. These stub classes can then be edited in order to provide a new implementation if the issue being addressed by the Component in question.


07 Jan 2007
Moved to sourceforge.
12 Jun 2006
Initial version of this website.
02 June 2006
Project started.

Quick Links

  Downloads : V-0.06
  Howto : Install
  Community : Mailing List
  To Do : Questions
Design Implementation Infrastructure

Conjecture is using services provided by SourceForge