Synopsis
Conjecture is a modular, extensible, open-source C++ framework for
Optical Character Recognition (OCR). Conjecture is not a single OCR, but
rather is an extensible collection of OCRs that can be explored,
analyzed, compared, extended, modified, and merged within a unified
environment.
Features
-
Modular: Conjecture is built around a collection of OCR Modules (more...)
-
Independent OCRs: Each Module provides a complete solution to
the problem of converting an input image into output text. When the
conjecture executable is invoked, the user specifies
which OCR Module will be used. Conjecture is not a single OCR, but
rather an arbitrarily large collection of OCRs.
-
Class-based: Each Module is implemented by a C++ "module" class, which
uses a number of additional "component" classes that implement
solutions to the various subproblems associated with OCR).
-
Third-party commercial OCRs: Conjecture can create a Module that "wraps"
an arbitrary third-party OCR, thus allowing that OCR to participate
in the Conjecture framework.
-
Third-party open-source OCRsConjecture provides Modules for
third-party open-source OCRs like GOCR, Ocrad and Claraocr that
incorporate their code bases into Conjecture. This allows Conjecture
to benefit from and build on top of existing OCR development.
-
Testable:
Conjecture provides a sophisticated testing infrastructure that (more...)
-
Database: Maintains a database of input images and
associated text files representing what an OCR should produce if it
is 100% accurate.
-
Module Variants: Formalizes the concept of a Module Variant, which
concists of a particular OCR Module and a set of values for the
configurable parameters associated with that Module.
-
Testing: Allows users to execute one or more variants of one or
more OCR modules on one or more input images.
-
Verification: Allows one to verify that a particular change
in the Conjecture code base has not caused unexpected consequences.
This is accomplished by invoking
conjecture on all
variants of all modules on all input images and comparing the
output of each run against expected output.
-
Assessment: Allows one to establish the accuracy of one or
more module/variant/image triples by comparing generated output
against desired output.
-
Extensible: Conjecture makes it easy to make Conjecture better (more...)
-
Open Source: Conjecture is open source (LGPL license), and
read/write accessible (via a Subversion repository) to all who are
interested in advancing the state-of- the-art in optical character
recognition.
-
User Extensible: Conjecture provides an infrastructure that
allows non-programmers to experiment with different configurable
parameters, and to record the combination of parameter values that
produce the best results. Over time, the best possible combination
of input parameters for each module/image will be identified,
benefitting all Conjecture users.
-
Programmer Extensible: Conjecture uses an object-oriented
design emphasizing modularity, making it easy for programmers to
make everything from tiny incremental changes in a single method of
an existing Component, to entirely new Modules that implement
everything necessary for image-to-text conversion.
- Modules (and the associated Components) are described
by an entry in a special
Conjecture.modules
file.
- Conjecture can automatically and fully generate the
module class based on the entry in the modules file.
- A Module can reuse components that have already been
developed for some other Module, or can create new ones.
- Conjecture automatically produces a stub class for
each new component. These stub classes can then be edited in
order to provide a new implementation if the issue being
addressed by the Component in question.
News
- 07 Jan 2007
-
Moved to sourceforge.
- 12 Jun 2006
-
Initial version of this website.
- 02 June 2006
-
Project started.
|
|