Conjecture: An Optical Character Recognizer

We need a name!
We need a logo!
We need someone to make our website CSS-enabled and beautiful!

Downloading Conjecture

There are two main ways to obtain the Conjecture source code:
  1. From the latest stable release, Conjecture-0.04, or a previous release.
           % cd <somedir>
           % wget http://www.holst.ca/Conjecture/docs/release/Conjecture-current.tgz
           % tar xzvf Conjecture-current.tgz
           % export CONJECTUREROOT=<somedir>/Conjecture-<version>
        
  2. From the SVN repository (if you have svn installed on your platform):
           % cd <somedir>
           % svn co svn://www.holst.ca/Conjecture
           % export CONJECTUREROOT=<somedir>/Conjecture
        
    • Note: If prompted for a username and password, email wade to obtain access.
    • Note: Write access to the repository always requires a username/password.

Installing Conjecture

The Conjecture environment currently requires the following process to install:

  1. Obtain the source by:
    • downloading a Conjecture-<version>.tgz release bundle and extract into <somedir>/Conjecture-<version>, or
    • making a local working copy of the Conjecture SVN repository in <somedir>/Conjecture

  2. Set your CONJECTUREROOT environment variable:
       # Using bourne-based shells
       % export CONJECTUREROOT=<somedir>/Conjecture-<version>   # official release
                  or
       % export CONJECTUREROOT=<somedir>/Conjecture             # svn repository
    
       # Using csh-based shells
       % setenv CONJECTUREROOT <somedir>/Conjecture-<version>   # official release
                  or
       % setenv CONJECTUREROOT <somedir>/Conjecture             # svn repository
       

    Note: Most Conjecture makefile targets and scripts require CONJECTUREROOT to be set, so it should be placed in your .bashrc/.cshrc files to ensure it is always initialized.

  3. Compile the source code:
       % cd $CONJECTUREROOT
       % make
       
  4. Run the test harness verification suite:
       % cd $CONJECTUREROOT
       % make verify
       

NOTE: A autoconf compilation environment will be provided in a future release.

An Overview of Conjecture

Conjecture has the same aim as every other OCR (the identification of unicode characters from graphical images). However, the philosophy behind its implementation differs from most other projects. It is based on the observation that no one strategy for character recognition will be optimal, given the vast degree of variation possible in input (differing fonts, font sizes, noise levels, orientation angles, existence of interspersed text and pictures, and numerous other issues that affect OCR accuracy). One strategy might excel for one document, while a very different strategy may be better for another document. Most existing open-source OCRs have a single strategy for solving the overall problem. Conjecture is meant to be the opposite - a repository for as many implementations/strategies as possible, for as many different ocr-related issues as possible.

The design of Conjecture has the following goals in mind:

Individuals can interact with the Conjecture framework at many different levels:

Documentation