The Testing Harness of Conjecture

Conjecture encourages users to contribute everything from tiny modifications in existing methods of existing classes, to full-fledged OCR Modules. However, modifications to the codebase can potentially affect more than is immediately apparent. Testing a change by using a single set of command-line flags, for a single OCR module, on a single input image, does not accurately assess the overall impact that a change has on the Conjecture environment as a whole.
For this reason, Conjecture provides an assessment infrastructure with the following features:

A (arbitrarily large) collection of input images
A corresponding collection of text files representing what an OCR implementation should produce, if it had 100% accuracy.
The ability to test individual OCRs or groups of OCRs (or all OCRs), against individual images or groups of images (or all images).
Every OCR Module has an associated set of configurable parameters, values that somehow influence performance and can be configured by the user. A particular combination of values for all configurable parameters represents a single module variant. The assessment infrastructure allows one to test a specific variant or a group of variants. Note that the complete set of variants possible is effectively infinite because some parameters can take on floating point values). For this reason, Conjecture provides a way of enumerating and naming variants that produce interesting results.
Can establish the accuracy of each variant of each OCR.
Can report the affect that a change in the code base has on the entire execution environment (increased or decreased accuracy, increased or decreased time-to-compute, etc.)

The $CONJECTURE/harness directory structure

$CONJECTURE/harness has three important components:

The ../config/Conjecture.tests file, describing algsets and algset variants to test. [wmh: this filename will be changing!]
The db subdir
The modules subdir
The $CONJECTURE/config/Conjecture.tests file
This file describes the set of OCR Modules and module variants to test.
  # this is the default module, 
  # used if -M is not specified
  MODULE default
    VARIANT v1;

  MODULE gocr
    # variant 1 of the gocr module
    VARIANT v1;

  MODULE ocrad  # a comment, ignored
    VARIANT v1;
    VARIANT v2 FLAGS { -Z "-T .7" };
    VARIANT v3 FLAGS { -Z "-T .75" };
         
In this file, anything after a '#' (including the '#') is removed as a comment). The ocrgen ignores blank lines in this file. Except for the above, there are only two legal line formats within this file (any other line format is an error and is ignored by ocrgen).

an line containing the keyword MODULE followed by <module> name. The <module> must be a legal identifer (start with a letter or underscore, and continue with letters, underscores or digits) and correspond to the name of Module defined in $CONJECTURE/config/Conjecture.modules. [wmh: the .modules and .tests files will probably be merged later]
a line containing the keyword VARIANT, followed by a variant name, <variant>, followed optionally by the keyword FLAGS an open brace, an arbitrary sequence of conjecture command line flags, a close brace, and a semicolon. The variant name must be a legal identifier. If FLAGS is not provided, default values are used for all flags. If it is provided, the open/close braces, flags, and semicolon must be provided.

Note that the value associated with FLAGS does not represent the entire set of flags sent to conjecture. In particular, the following flags are implicitly provided by the test harness during each invocation of conjecture generated by the test harness.
  -V 0 
  -M <module> 
  -i <input>.<suffix> 
  -o modules/<module>/<variant>.ocr
         
All of these flags are added after the flags from the VARIANT (and thus override any attempts to define them within FLAGS).
[wmh: In general, XML is the proper way to describe input formats nowadays. However, I wanted this file to be accessible to non-programmers, and didn't want to force them to learn XML. By defining a very simple format, with only two kinds of acceptable lines, it is hoped that programming-phobic individuals will be able to edit and experiment with Conjecture at this level. The format is actually based on a xml-equivalent syntax (related to research completely unrelated to OCRs).]
The harness/db directory

This directory contains an arbitrarily large collection of input image files in various graphics formats. Each input image has a corresponding ,valid file providing the text representation of the image file. The .valid file represents the desired output that should be produced by an OCR that has 100% accuracy.
One ramification of the above is that if two image files have the same prefix, with different suffixes (.jpg vs .pnm, for example), both images must refer to exactly the same textual content, since both files will be described by the single associated .valid file.

The harness/modules directory

The modules directory contains subdirectories for each OCR module, <module>. Within each of these directories, a subsubdirectory exists for each variant, <variant>.
The test harness involves executing conjecture on many input files using many command-line variants. The output from each run is placed into harness/ modules/ <module>/ <variant>/ <input>.ocr.
The variant subdirectories contain, in addition to .ocr files, <input>.val files. These files are similar to harness/db/<input>.valid files, except that rather than representing the perfect output, they represent the "expected" result for the given module and variant. These files are used in order to quickly determine whether a change in the code base has had an effect (intentional or otherwise) on the performance of a specific module variant.

The `ocrtest` Script

The ocrtest script is the interface to the Conjecture test harness. It is usually invoked with one of these purposes in mind:

validity testing
assessment testing
comparison testing
individual testing
Validity Testing
When changes to the codebase have been performed, it is important to know the overall impact these changes have had. Testing on a single input file with a single combination of flags does not guarantee that a change doesn't introduce unintentional affects that detrimentally (or positively) affects some other module and/or variant.
Using the command:
  % ocrtest -a
        
runs the test harness over every module and variant specified in $CONJECTURE/ config/ Conjecture.tests, for every image file found in $CONJECTURE/harness/db, writing results into harness/ modules/ <module>/ <variant>/ <input>.ocr. This result is then compared against harness/ modules/ <module>/ <variant>/ <input>.val. A 2D table is then generated, with module/variants as rows, and input files as columns. Within the table, if the module/variant/input produces an .ocr that matches .val exactly, the table entry will contain plus ('+') sign. If they do not match, this is indicated by an asterisk ('*') sign. Numerous other characters are used to indicate others kinds of errors ('@' if .ocr was generated, '%' if .val file is missing, '?' if the executable exited abnormally).
An example of the output produces by ocrtest during verification is shown below:
   ============================================
   Name      4x6   5x7   5x8 ocr-a ocr-b   rod 
   --------------------------------------------
   default                                     
     v1        +     +     +     +     +     + 
   gocr                                        
     v1        +     +     +     +     +     + 
   ocrad                                       
     v1        +     +     +     +     +     + 
     v2        +     +     +     +     +     + 
     v3        +     +     +     +     +     + 
   --------------------------------------------
   Table: Conjecture Test Harness
   ============================================
        
If the table contains all '+', then the entire application is working exactly as expected. However, if any '*' appear (or any of a number of other symbols), it indicates that unexpected output has occurred. Whether this is a good thing or a bad thing depends on whether the difference represents an increase in accuracy or not, which is established using assessment testing, discussed next.
[wmh: document the other symbols that have been added!]
The ocrtest script accepts a -x argument that specifies the specific conjecture executable to use. By default, it is conjecture, and the PATH environment variable is used to establish where to find it. However, $CONJECTURENROOT/harness/Makefile explicitly specifies a -x ./ocrprog flag to its invocations of ocrtest. Various targets in $CONJECTURE/src copy executables to $CONJECTURE/harness/ocrprog when appropriate.
Assessment Testing
The goal of an OCR is to produce accurate results. The assessment capabilities of the Conjecture test harness allow one to see how every module variant performs across all input files, by reporting the accuracy (as a percentage correct relative to the expected output).
Assessment teseting allows us to identify the best module variant for each input file. As well, when changes to the code base have produced changes in output (as indicated by validity testing discussed above), assessment testing allows one to see whether the changes were an improvement or a hindrance to overall accuracy.
Using the command:
  % ocrtest -A
        
does everything that validity testing does, but in addition, it executes the ocrdiff to compare harness/modules/<module>/<variant>/<input>.val against db/<input>.valid. Remember that db/<input>.valid represents the "goal" output that an ocr with 100% accuracy would produce. The ocrdiff script performs an intelligent character-by-character analysis to establish the accuracy of the ocr output.
The result of assessment testing is another table, like this:
   ============================================
   Name      4x6   5x7   5x8 ocr-a ocr-b   rod 
   --------------------------------------------
   default                                     
     v1      87    97    98    88   100    80  
   gocr                                        
     v1      87    97    98    88   100    80  
   ocrad                                       
     v1       0     0     0    49   -22    57  
     v2       0     0     0    49   -22    54  
     v3       0     0     0    49   -22    57  
   --------------------------------------------
   Table: Conjecture Test Harness
   ============================================
        
It reports the percentage accuracy for each module variant when applied to every input image. It also reports the validity information, except that anytime a '+' would have been shown in the validity table, a space (' ') is shown (to avoid unnecessary clutter).
[wmh: update this table, explain why the -22 is occuring, etc.]
Comparison Testing
Comparison testing is very similar to Assessment testing:
   % ocrtest -A -x <newexec> -X <oldexec>
        
The only difference is that assessment results are computed for two different executables. Individual tables are shown for each. And then a "difference" table is presented showing the accuracy of the first minus the accurace of the second. This gives a convenient means of assessing at a glance the relative impact that a particular change in the code base has had on overall performance. Positive entries in this table indicate improvements in accuracy, while negative entries indicate a worsening of accuracy.
Naturally, comparison testing assumes that you have two different executables to compare. When you are experimenting with a new algorithm, you should always make a copy of the "baseline" executable (and any other incremental improvements along the way) so that you will have them available for comparison testing.

Quick Links

Downloads	:	V-0.06 Repository
Howto	:	Install Modules Implementations
Community	:	Mailing List Wiki SVN
To Do	:	Questions Easy Design Implementation Infrastructure

Conjecture is using services provided by

The Testing Harness of Conjecture

The `$CONJECTURE/harness` directory structure

The `$CONJECTURE/config/Conjecture.tests` file

The `harness/db` directory

The `harness/modules` directory

The `ocrtest` Script

Validity Testing

Assessment Testing

Comparison Testing

Quick Links

The Testing Harness of Conjecture

The $CONJECTURE/harness directory structure

The $CONJECTURE/config/Conjecture.tests file

The harness/db directory

The harness/modules directory

The ocrtest Script

Validity Testing

Assessment Testing

Comparison Testing

Quick Links

The `$CONJECTURE/harness` directory structure

The `$CONJECTURE/config/Conjecture.tests` file

The `harness/db` directory

The `harness/modules` directory

The `ocrtest` Script