For this reason, Conjecture provides an assessment infrastructure with the following features:
../config/Conjecture.tests
file, describing algsets and algset variants to test. [wmh: this filename will be changing!]
db
subdir
custom
subdir
# this is the default module, used if -A is not specified MODULE default VARIANT v1; MODULE gocr # variant 1 of the gocr module VARIANT v1; MODULE ocrad VARIANT v1; VARIANT v2 FLAGS { -Z "-T .7" }; VARIANT v3 FLAGS { -Z "-T .75" }; # this is a comment and will be removed
There are two legal formats for lines in this file:
$CONJECTUREROOT/config/Conjecture.modules
.
conjecture
command line flags, a close brace, and a semicolon. The variant name must be a legal identifier.Some additional comments about the format:
ocrtest
test harness script.
Note that the flags specified for each variant do not represent the entire set of flags sent to conjecture
. In particular, the following flags are implicitly provided by the test harness to each invocation:
-V 0 -A <algset> -i <input>.<suffix> -o custom/<algset>/<variant>.ocr
All of these flags occur after the flags from the variant (and thus override any attempts to define them within the variants).
[wmh: In general, XML is the proper way to describe input formats nowadays. However, I wanted this file to be accessible to non-programmers, and didn't want to force them to learn XML. By defining a very simple format, with only two kinds of acceptable lines, it is hoped that programming-phobic individuals will be able to edit and experiment with Conjecture at this level. Yes, the format can be made more readable. Yes, I'll work on it sometime :-)]
,valid
file providing the text representation of the image file. The .valid
file represents the desired output that should be produced by an OCR that has 100% accuracy.One ramification of the above is that if two image files have the same prefix, with different suffixes (.jpg vs .pnm, for example), both images must refer to exactly the same textual content, since both files will be described by the single associated .valid file.
custom
directory contains subdirectories for each OCR module, <modulet>. Within each of these directories, a subsubdirectory exists for each variant, <variant>.
The test harness involves executing conjecture
on many input files using many command-line variants. The output from each run is placed into harness/custom/<module>/<variant>/<input>.ocr
.
The variant subdirectories contain, in addition to .ocr
files from individual executions of the ocr, <input>.val
files. These files are similar to harness/db/<input>.valid files, except that rather than representing the perfect output, they represent the "expected" result for the given module and variant. These files are used in order to quickly determine whether a change in the code base has had an effect (intentional or otherwise) on the performance of a specific module variant.
ocrtest
script is the interface to the Conjecture test harness. It is usually invoked with one of these purposes in mind:
Using the command:
% ocrtest -a
runs the test harness over every module and variant specified in $CONJECTUREROOT/config/Conjecture.tests
, for every image file found in $CONJECTUREROOT/harness/db, writing results into harness/custom/<module>/<variant>/<input>.ocr
. This result is then compared against harness/custom/<module>/<variant>/<input>.val
. If the files match identically, this is indicated with a '+' sign. If they do not match, this is indicated by a '*' sign. A table is produced summarizing the results. An example of such output is:
============================================ Name 4x6 5x7 5x8 ocr-a ocr-b rod -------------------------------------------- default v1 + + + + + + gocr v1 + + + + + + ocrad v1 + + + + + + v2 + + + + + + v3 + + + + + + -------------------------------------------- Table: Conjecture Test Harness ============================================
If the table contains all '+', then the entire application is working exactly as expected. However, if any '*' appear (or any of a number of other symbols), it indicates that unexpected output has occurred. Whether this is a good thing or a bad thing depends on whether the difference represents an increase in accuracy or not, which is established using assessment testing, discussed next.
[wmh: document the other symbols that have been added!]
The ocrtest
script accepts a -x
argument that specifies the specific conjecture
executable to use. By default, it is conjecture
, and the PATH environment variable is used to establish where to find it. However, $CONJECTURENROOT/harness/Makefile
explicitly specifies a -x ./ocrprog
flag to its invocations of ocrtest
. Various targets in $CONJECTUREROOT/src
copy executables to $CONJECTUREROOT/harness/ocrprog
when appropriate.
Assessment teseting allows us to identify the best module variant for each input file. As well, when changes to the code base have produced changes in output (as indicated by validity testing discussed above), assessment testing allows one to see whether the changes were an improvement or a hindrance to overall accuracy.
Using the command:
% ocrtest -A
does everything that validity testing does, but in addition, it executes the ocrdiff
to compare harness/custom/<module>/<variant>/<input>.val
against db/<input>.valid
. Remember that db/<input>.valid
represents the "goal" output that an ocr with 100% accuracy would produce. The ocrdiff
script performs an intelligent character-by-character analysis to establish the accuracy of the ocr output.
The result of assessment testing is another table, like this:
============================================ Name 4x6 5x7 5x8 ocr-a ocr-b rod -------------------------------------------- default v1 87 97 98 88 100 80 gocr v1 87 97 98 88 100 80 ocrad v1 0 0 0 49 -22 57 v2 0 0 0 49 -22 54 v3 0 0 0 49 -22 57 -------------------------------------------- Table: Conjecture Test Harness ============================================
It reports the percentage accuracy for each module variant when applied to every input image. It also reports the validity information, except that anytime a '+' would have been shown in the validity table, a space (' ') is shown (to avoid unnecessary clutter).
[wmh: update this table, explain why the -22 is occuring, etc.]
% ocrtest -A -x <newexec> -X <oldexec>
The only difference is that assessment results are computed for two different executables. Individual tables are shown for each. And then a "difference" table is presented showing the accuracy of the first minus the accurace of the second. This gives a convenient means of assessing at a glance the relative impact that a particular change in the code base has had on overall performance. Positive entries in this table indicate improvements in accuracy, while negative entries indicate a worsening of accuracy.
Naturally, comparison testing assumes that you have two different executables to compare. When you are experimenting with a new algorithm, you should always make a copy of the "baseline" executable (and any other incremental improvements along the way) so that you will have them available for comparison testing.