This page provides a summary of terminology used in the Conjecture framework.
[wmh: Additions and clarifications are always needed - please help by editing this page where useful!]
- horizontal: the direction that aligns glyphs within a line. Normally, this is just the x direction, but if the image is rotated, then 'horizontal' refers to the x direction after this rotation has been compensated for.
- vertical: the direction orthogonal to horizontal.
- line partitioning: The identification of the regions within a Page representing individual Lines of horizontally adjacent Glyphs. Line partitioning may occur before or after Glyph partitioning, depending on the overall partitioning strategy employed.
- glyph partitioning: The identification of the regions withiin a Page representing individual Glyphs. Glyph partitioning is not the same as Glyph identification, but instead preceeds it.
- glyph identification: The process of converting a graphic image representing a character into a unicode value. The heart of any OCR program. Its efficacy depends not only on its internal strategies, but also the accuracy of Glyph partitioning. If a region is identified as a Glyph, but is instead only part of a glyph, or multiple glyphs, the accuracy will naturally decline.
- component: A formalization of an OCR issue. Each component has a canonical name and an entire class hierarchy within the Conjecture framework. For example, one issue of importance is 'glyph identification', which has been formalized into the IdentifyComponent. The IdentifyComponent class provides an interface that together allows the issue to be "resolved" or "addressed". Subclasses of IdentifyComponent provide alternative implementations of this interface.
- OCR issue: Some concept or problem of importance during optical character recognition. A conceptual term. Examples of issues include segmentation, identification, formatting, dust removal, line angle detection, etc.
- component implementation: for a component <name>, an implementation is a subclass of the <name>Component interface (and an implementation of the interface within the subclass).
- strategy: synonynm for 'implementation'
- OCR: This term is used in two different contexts within Conjecture. It can mean "optical character recognition" (a verb), or it can mean "optical character recognizer" (a noun). The OCR class hierarchy uses the noun semantics - each subclass of OCR is an optical character recognizer (that performs optical character recognition :-) Optical character recognition involves taking an image as input, and producing formatted text as output.
Generated on Mon Jun 12 20:27:16 2006 for Conjecture by
1.4.6