Conjecture Terminology
- component
-
A formalization of an issue that needs to be addressed
in order to provide image-to-text conversion. Each
component has a canonical name and generates an entire
class hierarchy within the Conjecture framework. For
example, one issue of importance is 'glyph
identification', which has been formalized into the
IdentifyComponent component. The
IdentifyComponent class provides an
interface that "resolves" or "addressed" the issue.
Subclasses of IdentifyComponent provide
alternative implementations of this interface. A
particular Module will use a specific subclass of
IdentifyComponent to provide glyph
identification.
- component implementation
-
For a component
<Name> , an implementation is a
subclass of the <Name>Component interface.
- glyph segmentation
-
The identification of the regions within a Page
representing individual Glyphs. Glyph segmentation is
not the same as Glyph identification, but instead
necessarily preceeds it.
- glyph identification
-
The process of converting a graphic image representing
a character (a Glyph) into a unicode value. It is at
the core of any OCR program. Its efficacy depends not
only on its internal strategies, but also the accuracy
of Glyph segmentation. If a region is identified as a
Glyph, but is instead only part of a glyph, or
multiple glyphs, the accuracy of identification will
inevitably decline.
- horizontal
-
The direction that aligns glyphs within a line.
Normally, this is just the x direction, but if the
image is rotated, then 'horizontal' refers to the x
direction after this rotation has been compensated
for.
- issue
-
Some concept or problem of importance during optical
character recognition. A conceptual term. Examples of
issues include segmentation, identification,
formatting, dust removal, line angle detection, etc.
- line segmentation
-
The identification of the regions within a Page
representing individual Lines of horizontally
adjacent Glyphs. Line segmentation may occur before
or after Glyph segmentation, depending on the overall
segmentation strategy employed.
- OCR
-
This term is used in two different contexts within
Conjecture. It can mean "optical character
recognition" (a verb), or it can mean "optical
character recognizer" (a noun). The OCR class
hierarchy uses the noun semantics - each subclass of
OCR is an optical character recognizer (that performs
optical character recognition :-) Optical character
recognition involves taking an image as input, and
producing formatted text as output.
- strategy
-
Synonynm for 'implementation'
- vertical
-
The direction orthogonal to horizontal. It too is
affected by page rotation issues.
|
|