The Kernel Classes
The Conjecture framework defines a core collection of C++
classes that represent the fundamental objects that any OCR needs.
The kernel classes consist primarily of Image and the
Element hierarchy, but also includes various other
fundamental classes. New classes will be considered for inclusion
in the kernel based on their general utility.
The Element Class Hierarchy
The Element hierarchy is at the absolute center of OCR
processing. Instances of the classes in this hierarchy are used to
describe the fundamental concepts that an OCR will operate on.
Class Summary
Root |
Abstract superclass of all classes in the Conjecture
framework.
|
Env |
A class providing a public interface to other
programs.
|
Image |
A collection of pixels. In-memory representation of
a file in a known graphics format.
|
Element |
Abstract superclass of a collection of classes representing a
part-whole decomposition of a graphic image into smaller and smaller
semantic units.
|
Page |
An Image and associated meta-data representing an entire page of
to-be-scanned data. May be sub-divided into Regions, Lines, Words
and/or Glyphs, although only Glyphs are crucial.
|
Region |
A sub-region of a Page, usually used to represent the graphical area
for a single column in a multi-column image. May be sub-divided into
Lines, Words, and/or Glyphs. This class is not currently prioritized
and may be deleted if deemed unnecessary.
|
Line |
A sub-region of a Page (or Region) consisting entirely of Glyphs. My
be sub-divided into Words that contain Glyphs, but may often contain
only Glyphs.
|
Word |
A sub-region of a Line consisting entirely of Glyphs, separated from
other Words within the Line by more horizontal space than
|
Glyph |
A sub-region of a Page (or Region or Line or Word)
representing a single to-be-identified character.
|
The Distinction between the is-a and has-a Element hierarchies
Note that the Element class is an aggregation of one or more instances
of itself. This design allows for a great deal of flexibility, and
significant code reuse between Element and its subclasses. However,
the design does mean that there is both an Element is-a hierarchy (the
collection of classes shown above and how they are hierarhically
related) and an Element has-a hierarchy (the collection of specific
objects and how they are connected to one another). It is important to
keep this distinction between the two hierarchies clear. To this end,
in Conjecture documentation we will refer to the Element class hierarchy
(for the is-a relationship) and to the Element containment hierarchy
(for the has-a relationship).
Element defines both a 'parent' field (pointing to the Element within
which this Element is contained) and a 'parts' field (containing all
the "child" elements). Note that in this context, the terms 'parent'
and 'child' do NOT refer to the class hierarchy, but instead to the
containment (object) hierarchy. For example, if we say that a Page
contains two Regions, each of which contains 40 Lines, each of which
contains a variety of Words, each of which contains a variety of
Glyphs, we are talking about the hierarchy of objects, not
the hierarchy of classes. At the class level, Pages, Regions, Lines,
Words and Glyphs are all "equal" subclasses of Element, but at the
object level, there is an inherent asymmetry due to the semantics of
the classes; a Page can contain Regions or Lines or Words or Glyphs,
but a Glyph cannot contain a Word or Line or Region or Page, a Line
cannot contain a Region or Page, etc.
One way to enforce this asymmetry would have been to NOT define a
'parts' field in Element (containing Elements), but instead to have
the Page class define a 'regions' field containing Region instances,
have Region define a 'lines' field containing Line instances, have
Line define a 'words' field containing Word instances, etc. Although
this alternative allows for more specificity, and thus better
compile-time type-checking, it is also very constraining because it
forces every OCR to always divide each Page into Regions into Lines
into Words into Glyphs. Because Conjecture is attempting to be a
universal framework that can support any OCR implementation
imaginable, we do not want to place unnecessary restrictions on how an
implementation performs its duties. All that is strictly needed is to
divide Pages into Glyphs (the subdivision into Word, Line and Region
are not strictly necesssary).
So, instead of using the above strategy, Element defines a 'parts'
field, which allows Pages to contain Regions, Lines, Words or Glyphs,
allows Regions to contain Lines, Words, or Glyphs, and allows Lines to
contain Words or Glyphs, which is more flexibile. Of course, from a
compile-time perspective this design is problematic because
nonsensical containments are also possible (a Glyph could contain a
Word or Line or Region or Page, a Line could contain a Region or Page,
etc). However, this is easily addressed with some run-time checks
in the 'element-adding' functionality defined on the Element class.
The increased flexibility of this approach was deemed worth the
reduction in compile-time typechecking accuracy.
How the Element Hierarchy interacts with Image
The overall idea behind Elements is a part-whole decomposition of an
input image into smaller and smaller images. The manner in which this
decomposition is implemented can have significant time and space
efficiency ramifications.
One approach is to have each Element subclass maintain a local
copy of that portion of the overall image to which it applies. Fields
'height' and 'width' would establish the pixel dimensions, and a
'data' field could store the actual pixel information. For example,
each Glyph could maintain a copy of that portion of the Page
represented by the Glyph.
The problem with the above naive implementation is that it incurs more
and more memory the more sub-divided an Element becomes. Although it
is common for Pages to contain just a collection of Glyphs, it is also
possible for a Page to contain Regions that contain Lines that contain
Words that contain Glyphs (which might contain other Glyphs!). In such
a situation, the pixel data representing an individual Glyph would be
copied (and maintained in memory) up to 5 times (stored in a Glyph,
stored in a Word, stored in a Line, stored in a Region, and stored in
the Page.
The above memory impact can be avoided by taking a different approach.
Instead of each Element maintaining a separate copy of its portion of
an image, we note that each input image corresponds to a Page, and
thus a Page represents the largest image. In the containment
hierarchy, all other subclasses of Element eventually "belong" to a
Page. For this reason, if Page stores an Image instance, then every
Element subclass has access to that "big picture" image by following
its 'parent' field up until a Page is reached (which is guaranteed by
the Conjecture framework to always occur). Each Element can be thought
of as representing a specific rectangular region within that "big
picture" image associated with the Page.
By having each Element store the top-left and bottom-right coordinates
(relative to the "big-picture" image), we can avoid requiring each
element to maintain individual copies. It is for this reason that the
Element hierarchy does NOT inherit from Image, the reason the Element
class itself does not define an Image field, and the reason the Page
class has an Image field. This approach significantly reduces memory
costs and thus efficiency.
|
|