Root | .----------------------------------^-----------. | .___ | | | parts | \ | | | --<>Element Env | | | .-----------------------^-----------------------. | | | | | | Image <---- Page Region Line Word Glyph
Env : A class providing a public interface to other programs.
Image : A collection of pixels. In-memory representation of a file in a known graphics format.
Element : Abstract superclass of a collection of classes representing a part-whole decomposition of a graphic image into smaller and smaller semantic units.
Page : An Image and associated meta-data representing an entire page of to-be-scanned data. May be sub-divided into Regions, Lines, Words and/or Glyphs, although only Glyphs are crucial.
Region : A sub-region of a Page, usually used to represent the graphical area for a single column in a multi-column image. May be sub-divided into Lines, Words, and/or Glyphs. This class is not currently prioritized and may be deleted if deemed unnecessary.
Line : A sub-region of a Page (or Region) consisting entirely of Glyphs. My be sub-divided into Words that contain Glyphs, but may often contain only Glyphs.
Word : A sub-region of a Line consisting entirely of Glyphs, separated from other Words within the Line by more horizontal space than
Glyph : A sub-region of a Page (or Region or Line or Word) representing a single to-be-identified character.
Element defines both a 'parent' field (pointing to the Element within which this Element is contained) and a 'parts' field (containing all the "child" elements). Note that in this context, the terms 'parent' and 'child' do NOT refer to the class hierarchy, but instead to the containment (object) hierarchy. For example, if we say that a Page contains two Regions, each of which contains 40 Lines, each of which contains a variety of Words, each of which contains a variety of Glyphs, we are talking about the hierarchy of objects, not the hierarchy of classes. At the class level, Pages, Regions, Lines, Words and Glyphs are all "equal" subclasses of Element, but at the object level, there is an inherent asymmetry due to the semantics of the classes; a Page can contain Regions or Lines or Words or Glyphs, but a Glyph cannot contain a Word or Line or Region or Page, a Line cannot contain a Region or Page, etc.
One way to enforce this asymmetry would have been to NOT define a 'parts' field in Element (containing Elements), but instead to have the Page class define a 'regions' field containing Region instances, have Region define a 'lines' field containing Line instances, have Line define a 'words' field containing Word instances, etc. Although this alternative allows for more specificity, and thus better compile-time type-checking, it is also very constraining because it forces every OCR to always divide each Page into Regions into Lines into Words into Glyphs. Because Conjecture is attempting to be a universal framework that can support any OCR implementation imaginable, we do not want to place unnecessary restrictions on how an implementation performs its duties. All that is strictly needed is to divide Pages into Glyphs (the subdivision into Word, Line and Region are not strictly necesssary).
So, instead of using the above strategy, Element defines a 'parts' field, which allows Pages to contain Regions, Lines, Words or Glyphs, allows Regions to contain Lines, Words, or Glyphs, and allows Lines to contain Words or Glyphs, which is more flexibile. Of course, from a compile-time perspective this design is problematic because nonsensical containments are also possible (a Glyph could contain a Word or Line or Region or Page, a Line could contain a Region or Page, etc). However, this is easily addressed with some run-time checks in the 'element-adding' functionality defined on the Element class. The increased flexibility of this approach was deemed worth the reduction in compile-time typechecking accuracy.
One approach is to have each Element subclass maintain a local copy of that portion of the overall image to which it applies. Fields 'height' and 'width' would establish the pixel dimensions, and a 'data' field could store the actual pixel information. For example, each Glyph could maintain a copy of that portion of the Page represented by the Glyph.
The problem with the above naive implementation is that it incurs more and more memory the more sub-divided an Element becomes. Although it is common for Pages to contain just a collection of Glyphs, it is also possible for a Page to contain Regions that contain Lines that contain Words that contain Glyphs (which might contain other Glyphs!). In such a situation, the pixel data representing an individual Glyph would be copied (and maintained in memory) up to 5 times (stored in a Glyph, stored in a Word, stored in a Line, stored in a Region, and stored in the Page.
The above memory impact can be avoided by taking a different approach. Instead of each Element maintaining a separate copy of its portion of an image, we note that each input image corresponds to a Page, and thus a Page represents the largest image. In the containment hierarchy, all other subclasses of Element eventually "belong" to a Page. For this reason, if Page stores an Image instance, then every Element subclass has access to that "big picture" image by following its 'parent' field up until a Page is reached (which is guaranteed by the Conjecture framework to always occur). Each Element can be thought of as representing a specific rectangular region within that "big picture" image associated with the Page.
By having each Element store the top-left and bottom-right coordinates (relative to the "big-picture" image), we can avoid requiring each element to maintain individual copies. It is for this reason that the Element hierarchy does NOT inherit from Image, the reason the Element class itself does not define an Image field, and the reason the Page class has an Image field. This approach significantly reduces memory costs and thus efficiency.