Conjecture::Element Class Reference

#include <Element.h>

Inheritance diagram for Conjecture::Element:

Conjecture::Root Conjecture::Glyph Conjecture::Line Conjecture::Page Conjecture::Region Conjecture::Word List of all members.

Detailed Description

Abstract superclass of a collection of classes representing a part-whole decomposition of a graphic image into smaller and smaller semantic units.

****************************************************************

The Element class is at the root of a critical hierarchy of subclasses. Certain subclasses (Page and Glyph) reprsents the most important classes in the implementation, and much of their functionality is inherited from Element. Other subclasses (Region, Line and Word) provide "refinements" but aren't strictly necessary to the proper execution of the code.

An Element consists of state and functionality supporting a hierarchial decomposition of larger parts into a collection of smaller Elements contained within their "parent" part. The subclasses of Element have a conceptual "size order", and a tree structure links Elements to their parents and to their contained sub-images. Pages are the biggest, Regions are contained within Pages, Lines are contained within Regions or Pages, Words are contained within Lines, Regions, or Pages, and Glyphs are contained within Words, Lines, Regions or Pages.

Implementation subtleties: -------------------------- Since Elements are part of a part-whole decomposition into smaller and smaller images, the manner in which this decomposition is implemented can have significant time and space efficiency ramifications.

One strategy would be to have each Element subclass maintain a local copy of that portion of the overall image to which it applies. Fields 'height' and 'width' would establish the pixel dimensions, and a 'data' field could store the actual pixel information. For example, each Glyph could maintain a copy of that portion of the Page represented by the Glyph.

The problem with the above naive implementation is that it incurs more and more memory the more sub-divided an Element becomes. Although it is common for Pages to contain just a collection of Glyphs, it is also possible for a Page to contain Regions that contain Lines that contain Words that contain Glyphs (which might contain other Glyphs). In such a situation, the pixel data representing an individual Glyph would be copied (and maintained in memory) up to 5 times (stored in a Glyph, stored in a Word, stored in a Line, stored in a Region, and stored in the Page.

The above memory impact can be avoided by taking a different approach. Instead of each Element maintaining a separate copy of its portion of an image, we note that the largest possible Element is a Page, and all other subclasses of Page are conceptually contained within a Page. If each Element has access to its containing Page, then we could replaced the 'data' field with a <x,y> coordinate identifying the topleft pixel within the Page at which the sub-image begins. In addition, we could maintain either a <width,height> pair, or another <x,y> pair representing the bottom right corner (either one will allow computation of the other). By maintaining topleft relative to the Page, we do not need to maintain multiple copies of the data (which is what took up most of the memory in the first variant above).

Because this second strategy significantly reduces memory costs, it is the strategy currently used. However, the interface is separate from the implementation, and we can change the implementation as desired if it becomes necessary.

An Element has the following fields: parent --> the parent Element parts --> the list of sub parts topleft --> the <x,y> pixel coordinate within the Page image to which this image belongs of the top-left corner of this sub image. <x,y> is <0,0> for Page instances. bottomright --> the <x,y> pixel coordinate within the Page image to which this image belongs of the bottom-right corner of this sub image.

An Element can establish its Page by traversing its 'parent' links until one is found that has no parent (Page instances have a NULL parent).


Public Types

typedef std::list< Element * > ElementList

Public Member Functions

 Element (Element *parent, const Coord &topleft, const Coord &size)
void markInvisible (bool invisible=true)
virtual int type () const
void registerElement (Element *part)
ElementfirstElement () const
ElementlastElement () const
Coord size () const
void allGlyphs (std::vector< Glyph * > &glyphs)
void allGlyphs (std::vector< const Glyph * > &glyphs) const
PagecontainingPage () const
virtual const ImagepageImage () const
virtual void writeText (std::ostream &os) const
void writeImage (const std::string &filebase, const ImageArgs &config)
virtual PageasPage ()
virtual RegionasRegion ()
virtual LineasLine ()
virtual WordasWord ()
virtual GlyphasGlyph ()
virtual const PageasPage () const
virtual const RegionasRegion () const
virtual const LineasLine () const
virtual const WordasWord () const
virtual const GlyphasGlyph () const
int height () const
int width () const
u2 topY () const
u2 bottomY () const
u2 leftX () const
u2 rightX () const
void printStructure (std::ostream &os=std::cerr, const std::string &indent="") const
virtual void printSummary (std::ostream &os=std::cerr, const std::string &indent="", int index=-1) const
void writeGlyphs (const std::string &dir, const ImageArgs &adj) const
virtual std::string id () const
int findIndex (const Element *image) const
ElementList & partsRef ()
const Coordtopleft () const
const Coordbottomright () const
const ElementList & parts () const
const Elementparent () const
bool invisible () const

Static Public Member Functions

static void test (int argc=0, const char *argv[]=NULL)

Protected Member Functions

void topleftIs (const Coord &topleft)
CoordtopleftRef ()
void bottomrightIs (const Coord &bottomright)
CoordbottomrightRef ()
void partsIs (const ElementList &parts)
void parentIs (Element *parent)
Element *& parentRef ()
void invisibleIs (bool invisible)

Static Protected Attributes

static char * ClassName [6]


Constructor & Destructor Documentation

Conjecture::Element::Element Element parent,
const Coord topleft,
const Coord size
 

Each Element specifies its parent.

Checks are performed to ensure that the parent specified can be a parent of the image in question. See the 'type' method and 'parent' field for more details.

It is possible for parent to be NULL. When the reliance on C code is removed, this should only occur for Page instances (which do not have an Element parent), but currently the C code creates by-value Glyphs using a default constructor, so we must allow them to have NULL parents as well. Note that the Page constructor always passes a NULL pointer to its parent constructor, and that Glyph has a default value of NULL, while the other subclasses require a real image.


Member Function Documentation

void Conjecture::Element::allGlyphs std::vector< Glyph * > &  glyphs  ) 
 

Accumulate all glyphs in this Element.

Adds to the 'glyphs' vector all Glyph images within this Element, found by recursively descending the 'parts' field.

virtual Page* Conjecture::Element::asPage  )  [inline, virtual]
 

Although there are good reasons for superclasses not knowing about subclasses, I find the convenience of these downcasting methods too useful to avoid. Redefinitions are provided in subclasses for the relevant method (Page::asPage() returns 'this', Glyph::asGlyph returns 'this', etc.)

Reimplemented in Conjecture::Page.

Page * Conjecture::Element::containingPage  )  const
 

Returns the Page within which this Element resides.

If the Element is a Page, returns itself. Otherwise, the Element is a sub-region of a Page (and the Page of which it is a sub-region is returned).

This method is important because the 'topleft' and 'bottomright' fields of an Element describe coordinates within the Image associated with Page. That is, Element subclasses other than Page do not maintain their own local Images, but instead maintain information establishing what sub-region of the Page Image they apply to. Doing so allows us to avoid maintaining relative coordinates and redundant copies of parts.

int Conjecture::Element::findIndex const Element image  )  const
 

Returns the index of 'image' within this Element's list of subparts.

Returns -1 if the image is not an immediate sub-image of this image.

int Conjecture::Element::height  )  const [inline]
 

The height in pixels of this Element.

string Conjecture::Element::id  )  const [virtual]
 

Returns a unique string identifying this image among all others.

Consists of 'id', followed by integers separated by underscores. Each integer identifies a position within a container (Region, Line, Word, Glyph). The result is designed to be useful in filenames.

NOTE: This is not an efficient method - it is only meant to be used for debugging purposes, not core functionality.

void Conjecture::Element::markInvisible bool  invisible = true  )  [inline]
 

Marks the element as invisible - it can be ignored by all processing that is directed to producing text.

const Image * Conjecture::Element::pageImage  )  const [virtual]
 

The Image that this Element is a sub-region of.

Every subclass of Element stores topleft and bottomright coordinates relative to an Image stored within a Page. This method returns the Image.

Not named 'image' to avoid confusing with the accessors in Page for field 'image'.

void Conjecture::Element::printStructure std::ostream &  os = std::cerr,
const std::string &  indent = ""
const
 

Print out the containment hierarchy for this image.

virtual void Conjecture::Element::printSummary std::ostream &  os = std::cerr,
const std::string &  indent = "",
int  index = -1
const [virtual]
 

Print out information about this Element.

Reimplemented in Conjecture::Glyph, Conjecture::Line, Conjecture::Page, Conjecture::Region, and Conjecture::Word.

void Conjecture::Element::registerElement Element part  ) 
 

Add an Element to the set of parts within myself.

void Conjecture::Element::test int  argc = 0,
const char *  argv[] = NULL
[static]
 

Unit testing method.

This static method should create instances of the class (and instances of any other class necessary) and perform tests to ensure that all methods within the class are working as expected.

Reimplemented in Conjecture::Glyph, Conjecture::Line, Conjecture::Page, Conjecture::Region, and Conjecture::Word.

int Conjecture::Element::type  )  const [virtual]
 

Returns an integer establishing how "small" this image type is, relative to other image types. It has nothing to do with width or height, but instead with the conceptual size of the type itself. All instances of a particular subtype will always return the same value. The code is designed so that Glyph returns a larger number than Word, which is larger than Line, which is larger than Region, which is larger than Page. This allows us to perform some sanity checks on hierarchial decompositions to ensure that we don't make silly structures in which Lines have Glyphs as parents, etc.

FUTURE FIX: This method should be pure-virtual, but making it pure-virtual causes compilation failure (pure virtual method invoked in constructor).

Reimplemented in Conjecture::Glyph, Conjecture::Line, Conjecture::Page, Conjecture::Region, and Conjecture::Word.

int Conjecture::Element::width  )  const [inline]
 

The width in pixels of this Element.

void Conjecture::Element::writeGlyphs const std::string &  dir,
const ImageArgs adj
const
 

Create .pgm files for every Glyph within this Element

void Conjecture::Element::writeImage const std::string &  filebase,
const ImageArgs config
 

Write image sub-region to disk.

Creates a file (with prefix 'filebase' and suffix '.pgm') for the sub-region of the pageImage representing this Element. The output may be magnified, thresholded, and/or filtered. See the ImageArgs documentation for more on how to perform such modifications.

virtual void Conjecture::Element::writeText std::ostream &  os  )  const [virtual]
 

Writes a textual representation of this element to given output stream

Reimplemented in Conjecture::Glyph, Conjecture::Line, Conjecture::Page, and Conjecture::Word.


Member Data Documentation

char * Conjecture::Element::ClassName [static, protected]
 

Initial value:

 {
        "Element",
        "Page",
        "Region",
        "Line",
        "Word",
        "Glyph",
    }


The documentation for this class was generated from the following files:
Generated on Thu Jun 15 19:56:11 2006 for Conjecture by  doxygen 1.4.6