Guidelines For Modifying and Extending The Conjecture Framework
[wmh: these are preliminary and subject to change as the Conjecture
community grows - currently just my own personal set of coding
conventions - please forgive the soap-box :-)]
Readability, code-reuse, abstraction, modularity, encapsulation and
information hiding aren't just buzzwords, they are critical to smooth,
ongoing evolution of code. And they don't occur by magic. Here are
some guidelines to follow when working on the Conjecture framework.
- Every class is documented. Every method is documented. Every field
is documented. Use '///' or '/** ...*/' so that the
class/method/field documentation is available to doxygen. The UML
diagrams are to be kept up-to-date. Overall changes in design are
kept up-to-date.
- Every class should have a static 'Test(int argc, char* argv[])'
method that acts as a unit test. This method should create one or
more instances of the class (and any other classes needed to get
the class to work) and test the public interface methods, checking
return results/output against expected results to verify that it is
working as expected. The 'main' program provides special support
for invoking these unit-testing methods instead of performing
normal OCR processing, allowing convenient per-class testing.
- All fields are defined in private scope and are preceeded by an
underscore. Accessors are defined for each field. See the
more detailed discussion on 'Accessors' for more.
- getters can be public, but should ALWAYS have read-only
semantics, and should be moved out of 'public' scope if the
field isn't critical to the public interface.
- setters should never be at public scope. Use 'friend's
when classes need access to the non-public functionality
of other classes.
- Classes should be fully initialized by their constructors. Partial
initialization requires class clients to remember to perform
additional initialization before the class is safe to be used, and
completely defeats the point of having an initializer to begin
with! Furthermore, partial-initialization constructors require the
introduction of public access to internal state (directly or
indirectly via public read-write accessors). Note that this mandate
implies that classes should NOT have default constructors unless
the default values assigned to fields are not only intuitive and
meaningful, but also useable as-is.
- Avoid symbol conflicts.
- All classes are placed in the Conjecture
namespace.
- No global variables.
- Do NOT place 'using namespace std' statements in header files.
Yes, I know it allows you to avoid putting 'std::' in front of
all sorts of types, but putting such a statement in header files
forces third-party code to fully-qualify class names that
conflict with those in 'std'.
- The use of 'using namespace std' in source files is completely ok.
- Avoid assuming there will only be one instances of any particular
class. Support for multi-threading/parallel/distributed algorithms
suggests that even classes that seem like Singletons may not be.
- Write code in a thread-safe manner. Avoid non-const static
variables where possible.
- Document important changes in the $CONJECTURE/ChangeLog file.
Especially important to document is the introduction, changing or
removal of public interface methods in any kernel or utils class.
Concise notes on *everything* are probably better than detailed
notes on a few things. Detailed notes can be placed in appropriate
system/class/method documentation.
- All methods should be virtual by default. It is unlikely that the
minor gains in efficiency from having a non-virtual method will
significantly improve performance. The exception to this rule is
setters and getters, which can incur a significant efficiency hit
if made virtual because they can then no longer be inlined. Other
exceptions include methods that are expected to be executed
millions of times.
- If a method is virtual, explicitly specify the 'virtual' keyword in
the class header even when doing so is redundant. This convention
allows everyone to establish whether a method is virtual or not by
looking in a single header file, instead of having to search every
header file up the entire inheritance path when virtualness is in
doubt. Although some programming environments make this convention
unnecessary, program for all environments, not just your own.
- The "appearance" of code can have a significant impact on
encouraging or discouraging others to contribute. If the code is
"open" and "accessible", it is more likely people will be motivated
to modify and extend. If the code is obscure, cluttered, or
undocumented, it is much less likely that people will contribute.
Make efforts to make your code esthetically pleasant. Document
classes and methods, of course. Within each method, document each
conceptual "block" of code. One line of natural-language
description can make 10 lines of code much more accessible!
Accessors
Suppose we have a field called 'name', of type 'Type' (for example,
'Type' might be 'int' or 'A*'). One way to define it would be:
class A {
public:
Type name;
}
And then an individual would read/write the field using:
A* a = new A;
a->name = 10;
cout << "name = " << a->name << endl;
A public field is a terrible implementation decision because:
- All code, everywhere, is allowed to modify (and thus potentially
corrupt) its value. There is no "safety" available and
modification-at-a-distance is difficult to debug.
- If, during the evolution of the code, the type of the field
(or the existence of the field) comes into question, changes
are both difficult, and guaranteed to break any third-party
code that relies on this field.
However, using a simple strategy can address both of these problems
and add some additional capability. And in C++, this comes at
absolutely no runtime performance cost, as we'll see below.
Instead of defining our field as shown above, we do the following:
class A {
public:
inline const Type& name() const { return this->_name; }
protected:
inline void nameIs(Type val) { this->_name = val; }
inline Type& nameRef() { return this->_name; }
private:
Type _name;
}
In the above, we have moved the field definition itself into private
scope, and places an underscore in front of it. We have also added
three tiny inlined methods providing get/set access to the field.
Features worth noting:
- The 'name()' method returns a const version of the underlying
field. It allows 'read-only' access to the field, but does not
allow it to be modified. It is often at public scope, but should
be placed in protected scope if the field is not part of the
public interface. Obviously, the fewer field accessors in the
public interface, the more separation between implementation and
interface.
- The 'nameIs()' method sets the value of the field to its single
argument. This method should almost never be placed at public
scope. Classes should be designed so that they, are a small set of
tightly-coupled friend classes, are responsible for modifying
themselves. In this way, corruption of state is much easier to
identify, because only a small number of classes can be
responsible, and ultimately the change had to have occured via
this setter method.
- The 'nameRef()' method has exactly the same code block as
'name()', but its return value and non-const status make it
much more powerful. It too should almost never be placed at
public scope, as it provides a second means of modifying the
field. For example, if Type is 'int', we can write:
because 'nameRef()' returns a reference to the 'int' field. This
accessor is useful because anything we could do with the
underlying field 'a->_name', we can also do with 'a->nameRef()'.
For example:
cout << a->_name << endl;
a->_name += 10;
++a->_name;
is the same as
cout << a->nameRef() << endl;
a->nameRef() += 10;
++a->nameRef();
except that using 'nameRef' provides for some separation of
interface and implementation. Note that since 'nameRef' is
inline, the above two sets of lines really are identical
after compilation.
Additional Features
- The type of the argument in 'nameIs()' can sometimes be made
'const', but when it is a pointer type, it cannot be made 'const'
unless the underlying field itself is 'const'.
- Although many C++ programmers avoid using 'const', constness
can have a significant positive impact on code quality. Rather
than solving 'const' issues by not using 'const', it takes only
a little effort to understand the basic principles. Remember that:
- If a variable is 'const', it cannot be modified. That is, if
it appears on the right-hand side of an assignment statement,
the compiler will generate an error.
- 'const'ness is contagious - if one variable is 'const', it
often forces other variables or methods to also be 'const',
which may in turn force other variables or methods to be
'const', etc. etc. Because of this, the strategy of "I'll get
my code working without 'const' first, then add 'const'
semantics in later" is a terrible idea, because it is
difficult to "add in 'const'" incrementally - making one
variable/method 'const' requires many others to be made const
before the code will compile again. Because of this, it is
much better to always make EVERYTHING (variables and methods)
'const' by default, at least conceptually, then remove
'const' when it is established that the variable/method in
question cannot be 'const' and still retain the desired
semantics.
- If a variable is 'const', then only 'const' methods can
be invoked with that variable as a receiver. For example:
const Person p;
p.somefunc();
Only if 'somefunc' is marked as 'const' in its definition:
void somefunc() const { ... }
will the above 'p.somefunc();' invocation be allowed by
the compiler.
- A non-static method f() defined within a class A has a
special variable called 'this'. Since it is a variable,
it must have a type. The type of 'this', within class A,
is ALWAYS either 'A* const' or 'const A*const'. Which
one it is depends on whether the method f() is 'const'
or not.
- A method is 'const' if the keyword 'const' appears
AFTER its parameter list.
- If 'this' is of type 'A* const', it means that the
space for the pointer itself cannot be changed, so the
statement
is not allowed. Since the 'this' variable is ALWAYS
'A* const' or 'const A*const', one is never allowed to
modify 'this'. However, a variable being '*const'
does not affect the ability to modify what the variable
points to, so
isn't disallowed.
- If 'this' is of type 'const A* const', it means that
everything above for 'A* const' holds, AND that we
cannot modify what the variable points to, so
becomes a compile-time error.
|
|