[sword-devel] Some Comments on Sword's Code

David White sword-devel@crosswire.org
11 Feb 2002 22:07:18 +1100


While I was looking through the searching code for Sword, I noticed a
few things which seemed little strange, and I thought I'd make comments
on ways in which they could be improved:

there is lots of code in destructors that looks like this:

	if (entrybuf)
		delete [] entrybuf;

this is unnecessary, since C++ guarantees that delete [] NULL; will have
no effect.

Further, there seems to be a large amount of unnecessary memory
allocation and deallocation. For example, entrybuf is allocated as new
char[1]; - and as far as I can see it exists for the life of the object,
and then is deleted in the destructor; why not just use char
entrybuf[1]; ? - it is more efficient, more maintainable and more
reliable.

Instead of using constructor initialiser lists, variables are
initialised inside the body of the constructor; it is generally
considered better to initialise variables properly.

The search function, takes a parameter "int searchType", which is
documented to be a regex search if search is >= 0, a phrase if search is
-1, and a multiword search if phrase is -2. It would be alot cleaner,
and typesafe just to use an enumeration:

enum SearchType { REGEX_SEARCH, MULTIWORD_SEARCH, PHRASE_SEARCH };

Search also takes a pointer to a function, "void (*percent)(char, void
*)". It would be more flexible to make the function templated, and make
percent a function or function-like object that is passed in. That way,
the percentUserData parameter would also be unnecessary, as the user
could pass in a functoid which stores data if that is the desired
behaviour.

There is alot of code that could be written better using the STL, for
example:

for (unsigned int i = 0; i < strlen(wordBuf); i++)
	wordBuf[i] = toupper(wordBuf[i]);

to start with, this is inefficient, as it runs in O(n ^ 2) time, because
of all the calls to strlen, secondly it could be done much more cleanly
using,

std::transform(wordBuf, wordBuf+strlen(wordBuf), wordBuf, toupper);

I think consideration should be given to using the STL more extensively
in general. Use of std::string is often more efficient, elegant and
maintinable than use of C-style strings, and I think consideration
should be made of it's use (but yes, this would be a substantial rework
of the code). An example of a better way to write code using STL is in
rawtext.cpp, around line 333:

			words = (char **)calloc(sizeof(char *), 10);
			int allocWords = 10;
			words[wordCount] = strtok(wordBuf, " ");
			while (words[wordCount]) {
				wordCount++;
				if (wordCount == allocWords) {
					allocWords+=10;
					words = (char **)realloc(words, sizeof(char *)*allocWords);
				}
				words[wordCount] = strtok(NULL, " ");
			}

I see this code as being unnecessary complicated and error prone. What
if something through an exception? What would happen to the buffer
pointed to by words? This code can be rewritten more elegantly:

std::vector<std::string> allocWords;
std::istringstream stream(wordBuf);
std::string s;

std::copy(
          std::istream_iterator<std::string>(wordBuf),
          std::istream_iterator<std::string>(),
          std::back_inserter(allocWords)
         );

There are numerous calls to C-style allocation functions (calloc and
realloc), I would suggest considering using the more typesafe and
powerful std::vector. In particular, it is very difficult to make
C-style allocated code exception-safe, and if you really have to use it,
I would suggest using an auto_array class, in the spirit of
std::auto_ptr. Also, a problem with C-style allocation, is it makes it
even more difficult to deallocate memory properly, since you have to be
careful whether to use free() or delete or delete [], corresponding to
whether the memory was allocated using malloc/calloc/realloc, new, or
new []. Worse, on many compilers, allocating with new [] and freeing
with free() is fine, but only when you port to another platform will you
find lots of weird bugs.

I also noticed lines that looked like this:

indexes.erase(indexes.begin(), indexes.end());

this should be: indexes.clear();

I also noticed, that other than dynamic casts, Sword seems to
exclusively use C-style casts. I would recommend not using C-style
casts, as they are more dangerous, and might do an unexpected const_cast
or reinterpret_cast - both of which can easily cause undefined
behaviour. Generally, try to use static_casts instead of C-style casts.

Sword often uses the statement using namespace std; It is generally
considered bad style to import an entire namespace, particularly in a
header file. Safer is to just import the names you actually use as in,
using std::list; or simply explicitly qualify all uses of names. I
prefer the latter option personally, although I understand use of the
former.

This is getting rather voluminous, so I think I'll stop now. I apologise
if any of my comments seem like unnecessary criticisms, they are
intended rather to be helpful comments on how to improve the quality of
the code base. I can assure you that the code to my program is far worse
than Sword's code (I really had no idea what I was doing when I first
started writing it). Also, some of my above comments may be invalid,
since use of a subset of C++'s features may be intentional to allow
portability to some platforms with poor C++ support.

I would be happy to comment further if any of my comments have proved
helpful or instructive.

God Bless,

David.