[sword-devel] SwordReader PocketPC and SmartPhone CAB files.

Mon Jul 7 11:27:44 MST 2008

On Jul 7, 2008, at 1:41 PM, David Trotz wrote:

> Troy,
>>
>> You of all people should know this... :)  SWORD stores all the tags
>> inline in the text.  When you start turning things OFF is when all  
>> the
>> filters have to do work to remove things.  If everything is left on,
>> then the text is just handed back as it was stored.  I'm still not
>> convinced. :)
>>
> I should have known it, but I was honestly confused, but it does make
> more sense the way you describe it. My guess is I need a better way to
> grow my text buffer. It must be what is slowing me down so much.  
> 'Cause
> if I push in a single very long string I can render very fast, but  
> when
> I push in a verse at a time, I render very slowly. I will do some more
> testing.

I've run into this before. Here are some things I have tried. Maybe  
one of them will work well in your situation:
1) Over-allocate for growth.
When allocating exactly for growth (e.g. 1 verse at a time) the  
algorithm is O(n^^2)

Using a doubling algorithm (everytime you run out of space, allocate  
to the next 2^^n boundary, or allocate 2x what you need. This will be  
O(nlogn), IIRC.

I've tried a doubling algorithm, whose impact is that it can grossly  
over allocate.

It would be better to use an algorithm that starts out by aggressive  
over-allocation, but eases up with time. If you know nothing about  
your input, then the following might be useful. I'll be replacing  
Lucene's doubling algorithm with the following taken from Python:
	/* This over-allocates proportional to the list size, making room
	 * for additional growth.  The over-allocation is mild, but is
	 * enough to give linear-time amortized behavior over a long
	 * sequence of appends() in the presence of a poorly-performing
	 * system realloc().
	 * The growth pattern is:  0, 4, 8, 16, 25, 35, 46, 58, 72, 88, ...
	 */
	new_allocated = (newsize >> 3) + (newsize < 9 ? 3 : 6) + newsize;

I think you can do better when you know something about the input. For  
example, you can know the average size of a verse and the maximum  
length of a chapter. Together you can use these to create a custom  
series for growth. It is also possible to cheaply get the size of each  
chapter in the module. This could be useful in creating a heuristic  
estimate.

2) Consumer-producer. I don't know if it is possible to have threads  
or co-processes. But you might be able to arrange for a producer to  
provide the next desired verse and a consumer to render the HTML.

3) Use "pages" for large output. Currently, the user can only see a  
bit at a time and has to do something to see the next bit. So in a way  
the user is used to paging.

We do this in Bible Desktop when the output is more than 50 verses.  
(The user can configure this limit and we are changing it to 200  
because common desktop hardware is getting better.) Visually we use  
bottom tabs, but our users find it confusing and we are looking for  
something more obvious.

In Him,
	DM