[jsword-devel] Getting the global key list for a book

Chris Burrell chris at burrell.me.uk
Sat Feb 16 05:30:59 MST 2013


Hi DM

On the following:
*I'm wondering whether it'd be cheaper to build a list of those that aren't
present. (Just wondering).*
CJB>> Perhaps, but we'll still need to set each bit in the bitset. And for
Old Testaments that only have the Psalms this may be slower. But are you
pointing towards having an AntiBitSetPassage which would store the results
in reverse? I've got this running fast enough for me, i.e. 3ish seconds to
get all global lists from all modules. (generates about 13MBs of sitemap,
so my lost time is now spent in network transfer). So I'm happy enough.

*Also have been musing on whether it'd be good to modify Bitwise passage to
hold two testament bit maps. And null if not needed. Or maybe one per book.
*
CJB>>Not entirely sure how this would help in this case, because presumably
the BitSet is a O(1) operation for setting. I guess it speeds up for
contains() in that you may not need to look at a number of bits. Before we
go down that root, it would make sense to look at exactly where time is
being spent on an Android device. I'm thinking we're better spending time
on optimizing those areas (which are likely to be XML related) rather than
the other ones.

It might be good to add a method to SwordUtils with the same decode params
and determines whether the array has content. (i.e. is zero). This is in a
tight loop, so would be fast.
CJB>> Not quite sure what you're getting at here. I did add a comment,
which effectively said that:

                    // can this be simplified to temp[8] == 0 && temp[9] ==
0?
                    int verseSize = SwordUtil.decodeLittleEndian16(temp, ii
+ 8);

so presumably instead of calling out to decodeLittleEndian16(), we could
simply have the expression in the comment? But again, small performance (<
1% I reckon) improvements compared to the overall operations.

I've committed an extra couple of bits as per your original comments.
Chris



On 15 February 2013 22:21, Chris Burrell <chris at burrell.me.uk> wrote:

> Let me push my changes:
> https://github.com/tyndale/jsword/commit/c0d50633b28348185035a59bc9738672a0db5b42
>
>
> (There's the missing data file stuff in there too - let me know if you
> want it separated out).
>
> The reason for the name change is that there was no method initially. it
> just used contains() on the backend. The Book interface has not changed,
> but tries a call to backend.getFGKL() and if the method is unsupported it
> reverts back to the existing functionality. The name change was to
> highlight that SwordBook has a good way of getting the FGKL
>
>
> I did this before I realised we only had 1 implementation of a
> AbstractPassageBook. And I moved the default from AbstractPassageBook
> because it required access to backend, so I had a choice of exposing the
> backend or moving the function. I'm not thinking entirely straight and it
> may be have been better to expose backend instead.
>
> I take your comments in both emails and I'll look at implementing some of
> them tomorrow.
>
>     /* (non-Javadoc)
>      * @see org.crosswire.jsword.book.Book#getGlobalKeyList()
>      */
>     public final Key getGlobalKeyList() {
>         if (global == null) {
>             try {
>                 global = this.backend.getFastGlobalKeyList();
>                 return global;
>             } catch (UnsupportedOperationException ex) {
>                 // fail silently, operation not supported by the backend
>                 log.debug(ex.getMessage());
>             } catch (BookException ex) {
>                 // failing silently, as previous behaviour was to attempt
> to
>                 // return as much as we can using the slower method
>                 log.debug(ex.getMessage());
>             }
>
>             Versification v11n = super.getVersification();
>
>             global = super.createEmptyKeyList();
>             Key all = PassageKeyFactory.instance().getGlobalKeyList(v11n);
>
>             for (Key key : all) {
>                 if (contains(key)) {
>                     global.addAll(key);
>                 }
>             }
>         }
>
>         return global;
>     }
>
>
> Chris
>
>
> On 15 February 2013 22:08, DM Smith <dmsmith at crosswire.org> wrote:
>
>> Contains(xxx) is correct. So don't need to concern about the indirection.
>>
>> There are three parts to a compressed testament. The one index points to
>> a second index which points to the data.
>> -- DM
>>
>> On Feb 15, 2013, at 5:05 PM, Chris Burrell <chris at burrell.me.uk> wrote:
>>
>> I'm confused about the double indirection in that the contains() method
>> doesn't use it which is what I based my bit on.
>>
>> It simply checked for the size given an ordinal. So I'm doing the
>> opposite, getting all the data first, then iterating through them looking
>> for non-zero sizes
>>
>> I'm pretty sure it works (tested a little bit). It's shaved off heaps off
>> time (several minutes to a few seconds for all 250 modules I have
>> installed), but obviously that's no good if it isn't doing the right thing!
>> Chris
>>
>>
>>
>> On 15 February 2013 21:59, DM Smith <dmsmith at crosswire.org> wrote:
>>
>>> I like the direction this is heading. I'm not sure if it properly
>>> handles the double indirection of the compressed module. (I didn't look.)
>>> So assuming it is....
>>>
>>> Just directly create and use a bitwise passage. don't get it from the
>>> factory.
>>>
>>> The purpose of the factory was to be able to switch between
>>> implementations.
>>>
>>> I added the two bitwise passage classes and it is pretty much all we use
>>> now.
>>>
>>> I'm wondering whether it'd be cheaper to build a list of those that
>>> aren't present. (Just wondering).
>>>
>>> Also have been musing on whether it'd be good to modify Bitwise passage
>>> to hold two testament bit maps. And null if not needed. Or maybe one per
>>> book.
>>>
>>> Anyway, by writing to an api we can swap out the implementation. (We
>>> should add the new method as an overload by name to Passage and abstract
>>> passage. That way you don't need to know what kind of passage it is.)
>>>
>>> It might be good to add a method to SwordUtils with the same decode
>>> params and determines whether the array has content. (i.e. is zero). This
>>> is in a tight loop, so would be fast.
>>>
>>>
>>> On Feb 15, 2013, at 3:26 PM, Chris Burrell <chris at burrell.me.uk> wrote:
>>>
>>> Hi DM
>>>
>>> I'm about to test the following method for at least all Z backends:
>>>
>>> Maybe you can tell me what you think of it:
>>>
>>> *getFastGlobalKeyList*
>>>
>>> public Key getFastGlobalKeyList() throws BookException {
>>>         ZVerseBackendState rafBook = null;
>>>         try {
>>>             rafBook = initState();
>>>
>>>             String v11nName =
>>> getBookMetaData().getProperty(ConfigEntryType.VERSIFICATION).toString();
>>>             Versification v11n =
>>> Versifications.instance().getVersification(v11nName);
>>>
>>>             Testament[] testaments = new Testament[] {
>>>                     Testament.OLD, Testament.NEW
>>>             };
>>>
>>>             Key globalList =
>>> PassageKeyFactory.instance().createEmptyKeyList(v11n);
>>>             Passage passage = KeyUtil.getPassage(globalList, v11n);
>>>             BitwisePassage bitwisePassage = null;
>>>             if(passage instanceof BitwisePassage) {
>>>                 bitwisePassage = (BitwisePassage) passage;
>>>                 bitwisePassage.raiseEventSuppresion();
>>>                 bitwisePassage.raiseNormalizeProtection();
>>>             }
>>>
>>>
>>>             for (Testament currentTestament : testaments) {
>>>                 RandomAccessFile compRaf = currentTestament ==
>>> Testament.NEW ? rafBook.getNtCompRaf() : rafBook.getOtCompRaf();
>>>
>>>                 // If Bible does not contain the desired testament, then
>>> false
>>>                 if (compRaf == null) {
>>>                     // no keys in this testament
>>>                     continue;
>>>                 }
>>>
>>>                 int maxIndex = v11n.getCount(currentTestament);
>>>
>>>                 // Read in the whole index, a few hundred Kb at most.
>>>                 byte[] temp = SwordUtil.readRAF(compRaf, 0,
>>> COMP_ENTRY_SIZE * maxIndex);
>>>
>>>                 // for each block of 10 bytes, we consider the last 2
>>> bytes.
>>>                 for (int ii = 0; ii < temp.length; ii += 10) {
>>>                     // can this be simplified to temp[8] == 0 && temp[9]
>>> == 0?
>>>                     int verseSize = SwordUtil.decodeLittleEndian16(temp,
>>> 8);
>>>
>>>                     //can this be optimized even further - i.e. why
>>> decodeOrdinal, when add() go simply pass in and store an ordinal
>>>                     if (verseSize > 0) {
>>>                         if(bitwisePassage != null) {
>>>                             bitwisePassage.addVersifiedOrdinal(ii % 10);
>>>                         } else {
>>>                             globalList.addAll(v11n.decodeOrdinal(ii %
>>> 10));
>>>                         }
>>>                     }
>>>                 }
>>>             }
>>>
>>>             if(bitwisePassage != null) {
>>>                 bitwisePassage.lowerNormalizeProtection();
>>>                 bitwisePassage.lowerEventSuppressionAndTest();
>>>             }
>>>
>>>             return globalList;
>>>         } catch (IOException e) {
>>>             throw new BookException(JSMsg.gettext("Unable to read key
>>> list from book."));
>>>         } finally {
>>>             IOUtil.close(rafBook);
>>>         }
>>>     }
>>>
>>>
>>> *addVersifiedOrdinal (is just a simplification of add, so that we can
>>> give the ordinals directly, rather than converting to a verse and back
>>> again).*
>>> /**
>>>      * A shortcut to adding a key, by ordinal. The ordinal needs to be
>>> taken from the same versification as the passage being created.
>>>      *
>>>      * @param ordinal the ordinal
>>>      */
>>>     public void addVersifiedOrdinal(int ordinal) {
>>>         Versification v11n = getVersification();
>>>         optimizeWrites();
>>>
>>>             store.set(ordinal);
>>>
>>>         // we do an extra check here because the cost of calculating the
>>>         // params is non-zero and may be wasted
>>>         if (suppressEvents == 0) {
>>>             Verse verse = v11n.decodeOrdinal(ordinal);
>>>             fireIntervalAdded(this, verse, verse);
>>>         }
>>>     }
>>>
>>>
>>>
>>> On 15 February 2013 20:22, DM Smith <dmsmith at crosswire.org> wrote:
>>>
>>>> That's what http://www.crosswire.org/tracker/browse/JS-246 is all
>>>> about.
>>>>
>>>> There are a good number of places that we get all the keys for a
>>>> module. But the construction of the list is dumb.
>>>>
>>>> The question is what do you want? Do you want all the possible keys in
>>>> a versification? Or do you want the keys for the verses that are actually
>>>> in a module?
>>>>
>>>> If you only want chapters that exist, consider working off of
>>>> Versification to get the book names and the chapters. For each chapter, get
>>>> the number of verses and grab one (maybe verse 1 or one halfway into the
>>>> chapter) on the assumption that if one exists then the rest do.
>>>>
>>>>
>>>> Calling book.contains(key) is what you want to do. It goes against the
>>>> backend to determine whether the verse is in the module.
>>>>
>>>> Don't call book.getGlobalKeyList(); We probably should deprecate the
>>>> method. It is too easy to call it and it is very expensive for what it does.
>>>>
>>>> I have a 3 day weekend and hope to finish the av11n work. It impacts
>>>> this. Will probably add iterators for a versification.
>>>>
>>>> In Him,
>>>>         DM
>>>>
>>>>
>>>>
>>>> On Feb 15, 2013, at 2:14 PM, Chris Burrell <chris at burrell.me.uk> wrote:
>>>>
>>>> > Hi all
>>>> >
>>>> > getGlobalKeyList() seems to do a lot of work against the file system.
>>>> >
>>>> > I'm attempting to create a sitemap, which includes a URL to every
>>>> chapter. Presumably getGlobalKeyList is what I want to ensure that I'm only
>>>> displaying those chapters that exist. I was intending to do:
>>>> >
>>>> > globalKeyList = getGlobalKeyList();
>>>> > for(each book in versification) {
>>>> >     myBookKey = book.getValidKey(book.getShortName())
>>>> >     myBookKey.retainAll(globalKeyList);
>>>> >
>>>> >     if(myBookKey.getCardinality() > 0) {
>>>> >         outputSiteMapChapter();
>>>> >     }
>>>> > }
>>>> >
>>>> > So for each version I call getGlobalKeyList(), but that in terms seem
>>>> to make individual reads (through the contains()) method for every
>>>> potential key in the versification.
>>>> >
>>>> > public final Key getGlobalKeyList() {
>>>> >         if (global == null) {
>>>> >             Versification v11n =
>>>> Versifications.instance().getVersification(versification);
>>>> >             global = keyf.createEmptyKeyList(v11n);
>>>> >             Key all = keyf.getGlobalKeyList(v11n);
>>>> >             for (Key key : all) {
>>>> >                 if (contains(key)) {
>>>> >                     global.addAll(key);
>>>> >                 }
>>>> >             }
>>>> >         }
>>>> >         return global;
>>>> > }
>>>> >
>>>> > The contains() method above makes a backend call every time.
>>>> >
>>>> > Isn't there a way to obtain the list of keys from the index? I'm not
>>>> quite sure where the versification comes in here, or is it there simply to
>>>> map the names of the keys to their positions in the index file? If so, can
>>>> we simply read all keys in OT and NT, and check
>>>> >
>>>> > Doing this on GlobalKeyList() would also speed up index creation.
>>>> >
>>>> > Finally, it seems contains() only works on a single key, or takes the
>>>> first key/verse if it's a passage. Presumably that is correct?
>>>> >
>>>> > Chris
>>>> >
>>>> > _______________________________________________
>>>> > jsword-devel mailing list
>>>> > jsword-devel at crosswire.org
>>>> > http://www.crosswire.org/mailman/listinfo/jsword-devel
>>>>
>>>>
>>>
>>>
>>
>>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://www.crosswire.org/pipermail/jsword-devel/attachments/20130216/b29da8f3/attachment-0001.html>


More information about the jsword-devel mailing list