[jsword-devel] New versification mapping - preload mappings to prevent pauses

Chris Burrell chris at burrell.me.uk
Sat Jan 18 15:47:04 MST 2014


Presumably if we have a basic osis ref parser, we can easily construct a
verse range directly. Meaning a lot of the parsing on the versification
mapping is simplified.

The slight complication with the versification mappings is that it
currently copes for the case as follows:

Gen.1.1=Gen.1-2
Gen.1.1=Gen.1.4

i.e. split ranges, meaning that Gen 1:1 would map to verses 1,2 and 4
(leaving verse 3 out).

So if we used a osis ref parser, we could two ranges very very quickly
(Gen.1-2 and Gen.1-4). However, what structure would we use to store the
combination of both? I was thinking a RangedPassage but you suggested the
construction of this would be heavy? Presumably we could introduce a
constructor that takes a collection of VerseRanges, meaning its
construction would be also very quick?

One more point, on the osis ref parser. Presumably we would want to use
BibleBookNames but looks like it already normalizes the booknames to the
English locale. Is there any particular reason for the normalize() method
in the static initializer? Seems redundant to me... (apart from the
lowercase). I suggest we optimize that, saving another hosts of pattern
matches. (only a handful), but on Android it will save a number of method
calls which tend to be more expensive than expected. Is there a strong
reason to have the lowercase? or would it be possible to define them
lowercase to start with? Presumably our osis parser would want to ignore
case...

Chris



On 18 January 2014 22:19, DM Smith <dmsmith at crosswire.org> wrote:

> See embedded.
> On Jan 18, 2014, at 4:05 PM, Chris Burrell <chris at burrell.me.uk> wrote:
>
> I think to encourage the use of the Passages directly we need to make
> those constructors public. That's the main reason I have stayed away from
> using it.
>
>
> I see that TallyPassage is fully public and the rest are not. This must
> have been so that Lucene ranked searches could use them.
>
> Maybe, we should modify PassageKeyFactory to take an optional PassageType?
>
>
> Obviously a fast way to construct the versification mappings would be to
> turn them into verse numbers. But that means losing readability.
>
>
> I think readability is critical to maintenance. You'll note that the v11n
> tables are readable.
>
> The obvious thought here is to generate an intermediary file as part of
> the build from what we have already code (custom maven plugin/ant task).
> But that seems rather hacky, but would line up with the way the
> versifications are defined in code.
>
> So basically what we want is:
> * an easy way of reading in a verse range/passage from text form into
> Verse/VerseRanges
>
>
> The are a couple of things here that we could benefit from:
> 1) An osisID parser and an osisRef parser. An osisRef is merely two
> osisIDs separated by a dash.
> Note: Gen.1.1-2 is not an osisRef as 2 is not an osisID. To parse "2" the
> parser has to know that the "basis" (the context) of "2" is Genesis 1:1.
> Thus 2 is understood as a verse and not a chapter. However, if the basis
> was Gen.1 then the 2 would be understood as a chapter.
>
> By having fully formed OSIS references*, we gain the following:
> There are no spaces to trim.
> Splitting on - will split into two parts, both of which are fully formed
> osisIDs.
> Splitting an osisID on . will result in 1 to 3 parts. book, book and
> chapter or book, chapter and verse. The verse is always a number, never
> "ff" or a roman numeral.
> Lookup of the book names is simplified. No need to try to figure out that
> Jo means John (or does it mean Job, or Jonah???). Book names are fixed
> ascii, never internationalized, never have spaces, dashes, periods or roman
> numerals.
>
> *Later, when we have references with works and grains, we may want to
> handle those. E.g. ignore them.
>
> 2) Today, conceptually a Passage is an ordered list of Verses and
> VerseRanges that don't overlap. However, JSword does not have a way to
> build a Verse or a VerseRange from a String. Maybe we need a way to
> construct a Passage from bottom up. I.e. create a Verse or VerseRange from
> a String (from osisIDs and osisRefs) and then put them into an initially
> empty Passage. Since we control the input, we know that the VerseRanges do
> not overlap, so maybe we can have a smart constructor that takes a List of
> Verse and VerseRanges and adds them.
>
>
> Basically, the way the code works is to iterate through the properties
> file and map and reverse-map every verse present to the maximum size
> available (i.e. 1 verse to a Passage). The idea behind this is that we
> store less that way.
>
> So if a mapping specifies as follows: Gen.1.1=Gen.1.1-2, then we only need
> to store
> left to right: Gen.1.1=>Gen.1.1-2
> right to left: Gen.1=>Gen.1.1, Gen.1.2=>Gen.1.1
>
> In that way, we've saved a mapping, and therefore memory (assuming we
> represent those ranges correctly).
> Chris
>
>
>
> On 18 January 2014 20:48, DM Smith <dmsmith at crosswire.org> wrote:
>
>>
>> On Jan 18, 2014, at 3:19 PM, Chris Burrell <chris at burrell.me.uk> wrote:
>>
>> Great stuff - helps really explain things. So I think for the
>> versification mappings, seen as they don't change much as they are mostly
>> contiguous ranges a RangePassage might be the best bet.
>>
>>
>> A RangedPassage is really good once it is constructed. Building it takes
>> a lot of time. Certainly not good for canonical ordered search results.
>>
>>
>>
>> I think the other thing that's quite expensive
>> is: PassageKeyFactory.instance().getKey(targetVersification,
>> qualifiedKey.getKey().getOsisRef())
>>
>>
>> I think this is expensive because of how it is done.
>> From memory: getOsisRef() returns a string, which needs to be parsed into
>> what getKey() already is. The parser does not handle osisRefs. It throws an
>> exception and the caller then mungs the reference such that the parser can
>> handle it. JSword needs an osisRef parser, which would be much simpler and
>> faster than the current parser that tries to understand what an end user is
>> trying to provide.
>>
>> If the reference is a string (e.g. from a user, from a module, these
>> methods are appropriate), these are what should be used. If not a string,
>> then stay away. It is expensive to convert a Passage to a String and it is
>> expensive to parse a String and to build the Passage.
>>
>> It'd be better to get an empty key list and add qualifiedKey.getKey().
>>
>>
>>
>> I'm not sure what's the best way of dealing with this but I suspect that
>> we may want to have specific cases for Verses as I believe the above is
>> quite heavy as well.
>>
>> I think I also often
>> use: PassageKeyFactory.instance().createEmptyKeyList(versification) which
>> probably should use a RangePassage as well
>>
>>
>> It should use the PassageType that the program finds best. A server side
>> application has very different requirements from a desktop and that from a
>> mobile app.
>>
>> If you need a specific implementation, don't use the factory. Use what is
>> best for the particular location.
>>
>> Just note that if two different implementations are used (like adding one
>> passage to another) the result will probably be the program default.
>>
>>
>> Would that make sense?
>> Chris
>>
>>
>>
>> On 18 January 2014 20:11, DM Smith <dmsmith at crosswire.org> wrote:
>>
>>> There are several different implementations for Passage. They have
>>> trade-offs.
>>> BitwisePassage is strictly a BitMap, with one bit for each verse in the
>>> v11n. It is very space heavy but fast for most operations. Especially
>>> adding or removing Verses. Iteration is fast as it uses NextSetBit (or
>>> whatever it is called to find the next set bit).
>>> RocketPassage is a very heavy implementation. Not appropriate for
>>> mobile. It occupies lots more space than a BitwisePassage
>>> PassageTally is basically a weighted set of verses such that you can
>>> order by weights. This is used to return weighted search results. It is
>>> probably not much use otherwise. It is used to return a ranked Lucene
>>> search.
>>> RangedPassage keeps ranges (start verse and length) ordered in a
>>> TreeSet. It probably is a good candidate for mobile. Maybe the best
>>> regarding storage. Works well if the RangedPassage is not modified much. If
>>> you add a verse (not in the RangedPassage) it may cause two VerseRanges to
>>> be merged into one, otherwise it'll extend one or create a new one. That
>>> computation takes time. Removing a verse is also complex.
>>> DistinctPassage keeps verses ordered in a TreeSet. This can be a good
>>> candidate for mobile, if there are not many verses in the Passage. If there
>>> are lots it gets bigger than BitwisePassage.
>>>
>>> The PassageKeyFactory is used throughout JSword to create the preferred
>>> type of Passage. You can set it to one of the above via PassageType
>>> typically at program startup.
>>>
>>> I wrote BitwisePassage; Joe wrote the others.
>>>
>>> I've been thinking that there might be another variation of
>>> BitwisePassage that might make sense. One with testament or book bitmaps.
>>>
>>> Hope this helps.
>>>
>>>  In Him,
>>> DM
>>>
>>>
>>> On Jan 18, 2014, at 2:42 PM, Chris Burrell <chris at burrell.me.uk> wrote:
>>>
>>> The reason I went for passages is for less memory consumption. I also
>>> thought it would provide better parsing and we're parsing less.
>>>
>>> But I think the way we store passages is to represent every verse by a
>>> bit. means that iterating through ranges, calculating cardinality or
>>> initialising them means is expensive.
>>>
>>> I wonder if we could have a passage that just stores boundaries rather
>>> than all its content. That would be very fast to parse and very quick for
>>> operations on it.
>>>
>>> Chris
>>>  On 18 Jan 2014 19:17, "Martin Denham" <mjdenham at gmail.com> wrote:
>>>
>>>> I think I have found a workaround for this problem (using a background
>>>> thread makes it run very quickly) but it seems to point toward a possible
>>>> point of improvement in the new JSword mapping data structure.
>>>>
>>>> In the old AB v11n mapping code I stored maps of Verses.  However the
>>>> new JSword code seems to store maps of RocketPassage/BitwisePassage.  Can
>>>> you verify this is the best data structure for what will normally be a
>>>> single verse.  I don't know an awful lot about this so I am just asking,
>>>> but maybe a Passage will always consume more memory than a Verse.
>>>>
>>>> Thanks
>>>> Martin
>>>>
>>>>
>>>>
>>>>
>>>> On 18 January 2014 13:30, Martin Denham <mjdenham at gmail.com> wrote:
>>>>
>>>>> I tried your patch and it did noticeably improve things, so it is a
>>>>> worthwhile patch, but did not solve the actual problem.  Still viewing the
>>>>> RusSynodal straight after download takes well over a minute, whereas to
>>>>> boot straight into it takes about 2 secs.
>>>>>
>>>>> One or 2 things I have noticed:
>>>>>
>>>>>    - a huge amount of memory allocation and GC occurs when trying to
>>>>>    view RusSynodal straight after download
>>>>>    - I am not great with the profiler but here are my tentative
>>>>>    observations
>>>>>       - Seems to be spending about half the time in
>>>>>       java.util.BitSet.cardinality()
>>>>>          - called by
>>>>>          BitwisePassage.countVerses()<-RocketPassage.countVerses()<-AbstractPassage.getCardinality()<-VersificationToKJVMapper.add...
>>>>>       - I tried caching AbstractPassage.getCardinality but that had
>>>>>       no affect which would be the case if there were lots of different keys
>>>>>    - Could it be that a module is in an unusual state after download?
>>>>>     I have noticed that a module cannot be deleted directly after download,
>>>>>    until the app is restarted due to:
>>>>>       - sbmd.getConfigFile()==null when called by
>>>>>       SwordBookDriver.isDeletable directly after module download
>>>>>
>>>>> Any suggestions are welcome.
>>>>>
>>>>> Cheers
>>>>> Martin
>>>>>
>>>>>
>>>>> On 16 January 2014 23:13, Martin Denham <mjdenham at gmail.com> wrote:
>>>>>
>>>>>> I think I will use the old AB v11n mapping for the next release of AB
>>>>>> and come back to this after the release which has quite a lot of changes in
>>>>>> it already anyway.
>>>>>>
>>>>>> The problem could be AB, Android, or JSword related and it is not
>>>>>> trivial to reproduce involving repeated re-installs of RusSynodal.  I had a
>>>>>> similar problem a year or two ago, it took a long time to investigate and
>>>>>> ended up being a flaw in the Android VM which required a change to JSword.
>>>>>>
>>>>>> I keep trying to think what could cause these symptoms and drawing a
>>>>>> blank.
>>>>>>
>>>>>> Cheers
>>>>>> Martin
>>>>>>
>>>>>>
>>>>>> On 16 January 2014 21:35, Chris Burrell <chris at burrell.me.uk> wrote:
>>>>>>
>>>>>>> I'll try profiling it! Nothing comes to mind though.
>>>>>>>  On 16 Jan 2014 20:04, "Martin Denham" <mjdenham at gmail.com> wrote:
>>>>>>>
>>>>>>>> There is something unusual happening with the v11n mapping creation
>>>>>>>> straight after downloading a file requiring mapping e.g. It takes over 2
>>>>>>>> mins to load the mapping (a single call to mapVerse) for RusSynodal
>>>>>>>> straight after download but a few seconds to load the file normally.
>>>>>>>>
>>>>>>>> 1. View ESV and anything else in split screen
>>>>>>>> 2. Download RusSynodal
>>>>>>>> 3. Exit download screen after download
>>>>>>>> 4. select RusSynodal for display (with ESV) in 1 half of split
>>>>>>>> screen
>>>>>>>> 5. Mapping code now takes over 2 minutes to load mapping
>>>>>>>> 6. But there are no errors and then everything seems fine
>>>>>>>>
>>>>>>>> 7. Exit and kill And Bible
>>>>>>>> 8. Restart with ESV/RST in split screens
>>>>>>>> 9. This time everything is initialised in a few seconds
>>>>>>>>
>>>>>>>> Any ideas?
>>>>>>>>
>>>>>>>> Martin
>>>>>>>>
>>>>>>>>
>>>>>>>> On 16 January 2014 19:06, Chris Burrell <chris at burrell.me.uk>wrote:
>>>>>>>>
>>>>>>>>> Hi Martin
>>>>>>>>>
>>>>>>>>> Fine by me. Think it should be opt in if possible.
>>>>>>>>>
>>>>>>>>> A couple of thoughts on that note. Do you know of it s the io or
>>>>>>>>> the cpu time that's the issue?
>>>>>>>>>
>>>>>>>>> We probably want to opt in with a list as well as all as there's
>>>>>>>>> no point in loading the v11n if not required.
>>>>>>>>>
>>>>>>>>> I guess especially if we end up with mappings per module
>>>>>>>>> eventually.
>>>>>>>>>
>>>>>>>>> On a separate note, I also found the books.installed call is very
>>>>>>>>> expensive. Thinking it may be worth partially loading these. With around
>>>>>>>>> 200 modules we spend almost 15 seconds loading them.
>>>>>>>>>
>>>>>>>>> Finally you can apply for a open source license of jprofiler which
>>>>>>>>> helps massively to work out what's going on. Have got a couple of
>>>>>>>>> uncommitted fixes for books. Installed find with that.
>>>>>>>>>
>>>>>>>>> Chris
>>>>>>>>> On 16 Jan 2014 18:10, "Martin Denham" <mjdenham at gmail.com> wrote:
>>>>>>>>>
>>>>>>>>>>  Hi,
>>>>>>>>>>
>>>>>>>>>> I have integrated the new v11n mapping code and find I am getting
>>>>>>>>>> a pause when doing the initial mapping between any 2 v11ns.
>>>>>>>>>>
>>>>>>>>>> I originally had the same problem with the AB mapping code so
>>>>>>>>>> pre-loaded all mapping required for the installed set of documents at the
>>>>>>>>>> start in a background thread to prevent delays.  The code I used is in
>>>>>>>>>> VersificationMappingFactory.initialiseRequiredMappings()<https://github.com/mjdenham/and-bible/blob/master/AndBible/src/net/bible/android/control/versification/mapping/VersificationMappingFactory.java>
>>>>>>>>>> .
>>>>>>>>>>
>>>>>>>>>> I am trying to think of the best way to do this in the new code.
>>>>>>>>>>  Either i) you could add a method to do it which could be called or ii) you
>>>>>>>>>> could proeload all required mappings automatically or iii) add a new method
>>>>>>>>>> to allow AB (or other frontend) to trigger preload of all required mappings
>>>>>>>>>> by adding a public method a bit like the new
>>>>>>>>>> VersificationsMapper.ensure(v11ntopreload).
>>>>>>>>>>
>>>>>>>>>> This is quite an issue for mobile users.  I have a fast mobile
>>>>>>>>>> and loading a mapping causes a noticeable delay the first time a verse
>>>>>>>>>> changes, but the preload fix is fairly simple and worked well.
>>>>>>>>>>
>>>>>>>>>> Cheers
>>>>>>>>>> Martin
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> _______________________________________________
>>>>>>>>>> jsword-devel mailing list
>>>>>>>>>> jsword-devel at crosswire.org
>>>>>>>>>> http://www.crosswire.org/mailman/listinfo/jsword-devel
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>> _______________________________________________
>>>>>>>>> jsword-devel mailing list
>>>>>>>>> jsword-devel at crosswire.org
>>>>>>>>> http://www.crosswire.org/mailman/listinfo/jsword-devel
>>>>>>>>>
>>>>>>>>>
>>>>>>>>
>>>>>>
>>>>>
>>>>  _______________________________________________
>>> jsword-devel mailing list
>>> jsword-devel at crosswire.org
>>> http://www.crosswire.org/mailman/listinfo/jsword-devel
>>>
>>>
>>>
>>
>>
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://www.crosswire.org/pipermail/jsword-devel/attachments/20140118/b3c0f6c7/attachment-0001.html>


More information about the jsword-devel mailing list