[jsword-devel] New versification mapping - preload mappings to prevent pauses

Chris Burrell chris at burrell.me.uk
Mon Jan 20 08:25:22 MST 2014


Great - once you've done that, I'll see if I can put in the OSIS Ref Parser
which may shave some more time off.

Chris



On 20 January 2014 14:54, Martin Denham <mjdenham at gmail.com> wrote:

> I did some experiments using a RangedPassage and it performs well with the
> Versification code.  I removed teh workaround and found there is be a delay
> of ~1 sec using RangedPassage, instead of 1.5 mins using the original
> RocketPassage/BitwisePassage.
>
> I will leave the workaround in and also set RangedPassage to be the
> default passage type for AB.
>
> I needed to change PassageType.MIX to RangedPassage and will commit it
> shortly.
>
> Martin
>
>
> On 19 January 2014 00:56, DM Smith <dmsmith at crosswire.org> wrote:
>
>>
>> On Jan 18, 2014, at 5:47 PM, Chris Burrell <chris at burrell.me.uk> wrote:
>>
>> Presumably if we have a basic osis ref parser, we can easily construct a
>> verse range directly. Meaning a lot of the parsing on the versification
>> mapping is simplified.
>>
>>
>> We don't have a basic osisID parser. If we did then...
>>
>>
>> The slight complication with the versification mappings is that it
>> currently copes for the case as follows:
>>
>> Gen.1.1=Gen.1-2
>> Gen.1.1=Gen.1.4
>>
>>
>> These should be written:
>> Gen.1.1=Gen.1.1-Gen.1.2
>> Gen.1.1=Gen.1.4
>> That is the right hand side of the range should be an osisID.
>>
>>
>> i.e. split ranges, meaning that Gen 1:1 would map to verses 1,2 and 4
>> (leaving verse 3 out).
>>
>> So if we used a osis ref parser, we could two ranges very very quickly
>> (Gen.1-2 and Gen.1-4).
>>
>> Yes.
>>
>> However, what structure would we use to store the combination of both? I
>> was thinking a RangedPassage but you suggested the construction of this
>> would be heavy?
>>
>> The construction of a ranged passage one verse at a time is heavy.
>> Especially when the verses are not ordered.
>>
>> Imagine if we added the verses for Gen 1:1 to Gen 1:6 one verse at a
>> time, but in the following order 1, 3, 2, 4, 6, 7, 5, 9, 10, 8.
>>
>> 1 > Create a VerseRange for Gen 1:1 to Gen 1:1 and add it to the
>> collection.
>> 3 > Check to see if 3 is already represented or if it is adjacent to any
>> existing VerseRange. Since it is not create a new VerseRange for Gen 1:3 to
>> Gen 1:3.
>> 2 > Do the checks, finding that it is adjacent to Gen 1:1. Extend that
>> VerseRange to become Gen 1:1-Gen 1:2. Since it was also adjacent to the
>> other, throw that one away and extend the first to encompass all the verses
>> in that.
>> 4> Extend the one VerseRange.
>> 6> Create a new VerseRange.
>> 7> Find and extend a VerseRange.
>> 5> Find and extend a VerseRange. Find and another adjacent VerseRange and
>> extend the other to include it. Then throw it away. (Now we have one verse
>> range Gen 1:1-7.
>> 9> Create a Verse Range.
>> 10> Extend that Verse Range
>> 8> Find, extend, find, merge, wack.
>>
>> If the RangePassage is simply constructed (e.g. construct a verse range
>> and add it to an empty RangedPassage) There is no work.
>>
>> If the RangedPassage is "Gen.1.1-Gen.1.2 Gen.1.4", is built as a
>> VerseRange for both Gen.1.1-Gen.1.2 and for Gen.1.4-Gen.1.4 and added to
>> the RangedPassage, there will be little overhead.
>> (In OSIS a passage is a space separate list of osisIDs and/or osisRefs).
>>
>> I just don't know if this is directly supported in the RangedPassage
>> code, but it should.
>>
>>
>> Presumably we could introduce a constructor that takes a collection of
>> VerseRanges, meaning its construction would be also very quick?
>>
>>
>> Yes.
>>
>>
>> One more point, on the osis ref parser. Presumably we would want to use
>> BibleBookNames but looks like it already normalizes the booknames to the
>> English locale.
>>
>> Not really. The toUpper and toLower should always be called with an
>> explicit Locale. Using Locale.ENGLISH means that it won't use the user's
>> Locale. Since we control the text, we know that Locale.ENGLISH is the
>> correct thing to do.
>>
>> The normalizing of the names is a concession of end user text entry,
>> where they might not know that Gen is the one and only way to specify
>> Genesis in OSIS.
>>
>> It would be better to have the lookup be exact. Maybe a new hash and a
>> new call?
>>
>> Is there any particular reason for the normalize() method in the static
>> initializer?
>>
>>
>>
>>  Seems redundant to me... (apart from the lowercase).
>>
>>
>> This could be optimized to call toLower directly.
>>
>> The principle is that the index and the search should be normalized in
>> exactly the same way. If we chose, we could change normalize to upper odd
>> chars and lower even. The code would continue to work.
>>
>> But if we change it then comments should be here and in normalize to note
>> that a change to the one may require a change to the other.
>>
>>
>>  I suggest we optimize that, saving another hosts of pattern matches.
>> (only a handful), but on Android it will save a number of method calls
>> which tend to be more expensive than expected. Is there a strong reason to
>> have the lowercase? or would it be possible to define them lowercase to
>> start with?
>>
>>
>> These values are used to construct osisIDs and the Mixed case is defined
>> by the standard.
>>
>> Presumably our osis parser would want to ignore case...
>>
>>
>> The osis parser should probably not want to ignore case. The
>> specification is mixed case. Skipping toLower would be a bit faster.
>>
>>
>> Chris
>>
>>
>>
>> On 18 January 2014 22:19, DM Smith <dmsmith at crosswire.org> wrote:
>>
>>> See embedded.
>>> On Jan 18, 2014, at 4:05 PM, Chris Burrell <chris at burrell.me.uk> wrote:
>>>
>>> I think to encourage the use of the Passages directly we need to make
>>> those constructors public. That's the main reason I have stayed away from
>>> using it.
>>>
>>>
>>> I see that TallyPassage is fully public and the rest are not. This must
>>> have been so that Lucene ranked searches could use them.
>>>
>>> Maybe, we should modify PassageKeyFactory to take an optional
>>> PassageType?
>>>
>>>
>>> Obviously a fast way to construct the versification mappings would be to
>>> turn them into verse numbers. But that means losing readability.
>>>
>>>
>>> I think readability is critical to maintenance. You'll note that the
>>> v11n tables are readable.
>>>
>>> The obvious thought here is to generate an intermediary file as part of
>>> the build from what we have already code (custom maven plugin/ant task).
>>> But that seems rather hacky, but would line up with the way the
>>> versifications are defined in code.
>>>
>>> So basically what we want is:
>>> * an easy way of reading in a verse range/passage from text form into
>>> Verse/VerseRanges
>>>
>>>
>>> The are a couple of things here that we could benefit from:
>>> 1) An osisID parser and an osisRef parser. An osisRef is merely two
>>> osisIDs separated by a dash.
>>> Note: Gen.1.1-2 is not an osisRef as 2 is not an osisID. To parse "2"
>>> the parser has to know that the "basis" (the context) of "2" is Genesis
>>> 1:1. Thus 2 is understood as a verse and not a chapter. However, if the
>>> basis was Gen.1 then the 2 would be understood as a chapter.
>>>
>>> By having fully formed OSIS references*, we gain the following:
>>> There are no spaces to trim.
>>> Splitting on - will split into two parts, both of which are fully formed
>>> osisIDs.
>>>  Splitting an osisID on . will result in 1 to 3 parts. book, book and
>>> chapter or book, chapter and verse. The verse is always a number, never
>>> "ff" or a roman numeral.
>>> Lookup of the book names is simplified. No need to try to figure out
>>> that Jo means John (or does it mean Job, or Jonah???). Book names are fixed
>>> ascii, never internationalized, never have spaces, dashes, periods or roman
>>> numerals.
>>>
>>> *Later, when we have references with works and grains, we may want to
>>> handle those. E.g. ignore them.
>>>
>>> 2) Today, conceptually a Passage is an ordered list of Verses and
>>> VerseRanges that don't overlap. However, JSword does not have a way to
>>> build a Verse or a VerseRange from a String. Maybe we need a way to
>>> construct a Passage from bottom up. I.e. create a Verse or VerseRange from
>>> a String (from osisIDs and osisRefs) and then put them into an initially
>>> empty Passage. Since we control the input, we know that the VerseRanges do
>>> not overlap, so maybe we can have a smart constructor that takes a List of
>>> Verse and VerseRanges and adds them.
>>>
>>>
>>> Basically, the way the code works is to iterate through the properties
>>> file and map and reverse-map every verse present to the maximum size
>>> available (i.e. 1 verse to a Passage). The idea behind this is that we
>>> store less that way.
>>>
>>> So if a mapping specifies as follows: Gen.1.1=Gen.1.1-2, then we only
>>> need to store
>>> left to right: Gen.1.1=>Gen.1.1-2
>>> right to left: Gen.1=>Gen.1.1, Gen.1.2=>Gen.1.1
>>>
>>> In that way, we've saved a mapping, and therefore memory (assuming we
>>> represent those ranges correctly).
>>> Chris
>>>
>>>
>>>
>>> On 18 January 2014 20:48, DM Smith <dmsmith at crosswire.org> wrote:
>>>
>>>>
>>>> On Jan 18, 2014, at 3:19 PM, Chris Burrell <chris at burrell.me.uk> wrote:
>>>>
>>>> Great stuff - helps really explain things. So I think for the
>>>> versification mappings, seen as they don't change much as they are mostly
>>>> contiguous ranges a RangePassage might be the best bet.
>>>>
>>>>
>>>> A RangedPassage is really good once it is constructed. Building it
>>>> takes a lot of time. Certainly not good for canonical ordered search
>>>> results.
>>>>
>>>>
>>>>
>>>> I think the other thing that's quite expensive
>>>> is: PassageKeyFactory.instance().getKey(targetVersification,
>>>> qualifiedKey.getKey().getOsisRef())
>>>>
>>>>
>>>> I think this is expensive because of how it is done.
>>>> From memory: getOsisRef() returns a string, which needs to be parsed
>>>> into what getKey() already is. The parser does not handle osisRefs. It
>>>> throws an exception and the caller then mungs the reference such that the
>>>> parser can handle it. JSword needs an osisRef parser, which would be much
>>>> simpler and faster than the current parser that tries to understand what an
>>>> end user is trying to provide.
>>>>
>>>> If the reference is a string (e.g. from a user, from a module, these
>>>> methods are appropriate), these are what should be used. If not a string,
>>>> then stay away. It is expensive to convert a Passage to a String and it is
>>>> expensive to parse a String and to build the Passage.
>>>>
>>>> It'd be better to get an empty key list and add qualifiedKey.getKey().
>>>>
>>>>
>>>>
>>>> I'm not sure what's the best way of dealing with this but I suspect
>>>> that we may want to have specific cases for Verses as I believe the above
>>>> is quite heavy as well.
>>>>
>>>> I think I also often
>>>> use: PassageKeyFactory.instance().createEmptyKeyList(versification) which
>>>> probably should use a RangePassage as well
>>>>
>>>>
>>>> It should use the PassageType that the program finds best. A server
>>>> side application has very different requirements from a desktop and that
>>>> from a mobile app.
>>>>
>>>> If you need a specific implementation, don't use the factory. Use what
>>>> is best for the particular location.
>>>>
>>>> Just note that if two different implementations are used (like adding
>>>> one passage to another) the result will probably be the program default.
>>>>
>>>>
>>>> Would that make sense?
>>>> Chris
>>>>
>>>>
>>>>
>>>> On 18 January 2014 20:11, DM Smith <dmsmith at crosswire.org> wrote:
>>>>
>>>>> There are several different implementations for Passage. They have
>>>>> trade-offs.
>>>>> BitwisePassage is strictly a BitMap, with one bit for each verse in
>>>>> the v11n. It is very space heavy but fast for most operations. Especially
>>>>> adding or removing Verses. Iteration is fast as it uses NextSetBit (or
>>>>> whatever it is called to find the next set bit).
>>>>> RocketPassage is a very heavy implementation. Not appropriate for
>>>>> mobile. It occupies lots more space than a BitwisePassage
>>>>> PassageTally is basically a weighted set of verses such that you can
>>>>> order by weights. This is used to return weighted search results. It is
>>>>> probably not much use otherwise. It is used to return a ranked Lucene
>>>>> search.
>>>>> RangedPassage keeps ranges (start verse and length) ordered in a
>>>>> TreeSet. It probably is a good candidate for mobile. Maybe the best
>>>>> regarding storage. Works well if the RangedPassage is not modified much. If
>>>>> you add a verse (not in the RangedPassage) it may cause two VerseRanges to
>>>>> be merged into one, otherwise it'll extend one or create a new one. That
>>>>> computation takes time. Removing a verse is also complex.
>>>>> DistinctPassage keeps verses ordered in a TreeSet. This can be a good
>>>>> candidate for mobile, if there are not many verses in the Passage. If there
>>>>> are lots it gets bigger than BitwisePassage.
>>>>>
>>>>> The PassageKeyFactory is used throughout JSword to create the
>>>>> preferred type of Passage. You can set it to one of the above via
>>>>> PassageType typically at program startup.
>>>>>
>>>>> I wrote BitwisePassage; Joe wrote the others.
>>>>>
>>>>> I've been thinking that there might be another variation of
>>>>> BitwisePassage that might make sense. One with testament or book bitmaps.
>>>>>
>>>>> Hope this helps.
>>>>>
>>>>>  In Him,
>>>>> DM
>>>>>
>>>>>
>>>>> On Jan 18, 2014, at 2:42 PM, Chris Burrell <chris at burrell.me.uk>
>>>>> wrote:
>>>>>
>>>>> The reason I went for passages is for less memory consumption. I also
>>>>> thought it would provide better parsing and we're parsing less.
>>>>>
>>>>> But I think the way we store passages is to represent every verse by a
>>>>> bit. means that iterating through ranges, calculating cardinality or
>>>>> initialising them means is expensive.
>>>>>
>>>>> I wonder if we could have a passage that just stores boundaries rather
>>>>> than all its content. That would be very fast to parse and very quick for
>>>>> operations on it.
>>>>>
>>>>> Chris
>>>>>  On 18 Jan 2014 19:17, "Martin Denham" <mjdenham at gmail.com> wrote:
>>>>>
>>>>>> I think I have found a workaround for this problem (using a
>>>>>> background thread makes it run very quickly) but it seems to point toward a
>>>>>> possible point of improvement in the new JSword mapping data structure.
>>>>>>
>>>>>> In the old AB v11n mapping code I stored maps of Verses.  However the
>>>>>> new JSword code seems to store maps of RocketPassage/BitwisePassage.  Can
>>>>>> you verify this is the best data structure for what will normally be a
>>>>>> single verse.  I don't know an awful lot about this so I am just asking,
>>>>>> but maybe a Passage will always consume more memory than a Verse.
>>>>>>
>>>>>> Thanks
>>>>>> Martin
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>> On 18 January 2014 13:30, Martin Denham <mjdenham at gmail.com> wrote:
>>>>>>
>>>>>>> I tried your patch and it did noticeably improve things, so it is a
>>>>>>> worthwhile patch, but did not solve the actual problem.  Still viewing the
>>>>>>> RusSynodal straight after download takes well over a minute, whereas to
>>>>>>> boot straight into it takes about 2 secs.
>>>>>>>
>>>>>>> One or 2 things I have noticed:
>>>>>>>
>>>>>>>    - a huge amount of memory allocation and GC occurs when trying
>>>>>>>    to view RusSynodal straight after download
>>>>>>>    - I am not great with the profiler but here are my tentative
>>>>>>>    observations
>>>>>>>       - Seems to be spending about half the time in
>>>>>>>       java.util.BitSet.cardinality()
>>>>>>>          - called by
>>>>>>>          BitwisePassage.countVerses()<-RocketPassage.countVerses()<-AbstractPassage.getCardinality()<-VersificationToKJVMapper.add...
>>>>>>>       - I tried caching AbstractPassage.getCardinality but that had
>>>>>>>       no affect which would be the case if there were lots of different keys
>>>>>>>    - Could it be that a module is in an unusual state after
>>>>>>>    download?  I have noticed that a module cannot be deleted directly after
>>>>>>>    download, until the app is restarted due to:
>>>>>>>       - sbmd.getConfigFile()==null when called by
>>>>>>>       SwordBookDriver.isDeletable directly after module download
>>>>>>>
>>>>>>> Any suggestions are welcome.
>>>>>>>
>>>>>>> Cheers
>>>>>>> Martin
>>>>>>>
>>>>>>>
>>>>>>> On 16 January 2014 23:13, Martin Denham <mjdenham at gmail.com> wrote:
>>>>>>>
>>>>>>>> I think I will use the old AB v11n mapping for the next release of
>>>>>>>> AB and come back to this after the release which has quite a lot of changes
>>>>>>>> in it already anyway.
>>>>>>>>
>>>>>>>> The problem could be AB, Android, or JSword related and it is not
>>>>>>>> trivial to reproduce involving repeated re-installs of RusSynodal.  I had a
>>>>>>>> similar problem a year or two ago, it took a long time to investigate and
>>>>>>>> ended up being a flaw in the Android VM which required a change to JSword.
>>>>>>>>
>>>>>>>> I keep trying to think what could cause these symptoms and drawing
>>>>>>>> a blank.
>>>>>>>>
>>>>>>>> Cheers
>>>>>>>> Martin
>>>>>>>>
>>>>>>>>
>>>>>>>> On 16 January 2014 21:35, Chris Burrell <chris at burrell.me.uk>wrote:
>>>>>>>>
>>>>>>>>> I'll try profiling it! Nothing comes to mind though.
>>>>>>>>>  On 16 Jan 2014 20:04, "Martin Denham" <mjdenham at gmail.com> wrote:
>>>>>>>>>
>>>>>>>>>> There is something unusual happening with the v11n mapping
>>>>>>>>>> creation straight after downloading a file requiring mapping e.g. It takes
>>>>>>>>>> over 2 mins to load the mapping (a single call to mapVerse) for RusSynodal
>>>>>>>>>> straight after download but a few seconds to load the file normally.
>>>>>>>>>>
>>>>>>>>>> 1. View ESV and anything else in split screen
>>>>>>>>>> 2. Download RusSynodal
>>>>>>>>>> 3. Exit download screen after download
>>>>>>>>>> 4. select RusSynodal for display (with ESV) in 1 half of split
>>>>>>>>>> screen
>>>>>>>>>> 5. Mapping code now takes over 2 minutes to load mapping
>>>>>>>>>> 6. But there are no errors and then everything seems fine
>>>>>>>>>>
>>>>>>>>>> 7. Exit and kill And Bible
>>>>>>>>>> 8. Restart with ESV/RST in split screens
>>>>>>>>>> 9. This time everything is initialised in a few seconds
>>>>>>>>>>
>>>>>>>>>> Any ideas?
>>>>>>>>>>
>>>>>>>>>> Martin
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> On 16 January 2014 19:06, Chris Burrell <chris at burrell.me.uk>wrote:
>>>>>>>>>>
>>>>>>>>>>> Hi Martin
>>>>>>>>>>>
>>>>>>>>>>> Fine by me. Think it should be opt in if possible.
>>>>>>>>>>>
>>>>>>>>>>> A couple of thoughts on that note. Do you know of it s the io or
>>>>>>>>>>> the cpu time that's the issue?
>>>>>>>>>>>
>>>>>>>>>>> We probably want to opt in with a list as well as all as there's
>>>>>>>>>>> no point in loading the v11n if not required.
>>>>>>>>>>>
>>>>>>>>>>> I guess especially if we end up with mappings per module
>>>>>>>>>>> eventually.
>>>>>>>>>>>
>>>>>>>>>>> On a separate note, I also found the books.installed call is
>>>>>>>>>>> very expensive. Thinking it may be worth partially loading these. With
>>>>>>>>>>> around 200 modules we spend almost 15 seconds loading them.
>>>>>>>>>>>
>>>>>>>>>>> Finally you can apply for a open source license of jprofiler
>>>>>>>>>>> which helps massively to work out what's going on. Have got a couple of
>>>>>>>>>>> uncommitted fixes for books. Installed find with that.
>>>>>>>>>>>
>>>>>>>>>>> Chris
>>>>>>>>>>> On 16 Jan 2014 18:10, "Martin Denham" <mjdenham at gmail.com>
>>>>>>>>>>> wrote:
>>>>>>>>>>>
>>>>>>>>>>>>  Hi,
>>>>>>>>>>>>
>>>>>>>>>>>> I have integrated the new v11n mapping code and find I am
>>>>>>>>>>>> getting a pause when doing the initial mapping between any 2 v11ns.
>>>>>>>>>>>>
>>>>>>>>>>>> I originally had the same problem with the AB mapping code so
>>>>>>>>>>>> pre-loaded all mapping required for the installed set of documents at the
>>>>>>>>>>>> start in a background thread to prevent delays.  The code I used is in
>>>>>>>>>>>> VersificationMappingFactory.initialiseRequiredMappings()<https://github.com/mjdenham/and-bible/blob/master/AndBible/src/net/bible/android/control/versification/mapping/VersificationMappingFactory.java>
>>>>>>>>>>>> .
>>>>>>>>>>>>
>>>>>>>>>>>> I am trying to think of the best way to do this in the new
>>>>>>>>>>>> code.  Either i) you could add a method to do it which could be called or
>>>>>>>>>>>> ii) you could proeload all required mappings automatically or iii) add a
>>>>>>>>>>>> new method to allow AB (or other frontend) to trigger preload of all
>>>>>>>>>>>> required mappings by adding a public method a bit like the new
>>>>>>>>>>>> VersificationsMapper.ensure(v11ntopreload).
>>>>>>>>>>>>
>>>>>>>>>>>> This is quite an issue for mobile users.  I have a fast mobile
>>>>>>>>>>>> and loading a mapping causes a noticeable delay the first time a verse
>>>>>>>>>>>> changes, but the preload fix is fairly simple and worked well.
>>>>>>>>>>>>
>>>>>>>>>>>> Cheers
>>>>>>>>>>>> Martin
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> _______________________________________________
>>>>>>>>>>>> jsword-devel mailing list
>>>>>>>>>>>> jsword-devel at crosswire.org
>>>>>>>>>>>> http://www.crosswire.org/mailman/listinfo/jsword-devel
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>> _______________________________________________
>>>>>>>>>>> jsword-devel mailing list
>>>>>>>>>>> jsword-devel at crosswire.org
>>>>>>>>>>> http://www.crosswire.org/mailman/listinfo/jsword-devel
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>
>>>>>>>
>>>>>>  _______________________________________________
>>>>> jsword-devel mailing list
>>>>> jsword-devel at crosswire.org
>>>>> http://www.crosswire.org/mailman/listinfo/jsword-devel
>>>>>
>>>>>
>>>>>
>>>>
>>>>
>>>
>>>
>>
>>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://www.crosswire.org/pipermail/jsword-devel/attachments/20140120/4e8e162b/attachment-0001.html>


More information about the jsword-devel mailing list