[jsword-devel] New versification mapping - preload mappings to prevent pauses
Martin Denham
mjdenham at gmail.com
Mon Jan 20 07:54:45 MST 2014
I did some experiments using a RangedPassage and it performs well with the
Versification code. I removed teh workaround and found there is be a delay
of ~1 sec using RangedPassage, instead of 1.5 mins using the original
RocketPassage/BitwisePassage.
I will leave the workaround in and also set RangedPassage to be the default
passage type for AB.
I needed to change PassageType.MIX to RangedPassage and will commit it
shortly.
Martin
On 19 January 2014 00:56, DM Smith <dmsmith at crosswire.org> wrote:
>
> On Jan 18, 2014, at 5:47 PM, Chris Burrell <chris at burrell.me.uk> wrote:
>
> Presumably if we have a basic osis ref parser, we can easily construct a
> verse range directly. Meaning a lot of the parsing on the versification
> mapping is simplified.
>
>
> We don't have a basic osisID parser. If we did then...
>
>
> The slight complication with the versification mappings is that it
> currently copes for the case as follows:
>
> Gen.1.1=Gen.1-2
> Gen.1.1=Gen.1.4
>
>
> These should be written:
> Gen.1.1=Gen.1.1-Gen.1.2
> Gen.1.1=Gen.1.4
> That is the right hand side of the range should be an osisID.
>
>
> i.e. split ranges, meaning that Gen 1:1 would map to verses 1,2 and 4
> (leaving verse 3 out).
>
> So if we used a osis ref parser, we could two ranges very very quickly
> (Gen.1-2 and Gen.1-4).
>
> Yes.
>
> However, what structure would we use to store the combination of both? I
> was thinking a RangedPassage but you suggested the construction of this
> would be heavy?
>
> The construction of a ranged passage one verse at a time is heavy.
> Especially when the verses are not ordered.
>
> Imagine if we added the verses for Gen 1:1 to Gen 1:6 one verse at a time,
> but in the following order 1, 3, 2, 4, 6, 7, 5, 9, 10, 8.
>
> 1 > Create a VerseRange for Gen 1:1 to Gen 1:1 and add it to the
> collection.
> 3 > Check to see if 3 is already represented or if it is adjacent to any
> existing VerseRange. Since it is not create a new VerseRange for Gen 1:3 to
> Gen 1:3.
> 2 > Do the checks, finding that it is adjacent to Gen 1:1. Extend that
> VerseRange to become Gen 1:1-Gen 1:2. Since it was also adjacent to the
> other, throw that one away and extend the first to encompass all the verses
> in that.
> 4> Extend the one VerseRange.
> 6> Create a new VerseRange.
> 7> Find and extend a VerseRange.
> 5> Find and extend a VerseRange. Find and another adjacent VerseRange and
> extend the other to include it. Then throw it away. (Now we have one verse
> range Gen 1:1-7.
> 9> Create a Verse Range.
> 10> Extend that Verse Range
> 8> Find, extend, find, merge, wack.
>
> If the RangePassage is simply constructed (e.g. construct a verse range
> and add it to an empty RangedPassage) There is no work.
>
> If the RangedPassage is "Gen.1.1-Gen.1.2 Gen.1.4", is built as a
> VerseRange for both Gen.1.1-Gen.1.2 and for Gen.1.4-Gen.1.4 and added to
> the RangedPassage, there will be little overhead.
> (In OSIS a passage is a space separate list of osisIDs and/or osisRefs).
>
> I just don't know if this is directly supported in the RangedPassage code,
> but it should.
>
>
> Presumably we could introduce a constructor that takes a collection of
> VerseRanges, meaning its construction would be also very quick?
>
>
> Yes.
>
>
> One more point, on the osis ref parser. Presumably we would want to use
> BibleBookNames but looks like it already normalizes the booknames to the
> English locale.
>
> Not really. The toUpper and toLower should always be called with an
> explicit Locale. Using Locale.ENGLISH means that it won't use the user's
> Locale. Since we control the text, we know that Locale.ENGLISH is the
> correct thing to do.
>
> The normalizing of the names is a concession of end user text entry, where
> they might not know that Gen is the one and only way to specify Genesis in
> OSIS.
>
> It would be better to have the lookup be exact. Maybe a new hash and a new
> call?
>
> Is there any particular reason for the normalize() method in the static
> initializer?
>
>
>
> Seems redundant to me... (apart from the lowercase).
>
>
> This could be optimized to call toLower directly.
>
> The principle is that the index and the search should be normalized in
> exactly the same way. If we chose, we could change normalize to upper odd
> chars and lower even. The code would continue to work.
>
> But if we change it then comments should be here and in normalize to note
> that a change to the one may require a change to the other.
>
>
> I suggest we optimize that, saving another hosts of pattern matches. (only
> a handful), but on Android it will save a number of method calls which tend
> to be more expensive than expected. Is there a strong reason to have the
> lowercase? or would it be possible to define them lowercase to start with?
>
>
> These values are used to construct osisIDs and the Mixed case is defined
> by the standard.
>
> Presumably our osis parser would want to ignore case...
>
>
> The osis parser should probably not want to ignore case. The specification
> is mixed case. Skipping toLower would be a bit faster.
>
>
> Chris
>
>
>
> On 18 January 2014 22:19, DM Smith <dmsmith at crosswire.org> wrote:
>
>> See embedded.
>> On Jan 18, 2014, at 4:05 PM, Chris Burrell <chris at burrell.me.uk> wrote:
>>
>> I think to encourage the use of the Passages directly we need to make
>> those constructors public. That's the main reason I have stayed away from
>> using it.
>>
>>
>> I see that TallyPassage is fully public and the rest are not. This must
>> have been so that Lucene ranked searches could use them.
>>
>> Maybe, we should modify PassageKeyFactory to take an optional PassageType?
>>
>>
>> Obviously a fast way to construct the versification mappings would be to
>> turn them into verse numbers. But that means losing readability.
>>
>>
>> I think readability is critical to maintenance. You'll note that the v11n
>> tables are readable.
>>
>> The obvious thought here is to generate an intermediary file as part of
>> the build from what we have already code (custom maven plugin/ant task).
>> But that seems rather hacky, but would line up with the way the
>> versifications are defined in code.
>>
>> So basically what we want is:
>> * an easy way of reading in a verse range/passage from text form into
>> Verse/VerseRanges
>>
>>
>> The are a couple of things here that we could benefit from:
>> 1) An osisID parser and an osisRef parser. An osisRef is merely two
>> osisIDs separated by a dash.
>> Note: Gen.1.1-2 is not an osisRef as 2 is not an osisID. To parse "2" the
>> parser has to know that the "basis" (the context) of "2" is Genesis 1:1.
>> Thus 2 is understood as a verse and not a chapter. However, if the basis
>> was Gen.1 then the 2 would be understood as a chapter.
>>
>> By having fully formed OSIS references*, we gain the following:
>> There are no spaces to trim.
>> Splitting on - will split into two parts, both of which are fully formed
>> osisIDs.
>> Splitting an osisID on . will result in 1 to 3 parts. book, book and
>> chapter or book, chapter and verse. The verse is always a number, never
>> "ff" or a roman numeral.
>> Lookup of the book names is simplified. No need to try to figure out that
>> Jo means John (or does it mean Job, or Jonah???). Book names are fixed
>> ascii, never internationalized, never have spaces, dashes, periods or roman
>> numerals.
>>
>> *Later, when we have references with works and grains, we may want to
>> handle those. E.g. ignore them.
>>
>> 2) Today, conceptually a Passage is an ordered list of Verses and
>> VerseRanges that don't overlap. However, JSword does not have a way to
>> build a Verse or a VerseRange from a String. Maybe we need a way to
>> construct a Passage from bottom up. I.e. create a Verse or VerseRange from
>> a String (from osisIDs and osisRefs) and then put them into an initially
>> empty Passage. Since we control the input, we know that the VerseRanges do
>> not overlap, so maybe we can have a smart constructor that takes a List of
>> Verse and VerseRanges and adds them.
>>
>>
>> Basically, the way the code works is to iterate through the properties
>> file and map and reverse-map every verse present to the maximum size
>> available (i.e. 1 verse to a Passage). The idea behind this is that we
>> store less that way.
>>
>> So if a mapping specifies as follows: Gen.1.1=Gen.1.1-2, then we only
>> need to store
>> left to right: Gen.1.1=>Gen.1.1-2
>> right to left: Gen.1=>Gen.1.1, Gen.1.2=>Gen.1.1
>>
>> In that way, we've saved a mapping, and therefore memory (assuming we
>> represent those ranges correctly).
>> Chris
>>
>>
>>
>> On 18 January 2014 20:48, DM Smith <dmsmith at crosswire.org> wrote:
>>
>>>
>>> On Jan 18, 2014, at 3:19 PM, Chris Burrell <chris at burrell.me.uk> wrote:
>>>
>>> Great stuff - helps really explain things. So I think for the
>>> versification mappings, seen as they don't change much as they are mostly
>>> contiguous ranges a RangePassage might be the best bet.
>>>
>>>
>>> A RangedPassage is really good once it is constructed. Building it takes
>>> a lot of time. Certainly not good for canonical ordered search results.
>>>
>>>
>>>
>>> I think the other thing that's quite expensive
>>> is: PassageKeyFactory.instance().getKey(targetVersification,
>>> qualifiedKey.getKey().getOsisRef())
>>>
>>>
>>> I think this is expensive because of how it is done.
>>> From memory: getOsisRef() returns a string, which needs to be parsed
>>> into what getKey() already is. The parser does not handle osisRefs. It
>>> throws an exception and the caller then mungs the reference such that the
>>> parser can handle it. JSword needs an osisRef parser, which would be much
>>> simpler and faster than the current parser that tries to understand what an
>>> end user is trying to provide.
>>>
>>> If the reference is a string (e.g. from a user, from a module, these
>>> methods are appropriate), these are what should be used. If not a string,
>>> then stay away. It is expensive to convert a Passage to a String and it is
>>> expensive to parse a String and to build the Passage.
>>>
>>> It'd be better to get an empty key list and add qualifiedKey.getKey().
>>>
>>>
>>>
>>> I'm not sure what's the best way of dealing with this but I suspect that
>>> we may want to have specific cases for Verses as I believe the above is
>>> quite heavy as well.
>>>
>>> I think I also often
>>> use: PassageKeyFactory.instance().createEmptyKeyList(versification) which
>>> probably should use a RangePassage as well
>>>
>>>
>>> It should use the PassageType that the program finds best. A server side
>>> application has very different requirements from a desktop and that from a
>>> mobile app.
>>>
>>> If you need a specific implementation, don't use the factory. Use what
>>> is best for the particular location.
>>>
>>> Just note that if two different implementations are used (like adding
>>> one passage to another) the result will probably be the program default.
>>>
>>>
>>> Would that make sense?
>>> Chris
>>>
>>>
>>>
>>> On 18 January 2014 20:11, DM Smith <dmsmith at crosswire.org> wrote:
>>>
>>>> There are several different implementations for Passage. They have
>>>> trade-offs.
>>>> BitwisePassage is strictly a BitMap, with one bit for each verse in the
>>>> v11n. It is very space heavy but fast for most operations. Especially
>>>> adding or removing Verses. Iteration is fast as it uses NextSetBit (or
>>>> whatever it is called to find the next set bit).
>>>> RocketPassage is a very heavy implementation. Not appropriate for
>>>> mobile. It occupies lots more space than a BitwisePassage
>>>> PassageTally is basically a weighted set of verses such that you can
>>>> order by weights. This is used to return weighted search results. It is
>>>> probably not much use otherwise. It is used to return a ranked Lucene
>>>> search.
>>>> RangedPassage keeps ranges (start verse and length) ordered in a
>>>> TreeSet. It probably is a good candidate for mobile. Maybe the best
>>>> regarding storage. Works well if the RangedPassage is not modified much. If
>>>> you add a verse (not in the RangedPassage) it may cause two VerseRanges to
>>>> be merged into one, otherwise it'll extend one or create a new one. That
>>>> computation takes time. Removing a verse is also complex.
>>>> DistinctPassage keeps verses ordered in a TreeSet. This can be a good
>>>> candidate for mobile, if there are not many verses in the Passage. If there
>>>> are lots it gets bigger than BitwisePassage.
>>>>
>>>> The PassageKeyFactory is used throughout JSword to create the preferred
>>>> type of Passage. You can set it to one of the above via PassageType
>>>> typically at program startup.
>>>>
>>>> I wrote BitwisePassage; Joe wrote the others.
>>>>
>>>> I've been thinking that there might be another variation of
>>>> BitwisePassage that might make sense. One with testament or book bitmaps.
>>>>
>>>> Hope this helps.
>>>>
>>>> In Him,
>>>> DM
>>>>
>>>>
>>>> On Jan 18, 2014, at 2:42 PM, Chris Burrell <chris at burrell.me.uk> wrote:
>>>>
>>>> The reason I went for passages is for less memory consumption. I also
>>>> thought it would provide better parsing and we're parsing less.
>>>>
>>>> But I think the way we store passages is to represent every verse by a
>>>> bit. means that iterating through ranges, calculating cardinality or
>>>> initialising them means is expensive.
>>>>
>>>> I wonder if we could have a passage that just stores boundaries rather
>>>> than all its content. That would be very fast to parse and very quick for
>>>> operations on it.
>>>>
>>>> Chris
>>>> On 18 Jan 2014 19:17, "Martin Denham" <mjdenham at gmail.com> wrote:
>>>>
>>>>> I think I have found a workaround for this problem (using a background
>>>>> thread makes it run very quickly) but it seems to point toward a possible
>>>>> point of improvement in the new JSword mapping data structure.
>>>>>
>>>>> In the old AB v11n mapping code I stored maps of Verses. However the
>>>>> new JSword code seems to store maps of RocketPassage/BitwisePassage. Can
>>>>> you verify this is the best data structure for what will normally be a
>>>>> single verse. I don't know an awful lot about this so I am just asking,
>>>>> but maybe a Passage will always consume more memory than a Verse.
>>>>>
>>>>> Thanks
>>>>> Martin
>>>>>
>>>>>
>>>>>
>>>>>
>>>>> On 18 January 2014 13:30, Martin Denham <mjdenham at gmail.com> wrote:
>>>>>
>>>>>> I tried your patch and it did noticeably improve things, so it is a
>>>>>> worthwhile patch, but did not solve the actual problem. Still viewing the
>>>>>> RusSynodal straight after download takes well over a minute, whereas to
>>>>>> boot straight into it takes about 2 secs.
>>>>>>
>>>>>> One or 2 things I have noticed:
>>>>>>
>>>>>> - a huge amount of memory allocation and GC occurs when trying to
>>>>>> view RusSynodal straight after download
>>>>>> - I am not great with the profiler but here are my tentative
>>>>>> observations
>>>>>> - Seems to be spending about half the time in
>>>>>> java.util.BitSet.cardinality()
>>>>>> - called by
>>>>>> BitwisePassage.countVerses()<-RocketPassage.countVerses()<-AbstractPassage.getCardinality()<-VersificationToKJVMapper.add...
>>>>>> - I tried caching AbstractPassage.getCardinality but that had
>>>>>> no affect which would be the case if there were lots of different keys
>>>>>> - Could it be that a module is in an unusual state after
>>>>>> download? I have noticed that a module cannot be deleted directly after
>>>>>> download, until the app is restarted due to:
>>>>>> - sbmd.getConfigFile()==null when called by
>>>>>> SwordBookDriver.isDeletable directly after module download
>>>>>>
>>>>>> Any suggestions are welcome.
>>>>>>
>>>>>> Cheers
>>>>>> Martin
>>>>>>
>>>>>>
>>>>>> On 16 January 2014 23:13, Martin Denham <mjdenham at gmail.com> wrote:
>>>>>>
>>>>>>> I think I will use the old AB v11n mapping for the next release of
>>>>>>> AB and come back to this after the release which has quite a lot of changes
>>>>>>> in it already anyway.
>>>>>>>
>>>>>>> The problem could be AB, Android, or JSword related and it is not
>>>>>>> trivial to reproduce involving repeated re-installs of RusSynodal. I had a
>>>>>>> similar problem a year or two ago, it took a long time to investigate and
>>>>>>> ended up being a flaw in the Android VM which required a change to JSword.
>>>>>>>
>>>>>>> I keep trying to think what could cause these symptoms and drawing a
>>>>>>> blank.
>>>>>>>
>>>>>>> Cheers
>>>>>>> Martin
>>>>>>>
>>>>>>>
>>>>>>> On 16 January 2014 21:35, Chris Burrell <chris at burrell.me.uk> wrote:
>>>>>>>
>>>>>>>> I'll try profiling it! Nothing comes to mind though.
>>>>>>>> On 16 Jan 2014 20:04, "Martin Denham" <mjdenham at gmail.com> wrote:
>>>>>>>>
>>>>>>>>> There is something unusual happening with the v11n mapping
>>>>>>>>> creation straight after downloading a file requiring mapping e.g. It takes
>>>>>>>>> over 2 mins to load the mapping (a single call to mapVerse) for RusSynodal
>>>>>>>>> straight after download but a few seconds to load the file normally.
>>>>>>>>>
>>>>>>>>> 1. View ESV and anything else in split screen
>>>>>>>>> 2. Download RusSynodal
>>>>>>>>> 3. Exit download screen after download
>>>>>>>>> 4. select RusSynodal for display (with ESV) in 1 half of split
>>>>>>>>> screen
>>>>>>>>> 5. Mapping code now takes over 2 minutes to load mapping
>>>>>>>>> 6. But there are no errors and then everything seems fine
>>>>>>>>>
>>>>>>>>> 7. Exit and kill And Bible
>>>>>>>>> 8. Restart with ESV/RST in split screens
>>>>>>>>> 9. This time everything is initialised in a few seconds
>>>>>>>>>
>>>>>>>>> Any ideas?
>>>>>>>>>
>>>>>>>>> Martin
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> On 16 January 2014 19:06, Chris Burrell <chris at burrell.me.uk>wrote:
>>>>>>>>>
>>>>>>>>>> Hi Martin
>>>>>>>>>>
>>>>>>>>>> Fine by me. Think it should be opt in if possible.
>>>>>>>>>>
>>>>>>>>>> A couple of thoughts on that note. Do you know of it s the io or
>>>>>>>>>> the cpu time that's the issue?
>>>>>>>>>>
>>>>>>>>>> We probably want to opt in with a list as well as all as there's
>>>>>>>>>> no point in loading the v11n if not required.
>>>>>>>>>>
>>>>>>>>>> I guess especially if we end up with mappings per module
>>>>>>>>>> eventually.
>>>>>>>>>>
>>>>>>>>>> On a separate note, I also found the books.installed call is very
>>>>>>>>>> expensive. Thinking it may be worth partially loading these. With around
>>>>>>>>>> 200 modules we spend almost 15 seconds loading them.
>>>>>>>>>>
>>>>>>>>>> Finally you can apply for a open source license of jprofiler
>>>>>>>>>> which helps massively to work out what's going on. Have got a couple of
>>>>>>>>>> uncommitted fixes for books. Installed find with that.
>>>>>>>>>>
>>>>>>>>>> Chris
>>>>>>>>>> On 16 Jan 2014 18:10, "Martin Denham" <mjdenham at gmail.com> wrote:
>>>>>>>>>>
>>>>>>>>>>> Hi,
>>>>>>>>>>>
>>>>>>>>>>> I have integrated the new v11n mapping code and find I am
>>>>>>>>>>> getting a pause when doing the initial mapping between any 2 v11ns.
>>>>>>>>>>>
>>>>>>>>>>> I originally had the same problem with the AB mapping code so
>>>>>>>>>>> pre-loaded all mapping required for the installed set of documents at the
>>>>>>>>>>> start in a background thread to prevent delays. The code I used is in
>>>>>>>>>>> VersificationMappingFactory.initialiseRequiredMappings()<https://github.com/mjdenham/and-bible/blob/master/AndBible/src/net/bible/android/control/versification/mapping/VersificationMappingFactory.java>
>>>>>>>>>>> .
>>>>>>>>>>>
>>>>>>>>>>> I am trying to think of the best way to do this in the new code.
>>>>>>>>>>> Either i) you could add a method to do it which could be called or ii) you
>>>>>>>>>>> could proeload all required mappings automatically or iii) add a new method
>>>>>>>>>>> to allow AB (or other frontend) to trigger preload of all required mappings
>>>>>>>>>>> by adding a public method a bit like the new
>>>>>>>>>>> VersificationsMapper.ensure(v11ntopreload).
>>>>>>>>>>>
>>>>>>>>>>> This is quite an issue for mobile users. I have a fast mobile
>>>>>>>>>>> and loading a mapping causes a noticeable delay the first time a verse
>>>>>>>>>>> changes, but the preload fix is fairly simple and worked well.
>>>>>>>>>>>
>>>>>>>>>>> Cheers
>>>>>>>>>>> Martin
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> _______________________________________________
>>>>>>>>>>> jsword-devel mailing list
>>>>>>>>>>> jsword-devel at crosswire.org
>>>>>>>>>>> http://www.crosswire.org/mailman/listinfo/jsword-devel
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>> _______________________________________________
>>>>>>>>>> jsword-devel mailing list
>>>>>>>>>> jsword-devel at crosswire.org
>>>>>>>>>> http://www.crosswire.org/mailman/listinfo/jsword-devel
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>
>>>>>>>
>>>>>>
>>>>> _______________________________________________
>>>> jsword-devel mailing list
>>>> jsword-devel at crosswire.org
>>>> http://www.crosswire.org/mailman/listinfo/jsword-devel
>>>>
>>>>
>>>>
>>>
>>>
>>
>>
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://www.crosswire.org/pipermail/jsword-devel/attachments/20140120/439e2d68/attachment-0001.html>
More information about the jsword-devel
mailing list