[jsword-devel] Out of Memory Issues Loading repo module lists

Martin Denham mjdenham at gmail.com
Fri Jan 15 10:35:47 MST 2016


I have done a few experiments using different data structures in IniSection
and they seem to occupy a lot less memory but some IniSection functions are
no longer well supported e.g. get(String key, int index), so I am not sure
if this is worth pursuing.

HashMap<String, String> 3Mb for all repos (forgot to check just eBible)
HashMap<String, HashSet<String>> 3.6Mb for eBible and 5Mb for all repos

This compares to 10Mb just for eBible with the original data structure.

Martin

On 15 January 2016 at 15:05, Martin Denham <mjdenham at gmail.com> wrote:

> I was also thinking that maybe the default should be to return a full sbmd
> and another method or parameter will cause a partial sbmd to be returned.
> This would prevent surprises for other JSword users who don't have tight
> memory constraints.
>
> Martin
>
> On 15 January 2016 at 14:11, DM Smith <dmsmith at crosswire.org> wrote:
>
>> I’ll do that soon. And get back to you. I hadn’t meant to move
>> GatherAllReferences and ReadEverything from tests to examples at this time,
>> but later.
>>
>> — DM
>>
>> On Jan 15, 2016, at 9:09 AM, Martin Denham <mjdenham at gmail.com> wrote:
>>
>> Hi DM,
>>
>> Could you verify that you have checked in the Category fix as I still
>> have the problem after fetching the latest commit?  The last commit says
>> 'Fixed bug in category' but only includes changes in the examples package.
>> Also, the changes in the examples package, GatherAllReferences and
>> ReadEverything, do not compile as they are in the wrong package.
>>
>> Martin
>>
>> On 14 January 2016 at 02:16, DM Smith <dmsmith at crosswire.org> wrote:
>>
>>> Glad it is a good, workable solution.
>>>
>>> I saw that category problem earlier today and checked in a fix for it. I
>>> was addressing a problem with Bible Desktop’s display of a red ? over the
>>> cult bibles in the installer. Tried breaking it out into it’s own category.
>>> Tried putting it into “Other”. Not satisfied quite yet. But all the others
>>> are appropriately classified.
>>> — DM
>>>
>>> On Jan 13, 2016, at 5:30 PM, Martin Denham <mjdenham at gmail.com> wrote:
>>>
>>> The latest code seems to be running quite smoothly.
>>> eBible 680 modules 10Mb ram
>>>
>>> I could not notice any new major pauses.
>>> sbmd.reload works well and AB can show the About dialog.  I can see a
>>> slight but insignificant pause as the full attributes are loaded.
>>> I have installed various modules from different repositories.
>>> The above tests were done on a fairly low spec Android 2.2 AVD with 64Mb
>>> heap.
>>>
>>> The only issue I have noticed is that non-Bibles are appearing in the
>>> list of Bibles so I think there may be an issue with Category.  I can see
>>> some commentaries and GenBooks in a list that should just contain Bibles.
>>> If the problem is not obvious I can investigate further later.
>>>
>>> Thanks
>>> Martin
>>>
>>> On 12 January 2016 at 13:58, DM Smith <dmsmith at crosswire.org> wrote:
>>>
>>>> I’m working on transforming the tar.gz to a zip. The zip has much
>>>> faster access to files in it. The same amount of time per file. The tar.gz
>>>> is Tape ARchive and is fast to get the first and slow to get the last.
>>>>
>>>> I just did some computations and unpacking the tar.gz is not good for
>>>> your app. But you told me that…. :)
>>>>
>>>> In Him,
>>>> DM
>>>>
>>>> On Jan 12, 2016, at 8:16 AM, Martin Denham <mjdenham at gmail.com> wrote:
>>>>
>>>> I like this idea: "A file in a jar has an URL that is something like
>>>> …/fred.jar!file"
>>>>
>>>> Martin
>>>>
>>>> On 11 January 2016 at 23:29, DM Smith <dmsmith at crosswire.org> wrote:
>>>>
>>>>>
>>>>> On Jan 11, 2016, at 6:07 PM, Martin Denham <mjdenham at gmail.com> wrote:
>>>>>
>>>>> My estimate of file size might be too low because I forgot to take
>>>>> into account block size.  Quickly playing around with my android adds about
>>>>> 40% making it at least 7Mb for the conf files.
>>>>>
>>>>>
>>>>> Understand.
>>>>>
>>>>>
>>>>> By 'fluff' do you mean extract all the files from mods.d.tar.gz and
>>>>> write them all to disk.  I am a little concerned about writing and deleting
>>>>> hundreds of small files to the sd card repeatedly.  SD cards are not as
>>>>> good at high r/w as normal disks or flash drives.  That is the reason I do
>>>>> not store the AB database on the SD card.
>>>>>
>>>>>
>>>>> By fluff, I meant that the conf file would be re-read without a
>>>>> filter, thus getting everything.
>>>>>
>>>>>
>>>>> The process described would also make viewing a description (in AB
>>>>> right-click About) an unexpectedly expensive operation involving writing
>>>>> hundreds of files to the sd card.
>>>>>
>>>>>
>>>>> It would involve re-reading the one file without a filter. It should
>>>>> happen fast.
>>>>>
>>>>>
>>>>> I did not know about sbmd.toOSIS() and have not used it.  AB just pops
>>>>> up a little dialog with a few fields like About, copyright, licence,
>>>>> version, versification.
>>>>>
>>>>>
>>>>> Ok. Then you’ll need to call “fluff” before retreiving those fields.
>>>>> The code for fluff (or whatever we call it) would be something like:
>>>>> public void fluff() {
>>>>>   if (partiallyLoaded) {
>>>>> re-read and process the conf without a filter
>>>>> partiallyLoaded = false;
>>>>>    }
>>>>> }
>>>>>
>>>>>
>>>>> For the 2 reasons above my preference would be to avoid writing
>>>>> hundreds of files to the SD card but I can't think of a perfect solution.
>>>>> While grappling with this last week I was just trying to get the original
>>>>> code to work more efficiently (but failed).  I am not very experienced in
>>>>> Memory Analysis but suspected the memory use was higher than it might have
>>>>> been.
>>>>>
>>>>> By design, which files do you write to SD card? If they are only
>>>>> written when the mods.d.tar.gz is downloaded, would that help?
>>>>>
>>>>>
>>>>> If the tar.gz was searched each time for the conf it would be more
>>>>> expensive to process the tar.gz every time a Description is requested but
>>>>> the first time it would be quicker than writing hundreds of .conf files and
>>>>> to be honest I think a lot of people do not know about the long-press menu
>>>>> in AB so probably just the initial list of modules would be used most of
>>>>> the time.
>>>>>
>>>>>
>>>>> I don’t know about the long-press menu. In BibleDesktop, it is easy to
>>>>> navigate from one available to the next and each time it shows the full
>>>>> conf.
>>>>>
>>>>>
>>>>> Coincidentally my android slowed to a crawl when I tried to copy all
>>>>> of eBible's .conf files to it just now - initially fast then 1 file per 3
>>>>> secs, after 10 minutes I unplugged it, although that probably is not a
>>>>> realistic test and there is probably an explanation for the issue.
>>>>>
>>>>>
>>>>> I’ve nearly got the code written to unpack the conf. Let me zip up the
>>>>> files that have changed and send them to you.
>>>>>
>>>>> Basically, if you delete mods.d.tar.gz, it will fetch a new one
>>>>> (current behavior). If you delete mods.d/ it will unpack mods.d.tar.gz into
>>>>> it. If you fetch mods.d.tar.gz it will unpack it into mods.d. All of this
>>>>> takes place in the folder that mods.d.tar.gz is present.
>>>>>
>>>>> I tried adding new confs to mods.d that weren’t in mods.d.tar.gz to
>>>>> simulate a takedown and that works as well.
>>>>>
>>>>> If this code is no good for you, I’ve another thought. A file in a jar
>>>>> has an URL that is something like …/fred.jar!file. Maybe we can transform
>>>>> the mods.d.tar.gz into mods.d.tar and use that addressing mechanism to
>>>>> fetch the file? I’ll take a look at how the JRE does that. Maybe, I’ll roll
>>>>> the same for JSword over a tar.gz file.
>>>>>
>>>>> DM
>>>>>
>>>>>
>>>>> Martin
>>>>>
>>>>> On 11 January 2016 at 19:28, DM Smith <dmsmith at crosswire.org> wrote:
>>>>>
>>>>>> I have been thinking about this a bit more. I was knew there was a
>>>>>> need to prevent stale confs. The time performance is something that I’m not
>>>>>> able to test. My machine has an SSD, a fast 4 core CPU and gobs of RAM. So
>>>>>> I need you to keep me in line. ;)
>>>>>>
>>>>>> The easiest way to keep it pristine is to unpack it into a temporary
>>>>>> folder, rename the old folder and then rename the new folder. Finally
>>>>>> deleting the old folder. By doing it in this order it minimizes the time
>>>>>> that mods.d is unavailable. Important for multi-threaded apps and multiple
>>>>>> apps that share the same machine simultaneously.
>>>>>>
>>>>>> Right now the SwordBookMetaData remembers the File for the conf of
>>>>>> installed modules and is able to re-read it easily. But it does not store
>>>>>> anything about a conf’s location when it is from mods.d.tar.gz. I suppose I
>>>>>> could have it remember the location of mods.d.tar.gz and the name of the
>>>>>> conf entry and create a method to extract a that conf out of the compressed
>>>>>> archive. This would need to be done for each module that the user requests
>>>>>> info. To do this is quite expensive as it means inflating the file then
>>>>>> iterating over the contents until the desired conf is found.
>>>>>>
>>>>>> I think that it would be better to see how much time it adds to
>>>>>> extract the files and store them on disk. The fluffing of them would only
>>>>>> be when the user wants to browse a description of the module.
>>>>>>
>>>>>> I’d like to modify sbmd.toOSIS to check if the sbmd is partial or
>>>>>> full and if not full re-read the conf fully and then continue as before. I
>>>>>> think that is how JSword is designed to retreive the conf for presentation
>>>>>> to the end user. Does AndBible use that or some other mechanism to get what
>>>>>> it wants for presentation?
>>>>>>
>>>>>> I think I’ll add a “fluff” method to BookMetaData that will do this.
>>>>>> This could be called to get it to fluff at another time.
>>>>>>
>>>>>> DM
>>>>>>
>>>>>> On Jan 11, 2016, at 1:00 PM, Martin Denham <mjdenham at gmail.com>
>>>>>> wrote:
>>>>>>
>>>>>> My rough estimates have the total size of conf files in all repos at
>>>>>> about 5Mb which is not too different to the size of a module like ESV so
>>>>>> the impact should not be significant and it should not be a problem if this
>>>>>> is required.
>>>>>>
>>>>>> Other things to consider that come to mind i) would need to remove
>>>>>> conf files no longer in mods.d.tar.gz or delete and re-extract everything
>>>>>> after a refresh ii) Time taken to save files - loading the list is already
>>>>>> slow.
>>>>>>
>>>>>> I can't think of any major reason not to do as you describe.
>>>>>>
>>>>>> However, would an easier approach be to find files in the zip a bit
>>>>>> like this
>>>>>> <http://stackoverflow.com/questions/11123528/finding-a-file-in-zipentry-java>.
>>>>>> Speed would not be an issue because it would only be done once or twice
>>>>>> after fetching the list e.g. to view About or to actually download.  The
>>>>>> mod.conf file name/path could be saved in SBMD if required.
>>>>>>
>>>>>> Martin
>>>>>>
>>>>>>
>>>>>> On 11 January 2016 at 01:39, DM Smith <dmsmith at crosswire.org> wrote:
>>>>>>
>>>>>>> I’m trying to figure out how to reload a conf from a remote source
>>>>>>> (to go from a partial load to a full load).  The problem is that the
>>>>>>> AbstractSwordInstaller sits over top of mods.d.tar.gz, which it does not
>>>>>>> unpack. Instead, it iterates over all the entries in that binary file and
>>>>>>> handles each entry (i.e. a conf) in core. It doesn’t hit the disk. I’m
>>>>>>> wondering whether it would be alright to unpack the file in the same
>>>>>>> folder? That would allow a SwordBookMetaData to reload the file. It would
>>>>>>> also mean that SwordBookMetaData would only need one means of reading a
>>>>>>> conf as it’d be a file and not a byte array.
>>>>>>>
>>>>>>> It isn’t a problem with desktop or server apps, but it might be for
>>>>>>> AndBible.
>>>>>>>
>>>>>>> — DM
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> On Jan 10, 2016, at 3:31 PM, DM Smith <dmsmith at crosswire.org> wrote:
>>>>>>>
>>>>>>> The problem you encountered was 2 bugs:
>>>>>>> When the module is not UTF-8 the remote repository’s conf is
>>>>>>> re-read, but the filter wasn’t passed.
>>>>>>> Not intended, but IniSection required a filter, rather than saying a
>>>>>>> null filter meant everything passed.
>>>>>>>
>>>>>>> I’ve checked in that fix. Still trying to make the memory less….
>>>>>>>
>>>>>>> — DM
>>>>>>>
>>>>>>> On Jan 10, 2016, at 1:18 PM, DM Smith <dmsmith at crosswire.org> wrote:
>>>>>>>
>>>>>>> The “Partial load of conf file.’ was to load all of the things in a
>>>>>>> conf that the JSword engine needs to work with a module. I don’t know why
>>>>>>> the CrossWire repo is working for me but not for you. I’ll keep working on
>>>>>>> it today. The problem with the previous commit was fixed with the last
>>>>>>> commit. I wasn’t “adjusting” the module after loading to fill in things
>>>>>>> like BookDriver and BookCategory.
>>>>>>>
>>>>>>> I’m wondering whether getting the list of Books from the installer
>>>>>>> creates a deep rather than a shallow copy of them.
>>>>>>>
>>>>>>> Today I hope to make SwordBookMetaData even more lazy. It has a
>>>>>>> BookDriver and validates its storage when the repo is loaded. I plan to
>>>>>>> break one of my modules by renaming one of the files and see the impact.
>>>>>>> Chris and I have noticed that the FileState objects are not fully released.
>>>>>>> This actually is part of the design.
>>>>>>>
>>>>>>> Anyway, I think it is going in the right direction. Reducing the
>>>>>>> memory 4x is a  good thing. The data structures within the IniSection may
>>>>>>> be too heavy. I may relax the requirement that it maintains the SWORD confs
>>>>>>> order. The idea was to be able to modify the provided conf, retaining its
>>>>>>> order. However, now we never modify that conf.
>>>>>>>
>>>>>>> configAll was a deep clone of configSword. configAll adds in the
>>>>>>> contents of configJSword and then configFrontend. These last two are
>>>>>>> created even if not needed. We could make them lazy as well.
>>>>>>>
>>>>>>> DM
>>>>>>>
>>>>>>> On Jan 10, 2016, at 11:07 AM, Martin Denham <mjdenham at gmail.com>
>>>>>>> wrote:
>>>>>>>
>>>>>>> Thanks for the quick response.  I have had a brief look at the new
>>>>>>> commits.
>>>>>>>
>>>>>>> A lot of the attributes aren't being returned now so it is tricky to
>>>>>>> test and there are various errors but running the current tip 'Partial
>>>>>>> load of conf file.
>>>>>>> <https://github.com/crosswire/jsword/commit/80020f51c6a762d458ce8ae70007b78eadee1fb3>'
>>>>>>> the SBMD for eBible is now only a quarter of the original size at 10Mb
>>>>>>> which is fine but I still don't understand why it is so large for the
>>>>>>> minimal attribute set now being returned.
>>>>>>>
>>>>>>> I get a lot of errors like:
>>>>>>> SwordBookMetaData(492): Book not supported: malformed conf file for
>>>>>>> [BBE] no ModDrv found.
>>>>>>> SwordBookMetaData(492): Malformed conf file: missing
>>>>>>> [BBE]Description=. Using BBE
>>>>>>>
>>>>>>> and peculiarly the eBible repo seems to be the only repo I can use
>>>>>>> because all the others error.
>>>>>>>
>>>>>>> I also tried the previous commit Cut the memory requirements of a
>>>>>>> SwordBookMetaData in half.
>>>>>>> <https://github.com/crosswire/jsword/commit/cc32ba8f1bb245932a747390d03874b2be70e9a1> but
>>>>>>> it did not work because basic attributes like language were not being
>>>>>>> returned.
>>>>>>>
>>>>>>> I still don't understand why removing configSword should reduce
>>>>>>> memory by half because it should just be removing references to data that
>>>>>>> is also referenced from configAll, so it would reduce memory slightly but
>>>>>>> not much.
>>>>>>>
>>>>>>> Martin
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> On 10 January 2016 at 04:14, DM Smith <dmsmith at crosswire.org> wrote:
>>>>>>>
>>>>>>>> OK. That’s done. Also accidentally introduced a bug with the last
>>>>>>>> commit. It is noticeably fast.
>>>>>>>>
>>>>>>>> Next up, allow for *a* SwordBookMetaData to be reloaded fully. This
>>>>>>>> is needed to bring in all the other elements which are information only,
>>>>>>>> such as About, in order to display info to the end user. Since the user
>>>>>>>> will only look at one modules info at a time, it will load that one. You
>>>>>>>> may need to change your code (hope not) to force that one to reload.
>>>>>>>>
>>>>>>>> Give the code a try to see if it solves your out of memory error.
>>>>>>>>
>>>>>>>> DM
>>>>>>>>
>>>>>>>>
>>>>>>>> On Jan 9, 2016, at 9:06 PM, DM Smith <dmsmith at crosswire.org> wrote:
>>>>>>>>
>>>>>>>> I’ll be adding a filter to IniSection. Something like:
>>>>>>>> if  (filter.test(key)) {
>>>>>>>> use the key
>>>>>>>> } else {
>>>>>>>> do nothing
>>>>>>>> }
>>>>>>>>
>>>>>>>> SwordBookMetaData will be responsible for building the filter. At
>>>>>>>> least for a first go around. A single object should do.
>>>>>>>>
>>>>>>>> DM
>>>>>>>>
>>>>>>>> On Jan 9, 2016, at 6:29 PM, DM Smith <dmsmith at crosswire.org> wrote:
>>>>>>>>
>>>>>>>>
>>>>>>>> Yes, like you I have thought of streamlining conf loading for repo
>>>>>>>> lists.  One idea I had was to enable specification of a filter to
>>>>>>>> SwordBookMetaData to limit the conf values that are stored.
>>>>>>>>
>>>>>>>>
>>>>>>>> I was thinking of something similar. My ideas aren’t good enough to
>>>>>>>> be put into practice, but some kind of flag indicating empty, partially or
>>>>>>>> fully loaded. Empty would mean that it hasn’t gone to disk to get the conf.
>>>>>>>> Partial means that it read everything, but threw away most as not
>>>>>>>> interesting (since the conf does not have order you have to read and parse
>>>>>>>> it all). Full would mean that nothing was pitched.
>>>>>>>> SwordBookMetaData.getProperty would need to be changed to determine whether
>>>>>>>> the key is in memory or might be on disk and do the right thing. Or we
>>>>>>>> could keep getProperty as it is and if you want one of the fields that is
>>>>>>>> not stored (e.g. About) you have to call reload().
>>>>>>>>
>>>>>>>> Maybe we could also cache that info into a separate file(s)? When
>>>>>>>> mods.d.tar.gz is updated then the cache would be recomputed. In doing the
>>>>>>>> computation, each conf would be read then pitched. Basically, the storage
>>>>>>>> would be o.c.c.utils.Ini, if one file or IniSection, if many files.
>>>>>>>>
>>>>>>>> What do you think?
>>>>>>>>
>>>>>>>>
>>>>>>>> _______________________________________________
>>>>>>>> jsword-devel mailing list
>>>>>>>> jsword-devel at crosswire.org
>>>>>>>> http://www.crosswire.org/mailman/listinfo/jsword-devel
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> _______________________________________________
>>>>>>>> jsword-devel mailing list
>>>>>>>> jsword-devel at crosswire.org
>>>>>>>> http://www.crosswire.org/mailman/listinfo/jsword-devel
>>>>>>>>
>>>>>>>>
>>>>>>> _______________________________________________
>>>>>>> jsword-devel mailing list
>>>>>>> jsword-devel at crosswire.org
>>>>>>> http://www.crosswire.org/mailman/listinfo/jsword-devel
>>>>>>>
>>>>>>>
>>>>>>> _______________________________________________
>>>>>>> jsword-devel mailing list
>>>>>>> jsword-devel at crosswire.org
>>>>>>> http://www.crosswire.org/mailman/listinfo/jsword-devel
>>>>>>>
>>>>>>>
>>>>>>> _______________________________________________
>>>>>>> jsword-devel mailing list
>>>>>>> jsword-devel at crosswire.org
>>>>>>> http://www.crosswire.org/mailman/listinfo/jsword-devel
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> _______________________________________________
>>>>>>> jsword-devel mailing list
>>>>>>> jsword-devel at crosswire.org
>>>>>>> http://www.crosswire.org/mailman/listinfo/jsword-devel
>>>>>>>
>>>>>>>
>>>>>> _______________________________________________
>>>>>> jsword-devel mailing list
>>>>>> jsword-devel at crosswire.org
>>>>>> http://www.crosswire.org/mailman/listinfo/jsword-devel
>>>>>>
>>>>>>
>>>>>>
>>>>>> _______________________________________________
>>>>>> jsword-devel mailing list
>>>>>> jsword-devel at crosswire.org
>>>>>> http://www.crosswire.org/mailman/listinfo/jsword-devel
>>>>>>
>>>>>>
>>>>> _______________________________________________
>>>>> jsword-devel mailing list
>>>>> jsword-devel at crosswire.org
>>>>> http://www.crosswire.org/mailman/listinfo/jsword-devel
>>>>>
>>>>>
>>>>>
>>>>> _______________________________________________
>>>>> jsword-devel mailing list
>>>>> jsword-devel at crosswire.org
>>>>> http://www.crosswire.org/mailman/listinfo/jsword-devel
>>>>>
>>>>>
>>>> _______________________________________________
>>>> jsword-devel mailing list
>>>> jsword-devel at crosswire.org
>>>> http://www.crosswire.org/mailman/listinfo/jsword-devel
>>>>
>>>>
>>>>
>>>> _______________________________________________
>>>> jsword-devel mailing list
>>>> jsword-devel at crosswire.org
>>>> http://www.crosswire.org/mailman/listinfo/jsword-devel
>>>>
>>>>
>>> _______________________________________________
>>> jsword-devel mailing list
>>> jsword-devel at crosswire.org
>>> http://www.crosswire.org/mailman/listinfo/jsword-devel
>>>
>>>
>>>
>>> _______________________________________________
>>> jsword-devel mailing list
>>> jsword-devel at crosswire.org
>>> http://www.crosswire.org/mailman/listinfo/jsword-devel
>>>
>>>
>> _______________________________________________
>> jsword-devel mailing list
>> jsword-devel at crosswire.org
>> http://www.crosswire.org/mailman/listinfo/jsword-devel
>>
>>
>>
>> _______________________________________________
>> jsword-devel mailing list
>> jsword-devel at crosswire.org
>> http://www.crosswire.org/mailman/listinfo/jsword-devel
>>
>>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://www.crosswire.org/pipermail/jsword-devel/attachments/20160115/59ba0c20/attachment-0001.html>


More information about the jsword-devel mailing list