[jsword-devel] Out of Memory Issues Loading repo module lists

Martin Denham mjdenham at gmail.com
Tue Jan 12 06:16:32 MST 2016


I like this idea: "A file in a jar has an URL that is something like
…/fred.jar!file"

Martin

On 11 January 2016 at 23:29, DM Smith <dmsmith at crosswire.org> wrote:

>
> On Jan 11, 2016, at 6:07 PM, Martin Denham <mjdenham at gmail.com> wrote:
>
> My estimate of file size might be too low because I forgot to take into
> account block size.  Quickly playing around with my android adds about 40%
> making it at least 7Mb for the conf files.
>
>
> Understand.
>
>
> By 'fluff' do you mean extract all the files from mods.d.tar.gz and write
> them all to disk.  I am a little concerned about writing and deleting
> hundreds of small files to the sd card repeatedly.  SD cards are not as
> good at high r/w as normal disks or flash drives.  That is the reason I do
> not store the AB database on the SD card.
>
>
> By fluff, I meant that the conf file would be re-read without a filter,
> thus getting everything.
>
>
> The process described would also make viewing a description (in AB
> right-click About) an unexpectedly expensive operation involving writing
> hundreds of files to the sd card.
>
>
> It would involve re-reading the one file without a filter. It should
> happen fast.
>
>
> I did not know about sbmd.toOSIS() and have not used it.  AB just pops up
> a little dialog with a few fields like About, copyright, licence, version,
> versification.
>
>
> Ok. Then you’ll need to call “fluff” before retreiving those fields. The
> code for fluff (or whatever we call it) would be something like:
> public void fluff() {
>   if (partiallyLoaded) {
> re-read and process the conf without a filter
> partiallyLoaded = false;
>    }
> }
>
>
> For the 2 reasons above my preference would be to avoid writing hundreds
> of files to the SD card but I can't think of a perfect solution.  While
> grappling with this last week I was just trying to get the original code to
> work more efficiently (but failed).  I am not very experienced in Memory
> Analysis but suspected the memory use was higher than it might have been.
>
> By design, which files do you write to SD card? If they are only written
> when the mods.d.tar.gz is downloaded, would that help?
>
>
> If the tar.gz was searched each time for the conf it would be more
> expensive to process the tar.gz every time a Description is requested but
> the first time it would be quicker than writing hundreds of .conf files and
> to be honest I think a lot of people do not know about the long-press menu
> in AB so probably just the initial list of modules would be used most of
> the time.
>
>
> I don’t know about the long-press menu. In BibleDesktop, it is easy to
> navigate from one available to the next and each time it shows the full
> conf.
>
>
> Coincidentally my android slowed to a crawl when I tried to copy all of
> eBible's .conf files to it just now - initially fast then 1 file per 3
> secs, after 10 minutes I unplugged it, although that probably is not a
> realistic test and there is probably an explanation for the issue.
>
>
> I’ve nearly got the code written to unpack the conf. Let me zip up the
> files that have changed and send them to you.
>
> Basically, if you delete mods.d.tar.gz, it will fetch a new one (current
> behavior). If you delete mods.d/ it will unpack mods.d.tar.gz into it. If
> you fetch mods.d.tar.gz it will unpack it into mods.d. All of this takes
> place in the folder that mods.d.tar.gz is present.
>
> I tried adding new confs to mods.d that weren’t in mods.d.tar.gz to
> simulate a takedown and that works as well.
>
> If this code is no good for you, I’ve another thought. A file in a jar has
> an URL that is something like …/fred.jar!file. Maybe we can transform the
> mods.d.tar.gz into mods.d.tar and use that addressing mechanism to fetch
> the file? I’ll take a look at how the JRE does that. Maybe, I’ll roll the
> same for JSword over a tar.gz file.
>
> DM
>
>
> Martin
>
> On 11 January 2016 at 19:28, DM Smith <dmsmith at crosswire.org> wrote:
>
>> I have been thinking about this a bit more. I was knew there was a need
>> to prevent stale confs. The time performance is something that I’m not able
>> to test. My machine has an SSD, a fast 4 core CPU and gobs of RAM. So I
>> need you to keep me in line. ;)
>>
>> The easiest way to keep it pristine is to unpack it into a temporary
>> folder, rename the old folder and then rename the new folder. Finally
>> deleting the old folder. By doing it in this order it minimizes the time
>> that mods.d is unavailable. Important for multi-threaded apps and multiple
>> apps that share the same machine simultaneously.
>>
>> Right now the SwordBookMetaData remembers the File for the conf of
>> installed modules and is able to re-read it easily. But it does not store
>> anything about a conf’s location when it is from mods.d.tar.gz. I suppose I
>> could have it remember the location of mods.d.tar.gz and the name of the
>> conf entry and create a method to extract a that conf out of the compressed
>> archive. This would need to be done for each module that the user requests
>> info. To do this is quite expensive as it means inflating the file then
>> iterating over the contents until the desired conf is found.
>>
>> I think that it would be better to see how much time it adds to extract
>> the files and store them on disk. The fluffing of them would only be when
>> the user wants to browse a description of the module.
>>
>> I’d like to modify sbmd.toOSIS to check if the sbmd is partial or full
>> and if not full re-read the conf fully and then continue as before. I think
>> that is how JSword is designed to retreive the conf for presentation to the
>> end user. Does AndBible use that or some other mechanism to get what it
>> wants for presentation?
>>
>> I think I’ll add a “fluff” method to BookMetaData that will do this. This
>> could be called to get it to fluff at another time.
>>
>> DM
>>
>> On Jan 11, 2016, at 1:00 PM, Martin Denham <mjdenham at gmail.com> wrote:
>>
>> My rough estimates have the total size of conf files in all repos at
>> about 5Mb which is not too different to the size of a module like ESV so
>> the impact should not be significant and it should not be a problem if this
>> is required.
>>
>> Other things to consider that come to mind i) would need to remove conf
>> files no longer in mods.d.tar.gz or delete and re-extract everything after
>> a refresh ii) Time taken to save files - loading the list is already slow.
>>
>> I can't think of any major reason not to do as you describe.
>>
>> However, would an easier approach be to find files in the zip a bit like
>> this
>> <http://stackoverflow.com/questions/11123528/finding-a-file-in-zipentry-java>.
>> Speed would not be an issue because it would only be done once or twice
>> after fetching the list e.g. to view About or to actually download.  The
>> mod.conf file name/path could be saved in SBMD if required.
>>
>> Martin
>>
>>
>> On 11 January 2016 at 01:39, DM Smith <dmsmith at crosswire.org> wrote:
>>
>>> I’m trying to figure out how to reload a conf from a remote source (to
>>> go from a partial load to a full load).  The problem is that the
>>> AbstractSwordInstaller sits over top of mods.d.tar.gz, which it does not
>>> unpack. Instead, it iterates over all the entries in that binary file and
>>> handles each entry (i.e. a conf) in core. It doesn’t hit the disk. I’m
>>> wondering whether it would be alright to unpack the file in the same
>>> folder? That would allow a SwordBookMetaData to reload the file. It would
>>> also mean that SwordBookMetaData would only need one means of reading a
>>> conf as it’d be a file and not a byte array.
>>>
>>> It isn’t a problem with desktop or server apps, but it might be for
>>> AndBible.
>>>
>>> — DM
>>>
>>>
>>>
>>> On Jan 10, 2016, at 3:31 PM, DM Smith <dmsmith at crosswire.org> wrote:
>>>
>>> The problem you encountered was 2 bugs:
>>> When the module is not UTF-8 the remote repository’s conf is re-read,
>>> but the filter wasn’t passed.
>>> Not intended, but IniSection required a filter, rather than saying a
>>> null filter meant everything passed.
>>>
>>> I’ve checked in that fix. Still trying to make the memory less….
>>>
>>> — DM
>>>
>>> On Jan 10, 2016, at 1:18 PM, DM Smith <dmsmith at crosswire.org> wrote:
>>>
>>> The “Partial load of conf file.’ was to load all of the things in a conf
>>> that the JSword engine needs to work with a module. I don’t know why the
>>> CrossWire repo is working for me but not for you. I’ll keep working on it
>>> today. The problem with the previous commit was fixed with the last commit.
>>> I wasn’t “adjusting” the module after loading to fill in things like
>>> BookDriver and BookCategory.
>>>
>>> I’m wondering whether getting the list of Books from the installer
>>> creates a deep rather than a shallow copy of them.
>>>
>>> Today I hope to make SwordBookMetaData even more lazy. It has a
>>> BookDriver and validates its storage when the repo is loaded. I plan to
>>> break one of my modules by renaming one of the files and see the impact.
>>> Chris and I have noticed that the FileState objects are not fully released.
>>> This actually is part of the design.
>>>
>>> Anyway, I think it is going in the right direction. Reducing the memory
>>> 4x is a  good thing. The data structures within the IniSection may be too
>>> heavy. I may relax the requirement that it maintains the SWORD confs order.
>>> The idea was to be able to modify the provided conf, retaining its order.
>>> However, now we never modify that conf.
>>>
>>> configAll was a deep clone of configSword. configAll adds in the
>>> contents of configJSword and then configFrontend. These last two are
>>> created even if not needed. We could make them lazy as well.
>>>
>>> DM
>>>
>>> On Jan 10, 2016, at 11:07 AM, Martin Denham <mjdenham at gmail.com> wrote:
>>>
>>> Thanks for the quick response.  I have had a brief look at the new
>>> commits.
>>>
>>> A lot of the attributes aren't being returned now so it is tricky to
>>> test and there are various errors but running the current tip 'Partial
>>> load of conf file.
>>> <https://github.com/crosswire/jsword/commit/80020f51c6a762d458ce8ae70007b78eadee1fb3>'
>>> the SBMD for eBible is now only a quarter of the original size at 10Mb
>>> which is fine but I still don't understand why it is so large for the
>>> minimal attribute set now being returned.
>>>
>>> I get a lot of errors like:
>>> SwordBookMetaData(492): Book not supported: malformed conf file for
>>> [BBE] no ModDrv found.
>>> SwordBookMetaData(492): Malformed conf file: missing [BBE]Description=.
>>> Using BBE
>>>
>>> and peculiarly the eBible repo seems to be the only repo I can use
>>> because all the others error.
>>>
>>> I also tried the previous commit Cut the memory requirements of a
>>> SwordBookMetaData in half.
>>> <https://github.com/crosswire/jsword/commit/cc32ba8f1bb245932a747390d03874b2be70e9a1> but
>>> it did not work because basic attributes like language were not being
>>> returned.
>>>
>>> I still don't understand why removing configSword should reduce memory
>>> by half because it should just be removing references to data that is also
>>> referenced from configAll, so it would reduce memory slightly but not much.
>>>
>>> Martin
>>>
>>>
>>>
>>> On 10 January 2016 at 04:14, DM Smith <dmsmith at crosswire.org> wrote:
>>>
>>>> OK. That’s done. Also accidentally introduced a bug with the last
>>>> commit. It is noticeably fast.
>>>>
>>>> Next up, allow for *a* SwordBookMetaData to be reloaded fully. This is
>>>> needed to bring in all the other elements which are information only, such
>>>> as About, in order to display info to the end user. Since the user will
>>>> only look at one modules info at a time, it will load that one. You may
>>>> need to change your code (hope not) to force that one to reload.
>>>>
>>>> Give the code a try to see if it solves your out of memory error.
>>>>
>>>> DM
>>>>
>>>>
>>>> On Jan 9, 2016, at 9:06 PM, DM Smith <dmsmith at crosswire.org> wrote:
>>>>
>>>> I’ll be adding a filter to IniSection. Something like:
>>>> if  (filter.test(key)) {
>>>> use the key
>>>> } else {
>>>> do nothing
>>>> }
>>>>
>>>> SwordBookMetaData will be responsible for building the filter. At least
>>>> for a first go around. A single object should do.
>>>>
>>>> DM
>>>>
>>>> On Jan 9, 2016, at 6:29 PM, DM Smith <dmsmith at crosswire.org> wrote:
>>>>
>>>>
>>>> Yes, like you I have thought of streamlining conf loading for repo
>>>> lists.  One idea I had was to enable specification of a filter to
>>>> SwordBookMetaData to limit the conf values that are stored.
>>>>
>>>>
>>>> I was thinking of something similar. My ideas aren’t good enough to be
>>>> put into practice, but some kind of flag indicating empty, partially or
>>>> fully loaded. Empty would mean that it hasn’t gone to disk to get the conf.
>>>> Partial means that it read everything, but threw away most as not
>>>> interesting (since the conf does not have order you have to read and parse
>>>> it all). Full would mean that nothing was pitched.
>>>> SwordBookMetaData.getProperty would need to be changed to determine whether
>>>> the key is in memory or might be on disk and do the right thing. Or we
>>>> could keep getProperty as it is and if you want one of the fields that is
>>>> not stored (e.g. About) you have to call reload().
>>>>
>>>> Maybe we could also cache that info into a separate file(s)? When
>>>> mods.d.tar.gz is updated then the cache would be recomputed. In doing the
>>>> computation, each conf would be read then pitched. Basically, the storage
>>>> would be o.c.c.utils.Ini, if one file or IniSection, if many files.
>>>>
>>>> What do you think?
>>>>
>>>>
>>>> _______________________________________________
>>>> jsword-devel mailing list
>>>> jsword-devel at crosswire.org
>>>> http://www.crosswire.org/mailman/listinfo/jsword-devel
>>>>
>>>>
>>>>
>>>> _______________________________________________
>>>> jsword-devel mailing list
>>>> jsword-devel at crosswire.org
>>>> http://www.crosswire.org/mailman/listinfo/jsword-devel
>>>>
>>>>
>>> _______________________________________________
>>> jsword-devel mailing list
>>> jsword-devel at crosswire.org
>>> http://www.crosswire.org/mailman/listinfo/jsword-devel
>>>
>>>
>>> _______________________________________________
>>> jsword-devel mailing list
>>> jsword-devel at crosswire.org
>>> http://www.crosswire.org/mailman/listinfo/jsword-devel
>>>
>>>
>>> _______________________________________________
>>> jsword-devel mailing list
>>> jsword-devel at crosswire.org
>>> http://www.crosswire.org/mailman/listinfo/jsword-devel
>>>
>>>
>>>
>>> _______________________________________________
>>> jsword-devel mailing list
>>> jsword-devel at crosswire.org
>>> http://www.crosswire.org/mailman/listinfo/jsword-devel
>>>
>>>
>> _______________________________________________
>> jsword-devel mailing list
>> jsword-devel at crosswire.org
>> http://www.crosswire.org/mailman/listinfo/jsword-devel
>>
>>
>>
>> _______________________________________________
>> jsword-devel mailing list
>> jsword-devel at crosswire.org
>> http://www.crosswire.org/mailman/listinfo/jsword-devel
>>
>>
> _______________________________________________
> jsword-devel mailing list
> jsword-devel at crosswire.org
> http://www.crosswire.org/mailman/listinfo/jsword-devel
>
>
>
> _______________________________________________
> jsword-devel mailing list
> jsword-devel at crosswire.org
> http://www.crosswire.org/mailman/listinfo/jsword-devel
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://www.crosswire.org/pipermail/jsword-devel/attachments/20160112/4a1c09d6/attachment-0001.html>


More information about the jsword-devel mailing list