[jsword-devel] Out of Memory Issues Loading repo module lists

DM Smith dmsmith at crosswire.org
Sat Jan 9 16:29:40 MST 2016


I’ve checked in the change to cut the storage in half.

> On Jan 9, 2016, at 4:44 PM, Martin Denham <mjdenham at gmail.com> wrote:
> 
> While debugging I was looking for inefficiencies leading to unintended storage of data but really struggled to analyse the memory use.  A couple of things I noted:
> many values had long sequences of \u0000 at the end.  I don't know why but they disappeared after adding buf.trimToSize() at the end of IniSection.more()
A StringBuilder has a capacity sufficient for growth of the string. That extra space has \u0000. buf.trimToSize() will remove that extra capacity. Adding it to the end of IniSection.more() will reduce the memory, but the object is scoped to the method and is available to garbage collect. You can see the discussion of this in StringBuilder.setLength(n).

Adding warnings.trimToSize() should be helpful. I’ll do that.

> IniSections.warnings can get long and is never cleared.
I’ll change the report method to clear the buffer. So it is good to call report once.
> Inisection.MAX_BUFF_SIZE is probably too large for conf files
That size is what you provided earlier. But the reader is only kept for the duration of reading the conf and then it is closed and goes out of scope. We could reduce it if you like, but I don’t think it will help.
> Can buffer memory creation be moved outside the loop in AbstractSwordInstaller.loadCachedIndex; and just cleared at the start of each loop; to relieve some stress on the gc
> SwordBookMetaData spends most of it's time doing pattern matching in report().  Not sure if there is any quicker alternative.
We can make reporting optional. Both in IniSection and in SwordBookMetaData. This will save some memory for confs that have reportable problems. But not for the others.

During debugging pattern matching is quite slow. Much faster when not debugging. Even faster not doing it at all. Of course it is not a memory problem.
> In memory analysis I saw many key strings and value strings (as expected) but I also saw many complete config line strings and could not work out where they were being stored e.g. "History_2.18=Automatically generated on 2015-12-09 from source files dated 2014-09-16 by the Digital Bible Society (http://dbs.org <http://dbs.org/>)".  If we store keys, and values, and the whole key=value string then that would double data storage, but try as I might I could not find where or how the whole line was being stored.
History is an odd one. I think the conf entries should have been "History=2.18 Automatically …..”. JSword takes these entries and transforms them to this format, replacing the old format. It shouldn’t be stored but once. If it is stored more than once that is a bug.


> However, I tried most of the above and still could not load the repositories.  I just listed the above suggestions for you to consider.
> 
> Removing configSword sounds like a good idea.  It could potentially save half the storage.  However, it might not save much if it is just storing references to data that is also stored in configJSword - I got confused about this when scanning the code.

configJSword just stores the entries that are specific to it. Say you want to save Scope and BookList, the conf will be:
[xyz]
Scope=….
BookList=….

It won’t have anything else.

Same idea for configFrontend. Of course the IniSection still is created and takes space. Not allocating them until needed would be better. But at this time, I doubt that is the memory problem.

> 
> Yes, like you I have thought of streamlining conf loading for repo lists.  One idea I had was to enable specification of a filter to SwordBookMetaData to limit the conf values that are stored.

I was thinking of something similar. My ideas aren’t good enough to be put into practice, but some kind of flag indicating empty, partially or fully loaded. Empty would mean that it hasn’t gone to disk to get the conf. Partial means that it read everything, but threw away most as not interesting (since the conf does not have order you have to read and parse it all). Full would mean that nothing was pitched. SwordBookMetaData.getProperty would need to be changed to determine whether the key is in memory or might be on disk and do the right thing. Or we could keep getProperty as it is and if you want one of the fields that is not stored (e.g. About) you have to call reload().

Maybe we could also cache that info into a separate file(s)? When mods.d.tar.gz is updated then the cache would be recomputed. In doing the computation, each conf would be read then pitched. Basically, the storage would be o.c.c.utils.Ini, if one file or IniSection, if many files.

What do you think?

DM

> 
> Martin
> 
> On 9 January 2016 at 20:39, DM Smith <dmsmith at crosswire.org <mailto:dmsmith at crosswire.org>> wrote:
> Hmm. Not sure where the problem could be. Here is an overview of the design, which is quite different than before.
> 
> The SWORD conf is a Windows-style INI file. From a technical perspective it is a collection of named multi-maps. Each key can exist more than once in the file. This is represented by the class o.c.c.utils.Ini. While SWORD allows for all the confs to be concatenated into a single file, which Ini supports. Each conf is a named multi-map. This is represented by o.c.c.utils.IniSection. JSword only uses IniSection.
> 
> The class IniSection maintains the order of the keys as seen in the module and for the order of values for a key that is seen multiple times. As long as all repeated keys are adjacent, the reading and writing of the conf will yield the same lines in the same order. Internally it uses a TreeMap<String, ListSet<String>> to hold the keys and values. It uses List<String> to maintain the insertion order of keys. Rather than spitting out errors as they are seen, they are stored in a StringBuilder. It also remembers the File to which it was loaded.
> 
> This implementation may be too heavy for memory constrained devices.
> 
> The SwordBookMetaData maintains multiple IniSections for each conf on an as needed basis. This is to support the basic idea that the SWORD conf should be kept pristine and any modifications to it should either be in memory or be stored on a shared basis or a private basis. There are 4 IniSection objects that support this:
> configSword — The pristine SWORD conf for the module
> configJSword — The changes to the SWORD conf that should be persistent and shared by all front-ends
> configFrontend — The changes to the SWORD conf that should be persistent and private to the current frontend.
> configAll — The merge of the above such that the configFrontend takes precedence over configJSword and configJSword over configFrontend. It only maintains that precedence during loading. putProperty, which targets either configAll, configJSword or configFrontend, does not honor that precedence at this time. (It could.)
> 
> The obvious optimization I see is that since configSword is only used during construction it doesn’t need to exist as a member variable, but as a local variable. That would cut memory by half. I’ll make that change.
> 
> With the increase in the number of modules, we may need to re-think the Books class. There’s no need to have a full blown SwordBookMetaData for each book in a remote repository. People have no interest beyond curiosity in most of the modules. Unless they are showing the content of a conf, the vast majority of the conf is not seen. Unfortunately, the SWORD conf does not have a defined order and necessary elements can be at the end. So reading the entire conf and then analyzing it is necessary.
> 
> Hope this helps you target what else needs to be done.
> 
> DM
> 
> > On Jan 9, 2016, at 2:19 PM, Martin Denham <mjdenham at gmail.com <mailto:mjdenham at gmail.com>> wrote:
> >
> > Testing And Bible on a couple of smaller devices I have noticed OutOfMemory errors during the creation of SwordBookMetaData when attempting to get a list of downloadable modules from a repo.  I can just about load the eBible repo or all the other repos but get out of memory errors when loading all repos.  I can no longer even load just the Crosswire repo alone on my old G1.
> >
> > All the memory consumed by the list of downloadable modules is consumed by the BookMetaData i.e. bmd.configAll and bmd.configSword.
> >
> > I have been struggling to analyse memory usage by SwordBookMetaData and so I thought I would raise the issue here for comments.
> >
> > My small test Android devices only have about 64Mb of heap, much of which is used by other system classes.
> >
> > Sizes for the list of Books:
> > Crosswire 318 modules 10Mb
> > eBible 680 modules 41Mb
> >
> > The size of Crosswire conf files is normally less than 1k so 318*1k = 318k
> > The size of eBible conf files is up to about 6k so 680*6k = 4megs
> >
> > So I suspect an inefficiency in JSword but have been pulling my hair out trying to find the issue.  I tweaked a couple of things but nothing made a lot of difference.
> >
> > Has anybody else noticed anything similar?
> >
> > Martin
> > _______________________________________________
> > jsword-devel mailing list
> > jsword-devel at crosswire.org <mailto:jsword-devel at crosswire.org>
> > http://www.crosswire.org/mailman/listinfo/jsword-devel <http://www.crosswire.org/mailman/listinfo/jsword-devel>
> 
> 
> _______________________________________________
> jsword-devel mailing list
> jsword-devel at crosswire.org <mailto:jsword-devel at crosswire.org>
> http://www.crosswire.org/mailman/listinfo/jsword-devel <http://www.crosswire.org/mailman/listinfo/jsword-devel>
> 
> _______________________________________________
> jsword-devel mailing list
> jsword-devel at crosswire.org
> http://www.crosswire.org/mailman/listinfo/jsword-devel

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://www.crosswire.org/pipermail/jsword-devel/attachments/20160109/15df81a1/attachment.html>


More information about the jsword-devel mailing list