[jsword-devel] Code optimisation request

Sat Jun 26 05:21:14 MST 2010

Welcome Martin,

I'm excited about your work.

Yes you can make changes! But not directly. Only a few people can commit changes directly.

The preferred method is to open an issue in Jira at www.crosswire.org/bugs under the appropriate bucket. In this case it would be "Common Java Utilities". And attach a patch to it.

This does several things:
1) It keeps the issue from getting lost in this mailing list.
2) It allows for discussion/iteration there.
3) It is a centralized location for changes, allowing one to know what went into a release.
...

That said, see below:

In Him,
DM

On Jun 26, 2010, at 5:22 AM, Martin Denham wrote:

> Hi,
> 
> I am not sure if I can make code changes in the JSword-common but I haven't made any before so I thought I would request the change.
> 
> In Zip.java, in the uncompress method, could you change:
>         ByteArrayOutputStream bos = new ByteArrayOutputStream();
>         BufferedOutputStream out = new BufferedOutputStream(bos, expectedLength);
>         InflaterInputStream in = new InflaterInputStream(input, new Inflater(), expectedLength);
>         byte[] buf = new byte[expectedLength];
> To:
>         ByteArrayOutputStream out = new ByteArrayOutputStream(expectedLength);
>         InflaterInputStream in = new InflaterInputStream(input, new Inflater(), expectedLength);
>         byte[] buf = new byte[expectedLength];
> 
> There are 2 optimisations:
> ByteArrrayOutputStream is passed the expectedLength parameter
I traced through the code and I'm not sure that this fixes the problems. If expectedLength is as is implied by the routine, the expected length of the uncompressed result, and a block is big, there might still be OOMs.

The ByteArrayOutputStream is naive in its allocation of additional space for its internal buffer. If not given a size, it starts at 32 and then doubles each time it needs to grow the buffer. Setting the initial size to the expected size of the final result, eliminates re-allocation. This change is an important speed improvement. And if garbage collection is not optimized, it will save memory.

At the end, the buffer is retrieved with toByteArray, which creates a copy of the buffer. So at a minimum, with this change there are two copies until "bos" is garbage collected. To eliminate that final copy, we'd have to write our own or find something else.

I also looked at InflaterInputStream. Here I am not sure that one should pass it expectedLength. It will set the internal size of its buffer. I'm not sure this buffer needs to be the expected size. It defaults to 512 and once set does not grow. It just controls how often one gets more bytes from the stream to inflate. I'm thinking that perhaps it should be small, either the default or perhaps 1024, 2048, 4096, ... Maybe min(expectedLength, 4096)?

Allocation in Zip.java of buf, probably doesn't need to be the expected size either. It can probably be set to the same as what is passed to the inflater stream.

So right now, I think that with your improvement, to unzip requires 4x the expected length or more. 

> BufferedOutputStream is not used because there is no benefit to buffering byte array streams.
Yes, I see this now. Reading the file may need to be buffered, but he byte array streams have their own internal buffering.
> These changes may seem trivial but I am attempting to use JSword on an Android mobile and the above prevents many OutOfMemory errors.
> 
> Many thanks
> Martin
> _______________________________________________
> jsword-devel mailing list
> jsword-devel at crosswire.org
> http://www.crosswire.org/mailman/listinfo/jsword-devel

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://www.crosswire.org/pipermail/jsword-devel/attachments/20100626/407b027b/attachment.html>