[jsword-devel] [sword-devel] Replacement Lucene Analyzer for Japanese

DM Smith dmsmith at crosswire.org
Tue Feb 12 11:01:06 MST 2013


It is a lot of work. The analyzers and filters that we have written would need to be re-written. The code no longer uses String but rather char[] (or equivalent).

This happened well before 4.0. Typically w/ Lucene you don't want to directly upgrade from an early version of a prior release but only from the x.9 release. The difference between 3.9 and earlier is that lots of stuff is deprecated. The difference between 3.9 and 4.0 is that the deprecations are gone.

This has been very helpful in identifying how to go from one major release to the next.

We have custom language converters because theirs do too much. For example, they remove stop words. While this is generally nice. There are theological phrases in which stop words are significant, e.g. "in Christ"

Also most are built on StandardAnalyzer, which is slow and it's features are not appropriate. We use a very simple analyzer from Lucene.

There are some new Filters and Analyzers that we should be using.

I'd like to do this before we release or shortly after.

BTW, I want to get back to a release often practice.

In Him,
	DM

On Feb 12, 2013, at 10:15 AM, Chris Burrell <chris at burrell.me.uk> wrote:

> So on the JSword front, it would be good to move up to Lucene 4 at some stage. Are we saying this will need more work than just a simple upgrade?
> 
> Also, why do we have our custom language converters. Lucene seems to have most of the ones we're using, and we seem to simply wrap around the Filters in the library?
> 
> Chris
> 
> 
> On 12 February 2013 15:12, DM Smith <dmsmith at crosswire.org> wrote:
> Reposting to JSword-devel.
> 
> On Feb 12, 2013, at 6:47 AM, David Haslam  wrote:
> 
> > Some languages, like Japanese and Chinese, are configured in JSword to use
> > the SmartCN Lucene Analyzer.
> >
> > SmartCN contains a massive dictionary which is too large for most mobiles.
> >
> > We don't package SmartCN with And Bible so somebody needs to do some work to
> > find a replacement Lucene Analyzer for Japanese.
> >
> > cf. For Chinese we now use mmseg4j.
> >
> > David (on behalf of Martin)
> >
> > https://code.google.com/p/and-bible/issues/detail?id=160
> >
> >
> >
> > --
> > View this message in context: http://sword-dev.350566.n4.nabble.com/Replacement-Lucene-Analyzer-for-Japanese-tp4651942.html
> > Sent from the SWORD Dev mailing list archive at Nabble.com.
> >
> > _______________________________________________
> > sword-devel mailing list: sword-devel at crosswire.org
> > http://www.crosswire.org/mailman/listinfo/sword-devel
> > Instructions to unsubscribe/change your settings at above page
> 
> 
> _______________________________________________
> jsword-devel mailing list
> jsword-devel at crosswire.org
> http://www.crosswire.org/mailman/listinfo/jsword-devel
> 
> _______________________________________________
> jsword-devel mailing list
> jsword-devel at crosswire.org
> http://www.crosswire.org/mailman/listinfo/jsword-devel

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://www.crosswire.org/pipermail/jsword-devel/attachments/20130212/de6bf1db/attachment-0001.html>


More information about the jsword-devel mailing list