[jsword-devel] A smaller Chinese Lucene Analyzer

Martin Denham mjdenham at gmail.com
Thu Nov 11 13:54:21 MST 2010


Does anybody know if there is a Chinese Lucene Analyzer that is more
lightweight than smartcn or if it is possible to configure smartcn to use
less memory?

Smart Chinese Analyzer will not run on Android because it attempts to load
up a large dictionary in order to split phrases and runs out of memory.
 Here is a stack trace:

11-11 20:38:28.296: ERROR/AndroidRuntime(8925):
java.lang.ExceptionInInitializerError
11-11 20:38:28.296: ERROR/AndroidRuntime(8925):     at
org.apache.lucene.analysis.cn.smart.hhmm.HHMMSegmenter.process(HHMMSegmenter.java:201)
11-11 20:38:28.296: ERROR/AndroidRuntime(8925):     at
org.apache.lucene.analysis.cn.smart.WordSegmenter.segmentSentence(WordSegmenter.java:50)
11-11 20:38:28.296: ERROR/AndroidRuntime(8925):     at
org.apache.lucene.analysis.cn.smart.WordTokenFilter.incrementToken(WordTokenFilter.java:69)
11-11 20:38:28.296: ERROR/AndroidRuntime(8925):     at
org.apache.lucene.analysis.PorterStemFilter.incrementToken(PorterStemFilter.java:53)
11-11 20:38:28.296: ERROR/AndroidRuntime(8925):     at
org.apache.lucene.analysis.StopFilter.incrementToken(StopFilter.java:225)
11-11 20:38:28.296: ERROR/AndroidRuntime(8925):     at
org.apache.lucene.analysis.CachingTokenFilter.fillCache(CachingTokenFilter.java:87)
11-11 20:38:28.296: ERROR/AndroidRuntime(8925):     at
org.apache.lucene.analysis.CachingTokenFilter.incrementToken(CachingTokenFilter.java:61)
11-11 20:38:28.296: ERROR/AndroidRuntime(8925):     at
org.apache.lucene.queryParser.QueryParser.getFieldQuery(QueryParser.java:599)
11-11 20:38:28.296: ERROR/AndroidRuntime(8925):     at
org.apache.lucene.queryParser.QueryParser.Term(QueryParser.java:1449)
11-11 20:38:28.296: ERROR/AndroidRuntime(8925):     at
org.apache.lucene.queryParser.QueryParser.Clause(QueryParser.java:1337)
11-11 20:38:28.296: ERROR/AndroidRuntime(8925):     at
org.apache.lucene.queryParser.QueryParser.Query(QueryParser.java:1265)
11-11 20:38:28.296: ERROR/AndroidRuntime(8925):     at
org.apache.lucene.queryParser.QueryParser.TopLevelQuery(QueryParser.java:1254)
11-11 20:38:28.296: ERROR/AndroidRuntime(8925):     at
org.apache.lucene.queryParser.QueryParser.parse(QueryParser.java:200)
11-11 20:38:28.296: ERROR/AndroidRuntime(8925):     at
org.crosswire.jsword.index.lucene.LuceneIndex.find(Unknown Source)
<deleted a bit of the stack trace here>
11-11 20:38:28.296: ERROR/AndroidRuntime(8925): Caused by:
java.lang.OutOfMemoryError
11-11 20:38:28.296: ERROR/AndroidRuntime(8925):     at
java.lang.reflect.Array.newInstance(Array.java:492)
11-11 20:38:28.296: ERROR/AndroidRuntime(8925):     at
java.io.ObjectInputStream.readNewArray(ObjectInputStream.java:1637)
11-11 20:38:28.296: ERROR/AndroidRuntime(8925):     at
java.io.ObjectInputStream.readNonPrimitiveContent(ObjectInputStream.java:927)
11-11 20:38:28.296: ERROR/AndroidRuntime(8925):     at
java.io.ObjectInputStream.readObject(ObjectInputStream.java:2285)
11-11 20:38:28.296: ERROR/AndroidRuntime(8925):     at
java.io.ObjectInputStream.readObject(ObjectInputStream.java:2240)
11-11 20:38:28.296: ERROR/AndroidRuntime(8925):     at
org.apache.lucene.analysis.cn.smart.hhmm.BigramDictionary.loadFromInputStream(BigramDictionary.java:99)
11-11 20:38:28.296: ERROR/AndroidRuntime(8925):     at
org.apache.lucene.analysis.cn.smart.hhmm.BigramDictionary.load(BigramDictionary.java:120)
11-11 20:38:28.296: ERROR/AndroidRuntime(8925):     at
org.apache.lucene.analysis.cn.smart.hhmm.BigramDictionary.getInstance(BigramDictionary.java:71)
11-11 20:38:28.296: ERROR/AndroidRuntime(8925):     at
org.apache.lucene.analysis.cn.smart.hhmm.BiSegGraph.<clinit>(BiSegGraph.java:46)
11-11 20:38:28.296: ERROR/AndroidRuntime(8925):     ... 35 more

For now I will have to disable searching in Chinese texts.

Kind regards
Martin
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://www.crosswire.org/pipermail/jsword-devel/attachments/20101111/c83f0b42/attachment.html>


More information about the jsword-devel mailing list