[jsword-devel] A smaller Chinese Lucene Analyzer
Martin Denham
mjdenham at gmail.com
Thu Nov 11 13:54:21 MST 2010
Does anybody know if there is a Chinese Lucene Analyzer that is more
lightweight than smartcn or if it is possible to configure smartcn to use
less memory?
Smart Chinese Analyzer will not run on Android because it attempts to load
up a large dictionary in order to split phrases and runs out of memory.
Here is a stack trace:
11-11 20:38:28.296: ERROR/AndroidRuntime(8925):
java.lang.ExceptionInInitializerError
11-11 20:38:28.296: ERROR/AndroidRuntime(8925): at
org.apache.lucene.analysis.cn.smart.hhmm.HHMMSegmenter.process(HHMMSegmenter.java:201)
11-11 20:38:28.296: ERROR/AndroidRuntime(8925): at
org.apache.lucene.analysis.cn.smart.WordSegmenter.segmentSentence(WordSegmenter.java:50)
11-11 20:38:28.296: ERROR/AndroidRuntime(8925): at
org.apache.lucene.analysis.cn.smart.WordTokenFilter.incrementToken(WordTokenFilter.java:69)
11-11 20:38:28.296: ERROR/AndroidRuntime(8925): at
org.apache.lucene.analysis.PorterStemFilter.incrementToken(PorterStemFilter.java:53)
11-11 20:38:28.296: ERROR/AndroidRuntime(8925): at
org.apache.lucene.analysis.StopFilter.incrementToken(StopFilter.java:225)
11-11 20:38:28.296: ERROR/AndroidRuntime(8925): at
org.apache.lucene.analysis.CachingTokenFilter.fillCache(CachingTokenFilter.java:87)
11-11 20:38:28.296: ERROR/AndroidRuntime(8925): at
org.apache.lucene.analysis.CachingTokenFilter.incrementToken(CachingTokenFilter.java:61)
11-11 20:38:28.296: ERROR/AndroidRuntime(8925): at
org.apache.lucene.queryParser.QueryParser.getFieldQuery(QueryParser.java:599)
11-11 20:38:28.296: ERROR/AndroidRuntime(8925): at
org.apache.lucene.queryParser.QueryParser.Term(QueryParser.java:1449)
11-11 20:38:28.296: ERROR/AndroidRuntime(8925): at
org.apache.lucene.queryParser.QueryParser.Clause(QueryParser.java:1337)
11-11 20:38:28.296: ERROR/AndroidRuntime(8925): at
org.apache.lucene.queryParser.QueryParser.Query(QueryParser.java:1265)
11-11 20:38:28.296: ERROR/AndroidRuntime(8925): at
org.apache.lucene.queryParser.QueryParser.TopLevelQuery(QueryParser.java:1254)
11-11 20:38:28.296: ERROR/AndroidRuntime(8925): at
org.apache.lucene.queryParser.QueryParser.parse(QueryParser.java:200)
11-11 20:38:28.296: ERROR/AndroidRuntime(8925): at
org.crosswire.jsword.index.lucene.LuceneIndex.find(Unknown Source)
<deleted a bit of the stack trace here>
11-11 20:38:28.296: ERROR/AndroidRuntime(8925): Caused by:
java.lang.OutOfMemoryError
11-11 20:38:28.296: ERROR/AndroidRuntime(8925): at
java.lang.reflect.Array.newInstance(Array.java:492)
11-11 20:38:28.296: ERROR/AndroidRuntime(8925): at
java.io.ObjectInputStream.readNewArray(ObjectInputStream.java:1637)
11-11 20:38:28.296: ERROR/AndroidRuntime(8925): at
java.io.ObjectInputStream.readNonPrimitiveContent(ObjectInputStream.java:927)
11-11 20:38:28.296: ERROR/AndroidRuntime(8925): at
java.io.ObjectInputStream.readObject(ObjectInputStream.java:2285)
11-11 20:38:28.296: ERROR/AndroidRuntime(8925): at
java.io.ObjectInputStream.readObject(ObjectInputStream.java:2240)
11-11 20:38:28.296: ERROR/AndroidRuntime(8925): at
org.apache.lucene.analysis.cn.smart.hhmm.BigramDictionary.loadFromInputStream(BigramDictionary.java:99)
11-11 20:38:28.296: ERROR/AndroidRuntime(8925): at
org.apache.lucene.analysis.cn.smart.hhmm.BigramDictionary.load(BigramDictionary.java:120)
11-11 20:38:28.296: ERROR/AndroidRuntime(8925): at
org.apache.lucene.analysis.cn.smart.hhmm.BigramDictionary.getInstance(BigramDictionary.java:71)
11-11 20:38:28.296: ERROR/AndroidRuntime(8925): at
org.apache.lucene.analysis.cn.smart.hhmm.BiSegGraph.<clinit>(BiSegGraph.java:46)
11-11 20:38:28.296: ERROR/AndroidRuntime(8925): ... 35 more
For now I will have to disable searching in Chinese texts.
Kind regards
Martin
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://www.crosswire.org/pipermail/jsword-devel/attachments/20101111/c83f0b42/attachment.html>
More information about the jsword-devel
mailing list