[jsword-devel] A smaller Chinese Lucene Analyzer

DM Smith dmsmith at crosswire.org
Thu Nov 11 15:16:54 MST 2010


Martin,

In the lucene-analyzers jar try either: (let org.apache.lucene.analysis be o.a.l.a)
o.a.l.a.cn.ChineseAnalyzer or o.a.l.a.cjk.CJKAnalyzer
The latter searches bigrams and thus has a bigger index size.

Hope this helps.

In Him,
	DM

On Nov 11, 2010, at 3:54 PM, Martin Denham wrote:

> Does anybody know if there is a Chinese Lucene Analyzer that is more lightweight than smartcn or if it is possible to configure smartcn to use less memory?
> 
> Smart Chinese Analyzer will not run on Android because it attempts to load up a large dictionary in order to split phrases and runs out of memory.  Here is a stack trace:
> 
> 11-11 20:38:28.296: ERROR/AndroidRuntime(8925): java.lang.ExceptionInInitializerError
> 11-11 20:38:28.296: ERROR/AndroidRuntime(8925):     at org.apache.lucene.analysis.cn.smart.hhmm.HHMMSegmenter.process(HHMMSegmenter.java:201)
> 11-11 20:38:28.296: ERROR/AndroidRuntime(8925):     at org.apache.lucene.analysis.cn.smart.WordSegmenter.segmentSentence(WordSegmenter.java:50)
> 11-11 20:38:28.296: ERROR/AndroidRuntime(8925):     at org.apache.lucene.analysis.cn.smart.WordTokenFilter.incrementToken(WordTokenFilter.java:69)
> 11-11 20:38:28.296: ERROR/AndroidRuntime(8925):     at org.apache.lucene.analysis.PorterStemFilter.incrementToken(PorterStemFilter.java:53)
> 11-11 20:38:28.296: ERROR/AndroidRuntime(8925):     at org.apache.lucene.analysis.StopFilter.incrementToken(StopFilter.java:225)
> 11-11 20:38:28.296: ERROR/AndroidRuntime(8925):     at org.apache.lucene.analysis.CachingTokenFilter.fillCache(CachingTokenFilter.java:87)
> 11-11 20:38:28.296: ERROR/AndroidRuntime(8925):     at org.apache.lucene.analysis.CachingTokenFilter.incrementToken(CachingTokenFilter.java:61)
> 11-11 20:38:28.296: ERROR/AndroidRuntime(8925):     at org.apache.lucene.queryParser.QueryParser.getFieldQuery(QueryParser.java:599)
> 11-11 20:38:28.296: ERROR/AndroidRuntime(8925):     at org.apache.lucene.queryParser.QueryParser.Term(QueryParser.java:1449)
> 11-11 20:38:28.296: ERROR/AndroidRuntime(8925):     at org.apache.lucene.queryParser.QueryParser.Clause(QueryParser.java:1337)
> 11-11 20:38:28.296: ERROR/AndroidRuntime(8925):     at org.apache.lucene.queryParser.QueryParser.Query(QueryParser.java:1265)
> 11-11 20:38:28.296: ERROR/AndroidRuntime(8925):     at org.apache.lucene.queryParser.QueryParser.TopLevelQuery(QueryParser.java:1254)
> 11-11 20:38:28.296: ERROR/AndroidRuntime(8925):     at org.apache.lucene.queryParser.QueryParser.parse(QueryParser.java:200)
> 11-11 20:38:28.296: ERROR/AndroidRuntime(8925):     at org.crosswire.jsword.index.lucene.LuceneIndex.find(Unknown Source)
> <deleted a bit of the stack trace here>
> 11-11 20:38:28.296: ERROR/AndroidRuntime(8925): Caused by: java.lang.OutOfMemoryError
> 11-11 20:38:28.296: ERROR/AndroidRuntime(8925):     at java.lang.reflect.Array.newInstance(Array.java:492)
> 11-11 20:38:28.296: ERROR/AndroidRuntime(8925):     at java.io.ObjectInputStream.readNewArray(ObjectInputStream.java:1637)
> 11-11 20:38:28.296: ERROR/AndroidRuntime(8925):     at java.io.ObjectInputStream.readNonPrimitiveContent(ObjectInputStream.java:927)
> 11-11 20:38:28.296: ERROR/AndroidRuntime(8925):     at java.io.ObjectInputStream.readObject(ObjectInputStream.java:2285)
> 11-11 20:38:28.296: ERROR/AndroidRuntime(8925):     at java.io.ObjectInputStream.readObject(ObjectInputStream.java:2240)
> 11-11 20:38:28.296: ERROR/AndroidRuntime(8925):     at org.apache.lucene.analysis.cn.smart.hhmm.BigramDictionary.loadFromInputStream(BigramDictionary.java:99)
> 11-11 20:38:28.296: ERROR/AndroidRuntime(8925):     at org.apache.lucene.analysis.cn.smart.hhmm.BigramDictionary.load(BigramDictionary.java:120)
> 11-11 20:38:28.296: ERROR/AndroidRuntime(8925):     at org.apache.lucene.analysis.cn.smart.hhmm.BigramDictionary.getInstance(BigramDictionary.java:71)
> 11-11 20:38:28.296: ERROR/AndroidRuntime(8925):     at org.apache.lucene.analysis.cn.smart.hhmm.BiSegGraph.<clinit>(BiSegGraph.java:46)
> 11-11 20:38:28.296: ERROR/AndroidRuntime(8925):     ... 35 more
> 
> For now I will have to disable searching in Chinese texts.
> 
> Kind regards
> Martin
> 
> 
> _______________________________________________
> jsword-devel mailing list
> jsword-devel at crosswire.org
> http://www.crosswire.org/mailman/listinfo/jsword-devel




More information about the jsword-devel mailing list