[jsword-devel] A smaller Chinese Lucene Analyzer
DM Smith
dmsmith at crosswire.org
Thu Nov 11 14:13:55 MST 2010
You might try asking on Lucene's user list.
I don't know enough about Chinese to know if there is a lighter way (I think the other Chinese analyzer is also heavy), but you might do well with bigram searching.
In Him,
DM
On Nov 11, 2010, at 3:54 PM, Martin Denham wrote:
> Does anybody know if there is a Chinese Lucene Analyzer that is more lightweight than smartcn or if it is possible to configure smartcn to use less memory?
>
> Smart Chinese Analyzer will not run on Android because it attempts to load up a large dictionary in order to split phrases and runs out of memory. Here is a stack trace:
>
> 11-11 20:38:28.296: ERROR/AndroidRuntime(8925): java.lang.ExceptionInInitializerError
> 11-11 20:38:28.296: ERROR/AndroidRuntime(8925): at org.apache.lucene.analysis.cn.smart.hhmm.HHMMSegmenter.process(HHMMSegmenter.java:201)
> 11-11 20:38:28.296: ERROR/AndroidRuntime(8925): at org.apache.lucene.analysis.cn.smart.WordSegmenter.segmentSentence(WordSegmenter.java:50)
> 11-11 20:38:28.296: ERROR/AndroidRuntime(8925): at org.apache.lucene.analysis.cn.smart.WordTokenFilter.incrementToken(WordTokenFilter.java:69)
> 11-11 20:38:28.296: ERROR/AndroidRuntime(8925): at org.apache.lucene.analysis.PorterStemFilter.incrementToken(PorterStemFilter.java:53)
> 11-11 20:38:28.296: ERROR/AndroidRuntime(8925): at org.apache.lucene.analysis.StopFilter.incrementToken(StopFilter.java:225)
> 11-11 20:38:28.296: ERROR/AndroidRuntime(8925): at org.apache.lucene.analysis.CachingTokenFilter.fillCache(CachingTokenFilter.java:87)
> 11-11 20:38:28.296: ERROR/AndroidRuntime(8925): at org.apache.lucene.analysis.CachingTokenFilter.incrementToken(CachingTokenFilter.java:61)
> 11-11 20:38:28.296: ERROR/AndroidRuntime(8925): at org.apache.lucene.queryParser.QueryParser.getFieldQuery(QueryParser.java:599)
> 11-11 20:38:28.296: ERROR/AndroidRuntime(8925): at org.apache.lucene.queryParser.QueryParser.Term(QueryParser.java:1449)
> 11-11 20:38:28.296: ERROR/AndroidRuntime(8925): at org.apache.lucene.queryParser.QueryParser.Clause(QueryParser.java:1337)
> 11-11 20:38:28.296: ERROR/AndroidRuntime(8925): at org.apache.lucene.queryParser.QueryParser.Query(QueryParser.java:1265)
> 11-11 20:38:28.296: ERROR/AndroidRuntime(8925): at org.apache.lucene.queryParser.QueryParser.TopLevelQuery(QueryParser.java:1254)
> 11-11 20:38:28.296: ERROR/AndroidRuntime(8925): at org.apache.lucene.queryParser.QueryParser.parse(QueryParser.java:200)
> 11-11 20:38:28.296: ERROR/AndroidRuntime(8925): at org.crosswire.jsword.index.lucene.LuceneIndex.find(Unknown Source)
> <deleted a bit of the stack trace here>
> 11-11 20:38:28.296: ERROR/AndroidRuntime(8925): Caused by: java.lang.OutOfMemoryError
> 11-11 20:38:28.296: ERROR/AndroidRuntime(8925): at java.lang.reflect.Array.newInstance(Array.java:492)
> 11-11 20:38:28.296: ERROR/AndroidRuntime(8925): at java.io.ObjectInputStream.readNewArray(ObjectInputStream.java:1637)
> 11-11 20:38:28.296: ERROR/AndroidRuntime(8925): at java.io.ObjectInputStream.readNonPrimitiveContent(ObjectInputStream.java:927)
> 11-11 20:38:28.296: ERROR/AndroidRuntime(8925): at java.io.ObjectInputStream.readObject(ObjectInputStream.java:2285)
> 11-11 20:38:28.296: ERROR/AndroidRuntime(8925): at java.io.ObjectInputStream.readObject(ObjectInputStream.java:2240)
> 11-11 20:38:28.296: ERROR/AndroidRuntime(8925): at org.apache.lucene.analysis.cn.smart.hhmm.BigramDictionary.loadFromInputStream(BigramDictionary.java:99)
> 11-11 20:38:28.296: ERROR/AndroidRuntime(8925): at org.apache.lucene.analysis.cn.smart.hhmm.BigramDictionary.load(BigramDictionary.java:120)
> 11-11 20:38:28.296: ERROR/AndroidRuntime(8925): at org.apache.lucene.analysis.cn.smart.hhmm.BigramDictionary.getInstance(BigramDictionary.java:71)
> 11-11 20:38:28.296: ERROR/AndroidRuntime(8925): at org.apache.lucene.analysis.cn.smart.hhmm.BiSegGraph.<clinit>(BiSegGraph.java:46)
> 11-11 20:38:28.296: ERROR/AndroidRuntime(8925): ... 35 more
>
> For now I will have to disable searching in Chinese texts.
>
> Kind regards
> Martin
>
>
> _______________________________________________
> jsword-devel mailing list
> jsword-devel at crosswire.org
> http://www.crosswire.org/mailman/listinfo/jsword-devel
More information about the jsword-devel
mailing list