org.crosswire.jsword.index.lucene.analysis
Class ConfigurableSnowballAnalyzer

java.lang.Object
  extended by org.apache.lucene.analysis.Analyzer
      extended by org.crosswire.jsword.index.lucene.analysis.AbstractBookAnalyzer
          extended by org.crosswire.jsword.index.lucene.analysis.ConfigurableSnowballAnalyzer
All Implemented Interfaces:
Closeable

public class ConfigurableSnowballAnalyzer
extends AbstractBookAnalyzer

An Analyzer whose TokenStream is built from a LowerCaseTokenizer filtered with SnowballFilter (optional) and StopFilter (optional) Default behavior: Stemming is done, Stop words not removed A snowball stemmer is configured according to the language of the Book. Currently it takes following stemmer names (available stemmers in lucene snowball package net.sf.snowball.ext)

     Danish
     Dutch
     English
     Finnish
     French
     German2
     German
     Italian
     Kp
     Lovins
     Norwegian
     Porter
     Portuguese
     Russian
     Spanish
     Swedish
 
This list is expected to expand, as and when Snowball project support more languages

Author:
sijo cherian
See Also:
The GNU Lesser General Public License for details.

Field Summary
private static HashMap<String,Set<?>> defaultStopWordMap
           
private static Map<String,String> languageCodeToStemmerLanguageNameMap
           
private  org.apache.lucene.util.Version matchVersion
           
private  String stemmerName
          The name of the stemmer to use.
 
Fields inherited from class org.crosswire.jsword.index.lucene.analysis.AbstractBookAnalyzer
book, doStemming, doStopWords, stopSet
 
Fields inherited from class org.apache.lucene.analysis.Analyzer
overridesTokenStreamMethod
 
Constructor Summary
ConfigurableSnowballAnalyzer()
           
 
Method Summary
 void pickStemmer(String languageCode)
          Given the name of a stemmer, use that one.
 org.apache.lucene.analysis.TokenStream reusableTokenStream(String fieldName, Reader reader)
           
 void setBook(Book newBook)
          The book for which analysis is being performed.
 org.apache.lucene.analysis.TokenStream tokenStream(String fieldName, Reader reader)
          Filters LowerCaseTokenizer with StopFilter if enabled and SnowballFilter.
 
Methods inherited from class org.crosswire.jsword.index.lucene.analysis.AbstractBookAnalyzer
getBook, getDoStopWords, setDoStemming, setDoStopWords, setStopWords
 
Methods inherited from class org.apache.lucene.analysis.Analyzer
close, getOffsetGap, getPositionIncrementGap, getPreviousTokenStream, setOverridesTokenStreamMethod, setPreviousTokenStream
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 

Field Detail

stemmerName

private String stemmerName
The name of the stemmer to use.


languageCodeToStemmerLanguageNameMap

private static Map<String,String> languageCodeToStemmerLanguageNameMap

defaultStopWordMap

private static HashMap<String,Set<?>> defaultStopWordMap

matchVersion

private final org.apache.lucene.util.Version matchVersion
Constructor Detail

ConfigurableSnowballAnalyzer

public ConfigurableSnowballAnalyzer()
Method Detail

tokenStream

public final org.apache.lucene.analysis.TokenStream tokenStream(String fieldName,
                                                                Reader reader)
Filters LowerCaseTokenizer with StopFilter if enabled and SnowballFilter.

Specified by:
tokenStream in class org.apache.lucene.analysis.Analyzer

reusableTokenStream

public org.apache.lucene.analysis.TokenStream reusableTokenStream(String fieldName,
                                                                  Reader reader)
                                                           throws IOException
Overrides:
reusableTokenStream in class org.apache.lucene.analysis.Analyzer
Throws:
IOException

setBook

public void setBook(Book newBook)
Description copied from class: AbstractBookAnalyzer
The book for which analysis is being performed.

Overrides:
setBook in class AbstractBookAnalyzer

pickStemmer

public void pickStemmer(String languageCode)
Given the name of a stemmer, use that one.

Parameters:
languageCode -

Copyright ยจ 2003-2015