public class ConfigurableSnowballAnalyzer extends AbstractBookAnalyzer
TokenStream
is built from a
LowerCaseTokenizer
filtered with SnowballFilter
(optional)
and StopFilter
(optional) Default behavior: Stemming is done, Stop
words not removed A snowball stemmer is configured according to the language
of the Book. Currently it takes following stemmer names (available stemmers
in lucene snowball package net.sf.snowball.ext)
Danish Dutch English Finnish French German2 German Italian Kp Lovins Norwegian Porter Portuguese Russian Spanish SwedishThis list is expected to expand, as and when Snowball project support more languages
The GNU Lesser General Public License for details.
Modifier and Type | Field and Description |
---|---|
private static HashMap<String,Set<?>> |
defaultStopWordMap |
private static Map<String,String> |
languageCodeToStemmerLanguageNameMap |
private org.apache.lucene.util.Version |
matchVersion |
private String |
stemmerName
The name of the stemmer to use.
|
book, doStemming, doStopWords, stopSet
Constructor and Description |
---|
ConfigurableSnowballAnalyzer() |
Modifier and Type | Method and Description |
---|---|
void |
pickStemmer(String languageCode)
Given the name of a stemmer, use that one.
|
org.apache.lucene.analysis.TokenStream |
reusableTokenStream(String fieldName,
Reader reader) |
void |
setBook(Book newBook)
The book for which analysis is being performed.
|
org.apache.lucene.analysis.TokenStream |
tokenStream(String fieldName,
Reader reader)
Filters
LowerCaseTokenizer with StopFilter if enabled and
SnowballFilter . |
getBook, getDoStopWords, setDoStemming, setDoStopWords, setStopWords
private String stemmerName
private static Map<String,String> languageCodeToStemmerLanguageNameMap
private final org.apache.lucene.util.Version matchVersion
public final org.apache.lucene.analysis.TokenStream tokenStream(String fieldName, Reader reader)
LowerCaseTokenizer
with StopFilter
if enabled and
SnowballFilter
.tokenStream
in class org.apache.lucene.analysis.Analyzer
public org.apache.lucene.analysis.TokenStream reusableTokenStream(String fieldName, Reader reader) throws IOException
reusableTokenStream
in class org.apache.lucene.analysis.Analyzer
IOException
public void setBook(Book newBook)
AbstractBookAnalyzer
setBook
in class AbstractBookAnalyzer
public void pickStemmer(String languageCode)
languageCode
-