[sword-devel] Fwd: [sword-svn] r2045 - trunk/src/modules
DM Smith
dmsmith555 at yahoo.com
Thu May 3 12:28:22 MST 2007
Everyone,
I need to apologize for the change from StandardAnalyzer to
SimpleAnalyzer. I should not have made the change without an open
conversation here first. I did not think through the impact of such a
change. Troy was gracious enough to point it out to me. To revert the
change just replace SimpleAnalyzer with standard::StandardAnalyzer
globally. I can supply a patch if need be.
In His Service,
DM
DM Smith wrote:
> Martin Gruner wrote:
>
>> Hi Chris,
>>
>> are you sure you want to move from StandardAnalyzer to SimpleAnalyzer? IIRC
>> searches won't find English stop words like "for", "then", "and"...
>>
>>
>
> The SimpleAnalyzer does no stop word analysis. The only thing it does is
> lowercase everything and finds tokens as sequences of letters bounded by
> non-letters.
>
> The StandardAnalyzer does a whole boatload of stuff, in addition to what
> SimpleAnalyzer does:
> * Splits words at punctuation characters, removing punctuation. However,
> a dot that's not followed by whitespace is considered part of a token.
> (Eliminated later as part of an acronym)
> * Splits words at hyphens, unless there's a number in the token, in
> which case the whole token is interpreted as a product number and is not
> split.
> * Recognizes email addresses and internet hostnames as one token.
> * Removes ' from words followed by a trailing s or S.
> * Removes . from things it considers acronyms.
> * Eliminates the following English stop words:
> a, an, and, are, as, at, be, but, by, for, if, in, into, is, it, no,
> not, of, on, or, such, that, the, their, then, there, these, they, this,
> to, was, will, with
>
> The SimpleAnalyzer is correct.
>
>
>> mg
>>
>> ---------- Weitergeleitete Nachricht ----------
>>
>> Subject: [sword-svn] r2045 - trunk/src/modules
>> Date: Donnerstag, 3. Mai 2007
>> From: chrislit at www.crosswire.org
>> To: sword-cvs at crosswire.org
>>
>> Author: chrislit
>> Date: 2007-05-03 03:41:07 -0700 (Thu, 03 May 2007)
>> New Revision: 2045
>>
>> Modified:
>> trunk/src/modules/swmodule.cpp
>> Log:
>> DM's RAMDirectory patch for CLucene indexing
>>
>> Modified: trunk/src/modules/swmodule.cpp
>> ===================================================================
>> --- trunk/src/modules/swmodule.cpp 2007-05-01 17:35:31 UTC (rev 2044)
>> +++ trunk/src/modules/swmodule.cpp 2007-05-03 10:41:07 UTC (rev 2045)
>> @@ -515,7 +515,7 @@
>> is = new IndexSearcher(ir);
>> (*percent)(10, percentUserData);
>>
>> - standard::StandardAnalyzer analyzer;
>> + SimpleAnalyzer analyzer;
>> lucene_utf8towcs(wcharBuffer, istr, MAX_CONV_SIZE); //TODO Is istr always
>> utf8?
>> q = QueryParser::parse(wcharBuffer, _T("content"), &analyzer);
>> (*percent)(20, percentUserData);
>> @@ -960,10 +960,12 @@
>> setKey(*searchKey);
>> }
>>
>> - IndexWriter *writer = NULL;
>> + RAMDirectory *ramDir = NULL;
>> + IndexWriter *coreWriter = NULL;
>> + IndexWriter *fsWriter = NULL;
>> Directory *d = NULL;
>>
>> - standard::StandardAnalyzer *an = new standard::StandardAnalyzer();
>> + SimpleAnalyzer *an = new SimpleAnalyzer();
>> SWBuf target = getConfigEntry("AbsoluteDataPath");
>> bool includeKeyInSearch =
>> getConfig().has("SearchOption", "IncludeKeyInSearch");
>> char ch = target.c_str()[strlen(target.c_str())-1];
>> @@ -972,19 +974,10 @@
>> target.append("lucene");
>> FileMgr::createParent(target+"/dummy");
>>
>> - if (IndexReader::indexExists(target.c_str())) {
>> - d = FSDirectory::getDirectory(target.c_str(), false);
>> - if (IndexReader::isLocked(d)) {
>> - IndexReader::unlock(d);
>> - }
>> -
>> - writer = new IndexWriter( d, an, false);
>> - } else {
>> - d = FSDirectory::getDirectory(target.c_str(), true);
>> - writer = new IndexWriter( d ,an, true);
>> - }
>> + ramDir = new RAMDirectory();
>> + coreWriter = new IndexWriter(ramDir, an, true);
>> +
>>
>> -
>>
>> char perc = 1;
>> VerseKey *vkcheck = 0;
>> @@ -1222,7 +1215,7 @@
>> if (good) {
>> //printf("writing (%s).\n", (const char *)*key);
>> //fflush(stdout);
>> - writer->addDocument(doc);
>> + coreWriter->addDocument(doc);
>> }
>> delete doc;
>>
>> @@ -1230,9 +1223,29 @@
>> err = Error();
>> }
>>
>> - writer->optimize();
>> - writer->close();
>> - delete writer;
>> + // Optimizing automatically happens with the call to addIndexes
>> + //coreWriter->optimize();
>> + coreWriter->close();
>> +
>> + if (IndexReader::indexExists(target.c_str())) {
>> + d = FSDirectory::getDirectory(target.c_str(), false);
>> + if (IndexReader::isLocked(d)) {
>> + IndexReader::unlock(d);
>> + }
>> +
>> + fsWriter = new IndexWriter( d, an, false);
>> + } else {
>> + d = FSDirectory::getDirectory(target.c_str(), true);
>> + fsWriter = new IndexWriter( d ,an, true);
>> + }
>> +
>> + Directory *dirs[] = { ramDir, 0 };
>> + fsWriter->addIndexes(dirs);
>> + fsWriter->close();
>> +
>> + delete ramDir;
>> + delete coreWriter;
>> + delete fsWriter;
>> delete an;
>>
>> // reposition module back to where it was before we were called
More information about the sword-devel
mailing list