[sword-devel] Fwd: [sword-svn] r2045 - trunk/src/modules
DM Smith
dmsmith555 at yahoo.com
Thu May 3 11:19:58 MST 2007
Martin Gruner wrote:
> Hi Chris,
>
> are you sure you want to move from StandardAnalyzer to SimpleAnalyzer? IIRC
> searches won't find English stop words like "for", "then", "and"...
>
The SimpleAnalyzer does no stop word analysis. The only thing it does is
lowercase everything and finds tokens as sequences of letters bounded by
non-letters.
The StandardAnalyzer does a whole boatload of stuff, in addition to what
SimpleAnalyzer does:
* Splits words at punctuation characters, removing punctuation. However,
a dot that's not followed by whitespace is considered part of a token.
(Eliminated later as part of an acronym)
* Splits words at hyphens, unless there's a number in the token, in
which case the whole token is interpreted as a product number and is not
split.
* Recognizes email addresses and internet hostnames as one token.
* Removes ' from words followed by a trailing s or S.
* Removes . from things it considers acronyms.
* Eliminates the following English stop words:
a, an, and, are, as, at, be, but, by, for, if, in, into, is, it, no,
not, of, on, or, such, that, the, their, then, there, these, they, this,
to, was, will, with
The SimpleAnalyzer is correct.
> mg
>
> ---------- Weitergeleitete Nachricht ----------
>
> Subject: [sword-svn] r2045 - trunk/src/modules
> Date: Donnerstag, 3. Mai 2007
> From: chrislit at www.crosswire.org
> To: sword-cvs at crosswire.org
>
> Author: chrislit
> Date: 2007-05-03 03:41:07 -0700 (Thu, 03 May 2007)
> New Revision: 2045
>
> Modified:
> trunk/src/modules/swmodule.cpp
> Log:
> DM's RAMDirectory patch for CLucene indexing
>
> Modified: trunk/src/modules/swmodule.cpp
> ===================================================================
> --- trunk/src/modules/swmodule.cpp 2007-05-01 17:35:31 UTC (rev 2044)
> +++ trunk/src/modules/swmodule.cpp 2007-05-03 10:41:07 UTC (rev 2045)
> @@ -515,7 +515,7 @@
> is = new IndexSearcher(ir);
> (*percent)(10, percentUserData);
>
> - standard::StandardAnalyzer analyzer;
> + SimpleAnalyzer analyzer;
> lucene_utf8towcs(wcharBuffer, istr, MAX_CONV_SIZE); //TODO Is istr always
> utf8?
> q = QueryParser::parse(wcharBuffer, _T("content"), &analyzer);
> (*percent)(20, percentUserData);
> @@ -960,10 +960,12 @@
> setKey(*searchKey);
> }
>
> - IndexWriter *writer = NULL;
> + RAMDirectory *ramDir = NULL;
> + IndexWriter *coreWriter = NULL;
> + IndexWriter *fsWriter = NULL;
> Directory *d = NULL;
>
> - standard::StandardAnalyzer *an = new standard::StandardAnalyzer();
> + SimpleAnalyzer *an = new SimpleAnalyzer();
> SWBuf target = getConfigEntry("AbsoluteDataPath");
> bool includeKeyInSearch =
> getConfig().has("SearchOption", "IncludeKeyInSearch");
> char ch = target.c_str()[strlen(target.c_str())-1];
> @@ -972,19 +974,10 @@
> target.append("lucene");
> FileMgr::createParent(target+"/dummy");
>
> - if (IndexReader::indexExists(target.c_str())) {
> - d = FSDirectory::getDirectory(target.c_str(), false);
> - if (IndexReader::isLocked(d)) {
> - IndexReader::unlock(d);
> - }
> -
> - writer = new IndexWriter( d, an, false);
> - } else {
> - d = FSDirectory::getDirectory(target.c_str(), true);
> - writer = new IndexWriter( d ,an, true);
> - }
> + ramDir = new RAMDirectory();
> + coreWriter = new IndexWriter(ramDir, an, true);
> +
>
> -
>
> char perc = 1;
> VerseKey *vkcheck = 0;
> @@ -1222,7 +1215,7 @@
> if (good) {
> //printf("writing (%s).\n", (const char *)*key);
> //fflush(stdout);
> - writer->addDocument(doc);
> + coreWriter->addDocument(doc);
> }
> delete doc;
>
> @@ -1230,9 +1223,29 @@
> err = Error();
> }
>
> - writer->optimize();
> - writer->close();
> - delete writer;
> + // Optimizing automatically happens with the call to addIndexes
> + //coreWriter->optimize();
> + coreWriter->close();
> +
> + if (IndexReader::indexExists(target.c_str())) {
> + d = FSDirectory::getDirectory(target.c_str(), false);
> + if (IndexReader::isLocked(d)) {
> + IndexReader::unlock(d);
> + }
> +
> + fsWriter = new IndexWriter( d, an, false);
> + } else {
> + d = FSDirectory::getDirectory(target.c_str(), true);
> + fsWriter = new IndexWriter( d ,an, true);
> + }
> +
> + Directory *dirs[] = { ramDir, 0 };
> + fsWriter->addIndexes(dirs);
> + fsWriter->close();
> +
> + delete ramDir;
> + delete coreWriter;
> + delete fsWriter;
> delete an;
>
> // reposition module back to where it was before we were called
>
>
> _______________________________________________
> sword-cvs mailing list
> sword-cvs at crosswire.org
> http://www.crosswire.org/mailman/listinfo/sword-cvs
>
> -------------------------------------------------------
>
> _______________________________________________
> sword-devel mailing list: sword-devel at crosswire.org
> http://www.crosswire.org/mailman/listinfo/sword-devel
> Instructions to unsubscribe/change your settings at above page
>
>
More information about the sword-devel
mailing list