[sword-devel] Fwd: [sword-svn] r2045 - trunk/src/modules

DM Smith dmsmith555 at yahoo.com
Thu May 3 11:19:58 MST 2007


Martin Gruner wrote:
> Hi Chris,
>
> are you sure you want to move from StandardAnalyzer to SimpleAnalyzer? IIRC 
> searches won't find English stop words like "for", "then", "and"...
>   

The SimpleAnalyzer does no stop word analysis. The only thing it does is 
lowercase everything and finds tokens as sequences of letters bounded by 
non-letters.

The StandardAnalyzer does a whole boatload of stuff, in addition to what 
SimpleAnalyzer does:
* Splits words at punctuation characters, removing punctuation. However, 
a  dot that's not followed by whitespace is considered part of a token. 
(Eliminated later as part of an acronym)
* Splits words at hyphens, unless there's a number in the token, in 
which case the whole token is interpreted as a product number and is not 
split.
* Recognizes email addresses and internet hostnames as one token.
* Removes ' from words followed by a trailing s or S.
* Removes . from things it considers acronyms.
* Eliminates the following English stop words:
    a, an, and, are, as, at, be, but, by, for, if, in, into, is, it, no, 
not, of, on, or, such, that, the, their, then, there, these, they, this, 
to, was, will, with

The SimpleAnalyzer is correct.

> mg
>
> ----------  Weitergeleitete Nachricht  ----------
>
> Subject: [sword-svn] r2045 - trunk/src/modules
> Date: Donnerstag, 3. Mai 2007
> From: chrislit at www.crosswire.org
> To: sword-cvs at crosswire.org
>
> Author: chrislit
> Date: 2007-05-03 03:41:07 -0700 (Thu, 03 May 2007)
> New Revision: 2045
>
> Modified:
>    trunk/src/modules/swmodule.cpp
> Log:
> DM's RAMDirectory patch for CLucene indexing
>
> Modified: trunk/src/modules/swmodule.cpp
> ===================================================================
> --- trunk/src/modules/swmodule.cpp	2007-05-01 17:35:31 UTC (rev 2044)
> +++ trunk/src/modules/swmodule.cpp	2007-05-03 10:41:07 UTC (rev 2045)
> @@ -515,7 +515,7 @@
>  			is = new IndexSearcher(ir);
>  			(*percent)(10, percentUserData);
>  
> -			standard::StandardAnalyzer analyzer;
> +			SimpleAnalyzer analyzer;
>  			lucene_utf8towcs(wcharBuffer, istr, MAX_CONV_SIZE); //TODO Is istr always 
> utf8?
>  			q = QueryParser::parse(wcharBuffer, _T("content"), &analyzer);
>  			(*percent)(20, percentUserData);
> @@ -960,10 +960,12 @@
>  		setKey(*searchKey);
>  	}
>  
> -	IndexWriter *writer = NULL;
> +	RAMDirectory *ramDir = NULL;
> +	IndexWriter *coreWriter = NULL;
> +	IndexWriter *fsWriter = NULL;
>  	Directory *d = NULL;
>   
> -	standard::StandardAnalyzer *an = new standard::StandardAnalyzer();
> +	SimpleAnalyzer *an = new SimpleAnalyzer();
>  	SWBuf target = getConfigEntry("AbsoluteDataPath");
>  	bool includeKeyInSearch = 
> getConfig().has("SearchOption", "IncludeKeyInSearch");
>  	char ch = target.c_str()[strlen(target.c_str())-1];
> @@ -972,19 +974,10 @@
>  	target.append("lucene");
>  	FileMgr::createParent(target+"/dummy");
>  
> -	if (IndexReader::indexExists(target.c_str())) {
> -		d = FSDirectory::getDirectory(target.c_str(), false);
> -		if (IndexReader::isLocked(d)) {
> -			IndexReader::unlock(d);
> -		}
> -																		   
> -		writer = new IndexWriter( d, an, false);
> -	} else {
> -		d = FSDirectory::getDirectory(target.c_str(), true);
> -		writer = new IndexWriter( d ,an, true);
> -	}
> +	ramDir = new RAMDirectory();
> +	coreWriter = new IndexWriter(ramDir, an, true);
> +	
>  
> -
>   
>  	char perc = 1;
>  	VerseKey *vkcheck = 0;
> @@ -1222,7 +1215,7 @@
>  		if (good) {
>  //printf("writing (%s).\n", (const char *)*key);
>  //fflush(stdout);
> -			writer->addDocument(doc);
> +			coreWriter->addDocument(doc);
>  		}
>  		delete doc;
>  
> @@ -1230,9 +1223,29 @@
>  		err = Error();
>  	}
>  
> -	writer->optimize();
> -	writer->close();
> -	delete writer;
> +	// Optimizing automatically happens with the call to addIndexes
> +	//coreWriter->optimize();
> +	coreWriter->close();
> +
> +	if (IndexReader::indexExists(target.c_str())) {
> +		d = FSDirectory::getDirectory(target.c_str(), false);
> +		if (IndexReader::isLocked(d)) {
> +			IndexReader::unlock(d);
> +		}
> + 
> +		fsWriter = new IndexWriter( d, an, false);
> +	} else {
> +		d = FSDirectory::getDirectory(target.c_str(), true);
> +		fsWriter = new IndexWriter( d ,an, true);
> +	}
> +
> +	Directory *dirs[] = { ramDir, 0 };
> +	fsWriter->addIndexes(dirs);
> +	fsWriter->close();
> +
> +	delete ramDir;
> +	delete coreWriter;
> +	delete fsWriter;
>  	delete an;
>  
>  	// reposition module back to where it was before we were called
>
>
> _______________________________________________
> sword-cvs mailing list
> sword-cvs at crosswire.org
> http://www.crosswire.org/mailman/listinfo/sword-cvs
>
> -------------------------------------------------------
>
> _______________________________________________
> sword-devel mailing list: sword-devel at crosswire.org
> http://www.crosswire.org/mailman/listinfo/sword-devel
> Instructions to unsubscribe/change your settings at above page
>
>   




More information about the sword-devel mailing list