[jsword-devel] regex searching

Trenton D. Adams trent.jsword at trentonadams.ca
Thu Feb 25 11:27:51 MST 2010


Hi DM,

I'll look into it a little later.  I'm switching back to my other project for now.

p.s.
If you have any maven questions, let me know.  And eventually, we can look into setting up with the core maven repo.

Thanks.

----- "DM Smith" <dmsmith at crosswire.org> wrote:

> From: "DM Smith" <dmsmith at crosswire.org>
> To: jsword-devel at crosswire.org
> Sent: Wednesday, February 24, 2010 10:05:25 PM GMT -06:00 US/Canada Central
> Subject: Re: [jsword-devel] regex searching
>
> JSword search indexes support the full search syntax of Lucene. This
> has 
> some support for regular expressions. Specifically, it allows for * to
> 
> mean zero or more characters. I think we've got the Lucene flag to
> allow 
> prefix wild cards. Note that this is more like what a shell uses: it
> is 
> not a modifier for the previous character.
> 
> It was correctly noted that wash.*word will not search the verse as a
> 
> whole. Lucene search is based on words. The correct pattern in Lucene
> 
> would be wash*word, as '.' does not mean any character but rather it 
> means a punctuation mark. So "wash.*word" would be split into "wash 
> *word" and would find all verses with the word "wash" and any word 
> ending with "word", such as "sword".
> 
> The default connector for JSword is OR. It would be good to add the 
> ability for a user to change it to AND. With that, "wash* word" would
> 
> produce the expected results as it would be interpreted as "wash* AND
> 
> word" instead of "wash* OR word". (Choice would be some thing like 
> "Search for ALL words instead of ANY words.") We made OR the default 
> because it more closely matched various familiar search engines, such
> as 
> Google, and because it is the default of Lucene.
> 
> Lucene does have a regular expression capability, but it is not part
> of 
> JSword. It would be a good addition. Still, it would be based on words
> 
> and not on the text of a verse.
> 
> Adding the ability to search an arbitrary regular expression would be
> a 
> good addition. I don't think it would be too hard to add it. Jsword 
> already has the interface for any search implementation. Java's Regex
> is 
> a variant of Perl's but has a bit more power.  There are some issues:
> 
> We'll be adding highlighting to Lucene's search. Adding that to Regex
> 
> would be a separate effort. The regex search would be mutually
> exclusive 
> from Lucene search, so that would need to be made obvious. (In The
> SWORD 
> Project for Windows, they have both it is a bit confusing as it is not
> 
> clear.)
> 
> And yes, it will need to go throw the "plain" text of each verse. It 
> will be about as slow as creating an index. Basically, it will need to
> 
> take the raw verse and strip the markup. (This is part of JSword
> already 
> and Lucene indexing uses it.) And perhaps strip out the punctuation. 
> Finally doing the search.
> 
> If you want, add an issue or two to Jira (www.crosswire.org/bugs)
> under 
> JSword. That way it won't be forgotten.
> 
> In Christ,
>      DM
> 
> 
> On 02/23/2010 08:33 PM, Trenton D. Adams wrote:
> > But then again, I wonder if it's even needed, who knows.
> >
> > ----- "Trenton D. Adams"<trent.jsword at trentonadams.ca>  wrote:
> >
> >    
> >> From: "Trenton D. Adams"<trent.jsword at trentonadams.ca>
> >> To: "J-Sword Developers Mailing"<jsword-devel at crosswire.org>
> >> Sent: Tuesday, February 23, 2010 7:29:08 PM GMT -06:00 US/Canada
> Central
> >> Subject: [jsword-devel] regex searching
> >>
> >> Hello,
> >>
> >> I'm getting the impression that regular expression searching is not
> at
> >> all possible without implementing something that loads the books
> >> itself, and goes through each verse of the bible.  Is this true?
> >>
> >> It seems like the Lucene Index just doesn't support regular
> expression
> >> searches, eh?  And it also seems like SearcherFactory is currently
> not
> >> finished being implemented as a SearcherFactory, correct?  Or,
> perhaps
> >> other methods need to be added, like createSearcher(Book, Class
> >> searcherClass)???  Then you could provide another searcher type.
> >>
> >> What would be appropriate for this? adding the new
> >> SearcherFactory.createSearcher(), adding another "find" method to
> >> AbstractBook/Book that is something like "find(SearchRequest,
> >> Searcher)"???
> >>
> >> Are the books and everything abstract enough that I can search them
> by
> >> loading the text?  Or would I have to restrict it to the book types
> I
> >> know how to parse?
> >>
> >> Anything else?
> >>
> >> Thanks.
> >>
> >> _______________________________________________
> >> jsword-devel mailing list
> >> jsword-devel at crosswire.org
> >> http://www.crosswire.org/mailman/listinfo/jsword-devel
> >>      
> > _______________________________________________
> > jsword-devel mailing list
> > jsword-devel at crosswire.org
> > http://www.crosswire.org/mailman/listinfo/jsword-devel
> >    
> 
> 
> _______________________________________________
> jsword-devel mailing list
> jsword-devel at crosswire.org
> http://www.crosswire.org/mailman/listinfo/jsword-devel



More information about the jsword-devel mailing list