[jsword-devel] Advanced Search Enginem for Jsword
Mullins, Steven
Steven.Mullins at dmme.virginia.gov
Wed Jun 18 07:20:31 MST 2008
I have found a GPL'ed search engine, Emdros, that will do the advanced text searches that I need. For example, if I want to find all verses in the Greek NT with occurances of "Noun EIS Noun" I use the following query. It takes about 1.5 seconds to run:
SELECT ALL OBJECTS
WHERE
[verse
[Word psp = N ]
..<1
[Word lem_ascii = 'eis']
..<1
[Word psp = N ]
]
Now if I want to restrict the two nouns to be the same word I can use:
SELECT ALL OBJECTS
WHERE
[verse
[Word AS faith psp = N ]
..<1
[Word lem_ascii = 'eis']
..<1
[Word lem_ascii = faith.lem_ascii and psp = N ]
]
This is a very, very powerful lingustic search tool. In fact, it is the same engine for the syntax search in Logos Libronix. The German Bible Society uses it as well in their Stuttgart Electronic Study Bible. It's the best syntax search engine I know of, free or otherwise, for Biblical original languages. It is written in C++, but has swig wrappers for Python, Ruby, C#, Perl, PHP and yes, JAVA. It is called Emdros and is avaliable and well documented at www.emdros.org.
Emdros is basically a middleware solution. It sets between the query interface and a database. The visual structure looks like this:
Query Tool <---> Emdros <---> Database
Right now, Emdros supports several databases. SQLite is the one I use since it is a single file and is already integrated into emdros.
I think this would be a terriffic supplement/extension to the lucene search capabilities in jsword and/or sword. Perhaps a second search option called "emdros search" that would open a window to enter a query and save/open queries as text files would be in order. The return from emdros could be the references that match the query, which would then be fed to jsword for display and futher study.
I wrote a python script to create an emdros database from the raw MorphGNT text. You can see the source at http://gntools.svn.sourceforge.net/viewvc/gntools/morphgnt2emdros/morphgnt2emdros.py?revision=1&view=markup.
With the resulting database you can search using UTF8 greek (with or without accents) and plain ascii greek (b-greek encoded) lemmas and original words. You can search by word type, tense, mood, aspect, number, etc. I have a python GUI to do searches and save queries with also. I'd like to see all this in j/sword though.
Emdros is under the GPL but it is not java. That may exclude it from distribution with jsword, as lucene is. However, if a user has emdros installed, could we put a config option in jsword like "path to emdros library" and use the SWIG connection or even simpler, use a "path to emdros exe" and read the output from a system pipe. The later is what my python program uses. I have written the parsing code and regex's to read the output (in python).
I hope I have whetted some appetites. Any takers?
Steve
More information about the jsword-devel
mailing list