[bt-devel] Re: BibleTime
Martin Gruner
mg.pub at gmx.net
Sat Dec 17 00:54:27 MST 2005
Troy,
thanks for reminding us. We want to try our own implementation.
mg
Am Freitag, 16. Dezember 2005 20:08 schrieb Troy A. Griffitts:
> Hey guys,
> Just a quick note. Are you all aware that SWORD does expose clucene
> searching in the API. We have an interface to query if indexes have
> been created, and also to ask them to be created (reporting status) if
> they have not been.
>
> Also, it is my impression that clucene does not yet work correctly with
> wide characters (wchar_t is also different sizes on different platforms
> (as previously below) and does not conform to any standard).
>
> Hope this add a little,
> -Troy.
>
> Martin Gruner wrote:
> > Dear Lee,
> >
> > I'm more than excited! My comments below...
> >
> >>I have been meaning to send regular updates, but I keep thinking, "No
> >>I'll do this one more thing then send an update." Right now, in my
> >>local Bibletime tree, I have an index-based search going! Currently, it
> >>simply uses the existing search dialog, but I ignore some of the fields.
> >> The results show up in the normal results tab of the seach dialog.
> >>
> >>Here's what I've done. I implemented the search as another function in
> >>CSwordModuleInfo. Where the search dialog normally called search(), I
> >>call searchIndexed() which is my new function. The results are returned
> >>to m_searchResults as normal.
> >>
> >>I fought the Unicode issue again. The search string came from QT in
> >>UCS2. CLucene uses TCHAR which is a wchar_t if built for Unicode and
> >>just char if built for ANSI. To make matters worse, wchar_t is 2 bytes
> >>on Windows and 4 bytes on Linux. Fortunately, I found some conversion
> >>utilities in CLucene that allowed me to convert from utf8 to wide-char
> >>strings. So I use QString to convert to UTF8 then those utils to
> >>convert to CLucenes wchar types. Then I search my index and convert the
> >>results from wchar types to utf8 to stuff back into SWKey results. *Phew*
> >>
> >>:)
> >>:
> > :-p
> >
> > I suggest that we _demand_ that users install clucene built for Unicode.
> > Isn't wchar_t UCS2? Perhaps we could speed up index creation if we have a
> > direct conversion routine, instead of UCS2 - UTF8 - WCHAR_T (UCS2)? I'm
> > no expert here. We could add that later, also.
> >
> >>I am currently working on limiting the results to the search scope
> >>specified in the search dialog. I came up with a list of questions I
> >>wanted to ask to go further. I was going to send them tonight actually,
> >>but since you pinged me, here they are :)
> >>
> >>1. Search syntax. As you know CLucene has a rich search syntax. Do we
> >>want to expose that syntax directly (i.e. the user types their query in
> >>the syntax supported by CLucene) or do we want to break out the syntax
> >>into user interface elements (e.g. the AND/OR/ANY buttons, etc.)?
> >>
> >>2. Do we want index-based searching to be "the search method" or do we
> >>want it to be an option along with the search that's there now?
> >
> > It will be the standard and the only method. =) And IMO we should
> > directly expose the search syntax and offer some nice help for users to
> > learn it. This means that we can remove many buttons/boxes in the search
> > dialog. Going to be easier for us and more flexible for the users.
> >
> >>3. Index-building. When do we want to build the index? It almost makes
> >>sense to build the index when the user adds a module. However, this is
> >>a potentially long operation. We could kick off a thread to do it and
> >>keep the UI free for other purposes. Also, we could do like most search
> >>engines and force the user to build the index the first time they search.
> >
> > The last is what I'd suggest.
> > Another question: Will we be able to access the index directly, e.g.
> > getting a list of all words starting or ending with XY? I have plans for
> > an "instant concordance" function later which would operate on the index.
> > You could make a little blocking pop-up window that just says
> > "(Re)building index for module XY, this may take a while" and has a
> > progress bar. No user interaction needed.
> >
> >>4. Index-location. Where do we store the index? Do we currently have a
> >>.bibletime or something to store such things? (I might be able to answer
> >>this myself, I haven't looked for it yet.)
> >
> > You can use:
> > QString dir( KGlobal::dirs()->saveLocation("data", "bibletime/indices/")
> > );
> >
> > On my system, this will return ~/.kde/share/apps/bibletime/indices/,
> > which would be a nice location. ~/.kde/share/apps/bibletime/cache/ is
> > where we currently store the lexicon entry cache files (very simple
> > logic). Indexes also need to be rebuilt should the version of an
> > installed module OR the way we create indexes change. So I guess our
> > module version number and the "index layout" version number need to be
> > stored somewhere. Whenever the index layout changes, we increase the
> > index layout version number, and all indices will be rebuilt for the
> > users.
> > We also perhaps need a button to "Delete all index and cache files", if a
> > user has disk space problems.
> >
> >>Also, what about Bibles?
> >>Their indices are not going to change. Should we distribute index files
> >>with the modules? The user wouldn't have to build at all!
> >
> > This is not possible, because Crosswire distributes the module files, and
> > we'll likely use a different index format than other Sword frontends. So
> > I guess we'll have to take care of it.
> >
> > How long does it take? How big do they get?
> >
> >>5. Analyzers. It seems that there are many different Analyzers that can
> >>be used to build an index. (Some that differentiate between lower and
> >>uppercase, some that take into account grammar rules for certain
> >>languages, etc.) Do we want this flexibility extended to the user? Or
> >>do we just use the simple analyzer which simply breaks up words?
> >
> > I don't know, have to read more. Perhaps we should start with the simple
> > one?
> >
> >>6. Exceptions. When building my search in, CLucene code complained that
> >>C++ exceptions were turned off and CLucene requires them on. Was there
> >>a reason for them being turned off?
> >
> > Joachim, can you say something about it?
> >
> >>I think that's it for the moment... I'll try to send status updates
> >>more often :)
> >>
> >>
> >>[...]
> >
> > Lee, I just tagged cvs with rel-1-5-3 to reflect the status of the 1.5.3
> > release which just came out. Feel free to start working in cvs HEAD.
> > Should we need to make more bugfix releases in the meantime, we can
> > create a branch and work there. Once this works well and is documented,
> > we can release 1.6.
> >
> >
> > So much for now.
> >
> > mg
> >
> >>Thanks,
> >>
> >>Lee C.
> >>
> >>Martin Gruner wrote:
> >>>Hey Lee,
> >>>
> >>>just wanted to ask how you are and about your progress with BibleTime
> >>>coding / investigation. Here's a nice clucene-based project that I just
> >>>found: http://kioclucene.objectis.net/ (just as a demonstration).
> >>>
> >>>I hope you, Anna and you dear wife are doing well,
> >>>
> >>>Martin
> >
> > _______________________________________________
> > bt-devel mailing list
> > bt-devel at crosswire.org
> > http://www.crosswire.org/mailman/listinfo/bt-devel
>
> _______________________________________________
> bt-devel mailing list
> bt-devel at crosswire.org
> http://www.crosswire.org/mailman/listinfo/bt-devel
More information about the bt-devel
mailing list