[sword-devel] demo TEI modules

Chris Little chrislit at crosswire.org
Wed Sep 19 14:49:51 MST 2007



Troy A. Griffitts wrote:
> We probably need to do a few things here besides toupper (to assure 
> entry matches), as we've learned and done in our search code.  We 
> probably should at least normalize the utf8.  This is not a big hit 
> because it is only done on module creation for every key, and then once 
> for the input word before the binary search starts.

I wish we could display keys in non-touppered form. Capitals are so 
ugly, especially outside of basic modern western European languages.

> We could change the actual order to use a utf8 strcmp method, but this 
> would likely come with a relatively significant performance hit (though 
> maybe not-- the binary search algol will significantly limit the number 
> of actual utf8 strcmp operations we would need to perform).  This change 
> would require remaking any modules which use multibyte utf8 keys.

Collation is tricky. For one, it is always language-dependent. We have 
all the necessary data (at least for modern languages) in ICU, but using 
that means requiring ICU, which I'm quite fine with for desktop/server 
frontends, but isn't as practical for handhelds.

Independent of basic, language-wide collation standards, some dictionary 
editors pick different sort orders. The only way to cater to that is to 
store the records in their own module-specific order (e.g. using a 
GenBook-based system for the whole thing or somehow throwing away the 
binary search system). Given that most front-ends are listing the 
complete contents of the LD modules, which negates the utility of the 
binary searches, it might not be a bad idea to scrap the current system 
and make key-entry operate as a pattern-matching search (maybe regex?).

--Chris



More information about the sword-devel mailing list