[sword-devel] demo TEI modules
Chris Little
chrislit at crosswire.org
Wed Sep 19 14:49:51 MST 2007
Troy A. Griffitts wrote:
> We probably need to do a few things here besides toupper (to assure
> entry matches), as we've learned and done in our search code. We
> probably should at least normalize the utf8. This is not a big hit
> because it is only done on module creation for every key, and then once
> for the input word before the binary search starts.
I wish we could display keys in non-touppered form. Capitals are so
ugly, especially outside of basic modern western European languages.
> We could change the actual order to use a utf8 strcmp method, but this
> would likely come with a relatively significant performance hit (though
> maybe not-- the binary search algol will significantly limit the number
> of actual utf8 strcmp operations we would need to perform). This change
> would require remaking any modules which use multibyte utf8 keys.
Collation is tricky. For one, it is always language-dependent. We have
all the necessary data (at least for modern languages) in ICU, but using
that means requiring ICU, which I'm quite fine with for desktop/server
frontends, but isn't as practical for handhelds.
Independent of basic, language-wide collation standards, some dictionary
editors pick different sort orders. The only way to cater to that is to
store the records in their own module-specific order (e.g. using a
GenBook-based system for the whole thing or somehow throwing away the
binary search system). Given that most front-ends are listing the
complete contents of the LD modules, which negates the utility of the
binary searches, it might not be a bad idea to scrap the current system
and make key-entry operate as a pattern-matching search (maybe regex?).
--Chris
More information about the sword-devel
mailing list