[sword-devel] search failing in Hebrew modules
Chris Little
chrislit at crosswire.org
Thu Jul 30 23:45:35 MST 2009
DM Smith wrote:
> Unicode can have multiple possible representations (byte sequences) for
> a single decorated character. Search will work only if the request and
> index match.
Something to bear in mind here is that, while we've agreed to
standardize on NFC normalization of Unicode, WLC is not normalized. This
is because of some issue with NFC and Hebrew decorated with vowels,
dagesh, & cantillation that results in incorrect rendering. So in those
cases (and I don't know how rare they are) where our encoding differs
from NFC, there could be a mismatch.
Thus, for WLC, it would be wise to include UTF8NFC() in the set of
stripFilters--in addition to NFC normalizing the search key provided by
the user.
--Chris
More information about the sword-devel
mailing list