[sword-devel] French ligatures in Louis SÉGOND’s text
Chris Little
chrislit at crosswire.org
Mon Jul 16 10:52:40 MST 2007
DM Smith wrote:
> Doesn't ICU have locale sensitive decomposition (or transliteration)?
> If it does then why can't we use the language of the module to set
> the locale then decompose. This is what we are planning to do for
> JSword (it has been on the todo list for years).
I don't see anything like this in ICU. I couldn't find anything in the
API docs and there's nothing in the locale files themselves.
I think our best option may be to tag words on a per module basis with
alternative forms and then index the forms as alternates with Lucene, as
your last post suggested. For non-Lucene searches we can normalize the
text & search strings via the strip filters as Troy suggests.
Someone else would have to provide the code side of things, but in terms
of markup, I think we just want to do something along the lines of:
<w xlit="basic:coeur">cœur</w>
And the strip filter (for non-Lucene searches) will just replace that
with "couer".
--Chris
More information about the sword-devel
mailing list