[sword-devel] French ligatures in Louis SÉGOND’s text

Chris Little chrislit at crosswire.org
Mon Jul 16 10:52:40 MST 2007



DM Smith wrote:
> Doesn't ICU have locale sensitive decomposition (or transliteration)?  
> If it does then why can't we use the language of the module to set  
> the locale then decompose. This is what we are planning to do for  
> JSword (it has been on the todo list for years).

I don't see anything like this in ICU. I couldn't find anything in the 
API docs and there's nothing in the locale files themselves.

I think our best option may be to tag words on a per module basis with 
alternative forms and then index the forms as alternates with Lucene, as 
  your last post suggested. For non-Lucene searches we can normalize the 
text & search strings via the strip filters as Troy suggests.

Someone else would have to provide the code side of things, but in terms 
of markup, I think we just want to do something along the lines of:

<w xlit="basic:coeur">cœur</w>

And the strip filter (for non-Lucene searches) will just replace that 
with "couer".

--Chris





More information about the sword-devel mailing list