[sword-devel] Soft hyphens - a question about mkfastmod and Lucene search

David Haslam dfhdfh at protonmail.com
Thu Jun 11 07:01:06 EDT 2020


If the text of a SWORD module has words that contain a soft hyphen (U+00AD) what happens to these when the Lucene search index is created?

Are such soft hyphens stripped by mkfastmod ?

My understanding is that words that contain an ordinary hyphen U+2010 (or hyphen/minus U+002D) are treated as multiple words.
i.e. As if the hyphen were a space.

IMHO, the same procedure should not apply to soft hyphens, but at this stage, I'm first interested to learn what currently happens.

Best regards,

David

Sent with ProtonMail Secure Email.




More information about the sword-devel mailing list