[sword-devel] Chinese PinYin, OSIS, SWORD and front-ends
Chris Little
chrislit at crosswire.org
Tue Oct 19 14:20:22 MST 2010
On 10/19/2010 1:54 PM, Matthew Talbert wrote:
> On Tue, Oct 19, 2010 at 4:19 AM, David Haslam<d.haslam at ukonline.co.uk> wrote:
>>
>> Something to ponder for the future then, maybe?
>>
>> See �http://crosswire.org/wiki/Talk:Transliteration
>> http://crosswire.org/wiki/Talk:Transliteration
>>
>> Thanks, Chris, for useful comments there.
>
> As Chris says there, it would require indexing both versions of the
> module, something I don't believe is currently possible. What would be
> cool (imo) is to have the transliterated text available in a different
> field, much as lemma is done now. Then a search for trans:something
> would access the transliterated data. Of course, it would be nice to
> provide this transparently to the end user.
I'm really about as ignorant of (C)Lucene as a person can be, so someone
please correct me if I'm wrong. I believe our indexing just indexes at
the record level (verses or dictionary entries). So, upon creation of
the index, you could just concatenate the text and the transliterated
text and do indexing for that. Unless you need to support exact string
matches across record boundaries, the concatenation shouldn't affect
results.
Something I mention on the wiki, that I think you're also advocating, is
doing transliteration of the text on a word-by-word basis and placing
the result in the <w xlit="..."> attribute (all via a filter). That
partly depends on the sourcetype being OSIS (though we could do it to
plaintext too, and change its sourcetype at runtime). We could certainly
run such a filter process prior to indexing, which would mean that the
transliterated text could be searched, even if transliteration is turned
off in the current view.
--Chris
More information about the sword-devel
mailing list