[sword-devel] Devanagari text displays different in SWORD than in the source IMP file

Tim Chase tchase at maf.org
Tue Sep 8 10:31:57 MST 2009


Hi All

Another related issue with regards to zwj and zwnj is how these characters
are handled by the front end applications with regards to searching for
words.  Why would it make a difference?  Some words in devanagari script can
be spelled with or without the zwj character and still be valid. I did a
small test to see how SPW and BPB handle the searching.  For BPB it made no
difference if the zwj was included or not.  In both cases I got 33 hits for
the word in my test module for MARK in Nepali.  For SPW I got zero hits
without the zwj and 9 hits when I included the zwj which is how the word was
keyed in the source.
BPB brings up hits that are not exact matches but very close while SPW will
only bring up hits with exact matches that include the zwj.

I found an interesting web page that shows how various search engines handle
the zwj characters.  Google finds these words with or without the zwj.
Thought it might add something more to the discussion.

http://gii2.nagaokaut.ac.jp/gii/blog/lopdiary.php/lopdiary.php?itemid=507

BPB - BP Bible
SPW - SWORD Project for Windows

Tim


-----Original Message-----
From: Chris Little [mailto:chrislit at crosswire.org] 
Sent: Tuesday, September 08, 2009 2:05 PM
To: SWORD Developers' Collaboration Forum
Subject: Re: [sword-devel] Devanagari text displays different in SWORD than
in the source IMP file

Before anyone starts making authoritative statements about ZWJ or ZWNJ 
in various modules and their reflexes in front ends, I would like to see 
some sort of proof that this is even relevant to the problem.

If ZWNJ is present in the module, it isn't being changed by Sword or by 
BibleCS. If you copy text from BibleCS and paste it into an editor that 
renders things correctly, such as BabelPad or Notepad, you get back the 
correct rendering--so it's not inserting, deleting, or changing codepoints.

My own feeling is that the problem lies in the renderer used by various 
front ends. And specific to BibleCS, I suspect we can fix the issue by 
compiling in a more recent version of C++Builder (which I'll try to do, 
when I get a chance, unless Troy beats me to it).

Font choice is important. You have to use a font with the correct font 
tables. (Graphite tables would work, but OpenType are entirely 
sufficient for this kind of Indic application.) However, the fonts named 
in the initial post and my testing further to that report demonstrate 
that even fonts with good OT tables won't render correctly in BibleCS's 
current renderer.

--Chris


David Haslam wrote:
> Zero Width Joiner and non-Joiner
> 
> We should gather the evidence we have collected about zwj & zwnj (in e.g.
> Devanagari) by adding a new row in the table on this wiki page.
> 
> http://www.crosswire.org/wiki/Choosing_a_SWORD_program#Module_Support
> http://www.crosswire.org/wiki/Choosing_a_SWORD_program#Module_Support 
> 
> Ideally, the new row should be below the one for Complex Scripts. This
would
> help clarify the situation for everyone.  A single footnote should provide
> the background and give the explanation.
> 
> -- David Haslam
> 
> 
> 

_______________________________________________
sword-devel mailing list: sword-devel at crosswire.org
http://www.crosswire.org/mailman/listinfo/sword-devel
Instructions to unsubscribe/change your settings at above page




More information about the sword-devel mailing list