[sword-devel] Burmese rendering, was Portable Bible prototype

Chris Little chrislit at crosswire.org
Sun Jun 8 08:08:51 MST 2008


Burmese/Myanmar has been in Unicode since 3.0, but it got a major 
overhaul in version 5.1, which just came out a couple months ago. (And 
apparently there are still felt to be some shortcomings in the new 
encoding model.)

The Burmese Bible we have most likely uses the Unicode 3.0 model, while 
fonts like Padauk use the Unicode 5.1 model. We might get a lot closer 
to the correct rendering by simply updating the encoding model of the 
text. Performing normalization on the text might also help fix some of 
the rendering problems we're seeing--I don't know whether that was done 
with the posted module.

The *correct* rendering of the existing Genesis text is in the 
screenshot below. (This is the correct rendering of that encoding, which 
may or may not be the best encoding in Unicode 5.1.) This is the text 
from BibleCS pasted into WorldPad, which utilizes the Graphite tables in 
the Padauk font.

http://www.crosswire.org/~chrislit/burmeseGenesis.jpg

Regarding the possibility of complex script rendering in BD, there was a 
brief discussion of complex scripts in Java on graphite-devel a week 
ago: 
http://sourceforge.net/mailarchive/forum.php?thread_name=OF488F087E.E5E55533-ON86257456.004F079E-86257456.00506903%40notes.sil.org&forum_name=silgraphite-devel

The suggested best course of action for today is to use the Eclipse SWT 
framework since it will use pango on linux & uniscribe on windows.

--Chris


Peter von Kaehne wrote:
> I attach a screenshot
> 
> The differences I found in Gen1:1 are
> 
> 1A vs 1B - BD appears to have two characters in sequence while GS has
> split the second character and made it "embrace" the first one.
> 
> 2A vs 2B(a) At the centre of the word appears a character a little bit
> like a handwritten latinate small M with a circle above it, a colon and
> a huge bracket from the left side. The order of these elements is very
> different in GS vs BD.
> 
> 2A vs 2B(b) a hook is set under the last character of the word in GS,
> while BD shows this hook trailing the last "m" shaped.
> 
> 3A vs 3B : a character like a smiley and two parallel lines. In GS they
> are sorted horizontally, in BD sequentially, just as the 2A vs 2Bb
> difference.
> 
> To be honest I think the problem is with BD. It appears that these are
> similar to characters with diacritics, which should be correctly shaped
> before they are rendered.
> 
> I mentioned a couple of days or so ago a similar problem with Farsi
> where some diacritics would only be displayed if the base text uses the
> code point for the unified character, but flakes out with a box
> character when the character and the diacritic are given as sequential
> unicode items, expecting to be united by the renderer.
> 
> My immediate Farsi solution would be to do a mass exchange of characters
> where I exchange the two characters with the one unified one. But I
> wonder whether this is possible in Burmese. Apart form that I think it
> creates search + input difficulties
> 
> 
> 
> In Him
> 
> Peter



More information about the sword-devel mailing list