[sword-devel] Burmese rendering, was Portable Bible prototype
Chris Little
chrislit at crosswire.org
Sun Jun 8 08:08:51 MST 2008
Burmese/Myanmar has been in Unicode since 3.0, but it got a major
overhaul in version 5.1, which just came out a couple months ago. (And
apparently there are still felt to be some shortcomings in the new
encoding model.)
The Burmese Bible we have most likely uses the Unicode 3.0 model, while
fonts like Padauk use the Unicode 5.1 model. We might get a lot closer
to the correct rendering by simply updating the encoding model of the
text. Performing normalization on the text might also help fix some of
the rendering problems we're seeing--I don't know whether that was done
with the posted module.
The *correct* rendering of the existing Genesis text is in the
screenshot below. (This is the correct rendering of that encoding, which
may or may not be the best encoding in Unicode 5.1.) This is the text
from BibleCS pasted into WorldPad, which utilizes the Graphite tables in
the Padauk font.
http://www.crosswire.org/~chrislit/burmeseGenesis.jpg
Regarding the possibility of complex script rendering in BD, there was a
brief discussion of complex scripts in Java on graphite-devel a week
ago:
http://sourceforge.net/mailarchive/forum.php?thread_name=OF488F087E.E5E55533-ON86257456.004F079E-86257456.00506903%40notes.sil.org&forum_name=silgraphite-devel
The suggested best course of action for today is to use the Eclipse SWT
framework since it will use pango on linux & uniscribe on windows.
--Chris
Peter von Kaehne wrote:
> I attach a screenshot
>
> The differences I found in Gen1:1 are
>
> 1A vs 1B - BD appears to have two characters in sequence while GS has
> split the second character and made it "embrace" the first one.
>
> 2A vs 2B(a) At the centre of the word appears a character a little bit
> like a handwritten latinate small M with a circle above it, a colon and
> a huge bracket from the left side. The order of these elements is very
> different in GS vs BD.
>
> 2A vs 2B(b) a hook is set under the last character of the word in GS,
> while BD shows this hook trailing the last "m" shaped.
>
> 3A vs 3B : a character like a smiley and two parallel lines. In GS they
> are sorted horizontally, in BD sequentially, just as the 2A vs 2Bb
> difference.
>
> To be honest I think the problem is with BD. It appears that these are
> similar to characters with diacritics, which should be correctly shaped
> before they are rendered.
>
> I mentioned a couple of days or so ago a similar problem with Farsi
> where some diacritics would only be displayed if the base text uses the
> code point for the unified character, but flakes out with a box
> character when the character and the diacritic are given as sequential
> unicode items, expecting to be united by the renderer.
>
> My immediate Farsi solution would be to do a mass exchange of characters
> where I exchange the two characters with the one unified one. But I
> wonder whether this is possible in Burmese. Apart form that I think it
> creates search + input difficulties
>
>
>
> In Him
>
> Peter
More information about the sword-devel
mailing list