[sword-devel] Bible in Myanmar

David Haslam dfhdfh at protonmail.com
Wed May 15 04:41:54 MST 2019


Yep - sure - later I can do that.

David

Sent from ProtonMail Mobile

On Wed, May 15, 2019 at 11:26, Cyrille <lafricain79 at gmail.com> wrote:

> David I have no count in box, and I want not to create one. Can you push on https://framadrop.org/ it's totally free and secure (and private).
> Thank  you.
>
> Il 15/05/2019 11:46, David Haslam ha scritto:
>
>> Interim progress report.
>>
>> I downloaded the file Mat_utf8.zip from Cyrille's link and unzipped the contents to Mat_utf8-odt
>>
>> I opened the .odt file using 7-Zip from the Windows Explorer context menu, and extracted the file contents.xml
>>
>> I used Notepad++ plug-in XMLTools to pretty print the XML file and saved it as contents.pp.xml
>> This is simply a layout change that's easier to read.
>>
>> I viewed the .pp.xml file in BabelPad, which confirmed that the non-XML text was (mostly) Myanmar Unicode.
>>
>> I used a TextPipe filter to remove all XML tags, blanks from SOL & EOL and all blank lines.
>> The output file is now contents.pp.txt
>>
>> This is now something that's readable content in Myanmar Unicode, with some English text such as "The Gospel according Matthew" near the start.
>>
>> The file is best viewed using BabelPad with the option Display Colours | Colour Code by Script.
>> This shows Myanmar characters in light green, and non-Myanmar characters in other colours.
>>
>> Observations:
>> 1. The font conversion to Unicode left a few scattered characters unconverted. :(
>>
>> 0000C8	È	18	LATIN CAPITAL LETTER E WITH GRAVE
>> 0000D8	Ø	20	LATIN CAPITAL LETTER O WITH STROKE
>> 0000F2	ò	3	LATIN SMALL LETTER O WITH GRAVE
>>
>> The complete character frequency analysis is attached.
>>
>> 2. A few verse numbers? are still present here and there.
>> 3. The content contains section headings and parallel passage headings as well as verse text.
>>
>> I have just uploaded the file contents.pp.zip to a new folder in my Box account and added Cyrille & Michael as viewers.
>>
>> Best regards,
>>
>> David
>>
>> Sent with ProtonMail Secure Email.
>>
>> ‐‐‐‐‐‐‐ Original Message ‐‐‐‐‐‐‐
>> On Monday, May 13, 2019 9:19 AM, Cyrille
>> [<lafricain79 at gmail.com>](mailto:lafricain79 at gmail.com)
>> wrote:
>>
>>> Hello,
>>> I recently receive a modern translation of Myanmar of the NT, Psalms and
>>> Proverbs with permission to create a new module.
>>> But the problems are many... Firs to get the text.
>>> I tested different way, but it's done with PageMaker!
>>> I can get the text but the problem is I don't have the verses number
>>> because they are next in a parallel column and when I copy it I have
>>> only the biblical text.
>>> I have a pdf also but when I convert it to text (with pdftotext) the
>>> columns are mixed.
>>> Someone can help me whit any idea?
>>> Next problem is the Unicode... The text is not typed in unicode but use
>>> a special font.
>>> I can send everything you need or push it the git.crosswire.
>>>
>>> Thanks for help.
>>>
>>> sword-devel mailing list:
>>> sword-devel at crosswire.org
>>>
>>> http://www.crosswire.org/mailman/listinfo/sword-devel
>>> Instructions to unsubscribe/change your settings at above page
>>
>> _______________________________________________
>> sword-devel mailing list:
>> sword-devel at crosswire.org
>>
>> http://www.crosswire.org/mailman/listinfo/sword-devel
>> Instructions to unsubscribe/change your settings at above page
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://www.crosswire.org/pipermail/sword-devel/attachments/20190515/a413cdbf/attachment.html>


More information about the sword-devel mailing list