[sword-devel] Bible in Myanmar

David Haslam dfhdfh at protonmail.com
Wed May 15 04:38:10 MST 2019


Cyrille writes:

“My question is, can you do something with the txt file for adding the verse number?”

Well - yes - that’s my intention.
It was after all an “interim progress report”.  ;)

David

Sent from ProtonMail Mobile

On Wed, May 15, 2019 at 11:24, Cyrille <lafricain79 at gmail.com> wrote:

> Il 15/05/2019 11:46, David Haslam ha scritto:
>
>> Interim progress report.
>>
>> I downloaded the file Mat_utf8.zip from Cyrille's link and unzipped the contents to Mat_utf8-odt
>>
>> I opened the .odt file using 7-Zip from the Windows Explorer context menu, and extracted the file contents.xml
>>
>> I used Notepad++ plug-in XMLTools to pretty print the XML file and saved it as contents.pp.xml
>> This is simply a layout change that's easier to read.
>>
>> I viewed the .pp.xml file in BabelPad, which confirmed that the non-XML text was (mostly) Myanmar Unicode.
>>
>> I used a TextPipe filter to remove all XML tags, blanks from SOL & EOL and all blank lines.
>> The output file is now contents.pp.txt
>>
>> This is now something that's readable content in Myanmar Unicode, with some English text such as "The Gospel according Matthew" near the start.
>>
>> The file is best viewed using BabelPad with the option Display Colours | Colour Code by Script.
>> This shows Myanmar characters in light green, and non-Myanmar characters in other colours.
>>
>> Observations:
>> 1. The font conversion to Unicode left a few scattered characters unconverted. :(
>>
>> 0000C8	È	18	LATIN CAPITAL LETTER E WITH GRAVE
>> 0000D8	Ø	20	LATIN CAPITAL LETTER O WITH STROKE
>> 0000F2	ò	3	LATIN SMALL LETTER O WITH GRAVE
>
> Yes but this can be easily change. I can ask my friends with wich  characters to change it (or have a look in the pdf).
>
>> The complete character frequency analysis is attached.
>>
>> 2. A few verse numbers? are still present here and there.
>> 3. The content contains section headings and parallel passage headings as well as verse text.
>>
>> I have just uploaded the file contents.pp.zip to a new folder in my Box account and added Cyrille & Michael as viewers.
>
> My question is, can you do something with the txt file for adding the verse number?
>
>> Best regards,
>>
>> David
>>
>> Sent with ProtonMail Secure Email.
>>
>> ‐‐‐‐‐‐‐ Original Message ‐‐‐‐‐‐‐
>> On Monday, May 13, 2019 9:19 AM, Cyrille
>> [<lafricain79 at gmail.com>](mailto:lafricain79 at gmail.com)
>> wrote:
>>
>>> Hello,
>>> I recently receive a modern translation of Myanmar of the NT, Psalms and
>>> Proverbs with permission to create a new module.
>>> But the problems are many... Firs to get the text.
>>> I tested different way, but it's done with PageMaker!
>>> I can get the text but the problem is I don't have the verses number
>>> because they are next in a parallel column and when I copy it I have
>>> only the biblical text.
>>> I have a pdf also but when I convert it to text (with pdftotext) the
>>> columns are mixed.
>>> Someone can help me whit any idea?
>>> Next problem is the Unicode... The text is not typed in unicode but use
>>> a special font.
>>> I can send everything you need or push it the git.crosswire.
>>>
>>> Thanks for help.
>>>
>>> sword-devel mailing list:
>>> sword-devel at crosswire.org
>>>
>>> http://www.crosswire.org/mailman/listinfo/sword-devel
>>> Instructions to unsubscribe/change your settings at above page
>>
>> _______________________________________________
>> sword-devel mailing list:
>> sword-devel at crosswire.org
>>
>> http://www.crosswire.org/mailman/listinfo/sword-devel
>> Instructions to unsubscribe/change your settings at above page
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://www.crosswire.org/pipermail/sword-devel/attachments/20190515/c029325c/attachment-0001.html>


More information about the sword-devel mailing list