[sword-devel] Bible in Myanmar

Cyrille lafricain79 at gmail.com
Wed May 15 03:24:22 MST 2019



Il 15/05/2019 11:46, David Haslam ha scritto:
> Interim progress report.
>
> I downloaded the file Mat_utf8.zip from Cyrille's link and unzipped the contents to Mat_utf8-odt
>
> I opened the .odt file using 7-Zip from the Windows Explorer context menu, and extracted the file contents.xml
>
> I used Notepad++ plug-in XMLTools to pretty print the XML file and saved it as contents.pp.xml
> This is simply a layout change that's easier to read.
>
> I viewed the .pp.xml file in BabelPad, which confirmed that the non-XML text was (mostly) Myanmar Unicode.
>
> I used a TextPipe filter to remove all XML tags, blanks from SOL & EOL and all blank lines.
> The output file is now contents.pp.txt
>
> This is now something that's readable content in Myanmar Unicode, with some English text such as "The Gospel according Matthew" near the start.
>
> The file is best viewed using BabelPad with the option Display Colours | Colour Code by Script.
> This shows Myanmar characters in light green, and non-Myanmar characters in other colours.
>
> Observations:
> 1. The font conversion to Unicode left a few scattered characters unconverted. :(
>
> 0000C8	È	18	LATIN CAPITAL LETTER E WITH GRAVE
> 0000D8	Ø	20	LATIN CAPITAL LETTER O WITH STROKE
> 0000F2	ò	3	LATIN SMALL LETTER O WITH GRAVE
Yes but this can be easily change. I can ask my friends with wich 
characters to change it (or have a look in the pdf).
> The complete character frequency analysis is attached.
>
> 2. A few verse numbers? are still present here and there.
> 3. The content contains section headings and parallel passage headings as well as verse text.
>
> I have just uploaded the file contents.pp.zip to a new folder in my Box account and added Cyrille & Michael as viewers.
My question is, can you do something with the txt file for adding the
verse number?
>
> Best regards,
>
> David
>
> Sent with ProtonMail Secure Email.
>
> ‐‐‐‐‐‐‐ Original Message ‐‐‐‐‐‐‐
> On Monday, May 13, 2019 9:19 AM, Cyrille <lafricain79 at gmail.com> wrote:
>
>> Hello,
>> I recently receive a modern translation of Myanmar of the NT, Psalms and
>> Proverbs with permission to create a new module.
>> But the problems are many... Firs to get the text.
>> I tested different way, but it's done with PageMaker!
>> I can get the text but the problem is I don't have the verses number
>> because they are next in a parallel column and when I copy it I have
>> only the biblical text.
>> I have a pdf also but when I convert it to text (with pdftotext) the
>> columns are mixed.
>> Someone can help me whit any idea?
>> Next problem is the Unicode... The text is not typed in unicode but use
>> a special font.
>> I can send everything you need or push it the git.crosswire.
>>
>> Thanks for help.
>>
>> sword-devel mailing list: sword-devel at crosswire.org
>> http://www.crosswire.org/mailman/listinfo/sword-devel
>> Instructions to unsubscribe/change your settings at above page
>
>
> _______________________________________________
> sword-devel mailing list: sword-devel at crosswire.org
> http://www.crosswire.org/mailman/listinfo/sword-devel
> Instructions to unsubscribe/change your settings at above page

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://www.crosswire.org/pipermail/sword-devel/attachments/20190515/fc18561d/attachment-0001.html>


More information about the sword-devel mailing list