[sword-devel] Bible in Myanmar

Cyrille lafricain79 at gmail.com
Wed May 15 10:08:42 MST 2019


I have not understood everything yet ... But I trust you. But if you
have the courage to explain to me I want to learn :)
What I don't understand is how you can find the marker of each verse and
chapter in the utf8 text? What is this marker in question?

Il 15/05/2019 19:03, David Haslam ha scritto:
> Michael’s description matches how I imagined the method
> during my waking moments this morning. :)
>
> David
>
> Sent from ProtonMail Mobile
>
>
> On Wed, May 15, 2019 at 17:33, Michael H <cmahte at gmail.com
> <mailto:cmahte at gmail.com>> wrote:
>> I've been working long hours and emailing in my break time.  David
>> has the basics of converting to VPL.  
>>
>> I would then make the entire work a column in a spreadsheet. 
>>
>> Then in other collumns insert a list of Book/chapter/verse in order. 
>>
>> The BCV and versetext  columns should align and can be verified, and
>> adjusted where things don't match perfectly, like maybe 3 John has 15
>> instead of 14 verses. 
>>
>> Once the columns align, you can merge them into another column via
>> concatenation operations (&).  This last column becomes your output. 
>>
>> The output needs to consider that section titles and section ranges
>> belong in front of the verse marker. That is a bit more complex
>> search and replace, but can be done successfully. 
>>
>>
>>
>> On Wed, May 15, 2019 at 11:12 AM David Haslam <dfhdfh at protonmail.com
>> <mailto:dfhdfh at protonmail.com>> wrote:
>>
>>     The attachment contains a counted list of Myanmar words
>>     containing a font conversion error.
>>     /NB. We need to match these words with what they are in the
>>     legacy font./
>>
>>     This issue should be discussed with the current maintainer of the
>>     SIL *TECkit* converter, whoever that may be.
>>
>>     It may be worthwhile asking our friends at the SIL *Writing
>>     Systems Technology* team. See
>>     https://scripts.sil.org/default
>>
>>     /Aside: My friend Martin Hosken of SIL knew the late Keith
>>     Stribley - the former webmaster of ThanLwinSoft./
>>
>>     Best regards,
>>
>>     David
>>
>>     Sent with ProtonMail <https://protonmail.com> Secure Email.
>>
>>     ‐‐‐‐‐‐‐ Original Message ‐‐‐‐‐‐‐
>>     On Wednesday, May 15, 2019 4:41 PM, David Haslam
>>     <dfhdfh at protonmail.com <mailto:dfhdfh at protonmail.com>> wrote:
>>
>>>     _*Observations*: (continued)_
>>>
>>>     5. The string "*Kd;*" also looks anomalous. It's found only once in 
>>>     ကိုယ်တော်၏ဦးခေါင်းတော်အပေါ်၌ လည်း ဤသူသည်ကား ဂျူးလူမျ Kd;တို့၏ဘုရင်၊
>>>
>>>     6. It's evident from the PDF file that the text is paragraphed
>>>     with indented first lines. See 
>>>     https://www.dropbox.com/s/do5e675i19xfomf/Screenshot%202019-05-15%2016.29.10.png?dl=0
>>>
>>>     My hunch is that these leading paragraph indents may have been
>>>     coded within contents.xml as the self-closing
>>>     element *<text:tab/>*. There are 372 matches to this.
>>>
>>>     So not only do we need to provide chapter and verse tags (plus
>>>     section headings & parallel passage titles, etc), we also need
>>>     to reconstruct all the paragraph tags.
>>>
>>>     /NB. All structural XML indents were removed by the filter
>>>     "Remove blanks at SOL" in the file /*/contents.pp.tx/*/that
>>>     was output by my simple TextPipe filter. So that's quite a
>>>     different matter./
>>>
>>>     Best regards,
>>>
>>>     David
>>>
>>>     Sent with ProtonMail <https://protonmail.com> Secure Email.
>>>
>>>     ‐‐‐‐‐‐‐ Original Message ‐‐‐‐‐‐‐
>>>     On Wednesday, May 15, 2019 2:22 PM, David Haslam
>>>     <dfhdfh at protonmail.com <mailto:dfhdfh at protonmail.com>> wrote:
>>>
>>>>     _*Observations:* (continued*)*_
>>>>
>>>>     4. In addition to the reported instances of the anomalous 3
>>>>     characters (*È,Ø,ò*) found after the font conversion,
>>>>     there are 6 instances of the string "*m;*" that are
>>>>     also probably due to bugs in the converter.
>>>>
>>>>     Best regards,
>>>>
>>>>     David
>>>>
>>>>     Sent with ProtonMail <https://protonmail.com> Secure Email.
>>>>
>>>>     ‐‐‐‐‐‐‐ Original Message ‐‐‐‐‐‐‐
>>>>     On Wednesday, May 15, 2019 12:41 PM, David Haslam
>>>>     <dfhdfh at protonmail.com <mailto:dfhdfh at protonmail.com>> wrote:
>>>>
>>>>>     Yep - sure - later I can do that. 
>>>>>
>>>>>     David
>>>>>
>>>>>     Sent from ProtonMail Mobile
>>>>>
>>>>>
>>>>>     On Wed, May 15, 2019 at 11:26, Cyrille <lafricain79 at gmail.com
>>>>>     <mailto:lafricain79 at gmail.com>> wrote:
>>>>>>     David I have no count in box, and I want not to create one.
>>>>>>     Can you push on https://framadrop.org/ it's totally free and
>>>>>>     secure (and private).
>>>>>>     Thank  you.
>>>>>>
>>>>>>
>>>>>>     Il 15/05/2019 11:46, David Haslam ha scritto:
>>>>>>>     Interim progress report.
>>>>>>>
>>>>>>>     I downloaded the file Mat_utf8.zip from Cyrille's link and unzipped the contents to Mat_utf8-odt
>>>>>>>
>>>>>>>     I opened the .odt file using 7-Zip from the Windows Explorer context menu, and extracted the file contents.xml
>>>>>>>
>>>>>>>     I used Notepad++ plug-in XMLTools to pretty print the XML file and saved it as contents.pp.xml
>>>>>>>     This is simply a layout change that's easier to read.
>>>>>>>
>>>>>>>     I viewed the .pp.xml file in BabelPad, which confirmed that the non-XML text was (mostly) Myanmar Unicode.
>>>>>>>
>>>>>>>     I used a TextPipe filter to remove all XML tags, blanks from SOL & EOL and all blank lines.
>>>>>>>     The output file is now contents.pp.txt
>>>>>>>
>>>>>>>     This is now something that's readable content in Myanmar Unicode, with some English text such as "The Gospel according Matthew" near the start.
>>>>>>>
>>>>>>>     The file is best viewed using BabelPad with the option Display Colours | Colour Code by Script.
>>>>>>>     This shows Myanmar characters in light green, and non-Myanmar characters in other colours.
>>>>>>>
>>>>>>>     Observations:
>>>>>>>     1. The font conversion to Unicode left a few scattered characters unconverted. :(
>>>>>>>
>>>>>>>     0000C8	È	18	LATIN CAPITAL LETTER E WITH GRAVE
>>>>>>>     0000D8	Ø	20	LATIN CAPITAL LETTER O WITH STROKE
>>>>>>>     0000F2	ò	3	LATIN SMALL LETTER O WITH GRAVE
>>>>>>>
>>>>>>>     The complete character frequency analysis is attached.
>>>>>>>
>>>>>>>     2. A few verse numbers? are still present here and there.
>>>>>>>     3. The content contains section headings and parallel passage headings as well as verse text.
>>>>>>>
>>>>>>>     I have just uploaded the file contents.pp.zip to a new folder in my Box account and added Cyrille & Michael as viewers.
>>>>>>>
>>>>>>>
>>>>>>>     Best regards,
>>>>>>>
>>>>>>>     David
>>>>>>>
>>>>>>>     Sent with ProtonMail Secure Email.
>>>>>>>
>>>>>>>     ‐‐‐‐‐‐‐ Original Message ‐‐‐‐‐‐‐
>>>>>>>     On Monday, May 13, 2019 9:19 AM, Cyrille <lafricain79 at gmail.com> <mailto:lafricain79 at gmail.com> wrote:
>>>>>>>
>>>>>>>
>>>>>>>>     Hello,
>>>>>>>>     I recently receive a modern translation of Myanmar of the NT, Psalms and
>>>>>>>>     Proverbs with permission to create a new module.
>>>>>>>>     But the problems are many... Firs to get the text.
>>>>>>>>     I tested different way, but it's done with PageMaker!
>>>>>>>>     I can get the text but the problem is I don't have the verses number
>>>>>>>>     because they are next in a parallel column and when I copy it I have
>>>>>>>>     only the biblical text.
>>>>>>>>     I have a pdf also but when I convert it to text (with pdftotext) the
>>>>>>>>     columns are mixed.
>>>>>>>>     Someone can help me whit any idea?
>>>>>>>>     Next problem is the Unicode... The text is not typed in unicode but use
>>>>>>>>     a special font.
>>>>>>>>     I can send everything you need or push it the git.crosswire.
>>>>>>>>
>>>>>>>>     Thanks for help.
>>>>>>>>
>>>>>>>>     sword-devel mailing list: sword-devel at crosswire.org <mailto:sword-devel at crosswire.org>
>>>>>>>>     http://www.crosswire.org/mailman/listinfo/sword-devel
>>>>>>>>     Instructions to unsubscribe/change your settings at above page
>>>>>>>>
>>>>>>>
>>>>>>>     _______________________________________________
>>>>>>>     sword-devel mailing list: sword-devel at crosswire.org <mailto:sword-devel at crosswire.org>
>>>>>>>     http://www.crosswire.org/mailman/listinfo/sword-devel
>>>>>>>     Instructions to unsubscribe/change your settings at above page
>>>>>
>>>>>
>>>>
>>>
>>
>>     _______________________________________________
>>     sword-devel mailing list: sword-devel at crosswire.org
>>     <mailto:sword-devel at crosswire.org>
>>     http://www.crosswire.org/mailman/listinfo/sword-devel
>>     Instructions to unsubscribe/change your settings at above page
>>
>
>
>
> _______________________________________________
> sword-devel mailing list: sword-devel at crosswire.org
> http://www.crosswire.org/mailman/listinfo/sword-devel
> Instructions to unsubscribe/change your settings at above page

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://www.crosswire.org/pipermail/sword-devel/attachments/20190515/47a13168/attachment-0001.html>


More information about the sword-devel mailing list