[sword-devel] Bible in Myanmar
Cyrille
lafricain79 at gmail.com
Wed May 15 10:08:42 MST 2019
I have not understood everything yet ... But I trust you. But if you
have the courage to explain to me I want to learn :)
What I don't understand is how you can find the marker of each verse and
chapter in the utf8 text? What is this marker in question?
Il 15/05/2019 19:03, David Haslam ha scritto:
> Michael’s description matches how I imagined the method
> during my waking moments this morning. :)
>
> David
>
> Sent from ProtonMail Mobile
>
>
> On Wed, May 15, 2019 at 17:33, Michael H <cmahte at gmail.com
> <mailto:cmahte at gmail.com>> wrote:
>> I've been working long hours and emailing in my break time. David
>> has the basics of converting to VPL.
>>
>> I would then make the entire work a column in a spreadsheet.
>>
>> Then in other collumns insert a list of Book/chapter/verse in order.
>>
>> The BCV and versetext columns should align and can be verified, and
>> adjusted where things don't match perfectly, like maybe 3 John has 15
>> instead of 14 verses.
>>
>> Once the columns align, you can merge them into another column via
>> concatenation operations (&). This last column becomes your output.
>>
>> The output needs to consider that section titles and section ranges
>> belong in front of the verse marker. That is a bit more complex
>> search and replace, but can be done successfully.
>>
>>
>>
>> On Wed, May 15, 2019 at 11:12 AM David Haslam <dfhdfh at protonmail.com
>> <mailto:dfhdfh at protonmail.com>> wrote:
>>
>> The attachment contains a counted list of Myanmar words
>> containing a font conversion error.
>> /NB. We need to match these words with what they are in the
>> legacy font./
>>
>> This issue should be discussed with the current maintainer of the
>> SIL *TECkit* converter, whoever that may be.
>>
>> It may be worthwhile asking our friends at the SIL *Writing
>> Systems Technology* team. See
>> https://scripts.sil.org/default
>>
>> /Aside: My friend Martin Hosken of SIL knew the late Keith
>> Stribley - the former webmaster of ThanLwinSoft./
>>
>> Best regards,
>>
>> David
>>
>> Sent with ProtonMail <https://protonmail.com> Secure Email.
>>
>> ‐‐‐‐‐‐‐ Original Message ‐‐‐‐‐‐‐
>> On Wednesday, May 15, 2019 4:41 PM, David Haslam
>> <dfhdfh at protonmail.com <mailto:dfhdfh at protonmail.com>> wrote:
>>
>>> _*Observations*: (continued)_
>>>
>>> 5. The string "*Kd;*" also looks anomalous. It's found only once in
>>> ကိုယ်တော်၏ဦးခေါင်းတော်အပေါ်၌ လည်း ဤသူသည်ကား ဂျူးလူမျ Kd;တို့၏ဘုရင်၊
>>>
>>> 6. It's evident from the PDF file that the text is paragraphed
>>> with indented first lines. See
>>> https://www.dropbox.com/s/do5e675i19xfomf/Screenshot%202019-05-15%2016.29.10.png?dl=0
>>>
>>> My hunch is that these leading paragraph indents may have been
>>> coded within contents.xml as the self-closing
>>> element *<text:tab/>*. There are 372 matches to this.
>>>
>>> So not only do we need to provide chapter and verse tags (plus
>>> section headings & parallel passage titles, etc), we also need
>>> to reconstruct all the paragraph tags.
>>>
>>> /NB. All structural XML indents were removed by the filter
>>> "Remove blanks at SOL" in the file /*/contents.pp.tx/*/that
>>> was output by my simple TextPipe filter. So that's quite a
>>> different matter./
>>>
>>> Best regards,
>>>
>>> David
>>>
>>> Sent with ProtonMail <https://protonmail.com> Secure Email.
>>>
>>> ‐‐‐‐‐‐‐ Original Message ‐‐‐‐‐‐‐
>>> On Wednesday, May 15, 2019 2:22 PM, David Haslam
>>> <dfhdfh at protonmail.com <mailto:dfhdfh at protonmail.com>> wrote:
>>>
>>>> _*Observations:* (continued*)*_
>>>>
>>>> 4. In addition to the reported instances of the anomalous 3
>>>> characters (*È,Ø,ò*) found after the font conversion,
>>>> there are 6 instances of the string "*m;*" that are
>>>> also probably due to bugs in the converter.
>>>>
>>>> Best regards,
>>>>
>>>> David
>>>>
>>>> Sent with ProtonMail <https://protonmail.com> Secure Email.
>>>>
>>>> ‐‐‐‐‐‐‐ Original Message ‐‐‐‐‐‐‐
>>>> On Wednesday, May 15, 2019 12:41 PM, David Haslam
>>>> <dfhdfh at protonmail.com <mailto:dfhdfh at protonmail.com>> wrote:
>>>>
>>>>> Yep - sure - later I can do that.
>>>>>
>>>>> David
>>>>>
>>>>> Sent from ProtonMail Mobile
>>>>>
>>>>>
>>>>> On Wed, May 15, 2019 at 11:26, Cyrille <lafricain79 at gmail.com
>>>>> <mailto:lafricain79 at gmail.com>> wrote:
>>>>>> David I have no count in box, and I want not to create one.
>>>>>> Can you push on https://framadrop.org/ it's totally free and
>>>>>> secure (and private).
>>>>>> Thank you.
>>>>>>
>>>>>>
>>>>>> Il 15/05/2019 11:46, David Haslam ha scritto:
>>>>>>> Interim progress report.
>>>>>>>
>>>>>>> I downloaded the file Mat_utf8.zip from Cyrille's link and unzipped the contents to Mat_utf8-odt
>>>>>>>
>>>>>>> I opened the .odt file using 7-Zip from the Windows Explorer context menu, and extracted the file contents.xml
>>>>>>>
>>>>>>> I used Notepad++ plug-in XMLTools to pretty print the XML file and saved it as contents.pp.xml
>>>>>>> This is simply a layout change that's easier to read.
>>>>>>>
>>>>>>> I viewed the .pp.xml file in BabelPad, which confirmed that the non-XML text was (mostly) Myanmar Unicode.
>>>>>>>
>>>>>>> I used a TextPipe filter to remove all XML tags, blanks from SOL & EOL and all blank lines.
>>>>>>> The output file is now contents.pp.txt
>>>>>>>
>>>>>>> This is now something that's readable content in Myanmar Unicode, with some English text such as "The Gospel according Matthew" near the start.
>>>>>>>
>>>>>>> The file is best viewed using BabelPad with the option Display Colours | Colour Code by Script.
>>>>>>> This shows Myanmar characters in light green, and non-Myanmar characters in other colours.
>>>>>>>
>>>>>>> Observations:
>>>>>>> 1. The font conversion to Unicode left a few scattered characters unconverted. :(
>>>>>>>
>>>>>>> 0000C8 È 18 LATIN CAPITAL LETTER E WITH GRAVE
>>>>>>> 0000D8 Ø 20 LATIN CAPITAL LETTER O WITH STROKE
>>>>>>> 0000F2 ò 3 LATIN SMALL LETTER O WITH GRAVE
>>>>>>>
>>>>>>> The complete character frequency analysis is attached.
>>>>>>>
>>>>>>> 2. A few verse numbers? are still present here and there.
>>>>>>> 3. The content contains section headings and parallel passage headings as well as verse text.
>>>>>>>
>>>>>>> I have just uploaded the file contents.pp.zip to a new folder in my Box account and added Cyrille & Michael as viewers.
>>>>>>>
>>>>>>>
>>>>>>> Best regards,
>>>>>>>
>>>>>>> David
>>>>>>>
>>>>>>> Sent with ProtonMail Secure Email.
>>>>>>>
>>>>>>> ‐‐‐‐‐‐‐ Original Message ‐‐‐‐‐‐‐
>>>>>>> On Monday, May 13, 2019 9:19 AM, Cyrille <lafricain79 at gmail.com> <mailto:lafricain79 at gmail.com> wrote:
>>>>>>>
>>>>>>>
>>>>>>>> Hello,
>>>>>>>> I recently receive a modern translation of Myanmar of the NT, Psalms and
>>>>>>>> Proverbs with permission to create a new module.
>>>>>>>> But the problems are many... Firs to get the text.
>>>>>>>> I tested different way, but it's done with PageMaker!
>>>>>>>> I can get the text but the problem is I don't have the verses number
>>>>>>>> because they are next in a parallel column and when I copy it I have
>>>>>>>> only the biblical text.
>>>>>>>> I have a pdf also but when I convert it to text (with pdftotext) the
>>>>>>>> columns are mixed.
>>>>>>>> Someone can help me whit any idea?
>>>>>>>> Next problem is the Unicode... The text is not typed in unicode but use
>>>>>>>> a special font.
>>>>>>>> I can send everything you need or push it the git.crosswire.
>>>>>>>>
>>>>>>>> Thanks for help.
>>>>>>>>
>>>>>>>> sword-devel mailing list: sword-devel at crosswire.org <mailto:sword-devel at crosswire.org>
>>>>>>>> http://www.crosswire.org/mailman/listinfo/sword-devel
>>>>>>>> Instructions to unsubscribe/change your settings at above page
>>>>>>>>
>>>>>>>
>>>>>>> _______________________________________________
>>>>>>> sword-devel mailing list: sword-devel at crosswire.org <mailto:sword-devel at crosswire.org>
>>>>>>> http://www.crosswire.org/mailman/listinfo/sword-devel
>>>>>>> Instructions to unsubscribe/change your settings at above page
>>>>>
>>>>>
>>>>
>>>
>>
>> _______________________________________________
>> sword-devel mailing list: sword-devel at crosswire.org
>> <mailto:sword-devel at crosswire.org>
>> http://www.crosswire.org/mailman/listinfo/sword-devel
>> Instructions to unsubscribe/change your settings at above page
>>
>
>
>
> _______________________________________________
> sword-devel mailing list: sword-devel at crosswire.org
> http://www.crosswire.org/mailman/listinfo/sword-devel
> Instructions to unsubscribe/change your settings at above page
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://www.crosswire.org/pipermail/sword-devel/attachments/20190515/47a13168/attachment-0001.html>
More information about the sword-devel
mailing list