<html>
  <head>
    <meta http-equiv="Content-Type" content="text/html; charset=UTF-8">
  </head>
  <body text="#000000" bgcolor="#FFFFFF">
    <br>
    <br>
    <div class="moz-cite-prefix">Il 15/05/2019 19:18, David Haslam ha
      scritto:<br>
    </div>
    <blockquote type="cite"
cite="mid:5bUI1WcpT8-iUOilbslt9pvipCFxmoiZtc0gyw2Yq0jMD-0LFr2rKTT51o8AY9FGoyX2nps90i8OyjAcpBSHcnM4qoDP7ytBAAzvd6zRgpU=@protonmail.com">
      <meta http-equiv="content-type" content="text/html; charset=UTF-8">
      <div>Each of the last 1 or 2 characters of each verse is a regular
        Myanmar punctuation mark.</div>
      <div><br>
      </div>
    </blockquote>
    Do you know wich mark?<br>
    <blockquote type="cite"
cite="mid:5bUI1WcpT8-iUOilbslt9pvipCFxmoiZtc0gyw2Yq0jMD-0LFr2rKTT51o8AY9FGoyX2nps90i8OyjAcpBSHcnM4qoDP7ytBAAzvd6zRgpU=@protonmail.com">
      <div>We need to be careful how we apply this.  There may well be
        some exceptions.</div>
      <div><br>
      </div>
      <div>Windows users should install BabelPad. This free Unicode text
        editor is highly recommended.</div>
      <div><br>
      </div>
      <div><a href="http://www.babelstone.co.uk/Software/BabelPad.html"
          moz-do-not-send="true">http://www.babelstone.co.uk/Software/BabelPad.html</a><br>
      </div>
      <div><br>
      </div>
      <div>It will help in all sorts of ways, not least in analysis.</div>
      <div><br>
      </div>
      <div>David</div>
      <div><br>
      </div>
      <div id="protonmail_mobile_signature_block">
        <div>Sent from ProtonMail Mobile</div>
      </div>
      <div><br>
      </div>
      <div><br>
      </div>
      On Wed, May 15, 2019 at 18:08, Cyrille &lt;<a
        href="mailto:lafricain79@gmail.com" class=""
        moz-do-not-send="true">lafricain79@gmail.com</a>&gt; wrote:
      <blockquote class="protonmail_quote" type="cite"> <span
          class="tlid-translation translation" lang="en"><span title="">I
            have not understood everything yet ... But I trust you.</span>
          <span title="">But if you have the courage to explain to me I
            want to learn :)</span><br>
          <span title="">What I don't understand is how you can find the
            marker of each verse and chapter in the utf8 text?</span> <span
            title="" class="">What is this marker in question?</span></span><br>
        <br>
        <div class="moz-cite-prefix">Il 15/05/2019 19:03, David Haslam
          ha scritto:<br>
        </div>
        <blockquote type="cite">
          <div>Michael’s description matches how I imagined the method
            during my waking moments this morning. :)</div>
          <div><br>
          </div>
          <div>David</div>
          <div><br>
          </div>
          <div id="protonmail_mobile_signature_block">
            <div>Sent from ProtonMail Mobile</div>
          </div>
          <div><br>
          </div>
          <div><br>
          </div>
          On Wed, May 15, 2019 at 17:33, Michael H &lt;<a
            href="mailto:cmahte@gmail.com" class=""
            moz-do-not-send="true">cmahte@gmail.com</a>&gt; wrote:
          <blockquote class="protonmail_quote" type="cite">
            <div dir="ltr">
              <div class="gmail_default"
                style="font-family:garamond,serif;font-size:large">I've
                been working long hours and emailing in my break time. 
                David has the basics of converting to VPL.  <br>
                <br>
                I would then make the entire work a column in a
                spreadsheet. <br>
                <br>
                Then in other collumns insert a list of
                Book/chapter/verse in order. <br>
                <br>
                The BCV and versetext  columns should align and can be
                verified, and adjusted where things don't match
                perfectly, like maybe 3 John has 15 instead of 14
                verses. <br>
                <br>
                Once the columns align, you can merge them into another
                column via concatenation operations (&amp;).  This last
                column becomes your output. <br>
                <br>
                The output needs to consider that section titles and
                section ranges belong in front of the verse marker. That
                is a bit more complex search and replace, but can be
                done successfully. </div>
              <div class="gmail_default"
                style="font-family:garamond,serif;font-size:large"><br>
              </div>
              <div class="gmail_default"
                style="font-family:garamond,serif;font-size:large"><br>
              </div>
            </div>
            <br>
            <div class="gmail_quote">
              <div dir="ltr" class="gmail_attr">On Wed, May 15, 2019 at
                11:12 AM David Haslam &lt;<a
                  href="mailto:dfhdfh@protonmail.com"
                  moz-do-not-send="true">dfhdfh@protonmail.com</a>&gt;
                wrote:<br>
              </div>
              <blockquote class="gmail_quote" style="margin:0px 0px
                0px&#xA; 0.8ex;border-left:1px solid&#xA;
                rgb(204,204,204);padding-left:1ex">
                <div>The attachment contains a counted list of Myanmar
                  words containing a font conversion error.<br>
                  <i>NB. We need to match these words with what they are
                    in the legacy font.</i></div>
                <div><br>
                </div>
                <div>
                  <div>This issue should be discussed with the current
                    maintainer of the SIL <b>TECkit</b> converter,
                    whoever that may be.<br>
                  </div>
                  <div><br>
                  </div>
                  <div>It may be worthwhile asking our friends at the
                    SIL <b>Writing Systems Technology</b> team. See</div>
                  <a href="https://scripts.sil.org/default"
                    moz-do-not-send="true">https://scripts.sil.org/default</a><br>
                  <br>
                  <i>Aside: My friend Martin Hosken of SIL knew the late
                    Keith Stribley - the former webmaster of
                    ThanLwinSoft.</i></div>
                <div><br>
                </div>
                <div
                  class="gmail-m_-4120532262546096157protonmail_signature_block">
                  <div
                    class="gmail-m_-4120532262546096157protonmail_signature_block-user">
                    <div>Best regards,<br>
                    </div>
                    <div><br>
                    </div>
                    <div>David<br>
                    </div>
                  </div>
                  <div><br>
                  </div>
                  <div
                    class="gmail-m_-4120532262546096157protonmail_signature_block-proton">Sent
                    with <a href="https://protonmail.com"
                      moz-do-not-send="true">ProtonMail</a> Secure
                    Email.<br>
                  </div>
                </div>
                <div><br>
                </div>
                <div>‐‐‐‐‐‐‐ Original Message ‐‐‐‐‐‐‐<br>
                </div>
                <div> On Wednesday, May 15, 2019 4:41 PM, David Haslam
                  &lt;<a href="mailto:dfhdfh@protonmail.com"
                    moz-do-not-send="true">dfhdfh@protonmail.com</a>&gt;
                  wrote:<br>
                </div>
                <div> <br>
                </div>
                <blockquote
                  class="gmail-m_-4120532262546096157protonmail_quote"
                  type="cite">
                  <div><u><b>Observations</b>: (continued)</u><br>
                  </div>
                  <div><br>
                  </div>
                  <div>
                    <div>5. The string "<b>Kd;</b>" also looks
                      anomalous. It's found only once in <br>
                    </div>
                    <div>ကိုယ်တော်၏ဦးခေါင်းတော်အပေါ်၌ လည်း ဤသူသည်ကား
                      ဂျူးလူမျ Kd;တို့၏ဘုရင်၊<br>
                    </div>
                  </div>
                  <div><br>
                  </div>
                  <div>
                    <div>6. It's evident from the PDF file that the text
                      is paragraphed with indented first lines. See <br>
                    </div>
                    <div><a
href="https://www.dropbox.com/s/do5e675i19xfomf/Screenshot%202019-05-15%2016.29.10.png?dl=0"
                        moz-do-not-send="true">https://www.dropbox.com/s/do5e675i19xfomf/Screenshot%202019-05-15%2016.29.10.png?dl=0</a><br>
                    </div>
                    <div><br>
                    </div>
                    <div>My hunch is that these leading paragraph
                      indents may have been coded within contents.xml as
                      the self-closing element <b>&lt;text:tab/&gt;</b>. <span
                        style="font-family:arial,sans-serif">There are
                        372 matches to this.<br>
                        <br>
                        So not only do we need to provide chapter and
                        verse tags (plus section headings &amp; parallel
                        passage titles, etc), we also need to
                        reconstruct all the paragraph tags.<br>
                        <br>
                        <i>NB. All structural XML indents were removed
                          by the filter "Remove blanks at SOL" in the
                          file </i><b><i>contents.pp.tx</i></b><i> that
                          was output by my simple TextPipe filter. So
                          that's quite a different matter.</i></span></div>
                    <div><br>
                    </div>
                  </div>
                  <div
                    class="gmail-m_-4120532262546096157protonmail_signature_block">
                    <div
                      class="gmail-m_-4120532262546096157protonmail_signature_block-user">
                      <div>Best regards,<br>
                      </div>
                      <div><br>
                      </div>
                      <div>David<br>
                      </div>
                    </div>
                    <div><br>
                    </div>
                    <div
                      class="gmail-m_-4120532262546096157protonmail_signature_block-proton">Sent
                      with <a href="https://protonmail.com"
                        moz-do-not-send="true">ProtonMail</a> Secure
                      Email.<br>
                    </div>
                  </div>
                  <div><br>
                  </div>
                  <div>‐‐‐‐‐‐‐ Original Message ‐‐‐‐‐‐‐<br>
                  </div>
                  <div>On Wednesday, May 15, 2019 2:22 PM, David Haslam
                    &lt;<a href="mailto:dfhdfh@protonmail.com"
                      moz-do-not-send="true">dfhdfh@protonmail.com</a>&gt;
                    wrote:<br>
                  </div>
                  <div><br>
                  </div>
                  <blockquote type="cite"
                    class="gmail-m_-4120532262546096157protonmail_quote">
                    <div><u><b>Observations:</b> (continued<b>)</b></u><br>
                    </div>
                    <div><br>
                    </div>
                    <div>
                      <div>4. In addition to the reported instances of
                        the anomalous 3 characters (<b>È,Ø,ò</b>) found
                        after the font conversion,<br>
                      </div>
                      <div>there are 6 instances of the string "<b>m;</b>"
                        that are also probably due to bugs in the
                        converter.<br>
                      </div>
                      <div><br>
                      </div>
                    </div>
                    <div
                      class="gmail-m_-4120532262546096157protonmail_signature_block">
                      <div
                        class="gmail-m_-4120532262546096157protonmail_signature_block-user">
                        <div>Best regards,<br>
                        </div>
                        <div><br>
                        </div>
                        <div>David<br>
                        </div>
                      </div>
                      <div><br>
                      </div>
                      <div
                        class="gmail-m_-4120532262546096157protonmail_signature_block-proton">Sent
                        with <a href="https://protonmail.com"
                          moz-do-not-send="true">ProtonMail</a> Secure
                        Email.<br>
                      </div>
                    </div>
                    <div><br>
                    </div>
                    <div>‐‐‐‐‐‐‐ Original Message ‐‐‐‐‐‐‐<br>
                    </div>
                    <div>On Wednesday, May 15, 2019 12:41 PM, David
                      Haslam &lt;<a href="mailto:dfhdfh@protonmail.com"
                        moz-do-not-send="true">dfhdfh@protonmail.com</a>&gt;
                      wrote:<br>
                    </div>
                    <div><br>
                    </div>
                    <blockquote
                      class="gmail-m_-4120532262546096157protonmail_quote"
                      type="cite">
                      <div>Yep - sure - later I can do that. <br>
                      </div>
                      <div><br>
                      </div>
                      <div>David<br>
                      </div>
                      <div><br>
                      </div>
                      <div
                        id="gmail-m_-4120532262546096157protonmail_mobile_signature_block">
                        <div>Sent from ProtonMail Mobile<br>
                        </div>
                      </div>
                      <div><br>
                      </div>
                      <div><br>
                      </div>
                      <div>On Wed, May 15, 2019 at 11:26, Cyrille &lt;<a
                          href="mailto:lafricain79@gmail.com"
                          moz-do-not-send="true">lafricain79@gmail.com</a>&gt;
                        wrote:<br>
                      </div>
                      <blockquote type="cite"
                        class="gmail-m_-4120532262546096157protonmail_quote">
                        <div>David I have no count in box, and I want
                          not to create one. Can you push on <a
                            href="https://framadrop.org/"
                            moz-do-not-send="true">https://framadrop.org/</a>
                          it's totally free and secure (and private).<br>
                        </div>
                        <div>Thank  you.<br>
                        </div>
                        <div><br>
                        </div>
                        <div><br>
                        </div>
                        <div
                          class="gmail-m_-4120532262546096157moz-cite-prefix">Il
                          15/05/2019 11:46, David Haslam ha scritto:<br>
                        </div>
                        <blockquote type="cite">
                          <pre>Interim progress report.

I downloaded the file Mat_utf8.zip from Cyrille's link and unzipped the contents to Mat_utf8-odt

I opened the .odt file using 7-Zip from the Windows Explorer context menu, and extracted the file contents.xml

I used Notepad++ plug-in XMLTools to pretty print the XML file and saved it as contents.pp.xml
This is simply a layout change that's easier to read.

I viewed the .pp.xml file in BabelPad, which confirmed that the non-XML text was (mostly) Myanmar Unicode.

I used a TextPipe filter to remove all XML tags, blanks from SOL &amp; EOL and all blank lines.
The output file is now contents.pp.txt

This is now something that's readable content in Myanmar Unicode, with some English text such as "The Gospel according Matthew" near the start.

The file is best viewed using BabelPad with the option Display Colours | Colour Code by Script.
This shows Myanmar characters in light green, and non-Myanmar characters in other colours.

Observations:
1. The font conversion to Unicode left a few scattered characters unconverted. :(

0000C8        È        18        LATIN CAPITAL LETTER E WITH GRAVE
0000D8        Ø        20        LATIN CAPITAL LETTER O WITH STROKE
0000F2        ò        3        LATIN SMALL LETTER O WITH GRAVE

The complete character frequency analysis is attached.

2. A few verse numbers? are still present here and there.
3. The content contains section headings and parallel passage headings as well as verse text.

I have just uploaded the file contents.pp.zip to a new folder in my Box account and added Cyrille &amp; Michael as viewers.


Best regards,

David

Sent with ProtonMail Secure Email.

‐‐‐‐‐‐‐ Original Message ‐‐‐‐‐‐‐
On Monday, May 13, 2019 9:19 AM, Cyrille <a href="mailto:lafricain79@gmail.com" moz-do-not-send="true">&lt;lafricain79@gmail.com&gt;</a> wrote:


</pre>
                          <blockquote type="cite">
                            <pre>Hello,
I recently receive a modern translation of Myanmar of the NT, Psalms and
Proverbs with permission to create a new module.
But the problems are many... Firs to get the text.
I tested different way, but it's done with PageMaker!
I can get the text but the problem is I don't have the verses number
because they are next in a parallel column and when I copy it I have
only the biblical text.
I have a pdf also but when I convert it to text (with pdftotext) the
columns are mixed.
Someone can help me whit any idea?
Next problem is the Unicode... The text is not typed in unicode but use
a special font.
I can send everything you need or push it the git.crosswire.

Thanks for help.

sword-devel mailing list: <a href="mailto:sword-devel@crosswire.org" moz-do-not-send="true">sword-devel@crosswire.org</a>
<a href="http://www.crosswire.org/mailman/listinfo/sword-devel" moz-do-not-send="true">http://www.crosswire.org/mailman/listinfo/sword-devel</a>
Instructions to unsubscribe/change your settings at above page

</pre>
                          </blockquote>
                          <div><br>
                          </div>
                          <pre>_______________________________________________
sword-devel mailing list: <a href="mailto:sword-devel@crosswire.org" moz-do-not-send="true">sword-devel@crosswire.org</a>
<a href="http://www.crosswire.org/mailman/listinfo/sword-devel" moz-do-not-send="true">http://www.crosswire.org/mailman/listinfo/sword-devel</a>
Instructions to unsubscribe/change your settings at above page
</pre>
                        </blockquote>
                      </blockquote>
                      <div><br>
                      </div>
                      <div><br>
                      </div>
                    </blockquote>
                    <div><br>
                    </div>
                  </blockquote>
                  <div><br>
                  </div>
                </blockquote>
                <div><br>
                </div>
                _______________________________________________<br>
                sword-devel mailing list: <a
                  href="mailto:sword-devel@crosswire.org"
                  moz-do-not-send="true">sword-devel@crosswire.org</a><br>
                <a
                  href="http://www.crosswire.org/mailman/listinfo/sword-devel"
                  rel="noreferrer" moz-do-not-send="true">http://www.crosswire.org/mailman/listinfo/sword-devel</a><br>
                Instructions to unsubscribe/change your settings at
                above page</blockquote>
            </div>
          </blockquote>
          <div><br>
          </div>
          <div><br>
          </div>
          <br>
          <fieldset class="mimeAttachmentHeader"></fieldset>
          <pre class="moz-quote-pre" wrap="">_______________________________________________
sword-devel mailing list: <a class="moz-txt-link-abbreviated" href="mailto:sword-devel@crosswire.org" moz-do-not-send="true">sword-devel@crosswire.org</a>
<a class="moz-txt-link-freetext" href="http://www.crosswire.org/mailman/listinfo/sword-devel" moz-do-not-send="true">http://www.crosswire.org/mailman/listinfo/sword-devel</a>
Instructions to unsubscribe/change your settings at above page</pre>
        </blockquote>
        <br>
      </blockquote>
      <div><br>
      </div>
      <div><br>
      </div>
      <br>
      <fieldset class="mimeAttachmentHeader"></fieldset>
      <pre class="moz-quote-pre" wrap="">_______________________________________________
sword-devel mailing list: <a class="moz-txt-link-abbreviated" href="mailto:sword-devel@crosswire.org">sword-devel@crosswire.org</a>
<a class="moz-txt-link-freetext" href="http://www.crosswire.org/mailman/listinfo/sword-devel">http://www.crosswire.org/mailman/listinfo/sword-devel</a>
Instructions to unsubscribe/change your settings at above page</pre>
    </blockquote>
    <br>
  </body>
</html>