<html>
  <head>
    <meta http-equiv="Content-Type" content="text/html; charset=UTF-8">
  </head>
  <body text="#000000" bgcolor="#FFFFFF">
    <br>
    <br>
    <div class="moz-cite-prefix">Il 14/05/2019 22:26, David Haslam ha
      scritto:<br>
    </div>
    <blockquote type="cite"
cite="mid:9JK6fdu-uy4_G3T4aCrM581jrnnd4GAExQ3bJV_Ayu8AovpRa_U-GG8XldgmdXx_s40UpA3rzrBCADGwZgvOkt0NhZxkHdKeCO9QeGNsT14=@protonmail.com">
      <meta http-equiv="content-type" content="text/html; charset=UTF-8">
      <div>If Michael’s observations are anything to go by, then maybe I
        can script the recovery of chapter &amp; verse tags. </div>
      <div><br>
      </div>
      <div>We shall see ....</div>
      <div><br>
      </div>
      <div>Even if I’m not immediately successful - valuable lessons can
        be learned in the attempt. <br>
      </div>
    </blockquote>
    Very, well, I'll wait for you ;)<br>
    <blockquote type="cite"
cite="mid:9JK6fdu-uy4_G3T4aCrM581jrnnd4GAExQ3bJV_Ayu8AovpRa_U-GG8XldgmdXx_s40UpA3rzrBCADGwZgvOkt0NhZxkHdKeCO9QeGNsT14=@protonmail.com">
      <div><br>
      </div>
      <div>David</div>
      <div><br>
      </div>
      <div id="protonmail_mobile_signature_block">
        <div>Sent from ProtonMail Mobile</div>
      </div>
      <div><br>
      </div>
      <div><br>
      </div>
      On Tue, May 14, 2019 at 21:21, Cyrille &lt;<a
        href="mailto:lafricain79@gmail.com" class=""
        moz-do-not-send="true">lafricain79@gmail.com</a>&gt; wrote:
      <blockquote class="protonmail_quote" type="cite"> Ok thank you!  I
        have already all the text in unicode but without the verse
        numbers and chapters... I begun manually...<br>
        <br>
        <div class="moz-cite-prefix">Il 14/05/2019 22:17, David Haslam
          ha scritto:<br>
        </div>
        <blockquote type="cite">
          <div>Hi Cyrille </div>
          <div><br>
          </div>
          <div>If I can find the time tomorrow or later, I’ll have a
            look at what might be feasible. </div>
          <div><br>
          </div>
          <div>Thanks for all these useful links. </div>
          <div><br>
          </div>
          <div>David</div>
          <div><br>
          </div>
          <div id="protonmail_mobile_signature_block">
            <div>Sent from ProtonMail Mobile</div>
          </div>
          <div><br>
          </div>
          <div><br>
          </div>
          On Tue, May 14, 2019 at 14:08, Cyrille &lt;<a
            href="mailto:lafricain79@gmail.com" class=""
            moz-do-not-send="true">lafricain79@gmail.com</a>&gt; wrote:
          <blockquote class="protonmail_quote" type="cite"> I send my
            message again because it was bigger.<br>
            <br>
            The conversion to UTF-8 is 99% solved!! I used a online
            converter:<br>
            <a class="moz-txt-link-freetext"
href="https://thanlwinsoft.github.io/www.thanlwinsoft.org/ThanLwinSoft/MyanmarUnicode/Conversion/myanmarConverter.html"
              moz-do-not-send="true">https://thanlwinsoft.github.io/www.thanlwinsoft.org/ThanLwinSoft/MyanmarUnicode/Conversion/myanmarConverter.html</a><br>
            or:<br>
            <a class="moz-txt-link-freetext"
              href="http://burglish.my-mm.org/latest/trunk/web/fontconv.htm"
              moz-do-not-send="true">http://burglish.my-mm.org/latest/trunk/web/fontconv.htm</a><br>
            <br>
            See the result <a
href="https://framadrop.org/r/jKnYnvuQIH#mE+FWcvzD1N/Omnfr7uWMZmI/HZUUVPdvnVVkBFyFrA="
              moz-do-not-send="true">here</a>.<br>
            <br>
            Now the only problem is how to get the verse and chapter
            number... <br>
            <br>
            <br>
            <div class="moz-cite-prefix">Il 14/05/2019 13:53, Michael H
              ha scritto:<br>
            </div>
            <blockquote type="cite">
              <div dir="ltr">
                <div dir="ltr">
                  <div dir="ltr">
                    <div class="gmail_default"><font size="4"
                        face="garamond,&#xA; serif">Cyrille, (Peter), <br>
                        <br>
                        Maybe further discussion on this belongs in
                        Gitlab as issues.  Can I get added to this
                        project? <br>
                        <br>
                        Here are the first few lines of Matthew copied
                        from the PDF: </font><br>
                      ------<br>
                      <div class="gmail_default"
                        style="font-family:garamond,serif;font-size:large">&amp;Sifrmaw;OD;
                        {0Ha*vdusrf;</div>
                      <div class="gmail_default"
                        style="font-family:garamond,serif;font-size:large">The
                        Gospel According to Matthew</div>
                      <div class="gmail_default"
                        style="font-family:garamond,serif;font-size:large">ed'gef;</div>
                      <div class="gmail_default"
                        style="font-family:garamond,serif;font-size:large">usr;f
                        ûyy*k Kd¾v f &amp;iS rf maw;O;D \b0rwS wf r;f</div>
                      <div class="gmail_default"
                        style="font-family:garamond,serif;font-size:large">usr;f
                        ûyy*k Kd¾v f &amp;iS rf maw;O;Don f *gavav;,e,rf
                        S*sL;vrl sK;d tmvaf z;O;D \om;jzp\f / (rmu
                        k2;14)</div>
                      <div class="gmail_default"
                        style="font-family:garamond,serif;font-size:large">olonf
                        tcGefcHoltjzpf trIxrf;chJonf/ (vk 5;27)
                        a,Zl;ocif\aemufvdkufwynfhrjzpfrD ol\trnfrSm</div>
                      <div class="gmail_default"
                        style="font-family:garamond,serif;font-size:large">av0djzp\f
                        / ool n f wad b;&amp;,d tidk tf e;DwGi f
                        a,Z;lociEf iS ahf wG U Ny;D<br>
                        <br>
                      </div>
                      <div class="gmail_default"
                        style="font-family:garamond,serif;font-size:large">-----</div>
                      <div class="gmail_default"><font size="4"
                          face="garamond,&#xA; serif">And here are the
                          first few lines of Matthew copied from the
                          Pagemaker file: </font></div>
                      <div class="gmail_default"><font size="4"
                          face="garamond,&#xA; serif">-----<br>
                        </font>
                        <div class="gmail_default"><font size="4"
                            face="garamond, serif">Sifrmaw;OD;
                            {0Ha*vdusrf;</font></div>
                        <div class="gmail_default"><font size="4"
                            face="garamond, serif">The Gospel According
                            to Matthew</font></div>
                        <div class="gmail_default"><span
                            style="font-family:garamond,serif;font-size:large">ed'gef;</span><br>
                        </div>
                        <div class="gmail_default"><span
                            style="font-family:garamond,serif;font-size:large">usrf;�yyk*�dKvf 
                            &amp;Sifrmaw;OD;\b0rSwfwrf;  </span><br>
                        </div>
                        <div class="gmail_default"><span
                            style="font-family:garamond,serif;font-size:large">usrf;�yyk*�dKvf 
                            &amp;Sifrmaw;OD;onf  *gavav;,e,frS
                            *sL;vlrsKd; tmvfaz;OD;\om;jzpf\/ (rmuk 2;14)
                            olonf  tcGefcHoltjzpf trIxrf;chJonf/ (vk
                            5;27) a,Zl;ocif\aemufvdkufwynfhrjzpfrD 
                            ol\trnfrSm av0djzpf\/ olonf 
                            wdab;&amp;d,tkdifteD;wGif 
                            a,Zl;ocifESifhawGU  NyD;<br>
                            <br>
                            <br>
                            You can see that some letters have changed,
                            and some others are in a different order. <br>
                            <br>
                          </span><span
                            style="font-family:garamond,serif;font-size:large">The
                            letters that change are likely those points
                            that aren't compatible with unicode, and
                            pagemaker reassigned them to ensure that the
                            file is more widely viewable. Since a
                            conversion is already planned, these won't
                            matter as much, but the font embedded in the
                            PDF is different than the font attached to
                            the pagemaker file,  If you do start from
                            the PDF, you'll need to extract the font to
                            get the code points. </span><br
                            style="font-family:garamond,serif;font-size:large">
                          <span
                            style="font-family:garamond,serif;font-size:large"><br>
                            The problem is that the PDF export from
                            pagemaker sorts the letters into the order
                            they appear on the page.  Burmese text has
                            Indian style ligatures, where vowels tend to
                            jump over or under the previous letters,
                            sometimes back 2 or three letters. If you
                            study the following snippets from the
                            beginning of Matthew, you can see there is a
                            difference in order, as well as some glyphs
                            are modified. <br>
                            <br>
                            So, from the PDF letters are out of order,
                            but from Pagemaker, letters are encoded into
                            control points. Fixing the control points is
                            easy and happens with the unicode
                            conversion.  Fixing the letter order is not
                            easy. You'll need a first language speaker
                            and plenty of time. </span></div>
                        <div class="gmail_default"><span
                            style="font-family:garamond,serif;font-size:large"><br>
                            The guidance I received on another group was
                            to use either LO Draw or Indesign to export
                            the text from Pagemaker.  I'll look into LO
                            Draw again, but I don't have access to an
                            older version of Indesign (the pagemaker
                            import was removed in CS6). </span><span
                            style="font-family:garamond,serif;font-size:large"><br>
                          </span></div>
                      </div>
                    </div>
                  </div>
                </div>
              </div>
              <div dir="ltr">
                <div class="gmail_default"
                  style="font-family:garamond,serif;font-size:large"><br>
                </div>
              </div>
              <br>
              <div class="gmail_quote">
                <div dir="ltr" class="gmail_attr">On Mon, May 13, 2019
                  at 10:40 AM Michael H &lt;<a
                    href="mailto:cmahte@gmail.com"
                    moz-do-not-send="true">cmahte@gmail.com</a>&gt;
                  wrote:<br>
                </div>
                <blockquote class="gmail_quote" style="margin:0px
                  0px&#xA; 0px&#xA; 0.8ex;border-left:1px solid&#xA;
                  rgb(204,204,204);padding-left:1ex">
                  <div dir="ltr">
                    <div class="gmail_default"
                      style="font-family:garamond,serif;font-size:large">I
                      unzipped the pagemaker file, and when I open
                      NT_Proverb/Pagemaker (10.1mb), with a Hex editor,
                      I can 'find' all of the book names, and see the
                      text there.  <br>
                      <br>
                      To see the raw text: rename NT_Proverb.pmd &gt;
                      NT_Proverb.zip and open it with a zip archive
                      progeram.  The text is in the Pagemaker file at
                      the top level of the archive, but encoded with a
                      lot of extraneous information.  (The English text
                      "Matthew" appears at hex location 7A76972). <br>
                      <br>
                      When I open the fonts with fontforge, Fontforge
                      suggests the fonts are encoded as unicode (but the
                      glyphs are obviously not in the right spot.) <br>
                      However when I copy the text (I copied from LO
                      Draw) and paste it into jedit and save that as
                      unicode: Reopening the file has a warning 'not
                      unicode, text may be missing'. <br>
                      <br>
                      So, what this means is that there are some glyphs
                      encoded into locations that unicode treats as
                      control or non-printing codes. The text needs to
                      be dealt with as a specific encoding that matches
                      whatever the original font actually uses. I
                      haven't figured out what the original text files
                      were encoded with. Without that knowledge, I'm not
                      sure my system clipboard or editor (jedit) will
                      properly respect the glyphs in unusual locations
                      until the conversion to unicode, and I don't trust
                      myself to be able to detect if it is or is not
                      properly converted. <br>
                    </div>
                  </div>
                  <br>
                  <div class="gmail_quote">
                    <div dir="ltr" class="gmail_attr">On Mon, May 13,
                      2019 at 10:11 AM Cyrille &lt;<a
                        href="mailto:lafricain79@gmail.com"
                        moz-do-not-send="true">lafricain79@gmail.com</a>&gt;
                      wrote:<br>
                    </div>
                    <blockquote class="gmail_quote" style="margin:0px
                      0px&#xA; 0px&#xA; 0.8ex;border-left:1px
                      solid&#xA;&#xA; rgb(204,204,204);padding-left:1ex">
                      <div bgcolor="#FFFFFF"> David,<br>
                        Probably you are right about <a
href="http://scripts.sil.org/cms/scripts/page.php?site_id=nrsi&amp;cat_id=TECkit"
                          moz-do-not-send="true">TECkit</a>, if we get
                        the text it will help us to convert in UNICODE.<br>
                        About how to get the text, your method is out of
                        my skills :)<br>
                        I you succeed please let me know.<br>
                        <br>
                        <div
class="gmail-m_3757925966681618317gmail-m_-6550991463107192144gmail-m_-2496802141858019636moz-cite-prefix">Il
                          13/05/2019 16:21, David Haslam ha scritto:<br>
                        </div>
                        <blockquote type="cite">
                          <div>Given the insights from Michael Hart, it
                            may be feasible to temporarily rearrange the
                            main text stream as follows :</div>
                          <div><br>
                          </div>
                          <div>1. Replace every EOL by a horizontal
                            tab. </div>
                          <div>2. Insert an EOL after each verse end
                            character. </div>
                          <div><br>
                          </div>
                          <div>Observe that the above two steps are
                            wholly reversible such that the original
                            text stream can be restored later. </div>
                          <div><br>
                          </div>
                          <div>In effect the text stream is now in verse
                            per line (VPL) layout, albeit without verse
                            tags. Some adjustments may be necessary if
                            there any section headings, etc. </div>
                          <div><br>
                          </div>
                          <div>3. Add line numbers with the first number
                            being reset to 1 at the start of each
                            chapter, numbers incrementing by 1 for each
                            line. </div>
                          <div>4. Add a left margin USFM verse tag \v_<br>
                          </div>
                          <div><br>
                          </div>
                          <div
id="gmail-m_3757925966681618317gmail-m_-6550991463107192144gmail-m_-2496802141858019636protonmail_mobile_signature_block">
                            <div>Steps 3&amp;4 can be implemented in
                              various ways. For my part, I’d use a
                              bespoke TextPipe filter. </div>
                            <div><br>
                            </div>
                            <div>Another method to consider might be to
                              use Excel formulae. I recall resorting to
                              such a method in the early days of Go
                              Bible. </div>
                            <div><br>
                            </div>
                            <div>Now restore the original layout by
                              reverting steps 2 &amp; 1, if this is
                              really necessary. That is, if the original
                              text layout appeared to be paragraphed. </div>
                            <div><br>
                            </div>
                            <div>5. Decide how &amp; where to insert
                              paragraph tags. </div>
                            <div><br>
                            </div>
                            <div>6. Add chapter tags, book ID and main
                              title tags, etc. </div>
                            <div><br>
                            </div>
                            <div>Hope this gives some useful suggestions
                              that point towards a practical solution. </div>
                            <div><br>
                            </div>
                            <div>Best regards </div>
                            <div><br>
                            </div>
                            <div>David</div>
                            <div><br>
                            </div>
                            <div><br>
                            </div>
                            <div>Sent from ProtonMail Mobile</div>
                          </div>
                          <div><br>
                          </div>
                          <div><br>
                          </div>
                          On Mon, May 13, 2019 at 14:57, Michael H &lt;<a
                            href="mailto:cmahte@gmail.com"
                            moz-do-not-send="true">cmahte@gmail.com</a>&gt;
                          wrote:
                          <blockquote
class="gmail-m_3757925966681618317gmail-m_-6550991463107192144gmail-m_-2496802141858019636protonmail_quote"
                            type="cite">
                            <div dir="ltr">
                              <div dir="ltr">
                                <div dir="ltr">
                                  <div dir="ltr">
                                    <div class="gmail_default"
                                      style="font-family:garamond,serif;font-size:large">Cyrille<br>
                                      <br>
                                      LibreOffice Draw attempts to open
                                      the pagemaker file, with limited
                                      success. But it confirms that even
                                      in the pagemaker source, the verse
                                      numbers are a separate text
                                      stream. With this source, there is
                                      no way to copy the text with verse
                                      numbers intact. It appears to be
                                      stored with each book in it's own
                                      text stream. Each book is a
                                      separate text stream in the page
                                      maker file. LO Draw isn't
                                      rendering all of the pages, only
                                      the first 10, So I've only
                                      explored Matthew further. <br>
                                      <br>
                                      Based on Matthew only, the verses
                                      seem to all end with the character
                                      "-" or ";/", which should aid in
                                      the reconstruction. I've looked
                                      through the PDF and this seems to
                                      be the case for all books visually
                                      as well. However, this isn't
                                      perfect: I find 1107 of these
                                      characters in Matthew, instead of
                                      the expected 1071 verses.  But
                                      since the text stream has a book
                                      introduction, this is likely
                                      easily explained. Hopefully this
                                      gets you well down the path to
                                      creating a stream with verses. <br>
                                      <br>
                                      I would NOT start from the PDF
                                      file, but from the pagemaker
                                      file.  The PDF almost certainly
                                      has a lot of text rearranging and
                                      extra characters like page numbers
                                      and running heads.  Pagemaker has
                                      the book text in a single stream,
                                      in a form that will convert to
                                      unicode relatively easily. </div>
                                    <div class="gmail_default"
                                      style="font-family:garamond,serif;font-size:large"><br>
                                    </div>
                                  </div>
                                </div>
                              </div>
                            </div>
                          </blockquote>
                          <div><br>
                          </div>
                          <div><br>
                          </div>
                          <br>
                          <fieldset
class="gmail-m_3757925966681618317gmail-m_-6550991463107192144gmail-m_-2496802141858019636mimeAttachmentHeader"></fieldset>
                          <pre class="gmail-m_3757925966681618317gmail-m_-6550991463107192144gmail-m_-2496802141858019636moz-quote-pre">_______________________________________________
sword-devel mailing list: <a class="gmail-m_3757925966681618317gmail-m_-6550991463107192144gmail-m_-2496802141858019636moz-txt-link-abbreviated" href="mailto:sword-devel@crosswire.org" moz-do-not-send="true">sword-devel@crosswire.org</a>
<a class="gmail-m_3757925966681618317gmail-m_-6550991463107192144gmail-m_-2496802141858019636moz-txt-link-freetext" href="http://www.crosswire.org/mailman/listinfo/sword-devel" moz-do-not-send="true">http://www.crosswire.org/mailman/listinfo/sword-devel</a>
Instructions to unsubscribe/change your settings at above page</pre>
                        </blockquote>
                        <br>
                      </div>
                      _______________________________________________<br>
                      sword-devel mailing list: <a
                        href="mailto:sword-devel@crosswire.org"
                        moz-do-not-send="true">sword-devel@crosswire.org</a><br>
                      <a
                        href="http://www.crosswire.org/mailman/listinfo/sword-devel"
                        rel="noreferrer" moz-do-not-send="true">http://www.crosswire.org/mailman/listinfo/sword-devel</a><br>
                      Instructions to unsubscribe/change your settings
                      at above page</blockquote>
                  </div>
                </blockquote>
              </div>
              <br>
              <fieldset class="mimeAttachmentHeader"></fieldset>
              <pre class="moz-quote-pre" wrap="">_______________________________________________
sword-devel mailing list: <a class="moz-txt-link-abbreviated" href="mailto:sword-devel@crosswire.org" moz-do-not-send="true">sword-devel@crosswire.org</a>
<a class="moz-txt-link-freetext" href="http://www.crosswire.org/mailman/listinfo/sword-devel" moz-do-not-send="true">http://www.crosswire.org/mailman/listinfo/sword-devel</a>
Instructions to unsubscribe/change your settings at above page</pre>
            </blockquote>
            <br>
          </blockquote>
          <div><br>
          </div>
          <div><br>
          </div>
          <br>
          <fieldset class="mimeAttachmentHeader"></fieldset>
          <pre class="moz-quote-pre" wrap="">_______________________________________________
sword-devel mailing list: <a class="moz-txt-link-abbreviated" href="mailto:sword-devel@crosswire.org" moz-do-not-send="true">sword-devel@crosswire.org</a>
<a class="moz-txt-link-freetext" href="http://www.crosswire.org/mailman/listinfo/sword-devel" moz-do-not-send="true">http://www.crosswire.org/mailman/listinfo/sword-devel</a>
Instructions to unsubscribe/change your settings at above page</pre>
        </blockquote>
        <br>
      </blockquote>
      <div><br>
      </div>
      <div><br>
      </div>
      <br>
      <fieldset class="mimeAttachmentHeader"></fieldset>
      <pre class="moz-quote-pre" wrap="">_______________________________________________
sword-devel mailing list: <a class="moz-txt-link-abbreviated" href="mailto:sword-devel@crosswire.org">sword-devel@crosswire.org</a>
<a class="moz-txt-link-freetext" href="http://www.crosswire.org/mailman/listinfo/sword-devel">http://www.crosswire.org/mailman/listinfo/sword-devel</a>
Instructions to unsubscribe/change your settings at above page</pre>
    </blockquote>
    <br>
  </body>
</html>