<html>

  <head>

    <meta http-equiv="Content-Type" content="text/html; charset=UTF-8">

  </head>

  <body text="#000000" bgcolor="#FFFFFF">

    Thank you Michael for your help!<br>

    Let me know if you succeed to do something.<br>

    <br>

    <div class="moz-cite-prefix">Il 13/05/2019 15:57, Michael H ha

      scritto:<br>

    </div>

    <blockquote type="cite"

cite="mid:CAJ9hia8pYeDvfvnK2i_rnhqzk3NC53g5zAftZDkuOhaximqBhA@mail.gmail.com">

      <meta http-equiv="content-type" content="text/html; charset=UTF-8">

      <div dir="ltr">

        <div dir="ltr">

          <div dir="ltr">

            <div dir="ltr">

              <div class="gmail_default"

                style="font-family:garamond,serif;font-size:large">Cyrille<br>

                <br>

                LibreOffice Draw attempts to open the pagemaker file,

                with limited success. But it confirms that even in the

                pagemaker source, the verse numbers are a separate text

                stream. With this source, there is no way to copy the

                text with verse numbers intact. It appears to be stored

                with each book in it's own text stream. Each book is a

                separate text stream in the page maker file. LO Draw

                isn't rendering all of the pages, only the first 10, So

                I've only explored Matthew further. <br>

                <br>

                Based on Matthew only, the verses seem to all end with

                the character "-" or ";/", which should aid in the

                reconstruction. I've looked through the PDF and this

                seems to be the case for all books visually as well.

                However, this isn't perfect: I find 1107 of these

                characters in Matthew, instead of the expected 1071

                verses.  But since the text stream has a book

                introduction, this is likely easily explained. Hopefully

                this gets you well down the path to creating a stream

                with verses. <br>

                <br>

                I would NOT start from the PDF file, but from the

                pagemaker file.  The PDF almost certainly has a lot of

                text rearranging and extra characters like page numbers

                and running heads.  Pagemaker has the book text in a

                single stream, in a form that will convert to unicode

                relatively easily. </div>

              <div class="gmail_default"

                style="font-family:garamond,serif;font-size:large"><br>

              </div>

            </div>

          </div>

        </div>

      </div>

      <br>

      <fieldset class="mimeAttachmentHeader"></fieldset>

      <pre class="moz-quote-pre" wrap="">_______________________________________________

sword-devel mailing list: <a class="moz-txt-link-abbreviated" href="mailto:sword-devel@crosswire.org">sword-devel@crosswire.org</a>

<a class="moz-txt-link-freetext" href="http://www.crosswire.org/mailman/listinfo/sword-devel">http://www.crosswire.org/mailman/listinfo/sword-devel</a>

Instructions to unsubscribe/change your settings at above page</pre>

    </blockquote>

    <br>

  </body>

</html>