<html>
<head>
<meta http-equiv="Content-Type" content="text/html; charset=UTF-8">
</head>
<body text="#000000" bgcolor="#FFFFFF">
David,<br>
Probably you are right about <a moz-do-not-send="true"
href="http://scripts.sil.org/cms/scripts/page.php?site_id=nrsi&cat_id=TECkit">TECkit</a>,
if we get the text it will help us to convert in UNICODE.<br>
About how to get the text, your method is out of my skills :)<br>
I you succeed please let me know.<br>
<br>
<div class="moz-cite-prefix">Il 13/05/2019 16:21, David Haslam ha
scritto:<br>
</div>
<blockquote type="cite"
cite="mid:FAbiJuySD_kKVeWnnUe9G_72DwnmUPE7CMlf_75cHrJR7O0HX_XLWO7Y--3spNJhAF3PNjAmJVYheZY8jCjxiFkX3sayG43u8wH3BksIXJ4=@protonmail.com">
<meta http-equiv="content-type" content="text/html; charset=UTF-8">
<div>Given the insights from Michael Hart, it may be feasible to
temporarily rearrange the main text stream as follows :</div>
<div><br>
</div>
<div>1. Replace every EOL by a horizontal tab. </div>
<div>2. Insert an EOL after each verse end character. </div>
<div><br>
</div>
<div>Observe that the above two steps are wholly reversible such
that the original text stream can be restored later. </div>
<div><br>
</div>
<div>In effect the text stream is now in verse per line (VPL)
layout, albeit without verse tags. Some adjustments may be
necessary if there any section headings, etc. </div>
<div><br>
</div>
<div>3. Add line numbers with the first number being reset to 1 at
the start of each chapter, numbers incrementing by 1 for each
line. </div>
<div>4. Add a left margin USFM verse tag \v_<br>
</div>
<div><br>
</div>
<div id="protonmail_mobile_signature_block">
<div>Steps 3&4 can be implemented in various ways. For my
part, I’d use a bespoke TextPipe filter. </div>
<div><br>
</div>
<div>Another method to consider might be to use Excel formulae.
I recall resorting to such a method in the early days of Go
Bible. </div>
<div><br>
</div>
<div>Now restore the original layout by reverting steps 2 &
1, if this is really necessary. That is, if the original text
layout appeared to be paragraphed. </div>
<div><br>
</div>
<div>5. Decide how & where to insert paragraph tags. </div>
<div><br>
</div>
<div>6. Add chapter tags, book ID and main title tags, etc. </div>
<div><br>
</div>
<div>Hope this gives some useful suggestions that point towards
a practical solution. </div>
<div><br>
</div>
<div>Best regards </div>
<div><br>
</div>
<div>David</div>
<div><br>
</div>
<div><br>
</div>
<div>Sent from ProtonMail Mobile</div>
</div>
<div><br>
</div>
<div><br>
</div>
On Mon, May 13, 2019 at 14:57, Michael H <<a
href="mailto:cmahte@gmail.com" class="" moz-do-not-send="true">cmahte@gmail.com</a>>
wrote:
<blockquote class="protonmail_quote" type="cite">
<div dir="ltr">
<div dir="ltr">
<div dir="ltr">
<div dir="ltr">
<div class="gmail_default"
style="font-family:garamond,serif;font-size:large">Cyrille<br>
<br>
LibreOffice Draw attempts to open the pagemaker file,
with limited success. But it confirms that even in the
pagemaker source, the verse numbers are a separate
text stream. With this source, there is no way to copy
the text with verse numbers intact. It appears to be
stored with each book in it's own text stream. Each
book is a separate text stream in the page maker file.
LO Draw isn't rendering all of the pages, only the
first 10, So I've only explored Matthew further. <br>
<br>
Based on Matthew only, the verses seem to all end with
the character "-" or ";/", which should aid in the
reconstruction. I've looked through the PDF and this
seems to be the case for all books visually as well.
However, this isn't perfect: I find 1107 of these
characters in Matthew, instead of the expected 1071
verses. But since the text stream has a book
introduction, this is likely easily explained.
Hopefully this gets you well down the path to creating
a stream with verses. <br>
<br>
I would NOT start from the PDF file, but from the
pagemaker file. The PDF almost certainly has a lot of
text rearranging and extra characters like page
numbers and running heads. Pagemaker has the book
text in a single stream, in a form that will convert
to unicode relatively easily. </div>
<div class="gmail_default"
style="font-family:garamond,serif;font-size:large"><br>
</div>
</div>
</div>
</div>
</div>
</blockquote>
<div><br>
</div>
<div><br>
</div>
<br>
<fieldset class="mimeAttachmentHeader"></fieldset>
<pre class="moz-quote-pre" wrap="">_______________________________________________
sword-devel mailing list: <a class="moz-txt-link-abbreviated" href="mailto:sword-devel@crosswire.org">sword-devel@crosswire.org</a>
<a class="moz-txt-link-freetext" href="http://www.crosswire.org/mailman/listinfo/sword-devel">http://www.crosswire.org/mailman/listinfo/sword-devel</a>
Instructions to unsubscribe/change your settings at above page</pre>
</blockquote>
<br>
</body>
</html>