[sword-devel] Bible in Myanmar

David Haslam dfhdfh at protonmail.com
Mon May 13 07:21:32 MST 2019


Given the insights from Michael Hart, it may be feasible to temporarily rearrange the main text stream as follows :

1. Replace every EOL by a horizontal tab.
2. Insert an EOL after each verse end character.

Observe that the above two steps are wholly reversible such that the original text stream can be restored later.

In effect the text stream is now in verse per line (VPL) layout, albeit without verse tags. Some adjustments may be necessary if there any section headings, etc.

3. Add line numbers with the first number being reset to 1 at the start of each chapter, numbers incrementing by 1 for each line.
4. Add a left margin USFM verse tag \v_

Steps 3&4 can be implemented in various ways. For my part, I’d use a bespoke TextPipe filter.

Another method to consider might be to use Excel formulae. I recall resorting to such a method in the early days of Go Bible.

Now restore the original layout by reverting steps 2 & 1, if this is really necessary. That is, if the original text layout appeared to be paragraphed.

5. Decide how & where to insert paragraph tags.

6. Add chapter tags, book ID and main title tags, etc.

Hope this gives some useful suggestions that point towards a practical solution.

Best regards

David

Sent from ProtonMail Mobile

On Mon, May 13, 2019 at 14:57, Michael H <cmahte at gmail.com> wrote:

> Cyrille
>
> LibreOffice Draw attempts to open the pagemaker file, with limited success. But it confirms that even in the pagemaker source, the verse numbers are a separate text stream. With this source, there is no way to copy the text with verse numbers intact. It appears to be stored with each book in it's own text stream. Each book is a separate text stream in the page maker file. LO Draw isn't rendering all of the pages, only the first 10, So I've only explored Matthew further.
>
> Based on Matthew only, the verses seem to all end with the character "-" or ";/", which should aid in the reconstruction. I've looked through the PDF and this seems to be the case for all books visually as well. However, this isn't perfect: I find 1107 of these characters in Matthew, instead of the expected 1071 verses.  But since the text stream has a book introduction, this is likely easily explained. Hopefully this gets you well down the path to creating a stream with verses.
>
> I would NOT start from the PDF file, but from the pagemaker file.  The PDF almost certainly has a lot of text rearranging and extra characters like page numbers and running heads.  Pagemaker has the book text in a single stream, in a form that will convert to unicode relatively easily.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://www.crosswire.org/pipermail/sword-devel/attachments/20190513/8419f45d/attachment-0001.html>


More information about the sword-devel mailing list