[sword-devel] Bible in Myanmar
Cyrille
lafricain79 at gmail.com
Mon May 13 08:10:24 MST 2019
David,
Probably you are right about TECkit
<http://scripts.sil.org/cms/scripts/page.php?site_id=nrsi&cat_id=TECkit>,
if we get the text it will help us to convert in UNICODE.
About how to get the text, your method is out of my skills :)
I you succeed please let me know.
Il 13/05/2019 16:21, David Haslam ha scritto:
> Given the insights from Michael Hart, it may be feasible to
> temporarily rearrange the main text stream as follows :
>
> 1. Replace every EOL by a horizontal tab.
> 2. Insert an EOL after each verse end character.
>
> Observe that the above two steps are wholly reversible such that the
> original text stream can be restored later.
>
> In effect the text stream is now in verse per line (VPL) layout,
> albeit without verse tags. Some adjustments may be necessary if there
> any section headings, etc.
>
> 3. Add line numbers with the first number being reset to 1 at the
> start of each chapter, numbers incrementing by 1 for each line.
> 4. Add a left margin USFM verse tag \v_
>
> Steps 3&4 can be implemented in various ways. For my part, I’d use a
> bespoke TextPipe filter.
>
> Another method to consider might be to use Excel formulae. I recall
> resorting to such a method in the early days of Go Bible.
>
> Now restore the original layout by reverting steps 2 & 1, if this is
> really necessary. That is, if the original text layout appeared to be
> paragraphed.
>
> 5. Decide how & where to insert paragraph tags.
>
> 6. Add chapter tags, book ID and main title tags, etc.
>
> Hope this gives some useful suggestions that point towards a practical
> solution.
>
> Best regards
>
> David
>
>
> Sent from ProtonMail Mobile
>
>
> On Mon, May 13, 2019 at 14:57, Michael H <cmahte at gmail.com
> <mailto:cmahte at gmail.com>> wrote:
>> Cyrille
>>
>> LibreOffice Draw attempts to open the pagemaker file, with limited
>> success. But it confirms that even in the pagemaker source, the verse
>> numbers are a separate text stream. With this source, there is no way
>> to copy the text with verse numbers intact. It appears to be stored
>> with each book in it's own text stream. Each book is a separate text
>> stream in the page maker file. LO Draw isn't rendering all of the
>> pages, only the first 10, So I've only explored Matthew further.
>>
>> Based on Matthew only, the verses seem to all end with the character
>> "-" or ";/", which should aid in the reconstruction. I've looked
>> through the PDF and this seems to be the case for all books visually
>> as well. However, this isn't perfect: I find 1107 of these characters
>> in Matthew, instead of the expected 1071 verses. But since the text
>> stream has a book introduction, this is likely easily explained.
>> Hopefully this gets you well down the path to creating a stream with
>> verses.
>>
>> I would NOT start from the PDF file, but from the pagemaker file.
>> The PDF almost certainly has a lot of text rearranging and extra
>> characters like page numbers and running heads. Pagemaker has the
>> book text in a single stream, in a form that will convert to unicode
>> relatively easily.
>>
>
>
>
> _______________________________________________
> sword-devel mailing list: sword-devel at crosswire.org
> http://www.crosswire.org/mailman/listinfo/sword-devel
> Instructions to unsubscribe/change your settings at above page
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://www.crosswire.org/pipermail/sword-devel/attachments/20190513/77d5c6f9/attachment.html>
More information about the sword-devel
mailing list