[sword-devel] Best way to encode interlinear texts in OSIS?

Thu Aug 9 06:48:06 MST 2007

Kahunapule Michael Johnson wrote:
> Any suggestions on the best way to encode interlinear texts in OSIS?
>
> Take a look at a good Greek/English interlinear New Testament for an
> idea of what I'm talking about.
>
> My current idea is to encode the translations a separate works, but add
> milestone markers at sync points to each translation. Any other thoughts?
>
> Michael
>   

A couple of comments first:
This is essentially a mapping function which can map:
1 English to 1 Greek
1 English to n Greek
n English to 1 Greek
m English to n Greek (m > 1, n > 1)
0 English to n Greek (n >= 1, words not translated)
n English to 0 Greek (n >= 1, words inserted)

Further, some interlinears note when tense changes from one language to 
another.

Words from a verse in Greek texts may differ from another, both in 
position and the actual word itself. There perhaps is less variation 
here than between different English translations. Editions of either 
text may vary.

So creating an interlinear needs to be between one edition of a English 
translation and one edition of a Greek translation. We have done this 
for the KJV and TR for the NT. In the KJV, we currently use <w> to do 
the mapping.

In these examples, the mapping is always between the same verses.
1 to 1 mapping:
<w src="3">Lord</w>
"Lord" corresponds to the 3rd word in the TR.

1 to n mapping:
<w src="3 4">everything</w>
"everything" maps to the 3rd and 4th word.

n to 1 mapping:
<w src="4">he ate</w>

m to n mapping:
<w src="3 4">May it never be</w>

0 to n mapping:
<w src="1"/>
<w src="3 4"/>
Arguably, these can be deduced. Note in the KJV, we don't use the 
latter. It would be a reasonable optimization. Also since these are 
words not translated, their position in the verse is immaterial.

n to 0 mapping:
<transChange type="added">The</transChange>

Marking tense changes:
<transChange type="tense">has</transChange>

There are some more complicated examples, which would be quite prevalent 
in a thought-for-thought, dynamic translation, but are also present in a 
word-for-word, literal translation.
This is a slightly more complicated example:
<w src="3 6">everything</w> <w src="5">good</w>
"everything" maps to the 3rd and 6th word. "good" maps to the 5th.

Finally there is the reverse possibility of a single Greek word whose 
translation is not contiguous words. Here we use type="x-split" to 
indicate that the second occurrence of "12" goes with the first. (BTW, 
the subType numbers the extra tokens as parts beyond the Greek. Here 
there are 16 words in Greek. So the subType starts at 17. I have no idea 
how this could be useful.)

In the following the Greek reads as a word for word translation:
... her to make a publick example ... to put away her.
But is translated
... to make her a publick example ... to put her away...

<verse osisID="Matt.1.19">
<w src="2">Then</w>
<w src="1">Joseph</w>
<w src="5">her</w>
<w src="3 4">husband</w>,
<w src="7">being</w>
<w src="6">a just</w>
<transChange type="added">man</transChange>,
<w src="8">and</w>
<w src="9">not</w>
<w src="10">willing</w>
<w src="12">to make</w>
<w src="11">her</w>
<w src="12" type="x-split" subType="x-17">a publick example</w>,
<w src="13">was minded</w>
<w src="15">to put</w>
<w src="16">her</w>
<w src="15" type="x-split" subType="x-18">away</w>
<w src="14">privily</w>.
</verse>

So from this information, it would be able to present a English/Greek 
interlinear and a Greek/English one as well.

As an added bonus, the lemma and morph are maintained as parallel, space 
separated lists.
So when src="1 2 3"
then lemmas for Strong's will be
lemma="strong:A strong:B strong:C"
and morphs for Robinson's will be:
morph="robinson:X robinson:Y robinson:Z"

This maps (1,strong:A,robinson:X), (2,strong:B,robinson:Y), and 
(3,strong:C,robinson:Z).

Externalizing the mapping is an interesting idea. Especially if there 
were to be more than one mapping.