[sword-devel] Best way to encode interlinear texts in OSIS?
DM Smith
dmsmith555 at yahoo.com
Thu Aug 9 06:48:06 MST 2007
Kahunapule Michael Johnson wrote:
> Any suggestions on the best way to encode interlinear texts in OSIS?
>
> Take a look at a good Greek/English interlinear New Testament for an
> idea of what I'm talking about.
>
> My current idea is to encode the translations a separate works, but add
> milestone markers at sync points to each translation. Any other thoughts?
>
> Michael
>
A couple of comments first:
This is essentially a mapping function which can map:
1 English to 1 Greek
1 English to n Greek
n English to 1 Greek
m English to n Greek (m > 1, n > 1)
0 English to n Greek (n >= 1, words not translated)
n English to 0 Greek (n >= 1, words inserted)
Further, some interlinears note when tense changes from one language to
another.
Words from a verse in Greek texts may differ from another, both in
position and the actual word itself. There perhaps is less variation
here than between different English translations. Editions of either
text may vary.
So creating an interlinear needs to be between one edition of a English
translation and one edition of a Greek translation. We have done this
for the KJV and TR for the NT. In the KJV, we currently use <w> to do
the mapping.
In these examples, the mapping is always between the same verses.
1 to 1 mapping:
<w src="3">Lord</w>
"Lord" corresponds to the 3rd word in the TR.
1 to n mapping:
<w src="3 4">everything</w>
"everything" maps to the 3rd and 4th word.
n to 1 mapping:
<w src="4">he ate</w>
m to n mapping:
<w src="3 4">May it never be</w>
0 to n mapping:
<w src="1"/>
<w src="3 4"/>
Arguably, these can be deduced. Note in the KJV, we don't use the
latter. It would be a reasonable optimization. Also since these are
words not translated, their position in the verse is immaterial.
n to 0 mapping:
<transChange type="added">The</transChange>
Marking tense changes:
<transChange type="tense">has</transChange>
There are some more complicated examples, which would be quite prevalent
in a thought-for-thought, dynamic translation, but are also present in a
word-for-word, literal translation.
This is a slightly more complicated example:
<w src="3 6">everything</w> <w src="5">good</w>
"everything" maps to the 3rd and 6th word. "good" maps to the 5th.
Finally there is the reverse possibility of a single Greek word whose
translation is not contiguous words. Here we use type="x-split" to
indicate that the second occurrence of "12" goes with the first. (BTW,
the subType numbers the extra tokens as parts beyond the Greek. Here
there are 16 words in Greek. So the subType starts at 17. I have no idea
how this could be useful.)
In the following the Greek reads as a word for word translation:
... her to make a publick example ... to put away her.
But is translated
... to make her a publick example ... to put her away...
<verse osisID="Matt.1.19">
<w src="2">Then</w>
<w src="1">Joseph</w>
<w src="5">her</w>
<w src="3 4">husband</w>,
<w src="7">being</w>
<w src="6">a just</w>
<transChange type="added">man</transChange>,
<w src="8">and</w>
<w src="9">not</w>
<w src="10">willing</w>
<w src="12">to make</w>
<w src="11">her</w>
<w src="12" type="x-split" subType="x-17">a publick example</w>,
<w src="13">was minded</w>
<w src="15">to put</w>
<w src="16">her</w>
<w src="15" type="x-split" subType="x-18">away</w>
<w src="14">privily</w>.
</verse>
So from this information, it would be able to present a English/Greek
interlinear and a Greek/English one as well.
As an added bonus, the lemma and morph are maintained as parallel, space
separated lists.
So when src="1 2 3"
then lemmas for Strong's will be
lemma="strong:A strong:B strong:C"
and morphs for Robinson's will be:
morph="robinson:X robinson:Y robinson:Z"
This maps (1,strong:A,robinson:X), (2,strong:B,robinson:Y), and
(3,strong:C,robinson:Z).
Externalizing the mapping is an interesting idea. Especially if there
were to be more than one mapping.
More information about the sword-devel
mailing list