<html><head></head><body> <div dir="auto">Hi Timothy,<caret></caret></div><div dir="auto"><br></div><div dir="auto">Please consult the developers’ wiki</div><div dir="auto"><br></div><div dir="auto">https://wiki.crosswire.org/</div><div dir="auto"><br></div><div dir="auto">And consult the page about OSIS Bibles. </div><div dir="auto"><br></div><div dir="auto">David</div><div><br></div> <div id="protonmail_mobile_signature_block"><div>Sent from <a href="https://proton.me/mail/home">Proton Mail</a> for iOS</div></div> <div><br></div><div><br></div>On Sat, Sep 30, 2023 at 10:54, Timothy Allen <<a class="" href="mailto:On Sat, Sep 30, 2023 at 10:54, Timothy Allen <<a href=">thristian@gmail.com</a>> wrote:<blockquote type="cite" class="protonmail_quote">
<p>The Berean Standard Bible is available in two machine-readable
formats: USFM, and "translation tables", a 40MB Excel spreadsheet
with a row for every Hebrew or Greek word in their chosen source
texts with the English text it's translated to. I would like to
make one module with the nice formatting of the USFM sources and
the metadata from the spreadsheet, so I've spent the last few
weeks writing a script that runs through them both in parallel and
makes sure everything lines up, so I'm now confident that I have
an accurate mapping between them.</p>
<p>My question now is, how can I translate the data from the
spreadsheet into OSIS?</p>
<p>Here's the information the spreadsheet gives me:</p>
<table border="1" cellpadding="2" cellspacing="2" width="100%">
<tbody>
<tr>
<th valign="top">Column<br>
</th>
<th valign="top">Example<br>
</th>
<th valign="top">Notes<br>
</th>
</tr>
<tr>
<td valign="top">he_ordinal<br>
</td>
<td valign="top">1<br>
</td>
<td valign="top">"Hebrew Ordinal", increments for each
spreadsheet row in the Old Testament, set to 999999 for each
row in the New Testament<br>
</td>
</tr>
<tr>
<td valign="top">el_ordinal<br>
</td>
<td valign="top">0<br>
</td>
<td valign="top">"Greek Ordinal", set to 0 for each row in the
Old Testament, increments for each row in the New Testament,
except for Mark 1:1 which has a word with the number 18379.5
(presumably something needed to be inserted and they didn't
want to renumber everything else)<br>
</td>
</tr>
<tr>
<td valign="top">en_ordinal<br>
</td>
<td valign="top">1<br>
</td>
<td valign="top">"English Ordinal", increments for each
spreadsheet row (except for that word in Mark 1:1)<br>
</td>
</tr>
<tr>
<td valign="top">language<br>
</td>
<td valign="top">Hebrew<br>
</td>
<td valign="top">"Hebrew", "Greek", or sometimes "Aramaic"<br>
</td>
</tr>
<tr>
<td valign="top">verse_ordinal<br>
</td>
<td valign="top">1<br>
</td>
<td valign="top">Increments for each verse in the Bible, so
every word in Genesis 1:1 has "1", etc.<br>
</td>
</tr>
<tr>
<td valign="top">source_word<br>
</td>
<td valign="top">בְּרֵאשִׁ֖ית<br>
</td>
<td valign="top">The word in the original source text.
Sometimes includes fancy brackets to mark sources other than
WLC or Nestle 1904: {TR} ⧼RP⧽ (WH) 〈NE〉 [NA] ‹SBL› [[ECM]]<br>
</td>
</tr>
<tr>
<td valign="top">transliteration<br>
</td>
<td valign="top">bə·rê·šîṯ<br>
</td>
<td valign="top">A transliteration of the source word into the
Latin alphabet<br>
</td>
</tr>
<tr>
<td valign="top">grammar_code<br>
</td>
<td valign="top">Prep-b | N-fs<br>
</td>
<td valign="top">A code describing the grammatical form of the
word; these don't appear to be Robinson codes, but their own
custom thing for Hebrew
(<a href="https://biblehub.com/hebrewparse.htm" class="moz-txt-link-freetext">https://biblehub.com/hebrewparse.htm</a>) and Greek
(<a href="https://biblehub.com/abbrev.htm" class="moz-txt-link-freetext">https://biblehub.com/abbrev.htm</a>)<br>
</td>
</tr>
<tr>
<td valign="top">grammar_description<br>
</td>
<td valign="top">Preposition-b | Noun - feminine singular<br>
</td>
<td valign="top">The grammar code, unabbreviated<br>
</td>
</tr>
<tr>
<td valign="top">strongs_number<br>
</td>
<td valign="top">7225<br>
</td>
<td valign="top">The Strongs number of the basic form of this
word<br>
</td>
</tr>
<tr>
<td valign="top">translation<br>
</td>
<td valign="top">In the beginning<br>
</td>
<td valign="top">The English text that appears in the BSB<br>
</td>
</tr>
<tr>
<td valign="top">gloss<br>
</td>
<td valign="top">1) first, beginning, best, chief<br>
1a) beginning<br>
1b) first<br>
1c) chief<br>
1d) choice part<br>
</td>
<td valign="top">A definition from the Brown-Driver-Briggs
Hebrew Lexicon, or Thayer's Greek Definitions, as
appropriate<br>
</td>
</tr>
</tbody>
</table>
<p>Looking at the OSIS 2.1.1 User's Manual (and sniffing around in
the KJVA module), to represent this information in OSIS I should
use the <w> element, which supports the following attributes
(copy/pasted from the Manual):</p>
<ul>
<li><b>gloss</b> Record comments on a particular word or its
usage.</li>
<li><b>lemma</b> Use to record the base form of a word.</li>
<li><b>morph</b> Use to record grammatical information for a word.</li>
<li><b>POS</b> Use to record the function of a word according to a
particular view of the language's syntax.</li>
<li><b>src</b> Use to record origin of the word.</li>
<li><b>xlit</b> Use to record a transliteration of a word.</li>
</ul>
<p>The first problem is that sometimes multiple source words are
translated into a single English span, and it's not made clear how
to express that in these attributes. From poking around in the
KJVA module, I get the impression these are supposed to be
space-delimited lists. Is that correct?</p>
<p>Assuming that's the case, here's my guesses at how to fill out
these attributes for each span:</p>
<ul>
<li><b>gloss</b> can't be done, because each gloss contains spaces
which means the displaying app can't figure out which part of
the gloss goes with which word</li>
<li><b>lemma</b> is where Strongs numbers go; Greek Strongs
numbers should be prefixed with "G" and Hebrew/Aramaic ones with
"H0"</li>
<li><b>morph</b> might be used for the "grammar code" content, but
I would probably need to figure out how to translate them into
Robinson codes first, since that seems to be the only
morphological dictionary module in the Crosswire repositories</li>
<li><b>POS</b> is unclear to me, I don't see how it differs from
the "morph" attribute</li>
<li><b>src</b> is also unclear: is this for the word order
(he_ordinal or el_ordinal, possibly numbered from the beginning
of the verse rather than the beginning of the entire Bible) or
the actual choice of source text (Nestle1904, TR, NA, SBL,
etc.)?</li>
<li><b>xlit</b> clearly comes from the "transliteration" field</li>
</ul>
<p>One thing that's clearly missing is where to put the source word.
How does that work?<br>
</p>
<p>Is there other way to represent information that doesn't fit into
the <w> element? I'd like this module to be as useful as
possible, so I'm hesitant to toss out any information that can be
usefully represented.</p>
<p>Is there anything else I've missed or misunderstood?</p>
<p><br>
</p>
<p>Timothy.<br>
</p>
</blockquote></body></html>