[sword-devel] Creating a version of the BSB module with interlinear support

Mon Oct 2 03:50:14 EDT 2023

Morphology is not restricted to Robinson.

The wiki page merely gave that as an example.

A different morphology dictionary could be specified in the OSIS header.

That can be done even before any such dictionary module has been created.

David

Sent from [Proton Mail](https://proton.me/mail/home) for iOS

On Mon, Oct 2, 2023 at 08:38, Timothy Allen <[thristian at gmail.com](mailto:On Mon, Oct 2, 2023 at 08:38, Timothy Allen <<a href=)> wrote:

> Ah, thanks. I did look at that page when I started making my module, but I'd forgotten about it by the time I needed this more detailed advice. Thanks for reminding me! Using this to update the guesses from my original message:
>
> gloss I *might* be able to try grabbing the first word from the BDB/Thayer gloss, but that seems error-prone and I probably won't bother unless somebody really wants it lemma This should be used for Strongs numbers, marked up as "strong:G123" or "strong:H123", but could also be used for storing the original source text as "lemma.BSB:בְּרֵאשִׁ֖ית" if we assume a hypothetical lexicon that indexes all the words in the BSB. morph This should be used for Robinson morphology codes, so I should not bother with this until I can figure out how to translate the BSB's codes to Robinson ones. The wiki page also has "strongMorph" codes in its examples, but I can't find any extra information on what system this might refer to. Apparently there aren't any Hebrew morphology lexicons available for SWORD; maybe someday I could make one? POS Still unclear to me, it's not mentioned on the wiki page src Apparently this is for word order in the source language, but it's not at all clear where "word 1" is. The start of the <w> element? The start of the verse? The start of the chapter? The start of the book? The start of the Bible? Does it not matter, because front-ends are intended to just sort the words they have? xlit Still for the transliteration, simply enough.
>
> According to the wiki page, there's also an "n" attribute not mentioned in the official OSIS docs, which is for "marking enumerated words". I don't know what this means, and the wiki page doesn't include any examples. I'm going to guess I don't need it.
>
> Do I have all that right? Is there anything I've misunderstood?
>
> Also, would it be better to have "lemma.BSB:בְּרֵאשִׁ֖ית" and use the same "BSB" lexicon for every word in the entire text, or would it be more appropriate to use "lemma.WLC:בְּרֵאשִׁ֖ית" and use different lexicons to indicate the different sources used for the translation (Nestle1904, TR, NA, SBL, etc.)?
>
> Timothy
>
> On 30/9/23 20:00, David Haslam wrote:
>
>> Hi Timothy,
>>
>> Please consult the developers’ wiki
>>
>> https://wiki.crosswire.org/
>>
>> And consult the page about OSIS Bibles.
>>
>> David
>>
>> Sent from [Proton Mail](https://proton.me/mail/home) for iOS
>>
>> On Sat, Sep 30, 2023 at 10:54, Timothy Allen <[thristian at gmail.com](mailto:On Sat, Sep 30, 2023 at 10:54, Timothy Allen <<a href=)> wrote:
>>
>>> The Berean Standard Bible is available in two machine-readable formats: USFM, and "translation tables", a 40MB Excel spreadsheet with a row for every Hebrew or Greek word in their chosen source texts with the English text it's translated to. I would like to make one module with the nice formatting of the USFM sources and the metadata from the spreadsheet, so I've spent the last few weeks writing a script that runs through them both in parallel and makes sure everything lines up, so I'm now confident that I have an accurate mapping between them.
>>>
>>> My question now is, how can I translate the data from the spreadsheet into OSIS?
>>>
>>> Here's the information the spreadsheet gives me:
>>>
>>> Column	Example	Notes
>>> he_ordinal	1	"Hebrew Ordinal", increments for each spreadsheet row in the Old Testament, set to 999999 for each row in the New Testament
>>> el_ordinal	0	"Greek Ordinal", set to 0 for each row in the Old Testament, increments for each row in the New Testament, except for Mark 1:1 which has a word with the number 18379.5 (presumably something needed to be inserted and they didn't want to renumber everything else)
>>> en_ordinal	1	"English Ordinal", increments for each spreadsheet row (except for that word in Mark 1:1)
>>> language	Hebrew	"Hebrew", "Greek", or sometimes "Aramaic"
>>> verse_ordinal	1	Increments for each verse in the Bible, so every word in Genesis 1:1 has "1", etc.
>>> source_word	בְּרֵאשִׁ֖ית	The word in the original source text. Sometimes includes fancy brackets to mark sources other than WLC or Nestle 1904: {TR} ⧼RP⧽ (WH) 〈NE〉 [NA] ‹SBL› [[ECM]]
>>> transliteration	bə·rê·šîṯ	A transliteration of the source word into the Latin alphabet
>>> grammar_code	Prep-b | N-fs	A code describing the grammatical form of the word; these don't appear to be Robinson codes, but their own custom thing for Hebrew (https://biblehub.com/hebrewparse.htm) and Greek (https://biblehub.com/abbrev.htm)
>>> grammar_description	Preposition-b | Noun - feminine singular	The grammar code, unabbreviated
>>> strongs_number	7225	The Strongs number of the basic form of this word
>>> translation	In the beginning	The English text that appears in the BSB
>>> gloss	1) first, beginning, best, chief
>>> 1a) beginning
>>> 1b) first
>>> 1c) chief
>>> 1d) choice part	A definition from the Brown-Driver-Briggs Hebrew Lexicon, or Thayer's Greek Definitions, as appropriate
>>>
>>> Looking at the OSIS 2.1.1 User's Manual (and sniffing around in the KJVA module), to represent this information in OSIS I should use the <w> element, which supports the following attributes (copy/pasted from the Manual):
>>>
>>> - gloss Record comments on a particular word or its usage.
>>> - lemma Use to record the base form of a word.
>>> - morph Use to record grammatical information for a word.
>>> - POS Use to record the function of a word according to a particular view of the language's syntax.
>>> - src Use to record origin of the word.
>>> - xlit Use to record a transliteration of a word.
>>>
>>> The first problem is that sometimes multiple source words are translated into a single English span, and it's not made clear how to express that in these attributes. From poking around in the KJVA module, I get the impression these are supposed to be space-delimited lists. Is that correct?
>>>
>>> Assuming that's the case, here's my guesses at how to fill out these attributes for each span:
>>>
>>> - gloss can't be done, because each gloss contains spaces which means the displaying app can't figure out which part of the gloss goes with which word
>>> - lemma is where Strongs numbers go; Greek Strongs numbers should be prefixed with "G" and Hebrew/Aramaic ones with "H0"
>>> - morph might be used for the "grammar code" content, but I would probably need to figure out how to translate them into Robinson codes first, since that seems to be the only morphological dictionary module in the Crosswire repositories
>>> - POS is unclear to me, I don't see how it differs from the "morph" attribute
>>> - src is also unclear: is this for the word order (he_ordinal or el_ordinal, possibly numbered from the beginning of the verse rather than the beginning of the entire Bible) or the actual choice of source text (Nestle1904, TR, NA, SBL, etc.)?
>>> - xlit clearly comes from the "transliteration" field
>>>
>>> One thing that's clearly missing is where to put the source word. How does that work?
>>>
>>> Is there other way to represent information that doesn't fit into the <w> element? I'd like this module to be as useful as possible, so I'm hesitant to toss out any information that can be usefully represented.
>>>
>>> Is there anything else I've missed or misunderstood?
>>>
>>> Timothy.
>>
>> _______________________________________________
>> sword-devel mailing list:
>> sword-devel at crosswire.org
>>
>> http://crosswire.org/mailman/listinfo/sword-devel
>> Instructions to unsubscribe/change your settings at above page
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://crosswire.org/pipermail/sword-devel/attachments/20231002/83f2b1e8/attachment-0001.htm>