[sword-devel] Creating a version of the BSB module with interlinear support

David Haslam dfhdfh at protonmail.com
Sat Sep 30 06:00:53 EDT 2023


Hi Timothy,

Please consult the developers’ wiki

https://wiki.crosswire.org/

And consult the page about OSIS Bibles.

David

Sent from [Proton Mail](https://proton.me/mail/home) for iOS

On Sat, Sep 30, 2023 at 10:54, Timothy Allen <[thristian at gmail.com](mailto:On Sat, Sep 30, 2023 at 10:54, Timothy Allen <<a href=)> wrote:

> The Berean Standard Bible is available in two machine-readable formats: USFM, and "translation tables", a 40MB Excel spreadsheet with a row for every Hebrew or Greek word in their chosen source texts with the English text it's translated to. I would like to make one module with the nice formatting of the USFM sources and the metadata from the spreadsheet, so I've spent the last few weeks writing a script that runs through them both in parallel and makes sure everything lines up, so I'm now confident that I have an accurate mapping between them.
>
> My question now is, how can I translate the data from the spreadsheet into OSIS?
>
> Here's the information the spreadsheet gives me:
>
> Column	Example	Notes
> he_ordinal	1	"Hebrew Ordinal", increments for each spreadsheet row in the Old Testament, set to 999999 for each row in the New Testament
> el_ordinal	0	"Greek Ordinal", set to 0 for each row in the Old Testament, increments for each row in the New Testament, except for Mark 1:1 which has a word with the number 18379.5 (presumably something needed to be inserted and they didn't want to renumber everything else)
> en_ordinal	1	"English Ordinal", increments for each spreadsheet row (except for that word in Mark 1:1)
> language	Hebrew	"Hebrew", "Greek", or sometimes "Aramaic"
> verse_ordinal	1	Increments for each verse in the Bible, so every word in Genesis 1:1 has "1", etc.
> source_word	בְּרֵאשִׁ֖ית	The word in the original source text. Sometimes includes fancy brackets to mark sources other than WLC or Nestle 1904: {TR} ⧼RP⧽ (WH) 〈NE〉 [NA] ‹SBL› [[ECM]]
> transliteration	bə·rê·šîṯ	A transliteration of the source word into the Latin alphabet
> grammar_code	Prep-b | N-fs	A code describing the grammatical form of the word; these don't appear to be Robinson codes, but their own custom thing for Hebrew (https://biblehub.com/hebrewparse.htm) and Greek (https://biblehub.com/abbrev.htm)
> grammar_description	Preposition-b | Noun - feminine singular	The grammar code, unabbreviated
> strongs_number	7225	The Strongs number of the basic form of this word
> translation	In the beginning	The English text that appears in the BSB
> gloss	1) first, beginning, best, chief
> 1a) beginning
> 1b) first
> 1c) chief
> 1d) choice part	A definition from the Brown-Driver-Briggs Hebrew Lexicon, or Thayer's Greek Definitions, as appropriate
>
> Looking at the OSIS 2.1.1 User's Manual (and sniffing around in the KJVA module), to represent this information in OSIS I should use the <w> element, which supports the following attributes (copy/pasted from the Manual):
>
> - gloss Record comments on a particular word or its usage.
> - lemma Use to record the base form of a word.
> - morph Use to record grammatical information for a word.
> - POS Use to record the function of a word according to a particular view of the language's syntax.
> - src Use to record origin of the word.
> - xlit Use to record a transliteration of a word.
>
> The first problem is that sometimes multiple source words are translated into a single English span, and it's not made clear how to express that in these attributes. From poking around in the KJVA module, I get the impression these are supposed to be space-delimited lists. Is that correct?
>
> Assuming that's the case, here's my guesses at how to fill out these attributes for each span:
>
> - gloss can't be done, because each gloss contains spaces which means the displaying app can't figure out which part of the gloss goes with which word
> - lemma is where Strongs numbers go; Greek Strongs numbers should be prefixed with "G" and Hebrew/Aramaic ones with "H0"
> - morph might be used for the "grammar code" content, but I would probably need to figure out how to translate them into Robinson codes first, since that seems to be the only morphological dictionary module in the Crosswire repositories
> - POS is unclear to me, I don't see how it differs from the "morph" attribute
> - src is also unclear: is this for the word order (he_ordinal or el_ordinal, possibly numbered from the beginning of the verse rather than the beginning of the entire Bible) or the actual choice of source text (Nestle1904, TR, NA, SBL, etc.)?
> - xlit clearly comes from the "transliteration" field
>
> One thing that's clearly missing is where to put the source word. How does that work?
>
> Is there other way to represent information that doesn't fit into the <w> element? I'd like this module to be as useful as possible, so I'm hesitant to toss out any information that can be usefully represented.
>
> Is there anything else I've missed or misunderstood?
>
> Timothy.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://crosswire.org/pipermail/sword-devel/attachments/20230930/b1a22b69/attachment.htm>


More information about the sword-devel mailing list