[sword-devel] Creating a version of the BSB module with interlinear support

Mon Oct 2 06:42:01 EDT 2023


Le 02/10/2023 à 09:38, Timothy Allen a écrit :
>
> Ah, thanks. I did look at that page when I started making my module, 
> but I'd forgotten about it by the time I needed this more detailed 
> advice. Thanks for reminding me! Using this to update the guesses from 
> my original message:
>
> gloss
>     I *might* be able to try grabbing the first word from the
>     BDB/Thayer gloss, but that seems error-prone and I probably won't
>     bother unless somebody really wants it
> lemma
>     This should be used for Strongs numbers, marked up as
>     "strong:G123" or "strong:H123", but could also be used for storing
>     the original source text as "lemma.BSB:בְּרֵאשִׁ֖ית" if we assume
>     a hypothetical lexicon that indexes all the words in the BSB.
> morph
>     This should be used for Robinson morphology codes, so I should not
>     bother with this until I can figure out how to translate the BSB's
>     codes to Robinson ones. The wiki page also has "strongMorph" codes
>     in its examples, but I can't find any extra information on what
>     system this might refer to. Apparently there aren't any Hebrew
>     morphology lexicons available for SWORD; maybe someday I could
>     make one?
>

For Hebrew we have OSHM module.
>
> POS
>     Still unclear to me, it's not mentioned on the wiki page
> src
>     Apparently this is for word order in the source language, but it's
>     not at all clear where "word 1" is. The start of the <w> element?
>     The start of the verse? The start of the chapter? The start of the
>     book? The start of the Bible? Does it not matter, because
>     front-ends are intended to just sort the words they have?
> xlit
>     Still for the transliteration, simply enough.
>
> According to the wiki page, there's also an "n" attribute not 
> mentioned in the official OSIS docs, which is for "marking enumerated 
> words". I don't know what this means, and the wiki page doesn't 
> include any examples. I'm going to guess I don't need it.
>
>
> Do I have all that right? Is there anything I've misunderstood?
>
> Also, would it be better to have "lemma.BSB:בְּרֵאשִׁ֖ית" and use the 
> same "BSB" lexicon for every word in the entire text, or would it be 
> more appropriate to use "lemma.WLC:בְּרֵאשִׁ֖ית" and use different 
> lexicons to indicate the different sources used for the translation 
> (Nestle1904, TR, NA, SBL, etc.)?
>
>
> Timothy
>
>
> On 30/9/23 20:00, David Haslam wrote:
>> Hi Timothy,
>>
>> Please consult the developers’ wiki
>>
>> https://wiki.crosswire.org/
>>
>> And consult the page about OSIS Bibles.
>>
>> David
>>
>> Sent from Proton Mail <https://proton.me/mail/home> for iOS
>>
>>
>> On Sat, Sep 30, 2023 at 10:54, Timothy Allen <thristian at gmail.com 
>> <mailto:On Sat, Sep 30, 2023 at 10:54, Timothy Allen <<a href=>> wrote:
>>>
>>> The Berean Standard Bible is available in two machine-readable 
>>> formats: USFM, and "translation tables", a 40MB Excel spreadsheet 
>>> with a row for every Hebrew or Greek word in their chosen source 
>>> texts with the English text it's translated to. I would like to make 
>>> one module with the nice formatting of the USFM sources and the 
>>> metadata from the spreadsheet, so I've spent the last few weeks 
>>> writing a script that runs through them both in parallel and makes 
>>> sure everything lines up, so I'm now confident that I have an 
>>> accurate mapping between them.
>>>
>>> My question now is, how can I translate the data from the 
>>> spreadsheet into OSIS?
>>>
>>> Here's the information the spreadsheet gives me:
>>>
>>> Column
>>> 	Example
>>> 	Notes
>>> he_ordinal
>>> 	1
>>> 	"Hebrew Ordinal", increments for each spreadsheet row in the Old 
>>> Testament, set to 999999 for each row in the New Testament
>>> el_ordinal
>>> 	0
>>> 	"Greek Ordinal", set to 0 for each row in the Old Testament, 
>>> increments for each row in the New Testament, except for Mark 1:1 
>>> which has a word with the number 18379.5 (presumably something 
>>> needed to be inserted and they didn't want to renumber everything else)
>>> en_ordinal
>>> 	1
>>> 	"English Ordinal", increments for each spreadsheet row (except for 
>>> that word in Mark 1:1)
>>> language
>>> 	Hebrew
>>> 	"Hebrew", "Greek", or sometimes "Aramaic"
>>> verse_ordinal
>>> 	1
>>> 	Increments for each verse in the Bible, so every word in Genesis 
>>> 1:1 has "1", etc.
>>> source_word
>>> 	בְּרֵאשִׁ֖ית
>>> 	The word in the original source text. Sometimes includes fancy 
>>> brackets to mark sources other than WLC or Nestle 1904: {TR} ⧼RP⧽ 
>>> (WH) 〈NE〉 [NA] ‹SBL› [[ECM]]
>>> transliteration
>>> 	bə·rê·šîṯ
>>> 	A transliteration of the source word into the Latin alphabet
>>> grammar_code
>>> 	Prep-b | N-fs
>>> 	A code describing the grammatical form of the word; these don't 
>>> appear to be Robinson codes, but their own custom thing for Hebrew 
>>> (https://biblehub.com/hebrewparse.htm) and Greek 
>>> (https://biblehub.com/abbrev.htm)
>>> grammar_description
>>> 	Preposition-b | Noun - feminine singular
>>> 	The grammar code, unabbreviated
>>> strongs_number
>>> 	7225
>>> 	The Strongs number of the basic form of this word
>>> translation
>>> 	In the beginning
>>> 	The English text that appears in the BSB
>>> gloss
>>> 	1) first, beginning, best, chief
>>> 1a) beginning
>>> 1b) first
>>> 1c) chief
>>> 1d) choice part
>>> 	A definition from the Brown-Driver-Briggs Hebrew Lexicon, or 
>>> Thayer's Greek Definitions, as appropriate
>>>
>>> Looking at the OSIS 2.1.1 User's Manual (and sniffing around in the 
>>> KJVA module), to represent this information in OSIS I should use the 
>>> <w> element, which supports the following attributes (copy/pasted 
>>> from the Manual):
>>>
>>>   * *gloss* Record comments on a particular word or its usage.
>>>   * *lemma* Use to record the base form of a word.
>>>   * *morph* Use to record grammatical information for a word.
>>>   * *POS* Use to record the function of a word according to a
>>>     particular view of the language's syntax.
>>>   * *src* Use to record origin of the word.
>>>   * *xlit* Use to record a transliteration of a word.
>>>
>>> The first problem is that sometimes multiple source words are 
>>> translated into a single English span, and it's not made clear how 
>>> to express that in these attributes. From poking around in the KJVA 
>>> module, I get the impression these are supposed to be 
>>> space-delimited lists. Is that correct?
>>>
>>> Assuming that's the case, here's my guesses at how to fill out these 
>>> attributes for each span:
>>>
>>>   * *gloss* can't be done, because each gloss contains spaces which
>>>     means the displaying app can't figure out which part of the
>>>     gloss goes with which word
>>>   * *lemma* is where Strongs numbers go; Greek Strongs numbers
>>>     should be prefixed with "G" and Hebrew/Aramaic ones with "H0"
>>>   * *morph* might be used for the "grammar code" content, but I
>>>     would probably need to figure out how to translate them into
>>>     Robinson codes first, since that seems to be the only
>>>     morphological dictionary module in the Crosswire repositories
>>>   * *POS* is unclear to me, I don't see how it differs from the
>>>     "morph" attribute
>>>   * *src* is also unclear: is this for the word order (he_ordinal or
>>>     el_ordinal, possibly numbered from the beginning of the verse
>>>     rather than the beginning of the entire Bible) or the actual
>>>     choice of source text (Nestle1904, TR, NA, SBL, etc.)?
>>>   * *xlit* clearly comes from the "transliteration" field
>>>
>>> One thing that's clearly missing is where to put the source word. 
>>> How does that work?
>>>
>>> Is there other way to represent information that doesn't fit into 
>>> the <w> element? I'd like this module to be as useful as possible, 
>>> so I'm hesitant to toss out any information that can be usefully 
>>> represented.
>>>
>>> Is there anything else I've missed or misunderstood?
>>>
>>>
>>> Timothy.
>>>
>>
>> _______________________________________________
>> sword-devel mailing list:sword-devel at crosswire.org
>> http://crosswire.org/mailman/listinfo/sword-devel
>> Instructions to unsubscribe/change your settings at above page
>
> _______________________________________________
> sword-devel mailing list:sword-devel at crosswire.org
> http://crosswire.org/mailman/listinfo/sword-devel
> Instructions to unsubscribe/change your settings at above page
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://crosswire.org/pipermail/sword-devel/attachments/20231002/7ffd6ad7/attachment-0001.htm>