[sword-devel] Creating a version of the BSB module with interlinear support
Fr Cyrille
fr.cyrille at tiberiade.be
Mon Oct 2 06:42:01 EDT 2023
Le 02/10/2023 à 09:38, Timothy Allen a écrit :
>
> Ah, thanks. I did look at that page when I started making my module,
> but I'd forgotten about it by the time I needed this more detailed
> advice. Thanks for reminding me! Using this to update the guesses from
> my original message:
>
> gloss
> I *might* be able to try grabbing the first word from the
> BDB/Thayer gloss, but that seems error-prone and I probably won't
> bother unless somebody really wants it
> lemma
> This should be used for Strongs numbers, marked up as
> "strong:G123" or "strong:H123", but could also be used for storing
> the original source text as "lemma.BSB:בְּרֵאשִׁ֖ית" if we assume
> a hypothetical lexicon that indexes all the words in the BSB.
> morph
> This should be used for Robinson morphology codes, so I should not
> bother with this until I can figure out how to translate the BSB's
> codes to Robinson ones. The wiki page also has "strongMorph" codes
> in its examples, but I can't find any extra information on what
> system this might refer to. Apparently there aren't any Hebrew
> morphology lexicons available for SWORD; maybe someday I could
> make one?
>
For Hebrew we have OSHM module.
>
> POS
> Still unclear to me, it's not mentioned on the wiki page
> src
> Apparently this is for word order in the source language, but it's
> not at all clear where "word 1" is. The start of the <w> element?
> The start of the verse? The start of the chapter? The start of the
> book? The start of the Bible? Does it not matter, because
> front-ends are intended to just sort the words they have?
> xlit
> Still for the transliteration, simply enough.
>
> According to the wiki page, there's also an "n" attribute not
> mentioned in the official OSIS docs, which is for "marking enumerated
> words". I don't know what this means, and the wiki page doesn't
> include any examples. I'm going to guess I don't need it.
>
>
> Do I have all that right? Is there anything I've misunderstood?
>
> Also, would it be better to have "lemma.BSB:בְּרֵאשִׁ֖ית" and use the
> same "BSB" lexicon for every word in the entire text, or would it be
> more appropriate to use "lemma.WLC:בְּרֵאשִׁ֖ית" and use different
> lexicons to indicate the different sources used for the translation
> (Nestle1904, TR, NA, SBL, etc.)?
>
>
> Timothy
>
>
> On 30/9/23 20:00, David Haslam wrote:
>> Hi Timothy,
>>
>> Please consult the developers’ wiki
>>
>> https://wiki.crosswire.org/
>>
>> And consult the page about OSIS Bibles.
>>
>> David
>>
>> Sent from Proton Mail <https://proton.me/mail/home> for iOS
>>
>>
>> On Sat, Sep 30, 2023 at 10:54, Timothy Allen <thristian at gmail.com
>> <mailto:On Sat, Sep 30, 2023 at 10:54, Timothy Allen <<a href=>> wrote:
>>>
>>> The Berean Standard Bible is available in two machine-readable
>>> formats: USFM, and "translation tables", a 40MB Excel spreadsheet
>>> with a row for every Hebrew or Greek word in their chosen source
>>> texts with the English text it's translated to. I would like to make
>>> one module with the nice formatting of the USFM sources and the
>>> metadata from the spreadsheet, so I've spent the last few weeks
>>> writing a script that runs through them both in parallel and makes
>>> sure everything lines up, so I'm now confident that I have an
>>> accurate mapping between them.
>>>
>>> My question now is, how can I translate the data from the
>>> spreadsheet into OSIS?
>>>
>>> Here's the information the spreadsheet gives me:
>>>
>>> Column
>>> Example
>>> Notes
>>> he_ordinal
>>> 1
>>> "Hebrew Ordinal", increments for each spreadsheet row in the Old
>>> Testament, set to 999999 for each row in the New Testament
>>> el_ordinal
>>> 0
>>> "Greek Ordinal", set to 0 for each row in the Old Testament,
>>> increments for each row in the New Testament, except for Mark 1:1
>>> which has a word with the number 18379.5 (presumably something
>>> needed to be inserted and they didn't want to renumber everything else)
>>> en_ordinal
>>> 1
>>> "English Ordinal", increments for each spreadsheet row (except for
>>> that word in Mark 1:1)
>>> language
>>> Hebrew
>>> "Hebrew", "Greek", or sometimes "Aramaic"
>>> verse_ordinal
>>> 1
>>> Increments for each verse in the Bible, so every word in Genesis
>>> 1:1 has "1", etc.
>>> source_word
>>> בְּרֵאשִׁ֖ית
>>> The word in the original source text. Sometimes includes fancy
>>> brackets to mark sources other than WLC or Nestle 1904: {TR} ⧼RP⧽
>>> (WH) 〈NE〉 [NA] ‹SBL› [[ECM]]
>>> transliteration
>>> bə·rê·šîṯ
>>> A transliteration of the source word into the Latin alphabet
>>> grammar_code
>>> Prep-b | N-fs
>>> A code describing the grammatical form of the word; these don't
>>> appear to be Robinson codes, but their own custom thing for Hebrew
>>> (https://biblehub.com/hebrewparse.htm) and Greek
>>> (https://biblehub.com/abbrev.htm)
>>> grammar_description
>>> Preposition-b | Noun - feminine singular
>>> The grammar code, unabbreviated
>>> strongs_number
>>> 7225
>>> The Strongs number of the basic form of this word
>>> translation
>>> In the beginning
>>> The English text that appears in the BSB
>>> gloss
>>> 1) first, beginning, best, chief
>>> 1a) beginning
>>> 1b) first
>>> 1c) chief
>>> 1d) choice part
>>> A definition from the Brown-Driver-Briggs Hebrew Lexicon, or
>>> Thayer's Greek Definitions, as appropriate
>>>
>>> Looking at the OSIS 2.1.1 User's Manual (and sniffing around in the
>>> KJVA module), to represent this information in OSIS I should use the
>>> <w> element, which supports the following attributes (copy/pasted
>>> from the Manual):
>>>
>>> * *gloss* Record comments on a particular word or its usage.
>>> * *lemma* Use to record the base form of a word.
>>> * *morph* Use to record grammatical information for a word.
>>> * *POS* Use to record the function of a word according to a
>>> particular view of the language's syntax.
>>> * *src* Use to record origin of the word.
>>> * *xlit* Use to record a transliteration of a word.
>>>
>>> The first problem is that sometimes multiple source words are
>>> translated into a single English span, and it's not made clear how
>>> to express that in these attributes. From poking around in the KJVA
>>> module, I get the impression these are supposed to be
>>> space-delimited lists. Is that correct?
>>>
>>> Assuming that's the case, here's my guesses at how to fill out these
>>> attributes for each span:
>>>
>>> * *gloss* can't be done, because each gloss contains spaces which
>>> means the displaying app can't figure out which part of the
>>> gloss goes with which word
>>> * *lemma* is where Strongs numbers go; Greek Strongs numbers
>>> should be prefixed with "G" and Hebrew/Aramaic ones with "H0"
>>> * *morph* might be used for the "grammar code" content, but I
>>> would probably need to figure out how to translate them into
>>> Robinson codes first, since that seems to be the only
>>> morphological dictionary module in the Crosswire repositories
>>> * *POS* is unclear to me, I don't see how it differs from the
>>> "morph" attribute
>>> * *src* is also unclear: is this for the word order (he_ordinal or
>>> el_ordinal, possibly numbered from the beginning of the verse
>>> rather than the beginning of the entire Bible) or the actual
>>> choice of source text (Nestle1904, TR, NA, SBL, etc.)?
>>> * *xlit* clearly comes from the "transliteration" field
>>>
>>> One thing that's clearly missing is where to put the source word.
>>> How does that work?
>>>
>>> Is there other way to represent information that doesn't fit into
>>> the <w> element? I'd like this module to be as useful as possible,
>>> so I'm hesitant to toss out any information that can be usefully
>>> represented.
>>>
>>> Is there anything else I've missed or misunderstood?
>>>
>>>
>>> Timothy.
>>>
>>
>> _______________________________________________
>> sword-devel mailing list:sword-devel at crosswire.org
>> http://crosswire.org/mailman/listinfo/sword-devel
>> Instructions to unsubscribe/change your settings at above page
>
> _______________________________________________
> sword-devel mailing list:sword-devel at crosswire.org
> http://crosswire.org/mailman/listinfo/sword-devel
> Instructions to unsubscribe/change your settings at above page
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://crosswire.org/pipermail/sword-devel/attachments/20231002/7ffd6ad7/attachment-0001.htm>
More information about the sword-devel
mailing list