[sword-devel] OSHB 2.1: valid content for the lemma attribute ?
David Haslam
dfhdfh at protonmail.com
Tue Dec 29 04:28:04 EST 2020
Thanks Daniel.
Happy New Year!
David
Sent from ProtonMail Mobile
On Tue, Dec 29, 2020 at 09:20, Daniel Owens <dcowens76 at gmail.com> wrote:
> Good question. Looking at their readme file, this bit is relevant:
>
> *****
> *TOTHT - Tyndale OT Hebrew Tagged text
> <https://github.com/tyndale/STEPBible-Data>*
> The Leningrad codex based on Westminster via OpenScriptures, with full
> morphological and semantic tags for all words, prefixes and suffixes.
> Semantic tags use the extended Strongs linked to BDB by OS, is
> backwardly compatible with simple Strongs tags and includes all affixes
> (as defined in TBESH).
>
> *****
>
> If OS = Open Scriptures, then perhaps they got their Strongs linkings
> from us. That is at least plausible.
>
> Daniel
>
> On 12/28/20 8:15 PM, David Haslam wrote:
>> David,
>>
>> How do the Open Scriptures extensions to Strong’s numbers relate (if
>> at all) to the augmented Strong’s numbers documented by Tyndale House,
>> Cambridge and implemented in STEP Bible ?
>>
>> https://github.com/tyndale/STEPBible-Data<https://github.com/tyndale/STEPBible-Data>
>>
>> Best regards,
>>
>> David Haslam
>>
>> Sent from ProtonMail Mobile
>>
>>
>> On Mon, Dec 28, 2020 at 12:54, pierre amadio <amadio.pierre at gmail.com
>> <mailto:amadio.pierre at gmail.com>> wrote:
>>> Hello.
>>>
>>> I received the following feedback from Daniel Owens:
>>>
>>> #############
>>> When creating the OSHB, we ran in to the problem that Strong's numbers
>>> did not have a place for a number of prefixed lemma, including the
>>> inseparable prepositions and the vav conjunction. So we created some
>>> additional Strong's "numbers" to be able to mark up such lemma. You
>>> will notice that in the morph attribute, there are two parsings, "HR"
>>> for "Hebrew Preposition" and "HNcfsa" for "Hebrew Noun common feminine
>>> singular absolute". The preposition is the prefixed bet (בְּ). I hope
>>> that answers your question.
>>> #############
>>>
>>> I understand the logic behind the choice, but it looks to me this is
>>> is not behaving as expected with the Sword engine.
>>>
>>> Hebrew can express in a single word things that require several words
>>> in english.
>>> In my previous mail mentioning genesis 1:1 bereshit (in a beginning)
>>> is made out of 2 semantic units be/reshit:
>>>
>>> Excerpt from morphhb/oxlos-import/wlc.txt (where i think is the "raw"
>>> text used to build the module) from
>>> https://github.com/openscriptures/morphhb
>>> Gen 1:1.1 7225 בְּ/רֵאשִׁ֖ית
>>>
>>> Here we can see that / is used as a separator (or is it a reverse \ ?
>>> :-) )
>>>
>>> Let's look at an example with 3 elements, from genesis 12:1 in "Now
>>> the Lord had said unto Abram, Get thee out of thy country"
>>> the "out of your country" is a single word: from/earth-land/yours
>>> me-artze-ra
>>> Excerpt from morphbb's wlc.txt
>>> Gen 12:1.7 776 מֵ/אַרְצְ/ךָ֥
>>>
>>> If i look at this word with
>>> diatheke -b OSHB -o avlmn -f OSIS -k Genesis 12:1
>>>
>>> OSHB 1.4
>>> <w lemma="strong:H0776" morph="oshm:HR/Ncbsc/Sp2ms">מֵאַרְצְךָ</w>
>>> OSHB 2.1
>>> <w lemma="strong:Hm strong:H0776" morph="oshm:HR/Ncbsc/Sp2ms">מֵאַרְצְךָ</w>
>>>
>>> I see several problem:
>>>
>>> 1) As with bereshit (in a beginning), the strong number for the prefix
>>> is not a number and does not exist in the strong dictionary.
>>> This will probably result in unexpected behaviour from the frontend
>>> trying to show strong's number definitions.
>>>
>>> 2) with genesis example 12:1, the resulting xml node mention only 2
>>> elements (from/earth) מֵ/אַרְצְ and omit the "yours".
>>> I would have expected 3 entry מֵ/אַרְצְ/ךָ֥ in order to be consistent
>>> with how things are displayed with bereshit.
>>>
>>> I try to see if this had an effect with diatheke, looking for strong
>>> entry H0776:
>>>
>>> OSHB 1.4
>>> /usr/local/sword/bin/diatheke -b OSHB -s lucene -r Genesis -k
>>> "lemma:H0776"
>>> 252 matches
>>> /usr/local/sword/bin/diatheke -b OSHB -s attribute -r Genesis -k
>>> "Word//Lemma./H0776/"
>>> 292 matches
>>>
>>> OSHB 2.1
>>> /usr/local/sword/bin/diatheke -b OSHB -s lucene -r Genesis -k
>>> "lemma:H0776"
>>> 252 matches
>>> /usr/local/sword/bin/diatheke -b OSHB -s attribute -r Genesis -k
>>> "Word//Lemma./H0776/"
>>> 252 matches
>>>
>>> It looks like diatheke still finds the entry, now, what i do not
>>> understand is why the attribute search with version 1.4 find 292
>>> matches instead of 252 (which all the other research seems to agree
>>> on).
>>> _______________________________________________
>>> sword-devel mailing list: sword-devel at crosswire.org
>>> http://crosswire.org/mailman/listinfo/sword-devel
>>> Instructions to unsubscribe/change your settings at above page
>>
>>
>>
>> _______________________________________________
>> sword-devel mailing list: sword-devel at crosswire.org
>> http://crosswire.org/mailman/listinfo/sword-devel
>> Instructions to unsubscribe/change your settings at above page
> _______________________________________________
> sword-devel mailing list: sword-devel at crosswire.org
> http://crosswire.org/mailman/listinfo/sword-devel
> Instructions to unsubscribe/change your settings at above page
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://crosswire.org/pipermail/sword-devel/attachments/20201229/248a61af/attachment-0001.html>
More information about the sword-devel
mailing list