[sword-devel] OSISMorphSegmentation

David Haslam dfhdfh at protonmail.com
Sat Dec 30 10:19:12 MST 2017


Hi DM,

Oh - indeed - I wholly agree.

The OpenScriptures morphhb team has used the solidus in a manner that goes against the whole philosophy of OSIS.
It's our job to help steer them back to the right track, as we probably have more expertise in OSIS than most other organisations put together.

btw. Although I'm not a Hebraist myself, it's still fair to say that I've been looking at various Biblical Hebrew text sources on and off since February 2014.

It's not as if they are unaware that XML elements such as seg cannot be used for particular semantic purposes either.
Here's a counted list of seg elements with a type attribute.

00004 <seg type="x-large">
42577 <seg type="x-maqqef">
02278 <seg type="x-paseq">
01181 <seg type="x-pe">
00009 <seg type="x-reversednun">
01981 <seg type="x-samekh">
00003 <seg type="x-small">
23192 <seg type="x-sof-pasuq">
00004 <seg type="x-suspended">

Best regards,

David

Sent with [ProtonMail](https://protonmail.com) Secure Email.

> -------- Original Message --------
> Subject: Re: [sword-devel] OSISMorphSegmentation
> Local Time: 30 December 2017 4:25 PM
> UTC Time: 30 December 2017 16:25
> From: dmsmith at crosswire.org
> To: David Haslam <dfhdfh at protonmail.com>, SWORD Developers' Collaboration Forum <sword-devel at crosswire.org>
>
> I wish you had started with this. I find it more satisfying to know what the end goal is. I had thought you merely wanted to update the wiki.
>
> I’m wondering how a “morphological segment” should be rendered to an end user. I presume that it is of value to a student of Hebrew. Is it merely highlighting the segment differently than an adjacent one or are their other attributes that should be provided to the end user, e.g. Strong’s Number, part of speech, ….
>
> Regarding having / in the OSIS as a semantic marker is bad, bad, bad. Text in XML shouldn’t have to be parsed. Never. You are right to suggest a different markup. The problem with a milestone is that it marks the “between” but not the segment.
>
> The incompletely supported markup for SWORD is <seg type=“morph”>text</seg> or <seg type=“x-morph”>text</seg>.
>
> Not sure how it should be indexed. At least by whole word. Perhaps also by segment?
>
> In Him,
> DM
>
>> On Dec 30, 2017, at 3:54 AM, David Haslam <dfhdfh at protonmail.com> wrote:
>>
>> Thanks everyone for each "pearl of wisdom".
>>
>> It wasn't that I didn't understand what Hebrew morpheme segmentation is.
>> That's pretty evident from examining the content of the modules concerned.
>>
>> Nor was it that I am unfamiliar with use of the seg element in OSIS XML.
>> I've been using OSIS for long enough to have seen this many times.
>>
>> My concern is that we have a listed filter that either seems to do nothing
>> or that the module[s] where it's specified do not match what the filter expects.
>>
>> Neither scenario is good.
>>
>> Recently, I've been looking at the [openscriptures/morphhb](https://github.com/openscriptures/morphhb) project on GitHub.
>> Aside: I even issued a pull request yesterday.
>> btw. Some of the team members are known to us.
>>
>> They have adopted a simpler method to separate morpheme segments.
>> They just use a solidus as a separator character within the Hebrew word, e.g.
>> <w lemma="b/7225" n="1.0" morph="HR/Ncfsa">בְּ/רֵאשִׁ֖ית</w>
>> That's fine in the raw XML but it would look very odd to a Hebrew reader.
>>
>> When the openscriptures team gets round to rebuilding the OSHB module, we require a filter that works.
>> It need not look for the OSIS feature that doesn't seem to have any effect in our defective WLC module.
>>
>> My proposal would be to provide a simple mechanism in SWORD to hide or display a specified marker.
>>
>> Their present OSIS files could be converted to replace each solidus in the Hebrew text to the following XML element.
>> <milestone type="x-mss" marker="/" />
>> The type attribute proposed is merely an abbreviation for "morpheme segment separator".
>>
>> We already have something like this in SWORD.
>> When the KJV module switches to paragraphs, the Pilcrow signs disappear.
>> They are coded as milestone markers.
>>
>> We could generalise the concept by having a filter called
>> GlobalOptionFilter=OSISMilestoneMarker
>>
>> Who knows how many further good uses it might find?
>>
>> Best regards,
>>
>> David
>>
>> Sent with [ProtonMail](https://protonmail.com/) Secure Email.
>>
>>> -------- Original Message --------
>>> Subject: Re: [sword-devel] OSISMorphSegmentation
>>> Local Time: 30 December 2017 12:03 AM
>>> UTC Time: 30 December 2017 00:03
>>> From: scribe at crosswire.org
>>> To: sword-devel at crosswire.org
>>>
>>> A few brief points:
>>>
>>> The logs have this as the initial commit:
>>>
>>> commit ecaac871e4fa607a32d81f1049e928795db4eaa1
>>> Author: chrislit chrislit at bcd7d363-81e1-0310-97ec-a550e20fc99c
>>>
>>> Date:   Wed Jan 11 19:45:21 2006 +0000
>>>
>>>     added OSISMorphSegmentation files (from BibleTime) to repository;
>>> not integrated into projects/make system yet
>>>
>>>     git-svn-id: https://crosswire.org/svn/sword/trunk@1884
>>> bcd7d363-81e1-0310-97ec-a550e20fc99c
>>>
>>> Maybe the BibleTime team can lend a little info on the original intent.
>>>
>>> I did a little work on it 7 years ago:
>>>
>>> commit 0eda5565f50a1a6b22b4b96e147e81b04e88b859
>>> Author: scribe scribe at bcd7d363-81e1-0310-97ec-a550e20fc99c
>>> Date:   Mon Apr 14 16:22:11 2014 +0000
>>>
>>>     fixed osismorphsegmentation to look for both type=morph and x-morph
>>>     fixed close seg to check inMorph before processing </seg> as close morph
>>>
>>>     git-svn-id: https://crosswire.org/svn/sword/trunk@3153
>>> bcd7d363-81e1-0310-97ec-a550e20fc99c
>>>
>>> ... which I believe was in reference to Daniel Owen's work with the WHM
>>> database:
>>>
>>> http://crosswire.org/~dowens76/swordweb/parallelstudy.jsp?add=KJV&add=WHM&key=Gen.1.1
>>>
>>> On 12/29/2017 04:34 PM, Tom Sullivan wrote:
>>>
>>>> DM:
>>>> There may be a terminology problem here.
>>>> Re:
>>>> <seg type="x-morph">הַ</seg>
>>>> The letter He is used as the definite article and it is prepended to
>>>> the word. Example using English, L to R: "The Land" would be He-Eretz.
>>>> Hebrew also appends pronounimal suffixes, so perhaps those are
>>>> segments as well. The pronounimal suffixes also have meaning on their
>>>> own.
>>>> Highly inflected languages can be a bear for English speakers, so it
>>>> would make some sense to parse out the word. I am no Hebrew scholar
>>>> and cannot recall all of the exact terminology that should be used.
>>>> We could use some help here from someone whose Hebrew is fresh in
>>>> their mind. Correct terminology and a bit more explanation on all of
>>>> these kind of options would help.
>>>> All of us who are programmers should take heed from this issue. One
>>>> should not have to decipher code to know about the inputs and outputs.
>>>> Tom
>>>>
>>>> Tom Sullivan
>>>> info at BeForgiven.INFO
>>>> FAX: 815-301-2835
>>>>
>>>> Great News!
>>>> God created you, owns you and gave you commands to obey.
>>>> You have disobeyed God - as your conscience very well attests to you.
>>>> God's holiness and justice compel Him to punish you in Hell.
>>>> Jesus Christ became Man, was crucified, buried and rose from the dead
>>>> as a substitute for all who trust in Him, redeeming them from Hell.
>>>> If you repent (turn from your sin) and believe (trust) in Jesus Christ,
>>>> you will go to Heaven. Otherwise you will go to Hell.
>>>> Warning! Good works are a result, not cause, of saving trust.
>>>> More info is at [www.esig.beforgiven.info](http://www.esig.beforgiven.info/)
>>>> Do you believe this? Copy this signature into your email program
>>>> and use the Internet to spread the Great News every time you email.
>>>> On 12/29/2017 06:12 PM, DM Smith wrote:
>>>>
>>>>> I have no idea. I can read and write C++, but it’s been over 20 years
>>>>> since I did it on a regular basis. I’m not interested in trying to
>>>>> decipher the code or what Chris L. had in mind. Just glancing at the
>>>>> code it says it pertains to WLC and it has Morph and Segmentation in
>>>>> the name. That’s quite a clue.
>>>>> The code has a construct I’ve seen wrt to footnotes and strongs
>>>>> numbers, though I don’t what it does or how it is used. (within a
>>>>> verse buf is set to 1 for the first seg and 2 for the next and so on.
>>>>> tagText is the text content of the seg element.)
>>>>> module->getEntryAttributes()["Morpheme"][buf]["body"] = tagText;
>>>>> If it parallels footnotes, strongs, … then perhaps it is a numerical
>>>>> superscript that when clicked on brings up the segment. I don’t think
>>>>> that makes sense. Unless someone can make sense of it, I don’t think
>>>>> it’s worthy of documenting in the wiki.
>>>>> Perhaps the following is a clue. It is the content of Genesis 1:1.
>>>>> <w><seg type="x-morph">בְּ</seg><seg
>>>>> type="x-morph">רֵאשִׁ֖ית</seg></w> <w><seg
>>>>> type="x-morph">בָּרָ֣א</seg></w> <w><seg
>>>>> type="x-morph">אֱלֹהִ֑ים</seg></w> <w><seg
>>>>> type="x-morph">אֵ֥ת</seg></w> <w><seg type="x-morph">הַ</seg><seg
>>>>> type="x-morph">שָּׁמַ֖יִם</seg></w> <w><seg
>>>>> type="x-morph">וְ</seg><seg type="x-morph">אֵ֥ת</seg></w> <w><seg
>>>>> type="x-morph">הָ</seg><seg type="x-morph">אָֽרֶץ׃</seg></w>
>>>>> It appears that each w (aka word) is made up of one or more seg
>>>>> (segments). Each segment is marked as x-morph. While I took 7 credits
>>>>> of Biblical Hebrew, I don’t remember a lick of it. I’m guessing that
>>>>> a segment is part of the word that has meaning on its own.
>>>>> DM
>>>>>
>>>>>> On Dec 29, 2017, at 5:14 PM, David Haslam <dfhdfh at protonmail.com
>>>>>> mailto:dfhdfh at protonmail.com> wrote:
>>>>>> I know it's still the holiday season, yet I would still like to have
>>>>>> it explained what is the difference in output that we should see
>>>>>> when the OSISMorphSegmentation filter is applied.
>>>>>> There are modules which have this specified in the .conf file, yet
>>>>>> I've not seen any discernable difference in what (e.g.) Xiphos
>>>>>> displays when this module option is ticked.
>>>>>> /Is that too much to ask?/
>>>>>> Best regards,
>>>>>> David
>>>>>> Sent with ProtonMail https://protonmail.com/ Secure Email.
>>>>>>
>>>>>>> -------- Original Message --------
>>>>>>> Subject: Re: [sword-devel] OSISMorphSegmentation
>>>>>>> Local Time: 26 December 2017 12:10 PM
>>>>>>> UTC Time: 26 December 2017 12:10
>>>>>>> From: dfhdfh at protonmail.com mailto:dfhdfh at protonmail.com
>>>>>>> To: sword-devel mailing list <sword-devel at crosswire.org
>>>>>>> mailto:sword-devel at crosswire.org>
>>>>>>> All very well if you're a C++ programmer, but "as clear as mud" to
>>>>>>> those like me that aren't.
>>>>>>> What exactly is the intended difference in output with the filter
>>>>>>> enabled?
>>>>>>> Where segments of a Hebrew word are in different seg elements, what
>>>>>>> should I expect to see at the locations where the OSIS has
>>>>>>> </seg><seg.+> ?
>>>>>>> Does the filter insert a space or some other character as a
>>>>>>> separator between consecutive segments?
>>>>>>> Best regards,
>>>>>>> David
>>>>>>> Sent from ProtonMail https://protonmail.com/ Secure Email.
>>>>>>>
>>>>>>>> -------- Original Message --------
>>>>>>>> Subject: Re: [sword-devel] OSISMorphSegmentation
>>>>>>>> Local Time: 25 December 2017 3:16 PM
>>>>>>>> UTC Time: 25 December 2017 15:16
>>>>>>>> From: dmsmith at crosswire.org mailto:dmsmith at crosswire.org
>>>>>>>> To: David Haslam <dfhdfh at protonmail.com
>>>>>>>> mailto:dfhdfh at protonmail.com>, SWORD Developers' Collaboration
>>>>>>>> Forum <sword-devel at crosswire.org mailto:sword-devel at crosswire.org>
>>>>>>>> All of the filters are in the folder
>>>>>>>> http://www.crosswire.org/svn/sword/trunk/src/modules/filters/
>>>>>>>> Each filter has a corresponding file who’s name is in lowercase
>>>>>>>> with the extension of cpp.
>>>>>>>> See:
>>>>>>>> http://www.crosswire.org/svn/sword/trunk/src/modules/filters/osismorphsegmentation.cpp
>>>>>>>> From the code:
>>>>>>>> SWFilter descendant to toggle splitting of
>>>>>>>> morphemes (for morpheme segmented Hebrew in
>>>>>>>> the WLC)
>>>>>>>>
>>>>>>>>> On Dec 25, 2017, at 9:23 AM, David Haslam <dfhdfh at protonmail.com
>>>>>>>>> mailto:dfhdfh at protonmail.com> wrote:
>>>>>>>>> I want to update the wiki page for OSIS Bibles
>>>>>>>>> https://crosswire.org/wiki/OSIS_Bibles#Marking_morpheme_segmentation.
>>>>>>>>> Please would some one explain exactly what is looked for in the
>>>>>>>>> OSIS XML for SWORD to actually filter something in the module for
>>>>>>>>> GlobalOptionFilter=OSISMorphSegmentation
>>>>>>>>> /We seemed to have overlooked the documentation requirements
>>>>>>>>> since I first enquired almost 4 years ago/.
>>>>>>>>> Best regards,
>>>>>>>>> David
>>>>>>>>> Sent with ProtonMail https://protonmail.com/ Secure Email.
>>>>>>>>> ---------------------------------------------------------------
>>>>>>>>>
>>>>>>>>> sword-devel mailing list: sword-devel at crosswire.org
>>>>>>>>> mailto:sword-devel at crosswire.org
>>>>>>>>> http://www.crosswire.org/mailman/listinfo/sword-devel
>>>>>>>>> Instructions to unsubscribe/change your settings at above page
>>>>>>
>>>>>> ---------------------------------------------------------------
>>>>>>
>>>>>> sword-devel mailing list: sword-devel at crosswire.org
>>>>>> mailto:sword-devel at crosswire.org
>>>>>> http://www.crosswire.org/mailman/listinfo/sword-devel
>>>>>> Instructions to unsubscribe/change your settings at above page
>>>>>
>>>>> ---------------------------------------------------------------
>>>>>
>>>>> This email has been scanned by the Symantec Email Security.cloud
>>>>> service.
>>>>> For more information please visit [http://www.symanteccloud.com](http://www.symanteccloud.com/)
>>>>> ---------------------------------------------------------------
>>>>>
>>>>> ---------------------------------------------------------------
>>>>>
>>>>> sword-devel mailing list: sword-devel at crosswire.org
>>>>> http://www.crosswire.org/mailman/listinfo/sword-devel
>>>>> Instructions to unsubscribe/change your settings at above page
>>>>
>>>> ---------------------------------------------------------------
>>>>
>>>> sword-devel mailing list: sword-devel at crosswire.org
>>>> http://www.crosswire.org/mailman/listinfo/sword-devel
>>>> Instructions to unsubscribe/change your settings at above page
>>>
>>> ---------------------------------------------------------------
>>>
>>> sword-devel mailing list: sword-devel at crosswire.org
>>> http://www.crosswire.org/mailman/listinfo/sword-devel
>>> Instructions to unsubscribe/change your settings at above page
>>
>> _______________________________________________
>> sword-devel mailing list: sword-devel at crosswire.org
>> http://www.crosswire.org/mailman/listinfo/sword-devel
>> Instructions to unsubscribe/change your settings at above page
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://www.crosswire.org/pipermail/sword-devel/attachments/20171230/20288861/attachment-0001.html>


More information about the sword-devel mailing list