[sword-devel] OSISMorphSegmentation
David Haslam
dfhdfh at protonmail.com
Sat Dec 30 01:54:51 MST 2017
Thanks everyone for each "pearl of wisdom".
It wasn't that I didn't understand what Hebrew morpheme segmentation is.
That's pretty evident from examining the content of the modules concerned.
Nor was it that I am unfamiliar with use of the seg element in OSIS XML.
I've been using OSIS for long enough to have seen this many times.
My concern is that we have a listed filter that either seems to do nothing
or that the module[s] where it's specified do not match what the filter expects.
Neither scenario is good.
Recently, I've been looking at the [openscriptures/morphhb](https://github.com/openscriptures/morphhb) project on GitHub.
Aside: I even issued a pull request yesterday.
btw. Some of the team members are known to us.
They have adopted a simpler method to separate morpheme segments.
They just use a solidus as a separator character within the Hebrew word, e.g.
<w lemma="b/7225" n="1.0" morph="HR/Ncfsa">בְּ/רֵאשִׁ֖ית</w>
That's fine in the raw XML but it would look very odd to a Hebrew reader.
When the openscriptures team gets round to rebuilding the OSHB module, we require a filter that works.
It need not look for the OSIS feature that doesn't seem to have any effect in our defective WLC module.
My proposal would be to provide a simple mechanism in SWORD to hide or display a specified marker.
Their present OSIS files could be converted to replace each solidus in the Hebrew text to the following XML element.
<milestone type="x-mss" marker="/" />
The type attribute proposed is merely an abbreviation for "morpheme segment separator".
We already have something like this in SWORD.
When the KJV module switches to paragraphs, the Pilcrow signs disappear.
They are coded as milestone markers.
We could generalise the concept by having a filter called
GlobalOptionFilter=OSISMilestoneMarker
Who knows how many further good uses it might find?
Best regards,
David
Sent with [ProtonMail](https://protonmail.com) Secure Email.
> -------- Original Message --------
> Subject: Re: [sword-devel] OSISMorphSegmentation
> Local Time: 30 December 2017 12:03 AM
> UTC Time: 30 December 2017 00:03
> From: scribe at crosswire.org
> To: sword-devel at crosswire.org
>
> A few brief points:
>
> The logs have this as the initial commit:
>
> commit ecaac871e4fa607a32d81f1049e928795db4eaa1
> Author: chrislit chrislit at bcd7d363-81e1-0310-97ec-a550e20fc99c
>
> Date: Wed Jan 11 19:45:21 2006 +0000
>
> added OSISMorphSegmentation files (from BibleTime) to repository;
> not integrated into projects/make system yet
>
> git-svn-id: https://crosswire.org/svn/sword/trunk@1884
> bcd7d363-81e1-0310-97ec-a550e20fc99c
>
> Maybe the BibleTime team can lend a little info on the original intent.
>
> I did a little work on it 7 years ago:
>
> commit 0eda5565f50a1a6b22b4b96e147e81b04e88b859
> Author: scribe scribe at bcd7d363-81e1-0310-97ec-a550e20fc99c
> Date: Mon Apr 14 16:22:11 2014 +0000
>
> fixed osismorphsegmentation to look for both type=morph and x-morph
> fixed close seg to check inMorph before processing </seg> as close morph
>
> git-svn-id: https://crosswire.org/svn/sword/trunk@3153
> bcd7d363-81e1-0310-97ec-a550e20fc99c
>
> ... which I believe was in reference to Daniel Owen's work with the WHM
> database:
>
> http://crosswire.org/~dowens76/swordweb/parallelstudy.jsp?add=KJV&add=WHM&key=Gen.1.1
>
> On 12/29/2017 04:34 PM, Tom Sullivan wrote:
>
>> DM:
>> There may be a terminology problem here.
>> Re:
>> <seg type="x-morph">הַ</seg>
>> The letter He is used as the definite article and it is prepended to
>> the word. Example using English, L to R: "The Land" would be He-Eretz.
>> Hebrew also appends pronounimal suffixes, so perhaps those are
>> segments as well. The pronounimal suffixes also have meaning on their
>> own.
>> Highly inflected languages can be a bear for English speakers, so it
>> would make some sense to parse out the word. I am no Hebrew scholar
>> and cannot recall all of the exact terminology that should be used.
>> We could use some help here from someone whose Hebrew is fresh in
>> their mind. Correct terminology and a bit more explanation on all of
>> these kind of options would help.
>> All of us who are programmers should take heed from this issue. One
>> should not have to decipher code to know about the inputs and outputs.
>> Tom
>>
>> Tom Sullivan
>> info at BeForgiven.INFO
>> FAX: 815-301-2835
>>
>> Great News!
>> God created you, owns you and gave you commands to obey.
>> You have disobeyed God - as your conscience very well attests to you.
>> God's holiness and justice compel Him to punish you in Hell.
>> Jesus Christ became Man, was crucified, buried and rose from the dead
>> as a substitute for all who trust in Him, redeeming them from Hell.
>> If you repent (turn from your sin) and believe (trust) in Jesus Christ,
>> you will go to Heaven. Otherwise you will go to Hell.
>> Warning! Good works are a result, not cause, of saving trust.
>> More info is at www.esig.beforgiven.info
>> Do you believe this? Copy this signature into your email program
>> and use the Internet to spread the Great News every time you email.
>> On 12/29/2017 06:12 PM, DM Smith wrote:
>>
>>> I have no idea. I can read and write C++, but it’s been over 20 years
>>> since I did it on a regular basis. I’m not interested in trying to
>>> decipher the code or what Chris L. had in mind. Just glancing at the
>>> code it says it pertains to WLC and it has Morph and Segmentation in
>>> the name. That’s quite a clue.
>>> The code has a construct I’ve seen wrt to footnotes and strongs
>>> numbers, though I don’t what it does or how it is used. (within a
>>> verse buf is set to 1 for the first seg and 2 for the next and so on.
>>> tagText is the text content of the seg element.)
>>> module->getEntryAttributes()["Morpheme"][buf]["body"] = tagText;
>>> If it parallels footnotes, strongs, … then perhaps it is a numerical
>>> superscript that when clicked on brings up the segment. I don’t think
>>> that makes sense. Unless someone can make sense of it, I don’t think
>>> it’s worthy of documenting in the wiki.
>>> Perhaps the following is a clue. It is the content of Genesis 1:1.
>>> <w><seg type="x-morph">בְּ</seg><seg
>>> type="x-morph">רֵאשִׁ֖ית</seg></w> <w><seg
>>> type="x-morph">בָּרָ֣א</seg></w> <w><seg
>>> type="x-morph">אֱלֹהִ֑ים</seg></w> <w><seg
>>> type="x-morph">אֵ֥ת</seg></w> <w><seg type="x-morph">הַ</seg><seg
>>> type="x-morph">שָּׁמַ֖יִם</seg></w> <w><seg
>>> type="x-morph">וְ</seg><seg type="x-morph">אֵ֥ת</seg></w> <w><seg
>>> type="x-morph">הָ</seg><seg type="x-morph">אָֽרֶץ׃</seg></w>
>>> It appears that each w (aka word) is made up of one or more seg
>>> (segments). Each segment is marked as x-morph. While I took 7 credits
>>> of Biblical Hebrew, I don’t remember a lick of it. I’m guessing that
>>> a segment is part of the word that has meaning on its own.
>>> DM
>>>
>>>> On Dec 29, 2017, at 5:14 PM, David Haslam <dfhdfh at protonmail.com
>>>> mailto:dfhdfh at protonmail.com> wrote:
>>>> I know it's still the holiday season, yet I would still like to have
>>>> it explained what is the difference in output that we should see
>>>> when the OSISMorphSegmentation filter is applied.
>>>> There are modules which have this specified in the .conf file, yet
>>>> I've not seen any discernable difference in what (e.g.) Xiphos
>>>> displays when this module option is ticked.
>>>> /Is that too much to ask?/
>>>> Best regards,
>>>> David
>>>> Sent with ProtonMail https://protonmail.com/ Secure Email.
>>>>
>>>>> -------- Original Message --------
>>>>> Subject: Re: [sword-devel] OSISMorphSegmentation
>>>>> Local Time: 26 December 2017 12:10 PM
>>>>> UTC Time: 26 December 2017 12:10
>>>>> From: dfhdfh at protonmail.com mailto:dfhdfh at protonmail.com
>>>>> To: sword-devel mailing list <sword-devel at crosswire.org
>>>>> mailto:sword-devel at crosswire.org>
>>>>> All very well if you're a C++ programmer, but "as clear as mud" to
>>>>> those like me that aren't.
>>>>> What exactly is the intended difference in output with the filter
>>>>> enabled?
>>>>> Where segments of a Hebrew word are in different seg elements, what
>>>>> should I expect to see at the locations where the OSIS has
>>>>> </seg><seg.+> ?
>>>>> Does the filter insert a space or some other character as a
>>>>> separator between consecutive segments?
>>>>> Best regards,
>>>>> David
>>>>> Sent from ProtonMail https://protonmail.com/ Secure Email.
>>>>>
>>>>>> -------- Original Message --------
>>>>>> Subject: Re: [sword-devel] OSISMorphSegmentation
>>>>>> Local Time: 25 December 2017 3:16 PM
>>>>>> UTC Time: 25 December 2017 15:16
>>>>>> From: dmsmith at crosswire.org mailto:dmsmith at crosswire.org
>>>>>> To: David Haslam <dfhdfh at protonmail.com
>>>>>> mailto:dfhdfh at protonmail.com>, SWORD Developers' Collaboration
>>>>>> Forum <sword-devel at crosswire.org mailto:sword-devel at crosswire.org>
>>>>>> All of the filters are in the folder
>>>>>> http://www.crosswire.org/svn/sword/trunk/src/modules/filters/
>>>>>> Each filter has a corresponding file who’s name is in lowercase
>>>>>> with the extension of cpp.
>>>>>> See:
>>>>>> http://www.crosswire.org/svn/sword/trunk/src/modules/filters/osismorphsegmentation.cpp
>>>>>> From the code:
>>>>>> SWFilter descendant to toggle splitting of
>>>>>> morphemes (for morpheme segmented Hebrew in
>>>>>> the WLC)
>>>>>>
>>>>>>> On Dec 25, 2017, at 9:23 AM, David Haslam <dfhdfh at protonmail.com
>>>>>>> mailto:dfhdfh at protonmail.com> wrote:
>>>>>>> I want to update the wiki page for OSIS Bibles
>>>>>>> https://crosswire.org/wiki/OSIS_Bibles#Marking_morpheme_segmentation.
>>>>>>> Please would some one explain exactly what is looked for in the
>>>>>>> OSIS XML for SWORD to actually filter something in the module for
>>>>>>> GlobalOptionFilter=OSISMorphSegmentation
>>>>>>> /We seemed to have overlooked the documentation requirements
>>>>>>> since I first enquired almost 4 years ago/.
>>>>>>> Best regards,
>>>>>>> David
>>>>>>> Sent with ProtonMail https://protonmail.com/ Secure Email.
>>>>>>> ---------------------------------------------------------------
>>>>>>>
>>>>>>> sword-devel mailing list: sword-devel at crosswire.org
>>>>>>> mailto:sword-devel at crosswire.org
>>>>>>> http://www.crosswire.org/mailman/listinfo/sword-devel
>>>>>>> Instructions to unsubscribe/change your settings at above page
>>>>
>>>> ---------------------------------------------------------------
>>>>
>>>> sword-devel mailing list: sword-devel at crosswire.org
>>>> mailto:sword-devel at crosswire.org
>>>> http://www.crosswire.org/mailman/listinfo/sword-devel
>>>> Instructions to unsubscribe/change your settings at above page
>>>
>>> ---------------------------------------------------------------
>>>
>>> This email has been scanned by the Symantec Email Security.cloud
>>> service.
>>> For more information please visit http://www.symanteccloud.com
>>> ---------------------------------------------------------------
>>>
>>> ---------------------------------------------------------------
>>>
>>> sword-devel mailing list: sword-devel at crosswire.org
>>> http://www.crosswire.org/mailman/listinfo/sword-devel
>>> Instructions to unsubscribe/change your settings at above page
>>
>> ---------------------------------------------------------------
>>
>> sword-devel mailing list: sword-devel at crosswire.org
>> http://www.crosswire.org/mailman/listinfo/sword-devel
>> Instructions to unsubscribe/change your settings at above page
>
> ---------------------------------------------------------------
>
> sword-devel mailing list: sword-devel at crosswire.org
> http://www.crosswire.org/mailman/listinfo/sword-devel
> Instructions to unsubscribe/change your settings at above page
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://www.crosswire.org/pipermail/sword-devel/attachments/20171230/8fc1c6c9/attachment-0001.html>
More information about the sword-devel
mailing list