[sword-devel] Localized parsing symbols [was: C++ volunteer]

Troy A. Griffitts scribe at crosswire.org
Tue May 28 09:50:25 MST 2019


On 5/28/19 9:24 AM, Cyrille wrote:
>
>
> Il 28/05/2019 17:40, Troy A. Griffitts ha scritto:
>>
>> So, a little background surrounding why the logic is difficult to
>> work out a solution for this problem:
>>
>> The current verse parser, which works fairly well, always has 3 sets
>> of possibilities in view:
>>
>> OSISRef
>> Current Locale
>> English
>>
>> The parser needs to handle any of these three, typically in the
>> preference order listed above.  The issue with changing out symbols
>> while parsing is that some symbols (notoriously the comma) are used
>> for different purposes across these 3 sets.
>>
>> One might think that localized output might be easier than parsing,
>> e.g., once parsed, we could at least output the reference: Jn 3,16. 
>> The problem here is that what the engine outputs it also expects to
>> be able to parse.
>>
>> While we would like to solve this problem, it isn't as simple as
>> adding to the locale files:
>>
>> ChapterVerseSeparator=,
>>
>> RangeSeparator=-
>>
>> ListSeparator=.
>>
>> This would be enough to define the locale, but not solve the
>> problem.  We would need a fundamental change in how parsing is done,
>> e.g., explicitly telling the parser, "Hey, I'm sending you localized
>> input, so don't guess.  You can count on the symbols I'm sending you
>> to be localized"  Right now everyone has the convenience of just
>> passing any of the 3 sets of parsing text listed above and theparser
>> just figuring it out-- with the caveat that chapter, range, and list
>> separators are not localizable.
>>
>> Hope this gives some background,
>>
> Yes thank you, but I just don't understand why it is already possible
> with two separator (. and : ) and then not only with one? Maybe I
> can't understand it because it is too much hard (technicaly) for me ;)

Because ':' is unambiguous between all three.  '.' retains OSIS
semantics and is book and chapter separator between all three.  The
problem comes when you have an entry like:

Jn 3,16

Currently, the parser will understand this as John chapter 3 and chapter
16.  You obviously would want this to be chapter 3, verse 16.  The
question is, when does the parser decide it is a list separator and when
does it decide it is a chapter/verse separator.  It is ambiguous unless
you tell the parser: "This is strictly locale formatted text.  It is not
OSIS.  It is not English."

Hope this helps clarify.

Troy


>> Troy
>>
>>
>> On 5/28/19 6:10 AM, David Haslam wrote:
>>> OK - but my observations were not entirely irrelevant. 
>>>
>>> Some front-ends never need the user to enter a reference in an edit
>>> box. Navigation is done entirely via menu selections or clicking
>>> search results etc. 
>>> AFAICT this is true of PocketSword. 
>>>
>>> Other front-ends are designed at the opposite extreme. All
>>> navigation is done through an edit box. This is true (eg) of STEP
>>> Bible. 
>>>
>>> Best regards,
>>>
>>> David. 
>>>
>>> Sent from ProtonMail Mobile
>>>
>>>
>>> On Tue, May 28, 2019 at 13:54, refdoc at gmx.net <refdoc at gmx.net
>>> <mailto:refdoc at gmx.net>> wrote:
>>>> Sorry, David, that is a complete misunderstanding. Modules need
>>>> osisref. There is and will be no need to do anything to the
>>>> modules. This is about the engine parser to read references locale
>>>> appropriately.
>>>>
>>>> Sent from my mobile. Please forgive shortness, typos and weird
>>>> autocorrects.
>>>>
>>>>
>>>> -------- Original Message --------
>>>> Subject: Re: [sword-devel] C++ volunteer
>>>> From: David Haslam
>>>> To: SWORD Developers' Collaboration Forum
>>>> CC:
>>>>
>>>>
>>>>     Parsing native references is not a simple task, as we know from
>>>>     the fact that adyeths orefs.py was kicked into touch indefinitely. 
>>>>
>>>>     And that’s even when punctuation marks are defined in the
>>>>     specified configuration file. 
>>>>
>>>>     Unless we might consider the possibility of adding keys to
>>>>     module .conf files that define the module specific
>>>>     native reference punctuation marks and separators. 
>>>>
>>>>     That could be a huge undertaking, considering the need to
>>>>     maintain backwards compatibility. 
>>>>
>>>>     And it’s not as if it really is module specific entirely. A
>>>>     user can be switching between modules with different languages,
>>>>     yet would need the current reference to always work, no matter
>>>>     what. 
>>>>
>>>>     Best regards 
>>>>
>>>>     David
>>>>
>>>>     Sent from ProtonMail Mobile
>>>>
>>>>
>>>>     On Tue, May 28, 2019 at 12:10, refdoc at gmx.net <refdoc at gmx.net
>>>>     <mailto:refdoc at gmx.net>> wrote:
>>>>>     The improvement request for allowing commas in references...
>>>>>     adding commas in the suggested form would make millions of
>>>>>     currently valid Anglo references invalid. The problem is a
>>>>>     much wider one, references should be localised in their
>>>>>     punctuation too. I am not sure how difficult this would be,
>>>>>     but I guess we could make a start by defining what punctuation
>>>>>     is used for which purpose , and then take it from there.
>>>>>
>>>>>     Cyrille, maybe start a page on the wiki and start thinking there.
>>>>>
>>>>>     Sent from my mobile. Please forgive shortness, typos and weird
>>>>>     autocorrects.
>>>>>
>>>>>
>>>>>     -------- Original Message --------
>>>>>     Subject: Re: [sword-devel] C++ volunteer
>>>>>     From: Cyrille
>>>>>     To: SWORD Developers' Collaboration Forum
>>>>>     CC:
>>>>>
>>>>>
>>>>>         Hello Richard,
>>>>>         Welcome!
>>>>>         May I make a very selfish proposal to Richard who offers
>>>>>         his help. There are two issues that I really want to be
>>>>>         resolved. One of which particularly handicaps Catholic
>>>>>         users, (but I discovered today that the issue wasn't been
>>>>>         reported!!! I just did it):
>>>>>         https://tracker.crosswire.org/browse/API-216
>>>>>         And the second:
>>>>>         https://tracker.crosswire.org/projects/API/issues/API-180
>>>>>
>>>>>         If there are more important things that I am not able to
>>>>>         estimate not being a developer, I would have tried my luck ;)
>>>>>
>>>>>         Il 28/05/2019 01:38, Troy A. Griffitts ha scritto:
>>>>>>         Richard, sorry, I meant to give you the link to our tracker:
>>>>>>
>>>>>>         https://tracker.crosswire.org
>>>>>>
>>>>>>
>>>>>>         On 5/27/19 4:32 PM, Troy A. Griffitts wrote:
>>>>>>>         Welcome, Richard!
>>>>>>>
>>>>>>>         I would start at 2 places:
>>>>>>>
>>>>>>>         First, have a look at our tracker here.  We are not very (very not)
>>>>>>>         disciplined at keeping it current.  Skimming through there and
>>>>>>>         commenting on anything that looks interesting, or even cleaning a few
>>>>>>>         things up in there that you confirm are no longer a problem might be a
>>>>>>>         useful exercise to get you poking around at internals and would be a
>>>>>>>         blessing for us.  Our modus operandi as of late is to create a new unit
>>>>>>>         test in sword/tests/testssuite/ which fails at the bug and then once
>>>>>>>         fixed, the test should pass and we leave the test around to be sure we
>>>>>>>         don't regress.  We can always use more tests in our tests suite.
>>>>>>>
>>>>>>>         Next, we have the intention to modularize our search engines support and
>>>>>>>         search types.  Right now, SWModule (which represents a Bible) implements
>>>>>>>         our SWSearchable interface, which is fine, but right now it has a bunch
>>>>>>>         of #ifdef logic and switch statements to take different code paths
>>>>>>>         depending on which search engine is compiled into SWORD and which search
>>>>>>>         type is specified.  This was fine initially, but has grown to such that
>>>>>>>         we now support spaghetti in there.  It should probably simply have a set
>>>>>>>         of SWSearchable objects in a map<SEARCH_TYPE, SWSearchable> and proxy
>>>>>>>         the search request to the appropriate SWSearchable impl based on what
>>>>>>>         types are registered for the module.  This would allow us to implement
>>>>>>>         new types and register them with modules which support special search
>>>>>>>         types, e.g., advanced Hebrew Morphology searching.  That's the general
>>>>>>>         idea anyway.
>>>>>>>
>>>>>>>         You should probably become familiar with SWFilter and how we use these
>>>>>>>         throughout the engine. These prepare a buffer for particular
>>>>>>>         objectives.  We have RenderFilters, EncodingFilters, StripFilters, ... 
>>>>>>>         The last prepares an SWModule entry for searching by, typically,
>>>>>>>         stripping out all markup and leaving only a plaintext buffer which can
>>>>>>>         be searched.  We have some special code in the SWModule::search
>>>>>>>         spaghetti which takes Greek and Hebrew modules and turns buffers into a
>>>>>>>         series of Strongs#@MorphCode Strong#@MorphCode ... which allows regex
>>>>>>>         searches to do some advanced morph searching... like: Find this strongs
>>>>>>>         number, any morphology, followed by a any verb withing 2 words.  You
>>>>>>>         have to be pretty familiar with the Strong#@MorphCode syntax to
>>>>>>>         formulate something like that, but the idea is that a frontend could
>>>>>>>         have a nice UI to help a user come up with some creative searches. 
>>>>>>>         Anyway, these should all be probably modulized out by renaming the
>>>>>>>         StripFilter concept to SearchFilter, and then pushing all this special
>>>>>>>         code out to SearchFilter impls which do these special things...
>>>>>>>
>>>>>>>         Finally, an objective of all this search modularization is also to break
>>>>>>>         out the code required to create search indexes for each of the search
>>>>>>>         engines we support.  Ideally, we should be able to support the same
>>>>>>>         searches either as an indexed or brute force search.  The same code
>>>>>>>         which iterates a module, prepares each entry, and pushes that entry to
>>>>>>>         the search engine, building the search index, should also work for a
>>>>>>>         brute force search-- iterating the module, preparing each entry for the
>>>>>>>         search engine.. and then performing a check on that buffer to see if it
>>>>>>>         matches the search expression.
>>>>>>>
>>>>>>>         I hope this gives you a few things to think about. It has been good for
>>>>>>>         me to refresh thoughts on all of this.  Have a look and let me know what
>>>>>>>         you think.
>>>>>>>
>>>>>>>         Welcome!  Looking forward to sharing in service together,
>>>>>>>
>>>>>>>         Troy
>>>>>>>
>>>>>>>          
>>>>>>>
>>>>>>>         On 5/27/19 1:09 PM, Richard Smith wrote:
>>>>>>>>         Hi,
>>>>>>>>
>>>>>>>>         My name's Richard Smith. I'm a C++ software engineer with 10 years
>>>>>>>>         experience in various industries. I was wondering if there was any
>>>>>>>>         space for a volunteer. I've started taking a look at things (building
>>>>>>>>         repos on Win/unix), but if there are specific things that are
>>>>>>>>         required, within my ability, I'm happy to do that.
>>>>>>>>
>>>>>>>>         Best Regards
>>>>>>>>         Richard Smith
>>>>>>>>
>>>>>>>>         _______________________________________________
>>>>>>>>         sword-devel mailing list: sword-devel at crosswire.org
>>>>>>>>         http://www.crosswire.org/mailman/listinfo/sword-devel
>>>>>>>>         Instructions to unsubscribe/change your settings at above page
>>>>>>>         _______________________________________________
>>>>>>>         sword-devel mailing list: sword-devel at crosswire.org
>>>>>>>         http://www.crosswire.org/mailman/listinfo/sword-devel
>>>>>>>         Instructions to unsubscribe/change your settings at above page
>>>>>>         _______________________________________________
>>>>>>         sword-devel mailing list: sword-devel at crosswire.org
>>>>>>         http://www.crosswire.org/mailman/listinfo/sword-devel
>>>>>>         Instructions to unsubscribe/change your settings at above page
>>>>>
>>>>
>>>>
>>>
>>>
>>>
>>> _______________________________________________
>>> sword-devel mailing list: sword-devel at crosswire.org
>>> http://www.crosswire.org/mailman/listinfo/sword-devel
>>> Instructions to unsubscribe/change your settings at above page
>>
>> _______________________________________________
>> sword-devel mailing list: sword-devel at crosswire.org
>> http://www.crosswire.org/mailman/listinfo/sword-devel
>> Instructions to unsubscribe/change your settings at above page
>
>
> _______________________________________________
> sword-devel mailing list: sword-devel at crosswire.org
> http://www.crosswire.org/mailman/listinfo/sword-devel
> Instructions to unsubscribe/change your settings at above page
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://www.crosswire.org/pipermail/sword-devel/attachments/20190528/4a736997/attachment-0001.html>


More information about the sword-devel mailing list