[sword-devel] Localized parsing symbols [was: C++ volunteer]

Cyrille lafricain79 at gmail.com
Tue May 28 09:59:51 MST 2019



Il 28/05/2019 18:50, Troy A. Griffitts ha scritto:
> On 5/28/19 9:24 AM, Cyrille wrote:
>>
>>
>> Il 28/05/2019 17:40, Troy A. Griffitts ha scritto:
>>>
>>> So, a little background surrounding why the logic is difficult to
>>> work out a solution for this problem:
>>>
>>> The current verse parser, which works fairly well, always has 3 sets
>>> of possibilities in view:
>>>
>>> OSISRef
>>> Current Locale
>>> English
>>>
>>> The parser needs to handle any of these three, typically in the
>>> preference order listed above.  The issue with changing out symbols
>>> while parsing is that some symbols (notoriously the comma) are used
>>> for different purposes across these 3 sets.
>>>
>>> One might think that localized output might be easier than parsing,
>>> e.g., once parsed, we could at least output the reference: Jn 3,16. 
>>> The problem here is that what the engine outputs it also expects to
>>> be able to parse.
>>>
>>> While we would like to solve this problem, it isn't as simple as
>>> adding to the locale files:
>>>
>>> ChapterVerseSeparator=,
>>>
>>> RangeSeparator=-
>>>
>>> ListSeparator=.
>>>
>>> This would be enough to define the locale, but not solve the
>>> problem.  We would need a fundamental change in how parsing is done,
>>> e.g., explicitly telling the parser, "Hey, I'm sending you localized
>>> input, so don't guess.  You can count on the symbols I'm sending you
>>> to be localized"  Right now everyone has the convenience of just
>>> passing any of the 3 sets of parsing text listed above and theparser
>>> just figuring it out-- with the caveat that chapter, range, and list
>>> separators are not localizable.
>>>
>>> Hope this gives some background,
>>>
>> Yes thank you, but I just don't understand why it is already possible
>> with two separator (. and : ) and then not only with one? Maybe I
>> can't understand it because it is too much hard (technicaly) for me ;)
>
> Because ':' is unambiguous between all three.  '.' retains OSIS
> semantics and is book and chapter separator between all three.  The
> problem comes when you have an entry like:
>
> Jn 3,16
>
> Currently, the parser will understand this as John chapter 3 and
> chapter 16.  You obviously would want this to be chapter 3, verse 16. 
> The question is, when does the parser decide it is a list separator
> and when does it decide it is a chapter/verse separator.
>

No comma in the "catholic" model is never a list separator. If comma is
used as chapter/verse separator then the semi-columns is used as list
separator.
EG:
Jn 3,16-20; 5,3.8 : Confusion is not possible.
>
> It is ambiguous unless you tell the parser: "This is strictly locale
> formatted text.  It is not OSIS.  It is not English."
>
> Hope this helps clarify.
>
> Troy
>
>
>>> Troy
>>>
>>>
>>> On 5/28/19 6:10 AM, David Haslam wrote:
>>>> OK - but my observations were not entirely irrelevant. 
>>>>
>>>> Some front-ends never need the user to enter a reference in an edit
>>>> box. Navigation is done entirely via menu selections or clicking
>>>> search results etc. 
>>>> AFAICT this is true of PocketSword. 
>>>>
>>>> Other front-ends are designed at the opposite extreme. All
>>>> navigation is done through an edit box. This is true (eg) of STEP
>>>> Bible. 
>>>>
>>>> Best regards,
>>>>
>>>> David. 
>>>>
>>>> Sent from ProtonMail Mobile
>>>>
>>>>
>>>> On Tue, May 28, 2019 at 13:54, refdoc at gmx.net <refdoc at gmx.net
>>>> <mailto:refdoc at gmx.net>> wrote:
>>>>> Sorry, David, that is a complete misunderstanding. Modules need
>>>>> osisref. There is and will be no need to do anything to the
>>>>> modules. This is about the engine parser to read references locale
>>>>> appropriately.
>>>>>
>>>>> Sent from my mobile. Please forgive shortness, typos and weird
>>>>> autocorrects.
>>>>>
>>>>>
>>>>> -------- Original Message --------
>>>>> Subject: Re: [sword-devel] C++ volunteer
>>>>> From: David Haslam
>>>>> To: SWORD Developers' Collaboration Forum
>>>>> CC:
>>>>>
>>>>>
>>>>>     Parsing native references is not a simple task, as we know
>>>>>     from the fact that adyeths orefs.py was kicked into touch
>>>>>     indefinitely. 
>>>>>
>>>>>     And that’s even when punctuation marks are defined in the
>>>>>     specified configuration file. 
>>>>>
>>>>>     Unless we might consider the possibility of adding keys to
>>>>>     module .conf files that define the module specific
>>>>>     native reference punctuation marks and separators. 
>>>>>
>>>>>     That could be a huge undertaking, considering the need to
>>>>>     maintain backwards compatibility. 
>>>>>
>>>>>     And it’s not as if it really is module specific entirely. A
>>>>>     user can be switching between modules with different
>>>>>     languages, yet would need the current reference to always
>>>>>     work, no matter what. 
>>>>>
>>>>>     Best regards 
>>>>>
>>>>>     David
>>>>>
>>>>>     Sent from ProtonMail Mobile
>>>>>
>>>>>
>>>>>     On Tue, May 28, 2019 at 12:10, refdoc at gmx.net <refdoc at gmx.net
>>>>>     <mailto:refdoc at gmx.net>> wrote:
>>>>>>     The improvement request for allowing commas in references...
>>>>>>     adding commas in the suggested form would make millions of
>>>>>>     currently valid Anglo references invalid. The problem is a
>>>>>>     much wider one, references should be localised in their
>>>>>>     punctuation too. I am not sure how difficult this would be,
>>>>>>     but I guess we could make a start by defining what
>>>>>>     punctuation is used for which purpose , and then take it from
>>>>>>     there.
>>>>>>
>>>>>>     Cyrille, maybe start a page on the wiki and start thinking
>>>>>>     there.
>>>>>>
>>>>>>     Sent from my mobile. Please forgive shortness, typos and
>>>>>>     weird autocorrects.
>>>>>>
>>>>>>
>>>>>>     -------- Original Message --------
>>>>>>     Subject: Re: [sword-devel] C++ volunteer
>>>>>>     From: Cyrille
>>>>>>     To: SWORD Developers' Collaboration Forum
>>>>>>     CC:
>>>>>>
>>>>>>
>>>>>>         Hello Richard,
>>>>>>         Welcome!
>>>>>>         May I make a very selfish proposal to Richard who offers
>>>>>>         his help. There are two issues that I really want to be
>>>>>>         resolved. One of which particularly handicaps Catholic
>>>>>>         users, (but I discovered today that the issue wasn't been
>>>>>>         reported!!! I just did it):
>>>>>>         https://tracker.crosswire.org/browse/API-216
>>>>>>         And the second:
>>>>>>         https://tracker.crosswire.org/projects/API/issues/API-180
>>>>>>
>>>>>>         If there are more important things that I am not able to
>>>>>>         estimate not being a developer, I would have tried my luck ;)
>>>>>>
>>>>>>         Il 28/05/2019 01:38, Troy A. Griffitts ha scritto:
>>>>>>>         Richard, sorry, I meant to give you the link to our tracker:
>>>>>>>
>>>>>>>         https://tracker.crosswire.org
>>>>>>>
>>>>>>>
>>>>>>>         On 5/27/19 4:32 PM, Troy A. Griffitts wrote:
>>>>>>>>         Welcome, Richard!
>>>>>>>>
>>>>>>>>         I would start at 2 places:
>>>>>>>>
>>>>>>>>         First, have a look at our tracker here.  We are not very (very not)
>>>>>>>>         disciplined at keeping it current.  Skimming through there and
>>>>>>>>         commenting on anything that looks interesting, or even cleaning a few
>>>>>>>>         things up in there that you confirm are no longer a problem might be a
>>>>>>>>         useful exercise to get you poking around at internals and would be a
>>>>>>>>         blessing for us.  Our modus operandi as of late is to create a new unit
>>>>>>>>         test in sword/tests/testssuite/ which fails at the bug and then once
>>>>>>>>         fixed, the test should pass and we leave the test around to be sure we
>>>>>>>>         don't regress.  We can always use more tests in our tests suite.
>>>>>>>>
>>>>>>>>         Next, we have the intention to modularize our search engines support and
>>>>>>>>         search types.  Right now, SWModule (which represents a Bible) implements
>>>>>>>>         our SWSearchable interface, which is fine, but right now it has a bunch
>>>>>>>>         of #ifdef logic and switch statements to take different code paths
>>>>>>>>         depending on which search engine is compiled into SWORD and which search
>>>>>>>>         type is specified.  This was fine initially, but has grown to such that
>>>>>>>>         we now support spaghetti in there.  It should probably simply have a set
>>>>>>>>         of SWSearchable objects in a map<SEARCH_TYPE, SWSearchable> and proxy
>>>>>>>>         the search request to the appropriate SWSearchable impl based on what
>>>>>>>>         types are registered for the module.  This would allow us to implement
>>>>>>>>         new types and register them with modules which support special search
>>>>>>>>         types, e.g., advanced Hebrew Morphology searching.  That's the general
>>>>>>>>         idea anyway.
>>>>>>>>
>>>>>>>>         You should probably become familiar with SWFilter and how we use these
>>>>>>>>         throughout the engine. These prepare a buffer for particular
>>>>>>>>         objectives.  We have RenderFilters, EncodingFilters, StripFilters, ... 
>>>>>>>>         The last prepares an SWModule entry for searching by, typically,
>>>>>>>>         stripping out all markup and leaving only a plaintext buffer which can
>>>>>>>>         be searched.  We have some special code in the SWModule::search
>>>>>>>>         spaghetti which takes Greek and Hebrew modules and turns buffers into a
>>>>>>>>         series of Strongs#@MorphCode Strong#@MorphCode ... which allows regex
>>>>>>>>         searches to do some advanced morph searching... like: Find this strongs
>>>>>>>>         number, any morphology, followed by a any verb withing 2 words.  You
>>>>>>>>         have to be pretty familiar with the Strong#@MorphCode syntax to
>>>>>>>>         formulate something like that, but the idea is that a frontend could
>>>>>>>>         have a nice UI to help a user come up with some creative searches. 
>>>>>>>>         Anyway, these should all be probably modulized out by renaming the
>>>>>>>>         StripFilter concept to SearchFilter, and then pushing all this special
>>>>>>>>         code out to SearchFilter impls which do these special things...
>>>>>>>>
>>>>>>>>         Finally, an objective of all this search modularization is also to break
>>>>>>>>         out the code required to create search indexes for each of the search
>>>>>>>>         engines we support.  Ideally, we should be able to support the same
>>>>>>>>         searches either as an indexed or brute force search.  The same code
>>>>>>>>         which iterates a module, prepares each entry, and pushes that entry to
>>>>>>>>         the search engine, building the search index, should also work for a
>>>>>>>>         brute force search-- iterating the module, preparing each entry for the
>>>>>>>>         search engine.. and then performing a check on that buffer to see if it
>>>>>>>>         matches the search expression.
>>>>>>>>
>>>>>>>>         I hope this gives you a few things to think about. It has been good for
>>>>>>>>         me to refresh thoughts on all of this.  Have a look and let me know what
>>>>>>>>         you think.
>>>>>>>>
>>>>>>>>         Welcome!  Looking forward to sharing in service together,
>>>>>>>>
>>>>>>>>         Troy
>>>>>>>>
>>>>>>>>          
>>>>>>>>
>>>>>>>>         On 5/27/19 1:09 PM, Richard Smith wrote:
>>>>>>>>>         Hi,
>>>>>>>>>
>>>>>>>>>         My name's Richard Smith. I'm a C++ software engineer with 10 years
>>>>>>>>>         experience in various industries. I was wondering if there was any
>>>>>>>>>         space for a volunteer. I've started taking a look at things (building
>>>>>>>>>         repos on Win/unix), but if there are specific things that are
>>>>>>>>>         required, within my ability, I'm happy to do that.
>>>>>>>>>
>>>>>>>>>         Best Regards
>>>>>>>>>         Richard Smith
>>>>>>>>>
>>>>>>>>>         _______________________________________________
>>>>>>>>>         sword-devel mailing list: sword-devel at crosswire.org
>>>>>>>>>         http://www.crosswire.org/mailman/listinfo/sword-devel
>>>>>>>>>         Instructions to unsubscribe/change your settings at above page
>>>>>>>>         _______________________________________________
>>>>>>>>         sword-devel mailing list: sword-devel at crosswire.org
>>>>>>>>         http://www.crosswire.org/mailman/listinfo/sword-devel
>>>>>>>>         Instructions to unsubscribe/change your settings at above page
>>>>>>>         _______________________________________________
>>>>>>>         sword-devel mailing list: sword-devel at crosswire.org
>>>>>>>         http://www.crosswire.org/mailman/listinfo/sword-devel
>>>>>>>         Instructions to unsubscribe/change your settings at above page
>>>>>>
>>>>>
>>>>>
>>>>
>>>>
>>>>
>>>> _______________________________________________
>>>> sword-devel mailing list: sword-devel at crosswire.org
>>>> http://www.crosswire.org/mailman/listinfo/sword-devel
>>>> Instructions to unsubscribe/change your settings at above page
>>>
>>> _______________________________________________
>>> sword-devel mailing list: sword-devel at crosswire.org
>>> http://www.crosswire.org/mailman/listinfo/sword-devel
>>> Instructions to unsubscribe/change your settings at above page
>>
>>
>> _______________________________________________
>> sword-devel mailing list: sword-devel at crosswire.org
>> http://www.crosswire.org/mailman/listinfo/sword-devel
>> Instructions to unsubscribe/change your settings at above page
>
> _______________________________________________
> sword-devel mailing list: sword-devel at crosswire.org
> http://www.crosswire.org/mailman/listinfo/sword-devel
> Instructions to unsubscribe/change your settings at above page

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://www.crosswire.org/pipermail/sword-devel/attachments/20190528/29fa4fed/attachment-0001.html>


More information about the sword-devel mailing list