[sword-devel] Localized parsing symbols [was: C++ volunteer]
Cyrille
lafricain79 at gmail.com
Tue May 28 09:59:51 MST 2019
Il 28/05/2019 18:50, Troy A. Griffitts ha scritto:
> On 5/28/19 9:24 AM, Cyrille wrote:
>>
>>
>> Il 28/05/2019 17:40, Troy A. Griffitts ha scritto:
>>>
>>> So, a little background surrounding why the logic is difficult to
>>> work out a solution for this problem:
>>>
>>> The current verse parser, which works fairly well, always has 3 sets
>>> of possibilities in view:
>>>
>>> OSISRef
>>> Current Locale
>>> English
>>>
>>> The parser needs to handle any of these three, typically in the
>>> preference order listed above. The issue with changing out symbols
>>> while parsing is that some symbols (notoriously the comma) are used
>>> for different purposes across these 3 sets.
>>>
>>> One might think that localized output might be easier than parsing,
>>> e.g., once parsed, we could at least output the reference: Jn 3,16.
>>> The problem here is that what the engine outputs it also expects to
>>> be able to parse.
>>>
>>> While we would like to solve this problem, it isn't as simple as
>>> adding to the locale files:
>>>
>>> ChapterVerseSeparator=,
>>>
>>> RangeSeparator=-
>>>
>>> ListSeparator=.
>>>
>>> This would be enough to define the locale, but not solve the
>>> problem. We would need a fundamental change in how parsing is done,
>>> e.g., explicitly telling the parser, "Hey, I'm sending you localized
>>> input, so don't guess. You can count on the symbols I'm sending you
>>> to be localized" Right now everyone has the convenience of just
>>> passing any of the 3 sets of parsing text listed above and theparser
>>> just figuring it out-- with the caveat that chapter, range, and list
>>> separators are not localizable.
>>>
>>> Hope this gives some background,
>>>
>> Yes thank you, but I just don't understand why it is already possible
>> with two separator (. and : ) and then not only with one? Maybe I
>> can't understand it because it is too much hard (technicaly) for me ;)
>
> Because ':' is unambiguous between all three. '.' retains OSIS
> semantics and is book and chapter separator between all three. The
> problem comes when you have an entry like:
>
> Jn 3,16
>
> Currently, the parser will understand this as John chapter 3 and
> chapter 16. You obviously would want this to be chapter 3, verse 16.
> The question is, when does the parser decide it is a list separator
> and when does it decide it is a chapter/verse separator.
>
No comma in the "catholic" model is never a list separator. If comma is
used as chapter/verse separator then the semi-columns is used as list
separator.
EG:
Jn 3,16-20; 5,3.8 : Confusion is not possible.
>
> It is ambiguous unless you tell the parser: "This is strictly locale
> formatted text. It is not OSIS. It is not English."
>
> Hope this helps clarify.
>
> Troy
>
>
>>> Troy
>>>
>>>
>>> On 5/28/19 6:10 AM, David Haslam wrote:
>>>> OK - but my observations were not entirely irrelevant.
>>>>
>>>> Some front-ends never need the user to enter a reference in an edit
>>>> box. Navigation is done entirely via menu selections or clicking
>>>> search results etc.
>>>> AFAICT this is true of PocketSword.
>>>>
>>>> Other front-ends are designed at the opposite extreme. All
>>>> navigation is done through an edit box. This is true (eg) of STEP
>>>> Bible.
>>>>
>>>> Best regards,
>>>>
>>>> David.
>>>>
>>>> Sent from ProtonMail Mobile
>>>>
>>>>
>>>> On Tue, May 28, 2019 at 13:54, refdoc at gmx.net <refdoc at gmx.net
>>>> <mailto:refdoc at gmx.net>> wrote:
>>>>> Sorry, David, that is a complete misunderstanding. Modules need
>>>>> osisref. There is and will be no need to do anything to the
>>>>> modules. This is about the engine parser to read references locale
>>>>> appropriately.
>>>>>
>>>>> Sent from my mobile. Please forgive shortness, typos and weird
>>>>> autocorrects.
>>>>>
>>>>>
>>>>> -------- Original Message --------
>>>>> Subject: Re: [sword-devel] C++ volunteer
>>>>> From: David Haslam
>>>>> To: SWORD Developers' Collaboration Forum
>>>>> CC:
>>>>>
>>>>>
>>>>> Parsing native references is not a simple task, as we know
>>>>> from the fact that adyeths orefs.py was kicked into touch
>>>>> indefinitely.
>>>>>
>>>>> And that’s even when punctuation marks are defined in the
>>>>> specified configuration file.
>>>>>
>>>>> Unless we might consider the possibility of adding keys to
>>>>> module .conf files that define the module specific
>>>>> native reference punctuation marks and separators.
>>>>>
>>>>> That could be a huge undertaking, considering the need to
>>>>> maintain backwards compatibility.
>>>>>
>>>>> And it’s not as if it really is module specific entirely. A
>>>>> user can be switching between modules with different
>>>>> languages, yet would need the current reference to always
>>>>> work, no matter what.
>>>>>
>>>>> Best regards
>>>>>
>>>>> David
>>>>>
>>>>> Sent from ProtonMail Mobile
>>>>>
>>>>>
>>>>> On Tue, May 28, 2019 at 12:10, refdoc at gmx.net <refdoc at gmx.net
>>>>> <mailto:refdoc at gmx.net>> wrote:
>>>>>> The improvement request for allowing commas in references...
>>>>>> adding commas in the suggested form would make millions of
>>>>>> currently valid Anglo references invalid. The problem is a
>>>>>> much wider one, references should be localised in their
>>>>>> punctuation too. I am not sure how difficult this would be,
>>>>>> but I guess we could make a start by defining what
>>>>>> punctuation is used for which purpose , and then take it from
>>>>>> there.
>>>>>>
>>>>>> Cyrille, maybe start a page on the wiki and start thinking
>>>>>> there.
>>>>>>
>>>>>> Sent from my mobile. Please forgive shortness, typos and
>>>>>> weird autocorrects.
>>>>>>
>>>>>>
>>>>>> -------- Original Message --------
>>>>>> Subject: Re: [sword-devel] C++ volunteer
>>>>>> From: Cyrille
>>>>>> To: SWORD Developers' Collaboration Forum
>>>>>> CC:
>>>>>>
>>>>>>
>>>>>> Hello Richard,
>>>>>> Welcome!
>>>>>> May I make a very selfish proposal to Richard who offers
>>>>>> his help. There are two issues that I really want to be
>>>>>> resolved. One of which particularly handicaps Catholic
>>>>>> users, (but I discovered today that the issue wasn't been
>>>>>> reported!!! I just did it):
>>>>>> https://tracker.crosswire.org/browse/API-216
>>>>>> And the second:
>>>>>> https://tracker.crosswire.org/projects/API/issues/API-180
>>>>>>
>>>>>> If there are more important things that I am not able to
>>>>>> estimate not being a developer, I would have tried my luck ;)
>>>>>>
>>>>>> Il 28/05/2019 01:38, Troy A. Griffitts ha scritto:
>>>>>>> Richard, sorry, I meant to give you the link to our tracker:
>>>>>>>
>>>>>>> https://tracker.crosswire.org
>>>>>>>
>>>>>>>
>>>>>>> On 5/27/19 4:32 PM, Troy A. Griffitts wrote:
>>>>>>>> Welcome, Richard!
>>>>>>>>
>>>>>>>> I would start at 2 places:
>>>>>>>>
>>>>>>>> First, have a look at our tracker here. We are not very (very not)
>>>>>>>> disciplined at keeping it current. Skimming through there and
>>>>>>>> commenting on anything that looks interesting, or even cleaning a few
>>>>>>>> things up in there that you confirm are no longer a problem might be a
>>>>>>>> useful exercise to get you poking around at internals and would be a
>>>>>>>> blessing for us. Our modus operandi as of late is to create a new unit
>>>>>>>> test in sword/tests/testssuite/ which fails at the bug and then once
>>>>>>>> fixed, the test should pass and we leave the test around to be sure we
>>>>>>>> don't regress. We can always use more tests in our tests suite.
>>>>>>>>
>>>>>>>> Next, we have the intention to modularize our search engines support and
>>>>>>>> search types. Right now, SWModule (which represents a Bible) implements
>>>>>>>> our SWSearchable interface, which is fine, but right now it has a bunch
>>>>>>>> of #ifdef logic and switch statements to take different code paths
>>>>>>>> depending on which search engine is compiled into SWORD and which search
>>>>>>>> type is specified. This was fine initially, but has grown to such that
>>>>>>>> we now support spaghetti in there. It should probably simply have a set
>>>>>>>> of SWSearchable objects in a map<SEARCH_TYPE, SWSearchable> and proxy
>>>>>>>> the search request to the appropriate SWSearchable impl based on what
>>>>>>>> types are registered for the module. This would allow us to implement
>>>>>>>> new types and register them with modules which support special search
>>>>>>>> types, e.g., advanced Hebrew Morphology searching. That's the general
>>>>>>>> idea anyway.
>>>>>>>>
>>>>>>>> You should probably become familiar with SWFilter and how we use these
>>>>>>>> throughout the engine. These prepare a buffer for particular
>>>>>>>> objectives. We have RenderFilters, EncodingFilters, StripFilters, ...
>>>>>>>> The last prepares an SWModule entry for searching by, typically,
>>>>>>>> stripping out all markup and leaving only a plaintext buffer which can
>>>>>>>> be searched. We have some special code in the SWModule::search
>>>>>>>> spaghetti which takes Greek and Hebrew modules and turns buffers into a
>>>>>>>> series of Strongs#@MorphCode Strong#@MorphCode ... which allows regex
>>>>>>>> searches to do some advanced morph searching... like: Find this strongs
>>>>>>>> number, any morphology, followed by a any verb withing 2 words. You
>>>>>>>> have to be pretty familiar with the Strong#@MorphCode syntax to
>>>>>>>> formulate something like that, but the idea is that a frontend could
>>>>>>>> have a nice UI to help a user come up with some creative searches.
>>>>>>>> Anyway, these should all be probably modulized out by renaming the
>>>>>>>> StripFilter concept to SearchFilter, and then pushing all this special
>>>>>>>> code out to SearchFilter impls which do these special things...
>>>>>>>>
>>>>>>>> Finally, an objective of all this search modularization is also to break
>>>>>>>> out the code required to create search indexes for each of the search
>>>>>>>> engines we support. Ideally, we should be able to support the same
>>>>>>>> searches either as an indexed or brute force search. The same code
>>>>>>>> which iterates a module, prepares each entry, and pushes that entry to
>>>>>>>> the search engine, building the search index, should also work for a
>>>>>>>> brute force search-- iterating the module, preparing each entry for the
>>>>>>>> search engine.. and then performing a check on that buffer to see if it
>>>>>>>> matches the search expression.
>>>>>>>>
>>>>>>>> I hope this gives you a few things to think about. It has been good for
>>>>>>>> me to refresh thoughts on all of this. Have a look and let me know what
>>>>>>>> you think.
>>>>>>>>
>>>>>>>> Welcome! Looking forward to sharing in service together,
>>>>>>>>
>>>>>>>> Troy
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> On 5/27/19 1:09 PM, Richard Smith wrote:
>>>>>>>>> Hi,
>>>>>>>>>
>>>>>>>>> My name's Richard Smith. I'm a C++ software engineer with 10 years
>>>>>>>>> experience in various industries. I was wondering if there was any
>>>>>>>>> space for a volunteer. I've started taking a look at things (building
>>>>>>>>> repos on Win/unix), but if there are specific things that are
>>>>>>>>> required, within my ability, I'm happy to do that.
>>>>>>>>>
>>>>>>>>> Best Regards
>>>>>>>>> Richard Smith
>>>>>>>>>
>>>>>>>>> _______________________________________________
>>>>>>>>> sword-devel mailing list: sword-devel at crosswire.org
>>>>>>>>> http://www.crosswire.org/mailman/listinfo/sword-devel
>>>>>>>>> Instructions to unsubscribe/change your settings at above page
>>>>>>>> _______________________________________________
>>>>>>>> sword-devel mailing list: sword-devel at crosswire.org
>>>>>>>> http://www.crosswire.org/mailman/listinfo/sword-devel
>>>>>>>> Instructions to unsubscribe/change your settings at above page
>>>>>>> _______________________________________________
>>>>>>> sword-devel mailing list: sword-devel at crosswire.org
>>>>>>> http://www.crosswire.org/mailman/listinfo/sword-devel
>>>>>>> Instructions to unsubscribe/change your settings at above page
>>>>>>
>>>>>
>>>>>
>>>>
>>>>
>>>>
>>>> _______________________________________________
>>>> sword-devel mailing list: sword-devel at crosswire.org
>>>> http://www.crosswire.org/mailman/listinfo/sword-devel
>>>> Instructions to unsubscribe/change your settings at above page
>>>
>>> _______________________________________________
>>> sword-devel mailing list: sword-devel at crosswire.org
>>> http://www.crosswire.org/mailman/listinfo/sword-devel
>>> Instructions to unsubscribe/change your settings at above page
>>
>>
>> _______________________________________________
>> sword-devel mailing list: sword-devel at crosswire.org
>> http://www.crosswire.org/mailman/listinfo/sword-devel
>> Instructions to unsubscribe/change your settings at above page
>
> _______________________________________________
> sword-devel mailing list: sword-devel at crosswire.org
> http://www.crosswire.org/mailman/listinfo/sword-devel
> Instructions to unsubscribe/change your settings at above page
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://www.crosswire.org/pipermail/sword-devel/attachments/20190528/29fa4fed/attachment-0001.html>
More information about the sword-devel
mailing list