[sword-devel] Localized parsing symbols [was: C++ volunteer]
Troy A. Griffitts
scribe at crosswire.org
Tue May 28 09:50:25 MST 2019
On 5/28/19 9:24 AM, Cyrille wrote:
>
>
> Il 28/05/2019 17:40, Troy A. Griffitts ha scritto:
>>
>> So, a little background surrounding why the logic is difficult to
>> work out a solution for this problem:
>>
>> The current verse parser, which works fairly well, always has 3 sets
>> of possibilities in view:
>>
>> OSISRef
>> Current Locale
>> English
>>
>> The parser needs to handle any of these three, typically in the
>> preference order listed above. The issue with changing out symbols
>> while parsing is that some symbols (notoriously the comma) are used
>> for different purposes across these 3 sets.
>>
>> One might think that localized output might be easier than parsing,
>> e.g., once parsed, we could at least output the reference: Jn 3,16.
>> The problem here is that what the engine outputs it also expects to
>> be able to parse.
>>
>> While we would like to solve this problem, it isn't as simple as
>> adding to the locale files:
>>
>> ChapterVerseSeparator=,
>>
>> RangeSeparator=-
>>
>> ListSeparator=.
>>
>> This would be enough to define the locale, but not solve the
>> problem. We would need a fundamental change in how parsing is done,
>> e.g., explicitly telling the parser, "Hey, I'm sending you localized
>> input, so don't guess. You can count on the symbols I'm sending you
>> to be localized" Right now everyone has the convenience of just
>> passing any of the 3 sets of parsing text listed above and theparser
>> just figuring it out-- with the caveat that chapter, range, and list
>> separators are not localizable.
>>
>> Hope this gives some background,
>>
> Yes thank you, but I just don't understand why it is already possible
> with two separator (. and : ) and then not only with one? Maybe I
> can't understand it because it is too much hard (technicaly) for me ;)
Because ':' is unambiguous between all three. '.' retains OSIS
semantics and is book and chapter separator between all three. The
problem comes when you have an entry like:
Jn 3,16
Currently, the parser will understand this as John chapter 3 and chapter
16. You obviously would want this to be chapter 3, verse 16. The
question is, when does the parser decide it is a list separator and when
does it decide it is a chapter/verse separator. It is ambiguous unless
you tell the parser: "This is strictly locale formatted text. It is not
OSIS. It is not English."
Hope this helps clarify.
Troy
>> Troy
>>
>>
>> On 5/28/19 6:10 AM, David Haslam wrote:
>>> OK - but my observations were not entirely irrelevant.
>>>
>>> Some front-ends never need the user to enter a reference in an edit
>>> box. Navigation is done entirely via menu selections or clicking
>>> search results etc.
>>> AFAICT this is true of PocketSword.
>>>
>>> Other front-ends are designed at the opposite extreme. All
>>> navigation is done through an edit box. This is true (eg) of STEP
>>> Bible.
>>>
>>> Best regards,
>>>
>>> David.
>>>
>>> Sent from ProtonMail Mobile
>>>
>>>
>>> On Tue, May 28, 2019 at 13:54, refdoc at gmx.net <refdoc at gmx.net
>>> <mailto:refdoc at gmx.net>> wrote:
>>>> Sorry, David, that is a complete misunderstanding. Modules need
>>>> osisref. There is and will be no need to do anything to the
>>>> modules. This is about the engine parser to read references locale
>>>> appropriately.
>>>>
>>>> Sent from my mobile. Please forgive shortness, typos and weird
>>>> autocorrects.
>>>>
>>>>
>>>> -------- Original Message --------
>>>> Subject: Re: [sword-devel] C++ volunteer
>>>> From: David Haslam
>>>> To: SWORD Developers' Collaboration Forum
>>>> CC:
>>>>
>>>>
>>>> Parsing native references is not a simple task, as we know from
>>>> the fact that adyeths orefs.py was kicked into touch indefinitely.
>>>>
>>>> And that’s even when punctuation marks are defined in the
>>>> specified configuration file.
>>>>
>>>> Unless we might consider the possibility of adding keys to
>>>> module .conf files that define the module specific
>>>> native reference punctuation marks and separators.
>>>>
>>>> That could be a huge undertaking, considering the need to
>>>> maintain backwards compatibility.
>>>>
>>>> And it’s not as if it really is module specific entirely. A
>>>> user can be switching between modules with different languages,
>>>> yet would need the current reference to always work, no matter
>>>> what.
>>>>
>>>> Best regards
>>>>
>>>> David
>>>>
>>>> Sent from ProtonMail Mobile
>>>>
>>>>
>>>> On Tue, May 28, 2019 at 12:10, refdoc at gmx.net <refdoc at gmx.net
>>>> <mailto:refdoc at gmx.net>> wrote:
>>>>> The improvement request for allowing commas in references...
>>>>> adding commas in the suggested form would make millions of
>>>>> currently valid Anglo references invalid. The problem is a
>>>>> much wider one, references should be localised in their
>>>>> punctuation too. I am not sure how difficult this would be,
>>>>> but I guess we could make a start by defining what punctuation
>>>>> is used for which purpose , and then take it from there.
>>>>>
>>>>> Cyrille, maybe start a page on the wiki and start thinking there.
>>>>>
>>>>> Sent from my mobile. Please forgive shortness, typos and weird
>>>>> autocorrects.
>>>>>
>>>>>
>>>>> -------- Original Message --------
>>>>> Subject: Re: [sword-devel] C++ volunteer
>>>>> From: Cyrille
>>>>> To: SWORD Developers' Collaboration Forum
>>>>> CC:
>>>>>
>>>>>
>>>>> Hello Richard,
>>>>> Welcome!
>>>>> May I make a very selfish proposal to Richard who offers
>>>>> his help. There are two issues that I really want to be
>>>>> resolved. One of which particularly handicaps Catholic
>>>>> users, (but I discovered today that the issue wasn't been
>>>>> reported!!! I just did it):
>>>>> https://tracker.crosswire.org/browse/API-216
>>>>> And the second:
>>>>> https://tracker.crosswire.org/projects/API/issues/API-180
>>>>>
>>>>> If there are more important things that I am not able to
>>>>> estimate not being a developer, I would have tried my luck ;)
>>>>>
>>>>> Il 28/05/2019 01:38, Troy A. Griffitts ha scritto:
>>>>>> Richard, sorry, I meant to give you the link to our tracker:
>>>>>>
>>>>>> https://tracker.crosswire.org
>>>>>>
>>>>>>
>>>>>> On 5/27/19 4:32 PM, Troy A. Griffitts wrote:
>>>>>>> Welcome, Richard!
>>>>>>>
>>>>>>> I would start at 2 places:
>>>>>>>
>>>>>>> First, have a look at our tracker here. We are not very (very not)
>>>>>>> disciplined at keeping it current. Skimming through there and
>>>>>>> commenting on anything that looks interesting, or even cleaning a few
>>>>>>> things up in there that you confirm are no longer a problem might be a
>>>>>>> useful exercise to get you poking around at internals and would be a
>>>>>>> blessing for us. Our modus operandi as of late is to create a new unit
>>>>>>> test in sword/tests/testssuite/ which fails at the bug and then once
>>>>>>> fixed, the test should pass and we leave the test around to be sure we
>>>>>>> don't regress. We can always use more tests in our tests suite.
>>>>>>>
>>>>>>> Next, we have the intention to modularize our search engines support and
>>>>>>> search types. Right now, SWModule (which represents a Bible) implements
>>>>>>> our SWSearchable interface, which is fine, but right now it has a bunch
>>>>>>> of #ifdef logic and switch statements to take different code paths
>>>>>>> depending on which search engine is compiled into SWORD and which search
>>>>>>> type is specified. This was fine initially, but has grown to such that
>>>>>>> we now support spaghetti in there. It should probably simply have a set
>>>>>>> of SWSearchable objects in a map<SEARCH_TYPE, SWSearchable> and proxy
>>>>>>> the search request to the appropriate SWSearchable impl based on what
>>>>>>> types are registered for the module. This would allow us to implement
>>>>>>> new types and register them with modules which support special search
>>>>>>> types, e.g., advanced Hebrew Morphology searching. That's the general
>>>>>>> idea anyway.
>>>>>>>
>>>>>>> You should probably become familiar with SWFilter and how we use these
>>>>>>> throughout the engine. These prepare a buffer for particular
>>>>>>> objectives. We have RenderFilters, EncodingFilters, StripFilters, ...
>>>>>>> The last prepares an SWModule entry for searching by, typically,
>>>>>>> stripping out all markup and leaving only a plaintext buffer which can
>>>>>>> be searched. We have some special code in the SWModule::search
>>>>>>> spaghetti which takes Greek and Hebrew modules and turns buffers into a
>>>>>>> series of Strongs#@MorphCode Strong#@MorphCode ... which allows regex
>>>>>>> searches to do some advanced morph searching... like: Find this strongs
>>>>>>> number, any morphology, followed by a any verb withing 2 words. You
>>>>>>> have to be pretty familiar with the Strong#@MorphCode syntax to
>>>>>>> formulate something like that, but the idea is that a frontend could
>>>>>>> have a nice UI to help a user come up with some creative searches.
>>>>>>> Anyway, these should all be probably modulized out by renaming the
>>>>>>> StripFilter concept to SearchFilter, and then pushing all this special
>>>>>>> code out to SearchFilter impls which do these special things...
>>>>>>>
>>>>>>> Finally, an objective of all this search modularization is also to break
>>>>>>> out the code required to create search indexes for each of the search
>>>>>>> engines we support. Ideally, we should be able to support the same
>>>>>>> searches either as an indexed or brute force search. The same code
>>>>>>> which iterates a module, prepares each entry, and pushes that entry to
>>>>>>> the search engine, building the search index, should also work for a
>>>>>>> brute force search-- iterating the module, preparing each entry for the
>>>>>>> search engine.. and then performing a check on that buffer to see if it
>>>>>>> matches the search expression.
>>>>>>>
>>>>>>> I hope this gives you a few things to think about. It has been good for
>>>>>>> me to refresh thoughts on all of this. Have a look and let me know what
>>>>>>> you think.
>>>>>>>
>>>>>>> Welcome! Looking forward to sharing in service together,
>>>>>>>
>>>>>>> Troy
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> On 5/27/19 1:09 PM, Richard Smith wrote:
>>>>>>>> Hi,
>>>>>>>>
>>>>>>>> My name's Richard Smith. I'm a C++ software engineer with 10 years
>>>>>>>> experience in various industries. I was wondering if there was any
>>>>>>>> space for a volunteer. I've started taking a look at things (building
>>>>>>>> repos on Win/unix), but if there are specific things that are
>>>>>>>> required, within my ability, I'm happy to do that.
>>>>>>>>
>>>>>>>> Best Regards
>>>>>>>> Richard Smith
>>>>>>>>
>>>>>>>> _______________________________________________
>>>>>>>> sword-devel mailing list: sword-devel at crosswire.org
>>>>>>>> http://www.crosswire.org/mailman/listinfo/sword-devel
>>>>>>>> Instructions to unsubscribe/change your settings at above page
>>>>>>> _______________________________________________
>>>>>>> sword-devel mailing list: sword-devel at crosswire.org
>>>>>>> http://www.crosswire.org/mailman/listinfo/sword-devel
>>>>>>> Instructions to unsubscribe/change your settings at above page
>>>>>> _______________________________________________
>>>>>> sword-devel mailing list: sword-devel at crosswire.org
>>>>>> http://www.crosswire.org/mailman/listinfo/sword-devel
>>>>>> Instructions to unsubscribe/change your settings at above page
>>>>>
>>>>
>>>>
>>>
>>>
>>>
>>> _______________________________________________
>>> sword-devel mailing list: sword-devel at crosswire.org
>>> http://www.crosswire.org/mailman/listinfo/sword-devel
>>> Instructions to unsubscribe/change your settings at above page
>>
>> _______________________________________________
>> sword-devel mailing list: sword-devel at crosswire.org
>> http://www.crosswire.org/mailman/listinfo/sword-devel
>> Instructions to unsubscribe/change your settings at above page
>
>
> _______________________________________________
> sword-devel mailing list: sword-devel at crosswire.org
> http://www.crosswire.org/mailman/listinfo/sword-devel
> Instructions to unsubscribe/change your settings at above page
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://www.crosswire.org/pipermail/sword-devel/attachments/20190528/4a736997/attachment-0001.html>
More information about the sword-devel
mailing list