[sword-devel] Localized parsing symbols [was: C++ volunteer]
Troy A. Griffitts
scribe at crosswire.org
Tue May 28 08:40:03 MST 2019
So, a little background surrounding why the logic is difficult to work
out a solution for this problem:
The current verse parser, which works fairly well, always has 3 sets of
possibilities in view:
OSISRef
Current Locale
English
The parser needs to handle any of these three, typically in the
preference order listed above. The issue with changing out symbols
while parsing is that some symbols (notoriously the comma) are used for
different purposes across these 3 sets.
One might think that localized output might be easier than parsing,
e.g., once parsed, we could at least output the reference: Jn 3,16. The
problem here is that what the engine outputs it also expects to be able
to parse.
While we would like to solve this problem, it isn't as simple as adding
to the locale files:
ChapterVerseSeparator=,
RangeSeparator=-
ListSeparator=.
This would be enough to define the locale, but not solve the problem.
We would need a fundamental change in how parsing is done, e.g.,
explicitly telling the parser, "Hey, I'm sending you localized input, so
don't guess. You can count on the symbols I'm sending you to be
localized" Right now everyone has the convenience of just passing any
of the 3 sets of parsing text listed above and theparser just figuring
it out-- with the caveat that chapter, range, and list separators are
not localizable.
Hope this gives some background,
Troy
On 5/28/19 6:10 AM, David Haslam wrote:
> OK - but my observations were not entirely irrelevant.
>
> Some front-ends never need the user to enter a reference in an edit
> box. Navigation is done entirely via menu selections or clicking
> search results etc.
> AFAICT this is true of PocketSword.
>
> Other front-ends are designed at the opposite extreme. All navigation
> is done through an edit box. This is true (eg) of STEP Bible.
>
> Best regards,
>
> David.
>
> Sent from ProtonMail Mobile
>
>
> On Tue, May 28, 2019 at 13:54, refdoc at gmx.net <refdoc at gmx.net
> <mailto:refdoc at gmx.net>> wrote:
>> Sorry, David, that is a complete misunderstanding. Modules need
>> osisref. There is and will be no need to do anything to the modules.
>> This is about the engine parser to read references locale appropriately.
>>
>> Sent from my mobile. Please forgive shortness, typos and weird
>> autocorrects.
>>
>>
>> -------- Original Message --------
>> Subject: Re: [sword-devel] C++ volunteer
>> From: David Haslam
>> To: SWORD Developers' Collaboration Forum
>> CC:
>>
>>
>> Parsing native references is not a simple task, as we know from
>> the fact that adyeths orefs.py was kicked into touch indefinitely.
>>
>> And that’s even when punctuation marks are defined in the
>> specified configuration file.
>>
>> Unless we might consider the possibility of adding keys to module
>> .conf files that define the module specific native reference
>> punctuation marks and separators.
>>
>> That could be a huge undertaking, considering the need to
>> maintain backwards compatibility.
>>
>> And it’s not as if it really is module specific entirely. A user
>> can be switching between modules with different languages, yet
>> would need the current reference to always work, no matter what.
>>
>> Best regards
>>
>> David
>>
>> Sent from ProtonMail Mobile
>>
>>
>> On Tue, May 28, 2019 at 12:10, refdoc at gmx.net <refdoc at gmx.net
>> <mailto:refdoc at gmx.net>> wrote:
>>> The improvement request for allowing commas in references...
>>> adding commas in the suggested form would make millions of
>>> currently valid Anglo references invalid. The problem is a much
>>> wider one, references should be localised in their punctuation
>>> too. I am not sure how difficult this would be, but I guess we
>>> could make a start by defining what punctuation is used for
>>> which purpose , and then take it from there.
>>>
>>> Cyrille, maybe start a page on the wiki and start thinking there.
>>>
>>> Sent from my mobile. Please forgive shortness, typos and weird
>>> autocorrects.
>>>
>>>
>>> -------- Original Message --------
>>> Subject: Re: [sword-devel] C++ volunteer
>>> From: Cyrille
>>> To: SWORD Developers' Collaboration Forum
>>> CC:
>>>
>>>
>>> Hello Richard,
>>> Welcome!
>>> May I make a very selfish proposal to Richard who offers his
>>> help. There are two issues that I really want to be
>>> resolved. One of which particularly handicaps Catholic
>>> users, (but I discovered today that the issue wasn't been
>>> reported!!! I just did it):
>>> https://tracker.crosswire.org/browse/API-216
>>> And the second:
>>> https://tracker.crosswire.org/projects/API/issues/API-180
>>>
>>> If there are more important things that I am not able to
>>> estimate not being a developer, I would have tried my luck ;)
>>>
>>> Il 28/05/2019 01:38, Troy A. Griffitts ha scritto:
>>>> Richard, sorry, I meant to give you the link to our tracker:
>>>>
>>>> https://tracker.crosswire.org
>>>>
>>>>
>>>> On 5/27/19 4:32 PM, Troy A. Griffitts wrote:
>>>>> Welcome, Richard!
>>>>>
>>>>> I would start at 2 places:
>>>>>
>>>>> First, have a look at our tracker here. We are not very (very not)
>>>>> disciplined at keeping it current. Skimming through there and
>>>>> commenting on anything that looks interesting, or even cleaning a few
>>>>> things up in there that you confirm are no longer a problem might be a
>>>>> useful exercise to get you poking around at internals and would be a
>>>>> blessing for us. Our modus operandi as of late is to create a new unit
>>>>> test in sword/tests/testssuite/ which fails at the bug and then once
>>>>> fixed, the test should pass and we leave the test around to be sure we
>>>>> don't regress. We can always use more tests in our tests suite.
>>>>>
>>>>> Next, we have the intention to modularize our search engines support and
>>>>> search types. Right now, SWModule (which represents a Bible) implements
>>>>> our SWSearchable interface, which is fine, but right now it has a bunch
>>>>> of #ifdef logic and switch statements to take different code paths
>>>>> depending on which search engine is compiled into SWORD and which search
>>>>> type is specified. This was fine initially, but has grown to such that
>>>>> we now support spaghetti in there. It should probably simply have a set
>>>>> of SWSearchable objects in a map<SEARCH_TYPE, SWSearchable> and proxy
>>>>> the search request to the appropriate SWSearchable impl based on what
>>>>> types are registered for the module. This would allow us to implement
>>>>> new types and register them with modules which support special search
>>>>> types, e.g., advanced Hebrew Morphology searching. That's the general
>>>>> idea anyway.
>>>>>
>>>>> You should probably become familiar with SWFilter and how we use these
>>>>> throughout the engine. These prepare a buffer for particular
>>>>> objectives. We have RenderFilters, EncodingFilters, StripFilters, ...
>>>>> The last prepares an SWModule entry for searching by, typically,
>>>>> stripping out all markup and leaving only a plaintext buffer which can
>>>>> be searched. We have some special code in the SWModule::search
>>>>> spaghetti which takes Greek and Hebrew modules and turns buffers into a
>>>>> series of Strongs#@MorphCode Strong#@MorphCode ... which allows regex
>>>>> searches to do some advanced morph searching... like: Find this strongs
>>>>> number, any morphology, followed by a any verb withing 2 words. You
>>>>> have to be pretty familiar with the Strong#@MorphCode syntax to
>>>>> formulate something like that, but the idea is that a frontend could
>>>>> have a nice UI to help a user come up with some creative searches.
>>>>> Anyway, these should all be probably modulized out by renaming the
>>>>> StripFilter concept to SearchFilter, and then pushing all this special
>>>>> code out to SearchFilter impls which do these special things...
>>>>>
>>>>> Finally, an objective of all this search modularization is also to break
>>>>> out the code required to create search indexes for each of the search
>>>>> engines we support. Ideally, we should be able to support the same
>>>>> searches either as an indexed or brute force search. The same code
>>>>> which iterates a module, prepares each entry, and pushes that entry to
>>>>> the search engine, building the search index, should also work for a
>>>>> brute force search-- iterating the module, preparing each entry for the
>>>>> search engine.. and then performing a check on that buffer to see if it
>>>>> matches the search expression.
>>>>>
>>>>> I hope this gives you a few things to think about. It has been good for
>>>>> me to refresh thoughts on all of this. Have a look and let me know what
>>>>> you think.
>>>>>
>>>>> Welcome! Looking forward to sharing in service together,
>>>>>
>>>>> Troy
>>>>>
>>>>>
>>>>>
>>>>> On 5/27/19 1:09 PM, Richard Smith wrote:
>>>>>> Hi,
>>>>>>
>>>>>> My name's Richard Smith. I'm a C++ software engineer with 10 years
>>>>>> experience in various industries. I was wondering if there was any
>>>>>> space for a volunteer. I've started taking a look at things (building
>>>>>> repos on Win/unix), but if there are specific things that are
>>>>>> required, within my ability, I'm happy to do that.
>>>>>>
>>>>>> Best Regards
>>>>>> Richard Smith
>>>>>>
>>>>>> _______________________________________________
>>>>>> sword-devel mailing list: sword-devel at crosswire.org
>>>>>> http://www.crosswire.org/mailman/listinfo/sword-devel
>>>>>> Instructions to unsubscribe/change your settings at above page
>>>>> _______________________________________________
>>>>> sword-devel mailing list: sword-devel at crosswire.org
>>>>> http://www.crosswire.org/mailman/listinfo/sword-devel
>>>>> Instructions to unsubscribe/change your settings at above page
>>>> _______________________________________________
>>>> sword-devel mailing list: sword-devel at crosswire.org
>>>> http://www.crosswire.org/mailman/listinfo/sword-devel
>>>> Instructions to unsubscribe/change your settings at above page
>>>
>>
>>
>
>
>
> _______________________________________________
> sword-devel mailing list: sword-devel at crosswire.org
> http://www.crosswire.org/mailman/listinfo/sword-devel
> Instructions to unsubscribe/change your settings at above page
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://www.crosswire.org/pipermail/sword-devel/attachments/20190528/588ef467/attachment-0001.html>
More information about the sword-devel
mailing list