[sword-devel] Hyphens in book names
Greg Hellings
greg.hellings at gmail.com
Thu Sep 30 13:28:18 MST 2010
On Thu, Sep 30, 2010 at 3:24 PM, Weston Ruter <westonruter at gmail.com> wrote:
> So there would have to be a tokenizer and parser that determines the meaning
> of the token based on context.
A superior method of doing this task is what DM suggested with
constructing a trie. Then, character by character the parser could
walk from the start of the input string until it reached a point in
the trie where it had determined the book or determined that it was
encountering an error.
The trie structure was specifically designed to do exactly this type
of disambiguation in parsing.
--Greg
>
> On Thu, Sep 30, 2010 at 1:16 PM, DM Smith <dmsmith at crosswire.org> wrote:
>>
>> It's not quite as simple as working with the fully spelled out names.
>> SWORD allows other alternates as well. For example, perhaps the following
>> would work just as well for Apostle-Works:
>> A-W
>> AW
>> Wrks
>> Wrk
>> Wks
>> Wk
>> and any proper prefix of Apostle-Works that does not conflict with another
>> books abbreviations:
>> Apostle-Work
>> Apostle-Wor
>> Apostle-Wo
>> Apostle-W
>> Apostle-
>> Apostle
>> Apostl
>> ...
>> Ap
>>
>> How about prefixes on both sides of the dash?
>> Ap-Works
>> Apo-Works
>> Ap-Wo
>>
>> How about abbreviations of just one side or the other:
>> Apo-Wrks
>> Apostle-Wrk
>> A-Wks
>>
>> In Him,
>> DM
>>
>>
>> On 09/30/2010 01:24 PM, Weston Ruter wrote:
>>
>> I think the fundamental problem here is that the SWORD reference parser is
>> too simple. Namely, the parser needs to not blindly split on a hyphen
>> character but rather tokenize the input stream and contextually determine
>> what each token is as it processes the tokens in sequence. For example, if I
>> had the following passage span (assuming the language has "Apostle-Works" as
>> the book name for "Acts"):
>>
>> Apostle-Works 4:32 - Romans 3:21
>>
>> In this case, the parser would come across that first hyphen and could
>> contextually determine it's not a passage span separator hyphen since the
>> following token "Works" is not a recognized as a book, and also that
>> "Apostle" is not a full book in itself but "Apostle-Works" is. Otherwise,
>> there could be a pre-processor that does a first pass inspecting the token
>> stream and replacing localized book name token sequences with their internal
>> OSIS names and then just split on the hyphen as usual.
>>
>> Does that sound right?
>>
>> On Thu, Sep 30, 2010 at 9:52 AM, DM Smith <dmsmith at crosswire.org> wrote:
>>>
>>> On 09/30/2010 11:11 AM, David Troidl wrote:
>>>
>>> Hi Robert,
>>>
>>> There are many Unicode characters for hyphens and dashes. Could you
>>> substitute, for example, the hyphen from General Punctuation (‐)?
>>> This would give the proper appearance, without conflicting with the 'normal'
>>> hyphen separator.
>>>
>>> I think this is at core a user input problem. Telling users that they
>>> have to use a special character that is not on their keyboard is a problem.
>>> I don't think it will do at all.
>>>
>>> If we parse the user input to figure out whether a hyphen is a range
>>> specifier or part of a name and if part of a name then substitute it with
>>> something else, then we should add that to the SWORD reference parser.
>>>
>>>
>>> Peace,
>>>
>>> David
>>>
>>> On 9/29/2010 5:28 PM, Robert Hunt wrote:
>>>
>>> On 30/09/10 10:17, Greg Hellings wrote:
>>>
>>> OP was not talking about a transliteration from the sounds of his email,
>>> but rather the original language where the hyphen is a letter.
>>>
>>> You are equivalently proposing an English speaker to not use the letter s
>>> in the Bible names list. It might be comprehensible but it would be horrible
>>> usability and I probably wouldn't take such software seriously!
>>>
>>> Exactly!
>>>
>>> Perhaps allowing each locale to define its own numerals and hyphen-like
>>> character would be a good solution?
>>>
>>> Yes, I'm sure there's probably dozens of languages in the world that are
>>> likely to have hyphens in book names. Even in English, hyphen is a valid
>>> letter as you can see in the sentence above. (It's just fortunate that it
>>> doesn't occur in book names.
>>>
>>> Surely this issue has come up many times before???
>>>
>>> Robert.
>>>
>>> On Sep 29, 2010 4:08 PM, "Daniel Owens" <dhowens at pmbx.net> wrote:
>>> >
>>> > On 09/29/2010 03:55 PM, Robert Hunt wrote:
>>> >> New Zealand.
>>> >>
>>> >> Hello all,
>>> >>
>>> >> I am spending today studying the documentation on the Crosswire
>>> >> Sword wiki so I'm likely to have a few questions. Please let me know
>>> >> if this is not the right forum to ask questions.
>>> >>
>>> >> I see in http://www.crosswire.org/wiki/DevTools:SWORD that
>>> >> localised book names are not allowed hyphens in them (because the
>>> >> hyphen is used for verse ranges). In the Philippine language that we
>>> >> worked with as Bible translators, the hyphen is a letter in the
>>> >> alphabet and appears in several book names!
>>> >>
>>> >> Is this still a current limitation? If so, what is the suggested
>>> >> work-around.
>>> >>
>>> >> Thanks,
>>> >> Robert.
>>> >>
>>> > This problem came up with Vietnamese, and I was just told to drop the
>>> > hyphens. The result was not ideal, but in the end it is still
>>> > comprehensible in Vietnamese. I think the hyphen was needed because
>>> > Vietnamese is monosyllabic, but more recent "transliterations" of
>>> > foreign names have simply dropped the hyphens. Would the names still be
>>> > comprehensible without the hyphen?
>>> >
>>> > Daniel
>>>
>>> _______________________________________________
>>> sword-devel mailing list: sword-devel at crosswire.org
>>> http://www.crosswire.org/mailman/listinfo/sword-devel
>>> Instructions to unsubscribe/change your settings at above page
>>>
>>> _______________________________________________
>>> sword-devel mailing list: sword-devel at crosswire.org
>>> http://www.crosswire.org/mailman/listinfo/sword-devel
>>> Instructions to unsubscribe/change your settings at above page
>>>
>>> _______________________________________________
>>> sword-devel mailing list: sword-devel at crosswire.org
>>> http://www.crosswire.org/mailman/listinfo/sword-devel
>>> Instructions to unsubscribe/change your settings at above page
>>
>>
>>
>> --
>> Weston Ruter
>> http://weston.ruter.net/
>> @westonruter - Google Profile
>>
>>
>> _______________________________________________
>> sword-devel mailing list: sword-devel at crosswire.org
>> http://www.crosswire.org/mailman/listinfo/sword-devel
>> Instructions to unsubscribe/change your settings at above page
>>
>> _______________________________________________
>> sword-devel mailing list: sword-devel at crosswire.org
>> http://www.crosswire.org/mailman/listinfo/sword-devel
>> Instructions to unsubscribe/change your settings at above page
>
>
>
> --
> Weston Ruter
> http://weston.ruter.net/
> @westonruter - Google Profile
>
>
> _______________________________________________
> sword-devel mailing list: sword-devel at crosswire.org
> http://www.crosswire.org/mailman/listinfo/sword-devel
> Instructions to unsubscribe/change your settings at above page
>
More information about the sword-devel
mailing list