[jsword-devel] PassageKeyFactory, resolving verse references

Chris Burrell chris at burrell.me.uk
Sun Oct 28 02:09:37 MST 2012


Hi

For now I think we can work with it as is. But if others feel like it would
be a valuable addition, I wouldn't be adverse to pursuing this further.

Chris


On 27 October 2012 02:13, DM Smith <dmsmith at crosswire.org> wrote:

> Aside that the parsing needs over-hauling, this behavior is intentional.
> It matches what SWORD does. But that doesn't mean it cannot be changed. I'm
> not aware of anything that would prevent it from being different.
>
> Here is what JSword does: It reserves space, period, colon and dash as
> meta-characters; and letters and numbers as characters. A passage is split
> into parts by other characters.
>
> So in your example of Ps 27:7-14;28, it is split into Ps 27:7-14 and 28.
> The semi-colon is thrown away.
>
> The first part is noted to have a range character, dash, so it is split
> into Ps 27:7 and 14.
>
> Then Ps 27:7 is parsed. Since it has three parts (Book, Chapter and Verse)
> and the Book is valid, it can be understood as a full reference. But if it
> were merely 27:7, then we'd have to know what Book it were to belong. And
> if it were to be just 27, then we'd have to know whether it's context is a
> book or a book and a chapter.
>
> That's where "basis" comes in. When parsing, the parser is given a "basis"
> a context reference by which to disambiguate incomplete references. The
> basis can be a book [B], a book and a chapter [BC] or a book, chapter and
> verse [BCV].
>
> So now we come to 14, which is has a basis of BCV, so it has to be
> interpreted as V. So a range is created by taking BC from the basis and
> adding in V. The range is then from Ps 27:7 to Ps 27:14.
>
> If the reference were Ps 27-28;40. The first reference would be BC, so the
> 28 would be interpreted to be BC as well. So the range is created by taking
> the B from the basis and adding in the C. Then the reange is Ps 27 to Ps 28.
>
> Now we come to the number after the semi-colon (actually after the
> forgotten character that was not a letter, number, space, period, colon or
> dash), in Ps 27:7-14, the basis is BCV of Ps 27:14. The 28 could be
> interpreted as either a C or a V. The disambiguation rule is that a single
> number following BCV is always interpreted as a V. If it follows a BC then
> it is a C.
>
> Note, if we know that we are showing a verse, then it will be the basis of
> references in it. Likewise, if we are showing a chapter then verse 1 of
> that chapter can be the basis. But if we have no idea, then the basis is
> some fixed verse, say Genesis 1:1 or Matt 1:1.
>
> All this is in the type AccuracyType.
>
> If you want to change this then you'll have to modify the parsing to
> remember the separator and assign meaning it it and have it passed in and
> throughout AccuracyType.
>
> I mentioned at the beginning that the parser needs to be rewritten. This
> needs to be done for the sake of internationalization. A few things needs
> to be adjusted:
> 1) Allow any character, including meta-characters, especially dash and
> period, to be part of a book name. Difficulties abound such as: Gen-Exo.
> Basically, we need an incremental parsing of the book name. As long as the
> next character in the input still is a valid prefix of a book name, keep on
> going. If it doesn't and it is not an expected meta-character, then there
> is an input error.
> 2) Allow per language BC separators. Currently only period and space is
> allowed.
> 3) Allow per language CV separators. Currently only period and colon is
> allowed.
> 4) Allow per language range separator. Currently only dash is allowed.
>
> So, you've presumed that comma and semi-colon are meta-characters. They
> are not. If we make them such then we need to allow per language variations
> of them as well.
>
> The other reason to rewrite the parser is to split it into two parsers,
> one for osisID/osiRefs and one for everything else. Right now the one does
> double duty. The osisID/osisRef parser would not need any notion of "basis"
> as every part is BCV.
>
> Hope this helps.
>
> In Him,
>         DM
>
> On Oct 26, 2012, at 9:57 AM, Chris Burrell <chris at burrell.me.uk> wrote:
>
> > Hi All
> >
> > I've encountered an issue where I'm trying to parse the following
> references: "Ps 27:7-14; 28".
> >
> > It seems that JSword is treating the 28 as a verse rather than as a
> chapter. Previously I had assumed that when separating by commas you would
> be referencing a verse, but when separating by a semi-colon you would be
> expressing a chapter.
> >
> > The JSword code doesn't make a difference between the two parts, and
> therefore assumes here that 28 is a chapter. This then causes an issue
> because Ps 27 doesn't have a verse 28.
> >
> > Presumably introducing a difference between separators of verses, verse
> ranges and chapters (e.g. comma means between verse, but semi-colon means
> between chapters) is something fairly fundamental and therefore a change
> that we wouldn't want to make?
> >
> > Chris
> >
> > _______________________________________________
> > jsword-devel mailing list
> > jsword-devel at crosswire.org
> > http://www.crosswire.org/mailman/listinfo/jsword-devel
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://www.crosswire.org/pipermail/jsword-devel/attachments/20121028/41aee5ee/attachment.html>


More information about the jsword-devel mailing list