[jsword-devel] PassageKeyFactory, resolving verse references
DM Smith
dmsmith at crosswire.org
Fri Oct 26 18:13:46 MST 2012
Aside that the parsing needs over-hauling, this behavior is intentional. It matches what SWORD does. But that doesn't mean it cannot be changed. I'm not aware of anything that would prevent it from being different.
Here is what JSword does: It reserves space, period, colon and dash as meta-characters; and letters and numbers as characters. A passage is split into parts by other characters.
So in your example of Ps 27:7-14;28, it is split into Ps 27:7-14 and 28. The semi-colon is thrown away.
The first part is noted to have a range character, dash, so it is split into Ps 27:7 and 14.
Then Ps 27:7 is parsed. Since it has three parts (Book, Chapter and Verse) and the Book is valid, it can be understood as a full reference. But if it were merely 27:7, then we'd have to know what Book it were to belong. And if it were to be just 27, then we'd have to know whether it's context is a book or a book and a chapter.
That's where "basis" comes in. When parsing, the parser is given a "basis" a context reference by which to disambiguate incomplete references. The basis can be a book [B], a book and a chapter [BC] or a book, chapter and verse [BCV].
So now we come to 14, which is has a basis of BCV, so it has to be interpreted as V. So a range is created by taking BC from the basis and adding in V. The range is then from Ps 27:7 to Ps 27:14.
If the reference were Ps 27-28;40. The first reference would be BC, so the 28 would be interpreted to be BC as well. So the range is created by taking the B from the basis and adding in the C. Then the reange is Ps 27 to Ps 28.
Now we come to the number after the semi-colon (actually after the forgotten character that was not a letter, number, space, period, colon or dash), in Ps 27:7-14, the basis is BCV of Ps 27:14. The 28 could be interpreted as either a C or a V. The disambiguation rule is that a single number following BCV is always interpreted as a V. If it follows a BC then it is a C.
Note, if we know that we are showing a verse, then it will be the basis of references in it. Likewise, if we are showing a chapter then verse 1 of that chapter can be the basis. But if we have no idea, then the basis is some fixed verse, say Genesis 1:1 or Matt 1:1.
All this is in the type AccuracyType.
If you want to change this then you'll have to modify the parsing to remember the separator and assign meaning it it and have it passed in and throughout AccuracyType.
I mentioned at the beginning that the parser needs to be rewritten. This needs to be done for the sake of internationalization. A few things needs to be adjusted:
1) Allow any character, including meta-characters, especially dash and period, to be part of a book name. Difficulties abound such as: Gen-Exo. Basically, we need an incremental parsing of the book name. As long as the next character in the input still is a valid prefix of a book name, keep on going. If it doesn't and it is not an expected meta-character, then there is an input error.
2) Allow per language BC separators. Currently only period and space is allowed.
3) Allow per language CV separators. Currently only period and colon is allowed.
4) Allow per language range separator. Currently only dash is allowed.
So, you've presumed that comma and semi-colon are meta-characters. They are not. If we make them such then we need to allow per language variations of them as well.
The other reason to rewrite the parser is to split it into two parsers, one for osisID/osiRefs and one for everything else. Right now the one does double duty. The osisID/osisRef parser would not need any notion of "basis" as every part is BCV.
Hope this helps.
In Him,
DM
On Oct 26, 2012, at 9:57 AM, Chris Burrell <chris at burrell.me.uk> wrote:
> Hi All
>
> I've encountered an issue where I'm trying to parse the following references: "Ps 27:7-14; 28".
>
> It seems that JSword is treating the 28 as a verse rather than as a chapter. Previously I had assumed that when separating by commas you would be referencing a verse, but when separating by a semi-colon you would be expressing a chapter.
>
> The JSword code doesn't make a difference between the two parts, and therefore assumes here that 28 is a chapter. This then causes an issue because Ps 27 doesn't have a verse 28.
>
> Presumably introducing a difference between separators of verses, verse ranges and chapters (e.g. comma means between verse, but semi-colon means between chapters) is something fairly fundamental and therefore a change that we wouldn't want to make?
>
> Chris
>
> _______________________________________________
> jsword-devel mailing list
> jsword-devel at crosswire.org
> http://www.crosswire.org/mailman/listinfo/jsword-devel
More information about the jsword-devel
mailing list