Hi<div><br></div><div>For now I think we can work with it as is. But if others feel like it would be a valuable addition, I wouldn't be adverse to pursuing this further.</div><div><br></div><div>Chris</div><div><br><div>
<br><div class="gmail_quote">On 27 October 2012 02:13, DM Smith <span dir="ltr"><<a href="mailto:dmsmith@crosswire.org" target="_blank">dmsmith@crosswire.org</a>></span> wrote:<br><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">
Aside that the parsing needs over-hauling, this behavior is intentional. It matches what SWORD does. But that doesn't mean it cannot be changed. I'm not aware of anything that would prevent it from being different.<br>
<br>
Here is what JSword does: It reserves space, period, colon and dash as meta-characters; and letters and numbers as characters. A passage is split into parts by other characters.<br>
<br>
So in your example of Ps 27:7-14;28, it is split into Ps 27:7-14 and 28. The semi-colon is thrown away.<br>
<br>
The first part is noted to have a range character, dash, so it is split into Ps 27:7 and 14.<br>
<br>
Then Ps 27:7 is parsed. Since it has three parts (Book, Chapter and Verse) and the Book is valid, it can be understood as a full reference. But if it were merely 27:7, then we'd have to know what Book it were to belong. And if it were to be just 27, then we'd have to know whether it's context is a book or a book and a chapter.<br>
<br>
That's where "basis" comes in. When parsing, the parser is given a "basis" a context reference by which to disambiguate incomplete references. The basis can be a book [B], a book and a chapter [BC] or a book, chapter and verse [BCV].<br>
<br>
So now we come to 14, which is has a basis of BCV, so it has to be interpreted as V. So a range is created by taking BC from the basis and adding in V. The range is then from Ps 27:7 to Ps 27:14.<br>
<br>
If the reference were Ps 27-28;40. The first reference would be BC, so the 28 would be interpreted to be BC as well. So the range is created by taking the B from the basis and adding in the C. Then the reange is Ps 27 to Ps 28.<br>
<br>
Now we come to the number after the semi-colon (actually after the forgotten character that was not a letter, number, space, period, colon or dash), in Ps 27:7-14, the basis is BCV of Ps 27:14. The 28 could be interpreted as either a C or a V. The disambiguation rule is that a single number following BCV is always interpreted as a V. If it follows a BC then it is a C.<br>
<br>
Note, if we know that we are showing a verse, then it will be the basis of references in it. Likewise, if we are showing a chapter then verse 1 of that chapter can be the basis. But if we have no idea, then the basis is some fixed verse, say Genesis 1:1 or Matt 1:1.<br>
<br>
All this is in the type AccuracyType.<br>
<br>
If you want to change this then you'll have to modify the parsing to remember the separator and assign meaning it it and have it passed in and throughout AccuracyType.<br>
<br>
I mentioned at the beginning that the parser needs to be rewritten. This needs to be done for the sake of internationalization. A few things needs to be adjusted:<br>
1) Allow any character, including meta-characters, especially dash and period, to be part of a book name. Difficulties abound such as: Gen-Exo. Basically, we need an incremental parsing of the book name. As long as the next character in the input still is a valid prefix of a book name, keep on going. If it doesn't and it is not an expected meta-character, then there is an input error.<br>
2) Allow per language BC separators. Currently only period and space is allowed.<br>
3) Allow per language CV separators. Currently only period and colon is allowed.<br>
4) Allow per language range separator. Currently only dash is allowed.<br>
<br>
So, you've presumed that comma and semi-colon are meta-characters. They are not. If we make them such then we need to allow per language variations of them as well.<br>
<br>
The other reason to rewrite the parser is to split it into two parsers, one for osisID/osiRefs and one for everything else. Right now the one does double duty. The osisID/osisRef parser would not need any notion of "basis" as every part is BCV.<br>
<br>
Hope this helps.<br>
<br>
In Him,<br>
DM<br>
<div><div class="h5"><br>
On Oct 26, 2012, at 9:57 AM, Chris Burrell <<a href="mailto:chris@burrell.me.uk">chris@burrell.me.uk</a>> wrote:<br>
<br>
> Hi All<br>
><br>
> I've encountered an issue where I'm trying to parse the following references: "Ps 27:7-14; 28".<br>
><br>
> It seems that JSword is treating the 28 as a verse rather than as a chapter. Previously I had assumed that when separating by commas you would be referencing a verse, but when separating by a semi-colon you would be expressing a chapter.<br>
><br>
> The JSword code doesn't make a difference between the two parts, and therefore assumes here that 28 is a chapter. This then causes an issue because Ps 27 doesn't have a verse 28.<br>
><br>
> Presumably introducing a difference between separators of verses, verse ranges and chapters (e.g. comma means between verse, but semi-colon means between chapters) is something fairly fundamental and therefore a change that we wouldn't want to make?<br>
><br>
> Chris<br>
><br>
</div></div>> _______________________________________________<br>
> jsword-devel mailing list<br>
> <a href="mailto:jsword-devel@crosswire.org">jsword-devel@crosswire.org</a><br>
> <a href="http://www.crosswire.org/mailman/listinfo/jsword-devel" target="_blank">http://www.crosswire.org/mailman/listinfo/jsword-devel</a><br>
<br>
</blockquote></div><br></div></div>