FW: [osis-core] character counting issue: proposed solution
Steve DeRose
osis-core@bibletechnologieswg.org
Tue, 25 Jun 2002 12:11:39 -0400
At 06:27 AM -0400 06/21/02, Patrick Durusau wrote:
>Steve,
>
>Can you and Harry (and any one else who has comments on this issue)
>derive a consensus on the solution?
I think if we went with my suggestion from earlier, all we'd have to
do is change 'char' to 'cp' in the schema (and put a "?" for the
+length if it isn't there already (skipping (g) below). The rest goes
in comments or other prose. I'd be more inclined to go for Harry's
suggesting of using just strings, except that string comparison has
some of the same problems as counting in general...
>
>>I'm inclined to suggest:
>>
>>a) change 'character' to 'code point' and explain that it's dumb.
-- by 'dumb', i meant that it only counts code points, so surrogates
and other such stuff may not come out right (except via (b)).
>>
>>b) adopt Harry's method of looking forward upon finding mismatch.
>>
>>c) make the +length optional, and default it to the string length
>>
>>d) state that length 0 is a point selection before the nth char
>>
>>e) state that offsets start at 1 and can't be negative to count backwards.
>>
>>f) state what happens if the offset or length goes beyond the
>>content of the referenced element we're counting in. Just copy the
>>xpointer rules on this, I suppose (now, if i could only remember
>>what they are...).
>>
>>g) perhaps? make offset optional in which case you get the string. eh.
>>
>>Does that cut a plausible compromise on well-defined counting vs.
>>ease of implementation? Any boundary cases left unspecified?
>>
--
Steve DeRose -- http://www.stg.brown.edu/~sjd
Chair, Bible Technologies Group -- http://www.bibletechnologies.net
Email: sderose@speakeasy.net
Backup email: sderose@mac.com, sjd@stg.brown.edu