[osis-core] annotateRef: Question on whitespace
Patrick Durusau
osis-core@bibletechnologieswg.org
Mon, 15 Sep 2003 10:27:45 -0400
Greetings!
Note the whitespace issue that Todd raised recently has NOT been
resolved. Need comments, suggestions, etc.
From the minutes:
Commentaries: special issues
Need mechanism to indicate the text that is being commented upon.
Decision: new attribute, annotateRef, global, use Ref regex, refers to
work in
header (annotateRef a list of osisRefType (which will be global), space
delimited by their nature)
Example:
<p annotateRef="bible.kjv:1Tim.1.1-1Tim.1.5"
annotateType="commentary"><catchWord osisRef="1Tim.1.1@s[Paul an
apostle]>Paul an apostle</catchWord> - Familiarity is to be set aside where
the things of God are concerned. According to the commandment of God
- The authoritative appointment of God the Father. <catchWord
osisRef="1Tim.1.1@s[Our Saviour]">Our Savior</catchWord> - So
styled in many other places likewise, as being the grand orderer of
the whole scheme of our salvation. And Christ our hope - That is, the
author, object, and ground, of all our hope.</p>
<snip>discussion of fix of catchWord moved to separate post</snip>
Decision: In user's manual, deprecate annotateWork.
Todd notes in a subsequent post:
> annotateRef="Esth.4.14@s[It could] John.3.16@s:[gave his] Gen.1.1@s:[and
> the earth]"
>
> Would yield the following whitespace separated tokens
> Esth.4.14@s[It
> could]
> John.3.16@s:[gave
> his]
> Gen.1.1@s:[and
> the
> earth]
>
> rather than what is expected as follows:
> Esth.4.14@s[It could]
> John.3.16@s:[gave his]
> Gen.1.1@s:[and the earth]
I responded:
> Are you saying this is a problem with XML Schema regexes or with the regexes you are using in your application?
>
> Seems to me, without checking, in the middle of something at the moment, that a regex should match
>
> [chars + whitespace]
>
> differently from
>
> [chars + whitespace] [chars + whitespace] [chars + whitespace]
>
> Note that I am not matching whitespace but each entire expression.
>
> Requires better regex handling than simply splitting on whitespace.
Harry says:
> It seems to me that the strings are well enough defined,
> but processing them with standard tools may be harder. For
> example, if XSLT had a function contains-token, you couldn't
> use such a thing if some of the tokens contain whitespace.
Todd's most recent post (and the last traffic on this issue):
> The issue is not what software will do or what we say the rules are but
> the fact that with XML Schema a list is defined to be whitespace
> separated.
>
> It is possible to express in XML Schema that a simple type is a list of
> other simple types that allow whitespace. But a list is defined as
> being whitespace separated. So in practice you must not allow
> whitespace in a simple type you use to make a simple type that is a
> list.
>
> My suggestion is to not allow whitespace in the string portion of the
> @s: grain structure.
Todd: Is your proposal to non allow whitespace in the string portion of
the @s: grain structure only for annotateRef?
Hmmm, I think having whitespace in the string portion of the @s: grain
structure is fairly important in other uses of the osisRef regex.
General feeling on whether this will be confusing to allow whitespace
sometimes but not others?
Could do a separate regex for annotateRef that does not allow the
whitespace.
Suggestions, comments?
Hope everyone is having a great day!
Patrick
--
Patrick Durusau
Director of Research and Development
Society of Biblical Literature
Patrick.Durusau@sbl-site.org
Chair, V1 - Text Processing: Office and Publishing Systems Interface
Co-Editor, ISO 13250, Topic Maps -- Reference Model
Topic Maps: Human, not artificial, intelligence at work!