[osis-core] Lists in Attribute values: final call
Todd Tillinghast
osis-core@bibletechnologieswg.org
Mon, 20 Oct 2003 18:29:10 -0600
Chris,
Assuming xA0 is not whitespace as Patrick suggests, this would be a good
way to keep spaces and hyphens separate when substituted for xA0 and _
respectively).
Todd
> -----Original Message-----
> From: osis-core-admin@bibletechnologieswg.org [mailto:osis-core-
> admin@bibletechnologieswg.org] On Behalf Of Chris Little
> Sent: Monday, October 20, 2003 5:51 PM
> To: osis-core@bibletechnologieswg.org
> Subject: Re: [osis-core] Lists in Attribute values: final call
>
> Patrick,
>
> I don't know if it's necessarily a good idea, as you say, but it looks
> like it would work as a hack. xA is LF. NBSP (xA0) is absent from
the
> list.
>
> I'm still waiting for my port to be turned on here at SIL. Network
> support assures me it's on. Sadly, they're wrong. So looks like at
least
> another day of sub-33.6 for me.
>
> --Chris
>
>
> On Mon, 20 Oct 2003, Patrick Durusau wrote:
>
> > Chris,
> >
> > You have to go through XML Schema structures to be referred to
DataTypes
> > and thence to the XML productions which say:
> >
> > S ::= (#x20 | #x9 | #xD | #xA)+
> >
> > Good idea but defined as whitespace.
> >
> > Hope you are having a great day!
> >
> > Patrick
> >
> > Chris Little wrote:
> > > For those XML gurus in the crowd, I've got a question:
> > >
> > > Do XML's lists use space (0x20) as a delimiter, specifically, or
does
> any
> > > form of whitespace split tokens? And does non-breaking space
(0xA0)
> count
> > > as whitespace in that case?
> > >
> > > This is just a thought, but if we do end up dropping | as a
delimiter,
> we
> > > could suggest that users convert 0x20 to 0xA0, provided that the
> latter
> > > does not divide tokens. On the encoding side, it will require
extra
> > > effort, but on the rendering side, it can be left alone. You can
even
> > > type non-breaking spaces using option+space--provided your OS is
> advanced
> > > enough to support the option key. ;)
> > >
> > > --Chris
> > >
> > > On Mon, 20 Oct 2003, Scribe wrote:
> > >
> > >
> > >>While I conceded to making use of ' ' as a separator for a short
time
> (~10
> > >>hours), I believe I also would like to recant that concession.
There
> are
> > >>just too many cases where a ' ' might occur in the data.
> > >>
> > >>I have always thought choosing a ' ' for a list delimiter was a
silly
> > >>thing. I think the XML group will also feel the same in a future
> version
> > >>of the XML spec.
> > >>
> > >>I don't like the idea of forcing USERS to modify data to meet the
list
> > >>requirement. Patrick has a good point about user error (I realize
not
> all
> > >>documents will be hand edited by users and I'm sure that will be
> pointed
> > >>out). If I had to pick a ' ' or '|' to be least likely in the
data,
> '|'
> > >>for my money. There are no tools that I know of that do anything
> useful
> > >>with attribute lists based on spaces (I believe Patrick may have
> alluded
> > >>to one in the message below). It's easy for me to change (already
> done
> > >>actually) my code to look for a '|' to separate my list.
> > >>
> > >>I realize that following this logic might lead one to conclude
that I
> > >>should just as soon favour changing all lists to use '|', then.
Well,
> > >>actually, I'd be fine with that. Maybe it would speak loud enough
to
> > >>accelerate the change of the XML spec, or maybe I'm being arrogant
> again.
> > >>I always get progress and the latter mixed up ;)
> > >>
> > >> -Troy.
> > >>
> > >>
> > >>
> > >>
> > >>
> > >>On Mon, 20 Oct 2003, Patrick Durusau wrote:
> > >>
> > >>
> > >>>Greetings!
> > >>>
> > >>>Well we have spilled a lot of ink, errr, electrons on this one!
> > >>>
> > >>>At the heart of the dispute seems to me to be how one declares
and
> > >>>treats lists in XML attribute values.
> > >>>
> > >>> From an XML standpoint, it is really quite simple, if you want a
> list
> > >>>in an attribute value, it is a space delimited list and that
excludes
> > >>>any values in the list that have spaces. End of discussion.
> > >>>
> > >>>On the other hand, the no white space in the values is an
arbitrary
> > >>>limitation of XML lists, which may not conform to the data that
we
> wish
> > >>>to store in such lists.
> > >>>
> > >>>Now the argument can be made (and has been made) that we can
reform
> the
> > >>>values that are to be placed in such lists (substitute
underscores,
> > >>>etc.) for the values as seen by a user entering the text.
> > >>>
> > >>>The major problem with the reformation argument is that I tend to
> type
> > >>>what I am familiar with more accuracy and consistency than I do
if I
> try
> > >>>to conform to an unfamiliar practice. Even when I know I should
be
> using
> > >>>an underscore or some other character, I will slip and if the
prefix
> is
> > >>>optional, there is no XML error to alert me to the error. (That
is if:
> > >>>pld:123 is valid, pld:123_567 is valid, but pld123 567 should not
be.
> I
> > >>>don't have a prefix on 567 and actually there should not be one
> because
> > >>>I really meant: pld:123_567.
> > >>>
> > >>>Now, using that same example, I can also write a list as
> > >>>"pld:123|pld:123 567" because I am not using the XML list
mechanism
> and
> > >>>can have spaces, so long as the separator does not otherwise
appear
> in
> > >>>the string.
> > >>>
> > >>>I can even validate that expression by requiring the "|" symbol
> between
> > >>>the parts of the list, thus:
> > >>>
> > >>><xs:pattern
> >
>
>>>value="(((\p{L}|\p{N}|_)+)(\.(\p{L}|\p{N}|_))?):((\p{L}|\p{N}|\p{Zs}|
_|
> \.|\-
>
)*)?(\|(((\p{L}|\p{N}|_)+)(\.(\p{L}|\p{N}|_))?):((\p{L}|\p{N}|\p{Zs}|_|\
.|
> \-)*)?)?"/>
> > >>>
> > >>>Yeah, ugly isn't it?
> > >>>
> > >>>The point of all this being that we are faced with two ways to
handle
> > >>>lists in attribute values:
> > >>>
> > >>>1. XML list (white space delimited)
> > >>>
> > >>>2. Delimited by some other separator (in the example the pipe "|"
> sign
> > >>>
> > >>>Either way, the list must be processed by software to do more
than
> find
> > >>>something is in the list. So the question is: Does it really make
any
> > >>>difference to an application whether it splits on the "|" or on a
> white
> > >>>space.
> > >>>
> > >>>My sympathies are with the XML method but I do now know that
there
> are
> > >>>POS values (in modern Hebrew) that do have spaces.
> > >>>
> > >>>Could take the path of saying that data has to be reformed to
meet
> our
> > >>>specifications but that introduces user error.
> > >>>
> > >>>Where I am coming out on this is that I don't see the benefit of
> > >>>following the whitespace protocol of the XML standard. Won't be
> > >>>processed meaninfully by an XML parser anyway so I am not sure
what
> that
> > >>>gets us for these cases.
> > >>>
> > >>>Note that I am aware of the uses of list where you have an
enumerated
> > >>>set of values to validate against an attribute value restriction,
but
> so
> > >>>far as I know, no one has proposed such a set for any of these
> > >>>attributes. That would be a case for making it a list but I would
be
> > >>>real leary of saying that everyone had to use our names for their
> > >>>linguistic categories.
> > >>>
> > >>>Got to run, have to eat my snack and jump into a conference call
on
> > >>>OpenOffice.
> > >>>
> > >>>Will try to make the rounds this afternoon so we can get back on
> schedule.
> > >>>
> > >>>Hope everyone is in good health and spirits!
> > >>>
> > >>>Patrick
> > >>>
> > >>>
> > >>
> > >>_______________________________________________
> > >>osis-core mailing list
> > >>osis-core@bibletechnologieswg.org
> > >>http://www.bibletechnologieswg.org/mailman/listinfo/osis-core
> > >>
> > >
> > >
> > > _______________________________________________
> > > osis-core mailing list
> > > osis-core@bibletechnologieswg.org
> > > http://www.bibletechnologieswg.org/mailman/listinfo/osis-core
> > >
> >
> >
> >
>
> _______________________________________________
> osis-core mailing list
> osis-core@bibletechnologieswg.org
> http://www.bibletechnologieswg.org/mailman/listinfo/osis-core