[osis-core] Lists in Attribute values: final call
Patrick Durusau
osis-core@bibletechnologieswg.org
Mon, 20 Oct 2003 10:34:00 -0400
Greetings!
Well we have spilled a lot of ink, errr, electrons on this one!
At the heart of the dispute seems to me to be how one declares and
treats lists in XML attribute values.
From an XML standpoint, it is really quite simple, if you want a list
in an attribute value, it is a space delimited list and that excludes
any values in the list that have spaces. End of discussion.
On the other hand, the no white space in the values is an arbitrary
limitation of XML lists, which may not conform to the data that we wish
to store in such lists.
Now the argument can be made (and has been made) that we can reform the
values that are to be placed in such lists (substitute underscores,
etc.) for the values as seen by a user entering the text.
The major problem with the reformation argument is that I tend to type
what I am familiar with more accuracy and consistency than I do if I try
to conform to an unfamiliar practice. Even when I know I should be using
an underscore or some other character, I will slip and if the prefix is
optional, there is no XML error to alert me to the error. (That is if:
pld:123 is valid, pld:123_567 is valid, but pld123 567 should not be. I
don't have a prefix on 567 and actually there should not be one because
I really meant: pld:123_567.
Now, using that same example, I can also write a list as
"pld:123|pld:123 567" because I am not using the XML list mechanism and
can have spaces, so long as the separator does not otherwise appear in
the string.
I can even validate that expression by requiring the "|" symbol between
the parts of the list, thus:
<xs:pattern
value="(((\p{L}|\p{N}|_)+)(\.(\p{L}|\p{N}|_))?):((\p{L}|\p{N}|\p{Zs}|_|\.|\-)*)?(\|(((\p{L}|\p{N}|_)+)(\.(\p{L}|\p{N}|_))?):((\p{L}|\p{N}|\p{Zs}|_|\.|\-)*)?)?"/>
Yeah, ugly isn't it?
The point of all this being that we are faced with two ways to handle
lists in attribute values:
1. XML list (white space delimited)
2. Delimited by some other separator (in the example the pipe "|" sign
Either way, the list must be processed by software to do more than find
something is in the list. So the question is: Does it really make any
difference to an application whether it splits on the "|" or on a white
space.
My sympathies are with the XML method but I do now know that there are
POS values (in modern Hebrew) that do have spaces.
Could take the path of saying that data has to be reformed to meet our
specifications but that introduces user error.
Where I am coming out on this is that I don't see the benefit of
following the whitespace protocol of the XML standard. Won't be
processed meaninfully by an XML parser anyway so I am not sure what that
gets us for these cases.
Note that I am aware of the uses of list where you have an enumerated
set of values to validate against an attribute value restriction, but so
far as I know, no one has proposed such a set for any of these
attributes. That would be a case for making it a list but I would be
real leary of saying that everyone had to use our names for their
linguistic categories.
Got to run, have to eat my snack and jump into a conference call on
OpenOffice.
Will try to make the rounds this afternoon so we can get back on schedule.
Hope everyone is in good health and spirits!
Patrick
--
Patrick Durusau
Director of Research and Development
Society of Biblical Literature
Patrick.Durusau@sbl-site.org
Chair, V1 - Text Processing: Office and Publishing Systems Interface
Co-Editor, ISO 13250, Topic Maps -- Reference Model
Topic Maps: Human, not artificial, intelligence at work!