[osis-core] Lists in Attribute values: final call

Patrick Durusau osis-core@bibletechnologieswg.org
Mon, 20 Oct 2003 10:34:00 -0400


Greetings!

Well we have spilled a lot of ink, errr, electrons on this one!

At the heart of the dispute seems to me to be how one declares and 
treats lists in XML attribute values.

 From an XML standpoint, it is really quite simple, if you want a list 
in an attribute value, it is a space delimited list and that excludes 
any values in the list that have spaces. End of discussion.

On the other hand, the no white space in the values is an arbitrary 
limitation of XML lists, which may not conform to the data that we wish 
to store in such lists.

Now the argument can be made (and has been made) that we can reform the 
values that are to be placed in such lists (substitute underscores, 
etc.) for the values as seen by a user entering the text.

The major problem with the reformation argument is that I tend to type 
what I am familiar with more accuracy and consistency than I do if I try 
to conform to an unfamiliar practice. Even when I know I should be using 
an underscore or some other character, I will slip and if the prefix is 
optional, there is no XML error to alert me to the error. (That is if: 
pld:123 is valid, pld:123_567 is valid, but pld123 567 should not be. I 
don't have a prefix on 567 and actually there should not be one because 
I really meant: pld:123_567.

Now, using that same example, I can also write a list as 
"pld:123|pld:123 567" because I am not using the XML list mechanism and 
can have spaces, so long as the separator does not otherwise appear in 
the string.

I can even validate that expression by requiring the "|" symbol between 
the parts of the list, thus:

<xs:pattern 
value="(((\p{L}|\p{N}|_)+)(\.(\p{L}|\p{N}|_))?):((\p{L}|\p{N}|\p{Zs}|_|\.|\-)*)?(\|(((\p{L}|\p{N}|_)+)(\.(\p{L}|\p{N}|_))?):((\p{L}|\p{N}|\p{Zs}|_|\.|\-)*)?)?"/>

Yeah, ugly isn't it?

The point of all this being that we are faced with two ways to handle 
lists in attribute values:

1. XML list (white space delimited)

2. Delimited by some other separator (in the example the pipe "|" sign

Either way, the list must be processed by software to do more than find 
something is in the list. So the question is: Does it really make any 
difference to an application whether it splits on the "|" or on a white 
space.

My sympathies are with the XML method but I do now know that there are 
POS values (in modern Hebrew) that do have spaces.

Could take the path of saying that data has to be reformed to meet our 
specifications but that introduces user error.

Where I am coming out on this is that I don't see the benefit of 
following the whitespace protocol of the XML standard. Won't be 
processed meaninfully by an XML parser anyway so I am not sure what that 
gets us for these cases.

Note that I am aware of the uses of list where you have an enumerated 
set of values to validate against an attribute value restriction, but so 
far as I know, no one has proposed such a set for any of these 
attributes. That would be a case for making it a list but I would be 
real leary of saying that everyone had to use our names for their 
linguistic categories.

Got to run, have to eat my snack and jump into a conference call on 
OpenOffice.

Will try to make the rounds this afternoon so we can get back on schedule.

Hope everyone is in good health and spirits!

Patrick

-- 
Patrick Durusau
Director of Research and Development
Society of Biblical Literature
Patrick.Durusau@sbl-site.org
Chair, V1 - Text Processing: Office and Publishing Systems Interface
Co-Editor, ISO 13250, Topic Maps -- Reference Model

Topic Maps: Human, not artificial, intelligence at work!