[osis-core] Lists in Attribute values: final call

Patrick Durusau osis-core@bibletechnologieswg.org
Mon, 20 Oct 2003 18:52:41 -0400


Chris,

You have to go through XML Schema structures to be referred to DataTypes 
and thence to the XML productions which say:

S   ::=   (#x20 | #x9 | #xD | #xA)+

Good idea but defined as whitespace.

Hope you are having a great day!

Patrick

Chris Little wrote:
> For those XML gurus in the crowd, I've got a question:
> 
> Do XML's lists use space (0x20) as a delimiter, specifically, or does any 
> form of whitespace split tokens?  And does non-breaking space (0xA0) count 
> as whitespace in that case?
> 
> This is just a thought, but if we do end up dropping | as a delimiter, we 
> could suggest that users convert 0x20 to 0xA0, provided that the latter 
> does not divide tokens.  On the encoding side, it will require extra 
> effort, but on the rendering side, it can be left alone.  You can even 
> type non-breaking spaces using option+space--provided your OS is advanced 
> enough to support the option key. ;)
> 
> --Chris
> 
> On Mon, 20 Oct 2003, Scribe wrote:
> 
> 
>>While I conceded to making use of ' ' as a separator for a short time (~10 
>>hours), I believe I also would like to recant that concession.  There are 
>>just too many cases where a ' ' might occur in the data.
>>
>>I have always thought choosing a ' ' for a list delimiter was a silly
>>thing.  I think the XML group will also feel the same in a future version
>>of the XML spec.
>>
>>I don't like the idea of forcing USERS to modify data to meet the list 
>>requirement.  Patrick has a good point about user error (I realize not all 
>>documents will be hand edited by users and I'm sure that will be pointed 
>>out).  If I had to pick a ' ' or '|' to be least likely in the data, '|' 
>>for my money.  There are no tools that I know of that do anything useful 
>>with attribute lists based on spaces (I believe Patrick may have alluded 
>>to one in the message below).  It's easy for me to change (already done 
>>actually) my code to look for a '|' to separate my list.
>>
>>I realize that following this logic might lead one to conclude that I 
>>should just as soon favour changing all lists to use '|', then.  Well, 
>>actually, I'd be fine with that.  Maybe it would speak loud enough to 
>>accelerate the change of the XML spec, or maybe I'm being arrogant again.  
>>I always get progress and the latter mixed up ;)
>>
>>	-Troy.
>>
>>
>>
>>
>>
>>On Mon, 20 Oct 2003, Patrick Durusau wrote:
>>
>>
>>>Greetings!
>>>
>>>Well we have spilled a lot of ink, errr, electrons on this one!
>>>
>>>At the heart of the dispute seems to me to be how one declares and 
>>>treats lists in XML attribute values.
>>>
>>> From an XML standpoint, it is really quite simple, if you want a list 
>>>in an attribute value, it is a space delimited list and that excludes 
>>>any values in the list that have spaces. End of discussion.
>>>
>>>On the other hand, the no white space in the values is an arbitrary 
>>>limitation of XML lists, which may not conform to the data that we wish 
>>>to store in such lists.
>>>
>>>Now the argument can be made (and has been made) that we can reform the 
>>>values that are to be placed in such lists (substitute underscores, 
>>>etc.) for the values as seen by a user entering the text.
>>>
>>>The major problem with the reformation argument is that I tend to type 
>>>what I am familiar with more accuracy and consistency than I do if I try 
>>>to conform to an unfamiliar practice. Even when I know I should be using 
>>>an underscore or some other character, I will slip and if the prefix is 
>>>optional, there is no XML error to alert me to the error. (That is if: 
>>>pld:123 is valid, pld:123_567 is valid, but pld123 567 should not be. I 
>>>don't have a prefix on 567 and actually there should not be one because 
>>>I really meant: pld:123_567.
>>>
>>>Now, using that same example, I can also write a list as 
>>>"pld:123|pld:123 567" because I am not using the XML list mechanism and 
>>>can have spaces, so long as the separator does not otherwise appear in 
>>>the string.
>>>
>>>I can even validate that expression by requiring the "|" symbol between 
>>>the parts of the list, thus:
>>>
>>><xs:pattern 
>>>value="(((\p{L}|\p{N}|_)+)(\.(\p{L}|\p{N}|_))?):((\p{L}|\p{N}|\p{Zs}|_|\.|\-)*)?(\|(((\p{L}|\p{N}|_)+)(\.(\p{L}|\p{N}|_))?):((\p{L}|\p{N}|\p{Zs}|_|\.|\-)*)?)?"/>
>>>
>>>Yeah, ugly isn't it?
>>>
>>>The point of all this being that we are faced with two ways to handle 
>>>lists in attribute values:
>>>
>>>1. XML list (white space delimited)
>>>
>>>2. Delimited by some other separator (in the example the pipe "|" sign
>>>
>>>Either way, the list must be processed by software to do more than find 
>>>something is in the list. So the question is: Does it really make any 
>>>difference to an application whether it splits on the "|" or on a white 
>>>space.
>>>
>>>My sympathies are with the XML method but I do now know that there are 
>>>POS values (in modern Hebrew) that do have spaces.
>>>
>>>Could take the path of saying that data has to be reformed to meet our 
>>>specifications but that introduces user error.
>>>
>>>Where I am coming out on this is that I don't see the benefit of 
>>>following the whitespace protocol of the XML standard. Won't be 
>>>processed meaninfully by an XML parser anyway so I am not sure what that 
>>>gets us for these cases.
>>>
>>>Note that I am aware of the uses of list where you have an enumerated 
>>>set of values to validate against an attribute value restriction, but so 
>>>far as I know, no one has proposed such a set for any of these 
>>>attributes. That would be a case for making it a list but I would be 
>>>real leary of saying that everyone had to use our names for their 
>>>linguistic categories.
>>>
>>>Got to run, have to eat my snack and jump into a conference call on 
>>>OpenOffice.
>>>
>>>Will try to make the rounds this afternoon so we can get back on schedule.
>>>
>>>Hope everyone is in good health and spirits!
>>>
>>>Patrick
>>>
>>>
>>
>>_______________________________________________
>>osis-core mailing list
>>osis-core@bibletechnologieswg.org
>>http://www.bibletechnologieswg.org/mailman/listinfo/osis-core
>>
> 
> 
> _______________________________________________
> osis-core mailing list
> osis-core@bibletechnologieswg.org
> http://www.bibletechnologieswg.org/mailman/listinfo/osis-core
> 


-- 
Patrick Durusau
Director of Research and Development
Society of Biblical Literature
Patrick.Durusau@sbl-site.org
Chair, V1 - Text Processing: Office and Publishing Systems Interface
Co-Editor, ISO 13250, Topic Maps -- Reference Model

Topic Maps: Human, not artificial, intelligence at work!