[osis-core] Regex News!
Troy A. Griffitts
osis-core@bibletechnologieswg.org
Thu, 12 Feb 2004 13:25:13 -0700
Chris,
Thank you for the concern.
My position on the conversion of ' ' to NBSP is still the same as when
we discussed the issue in MI:
XML is meant to be a human readable/editable format.
NBSP in most regular text editors will probably not show up any
different and will mislead humans, thus I don't like forcing the conversion.
That said, if I remember correctly, I think we were talking about using
it all over in the body text (can't remember the exact case for which it
was proposed).
Forcing it in osisIDType and similar _should_ be isolated and less
common, and IMM completely reversible. I'm not sure why anyone would
ever have an NBSP in their morph codes, etc., unless it was for
formatting issues, which we shouldn't really care about, so we should be
able to toggle back and force as we see fit.
I don't have any system which includes spaces in the code that I can
think of.
I don't have a better solution.
So, I think the scale leans toward me being satisfied, especially since
the change it comes with lets me get caught up on my work.
Thanks again for remembering my concern from MI,
-Troy.
Chris Little wrote:
> Right, but the question is whether this will suffice for Troy, or whether
> a translation of 0x20 to 0xA0 (that may not necessarily be reversible)
> will be objected to.
>
> --Chris
>
> On Wed, 11 Feb 2004, Patrick Durusau wrote:
>
>
>>Chris,
>>
>>Should not be a problem since non-breaking space is by definition not
>>XML 1.0 whitespace.
>>
>> From the spec:
>>
>>http://www.w3.org/TR/2004/REC-xml-20040204/#NT-S
>>
>>S ::= (#x20 | #x9 | #xD | #xA)+
>>
>>Hope you are having a great day!
>>
>>Patrick
>>
>>Chris Little wrote:
>>
>>>Patrick,
>>>
>>>Sounds okay, but I'll go ahead and play the devil's advocate (read: look
>>>at things from Troy's position)...
>>>
>>>I think one of Troy's desires was to be able to encode _anything_ as a
>>>valid osisGenRef, and I presume this change is partly/mostly intended to
>>>placate him. Since spaces are part of some morphology codes, how should
>>>he encode those? Non-breaking space?
>>>
>>>If so, Troy, does that work for you?
>>>
>>>--Chris
>>>
>>>Patrick Durusau wrote:
>>>
>>>
>>>>Greetings!
>>>>
>>>>The new addition to the regexes reads as follows:
>>>>
>>>>|(\\[^\s])
>>>>
>>>>This means that any single character (excluding space, including all
>>>>Unicode (which means PUA) can be used in any of the OSIS regex
>>>>expressions.
>>>>
>>>>Must be preceded by a "\" if the character is one of the ones we have
>>>>reserved for use in ID or REF syntax.
>>>>
>>>>Those characters are: ".", ":", "!", "[", "]", "@", "-" and "\".
>>>>
>>>>This is only for the portion following the prefix, which is terminated
>>>>by a ":".
>>>>
>>>>All applications are required to recognize the "\" as an escape
>>>>character applying to the single character that follows it.
>>>>
>>>>Hope everyone is having a great day!
>>>>
>>>>Patrick
>>>>
>>>
>>>_______________________________________________
>>>osis-core mailing list
>>>osis-core@bibletechnologieswg.org
>>>http://www.bibletechnologieswg.org/mailman/listinfo/osis-core
>>>
>>
>>
>>
>
> _______________________________________________
> osis-core mailing list
> osis-core@bibletechnologieswg.org
> http://www.bibletechnologieswg.org/mailman/listinfo/osis-core