[osis-core] Regex News!

Troy A. Griffitts osis-core@bibletechnologieswg.org
Thu, 12 Feb 2004 13:25:13 -0700


Chris,
	Thank you for the concern.

	My position on the conversion of ' ' to NBSP is still the same as when 
we discussed the issue in MI:

	XML is meant to be a human readable/editable format.
	NBSP in most regular text editors will probably not show up any 
different and will mislead humans, thus I don't like forcing the conversion.

	That said, if I remember correctly, I think we were talking about using 
it all over in the body text (can't remember the exact case for which it 
was proposed).

	Forcing it in osisIDType and similar _should_ be isolated and less 
common, and IMM completely reversible.  I'm not sure why anyone would 
ever have an NBSP in their morph codes, etc., unless it was for 
formatting issues, which we shouldn't really care about, so we should be 
able to toggle back and force as we see fit.

	I don't have any system which includes spaces in the code that I can 
think of.

	I don't have a better solution.

	So, I think the scale leans toward me being satisfied, especially since 
the change it comes with lets me get caught up on my work.

	Thanks again for remembering my concern from MI,
		-Troy.



Chris Little wrote:
> Right, but the question is whether this will suffice for Troy, or whether 
> a translation of 0x20 to 0xA0 (that may not necessarily be reversible) 
> will be objected to.
> 
> --Chris
> 
> On Wed, 11 Feb 2004, Patrick Durusau wrote:
> 
> 
>>Chris,
>>
>>Should not be a problem since non-breaking space is by definition not 
>>XML 1.0 whitespace.
>>
>> From the spec:
>>
>>http://www.w3.org/TR/2004/REC-xml-20040204/#NT-S
>>
>>S   ::=   (#x20 | #x9 | #xD | #xA)+
>>
>>Hope you are having a great day!
>>
>>Patrick
>>
>>Chris Little wrote:
>>
>>>Patrick,
>>>
>>>Sounds okay, but I'll go ahead and play the devil's advocate (read: look 
>>>at things from Troy's position)...
>>>
>>>I think one of Troy's desires was to be able to encode _anything_ as a 
>>>valid osisGenRef, and I presume this change is partly/mostly intended to 
>>>placate him.  Since spaces are part of some morphology codes, how should 
>>>he encode those?  Non-breaking space?
>>>
>>>If so, Troy, does that work for you?
>>>
>>>--Chris
>>>
>>>Patrick Durusau wrote:
>>>
>>>
>>>>Greetings!
>>>>
>>>>The new addition to the regexes reads as follows:
>>>>
>>>>|(\\[^\s])
>>>>
>>>>This means that any single character (excluding space, including all 
>>>>Unicode (which means PUA) can be used in any of the OSIS regex 
>>>>expressions.
>>>>
>>>>Must be preceded by a "\" if the character is one of the ones we have 
>>>>reserved for use in ID or REF syntax.
>>>>
>>>>Those characters are: ".", ":", "!", "[", "]", "@", "-" and "\".
>>>>
>>>>This is only for the portion following the prefix, which is terminated 
>>>>by a ":".
>>>>
>>>>All applications are required to recognize the "\" as an escape 
>>>>character applying to the single character that follows it.
>>>>
>>>>Hope everyone is having a great day!
>>>>
>>>>Patrick
>>>>
>>>
>>>_______________________________________________
>>>osis-core mailing list
>>>osis-core@bibletechnologieswg.org
>>>http://www.bibletechnologieswg.org/mailman/listinfo/osis-core
>>>
>>
>>
>>
> 
> _______________________________________________
> osis-core mailing list
> osis-core@bibletechnologieswg.org
> http://www.bibletechnologieswg.org/mailman/listinfo/osis-core