[osis-core] osisWork regex: Summary

Patrick Durusau osis-core@bibletechnologieswg.org
Thu, 22 Aug 2002 09:33:00 -0400


Guys,

Summarizing the posts from yesterday in preparation for making schema 
repairs/edits/changes:


On the osisWork regex:

1. Harry and Luke discovered that SoftQuad (XMetal) does not support 
category escapes in regexes.

Q: Should we take non-compliant implementations of XML schema into 
account in the syntax of osisCore? I have not tested doing the sort of 
range that XMetal apparently recognizes (after all the range for \p[L] 
is quite a bit larger than [a-z]|[A-Z]). It would certainly be a very 
ugly expansion.

2. Chris found that:

>>Here's the regex: ((\p{L}|\p{N}|_)*)((\.(\p{L}|\p{N}|_)+)*)?
>>
>
>If this is the schema as it stands, the first * needs to be changed to a 
>+.  Currently, "" is a valid osisWork.
>
Same criticism applies to:

osisIDType and osisRefType.

I think this is clearly a bug and should be repaired as indicated by Chris.

Proposed Action:

1. Repair all regexes to not allow "" as valid values.

Outstanding Question:

2. How to accomodate (if at all) non-support for category escapes in 
regexes?

Patrick

-- 
Patrick Durusau
Director of Research and Development
Society of Biblical Literature
pdurusau@emory.edu