[osis-core] New osisRef regex

Patrick Durusau osis-core@bibletechnologieswg.org
Sun, 08 Jun 2003 17:12:17 -0400


Guys,

Just to walk through the new osisRef regex (probably still going to be 
alpha tonight but with regexes, sorry but will also try to have time 
regex in place as well):

Present:

1. (((\p{L}|\p{N}|_)+)((\.(\p{L}|\p{N}|_)+)*)?:)? - optional work (note only on beginning of osisRef

2. ((\p{L}|\p{N}|_)+)((\.(\p{L}|\p{N}|_)+)*)? - our ususal letter/number citation

3. (@(cp:\[(\p{Nd})*\]|s:\[(\p{L}|\p{N}|\s)*\]))? - our cp or s operator

4.(\-(((\p{L}|\p{N}|_)*)((\.(\p{L}|\p{N}|_)+)*)) - second letter/number citation (note, no work prefix)

5. (@(cp:\[(\p{Nd})*\]|s:\[(\p{L}|\p{N}|\s)*\]))?)? - assuming no typos, the same cp or s operator


Look to me like the ((!(((\p{L}|\p{N}|_)+)((\.(\p{L}|\p{N}|_)+)*)?)? 
conceptually should go at the end of 2 and 4, so that is precedes the cp 
and s operators.

So, rewritten, it becomes:

1. (((\p{L}|\p{N}|_)+)((\.(\p{L}|\p{N}|_)+)*)?:)? - optional work (note only on beginning of osisRef


2. ((\p{L}|\p{N}|_)+)((\.(\p{L}|\p{N}|_)+)*)?((!(((\p{L}|\p{N}|_)+)((\.(\p{L}|\p{N}|_)+)*)?)? - our ususal letter/number citation, with ref extension mechanism


3. (@(cp:\[(\p{Nd})*\]|s:\[(\p{L}|\p{N}|\s)*\]))? - our cp or s operator


4.(\-(((\p{L}|\p{N}|_)*)((\.(\p{L}|\p{N}|_)+)*))((!(((\p{L}|\p{N}|_)+)((\.(\p{L}|\p{N}|_)+)*)?)? - second letter/number citation (note, no work prefix), with ref extension mechanism


5. (@(cp:\[(\p{Nd})*\]|s:\[(\p{L}|\p{N}|\s)*\]))?)? - assuming no typos, the same cp or s operator


Notes will say that you can use reference extension but can be ignored 
when going outside of the present document.

Or in full:

(((\p{L}|\p{N}|_)+)((\.(\p{L}|\p{N}|_)+)*)?:)?((\p{L}|\p{N}|_)+)((\.(\p{L}|\p{N}|_)+)*)?((!(((\p{L}|\p{N}|_)+)((\.(\p{L}|\p{N}|_)+)*)?)?(@(cp:\[(\p{Nd})*\]|s:\[(\p{L}|\p{N}|\s)*\]))?(\-(((\p{L}|\p{N}|_)*)((\.(\p{L}|\p{N}|_)+)*))((!(((\p{L}|\p{N}|_)+)((\.(\p{L}|\p{N}|_)+)*)?)?(@(cp:\[(\p{Nd})*\]|s:\[(\p{L}|\p{N}|\s)*\]))?)?


Now that is a truly ugly regex! Worthy of a European TEI editor one 
would say! (Sorry, inside joke that only Steve will understand. That too 
is part of the irony of the foregoing.)

Todd: Have the new osisID and osisRef regexes validated. Going for time 
before dinner, probably early evening for the next alpha release.

Patrick

-- 
Patrick Durusau
Director of Research and Development
Society of Biblical Literature
Patrick.Durusau@sbl-site.org
Co-Editor, ISO 13250, Topic Maps -- Reference Model

Topic Maps: Human, not artificial, intelligence at work!