[osis-core] Long, ugly but it works!
Patrick Durusau
osis-core@bibletechnologieswg.org
Mon, 03 Jun 2002 14:51:29 -0400
This is a multi-part message in MIME format.
--------------010109040102090604040200
Content-Type: text/plain; charset=us-ascii; format=flowed
Content-Transfer-Encoding: 7bit
Guys,
New regex! With a test file!
Incorporates Steve's request that we allow a range of grains on a ref.
(grains can be char offsets or some other syntax)
Note that I am trying to validate the syntax of a reference separate
from the content of a reference. I think those are two separate issues.
This avoids us having to specify references for all of the classical
authors, for example, on which I don't think we have a good set of
possible references.
The content validation can be built upon the validating reference
syntax. In other words, is this valid syntax? If yes, proceed to
validate against the content.
More to follow, probably in the morning.
Patrick
--
Patrick Durusau
Director of Research and Development
Society of Biblical Literature
pdurusau@emory.edu
--------------010109040102090604040200
Content-Type: text/xml;
name="regex2.xml"
Content-Transfer-Encoding: 7bit
Content-Disposition: inline;
filename="regex2.xml"
<?xml version="1.0" encoding="UTF-8"?>
<text xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:noNamespaceSchemaLocation="C:\downloads\osis-current\regex\regex2.xsd">
<body>
<p>Matt.1.1-Matt.1.3</p>
<p>Matt.1.1@char:124+130(logos)-@135+145(word)</p>
<p>Matt.1.2@char:123+134(logos)</p>
<p>Matt.1.5@char:123+134(logos)-Matt.1.6</p>
<p>Matt.1.5@char:123+134(logos)-Matt.1.6@char:234+236(Uriah)</p>
<p>Matt.1.5-Matt.1.6@char:120+124(now)-@char:150+135(then)</p>
<p>Matt.1.5@x-xpath:\text\div\p\line(Asaph)</p>
<p>Matt.1.5@x-xpath:\text\div\p\line(Asaph)-Matt.1.6</p>
<p>Matt.1.5@x-xpath:\text\div\p\line(Uriah)-Matt.1.6@x-xpath:\text\div\p\line(Asaph)</p>
</body>
</text>
--------------010109040102090604040200
Content-Type: text/plain;
name="regex2.xsd"
Content-Transfer-Encoding: 7bit
Content-Disposition: inline;
filename="regex2.xsd"
<?xml version="1.0" encoding="UTF-8"?>
<xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema"
elementFormDefault="unqualified">
<xs:element name="text">
<xs:complexType>
<xs:sequence>
<xs:element ref="body" minOccurs="1" maxOccurs="1"/>
</xs:sequence>
</xs:complexType>
</xs:element>
<xs:element name="body">
<xs:complexType>
<xs:sequence>
<xs:element ref="p" minOccurs="1" maxOccurs="unbounded"/>
</xs:sequence>
</xs:complexType>
</xs:element>
<xs:element name="p" type="osisRef"/>
<xs:simpleType name="osisRef">
<xs:restriction base="xs:string">
<!-- <xs:pattern value="(([^\s]*\.){0,6}([^\s]*))"/> first part valid -->
<!-- <xs:pattern
value="@((char:(\d*)\+(\d*)\((.*)\))|((x-(\c*):)(.*)\((.*)\)))|(-@((char:(\d*)\+(\d*)\((.*)\))|((x-(\c*):)(.*)\((.*)\))))?"/> -->
<!-- second part valid, remember to make optional with ? -->
<!-- <xs:pattern value="((-(([^\s]*\.){0,6}([^\s]*)))(@((char:(\d*)\+(\d*)\((.*)\))|((x-(\c*):)(.*)\((.*)\)))|(-@((char:(\d*)\+(\d*)\((.*)\))|((x-(\c*):)(.*)\((.*)\))))?)?)?"/> third part valid, combines second ref and grain -->
<!-- now to combine 1, 2, and 3, making ref1 required, grain optional (all cases, so you could have ref1-ref2), ref2 optional, with the ref2 grain optional) Note that I have repeated the expression for the second ref so that you can have the second ref or you can have the second ref plus the grain but not ref1 or ref1 plus grain1 and then grain2. to list grain2, must have ref2. made the regex a little less easy to see but just a little validation. -->
<xs:pattern value="(([^\s]*\.){0,6}([^\s]*))(@((char:(\d*)\+(\d*)\((.*)\))|((x-(\c*):)(.*)\((.*)\)))|(-@((char:(\d*)\+(\d*)\((.*)\))|((x-(\c*):)(.*)\((.*)\))))?)?((-(([^\s]*\.){0,6}([^\s]*)))(@((char:(\d*)\+(\d*)\((.*)\))|((x-(\c*):)(.*)\((.*)\)))|(-@((char:(\d*)\+(\d*)\((.*)\))|((x-(\c*):)(.*)\((.*)\))))?)?)?"/>
</xs:restriction>
</xs:simpleType>
</xs:schema>
--------------010109040102090604040200--